Enterprise Manager: how Comcast manages and secures thousands of databases

December 7, 2023 | 5 minute read
Moe Fardoost
Senior Director of Product Management
Text Size 100%:

If you were unable to attend Oracle CloudWorld 2023, be sure to return to the observability blog site for more updates on the team’s key sessions. In this post, I’ll review session LRN3210 “Safeguarding Databases: Comprehensive Monitoring and Vulnerability Assessment” which was co-presented by Oracle and customer Comcast. This session covered Comcast use cases and their solution implementation using Oracle Enterprise Manager (EM).

About session LRN3210

This session presented at Oracle CloudWorld 2023 covered EM’s latest enhancements in database monitoring and vulnerability management and was hosted by Ana McCollum and Harish Niddagatta of Oracle Product Management. For more details on the topic, be sure to review the presentation content linked in the resources section of this blog. What follows is a summary of Comcast’s section of the presentation on the role of EM in their IT environment and how it is helping Comcast monitor and secure thousands of databases.

Database engineering and operations at Comcast

Anupam Mohanta, Director of Database Engineering, leads the team managing database engineering at Comcast. His team is responsible for maintaining EM deployment, designing, and developing monitoring solutions, and automating common tasks that optimize DBA workload and enable developers’ productivity. Anupam presented a summary of the scope of the team’s challenges and how they overcame them leveraging EM.

Comcast Database Engineering and Operations team
Figure 1: Comcast Database Engineering and Operations

EM at Comcast

EM is an integral part of Comcast’s infrastructure footprint. Comcast manages over 13,000 database targets. About 80% are Oracle Database and the remaining 20% are MySQL. Let’s take a look at their top use cases.

Like many customers, Comcast started their use of EM with monitoring, alerting, and performance troubleshooting of their database fleet. They have expanded their EM usage to include securing databases, risk compliance reporting, and deploying database-as-a-service for their DevOps team.

Comcast use cases
Figure 2: Comcast use cases

EM architecture at Comcast

Comcast maintains a high-availability architecture using an active-passive model. Communication between agents and the mid-tier go through a load balancer. When EM is brought down for planned maintenance, they use Always-On Monitoring to ensure they continue to get notifications for critical alerts from their targets.

Comcast architecture
Figure 3: Comcast architecture

 

DBA workload optimizations

The self-healing architecture deployed by the Comcast team enables them to remediate about 500 alerts each month with no action required from DBAs. This level of automation represents about 10% of the overall alerts processed each month in the following categories:

  • Filesystem Usage
  • Fast Recovery Area (FRA) Usage
  • MySQL Replication

Comcast has also automated the resolution of many common events. To do so, they used Corrective Actions. These Corrective Actions were based on custom scripts that automatically execute and resolve the events whenever they occurred.

Comcast self-healing workflow
Figure 4: Self-healing workflow

Self-healing implementation and observations

Key aspects of the implementation include:

  • Corrective action target types for cluster and single-instance Oracle Databases, MySQL, and host
  • Corrective action methods can be OS command or host-based authentication
  • Key variables/parameters:  ORACLE_SID, ORACLE_HOME, DB_NAME, and PDB_NAME

Top challenges observed:

  • Credential Setup (jobs in indefinite pending state): Corrective Actions - will not execute successfully without the necessary credentials in place.
  • Script Placement - to run a script as part of the corrective action, the scripting needed to be deployed on the target.  

Self-service administration, provisioning, and delivery

Developers, application teams, DevOps engineers, and testers all need access to databases, and meeting their requirements is a challenging task. The Comcast team has met this challenge with Comcast-built Chatbot, powered by Natural Language Protocol (NLP). Dev teams can chat with the Chatbot and perform the following tasks:

  • Monitoring – Health Checks
  • Performance Review & Remediation – Get AWR Reports, Identify & Kill Blocking Sessions
  • Switchover – In case of a disaster recovery scenario, application teams can use the bot to switch over to an available node. The exposed API also helps developers have automated full-stack failover (application & database)

The above capabilities provide great power to application teams. Yet, access to EM resources is secured through integrations of Comcast’s Configuration Management Database (CMDB) with EM groups, so there is no unauthorized access.

Security and Compliance

EM compliance framework and corrective actions provide the ability to report and fix any Center for Internet Security (CIS) violations. Figure 5 shows the automated workflow in action.

Comcast Implementation of CIS compliance workflow
Figure 5: Implementation of CIS compliance workflow 

The implementation has the following key aspects:

  • CIS compliance target types include cluster, pluggable, and single-instance databases
  • Corrective action methods include OS command and host-based authentication
  • Key variables/parameters are ORACLE_SID, ORACLE_HOME, DB_NAME, and PDB_NAME

OEM compliance framework and corrective actions provide the ability to report and fix any CIS violations and keep their environments secure.

Summary and wrap-up

Anupam Mohanta, Director of Database Engineering at Comcast presented their use of Oracle Enterprise Manager at Oracle CloudWorld 2023. This blog is a summary of what was presented during the session including use cases, implementation approaches, and the benefits the Comcast team is experiencing.

Resources 

Moe Fardoost

Senior Director of Product Management

Moe is a seasoned Information Technology professional who has extensive experience in every stage of software market, having had hands-on roles as a software developer, QA lead, ITOps and customer service manager, product marketer, and most recently, product management. Moe's passions are cloud observability and automation, fly fishing, and skiing.


Previous Post

Oracle Enterprise Manager 13c Release 5 Update 18 (13.5.0.18) is now available

Daniela Hansell | 5 min read

Next Post


Optimize workload performance with OPSI SQL Insights

Murtaza Husain | 6 min read