Introduction
Database performance monitoring is a critical aspect of maintaining optimal application performance and user experience. Amazon Relational Database Service (RDS) provides a powerful feature called Enhanced Monitoring that allows administrators to gain deeper visibility into the health and performance of their database instances. This blog post will explore Enhanced Monitoring in detail, covering its features, implementation, benefits, and best practices.
What is Enhanced Monitoring?
Enhanced Monitoring is an advanced monitoring feature available for Amazon RDS instances that provides real-time operating system (OS) metrics for your database instances. While standard CloudWatch metrics offer basic database performance statistics, Enhanced Monitoring goes a step further by providing visibility into processes or threads that consume CPU, file system performance, and disk I/O latency.
The key difference between standard CloudWatch monitoring and Enhanced Monitoring is:
- CloudWatch: Gathers metrics from the hypervisor layer for an instance
- Enhanced Monitoring: Collects metrics directly from the agent on the instance
This fundamental difference allows Enhanced Monitoring to provide more accurate and granular metrics, especially when the instance experiences high CPU utilization.
How Enhanced Monitoring Works
Enhanced Monitoring uses an agent installed on the RDS instance to collect OS-level metrics. Here’s how the system functions:
- When enabled, an agent is deployed on your RDS instance
- The agent collects OS-level metrics at your specified interval (as frequent as every 1 second)
- Metrics are pushed to CloudWatch Logs rather than standard CloudWatch metrics
- These metrics can be viewed within the RDS console or retrieved from CloudWatch Logs
The architecture involves several AWS services working together:
- RDS Instance: Where the monitoring agent runs
- CloudWatch Logs: Where the metrics are stored
- IAM: For managing the permissions needed to collect and store the metrics
Available Metrics in Enhanced Monitoring
Enhanced Monitoring provides an extensive set of metrics that standard monitoring doesn’t offer. These include:
OS Metrics
- General: OS version, virtualization type, system uptime
- CPU: User, system, idle, wait, steal, guest, irq, nice percentages (per CPU core and total)
- Memory: Total, active, inactive, free, cached, buffered, available
- Tasks: Total number of processes, sleeping, running, blocked, stopped, zombie
- Swap Space: Total, free, used, cache swap
Process List Metrics
- Process ID (PID)
- Name: Process name
- CPU%: CPU percentage used by the process
- Memory%: Memory percentage used by the process
- VSZ/RSS: Virtual and resident set size
- Status: Process status (running, sleeping, etc.)
File System Metrics
- Filesystem utilization: Total, used, available space
- IOPS: Read/write operations per second
- Latency: Read/write latency
- Throughput: Read/write throughput
Disk I/O Metrics
- Read/Write IOPS
- Read/Write bandwidth
- Read/Write latency
- Read/Write queue depths
- Disk utilization
Setting Up Enhanced Monitoring
Setting up Enhanced Monitoring involves a few straightforward steps:
Prerequisites
- Create an IAM role that grants RDS permission to send metrics to CloudWatch Logs
- You can allow RDS to create this role automatically when enabling Enhanced Monitoring
- The managed policy
AmazonRDSEnhancedMonitoringRole
provides the necessary permissions
Enabling Enhanced Monitoring for New Instances
- In the RDS console, when creating a new DB instance:
- Go to the “Additional configuration” section
- Find the “Monitoring” options
- Check “Enable Enhanced monitoring”
- Select your desired granularity (1, 5, 10, 15, 30, or 60 seconds)
- Choose an IAM role (create new or use existing)
Enabling Enhanced Monitoring for Existing Instances
- In the RDS console:
- Select the instance you want to modify
- Click “Modify”
- Find the “Monitoring” section
- Check “Enable Enhanced monitoring”
- Select your granularity and IAM role
- Apply the changes immediately or during the next maintenance window
Via AWS CLI
aws rds modify-db-instance \
--db-instance-identifier mydbinstance \
--monitoring-interval 15 \
--monitoring-role-arn arn:aws:iam::123456789012:role/rds-monitoring-role
Via AWS CloudFormation
Resources:
MyDBInstance:
Type: AWS::RDS::DBInstance
Properties:
MonitoringInterval: 15
MonitoringRoleArn: !GetAtt RDSMonitoringRole.Arn
# other DB instance properties...
RDSMonitoringRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: monitoring.rds.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonRDSEnhancedMonitoringRole
Viewing Enhanced Monitoring Metrics
Once enabled, you can view the Enhanced Monitoring metrics in several ways:
RDS Console
- Navigate to the RDS console
- Select your DB instance
- Choose the “Monitoring” tab
- View “Enhanced monitoring” metrics in the respective graphs
- You can adjust the time range and granularity of the displayed data
CloudWatch Logs
Enhanced Monitoring metrics are stored in CloudWatch Logs in the following log group:
/aws/rds/instance/db-instance-identifier/os-metrics
You can query this data using CloudWatch Logs Insights with queries like:
fields @timestamp, @message
| sort @timestamp desc
| limit 20
Or for specific metrics:
filter @message like "cpuUtilization"
| stats avg($.cpuUtilization.total) by bin(5m)
AWS CLI
aws logs get-log-events \
--log-group-name /aws/rds/instance/mydbinstance/os-metrics \
--log-stream-name os-metrics
Enhanced Monitoring vs. Performance Insights
RDS offers another monitoring feature called Performance Insights. It’s important to understand how these two features complement each other:
Feature | Enhanced Monitoring | Performance Insights |
---|---|---|
Focus | OS metrics, resource utilization | Database performance, SQL queries |
Granularity | As low as 1 second | 1 second |
Retention | Based on CloudWatch Logs settings | 7 days free, up to 24 months paid |
Cost | Additional charges apply | Basic features free, extended retention has additional cost |
Primary use case | Troubleshooting OS and resource issues | Identifying problematic SQL queries and database load |
For comprehensive database monitoring, it’s recommended to use both features together:
- Enhanced Monitoring: To understand resource utilization at the OS level
- Performance Insights: To identify database performance bottlenecks
Use Cases for Enhanced Monitoring
Enhanced Monitoring provides value in numerous scenarios:
Troubleshooting Performance Issues
- Identify if database performance problems are related to OS-level resource constraints
- Determine if high CPU usage is caused by database processes or other OS processes
- Detect disk I/O bottlenecks that might be affecting database performance
Capacity Planning
- Track resource usage trends over time
- Make informed decisions about when to scale up or scale out
- Understand the impact of workload increases on system resources
Security Monitoring
- Identify unexpected processes running on the database instance
- Detect unusual resource consumption patterns that might indicate security issues
- Monitor for potential resource exhaustion attacks
Performance Optimization
- Identify excessive swap usage that might indicate memory pressure
- Find file systems that are reaching capacity limits
- Detect I/O performance issues that could be addressed with configuration changes
Best Practices
To get the most from Enhanced Monitoring, follow these best practices:
Configuration Best Practices
- Choose appropriate monitoring interval: Balance between detail and cost
- For production databases: 15-30 seconds is often a good balance
- For troubleshooting: Use 1-5 seconds temporarily
- For non-critical systems: 60 seconds may be sufficient
- Set up appropriate CloudWatch Logs retention: By default, logs are kept indefinitely
- Consider setting a retention period to manage costs
- Match retention to your compliance and troubleshooting needs
- Create CloudWatch Alarms: Set up alarms on key metrics
- High CPU utilization sustained over time
- Low available memory
- High disk I/O wait times
- High swap usage
Operational Best Practices
- Establish baselines: Understand normal performance patterns
- Document baseline metrics during different workload periods
- Compare current metrics against established baselines
- Regular review: Don’t just wait for alerts
- Schedule regular reviews of Enhanced Monitoring data
- Look for trends that might indicate developing issues
- Correlate with application metrics: Connect database performance with user experience
- Compare Enhanced Monitoring metrics with application response times
- Identify patterns between resource utilization and application performance
- Coordinate with Performance Insights: Use both monitoring systems together
- When Performance Insights shows database load, check Enhanced Monitoring for resource constraints
- When Enhanced Monitoring shows resource issues, check Performance Insights for problematic queries
Cost Considerations
Enhanced Monitoring incurs additional costs that should be considered:
- CloudWatch Logs costs:
- Charges for ingestion of log data
- Charges for storage of log data
- Charges for any custom metrics or dashboards created
- Impact of monitoring interval:
- More frequent intervals generate more data, increasing costs
- Example cost difference: monitoring at 1 second vs. 60 seconds can be 60x more log data
- Cost optimization:
- Set appropriate log retention periods
- Use higher monitoring intervals for non-critical systems
- Temporarily increase granularity during troubleshooting, then revert to normal
Limitations and Considerations
While Enhanced Monitoring is powerful, it has some limitations to be aware of:
- Not available on all instance classes:
- Some very small instance classes don’t support Enhanced Monitoring
- Impact during high load:
- The monitoring agent itself consumes resources
- At very high granularity (1 second), can add slight overhead
- No direct integration with third-party monitoring tools:
- Need to use CloudWatch Logs APIs to export data
- Limited historical data without configuration:
- Default CloudWatch Logs retention may not be sufficient for long-term analysis
Advanced Usage: Automation and Custom Dashboards
Take Enhanced Monitoring to the next level with these advanced techniques:
Creating Custom CloudWatch Dashboards
- Extract specific metrics from Enhanced Monitoring logs
- Create CloudWatch metrics from log data using metric filters
- Build custom dashboards combining RDS metrics, Enhanced Monitoring metrics, and application metrics
Automation with Lambda
- Create Lambda functions that process Enhanced Monitoring logs
- Implement automated responses to specific conditions
- Example: Automatically adjust RDS parameters based on observed metrics
Integration with Third-Party Monitoring
- Use Lambda to forward Enhanced Monitoring data to third-party systems
- Create unified monitoring views across all systems
- Apply advanced analytics to Enhanced Monitoring data
Conclusion
Enhanced Monitoring for Amazon RDS provides deep visibility into the operating system and resources of your database instances. By collecting metrics directly from the database instance rather than the hypervisor, it offers a more accurate view of resource utilization, particularly in high-load situations.
While it comes with additional costs, the benefits for troubleshooting, optimization, and capacity planning make it an essential tool for any serious database administrator. When used in conjunction with Performance Insights and standard CloudWatch metrics, Enhanced Monitoring provides a comprehensive monitoring solution that can help ensure your databases perform optimally and reliably.
By following the best practices outlined in this guide, you can implement Enhanced Monitoring effectively and leverage its capabilities to maintain healthy, high-performing database environments.
Additional Resources
- Amazon RDS Enhanced Monitoring Documentation
- CloudWatch Logs Insights Documentation
- RDS Performance Insights Documentation
- AWS Database Blog – Analyzing RDS Enhanced Monitoring with CloudWatch Logs Insights