For observability, Amazon CloudWatch is one of the options to collect and track metrics as well as provide alerts based on the metric threshold setting. Especially when you don't want to use external monitoring and observability tools such as Datadog or Prometheus, and don't want to pay extra costs for data transferring out.
The thing is that we need an automated way of setting up CloudWatch Alarms for EC2 instances and customising the metrics as well as alerts. Especially when there are new EC2 created by autoscaling or on-demand, we need to trigger the automation process to install cloudwatch agent on the EC2 instances as well as set up the alarm for them such as CPU utilization, disk I/O, and memory usage.
In this blog post, I demonstrate how to automate the setup and configuration of CloudWatch alarms on Amazon EC2 in addition to providing alert notification to the Slack channel.
Table Of Contents
🚀 Solution overview
The CloudWatch Auto Alarms and Install CloudWatch Agent AWS Lambda functions help to quickly and automatically create a standard set of CloudWatch alarms for the new Amazon EC2 instances (or just reboot the EC2 for generating a Running event state). It saves the time for installing cloudwatch agent as well as agent configuration setup, deploying alarms and setup metric alerts, plus reduces the skills gap required to create and manage alarms.
This blog post gives an example of setting default configuration and creating alarms for the Amazon EC2 with Amazon Linux AMI (but the lambda function supports multiple OS such as Ubuntu, Redhat, SUSE and Windows):
Disk Space Used
CloudWatch agent predefined metric sets - Advanced
CPU: cpu_usage_idle, cpu_usage_iowait, cpu_usage_user, cpu_usage_system
Disk: disk_used_percent, disk_inodes_free
Diskio: diskio_io_time, diskio_write_bytes, diskio_read_bytes, diskio_writes, diskio_reads
Netstat: netstat_tcp_established, netstat_tcp_time_wait
The created alarms take action of notifying an Amazon SNS topic. The SNS topic is subscribed by the AWS ChatBot associated with the Slack channel to send alert messages directly to Slack.
🚀 Flow overview
Prerequisites: EC2 instances use AMI versions which support automatically installing SSM agents from startup.
In the flow chart above, it performs the following steps
For any EC2 instance launched or restarted, the eventbridge rule
cw-auto-alarmcatch the event of a new Running state from the EC2 instance and then trigger their targets here are lambda functions
The lambda function
install-cw-agent-install-cw-agentdoes the following steps
Get the instance tag to check if it contains tag-key
Create_Auto_Alarms(a reference to
ALARM_TAGenvironment of the lambda) then proceed, otherwise, ignore
Run the SSM documents
AWS-ConfigureAWSPackageto install cloudwatch agent on the target instance and then run SSM
AWS-RunShellScriptto load cloudwatch agent config from SSM parameter store and start cloudwatch agent service
The lambda function
cw-auto-alarmbased on EC2 instance tags to create cloudwatch alarms with the format
AutoAlarm-<InstanceID>-<cw-namespace>-<MetricName>-<ComparisonOperator>-<Period>-<EvaluationPeriods>-<Statistic>-<CloudWatchAutoAlarms>. These alarms send alerts to the SNS topic which is defined in
When the SNS topic receives a message, it forwards it to AWS ChatBot webhook and then the chatbot sends an alert message to the registered slack channel.
If there's any instance terminated, the eventbridge rule
cw-auto-alarmcatches the event and then triggers the lambda function to delete the alarms according to the terminated instances
🚀 Deploying the solution
For infrastructure as code, in this blog post, I use CDK Typescript.
Stack visualize chart
AWS Chatbotapp to Slack channel.
Provide slack workspace ID and slack channel ID to the CDK code.
Deploy cdk stacks
cdk deploy --all
🚀 Test alarms
cdk deploy --allinclude creating EC2 instance but it might be a gap for eventbridge rule to catch the event of Running state change, so for sure, just restart the EC2.
Create one more instance to test creating alarms for new instance launch through the stack
EC2 with proper tags
will be created according to alarms
- In-alarm threshold
- Slack alert
Destroy all the stacks within this project by running
cdk destroy --all
Cloudwatch logs groups which are created by Lambda functions are not parts of the project stacks so they are not deleted. Although the log group have retention you might want to delete them for cleaning up completely
In this post, I leverage serverless services such as lambda function, eventbridge rule, systems manager, and SNS to provide an automation way of creating CloudWatch alarms and alerts for Amazon EC2 instances in an AWS account.
By using the SSM agent from the Systems manager, the lambda function can remotely install cloudwatch agent in the EC2 instances for collecting system logs and metrics and then create cloudwatch alarms properly based on the tags of EC2.
The solution is deployed using AWS CDK typescript. For production, I encourage creating the CDK pipeline to deploy the IaC through codepipeline completely.