Table of contents
- Abstract
- Table Of Contents
- ๐ Overview of EC2 spot instance
- ๐ Simulate Spot Interruptions architect
- Now we start creating CDK stacks
- ๐ Create Lambda function - send slack
- ๐ Create event rule of spot interruption
- ๐ Create FIS service role
- ๐ Create FIS Experiment Template
- ๐ Start experiment template
- ๐ Conclusion
Abstract
AWS Fault Injection Simulator now supports Spot Interruptions, now you can trigger the interruption of an Amazon EC2 Spot Instance using AWS Fault Injection Simulator (FIS).
With FIS, you can test the resiliency of your workload and validate that your application is reacting to the interruption notices that EC2 sends before terminating your instances.
This blog guides you step-by-step to create FIS Experiment templates using AWS CDK
Table Of Contents
๐ Overview of EC2 spot instance
Amazon EC2 Spot Instances reduce the cost up to 90% but can be interrupted or reclaimed at any time with a warning in 2 mins.
We can use
aws-node-termination-handler
to ensure that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable
๐ Simulate Spot Interruptions architect
Starting the FIS experiment which sends
send-spot-instance-interruptions
event.Use cloudwatch event rule to catch
EC2 Spot Instance Interruption Warning
event and then trigger lambda function for sending slack notifications.aws-node-termination-handler
kubernetes DaemonSet also takes action when catching the event
Now we start creating CDK stacks
๐ Create Lambda function - send slack
Lambda handler parses the event to send a slack message which contains the event detail-type, instance ID and action
```plaintext import requests from datetime import datetime import json
def send_slack(msg): """ Send payload to slack """ webhook_url = "hooks.slack.com/services**" footer_icon = 'cdkworkshop.com/images/new-cdk-logo.png' color = '#36C5F0' level = ':white_check_mark: INFO :white_check_mark:' curr_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S') payload = {"username": "Test", "attachments": [{ "pretext": level, "color": color, "text": f"{msg}", "footer": f"{curr_time}", "footer_icon": footer_icon}]} requests.post(webhook_url, data=json.dumps(payload), headers={'Content-Type': 'application/json'})
def handler(event, context): detail_type = event.get('detail-type', '') instance_id = event['detail']['instance-id'] action = event['detail']['instance-action'] message = f'{detail_type}\nresource: {instance_id}, action: {action}' send_slack(message)
* Lambda stack
`lambda.ts`
```plaintext
const send_slack = new lambda.Function(this, 'slackLambda', {
description: 'Send Event message to slack',
runtime: lambda.Runtime.PYTHON_3_8,
code: lambda.Code.fromAsset('lambda-code/app.zip'),
handler: 'app.handler',
functionName: 'send-slack-spot-event'
});
๐ Create event rule of spot interruption
The event listens to
EC2 Spot Instance Interruption Warning
to trigger the above lambda functionevent.ts
const spot_event = new event.Rule(this, 'SpotEventRule', { description: 'Spot termination event rule', ruleName: 'spot-event', eventPattern: { source: ['aws.ec2'], detailType: ['EC2 Spot Instance Interruption Warning'], detail: { 'instance-action': ['terminate'] } } }); spot_event.addTarget(new event_target.LambdaFunction(send_slack));
๐ Create FIS service role
IAM role for AWS FIS permissions to handle the target resources here is EC2 instance
fis_role.ts
const fis_role = new iam.Role(this, 'FisRole', { roleName: 'spot-fis-test', assumedBy: new iam.ServicePrincipal('fis.amazonaws.com') }); const ec2_policy_sts = new iam.PolicyStatement({ sid: 'SpotFisTest', effect: iam.Effect.ALLOW, actions: [ 'ec2:DescribeInstances', 'ec2:StopInstances', 'ec2:SendSpotInstanceInterruptions' ], resources: ['arn:aws:ec2:ap-northeast-1:*:instance/*'], conditions: { 'StringEquals': {'aws:RequestedRegion': props?.env?.region} } }); fis_role.addToPolicy(ec2_policy_sts);
๐ Create FIS Experiment Template
The experiment template includes:
Action:
send-spot-instance-interruptions
, parameter:durationBeforeInterruption
PT2M
Targets:
Resource type:
aws:ec2:spot-instance
Resource filters:
State.Name
=running
Selection mode:
COUNT(1)
Stack
fis.ts
const target: fis.CfnExperimentTemplate.ExperimentTemplateTargetProperty = { resourceType: 'aws:ec2:spot-instance', resourceTags: {'eks:nodegroup-name': 'eks-airflow-nodegroup-pet'}, selectionMode: 'COUNT(1)', filters: [{ path: 'State.Name', values: ['running'] }] }; const action: fis.CfnExperimentTemplate.ExperimentTemplateActionProperty = { actionId: 'aws:ec2:send-spot-instance-interruptions', parameters: {'durationBeforeInterruption': 'PT2M'}, targets: {'SpotInstances': 'spot-fis-target'} }; const fis_exp = new fis.CfnExperimentTemplate(this, 'FisExperiment', { description: 'Spot Interruption Simulate', roleArn: fis_role.roleArn, tags: { 'Name': 'spot-interrupt-test', 'cdk': 'fis-stack' }, stopConditions: [ {source: 'none'} ], targets: {'spot-fis-target': target}, actions: {'send-spot-instance-interruptions': action} });
๐ Start experiment template
- Start
- Complete
- Slack notify the event and
aws-node-termination-handler
action either
๐ Conclusion
This kind of FIS experiment helps us to test the scenario of spot interruption to check
aws-node-termination-handler
and fault tolerance of the applicationWe should also know about FIS pricing. The AWS FIS price is
$0.10
per action-minute.