Abstract
Service Connection represents a Service Principal in Azure DevOps, an identity which uses Headless authentication.
We create a connection from Azure Pipelines for external and remote services to execute tasks in a job. Once you establish a connection, you can view, edit, and add security to the service connection.
To make Azure DevOps (ADO) work with AWS services we install the plugin AWS Toolkit for Microsoft Azure DevOps which contains tasks you can use in build and release definitions in Azure DevOps to interact with AWS services (Especially when ADO agent is not EC2 instances which can assume instance profile)
When we create a service connection with AWS type in ADO, we need to create AWS account and fill out the Access key and credential key for the service connection to access AWS service.
It is recommended to rotate IAM access keys periodically as a security best practice. In this blog post, you will learn about how to create AWS service connection with the least privilege and security best practice and how to rotate the access key automatically using AWS serverless services such as AWS lambda, Eventbridge rule and AWS Stepfunction.
Table Of Contents
๐ Solution overview
IAM access key is used by Azure DevOps pipeline to manage AWS development resources as well as some production resources, so it should be rotated weekly. And to reduce the impact towards the pipeline jobs, we schedule this operation on Sunday.
The operation is orchestrated by AWS stepfunction which is ingested by lambda functions to handle rotating access keys, store the access key in Secret Manager and update the new access key to Azure DevOps service connection (AWS type), finally, send the result status to the slack channel.
The operation also requires ADO credentials such as personal access token (PAT), workspace URL, and project ID which are stored in SSM parameter store.
๐ Process flow detail
Scheduled Eventbridge rule weekly trigger step function on Sunday
Step function executes the state machine which includes
Invoke lambda function to create a new access key -> delete old one -> Update credential of new access key to secret manager
Invoke lambda function to re-create Azure DevOps service endpoint based on the new Access key and credential from the secret manager and PAT from parameter store.
Invoke lambda function to send slack message of result status
๐ Lambda function - Rotate Access key and update to secret manager
Source handler rotate-access-key.py
The functions requires IAM service-role which has permission to Create/Delete IAM Access key and put to secret manager.
๐ Lambda function - Send to slack
Source handler send-slack.py
The function require IAM service-role which has permission to get Slack webhook URL from parameter store
๐ Lambda function - Azure DevOps service connection
- Solution of creating Azure DevOps service connection - Using lambda function, but there are two ways for this
Build lambda custom layer which contains Azure DevOps Python module to run Azure DevOps CLI
Build docker lambda container image
- Challenges
Why do we have challenges here? Big issue azure-cli package could lose some weight. At the time of writing this post, installing or packaging azure-cli takes more than 1GB size which is too big for the lambda function, especially custom lambda layer has a limit of 250 MB (unzipped) for the deployment zip file
Fortunately, there is a workaround to reduce the size of the docker image by just installing enough Azure cli libraries
- Build lambda container image using codepipeline
Dockerfile which builds container image just about
299.41 MB
. The lambda container image plays as azure CLI to runazure DevOps
commandsFor CI, build and automate deploy to lambda docker container image, we use Codepipeline with the following stages
Pull Source code for commit merge to master
Trigger codebuild with buildspec.yml
We can use azure-DevOps-Python-api to write Python code for handling ADO service connection but using its wrapper tool such as Azure DevOps CLI is much more convenient so I write bash script run.sh and use Python to execute it
subprocess.check_call(f"export AZURE_DevOps_EXT_PAT={_pat}; ./run.sh {access_key} {cred}", shell=True)
๐ Test the step function
Pre-requisite: You need to deploy pipeline stack first, push code to codecommit so that it will trigger pipeline to deploy stacks and trigger build/push/update container image
Let's check the full diagram
Manually trigger
start execution
of the state machines to testFailed case
Success case -> Trigger a sample Azure DevOps pipeline which uses AWS resources to test
๐ Conclusion
We now have an automated way to weekly rotate AWS access key of Azure DevOps service connection. Any failed state/steps are alerted to slack with detail of which the failed step is and the error message.
You should aware that the
simflexcloud-d1-azure-devops-sfn-rotate-key-task
task does not include AWS Credential key in itsOutput
due to security, so it must get that value from the Secret Manager
References: