Diving Deep into AWS DevOps Agent
📍 TBA
Modern cloud applications generate overwhelming amounts of operational data across distributed systems. When incidents occur:
-
- Mean Time to Resolution (MTTR) is too high due to manual investigation across multiple tools
- Context switching between observability platforms, logs, metrics and infrastructure consoles slows response
- Knowledge is siloed, making incident response dependent on specific team members
- Preventative measures are reactive and inconsistent, leading to recurring incidents
- On-call burden impacts team morale and productivity
AWS DevOps Agent is a frontier AI agent that autonomously investigates, resolves, and prevents operational incidents across distributed cloud applications. The agent leverages advanced natural language processing to conduct real-time root cause analysis by correlating data across metrics, logs, traces, and deployment events from Amazon CloudWatch, third-party observability platforms and open-source tools.
Your workshop environment has multiple applications deployed (modules to be added incrementally):
Module 1: Simple Lambda: A Lambda function that writes to an S3 Bucket and DynamoDB Table every 60 seconds
Module 2: Data Analytics pipeline - a Pipeline that uses Amazon Managed Workflows for Apache Airflow (MWAA) for workflow orchestration, AWS Lambda for serverless data generation, and AWS Glue for ETL (Extract, Transform, Load) processing and transformation.
To start, we will set up an Agent Space and investigate a CloudWatch alarm. Once the agent is set up and operational, you will go through simluating errors and leverage Devops Agent to identify the root cause, generate mitigation plan and prevention mechanisms.
Delivered by Dionysios Kakaletris (Senior Technical Account Manager @ AWS) and Thomas Wieger (Senior Solutions Architect @ AWS)