Ask Claude About Your AWS Alarms: Building an MCP Server for CloudWatch

CloudWatch Is Comprehensive. It's Also a Pain During an Incident.

The full source for this project is at github.com/ivandir/mcp-cloudwatch.

CloudWatch ingests metrics from every AWS service, runs Insights queries against your log groups, and fires alarms when thresholds are crossed. That's genuinely impressive. It's also genuinely hostile to use when something's on fire. You're context-switching between the Metrics console, the Logs Insights query editor, and the Alarms dashboard, each with its own navigation model, its own query language, its own mental overhead. By the time you've assembled the picture, the incident has moved on.

What you actually want in those moments is a conversation. "Are there any alarms firing in production right now?" "Show me the error rate for the payments service over the last hour." "Query the application logs for timeout errors in the last 15 minutes." These aren't hard questions, but answering them through the console requires multiple navigation steps and context switches for each one.

An MCP server for CloudWatch gives Claude Code the ability to answer those questions directly, in context, as part of a larger investigation. The model can chain calls: check alarms, pull the relevant metric, query the logs, synthesize an explanation, without the engineer having to drive each step manually.

Six Tools, One Prefix

The server registers six tools, all prefixed with cloudwatch_ to avoid name collisions when multiple MCP servers are active in the same Claude Code session:

Tool	What it does
`cloudwatch_list_alarms`	Returns alarms filtered by state (OK / ALARM / INSUFFICIENT_DATA) and optional name prefix. Strips ~40 boto3 fields down to 9 essentials: name, state, reason, metric, namespace, dimensions, threshold, comparisonOperator, updatedAt.
`cloudwatch_get_metric`	Fetches a time-series for any CloudWatch metric — namespace, metric name, dimensions, stat, period, and time window. Returns `{"label": ..., "points": [{"timestamp": ..., "value": ...}]}` sorted ascending by timestamp.
`cloudwatch_list_metrics`	Lists available metrics within a namespace, optionally filtered by metric name — useful for discovery when you don't know the exact metric name.
`cloudwatch_query_logs`	Runs a CloudWatch Logs Insights query across one or more log groups. Polls until Complete with a hard 30-second timeout and returns rows as `[{"field": "value", ...}]`.
`cloudwatch_get_log_events`	Fetches raw log events from a log stream with optional filter pattern, start time, and result limit.
`cloudwatch_list_log_groups`	Lists available CloudWatch Logs log groups — useful for discovery before querying.

Six tools covers the 90% case for operational use. The prefix is deliberate. When you have multiple MCP servers registered simultaneously, tool names like list_alarms collide. A database server might have one. A monitoring server might have one. The cloudwatch_ prefix namespaces them cleanly and makes the model's tool selection more predictable across a crowded session.

Session Setup: client.py

All the AWS credential setup lives in client.py, a single file that creates the two boto3 clients used by every tool:

import os
import boto3

region = os.environ.get("AWS_REGION", "us-east-1")
profile = os.environ.get("AWS_PROFILE")

session = boto3.Session(profile_name=profile, region_name=region)
cloudwatch = session.client("cloudwatch")
logs = session.client("logs")

That's the whole file. AWS_REGION defaults to us-east-1 if not set; AWS_PROFILE is optional and passes straight through to boto3.Session. The session is created once at import time, and both clients are module-level globals that the tool implementations import directly.

If you use aws-vault or aws sso login, this is zero-config. Those tools inject credentials into the environment before spawning the Claude Code process, and boto3's default credential chain picks them up automatically. No tokens in .claude/settings.json, no credential files to manage separately.

Registering with Claude Code

The quickest path is the claude mcp add command:

# minimal registration
claude mcp add cloudwatch -- python -m mcp_cloudwatch

# with explicit region and profile
claude mcp add cloudwatch \
  -e AWS_REGION=us-east-1 \
  -e AWS_PROFILE=my-profile \
  -- python -m mcp_cloudwatch

For teams that prefer committing MCP config to the repository, the equivalent .mcp.json block is:

{
  "mcpServers": {
    "cloudwatch": {
      "command": "python",
      "args": ["-m", "mcp_cloudwatch"],
      "env": {
        "AWS_REGION": "us-east-1",
        "AWS_PROFILE": "my-profile"
      }
    }
  }
}

If you work across multiple accounts, add a separate entry per account with a distinct key (e.g., cloudwatch-prod, cloudwatch-staging) and different AWS_PROFILE values in each env block. Because all six tools are namespaced, both instances can be active at the same time without ambiguity.

How query_logs Works Under the Hood

CloudWatch Logs Insights is asynchronous. You submit a query, get back a query ID, then poll until the status flips to Complete. The query_logs implementation handles that polling loop explicitly:

for _ in range(30):
    result = logs.get_query_results(queryId=query_id)
    status = result["status"]
    if status == "Complete":
        return [
            {r["field"]: r["value"] for r in row}
            for row in result["results"]
        ]
    if status in ("Failed", "Cancelled"):
        raise RuntimeError(f"Query {status.lower()}")
    time.sleep(1)

Thirty iterations at one second each gives a hard 30-second timeout. When the query completes, rows come back as a list of dicts, [{"@timestamp": "...", "@message": "..."}], which the model can read and reason about directly. If the query fails or times out, the error message carries enough context for the model to decide what to do next: reduce the time window, add a more specific filter, or surface the issue to the user.

Stripping DescribeAlarms Down to What Matters

CloudWatch's DescribeAlarms API returns around 40 fields per alarm. Most of them, evaluation period configuration, alarm ARN, AMI IDs, action ARNs, are irrelevant when the question is "what's firing and why." The list_alarms tool in tools/alarms.py strips the response down to exactly nine fields:

{
  "name": "PaymentsAPIErrorRate",
  "state": "ALARM",
  "reason": "Threshold Crossed: 3 out of the last 3 datapoints...",
  "metric": "ErrorRate",
  "namespace": "MyApp/Payments",
  "dimensions": [{"Name": "Service", "Value": "payments"}],
  "threshold": 5.0,
  "comparisonOperator": "GreaterThanOrEqualToThreshold",
  "updatedAt": "2026-03-13T14:22:00Z"
}

That's enough for the model to understand what's firing, why, and when. The same philosophy applies to get_metric, which returns {"label": ..., "points": [...]} with points sorted ascending by timestamp so the model can read the trend directly without any post-processing.

The parse_time Utility

Every tool that accepts a time window takes start_time and end_time parameters. Both accept either ISO 8601 timestamps or relative strings like -1h, -30m, or -7d. The parse_time utility resolves those relative expressions to absolute datetime objects at call time.

This matters more than it sounds. When you ask Claude Code "show me the error rate for the last hour," the model can pass -1h directly. No need to reason about the current UTC timestamp, no computing an exact ISO 8601 string. Without this, every time-windowed query needs an extra reasoning step before the actual call. It's a small thing, but across a dozen tool calls during an incident it adds up.

A Real Incident Investigation, Step by Step

The most useful thing about this server isn't any individual tool. It's chaining them. Here's what a typical investigation looks like using the actual tool names:

Ask Claude Code: "Are there any alarms in ALARM state?" → cloudwatch_list_alarms(state="ALARM") returns two alarms: PaymentsAPIErrorRate and PaymentsAPILatencyP99.
"Show me the error rate for the last 30 minutes." → cloudwatch_get_metric(namespace="MyApp/Payments", metric_name="ErrorRate", stat="Average", start_time="-30m") returns a time series with a spike starting 18 minutes ago.
"What log groups are available for the payments service?" → cloudwatch_list_log_groups() returns /ecs/payments and /ecs/payments-worker.
"Query the payments logs for errors in the last 30 minutes." → cloudwatch_query_logs(log_group_names=["/ecs/payments"], query="fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20", start_time="-30m") returns the 20 most recent error lines as structured rows.
Claude synthesizes: "The error rate spiked 18 minutes ago, correlating with a series of ConnectionRefused errors in the payments logs targeting the database endpoint. The P99 latency alarm followed 3 minutes later. This looks like a database connectivity issue rather than an application error."

That whole sequence takes about 45 seconds in Claude Code. The same investigation through the console takes 5 to 10 minutes of navigation, tab-switching, and manual correlation.

The model doesn't just fetch data, it correlates it. That correlation step is where the time savings are. A human reading three separate consoles has to hold the context in working memory. Claude Code holds it in its context window and reasons across all of it simultaneously.

IAM Permissions

The server is read-only by design. The minimal IAM policy covers exactly what the six tools need:

cloudwatch:DescribeAlarms
cloudwatch:GetMetricData
cloudwatch:ListMetrics
logs:StartQuery
logs:GetQueryResults
logs:FilterLogEvents
logs:DescribeLogGroups

No write permissions. You can't create, modify, or delete alarms through this server. Can't publish custom metrics or purge log groups. These are intentional omissions. A model that can silently modify alarm thresholds during an incident is a liability, not a tool. The read-only constraint also means these tools can be granted automatic approval in Claude Code's permission settings without any review burden. That's a real operational win.

What I'd Tell Someone Building a Similar Server

The biggest mistake in observability MCP servers is exposing too much. CloudWatch has hundreds of API operations. An MCP server that exposes all of them gives the model too many choices and produces worse results than one that exposes six well-chosen tools. Fewer, clearer tools outperform large, complete ones almost every time.

The prefix lesson generalizes beyond this project. Claude Code users routinely run five or more MCP servers simultaneously. Generic names like list_alarms or query_logs collide. Namespaced names don't.

Response shaping matters as much as the API call itself. DescribeAlarms returns 40 fields; the model needs 9. Passing the full boto3 response through wastes context tokens on irrelevant fields and buries the signal. Before designing a tool's return type, ask what the model actually needs to answer the question. That question alone catches most of the bad design decisions before they ship.

The full implementation, Python, with installation instructions and Claude Code configuration, is at github.com/ivandir/mcp-cloudwatch.