My Quest for CloudWatch Monitoring on AWS Lightsail with Terraform

February 12, 2026

#aws #lightsail #cloudwatch #terraform #monitoring #iac #devops

Last month, I started a small side project related to the OpenClaw craze. After some checks I decided I needed something beyond the regular Vercel free setup because I had SSE and some server needs. I was looking around for solutions and decided to try AWS Lightsail. The appeal was obvious: predictable pricing (with 90 first days free), simple interface, and enough power for a modest web application. Just like EC2 but without the hassle. I took a 1GB, 2 vCPUs, 40GB SSD instance. Everything was going smoothly until I asked myself a simple question:

“How do I know if my server is running out of memory?”

I was worried my side-project might go viral and I’d need to scale up quickly.

What followed was a frustrating deep-dive into AWS documentation, Stack Overflow threads, AI chats and ultimately, a solution that I wish someone had written up clearly from the start. So here we are.

The Problem: Lightsail’s Hidden Limitation

Lightsail gives you basic metrics out of the box — CPU utilization, network in/out, and status checks. But if you want memory usage, disk space, or swap utilization, you need to install the CloudWatch agent.

“No problem,” I thought. “I’ve done this on EC2 a dozen times.”

I SSHed into my instance, installed the CloudWatch agent, configured it, and started it up. Then I waited. And waited. No metrics appeared in CloudWatch.

The agent logs told the story:

No volume is attached. Skipping waiter for attached volume.

And more ominously:

No valid credentials found for the CloudWatch agent.

Here’s what I quickly learned: Lightsail instances have an assumed IAM role that you cannot modify. On EC2, I’d simply attach an IAM role with CloudWatchAgentServerPolicy to my instance. On Lightsail, that option doesn’t exist. The instance has some IAM role baked in for Lightsail-specific operations, but you can’t add policies to it or swap it out.

The CloudWatch agent’s --mode ec2 flag tries to use instance metadata credentials which exist but lack CloudWatch permissions. And despite what you might hope, the --mode onPremise flag doesn’t magically make the agent ignore the instance role. It still tries to use it.

The Solution: IAM User Credentials

The workaround is straightforward once you know it: create an IAM user with the necessary permissions and configure the CloudWatch agent to use those credentials explicitly.

This felt a bit old-school to me. IAM users with long-lived access keys? In 2026? But sometimes the pragmatic solution is the right one.

Here’s how I set it up, step by step.

Step 1: Create an IAM User for the Agent

First, I needed an IAM user with only the permissions the CloudWatch agent requires. AWS provides a managed policy for exactly this: CloudWatchAgentServerPolicy.

resource "aws_iam_user" "cloudwatch_agent" {
  name = "lightsail-cloudwatch-agent"
  path = "/lightsail/"
}

resource "aws_iam_user_policy_attachment" "cloudwatch_agent" {
  user       = aws_iam_user.cloudwatch_agent.name
  policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}

resource "aws_iam_access_key" "cloudwatch_agent" {
  user = aws_iam_user.cloudwatch_agent.name
}

Why a dedicated user instead of reusing an existing one? Principle of least privilege. This user can only publish CloudWatch metrics — nothing else. If the credentials were ever compromised, the blast radius is minimal.

Step 2: Store Credentials on the Instance

The CloudWatch agent looks for credentials in a specific location with a specific profile name. Through trial and error (and eventually finding it buried in AWS documentation), I discovered it expects:

File: /root/.aws/credentials

[AmazonCloudWatchAgent]
aws_access_key_id = AKIA...
aws_secret_access_key = ...

The profile name AmazonCloudWatchAgent isn’t arbitrary — it’s what the agent looks for by default. You can use a different name, but then you need additional configuration.

I also learned the hard way that permissions matter:

chmod 600 /root/.aws/credentials

Without this, the agent might refuse to read the file, or worse, other users on the system could read your AWS credentials.

Step 3: Tell the Agent Where to Find Credentials

Even with credentials in place, the agent might still try to use instance metadata first. To force it to use our credentials file, we need a common configuration file:

File: /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml

[credentials]
shared_credential_profile = "AmazonCloudWatchAgent"
shared_credential_file = "/root/.aws/credentials"

This tells the agent: “Don’t guess. Use this specific profile from this specific file.”

Step 4: Configure What Metrics to Collect

Now for the fun part: deciding what to monitor. The CloudWatch agent uses a JSON configuration file. Here’s what I settled on:

{
    "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "root"
    },
    "metrics": {
        "namespace": "Lightsail/Custom",
        "append_dimensions": {
            "InstanceId": "my-instance-name"
        },
        "metrics_collected": {
            "cpu": {
                "measurement": ["cpu_usage_idle", "cpu_usage_user", "cpu_usage_system"],
                "totalcpu": true
            },
            "mem": {
                "measurement": ["mem_used_percent"]
            },
            "disk": {
                "measurement": ["used_percent"],
                "resources": ["/"]
            },
            "swap": {
                "measurement": ["used_percent"]
            }
        }
    }
}

A few things worth noting:

Custom namespace: I use Lightsail/Custom instead of the default CWAgent. This makes it easy to find my Lightsail metrics in the CloudWatch console and keeps them separate from EC2 metrics if I have those too.

Instance identifier: Since Lightsail instances don’t have EC2-style instance IDs, I use the instance name as a dimension. This becomes important when you have multiple instances and need to tell their metrics apart.

CPU idle vs. CPU used: The agent reports cpu_usage_idle (percentage of time CPU is idle). To alarm on high CPU usage, you check when idle drops below a threshold. It’s a bit backwards, but it’s how the agent works.

Step 5: Start the Agent

With everything configured, starting the agent requires a specific incantation:

/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
    -a fetch-config \
    -m onPremise \
    -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json \
    -s

The -m onPremise flag is important. Even though it doesn’t prevent the agent from trying instance credentials, it does change some behaviors around how metrics are tagged. Combined with our explicit credential configuration, this gives us what we need.

The -s flag starts the agent after loading the configuration.

Step 6: Set Up Alarms

Metrics are only useful if you act on them. I created CloudWatch alarms for the three things most likely to cause problems:

High CPU usage:

resource "aws_cloudwatch_metric_alarm" "cpu_high" {
  alarm_name          = "lightsail-cpu-high"
  comparison_operator = "LessThanThreshold"
  metric_name         = "cpu_usage_idle"
  namespace           = "Lightsail/Custom"
  threshold           = 20  # Alert when idle drops below 20% (i.e., usage above 80%)
  evaluation_periods  = 2
  period              = 300
}

High memory usage:

resource "aws_cloudwatch_metric_alarm" "memory_high" {
  alarm_name          = "lightsail-memory-high"
  comparison_operator = "GreaterThanThreshold"
  metric_name         = "mem_used_percent"
  namespace           = "Lightsail/Custom"
  threshold           = 80
  evaluation_periods  = 2
  period              = 300
}

Disk filling up:

resource "aws_cloudwatch_metric_alarm" "disk_high" {
  alarm_name          = "lightsail-disk-high"
  comparison_operator = "GreaterThanThreshold"
  metric_name         = "disk_used_percent"
  namespace           = "Lightsail/Custom"
  threshold           = 80
  evaluation_periods  = 2
  period              = 300
}

I use two evaluation periods of 5 minutes each. This means the metric needs to breach the threshold for 10 minutes before alerting. Brief spikes during deployments or cron jobs won’t wake me up at 3 AM.

Automating Everything with Terraform

Doing all of this manually once is educational. Doing it for every new instance is tedious. So I wrapped everything into a Terraform template.

The key insight is using Terraform’s templatefile function to inject the IAM credentials into the user data script:

resource "aws_lightsail_instance" "main" {
  name         = var.instance_name
  blueprint_id = var.blueprint_id
  bundle_id    = var.bundle_id

  user_data = templatefile("${path.module}/templates/user_data.sh.tpl", {
    aws_access_key_id     = aws_iam_access_key.cloudwatch_agent.id
    aws_secret_access_key = aws_iam_access_key.cloudwatch_agent.secret
    aws_region            = var.aws_region
    cloudwatch_namespace  = var.cloudwatch_namespace
    instance_name         = var.instance_name
  })
}

The user data script runs on first boot and handles:

Installing the CloudWatch agent (detecting Amazon Linux vs. Ubuntu)
Creating the credentials file
Creating the common config
Deploying the agent configuration
Starting and enabling the agent

By the time I can SSH into the instance, monitoring is already running.

Security Considerations

A few things kept me up at night while building this:

Access keys in user data: Yes, the IAM credentials end up in the instance’s user data, which is visible in the AWS console to anyone with Lightsail access. For my use case (personal projects), this is acceptable. For production workloads, you might want to fetch credentials from Secrets Manager during boot instead.

Terraform state: The access keys are stored in Terraform state. If you’re using local state, that file contains secrets. Use an encrypted S3 backend for anything beyond experimentation.

Key rotation: IAM access keys should be rotated periodically. This template doesn’t handle rotation automatically — destroying and recreating the instance would generate new keys, but that’s disruptive. For long-lived instances, consider a separate process for key rotation.

For Production Workloads

The approach above works well for personal projects and experimentation, but for production workloads you should consider more robust alternatives:

AWS Secrets Manager: Instead of embedding credentials in user data, store them in Secrets Manager and fetch them at boot time. Your user data script would retrieve the credentials via the AWS CLI or a simple API call, keeping them out of the instance metadata entirely.

Systems Manager Parameter Store: Similar to Secrets Manager but often simpler for this use case. Store the credentials as SecureString parameters and fetch them during instance initialization.

Migrate to EC2: If you need proper IAM role support, EC2 with instance profiles is the right tool. You can find EC2 instances at similar price points to Lightsail, and you get native support for temporary credentials that rotate automatically. The operational overhead is higher, but so is the security posture.

Lightsail Containers: If your workload fits a containerized model, Lightsail container services can use IAM roles properly, avoiding the credential management problem entirely.

Configuration Management: Use tools like Ansible, Chef, or Puppet to deploy credentials after instance creation rather than baking them into user data. This keeps secrets out of AWS-visible metadata and gives you a central place to manage rotation.

Automated Key Rotation: If you stick with IAM user credentials, implement automated rotation. AWS provides a Secrets Manager rotation function that can rotate IAM keys on a schedule. Your instances would fetch fresh credentials periodically rather than relying on static keys.

CloudTrail Monitoring: Regardless of which approach you choose, enable CloudTrail and set up alerts for unusual activity from your CloudWatch agent credentials. If the keys are ever used from an unexpected IP or for unexpected API calls, you want to know immediately.

The tradeoff is always between simplicity and security. This template optimizes for simplicity — getting monitoring running with minimal friction. For workloads where security is paramount, invest the additional effort in one of the approaches above.

Was It Worth It?

Absolutely. The first time I got a memory alert before my application crashed, the setup paid for itself. Lightsail’s simplicity is its strength, but knowing what’s happening inside your instance is non-negotiable for anything you care about keeping online.

The whole ordeal taught me something I keep relearning: cloud services that abstract away complexity also abstract away flexibility. Lightsail hides IAM roles to keep things simple, but that simplicity has a cost when you need something it didn’t anticipate.

The workaround isn’t elegant. Creating an IAM user with static credentials feels like a step backward from the instance-role model. But it works, it’s secure enough for most purposes, and once it’s automated, you never have to think about it again.

If you’re running anything on Lightsail that matters, set up monitoring. Your future self, debugging an outage at midnight, will thank you.