Observability 101: Start with Logs

Stop over-engineering your observability stack. Start with what you already have

Overview

It’s easy to get overwhelmed by all of the advice that exists to instrument your systems with telemetry. Vendors and experienced practioners will tell you that you have to have the 3-pillars of observability supported (metrics, traces, and logs).

The truth is, most applications start with logs and you can do a lot with just that. This post will show you how to create log-based alerts since logs are the most common telemetry. You probably already have these in your application.

This post will provide a step-by-step demo of how to setup log-based alerting. I’ll use open-source tools exclusively. You can find the repo to run the setup yourself here.

OpenSearch : an open-source analytics engine that supports telemetry like logs + metrics

fluentbit: lightweight telemetry processor and forwarder

Docker: popular containerization technology

Container Setup

We’ll first start out by defining the 3 containers to run with a docker-compose file: opensearch, opensearch dashboards, and fluentbit. Please note: utilize your own admin password and leverage a secrets management platform like HashiCorp Vault to store your passwords so you do not commit them into your code base. The code below is just for demo purposes.

services:
  opensearch:
    image: opensearchproject/opensearch:2.12.0
    container_name: opensearch
    environment:
      # Run as single-node cluster (no clustering overhead)
      - discovery.type=single-node
      # Required for OpenSearch 2.12+
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=YourStr0ng!Password
      # Limit JVM heap to 1GB (adjust based on available memory)
      - "OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g"
    ports:
      # REST API
      - 9200:9200
    volumes:
      # Persist index data across restarts
      - opensearch-data:/usr/share/opensearch/data
    healthcheck:
      test: ["CMD-SHELL", "curl -sk -u admin:YourStr0ng!Password https://localhost:9200/_cluster/health || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5

  opensearch-dashboards:
    image: opensearchproject/opensearch-dashboards:2.12.0
    container_name: opensearch-dashboards
    ports:
      # Web UI
      - 5601:5601
    environment:
      # Point to the OpenSearch container
      OPENSEARCH_HOSTS: '["https://opensearch:9200"]'
    depends_on:
      opensearch:
        # Wait for OpenSearch to be healthy before starting
        condition: service_healthy

  fluent-bit:
    image: fluent/fluent-bit:latest
    container_name: fluent-bit
    volumes:
      # Mount config files
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
      - ./parsers.conf:/fluent-bit/etc/parsers.conf
      # Mount logs directory - write logs here to ship to OpenSearch
      - ./logs:/var/log/app
    depends_on:
      opensearch:
        condition: service_healthy

fluentbit will write logs to OpenSearch and then we can login to the dashboard containerto see our logs + alerts in the UI.

The fluent-bit.conf also contains config for fluentbit to know where to locate the logs and forward them to OpenSearch. In the config below, we are looking for logs at /var/log/app/*.log and they are being forwarded to opensearch at port 9200 (which is where our opensearch instance is running at). The docker compose volume mount is at ./logs:var/log/app since that is a standard Linux convention.

[SERVICE]
    Flush        5
    Log_Level    info
    Parsers_File parsers.conf

[INPUT]
    Name         tail
    Path         /var/log/app/*.log
    Tag          app.logs
    Parser       json
    Refresh_Interval 5

[OUTPUT]
    Name         opensearch
    Match        *
    Host         opensearch
    Port         9200
    HTTP_User    admin
    HTTP_Passwd  YourStr0ng!Password
    Index        application-logs
    Logstash_Format On
    Logstash_Prefix app-logs
    tls          On
    tls.verify   Off
    Suppress_Type_Name On

You can run these containers that are defined in the repo I created by using docker compose up -d

Generate Logs

Once you have the containers running, you can generate some sample logs that will automatically get shipped to OpenSearch.

# Batch of logs (to trigger High Error Rate Monitor)
for i in {1..8}; do
  echo '{"timestamp":"'$(date -u +%Y-%m-%dT%H:%M:%S.000)'","level":"ERROR","message":"Test error '$i'","service":"my-app"}' >> logs/app.log
done

View the Logs

Once these are generated and forwarded to OpenSearch, we can log into the dashboard using the defined creds at localhost:5601

Navigate to Discover -> Create Index Pattern and input a regex like app-logs* to match the logstash prefix used by fluentbit. You’ll also be prompted to pick a time field (you can select @timestamp).

You can now navigate to Discover to view your logs!

Configure Alerts

Now that we have logs flowing into our OpenSearch instance, we want to configure alerts that are based on these. You can do this by defining a monitor, which can detect anomalies in your data on a continuous basis and allow you to configure multiple alerts that utilize those signals.

In the UI dashboard you can navigate to Alerting -> Monitors -> Create Monitor to configure your own using the wizard and also using the query editor. Below is a monitor I have written that can be copy/pasted using the query editor: It looks at the last 5min and searches for the key/value pair level:ERRORto match any logs that have an error level defined.

{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "@timestamp": {
                            "from": "now-5m",
                            "to": "now",
                            "include_lower": true,
                            "include_upper": true,
                            "boost": 1
                        }
                    }
                },
                {
                    "term": {
                        "level.keyword": {
                            "value": "ERROR",
                            "boost": 1
                        }
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "aggregations": {
        "error_count": {
            "value_count": {
                "field": "_id"
            }
        }
    }
}

We can define 1 or more triggers for each of these monitors that will define various thresholds and severities that go along with that. In the screenshot below, this will fire with a SEV-2 severity whenever ≥ 5 errors occur within the 5 minute window utilized by the monitor:

Based on the data, we can see that this alert fired 2x for me, once at 22:40 UTC and the other time at 22:44 UTC

You can also have the alerts forwarded to Slack. Go to the OpenSearch dashboard and navigate to Notifications. You can provide a Slack webhook url that is configured to forward alerts to a channel you provided when you setup the webhook.

I can then update my monitor trigger to include an action to send the notification to Slack.

After sending a test alert, I regenerated some additional error logs using the scripts provided in the repo and tada! We have our slack alert.

Conclusion

Observability can get complex and expensive very quickly, but getting started with it isn’t too bad when you already have some telemetry like logs. Logs are a great place to start with and there are great open-source tools like OpenSearch + fluentbit that are powerful and can provide alerting out-of-the-box. This entire demo was hosted on my local machine and I didn’t pay a dime for any of it.

Of course, in production you likely will need to host these containers somewhere or possible pay for a managed service if you don’t want to self-host yourself. Still, the main point is that you can deliver a lot of observability right away without having to configure separate systems for metrics + traces.

With observability, start simple and only layer on complexity when it’s warranted.

Observability 101: Start with Logs

Overview

Container Setup

Generate Logs

View the Logs

Configure Alerts

Conclusion

Want help scaling your ops?

Related Posts

Rollback First, Ask Questions Later

Your Alerts Are Lying to You: Why More Monitoring Won’t Save Your On-Call Engineers

Grafana vs. Prometheus Agent