Building an AI agent to help support ISO 27001 audit checks

ISO 27001 creates a useful discipline for security. It also creates a lot of evidence work.

At Bit Zesty, we need to show that our operational controls are being checked in practice. That includes server hardening, access control, patching, firewall posture, service health, and signs that a host has drifted from its expected setup.

Some environments need a full security operations platform. But a small hardened server does not always need the cost and overhead of a SIEM. Jumpboxes, utility hosts, and self-hosted GitHub Actions runners still need regular checks. They just need checks that fit their risk and scale.

That is why we built Watchkeeper, a small AI agent for reviewing server security evidence.

The problem

Manual server checks do not scale well. They are repetitive, easy to postpone, and hard to evidence consistently during an audit.

Heavyweight monitoring can create the opposite problem. You get another tool to manage, another stream of alerts, and more noise for the team to triage.

We wanted something smaller:

daily evidence collection
clear drift detection
fewer false alarms
human review when something changes
useful records for ISO 27001 reporting

Watchkeeper sits in that gap. It reviews the evidence, emails the team when something needs attention, and produces monthly audit emails we can retain for ISO 27001 reporting.

What Watchkeeper does

Watchkeeper is a lightweight server-side AI agent that runs from cron on Ubuntu servers.

It collects a plain-text security report and compares the current host state with known-good baselines and documented expectations.

It checks things like:

listening ports
SSH configuration and authorised keys
users, groups, and sudo access
cron entries and systemd services
available security updates
UFW, fail2ban, auditd, and AppArmor
antivirus service health
local file integrity signals

The facts come from the server. The AI does not decide whether a port is open, whether a user exists, or whether a service is running. The operating system provides that evidence.

The agent reviews the evidence and decides whether a person needs to look. It also sends a monthly audit summary so we have a regular record of checks, even when there are no urgent alerts.

How the agent review works

After Watchkeeper collects the report, the agent compares the evidence with the host context.

The output is deliberately simple:

DECISION: SEND
DECISION: SKIP

If the decision is SEND, the team gets an alert email. If the decision is SKIP, the report is retained as evidence without creating another alert.

Separately, Watchkeeper sends a monthly audit email. That gives us a regular summary for ISO 27001 evidence, rather than only hearing from the system when something has changed.

The agent cannot change the server. It cannot approve a risky change. It cannot refresh the baseline. That remains a human decision.

This is the part we care about most. AI is useful here because it is constrained. It reviews structured evidence, with a narrow decision, inside a workflow we already understand.

Why context matters

Servers are not all meant to look the same.

A Tailscale jumpbox has a different normal state from a self-hosted GitHub Actions runner. A utility server might have expected cron jobs that would be suspicious elsewhere. Some hosts should have no public web service at all. Others exist to expose one.

Watchkeeper gives each host its own context file. That context can describe expected ports, approved admin users, trusted private network ranges, cloud provider recovery behaviour, expected jobs, and known absences.

This reduces noise. It also gives us a clear record of what we expect from each host, which is useful for ISO 27001 reviews.

What it caught

During testing, Watchkeeper caught the kind of issues we wanted it to catch.

In one case, a recovery session had left SSH weaker than expected. Root login and keyboard-interactive authentication had been re-enabled. Watchkeeper flagged the change, we reviewed it, fixed the SSH configuration, removed the temporary recovery keys, refreshed the baseline, and reran the check.

In another case, an antivirus service was failing because the host had no swap and the operating system was killing the daemon. That was not a dramatic security incident, but it was still a control failure. Watchkeeper surfaced it, we added a small swapfile, restarted the service, and confirmed the next report was clean.

These are ordinary operational issues. That is why they are worth automating. They matter, but a person should not have to rediscover them manually every day.

How this supports ISO 27001

ISO 27001 is easier to operate when evidence is created as part of the normal workflow.

Watchkeeper gives us a repeatable record of:

what was checked
what the expected baseline was
what changed
whether human review was needed
which monthly audit summary covered the check
what was fixed before the baseline changed

That helps with internal review and audit preparation. It also helps us talk about risk in practical terms. Instead of saying "we believe this server is hardened", we can point to recent evidence that it still matches the expected posture.

Why we built it this way

We built Watchkeeper for ourselves first.

We were not trying to create a product, a dashboard, or a SIEM replacement. We wanted a practical way to improve our own ISO 27001 evidence process and reduce the manual effort involved in checking small servers.

But the principles are reusable. The same pattern could be applied to other routine control checks where the facts can be collected from a trusted system and reviewed against documented expectations.

Over time, this kind of approach could also grow into something broader. You could add central reporting, richer integrations, more control types, or connect the evidence into an existing SIEM. The important point is the order: start with reliable evidence and clear escalation rules, then add more automation where it helps.

Lessons learned

The useful AI opportunities are not always the flashy ones.

For a CEO, the value of Watchkeeper is not that "AI does security". It is that a repetitive control activity becomes more consistent, better evidenced, and less dependent on someone remembering to run manual checks or compile audit updates by hand.

The same pattern applies beyond server monitoring:

collect facts from a trusted system
write down what good looks like
use AI to review the evidence
limit what the AI can do
send unclear or risky cases to a person
keep the evidence

That is where AI agents are useful in real organisations. They work best as reviewers inside controlled processes, not as unsupervised operators.

Discovery

Software development

AI and automation

Design

Support and maintenance

Team augmentation

The problem

What Watchkeeper does

How the agent review works

Why context matters

What it caught

How this supports ISO 27001

Why we built it this way

Lessons learned

Do you need help with your application?

Related Posts

OpenAI AgentKit: The Reality Behind the Drag-and-Drop Agent Builder

What is Vibe Coding: How to use GenAI for Software Engineering

User Feedback Analysis: Step-by-Step Guide [2025]