Home  /  Blog  /  Wazuh Backup, Recovery and Business Continuity

● SOC

Wazuh Backup, Recovery and Business Continuity

A SIEM that loses its own configuration or index data during an outage is worse than useless during an incident. Here is how to back up, restore and continuity-plan a Wazuh deployment so your monitoring survives hardware failure, ransomware and human error.

Published 26 June 2026 11 min read Codesecure SOC Practice SOC

Key Takeaways

  • A Wazuh deployment has three independent data tiers to protect: manager configuration and ruleset (filesystem), agent registration keys and state (filesystem plus database), and indexer event data (OpenSearch snapshots). Each needs its own backup method.
  • Configuration and ruleset change rarely but are catastrophic to lose. Version-control ossec.conf, custom rules and decoders in Git, and back up /var/ossec/etc daily.
  • Indexer event data is the largest tier. Use OpenSearch snapshot repositories to object storage (S3, Azure Blob, MinIO) with a tiered retention policy rather than full nightly dumps.
  • Agent keys matter: losing client.keys forces re-enrollment of every agent. Back it up and treat it as a secret.
  • Restore testing is the only proof a backup works. Schedule quarterly restore drills to a clean environment and measure actual RTO against your target.

What a Wazuh Deployment Actually Contains

Before you can back up Wazuh you have to understand that it is not a single application with a single data store. A production deployment is three loosely coupled components, the Wazuh manager, the Wazuh indexer (an OpenSearch fork) and the Wazuh dashboard, and each holds a different class of data with a different change rate, size and recovery priority.

The Wazuh manager holds your configuration and detection logic. This lives almost entirely under /var/ossec: the main ossec.conf, custom rules in /var/ossec/etc/rules, custom decoders in /var/ossec/etc/decoders, CDB lists, active-response scripts, agent group configuration under /var/ossec/etc/shared, and the agent registration database. This data is small (typically tens of megabytes) but represents months of tuning work. Losing it means rebuilding every custom rule and re-tuning every false positive from scratch.

The Wazuh indexer holds the actual security events: alerts, archives, file integrity monitoring state, vulnerability detection results. This is by far the largest tier, often hundreds of gigabytes to terabytes depending on retention and log volume. It is also the tier most often skipped in naive backup plans because its size makes a simple file copy impractical.

The dashboard holds saved objects: index patterns, visualisations, dashboards, saved searches and alerting definitions. These are stored inside the indexer in dedicated system indices, so they are recoverable if you snapshot those indices, but they are easy to forget because they live separately from your event data.

Backing Up Configuration, Rules and Decoders

The single highest-value backup in any Wazuh deployment is the configuration and ruleset, and the best way to protect it is not a backup tool at all, it is version control. Keep ossec.conf, every custom rules file, every custom decoder, CDB lists and active-response scripts in a Git repository. Every tuning change becomes a commit with an author, a timestamp and a reason. You get a full audit trail, instant rollback of a bad rule, and a backup that lives off the SIEM host by definition.

Version control does not capture everything, though. The agent registration database (client.keys plus the internal agent state) is generated at runtime and is not config you would commit. Run a daily filesystem backup of /var/ossec/etc and /var/ossec/queue/agents-timestamp to capture registration state. A simple approach is a cron job that creates a timestamped tar archive and pushes it to object storage or a separate backup host.

Active-response scripts, integrations (the integrations directory used for Slack, VirusTotal, TheHive and similar), and any wodle or custom Python modules also live on the manager filesystem and must be in either Git or the daily archive. A common gap is the API configuration and credentials under /var/ossec/api, which you need if your n8n or TheHive automation authenticates to the Wazuh API. Back these up as secrets, encrypted at rest, never in a plain Git repository.

For multi-manager clusters, remember that the master node distributes shared configuration to worker nodes. You only need to back up the master node's configuration tier, but you must back up each node's local client.keys and node key. Document which node is the master so a restore does not accidentally promote a worker with stale config.

Need a Managed SOC Without Splunk-Level Costs?

Codesecure designs, deploys and operates open-source SOC stacks built on Wazuh, n8n, TheHive, Cortex and MISP for businesses across India, Singapore, UAE and Malaysia. ISO/IEC 27001:2022 certified delivery, named OSCP and CISSP analysts, fixed-price proposals.

See SOC Services →

Backing Up Indexer Event Data with Snapshots

Event data is too large for file copies, so Wazuh inherits OpenSearch's native snapshot mechanism. A snapshot is an incremental, point-in-time copy of one or more indices written to a registered snapshot repository. The first snapshot copies all data; subsequent snapshots copy only changed segments, which keeps storage and runtime manageable even on terabyte-scale indices.

Register a snapshot repository backed by durable, off-host storage. The standard choices are an S3-compatible bucket (AWS S3, MinIO, Wasabi), Azure Blob storage, or a shared filesystem mount that itself lives on separate hardware. Never register the repository on the same disk as the live indexer data, because a disk failure would then destroy both the primary and the backup at once.

Schedule snapshots through the OpenSearch Snapshot Management policy or an external scheduler. A practical pattern is hourly snapshots of the current day's hot indices for a low recovery point objective on recent data, plus a daily snapshot of all indices for completeness. Snapshots are cheap to take and the incremental design means frequent snapshots do not multiply storage cost.

Pair snapshots with an Index State Management policy that ages older indices to a warm or cold tier and eventually deletes them per your retention requirement. This keeps the snapshot footprint bounded and aligns recovery with compliance: you snapshot what you are legally required to retain (often one year for security logs, up to six years for regulated sectors) and let everything older roll off. Always snapshot the system indices that hold dashboard saved objects and alerting definitions, or a restore will recover your data but leave you with a blank dashboard.

Disaster Recovery Architecture and RPO/RTO

Backups answer the question of what you can restore. Disaster recovery answers how fast and to what point. Start by setting two numbers with the business: the Recovery Point Objective (how much data you can afford to lose, measured in time) and the Recovery Time Objective (how long monitoring can be down before the risk is unacceptable). For most SOC deployments a sensible target is an RPO of one hour for recent events and an RTO of four to eight hours for full service, but a payment processor or a regulated bank will demand far tighter numbers.

The cheapest DR posture is backup-and-restore: snapshots to object storage, configuration in Git, and a documented runbook to rebuild a manager and indexer from scratch on fresh infrastructure. This is appropriate when an RTO of several hours is acceptable. Provision-as-code (Ansible, Terraform) the entire stack so the rebuild is a script, not a memory exercise during a crisis.

For tighter RTO, run a warm standby. A second Wazuh manager and indexer in a different availability zone or data centre stays current through indexer cross-cluster replication and a synchronised configuration repository. On failure you redirect agents to the standby manager (agents can be configured with multiple manager addresses for automatic failover) and promote the standby. This cuts RTO to minutes but doubles infrastructure cost.

Whatever the architecture, agents must know where to fail over. Configure each agent with multiple <server> entries so that if the primary manager is unreachable the agent automatically connects to the standby. Without this, a manager outage silently blinds your entire fleet until someone manually reconfigures every endpoint, which is exactly the wrong activity during an active incident.

Protecting Backups Against Ransomware and Insider Error

A SIEM is a high-value target precisely because it records the attacker's activity. Sophisticated intrusions deliberately attempt to wipe or encrypt SIEM data and its backups to destroy evidence and slow response. A backup that an attacker with manager access can also delete is not a backup, it is a convenience.

Apply immutability to your snapshot repository. S3 Object Lock, Azure Blob immutability policies and MinIO object locking let you write snapshots that cannot be deleted or overwritten for a defined retention window, even by an administrator. This is the single most important control for SIEM backup integrity, because it survives a fully compromised manager.

Separate the credentials. The Wazuh host should be able to write new snapshots but not delete existing ones. Use a scoped IAM role or storage policy that grants put-object but denies delete-object on the backup bucket. Snapshot deletion for retention is then handled by a separate, tightly controlled identity or a storage-side lifecycle rule, not by the SIEM itself.

Do not forget human error, which causes more lost data than ransomware. Versioned object storage protects against an accidental overwrite, and the off-host Git repository protects against a fat-fingered rule deletion. Maintain at least one backup copy in a different account or subscription so that a compromised or misconfigured primary cloud account cannot take the backups down with it.

Want Your SOC Automation Engineered Properly?

Whether you need Wazuh deployed, n8n SOAR playbooks built, or a 24x7 managed SOC retainer, our SOC lead is available for a 30-minute free scoping call to map your detection and response gaps.

Talk to a SOC Lead →

Restore Testing and Continuity Drills

An untested backup is a hypothesis, not a safeguard. The only way to know your Wazuh backups work is to restore them, and the only way to know your RTO target is realistic is to time an actual rebuild. Codesecure schedules a quarterly restore drill for managed-SOC clients: spin up a clean isolated environment, restore the manager configuration from Git and the daily archive, restore a representative set of indices from snapshots, point a test agent at the rebuilt manager and confirm events flow end to end.

Measure three things during every drill. First, did the restore succeed completely, or were there missing indices, broken dashboards or failed agent re-enrollment. Second, how long did it actually take, and how does that compare to your RTO commitment. Third, was the runbook accurate, or did the engineer have to improvise. Every drill produces a short list of fixes to the runbook and the backup configuration.

Test the failure modes you actually fear, not just the easy ones. Restore after a simulated ransomware event by treating the live environment as untrusted and rebuilding entirely from immutable backups. Restore a single accidentally deleted custom rule from Git history. Restore the dashboard saved objects after a system-index loss. Each scenario exercises a different part of the backup design, and each tends to reveal a different gap.

Document the outcome and assign owners to every gap. Business continuity is not a one-time project, it is a recurring discipline. The deployments that recover cleanly from real incidents are the ones that rehearsed the recovery before they needed it, and treated every failed drill as a finding to close rather than an embarrassment to hide.

SHARE

Frequently Asked Questions

How often should I back up a Wazuh deployment?

Configuration and ruleset should be in Git with every change committed, plus a daily filesystem archive of /var/ossec/etc and agent registration state. Indexer event data should be snapshotted at least daily, and ideally hourly for the current day's hot indices to keep your recovery point objective low. Agent keys (client.keys) should be in the daily archive and treated as a secret.

What is the difference between backing up the Wazuh manager and the indexer?

The manager holds configuration, rules, decoders and agent registration, which is small (tens of megabytes) and best protected with Git plus a daily filesystem archive. The indexer holds event data, which is large (hundreds of gigabytes to terabytes) and must use OpenSearch incremental snapshots to object storage. They are different tiers with different sizes, change rates and recovery priorities, so they need different methods.

Can I just copy the /var/ossec directory to back up Wazuh?

That captures the manager configuration and rules, which is valuable, but it does not capture the indexer event data, which lives in OpenSearch indices, not on the manager filesystem. A full backup needs the filesystem copy for configuration plus OpenSearch snapshots for event data plus the dashboard saved objects from the system indices. A directory copy alone leaves your event history unprotected.

How do I make Wazuh backups resistant to ransomware?

Write snapshots to an object store with immutability enabled (S3 Object Lock, Azure Blob immutability or MinIO object locking) so they cannot be deleted within the retention window even by a compromised administrator. Use scoped credentials that allow writing new snapshots but deny deleting existing ones, and keep at least one copy in a separate cloud account or subscription so a single compromised account cannot destroy the backups.

What RPO and RTO should a SOC aim for with Wazuh?

For most deployments an RPO of one hour for recent events and an RTO of four to eight hours for full service is a reasonable baseline, achievable with hourly snapshots and an infrastructure-as-code rebuild. Regulated sectors such as banking or payments often demand much tighter numbers, which usually requires a warm standby manager and indexer with cross-cluster replication rather than backup-and-restore alone.

How do I test that my Wazuh backups actually work?

Run a quarterly restore drill: build a clean isolated environment, restore configuration from Git and the daily archive, restore a representative set of indices from snapshots, point a test agent at the rebuilt manager and confirm events flow end to end. Measure whether the restore succeeded completely, how long it took against your RTO target, and whether the runbook was accurate. Treat every gap as a finding to close before the next drill.

CS

Codesecure SOC Practice

OSCP / CEH / CISSP Certified Analysts

Codesecure Solutions is ISO/IEC 27001:2022 certified and runs open-source managed SOC stacks (Wazuh, n8n, TheHive, Cortex, MISP) for businesses across India, Singapore, UAE and Malaysia. Named OSCP, CEH and CISSP analysts, fixed-price implementation and managed-SOC retainers, board-ready reporting.

✓ ISO/IEC 27001:2022 Certified

Make Your SIEM Survive the Incident It Detects

Codesecure designs Wazuh deployments with backup, immutable snapshots, disaster recovery and quarterly restore drills built in from day one. ISO/IEC 27001:2022 certified delivery, named OSCP and CISSP analysts, fixed-price implementation and managed-SOC retainers across India, Singapore, UAE and Malaysia.