Key Takeaways
- High-volume Wazuh has three pressure points: the analysis engine (analysisd), the indexer write path and storage I/O. Optimisation targets whichever is the current bottleneck.
- Indexer sharding determines write parallelism and search speed. Too few shards limits throughput; too many shards wastes memory on overhead.
- The analysisd event queue is the canary. A queue consistently near capacity means events are about to be dropped and is the clearest signal to scale.
- EPS capacity is bounded by the slowest stage. A fast indexer cannot help if analysisd is single-threaded against a heavy ruleset.
- Hardware sizing follows EPS, retention and search load. RAM for the indexer JVM and fast disk for indices are usually the binding constraints, not CPU.
Find the Real Bottleneck First
Performance tuning without measurement is guesswork. A high-volume Wazuh deployment has three distinct pressure points, and the right fix depends entirely on which one is saturated. The first is analysisd, where decoding and rule evaluation happen. The second is the indexer write path, where alerts and archives are stored. The third is storage I/O, which both of the others depend on.
Symptoms point to different stages. If alerts are delayed but the indexer is idle, analysisd is the bottleneck. If the analysisd queue is empty but searches are slow and indexing lags, the indexer or its disk is the bottleneck. If everything is slow and disk wait is high, storage I/O is starving the whole pipeline. Diagnose before you tune.
The metrics that matter are the analysisd event queue usage, the indexer bulk rejection and queue rejection counts, JVM heap pressure and disk I/O wait. Wazuh exposes the internal queue figures and the indexer exposes node and thread-pool statistics. Sustained scaling decisions should be driven by these numbers, not by a single bad afternoon.
Tuning the Analysis Engine
analysisd is single-process but multi-threaded for rule matching. On a busy manager, the default thread count for rule evaluation can become the ceiling. Increasing the analysisd rule-matching threads to use available cores lets the engine drain its event queue faster under bursty load, which directly raises sustainable EPS.
Reducing useless work is equally powerful. Every event that reaches analysisd consumes decoding and matching cycles even if it never produces an alert. Filtering high-volume, low-value log sources at the agent, and writing tight decoders that exit early, cuts the load analysisd must carry. At high EPS, dropping ten percent of pointless events can be the difference between a healthy and a saturated queue.
The event queue size itself can be raised to absorb bigger bursts, but a larger queue only buys time. If the steady-state arrival rate exceeds the steady-state processing rate, a bigger queue simply delays the moment events start dropping. Queue sizing handles spikes; thread count and event reduction handle sustained load.
Need a Wazuh-Based Managed SOC?
Codesecure deploys and operates Wazuh, TheHive, n8n, Cortex and MISP as a managed SOC. 24x7 named analysts, detection engineering, tuned dashboards and audit-ready compliance reporting. No commercial SIEM licensing.
See Managed SOC →Indexer Sharding and Write Throughput
The Wazuh indexer (an OpenSearch derivative) writes data into indices that are split into shards. Shards are the unit of parallelism: writes and searches are distributed across them. Choosing the shard count is the single most consequential indexer decision for high-volume estates. Too few shards and write throughput is capped because all writes funnel through limited parallelism. Too many shards and the cluster wastes heap and file handles on per-shard overhead.
A practical target is keeping each shard in a sensible size band (commonly tens of gigabytes) so shards are big enough to be efficient but small enough to recover and rebalance quickly. Index lifecycle management should roll indices over by size or age, and the shard count per index should be set with the node count in mind so shards distribute evenly across data nodes.
The indexer also has bulk and write thread pools with finite queues. Under sustained high EPS, write rejections appear when these pools are full, which means events are being refused. Rejections are a definitive signal that the indexer cannot keep pace and that you need more data nodes, faster disk or a better shard layout, not just a config tweak.
Storage, JVM Heap and Memory
Indexing is heavily I/O bound. SSD or NVMe storage for the active indices is effectively mandatory at high volume; spinning disk simply cannot sustain the random write and merge load. Separating hot indices (recent, heavily written and searched) from warm and cold tiers on cheaper storage controls cost while keeping the write path fast.
JVM heap for the indexer should generally sit around half of available RAM, leaving the rest for the operating system file cache that OpenSearch relies on heavily. Oversizing the heap past recommended limits hurts rather than helps because of garbage-collection behaviour. Heap pressure and frequent GC pauses are a common, easily-missed cause of indexing lag that masquerades as a throughput problem.
Retention multiplies storage requirements directly. Storing one hundred days of high-EPS archives needs roughly ten times the disk of ten days. Decide early what must be hot and searchable versus what can be compressed to cold storage or off-cluster, because retention strategy drives the storage budget more than EPS alone does.
Understanding EPS Capacity
Events per second is the headline capacity figure, but a single number hides important detail. Sustainable EPS is bounded by the slowest stage in the pipeline, so a manager that can decode 20,000 EPS paired with an indexer that can only write 8,000 EPS has a real ceiling of 8,000. Sizing must balance the stages, not maximise one.
Peak versus average matters enormously. Security telemetry is bursty: a scan, an outbreak or a misconfigured device can multiply the event rate for minutes. Sizing for average EPS guarantees data loss during exactly the incidents you most need to capture. Capacity should be planned against realistic peaks with headroom, typically sizing to a comfortable fraction of tested maximum throughput.
Measure your own EPS rather than estimating. Wazuh and the indexer report ingestion rates, and a short period of observation reveals both the average and the burst profile of the real environment. Codesecure benchmarks EPS against tested node throughput before committing a sizing, so clients pay for the capacity they need and keep headroom for incident-driven spikes.
Want Help With Detection Engineering?
Whether you run Wazuh in-house or want a fully managed service, our SOC engineers build custom rules, dashboards and integrations tuned to your environment. ISO/IEC 27001:2022 certified delivery, fixed-fee monthly retainer.
Talk to a SOC Engineer →When Tuning Is Not Enough: Scaling Out
Every single-node deployment has a ceiling. When analysisd threading, event reduction and indexer tuning have been applied and the queues still saturate, the answer is horizontal scale: a manager cluster with worker nodes sharing the agent load, and a multi-node indexer cluster spreading indexing and search across data nodes.
Scaling out is not just adding servers; it is distributing the right stage. If analysisd is the bottleneck, additional manager worker nodes help. If indexing is the bottleneck, additional indexer data nodes help. Diagnosing which stage saturates determines what to add, which is why the measurement discipline from the start of this guide pays off again at scale-out time.
Performance optimisation and scalability planning are two halves of the same problem. Optimisation extracts maximum throughput from the hardware you have; scalability planning decides when and how to add hardware. Treating them together avoids both the trap of over-provisioning early and the trap of hitting a wall with no migration path.
Frequently Asked Questions
How many events per second can Wazuh handle?
It depends on hardware, ruleset weight and shard layout, and it is bounded by the slowest stage. A single well-sized node can sustain several thousand EPS; clustered managers and multi-node indexers scale to tens of thousands and beyond. The honest answer comes from benchmarking your own ruleset and storage, not from a generic number.
How do I know if Wazuh is dropping events?
Watch the analysisd event queue usage and the indexer thread-pool rejection counts. A queue consistently near capacity means events are about to be dropped, and write or bulk rejections on the indexer mean events are being refused outright. Both are definitive signals that a stage is saturated and needs tuning or scale-out.
How many shards should a Wazuh index have?
Enough for write parallelism across your data nodes, but few enough that each shard stays in a sensible size band (commonly tens of gigabytes). Too few shards caps throughput; too many waste heap and file handles on overhead. Use index lifecycle management to roll over by size or age and distribute shards evenly across nodes.
How much RAM does the Wazuh indexer need?
Plan JVM heap at roughly half of available RAM, leaving the rest for the operating system file cache that the indexer depends on. Oversizing heap past recommended limits hurts performance because of garbage collection. At high volume, RAM for the indexer and fast disk for indices are usually the binding constraints rather than CPU.
Does increasing analysisd threads improve performance?
Yes, on a multi-core manager where analysisd is the bottleneck. More rule-matching threads let the engine drain its event queue faster under bursty load, raising sustainable EPS. Pair it with dropping high-volume low-value events at the agent so analysisd spends cycles only on telemetry that matters.
Can Codesecure size and tune Wazuh for high volume?
Yes. Codesecure benchmarks EPS against tested node throughput, diagnoses the saturated stage, and tunes analysisd threading, shard layout, JVM heap and storage tiers. We size for realistic peaks with headroom so clients capture data during the incidents that generate the most events. ISO/IEC 27001:2022 certified delivery.
Sustain High Event Volume Without Dropping Data
Codesecure benchmarks, sizes and tunes Wazuh for high-EPS estates: indexer sharding, analysisd threading, JVM heap and storage tiers. ISO/IEC 27001:2022 certified delivery, named SOC engineers, fixed monthly retainer.

