Benchmarks
Wickra is a streaming-first library: the state machine inside every indicator takes a single new tick and returns its updated output in constant time. The charts below show what that costs against the full Python TA ecosystem and the other Rust crates — wins and losses, the same figures the project README carries. Each bar is normalised to the slowest in its group, so the shortest bar is the fastest library; the value to the right is the measured number.
Choosing a language? Jump to per-binding throughput
All 10 bindings call the same verified Rust core, but the cost of crossing each language's FFI boundary differs by orders of magnitude on streaming workloads. See Per-binding throughput to pick the binding that keeps up with your hot loop (Rust / C / C++ / C# are near-core; R is the outlier).
Reproduce these on your own hardware
# Python — vs talipp / TA-Lib / tulipy / pandas-ta / finta
pip install -e bindings/python[bench]
python -m benchmarks.compare_libraries
# Rust core — vs kand / ta-rs / yata
cargo bench -p wickra-benchThe Python script auto-detects every peer library installed in your venv. The nightly cross-library-bench workflow runs both suites on a Linux runner and uploads the raw reports as artefacts.
1. Streaming — the structural win
Live trading feeds one tick at a time. Wickra updates every indicator in O(1); batch-only libraries (TA-Lib, tulipy, finta, pandas-ta) have no incremental API and must recompute the whole history on every tick. Only talipp (Python) and ta-rs / yata (Rust) carry real per-tick state. This is the gap the library was built to expose.
Python — per-tick latency (seed 5 000 bars, then feed ticks one at a time):
Against the only other incremental Python peer Wickra is 11–56× faster; against the recompute-on-every-tick libraries it is 2 800–19 000× faster (finta RSI hits 19 000×). tulipy / pandas-ta land in the same recompute band as TA-Lib — too far off-scale to chart next to a sub-microsecond bar.
Rust — per-tick latency (whole 50 000-bar series, lower = faster):
ta-rs hands back a bare f64 from the first tick with no warmup and no validation; it leads several rows by giving those guarantees up. Against kand, Wickra wins streaming RSI, Bollinger and ATR. yata exposes only SMA/EMA as raw-value methods, so its other rows are omitted rather than faked.
2. Batch — competitive, not the headline
Whole series in one call. This is not the headline: hand-tuned C (tulipy, TA-Lib) and the leanest crate (kand) win the simple recurrences, and we show the full field rather than cherry-pick. Wickra trades a few µs per pass for the None-warmup, NaN-safety and bit-exact batch == streaming guarantees none of them keep — yet it still beats pandas-ta and finta on every row, and TA-Lib on RSI and ATR.
Python (20 000-bar pass, µs/op, lower = faster):
All five libraries are measured in the same Python 3.12 run as Wickra (no CI-vs-desktop mix). tulipy's SIMD C and TA-Lib lead the simple recurrences;
pandas-taandfintatrail across the board. talipp is excluded from the batch chart on purpose — it is streaming-first, so re-instantiating it for a full batch pass is not a like-for-like comparison.
Rust (50 000-bar pass, µs, lower = faster). Only Wickra and kand expose a batch API; ta-rs and yata are streaming-only:
Wickra wins RSI, Bollinger and ATR outright and trades a few µs on the simple recurrences for the warmup/NaN guarantees. Its real edge is breadth (514 indicators) and O(1) streaming across ten languages, not winning every micro-benchmark — the project README carries the same tables.
3 — Per-binding throughput
The sections above compare Wickra against other libraries — which only exist for Python and Rust. Every binding calls the same Rust core, so this last table is not a speed claim: it measures the raw cost of crossing each language's FFI boundary, in million updates per second (Mupd/s), for SMA(20) over 200 000 bars (median of 3, same machine as above).
| Target | streaming (Mupd/s) | batch (Mupd/s) |
|---|---|---|
| Rust core (no FFI) | 380 | 498 |
| C / C++ | 365 | 358 |
| C# | 348 | 259 |
| Python | 31 | 46 |
| Java | 38 | 173 |
| Go | 23 | 394 |
| WASM | 21 | 169 |
| Node.js | 16 | 9 |
| R | 0.1 | 279 |
Streaming spans three orders of magnitude — the raw C ABI (365) sits just under the FFI-free Rust ceiling (380), while R's per-call interpreter overhead makes streaming ~2800× slower than its own batch. The single batch crossing stays high for the bindings that return a contiguous buffer; the low outliers are Node (its napi batch boxes every element into a JS Array) and Python (a stdlib array.array copy, now that NumPy is optional). Reproduce with the per-binding throughput scripts — see BENCHMARKS.md §3.
What the numbers do not say
- Absolute µs values depend on CPU, memory clock, OS scheduler, and the Python / Node.js / Rust versions — read them as relative speedups between libraries on identical input, not as a universal performance contract.
- Reproduced on: Windows 11 Pro 26200, AMD Ryzen 9 9950X, 64 GB DDR5, Rust 1.92 (release profile,
lto = "fat",codegen-units = 1), Python 3.12. - The Python Wickra figures are the Python binding runtime, not the bare Rust kernel — a small PyO3 boundary cost is included on each measurement.
See also
benchmarks/compare_libraries.py— the canonical Python script.crates/wickra-bench— the Rust cross-library benchmark harness.- Bench workflow — nightly run on the GitHub-hosted Linux runner, archived as build artefacts.
- BENCHMARKS.md §3 — per-binding throughput benchmarks: raw updates/sec for each language binding (C, C++, C#, Go, Java, Python, R, WASM, plus the Rust core baseline). These measure each binding's FFI overhead, not the cross-library comparison shown above.
- Streaming-vs-Batch (docs) — what the equivalence guarantee actually means.