V Benchmarks

All benchmarks compiled with v -prod on Apple M5, 16 GB RAM, macOS (arm64). V version: 0.5.1.

GC: Boehm vs VGC

Compares Boehm GC (-gc boehm) against V's built-in concurrent tri-color mark-and-sweep (-gc vgc). 5 iterations per test, median reported.

v run bench/bench_gc.v

  test                                             boehm       vgc     ratio
  ———————————————————————————————————————————— ————————— ————————— —————————
  small allocs (1000000x string)                   39 ms     48 ms    1.23x
  tree build+walk (depth=18, 10x)                  48 ms    118 ms    2.46x
  array grow (100x 100000 pushes)                   9 ms     26 ms    2.89x
  map insert (20x 10k entries)                     20 ms     27 ms    1.35x
  mixed workload (50 rounds)                       10 ms     16 ms    1.60x

  heap usage:
    boehm: 29856 KB allocated, 29296 KB free
    vgc:   131072 KB allocated, 0 KB free

Boehm is still 1.2x-2.9x faster across these workloads and uses ~4x less heap.

Closures

Measures closure creation, invocation, multi-threaded creation, and memory overhead.

v -prod -o /tmp/bench_closure bench/bench_closure.v && /tmp/bench_closure

| Test Name                 | Iterations | Time(ms) | Ops/sec      |
|---------------------------|------------|----------|--------------|
| Normal Function Call      |  100000000 |        0 |  +inf Mop/s  |
| Small Closure Creation    |   10000000 |      188 | 53.19 Mop/s  |
| Medium Closure Creation   |   10000000 |      376 | 26.60 Mop/s  |
| Large Closure Creation    |    1000000 |      121 |  8.26 Mop/s  |
| Small Closure Call        |  100000000 |      136 | 735.29 Mop/s |
| Medium Closure Call       |  100000000 |      133 | 751.88 Mop/s |
| Large Closure Call        |   10000000 |       16 | 625.00 Mop/s |
| Multi-threaded Creation   |    1000000 |       95 | 10.53 Mop/s  |

Memory: ~69 bytes per closure (medium, 4 captured vars). Closure calls are ~625-750 Mop/s.

String Deduplication

Compares four deduplication strategies on 10,000 strings with ~30% duplicates.

v -prod -o /tmp/bench_string_dedup bench/bench_string_dedup.v && /tmp/bench_string_dedup

Method 1 (basic array)          33 ms   7000 unique
Method 2 (pre-allocated array)  27 ms   7000 unique
Method 3 (map)                   0 ms   7000 unique
Method 4 (set)                   0 ms   7000 unique

Maps and sets are orders of magnitude faster than linear array search for deduplication.

Vectors (Boids Simulation)

N-body boids simulation with 10,000 entities: cohesion, separation, and alignment.

v -prod -o /tmp/bench_vectors bench/vectors/vectors.v && /tmp/bench_vectors

~50 ms per run (after warmup)

Crypto: ECDSA

Key generation, signing, and verification (1,000 iterations each).

v -prod -o /tmp/bench_ecdsa bench/crypto/ecdsa/ecdsa.v && /tmp/bench_ecdsa

Average key generation time:   9 µs
Average sign time:            11 µs
Average verify time:          30 µs

SOA Structs (V2 cleanc only)

Compares Array-of-Structs vs Struct-of-Arrays memory layout for a 16-field particle system (500k particles). Uses V2's @[soa] attribute which auto-generates separate contiguous arrays per field for better cache utilization.

./cmd/v2/v2 -prod -backend cleanc bench/bench_soa_structs.v -o bench/bench_soa_structs
./bench/bench_soa_structs

build particles
  aos: 20 ms    soa push: 125 ms    soa indexed: 37 ms

sum x only
  aos: 19 ms    soa: 12 ms    speedup: 1.58x

sum x/y/z/life (4 of 16 fields)
  aos: 14 ms    soa: 10 ms    speedup: 1.40x

sum all 16 fields
  aos: 11 ms    soa: 16 ms    speedup: 0.69x

integrate position/velocity/life
  aos: 10 ms    soa: 13 ms    speedup: 0.77x

SOA is 1.4x-1.6x faster for partial field access (fewer cache lines touched). When all fields are accessed or mutated, AOS wins due to less pointer indirection.