Hacker News Viewer

Bypassing the kernel for 56ns cross-language IPC

by riyaneel on 4/16/2026, 5:13:50 PM

https://github.com/riyaneel/Tachyon/tree/main/docs/adr

Comments

by: riyaneel

I am the author of this library. The goal was to reach RAM-speed communication between independent processes (C++, Rust, Python, Go, Java, Node.js) without any serialization overhead or kernel involvement on the hot path.<p>I managed to hit a p50 round-trip time of 56.5 ns (for 32-byte payloads) and a throughput of ~13.2M RTT&#x2F;sec on a standard CPU (i7-12650H).<p>Here are the primary architectural choices that make this possible:<p>- Strict SPSC &amp; No CAS: I went with a strict Single-Producer Single-Consumer topology. There are no compare-and-swap loops on the hot path. acquire_tx and acquire_rx are essentially just a load, a mask, and a branch using memory_order_acquire &#x2F; release.<p>- Hardware Sympathy: Every control structure (message headers, atomic indices) is padded to 128-byte boundaries. False sharing between the producer and consumer cache lines is structurally impossible.<p>- Zero-Copy: The hot path is entirely in a memfd shared memory segment after an initial Unix Domain Socket handshake (SCM_RIGHTS).<p>- Hybrid Wait Strategy: The consumer spins for a bounded threshold using cpu_relax(), then falls back to a sleep via SYS_futex (Linux) or __ulock_wait (macOS) to prevent CPU starvation.<p>The core is C++23, and it exposes a C ABI to bind the other languages.<p>I am sharing this here for anyone building high-throughput polyglot architectures and dealing with cross-language ingestion bottlenecks.

4/16/2026, 5:13:50 PM


by: BobbyTables2

Would be interesting to see performance comparisons between this and the alternatives considered like eventfd.<p>Sure, the “hot path” is probably very fast for all, but what about the slow path?

4/19/2026, 3:54:49 AM


by: Fire-Dragon-DoL

Wow, congrats!

4/19/2026, 3:58:47 AM


by: JSR_FDED

What would need to change when the hardware changes?

4/19/2026, 2:38:45 AM