Data Processing Benchmark Featuring Rust, Go, Swift, Zig, Julia etc.
by behnamoh on 1/31/2026, 8:50:56 PM
https://github.com/zupat/related_post_gen
Comments
by: pron
I was surprised to see that Java was slower than C++, but the Java code is run with `-XX:+UseSerialGC`, which is the slowest GC, meant to be used only on very small systems, and to optimise for memory footprint more than performance. Also, there's no heap size, which means it's hard to know what exactly is being measured. Java allows trading off CPU for RAM and vice-versa. It would be meaningful if an appropriate GC were used (Parallel, for this batch job) and with different heap sizes. If the rules say the program should take less than 8GB of RAM, then it's best to configure the heap to 8GB (or a little lower). Also, System.gc() shouldn't be invoked.<p>Don't know if that would make a difference, but that's how I'd run it, because in Java, the heap/GC configuration is an important part of the program and how it's actually executed.<p>Of course, the most recent JDK version should be used (I guess the most recent compiler version for all languages).
2/1/2026, 12:03:34 AM
by: jhack
D gets no respect. It's a solid language with a lot of great features and conveniences compared to C++ but it barely gets a passing mention (if that) when language discussions pop up. I'd argue a lot of the problems people have with C++ are addressed with D but they have no idea.
2/1/2026, 12:35:58 AM
by: piskov
C# is very fast (see multicore rating). Implementation based on simd (vector), memory spans, stackalloc, source generators and what have you — modern C# allows you go very low-level and very fast.<p>Probably even faster under .net 10.<p>Though using stopwatch for benchmark is killing me :-) Wonder if multiple runs via benchmarkdotnet would show better times (also due to jit optimizations). For example, Java code had more warm-up iterations before measuring
2/1/2026, 1:28:03 AM
by: XJ6w9dTdM
I was very surprised to see the results for common lisp. As I scrolled down I just figured that the language was not included until I saw it down there. I would have guessed SBCL to be much faster. I checked it out locally and got: Rust 9ms, D: 16ms, and CL: 80ms.<p>Looking at the implementation, only adding type annotations, there was a ~10% improvement. Then the tag-map using vectors as values which is more appropriate than lists (imo) gave a 40% improvement over the initial version. By additionally cutting a few allocations, the total time is halved. I'm guessing other languages will have similar easy improvements.
2/1/2026, 4:11:12 AM
by: von_lohengramm
This entire benchmark is frankly a joke. As other commenters have pointed out, the compiler flags make no sense, they use pretty egregious ways to measure performance, and ancient versions are being used across the board. Worst of all, the code quality in each sample is extremely variable and some are _really_ bad.
2/1/2026, 1:43:48 AM
by:
2/1/2026, 2:06:24 AM
by: jasonjmcghee
What's up with the massive jump for 20k to 60k for nearly all languages?
2/1/2026, 3:39:00 AM
by: matthewfcarlson
I see some questions around the methodology of the testing. But is this representative of Ruby? Several minutes total when most finish under a second?
2/1/2026, 2:08:05 AM
by: sergiotapia
I wrote a script (now an app basically haha) to migrate data from EMR #1 to EMR #2 and I chose Nim because it feels like Python but it's fast as hell. Claude Code did a fine job understanding and writing Nim especially when I gave it more explicit instructions in the system prompt.
2/1/2026, 4:17:01 AM
by: Imustaskforhelp
This is really interesting. Julia is a beast compared to python.<p>Nowadays whenever I see benchmarks of different languages. I really compare it to benjdd.com/languages or benjdd.com/languages2<p>Ended up creating a visualization of this data if anybody's interested<p><a href="https://serjaimelannister.github.io/data-processing-benchmark/" rel="nofollow">https://serjaimelannister.github.io/data-processing-benchmar...</a><p>(Given credits to both sources in the description of this repo)<p>(Also fair disclosure but it was generated just out of curiosity of how this benchmark data might look if it was on benjdd's ui and I used LLM's for this use case for prototyping purposes. The result looks pretty simiar imo for visualization so full credits to benjdd's awesome visualization, I just wanted this to be in that to see for myself but ended up having it open source/on github pages)<p>I think benjdd's on hackernews too so hi ben! Your websites really cool!
2/1/2026, 12:05:40 AM
by: aatd86
Isn't that measuring the speed of json encoding instead?
2/1/2026, 2:24:50 AM
by:
2/1/2026, 1:56:53 AM
by: pyrolistical
That’s odd zig concurrent got slower
2/1/2026, 1:27:40 AM
by: KerrAvon
Genuine question: Are GitHub workflows stable enough to be used for benchmarking? Like CPU time quantum scheduling is guaranteed to be the same from run to run?
2/1/2026, 2:52:07 AM
by: Vaslo
So in the D vs Zig vs Rust vs C fight - learn d if speed is your thing?
2/1/2026, 12:21:20 AM
by: ekianjo
Data processing benchmark but somehow R is not even mentioned?
2/1/2026, 1:19:09 AM