The 5 best papers I read in 2021

Can you believe it, there is just a week left in 2021! Although it's been a tough year for a lot of reasons, 2021 has been a great year for learning new concepts and ideas. Here are 5 of my favorite research papers that I got to read this year!

#5 : "Critical Evaluation of Existing External Sorting Methods..."

  • 2015, Martin Krulis, Charles University πŸ‡¨πŸ‡Ώ

Ever needed to process a set of data that is too large for a single computer? Like in that whole cloud thing everyone keeps talking about? This paper is a one stop shop for learning about external sorting algorithms that do just that.

#4 : "Stratified B-trees and versioning dictionaries"

  • 2011, Andrew Twigg et al., Oxford University πŸ‡¬πŸ‡§

Ah yes, the mighty B-Tree, 50 years old and still the fundamental underpinning of most of the world's data. I had some fun this winter experimenting with copy-on-write trees to build a non-blocking transaction system and this paper was a great resource on how to remove bottlenecks from the design.

#3 : "Consistency Without Ordering"

  • 2012, Vijay Chidambaram et al., University of Texas πŸ‡ΊπŸ‡Έ

This paper defies gravity by side-stepping one of the most fundamental laws of filesystems: guaranteed write ordering. It demonstrates a technique for crash proof data integrity even if write operations are interrupted or arrive out of order.

#2 : "Parsing Gigabytes of JSON per Second"

  • 2019, Geoff Langdale et al., Intel πŸ‡ΊπŸ‡Έ

A stunning paper that shattered the performance ceiling on parsing unstructured text. It introduces and combines several techniques for complex, delimited/escaped text along with novel methods for parsing floating-point numbers. And it did it without any pre-processing and zero compromise to security. That sound you hear is your solid-state disks shaking in their boots.

#1 : "BLAKE3 : One Function, Fast Everywhere"

  • 2020, Jack O’ Connor et al., Keybase/Zoom πŸ‡ΊπŸ‡Έ

A paper so packed with ideas that it would make Ralph Merkle himself proud. It draws a line in the sand about the design, performance and applications of hash trees while providing instructional insight into everything from cryptographic algorithm design and data validation all the way to SIMD CPU instructions and parallel data processing. It reminds me of back in highschool when I learned how Bittorrent sliced and shared data securely and I knew at once that I wanted to study computer science. This one is really special.