How to Supercharge Your Linux Per-Core I/O Performance by 60%: A Step-by-Step Guide Inspired by Jens Axboe's Latest Patches

By

Introduction

At the recent Linux Storage, File-System, Memory Management, and BPF Summit (LSFMM) in Croatia, a presentation highlighted the I/O overhead of Linux compared to the Storage Performance Development Kit (SPDK). This sparked Jens Axboe, the lead IO_uring developer and Linux block maintainer, to dive into optimizations. His resulting patches delivered an impressive ~60% increase in per-core I/O performance. This guide walks you through the process—from understanding the problem to implementing and testing similar enhancements on your own system.

How to Supercharge Your Linux Per-Core I/O Performance by 60%: A Step-by-Step Guide Inspired by Jens Axboe's Latest Patches

What You Need

  • A Linux development machine (preferably with a recent kernel source, e.g., 6.x)
  • Basic familiarity with Linux kernel compilation and command-line tools
  • Installation of necessary development packages: build-essential, libncurses-dev, bison, flex, libssl-dev, and git
  • Access to the latest kernel source code (clone from git.kernel.org or download a tarball)
  • Benchmarking tool: fio (Flexible I/O Tester) for measuring per-core performance
  • Knowledge of IO_uring and the block layer (helpful but not strictly required)
  • Patience and a test environment (do not apply unfinished patches on production machines)

Step-by-Step Guide

Step 1: Identify the I/O Overhead Bottleneck

Before optimizing, understand where the overhead lies. Review presentations or documentation that compare Linux I/O performance with SPDK. Common bottlenecks include lock contention, syscall overhead, and inefficient memory management. Axboe’s work focused on reducing per-IO overhead in the block layer and IO_uring paths. For your own analysis, use tools like perf and trace-cmd to capture kernel traces during heavy I/O workloads.

Step 2: Set Up Your Development Environment

  1. Clone the Linux kernel source tree from the official repository:
    git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  2. Install required build dependencies. For Debian/Ubuntu:
    sudo apt-get install build-essential libncurses-dev bison flex libssl-dev
  3. Configure the kernel. Start with a baseline configuration (e.g., make defconfig) and ensure IO_uring support is enabled (CONFIG_IO_URING=y).

Step 3: Find and Apply the Performance Patches

Axboe’s patches are typically submitted to the Linux Kernel Mailing List (LKML) or available in the io_uring development branch. To replicate the 60% gain, look for series titled like “per-core IO improvements” or similar. Steps:

  • Search LKML archives or the maintainer’s git tree.
  • Download the patch series (e.g., git format-patch from a working branch).
  • Apply patches on top of your kernel source: git am *.patch.
  • Resolve any conflicts manually if they occur.

Step 4: Compile and Install the Custom Kernel

  1. Build the kernel and modules: make -j$(nproc)
  2. Install modules: sudo make modules_install
  3. Install the kernel image: sudo make install
  4. Update bootloader (e.g., update-grub) and reboot into the new kernel.

Step 5: Benchmark Per-Core I/O Performance

Use fio to measure single-core I/O throughput. Example command for random reads with IO_uring:

fio --name=test --ioengine=io_uring --rw=randread --bs=4k --numjobs=1 --size=1G --runtime=30 --time_based --group_reporting

Run the same benchmark on the baseline kernel (without patches) and the patched kernel. Compare the IOPS (I/O operations per second) and latency percentiles.

Step 6: Analyze and Iterate

If your results don’t show a ~60% improvement, investigate:

  • Check kernel config differences (ensure no debugging options that slow down I/O).
  • Use perf top while running fio to identify remaining hot spots.
  • Try different patch versions or additional optimizations from Axboe or other developers.

Tips for Success

  • Test on a non-critical system – these patches are cutting-edge and may have stability issues.
  • Use the exact same hardware and workload for before/after comparisons to avoid variables.
  • Watch the LKML and IO_uring mailing list for evolved patches, as Axboe often posts updated versions.
  • Consider enabling kernel debug options initially to catch any regressions, then disable for performance runs.
  • Document each patch and its effect to contribute back to the community if you build on the work.
  • Understand the trade-offs – the patches may increase per-core performance at the cost of slightly higher memory usage or complexity.

Conclusion

By following these steps, you can harness the same optimizations that Jens Axboe developed to boost per-core I/O performance by up to 60%. Remember that kernel development is iterative; your mileage may vary depending on your hardware and workload. Stay engaged with the open-source community to get the latest improvements and contribute your findings.

Related Articles

Recommended

Discover More

Scope 3 Emissions: A Daunting Challenge, But Solutions Exist, Experts SayFrom Mod to Miracle: How Piranha Games Turned a Half-Life Fan Project into an Official Die Hard TitleSecuring Your cPanel Server Against Critical Authentication Flaws: A Step-by-Step Update GuideStep-by-Step Guide: In-Place Vertical Scaling for Pod-Level Resources in Kubernetes v1.36OpenAI Reveals Origin of 'Goblin' AI Glitch in Codex CLI