Using eBPF Hooks to Profile Linux File System Activity Across Benchmarking Workloads
Penn collection
Degree type
Discipline
Subject
Kernel Benchmarking
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Modern operating systems must balance performance and adaptability when managing diverse application workloads, yet their file system behavior relies on fixed policies that may not make optimal choices for dynamic workloads. This project investigates Linux file system activity by inserting eBPF probes into vfs_read and vfs_write, two central functions that dispatch user-level I/O requests. Using KernMLOps, a standardized benchmarking and instrumentation framework, we extended support to attach probes not only at function entry but also within specific branches, enabling precise distinction between legacy .read/.write paths and newer .read_iter/.write_iter paths. Experiments were conducted on CloudLab nodes with Linux kernel 6.6.42 across representative workloads, including Redis and Fio. Collected syscall metadata—PID/TID, buffer size, return values, and path usage—was analyzed with polars and matplotlib. Results revealed distinct workload “signatures”: Redis exhibited mixed path usage with noisy buffer distributions, while Fio showed structured, iterator-dominant access patterns. These differences highlight how workloads leave identifiable syscall traces, providing ML-ready features for adaptive OS policies. By characterizing I/O behavior in this way, we lay the foundation for data-driven optimizations in caching, batching, and scheduling, advancing the broader LDOS goal of using machine learning to optimize operating systems.