Profiling Python with eBPF: A New Frontier in Performance Analysis Link to heading

Profiling Python applications can be challenging, especially in scenarios involving high-performance requirements or complex workloads. Existing tools often require code instrumentation, making them impractical for certain use cases. Enter eBPF (Extended Berkeley Packet Filter)—a revolutionary Linux technology—and the open-source project Parca, which together are reshaping the landscape of Python profiling.

In this post, I’ll explore how eBPF enables continuous profiling, discuss challenges like stack unwinding in Python, and demonstrate the power of modern profiling tools.

You can also watch my full talk here or refer to the slides from the presentation.

Why Do We Need Profiling? Link to heading

Profiling helps optimize performance and troubleshoot issues, such as CPU spikes, memory leaks, or out-of-memory (OOM) events. For instance:

Performance optimization: Identifying bottlenecks in code.
Incident resolution: Determining which function or component caused a memory spike or CPU overload.

Traditional Python profiling tools, like cProfile or py-spy, require application instrumentation, which isn’t always feasible—especially in production environments where code access might be restricted. This is where eBPF shines, offering non-intrusive, external profiling.

Existing Profiling Solutions in Python Link to heading

The Python ecosystem offers several profiling tools, each with unique strengths:

cProfile: A built-in module for deterministic profiling.
pyinstrument: A call stack profiler for Python.
py-spy: A sampling profiler for Python programs.
yappi: Yet Another Python Profiler, supports multithreaded programs.
Pyflame: A ptracing profiler for Python.
Scalene: A high-performance CPU and memory profiler.

While these tools are valuable, many require code instrumentation or introduce significant overhead, making them less suitable for continuous profiling in production environments.

What Is eBPF? Link to heading

Originally designed for network packet filtering, eBPF has evolved into a versatile event-driven system. It enables safe execution of custom programs inside the Linux kernel, using:

Performance Monitoring Units (PMUs): Efficient hardware units that track CPU cycles and other metrics.
Perf subsystem: A Linux facility for hooking into kernel and user-space events, such as CPU activity, memory allocation, or I/O.

By leveraging eBPF with PMUs, profiling becomes faster and more efficient than traditional approaches.

Continuous Profiling with Parca Link to heading

Parca is an open-source project enabling continuous profiling. Its eBPF agent hooks into perf events, collects stack traces, and aggregates data for visualization. The process involves:

Hooking into CPU events to monitor active functions.
Stack unwinding to trace function calls.
Data aggregation and visualization in a web-based UI.

Unlike traditional profilers, Parca introduces minimal runtime overhead, making it ideal for production workloads.

Stack Unwinding: A Key Challenge Link to heading

Native Code Link to heading

Profiling native code is straightforward: we unwind the stack by reading memory addresses from the CPU and resolving them into human-readable symbols using debug information (e.g., DWARF).

Python Code Link to heading

For Python, stack unwinding is complex due to its interpreter-based execution. Python maintains execution state in custom data structures, such as:

Interpreter state: Tracks threads and their execution context.
Thread state: A linked list of threads running in the interpreter.
Frame state: Represents the current execution frame.

To unwind Python stacks, we must traverse these structures, extract relevant information, and map them to human-readable symbols.

How Parca Profiles Python Link to heading

Here’s how Parca handles Python profiling:

Reverse Engineering the Python Runtime:
- Analyze Python’s internal structures (e.g., thread and frame states).
- Identify offsets and symbols using tools like GDB or DWARF debuggers.
Unwinding Python Stacks:
- Traverse thread states to locate the active Global Interpreter Lock (GIL) holder.
- Walk through execution frames to collect function call data.
Mapping Symbols:
- Resolve function addresses to readable symbols.
- Encode line numbers and function names for better traceability.
Efficient Data Handling:
- Use eBPF maps for kernel-to-user space communication.
- Optimize symbol resolution by caching frequently seen traces.

Python 3.13: A Game-Changer for Profiling Link to heading

The upcoming Python 3.13 release introduces a debug offset structure that simplifies stack unwinding. It provides precomputed offsets for key runtime fields, eliminating much of the manual reverse engineering required for earlier versions. This improvement marks a significant leap forward for tools like Parca.

Visualizing Profiles with Parca Link to heading

Parca’s UI provides a comprehensive view of application performance:

Flame graphs: Visualize stack traces over time, highlighting bottlenecks.
Filtering and Metadata: Focus on specific languages (e.g., Python) or layers (e.g., C libraries).
Continuous Insights: Compare profiles across deployments to monitor performance regressions.

For example, a flame graph might reveal inefficient recursion in a Python function, enabling developers to pinpoint and optimize the problematic code.

Supported Python Versions Link to heading

Parca supports profiling for Python versions from 2.7 to 3.11, with ongoing work for 3.12 and full support anticipated for 3.13. The project’s modular design allows quick adaptation to new Python runtime changes.

Conclusion Link to heading

Profiling Python applications with eBPF and Parca represents a new frontier in performance analysis. By leveraging eBPF and continuous profiling, we can gain invaluable insights into our applications, enabling effective performance optimization. I encourage you to explore Parca, provide feedback, and contribute to the project—it’s a collaborative effort that can benefit us all as we tackle the challenges of modern software development.

Get Started Link to heading

Watch my full talk or check out the presentation slides. Explore Parca on GitHub and join the community. Your feedback helps improve the tooling and shape the future of observability.