Intel Processor Trace – Software Tracing at the Finest Grain

Intel Logo


Tracing is a specialized use of logging to record information about a program’s execution. – Wikipedia

This is a very crucial approach of how to know how your program or your system works. Your computer keeps logging computer events that you are interested in looking into. You get a profile of how thingy inside works. Then, you can debug or fine-tune it to deliver better performance.

Recently, at LinuxCon North America 2014 Brendan Gregg, a senior performance architect from Netflix released a quick cheat sheet of tools how you can peek into Linux internals at different levels. It is pretty convenience and useful. Even made a post just for his diagram, “Linux Performance Observability Tools”.

Linux Performance Observability Tools

Up until now, tracing computer software is done by tracing software by software manner. It is a trap-based approach that when an interested event e.g. sending HTML packet is done, it jumps to software tracing program to record the event, then it goes back to its normal operation. Execution plane and tracing plane lie on the same dimension fed to processors. This introduces disturbance to program execution as it has to jump back and forth. Sometimes it makes noticeable overhead if events happen frequently or software tracing program has to do with slow computer parts such as harddisks.

Intel has released a specification for its new hardware-based software tracing extension named Processor Trace (I don’t know since when). It is written in IA64 System Programming Manual – Chapter 36 in this link. It allows a processor with this extension to records how a program got executed at machine instruction level, the finest grain in computer software. This produces slight overhead to execution time as program execution and tracing are separated. Programs just get executed normally as nothing happening. The extension just watch the executions from a bird eye view and record traces.

It does not blindly record every instruction coming through a processor. That will introduce super huge overhead to performance. It considers only machine instructions, asynchronous events and environment information that make execution dynamic. Programs whether programming language are used are eventually grinded down to machine instructions specific to a processor. Processors execute these machine instructions in sequence up until they meet an event diverging their execution flow.

The fruit from using this extension is that you can perfectly reconstruct a program you just traced. You just have to provide traces and the trace program binary to a trace analyzer program. Then, you get a slow-mo video of how your program was run. It sounds interesting, isn’t it?

An Intel employee made a blog post talking about this technology in overview and pointing to a Linux library to work with it.

I don’t know when and which Intel products get shipped with this extension. Phoronix reported that perf has been updated to support this extension, and it also speculated that it seems to be included in upcoming Broadwell architecture.

In the next post, I’m going to talk about what kind of data is generated from this extension.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s