uprofile(1) — Commands
NAME
uprofile − Profile user code with the EV4 performance counters
SYNOPSIS
uprofile [-v] [-i] [-all|-each|-one] [STATS] <command> [<arg> ...]
FLAGS
-vEngages verbose mode, which prints some useful information about the program being profiled.
-iUses integer (32 bit) sample buckets in the generated umon.out file(s). While this makes the generated file twice as large as using the default short (16 bit) buckets, it is almost impossible to overflow a bucket. Recommended when profiling programs that spend much time in small, tight loops. If a bucket overflows when not using the -i option, a warning is produced with the offending PC.
-all|-each|-one
Specifies which mode to use for profiling on multi-processor machines. Using the -all flag (the default) aggregates all CPUs’ data into one umon.out file. Using the -each flag collects separate profiles for each CPU, and writes the output into a set of files named umon.out.n, where n is the CPU number. Using the -one flag only profiles the current CPU. The uprofile program must be run using the runon command for the -one mode to work.
DESCRIPTION
The uprofile command uses the EV4 performance counters to produce a finely-grained PC profile of a user program. The program creates the kmon.out file that you can use as input to the prof command. During execution the program follows these steps:
1.Creates (or overwrites) the umon.out file.
2.Enables performance counters on the EV4 chip.
3.Runs the command you specify.
4.Writes the profile data to the umon.out file.
You use the prof command to help you analyze the data in the umon.out file. The following example shows how to invoke the prof command:
% prof umon.out
You specify the statistics you want to collect for the user code being profiled in the STATS parameter.
The EV4 chip has two performance counter registers, each of which can be separately programmed. The statistics each counter can collect are shown in the following table:
| Counter0Stats | Counter1Stats |
| 0disabled | 1disabled |
| issues | dcache |
| pipedry | icache |
| loads | dualissues |
| pipefrozen | mispredicts |
| branches | floatops |
| cycles | intops |
| PALcycles | stores |
| nonissues | novictims |
| victims |
For a complete description of these statistics, refer to pfm(7).
The command [arg...] parameter specifies a command to execute and its optional arguments. It is not necessary for the command to have been compiled with the normal profiling switch (-p). Note that the uprofile command does not work correctly on "stripped" images: that is, images that do not contain a symbol table.
When either of the counters on the EV4 chip reaches 4096 events (that is, 4096 cycles), an interrupt is triggered. This interrupt causes uprofile to record the PC.
Either counter0 or counter1 may be disabled, by specifying "0disabled" or "1disabled" as the counter statistic. This can be used to isolate specific event types, such as loads, without extraneous data being generated. You cannot disable both counters at once.
When you specify no counter statistics, the uprofile command counts cycles on counter 0 by default, disabling counter 1. A 150 MHz EV4 produces 36621 samples per second, which is much higher than the normal hardclock-driven profiling rate of 1024 per second. Note that this produces a heavy interrupt load on the system, which can noticeably slow performance. Also, because the interrupt rate is not tied to the 1024 Hz hardclock, the number of "seconds" reported by the prof command is incorrect. Only the percentages of elapsed time are reliable.
If you specify a statistic for each counter, the uprofile command accumulates their results. You cannot then view the results of any single statistic separately.
Alternate events can be specified to produce some interesting results. For example, specifying "uprofile PALcycles icache <command>" will generate a statistical list of what routines tend to use PAL cycles and generate instruction cache misses. For a complete description of each of the various statistics available, consult the pfm(7) reference page.
NOTES
The kernel in use must have the pfm pseudo-device configured in to it. To do this, add the following line to the kernel configuration file, and rebuild the kernel:
pseudo-device pfm
The victim and novictim statistics rely on the external performance counter pin connections as described in the EV4 chip specification. Currently, only the DEC 3000/400,/500,/600, and /800 workstations have these connections. Attempts to display either of these statistics on other platforms (while allowed) will typically generate empty data.
User-level profiling is only possible on EV4 Pass 3 or later processors. Attempts to do this on a Pass 2 processor will gather PC samples for every process running on the system; this will lead to massively erroneous samples in the data stream.
FILES
/dev/pfcntrThe performance counter device file.
umon.out[.n]The generated statistics file(s). Use as the mon.out file for the prof command.