kprofile(1) — Commands
NAME
kprofile − Profile the kernel with the EV4 performance counters
SYNOPSIS
kprofile [-all |-each|-one] [[-i] -k <kernel_name>] [-t] [[STAT0] [STAT1]] [<command> [<arg> ...] ]
PARAMETERS
STAT0 STAT1
Specifies the statistics you want the kernel to collect. The EV4 chip has two performance counter registers, each of which can be separately programmed. The statistics each counter can collect are shown in the following table:
| Counter0 Statistics (STAT0) | Counter1 Statistics (STAT1) |
| 0disabled | 1disabled |
| issues | dcache |
| pipedry | icache |
| loads | dualissues |
| pipefrozen | mispredicts |
| branches | floatops |
| cycles | intops |
| PALcycles | stores |
| nonissues | novictims |
| victims |
For a complete description of these statistics, refer to pfm(7).
command [arguments]
Specifies a command to execute. If you specify a command, kprofile collects statistics on only that command and its process descendants. Collection runs until the command completes, and then kprofile generates the kmon.out file.
FLAGS
If you specify flags, they must be first on the command line (before the statistics or command parameters).
-k <kernel_name>
Overrides the name of the kernel to profile. (The default is the booted kernel.)
-tEnables triggered mode. This flag sets up all required information for running the performance counters, but does not invoke them. See the DESCRIPTION section for further information.
-iUses integer (32 bit) sample buckets in the generated kmon.out file(s). While this flag generates a file twice as large as the file generated by using the default short (16 bit) buckets, no overflow problem should exist. This flag is useful for profiling programs that spend much of their execution time in small loops.
-all|-each|-one
Specifies which mode to use for profiling on multi-processor machines. Using the -all flag (the default) aggregates all of the CPU data into one kmon.out file. Using the -each flag collects separate profiles for each CPU, and writes the output into a set of files named kmon.out.n, where n is the CPU number. Using the -one flag profiles only the current CPU. The kprofile command must be run using the runon command for the -one flag to operate correctly.
DESCRIPTION
The kprofile command uses the performance counters on the EV4 chip to produce a detailed kernel program counter (PC) profile. The program creates the kmon.out file that you can use as input to the prof command. During execution the program follows these steps:
1.Creates (or overwrites) the kmon.out file.
2.Enables performance counters on the EV4 chip.
3.Runs the command you specify or sleeps (while the kernel collects performance data) until you enter Ctrl/C.
4.Writes the profile data to the kmon.out file.
You use the prof command to help you analyze the data in the kmon.out file. The following example shows how to invoke the prof command:
% prof /vmunix kmon.out
When one of the counters on the EV4 chip reaches 4096 events (that is, 4096 cycles), an interrupt is triggered. This interrupt causes kprofile to record the PC.
You can disable either counter0 or counter1, by specifying 0disabled or 1disabled as the counter statistic. You can use this feature to isolate specific event types, such as loads, without extraneous data being generated. You cannot disable both counters at once.
By default, the system counts cycles on counter0, and disables counter1. A 150 Megahertz (MHz) EV4 produces 36621 samples per second, which is much higher than the normal hardclock-driven profiling rate of 1024 per second. This high sample rate produces a heavy interrupt load on the system that can noticeably slow performance. Also, because the interrupt rate is not tied to the 1024 Hz hardclock, the number of "seconds" reported by the prof command is incorrect. Only the percentages of elapsed time are reliable.
You can specify alternate events to produce interesting results. For example, consider the following command:
% kprofile PALcycles icache
This command generates a statistical list of what kernel routines tend to use PAL cycles or generate I-cache misses. For a complete description of the various statistics available, consult the pfm(7) reference page.
To perform a detailed analysis of short sections of kernel code, use triggered mode (invoked with the -t flag). When you use this mode, kprofile performs all of the required setup for enabling the counters as normal, but does not invoke them. You can insert counter start or stop commands into the kernel code to be instrumented as follows:
Turn counters on:wrperfmon (PFOPT, 1)
Turn counters off:wrperfmon (0)
You can turn the counters on and off repeatedly to collect data over many iterations or multiple sections of code.
The macro PFOPT is defined in <sys/pfcntr.h>.
NOTES
You must configure the pfm pseudo-device into the kernel you want to profile. Follow these steps to configure the pfm pseudo-device:
1.Become the superuser by logging in as root or using the su command.
2.Add the following line to the file /sys/conf/MACHINE, where MACHINE is the name of your system’s configuration file:
pseudo-device pfm
Position this line in the file among the other psuedo-device configurations.
3.Reconfigure and rebuild the kernel by issuing the doconfig command using the -c flag. The following shows the doconfig command for a system that has a configuration file named PEARLY:
# doconfig -c PEARLY
Replace PEARLY with the name of your system’s configuration file in the preceeding command.
4.Copy the new kernel to the boot location, as shown:
# cp /sys/PEARLY/vmunix /vmunix
Replace PEARLY with your system’s name in the preceeding command.
5.Reboot your system:
# shutdown -r now
This shutdown command performs an immediate system shutdown and automatically reboots the system.
RESTRICTIONS
The victim and novictim statistics rely on the external performance counter pin connections as described in the EV4 chip specification. Currently, only the DEC 3000/400, 3000/500, 3000/600, and 3000/800 workstations have these connections. Attempts to display either of these statistics on other platforms (while allowed) typically generates empty data.
Generating statistics for a single command is possible only on EV4 Pass 3 processors. Attempts to do this on a Pass 2 processor will gather statistics for the entire system, as if no command had been specified.
Using triggered mode also requires an EV4 Pass 3 processor and cannot be performed with per-process monitoring.
FILES
/dev/pfcntrThe performance counter device file.
kmon.out[.n]The generated statistics file(s).
/vmunixThe kernel to profile.