Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ kprofile(1) — Digital UNIX 4.0d

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

prof(1)

runon(1)

pfm(7)

pdtostd(1)

psrinfo(1)

uprofile(1)  —  Commands

NAME

uprofile, kprofile − Profile a program (uprofile) or kernel (kprofile) with Alpha on-chip performance counters

SYNOPSIS

uprofile [-v] [-all|-each|-one] [statistic ...] program [argument ...]

kprofile [-all|-each|-one] [-k kernel_name] [-t] [statistic] [program [argument ...]]

FLAGS

-vEngages verbose mode, which prints some useful information about the program being profiled. 

-all|-each|-one
Specifies which mode to use for profiling on multi-processor machines. Using the -all flag (the default) aggregates the data for all CPUs into one umon.out file.  Using the -each flag collects separate profiles for each CPU and writes the output into a set of files named umon.out.n, where n is the CPU number.  Using the -one flag profiles only the current CPU.  For the -one flag to work, the uprofile or kprofile program must be run using the runon command. 

-k kernel_name
Overrides the name of the kernel to profile. (The default is the booted kernel.)

-tEnables triggered mode for kprofile.  This flag sets up all required information for running the performance counters, but does not invoke them.  See the DESCRIPTION section for additional information. 

PARAMETERS

programThe name of the executable to run while profiling operations are being performed. 

argumentAn argument to pass to the program that is run. 

DESCRIPTION

The uprofile program uses the Alpha on-chip performance counters to produce a finely-grained program-counter profile of a user program.  The program creates the umon.out file for use as input to the prof command.  During execution, uprofile performs the following steps:

     1.Creates (or overwrites) the umon.out file. 

     2.Enables performance counters on the Alpha chip. 

     3.Runs the program you specify with the arguments you specify, collecting the selected statistics on the program’s process and its descendants. 

     4.Writes the profile data to the umon.out file. 

The kprofile program uses the Alpha on-chip performance counters to produce a detailed kernel program-counter profile.  The program creates the kmon.out file for use as input to the prof command.  During execution, kprofile performs the following steps:

     1.Creates (or overwrites) the kmon.out file. 

     2.Enables performance counters on the Alpha chip. 

     3.Runs the program you specify with the arguments you specify or, if no program is specified, sleeps (while the kernel collects performance data) until you enter Ctrl/C. 

     4.Writes the profile data to the kmon.out file. 

You use the prof command to help you analyze the data in the umon.out or kmon.out file.  The following examples show how to invoke the prof command to analyze data in the respective files:

% prof a.out umon.out
% prof /vmunix kmon.out

You specify the statistics that you want to collect for the program being profiled in one or more statistic parameters. 

The Alpha architecture implemented on your machine determines which statistics can be collected and the number of counters available for collecting multiple statistics at the same time.  The implementation is indicated by the Alpha chip number, which can be displayed by using the show config console command before booting Digital UNIX, or, after booting, by using the psrinfo −v command, or by calling getsysinfo (GSI_PROC_TYPE).  Also, if the uprofile command is run without arguments, it will show how many counters and what statistics are available on your machine. 

All of the chips in the EV4 chip set (21064 [EV4], 21064A [EV45], 21066/21068 [LCA4]) have two performance counter registers, each of which can be separately programmed.  The statistics that each counter can collect are shown in the following table:

Counter0Stats Counter1Stats
0disabled 1disabled
issues dcache
pipedry icache
loads dualissues
pipefrozen mispredicts
branches floatops
cycles intops
PALcycles stores
nonissues novictims
victims

All of the chips in the EV5 chip set (21164 [EV5], 21164A [EV56], and 21164PC [PCA56]) have three performance counter registers, each of which can be separately programmed.  Some of the counters are common to all EV5 implementations, some are specific to EV5 and EV56, and some are specific to PCA56. 

The statistics that each of the common EV5 counters can collect are shown in the following table:

Counter0Stats Counter1Stats Counter2Stats
0disabled 1disabled 2disabled
cycles0 nonissues longstalls
issues splitissue pcmispredicts
pipedry branchmispredicts
replay icachemisses
singleissues itbmisses
dualissues dcacheldmisses
tripleissues dtbmisses
quadissues ldsmerged
flowchanges ldureplays
intops fullreplays
floatops externalinput
loads cycles2
stores memorybarriers
icacheacc lockedloads
dcacheacc

The statistics that each of the EV5- and EV56-specific counters can collect are show in the following table:

Counter1Stats Counter2Stats
scacheacc scachemisses
scachereads scachereadmisses
scachewrites1 scachewritemisses
scachevictim scachesharedwrites
bcacheref scachewrites2
bcachevictim bcachemisses
sysreqs systeminvalidates
systemreadrequests

The statistics that each of the PCA56-specific counters can collect are shown in the following table:

Counter1Stats Counter2Stats
bcachereads bcachedreads
bcachedreadhits bcachereadhits
bcachedreadfills bcachereadfills
bcachewrites bcachewritehits
bcachecleanwritehits bcachewritefills
bcachevictims sysreadflushhits
readmisstwo sysreadflushmisses
readmissthree

For descriptions of the statistics for all EV4 and EV5 implementations, refer to pfm(7). 

You can disable any counter by specifying 0disabled, 1disabled, or 2disabled as the counter statistic.  You can use this feature to isolate specific event types, such as loads, without extraneous data being generated.  You cannot disable all counters at the same time, choose two statistics for the same counter, or disable a counter once its statistic is specified. 

When you specify no counter statistics, uprofile and kprofile count cycles on counter 0 by default, and prof displays a profile in terms of seconds used by each procedure in the program (except for any shared libraries).  For the noncycle statistics, prof’s profile shows the number of samples recorded, the sampling interval (events per second), and the total number of events that this implies. 

If you specify statistics for multiple counters, uprofile and kprofile accumulate their results.  You cannot then view the results of any single statistic separately.  Because collected data is merged into a single buffer, interpretation of multiply collected statistics may be difficult. 

To perform a detailed analysis of short sections of kernel code, use the kprofile command with triggered mode (invoked with the -t flag).  When you use this mode, kprofile performs all of the required setup for enabling the counters as normal, but does not invoke them.  You can insert counter start or stop commands into the kernel code to be instrumented as follows:

Turn counters on:  wrperfmon (PFOPT, 1)
Turn counters off: wrperfmon (0)

You can turn the counters on and off repeatedly to collect data over many iterations or multiple sections of code. 

The macro PFOPT is defined in <sys/pfcntr.h>. 

NOTES

The interrupt load that profiling places on the system may affect performance. 

The kernel in use must have the pfm pseudo-device configured into it. To do this, add the following line to the kernel configuration file and rebuild the kernel:

        pseudo-device       pfm

RESTRICTIONS

The victim and novictim statistics rely on the external performance counter pin connections as described in the EV4 chip specification.  The DEC 3000/400, /500, /600, and /800 workstations have these connections.  Attempts to display either of these statistics on other platforms (while allowed) will typically generate empty data. 

The uprofile command is only supported on EV4 Pass 3 or later processors.  Attempts to use it on a Pass 2 processor will gather PC samples for every process running on the system. 

Using kprofile to generate statistics for a single command is only possible on EV4 Pass 3 or later processors.  Attempts to do this on a Pass 2 processor will gather statistics for the entire system, as if no command had been specified. 

Using kprofile with triggered mode also requires an EV4 Pass 3 or later processor and cannot be performed with per-process monitoring. 

FILES

/dev/pfcntrThe performance counter device file. 

umon.out[.n]The statistics file(s) generated by uprofile. 

kmon.out[.n]The statistics file(s) generated by kprofile. 

/vmunixThe default kernel to profile. 

RELATED INFORMATION

prof(1), runon(1), pfm(7), pdtostd(1), psrinfo(1)

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026