R10KCOUNTERS(5) R10KCOUNTERS(5)
NAME
r10k_evcntrs, r10k_event_counters, r10k_counters -Programming the R10000
event counters
DESCRIPTION
The R10000 processor supplies two performance counters for counting
certain hardware events. Each counter can track one event at a time and
there are a choice of sixteen events per counter. There are also two
associated control registers which are used to specify which event the
relevant counter is counting. Each counter is a 32-bit read / write
register and is incremented by one each time the event specified in its
associated control register occurs. Furthermore, the control registers
allow one to indicate that the events are only counted in a specific
mode. The modes may be user mode or several choices of kernel mode, or
some combination of kernel and user mode.
The counters may optionally assert an interrupt upon overflow, which is
defined to be when the most significant bit of one of the counter
registers (bit 31) becomes set. If such an overflow interrupt is enabled
for that event in the associated control register, then the interrupt
will be presented to the cpu. Whether the interrupt is asserted or not
the counting of events will continue after overflow.
The format of the control registers is as follows:
31 9 8 5 4 3 2 1 0
-------------------------------------------------------------
| 0 | Event | IE | U | S | K | EXL |
-------------------------------------------------------------
Bit 4 is the interrupt enable bit and it specifies whether overflows for
the specified event will generate interrupts or not. Bits 3 through 0
specify the mode the event is counted in, or the count enable bits. These
bits will enable counting when they match the equivalent KSU settings in
the status register of the R10000. That is:
U bit <----> KSU = 2, EXL = 0, ERL = 0 (user mode)
S bit <----> KSU = 1, EXL = 0, ERL = 0 (supervisor mode, not supported)
K bit <----> KSU = 0, EXL = 0, ERL = 0 (kernel mode)
EXL bit <---> EXL = 1, ERL = 0 (transient kernel mode)
So, for example, if the KSU bits in the status register are 2 and ERL and
EXL bits are both off, then events enabled with the U bit will be
counted. In this way, a program which intends to use the performance
counters must specify which events are to be counted and in which mode(s)
they are to be counted.
Page 1
R10KCOUNTERS(5) R10KCOUNTERS(5)
The events that can be tracked by the performance counters are given in
the following table:
Event Counter 0 Counter 1
-----------------------------------------------------------------
| 0 | Cycles | Cycles |
-----------------------------------------------------------------
| 1 | Instructions issued | Instructions graduated |
| | to ALU, FPU or load/store | |
| | units | |
-----------------------------------------------------------------
| 2 | Load/prefetch/sync issued | Load/prefetch/sync graduated |
-----------------------------------------------------------------
| 3 | Stores issued | Stores graduated |
-----------------------------------------------------------------
| 4 | Store conditional issued | Store conditional graduated |
-----------------------------------------------------------------
| 5 | Failed store conditional | Floating-point instr. grad. |
-----------------------------------------------------------------
| 6 | Rev2.x: Branches decoded | Write back from data cache |
| | Rev3.x: Branches resolved | to secondary cache |
-----------------------------------------------------------------
| 7 | Write back from secondary | TLB refill exceptions |
| | cache to System interface | |
-----------------------------------------------------------------
| 8 | Single-bit ECC errors on | Branches mispredicted |
| | secondary cache data | |
-----------------------------------------------------------------
| 9 | Instruction cache misses | Data cache misses |
-----------------------------------------------------------------
| 10 | Secondary cache misses | Secondary cache misses |
| | (instruction) | (data) |
-----------------------------------------------------------------
| 11 | Secondary cache way | Secondary cache way |
| | mispredicted (instruction)| mispredicted (data) |
-----------------------------------------------------------------
| 12 | External intervention | External intervention hits |
| | requests | |
-----------------------------------------------------------------
| 13 | External invalidation | External invalidation hits |
| | requests | |
-----------------------------------------------------------------
| 14 | Rev2.x: Virtual coherency | Upgrade requests on clean |
| | Rev3.x: Functional unit | secondary cache lines |
| | completion cycles | |
-----------------------------------------------------------------
| 15 | Instructions graduated | Upgrade requests on shared |
| | | secondary cache lines |
-----------------------------------------------------------------
Page 2
R10KCOUNTERS(5) R10KCOUNTERS(5)
Note that the definition of events 6 and 14 on counter 0 differ depending
on the Chip Revision. The chip revision can be determined via the
command hinv(1).
The kernel maintains 64-bit virtual counters for the user program using
the hardware counters. This view of the counters as being 64-bit is
maintained through the programming interfaces which use them, even though
the actual counters are only 32 bits. Similarly, there are only two
hardware counters per R10000, but the programming interface supports the
view that there are actually 32 counters. That is, a user program can
specify that more than one event per hardware counter is to be counted,
up to sixteen events per counter. The kernel will then multiplex the
events across clock tick boundaries. So, if a program is tracking more
than one event per counter, every clock tick the kernel will check to see
if it is necessary to switch the events being tracked. If necessary,
then, it will save the counts for the previous events and set up the
counters for the next event(s). Thus, to the program there are 32 64-bit
counters available.
The performance counters are available to the user program primarily via
the /proc interface. A limited and more specialized functionality is also
provided through the syssgi interface, but this is not intended to be the
general interface. Through /proc, ioctls exist which allow one to start
or stop using the counters, to read the counts in one's counters, or to
modify the way the counters are being used. Since this interface
specifies a pid as a parameter, it is possible, in general, for one
process to read or manipulate another process's counters, as long as the
process belongs to the same process group or is root.
There are also ioctls which allow the program to specify overflow
thresholds on a per-event basis, and to supply a signal to be sent to the
program upon overflow. That is, the fact that an interrupt can be
generated whenever a particular counter overflows can be exploited to
allow a program to specify a threshold N for an event such that after N
occurrences of the event an interrupt will be generated. In addition to
this, while the kernel is servicing the counter overflow interrupt it can
perform some user-specified action, such as sending a user-specified
signal to the program whenever an overflow is generated, or incrementing
a PC bucket for profiling. The latter choice is a more specialized
functionality and is not part of the general /proc interface.
For a process using the counters in user mode, the control block for the
counters is kept in the u-area. Thus, once the process forks, the child
acquires the same state of the counters as the parent, which implies that
the next time the child runs the performance counters will be run for the
child, tracking the same events as its parent. Therefore, the counter
values are zeroed for the child upon fork so that at a later time the
child's counters will accurately depict the activity of the child. For
this reason, it is possible for the parent to fork and then wait for the
child to exit. When the child exits, if the kernel sees that the parent
is waiting for the child it will add the child's 64-bit counters to those
of the parent and the parent will thus have the event trace of the child.
Page 3
R10KCOUNTERS(5) R10KCOUNTERS(5)
Other methods for a parent to acquire a child's counters are discussed
with the PIOCSAVECCNTRS ioctl.
Operation Modes for the Performance Counters
There are two basic modes that the counters are used in, user mode and
system mode. Using them in user mode allows the counters to be shared
among any number of user programs. In this mode the kernel saves and
restores the counts and state of the counters across context switch
boundaries. System mode is defined when a user with root privileges uses
the counters in kernel mode (user mode and/or EXL mode may also be
specified, but kernel mode is essential). In this mode there are no
context switch boundaries and so other programs will not be able to use
the counters when they are in use in system mode.
Therefore, when the counters are already in use in user mode, a program
which attempts to use them in system mode will fail with EBUSY since the
two modes cannot co-exist (unless certain commands are employed to force
releasing of the counters in user mode and the acquiring of them in
system mode- to be discussed later). Likewise, if the counters are in use
in system mode, any program attempting to use the counters will fail with
EBUSY (root-level or otherwise).
The approach taken to these two operating modes is that system mode has a
higher priority. For this reason there is a syssgi command to forcibly
acquire the counters in system mode. Any current users of the counters on
any cpu will be forced to release them. And any users of the counters who
are not currently running will not be able to acquire them when they run
again. This latter situation holds at all times. That is, there may be
several programs sharing the counters in user mode. If at any moment they
happen to all be switched out, the counters are temporarily free. At this
point it is possible for a super-user to acquire the counters in system
mode. Then, when the other programs are run again, they won't be able to
acquire the counters since they are in use in system mode. Since this
program will then be run at this point without the intended event
counting, the kernel will arrange it such that this program will not use
the counters again, unless they are explicitly restarted. This is because
the values in the counters are no longer representative of the program.
To re-iterate, a root-level program may receive EBUSY from the kernel if
it tries to acquire the counters in system mode through /proc and they
are actively in use at the time of the system call. If they are in use in
user mode by other programs but those programs are not running at the
time of the system call, then the counters will be successfully acquired
in system mode and the other programs will not be able to acquire them
again- the kernel will not try to start up the counters for those other
programs again.
In order to make this situation visible to the program, a generation
number is employed to reflect the current state of the counters. In this
case, whenever the kernel does turn off the use of the counters for a
Page 4
R10KCOUNTERS(5) R10KCOUNTERS(5)
program because the mode of operation has switched from user mode to
system mode, the generation number for the counters for the user programs
will be increased. Thus, subsequent reads of the counters will return the
new number and should signal the program that the counter values are not
to be trusted. The number will be discussed in greater detail later.
To support using the counters in system mode, each cpu has its own
control block for the counters, pointed to in its private area. There is
also a global counter control block which maintains counter state for the
entire system. When the counters are being used in system mode they are
not read and stored across context switch boundaries. In fact, unless
they are explicitly read by a program, the counters are not read by the
kernel until there is an overflow interrupt. When this occurs the cpu on
which the interrupt occurs updates its own private virtual counters, no
changes are made to the global counter control block.
When the counters are read in system mode via PIOCGETEVCTRS through
/proc, the per-cpu counters are all added together into the global
counters so that the global counters represent the sum total of the
counted events for the entire system. This same coalescing of the per-cpu
counters happens when the counters are released. Note that it is also
possible to read a particular cpu's counters via the syssgi
HWPERF_GET_CPUCNTRS command.
/proc Commands for the Performance Counters
To support the /proc interface for the counters, there are several data
structures defined in /usr/include/sys/hwperftypes.h that are used to
either pass parameters with the calls or to receive data back from the
kernel.
struct hwperf_ctrlreg {
ushort_t hwp_ev :11, /* event counted */
hwp_ie :1, /* overflow intr enable */
hwp_mode:4; /* user/kernel/EXL */
};
typedef union {
short hwperf_spec;
struct hwperf_ctrlreg hwperf_creg;
} hwperf_ctrl_t;
typedef struct {
hwperf_ctrl_t hwp_evctrl[HWPERF_EVENTMAX];
} hwperf_eventctrl_t;
Each event is described to the kernel through an hwperf_ctrl_t. Where
relevant, the ioctls take the address of an hwperf_eventctrl_t, the array
of 32 hwperf_ctrl_t's. If the user is not interested in an event, then
Page 5
R10KCOUNTERS(5) R10KCOUNTERS(5)
care must be taken to ensure that the corresponding element in this array
is zero.
For a user to gain access to the counters, it must indicate which events
are of interest and how they are to be counted; whether overflow
thresholds are to be used to generate overflow interrupts or not, and
what those thresholds are per event; and what signal the user program
would like to receive from the kernel upon overflow interrupt. All of
this information is conveyed with the structure hwperf_profevctrarg_t:
typedef struct hwperf_profevctrarg {
hwperf_eventctrl_t hwp_evctrargs;
int hwp_ovflw_freq[HWPERF_EVENTMAX];
int hwp_ovflw_sig; /* SIGUSR1,2 */
} hwperf_profevctrarg_t;
With the above structure as parameter the user program must take care to
zero the hwp_ovflw_freq elements for which no overflow thresholds are
intended. The hwp_ovflw_sig field is used to tell the kernel which signal
the program wants to receive upon overflow interrupt. The acceptable
signals are between 1 and 32 (SIG32). This field should be zero if no
signals are wanted.
The following structure is an array of 32 64-bit virtual counters and is
used when a program wants to read the virtual counters of a process:
typedef struct {
__uint64_t hwp_evctr[HWPERF_EVENTMAX];
} hwperf_cntr_t;
It is also possible to read the counters and all of the prusage
information of a process in one call. To this end the hwperf_prusage_t is
defined:
typedef struct hwperf_prusage {
timespec_t pu_tstamp; /* time stamp */
timespec_t pu_starttime; /* time process was started */
timespec_t pu_utime; /* user CPU time */
timespec_t pu_stime; /* system CPU time */
__uint64_t pu_minf; /* minor (mapping) page faults */
__uint64_t pu_majf; /* major (disk) page faults */
__uint64_t pu_utlb; /* user TLB misses */
__uint64_t pu_nswap; /* swaps (process only) */
__uint64_t pu_gbread; /* gigabytes ... */
__uint64_t pu_bread; /* and bytes read */
__uint64_t pu_gbwrit; /* gigabytes ... */
__uint64_t pu_bwrit; /* and bytes written */
__uint64_t pu_sigs; /* signals received */
__uint64_t pu_vctx; /* voluntary context switches */
Page 6
R10KCOUNTERS(5) R10KCOUNTERS(5)
__uint64_t pu_ictx; /* involuntary context switches */
__uint64_t pu_sysc; /* system calls */
__uint64_t pu_syscr; /* read() system calls */
__uint64_t pu_syscw; /* write() system calls */
__uint64_t pu_syscps; /* poll() or select() system calls */
__uint64_t pu_sysci; /* ioctl() system calls */
__uint64_t pu_graphfifo; /* graphics pipeline stalls */
__uint64_t pu_graph_req[8]; /* graphics resource requests */
__uint64_t pu_graph_wait[8];/* graphics resource waits */
__uint64_t pu_size; /* size of swappable image in pages */
__uint64_t pu_rss; /* resident set size */
__uint64_t pu_inblock; /* block input operations */
__uint64_t pu_oublock; /* block output operations */
__uint64_t pu_vfault; /* total number of vfaults */
__uint64_t pu_ktlb; /* kernel TLB misses */
cpu_mon_t pu_cpu_mon; /* cpu monitoring stats */
} hwperf_prusage_t;
The ioctls available through /proc are the following:
PIOCENEVCTRS - Start using the counters for a process, either in user
mode or system mode. It initializes the counters for the
target process and, if the process is running, starts
them. Otherwise, the counters will be started the next
time the process is run. Fails with EINVAL if events are
specified events improperly, or if an input overflow
frequency (threshold) is negative.
If supervisor or kernel mode is specified for any of
the events and the caller does not have root privileges,
it will fail with EPERM. EBUSY may be returned for two
possible reasons:
(1) the counters are already in use in system mode or,
(2) the caller is requesting the counters in system
mode and, at the time of the request, the counters are
in use in user mode, on at least one cpu (this command
will not forcibly acquire the counters for a root
process).
Returns a positive generation number if successful.
PIOCGETEVCTRS - Read the virtual counters of the target process.
The address of an hwperf_cntr_t must be supplied in
the call.
Returns a positive generation number if successful.
Page 7
R10KCOUNTERS(5) R10KCOUNTERS(5)
PIOCGETPREVCTRS- Read a process's counters in addition to reading all
the prusage information associated with the process.
The address of an hwperf_prusage_t must be supplied
with the call.
Returns a positive generation number if successful.
PIOCGETEVCTRL - Retrieve the control information for the process's
counters: which events are being counted and the mode
they are being counted in. The kernel will copyout an
array of 32 event specifiers, so the user must supply
an address of an hwperf_eventctrl_t.
Returns a positive generation number if successful.
PIOCSETEVCTRL - Modify how a program is using the counters, whether it
be events and/or their associated mode of operation, or
overflow threshold values, or overflow signal. Once the
counters have been acquired this is how their operation
for a program is modified without releasing the
counters. Each time the PIOCSETEVCTRL is made the
generation number for the target process's counters will
be incremented. The parameter to this call is the
address of an hwperf_profevctrarg_t.
Returns a positive generation number if successful.
PIOCRELEVCTRS - Release the performance counters- the target process
will not have any events counted after this call. Note
that the virtual counters associated with the target
may still be read as long as the process has not exited.
No parameters are necessary.
PIOCSAVECCNTRS - Allow a parent process to receive the counter values
of one of its children when it exits, without having to
wait for the child (when the parent is waiting no
explicit call is necessary). When the child exits its
counter values will be added to the parent's, whether
the parent is using its counters or not. No parameters
are necessary other than target pid.
EXAMPLE
An example of how these commands would be used is given here. Suppose
that we wanted to count instruction cache misses and data cache misses
for our own program. That means that we want to count event 9 for both
counters, and these events would be counted in user mode. The following
code would accomplish this. Note that the constants used are defined in
/usr/include/sys/hwperfmacros.h, and evctr_args is an
hwperf_profevctrarg_t.
Page 8
R10KCOUNTERS(5) R10KCOUNTERS(5)
pid = getpid();
sprintf(pfile, "/proc/%05d", pid);
fd = open(pfile, O_RDWR);
for (i = 0; i < HWPERF_CNTEVENTMAX; i++) {
if (i == 9) {
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_mode = HWPERF_CNTEN_U;
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ie = 1;
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ev = i;
evctr_args.hwp_ovflw_freq[i] = 0;
} else {
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_spec = 0;
evctr_args.hwp_ovflw_freq[i] = 0;
}
}
for (i = HWPERF_CNT1BASE; i < HWPERF_EVENTMAX; i++) {
if (i == 9) {
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_mode = HWPERF_CNTEN_U;
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ie = 1;
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_creg.hwp_ev = i - HWPERF_CNT1BASE;
evctr_args.hwp_ovflw_freq[i] = 0;
} else {
evctr_args.hwp_evctrargs.hwp_evctrl[i].hwperf_spec = 0;
evctr_args.hwp_ovflw_freq[i] = 0;
}
}
evctr_args.hwp_ovflw_sig = 0;
generation1 = ioctl(fd, PIOCENEVCTRS, (void *)&evctr_args);
if (generation1 < 0) {
perror("failed to acquire counters");
exit errno;
}
. . . . . (body of program) . . . .
/* now read the counter values */
if ((generation2 = ioctl(fd, PIOCGETEVCTRS, (void *)&cnts)) < 0) {
perror("PIOCGETEVCTRS returns error");
exit(errno);
}
/* generation number should be the same */
if (generation1 != generation2) {
printf("program lost event counters0);
exit 0;
}
/* release the counters */
Page 9
R10KCOUNTERS(5) R10KCOUNTERS(5)
if ((ioctl(fd, PIOCRELEVCTRS)) < 0) {
perror("prioctl PIOCRELEVCTRS returns error");
exit(errno);
}
/* print out the counts */
printf("instruction cache misses: %d/0, cnts.hwp_evctr[9]);
printf("data cache misses: %d/0, cnts.hwp_evctr[25]);
exit 0;
Syssgi Commands for the Performance Counters
The syssgi commands which access the R10000 event counters are not
intended for general use. Rather, specialized commands are implemented
through this interface. Note that all the commands are the first argument
to the syssgi command SGI_EVENTCTR. The available commands are:
HWPERFPROFENABLE - Enable sprofil-like profiling using the
performance counters rather than the clock.
Returns EINVAL on incorrect input, or EBUSY
if the counters are already in use in system
mode. The second argument to this command is
the address of an hwperf_profevctrarg_t, the
argument is a profp, the fourth is the profcnt,
both referring to input necessary for profiling.
Returns a positive generation number if
successful.
HWPERFENSYSCNTRS - Forcibly acquire the counters in system mode.
ROOT PERMISSIONS ARE REQUIRED FOR THIS COMMAND.
Note that the counters must be set up in kernel
mode (usr and EXL may be included, but kernel mode
is required), EINVAL will be returned otherwise.
That is, at least one of the events must be
counted in kernel mode. Will fail with EBUSY if
the counters are already in use in system mode.
Otherwise, the command is guaranteed to return
the counters in system mode. Starts up the
counters on all the cpus, with all the cpus
counting the same events.
Takes as input (third parameter of syssgi call)
the address of an hwperf_profevctrarg_t, which
is set up just as it is for the PIOCENEVENTCTRS
(see example above).
Page 10
R10KCOUNTERS(5) R10KCOUNTERS(5)
Returns a positive generation number if
successful.
HWPERFGETSYSCNTRS - Read the global system counters to get the global
event counts. All of the per-cpu counters will be
aggregated into the global counters and the
results will be returned to the caller. Caller
must supply in third argument the address of
an hwperf_cntr_t.
Returns a positive generation number if
successful.
HWPERFGETCPUCNTRS - Read a particular cpu's event counters. The third
parameter is a cpuid, the fourth is the address
of an hwperf_cntr_t.
Returns a positive generation number if
successful, 0 otherwise (which would indicate
an invalid cpuid.)
HWPERFGETSYSEVCTRL - Retrieve the control information for the systems
event counters: which events are being counted
and the modes they are being counted in. The third
parameter must be the address of an
hwperf_eventctrl_t. Returns EINVAL if the counters
are not in use.
Returns a positive generation number if
successful.
HWPERFSETSYSEVCTRL - Modify how the system counters are operating,
whether it be events being counted and/or their
associated mode of operation, or overflow
threshold values, or overflow signal.
MUST BE ROOT TO ISSUE THIS COMMAND, or else EPERM
will be returned.
Once the counters have been acquired this is how
their operation is modified without releasing
them. Each time the system call
syssgi(SGI_EVENTCTR, HWPERF_SET_SYSEVCTRL,...)
is issued the generation number for the system's
counters is incremented. The third parameter to
this call is the address of an
hwperf_profevctrarg_t.
Returns a positive generation number if
Page 11
R10KCOUNTERS(5) R10KCOUNTERS(5)
successful.
HWPERFRELSYSCNTRS - Stop using the counters in system mode and to
make the counters available again.
ROOT PERMISSION REQUIRED.
Returns 0 upon success.
FILES
/usr/include/sys/hwperftypes.h
/usr/include/sys/hwperfmacros.h
SEE ALSO
ecadmin(1M),
ecstats(1M),
perfex(1M),
libperfex(3C),
libperfex(3F),
http://www.sgi.com/MIPS/products/r10k/PerfCnt/R10KPFCount.doc.html
Page 12