hiprof(5) — Macro Packages and Conventions
NAME
hiprof − Hierarchical instruction profiler
SYNOPSIS
atom appl_prog −tool hiprof [−env threads] [−toolargs="arg1 arg2..."] [atom_flags...]
PARAMETERS
appl_prog
File name of a fully linked shared or nonshared executable to be profiled. This program should be compiled with the −g1, −g2, or −g3 flag to obtain more complete profiling information. If the default symbol table level (−g0) has been used, line number information, static procedure names, and file names are unavailable to the profiler.
FLAGS
−tool hiprof
Identifies the hiprof tool to atom.
−env threads
Specifies that the hiprof tool is being invoked on an application that runs in a threaded environment. To make run-time analysis of an application threadsafe, you must specify −env threads in the hiprof command. Only POSIX threads created using the pthread_create function are supported.
The threadsafe instrumented executable is named appl_prog.hiprof.threads by default. You can omit the −env threads flag if the application does not create threads; in this case, the instrumented executable is named appl_prog.hiprof.
−toolargs=""arg1 arg2 ...""
Passes arguments to the hiprof tool’s instrumentation routines. Use whitespace characters to separate arguments from their parameters (if any) and from other arguments.
atom_flags
Specifies flags to the atom command. See the atom(1) reference page for descriptions of other flags accepted by the atom command, such as those that enable instrumentation of shared libraries, specify the names of instrumented objects, and request debugging information.
The hiprof tool allows the following flags to be passed in the −toolargs flag for use by the hiprof tool’s instrumentation routine when instrumenting appl_prog. Except where noted, these flags can also be passed to the instrumented program at execution time by being defined as part of the HIPROF_ARGS environment variable.
−calltime
Causes hiprof to apply more precise, pthread-dependent profiling process-wide. This style of profiling measures the cost of calls during each call. By default, hiprof uses threadsafe, pthread-independent profiling, which shows the cost of calls proportional to the number of calls. This flag cannot be defined as part of the HIPROF_ARGS environment variable.
−cputime
Causes hiprof to use CPU time obtained from the hardware cycle counter rather than from instruction counts. This flag cannot be defined as part of the HIPROF_ARGS environment variable.
−dirname directory
Specifies the directory path in which hiprof creates its .hiout data files.
−exc procname
Excludes time spent in procname from the profile. This switch can be used multiple times to exclude multiple procedures.
−fastrecur
Invokes a simpler heuristic for mapping recursion into a hierarchical report when used with the −calltime, −cputime, or −pagefaults flag.
−nolog
Disables use of a trace buffer for −cputime. This is useful for studying the performance of hiprof. This flag cannot be defined as part of the HIPROF_ARGS environment variable.
−nousr
Excludes user execution time from the profile. This flag cannot be defined as part of the HIPROF_ARGS environment variable.
−[no]pids
Includes (or excludes) the process ID of the process running the program in the name of the hiprof profile file produced by the instrumented application.
−pagefaults
Measures pagefaults instead of program execution time. Works only for nonthreaded programs. This flag cannot be defined as part of the HIPROF_ARGS environment variable.
−sigdump sig
Causes the process running the instrumented application to catch the signal indicated by sig (see signal(4)). When it receives that signal, the process writes the current profiling data to the output file, reinitializes the profile by setting the execution time to zero, and resumes execution.
−systime
Incorporates cycle counter estimates of system time into instruction count estimates of user time when used with the −calltime flag.
−threads
When used with the −calltime or −cputime flags (and −env threads is specified on the atom command line), causes hiprof to apply more localized, pthread-dependent profiling to each individual thread in the process. Otherwise, hiprof provides process-wide profiling for the modes enabled by these flags.
−textout
When used with the −calltime, −cputime, or −pagefaults flags, produces a text-format profile file instead of a binary profiling data file. This file is similar to the output from gprof, although it cannot be combined or filtered. It also contains additional statistics on the instrumentation that has been used on appl_prog. By default, the profile file contains binary data that the gprof utility can combine with other profiles and filter, prior to generating a report.
When −textout is specified with −env threads, each thread is individually profiled, as if −threads had also been specified.
While the instrumented appl_prog is being executed, flags specified in the definition of the HIPROF_ARGS environment variable override any corresponding settings in the −toolargs flags. For example:
% setenv HIPROF_ARGS "-dirname /tmp/profiles -pids"
DESCRIPTION
The hiprof tool is an Atom-based program profiling tool that produces both flat and hierarchical profiles. The flat profile shows the execution time spent in any given procedure. The hierarchical profile shows the time spent in a given procedure and all its descendents. The hierarchical profile enables the user to answer questions of the form "How much time is spent in printf() and all procedures called by printf()?".
The hiprof tool’s output is similar to that generated by the −pg flag of the cc command. However, hiprof uses code instrumentation rather than PC-sampling to gather statistics. The gprof command is usually used to filter and merge output files and to format profile reports.
The hiprof tool generates an instrumented version of appl_prog. The instrumented program behaves identically to the original except that it writes out an execution profile after it is done.
Multiple profile files can be created by a single program run because a separate profile can optionally be generated for each thread of each process. Nonthreaded programs are treated as programs with just one thread.
Specifying the Name of Profile Files
The name of the profile file has the following form:
appl_prog.pid.tid.hiout
The pid (process ID) portion of the filename appears only if you specify the −pids flag in either the atom command’s −toolargs flag or the HIPROF_ARGS environment variable. The tid (thread ID) portion appears only if you specify both −env threads on the atom command line and −threads in either the atom command’s −toolargs flag or the HIPROF_ARGS environment variable.
The directory in which the profile file is created can be specified with the −dirname flag.
Resetting the Profile
It is sometimes useful to start profiling part way into the execution of a program. For example, a user may wish to omit program initialization from the profile. Also, it is sometimes useful to force the program to print its profile even before it has finished executing. For example, a user might wish to extract the profile of a running file server. The hiprof tool provides a mechanism to do these things.
If you specify the −sigdump flag in the atom command line or define the −sigdump flag in the HIPROF_ARGS environment variable, the specified signal will be caught by the process. When it receives that signal, the process writes the current profiling data to the output file, reinitializes the profile by setting the execution time to zero, and resumes execution.
The process can be signaled any number of times during its execution.
If you do not specify the −textout flag in the atom command line or define it in the HIPROF_ARGS environment variable (that is, when you are producing binary profile files for gprof), each signal causes the process to overwrite any existing file.
If you do specify the −textout flag (that is, when you are producing text-format profile files), the output file will contain two sets of profile data when the process completes execution:
•From the beginning of the program to the point at which the signal was received
•From the point each signal was received to the end of the program
For example:
setenv HIPROF_ARGS "-sigdump USR1"
application_program.hiprof &
<wait until the desired time>
kill -USR1 pid
User Time Estimates
The hiprof tool provides two different ways of estimating user execution time: instruction counts and the cycle counter. By default, the hiprof tool estimates execution time by counting the number of user-level instructions executed. However, if the −cputime flag is specified during instrumentation (that is, to the −toolargs flag in the atom command line), CPU time is estimated using the hardware cycle counter. This involves looking at the value of the hardware cycle counter before and after a procedure call to determine the time spent in the procedure.
The advantage of instruction counts is that they are repeatable and are unaffected by the presence of the instrumentation code. If a program is run twice with identical inputs, the instruction counts for both runs will be identical. The disadvantage of instruction counts is that they do not account for various second-order effects (cache misses, TLB misses, and pipeline stalls) which degrade the execution time of a real program.
The advantage of using the cycle counter is that the effects of cache misses, TLB misses, and pipeline stalls are accounted for. The disadvantage is that the presence of the instrumentation code can degrade the performance of the cache and TLB seen by the application. If an application procedure is short (100 or so instructions), then times reported for both the short procedure and the procedure calling the short procedure can be unrealistically pessimistic. If a significant fraction of an application’s time is spent in a short procedure, it may be better not to instrument that procedure at all. To exclude procedure procname from instrumentation, you can specify the −exc procname flag in the atom command line or define it in the HIPROF_ARGS environment variable. If a procedure is not instrumented, its run time is charged to its parent and all calls made by the procedure appear to be made by the parent.
System Time Estimates
By default, the hiprof tool uses instruction counts and omits system time from its estimates of execution time. However, passing the −cputime flag in the −toolargs flag to hiprof’s instrumentation routine causes the instrumentation routine to use the hardware cycle counter to measure both user and system CPU time. If you specify the −calltime flag to the −toolargs flag on the atom command line, you can specify the −systime flag (either in −toolargs or in the HIPROF_ARGS environment variable) to incorporate cycle counter estimates of system time into instruction count estimates of user time. You can exclude user execution time from the profile by using the −nouser flag in the −toolargs flag at instrumentation time.
Multiple Processes and Threads
When a program calls fork, an additional output file is created for the new child process. The child’s output file reports only the execution time used by the child process following the fork. The parent’s output file reports the execution time of the parent process both before and after the fork. Similarly, when a threaded application creates a new thread, a separate profile is created for that thread.
If a process calls exec and the exec succeeds, then all execution time statistics from the creation of the process up to the exec are lost. This occurs because the profile statistics are lost when the exec overwrites the address space. For the most part, this is not a problem because calls to exec are usually immediately preceded by a fork. If the program being invoked by the exec call is instrumented, then the execution time of the process following the exec is reported in that new program’s output file.
Recursion
Recursion causes complications for hierarchical profilers because the call graph is not a tree. The hiprof tool uses a heuristic to map the times from a cyclic graph to a hierarchical report. While the application runs, hiprof dynamically detects edges that close cycles in the call graph. Then, hiprof breaks the cycle by stopping the clock for all edges in the cycle. Edges that close cycles in the call graph are marked in the text-format report (generated when the −textout flag is specified in the −toolargs flag or in the HIPROF_ARGS environment variable) with a ’+’ character and will have zero time assigned to them.
Although the above heuristic produces the most intuitive reports, it can be inefficient for some programs that are highly recursive. A simpler algorithm can be invoked by including −fastrecur in the −toolargs flag to the atom command line or in the definition of the HIPROF_ARGS environment variable. In the simpler algorithm, the clock is stopped only for the edge closing the cycle. All of the other edges in the cycle continue to accumulate time −− with the result that the sum of the times of the edges leaving a node can sum to more than the execution time of the program.
Algorithm
Although hiprof’s output format was modeled after gprof’s PC-sampling format, its algorithms (except in the default mode) are different. A couple of improvements result. For example, the amount of time spent by a child procedure on behalf of its parent is measured rather than estimated, as it is in PC-sampling. Unlike profilers based on pixie, both the source and destination of indirect calls can be reported.
The hiprof tool dynamically constructs the procedure call graph during the execution of the program. This allows the profiler to handle indirect calls that would otherwise be ambiguous from a static analysis of the program. Nodes in the graph represent procedures, and arcs between nodes represent procedure calls. During the execution of the program, the profiler maintains a model of the procedure call stack. When a procedure is called, the profiler pushes the identity of the called procedure and the time of the call onto its stack. When a procedure returns, the profiler pops the top entry off its simulated stack. The difference in the times of the call and return gives the time spent in the called procedure and all of its descendents.
A test is performed by the algorithm to avoid double counting times when a recursion occurs. If multiple calls to the same procedure are outstanding simultaneously, the profiler only times the first call.
FILES
appl_prog.hiprof
Default name for instrumented version of appl_prog
appl_prog.hiout
Default name of profile output file
BUGS
If the cycle counter is used to measure the execution time of a procedure and the procedure call executes more than 2^32 cycles without making another procedure call, the reported execution time for that procedure will be too small because the wraparound of the 32-bit cycle counter is not detected. Wraparound may also occur if not all procedures or shared libraries are profiled. Consequently, when you specify the −cputime flag, you should also specify the −all flag.
If a deadlock occurs in a multithreaded application that has signal handlers in instrumented objects, you can use any one of the following flags on the atom command line to avoid the problem:
•Specify the −excobj flag to avoid instrumenting the object
•Specify the −toolargs="−exc procname" flag to avoid instrumenting the signal-handling procedure
•Specify the −toolargs="−calltime" flag to enable deadlock detection (and thereby ignore the signal-handling procedures)
A maximum of 1024 threads are allowed.
SEE ALSO
atom(1), gprof(1), cc(1), dxprof(1) (dxprof(1) is available only if the Developer’s Tool Kit and associated reference pages are installed on your system.)
Programmer’s Guide