Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ tha(1) — Sun WorkShop 3.0.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

tha(1)

NAME

tha − view graphs and tables from a traced MT program

SYNOPSIS

tha [ -V ] [ instr-dir] [ -exec executable]

DESCRIPTION

Thread Analyzer (tha) is a graphical user interface for viewing trace information in various ways.  One of its modes of display produces tables which look just like prof and gprof on a per-thread basis.  It can display information as tables of metrics (for example, CPU time) or as graphs of these metrics over the lifetime of the program.  Other metrics are based on read and write system calls and on libthread synchronization primitives. 

-V(or-v)
Causes Thread Analyzer to print out its version number and exit. 

-exec Specifies the executable which was used to create the trace directory. 

Any other argument (aside from X toolkit options) is taken to be the name of a trace directory to load.  If a trace directory name is not provided, it can be selected using a file chooser from within the tool. 

To generate trace information for use with Thread Analyzer, use the -Ztha option at link time.  You can also use -Ztha at compile time with the cc, CC, and f77 compilers to generate trace information when the program is run.  (See under EXAMPLES below). 

Thread-level metrics based on MT synchronization calls and file operations can be collected by simply linking a program with -Ztha .  Metrics can additionally be collected for user functions by compiling the files containing those functions with the -Ztha option.  If any of the .o files in a program were compiled with -Ztha, then the program must be linked with -Ztha.  Failure to specify -Ztha causes a link time error. 

The -Ztha flag automatically turns on the -mt flag.  Non-threaded programs can be instrumented with -Ztha, but you must be able to compile them using the -mt switch. 

References to the program cc (the C compiler) in this man page apply equally well to CC (the C++ compiler) or f77 (the fortran compiler). 

When an instrumented program is run, it creates a directory called tha.pid in the current working directory to hold trace data about that particular run. Running the command

   % tha tha.nnn
 

will bring up the Thread Analyzer on the trace data collected from the corresponding run of the program.  (where nnn is the Process ID —PID— of the traced run.)

The instrumented executable must be available the first time that Thread Analyzer is run on a particular trace directory, so that the symbols can be read.  The symbols and their values are then cached inside the trace directory, allowing the original binary to be removed or moved, if desired.  Currently the program is only automatically searched for in the same directory as the trace directory, and only the first 14 characters are stored.  If the program isn’t in the right directory, or the name is too long, then use the -exec option to specify a full or relative pathname to the executable. 

USER INTERFACE

Main Window
The main window displays a tree representing all the threads and function calls in the program. The Load button can be used to select a trace directory, if Thread Analyzer is brought up without specifying the trace directory as a command line argument.  The View menu is used to display graphs windows and tables which show the various metrics available, and to apply filters.  The general way to use the View menu is to select a node by clicking the SELECT mouse button on it, and then select an option from the View menu.  Some of the View options will then display a table from which to select one or more metrics to be shown. 

When the tree is first displayed, nodes in it might be abbreviated by showing descendant lines, without showing the child nodes.  At any point, a node can be compressed (hiding its children) or expanded (showing its children) by clicking the right mouse button to compress or expand the node.  Individual nodes may be hidden by clicking the middle mouse button on them.  A node which is currently selected will never be hidden. It must be deselected first by clicking on the tree display background or by selecting another node. 

Another way of modifying the display of the tree is by using a filter based on percentage of CPU time. 

A program compiled without using the -Ztha option will not contain trace information about user functions and when tha is run on the trace directory, no function nodes will be displayed. 

Filter Popups
When either of the two filter options is selected from the View menu, a popup is displayed which can be used to filter either the thread-level nodes or the function level nodes, depending on which View menu option was selected.  Nodes which do not meet this minimum criteria will be hidden. 

Graph Windows
Graph Windows can show the value of a metric over time.  Several metrics may be selected for the same graph window.  The maximum number of graph windows that may be displayed at one time is ten (10). When more graph windows are attempted, a warning is displayed.  At the function level, only the CPU time metric can be graphed.

gprof Tables
gprof tables are available at both the function and thread levels.  The layout of the table that comes up resembles gprof(1).  A gprof table at the function level will show information only due to one particular thread, that is, the thread above it in the tree display. 

prof Tables
prof tables are available at the function, thread and program level. All prof tables at a certain level will look the same.  Selecting a node for use by the prof option is effectively selecting a level at which you would like to apply it.  The function level prof table has an entry for each thread/function combination.  The thread-level prof table has an entry for each thread (with all function calls made by thread added up).  The program level prof table shows the total time for the entire program.  (This is one of the less informative tables.) 

Sorted Metric Table
The sorted metric table shows an arbitrary metric in the same fashion as the prof table. The display is either per-program, per-thread, or per-thread/function combination, depending on the level of the node which was selected when the table was brought up.

Metric Table
The metric table can show multiple metrics at once, for whichever object is selected.  If a function node is selected, then the metrics are shown for calls of that function made by the thread which is the node’s parent.

METRICS

Thread Analyzer displays graphs and tables which show one or more measured statistics over the wall clock execution time of the program.  The metrics include read/write system call statistics such as read-ops/sec, read-bytes/se, read+write-ops/sec, ans so forth. There are also metrics based on percentage of wall-clock time spent blocked on each major form of synchronization (mutex_lock, cond_wait, cond_timedwait, sema_wait, thr_join, or the sum of these).  Each of these metrics can be graphed on a per-thread basis. 

Metrics apply to objects. An object can be an entire program, a single thread, or a single function.  If a metric refers to a single function, then it refers to all the times that function was called by the thread.  In other words, Function level metrics are implicitly divided up according to the thread that made the call. 

When a metric is collected for a function, the measurement is not normally accumulated into the parent function (the caller of the function).  For example, CPU time for a function will only reflect time spent in that function, and will not reflect time spent in the functions that it calls.  The exception to this is the way un-instrumented functions are treated.  When an instrumented function calls an uninstrumented function, then metrics accumulated by the uninstrumented function will be attributed consistently to the most immediately preceding instrumented function in the call chain. 

The word object in the list below refers to either a thread or a function, as represented by a node in the tree display. 

CPU Time This metric measures the amount of time an object was scheduled by the operating system, and running on a CPU. 

Wall Clock Time This metric measures the amount of time between when an object was created and when it was destroyed.  The obvious difference from CPU Time is that when a thread is suspended (for instance waiting for a mutex), the thread is still accumulating Wall Clock time even though it is not taking up CPU time. 

Mutex lock wait time This metric measures the wall-clock time an object remains suspended waiting to acquire a mutex lock. 

Semaphore wait time This metric measures the wall-clock time an object remains suspended waiting to acquire a semaphore. 

Condition variable wait time
This metric measures the wall-clock time an object remains suspended waiting to be signalled on a condition variable.

RW read-lock wait time
This metric measures the wall-clock time an object remains suspended waiting to acquire a read lock on a Reader/Writer style lock.

RW write-lock wait time
This metric measures the wall-clock time an object remains suspended waiting to acquire a write lock on a Reader/Writer style lock.

Join wait time This metric measures the wall-clock time an object remains suspended in the thr_join function waiting for another thread to terminate. 

Total Sync wait time This metric measures the total wall-clock time spent waiting on any of these six forms of threads syncronization. 

Writes/Sec This metric measures the number of write system calls made per second by a certain object. 

Reads/Sec This metric measures the number of read system calls made per second by a certain object. 

Write Bytes/Sec This metric measures the number of bytes written per second by write system calls. 

Read Bytes/Sec This metric measures the number of bytes read per second by read system calls. 

File IO Ops/Sec This metric measures the combined number of read and write system calls made by an object per second. 

File IO Bytes/Sec This metric measures the number of bytes read or written per second by an object via the read and write system calls. 

Read Wait Time This metric measures the amount of time spent blocked on read system calls. 
 

EXAMPLES

 
Here’s how to compile a simple program to collect both function and thread-level data.

    % cc -Ztha -o prog prog.c -lthread
 To collect only thread-level metrics:
    % cc -c prog.c
   % cc -Ztha -o prog prog.o -lthread
 To collect function level metrics on select files:
(Functions in prog2.c will not emit trace information)
    % cc -Ztha -c prog1.c
   % cc -c prog2.c
   % cc -Ztha -c prog3.c
   % cc -o prog prog1.o prog2.o prog3.o -lthread
 then, to see the results
    % ./prog          (the PID was 1294 for this run)
   % ls tha.∗
   tha.1294
   % tha -exec prog tha.1294

 

CAVEATS

Wall-clock time for a thread is measured from the point where the thread is created with thr_create() until the thread terminates itself with thr_exit().  Wall-clock time is accumulated even if the thread is suspended by a synchronization function or if the thread’s LWP is swapped out by the OS. 

This means that some metric values will look a little odd at first.  For example, consider a program which creates 5 threads on startup, runs for 5 minutes, then terminates all 5 threads and exits.  This program will show 25 minutes of wall-clock time: 5 threads for 5 minutes.  Even if only one thread actually runs, and even if you’re only on a one-cpu machine. 

The color-coded graph window doesn’t work well on black and white displays. 

There is currently no way to get metrics based on all calls to a function, across threads.  One might want to see CPU time spent in function foo() in the entire program.  There is currently no way to do this. 

Thread Analyzer cannot switch to a second trace directory without quitting the tool and restarting it. 

There is no way to save either tabular or graphical output to a file. 

Each instrumented call will record 32 bytes of information into a data file at run time.  This happens on calls to mutex_lock, cond_wait, cond_timed_wait, thr_join, rw_rdlock, rw_wrlock and sema_wait, as well read and write system calls.  A record is written when libthread actively suspends one thread and schedules another.  A record is also written for each call to a function that was compiled with −Ztha.  This can cause CPU intensive programs with many short function calls to quickly consume large amounts of disk space.  If this problem is encountered, it can sometimes be alleviated by removing the −Ztha flag from the compilation of files with lots of small, frequently called functions. 

BUGS

Due to the nature of the instrumentation, applications which create large numbers of threads will perform poorly. 

Thread Analyzer does not support programs with overly large stack depths.  If more than 1000 functions are active at once in any one thread, Thread Analyzer will fail. 

Due to the nature of the display, applications with a large number (>1000) of threads or instrumented functions will not be easy to manipulate graphically.  Filtering either threads or functions by CPU time can help alleviate this problem. 

Because of the way synchronization time is measured, threads which don’t ever actually block on synchronization objects may still show a very small but non-zero value.  This would falsely indicate that threads had blocked. 

Because of the way the instrumentation is implemented, there are some cases where per-function (compile-time) instrumentation should not be used.  Signal handler functions should not be instrumented, and if an application has a private version of the "malloc" routine, then it also should not be compiled with -Ztha.  The instrumentation uses malloc() to allocate memory. 

If you are using C++ to create shared libraries, then you need to be careful not to use -Ztha on source files with static constructors in them.  These constructors will end up trying to record data before the data collection code has been initialized, causing your program to seg fault.  The most common thing that creates static constructors is including <stream.h>, unfortunately. 

There are known problems when instrumenting C++ templates. 

There are occasionally round-off errors which cause the least significant digit of a "total" not to match the sum of its component numbers. 

Instrumenting shared libraries (.so files) has not been adequately tested and may cause problems.  We recommend not compiling files with -Ztha if they are going into a shared library.  As a workaround, staticly link your libraries (i.e., create .a files) while you are performance-profiling your application. 

PROGRAM TERMINATION

The recommended way for a threaded program to terminate itself is by calling thr_exit() from the function main().  Calling exit() or _exit(), or returning from main() as a regular function will result in the abrupt termination of any currently running threads. 

If a program aborts while other threads are actively running, the trace directory can be left in an inconsistent state.  This sometimes happens when interrupting a program with ctrl-C.  Terminating a program by calling _exit() will corrupt the trace directory, and calling exit() when there are other threads logging trace records may do so. 

Calling exit() when other threads are suspended on mutexes or condition variables should work fine. 

Interrupting a running program with a ctrl-C may also leave the trace directory in a corrupted state. 

SEE ALSO

Read the file /opt/SUNWspro/READMEs/impact for last minute release notes or product notes.  (If you have installed SPARCcompilers somewhere other than /opt, the file will be $(INSTALL)/SUNWspro/READMEs/impact.) 

Thread Analyzer  —  Last change: 30 Aug 96

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026