Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ ddopt(1) — IRIX 6.5.3f

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

ucode(1)

uopt(1)

btou(1)

ppu(1)



DDOPT(1)                                                              DDOPT(1)



NAME
     ddopt - MIPS Data-Dependency-based Optimizer

SYNOPSIS
     ddopt unopt_file opt_file [ -v -mips3 -hostcache -cachesz  size ]

DESCRIPTION
     ddopt, the MIPS data-dependency-based optimizer, reads the input binary
     ucode file on a procedure by procedure basis, performs loop-based
     transformations on each outer-most loop nest in each procedure and
     outputs the optimized binary ucode file.  By convention, it takes a
     binary ucode file with the extensions .B or .M as input and output a
     binary ucode file with the extension .D.  In the compilation process,
     ddopt runs after the front-end, after uld and usplit, and before umerge,
     uopt and ugen.  Currently, ddopt only takes ucode files generated from
     FORTRAN.

     ddopt borrows optimization techniques that originated from compilers for
     supercomputers and adapts them to apply to scalar machines.  It performs
     high-level analysis on the behavior of array accesses in loops, deriving
     what we call data dependency information.  Numerous optimization
     transformations on the program code are performed based on such
     information (and thus the name ddopt ). The transformations are
     invariantly associated with program loops that operate on arrays.

     There are different kinds of transformations performed by ddopt that
     benefit program performance:

     1. Those that reduce memory references.  Techniques include re-using
     array references that have been allocated to register (register
     allocation for array references) and moving array references and
     assignments outside loops.

     2. Those that improve locality of memory references (thus reducing data
     cache misses).  Techniques include changing the order of loop nests (loop
     interchange) and partitioning loop iterations to operate on smaller
     sections of array (strip-mining).

     3. Those that reduce floating-point interlocks and promote greater
     parallelism among floating-point operations by promoting larger pieces of
     straight-line code in loops.  Techniques include unrolling and
     unrolling-and-jam (unroll outer loop and jam the resulting copies of the
     inner loop into one bigger loop).

     There are other optimizations that ddopt does just to bring in more
     opportunities for doing the above transformations:  local common
     subexpression, secondary index variable elimination, constant
     propagation, copy propagation, constant folding, jump folding and dead
     code elimination.  Some of these optimizations duplicate the
     optimizations performed in uopt . These optimizations are applied
     iteratively until there is no more change to the code, and they precedes
     the data-dependency-based analyses and transformations.



                                                                        Page 1





DDOPT(1)                                                              DDOPT(1)



     The following options are interpreted by ddopt. Options starting with -X
     are not recognized by the compiler driver, and have to be passed to ddopt
     via -Wd,... .

     -v      Turns on verbose mode.  In this mode, ddopt will print the name
             of the procedure it is currently optimizing.

     -mips3  Tells ddopt that the target machine uses the MIPS3 instruction
             set.

     -hostcache
             Tells ddopt to assume that the target machine has the same data
             cache size as the host machine, so it can find out the cache size
             via system call.

     -cachesz  size
             Gives ddopt the data cache size of the target machine, in bytes.
             The default is 8192 bytes.

     -Xbldgr Dumps the data dependency information computed, for debugging
             purpose.

     -Xbboptoff
             Turns off the conventional global optimizations that precede the
             data-dependency-related transformations.

     -Xbf  size
             Changes the blocking factor used by ddopt in strip-mining.  The
             default is 36 bytes.

     -Xdump  Tells ddopt to dump the original and transformed program in a
             compact, close-to-source-level format.

     -Xdosizethreshold  count
             If the number of statements in a DO loop exceeds this number,
             that DO loop is excluded from transformation by ddopt. The
             default is 150.

     -Xgcopyoff
             Turns off global copy propagation.

     -Xinteroff
             Turns off loop interchange.

     -Xindepregoff
             Turns off loop-independent dependence register allocation.

     -Xinputregoff
             Turns off input dependence register allocation.






                                                                        Page 2





DDOPT(1)                                                              DDOPT(1)



     -Xinvarregoff
             Turns off loop-invariant register allocation.

     -Xlcopyoff
             Turns off local copy propagation.

     -Xmergepiblockoff
             Disallows the merging of pi-blocks created for statements in the
             same basic blocks.

     -Xmoreunrolljam
             By default, unroll-and-jam are performed only on inner loop nests
             that come out of strip-mining. This flag removes this restriction
             and tells ddopt to do unroll-and-jam whenever it thinks it is
             advantageous.

     -Xmaxintregs
             Tells ddopt the number of integer registers available in the
             underlying machine.  The default is 32.

     -Xmaxfloatregs
             Tells ddopt the number of floating-point registers available in
             the underlying machine.  The default is 16.

     -Xofffoo
             Turns off all transformation for the given procedure name ("foo"
             in this case).

     -Xoutputregoff
             Turns off output dependence register allocation.

     -Xoverallocate
             Tells ddopt to perform register allocation without regard to the
             number of registers available in the underlying machine.

     -Xstripoff
             Turns off strip-mining.

     -Xstriponly
             Tells ddopt to perform strip-mining but prevent the newly-formed
             loops from being interchanged into a deeper region of the loop
             nest, for debugging purpose only.

     -Xstat  Prints optimization statistics to give line numbers and number of
             times various transformations were applied.

     -Xtrueregoff
             Turns off true dependence register allocation.

     -Xunrolloff
             Turns off loop unrolling.




                                                                        Page 3





DDOPT(1)                                                              DDOPT(1)



     -Xunrolljamoff
             Turns off unroll-and-jam.

     -Xunrollthreshold  count
             Sets the threshold that limits the extent to which unrolling can
             be performed without causing the number of statements in the loop
             to exceed this number.  The default is 180.

     -Xunrolltimes  count
             Sets the maximum number of times to unroll a loop.  The default
             is 4.

SEE ALSO
     ucode(1), uopt(1), btou(1), ppu(1),

DIAGNOSTICS
     ddopt assumes the input ucode file is error-free.






































                                                                        Page 4



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026