Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ migration(5) — IRIX 6.5.3f

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

numa(5)

replication(5)

mtune(4)

refcnt(5)

mmci(5)

nstats(1)

sn(1)



migration(5)                                                      migration(5)



NAME
     migration - dynamic memory migration

DESCRIPTION
     This document describes the dynamic memory migration system available in
     Origin systems.


   Introduction
     Dynamic page migration is a mechanism that provides adaptive memory
     locality for applications running on a NUMA machine such as the Origin
     systems. The Origin hardware implements a competitive algorithm based on
     comparing remote memory access counters to a local memory access counter;
     when the difference between the numbers of remote an local accesses goes
     beyond a preset threshold, an interrupt is generated to inform the
     Operating System that a physical memory page is currently experiencing
     excessive remote accesses.


     Within the interrupt handler the Operating System takes a final decision
     as to whether migrate the page or not. If it decides to migrate the page,
     the migration is executed immediately. The system may decide not to
     execute the migration due to enforcement of a migration control policy or
     due to lack of resources.


     Page migration can also be explicitely requested by users, and in
     addition, it is used to assist the memory coalescing algorithms for
     multiple page size support.


   Migration Modules
     The migration subsystem is composed of the following modules:

     - Detection Module. This module monitors memory accesses issued by nodes
       in the system to each physical memory page. In Origin systems this
       module is mostly implemented in hardware. This detection module informs
       the Migration Control Module that a page is experiencing excessive
       remote accesses via an interrupt sent to the page's home node.

     - Migration Engine Module. This module carries out the effective data
       movement from their current physical memory page to a new page in the
       node issuing the remote accesses.

     - Migration Control Module. This module decides whether the page should
       be migrated or not, based on migration control policies, defined by
       parameters such as migration threshold, bounce detection and
       prevention, dampening factor, and others.

     - Migration Control Periodic Operations Module. This module executes all
       periodic operations needed for the Migration Control Module.




                                                                        Page 1





migration(5)                                                      migration(5)



     - Memory Management Control Interface Module (MMCI Module). This module
       provides an interface for users to tune the migration policy associated
       with and address space.


   Migration Detection Module
     The basic goal of memory migration is to minimize memory access latency.
     In a NUMA system where local memory access latency is smaller then remote
     memory access latency, we can achieve this latency minimization goal by
     moving the data to the node where most memory references are going to be
     issued from.

     It'd be great to be able to move data to the node where they're going to
     be needed right before they're actually needed. Unfortunately, we cannot
     predict the future (yet). Fortunately, common programs usually have some
     amount of temporal and spatial locality, which allows us to heuristically
     predict the future based on the behavior observed during some past period
     of time.

     The usual procedure used to predict future memory accesses to a page is
     to count the memory references to this page issued by each node in the
     system. If the accumulated number of remote references becomes
     considerably greater than the number of accumulated local references,
     then it may be beneficial to migrate the page to the remote node issuing
     the references, specially if this remote node will continue accessing
     this same page for a long time.

     Origin systems have counters that continuosly monitor all memory accesses
     issued by each node in the system to each physical memory page. In a 64-
     node Origin (128 processors), we have 64 memory access counters for every
     4-KB low level physical page (4 KB is the size of a low level physical
     page size; software page sizes start at 16KB for Origin systems). For
     every memory access, the counter associated with the node issuing the
     reference is incremented; at the same time, this counter is compared to
     the counter that keeps track of local accesses, and if the remote counter
     exceeds the local counter by a threshold, an interrupt is generated
     advising the Operating System about the existence of a page with
     excessive remote accesses.

     Upon reception of the interrupt, the Migration Control Module in the
     Operating System decides whether to migrate the page or not.

     The threshold that determines how large the difference between the remote
     and the local counters needs to be in order for the interrupt to be
     generated is stored in a per-node hardware register, which is initialized
     by the Migration Control Module. The default system threshold defined in
     /var/sysgen/mtune/numa by the tunable variables
     numamigrdefaultthreshold and numamigrthresholdreference (see
     Migration Tunables below), and the threshold specified by users as a
     parameter of a  migration policy (mmci(5)), are not directly stored into
     this register due to the fact that different pages on the same node may
     have different migration thresholds. These thresholds are used to



                                                                        Page 2





migration(5)                                                      migration(5)



     initialize the reference counters when a page is initialized.


   Migration Engine Module
     This module transparently moves a page from one physical frame to
     another. The migration engine first verifies the availability of all
     resources needed to realize the migration of a page and, if not all of
     them are available, it just aborts the operation.

     The data transfer operation may be done using a processor or a
     specialized Block Transfer Engine. Translation lookaside buffer (TLB)
     shootdowns may be done using inter-processor interrupts or special
     hardware kown as poison bits, available only as an option on special
     Origin systems running Cellular Irix 6.5 or later. TLB shootdowns are
     needed in order to avoid the use of stale translations that may be
     pointing to the physical memory page that contained the data before
     migration took place. Normally, a TLB shootdown operation is performed by
     sending interrupts to all processors in the system with a TLB that may
     have stale translation entries. On systems with poison bits, this global
     TLB shootdown is not needed: along with the data transfer operation,
     hardware bits are automatically set to indicate that the page is now
     stale (poisonous); if a processor tries to access this stale page via a
     stale translation, the memory management hardware generates a special Bus
     Error which causes the TLB with the stale translation to be updated.
     Effectively, poison bits allow for the implementation of a lazy TLB
     shootdown algorithm.

     The vehicle used for the data transfer operation may be selected by the
     system administrator via a tunable variable in /var/sysgen/mtune/numa:
     numamigrvehicle. Poison bit based TLB shootdowns are enabled whenever
     the data transfer vehicle is the Block Transfer Engine and the hardware
     is equipped with the optional poison bits.


   Migration Control Module
     This module decides whether a page should be migrated or not, after
     receiving a notification (via an interrupt) from the Migration Detection
     Module alerting that a page is experiencing excessive remote accesses.
     This decision is based on applicable migration control policies and
     resource availability.

     The basic idea behind controlling migration is that it's not always a
     good idea to migrate a page when the memory reference counters are
     telling us that a page is experiencing excessive remote accesses; indeed,
     sometimes the page may be bouncing back and forth due to ill application
     behavior, the counters may have accumulated too much past knowledge,
     making them unfit to predict near future behavior, the destination node
     may be very low in memory, or the path currently needed to do the
     migration may currently be too busy.

     The Migration Control Module applies a series of filters to a reference
     counter notification or migration request, as enumerated below.  All



                                                                        Page 3





migration(5)                                                      migration(5)



     tunables mentioned in this list are found in /var/sysgen/mtune/numa.


     Node Distance Filter        This filter rejects all migration requests
                                 where the distance between the source and the
                                 destination is less than
                                 numamigrmindistance in
                                 /var/sysgen/mtune/numa. All rejected requests
                                 result in the page being frozen in order to
                                 prevent this request from being re-issued too
                                 soon.

     Memory Pressure Filter      This filter rejects migration requests to
                                 nodes where physical memory is low.  The
                                 threshold for low memory is defined by the
                                 tunable numamigrmemorylowthreshold, which
                                 defines the minimum percentage of physical
                                 memory that needs to be available in order
                                 for a page to be migrated there. This filter
                                 can be enabled and disabled using the tunable
                                 numamigrmemorylowenabled.

     Traffic Control Filter      Experimental filter intended to throttle
                                 migration down when the Craylink Interconnect
                                 traffic reaches peak levels. Experiments have
                                 shown that this filter is unnecessary for
                                 Origin 2000 systems.

     Bounce Control Filter       Sometimes pages may start bouncing due to ill
                                 application behavior or simple page level
                                 false sharing. This filter detects and
                                 freezes bouncing pages. The detection is done
                                 by keeping a count of the number of
                                 migrations per page in a counter that is aged
                                 (periodically decremented by a system
                                 daemon). If the count ever goes beyond a
                                 threshold, it is considered to be bouncing
                                 and it is therefore frozen. Frozen pages
                                 start melting immediately, so that after some
                                 period of time, they are unfrozen and
                                 migratable again. Note the the melting
                                 procedure is gradual, not instantaneous. The
                                 bounce control filter relies on operations
                                 executed periodically by the Migration
                                 Control Periodic Operations Module described
                                 below, for a) aging of the migration counters
                                 and b) melting of frozen pages. The period of
                                 these bounce control periodic operations is
                                 defined by the tunable
                                 numamigrbouncecontrolinterval. The
                                 default value for this tunable is 0, which
                                 translates into a period such that 4 physical



                                                                        Page 4





migration(5)                                                      migration(5)



                                 pages are operated on per tick (10[ms]
                                 intervals). Freezing can be enabled and
                                 disabled using the tunable
                                 numamigrfreezeenabled, and the freezing
                                 threshold can be set using the tunable
                                 numamigrfreezethreshold. This threshold is
                                 specified as a percentage of the maximum
                                 efffective freezing threshold value, which is
                                 7 for Origin 2000 systems. Melting can be
                                 enabled and disabled using the tunable
                                 numamigrmeltenabled, and the melting
                                 threshold can be set using the tunable
                                 numamigrmeltthreshold.  The melting
                                 threshold is expressed as a percentage of the
                                 maximum effective melting threshold value,
                                 which is 7 for Origin 2000 systems.

     Migration Dampening Filter  This filter minimizes the amount of migration
                                 due to quick temporary remote memory
                                 accesses, such as those that occur when
                                 caches are loaded from a cold state, or when
                                 they are reloaded with a new context. We
                                 implement this dampening flter using a per-
                                 page migration request counter that is
                                 incremented every time we receive a migration
                                 request interrupt, and aged (periodically
                                 decremented) by the Migration Control
                                 Periodic Operations Module. We effectively
                                 migrate a page only if the counter reaches a
                                 value greater than some dampening threshold.
                                 This will happen only for applications tha
                                 continuously generate remote accesses to the
                                 same page during some interval of time. If
                                 the application experiences just a short,
                                 transitory sequence of remote accesses, it is
                                 very unlikely that the migration request
                                 counter will reach the threshold value. This
                                 filter can be enabled and disabled using the
                                 tunable numamigrdampeningenabled, and the
                                 migration request coun threshold can be set
                                 using the tunable numamigrdampeningfactor.


     The memory reference counters are re-initialized to their startup values
     after every reference counter interrupt.


   Migration Control Periodic Operations Module
     The Migration Control Module relies on several periodic operations. These
     operations are listed below:





                                                                        Page 5





migration(5)                                                      migration(5)



     - Bounce Control Operations. Age migration counter for freezing and
       melting.

     _ Unpegging. Reset memory reference counters that have reached a
       saturation level.

     - Queue Control Operations. Age queued outstanding migration requests.
       Experimental, always disabled for production systems.

     - Traffic Control Operations. Sample the state of the Craylink
       interconnect and correspondingly adjust the per-node migration
       threshold. Experimental, always disabled for production systems.

     These operations are executed in a loop, triggered once every
     memtickbaseperiod, a tunable that defines the migration control
     periodic period in terms of system ticks (a system tick is equivalent to
     10 [ms] on Origin systems runing Irix 6.5). This loop of operations may
     be enabled and disabled using the tunable memtickenabled.  If migration
     is enabled or users are allowed to use migration, this loop must be
     enabled.

     In order to minimize interference with user processes, we limit the
     number of pages operated on in a loop to a few pages, trying to limit the
     time used to less than 20 [us]. Administrators can adjust the time
     dedicated to these periodic operations via the following tunables:

     + memtickbaseperiod

     + numamigrunpeggingcontrolinterval

     + numamigrtrafficcontrolinterval

     + numamigrbouncecontrolinterval

   Description of Periodic Operations
     The following list describes the Bounce Control Periodic Operations in
     detail:

     Aging Migration Counters    In order to detect bouncing we keep track of
                                 the number of migrations per page using a
                                 counter that is periodically decremented
                                 (aged). When the counter goes beyond a
                                 threshold, we consider the page to be
                                 bouncing a freeze it.

     Aging Migration Request Counters
                                 In order to avoid excessive migration or
                                 bouncing due to short, transitory remote
                                 memry access sequences we have a migration
                                 dampening filter that needs to count several
                                 migration requests within a limited period of
                                 time before it actually lets  a real page



                                                                        Page 6





migration(5)                                                      migration(5)



                                 migration take place. The time factor is
                                 introduced in the filter by aging the
                                 migration request counters.

     Melting Frozen Pages        When a page is frozen we want to eventually
                                 unfreeze it so that it becomes migratable
                                 again. This behavior is desirable because
                                 usually the events that cause a page to be
                                 frozen are temporary. As part of the periodic
                                 operations, we increment a counter per page
                                 to keep track of how long the page has been
                                 frozen. When the counter goes beyond a
                                 threshold, menaing that the page has been
                                 frozen for a sufficiently long time already,
                                 we unfreeze the page making it migratable
                                 again.

     The Unpegging Periodic Operation consists in scanning all the memory
     reference counters looking for those counters that have pegged due to
     having reached their maximum count. When a pegged counter is found, the
     complete set of counters that counter belongs to (all counter associated
     with a page) is restarted.

     The current implementation of the Migration Control module does not
     execute Queue Control Periodic Operations or Traffic Control Periodic
     Operations.

   Page Migration Tunables
     This is a list of all the memory migration tunables in
     /var/sysgen/mtune/numa that define the default memory migration policy
     used by the system.

     * numamigrdefaultmode.  This tunable defines the default migration
       mode. It can take the following values:


              0: MIGR_DEFMODE_DISABLED
                 Migration is completely disabled, users cannot use migration.

              1: MIGR_DEFMODE_ENABLED
                 Migration is always enabled, users cannot disable migration.

              2: MIGR_DEFMODE_NORMOFF
                 Migration is normally off, users can enable migration for an application.

              3: MIGR_DEFMODE_NORMON
                 Migration is normally on, users can disable migration for an application.

              4: MIGR_DEFMODE_LIMITED
                 Migration is normally off for machine configurations with
                 a maximum Craylink distance less than  numa_migr_min_maxradius
                 (defined below). Migration is normally on otherwise. Users



                                                                        Page 7





migration(5)                                                      migration(5)



                 can override this mode.



     *    numamigrdefaultthreshold.  This threshold defines the minimum
          difference between the local and any remote counter needed to
          generate a migration request interrupt.


              if ( (remote_counter - local_counter) >=
                   ((numa_migr_threshold_reference_value / 100) * numa_migr_default_threshold)) {
                          send_migration_request_intr();
              }



     *    numamigrthresholdreference.  This parameter defines the pegging
          value for the memory reference counters.  It is machine
          configuration dependent. For Origin 2000 systems, it can take the
          following values:


             0: MIGR_THRESHREF_STANDARD = Threshold reference is 2048 (11 bit counters)
                                          Maximum threshold allowed for systems
                                          with STANDARD DIMMS. This is the default.
             1: MIGR_THRESHREF_PREMIUM =  Threshold reference is 524288 (19-bit counters)
                                          Maximum threshold allowed for systems
                                          with *all* PREMIUM SIMMS.



     *    numamigrvehicle.  This tunable defines what device the system
          should use to migrate a page.  The value 0 selects the Block
          Transfer Engine (BTE) and a value of 1 selects the processor. When
          the BTE is selected, and the system is equipped with the optional
          poison bits, the system automatically uses Lazy TLB Shootdown
          Algorithms.

     *    numamigrminmaxradius.  This tunable is used if
          numamigrdefaultmode has been set to mode 4
          (MIGR_DEFMODE_LIMITED). For this mode, migration is normally off for
          machine configurations with a maximum Craylink distance less than
          numamigrminmaxradius Migration is normally on otherwise.

     *    numamigrautomigrmech.  This tunable defines the migration
          execution mode for memory reference counter triggered migrations: 0
          for immediate and 1 for delayed. Only the Immediate Mode (0) is
          currently available.

     *    numamigrusermigrmech.  This tunables defines the migration
          execution mode for user requested migrations:  0 for immediate and 1
          for delayed. Only the Immediate Mode (0) is currently available.



                                                                        Page 8





migration(5)                                                      migration(5)



     *    numamigrcoaldmigrmech .  This tunables defines the migration
          execution mode for memory coalescing migrations:  0 for immediate
          and 1 for delayed. Only the Immediate Mode (0) is currently
          available.

     *    numarefcntdefaultmode.  This tunable defines the default extended
          reference counter mode. It can take the following values:


             0: REFCNT_DEFMODE_DISABLED
                Extended reference counters are disabled, users cannot access the
                extended reference counters (refcnt(5)).

             1: REFCNT_DEFMODE_ENABLED
                Extended reference counters are always enabled, users cannot disable them.

             2: REFCNT_DEFMODE_NORMOFF
                Extended reference counters are normally disabled, users can
                disable or enable the counters for an application.

             3: REFCNT_DEFMODE_NORMON
                Extended reference counters are normally enabled, users can disable or
                enable the counters for an application.


     *    numarefcntoverflowthreshold This tunable defines the count at
          which the hardware reference counters notify the operating system of
          a counter overflow in order for the count to be transfered into the
          (software) extended reference counters. It is expresses as a
          percentage of the threshold reference value defined by
          numamigrthresholdreference.

     *    numamigrmindistance Minimum distance required by the Node
          Distance Filter in order to accept a migration request.

     *    numamigrmemorylowenabled Enable or disable the Memory Pressure
          Filter.

     *    numamigrmemorylowthreshold Threshold at which the Memory
          Pressure Filter starts rejecting migration requests to a node. This
          threshold is expressed as a percentage of the total amount of
          physical memory in  a node.

     *    numamigrfreezeenabled Enable or disable the freezing operation in
          the Bounce COntrol Filter.

     *    numamigrfreezethreshold Threshold at which a page is frozen. This
          tunable is expressed as a percent of the maximum count supported by
          the migration counters (7 for Origin 2000).






                                                                        Page 9





migration(5)                                                      migration(5)



     *    numamigrmeltenabled Enable or disable the melting operation in
          the Bounce Control Filter.

     *    numamigrmeltthreshold When a migration counter goes below this
          threshold a page is unfrozen.  This tunable is expressed as a
          percent of the maximum count supported by the migration counters (7
          for Origin 2000).

     *    numamigrbouncecontrolinterval This tunable defines the period
          for the loop that ages the migration counters and the dampening
          counters. It is expressed in terms of number of mem_ticks.  (the
          mem_tick unit is defined by memtickbaseperiod below).  If it is
          set to 0, we process 4 pages per mem_tick. In this case, the actual
          period depends on the amount of physical memory present in a node.

     *    numamigrdampeningenabled Enable or disable migration dampening.

     *    numamigrdampeningfactor The number of migration requests needed
          for a page before migration is actually executed. It is expressed as
          a percentage of the maximum count supported by the migration-request
          counters (3 for Origin 2000).

     *    memtickenabled Enable or disabled the loop that executes the
          Migration Control Periodic Operation.

     *    memtickbaseperiod Number of 10[ms] system ticks in one mem_tick.

     *    numamigrunpeggingcontrolenabled Enable or disable the unpegging
          periodic operation

     *    numamigrunpeggingcontrolinterval This tunable defines the period
          for the loop that unpegs the hardware memory reference counters. It
          is expressed in terms of number of mem_ticks.  (the mem_tick unit is
          defined by memtickbaseperiod).  If it is set to 0, we process 8
          pages per mem_tick. In this case, the actual period depends on the
          amount of physical memory present in a node.

     *    numamigrunpeggingcontrolthreshold Hardware memory reference
          counter value at which we consider the counter to be pegged. It is
          expressed as a percent of the maximum count defined by
          numa_migr_threshold_reference.

     *    numamigrtrafficcontrolenabled Enable or disable the Traffic
          Control Filter. This is an experimental module, and therefore it
          should always be disabled.

     *    numamigrtrafficcontrolinterval Traffic control period.
          Experimental module.

     *    numamigrtrafficcontrolthreshold Traffic control threshold for
          kicking the batch migration of enqueued migration requests.
          Experimental module.



                                                                       Page 10





migration(5)                                                      migration(5)



SEE ALSO
     numa(5), replication(5), mtune(4), /var/sysgen/mtune/numa, refcnt(5),
     mmci(5), nstats(1), sn(1).




















































                                                                       Page 11



Typewritten Software • bear@typewritten.org • Edmonds, WA 98026