Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ hashtable(3X) — OSF1 1.0

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

sorttable(3X)

stringx(3X)

HASHTABLE(3X)  —  Subroutines

NAME

hashtable − generic hashtable for C++

SYNOPSIS


#include <codelibs/boolean.h>
#include <codelibs/hashtable.h>
 // usage:
declare_hashtable(hashtable, iterator, symbol, key)
implement_hashtable(hashtable, iterator, symbol, key, hash)
 // preconditions:
unsigned hash(key);    // optional
class symbol
{
public:
symbol(symbol&);    // if appropriate
void operator=(symbol&);  // if appropriate
~symbol();                // if appropriate
symbol(key);
boolean operator==(key);
operator key();
};
 // builds this:
class hashtable
{
public:
hashtable(unsigned hs = 0,
unsigned (∗bump_function)(unsigned) = NULL);
hashtable(hashtable&);
~hashtable();
 boolean find(key);
boolean insert(key);
boolean remove(key);
 boolean operator+=(key k) { return insert(k); }
boolean operator-=(key k) { return remove(k); }
// operators += and -= are overloaded below
 boolean operator==(key k) { return find(k); }
boolean operator!=(key k) { return !find(k); }
 hashtable &clear();
hashtable &operator=(hashtable&);
 // operators += and -= are overloaded above
boolean &operator==(hashtable&);
hashtable &operator+=(hashtable&);
hashtable &operator-=(hashtable&);
hashtable &operator&=(hashtable&);
 symbol &operator[](key);
symbol &get();
symbol &operator()();
symbol &operator∗();
symbol ∗operator->();
unsigned size();
};
 class iterator
{
public:
iterator();
iterator(hashtable &);
~iterator();
hashtable &operator=(hashtable &);
boolean operator==(iterator &);
boolean operator!=(iterator &);
operator boolean();
symbol &get();
symbol &operator()();
symbol &operator∗();
symbol ∗operator->();
symbol &operator++();
void remove();
};
 CC ... -lcodelibs
 

DESCRIPTION

Hashtable is a C++ header file which defines a generic hashtable class and an associated iterator class. 

A hashtable is an unordered set of instances of type symbol which are uniquely identified by values of type key. If the symbol and key types are identical, or if the key type is a reference to symbol, then the hashtable is simply a set of symbol. An iterator is used to successively refer to the members of a hashtable without identifying them by their key values.  The ordering of the symbols seen by an iterator is undefined. 

DECLARATION

declare_hashtable
This macro declares a class named hashtable with the given symbol and key types and an associated class named iterator. This declaration must be visible at any place which refers to one of the class names hashtable or iterator.  It is typically added to a header file where externs are kept. 

implement_hashtable
This macro implements the hashtable and iterator classes declared above, using the hash function for mapping key values onto nonnegative integer values.  This implementation must appear exactly once somewhere in the user’s code, and is typically put into some ∗.C source file. 

PRECONDITIONS

The symbol and key arguments must denote C++ types, either predefined or user-defined.  These arguments must be type expressions that can precede an identifier in a declaration (e.g., int, complex, char∗, complex&).  Types which require an embedded identifier in a declaration (e.g., int (∗)(float)) MUST be given simple names via “typedef”. 

The key argument may denote a reference type, in which case all key values will be passed by reference to the hashtable member functions find, insert, remove, and operator[].  This is useful when key values are large.  For example, when implementing a set of symbol, the key type should be symbol& if it is too expensive to pass entire symbol values around.  The symbol argument itself must not denote a reference type. 

The classes or types symbol and key must be defined to provide the operators, constructors, and destructors needed by the generated hashtable as described in the SYNOPSIS section above.  Any user-defined ‘inline’ functions will indeed be expanded at the point of call. 

The ‘==’ operator is used to compare a symbol to a key. This operator should return TRUE if and only if the symbol is identified by the given key.

It must be possible to initialize a symbol from a key such that the resulting symbol is identified by the given key.

The name hash must denote a function which maps key values onto integer ‘hash’ values.  The hash function must always map equivalent key values onto the same hash value.  To increase the efficiency of hashtable operations, the hash function should be very fast and should tend to map different key values onto different hash values.  (A default hash function for standard C-string char∗ types is provided in the stringx3x package and is called strhash.)

It must be possible to convert a symbol into a key. This occurs when the hashtable is dynamically resized and each symbol is rehashed into the larger hashing space.  If symbol is a class type, it could have a member function operator key().  For example, if the key type is char∗, the member function operator char∗() would return the symbol’s key. If the key type is complex&, then the member function operator complex&() would return a reference to the key value.  Alternatively, if key is a class type, the type conversion can be accomplished by having a key constructor from type symbol (e.g., key(symbol) or key(symbol&). 

If symbol is a class type with a destructor, the destructor is called when a symbol is removed from a hashtable.

OPERATIONS

A hashtable has a current symbol referred to by the member function operator().  The current symbol is initially undefined.  The only member functions which can change the current symbol are find, insert, remove, and operator[]. 

An iterator has a parent which, if defined, refers to a hashtable. An iterator with an undefined parent is known as an "orphan".  An iterator has a “current symbol” which, if defined, refers to a symbol within the parent of the iterator. The current symbol of an orphan is undefined.

When an iterator is initialized from or assigned to a parent, its current symbol is the first symbol in the parent.  If the parent is empty, the current symbol is undefined.  When an iterator is advanced (via operator++) its current symbol becomes the next symbol in the parent.  If there is no next symbol, the current symbol becomes undefined. Advancing an iterator whose current symbol is undefined has no effect.  As an iterator is advanced from the first symbol through the last, it is guaranteed never to see the same symbol twice. 

If the current symbol of an iterator is removed from its parent, the current symbol of the iterator becomes the next symbol in the parent. 

If a symbol is inserted into the parent of an iterator, the new entry may appear before or after the current symbol of the iterator. Thus, it is undefined whether the iterator will encounter the new entry during its advancement through the remainder of the parent. 

A hashtable is never dynamically resized as long as there is an iterator associated with it.  Otherwise, the ordering of the entries would change ‘underneath’ the iterator, causing unpredictable behavior. Thus, it is generally unwise to have an iterator associated with a hashtable while insertions are being made to that hashtable, since this will stunt the growth of the hashtable.

HASHTABLE

hashtable
Construct an empty hashtable. The current symbol is undefined. The initial hashsize of this hashtable is hs if that value is nonzero.  Otherwise, the initial hashsize is computed using the ‘bump’ function specified by bump_function. If bump_function is NULL, a default bump function is used.  This should be sufficient for most applications.  For more details, see "BUMP FUNCTION" below. 

~hashtable
Destroy the hashtable. Each symbol in the hashtable is deleted, and any associated iterators become orphans. 

find operator== operator!=
Find and operator== return TRUE if and only if there is a symbol with the given key. If TRUE is returned, the current symbol is the one with the given key; otherwise it is undefined. Operator!= returns TRUE if there is NO such symbol in the table.  If there is such a symbol then the current symbol is set to it and FALSE is returned.  (The operator inlines are provided for convenience.) 

insert operator+=
Insert a symbol with the given key into the hashtable. Returns FALSE if and only if there is a symbol with the given key already in the hashtable. Thus if TRUE is returned, the current symbol is a newly inserted one constructed from the key; otherwise it is the already existing one. Operator+= is overloaded for merging hashtables below. 

remove operator-=
Remove the symbol with the given key from the hashtable. Return TRUE if and only if the symbol was actually found and deleted from the hashtable. All iterators referring to the existing symbol are advanced before the symbol is deleted.  If FALSE is returned, then the contents of the hashtable are unchanged.  In either case, the current symbol is undefined.  Operator-= is also overloaded below. 

clear Remove all elements from the hashtable. Return a reference to the cleared table to allow chaining of operations.

operator=
Copy the contents of the specified hashtable. This allows easy duplication and manipulation of tables. Returns a reference to this.

operator==
This compares two hashtables return TRUE if they contain the same elements and FALSE if they don’t.

operator+=
(This operator is also overloaded for adding single elements.) Here, it adds all elements in the specified hashtable to the current table ( this). Returns a reference to this.

operator-=
(This operator is also overloaded for removing single elements.) Here, it removes all the elements in the specified hashtable from the current table ( this). Returns a reference to this.

operator&=
Performs an intersection between the two tables leaving only the common elements in this. Returns a reference to this.

operator[]
Return a reference to the symbol with the given key if it exists; otherwise return a reference to a newly inserted symbol constructed from the key. The current symbol is the one whose reference is returned. (These semantics ensure that this operation can never fail.)

get operator() operator∗
Return a reference to the current symbol. If there is no current symbol, the returned value is both invalid and undefined.

operator->
Returns a pointer to the current symbol. If there is no current symbol, then NULL is returned.

size Return the number of symbols currently in the hashtable.

ITERATOR

iterator
Construct either an orphan iterator or an iterator with the given hashtable as its parent.  The current symbol of this iterator is the first symbol in the parent, or undefined if the parent is empty. 

~iterator
Destroy this iterator. The iterator is no longer associated with its parent. 

operator=
Make the given hashtable be the new parent of this iterator. The current symbol of this iterator is the first symbol in the new parent, or undefined if the new parent is empty.  Return a reference to the new parent. 

operator==
Return TRUE if and only if the current symbols of this iterator and the given one are identical (i.e., refer to the same symbol). Two undefined current symbols are considered identical.

operator!=
Return TRUE if and only if the current symbols of this iterator and the given one are different. 

operator boolean
Return TRUE if and only if the current symbol of this iterator is defined. 

get operator() operator∗
Return a reference to the current symbol of this iterator.

operator->
Returns a pointer to the current symbol of this iterator.

operator++
Advance this iterator to the next symbol in the parent and then return a reference to the current symbol.  If there is no next symbol in the parent, the current symbol becomes undefined.  Advancing an iterator has no effect if its current symbol is undefined. 

remove
Remove the current symbol of this iterator from the parent.  If the current symbol is undefined, this function has no effect.  All iterators (including this one) which refer to the current symbol of this iterator are advanced before the symbol is deleted. 

BUMP FUNCTION

Whenever a hashtable is dynamically resized into a larger hashing space, the ‘bump’ function is used to compute the new hashsize as a function of the current hashsize.  Dynamic resizing occurs in two situations.  First, when a new hashtable is constructed, its initial hashsize is bump(0) if bump(0) is nonzero.  Otherwise, its initial hashsize is 1.  Second, when the number of entries in a hashtable reaches a threshold value (25% more than its current hashsize), and there are no iterators associated with the hashtable, the bump function is called to determine the new hashsize. If the new hashsize is greater than the current hashsize, then the hashtable is resized accordingly.  Otherwise, the hashtable is not resized. 

The default bump function essentially doubles the current hashsize and rounds that result up to the nearest prime number.  Its actual definition is subject to change. 

The user may provide a bump-function to better control memory growth or the hash itself.  This function must map an unsigned int onto another unsigned int which is typically greater than its argument.  If it is not greater, then the hashtable is not resized.  Prime numbers should be returned as they tend to make for better hashing.  However, with a properly matching hash function, even this is not necessary. 

COMMENTS

Note that a symbol class may overload its member functions to accept multiple key types.  Although only one key type is used in a particular hashtable class, different hashtable classes can have the same symbol type but make use of different key types. 

Data which is shared among different instances of one or more symbol types must be managed appropriately.  For example, the destructor for the symbol class could decrement a reference count in the shared data and delete the data if the count became zero. 

Note that a symbol class need not actually contain an object of type key. The information constituting the key value of a symbol can be "distributed" within the symbol or even computed from it in some manner.  As noted previously, the symbol and key types can even be identical. 

WARNING

This version of hashtable now that the user’s class have another constructor to create a symbol from another symbol, as well as an operator= to assign one symbol to another.  This is to fix a bug when one hashtable is assigned to or constructed from another. 

It is generally unwise to have an iterator associated with a hashtable while insertions are being made to that hashtable, since this will stunt the growth of the hashtable.

EXAMPLE

#include <stdio.h>
#include <string.h>
#include <codelibs/boolean.h>
#include <codelibs/hashtable.h>
#include <codelibs/stringx.h>
 class Entry
{
    char ∗ename;
    unsigned long enumber;
 public:
    // the following entries are for the hashtable
    Entry(Entry &e) { ename = strdup(e.ename); enumber = e.enumber; }
    void operator=(Entry &e)
        { strfree(ename); ename = strdup(e.ename); enumber = e.enumber; }
    ~Entry() { strfree(ename); }
    Entry(char ∗k) { ename = strdup(k); enumber = 0; }
    boolean operator==(char ∗k) { return strcmp(ename, k) == 0; }
    operator char∗() { return ename; }
     // these are so that we can read and set the class vars
    char ∗name() { return ename; }
    unsigned long number() { return enumber; }
    void number(unsigned long n) { enumber = n; }
};
 // the table will be named type "Table" with iterator "Table_iter":
declare_hashtable(Table, Table_iter, Entry, char∗);
implement_hashtable(Table, Table_iter, Entry, char∗, strhash);
  main(int argc, char ∗argv[])
{
    Table table;
    int i;
     // add our command-line arguments into the table
    for (i = 1; i < argc; i++)
if (table.insert(argv[i]))
{
    table().number(i);
    printf("New Entry: %s %d\n", table().name(), table().number());
}
else
    printf("Sorry, %s is already listed at %d\n",
    table().name(), table().number());
    printf("\n");
     // look for the string "test" in the table
    char ∗s = "test";
    if (table.find(s))
    printf("The number for %s is %d\n",
    table().name(), table().number());
    else
    printf("There is no one listed by that name.\n");
     // and try to remove it
    if (table.remove(s))
    printf("Removed %s\n", s);
    else
    printf("Cannot remove nonexistent %s\n", s);
    printf("\n");
     // now set the value for "bozo" to 42, creating it if necessary
    table["bozo"].number(42);
    printf("The number of %s is now %d\n\n",
    table().name(), table->number());
     printf("The size of the Table is %d\n\n", table.size());
     // print out all the entries in the table
    Table_iter iter;
    printf("Table:\n");
    for (iter = table; iter; iter++)
printf("%s : %d\n", (∗iter).name(), iter->number());
    printf("\n");
     // create a new copy of the "table"
    Table newtab = table;
     // Remove all entries whose number is greater than 25
    printf("Table with numbers greater than 25 deleted:\n");
    iter = table;
    while (iter)
if (iter().number() > 25)
    iter.remove();
else
{
    printf("%s : %d\n", iter().name(), iter().number());
    iter++;
}
    printf("\n");
     // remove all entries in "table" from "newtab",
    //     leaving only the elements originally deleted from "table"!
    newtab -= table;
    printf("Deleted entries are:\n");
    for (iter = newtab; iter; iter++)
printf("%s\n", iter->name());
     return 0;
}

SEE ALSO

sorttable(3X), stringx(3X)
The Art of Computer Programming, by Donald E. Knuth.

  —  codelibs  —  C++

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026