view doc/misc/html/oldclasslib/strdict.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
line wrap: on
line source

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>String Dictionary Class Definition</TITLE>
</HEAD>


<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
 TEXT="#000000" LINK="#0033CC"
 VLINK="#CC0033" ALINK="#CC0099">

<P>
<IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram"
         WIDTH=124 HEIGHT=30 >
<BR CLEAR="all">
Back to :
<A HREF="../index.html">Index</A> |
<A HREF="index.html">Class libraries for examples</A>
<P>

<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
        WIDTH=1010 HEIGHT=2  >
<P>

<H1>String Dictionary Class Definition</H1>
<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
        WIDTH=1010 HEIGHT=2  >
<P>
<BR>
<H2>Introduction</H2>
<P>

A string dictionary is a device for keeping track of character strings.
It associates a unique, non-zero, unsigned integer with each unique
string inserted in the dictionary. These indices are assigned in order
of insertion, so that the first string inserted has index 1, the second
index 2, and so on. These indices may be used to index auxiliary tables
containing information about the corresponding string. In particular,
these indices may be conveniently used to index a symbol table. A
hashing scheme is used to facilitate quick identification of a given
string.
<P>
<BR>

<H2>Constructor </H2>

The constructor for a string dictionary takes two unsigned integer
arguments: the maximum number of entries and the average number of
bytes per entry. The latter will default to ten if you don't specify
it. The product of size and length must be less than 65536. For
example:
<PRE>
      string_dictionary td(5000);    // 5000 entries, default size
      string_dictionary xx(500,20);  // 500 entries, average size = 20
</PRE>
<P>
<BR>

<H2>Entering Strings in a Dictionary </H2>

To enter a string in a dictionary, use the overloaded operator " &lt;&lt; ",
with the dictionary on the left, the string on the right. The result is
a nonzero, unsigned integer. For example:
<PRE>
      unsigned hwx = td &lt;&lt; "Hello, world!";
</PRE>
<P>
<BR>

<H2>Dictionary Lookups </H2>

To look up a string in a dictionary, use the overloaded operator
"<CODE>[]</CODE>",
using the string to index the dictionary. The result is an unsigned
integer. It will be zero if the string is not found in the dictionary.
For example:
<PRE>
       unsigned k = td["Hello, world!"];
</PRE>
<BR>

<H2>Recovering a String from a Dictionary </H2>

The operator "<CODE>[]</CODE>" is also overloaded to allow you to get the pointer to
a particular string in the dictionary. Simply index the dictionary with
an unsigned integer. The result will be a pointer to the desired
string. If the index is zero, or larger than the number of strings in
the dictionary, the pointer will be null. For example:
<PRE>
       unsigned k = td &lt;&lt; "Hello";
       char *ptr = td[k];             //returns pointer to "Hello"
</PRE>
<BR>

<H2>Number of Entries in the Dictionary </H2>

To find the number of entries in a dictionary, use the function "size".
It returns an unsigned integer. Since the zero entry is unused, the
integer returned by size is the index of the last entry in the
dictionary. For example:
<PRE>
       unsigned n = size(td);
</PRE>
<P>
<BR>

<H2>Resetting a String Dictionary </H2>

The overloaded function reset() may be used to restore a string
dictionary to its initial condition, deleting all entries:
<PRE>
        reset(td);
</PRE>
<P>
<BR>

<H2>Technical Notes </H2>

The dictionary uses a hashing algorithm freely adapted from Fletcher's
checksum as described by John Kodis in Doctor Dobb's Journal, May,
1992.
<P>
The length of the hash table is the smallest power of two that is
greater than the specified size of the dictionary. Therefore, if you
are using a computer with 8086 architecture, you are limited to a
dictionary with fewer than 32768 entries.
<P>
The hash algorithm produces two unsigned integers, k1 and k2, which are
each less than 255. The hash code is then calculated as
 <CODE> 255*k1 + k2</CODE>,
and masked to the length of the hash table. In case of collision, a
hash increment is calculated by using k1 and k2 in the opposite order:
<CODE> 255*k2 + k1</CODE>.
This is multiplied by two and one is added to guarantee an
increment which is relatively prime to the length of the hash table.
This process guarantees the entire table can be searched.
<P>
The number of calls and the number of collisions are recorded so you
can check the efficiency of the hash algorithm.
<P>
Because the dictionary allocates a single character array for all the
dictionary entries, the product of the max number of entries and the
average size must be less than 65536 when using 8086 architecture. If
this is inconvenient there are several alternatives. You can allocate
storage individually for each string and simply maintain a list of
pointers, or you can declare a "huge" array, and use longs to index it.
Remember that in this case the hash table would have to be declared as
an array of longs.


<P>
<BR>


<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------"
      WIDTH=1010 HEIGHT=2 >
<P>
<IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software"
                WIDTH=181 HEIGHT=25>
<BR CLEAR="right">

<P>
Back to :
<A HREF="../index.html">Index</A> |
<A HREF="index.html">Class libraries for examples</A>

<P>
<ADDRESS><FONT SIZE="-1">
                  AnaGram parser generator - examples<BR>
                  Class libraries - String dictionary class definition<BR>
                  Copyright &copy; 1993-1999, Parsifal Software. <BR>
                  All Rights Reserved.<BR>
</FONT></ADDRESS>

</BODY>
</HTML>