Mercurial > ~dholland > hg > ag > index.cgi
diff doc/misc/html/oldclasslib/strdict.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/misc/html/oldclasslib/strdict.html Sat Dec 22 17:52:45 2007 -0500 @@ -0,0 +1,177 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> +<HTML> +<HEAD> +<TITLE>String Dictionary Class Definition</TITLE> +</HEAD> + + +<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" + TEXT="#000000" LINK="#0033CC" + VLINK="#CC0033" ALINK="#CC0099"> + +<P> +<IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram" + WIDTH=124 HEIGHT=30 > +<BR CLEAR="all"> +Back to : +<A HREF="../index.html">Index</A> | +<A HREF="index.html">Class libraries for examples</A> +<P> + +<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > +<P> + +<H1>String Dictionary Class Definition</H1> +<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > +<P> +<BR> +<H2>Introduction</H2> +<P> + +A string dictionary is a device for keeping track of character strings. +It associates a unique, non-zero, unsigned integer with each unique +string inserted in the dictionary. These indices are assigned in order +of insertion, so that the first string inserted has index 1, the second +index 2, and so on. These indices may be used to index auxiliary tables +containing information about the corresponding string. In particular, +these indices may be conveniently used to index a symbol table. A +hashing scheme is used to facilitate quick identification of a given +string. +<P> +<BR> + +<H2>Constructor </H2> + +The constructor for a string dictionary takes two unsigned integer +arguments: the maximum number of entries and the average number of +bytes per entry. The latter will default to ten if you don't specify +it. The product of size and length must be less than 65536. For +example: +<PRE> + string_dictionary td(5000); // 5000 entries, default size + string_dictionary xx(500,20); // 500 entries, average size = 20 +</PRE> +<P> +<BR> + +<H2>Entering Strings in a Dictionary </H2> + +To enter a string in a dictionary, use the overloaded operator " << ", +with the dictionary on the left, the string on the right. The result is +a nonzero, unsigned integer. For example: +<PRE> + unsigned hwx = td << "Hello, world!"; +</PRE> +<P> +<BR> + +<H2>Dictionary Lookups </H2> + +To look up a string in a dictionary, use the overloaded operator +"<CODE>[]</CODE>", +using the string to index the dictionary. The result is an unsigned +integer. It will be zero if the string is not found in the dictionary. +For example: +<PRE> + unsigned k = td["Hello, world!"]; +</PRE> +<BR> + +<H2>Recovering a String from a Dictionary </H2> + +The operator "<CODE>[]</CODE>" is also overloaded to allow you to get the pointer to +a particular string in the dictionary. Simply index the dictionary with +an unsigned integer. The result will be a pointer to the desired +string. If the index is zero, or larger than the number of strings in +the dictionary, the pointer will be null. For example: +<PRE> + unsigned k = td << "Hello"; + char *ptr = td[k]; //returns pointer to "Hello" +</PRE> +<BR> + +<H2>Number of Entries in the Dictionary </H2> + +To find the number of entries in a dictionary, use the function "size". +It returns an unsigned integer. Since the zero entry is unused, the +integer returned by size is the index of the last entry in the +dictionary. For example: +<PRE> + unsigned n = size(td); +</PRE> +<P> +<BR> + +<H2>Resetting a String Dictionary </H2> + +The overloaded function reset() may be used to restore a string +dictionary to its initial condition, deleting all entries: +<PRE> + reset(td); +</PRE> +<P> +<BR> + +<H2>Technical Notes </H2> + +The dictionary uses a hashing algorithm freely adapted from Fletcher's +checksum as described by John Kodis in Doctor Dobb's Journal, May, +1992. +<P> +The length of the hash table is the smallest power of two that is +greater than the specified size of the dictionary. Therefore, if you +are using a computer with 8086 architecture, you are limited to a +dictionary with fewer than 32768 entries. +<P> +The hash algorithm produces two unsigned integers, k1 and k2, which are +each less than 255. The hash code is then calculated as + <CODE> 255*k1 + k2</CODE>, +and masked to the length of the hash table. In case of collision, a +hash increment is calculated by using k1 and k2 in the opposite order: +<CODE> 255*k2 + k1</CODE>. +This is multiplied by two and one is added to guarantee an +increment which is relatively prime to the length of the hash table. +This process guarantees the entire table can be searched. +<P> +The number of calls and the number of collisions are recorded so you +can check the efficiency of the hash algorithm. +<P> +Because the dictionary allocates a single character array for all the +dictionary entries, the product of the max number of entries and the +average size must be less than 65536 when using 8086 architecture. If +this is inconvenient there are several alternatives. You can allocate +storage individually for each string and simply maintain a list of +pointers, or you can declare a "huge" array, and use longs to index it. +Remember that in this case the hash table would have to be declared as +an array of longs. + + +<P> +<BR> + + +<IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" + WIDTH=1010 HEIGHT=2 > +<P> +<IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software" + WIDTH=181 HEIGHT=25> +<BR CLEAR="right"> + +<P> +Back to : +<A HREF="../index.html">Index</A> | +<A HREF="index.html">Class libraries for examples</A> + +<P> +<ADDRESS><FONT SIZE="-1"> + AnaGram parser generator - examples<BR> + Class libraries - String dictionary class definition<BR> + Copyright © 1993-1999, Parsifal Software. <BR> + All Rights Reserved.<BR> +</FONT></ADDRESS> + +</BODY> +</HTML> +