Mercurial > ~dholland > hg > ag > index.cgi
view doc/misc/html/oldclasslib/strdict.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line source
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <TITLE>String Dictionary Class Definition</TITLE> </HEAD> <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" TEXT="#000000" LINK="#0033CC" VLINK="#CC0033" ALINK="#CC0099"> <P> <IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram" WIDTH=124 HEIGHT=30 > <BR CLEAR="all"> Back to : <A HREF="../index.html">Index</A> | <A HREF="index.html">Class libraries for examples</A> <P> <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P> <H1>String Dictionary Class Definition</H1> <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P> <BR> <H2>Introduction</H2> <P> A string dictionary is a device for keeping track of character strings. It associates a unique, non-zero, unsigned integer with each unique string inserted in the dictionary. These indices are assigned in order of insertion, so that the first string inserted has index 1, the second index 2, and so on. These indices may be used to index auxiliary tables containing information about the corresponding string. In particular, these indices may be conveniently used to index a symbol table. A hashing scheme is used to facilitate quick identification of a given string. <P> <BR> <H2>Constructor </H2> The constructor for a string dictionary takes two unsigned integer arguments: the maximum number of entries and the average number of bytes per entry. The latter will default to ten if you don't specify it. The product of size and length must be less than 65536. For example: <PRE> string_dictionary td(5000); // 5000 entries, default size string_dictionary xx(500,20); // 500 entries, average size = 20 </PRE> <P> <BR> <H2>Entering Strings in a Dictionary </H2> To enter a string in a dictionary, use the overloaded operator " << ", with the dictionary on the left, the string on the right. The result is a nonzero, unsigned integer. For example: <PRE> unsigned hwx = td << "Hello, world!"; </PRE> <P> <BR> <H2>Dictionary Lookups </H2> To look up a string in a dictionary, use the overloaded operator "<CODE>[]</CODE>", using the string to index the dictionary. The result is an unsigned integer. It will be zero if the string is not found in the dictionary. For example: <PRE> unsigned k = td["Hello, world!"]; </PRE> <BR> <H2>Recovering a String from a Dictionary </H2> The operator "<CODE>[]</CODE>" is also overloaded to allow you to get the pointer to a particular string in the dictionary. Simply index the dictionary with an unsigned integer. The result will be a pointer to the desired string. If the index is zero, or larger than the number of strings in the dictionary, the pointer will be null. For example: <PRE> unsigned k = td << "Hello"; char *ptr = td[k]; //returns pointer to "Hello" </PRE> <BR> <H2>Number of Entries in the Dictionary </H2> To find the number of entries in a dictionary, use the function "size". It returns an unsigned integer. Since the zero entry is unused, the integer returned by size is the index of the last entry in the dictionary. For example: <PRE> unsigned n = size(td); </PRE> <P> <BR> <H2>Resetting a String Dictionary </H2> The overloaded function reset() may be used to restore a string dictionary to its initial condition, deleting all entries: <PRE> reset(td); </PRE> <P> <BR> <H2>Technical Notes </H2> The dictionary uses a hashing algorithm freely adapted from Fletcher's checksum as described by John Kodis in Doctor Dobb's Journal, May, 1992. <P> The length of the hash table is the smallest power of two that is greater than the specified size of the dictionary. Therefore, if you are using a computer with 8086 architecture, you are limited to a dictionary with fewer than 32768 entries. <P> The hash algorithm produces two unsigned integers, k1 and k2, which are each less than 255. The hash code is then calculated as <CODE> 255*k1 + k2</CODE>, and masked to the length of the hash table. In case of collision, a hash increment is calculated by using k1 and k2 in the opposite order: <CODE> 255*k2 + k1</CODE>. This is multiplied by two and one is added to guarantee an increment which is relatively prime to the length of the hash table. This process guarantees the entire table can be searched. <P> The number of calls and the number of collisions are recorded so you can check the efficiency of the hash algorithm. <P> Because the dictionary allocates a single character array for all the dictionary entries, the product of the max number of entries and the average size must be less than 65536 when using 8086 architecture. If this is inconvenient there are several alternatives. You can allocate storage individually for each string and simply maintain a list of pointers, or you can declare a "huge" array, and use longs to index it. Remember that in this case the hash table would have to be declared as an array of longs. <P> <BR> <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" WIDTH=1010 HEIGHT=2 > <P> <IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software" WIDTH=181 HEIGHT=25> <BR CLEAR="right"> <P> Back to : <A HREF="../index.html">Index</A> | <A HREF="index.html">Class libraries for examples</A> <P> <ADDRESS><FONT SIZE="-1"> AnaGram parser generator - examples<BR> Class libraries - String dictionary class definition<BR> Copyright © 1993-1999, Parsifal Software. <BR> All Rights Reserved.<BR> </FONT></ADDRESS> </BODY> </HTML>