Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/misc/html/oldclasslib/strdict.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> | |
2 <HTML> | |
3 <HEAD> | |
4 <TITLE>String Dictionary Class Definition</TITLE> | |
5 </HEAD> | |
6 | |
7 | |
8 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" | |
9 TEXT="#000000" LINK="#0033CC" | |
10 VLINK="#CC0033" ALINK="#CC0099"> | |
11 | |
12 <P> | |
13 <IMG ALIGN="right" SRC="../images/agrsl6c.gif" ALT="AnaGram" | |
14 WIDTH=124 HEIGHT=30 > | |
15 <BR CLEAR="all"> | |
16 Back to : | |
17 <A HREF="../index.html">Index</A> | | |
18 <A HREF="index.html">Class libraries for examples</A> | |
19 <P> | |
20 | |
21 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
22 WIDTH=1010 HEIGHT=2 > | |
23 <P> | |
24 | |
25 <H1>String Dictionary Class Definition</H1> | |
26 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
27 WIDTH=1010 HEIGHT=2 > | |
28 <P> | |
29 <BR> | |
30 <H2>Introduction</H2> | |
31 <P> | |
32 | |
33 A string dictionary is a device for keeping track of character strings. | |
34 It associates a unique, non-zero, unsigned integer with each unique | |
35 string inserted in the dictionary. These indices are assigned in order | |
36 of insertion, so that the first string inserted has index 1, the second | |
37 index 2, and so on. These indices may be used to index auxiliary tables | |
38 containing information about the corresponding string. In particular, | |
39 these indices may be conveniently used to index a symbol table. A | |
40 hashing scheme is used to facilitate quick identification of a given | |
41 string. | |
42 <P> | |
43 <BR> | |
44 | |
45 <H2>Constructor </H2> | |
46 | |
47 The constructor for a string dictionary takes two unsigned integer | |
48 arguments: the maximum number of entries and the average number of | |
49 bytes per entry. The latter will default to ten if you don't specify | |
50 it. The product of size and length must be less than 65536. For | |
51 example: | |
52 <PRE> | |
53 string_dictionary td(5000); // 5000 entries, default size | |
54 string_dictionary xx(500,20); // 500 entries, average size = 20 | |
55 </PRE> | |
56 <P> | |
57 <BR> | |
58 | |
59 <H2>Entering Strings in a Dictionary </H2> | |
60 | |
61 To enter a string in a dictionary, use the overloaded operator " << ", | |
62 with the dictionary on the left, the string on the right. The result is | |
63 a nonzero, unsigned integer. For example: | |
64 <PRE> | |
65 unsigned hwx = td << "Hello, world!"; | |
66 </PRE> | |
67 <P> | |
68 <BR> | |
69 | |
70 <H2>Dictionary Lookups </H2> | |
71 | |
72 To look up a string in a dictionary, use the overloaded operator | |
73 "<CODE>[]</CODE>", | |
74 using the string to index the dictionary. The result is an unsigned | |
75 integer. It will be zero if the string is not found in the dictionary. | |
76 For example: | |
77 <PRE> | |
78 unsigned k = td["Hello, world!"]; | |
79 </PRE> | |
80 <BR> | |
81 | |
82 <H2>Recovering a String from a Dictionary </H2> | |
83 | |
84 The operator "<CODE>[]</CODE>" is also overloaded to allow you to get the pointer to | |
85 a particular string in the dictionary. Simply index the dictionary with | |
86 an unsigned integer. The result will be a pointer to the desired | |
87 string. If the index is zero, or larger than the number of strings in | |
88 the dictionary, the pointer will be null. For example: | |
89 <PRE> | |
90 unsigned k = td << "Hello"; | |
91 char *ptr = td[k]; //returns pointer to "Hello" | |
92 </PRE> | |
93 <BR> | |
94 | |
95 <H2>Number of Entries in the Dictionary </H2> | |
96 | |
97 To find the number of entries in a dictionary, use the function "size". | |
98 It returns an unsigned integer. Since the zero entry is unused, the | |
99 integer returned by size is the index of the last entry in the | |
100 dictionary. For example: | |
101 <PRE> | |
102 unsigned n = size(td); | |
103 </PRE> | |
104 <P> | |
105 <BR> | |
106 | |
107 <H2>Resetting a String Dictionary </H2> | |
108 | |
109 The overloaded function reset() may be used to restore a string | |
110 dictionary to its initial condition, deleting all entries: | |
111 <PRE> | |
112 reset(td); | |
113 </PRE> | |
114 <P> | |
115 <BR> | |
116 | |
117 <H2>Technical Notes </H2> | |
118 | |
119 The dictionary uses a hashing algorithm freely adapted from Fletcher's | |
120 checksum as described by John Kodis in Doctor Dobb's Journal, May, | |
121 1992. | |
122 <P> | |
123 The length of the hash table is the smallest power of two that is | |
124 greater than the specified size of the dictionary. Therefore, if you | |
125 are using a computer with 8086 architecture, you are limited to a | |
126 dictionary with fewer than 32768 entries. | |
127 <P> | |
128 The hash algorithm produces two unsigned integers, k1 and k2, which are | |
129 each less than 255. The hash code is then calculated as | |
130 <CODE> 255*k1 + k2</CODE>, | |
131 and masked to the length of the hash table. In case of collision, a | |
132 hash increment is calculated by using k1 and k2 in the opposite order: | |
133 <CODE> 255*k2 + k1</CODE>. | |
134 This is multiplied by two and one is added to guarantee an | |
135 increment which is relatively prime to the length of the hash table. | |
136 This process guarantees the entire table can be searched. | |
137 <P> | |
138 The number of calls and the number of collisions are recorded so you | |
139 can check the efficiency of the hash algorithm. | |
140 <P> | |
141 Because the dictionary allocates a single character array for all the | |
142 dictionary entries, the product of the max number of entries and the | |
143 average size must be less than 65536 when using 8086 architecture. If | |
144 this is inconvenient there are several alternatives. You can allocate | |
145 storage individually for each string and simply maintain a list of | |
146 pointers, or you can declare a "huge" array, and use longs to index it. | |
147 Remember that in this case the hash table would have to be declared as | |
148 an array of longs. | |
149 | |
150 | |
151 <P> | |
152 <BR> | |
153 | |
154 | |
155 <IMG ALIGN="bottom" SRC="../images/rbline6j.gif" ALT="----------------------" | |
156 WIDTH=1010 HEIGHT=2 > | |
157 <P> | |
158 <IMG ALIGN="right" SRC="../images/pslrb6d.gif" ALT="Parsifal Software" | |
159 WIDTH=181 HEIGHT=25> | |
160 <BR CLEAR="right"> | |
161 | |
162 <P> | |
163 Back to : | |
164 <A HREF="../index.html">Index</A> | | |
165 <A HREF="index.html">Class libraries for examples</A> | |
166 | |
167 <P> | |
168 <ADDRESS><FONT SIZE="-1"> | |
169 AnaGram parser generator - examples<BR> | |
170 Class libraries - String dictionary class definition<BR> | |
171 Copyright © 1993-1999, Parsifal Software. <BR> | |
172 All Rights Reserved.<BR> | |
173 </FONT></ADDRESS> | |
174 | |
175 </BODY> | |
176 </HTML> | |
177 |