Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/mansupp/mansupp-201.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 <HTML> | |
2 <HEAD> | |
3 <TITLE>AnaGram 2.01 Manual Supplement</TITLE> | |
4 </HEAD> | |
5 <BODY TEXT="#000000" LINK="#0000ff" VLINK="#551a8b" ALINK="#ff0000" | |
6 BGCOLOR="#ffffff"> | |
7 | |
8 <H1 ALIGN="CENTER">AnaGram 2.01</H1> | |
9 | |
10 <H1 ALIGN="CENTER">Supplement to User's Guide</H1> | |
11 <P> | |
12 <BR> | |
13 | |
14 <H2>Thread Safe Parsers</H2> | |
15 | |
16 <P> | |
17 AnaGram 2.01 incorporates several changes designed to make it | |
18 easier to write thread safe parsers. | |
19 </P> | |
20 | |
21 <P> | |
22 First, the new <B><A HREF="#reentrantParser">reentrant parser</A></B> | |
23 switch makes the AnaGram parse | |
24 engine reentrant by passing the parser control block as an argument | |
25 to all function calls. Without it, the parser control block becomes a | |
26 global resource, so that only one parse context can be in use at one | |
27 time. | |
28 </P> | |
29 | |
30 <P> | |
31 Second, the <B><A HREF="#extendPcb">extend pcb</A></B> | |
32 statement allows you to add your own | |
33 declarations to the parser control block, so that you can avoid | |
34 references to global or static variables in your reduction procedures. | |
35 </P> | |
36 | |
37 <P> | |
38 Finally, the parsers generated by AnaGram 2.01 no longer use any | |
39 static or global variables to store temporary data. All working storage | |
40 is now kept on the stack or in the parser control block. | |
41 </P> | |
42 | |
43 <P> | |
44 These are the steps to make a parser thread safe: | |
45 <UL> | |
46 <LI>Set the <B>reentrant parser</B> switch in your syntax file.</LI> | |
47 <LI>Add one or more <B>extend pcb</B> statements to your syntax file | |
48 and include declarations for all the variables needed by your | |
49 reduction procedures. Update your reduction procedures | |
50 accordingly.</LI> | |
51 <LI>If your parser will modify any variable which is not in the | |
52 parser control block, make sure that variable is protected by | |
53 a mutex, or otherwise synchronized properly.</LI> | |
54 <LI>To run the parser, declare an instance of the parser control | |
55 block <EM>on the stack</EM>, initialize your fields in the | |
56 parser control block as appropriate, lock any relevant mutexes, | |
57 and then call the parser function with a pointer to the parser | |
58 control block as the argument.</LI> | |
59 </UL> | |
60 <BR> | |
61 | |
62 <H2>Added C++ Support</H2> | |
63 | |
64 <P> | |
65 In previous versions of AnaGram it has not been possible to return class | |
66 instances (rather than pointers to them) from reduction procedures except | |
67 under limited circumstances. This is because AnaGram generates code that | |
68 stores objects on the parser value stack simply by casting the stack pointer | |
69 and assigning the value. This approach is correct for all traditional data | |
70 types, but leads to unpredictable behavior for a class that has supplied its | |
71 own assignment operator. Overloaded assignment operators depend on the | |
72 destination being a valid instance of the class. With the traditional AnaGram | |
73 parser value stack, however, this is not normally the case. | |
74 </P> | |
75 | |
76 <P> | |
77 Since there are many classes, such as string classes, which require | |
78 their own implementation of the assignment operator, the restriction | |
79 on returning class instances has often made reduction procedures | |
80 unnecessarily complex. | |
81 </P> | |
82 | |
83 <P> | |
84 AnaGram 2.01 now has a <B><A HREF="#wrapper">wrapper</A></B> | |
85 statement which can be used to | |
86 overcome this problem. For each class specified in a <B>wrapper</B> | |
87 statement, AnaGram generates a wrapper class that transparently | |
88 solves the problem. The stacked object is created using the copy | |
89 constructor. The reduction procedure is called with a reference to the | |
90 stacked object rather than a copy. Wrapped objects are removed <EM>after</EM> | |
91 the reduction procedure that uses them returns. | |
92 </P> | |
93 <BR> | |
94 | |
95 <H2>Error Diagnostic Support</H2> | |
96 | |
97 <P> | |
98 The error diagnostics created by the <STRONG>diagnose errors</STRONG> | |
99 switch have | |
100 been revised so that their text is defined by macros which the user | |
101 can replace. There are three macros involved: | |
102 </P> | |
103 | |
104 <UL> | |
105 <LI><TT>MISSING_FORMAT</TT>. The default definition of this macro is | |
106 <CODE>"Missing %s"</CODE>. It is used when the parser expects a unique | |
107 input token, the name of the token exists in the <B>token names</B> | |
108 table, and the token is not found in the input.</LI> | |
109 <LI><TT>UNEXPECTED_FORMAT</TT>. The default definition of this | |
110 macro is <CODE>"Unexpected %s"</CODE>. It is used when there is more | |
111 than one possible input token, but the token found is not one of | |
112 those expected.</LI> | |
113 <LI><TT>UNNAMED_TOKEN</TT>. The default definition is <TT>"input"</TT>. It | |
114 is used in place of a token name in <TT>UNEXPECTED_FORMAT</TT> | |
115 when the actual input encountered cannot be identified as a | |
116 token.</LI> | |
117 </UL> | |
118 | |
119 <P> | |
120 Note that if <B>diagnose errors</B> is ON, AnaGram automatically | |
121 includes in your generated parser the array of strings specified by the | |
122 <TT>TOKEN_NAMES</TT> macro, which is useful in creating | |
123 diagnostics. The default | |
124 name of this array is | |
125 <PRE> | |
126 <parser name>_token_names | |
127 </PRE> | |
128 </P> | |
129 <BR> | |
130 | |
131 <H2>New Attribute Statements</H2> | |
132 | |
133 <H3><A NAME="extendPcb">extend pcb</A></H3> | |
134 | |
135 <P> | |
136 The <B>extend pcb</B> statement is an attribute statement that allows you to | |
137 add declarations of your own to the parser control block. With this | |
138 feature, data needed by reduction procedures can be stored in the | |
139 parser control block rather than in global or static storage. This | |
140 capability greatly facilitates the construction of thread safe | |
141 parsers. | |
142 </P> | |
143 | |
144 <P> | |
145 The <B>extend pcb</B> statement may be used in any configuration section. | |
146 The format is as follows: | |
147 <PRE> | |
148 extend pcb { <C or C++ declaration>... } | |
149 </PRE> | |
150 </P> | |
151 | |
152 <P> | |
153 It may, of course, extend over multiple lines and may contain any number | |
154 of C or C++ declarations of any kind. AnaGram will append it to the end of | |
155 the parser control block definition in the generated parser header file. | |
156 There may be any number of <B>extend pcb</B> statements. The extensions are | |
157 appended to the parser control block definition in the order in which they | |
158 occur in the syntax file. | |
159 </P> | |
160 | |
161 <P> | |
162 The <B>extend pcb</B> statement is compatible with both C and C++ parsers. | |
163 Note that even if you are deriving your own class from the parser | |
164 control block, you might want to use <B>extend pcb</B> to provide virtual | |
165 function definitions or other declarations appropriate to a base class. | |
166 </P> | |
167 | |
168 <H3><A NAME="wrapper">wrapper</A></H3> | |
169 | |
170 <P> | |
171 The <B>wrapper</B> attribute statement provides correct handling of C++ | |
172 objects returned inline by reduction procedures. | |
173 </P> | |
174 | |
175 <P> | |
176 If you specify a wrapper for a C++ object, when a reduction | |
177 procedure returns an instance of the object, a copy of the object will | |
178 be constructed on the parser value stack and the destructor will be | |
179 called when that object is removed from the stack. | |
180 </P> | |
181 | |
182 <P> | |
183 Without a wrapper, objects are stored on the value stack simply by | |
184 coercing the stack pointer to the appropriate type. There is no | |
185 constructor call when the object is stored nor a destructor call when | |
186 it is removed from the stack. | |
187 </P> | |
188 | |
189 <P> | |
190 Classes which use reference counts or otherwise overload the | |
191 assignment operator should always have wrappers in order to | |
192 function correctly. | |
193 </P> | |
194 | |
195 <P> | |
196 Wrapper statements, like other attribute statements, must appear in | |
197 configuration sections. The syntax is: | |
198 <PRE> | |
199 wrapper {<comma delimited list of data types>} | |
200 </PRE> | |
201 For example: | |
202 <PRE> | |
203 [ | |
204 wrapper {CString, CFont} | |
205 ] | |
206 </PRE> | |
207 </P> | |
208 | |
209 <P> | |
210 You cannot specify a wrapper for the <B>default token type</B>. | |
211 </P> | |
212 | |
213 <P> | |
214 If your parser uses AnaGram wrappers and exits with an error condition, there | |
215 may be objects remaining on the parser value stack. If you have no | |
216 further use for | |
217 these objects, you should call the <TT>DELETE_WRAPPERS</TT> macro on error exit | |
218 so that they will be properly deleted, thus avoiding a memory leak. If you | |
219 have enabled <B>auto resynch</B>, <TT>DELETE_WRAPPERS</TT> will be | |
220 invoked automatically. | |
221 </P> | |
222 <BR> | |
223 | |
224 <H2>Changed Configuration Parameters</H2> | |
225 | |
226 <H3>Parser stack alignment</H3> | |
227 | |
228 <P> | |
229 <B>Parser stack alignment</B> now defaults to <TT>long</TT> instead | |
230 of <TT>int</TT>. With | |
231 this default, AnaGram parsers will compile and run on 64-bit | |
232 processors with no further attention. Users who are building parsers | |
233 for embedded systems or other uses where memory is limited may | |
234 want to override this default value with their own specification. | |
235 </P> | |
236 | |
237 <H3>Parser stack size</H3> | |
238 | |
239 <P> | |
240 <B>Parser stack size</B> now defaults to 128 instead of 32. AnaGram | |
241 adjusts the parser stack size upwards, if necessary, depending on the | |
242 grammar. If your grammar uses only left recursive constructs, you | |
243 will never have a problem with parser stack overflow. If there is | |
244 center recursion or right recursion in your grammar, however, there | |
245 always exists syntactically correct input which can cause stack | |
246 overflow no matter how large the stack. Be sure that the parser stack | |
247 size is ample enough to handle all reasonable cases. | |
248 </P> | |
249 | |
250 <H3>Token names</H3> | |
251 | |
252 <P> | |
253 <B>Token names</B> defaults to OFF. If it is set, AnaGram generates a | |
254 static array of character strings, indexed by token number, to provide | |
255 ASCII representations of token names for use in error diagnostics. | |
256 </P> | |
257 | |
258 <P> | |
259 The array contains strings for all grammar tokens which have been | |
260 explicitly named in the syntax file as well as tokens which represent | |
261 keywords or single character constants. | |
262 </P> | |
263 | |
264 <P> | |
265 Prior to version 2.01 of AnaGram, the array contained strings | |
266 for explicitly named tokens only. If this restriction is required, set the | |
267 <B>token names only</B> switch. | |
268 </P> | |
269 | |
270 <H2>New Configuration Parameters</H2> | |
271 | |
272 <H3>iso latin 1</H3> | |
273 | |
274 <P> | |
275 The <B>iso latin 1</B> configuration switch defaults to ON. It controls case | |
276 conversion on input characters when the <B>case sensitive</B> switch is set | |
277 to OFF. When <B>iso latin 1</B> is set, the default <TT>CONVERT_CASE</TT> macro | |
278 is defined to correctly convert all characters in the latin 1 character | |
279 set. | |
280 </P> | |
281 | |
282 <P> | |
283 When the <B>iso latin 1</B> switch is OFF, only characters in the ASCII range | |
284 (0-127) are converted. | |
285 </P> | |
286 | |
287 <H3><A NAME="reentrantParser">reentrant parser</A></H3> | |
288 | |
289 <P> | |
290 The <B>reentrant parser</B> configuration switch defaults to OFF. If you | |
291 turn it on, AnaGram will generate code that passes the parser control | |
292 block to functions via calling sequences so they do not have to use a | |
293 static reference to find the control block. | |
294 </P> | |
295 | |
296 <P> | |
297 AnaGram passes the parser control block using the macro | |
298 <TT>PCB_TYPE</TT>. For example, | |
299 <PRE> | |
300 static void ag_ra(PCB_TYPE *pcb_pointer) | |
301 </PRE> | |
302 AnaGram will define <TT>PCB_TYPE</TT> as the type of the parser | |
303 control block if you | |
304 do not define it otherwise. If you are using C++, and derive a class from the | |
305 parser control block, you can override the definition of | |
306 <TT>PCB_TYPE</TT> in order to | |
307 make your derived class accessible from your reduction procedures. | |
308 </P> | |
309 | |
310 <P> | |
311 The <B>reentrant parser</B> switch cannot be used in conjunction with the | |
312 <B>old style</B> switch. | |
313 </P> | |
314 | |
315 <P> | |
316 When you have enabled the reentrant parser switch, the parse | |
317 function, the initializer function, and the parser value function are all | |
318 defined to take a pointer to the parser control block as their sole | |
319 argument. | |
320 </P> | |
321 | |
322 <H3>token names only</H3> | |
323 | |
324 <P> | |
325 <B>Token names only</B> defaults to OFF. This configuration | |
326 switch was added to AnaGram 2.01 to provide the functionality previously | |
327 provided by the <B>token names</B> switch. When <B>token names | |
328 only</B> is ON, only tokens which have been given explicit names in the | |
329 syntax file have non-empty strings in the generated list of character strings. | |
330 <B>Token names only</B> takes precedence over the <B>token names</B> switch. | |
331 </P> | |
332 | |
333 <H3>no cr</H3> | |
334 | |
335 <P> | |
336 The <B>no cr</B> configuration switch is provided for developers | |
337 who intend to use the generated parser on a Unix system. When | |
338 <B>no cr</B> is set, it causes AnaGram's | |
339 output parser and header files to be written without carriage | |
340 returns. The switch defaults to OFF, to maintain compatibility with | |
341 Windows systems. | |
342 </P> | |
343 | |
344 </BODY> | |
345 </HTML> |