comparison doc/mansupp/mansupp-201.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:13d2b8934445
1 <HTML>
2 <HEAD>
3 <TITLE>AnaGram 2.01 Manual Supplement</TITLE>
4 </HEAD>
5 <BODY TEXT="#000000" LINK="#0000ff" VLINK="#551a8b" ALINK="#ff0000"
6 BGCOLOR="#ffffff">
7
8 <H1 ALIGN="CENTER">AnaGram 2.01</H1>
9
10 <H1 ALIGN="CENTER">Supplement to User's Guide</H1>
11 <P>
12 <BR>
13
14 <H2>Thread Safe Parsers</H2>
15
16 <P>
17 AnaGram 2.01 incorporates several changes designed to make it
18 easier to write thread safe parsers.
19 </P>
20
21 <P>
22 First, the new <B><A HREF="#reentrantParser">reentrant parser</A></B>
23 switch makes the AnaGram parse
24 engine reentrant by passing the parser control block as an argument
25 to all function calls. Without it, the parser control block becomes a
26 global resource, so that only one parse context can be in use at one
27 time.
28 </P>
29
30 <P>
31 Second, the <B><A HREF="#extendPcb">extend pcb</A></B>
32 statement allows you to add your own
33 declarations to the parser control block, so that you can avoid
34 references to global or static variables in your reduction procedures.
35 </P>
36
37 <P>
38 Finally, the parsers generated by AnaGram 2.01 no longer use any
39 static or global variables to store temporary data. All working storage
40 is now kept on the stack or in the parser control block.
41 </P>
42
43 <P>
44 These are the steps to make a parser thread safe:
45 <UL>
46 <LI>Set the <B>reentrant parser</B> switch in your syntax file.</LI>
47 <LI>Add one or more <B>extend pcb</B> statements to your syntax file
48 and include declarations for all the variables needed by your
49 reduction procedures. Update your reduction procedures
50 accordingly.</LI>
51 <LI>If your parser will modify any variable which is not in the
52 parser control block, make sure that variable is protected by
53 a mutex, or otherwise synchronized properly.</LI>
54 <LI>To run the parser, declare an instance of the parser control
55 block <EM>on the stack</EM>, initialize your fields in the
56 parser control block as appropriate, lock any relevant mutexes,
57 and then call the parser function with a pointer to the parser
58 control block as the argument.</LI>
59 </UL>
60 <BR>
61
62 <H2>Added C++ Support</H2>
63
64 <P>
65 In previous versions of AnaGram it has not been possible to return class
66 instances (rather than pointers to them) from reduction procedures except
67 under limited circumstances. This is because AnaGram generates code that
68 stores objects on the parser value stack simply by casting the stack pointer
69 and assigning the value. This approach is correct for all traditional data
70 types, but leads to unpredictable behavior for a class that has supplied its
71 own assignment operator. Overloaded assignment operators depend on the
72 destination being a valid instance of the class. With the traditional AnaGram
73 parser value stack, however, this is not normally the case.
74 </P>
75
76 <P>
77 Since there are many classes, such as string classes, which require
78 their own implementation of the assignment operator, the restriction
79 on returning class instances has often made reduction procedures
80 unnecessarily complex.
81 </P>
82
83 <P>
84 AnaGram 2.01 now has a <B><A HREF="#wrapper">wrapper</A></B>
85 statement which can be used to
86 overcome this problem. For each class specified in a <B>wrapper</B>
87 statement, AnaGram generates a wrapper class that transparently
88 solves the problem. The stacked object is created using the copy
89 constructor. The reduction procedure is called with a reference to the
90 stacked object rather than a copy. Wrapped objects are removed <EM>after</EM>
91 the reduction procedure that uses them returns.
92 </P>
93 <BR>
94
95 <H2>Error Diagnostic Support</H2>
96
97 <P>
98 The error diagnostics created by the <STRONG>diagnose errors</STRONG>
99 switch have
100 been revised so that their text is defined by macros which the user
101 can replace. There are three macros involved:
102 </P>
103
104 <UL>
105 <LI><TT>MISSING_FORMAT</TT>. The default definition of this macro is
106 <CODE>"Missing %s"</CODE>. It is used when the parser expects a unique
107 input token, the name of the token exists in the <B>token names</B>
108 table, and the token is not found in the input.</LI>
109 <LI><TT>UNEXPECTED_FORMAT</TT>. The default definition of this
110 macro is <CODE>"Unexpected %s"</CODE>. It is used when there is more
111 than one possible input token, but the token found is not one of
112 those expected.</LI>
113 <LI><TT>UNNAMED_TOKEN</TT>. The default definition is <TT>"input"</TT>. It
114 is used in place of a token name in <TT>UNEXPECTED_FORMAT</TT>
115 when the actual input encountered cannot be identified as a
116 token.</LI>
117 </UL>
118
119 <P>
120 Note that if <B>diagnose errors</B> is ON, AnaGram automatically
121 includes in your generated parser the array of strings specified by the
122 <TT>TOKEN_NAMES</TT> macro, which is useful in creating
123 diagnostics. The default
124 name of this array is
125 <PRE>
126 &lt;parser name&gt;_token_names
127 </PRE>
128 </P>
129 <BR>
130
131 <H2>New Attribute Statements</H2>
132
133 <H3><A NAME="extendPcb">extend pcb</A></H3>
134
135 <P>
136 The <B>extend pcb</B> statement is an attribute statement that allows you to
137 add declarations of your own to the parser control block. With this
138 feature, data needed by reduction procedures can be stored in the
139 parser control block rather than in global or static storage. This
140 capability greatly facilitates the construction of thread safe
141 parsers.
142 </P>
143
144 <P>
145 The <B>extend pcb</B> statement may be used in any configuration section.
146 The format is as follows:
147 <PRE>
148 extend pcb { &lt;C or C++ declaration&gt;... }
149 </PRE>
150 </P>
151
152 <P>
153 It may, of course, extend over multiple lines and may contain any number
154 of C or C++ declarations of any kind. AnaGram will append it to the end of
155 the parser control block definition in the generated parser header file.
156 There may be any number of <B>extend pcb</B> statements. The extensions are
157 appended to the parser control block definition in the order in which they
158 occur in the syntax file.
159 </P>
160
161 <P>
162 The <B>extend pcb</B> statement is compatible with both C and C++ parsers.
163 Note that even if you are deriving your own class from the parser
164 control block, you might want to use <B>extend pcb</B> to provide virtual
165 function definitions or other declarations appropriate to a base class.
166 </P>
167
168 <H3><A NAME="wrapper">wrapper</A></H3>
169
170 <P>
171 The <B>wrapper</B> attribute statement provides correct handling of C++
172 objects returned inline by reduction procedures.
173 </P>
174
175 <P>
176 If you specify a wrapper for a C++ object, when a reduction
177 procedure returns an instance of the object, a copy of the object will
178 be constructed on the parser value stack and the destructor will be
179 called when that object is removed from the stack.
180 </P>
181
182 <P>
183 Without a wrapper, objects are stored on the value stack simply by
184 coercing the stack pointer to the appropriate type. There is no
185 constructor call when the object is stored nor a destructor call when
186 it is removed from the stack.
187 </P>
188
189 <P>
190 Classes which use reference counts or otherwise overload the
191 assignment operator should always have wrappers in order to
192 function correctly.
193 </P>
194
195 <P>
196 Wrapper statements, like other attribute statements, must appear in
197 configuration sections. The syntax is:
198 <PRE>
199 wrapper {&lt;comma delimited list of data types&gt;}
200 </PRE>
201 For example:
202 <PRE>
203 [
204 wrapper {CString, CFont}
205 ]
206 </PRE>
207 </P>
208
209 <P>
210 You cannot specify a wrapper for the <B>default token type</B>.
211 </P>
212
213 <P>
214 If your parser uses AnaGram wrappers and exits with an error condition, there
215 may be objects remaining on the parser value stack. If you have no
216 further use for
217 these objects, you should call the <TT>DELETE_WRAPPERS</TT> macro on error exit
218 so that they will be properly deleted, thus avoiding a memory leak. If you
219 have enabled <B>auto resynch</B>, <TT>DELETE_WRAPPERS</TT> will be
220 invoked automatically.
221 </P>
222 <BR>
223
224 <H2>Changed Configuration Parameters</H2>
225
226 <H3>Parser stack alignment</H3>
227
228 <P>
229 <B>Parser stack alignment</B> now defaults to <TT>long</TT> instead
230 of <TT>int</TT>. With
231 this default, AnaGram parsers will compile and run on 64-bit
232 processors with no further attention. Users who are building parsers
233 for embedded systems or other uses where memory is limited may
234 want to override this default value with their own specification.
235 </P>
236
237 <H3>Parser stack size</H3>
238
239 <P>
240 <B>Parser stack size</B> now defaults to 128 instead of 32. AnaGram
241 adjusts the parser stack size upwards, if necessary, depending on the
242 grammar. If your grammar uses only left recursive constructs, you
243 will never have a problem with parser stack overflow. If there is
244 center recursion or right recursion in your grammar, however, there
245 always exists syntactically correct input which can cause stack
246 overflow no matter how large the stack. Be sure that the parser stack
247 size is ample enough to handle all reasonable cases.
248 </P>
249
250 <H3>Token names</H3>
251
252 <P>
253 <B>Token names</B> defaults to OFF. If it is set, AnaGram generates a
254 static array of character strings, indexed by token number, to provide
255 ASCII representations of token names for use in error diagnostics.
256 </P>
257
258 <P>
259 The array contains strings for all grammar tokens which have been
260 explicitly named in the syntax file as well as tokens which represent
261 keywords or single character constants.
262 </P>
263
264 <P>
265 Prior to version 2.01 of AnaGram, the array contained strings
266 for explicitly named tokens only. If this restriction is required, set the
267 <B>token names only</B> switch.
268 </P>
269
270 <H2>New Configuration Parameters</H2>
271
272 <H3>iso latin 1</H3>
273
274 <P>
275 The <B>iso latin 1</B> configuration switch defaults to ON. It controls case
276 conversion on input characters when the <B>case sensitive</B> switch is set
277 to OFF. When <B>iso latin 1</B> is set, the default <TT>CONVERT_CASE</TT> macro
278 is defined to correctly convert all characters in the latin 1 character
279 set.
280 </P>
281
282 <P>
283 When the <B>iso latin 1</B> switch is OFF, only characters in the ASCII range
284 (0-127) are converted.
285 </P>
286
287 <H3><A NAME="reentrantParser">reentrant parser</A></H3>
288
289 <P>
290 The <B>reentrant parser</B> configuration switch defaults to OFF. If you
291 turn it on, AnaGram will generate code that passes the parser control
292 block to functions via calling sequences so they do not have to use a
293 static reference to find the control block.
294 </P>
295
296 <P>
297 AnaGram passes the parser control block using the macro
298 <TT>PCB_TYPE</TT>. For example,
299 <PRE>
300 static void ag_ra(PCB_TYPE *pcb_pointer)
301 </PRE>
302 AnaGram will define <TT>PCB_TYPE</TT> as the type of the parser
303 control block if you
304 do not define it otherwise. If you are using C++, and derive a class from the
305 parser control block, you can override the definition of
306 <TT>PCB_TYPE</TT> in order to
307 make your derived class accessible from your reduction procedures.
308 </P>
309
310 <P>
311 The <B>reentrant parser</B> switch cannot be used in conjunction with the
312 <B>old style</B> switch.
313 </P>
314
315 <P>
316 When you have enabled the reentrant parser switch, the parse
317 function, the initializer function, and the parser value function are all
318 defined to take a pointer to the parser control block as their sole
319 argument.
320 </P>
321
322 <H3>token names only</H3>
323
324 <P>
325 <B>Token names only</B> defaults to OFF. This configuration
326 switch was added to AnaGram 2.01 to provide the functionality previously
327 provided by the <B>token names</B> switch. When <B>token names
328 only</B> is ON, only tokens which have been given explicit names in the
329 syntax file have non-empty strings in the generated list of character strings.
330 <B>Token names only</B> takes precedence over the <B>token names</B> switch.
331 </P>
332
333 <H3>no cr</H3>
334
335 <P>
336 The <B>no cr</B> configuration switch is provided for developers
337 who intend to use the generated parser on a Unix system. When
338 <B>no cr</B> is set, it causes AnaGram's
339 output parser and header files to be written without carriage
340 returns. The switch defaults to OFF, to maintain compatibility with
341 Windows systems.
342 </P>
343
344 </BODY>
345 </HTML>