Mercurial > ~dholland > hg > ag > index.cgi
comparison doc/misc/html/examples/mpp/mas.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:13d2b8934445 |
---|---|
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> | |
2 <HTML> | |
3 <HEAD> | |
4 <TITLE>Macro/Argument Substitution Module - Macro preprocessor and C Parser </TITLE> | |
5 </HEAD> | |
6 | |
7 | |
8 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" | |
9 TEXT="000000" LINK="0033CC" | |
10 VLINK="CC0033" ALINK="CC0099"> | |
11 | |
12 <P> | |
13 <IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram" | |
14 WIDTH=124 HEIGHT=30 > | |
15 <BR CLEAR="all"> | |
16 Back to : | |
17 <A HREF="../../index.html">Index</A> | | |
18 <A HREF="index.html">Macro preprocessor overview</A> | |
19 <P> | |
20 | |
21 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" | |
22 WIDTH=1010 HEIGHT=2 > | |
23 <P> | |
24 <H1> Macro/Argument Substitution Module - Macro preprocessor and C Parser </H1> | |
25 | |
26 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" | |
27 WIDTH=1010 HEIGHT=2 > | |
28 <P> | |
29 | |
30 <H2>Introduction</H2> | |
31 <P> | |
32 | |
33 The Macro/Argument Substitution module, MAS.SYN, | |
34 accomplishes the following tasks: | |
35 <OL> | |
36 <LI> It can scan the body of a macro, to identify and | |
37 substitute for parameters and macro calls. </LI> | |
38 <LI> It can scan the set of arguments for a macro call | |
39 embedded within another macro, only substituting | |
40 arguments to the outer macro for parameters found | |
41 within arguments to the inner call. </LI> | |
42 <LI> It can scan the argument to a macro for macro calls | |
43 prior to substituting the argument for a parameter. </LI> | |
44 <LI> It can recognize the "##" operator and paste two tokens | |
45 together. </LI> | |
46 <LI> It can recognize the "#" operator and turn a macro | |
47 argument into a string. </LI> | |
48 </OL> | |
49 The macro/argument substitution parser, mas(), is called | |
50 from a shell function, expand_text(). expand_text() is, in | |
51 turn, called by expand_macro() and expand_arg(). Output from | |
52 mas() is accumulated on the token_accumulator, ta. | |
53 <P> | |
54 <BR> | |
55 | |
56 <H2> Theory of Operation </H2> | |
57 | |
58 The primary purpose of mas() is to scan sequences of tokens | |
59 for macro calls and parameter names and to do the indicated | |
60 substitutions. At the same time it must correctly handle the | |
61 "##" and '#' operators both of which inhibit macro expansion | |
62 of their operands. Thus the entire grammar is structured | |
63 around the requirements for these two operators. | |
64 <P> | |
65 A further complication is the handling of white space. | |
66 White space within a macro argument cannot be deleted, since | |
67 otherwise the '#' operator would not provide a correct | |
68 result. Thus, in numerous circumstances in this grammar, it | |
69 is not clear what to do with white space at the time it is | |
70 encountered. For this reason, any particular sequence of | |
71 white space tokens is saved up on a temporary stack, | |
72 space_stack, which can be output later or simply | |
73 disregarded. | |
74 <P> | |
75 Like TS.SYN, MAS.SYN must have special syntax for | |
76 accumulating the arguments of macro calls. The differences | |
77 between the two grammars arise from the fact that TS.SYN is | |
78 converting an ascii representation to a token | |
79 representation, while MAS.SYN already has token input. | |
80 <P> | |
81 <BR> | |
82 | |
83 <H2> Elements of the Macro/Argument Substitution Module </H2> | |
84 | |
85 | |
86 The remainder of this document describes the macro | |
87 definitions, the structure definitions, the static data | |
88 definitions, all configuration parameter settings, and all | |
89 non-terminal parsing tokens used in the macro/argument | |
90 substitution module. It also explains each configuration | |
91 parameter setting in the syntax file. In MAS.SYN, each | |
92 function that is defined is preceded by a short explanation | |
93 of its purpose. | |
94 <P> | |
95 <BR> | |
96 | |
97 <H2> Macro Definitions </H2> | |
98 | |
99 <DL> | |
100 <DT> INPUT_CODE | |
101 <DD> Since this grammar uses "input values" and "pointer input", | |
102 the parser needs to know how to extract the identification | |
103 code for an input token from the item identified by the | |
104 pointer. The parser expects this macro to be appropriately | |
105 defined. Here it is defined so that it extracts the id field | |
106 of the token. | |
107 | |
108 <DT> PCB | |
109 <DD> Since the "declare pcb" switch has been turned off, PCB has | |
110 to be defined manually. | |
111 | |
112 <DT> SYNTAX_ERROR | |
113 <DD> This definition of SYNTAX_ERROR overrides the default | |
114 definition provided by AnaGram. | |
115 </DL> | |
116 <P> | |
117 <BR> | |
118 | |
119 <H2> Static variables </H2> | |
120 | |
121 <DL> | |
122 <DT> active_macros | |
123 <DD> Type: stack<unsigned> | |
124 <P> | |
125 This is a multilevel stack used to keep track of which | |
126 macros have been invoked in any particular expansion. If, | |
127 after an expansion pass it is determined that the result of | |
128 a concatenation is a macro name, all the macros which have | |
129 been expanded so far are marked busy and the text is scanned | |
130 again. Once there is no need for further scans, the busy | |
131 flags are turned off. The stack is multi-level so that it | |
132 can nest easily for recursive usage. | |
133 | |
134 <DT> args | |
135 <DD> Type: token ** | |
136 <P> | |
137 "args" is an array of pointers to token strings. It is the | |
138 set of argument strings for the macro currently being | |
139 expanded. | |
140 | |
141 <DT> args_only | |
142 <DD> Type: int | |
143 <P> | |
144 "args_only" is a switch to tell the macro/argument | |
145 substitution logic only to make argument substitutions and | |
146 not to expand macros. In id_macro it is interrogated and if | |
147 set, NAME tokens are not checked to see if they identify | |
148 macros. | |
149 | |
150 <DT> mas_pcb | |
151 <DD> Type: mas_pcb_type * | |
152 <P> | |
153 This variable contains a pointer to the currently active | |
154 parser control block for mas(). It is saved, set and | |
155 restored in expand_text(). | |
156 | |
157 <DT> n_concats | |
158 <DD> Type: int | |
159 <P> | |
160 This variable is used to count the number of concatenation | |
161 operations that result in a macro name in the course of a | |
162 single scan for macros. If it is non-zero, the text is | |
163 rescanned. | |
164 | |
165 <DT> n_args | |
166 <DD> Type: int | |
167 <P> | |
168 This variable specifies the number of arguments for the | |
169 current macro being expanded. | |
170 | |
171 <DT> params | |
172 <DD> Type: unsigned * | |
173 <P> | |
174 This variable is a pointer to a list of n_args unsigned | |
175 integers. The integers are the indices in the token | |
176 dictionary of the parameter names for the macro currently | |
177 being expanded. | |
178 | |
179 <DT> space_stack | |
180 <DD> Type: token_accumulator | |
181 <P> | |
182 In a number of places in this grammar, it is necessary to | |
183 pass over white space tokens without knowing whether they | |
184 are to be output or disregarded. "space_stack" provides a | |
185 place to store them temporarily until the decision can be | |
186 made. Remember that within macro arguments spaces can be | |
187 significant, and therefore must not be discarded | |
188 prematurely. | |
189 </DL> | |
190 <P> | |
191 <BR> | |
192 | |
193 <H2> Configuration Parameters </H2> | |
194 <DL> | |
195 | |
196 <DT> ~allow macros | |
197 <DD> This statement turns off the allow macros switch so that | |
198 AnaGram implements all reduction procedures as explicit | |
199 function definitions. This simplifies debugging at the cost | |
200 of a slight performance degradation. | |
201 | |
202 <DT> ~backtrack | |
203 <DD> This statement turns off the backtrack switch. This means | |
204 that if the token scanner encounters a syntax error, it will | |
205 not undo default reductions that may have been caused by the | |
206 bad input before it generates diagnostics. | |
207 | |
208 <DT> context type = location | |
209 <DD> This statement specifies that the generated parser is to | |
210 track context automatically. The context variables have type | |
211 "location". location is defined elsewhere to consist of two | |
212 fields: line number and column number. | |
213 | |
214 <DT> ~declare pcb | |
215 <DD> This statement tells AnaGram not to declare a parser control | |
216 block for the parser. Access to the parser control block is | |
217 through a pointer. Actual allocation of storage and setting | |
218 of the pointer takes place in expand_text(). | |
219 | |
220 <DT> default input type = token | |
221 <DD> This statement tells AnaGram how to code reduction procedure | |
222 calls that involve input tokens. | |
223 | |
224 <DT> enum | |
225 <DD> This enumeration statement provides definitions for terminal | |
226 tokens. The same enum statement is found in EX.SYN, in | |
227 JRC.SYN, in KRC.SYN and in MPP.H. | |
228 | |
229 <DT> ~error frame | |
230 <DD> This turns off the error frame portion of the automatic | |
231 syntax error diagnostic generator, since the context of the | |
232 error in the macro substition syntax is of little interest. | |
233 If an error frame were to be used in diagnostics that of the | |
234 C parser would be more appropriate. | |
235 | |
236 <DT> error trace | |
237 <DD> This turns on the error trace function, so that if the token | |
238 scanner encounters a syntax error it will write an .etr | |
239 file. | |
240 | |
241 <DT> input values | |
242 <DD> This switch tells AnaGram that the input units carry some | |
243 baggage, that they have values apart from their identifying | |
244 code. Since this grammar uses pointer input, an INPUT_CODE | |
245 macro must also be defined. | |
246 | |
247 <DT> ~lines and columns | |
248 <DD> Turns off the lines and columns switch so your parser won't | |
249 try to track them here where they certainly make no sense. | |
250 | |
251 <DT> line numbers | |
252 <DD> This statement causes AnaGram to include #line statements in | |
253 the parser file so that your compiler can provided | |
254 diagnostics keyed to your syntax file. | |
255 | |
256 <DT> pointer input | |
257 <DD> This statement tells AnaGram that the input to mas() is an | |
258 array in memory that can be scanned simply by incrementing a | |
259 pointer. Since the input tokens are not simply characters, a | |
260 pointer type statement is required and the INPUT_CODE macro | |
261 must be defined. | |
262 | |
263 <DT> pointer type = token * | |
264 <DD> This statement provides the C data type of the input units | |
265 to the parser. If this statement were omitted, pointer type | |
266 would default to unsigned char *, and your compiler would | |
267 scold when it tried to compile the parser. | |
268 | |
269 <DT> subgrammar parse unit | |
270 <DD> This statement tells AnaGram that the specifications for | |
271 parse unit are internallly complete and that it should not | |
272 determine reductions by inspecting following tokens. | |
273 | |
274 <DT> ~test range | |
275 <DD> This statement tells AnaGram not to check input characters | |
276 to see if they are within allowable limits. This checking is | |
277 not necessary since the input to mas() has been generated in | |
278 such a way that it cannot possibly get an out of range | |
279 token. | |
280 </DL> | |
281 <P> | |
282 <BR> | |
283 | |
284 <H2> Grammar Tokens </H2> | |
285 <DL> | |
286 <DT> arg element | |
287 <DD> An "arg element" is a discrete token in an argument to a | |
288 macro or a sequence of "nested elements". | |
289 | |
290 <DT> arg elements | |
291 <DD> A sequence of "arg element". | |
292 | |
293 <DT> concatenation | |
294 <DD> These productions implement the "##" operator. "parameter | |
295 name" is distinguished on the right side to avoid improper | |
296 macro expansions. | |
297 | |
298 <DT> defined | |
299 <DD> See "variable". "defined" is the special operator available | |
300 in #if and #elif statements to determine whether a macro has | |
301 been defined. It is recognized only if "if_clause" has been | |
302 set. | |
303 | |
304 <DT> grammar | |
305 <DD> "grammar" simply describes the complete input to MAS. Since | |
306 "grammar" is a special name recognized by AnaGram, there is | |
307 no need for any further specification of the "start token" | |
308 for the grammar. | |
309 | |
310 <DT> left side | |
311 <DD> This refers to a "##" operator, its left operand, and any | |
312 intervening white space. The cross recursion with | |
313 "concatenation" allows for constructs of the form A ## B ## | |
314 C, with grouping to the left. | |
315 | |
316 <DT> macro | |
317 <DD> See "variable". This grammar distinguishes between a "simple | |
318 macro" which was defined without any parameter list, and a | |
319 "macro" which had an explicit, although perhaps empty, | |
320 parameter list. If a "macro" appears without following | |
321 parentheses, it is simply passed on without being expanded. | |
322 | |
323 <DT> macro arg list | |
324 <DD> This token counts the number of arguments to a macro, and | |
325 stacks them on so many levels of the token accumulator. The | |
326 logic is essentially the same as for macro arg list in | |
327 TS.SYN. | |
328 | |
329 <DT> nested elements | |
330 <DD> The "nested elements" token represents a sequence of macro | |
331 argument tokens enclosed in matching parentheses. | |
332 | |
333 <DT> not parameter | |
334 <DD> This token exists simply to avoid multiple copies of the | |
335 same reduction procedure. | |
336 | |
337 <DT> parameter expansion | |
338 <DD> "parameter expansion" is simply a device to defer the | |
339 replacement of a macro parameter name until it has been | |
340 determined whether it is followed by "##", with perhaps some | |
341 intervening white space. | |
342 | |
343 <DT> parameter name | |
344 <DD> See <STRONG>variable</STRONG>. | |
345 | |
346 <DT> parse unit | |
347 <DD> The major problem in expanding a macro body is dealing with | |
348 the "##" operator. Since the arguments of ## are not to have | |
349 their macros expanded, but only macro arguments replaced, | |
350 they have to be dealt with specially. Thus "parse unit" | |
351 distinguishes those tokens which are not macro parameters, | |
352 "simple parse units" from the parameters and allows for | |
353 recognition of the concatenation operator before it goes | |
354 ahead and allows complete expansion of a macro parameter. | |
355 | |
356 <DT> right side | |
357 <DD> This token consists of anything that can follow "##" with | |
358 the exception of a parameter name which needs special | |
359 treatment. If the token is a macro name, the macro is not | |
360 expanded. | |
361 | |
362 <DT> simple macro | |
363 <DD> See variable. A "simple macro" is one which was defined | |
364 without any following parameter list. | |
365 | |
366 <DT> simple parse unit | |
367 <DD> "simple parse unit" consists of the input constructs which | |
368 do not immediately involve macro parameters. It allows for | |
369 complete macro expansion. | |
370 | |
371 <DT> space | |
372 <DD> The token scanner passes spaces along because they cannot be | |
373 discarded until after macros have been expanded. This is | |
374 because of the # operator which turns macro arguments into | |
375 strings. | |
376 <P> | |
377 If the args_only flag is set, spaces have to be passed on to | |
378 the output. Otherwise they can be discarded. These | |
379 productions accumulate space tokens on a stack, so that the | |
380 decision to output them or to discard them can be deferred. | |
381 If they are not output, they will effectively be discarded | |
382 by the reset() operation the next time a sequence of spaces | |
383 is encountered. | |
384 | |
385 <DT> variable | |
386 <DD> Since a NAME token can, depending on circumstance, name a | |
387 parameter a macro or the "defined" operator, a semantically | |
388 determined production is used to make the distinction. | |
389 "variable" is the outcome for NAME tokens that are simply to | |
390 be passed on without any special treatment. | |
391 </DL> | |
392 <P> | |
393 | |
394 <BR> | |
395 | |
396 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------" | |
397 WIDTH=1010 HEIGHT=2 > | |
398 <P> | |
399 <IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software" | |
400 WIDTH=181 HEIGHT=25> | |
401 <BR CLEAR="right"> | |
402 | |
403 <P> | |
404 Back to : | |
405 <A HREF="../../index.html">Index</A> | | |
406 <A HREF="index.html">Macro preprocessor overview</A> | |
407 <P> | |
408 | |
409 <P> | |
410 <ADDRESS><FONT SIZE="-1"> | |
411 AnaGram parser generator - examples<BR> | |
412 Macro/Argument Substitution Module - Macro preprocessor and C Parser <BR> | |
413 Copyright © 1993-1999, Parsifal Software. <BR> | |
414 All Rights Reserved.<BR> | |
415 </FONT></ADDRESS> | |
416 | |
417 </BODY> | |
418 </HTML> | |
419 |