comparison doc/misc/html/examples/mpp/mas.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:13d2b8934445
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2 <HTML>
3 <HEAD>
4 <TITLE>Macro/Argument Substitution Module - Macro preprocessor and C Parser </TITLE>
5 </HEAD>
6
7
8 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
9 TEXT="000000" LINK="0033CC"
10 VLINK="CC0033" ALINK="CC0099">
11
12 <P>
13 <IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram"
14 WIDTH=124 HEIGHT=30 >
15 <BR CLEAR="all">
16 Back to :
17 <A HREF="../../index.html">Index</A> |
18 <A HREF="index.html">Macro preprocessor overview</A>
19 <P>
20
21 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
22 WIDTH=1010 HEIGHT=2 >
23 <P>
24 <H1> Macro/Argument Substitution Module - Macro preprocessor and C Parser </H1>
25
26 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
27 WIDTH=1010 HEIGHT=2 >
28 <P>
29
30 <H2>Introduction</H2>
31 <P>
32
33 The Macro/Argument Substitution module, MAS.SYN,
34 accomplishes the following tasks:
35 <OL>
36 <LI> It can scan the body of a macro, to identify and
37 substitute for parameters and macro calls. </LI>
38 <LI> It can scan the set of arguments for a macro call
39 embedded within another macro, only substituting
40 arguments to the outer macro for parameters found
41 within arguments to the inner call. </LI>
42 <LI> It can scan the argument to a macro for macro calls
43 prior to substituting the argument for a parameter. </LI>
44 <LI> It can recognize the "##" operator and paste two tokens
45 together. </LI>
46 <LI> It can recognize the "#" operator and turn a macro
47 argument into a string. </LI>
48 </OL>
49 The macro/argument substitution parser, mas(), is called
50 from a shell function, expand_text(). expand_text() is, in
51 turn, called by expand_macro() and expand_arg(). Output from
52 mas() is accumulated on the token_accumulator, ta.
53 <P>
54 <BR>
55
56 <H2> Theory of Operation </H2>
57
58 The primary purpose of mas() is to scan sequences of tokens
59 for macro calls and parameter names and to do the indicated
60 substitutions. At the same time it must correctly handle the
61 "##" and '#' operators both of which inhibit macro expansion
62 of their operands. Thus the entire grammar is structured
63 around the requirements for these two operators.
64 <P>
65 A further complication is the handling of white space.
66 White space within a macro argument cannot be deleted, since
67 otherwise the '#' operator would not provide a correct
68 result. Thus, in numerous circumstances in this grammar, it
69 is not clear what to do with white space at the time it is
70 encountered. For this reason, any particular sequence of
71 white space tokens is saved up on a temporary stack,
72 space_stack, which can be output later or simply
73 disregarded.
74 <P>
75 Like TS.SYN, MAS.SYN must have special syntax for
76 accumulating the arguments of macro calls. The differences
77 between the two grammars arise from the fact that TS.SYN is
78 converting an ascii representation to a token
79 representation, while MAS.SYN already has token input.
80 <P>
81 <BR>
82
83 <H2> Elements of the Macro/Argument Substitution Module </H2>
84
85
86 The remainder of this document describes the macro
87 definitions, the structure definitions, the static data
88 definitions, all configuration parameter settings, and all
89 non-terminal parsing tokens used in the macro/argument
90 substitution module. It also explains each configuration
91 parameter setting in the syntax file. In MAS.SYN, each
92 function that is defined is preceded by a short explanation
93 of its purpose.
94 <P>
95 <BR>
96
97 <H2> Macro Definitions </H2>
98
99 <DL>
100 <DT> INPUT_CODE
101 <DD> Since this grammar uses "input values" and "pointer input",
102 the parser needs to know how to extract the identification
103 code for an input token from the item identified by the
104 pointer. The parser expects this macro to be appropriately
105 defined. Here it is defined so that it extracts the id field
106 of the token.
107
108 <DT> PCB
109 <DD> Since the "declare pcb" switch has been turned off, PCB has
110 to be defined manually.
111
112 <DT> SYNTAX_ERROR
113 <DD> This definition of SYNTAX_ERROR overrides the default
114 definition provided by AnaGram.
115 </DL>
116 <P>
117 <BR>
118
119 <H2> Static variables </H2>
120
121 <DL>
122 <DT> active_macros
123 <DD> Type: stack<unsigned>
124 <P>
125 This is a multilevel stack used to keep track of which
126 macros have been invoked in any particular expansion. If,
127 after an expansion pass it is determined that the result of
128 a concatenation is a macro name, all the macros which have
129 been expanded so far are marked busy and the text is scanned
130 again. Once there is no need for further scans, the busy
131 flags are turned off. The stack is multi-level so that it
132 can nest easily for recursive usage.
133
134 <DT> args
135 <DD> Type: token **
136 <P>
137 "args" is an array of pointers to token strings. It is the
138 set of argument strings for the macro currently being
139 expanded.
140
141 <DT> args_only
142 <DD> Type: int
143 <P>
144 "args_only" is a switch to tell the macro/argument
145 substitution logic only to make argument substitutions and
146 not to expand macros. In id_macro it is interrogated and if
147 set, NAME tokens are not checked to see if they identify
148 macros.
149
150 <DT> mas_pcb
151 <DD> Type: mas_pcb_type *
152 <P>
153 This variable contains a pointer to the currently active
154 parser control block for mas(). It is saved, set and
155 restored in expand_text().
156
157 <DT> n_concats
158 <DD> Type: int
159 <P>
160 This variable is used to count the number of concatenation
161 operations that result in a macro name in the course of a
162 single scan for macros. If it is non-zero, the text is
163 rescanned.
164
165 <DT> n_args
166 <DD> Type: int
167 <P>
168 This variable specifies the number of arguments for the
169 current macro being expanded.
170
171 <DT> params
172 <DD> Type: unsigned *
173 <P>
174 This variable is a pointer to a list of n_args unsigned
175 integers. The integers are the indices in the token
176 dictionary of the parameter names for the macro currently
177 being expanded.
178
179 <DT> space_stack
180 <DD> Type: token_accumulator
181 <P>
182 In a number of places in this grammar, it is necessary to
183 pass over white space tokens without knowing whether they
184 are to be output or disregarded. "space_stack" provides a
185 place to store them temporarily until the decision can be
186 made. Remember that within macro arguments spaces can be
187 significant, and therefore must not be discarded
188 prematurely.
189 </DL>
190 <P>
191 <BR>
192
193 <H2> Configuration Parameters </H2>
194 <DL>
195
196 <DT> ~allow macros
197 <DD> This statement turns off the allow macros switch so that
198 AnaGram implements all reduction procedures as explicit
199 function definitions. This simplifies debugging at the cost
200 of a slight performance degradation.
201
202 <DT> ~backtrack
203 <DD> This statement turns off the backtrack switch. This means
204 that if the token scanner encounters a syntax error, it will
205 not undo default reductions that may have been caused by the
206 bad input before it generates diagnostics.
207
208 <DT> context type = location
209 <DD> This statement specifies that the generated parser is to
210 track context automatically. The context variables have type
211 "location". location is defined elsewhere to consist of two
212 fields: line number and column number.
213
214 <DT> ~declare pcb
215 <DD> This statement tells AnaGram not to declare a parser control
216 block for the parser. Access to the parser control block is
217 through a pointer. Actual allocation of storage and setting
218 of the pointer takes place in expand_text().
219
220 <DT> default input type = token
221 <DD> This statement tells AnaGram how to code reduction procedure
222 calls that involve input tokens.
223
224 <DT> enum
225 <DD> This enumeration statement provides definitions for terminal
226 tokens. The same enum statement is found in EX.SYN, in
227 JRC.SYN, in KRC.SYN and in MPP.H.
228
229 <DT> ~error frame
230 <DD> This turns off the error frame portion of the automatic
231 syntax error diagnostic generator, since the context of the
232 error in the macro substition syntax is of little interest.
233 If an error frame were to be used in diagnostics that of the
234 C parser would be more appropriate.
235
236 <DT> error trace
237 <DD> This turns on the error trace function, so that if the token
238 scanner encounters a syntax error it will write an .etr
239 file.
240
241 <DT> input values
242 <DD> This switch tells AnaGram that the input units carry some
243 baggage, that they have values apart from their identifying
244 code. Since this grammar uses pointer input, an INPUT_CODE
245 macro must also be defined.
246
247 <DT> ~lines and columns
248 <DD> Turns off the lines and columns switch so your parser won't
249 try to track them here where they certainly make no sense.
250
251 <DT> line numbers
252 <DD> This statement causes AnaGram to include #line statements in
253 the parser file so that your compiler can provided
254 diagnostics keyed to your syntax file.
255
256 <DT> pointer input
257 <DD> This statement tells AnaGram that the input to mas() is an
258 array in memory that can be scanned simply by incrementing a
259 pointer. Since the input tokens are not simply characters, a
260 pointer type statement is required and the INPUT_CODE macro
261 must be defined.
262
263 <DT> pointer type = token *
264 <DD> This statement provides the C data type of the input units
265 to the parser. If this statement were omitted, pointer type
266 would default to unsigned char *, and your compiler would
267 scold when it tried to compile the parser.
268
269 <DT> subgrammar parse unit
270 <DD> This statement tells AnaGram that the specifications for
271 parse unit are internallly complete and that it should not
272 determine reductions by inspecting following tokens.
273
274 <DT> ~test range
275 <DD> This statement tells AnaGram not to check input characters
276 to see if they are within allowable limits. This checking is
277 not necessary since the input to mas() has been generated in
278 such a way that it cannot possibly get an out of range
279 token.
280 </DL>
281 <P>
282 <BR>
283
284 <H2> Grammar Tokens </H2>
285 <DL>
286 <DT> arg element
287 <DD> An "arg element" is a discrete token in an argument to a
288 macro or a sequence of "nested elements".
289
290 <DT> arg elements
291 <DD> A sequence of "arg element".
292
293 <DT> concatenation
294 <DD> These productions implement the "##" operator. "parameter
295 name" is distinguished on the right side to avoid improper
296 macro expansions.
297
298 <DT> defined
299 <DD> See "variable". "defined" is the special operator available
300 in #if and #elif statements to determine whether a macro has
301 been defined. It is recognized only if "if_clause" has been
302 set.
303
304 <DT> grammar
305 <DD> "grammar" simply describes the complete input to MAS. Since
306 "grammar" is a special name recognized by AnaGram, there is
307 no need for any further specification of the "start token"
308 for the grammar.
309
310 <DT> left side
311 <DD> This refers to a "##" operator, its left operand, and any
312 intervening white space. The cross recursion with
313 "concatenation" allows for constructs of the form A ## B ##
314 C, with grouping to the left.
315
316 <DT> macro
317 <DD> See "variable". This grammar distinguishes between a "simple
318 macro" which was defined without any parameter list, and a
319 "macro" which had an explicit, although perhaps empty,
320 parameter list. If a "macro" appears without following
321 parentheses, it is simply passed on without being expanded.
322
323 <DT> macro arg list
324 <DD> This token counts the number of arguments to a macro, and
325 stacks them on so many levels of the token accumulator. The
326 logic is essentially the same as for macro arg list in
327 TS.SYN.
328
329 <DT> nested elements
330 <DD> The "nested elements" token represents a sequence of macro
331 argument tokens enclosed in matching parentheses.
332
333 <DT> not parameter
334 <DD> This token exists simply to avoid multiple copies of the
335 same reduction procedure.
336
337 <DT> parameter expansion
338 <DD> "parameter expansion" is simply a device to defer the
339 replacement of a macro parameter name until it has been
340 determined whether it is followed by "##", with perhaps some
341 intervening white space.
342
343 <DT> parameter name
344 <DD> See <STRONG>variable</STRONG>.
345
346 <DT> parse unit
347 <DD> The major problem in expanding a macro body is dealing with
348 the "##" operator. Since the arguments of ## are not to have
349 their macros expanded, but only macro arguments replaced,
350 they have to be dealt with specially. Thus "parse unit"
351 distinguishes those tokens which are not macro parameters,
352 "simple parse units" from the parameters and allows for
353 recognition of the concatenation operator before it goes
354 ahead and allows complete expansion of a macro parameter.
355
356 <DT> right side
357 <DD> This token consists of anything that can follow "##" with
358 the exception of a parameter name which needs special
359 treatment. If the token is a macro name, the macro is not
360 expanded.
361
362 <DT> simple macro
363 <DD> See variable. A "simple macro" is one which was defined
364 without any following parameter list.
365
366 <DT> simple parse unit
367 <DD> "simple parse unit" consists of the input constructs which
368 do not immediately involve macro parameters. It allows for
369 complete macro expansion.
370
371 <DT> space
372 <DD> The token scanner passes spaces along because they cannot be
373 discarded until after macros have been expanded. This is
374 because of the # operator which turns macro arguments into
375 strings.
376 <P>
377 If the args_only flag is set, spaces have to be passed on to
378 the output. Otherwise they can be discarded. These
379 productions accumulate space tokens on a stack, so that the
380 decision to output them or to discard them can be deferred.
381 If they are not output, they will effectively be discarded
382 by the reset() operation the next time a sequence of spaces
383 is encountered.
384
385 <DT> variable
386 <DD> Since a NAME token can, depending on circumstance, name a
387 parameter a macro or the "defined" operator, a semantically
388 determined production is used to make the distinction.
389 "variable" is the outcome for NAME tokens that are simply to
390 be passed on without any special treatment.
391 </DL>
392 <P>
393
394 <BR>
395
396 <IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
397 WIDTH=1010 HEIGHT=2 >
398 <P>
399 <IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software"
400 WIDTH=181 HEIGHT=25>
401 <BR CLEAR="right">
402
403 <P>
404 Back to :
405 <A HREF="../../index.html">Index</A> |
406 <A HREF="index.html">Macro preprocessor overview</A>
407 <P>
408
409 <P>
410 <ADDRESS><FONT SIZE="-1">
411 AnaGram parser generator - examples<BR>
412 Macro/Argument Substitution Module - Macro preprocessor and C Parser <BR>
413 Copyright &copy; 1993-1999, Parsifal Software. <BR>
414 All Rights Reserved.<BR>
415 </FONT></ADDRESS>
416
417 </BODY>
418 </HTML>
419