Mercurial > ~dholland > hg > ag > index.cgi

diff doc/misc/html/examples/mpp/mas.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author: David A. Holland
date: Sat, 22 Dec 2007 17:52:45 -0500
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/misc/html/examples/mpp/mas.html	Sat Dec 22 17:52:45 2007 -0500
@@ -0,0 +1,419 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+<HEAD>
+<TITLE>Macro/Argument Substitution Module - Macro preprocessor and C Parser </TITLE>
+</HEAD>
+
+
+<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
+ TEXT="000000" LINK="0033CC"
+ VLINK="CC0033" ALINK="CC0099">
+
+<P>
+<IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram"
+         WIDTH=124 HEIGHT=30 >
+<BR CLEAR="all">
+Back to :
+<A HREF="../../index.html">Index</A> |
+<A HREF="index.html">Macro preprocessor overview</A>
+<P>
+
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+        WIDTH=1010 HEIGHT=2  >
+<P>
+<H1> Macro/Argument Substitution Module - Macro preprocessor and C Parser   </H1>
+
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+        WIDTH=1010 HEIGHT=2  >
+<P>
+
+<H2>Introduction</H2>
+<P>
+
+          The Macro/Argument Substitution module, MAS.SYN,
+          accomplishes the following tasks:
+<OL>
+<LI>            It can scan the body of a macro, to identify and
+               substitute for parameters and macro calls. </LI>
+<LI>            It can scan the set of arguments for a macro call
+               embedded within another macro, only substituting
+               arguments to the outer macro for parameters found
+               within arguments to the inner call. </LI>
+<LI>            It can scan the argument to a macro for macro calls
+               prior to substituting the argument for a parameter. </LI>
+<LI>            It can recognize the "##" operator and paste two tokens
+               together. </LI>
+<LI>            It can recognize the "#" operator and turn a macro
+               argument into a string. </LI>
+</OL>
+          The macro/argument substitution parser, mas(), is called
+          from a shell function, expand_text(). expand_text() is, in
+          turn, called by expand_macro() and expand_arg(). Output from
+          mas() is accumulated on the token_accumulator, ta.
+<P>
+<BR>
+
+<H2>     Theory of Operation </H2>
+
+          The primary purpose of mas() is to scan sequences of tokens
+          for macro calls and parameter names and to do the indicated
+          substitutions. At the same time it must correctly handle the
+          "##" and '#' operators both of which inhibit macro expansion
+          of their operands. Thus the entire grammar is structured
+          around the requirements for these two operators.
+<P>
+          A further complication is the handling of white space.
+          White space within a macro argument cannot be deleted, since
+          otherwise the '#' operator would not provide a correct
+          result.  Thus, in numerous circumstances in this grammar, it
+          is not clear what to do with white space at the time it is
+          encountered.  For this reason, any particular sequence of
+          white space tokens is saved up on a temporary stack,
+          space_stack, which can be output later or simply
+          disregarded.
+<P>
+          Like TS.SYN, MAS.SYN must have special syntax for
+          accumulating the arguments of macro calls. The differences
+          between the two grammars arise from the fact that TS.SYN is
+          converting an ascii representation to a token
+          representation, while MAS.SYN already has token input.
+<P>
+<BR>
+
+<H2>     Elements of the Macro/Argument Substitution Module </H2>
+
+
+          The remainder of this document describes the macro
+          definitions, the structure definitions, the static data
+          definitions, all configuration parameter settings, and all
+          non-terminal parsing tokens used in the macro/argument
+          substitution module. It also explains each configuration
+          parameter setting in the syntax file. In MAS.SYN, each
+          function that is defined is preceded by a short explanation
+          of its purpose.
+<P>
+<BR>
+
+<H2>     Macro Definitions </H2>
+
+<DL>
+<DT>     INPUT_CODE
+   <DD>          Since this grammar uses "input values" and "pointer input",
+          the parser needs to know how to extract the identification
+          code for an input token from the item identified by the
+          pointer. The parser expects this macro to be appropriately
+          defined. Here it is defined so that it extracts the id field
+          of the token.
+
+<DT>     PCB
+   <DD>          Since the "declare pcb" switch has been turned off, PCB has
+          to be defined manually.
+
+<DT>     SYNTAX_ERROR
+   <DD>          This definition of SYNTAX_ERROR overrides the default
+          definition provided by AnaGram.
+</DL>
+<P>
+<BR>
+
+<H2>     Static variables </H2>
+
+<DL>
+<DT>     active_macros
+   <DD>          Type: stack<unsigned>
+<P>
+          This is a multilevel stack used to keep track of which
+          macros have been invoked in any particular expansion. If,
+          after an expansion pass it is determined that the result of
+          a concatenation is a macro name, all the macros which have
+          been expanded so far are marked busy and the text is scanned
+          again. Once there is no need for further scans, the busy
+          flags are turned off. The stack is multi-level so that it
+          can nest easily for recursive usage.
+
+<DT>     args
+   <DD>          Type: token **
+<P>
+          "args" is an array of pointers to token strings. It is the
+          set of argument strings for the macro currently being
+          expanded.
+
+<DT>     args_only
+    <DD>         Type: int
+<P>
+          "args_only" is a switch to tell the macro/argument
+          substitution logic only to make argument substitutions and
+          not to expand macros. In id_macro it is interrogated and if
+          set, NAME tokens are not checked to see if they identify
+          macros.
+
+<DT>     mas_pcb
+   <DD>          Type: mas_pcb_type *
+<P>
+          This variable contains a pointer to the currently active
+          parser control block for mas(). It is saved, set and
+          restored in expand_text().
+
+<DT>     n_concats
+    <DD>         Type: int
+<P>
+          This variable is used to count the number of concatenation
+          operations that result in a macro name in the course of a
+          single scan for macros. If it is non-zero, the text is
+          rescanned.
+
+<DT>     n_args
+   <DD>          Type: int
+<P>
+          This variable specifies the number of arguments for the
+          current macro being expanded.
+
+<DT>     params
+    <DD>         Type: unsigned *
+<P>
+          This variable is a pointer to a list of n_args unsigned
+          integers. The integers are the indices in the token
+          dictionary of the parameter names for the macro currently
+          being expanded.
+
+<DT>     space_stack
+   <DD>          Type: token_accumulator
+<P>
+          In a number of places in this grammar, it is necessary to
+          pass over white space tokens without knowing whether they
+          are to be output or disregarded. "space_stack" provides a
+          place to store them temporarily until the decision can be
+          made. Remember that within macro arguments spaces can be
+          significant, and therefore must not be discarded
+          prematurely.
+</DL>
+<P>
+<BR>
+
+<H2>     Configuration Parameters  </H2>
+<DL>
+
+<DT>     ~allow macros
+   <DD>         This statement turns off the allow macros switch so that
+          AnaGram implements all reduction procedures as explicit
+          function definitions. This simplifies debugging at the cost
+          of a slight performance degradation.
+
+<DT>     ~backtrack
+   <DD>          This statement turns off the backtrack switch. This means
+          that if the token scanner encounters a syntax error, it will
+          not undo default reductions that may have been caused by the
+          bad input before it generates diagnostics.
+
+<DT>     context type = location
+   <DD>          This statement specifies that the generated parser is to
+          track context automatically. The context variables have type
+          "location". location is defined elsewhere to consist of two
+          fields: line number and column number.
+
+<DT>     ~declare pcb
+    <DD>         This statement tells AnaGram not to declare a parser control
+          block for the parser. Access to the parser control block is
+          through a pointer. Actual allocation of storage and setting
+          of the pointer takes place in expand_text().
+
+<DT>     default input type = token
+   <DD>          This statement tells AnaGram how to code reduction procedure
+          calls that involve input tokens.
+
+<DT>     enum
+   <DD>          This enumeration statement provides definitions for terminal
+          tokens. The same enum statement is found in EX.SYN, in
+          JRC.SYN, in KRC.SYN and in MPP.H.
+
+<DT>     ~error frame
+   <DD>          This turns off the error frame portion of the automatic
+          syntax error diagnostic generator, since the context of the
+          error in the macro substition syntax is of little interest.
+          If an error frame were to be used in diagnostics that of the
+          C parser would be more appropriate.
+
+<DT>     error trace
+   <DD>          This turns on the error trace function, so that if the token
+          scanner encounters a syntax error it will write an .etr
+          file.
+
+<DT>     input values
+   <DD>          This switch tells AnaGram that the input units carry some
+          baggage, that they have values apart from their identifying
+          code. Since this grammar uses pointer input, an INPUT_CODE
+          macro must also be defined.
+
+<DT>     ~lines and columns
+   <DD>          Turns off the lines and columns switch so your parser won't
+          try to track them here where they certainly make no sense.
+
+<DT>     line numbers
+   <DD>          This statement causes AnaGram to include #line statements in
+          the parser file so that your compiler can provided
+          diagnostics keyed to your syntax file.
+
+<DT>     pointer input
+   <DD>          This statement tells AnaGram that the input to mas() is an
+          array in memory that can be scanned simply by incrementing a
+          pointer. Since the input tokens are not simply characters, a
+          pointer type statement is required and the INPUT_CODE macro
+          must be defined.
+
+<DT>     pointer type = token *
+   <DD>          This statement provides the C data type of the input units
+          to the parser. If this statement were omitted, pointer type
+          would default to unsigned char *, and your compiler would
+          scold when it tried to compile the parser.
+
+<DT>     subgrammar parse unit
+   <DD>          This statement tells AnaGram that the specifications for
+          parse unit are internallly complete and that it should not
+          determine reductions by inspecting following tokens.
+
+<DT>     ~test range
+   <DD>          This statement tells AnaGram not to check input characters
+          to see if they are within allowable limits. This checking is
+          not necessary since the input to mas() has been generated in
+          such a way that it cannot possibly get an out of range
+          token.
+</DL>
+<P>
+<BR>
+
+<H2>     Grammar Tokens </H2>
+<DL>
+<DT>     arg element
+    <DD>         An "arg element" is a discrete token in an argument to a
+          macro or a sequence of "nested elements".
+
+<DT>     arg elements
+   <DD>          A sequence of "arg element".
+
+<DT>     concatenation
+   <DD>         These productions implement the "##" operator. "parameter
+          name" is distinguished on the right side to avoid improper
+          macro expansions.
+
+<DT>     defined
+   <DD>          See "variable". "defined" is the special operator available
+          in #if and #elif statements to determine whether a macro has
+          been defined. It is recognized only if "if_clause" has been
+          set.
+
+<DT>     grammar
+   <DD>          "grammar" simply describes the complete input to MAS. Since
+          "grammar" is a special name recognized by AnaGram, there is
+          no need for any further specification of the "start token"
+          for the grammar.
+
+<DT>     left side
+   <DD>          This refers to a "##" operator, its left operand, and any
+          intervening white space. The cross recursion with
+          "concatenation" allows for constructs of the form A ## B ##
+          C, with grouping to the left.
+
+<DT>     macro
+   <DD>          See "variable". This grammar distinguishes between a "simple
+          macro" which was defined without any parameter list, and a
+          "macro" which had an explicit, although perhaps empty,
+          parameter list. If a "macro" appears without following
+          parentheses, it is simply passed on without being expanded.
+
+<DT>     macro arg list
+    <DD>         This token counts the number of arguments to a macro, and
+          stacks them on so many levels of the token accumulator. The
+          logic is essentially the same as for macro arg list in
+          TS.SYN.
+
+<DT>     nested elements
+   <DD>          The "nested elements" token represents a sequence of macro
+          argument tokens enclosed in matching parentheses.
+
+<DT>     not parameter
+   <DD>          This token exists simply to avoid multiple copies of the
+          same reduction procedure.
+
+<DT>     parameter expansion
+   <DD>          "parameter expansion" is simply a device to defer the
+          replacement of a macro parameter name until it has been
+          determined whether it is followed by "##", with perhaps some
+          intervening white space.
+
+<DT>     parameter name
+   <DD>          See <STRONG>variable</STRONG>.
+
+<DT>     parse unit
+   <DD>          The major problem in expanding a macro body is dealing with
+          the "##" operator. Since the arguments of ## are not to have
+          their macros expanded, but only macro arguments replaced,
+          they have to be dealt with specially. Thus "parse unit"
+          distinguishes those tokens which are not macro parameters,
+          "simple parse units" from the parameters and allows for
+          recognition of the concatenation operator before it goes
+          ahead and allows complete expansion of a macro parameter.
+
+<DT>     right side
+   <DD>          This token consists of anything that can follow "##" with
+          the exception of a parameter name which needs special
+          treatment. If the token is a macro name, the macro is not
+          expanded.
+
+<DT>     simple macro
+   <DD>          See variable. A "simple macro" is one which was defined
+          without any following parameter list.
+
+<DT>     simple parse unit
+   <DD>          "simple parse unit" consists of the input constructs which
+          do not immediately involve macro parameters. It allows for
+          complete macro expansion.
+
+<DT>     space
+   <DD>          The token scanner passes spaces along because they cannot be
+          discarded until after macros have been expanded. This is
+          because of the # operator which turns macro arguments into
+          strings.
+<P>
+          If the args_only flag is set, spaces have to be passed on to
+          the output. Otherwise they can be discarded. These
+          productions accumulate space tokens on a stack, so that the
+          decision to output them or to discard them can be deferred.
+          If they are not output, they will effectively be discarded
+          by the reset() operation the next time a sequence of spaces
+          is encountered.
+
+<DT>     variable
+   <DD>       Since a NAME token can, depending on circumstance, name a
+          parameter a macro or the "defined" operator, a semantically
+          determined production is used to make the distinction.
+          "variable" is the outcome for NAME tokens that are simply to
+          be passed on without any special treatment.
+</DL>
+<P>
+
+<BR>
+
+<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
+      WIDTH=1010 HEIGHT=2 >
+<P>
+<IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software"
+                WIDTH=181 HEIGHT=25>
+<BR CLEAR="right">
+
+<P>
+Back to :
+<A HREF="../../index.html">Index</A> |
+<A HREF="index.html">Macro preprocessor overview</A>
+<P>
+
+<P>
+<ADDRESS><FONT SIZE="-1">
+                  AnaGram parser generator - examples<BR>
+                  Macro/Argument Substitution Module - Macro preprocessor and C Parser <BR>
+                  Copyright &copy; 1993-1999, Parsifal Software. <BR>
+                  All Rights Reserved.<BR>
+</FONT></ADDRESS>
+
+</BODY>
+</HTML>
+
author	David A. Holland
date	Sat, 22 Dec 2007 17:52:45 -0500
parents
children