view doc/misc/html/examples/mpp/mas.html @ 24:a4899cdfc2d6 default tip

Obfuscate the regexps to strip off the IBM compiler's copyright banners. I don't want bots scanning github to think they're real copyright notices because that could cause real problems.
author David A. Holland
date Mon, 13 Jun 2022 00:40:23 -0400
parents 13d2b8934445
children
line wrap: on
line source

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Macro/Argument Substitution Module - Macro preprocessor and C Parser </TITLE>
</HEAD>


<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
 TEXT="000000" LINK="0033CC"
 VLINK="CC0033" ALINK="CC0099">

<P>
<IMG ALIGN="right" SRC="../../images/agrsl6c.gif" ALT="AnaGram"
         WIDTH=124 HEIGHT=30 >
<BR CLEAR="all">
Back to :
<A HREF="../../index.html">Index</A> |
<A HREF="index.html">Macro preprocessor overview</A>
<P>

<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
        WIDTH=1010 HEIGHT=2  >
<P>
<H1> Macro/Argument Substitution Module - Macro preprocessor and C Parser   </H1>

<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
        WIDTH=1010 HEIGHT=2  >
<P>

<H2>Introduction</H2>
<P>

          The Macro/Argument Substitution module, MAS.SYN,
          accomplishes the following tasks:
<OL>
<LI>            It can scan the body of a macro, to identify and
               substitute for parameters and macro calls. </LI>
<LI>            It can scan the set of arguments for a macro call
               embedded within another macro, only substituting
               arguments to the outer macro for parameters found
               within arguments to the inner call. </LI>
<LI>            It can scan the argument to a macro for macro calls
               prior to substituting the argument for a parameter. </LI>
<LI>            It can recognize the "##" operator and paste two tokens
               together. </LI>
<LI>            It can recognize the "#" operator and turn a macro
               argument into a string. </LI>
</OL>
          The macro/argument substitution parser, mas(), is called
          from a shell function, expand_text(). expand_text() is, in
          turn, called by expand_macro() and expand_arg(). Output from
          mas() is accumulated on the token_accumulator, ta.
<P>
<BR>

<H2>     Theory of Operation </H2>

          The primary purpose of mas() is to scan sequences of tokens
          for macro calls and parameter names and to do the indicated
          substitutions. At the same time it must correctly handle the
          "##" and '#' operators both of which inhibit macro expansion
          of their operands. Thus the entire grammar is structured
          around the requirements for these two operators.
<P>
          A further complication is the handling of white space.
          White space within a macro argument cannot be deleted, since
          otherwise the '#' operator would not provide a correct
          result.  Thus, in numerous circumstances in this grammar, it
          is not clear what to do with white space at the time it is
          encountered.  For this reason, any particular sequence of
          white space tokens is saved up on a temporary stack,
          space_stack, which can be output later or simply
          disregarded.
<P>
          Like TS.SYN, MAS.SYN must have special syntax for
          accumulating the arguments of macro calls. The differences
          between the two grammars arise from the fact that TS.SYN is
          converting an ascii representation to a token
          representation, while MAS.SYN already has token input.
<P>
<BR>

<H2>     Elements of the Macro/Argument Substitution Module </H2>


          The remainder of this document describes the macro
          definitions, the structure definitions, the static data
          definitions, all configuration parameter settings, and all
          non-terminal parsing tokens used in the macro/argument
          substitution module. It also explains each configuration
          parameter setting in the syntax file. In MAS.SYN, each
          function that is defined is preceded by a short explanation
          of its purpose.
<P>
<BR>

<H2>     Macro Definitions </H2>

<DL>
<DT>     INPUT_CODE
   <DD>          Since this grammar uses "input values" and "pointer input",
          the parser needs to know how to extract the identification
          code for an input token from the item identified by the
          pointer. The parser expects this macro to be appropriately
          defined. Here it is defined so that it extracts the id field
          of the token.

<DT>     PCB
   <DD>          Since the "declare pcb" switch has been turned off, PCB has
          to be defined manually.

<DT>     SYNTAX_ERROR
   <DD>          This definition of SYNTAX_ERROR overrides the default
          definition provided by AnaGram.
</DL>
<P>
<BR>

<H2>     Static variables </H2>

<DL>
<DT>     active_macros
   <DD>          Type: stack<unsigned>
<P>
          This is a multilevel stack used to keep track of which
          macros have been invoked in any particular expansion. If,
          after an expansion pass it is determined that the result of
          a concatenation is a macro name, all the macros which have
          been expanded so far are marked busy and the text is scanned
          again. Once there is no need for further scans, the busy
          flags are turned off. The stack is multi-level so that it
          can nest easily for recursive usage.

<DT>     args
   <DD>          Type: token **
<P>
          "args" is an array of pointers to token strings. It is the
          set of argument strings for the macro currently being
          expanded.

<DT>     args_only
    <DD>         Type: int
<P>
          "args_only" is a switch to tell the macro/argument
          substitution logic only to make argument substitutions and
          not to expand macros. In id_macro it is interrogated and if
          set, NAME tokens are not checked to see if they identify
          macros.

<DT>     mas_pcb
   <DD>          Type: mas_pcb_type *
<P>
          This variable contains a pointer to the currently active
          parser control block for mas(). It is saved, set and
          restored in expand_text().

<DT>     n_concats
    <DD>         Type: int
<P>
          This variable is used to count the number of concatenation
          operations that result in a macro name in the course of a
          single scan for macros. If it is non-zero, the text is
          rescanned.

<DT>     n_args
   <DD>          Type: int
<P>
          This variable specifies the number of arguments for the
          current macro being expanded.

<DT>     params
    <DD>         Type: unsigned *
<P>
          This variable is a pointer to a list of n_args unsigned
          integers. The integers are the indices in the token
          dictionary of the parameter names for the macro currently
          being expanded.

<DT>     space_stack
   <DD>          Type: token_accumulator
<P>
          In a number of places in this grammar, it is necessary to
          pass over white space tokens without knowing whether they
          are to be output or disregarded. "space_stack" provides a
          place to store them temporarily until the decision can be
          made. Remember that within macro arguments spaces can be
          significant, and therefore must not be discarded
          prematurely.
</DL>
<P>
<BR>

<H2>     Configuration Parameters  </H2>
<DL>

<DT>     ~allow macros
   <DD>         This statement turns off the allow macros switch so that
          AnaGram implements all reduction procedures as explicit
          function definitions. This simplifies debugging at the cost
          of a slight performance degradation.

<DT>     ~backtrack
   <DD>          This statement turns off the backtrack switch. This means
          that if the token scanner encounters a syntax error, it will
          not undo default reductions that may have been caused by the
          bad input before it generates diagnostics.

<DT>     context type = location
   <DD>          This statement specifies that the generated parser is to
          track context automatically. The context variables have type
          "location". location is defined elsewhere to consist of two
          fields: line number and column number.

<DT>     ~declare pcb
    <DD>         This statement tells AnaGram not to declare a parser control
          block for the parser. Access to the parser control block is
          through a pointer. Actual allocation of storage and setting
          of the pointer takes place in expand_text().

<DT>     default input type = token
   <DD>          This statement tells AnaGram how to code reduction procedure
          calls that involve input tokens.

<DT>     enum
   <DD>          This enumeration statement provides definitions for terminal
          tokens. The same enum statement is found in EX.SYN, in
          JRC.SYN, in KRC.SYN and in MPP.H.

<DT>     ~error frame
   <DD>          This turns off the error frame portion of the automatic
          syntax error diagnostic generator, since the context of the
          error in the macro substition syntax is of little interest.
          If an error frame were to be used in diagnostics that of the
          C parser would be more appropriate.

<DT>     error trace
   <DD>          This turns on the error trace function, so that if the token
          scanner encounters a syntax error it will write an .etr
          file.

<DT>     input values
   <DD>          This switch tells AnaGram that the input units carry some
          baggage, that they have values apart from their identifying
          code. Since this grammar uses pointer input, an INPUT_CODE
          macro must also be defined.

<DT>     ~lines and columns
   <DD>          Turns off the lines and columns switch so your parser won't
          try to track them here where they certainly make no sense.

<DT>     line numbers
   <DD>          This statement causes AnaGram to include #line statements in
          the parser file so that your compiler can provided
          diagnostics keyed to your syntax file.

<DT>     pointer input
   <DD>          This statement tells AnaGram that the input to mas() is an
          array in memory that can be scanned simply by incrementing a
          pointer. Since the input tokens are not simply characters, a
          pointer type statement is required and the INPUT_CODE macro
          must be defined.

<DT>     pointer type = token *
   <DD>          This statement provides the C data type of the input units
          to the parser. If this statement were omitted, pointer type
          would default to unsigned char *, and your compiler would
          scold when it tried to compile the parser.

<DT>     subgrammar parse unit
   <DD>          This statement tells AnaGram that the specifications for
          parse unit are internallly complete and that it should not
          determine reductions by inspecting following tokens.

<DT>     ~test range
   <DD>          This statement tells AnaGram not to check input characters
          to see if they are within allowable limits. This checking is
          not necessary since the input to mas() has been generated in
          such a way that it cannot possibly get an out of range
          token.
</DL>
<P>
<BR>

<H2>     Grammar Tokens </H2>
<DL>
<DT>     arg element
    <DD>         An "arg element" is a discrete token in an argument to a
          macro or a sequence of "nested elements".

<DT>     arg elements
   <DD>          A sequence of "arg element".

<DT>     concatenation
   <DD>         These productions implement the "##" operator. "parameter
          name" is distinguished on the right side to avoid improper
          macro expansions.

<DT>     defined
   <DD>          See "variable". "defined" is the special operator available
          in #if and #elif statements to determine whether a macro has
          been defined. It is recognized only if "if_clause" has been
          set.

<DT>     grammar
   <DD>          "grammar" simply describes the complete input to MAS. Since
          "grammar" is a special name recognized by AnaGram, there is
          no need for any further specification of the "start token"
          for the grammar.

<DT>     left side
   <DD>          This refers to a "##" operator, its left operand, and any
          intervening white space. The cross recursion with
          "concatenation" allows for constructs of the form A ## B ##
          C, with grouping to the left.

<DT>     macro
   <DD>          See "variable". This grammar distinguishes between a "simple
          macro" which was defined without any parameter list, and a
          "macro" which had an explicit, although perhaps empty,
          parameter list. If a "macro" appears without following
          parentheses, it is simply passed on without being expanded.

<DT>     macro arg list
    <DD>         This token counts the number of arguments to a macro, and
          stacks them on so many levels of the token accumulator. The
          logic is essentially the same as for macro arg list in
          TS.SYN.

<DT>     nested elements
   <DD>          The "nested elements" token represents a sequence of macro
          argument tokens enclosed in matching parentheses.

<DT>     not parameter
   <DD>          This token exists simply to avoid multiple copies of the
          same reduction procedure.

<DT>     parameter expansion
   <DD>          "parameter expansion" is simply a device to defer the
          replacement of a macro parameter name until it has been
          determined whether it is followed by "##", with perhaps some
          intervening white space.

<DT>     parameter name
   <DD>          See <STRONG>variable</STRONG>.

<DT>     parse unit
   <DD>          The major problem in expanding a macro body is dealing with
          the "##" operator. Since the arguments of ## are not to have
          their macros expanded, but only macro arguments replaced,
          they have to be dealt with specially. Thus "parse unit"
          distinguishes those tokens which are not macro parameters,
          "simple parse units" from the parameters and allows for
          recognition of the concatenation operator before it goes
          ahead and allows complete expansion of a macro parameter.

<DT>     right side
   <DD>          This token consists of anything that can follow "##" with
          the exception of a parameter name which needs special
          treatment. If the token is a macro name, the macro is not
          expanded.

<DT>     simple macro
   <DD>          See variable. A "simple macro" is one which was defined
          without any following parameter list.

<DT>     simple parse unit
   <DD>          "simple parse unit" consists of the input constructs which
          do not immediately involve macro parameters. It allows for
          complete macro expansion.

<DT>     space
   <DD>          The token scanner passes spaces along because they cannot be
          discarded until after macros have been expanded. This is
          because of the # operator which turns macro arguments into
          strings.
<P>
          If the args_only flag is set, spaces have to be passed on to
          the output. Otherwise they can be discarded. These
          productions accumulate space tokens on a stack, so that the
          decision to output them or to discard them can be deferred.
          If they are not output, they will effectively be discarded
          by the reset() operation the next time a sequence of spaces
          is encountered.

<DT>     variable
   <DD>       Since a NAME token can, depending on circumstance, name a
          parameter a macro or the "defined" operator, a semantically
          determined production is used to make the distinction.
          "variable" is the outcome for NAME tokens that are simply to
          be passed on without any special treatment.
</DL>
<P>

<BR>

<IMG ALIGN="bottom" SRC="../../images/rbline6j.gif" ALT="----------------------"
      WIDTH=1010 HEIGHT=2 >
<P>
<IMG ALIGN="right" SRC="../../images/pslrb6d.gif" ALT="Parsifal Software"
                WIDTH=181 HEIGHT=25>
<BR CLEAR="right">

<P>
Back to :
<A HREF="../../index.html">Index</A> |
<A HREF="index.html">Macro preprocessor overview</A>
<P>

<P>
<ADDRESS><FONT SIZE="-1">
                  AnaGram parser generator - examples<BR>
                  Macro/Argument Substitution Module - Macro preprocessor and C Parser <BR>
                  Copyright &copy; 1993-1999, Parsifal Software. <BR>
                  All Rights Reserved.<BR>
</FONT></ADDRESS>

</BODY>
</HTML>