diff doc/mansupp/mansupp-201.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/mansupp/mansupp-201.html	Sat Dec 22 17:52:45 2007 -0500
@@ -0,0 +1,345 @@
+<HTML>
+<HEAD>
+<TITLE>AnaGram 2.01 Manual Supplement</TITLE>
+</HEAD>
+<BODY TEXT="#000000" LINK="#0000ff" VLINK="#551a8b" ALINK="#ff0000" 
+      BGCOLOR="#ffffff">
+
+<H1 ALIGN="CENTER">AnaGram 2.01</H1>
+
+<H1 ALIGN="CENTER">Supplement to User's Guide</H1>
+<P>
+<BR>
+
+<H2>Thread Safe Parsers</H2>
+
+<P>
+AnaGram 2.01 incorporates several changes designed to make it
+easier to write thread safe parsers.
+</P>
+
+<P>
+First, the new <B><A HREF="#reentrantParser">reentrant parser</A></B>
+switch makes the AnaGram parse
+engine reentrant by passing the parser control block as an argument
+to all function calls. Without it, the parser control block becomes a
+global resource, so that only one parse context can be in use at one
+time.
+</P>
+
+<P>
+Second, the <B><A HREF="#extendPcb">extend pcb</A></B>
+statement allows you to add your own
+declarations to the parser control block, so that you can avoid
+references to global or static variables in your reduction procedures.
+</P>
+
+<P>
+Finally, the parsers generated by AnaGram 2.01 no longer use any
+static or global variables to store temporary data. All working storage
+is now kept on the stack or in the parser control block.
+</P>
+
+<P>
+These are the steps to make a parser thread safe:
+<UL>
+  <LI>Set the <B>reentrant parser</B> switch in your syntax file.</LI>
+  <LI>Add one or more <B>extend pcb</B> statements to your syntax file
+      and include declarations for all the variables needed by your
+      reduction procedures. Update your reduction procedures
+      accordingly.</LI>
+  <LI>If your parser will modify any variable which is not in the
+      parser control block, make sure that variable is protected by
+      a mutex, or otherwise synchronized properly.</LI>
+  <LI>To run the parser, declare an instance of the parser control
+      block <EM>on the stack</EM>, initialize your fields in the
+      parser control block as appropriate, lock any relevant mutexes,
+      and then call the parser function with a pointer to the parser
+      control block as the argument.</LI>
+</UL>
+<BR>
+
+<H2>Added C++ Support</H2>
+
+<P>
+In previous versions of AnaGram it has not been possible to return class
+instances (rather than pointers to them) from reduction procedures except
+under limited circumstances. This is because AnaGram generates code that
+stores objects on the parser value stack simply by casting the stack pointer
+and assigning the value. This approach is correct for all traditional data
+types, but leads to unpredictable behavior for a class that has supplied its
+own assignment operator. Overloaded assignment operators depend on the
+destination being a valid instance of the class. With the traditional AnaGram
+parser value stack, however, this is not normally the case.
+</P>
+
+<P>
+Since there are many classes, such as string classes,  which require
+their own implementation of the assignment operator, the restriction
+on returning class instances has often made reduction procedures
+unnecessarily complex.
+</P>
+
+<P>
+AnaGram 2.01 now has a <B><A HREF="#wrapper">wrapper</A></B>
+statement which can be used to
+overcome this problem. For each class specified in a <B>wrapper</B>
+statement, AnaGram generates a wrapper class that transparently
+solves the problem. The stacked object is created using the copy
+constructor. The reduction procedure is called with a reference to the
+stacked object rather than a copy. Wrapped objects are removed <EM>after</EM>
+the reduction procedure that uses them returns.
+</P>
+<BR>
+
+<H2>Error Diagnostic Support</H2>
+
+<P>
+The error diagnostics created by the <STRONG>diagnose errors</STRONG>
+switch have
+been revised so that their text is defined by macros which the user
+can replace. There are three macros involved:
+</P>
+
+<UL>
+  <LI><TT>MISSING_FORMAT</TT>. The default definition of this macro is
+      <CODE>"Missing %s"</CODE>. It is used when the parser expects a unique
+      input token, the name of the token exists in the <B>token names</B>
+      table, and the token is not found in the input.</LI>
+  <LI><TT>UNEXPECTED_FORMAT</TT>. The default definition of this
+      macro is <CODE>"Unexpected %s"</CODE>. It is used when there is more
+      than one possible input token, but the token found is not one of
+      those expected.</LI>
+  <LI><TT>UNNAMED_TOKEN</TT>. The default definition is <TT>"input"</TT>. It
+      is used in place of a token name in <TT>UNEXPECTED_FORMAT</TT>
+      when the actual input encountered cannot be identified as a
+      token.</LI>
+</UL>
+
+<P>
+Note that if <B>diagnose errors</B> is ON, AnaGram automatically
+includes in your generated parser the array of strings specified by the
+<TT>TOKEN_NAMES</TT> macro, which is useful in creating
+diagnostics. The default
+name of this array is
+<PRE>
+       &lt;parser name&gt;_token_names
+</PRE>
+</P>
+<BR>
+
+<H2>New Attribute Statements</H2>
+
+<H3><A NAME="extendPcb">extend pcb</A></H3>
+
+<P>
+The <B>extend pcb</B> statement is an attribute statement that allows you to
+add declarations of your own to the parser control block. With this
+feature, data needed by reduction procedures can be stored in the
+parser control block rather than in global or static storage. This
+capability greatly facilitates the construction of thread safe
+parsers.
+</P>
+
+<P>
+The <B>extend pcb</B> statement may be used in any configuration section.
+The format is as follows:
+<PRE>
+  extend pcb { &lt;C or C++ declaration&gt;... }
+</PRE>
+</P>
+
+<P>
+It may, of course, extend over multiple lines and may contain any number
+of C or C++ declarations of any kind. AnaGram will append it to the end of
+the parser control block definition in the generated parser header file.
+There may be any number of <B>extend pcb</B> statements. The extensions are
+appended to the parser control block definition in the order in which they
+occur in the syntax file.
+</P>
+
+<P>
+The <B>extend pcb</B> statement is compatible with both C and C++ parsers.
+Note that even if you are deriving your own class from the parser
+control block, you might want to use <B>extend pcb</B> to provide virtual
+function definitions or other declarations appropriate to a base class.
+</P>
+
+<H3><A NAME="wrapper">wrapper</A></H3>
+
+<P>
+The <B>wrapper</B> attribute statement provides correct handling of C++
+objects returned inline by reduction procedures.
+</P>
+
+<P>
+If you specify a wrapper for a C++ object, when a reduction
+procedure returns an instance of the object, a copy of the object will
+be constructed on the parser value stack and the destructor will be
+called when that object is removed from the stack.
+</P>
+
+<P>
+Without a wrapper, objects are stored on the value stack simply by
+coercing the stack pointer to the appropriate type. There is no
+constructor call when the object is stored nor a destructor call when
+it is removed from the stack.
+</P>
+
+<P>
+Classes which use reference counts or otherwise overload the
+assignment operator should always have wrappers in order to
+function correctly.
+</P>
+
+<P>
+Wrapper statements, like other attribute statements, must appear in
+configuration sections. The syntax is:
+<PRE>
+  wrapper {&lt;comma delimited list of data types&gt;}
+</PRE>
+For example:
+<PRE>
+   [
+      wrapper {CString, CFont}
+   ]
+</PRE>
+</P>
+
+<P>
+You cannot specify a wrapper for the <B>default token type</B>.
+</P>
+
+<P>
+If your parser uses AnaGram wrappers and exits with an error condition, there
+may be objects remaining on the parser value stack. If you have no
+further use for
+these objects, you should call the <TT>DELETE_WRAPPERS</TT> macro on error exit
+so that they will be properly deleted, thus avoiding a memory leak. If you
+have enabled <B>auto resynch</B>, <TT>DELETE_WRAPPERS</TT> will be
+invoked automatically.
+</P>
+<BR>
+
+<H2>Changed Configuration Parameters</H2>
+
+<H3>Parser stack alignment</H3>
+
+<P>
+<B>Parser stack alignment</B> now defaults to <TT>long</TT> instead 
+of <TT>int</TT>. With
+this default, AnaGram parsers will compile and run on 64-bit
+processors with no further attention. Users who are building parsers
+for embedded systems or other uses where memory is limited may
+want to override this default value with their own specification.
+</P>
+
+<H3>Parser stack size</H3>
+
+<P>
+<B>Parser stack size</B> now defaults to 128 instead of 32. AnaGram
+adjusts the parser stack size upwards, if necessary, depending on the
+grammar. If your grammar uses only left recursive constructs, you
+will never have a problem with parser stack overflow. If there is
+center recursion or right recursion in your grammar, however, there
+always exists syntactically correct input which can cause stack
+overflow no matter how large the stack. Be sure that the parser stack
+size is ample enough to handle all reasonable cases.
+</P>
+
+<H3>Token names</H3>
+
+<P>
+<B>Token names</B> defaults to OFF. If it is set, AnaGram generates a
+static array of character strings, indexed by token number, to provide
+ASCII representations of token names for use in error diagnostics.
+</P>
+
+<P>
+The array contains strings for all grammar tokens which have been
+explicitly named in the syntax file as well as tokens which represent
+keywords or single character constants.
+</P>
+
+<P>
+Prior to version 2.01 of AnaGram, the array contained strings
+for explicitly named tokens only. If this restriction is required, set the
+<B>token names only</B> switch.
+</P>
+
+<H2>New Configuration Parameters</H2>
+
+<H3>iso latin 1</H3>
+
+<P>
+The <B>iso latin 1</B> configuration switch defaults to ON. It controls case
+conversion on input characters when the <B>case sensitive</B> switch is set
+to OFF. When <B>iso latin 1</B> is set, the default <TT>CONVERT_CASE</TT> macro
+is defined to correctly convert all characters in the latin 1 character
+set.
+</P>
+
+<P>
+When the <B>iso latin 1</B> switch is OFF, only characters in the ASCII range
+(0-127) are converted.
+</P>
+
+<H3><A NAME="reentrantParser">reentrant parser</A></H3>
+
+<P>
+The <B>reentrant parser</B> configuration switch defaults to OFF. If you
+turn it on, AnaGram will generate code that passes the parser control
+block to functions via calling sequences so they do not have to use a
+static reference to find the control block.
+</P>
+
+<P>
+AnaGram passes the parser control block using the macro
+<TT>PCB_TYPE</TT>. For example,
+<PRE>
+  static void ag_ra(PCB_TYPE *pcb_pointer)
+</PRE>
+AnaGram will define <TT>PCB_TYPE</TT> as the type of the parser
+control block if you
+do not define it otherwise. If you are using C++, and derive a class from the
+parser control block, you can override the definition of
+<TT>PCB_TYPE</TT> in order to
+make your derived class accessible from your reduction procedures.
+</P>
+
+<P>
+The <B>reentrant parser</B> switch cannot be used in conjunction with the
+<B>old style</B> switch.
+</P>
+
+<P>
+When you have enabled the reentrant parser switch, the parse
+function, the initializer function, and the parser value function are all
+defined to take a pointer to the parser control block as their sole
+argument.
+</P>
+
+<H3>token names only</H3>
+
+<P>
+<B>Token names only</B> defaults to OFF. This configuration
+switch was added to AnaGram 2.01 to provide the functionality previously
+provided by the <B>token names</B> switch. When <B>token names
+only</B> is ON, only tokens which have been given explicit names in the
+syntax file have non-empty strings in the generated list of character strings.
+<B>Token names only</B> takes precedence over the <B>token names</B> switch.
+</P>
+
+<H3>no cr</H3>
+
+<P>
+The <B>no cr</B> configuration switch is provided for developers
+who intend to use the generated parser on a Unix system. When
+<B>no cr</B> is set, it causes AnaGram's
+output parser and header files to be written without carriage
+returns. The switch defaults to OFF, to maintain compatibility with
+Windows systems.
+</P>
+
+</BODY>
+</HTML>