view doc/mansupp/mansupp-201.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
line wrap: on
line source

<HTML>
<HEAD>
<TITLE>AnaGram 2.01 Manual Supplement</TITLE>
</HEAD>
<BODY TEXT="#000000" LINK="#0000ff" VLINK="#551a8b" ALINK="#ff0000" 
      BGCOLOR="#ffffff">

<H1 ALIGN="CENTER">AnaGram 2.01</H1>

<H1 ALIGN="CENTER">Supplement to User's Guide</H1>
<P>
<BR>

<H2>Thread Safe Parsers</H2>

<P>
AnaGram 2.01 incorporates several changes designed to make it
easier to write thread safe parsers.
</P>

<P>
First, the new <B><A HREF="#reentrantParser">reentrant parser</A></B>
switch makes the AnaGram parse
engine reentrant by passing the parser control block as an argument
to all function calls. Without it, the parser control block becomes a
global resource, so that only one parse context can be in use at one
time.
</P>

<P>
Second, the <B><A HREF="#extendPcb">extend pcb</A></B>
statement allows you to add your own
declarations to the parser control block, so that you can avoid
references to global or static variables in your reduction procedures.
</P>

<P>
Finally, the parsers generated by AnaGram 2.01 no longer use any
static or global variables to store temporary data. All working storage
is now kept on the stack or in the parser control block.
</P>

<P>
These are the steps to make a parser thread safe:
<UL>
  <LI>Set the <B>reentrant parser</B> switch in your syntax file.</LI>
  <LI>Add one or more <B>extend pcb</B> statements to your syntax file
      and include declarations for all the variables needed by your
      reduction procedures. Update your reduction procedures
      accordingly.</LI>
  <LI>If your parser will modify any variable which is not in the
      parser control block, make sure that variable is protected by
      a mutex, or otherwise synchronized properly.</LI>
  <LI>To run the parser, declare an instance of the parser control
      block <EM>on the stack</EM>, initialize your fields in the
      parser control block as appropriate, lock any relevant mutexes,
      and then call the parser function with a pointer to the parser
      control block as the argument.</LI>
</UL>
<BR>

<H2>Added C++ Support</H2>

<P>
In previous versions of AnaGram it has not been possible to return class
instances (rather than pointers to them) from reduction procedures except
under limited circumstances. This is because AnaGram generates code that
stores objects on the parser value stack simply by casting the stack pointer
and assigning the value. This approach is correct for all traditional data
types, but leads to unpredictable behavior for a class that has supplied its
own assignment operator. Overloaded assignment operators depend on the
destination being a valid instance of the class. With the traditional AnaGram
parser value stack, however, this is not normally the case.
</P>

<P>
Since there are many classes, such as string classes,  which require
their own implementation of the assignment operator, the restriction
on returning class instances has often made reduction procedures
unnecessarily complex.
</P>

<P>
AnaGram 2.01 now has a <B><A HREF="#wrapper">wrapper</A></B>
statement which can be used to
overcome this problem. For each class specified in a <B>wrapper</B>
statement, AnaGram generates a wrapper class that transparently
solves the problem. The stacked object is created using the copy
constructor. The reduction procedure is called with a reference to the
stacked object rather than a copy. Wrapped objects are removed <EM>after</EM>
the reduction procedure that uses them returns.
</P>
<BR>

<H2>Error Diagnostic Support</H2>

<P>
The error diagnostics created by the <STRONG>diagnose errors</STRONG>
switch have
been revised so that their text is defined by macros which the user
can replace. There are three macros involved:
</P>

<UL>
  <LI><TT>MISSING_FORMAT</TT>. The default definition of this macro is
      <CODE>"Missing %s"</CODE>. It is used when the parser expects a unique
      input token, the name of the token exists in the <B>token names</B>
      table, and the token is not found in the input.</LI>
  <LI><TT>UNEXPECTED_FORMAT</TT>. The default definition of this
      macro is <CODE>"Unexpected %s"</CODE>. It is used when there is more
      than one possible input token, but the token found is not one of
      those expected.</LI>
  <LI><TT>UNNAMED_TOKEN</TT>. The default definition is <TT>"input"</TT>. It
      is used in place of a token name in <TT>UNEXPECTED_FORMAT</TT>
      when the actual input encountered cannot be identified as a
      token.</LI>
</UL>

<P>
Note that if <B>diagnose errors</B> is ON, AnaGram automatically
includes in your generated parser the array of strings specified by the
<TT>TOKEN_NAMES</TT> macro, which is useful in creating
diagnostics. The default
name of this array is
<PRE>
       &lt;parser name&gt;_token_names
</PRE>
</P>
<BR>

<H2>New Attribute Statements</H2>

<H3><A NAME="extendPcb">extend pcb</A></H3>

<P>
The <B>extend pcb</B> statement is an attribute statement that allows you to
add declarations of your own to the parser control block. With this
feature, data needed by reduction procedures can be stored in the
parser control block rather than in global or static storage. This
capability greatly facilitates the construction of thread safe
parsers.
</P>

<P>
The <B>extend pcb</B> statement may be used in any configuration section.
The format is as follows:
<PRE>
  extend pcb { &lt;C or C++ declaration&gt;... }
</PRE>
</P>

<P>
It may, of course, extend over multiple lines and may contain any number
of C or C++ declarations of any kind. AnaGram will append it to the end of
the parser control block definition in the generated parser header file.
There may be any number of <B>extend pcb</B> statements. The extensions are
appended to the parser control block definition in the order in which they
occur in the syntax file.
</P>

<P>
The <B>extend pcb</B> statement is compatible with both C and C++ parsers.
Note that even if you are deriving your own class from the parser
control block, you might want to use <B>extend pcb</B> to provide virtual
function definitions or other declarations appropriate to a base class.
</P>

<H3><A NAME="wrapper">wrapper</A></H3>

<P>
The <B>wrapper</B> attribute statement provides correct handling of C++
objects returned inline by reduction procedures.
</P>

<P>
If you specify a wrapper for a C++ object, when a reduction
procedure returns an instance of the object, a copy of the object will
be constructed on the parser value stack and the destructor will be
called when that object is removed from the stack.
</P>

<P>
Without a wrapper, objects are stored on the value stack simply by
coercing the stack pointer to the appropriate type. There is no
constructor call when the object is stored nor a destructor call when
it is removed from the stack.
</P>

<P>
Classes which use reference counts or otherwise overload the
assignment operator should always have wrappers in order to
function correctly.
</P>

<P>
Wrapper statements, like other attribute statements, must appear in
configuration sections. The syntax is:
<PRE>
  wrapper {&lt;comma delimited list of data types&gt;}
</PRE>
For example:
<PRE>
   [
      wrapper {CString, CFont}
   ]
</PRE>
</P>

<P>
You cannot specify a wrapper for the <B>default token type</B>.
</P>

<P>
If your parser uses AnaGram wrappers and exits with an error condition, there
may be objects remaining on the parser value stack. If you have no
further use for
these objects, you should call the <TT>DELETE_WRAPPERS</TT> macro on error exit
so that they will be properly deleted, thus avoiding a memory leak. If you
have enabled <B>auto resynch</B>, <TT>DELETE_WRAPPERS</TT> will be
invoked automatically.
</P>
<BR>

<H2>Changed Configuration Parameters</H2>

<H3>Parser stack alignment</H3>

<P>
<B>Parser stack alignment</B> now defaults to <TT>long</TT> instead 
of <TT>int</TT>. With
this default, AnaGram parsers will compile and run on 64-bit
processors with no further attention. Users who are building parsers
for embedded systems or other uses where memory is limited may
want to override this default value with their own specification.
</P>

<H3>Parser stack size</H3>

<P>
<B>Parser stack size</B> now defaults to 128 instead of 32. AnaGram
adjusts the parser stack size upwards, if necessary, depending on the
grammar. If your grammar uses only left recursive constructs, you
will never have a problem with parser stack overflow. If there is
center recursion or right recursion in your grammar, however, there
always exists syntactically correct input which can cause stack
overflow no matter how large the stack. Be sure that the parser stack
size is ample enough to handle all reasonable cases.
</P>

<H3>Token names</H3>

<P>
<B>Token names</B> defaults to OFF. If it is set, AnaGram generates a
static array of character strings, indexed by token number, to provide
ASCII representations of token names for use in error diagnostics.
</P>

<P>
The array contains strings for all grammar tokens which have been
explicitly named in the syntax file as well as tokens which represent
keywords or single character constants.
</P>

<P>
Prior to version 2.01 of AnaGram, the array contained strings
for explicitly named tokens only. If this restriction is required, set the
<B>token names only</B> switch.
</P>

<H2>New Configuration Parameters</H2>

<H3>iso latin 1</H3>

<P>
The <B>iso latin 1</B> configuration switch defaults to ON. It controls case
conversion on input characters when the <B>case sensitive</B> switch is set
to OFF. When <B>iso latin 1</B> is set, the default <TT>CONVERT_CASE</TT> macro
is defined to correctly convert all characters in the latin 1 character
set.
</P>

<P>
When the <B>iso latin 1</B> switch is OFF, only characters in the ASCII range
(0-127) are converted.
</P>

<H3><A NAME="reentrantParser">reentrant parser</A></H3>

<P>
The <B>reentrant parser</B> configuration switch defaults to OFF. If you
turn it on, AnaGram will generate code that passes the parser control
block to functions via calling sequences so they do not have to use a
static reference to find the control block.
</P>

<P>
AnaGram passes the parser control block using the macro
<TT>PCB_TYPE</TT>. For example,
<PRE>
  static void ag_ra(PCB_TYPE *pcb_pointer)
</PRE>
AnaGram will define <TT>PCB_TYPE</TT> as the type of the parser
control block if you
do not define it otherwise. If you are using C++, and derive a class from the
parser control block, you can override the definition of
<TT>PCB_TYPE</TT> in order to
make your derived class accessible from your reduction procedures.
</P>

<P>
The <B>reentrant parser</B> switch cannot be used in conjunction with the
<B>old style</B> switch.
</P>

<P>
When you have enabled the reentrant parser switch, the parse
function, the initializer function, and the parser value function are all
defined to take a pointer to the parser control block as their sole
argument.
</P>

<H3>token names only</H3>

<P>
<B>Token names only</B> defaults to OFF. This configuration
switch was added to AnaGram 2.01 to provide the functionality previously
provided by the <B>token names</B> switch. When <B>token names
only</B> is ON, only tokens which have been given explicit names in the
syntax file have non-empty strings in the generated list of character strings.
<B>Token names only</B> takes precedence over the <B>token names</B> switch.
</P>

<H3>no cr</H3>

<P>
The <B>no cr</B> configuration switch is provided for developers
who intend to use the generated parser on a Unix system. When
<B>no cr</B> is set, it causes AnaGram's
output parser and header files to be written without carriage
returns. The switch defaults to OFF, to maintain compatibility with
Windows systems.
</P>

</BODY>
</HTML>