Mercurial > ~dholland > hg > ag > index.cgi
diff doc/mansupp/mansupp-201.html @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/mansupp/mansupp-201.html Sat Dec 22 17:52:45 2007 -0500 @@ -0,0 +1,345 @@ +<HTML> +<HEAD> +<TITLE>AnaGram 2.01 Manual Supplement</TITLE> +</HEAD> +<BODY TEXT="#000000" LINK="#0000ff" VLINK="#551a8b" ALINK="#ff0000" + BGCOLOR="#ffffff"> + +<H1 ALIGN="CENTER">AnaGram 2.01</H1> + +<H1 ALIGN="CENTER">Supplement to User's Guide</H1> +<P> +<BR> + +<H2>Thread Safe Parsers</H2> + +<P> +AnaGram 2.01 incorporates several changes designed to make it +easier to write thread safe parsers. +</P> + +<P> +First, the new <B><A HREF="#reentrantParser">reentrant parser</A></B> +switch makes the AnaGram parse +engine reentrant by passing the parser control block as an argument +to all function calls. Without it, the parser control block becomes a +global resource, so that only one parse context can be in use at one +time. +</P> + +<P> +Second, the <B><A HREF="#extendPcb">extend pcb</A></B> +statement allows you to add your own +declarations to the parser control block, so that you can avoid +references to global or static variables in your reduction procedures. +</P> + +<P> +Finally, the parsers generated by AnaGram 2.01 no longer use any +static or global variables to store temporary data. All working storage +is now kept on the stack or in the parser control block. +</P> + +<P> +These are the steps to make a parser thread safe: +<UL> + <LI>Set the <B>reentrant parser</B> switch in your syntax file.</LI> + <LI>Add one or more <B>extend pcb</B> statements to your syntax file + and include declarations for all the variables needed by your + reduction procedures. Update your reduction procedures + accordingly.</LI> + <LI>If your parser will modify any variable which is not in the + parser control block, make sure that variable is protected by + a mutex, or otherwise synchronized properly.</LI> + <LI>To run the parser, declare an instance of the parser control + block <EM>on the stack</EM>, initialize your fields in the + parser control block as appropriate, lock any relevant mutexes, + and then call the parser function with a pointer to the parser + control block as the argument.</LI> +</UL> +<BR> + +<H2>Added C++ Support</H2> + +<P> +In previous versions of AnaGram it has not been possible to return class +instances (rather than pointers to them) from reduction procedures except +under limited circumstances. This is because AnaGram generates code that +stores objects on the parser value stack simply by casting the stack pointer +and assigning the value. This approach is correct for all traditional data +types, but leads to unpredictable behavior for a class that has supplied its +own assignment operator. Overloaded assignment operators depend on the +destination being a valid instance of the class. With the traditional AnaGram +parser value stack, however, this is not normally the case. +</P> + +<P> +Since there are many classes, such as string classes, which require +their own implementation of the assignment operator, the restriction +on returning class instances has often made reduction procedures +unnecessarily complex. +</P> + +<P> +AnaGram 2.01 now has a <B><A HREF="#wrapper">wrapper</A></B> +statement which can be used to +overcome this problem. For each class specified in a <B>wrapper</B> +statement, AnaGram generates a wrapper class that transparently +solves the problem. The stacked object is created using the copy +constructor. The reduction procedure is called with a reference to the +stacked object rather than a copy. Wrapped objects are removed <EM>after</EM> +the reduction procedure that uses them returns. +</P> +<BR> + +<H2>Error Diagnostic Support</H2> + +<P> +The error diagnostics created by the <STRONG>diagnose errors</STRONG> +switch have +been revised so that their text is defined by macros which the user +can replace. There are three macros involved: +</P> + +<UL> + <LI><TT>MISSING_FORMAT</TT>. The default definition of this macro is + <CODE>"Missing %s"</CODE>. It is used when the parser expects a unique + input token, the name of the token exists in the <B>token names</B> + table, and the token is not found in the input.</LI> + <LI><TT>UNEXPECTED_FORMAT</TT>. The default definition of this + macro is <CODE>"Unexpected %s"</CODE>. It is used when there is more + than one possible input token, but the token found is not one of + those expected.</LI> + <LI><TT>UNNAMED_TOKEN</TT>. The default definition is <TT>"input"</TT>. It + is used in place of a token name in <TT>UNEXPECTED_FORMAT</TT> + when the actual input encountered cannot be identified as a + token.</LI> +</UL> + +<P> +Note that if <B>diagnose errors</B> is ON, AnaGram automatically +includes in your generated parser the array of strings specified by the +<TT>TOKEN_NAMES</TT> macro, which is useful in creating +diagnostics. The default +name of this array is +<PRE> + <parser name>_token_names +</PRE> +</P> +<BR> + +<H2>New Attribute Statements</H2> + +<H3><A NAME="extendPcb">extend pcb</A></H3> + +<P> +The <B>extend pcb</B> statement is an attribute statement that allows you to +add declarations of your own to the parser control block. With this +feature, data needed by reduction procedures can be stored in the +parser control block rather than in global or static storage. This +capability greatly facilitates the construction of thread safe +parsers. +</P> + +<P> +The <B>extend pcb</B> statement may be used in any configuration section. +The format is as follows: +<PRE> + extend pcb { <C or C++ declaration>... } +</PRE> +</P> + +<P> +It may, of course, extend over multiple lines and may contain any number +of C or C++ declarations of any kind. AnaGram will append it to the end of +the parser control block definition in the generated parser header file. +There may be any number of <B>extend pcb</B> statements. The extensions are +appended to the parser control block definition in the order in which they +occur in the syntax file. +</P> + +<P> +The <B>extend pcb</B> statement is compatible with both C and C++ parsers. +Note that even if you are deriving your own class from the parser +control block, you might want to use <B>extend pcb</B> to provide virtual +function definitions or other declarations appropriate to a base class. +</P> + +<H3><A NAME="wrapper">wrapper</A></H3> + +<P> +The <B>wrapper</B> attribute statement provides correct handling of C++ +objects returned inline by reduction procedures. +</P> + +<P> +If you specify a wrapper for a C++ object, when a reduction +procedure returns an instance of the object, a copy of the object will +be constructed on the parser value stack and the destructor will be +called when that object is removed from the stack. +</P> + +<P> +Without a wrapper, objects are stored on the value stack simply by +coercing the stack pointer to the appropriate type. There is no +constructor call when the object is stored nor a destructor call when +it is removed from the stack. +</P> + +<P> +Classes which use reference counts or otherwise overload the +assignment operator should always have wrappers in order to +function correctly. +</P> + +<P> +Wrapper statements, like other attribute statements, must appear in +configuration sections. The syntax is: +<PRE> + wrapper {<comma delimited list of data types>} +</PRE> +For example: +<PRE> + [ + wrapper {CString, CFont} + ] +</PRE> +</P> + +<P> +You cannot specify a wrapper for the <B>default token type</B>. +</P> + +<P> +If your parser uses AnaGram wrappers and exits with an error condition, there +may be objects remaining on the parser value stack. If you have no +further use for +these objects, you should call the <TT>DELETE_WRAPPERS</TT> macro on error exit +so that they will be properly deleted, thus avoiding a memory leak. If you +have enabled <B>auto resynch</B>, <TT>DELETE_WRAPPERS</TT> will be +invoked automatically. +</P> +<BR> + +<H2>Changed Configuration Parameters</H2> + +<H3>Parser stack alignment</H3> + +<P> +<B>Parser stack alignment</B> now defaults to <TT>long</TT> instead +of <TT>int</TT>. With +this default, AnaGram parsers will compile and run on 64-bit +processors with no further attention. Users who are building parsers +for embedded systems or other uses where memory is limited may +want to override this default value with their own specification. +</P> + +<H3>Parser stack size</H3> + +<P> +<B>Parser stack size</B> now defaults to 128 instead of 32. AnaGram +adjusts the parser stack size upwards, if necessary, depending on the +grammar. If your grammar uses only left recursive constructs, you +will never have a problem with parser stack overflow. If there is +center recursion or right recursion in your grammar, however, there +always exists syntactically correct input which can cause stack +overflow no matter how large the stack. Be sure that the parser stack +size is ample enough to handle all reasonable cases. +</P> + +<H3>Token names</H3> + +<P> +<B>Token names</B> defaults to OFF. If it is set, AnaGram generates a +static array of character strings, indexed by token number, to provide +ASCII representations of token names for use in error diagnostics. +</P> + +<P> +The array contains strings for all grammar tokens which have been +explicitly named in the syntax file as well as tokens which represent +keywords or single character constants. +</P> + +<P> +Prior to version 2.01 of AnaGram, the array contained strings +for explicitly named tokens only. If this restriction is required, set the +<B>token names only</B> switch. +</P> + +<H2>New Configuration Parameters</H2> + +<H3>iso latin 1</H3> + +<P> +The <B>iso latin 1</B> configuration switch defaults to ON. It controls case +conversion on input characters when the <B>case sensitive</B> switch is set +to OFF. When <B>iso latin 1</B> is set, the default <TT>CONVERT_CASE</TT> macro +is defined to correctly convert all characters in the latin 1 character +set. +</P> + +<P> +When the <B>iso latin 1</B> switch is OFF, only characters in the ASCII range +(0-127) are converted. +</P> + +<H3><A NAME="reentrantParser">reentrant parser</A></H3> + +<P> +The <B>reentrant parser</B> configuration switch defaults to OFF. If you +turn it on, AnaGram will generate code that passes the parser control +block to functions via calling sequences so they do not have to use a +static reference to find the control block. +</P> + +<P> +AnaGram passes the parser control block using the macro +<TT>PCB_TYPE</TT>. For example, +<PRE> + static void ag_ra(PCB_TYPE *pcb_pointer) +</PRE> +AnaGram will define <TT>PCB_TYPE</TT> as the type of the parser +control block if you +do not define it otherwise. If you are using C++, and derive a class from the +parser control block, you can override the definition of +<TT>PCB_TYPE</TT> in order to +make your derived class accessible from your reduction procedures. +</P> + +<P> +The <B>reentrant parser</B> switch cannot be used in conjunction with the +<B>old style</B> switch. +</P> + +<P> +When you have enabled the reentrant parser switch, the parse +function, the initializer function, and the parser value function are all +defined to take a pointer to the parser control block as their sole +argument. +</P> + +<H3>token names only</H3> + +<P> +<B>Token names only</B> defaults to OFF. This configuration +switch was added to AnaGram 2.01 to provide the functionality previously +provided by the <B>token names</B> switch. When <B>token names +only</B> is ON, only tokens which have been given explicit names in the +syntax file have non-empty strings in the generated list of character strings. +<B>Token names only</B> takes precedence over the <B>token names</B> switch. +</P> + +<H3>no cr</H3> + +<P> +The <B>no cr</B> configuration switch is provided for developers +who intend to use the generated parser on a Unix system. When +<B>no cr</B> is set, it causes AnaGram's +output parser and header files to be written without carriage +returns. The switch defaults to OFF, to maintain compatibility with +Windows systems. +</P> + +</BODY> +</HTML>