Mercurial > ~dholland > hg > ag > index.cgi
view doc/mansupp/mansupp-201.html @ 7:57b2cc9b87f7
Use memcpy instead of strncpy when we know the length anyway.
Modern gcc seems to think it knows how to detect misuse of strncpy,
but it's wrong (in fact: very, very wrong) and the path of least
resistance is to not try to fight with it.
author | David A. Holland |
---|---|
date | Mon, 30 May 2022 23:47:52 -0400 |
parents | 13d2b8934445 |
children |
line wrap: on
line source
<HTML> <HEAD> <TITLE>AnaGram 2.01 Manual Supplement</TITLE> </HEAD> <BODY TEXT="#000000" LINK="#0000ff" VLINK="#551a8b" ALINK="#ff0000" BGCOLOR="#ffffff"> <H1 ALIGN="CENTER">AnaGram 2.01</H1> <H1 ALIGN="CENTER">Supplement to User's Guide</H1> <P> <BR> <H2>Thread Safe Parsers</H2> <P> AnaGram 2.01 incorporates several changes designed to make it easier to write thread safe parsers. </P> <P> First, the new <B><A HREF="#reentrantParser">reentrant parser</A></B> switch makes the AnaGram parse engine reentrant by passing the parser control block as an argument to all function calls. Without it, the parser control block becomes a global resource, so that only one parse context can be in use at one time. </P> <P> Second, the <B><A HREF="#extendPcb">extend pcb</A></B> statement allows you to add your own declarations to the parser control block, so that you can avoid references to global or static variables in your reduction procedures. </P> <P> Finally, the parsers generated by AnaGram 2.01 no longer use any static or global variables to store temporary data. All working storage is now kept on the stack or in the parser control block. </P> <P> These are the steps to make a parser thread safe: <UL> <LI>Set the <B>reentrant parser</B> switch in your syntax file.</LI> <LI>Add one or more <B>extend pcb</B> statements to your syntax file and include declarations for all the variables needed by your reduction procedures. Update your reduction procedures accordingly.</LI> <LI>If your parser will modify any variable which is not in the parser control block, make sure that variable is protected by a mutex, or otherwise synchronized properly.</LI> <LI>To run the parser, declare an instance of the parser control block <EM>on the stack</EM>, initialize your fields in the parser control block as appropriate, lock any relevant mutexes, and then call the parser function with a pointer to the parser control block as the argument.</LI> </UL> <BR> <H2>Added C++ Support</H2> <P> In previous versions of AnaGram it has not been possible to return class instances (rather than pointers to them) from reduction procedures except under limited circumstances. This is because AnaGram generates code that stores objects on the parser value stack simply by casting the stack pointer and assigning the value. This approach is correct for all traditional data types, but leads to unpredictable behavior for a class that has supplied its own assignment operator. Overloaded assignment operators depend on the destination being a valid instance of the class. With the traditional AnaGram parser value stack, however, this is not normally the case. </P> <P> Since there are many classes, such as string classes, which require their own implementation of the assignment operator, the restriction on returning class instances has often made reduction procedures unnecessarily complex. </P> <P> AnaGram 2.01 now has a <B><A HREF="#wrapper">wrapper</A></B> statement which can be used to overcome this problem. For each class specified in a <B>wrapper</B> statement, AnaGram generates a wrapper class that transparently solves the problem. The stacked object is created using the copy constructor. The reduction procedure is called with a reference to the stacked object rather than a copy. Wrapped objects are removed <EM>after</EM> the reduction procedure that uses them returns. </P> <BR> <H2>Error Diagnostic Support</H2> <P> The error diagnostics created by the <STRONG>diagnose errors</STRONG> switch have been revised so that their text is defined by macros which the user can replace. There are three macros involved: </P> <UL> <LI><TT>MISSING_FORMAT</TT>. The default definition of this macro is <CODE>"Missing %s"</CODE>. It is used when the parser expects a unique input token, the name of the token exists in the <B>token names</B> table, and the token is not found in the input.</LI> <LI><TT>UNEXPECTED_FORMAT</TT>. The default definition of this macro is <CODE>"Unexpected %s"</CODE>. It is used when there is more than one possible input token, but the token found is not one of those expected.</LI> <LI><TT>UNNAMED_TOKEN</TT>. The default definition is <TT>"input"</TT>. It is used in place of a token name in <TT>UNEXPECTED_FORMAT</TT> when the actual input encountered cannot be identified as a token.</LI> </UL> <P> Note that if <B>diagnose errors</B> is ON, AnaGram automatically includes in your generated parser the array of strings specified by the <TT>TOKEN_NAMES</TT> macro, which is useful in creating diagnostics. The default name of this array is <PRE> <parser name>_token_names </PRE> </P> <BR> <H2>New Attribute Statements</H2> <H3><A NAME="extendPcb">extend pcb</A></H3> <P> The <B>extend pcb</B> statement is an attribute statement that allows you to add declarations of your own to the parser control block. With this feature, data needed by reduction procedures can be stored in the parser control block rather than in global or static storage. This capability greatly facilitates the construction of thread safe parsers. </P> <P> The <B>extend pcb</B> statement may be used in any configuration section. The format is as follows: <PRE> extend pcb { <C or C++ declaration>... } </PRE> </P> <P> It may, of course, extend over multiple lines and may contain any number of C or C++ declarations of any kind. AnaGram will append it to the end of the parser control block definition in the generated parser header file. There may be any number of <B>extend pcb</B> statements. The extensions are appended to the parser control block definition in the order in which they occur in the syntax file. </P> <P> The <B>extend pcb</B> statement is compatible with both C and C++ parsers. Note that even if you are deriving your own class from the parser control block, you might want to use <B>extend pcb</B> to provide virtual function definitions or other declarations appropriate to a base class. </P> <H3><A NAME="wrapper">wrapper</A></H3> <P> The <B>wrapper</B> attribute statement provides correct handling of C++ objects returned inline by reduction procedures. </P> <P> If you specify a wrapper for a C++ object, when a reduction procedure returns an instance of the object, a copy of the object will be constructed on the parser value stack and the destructor will be called when that object is removed from the stack. </P> <P> Without a wrapper, objects are stored on the value stack simply by coercing the stack pointer to the appropriate type. There is no constructor call when the object is stored nor a destructor call when it is removed from the stack. </P> <P> Classes which use reference counts or otherwise overload the assignment operator should always have wrappers in order to function correctly. </P> <P> Wrapper statements, like other attribute statements, must appear in configuration sections. The syntax is: <PRE> wrapper {<comma delimited list of data types>} </PRE> For example: <PRE> [ wrapper {CString, CFont} ] </PRE> </P> <P> You cannot specify a wrapper for the <B>default token type</B>. </P> <P> If your parser uses AnaGram wrappers and exits with an error condition, there may be objects remaining on the parser value stack. If you have no further use for these objects, you should call the <TT>DELETE_WRAPPERS</TT> macro on error exit so that they will be properly deleted, thus avoiding a memory leak. If you have enabled <B>auto resynch</B>, <TT>DELETE_WRAPPERS</TT> will be invoked automatically. </P> <BR> <H2>Changed Configuration Parameters</H2> <H3>Parser stack alignment</H3> <P> <B>Parser stack alignment</B> now defaults to <TT>long</TT> instead of <TT>int</TT>. With this default, AnaGram parsers will compile and run on 64-bit processors with no further attention. Users who are building parsers for embedded systems or other uses where memory is limited may want to override this default value with their own specification. </P> <H3>Parser stack size</H3> <P> <B>Parser stack size</B> now defaults to 128 instead of 32. AnaGram adjusts the parser stack size upwards, if necessary, depending on the grammar. If your grammar uses only left recursive constructs, you will never have a problem with parser stack overflow. If there is center recursion or right recursion in your grammar, however, there always exists syntactically correct input which can cause stack overflow no matter how large the stack. Be sure that the parser stack size is ample enough to handle all reasonable cases. </P> <H3>Token names</H3> <P> <B>Token names</B> defaults to OFF. If it is set, AnaGram generates a static array of character strings, indexed by token number, to provide ASCII representations of token names for use in error diagnostics. </P> <P> The array contains strings for all grammar tokens which have been explicitly named in the syntax file as well as tokens which represent keywords or single character constants. </P> <P> Prior to version 2.01 of AnaGram, the array contained strings for explicitly named tokens only. If this restriction is required, set the <B>token names only</B> switch. </P> <H2>New Configuration Parameters</H2> <H3>iso latin 1</H3> <P> The <B>iso latin 1</B> configuration switch defaults to ON. It controls case conversion on input characters when the <B>case sensitive</B> switch is set to OFF. When <B>iso latin 1</B> is set, the default <TT>CONVERT_CASE</TT> macro is defined to correctly convert all characters in the latin 1 character set. </P> <P> When the <B>iso latin 1</B> switch is OFF, only characters in the ASCII range (0-127) are converted. </P> <H3><A NAME="reentrantParser">reentrant parser</A></H3> <P> The <B>reentrant parser</B> configuration switch defaults to OFF. If you turn it on, AnaGram will generate code that passes the parser control block to functions via calling sequences so they do not have to use a static reference to find the control block. </P> <P> AnaGram passes the parser control block using the macro <TT>PCB_TYPE</TT>. For example, <PRE> static void ag_ra(PCB_TYPE *pcb_pointer) </PRE> AnaGram will define <TT>PCB_TYPE</TT> as the type of the parser control block if you do not define it otherwise. If you are using C++, and derive a class from the parser control block, you can override the definition of <TT>PCB_TYPE</TT> in order to make your derived class accessible from your reduction procedures. </P> <P> The <B>reentrant parser</B> switch cannot be used in conjunction with the <B>old style</B> switch. </P> <P> When you have enabled the reentrant parser switch, the parse function, the initializer function, and the parser value function are all defined to take a pointer to the parser control block as their sole argument. </P> <H3>token names only</H3> <P> <B>Token names only</B> defaults to OFF. This configuration switch was added to AnaGram 2.01 to provide the functionality previously provided by the <B>token names</B> switch. When <B>token names only</B> is ON, only tokens which have been given explicit names in the syntax file have non-empty strings in the generated list of character strings. <B>Token names only</B> takes precedence over the <B>token names</B> switch. </P> <H3>no cr</H3> <P> The <B>no cr</B> configuration switch is provided for developers who intend to use the generated parser on a Unix system. When <B>no cr</B> is set, it causes AnaGram's output parser and header files to be written without carriage returns. The switch defaults to OFF, to maintain compatibility with Windows systems. </P> </BODY> </HTML>