Mercurial > ~dholland > hg > ag > index.cgi
diff anagram/guisupport/helpdata.src @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/anagram/guisupport/helpdata.src Sat Dec 22 17:52:45 2007 -0500 @@ -0,0 +1,7195 @@ +Accept Action + +The accept action is one of the four actions of a +traditional ©parsing engineª. The accept action is +performed when the ©parserª has succeeded in identifying +the goal, or ©grammar tokenª for the ©grammarª. When +the parser executes the accept action, it sets the ©exit_flagª +field in the ©parser control blockª to AG_SUCCESS_CODE and returns +to the calling program. The accept action is thus the last action of +the parsing engine and occurs only once for each successful execution +of the parser. + +If the grammar token has a non-void value, you may +obtain its value by calling the ©parser value functionª +whose name is given by <parser name>_value, that is, +by appending "_value" to the ©parser nameª. +## + +Parser Value Function, Return Value + +The value assigned to the ©grammar tokenª in your parser +may be retrieved by calling the parser value function after +the parser has finished. The name of this function is given +by <©parser nameª>_value. The return type of the function +is the type assigned to the grammar token. + +If you have set the ©reentrant parserª switch, the parser +value function takes a pointer to the ©parser control blockª +as its sole argument. Otherwise, it takes no arguments. The +value function is not defined if the grammar token has type "void". +## + +AG_PLACEMENT_DELETE_REQUIRED + +When the ©wrapperª option is specified, the wrapper +template class that AnaGram defines uses a "placement +new" operator to construct the wrapper object on the +©parser value stackª. The MSVC++ 6.0 compiler requires, +in this situation, that a corresponding "placement +delete" operator be defined. Other C++ compilers, +notably MSVC++ 5.0, generate an error message if +they encounter the definition of a "placement delete" +operator. + +Accordingly, AG_PLACEMENT_DELETE_REQUIRED is used to determine +whther a "placement delete" operator should be defined. + +AG_PLACEMENT_DELETE_REQUIRED is defined to be 1 if you are using MSVC++ +6.0 or greater, 0 otherwise. You can override the automatic definition of +AG_PLACEMENT_DELETE_REQUIRED by defining it in the ©C prologueª section +of your grammar. Set it to a non-zero value to force the "placement +delete" definition, zero to skip the definition. + +## + +ag_tcv + +ag_tcv is an array AnaGram includes in your ©parserª. +Your parser uses ag_tcv to translate external codes to +the internal token numbers that AnaGram uses. It uses +the actual input code to index the ag_tcv array to +fetch a ©token numberª. The token number is then used +to identify the input token. +## + +Allow macros + +"Allow macros" is a ©configuration switchª which +defaults to on. When it is set, i.e., on, ©reduction +procedureªs will be implemented as macros if they are +sufficiently simple. This makes your ©parserª somewhat +more compact but makes it somewhat more difficult to +debug. It's a good idea to turn this switch off for +debugging. +## + +Analyze Grammar + +The Analyze Grammar command will scan and +analyze your ©syntax fileª, and create a number of +tables summarizing your grammar. + +Analyze Grammar does not create any ©output filesª. +To create a ©parserª, use the ©Build Parserª command. +You would probably use Analyze Grammar, rather than Build Parser, during +initial development of your ©grammarª. + +You can use ©File Traceª and ©Grammar Traceª as soon as you have +analyzed your grammar. It is not necessary to build a parser first. +## + +Attribute Statement + +Attribute statements are used in ©configuration +sectionsª of your ©syntax fileª to specify certain +properties for ©tokenªs, ©character setªs, or other +units of your grammar. The attribute statements +available are: + ©disregardª + ©distinguish keywordsª + ©enumª + ©extend pcbª + ©hiddenª + ©leftª + ©lexemeª + ©nonassocª + ©rename macroª + ©reserve keywordsª + ©rightª + ©stickyª + ©subgrammarª + ©wrapperª +## + +Auto init + +Auto init is a ©configuration switchª which defaults to +on. It controls the initialization of any ©parserª that +it is not ©event drivenª. When it is set to on, your +parser is automatically initialized every time it is +called. This is the situation you will normally use. On +occasion, however, it is desirable to call a parser +several times without reinitializing it. In this case, +you may set the auto init parameter to off and then +call the ©initializerª yourself whenever it is +appropriate. +## + +Auto resynch + +"Auto resynch" is a ©configuration switchª which +defaults to off. You may use it to specify ©automatic +resynchronizationª as an ©error recoveryª mechanism. + +Setting the "auto resynch" switch causes AnaGram to +include an automatic ©resynchronizationª procedure in +your ©parserª. The resynchronization procedure will be +invoked when your parser encounters a ©syntax errorª +and will skip over input until it finds input +characters or ©tokensª consistent with its state at the +time of the error. + +An alternate technique, ©error token resynchronizationª, +uses an ©error tokenª which you include in your grammar. +## + +Automatic Resynchronization + +Automatic ©resynchronizationª is one of several ©error +recoveryª options available as part of parsers built by +AnaGram. You enable automatic resynchronization by +setting the ©auto resynchª ©configuration switchª. If +your parser includes automatic resynchronization it will +incorporate a heuristic procedure which will skip over +input tokens until it finds a token which makes sense +with respect to one or another of the ©productionªs +active at the time of the ©syntax errorª. + +The purpose of the resynchronization procedure is to +provide a simple way for your parser to proceed in the +event of syntax errors so that it can find more than one +syntax error on a given pass. The resynchronization +procedure uses a heuristic based on your own syntax. +AnaGram itself uses this technique to resynchronize +after syntax errors in its input. + +A disadvantage to using this resynchronization technique +is that the resynchronization procedure turns off all +©reduction procedureªs. Because of the error, a number +of reduction procedures, which normally would be +executed, will be skipped. The parameters for any +reduction procedures that might be called later would be +suspect and could cause serious problems. It seems more +prudent simply to shut them down. + +If you use the automatic resynchronization procedure, +you must also specify an ©eof tokenª so that the +synchronizer doesn't inadvertently skip over the end of +file. + +An alternative technique for resynchronization is called +©error token resynchronizationª. +## + +Auxiliary Trace + +An Auxiliary Trace is a pre-built grammar trace which +you may select from the ©Auxiliary Windowsª popup menu for +most windows which display parser state information. +The Auxiliary Trace provides a path to the state +specified in the highlighted line of the primary window. + +When obtained +from the Parser Stack pane of the ©File Traceª or ©Grammar Traceª, the +Auxiliary Trace is simply a copy of the current status of these +traces so you can explore your alternatives while still retaining the +status of the original trace for reference. +## + +Auxiliary Windows + +From most AnaGram windows you can pop up an Auxiliary Windows +menu by clicking the right mouse button or by pressing Shift F10. +Auxiliary Windows may +have Auxiliary Windows of their own. + + Windows with a cursor bar (highlighted line): +The windows available in the Auxiliary Windows menu depend on the +grammar elements identified by the cursor bar in the parent window. If +the cursor bar identifies a ©parser stateª, there will be windows that +describe the state. If the cursor bar identifies a ©grammar ruleª, +there will be windows that describe the rule. If the cursor bar +identifies a ©tokenª, there will be windows that describe the token. In +the case of a ©marked ruleª, token windows will describe the marked +token, if any. In some cases, specialized pre-built grammar traces +such as the ©Conflict Traceª or ©Auxiliary Traceª are on the menu. + + Help windows: +For Help windows, the Auxiliary Windows menu will show all the +available links to other ©Help topicsª from this window. ©Using Helpª +is always available. +## + +Backtrack + +If your ©parserª does not continue after encountering a +©syntax errorª, you can speed it up and make it a +little smaller by turning off the backtrack +©configuration switchª. If backtrack is on, AnaGram +configures your parser so that in case of syntax error +it can undo any ©default reductionsª it might have made +as a consequence of the erroneous input. The purpose of +such an undo function is to identify the proper ©error +frameª and to maximize the probability of being able to +recover gracefully. +## + +Empty Recursion + +This warning message tells you that the recursive step of the +specified ©recursive ruleª can be completely matched by ©zero +lengthª tokens, i.e., by nothing at all. +The result is potentially an infinite loop in the generated ©parserª. +The specified rule is an expansion rule of the specified token. + +Because of the possibility of encountering an infinite loop while parsing, +AnaGram turns off its ©keyword anomalyª analysis if empty recursion is +found. The ©File Traceª function is also disabled for the same reason. + +The ©circular definitionª of a token has the same effect as an +empty recursion, in that no additional input is required to match +the recursive rule. + +## +Keyword Anomaly analysis aborted: empty recursion + +The ©keyword anomalyª analysis has been turned off, since the presence of +©recursive ruleªs with ©empty recursionª can cause infinite loops in the analysis. + +## + +Keyword Anomaly analysis aborted: circular definition + +The ©keyword anomalyª analysis has been turned off, since the presence of +a ©circular definitionª can cause infinite loops in the analysis. + +## + +File Trace disabled: empty recursion + +Because of the presence of ©recursive ruleªs with ©empty recursionª in this grammar and +the infinite loops that can ensue, the ©File Traceª function has been +disabled. + +## + +File Trace disabled: circular definition + +Because of the presence of a ©circular definitionª in this grammar and +the infinite loops that can ensue, the ©File Traceª function has been +disabled. + +## + + + +Both Error Token Resynch and Auto Resynch Specified + + + +This ©warningª message indicates that your ©grammarª +defines an ©error tokenª and also requests ©automatic +resynchronizationª. AnaGram will ignore the request +for automatic resynchronization and will provide ©error +token resynchronizationª. If you named a token "error" +but do not wish ©error token resynchronizationª, you can +either rename "error", or, in a ©configuration +sectionª, you may explicitly specify the error token to +be something you don't otherwise use in your grammar: + [ error token = not used ] +## + +Bottom Margin + +"Bottom margin" is an ©obsolete configuration parameterª. +## + +Bright Background + +"Bright background" is a ©configuration switchª which +was used in the DOS version of AnaGram. It is no longer +used, but is still recognized for the sake of upward +compatibility with old ©configuration fileªs. +## + +Build Parser + +You use the Build Parser command to create a ©parserª based on your +©grammarª. The parser is a C file consisting of the ©embedded Cª (which +may include C++) code in your ©syntax fileª, your ©reduction +procedureªs, a number of tables derived from your grammar +specification, and a ©parsing engineª customized to your requirements. + +If you only wish to investigate your grammar and do not +wish to create ©output filesª, use the ©Analyze +Grammarª command. +## + +Build <file name> + +This item on the ©Action Menuª is available when you have analyzed a +©grammarª but you have not yet built it. It builds the grammar +without reloading the ©syntax fileª from the disk. +## + +Cannot Make Wrapper for Default Token Type + +This ©warningª message occurs when AnaGram finds a token type that has +been previously defined as the ©default token typeª +listed in a ©wrapperª statement. If a wrapper is needed for a +particular type, you must specify the ©data typeª explicitly +for each relevant ©tokenª. + +As a result, a wrapper class has not been created for the specified token type. +## + +Token with Wrapper cannot be Default Token Type + +This ©warningª message indicates that an attempt has been made +to specify a class that has previously been listed in a ©wrapperª +statement as the ©default token typeª. +If a wrapper is needed for a particular type, you must specify the +©data typeª explicitly for each relevant ©tokenª. + +As a result, the default token type has not been set. +## + +Case Sensitive + +"Case sensitive" is a ©configuration switchª which +defaults to on. When it is on, it instructs AnaGram to +build a parser for which all input is case sensitive. +When it is off, the AnaGram builds a parser which +ignores case for all input. + +If the ©iso latin 1ª configuration switch is turned +off, case conversion will be limited to characters +in the normal ascii range. When it is on, case +conversion will be done for all iso latin 1 characters. + +If you have other requirements for case conversion, +you may provide your own definition in your ©embedded cª for the +©CONVERT_CASEª macro which is invoked to perform case +conversion on input characters. + +Note that the value of an input token is unaffected +by the case sensitive switch. When case sensitive is +off, 'a' and 'A' will be treated as the same input +token by the parser, but the ©token valueªs will +nevertheless be different. +## + +C Prologue + +If you include a block of ©embedded Cª code at the very +beginning of your syntax file, it is called the "C +prologue". It will be copied to your ©parser fileª +before any of the code generated by AnaGram. You can +use the C prologue to ensure that copyright notices, +#include directives, or type definitions, for example, +occur at the very beginning of your parser file. + +If you specify a C or C++ type of your own definition, +you must provide a definition in the C prologue. +## + +CHANGE_REDUCTION + +CHANGE_REDUCTION(t) is a macro which AnaGram defines in +your ©parser fileª if your ©parserª uses ©semantically +determined productionsª. In your ©reduction procedureª, +when you need to change the ©reduction tokenª you can +easily do so by calling CHANGE_REDUCTION with the name +of the desired token as the argument. If the token name +has embedded spaces, replace the embedded spaces with +underline characters. +## + +Character Constant + +You may represent single characters in your ©grammarª by +using character constants. The rules for character +constants are the same as in C. The escape sequences +are as follows: + \a alert (bell) character + \b backspace + \f formfeed + \n newline + \r carriage return + \t horizontal tab + \v vertical tab + \\ backslash + \? question mark + \' single quote + \" double quote + \ooo octal number + \xhh hexadecimal number + + AnaGram treats a single +character as a ©character setª +which contains only the specified character. Therefore you +can use a character constant in a ©set expressionª. +## + +Character Map + +The Character Map table shows you the mapping of input +characters to ©token numbersª. The ©ag_tcvª table in +your parser is based on the information in this table. + +The fields in this table are: + character code + display character, if any (what Windows displays for this code) + ©partition set numberª + ©token numberª + ©token representationª + +The display character will be what Windows displays for the character +code in the Data Tables font you have chosen. +## + +Character Range + +A "character range" is a simple way to specify a +©character setª. There are two ways to represent a +character range in an AnaGram ©syntax fileª. + +The first way is like a ©character constantª: 'a-z'. + +The second way allows somewhat greater freedom: + 'a'..'z' + 'a'..255 + ^Z..037 + -1..0xff +Here you use two arbitrary ©character representationsª +separated by two dots. If the two characters are out of +order, AnaGram will reverse the order, but will give +you a ©warningª. + +More complex ©character setsª may be specified by using +©unionª, ©differenceª, ©intersectionª, or ©complementª +operators. +## + +Character Representation + +In an AnaGram ©syntax fileª you may represent a +character literally with a ©character constantª or +numerically using decimal, octal or hexadecimal +representations following the conventions for C. Thus +'A', 65, 0101, and 0x41 all represent the same +character. Control characters can be represented using +the '^' character and either an upper or lower case +letter. Thus ^j and ^J are acceptable representations +of the ascii newline code. The rules for character +constants are identical to those in C, and the same +escape sequences are recognized. +## + +Character Set + +In AnaGram grammars you can conveniently specify whole +sets of characters at a time. This avoids +needless repetition and complexity. + +Sets of characters may be defined in an AnaGram ©syntax +fileª in any of a number of ways. A single character is +taken to represent a character set consisting of a +single element. (See ©character representationª.) You +can also specify a set consisting of a range of +characters (see ©character rangeª) and perform the +familiar set operations, union, intersection, difference +and complement. + +All the sets you define in your syntax file are +summarized in the ©Character Setsª window. + +The ©unionª of two character sets, represented by a '+', +contains all characters that are in one or another of +the two sets. Thus, 'A-Z' + 'a-z' represents the set of +all upper and lower case letters. + +The ©intersectionª of two character sets, represented +by a '&', contains all characters that are in both +sets. Thus, suppose you have the ©definitionsª + letter = 'A-Z' + 'a-z' + hex digit = '0-9' + 'A-F' + 'a-f' +Then (letter & hex digit) contains precisely upper and +lower case a to f. + +The ©differenceª of two character sets, represented by +a '-', contains all characters that are in the first +set but not in the second set. Thus, using the same +definitions as above, (letter - hex digit) contains +precisely upper and lower case g to z. + +The ©complementª of a character set, represented by a +preceding '~', represents all characters in the +©character universeª which are not in the given set. +Suppose you have defined a set, ©eofª, which consists of +the characters which represent end of file. Then, in +your grammar where you wish to accept an arbitrary +character, what you really want is anything but an end +of file character. You can define it thus: + anything = ~eof +## + +Character Sets + +This window lists all of the distinct ©character setªs +which you defined, implicitly or explicitly, in your +©grammarª. Each line in the table describes one such +set. + +The description takes the form of the internal set +number and the defining ©expressionª. The ©Auxiliary +Windowsª menu will allow you to see the ©Partition +Setsª which cover the character set, and the ©Set +Elementsª which it comprises, as well as the ©Token Usageª. +## + +Character Universe, Universe + +The character universe, or set of all expected input +characters to your parser, is defined as all characters +in the range given by a particular lower bound and a +particular upper bound, as described below. + +The character universe is used for two things in +AnaGram. The first use is for calculating the +©complementª of a character set. The second use is in +the input processing of your parser. Input characters +will be used to index a ©token conversionª table to +convert character codes to token numbers. The length of +this table will be given by the size of the character +universe. If you have set the ©test rangeª +©configuration switchª you parser will verify that the +input character is within the range of the conversion +table. Otherwise, the character code will not be +checked for validity. In this case, an out-of-range +character will lead to undefined behavior. + +If you have not used any characters with negative codes +in your grammar, the lower bound is zero. Otherwise, it +is the most negative such character. + +If the highest character code you have used is less +than or equal to 255, the upper bound will be 255. + +If you have used a character code greater than 255, the +upper bound will be the largest such code which appears +in your syntax file. +## + +Characteristic Rule + +Each ©parser stateª is characterized by a particular +set of ©grammar rulesª, and for each such rule, a +marked token which is the next ©tokenª expected. The +combination of a grammar rule and its marked token is often +called a ©marked ruleª. A marked rule which +characterizes a state is called a "characteristic +rule". In the course of doing ©grammar analysisª, +AnaGram determines the characteristic rules for each +©parser stateª. After analyzing your grammar, you may +inspect the ©State Definition Tableª to see the +characteristic rules for any state in your parser. +## + +Characteristic Token + +Every state in a ©parserª, except state 0, can be +characterized by the one, unique ©tokenª which causes a +jump to that state. That token is called the +©characteristic tokenª of the state, because to get to +that ©parser stateª you must have just seen precisely +that token in the input. Note that several states could +have the same characteristic token. + +When you have a list of states, such as is given by the +©parser state stackª, it is equivalent to a list of +characteristic tokens. This list of tokens is the list +of tokens that have been recognized so far by the +parser. +## + +Circular Definition + +If the ©expansion ruleªs for a ©tokenª contain a ©grammar ruleª that +consists only of the token itself, the definition of the +token is circular. A circular definition is an extreme +case of ©empty recursionª. + +As in cases of empty recursion, the generated parser may contain +infinite loops. When such a condition is detected, therefore, +©keyword anomalyª analysis the ©File Traceª option are disabled. + +## + +column + +"column" is an integer field in your ©parser control +blockª used for keeping track of the column number of +the current character in your input. Line and column +numbers are tracked only if the ©lines and columnsª +©configuration switchª has been set. +## + +Command Line + +If you provide the name of a syntax file on the +command line when you start AnaGram, it will open +the file and run either ©Analyze Grammarª or ©Build +Parserª depending on the setting of the ©Autobuildª +switch. +## + +Command Line Version, agcl.exe + +The command line version of AnaGram, agcl.exe, can be +used in make files. It takes the name of a single syntax +file on the command +line. Error and ©warningª messages are written to stdout. + +Normally you would only use the command line version once you +have finished developing your ©parserª and are integrating +it with the rest of your program. + +The command line version of AnaGram is not included with +trial copies. +## + +Comment + +You may incorporate comments in your syntax file using +either of two conventions. The first is the normal C +convention for comments which begin with "/*" and end +with "*/". Such comments may be of arbitrary length. By +setting or resetting the ©nest commentsª switch, you +may control whether they may be nested or not. + +The second convention for comments is the C++ comment +convention. In this case the comment begins with "//" +and ends with a newline. + +When writing a ©grammarª, you may wish to allow a user +to comment his input freely without your having to +explicitly allow for comments in your grammar. You may +accomplish this by using the ©disregardª statement. +## + +Compile Command + +"Compile command" is a ©configuration parameterª which +takes a string value. This parameter was used in the +DOS version of AnaGram, but is ignored in the Windows +version. +## + +Complement + +In set theory, the complement of a set, S, is the set +of all elements of the ©universeª which are not members +of the set S. + +In AnaGram, the complement operator for ©character +setsª is given by '~' and has higher precedence than +©differenceª, ©intersectionª, or ©unionª. + +In AnaGram, the most useful complement is that of the +end of file character set. For ordinary ascii files it +is often convenient to read the entire file into +memory, append a zero byte to the end, and define the +end of file set thus: + eof = 0 + ^Z. +Then, ~©eofª represents all legitimate input characters. + +You can then use set differences to specify certain +useful sets without tedious enumeration. For example, a +comment that is to be terminated by the end of line +then consists of characters from the set + comment char = ~'\n' & ~eof +This set could also be written + comment char = ~('\n' + eof) +## + +Completed Rule + +A "completed rule" is a ©characteristic ruleª which has no ©marked +tokenª. In other words, it has been completely matched and will be +reduced by the next input. + +If there is more than one completed rule in a state, +the decision as to which to reduce is made based on the +next input token. If there is only one completed rule +in a state, it will be reduced by default unless the +©default reductionsª switch has been reset, i.e., +turned off. +## + +Configuration File + +If it can find them, AnaGram reads two configuration +files to set up ©configuration parameterªs. At program +initialization, it will first attempt to read a +configuration file in the directory that contains +the AnaGram executable file you are running. Then it +will read a configuration file in your working +directory. Both files should have the name +"AnaGram.cfg" if they exist. Neither is necessary. + +If a parameter is specified in both files, the +specification in the file from the working directory +takes precedence. + +The effect of this two stage process is to allow you to +set your standard preferences in the principal +directory, with specific overrides in your working +directories. + +The values for configuration parameters in ©syntax +filesª override those read from configuration files. + +AnaGram does not save configuration parameters in +the Windows registry, nor does it provide any +mechanism for setting or changing the values of +configuration parameters within AnaGram itself. +## + +Configuration Parameter + +Configuration parameters may be specified either in +©configuration filesª or in your ©syntax fileª. In your +syntax files, configuration parameters are specified, +one per line, in a ©configuration sectionª. + +AnaGram ignores case when identifying a configuration +parameter, so that "ALLOW MACROS", "Allow Macros", and +"allow macros" are all equivalent forms. + +There may be any number of configuration sections in a +©syntax fileª. Any parameter may be specified any +number of times. Since AnaGram maintains only one value +in storage for these parameters, whenever it refers to +one it will see the most recently specified value. +Every configuration parameter has a default value which +has been chosen to correspond to a standard if it +exists, customary usage if such can be determined, or +otherwise to the most likely usage. + +Before executing an Analyze Grammar or Build Parser command, AnaGram +resets configuration parameters to their initial values, as +determined by the built in defaults and the configuration files read +at program initialization. + +The ©Configuration Parameters Windowª shows the current settings of all +of the configuration parameters. When this window is active you may +press ©F1ª or click with the ©help cursorª to pop up a help window +describing the parameter under the cursor bar. + +There are several varieties of configuration +parameters. Some simply set or reset a condition. These +need simply be stated to set the condition or negated +with the tilde (~) to reset the condition. Thus + [ nest comments ] +causes AnaGram to allow nested comments, and + [ ~nest comments ] +causes AnaGram to disallow nested comments. + +If you prefer you may explicitly specify a switch value as on or off: + [ nest comments = on] + + A second kind +of configuration parameter takes a value +which is the name of a token. Thus + [ grammar token = c grammar] +specifies that the token, c grammar, is the ©grammar +tokenª which is to be analyzed. + +A third variety of configuration parameter takes a +value which is a C data type. Thus + [ default token type = unsigned char *] +signifies that the ©semantic valueª of a token, unless +otherwise specified is a pointer to an unsigned char. + +A fourth variety of configuration parameter takes a +string value to set some ascii string used by AnaGram. +Thus + [ header file name = "widget.h" ] +signifies that the header file created by AnaGram +should be called "widget.h". + +In string-valued parameters used to specify the names +of output files or the name of your parser, you may use +the '#' character to indicate the name of your syntax +file: When the string is actually used, AnaGram will +substitute the syntax file name for the '#'. + +In string-valued parameters used to specify the names +of functions or variables that AnaGram generates, you +may use '$' to specify the name of your parser. When +the string is actually used, AnaGram will substitute +the name of your parser for the '$'. + +In the "©enum constant nameª" configuration parameter +you may use '%' to specify where a token name is to be +substituted. + +The final variety of configuration parameter takes a +numeric value. The value may be decimal, octal +or hexadecimal, following the C conventions, and may +have an optional sign. Thus + [parser stack size = 50] +tells AnaGram to allocate space for at least fifty stack entries +when it creates your parser. +## + +Configuration Parameters Window + +The Configuration Parameters window lists the +©configuration parameterªs AnaGram accepts with their +current values, as set by the ©configuration filesª it +has read and by the most recent ©syntax fileª it has +analyzed. Configuration parameters cannot be changed +from within AnaGram. +## + +Configuration Section + +A configuration section is one of the main divisions of +your ©syntax fileª. It begins with a left square +bracket on a fresh line. It then contains definitions +of ©configuration parameterªs, ©configuration switchª +settings and ©attribute statementªs. These +specifications must each start on a new line. The +configuration section is closed with a right bracket. +Any further component of your syntax file, other than a +©commentª, must start on a fresh line. + +There can be any number of configuration sections in a +syntax file. +## + +Configuration Switch + +A configuration switch is a ©configuration parameterª +which can take on only the two values true and false, +or on and off. You set a configuration switch, or turn +it on, by simply naming it in your ©configuration fileª +or in a ©configuration sectionª of your ©syntax fileª. +You turn it off, or "reset" it, by use of the tilde: +"~nest comments", for example, resets, or turns off, +the ©nest commentsª switch. If you prefer, you may +assign the value "on" to set the switch, or "off" to +reset it. For example: + nest comments = on +## + +Conflict + +"Conflicts" arise during the ©grammar analysisª when +AnaGram cannot determine how to treat a given input +token. There are two sorts of conflicts: ©shift-reduce +conflictsª and ©reduce-reduce conflictsª. Conflicts may +arise either because the grammar is inherently +ambiguous, or simply because the grammar analyzer +cannot look far enough ahead to resolve the conflict. +In the latter case, it is often possible to rewrite the +grammar in such a way as to eliminate the conflict. In +particular, ©null productionsª are a common source of +conflicts. + +When AnaGram analyzes your grammar, it lists all +unresolved conflicts in the ©Conflictsª window. A number +of ©Auxiliary Windowsª available from the Conflicts window +provide help in identifying the source of the conflict. + +There are a number of ways to deal with conflicts. If +you understand the conflict well, you may simply choose +to ignore it. When AnaGram encounters a shift-reduce +conflict while building parse tables it resolves it by +choosing the ©shift actionª. When AnaGram encounters a +reduce-reduce conflict while building parse tables, it +resolves it by selecting the ©grammar ruleª which +occurred first in the grammar. + +A second way to deal with conflicts is to set ©operator +precedenceª parameters. If you set these parameters, +AnaGram will use them preferentially to resolve +conflicts. Any conflicts so resolved will be listed in +the ©Resolved Conflictsª window. + +A third way to resolve a conflict is to declare some +tokens as ©stickyª. This is particularly useful for +©productionªs whose sole purpose is to skip over +uninteresting input. + +A fourth way to resolve conflicts is to declare a token +to be a ©subgrammarª. When you do this, AnaGram does +not look beyond the definition of the subgrammar token +itself for reducing tokens. This is not a particularly +selective way to resolve conflicts and should be used +only when the subgrammar token is naturally defined +only by internal criteria. The tokens identified by +lexical scanners are prime examples of this genre. + +The fifth way to deal with conflicts is to rewrite the +grammar to eliminate them. Many people prefer this +approach since it yields the highest level of +confidence in the resulting program. + +Please refer to the AnaGram User's Guide for more information about +dealing with conflicts. +## + +Conflicts + +If there are ©conflictªs in your grammar which are not +resolved by ©precedence rulesª, they will be listed in +the Conflicts window. The Conflicts window will also be +listed in the ©Browse Menuª. Conflicts which have been +resolved by ©precedence rulesª are listed in the +©Resolved Conflictsª window. + +The Conflicts window lists the conflicts, or +ambiguities, which AnaGram found in your grammar. The +table identifies the ©parser statesª in which it found +conflicts, the ©conflict tokenªs for which it had more +than one option, and the ©marked rulesª for each such +option. If one of the rules for a particular conflict +has a ©marked tokenª, the conflict is +a ©shift-reduce conflictª. The marked token is the token +to be shifted. If none of the rules has a marked token the conflict is +a ©reduce-reduce conflictª. + +AnaGram provides a number of ©Auxiliary Windowsª to help +you find and fix the source of the conflict. The +©Conflict Traceª window is a pre-built ©Grammar Traceª +window which shows you one of perhaps many ways to +encounter the conflict. The ©Reduction Traceª window +shows the result of reducing a particular ambiguous +rule. + +In addition, the ©Rule Derivationª and ©Token +Derivationª windows show you why the conflict token is a +©reducing tokenª. They are particularly useful for +shift-reduce conflicts. + +The ©Expansion Chainª window is helpful for understanding +reduce-reduce conflicts. + +Other Auxiliary Windows which are often useful are the +©State Definitionª window, the ©Reduction Statesª +window, and the ©Problem Statesª window. + +Please refer to the AnaGram User's Guide for more information on how to +deal with conflicts. +## + +Conflicts Resolved by Precedence Rules + +This ©warningª message indicates that AnaGram has +resolved conflicts in your grammar by using ©precedence +rulesª: guidelines you supplied either by explicit +©precedence declarationsª, by using a ©stickyª +statement or ©distinguish lexemesª statement, or +implicitly by using a ©disregardª statement. These +conflicts are listed in the ©Resolved Conflictsª +window, and are not listed in the ©Conflictsª window. +## + +Conflict Token + +In any given ©conflictª, there is a ©tokenª for which +an unambiguous ©parser actionª cannot be determined. +This token is called the "conflict token". +## + +Conflict Trace + +The Conflict Trace is a ready-made ©Grammar Traceª +which shows you one of perhaps many ways to get to the +state which has the ©conflictª selected by the cursor +bar. The Conflict Trace window is an option in the +©Auxiliary Windowsª menu for the ©Conflictsª window and +the ©Resolved Conflictsª window. +## + +Const Data + +The const data ©configuration switchª controls the use +of CONST qualifiers in generated code. If the switch is +set, all fixed data arrays in the ©parser fileª will be +qualified as CONST, unless the ©old styleª switch is +set. The default setting is ON. Other configuration +switches which control declaration qualifiers in the +parser file are ©near functionsª and ©far tablesª. +## + +CONTEXT + +"CONTEXT" is a macro which AnaGram defines for you if +you have defined a ©context typeª. It provides access +to the top value of the ©context stackª. Your +©GET_CONTEXTª macro may store the current context by +assigning a value to CONTEXT. Suppose your parser uses +©pointer inputª, and you wish to know the value of the +©pointerª for every production. You could define +GET_CONTEXT thus: + #define GET_CONTEXT CONTEXT = PCB.pointer + + In ©reduction procedureªs, you may use the CONTEXT +macro to find the context for the rule you are +reducing, that is to say, the value the context +variables had when the first token in the rule was +encountered. +## + +Context Stack + +It is often convenient, when writing ©reduction +procedureªs, to know the actual context of the ©grammar +ruleª your procedure is reducing. To do this you need +to know the values that certain variables, such as +stack pointers, or input pointers, in your program had +at various stages as your parser matched the rule. You +can accomplish this by maintaining a context stack. + +If you wish, AnaGram will keep track, on a stack, of any +context variables you wish. To do so, define a structure +which can hold all the values you need to stack. Use the +©context typeª ©configuration parameterª to tell AnaGram +how to declare the stack. Then define the ©GET_CONTEXTª +macro to gather the appropriate values and store them on +the stack. The ©CONTEXTª macro evaluates to the proper +location into which the GET_CONTEXT macro should store +the context value. AnaGram will invoke the GET_CONTEXT +macro whenever necessary to make sure the right values +are stacked. In a reduction procedure, you can then use +the macro ©RULE_CONTEXTª to find the value of the +context structure as of the beginning of each token in +the rule you are reducing. + +If your parser is ©event drivenª, store the context of +the input token in PCB.input_context. The default +version of GET_CONTEXT will stack the context as +appropriate. + +If your parser should encounter an error, you may use +©ERROR_CONTEXTª to determine the values of the context +variables at the beginning of the aborted grammar rule. +## + +context type + +"Context type" is a ©configuration parameterª whose +value is a C type name, possibly as defined by a +typedef statement. By default, "context type" is +undefined. If you define it, AnaGram will set up a +©context stackª in your ©parser control blockª so you +can track the context of ©productionªs. + +Each time your parser pushes values onto the state +stack and value stack it will invoke the ©GET_CONTEXTª +macro to store the current context on the context +stack. The macro ©CONTEXTª names the current stack +location. In your GET_CONTEXT macro you can use it as +the destination for the current context. In a +©reduction procedureª, CONTEXT names the context as of +the beginning of the production. Two other macros are +available to inspect the values of the context stack. +In a reduction procedure, you may use ©RULE_CONTEXTª[k] +to determine the value of the context variable as it +was as of the (k+1)th token in the rule. In particular, +RULE_CONTEXT[0] is the value the context variable had +when the first token in the rule was seen. + +If you enable the ©error frameª ©configuration switchª, +you may use ©ERROR_CONTEXTª to determine the context of +the production your parser was trying to identify at +the time of the error. +## + +CONVERT_CASE + +CONVERT_CASE is a user definable macro which AnaGram +invokes to convert the case of input characters when +the ©case sensitiveª switch has been turned off. If +you do not define the macro yourself, AnaGram will +provide a macro which will convert case correctly +for characters in the ASCII character range and +also for ©ISO latin 1ª characters if the corresponding +©configuration switchª is on. + +## + +Coverage File Name + +If you have set the ©rule coverageª ©configuration +switchª to include coverage analysis in your parser, +AnaGram uses the value of the coverage file name +©configuration parameterª to find the results of your +testing. The value of the parameter is a string. The +default value is "#.nrc", where '#' represents the name +of your syntax file. +## + +cs + +cs is a field in a ©parser control blockª which +contains your ©context stackª. cs will be defined only +if you have defined the ©configuration parameterª +©context typeª. +## + +Current Grammar + +The Current Grammar is the ©grammarª you presently have +loaded. Its name is displayed on the title bar of +each AnaGram window. + +A status field at the right center of the ©Control Panelª +indicates the state of processing that has been +carried out on the grammar. + +"Loaded" means that the ©syntax fileª has been read +into memory, but that syntax errors have been found. + +"Parsed" means that AnaGram has tried to analyze the +grammar, but got into some kind of difficulty and did +not complete the job. The explanation should be +apparent from the messages in the ©Warningsª window. + +"Analyzed" means that a ©grammar analysisª has been +completed, but no ©output filesª have been written. + +"Built" means that an analysis has been completed and +output files have been written. +## + +Data Type + +The ©tokensª in your ©parserª usually have ©semantic +valuesª. The data types for these values will be +determined by the ©default input typeª and ©default +token typeª ©configuration parameterªs unless you +explicitly provide ©token declarationsª in your grammar. +You may also define the data type for any ©nonterminalª +token by preceding the token name with an ordinary C +cast when you write a production. For example: + + (int) integer + -> '0-9':d =d-'0'; + -> integer:n, '0-9':d =10*n + d - '0'; + +The data type may be any simple C or C++ data type, with +arbitrary indirection and qualification. You may also +use any type you have defined by means of typedef, +struct or class definitions. Template classes may also +be used. If you specify a type of your own definition, +you must provide a definition in the ©C prologueª at the +beginning of your ©syntax fileª. + +A token may have the type "void" if its value has no +interest for the parser. Since your parser will not +stack a value for a void token, your parser may run +somewhat faster when tokens are declared as void. +## + +Declare pcb + +"Declare pcb" is a ©configuration switchª that defaults +to on. If this switch is set when you invoke the ©Build +Parserª command, AnaGram will automatically declare a +©parser control blockª for you, at the beginning of +your parser file. If you have used data types that you +define yourself, the typedef statements need to precede +the parser control block declaration. In this case, you +should turn "declare pcb" off and declare it yourself. + +For more information, see the AnaGram User's Guide. +## + +Default Input Type + +The default input type is a ©configuration parameterª +which determines the ©data typeª for the ©semantic +valueªs of ©terminal tokensª if they are not explicitly +declared. Normally, you would explicitly declare +terminal tokens only when you have set the ©input +valuesª ©configuration switchª. If you do not set the +default input type, it will default to "int". + +The default data type for the values of ©nonterminal +tokensª is given by the ©default token typeª +configuration parameter. +## + +Default Reduction + +"Default reductions" is a ©configuration switchª which +defaults to on. + +A "default reduction" is a ©parser actionª which may be +used in your parser in any state which has precisely +one ©completed ruleª. + +If a given ©parser stateª has, among its ©characteristic +rulesª, exactly one completed rule, it is usually faster +to reduce it on any input than to check specifically for +correct input before reducing it. The only time this +default reduction causes trouble is in the event of a +©syntax errorª. In this situation you may get an +erroneous reduction. Normally when you are parsing a +file, this is inconsequential because you are not going +to continue semantic action in the presence of error. +But, if you are using your parser to handle real-time +interactive input, you have to be able to continue +semantic processing after notifying your user that he +has entered erroneous input. In this case you would want +default reductions to have been turned off so that +©productionªs are reduced only when there is correct +input. +## + +Default reduction value + +If a ©grammar ruleª does not have a ©reduction procedureª +the ©semantic valueª of the first token in the rule will +be taken as the semantic value of the token on the left +hand side. If these tokens do not have the same ©data typeª +a ©warningª will be given. +## + +Default Token Type + +"Default token type" is a ©configuration parameterª +which determines the ©data typeª for the ©semantic +valueª of a ©nonterminal tokenª if no other type is +explicitly specified. It defaults to void. Therefore, if +any ©reduction procedureª returns a value, you must +either explicitly set the type of the ©reduction tokenª +or you must set default token type to an appropriate +value. + +The default token type cannot have a ©wrapperª class +defined. + +The default data type for the value of a ©terminal +tokenª is given by the ©default input typeª +configuration parameter. +## + +Definition, Definition Statement + +AnaGram syntax files may contain definition statements +which assign new names to ©character setsª, ©virtual +productionsª, ©keyword stringsª, ©immediate actionsª, +or ©tokensª. Definitions have the form + name = <character set> + name = <virtual production> + name = <keyword string> + name = <immediate action> + name = <token name> + +For example, + letter = 'a-z' + 'A-Z' + statement list = statement?... + include = "include" + +The symbols thus defined may be used anywhere the +expression on the right hand side might be used. Such +definitions, in and of themselves, do not define tokens. +Tokens are defined only by their usage in productions. + +## + +DELETE_WRAPPERS + +If your parser uses ©wrapperªs and exits with an error condition, there +may be objects remaining on the ©parser value stackª. The DELETE_WRAPPERS macro +can be used to delete any remaining objects on the stack. +If you have enabled +©auto resynchª, DELETE_WRAPPERS will be invoked automatically. +## + +Diagnose Errors + +"Diagnose errors" is a ©configuration switchª which +defaults to on. When this switch is on, AnaGram includes a +function, ag_diagnose(), in your parser which provides simple +syntax error disgnoses. When your parser encounters a +syntax error, this function will be called immediately prior +to the invocation of the ©SYNTAX_ERRORª macro. A pointer to the message will be +stored in the ©error_messageª field of the ©parser control blockª. + +If you wish to implement your own ©error diagnosisª, you +should turn this switch off, and include a call to your +own diagnostic procedure in your SYNTAX_ERROR macro. + +ag_diagnose() provides three possible error messages, +governed by three macros: ©MISSING_FORMATª, ©UNEXPECTED_FORMATª, and +©UNNAMED_TOKENª. You may override the definitions of +these macros with your own definitions if you wish +to provide diagnostics in another language + +If you have set the ©error frameª +switch it will also set the ©error_frame_tokenª field. +The "error_frame_token" is the non-terminal token which +the parser was trying to complete when the error was +encountered. + +When the "diagnose errors" switch is set, AnaGram also +includes the a ©token namesª table in the parser which +contains the ascii names of the tokens in the grammar, +including entries for character constants and keywords. + +Use the ©token names onlyª switch to limit the table +to explicitly named tokens only. +## + +MISSING_FORMAT + +MISSING_FORMAT is a macro that is used by the error +diagnositic function created by the ©diagnose errorsª +switch. If you do not define it in your parser, +AnaGram will define it thus: + #define MISSING_FORMAT "Missing %s" + + This format is used when the diagnostic function can +identify a unique terminal or nonterminal token that +would satisfy the syntactic rules and is named +in the ©token namesª table. +## + +UNEXPECTED_FORMAT + +UNEXPECTED_FORMAT is a macro that is used by the error +diagnositic function created by the ©diagnose errorsª +switch. If you do not define it in your parser, +AnaGram will define it thus: + #define UNEXPECTED_FORMAT "Unexpected %s" + + This format is used when the diagnostic function cannot +identify a named, unique terminal or nonterminal token that +would satisfy the syntactic rules and finds an +incorrect token, the name of which can be found +in the ©token namesª table. +## + +UNNAMED_TOKEN + +UNNAMED_TOKEN is a macro that is used by the error +diagnositic function created by the ©diagnose errorsª +switch. If you do not define it in your parser, +AnaGram will define it thus: + #define UNNAMED_TOKEN "input" + + This macro is used as argument for the ©UNEXPECTED_FORMATª +macro when the actual, erroneous input cannot be identified. +## + +Difference + +In set theory, the difference of two sets, A and B, is +defined to be the set of all elements of A that are not +elements of B. In an AnaGram ©syntax fileª, you +represent the difference of two ©character setsª by +using the '-' operator. Thus the difference of A and B +is A - B. The difference operator is ©left +associativeª. +## + +Disregard + +The purpose of the "disregard" statement is to skip over +uninteresting ©white spaceª and comments in your input +file. It allows you to specify a token that should be +passed over in the input to your parser. The statement +takes the form: + disregard ws +where "ws" is a token name or character set. Disregard +statements, like other ©attribute statementªs, may be +placed in any ©configuration sectionª. + +You may have more than one disregard statement in your +©grammarª. If you do, AnaGram will create a shell +production. For example, suppose you write: + [ disregard alpha + disregard beta ] +AnaGram will proceed as though you had written: + gamma -> alpha | beta + [ disregard gamma ] + + It frequently happens that you wish your ©parserª to +disregard blanks or comments, except that ©white spaceª +within names, numbers, strings, and other elementary +constructs is subject to special rules and thus should +not be disregarded blindly. In this case, you can use +the "©lexemeª" statement to declare these constructs off +limits for the disregard statement. Within these +constructs, the disregard statement will be inoperative +and the admissibility of white space is determined +solely by the productions which define these constructs. + +Outside those productions which define lexemes, you +should not generally use a token which is supposed to be +disregarded. If you do, your grammar will have +©conflictªs, since the token could satisfy both the +explicit usage, as well as the implicit rules set up by +the disregard statement. Such conflicts, however, are +resolved automatically in favor of your explicit use of +the token. The conflicts will appear in the ©Resolved +Conflictsª window. + +If you have "open ended" lexemes in your grammar such +as variable names or numeric constants, your grammar +will detect a conflict if one of these lexemes may +follow another such lexeme immediately. To deal with +these conflicts, you should turn on the "©Distinguish +Lexemesª" configuration switch. It will cause white +space to be required as a separator between the +lexemes. + +In order to implement the "disregard" statement AnaGram +will redefine some tokens in your grammar. For example, +'+' may be redefined to consist of a simple plus sign +followed by optional white space: + '+' -> '+'%, white space?... +The ©percent signª is used to indicate the original, +simple plus without the optional white space attached. +You will probably notice the percent sign appearing in +some windows and traces. +## + +distinguish keywords + +"distinguish keywords" is an ©attribute statementª +which you may include in a ©configuration sectionª. It +is used to tell AnaGram how to distinguish ©keywordªs +from similar sequences of characters in your input +stream. For example, you may want your parser to +recognize "int" as a keyword when it appears in the +following context: + int x; +but not when in appears in the middle of such words as +"integral" and "intolerant". The operand of +"distinguish keywords" is a list of character set +©expressionªs separated by commas and enclosed in braces +({ }). + +Once AnaGram has read your entire syntax file, it +evaluates all of these character sets and tests each +keyword string against the character sets in the order +in which they were encountered in the program. If all +the characters which constitute a particular keyword +are members of the specified set, the keyword logic is +set up so that it will recognize the keyword only if +the immediately following character is not in the set. + +In the example above, + [distinguish keywords {'a-z'} ] +will do the trick. + +The "©stickyª" statement also affects the recognition +of keywords. +## + +Distinguish Lexemes + +The "distinguish lexemes" ©configuration switchª is +used in conjunction with the "©disregardª" statement +and the "©lexemeª" statement to resolve the +©shift-reduce conflictªs which often crop up when +suppressing white space. + +The difficulty with suppressing white space is that you +wish it to be optional in cases like "x+y", where it is +not necessary in order to parse correctly, but you want +to require it in situations such as "mytype x", where +it is necessary to separate otherwise indistinguishable +constructs. If the white space were optional, it would +be necessary to allow for "mytypex", but it would be +impossible to determine if this were to be interpreted as +"mytype x", "mytyp ex", or any of the many other +possibilities. + +The distinguish lexemes switch causes AnaGram to make +the white space optional where doing so causes no +ambiguity and makes it mandatory where to make it +optional would lead to ambiguity. In the example given +above, "mytypex" would be treated as a single name, and +another name would have to follow separating white +space. + +The default value for distinguish lexemes is OFF. It is +anticipated that this will be changed to ON in future +releases of AnaGram. +## + +Duplicate Production + +This ©warningª message appears when a ©productionª +appears twice in your ©grammarª. You will have a +number of ©reduce-reduce conflictªs as a consequence. +Eliminate the duplicate, and the conflicts it caused +will go away. +## + +Edit Command + +"Edit command" is a ©configuration parameterª which +accepts a string value. It is no longer used and is +retained only for file compatiblity with the DOS +version of AnaGram. +## + +Embedded C + +You may encapsulate pieces of C or C++ code in your ©syntax +fileª more or less arbitrarily. Such pieces of code will +simply be copied to the ©parser fileª in the order in +which they are encountered. Each such piece of code must +be enclosed with braces({}). The left brace must be on a +new line, and nothing except comments may follow the +right brace. AnaGram does not inspect the interior of +such a piece of C code except to identify character +constants, strings, comments and blocks surrounded with +braces so that it does not identify the end of the +embedded C prematurely. Note that AnaGram will use the +status of the ©nest commentsª ©configuration switchª in +effect at the beginning of the embedded C. + +AnaGram, of course, can be confused by unterminated +strings, unbalanced brackets, and unterminated comments. +The most likely outcome, in such a situation, is that +AnaGram will encounter an end of file looking for the +end of the embedded C. Should this happen, AnaGram will +identify the beginning of the piece of embedded C which +caused the problem. + +If your syntax file begins with a block of embedded C, +called the "©C prologueª", it will be copied to the very +beginning of the parser file, preceding all of AnaGram's +output. You may use such an initial block of embedded C +to guarantee that program title comments, copyright +notices and important definitions are at the very +beginning of your parser file. + +The code you include as embedded C, of course, has to +coexist with the code AnaGram generates. In order to +keep the potential for name conflicts to a minimum, all +variables and functions which AnaGram defines begin with +the letters "ag_". You should avoid variable names which +begin with these letters. + +If AnaGram finds no embedded C in a syntax file, and you +ask it to build a parser, it will automatically generate +a main program that calls your parser. If you don't want +it to do this, you may turn off the ©main programª +©configuration switchª. +## + +Empty Keyword String + +This ©warningª appears when you have a keyword string +that contains no characters whatsoever. ©Keyword +stringsª must contain at least one character. If you +wish a null match, use a ©null productionª instead. +## + +Enable Mouse + +"Enable mouse" is a ©configuration switchª that defaults +to on. It is not used in the Windows version of AnaGram +and has been retained only for file compatibility with +the DOS version. +## + +Enum Constant Name + +The "enum constant name" ©configuration parameterª +allows you to select the name AnaGram will use for the +set of enumeration constants it defines in the ©parser +headerª file for your ©parserª. The value of "enum +constant name" should be a string containing the '%' +character. AnaGram will substitute each token name in +turn into this template as it creates the list of +enumeration constants. If it finds a '$' character it +will substitute the name of your parser. The default +value of "enum constant name" is "$_%_token". +## + +Enumeration Constants + +In your ©parser headerª file, AnaGram includes a typedef +enum statement which provides enumeration constants +corresponding to all the named constants in your +grammar. The names of the enumeration constants +themselves are defined by the ©enum constant nameª +©configuration parameterª. These constants are useful +when dealing with ©semantically determined productionsª. +## + +Enum + +Within a ©configuration sectionª, you may use an "enum" +statement to define numeric values for any number of +tokens just as you define enumeration constants in C. +The syntax is effectively the same as the enum statement +in C: + + [ + enum { + first = 60, + second, + third, + fourth = 'a', + fifth, + } + ] + +is exactly equivalent to + first = 60 + second = 61 + third = 62 + fourth = 'a' + fifth = 'b' +## + +eof + +"eof" is a quasi reserved word in AnaGram, used to +specify an end of file token. You may use another token +as an end of file delimiter by setting the ©Eof Tokenª +©configuration parameterª. eof is not required unless +you use ©automatic resynchronizationª in your ©parserª. + +If you have not defined eof or specified an Eof Token +parameter, ©File Traceª may show a syntax error when it +encounters the end of a test file. + +There are various ascii values that are commonly used +to represent an end of file. The end of a string in +memory is commonly 0, DOS uses ^Z, Unix uses ^D, and +Unix style stream I/O uses -1. It is often convenient +then to define + + eof = -1 + 0 + ^D + ^Z +## + +Eof Token + +"Eof token" is a ©configuration parameterª which accepts +a token name as a value. There is no default value. +AnaGram does not need a specification for the eof token +unless you are using its ©automatic resynchronizationª +facility. + +If you use the ©automatic resynchronizationª capability +of AnaGram, you must specify explicitly an end of file +token. You can do this either by defining a ©terminal +tokenª in your ©grammarª called eof or by using the "eof +token" parameter to identify some other terminal token +to be used as the end of file marker. You would do this +only if you must use the name "©eofª" for some other +purpose. + +Note that "eof" is case sensitive. Neither Eof nor +EOF will qualify as end of file tokens unless you +explicitly specify them using the eof token parameter. +## + +Eof Token Not Defined + +This ©warningª appears if you have requested either +©error token resynchronizationª or ©automatic +resynchronizationª and you have not defined an ©eof +tokenª. The resynchronization procedure will not work +correctly at end of file. +## + +Error Action + +The error action is one of the four ©parser actionªs of a +traditional ©parsing engineª. The error action is +performed when the parser has encountered an input +token which is not admissible in the current state. +The further behavior of a traditional parser is +undefined. +## + +Error Defining + +"Error defining TXXX: <token representation>" is a +©warningª message which appears if errors are encountered +while attempting to evaluate the ©character setª for +the specified ©tokenª. This warning is always generated +in addition to more detailed warnings that are made +when the actual errors are encountered. +## + +Error frame + +"Error frame" is a ©configuration switchª which defaults +to off. You use this switch to specify the ©error +diagnosisª capabilities of your parser. If this switch +is set and the ©diagnose errorsª switch is set, i.e., +on, your parser will include a function which will +determine the "context" of any ©syntax errorª, that is, +the token the parser was trying to complete. + +To determine the context of an error, your parser will +scan backwards through the ©parser state stackª, +examining ©characteristic rulesª until it finds a state +which can accept a unique ©nonterminalª reduction token +that you have not marked as ©hiddenª. It will then set +PCB.©error_frame_ssxª to the ©parser stack indexª for +that level. +## + +ERROR_CONTEXT + +ERROR_CONTEXT is a macro AnaGram defines for you. If +your parser encounters a ©syntax errorª, you have +enabled the ©error frameª ©configuration switchª, and +you have defined a ©context typeª, ERROR_CONTEXT will +enable you to access the ©contextª as of when the parser +encountered the beginning of the ©error_frame_tokenª. +## + +Error Diagnosis + +"Error diagnosis" and ©error recoveryª are the two +aspects of ©error handlingª. If in the ©embedded Cª +portion of your syntax file you define a macro called +©SYNTAX_ERRORª, it will be invoked by the parser when a +©syntax errorª is encountered. If you have set the +©diagnose errorsª ©configuration switchª, the +©error_messageª field of the ©parser control blockª will +contain a pointer to a string containing a diagnostic +message. The diagnostic is of the form "Missing <token +name>" or "Unexpected <token name>". + +If you do not define SYNTAX_ERROR it will be +automatically defined so that a message will be written +to stderr. + +If the ©lines and columnsª switch has been set you will +have the current line number and column number available +for your diagnostic message. + +If you have set the ©error frameª switch as well as the +diagnose errors switch, the variable +PCB.©error_frame_tokenª will identify the ©nonterminal +tokenª the parser was trying to recognize when the +error was encountered. + +Of course, if your parser is controlling direct keyboard +input, a diagnosis might be unnecessary. In this case +you might define SYNTAX_ERROR so that it simply beeps at +the user and let it go at that. +## + +Error Handling + +Rarely is a parser built to read an arbitrary input +file. The normal situation is that the parser is built +to read files that conform to the rules specified in a +grammar, rules that describe a class of input files +rather than all possible input files. If the input file +does not conform to the grammar, the parser will detect +a ©syntax errorª. + +There are two aspects to error handling in your parser: +©error diagnosisª and ©error recoveryª. Error diagnosis +consists in informing your user that something +unexpected has happened. Error recovery consists in +either aborting the parse, or getting it started again +in some reasonable manner. AnaGram provides several +options for both error diagnosis and error recovery. + +When a syntax error is encountered, first your error +diagnosis option is executed and then your error +recovery option is executed. +## + +error_message + +error_message is a field in a ©parser control blockª to +which your ©error handlingª procedures may refer. If you +have set the ©diagnose errorsª ©configuration switchª, +on encountering a ©syntax errorª your ©parserª will +create a string containing an appropriate diagnostic +message and store a pointer to it into +PCB.error_message. +## + +Error Trace + +"Error Trace" is both a ©configuration switchª and the +name of an option in the ©Action Menuª. If the switch +is on, AnaGram adds code to your parser to capture +state information to a file in case of a ©syntax errorª. The Error +Trace option can then read this information and prepare a pre-built +©Grammar Traceª showing you the state of the parser at the time of +the error. + +The name of the file is determined by the macro +©AG_TRACE_FILE_NAMEª. AnaGram will provide a default +definition for the macro consisting of the name of +your ©syntax fileª plus the extension ".etr". You +may override this definition by defining AG_TRACE_FILE_NAME +in your ©embedded Cª. + +If error trace is enabled, AnaGram will also enable the +Error Trace option on the ©Action Menuª. If you select +Error Trace AnaGram will initialize a ©Grammar Traceª +window from the error trace file you select. The parser +stack of the trace will be as it was when the error +occurred. The last line of the parser stack pane will +show the ©lookahead tokenª that caused the syntax error. You may +then use the Grammar Trace to explore the nature of +the syntax error your parser encountered. + +AnaGram will +warn you if the error trace file is older than +the syntax file, since under those conditions, the +error trace file might be invalid. +## + +AG_TRACE_FILE_NAME + +AG_TRACE_FILE_NAME is a C macro used to determine the +name of the file your parser will write when it +encounters a ©syntax errorª if you have enabled +the ©error traceª ©configuration switchª. + +You may define AG_TRACE_FILE_NAME in your ©embedded Cª. +AnaGram provides a default definition given by the +name of your ©syntax fileª with the extension ".etr". +## + +Error Recovery + +Error recovery is the process of continuing after a +©syntax errorª. AnaGram offers several options. These +are controlled by ©configuration parameterªs and by +your grammar. + +If you do not specify any error recovery, your parser +will simply return to the calling program when it +encounters a syntax error. ©PCBª.©exit_flagª will be set +to two, to indicate termination on syntax error. + +If you wish your parser to simply ignore the erroneous +token and continue, set PCB.exit_flag to zero in your +©SYNTAX_ERRORª macro. You might use this option if your +parser is dealing directly with keyboard input. + +You may wish to use YACC type error handling. To do +this, simply incorporate a token called "error" in your +grammar, or specify some other token as an ©error +tokenª. On syntax error, your parser will back up to +the most recent state where "error" was acceptable +input, treat the bad input as an instance of error, and +then skip all input until it finds an acceptable input +token. At that point it will proceed as though nothing +had happened. + +AnaGram also provides an ©automatic resynchronizationª +option, which uses a complex heuristic to compare input +tokens against all stacked states in order to find the +best state from which to continue. +## + +Error Token Resynchronization + +One of your options for ©error recoveryª after a ©syntax +errorª is a technique similar to that provided in YACC. +You include a terminal token called "error" in your +grammar. (Or, use the ©error tokenª configuration +parameter to specify some other token to serve this +purpose.) When the parser encounters an error in the +input, after invoking the ©SYNTAX_ERRORª macro, it backs +up the ©parser state stackª to the most recent state in +which "error" was an acceptable input. It then shifts to +the new state as though it had seen an actual "error" +token. At this point, it skips over any character in the +input which is not an acceptable input character for +this state. Once it does find an acceptable input +character, it continues processing as though nothing had +happened. +## + +error_frame_ssx + +error_frame_ssx is a field in a ©parser control blockª +to which your ©error handlingª routines may refer. When +your ©SYNTAX_ERRORª macro is called, if you have set +both the ©diagnose errorsª and ©error frameª +configuration switches, error_frame_ssx will contain the +value of the ©parser stack indexª at the beginning of +the ©error_frame_tokenª. For example, if in a syntax +file, you fail to close a comment, AnaGram will +encounter an illegal end of file in the comment. In this +situation, error_frame_token is the token for a comment, +and error_frame_ssx gives the parser stack depth at the +beginning of the comment. +## + +error_frame_token + +error_frame_token is a field in a ©parser control blockª +to which your ©error handlingª routines may refer. If +you have set both the ©diagnose errorsª and ©error +frameª ©configuration switchªes, when your +©SYNTAX_ERRORª macro is called, it will contain the +©token numberª of the error_frame_token. +## + +error, Error Token + +"Error token" is a ©configuration parameterª that takes +a token name for a value. It has no default value. If +you do not specify it, and your grammar has a terminal +token called "error", it will be used as the error +token. If you have an error token defined your parser +will presume that you wish to use the ©error token +resynchronizationª method of ©error recoveryª. +## + +Escape Backslashes + +"©Escape backslashesª" is a ©configuration switchª that +defaults to off. When turned on, the ©line numbersª switch +will write pathnames with doubled backslashes. The switch +is no longer necessary, since AnaGram now uses forward slashes +in the pathnames in #line directives rather than backslashes.switch. +## + +Event Driven + +It is often convenient to configure your parser to be +"event driven". In this situation, instead of calling +your parser once to process the entire input, you call +an ©initializerª to initialize the parser, and then you +call the parser once for each input token. Each time you +call it, the parser processes the single input token +until it can do no more. + +You can interrogate the ©exit_flagª field of the +©parser control blockª to determine whether the parse is +complete or whether the parser encountered an error. + +Event driven parsers are especially convenient for +dealing with terminal input or communications protocols. +## + +Event Driven Parser Cannot Use Pointer Input + +This ©warningª message appears if you specify pointer +input for your ©parserª and also specify that it should +be event driven. If you are going to use ©pointer +inputª, you should not specify your ©parserª as event +driven. Conversely, if you really want an ©event +drivenª parser, you cannot specify pointer input. +## + +Excessive Recursion + +This ©warningª message appears if an internal stack in +AnaGram overflows because of the complexity of an +expression in your ©grammarª. Simplify your grammar by +using ©definitionª statements to name subexpressions. +## + +exit_flag + +exit_flag is a field in the ©parser control blockª. +When your parser returns, PCB.exit_flag contains an exit +code describing the outcome of the parse. Mnemonic +values for the exit codes are defined in the parser +header file AnaGram generates. These mnemonics, their +values and their meanings are: + AG_RUNNING_CODE = 0: Parse is not yet complete + AG_SUCCESS_CODE = 1: Parse terminated successfully + AG_SYNTAX_ERROR_CODE = 2: Syntax error encountered + AG_REDUCTION_ERROR_CODE = 3: Bad reduction token encountered + AG_STACK_ERROR_CODE = 4: Parser stack overflowed + AG_SEMANTIC_ERROR_CODE = 5: Semantic error, user defined + + An AnaGram parser checks exit_flag on return +from every ©reduction procedureª. AnaGram will exit with +the flag unchanged if it is non-zero. To halt a parse +from a reduction procedure, then, you need only set the +exit_flag to AG_SEMANTIC_ERROR_CODE, or any other unused value +greater than zero that suits your needs. +## + +Expansion, Expansion Rule + +In analyzing a ©grammarª, we are often interested in the +full range of input that can be expected at a certain +point. The expansion of a ©tokenª or state shows us +all the expected input. An expansion yields a set of +©marked ruleªs. The ©marked tokenª in each rule +shows us what input to expect. + +The set of expansion rules of a (©nonterminalª) token +shows all the expected input that can occur whenever the +token appears in the grammar. The set consists of all +the ©grammar ruleªs produced by the token, plus all the +rules produced by the first token of any rule in the +set. A ©marked tokenª for an expansion rule of a token +is the first element in the rule. + +The expansion of a state consists of its ©characteristic +ruleªs plus the expansion rules of the marked token in each +characteristic rule. +## + +Expansion Chain + +You may select an Expansion Chain window from the +©Auxiliary Windowsª popup menu of most windows that contain +©expansion ruleªs. + +The Expansion Chain window is extremely useful for +indicating why a particular ©grammar ruleª is an +©expansion ruleª in a particular state. To see a chain +of productions that produces a desired expansion rule, +select the expansion rule with the cursor bar, press +the right mouse button for the Auxiliary Windows menu, and select +Expansion Chain. + +The Expansion Chain window will then present a sequence +of expansion rules, using the same format as the +Expansion Rules window, but subject to the constraint +that each rule is produced by the ©marked tokenª in the previous line. + +The first rule in the window is a ©characteristic ruleª +for the given state. The last rule in the window is +the rule selected by the cursor bar in the window from +which you chose the Expansion Chain. It should be noted +that this expansion is not unique. There may be other +derivations. +## + +Expansion Rules + +You may select an Expansion Rules window from the +©Auxiliary Windowsª popup menu of most windows which display +©marked rulesª. The Expansion Rules window shows the +complete set of ©expansion ruleªs for the ©marked +tokenª in the highlighted rule. + +In other windows, including all trace windows, the +Expansion Rules window shows the expansion of the token +on the highlighted line. +## + +F1 + +Use the F1 key to bring up a context sensitive help window. Because of +various peculiarities of the Windows API, there are a few contexts +where the F1 key does not work; however, generally the ©help cursorª +works where F1 does not and vice versa. + +©Helpª windows have hypertext links to related help windows. +In a help window, the right mouse button pops up a menu of +all the links for the window. +## + +extend pcb + +The "extend pcb" statement is an ©attribute statementª that allows you to +add declarations of your own to the ©parser control blockª. With this +feature, data needed by ©reduction procedureªs can be stored in the pcb +rather than in global or static storage. This capability greatly +facilitates the construction of ©thread safe parsersª. + +The extend pcb statement may be used in any configuration section. +The format is as follows: + extend pcb { <C or C++ declaration>... } + +It may, of course, extend over multiple lines and may contain any +C or C++ declarations. AnaGram will append it to the end of the parser +control block declaration in the generated parser ©header fileª. There may +be any number of extend pcb statements. The extensions are appended to +the pcb in the order in which they occur in the syntax file. + +The extend pcb statement is compatible with both C and C++ parsers. Note +that even if you are deriving your own class from the parser control +block, you might want to use the extend pcb to provide virtual function +definitions or other declarations appropriate to a base class. +## + +Far Tables + +"Far tables" is a ©configuration switchª which defaults +to off. If it is set, when AnaGram builds a ©parserª it +will declare the larger tables it builds as FAR. This +can be a convenience when using some memory models with +8086 architecture. +## + +Fatal Syntax Errors + +This ©warningª message occurs when AnaGram cannot +complete the ©Analyze Grammarª command on your ©syntax +fileª because of errors in your syntax file. +## + +File Trace + +You can use the File Trace facility to verify your grammar, +even before you have implemented ©reduction proceduresª or +any other code. Thus you can defer writing procedural code +until you have the grammar working to your specifications. + +To run File Trace, select +File Trace from the ©Action Menuª or click on the File Trace button. + +Select a test file. When the ©File Trace Windowª appears, +double click at any point in the ©test file paneª, or +click the ©Parse Fileª button to parse the entire file. +AnaGram will parse up to the point you have selected +according to the rules in your ©grammarª. If the test file does not +conform to the rules of the grammar, the parse will halt with a +©syntax errorª. You can then inspect the ©Parser Stack paneª and the +©Rule Stack paneª to get an idea of the nature of the problem. + + +AnaGram uses different colors to +distinguish the portion of the test file that has +been parsed from the portion that has not been parsed, +so the location of the error should be readily apparent. + +Since the syntax error often occurs somewhat downstream +from the actual error, you may need to back the parse up +and approach the error slowly. In the Test File pane, +double click at any point prior to the error to back +the parse up to that point. You can then click on the +©Single Stepª button to perform a single parser action. + +You may also use the cursor keys to control the parse. +As long as no error is encountered, the parse is locked +to the blinking cursor. If you cursor past the syntax +error, however, the parse can no longer track the cursor +so the cursor location will differ from the parse location . The +cursor and parse locations will also differ after you single click +at any point other than the current parse location. + +When the cursor and the parse location are thus out of synch, the +Single Step button is replaced with a ©Synch Parseª button. You +can click on Synch Parse to get the parse back in synch with the +cursor. + +The File Trace option will be greyed out on the ©Action Menuª +if your grammar has ©empty recursionª, since +such a grammar may cause infinite loops in the parser. + +Because a File Trace is based on character codes, it will also be greyed out +on the ©Action Menuª if your parser uses ©token inputª rather than +character input. + +All parser actions performed by a File Trace update the ©trace +coverageª counts, enabling you to verify the extent to which +your test files exercise your parser. + +Normally, AnaGram reads test files in "text" mode, +discarding carriage return characters. If your parser +needs to recognize carriage return characters +explicitly, you should turn the "©test file binaryª" +switch on. +## + +File Trace Window + +The ©File Traceª window normally consists of three panes: + The ©Parser Stack paneª + The ©Test File paneª + The ©Rule Stack paneª + + If your grammar uses ©semantically determined productionsª, +the ©Reduction Choices paneª will appear when necessary +to allow you to select a ©reduction tokenª. The choice that +you make will be remembered and reused if you should back up +the parse and parse past this point again. The remombered choice +is not made automatically when you use ©Single Stepª. Thus, +if you wish to +change your choice, position the cursor at the location where +the choice must be made and Single Step past the choice. + +If you ©reloadª the test file, the choices you have made will +be discarded. + +The active pane has +a distinctively colored title panel and cursor bar. You can +use the tab key to tab among the panes. The function of +other keyboard keys depends on which pane is active. + +Along the bottom of the File Trace Window is a toolbar with +two status boxes: + ©Parse Locationª + ©Parse Statusª +and five buttons: + ©Single Stepª + ©Parse Fileª + ©Resetª + ©Reloadª + ©Helpª + + If the blinking cursor loses synch with the current +parse location, the Single Step button is replaced with +the ©Synch Parseª button. +## + +Grammar Trace Window + +The ©Grammar Traceª window normally consists of three panes: + The ©Parser Stack paneª + The ©Allowable Input paneª + The ©Rule Stack paneª + + If your grammar uses ©semantically determined productionsª, +the ©Reduction Choices paneª will appear when necessary +to allow you to select a ©reduction tokenª. + +The active pane has +a distinctively colored column header and cursor bar. You +can use the tab key to tab among the panes. The function of other +keyboard keys depends on which pane is active. + +Along the bottom of the Grammar Trace Window is a toolbar with +a ©Parse Statusª box, a ©text entryª field +and four buttons: + ©Proceedª + ©Single Stepª + ©Resetª + ©Helpª + + In the ©Parser Stack paneª you can see a +representation of the ©parser state stackª and ©parser stateª as they +might appear in the course of execution of your ©parserª. You can +examine the ©allowable inputª tokens and see the changes to the +state and the state stack caused by any input token you +choose. The ©Rule Stack paneª shows the relationship between the +contents of the parser stack and your ©grammarª. If your grammar +uses ©semantically determined productionsª, you can select the +appropriate ©reduction tokenª from the ©Reduction Choices paneª. + +You can enter text characters directly in the ©text entryª +field. This means you can run a Grammar Trace like a ©File Traceª +where the test file is replaced by the characters you type in the +text entry field. This is a very convenient way to check out your +grammar. +## + +Test File, Test File Pane + +In the ©File Traceª, the file under test is displayed in the +upper right pane. To parse to a specific point, double +click at that point. + +As long as the parse location and the cursor are synchronized, +when you use the cursor keys to +move the cursor, the parse will track the cursor. + +If the parse encounters a ©syntax errorª, it will not be able +to go beyond the location of the error. In this situation, +moving the cursor right or down will cause the cursor position to +differ from the parse location. The parse and cursor positions can also +differ if you single click anywhere in the Test File pane. + +If the +parse location and the cursor are thus not synchronized, the +©Single Stepª button will be replaced with a ©Synch Parseª +button. Click on the Synch Parse button to get the cursor +and the parse back in synch. Of course, the parse will still +not be able to proceed past a syntax error. + +In the default color scheme, parsed text is shown on a lighter +background than is unparsed text. + +If your grammar uses ©semantically determined productionªs, +the parse will halt when one is encountered and the ©reduction +choices paneª will be displayed so you may select the appropriate +©reduction tokenª. + +At any time you can click on the ©Reset buttonª to reset the parse to +the beginning of the test file. If you modify the test file, you +can click on the ©Reload buttonª to load the modified file and +reset the parse. + +Normally, AnaGram reads test files in "text" mode, discarding carriage +return characters. If your parser needs to recognize carriage return +characters explicitly, you should turn the ©test file binaryª +©configuration switchª on. + +Sample test files are provided with the FFCALC and FC ©examplesª. +## + +Parse Location + +The current location of the ©File Traceª parser in the +©test file paneª. The format is <line number>:<column number>. +## + +Parse Status + +The current state of the ©File Traceª or ©Grammar Traceª parser. + + Ready: The parser is ready for input. + Running: The parser is processing input. + Parse Complete: The parser has reached the end of the input. Click +on ©resetª or ©reloadª to restart the parse. + Syntax error: A syntax error has been encountered. The parser cannot +go any further. + Unexpected end of file: The parser has reached the end of the actual +input but the grammar still expects more. + Select reduction token: The parser encountered a ©semantically determined +productionª. Select a ©reduction tokenª from the ©Reduction Choices paneª. + Selection error: The reduction token selected from the Reduction Choices +pane was not allowable input in the present state. Select another +reduction token. +## + +Parse File + +Use the Parse File button in the ©File Traceª to parse all the way +to the end of file. The parse will not stop until it encounters a +©syntax errorª, a ©semantically determined productionª, or the end of file. +## + +Reset + +Use the Reset button in the ©File Traceª or ©Grammar Traceª to reset +the parse to its initial state. This is most convenient when using +a ©Conflict Traceª, ©Error Traceª, or other ©Auxiliary Traceª +since these traces seldom begin at state 0. +## + +Reload + +The Reload button in the ©File Trace Windowª rereads the test file. +This is convenient if you modify the test file while you are testing +the ©grammarª. +## + +Lookahead Token + +In an ©LALR-1 parserª the "lookahead token" is the next token to be +processed. For each ©parser stateª there is a list of tokens that +may be seen in this state. For each token there is a corresponding +©parser actionª. The parser scans the list looking for the lookahead +token and then performs the corresponding parser action. If the +lookahead token cannot be found and there is no ©default reductionª, +the parser signals a ©syntax errorª. + +In File Trace, and in some circumstances in Grammar Trace, the +lookahead token can be seen on the last line of the +©Parser Stack paneª. +## + +GET_CONTEXT + +If you have defined a "©context typeª" ©configuration +parameterª, and wish to maximize the performance of your +parser, you should write a GET_CONTEXT macro which +stores the context of the input token directly in +©CONTEXTª, the current stack location. Otherwise, you +can write your ©GET_INPUTª macro so that it stores +context into ©PCBª.©input_contextª. The default +definition for GET_CONTEXT will then copy +PCB.input_context to the ©context stackª at the +appropriate time. +## + +GET_INPUT + +GET_INPUT is a macro which you should define to control +©parser inputª if your +parser is not ©event drivenª and you are not using +©pointer inputª. If you don't define it, AnaGram will +define it by default to read a single character from +stdin: + + #define GET_INPUT (PCB.input_code = getchar()) + + ©PCBª.©input_codeª is an integer field in the ©parser control blockª +which is used to hold the current character code. You +may also want GET_INPUT to set the values of ©input_contextª or +©input_valueª. It may call an input function, or it may execute +in-line code when it is invoked. +## + +iso latin 1 + +The "iso latin 1" ©configuration switchª controls case +conversion on input characters when the ©case sensitiveª +switch is set to off. When "iso latin 1" is set, the +default ©CONVERT_CASEª macro is defined to convert +correctly all characters in the latin 1 character set. +When the switch is off, only characters in the ASCII +range (0-127) are converted. +## + +Dragon Book + +The "dragon book" is the classic reference on formal parsing: + Compilers: Principles, Techniques, and Tools + Aho, Sethi, and Ullman + Addison-Wesley, 1986. + + It is called the "dragon book" because of its +colorful cover illustration showing a knight in +armour ("data flow analysis") armed with sword +("©LALR parser generatorª") and shield ("syntax +directed translation") at his PC attacking a +bright red dragon ("complexity of compiler design"). +## + +LALR-1 Parser + +An LALR-1 parser is a ©parserª created from a +©grammarª by an ©LALR parser generatorª. +## + +LALR Parser Generator + +LALR(k) (LookAhead Left-to-right Rightmost derivation) +parser generators are +programs that create parsers algorithmically from +formal grammars. The (k) refers to the number of +lookahead symbols used to make parsing decisions. +Normally, k = 1. + +LALR parsers are a subset of the class of +so-called LR parsers. LALR parsers are generally more compact +and less costly to create. These advantages are +obtained at a slight sacrifice in generality. Although +is possible to contrive an LR grammar which has +©conflictªs when analyzed with the LALR algorithm, +such situations rarely occur in practice, and can +be easily resolved by rewriting a few rules. + +In the ©dragon bookª, section 4.7, the authors list the following +attractive properties of LR parsing: + LR parsers can be constructed to recognize virtually +all programming-language constructs for which context-free +grammars can be written. + The LR parsing method is the most general nonbacktracking +shift-reduce parsing method known, yet it can be implemented as +efficiently as other shift-reduce methods. + The class of grammars that can be parsed using LR methods is +a superset of the class of grammars that can be parsed with +predictive parsers. + An LR parser can detect a syntactic error as soon as it is +possible to do so on a left-to-right scan of the input. +## + +Getting Started + +AnaGram is an ©LALR parser generatorª. Its input is +a ©syntax fileª, which you prepare with an ordinary +programming editor. Its output is a ©parser fileª. which +you can compile with a C or C++ compiler on any platform +and link into your program. To compile on Unix platforms, set +the ©no crª ©configuration switchª. + +AnaGram has extensive context-sensitive hypertext +©helpª. In any AnaGram window, press ©F1ª or select an item with the +©Help Cursorª. Further documentation in HTML format, including +documentation of examples, is found in the html subdirectory. AnaGram +also has a comprehensive hard-copy manual, the AnaGram User's Guide. + +If you are new to AnaGram, you might begin by reviewing the Help +Topics ©How AnaGram Worksª and ©Program Developmentª, and looking at +An Annotated Example and Summary of AnaGram Notation in the HTML +documentation. + +If you are not already familiar with formal parsing techniques, you +may want to read Introduction to Syntax Directed Parsing in the HTML +documentation. Note also the Fahrenheit to Celsius conversion +examples in the examples/fc directory, which comprise a graded +sequence of syntax files illustrating most of the basic +principles of ©syntax directed parsingª in easy steps. Documentation +is in html/fc.html. + +AnaGram has many features, many of which are not +commonly found in parser generators: + the ©configuration sectionª + ©thread safe parsersª + C++ support + the ©disregardª and ©lexemeª statements + ©event drivenª parsers + ©character setsª + ©virtual productionsª + ©File Traceª, ©Grammar Traceª + ©automatic resynchronizationª + ©error token resynchronizationª + +To familiarize yourself with the many options available for configuring +your parsers, select ©Configuration Parametersª from the ©Browse Menuª. +Use ©F1ª or the ©Help Cursorª to pop up explanations of the various +parameters. + + +If you don't find the information you need, please visit the +AnaGram web page at http://www.parsifalsoft.com for further +information and support. + +## + +How AnaGram Works + +AnaGram contains an ©LALR Parser Generatorª which creates a +table driven ©LALR-1 parserª from a ©grammarª written in a variant +of Backus-Naur Form. AnaGram works in two steps. In the +first step, or analysis phase, it reads a ©syntax fileª and +compiles a number of tables describing the grammar. In the +second step, or build phase, it writes two output files: +a ©parser fileª written in C or C++ and a ©header fileª. + +Syntax files normally have the extension .syn. The rules for +writing syntax files are given in the AnaGram User's Guide +and in the Summary of AnaGram Notation in the HTML documentation. + +The header file contains definitions and declarations, including +the definition of a ©parser control blockª. + +The parser file consists of: + The ©C prologueª, if any. + Definitions and declarations provided by AnaGram. + ©Reduction procedureªs. + a customized ©parsing engineª. + a ©parse functionª to be called when input is to be parsed. + + The name of the parser file is controlled by the ©parser +file nameª ©configuration parameterª. The name of the +parse function itself is controlled by ©parser nameª. In the +default case, the parser file will have the same name as +the syntax file, with the extension .c. The name of the +parse function is given by the ©parser nameª parameter. It defaults +to the name of the syntax file. +## + +Examples + +The EXAMPLES directory of the AnaGram distribution disk +contains a number of examples to help you get started. +Documentation for the examples, in HTML format, is located +in the html directory (start at index.html or examples.html). + +The traditional Hello, World, in examples/hw, is a good +example for getting familiar with the mechanical +procedures of building both C and C++ parsers from +©syntax fileªs. + +The Fahrenheit/Celsius conversion examples in the +examples/fc directory on your AnaGram diskette comprise +a graded sequence of syntax files which illustrate +most of the basic principles of ©syntax directed +parsingª in easy steps. In addition, these examples +demonstrate many features of AnaGram which are not +found in other parser generators: + the ©configuration sectionª + ©character setsª + ©virtual productionsª + ©error token resynchronizationª + ©File Traceª + the ©disregardª and ©lexemeª statements + ©event drivenª parsers + +The Four Function Calculator (examples/ffcalc) is used +traditionally to demonstrate parser generators. If you +are already familiar with ©syntax directed parsingª this +example will give you a good overview of the basics of +AnaGram. An annotated version of this example may be +found in AnaGram's HTML documentation. +The FFCALC example illustrates the use of ©precedence +rulesª to resolve ©conflictsª. + +Other examples are available to demonstrate additional +features of AnaGram. + +RCALC (examples/rcalc) is a simple four function +calculator which accepts roman numeral input. It +illustrates the following AnaGram features: + ©pointer inputª + ©SYNTAX_ERRORª macro + ©context stackª + +DSL (examples/dsl) is a complete DOS script language, +which provides capabilities well in excess of DOS batch +files. DSL is a complete working program, used in the +past to create AnaGram's install program. Some of the +specific features of AnaGram which it illustrates are: + ©distinguish lexemesª + ©distinguish keywordsª + ©far tablesª + +MPP is a fully functional macro preprocessor for C or +C++. Included with MPP are two C grammars, either of +which may be incorporated into MPP. MPP uses several +parsers that work together: + TS.SYN is the primary token scanner parser that +identifies tokens, and handles preprocessor +commands. + MAS.SYN is used to do macro argument substitution. + CT.SYN is used to identify tokens that result from +string concatenation during macro argument +substitution. + EX.SYN is used to evaluate constant expressions in +#if preprocessor statements. + +Among the more powerful features of AnaGram that MPP +illustrates are: + ©semantically determined productionsª + ©event drivenª parsers +## + +Goal, Goal Token, Start Token + +The ©grammar tokenª is the token which represents the +"top level" in your grammar. Some people refer to it as +the "goal" or "goal token" and others as the "start +token". Whichever it is called, it is the single token +which describes the complete input to your parser. + +The most common way to specify a grammar token is as +follows: + grammar -> statements?..., eof +This production tells AnaGram that the input to your +parser consists of a (possibly empty) sequence of +statements followed by an end of file token. + +There are a number of ways of specifying which token in +your ©syntax fileª represents the top level of your +grammar. You may simply name it "grammar", or you may +tag it with a '$' character when you define it, or you +may set the ©grammar tokenª ©configuration parameterª. + +If you should inadvertently tag several tokens with the +'$' character and/or set the grammar token parameter, +it is the last such specification in the file which +wins. Some people develop their grammars bottom up, +gradually adding new levels of complexity. In the +course of development, they may specify a number of +tokens as grammar tokens and forget to remove the old +specifications. + +Notice that if you define the token "grammar" anywhere +in your syntax and specify the grammar token otherwise, +"grammar" will not be the grammar token. This is to +keep "grammar" from being a reserved word. If you need +to use it in your syntax for something other than the +whole grammar, you are free to do so. +## + +Grammar + +Traditionally, a "grammar" is a set of ©productionªs +which taken together specify precisely a set of +acceptable input streams, in terms of an abstract set +of ©terminal tokensª. The set of acceptable input +streams is often called the "language" defined by the +grammar. + +In AnaGram, the term "grammar" also includes +©configuration sectionsª as well as the ©definitionsª +of ©character setsª and ©virtual productionsª which +augment the collection of productions. The term is +often used in contrast to the term "©syntax fileª" +which is used to signify the complete AnaGram source +file including reduction procedures and embedded C or +the term "©parserª" which refers to AnaGram's output +file. + +A grammar is often called a "syntax", and the rules of +the grammar are often called syntactic rules. +## + +Grammar Analysis + +The major function of AnaGram is the analysis of +context-free grammars written in a particular variant +of Backus-Naur Form. + +The analysis of a grammar proceeds in four stages. In +the first, the input grammar is analyzed and a number +of tables are built which describe all of the +©productionªs and components of the ©grammarª. + +In the second stage, AnaGram analyzes all of the +character sets defined in the grammar, and where +necessary, defines auxiliary tokens and productions. + +In the third stage, AnaGram identifies all of the +states of the parser and builds the go-to table for the +parser. + +In the fourth stage, Anagram identifies ©reduction +tokensª for each completed ©grammar ruleª in each state +and checks for ©conflictªs. + +Use the ©Analyze Grammarª command to cause AnaGram to +analyze your grammar. +## + +Grammar Is Ambiguous + +This ©warningª message appears if your ©grammarª +contains ©conflictªs. AnaGram will resolve ©shift-reduce +conflictsª by selecting the shift option. It will +resolve ©reduce-reduce conflictsª by selecting from the +conflicting ©grammar ruleªs the one which appears first +in the ©syntax fileª. +## + +Grammar Rule + +A "grammar rule" is the right hand side of a production. +It is a sequence of ©rule elementsª. Each rule element +identifies some token, which can be either a ©terminal +tokenª or ©nonterminal tokenª. + +A grammar rule is "matched" by a +corresponding sequence of tokens in the input stream to +the parser. The rule elements in the grammar rule may be +©token nameªs, ©set expressionsª, ©character constantsª, +©immediate actionªs, ©keyword stringsª, or ©virtual +productionsª. + +A grammar rule may be followed by an +optional ©reduction procedureª. The ©semantic valuesª of +the tokens that comprise the rule may be passed to the +reduction procedure by using ©parameter assignmentsª. + +A grammar rule always makes up the right hand side of a +production. The left hand side of the production +identifies one or more ©nonterminal tokensª, or +©reduction tokensª, to which the rule reduces when +matched. If there is more than one reduction token, +the production is called a ©semantically determined productionª and +there should be a ©reduction procedureª to select +the correct reduction token at run time. +## + +Grammar Token + +The "grammar token" ©configuration parameterª may be +used to specify the ©goalª, or "start" token for the +syntax analyzer portion of AnaGram. Alternatively, you +could simply call the token "grammar", or you could +append a '$' character to it when you define it. + +Each grammar must have a grammar token specified before +it can be analyzed or before a parser can be built. The +grammar token is the single token to which the grammar +finally condenses. When this token is identified by the +parser, the parse is complete. +## + +Grammar Trace + +AnaGram's Grammar Trace facility lets you examine the workings of your +©parserª in detail. You can use the Grammar Trace as soon as you have +analyzed your ©grammarª, even before you have written any ©reduction +procedureªs or other code. Thus you can defer writing procedural code +until you have the grammar working to your specifications. + +Select the ©Grammar Trace Windowª +from the ©Action Menuª or click on the Grammar Trace +button. + +In the ©Parser Stack paneª you can see a representation of the +©parser state stackª and ©parser stateª as they might appear in the +course of execution of your ©parserª. The ©Rule Stack paneª shows the +relationship between the contents of the parser stack and your +©grammarª. If your grammar uses ©semantically determined +productionsª, you can select the appropriate ©reduction tokenª from +the ©Reduction Choices paneª. + +At any stage, the ©Parser Stackª represents a parse +in progress. It shows the sequence of ©tokenªs that have +been input so far and the states in which they were +seen. When a production is complete and the grammar rule +is reduced, the tokens that make up the rule are removed +from the stack and replaced by the token on the left +side of the production. Initially, the Parser Stack contains +only a ©lookahead lineª. + +To explore your grammar, choose ©tokenªs one by one from +the ©Allowable Inputª +pane. This pane shows the tokens allowable at the current state of the +grammar, and the actions that result when the tokens are chosen. + +You can also enter text characters directly in the ©text entryª +field. This means you can run a Grammar Trace like a ©File Traceª +where the test file is replaced by the characters you type in the +text entry field. This is a very convenient way to check out your +grammar. Text entry is, of course, not appropriate for grammars that +expect ©token inputª. + +In a ©File Traceª you can advance the parse no matter which pane is +active. In a Grammar Trace there is a question as to whether input is +intended to come from the Allowable Input pane or the text entry +field. Therefore the parse can only be advanced when one of these +two is active to indicate that it is the source of input. + +Specialized prebuilt Grammar Traces such as the ©Conflict Traceª and +the ©Auxiliary Traceª can be selected from ©Auxiliary Windowsª popup +menus where appropriate. + +All Grammar Trace activity updates the ©trace coverageª counts. +## + +Text Entry + +It is sometimes more convenient to enter text in the +text entry box on the ©Grammar Traceª toolbar than to +select individual tokens from the ©Allowable Input paneª. + +By entering text you can proceed quickly to a troublesome +state without having to choose each individual token +en route. + +After entering text, press Enter or click on the Proceed +button to parse the text. Click on the single step button +to work slowly through the text step by step. +## + +header file name + +The "header file name" parameter names the ©parser +headerª file that AnaGram will generate when it builds +your parser. This header file can be used with your +parser or with other modules in your program. The +header file contains a number of typedef statements and +an number of macro definitions which are needed in your +parser and may be useful in other modules. + +If the value of this parameter contains a '#' character, +AnaGram will substitute the name of your syntax file for +the '#'. The default value of "header file name" is +"#.h". +## + +Help, Using Help + +There are 3 main ways to access AnaGram Online Help: + Press F1 for context-sensitive help from most windows and menu items. + Similarly, use the ©Help Cursorª from most windows and menu items. + From the Help menu, you can bring up ©Help Topicsª and choose a topic. + +You can also get fly-over help for the toolbar buttons on the ©Control +Panelª. File and Grammar Traces have a Help button. + +AnaGram's Help windows, unlike most others, remain on-screen until you +dismiss them. This means you can refer to several topics at once. They +have hypertext links to other Help topics. Also, right-clicking +the mouse on a Help window or pressing F1 will pop up an Auxiliary +Windows menu of all linked topics in the window. "Using Help" is always +available from this popup menu. + +Note that, for the ©Warningsª, ©Configuration Parameterªs and ©Help +Topicsª windows, F1 will give you help for the item +on the highlighted line, whereas the Help Cursor allows you +to select any line by clicking on it. + +AnaGram also has documentation in HTML format, indexed in the index.html +file. This documentation covers Getting Started, examples, and some +further topics mainly condensed from the User's Guide. Hard copy +documentation is in the AnaGram User's Guide, which has the most +detail. +## + +Hidden + +In a ©configuration sectionª of your grammar you may use +an ©attribute statementª to declare one or more tokens +to be "hidden". Tokens that are "hidden" do not appear +in the ©token namesª table, and thus do not appear in syntax error +diagnoses. When your parser attempts to determine the +©error frameª of a ©syntax errorª, it will disregard the +tokens that have been declared hidden. The hidden +declaration consists simply of the keyword hidden +followed by a list of tokens, separated by commas and +enclosed in braces ({ }): + [ hidden { widget, wombat, foo, bar } ] + + You would use the "hidden" attribute primarily for +tokens whose name would not mean anything to your users. +## + +Immediate Action + +Immediate actions are snippets of C code which are to +be executed in the middle of a ©grammar ruleª. Immediate +actions are denoted by a '!' character followed by +either a C expression, terminated by a semicolon; or a +block of C code enclosed in braces. For example, in a +simple desk calculator example one might write the +following: + transaction + -> !printf('#');, expression:x =printf("%d\n",x); + + Notice that the only apparent difference between an +immediate action and a ©reduction procedureª is that the +immediate action is preceded by '!' instead of '='. +Notice that the immediate action must be followed by a +comma to separate it from the following ©rule elementª. + +Immediate actions may also be used in ©definitionªs: + prompt = !printf('#'); + +The above example, using this definition would then be: + transaction + -> prompt, expression:x =printf("%d\n",x); + + You could accomplish the same result by writing a ©null +productionª and a reduction procedure: + prompt + -> =printf('#'); + +This is exactly how AnaGram implements immediate +actions. +## + +Implementation Errors + +"Implementation errors" are errors your parser detects +which are not the immediate result of bad input. When +it encounters an implementation error, your parser will +call a macro which you can define to deal with the +problem in a manner suitable to your needs. If you don't +provide these macros, AnaGram will make default +definitions. There are two macros corresponding to two +implementation errors: + ©PARSER_STACK_OVERFLOWª + ©REDUCTION_TOKEN_ERRORª +## + +Inappropriate Value + +This ©warningª message appears when the value assigned to +a ©configuration parameterª is not appropriate to that +parameter. Check the definition of the parameter, by +opening the ©Configuration Parameters Windowª, +selecting the parameter and pressing F1. +## + +Initializer + +For every ©parserª it generates, AnaGram generates an +"initializer" function to call the parser. AnaGram +names the initializer by prefixing the ©parser nameª +with "init_". If your parser is ©event drivenª, you must +call the initializer before you call the parser. + +If your parser is not event driven, AnaGram will +normally include a call to the initializer in the +parser. If you wish to be able to call your parser more +than once without its being re-initialized, you may turn +off the ©auto initª ©configuration switchª. When you do +this, you assume responsibility for calling the +initializer. If your parser is event driven, you must +always call the initializer function. + +If the ©reentrant parserª switch is set, the initializer takes +a pointer to the ©parser control blockª as its sole argument. Otherwise +it takes no arguments. The initializer returns no value. All +communication is by means of the ©parser control blockª. +## + +Input Character + +The actual unit of ©parser inputª is usually a +single character. Note that you are not limited to +eight-bit characters. Your parser will use the input +character to index a translation table, ©ag_tcvª, to +determine the ©token numberª for that character. The +©token numberª identifies the actual syntactic token. +The character code itself will be the ©semantic valueª +of the token. Note that AnaGram groups together all +input characters that are syntactically +indistinguishable into a single input token. +## + +input_code + +input_code is a field in the ©parser control blockª +which contains the current ©input characterª, or, if your +©GET_INPUTª macro supplies ©token numberªs directly, the +token number. + +If you write your own ©GET_INPUTª macro, you must make +sure that you store the input character, or token +number, you get into ©PCBª.input_code. +## + +INPUT_CODE(t) + +If you set both the ©pointer inputª and the ©input +valuesª ©configuration parameterªs, you must provide an +INPUT_CODE macro for your parser. In this situation, +your parser will use the pointer to load the +©input_valueª field of the ©parser control blockª and +uses the INPUT_CODE macro to extract the appropriate +value for the ©input_codeª field. For example, if the +input_value is a structure and the appropriate member +field is called "id" you would write: + + #define INPUT_CODE(t) (t).id +## + +input_context + +"input_context" is a field which AnaGram adds to the +definition of the ©parser control blockª structure when +you define a ©context typeª ©configuration parameterª. +If you choose, you can write your GET_INPUT macro so +that it stores the context value in ©PCBª.input_context. +The default definition for ©GET_CONTEXTª will then stack +the context value at the appropriate time. You can think +of PCB.input_context as a sort of temporary "parking +place" for the context value. +## + +Input Scan Aborted + +This ©warningª message appears if AnaGram is unable to +finish scanning your ©syntax fileª because of previous +errors. +## + +input values + +"Input values" is a ©configuration switchª which +defaults to off. If your ©parser inputª includes +explicit ©token valueªs which are not simply the ascii +values of corresponding ascii input characters, you must +set the "input values" switch to inform AnaGram. Unless +your parser is ©event drivenª or uses ©pointer inputª, +you must also provide your own ©GET_INPUTª macro. + +If your parser uses pointer input, you must provide an +©INPUT_CODE(t)ª macro. + +The semantic value of an input token is to be stored in the +©input_valueª field of the parser control block. +## + +input_value + +input_value is a field in the ©parser control blockª +which is used to store the semantic value of the input +token. + +If you write your own ©GET_INPUTª macro, and you have +set the ©input valuesª ©configuration switchª, you +should make sure that you store the value of the ©input +characterª or token into ©PCBª.input_value. +## + +Internal Error + +"AnaGram internal error: ..." is a ©warningª message which +appears if one of AnaGram's internal consistency tests +fails. This message should never appear if AnaGram is +working properly. Usually AnaGram will abort on +encountering an internal error, although under +a small set of circumstances it may continue. Should +this happen, it would be wise to close AnaGram and +restart it. + +If you do get an internal error, please note the complete +message identifing the problem and file a bug report, +following the directions posted on the AnaGram web page +at http://www.parsifalsoft.com. +A copy of the relevant +syntax file and a summary of the circumstances surrounding +the problem would be greatly appreciated. +## + +Intersection + +In set theory, the intersection of two sets, A and B, is +defined to be the set of all elements of A which are +also elements of B. In an AnaGram ©syntax fileª, the +intersection of two ©character setsª is represented with +the '&' operator. The intersection operator has lower +©precedenceª than the ©complementª operator, but higher +precedence than the ©unionª and ©differenceª operators. +The intersection operator is ©left associativeª. +## + +Keyboard Support + +AnaGram can be controlled entirely from the keyboard. In the Control +Panel, you +can tab to any button and press Enter to select it. In addition to +the conventional +Windows keyboard functions, the following keys have been implemented: + Escape closes any AnaGram window except the Control Panel. + F8 toggles between an active AnaGram window and the Control Panel + F10 accesses the Control Panel menu from any +AnaGram Window. + Shift F10 pops up the Auxiliary Windows menu +## + +Keyword, Keyword String + +Keywords are a very important feature of AnaGram. They +provide an easy way to pick up special character +sequences in your input, thereby eliminating the need +for a lot of tedious ©productionªs. + +If AnaGram finds, on the right hand side of one of your +©grammarª productions, a string enclosed in double +quotes, such as "IF", it automatically creates from the +string a "keyword" which is incorporated into your +parser. You may have any number of keywords. A keyword +is treated as a single terminal token. Recognition of +keywords is governed by the ©case sensitiveª switch. + +Your parser will look for a keyword in its input stream +wherever you have defined this particular keyword to be +legitimate input. It will do whatever lookahead is +necessary in order to pick up the entire keyword. If +several keywords match the input, such as IF and IFF, +it will select the longest match, IFF in this example. + +Important points to notice about keywords: + 1) Keywords take precedence over ordinary +characters in the input stream - thus if the character +I and the keyword IF are both legitimate input at some +point, IF will be selected, if present, in preference +to I. + 2) Keywords are not reserved words. Your parser +will only look for a keyword when it is in a state +where that keyword is legitimate input. + 3) Keywords do not participate in character sets +and should not appear in definitions of character sets. +In particular, they are not considered as belonging to +the complement of a character set. Thus +a keyword would not be considered legitimate input +for the production + next char -> ~( '/' + '*' ) + + 4) Keywords may appear in virtual productions. + + 5) Keywords may be named by means of a definition. + +AnaGram will list all the keywords in your grammar in +the ©Keywordsª window. In addition, in numerous +windows where the cursor bar selects a state, the +©Auxiliary Windowsª popup menu will list a Keywords option. +This window will provide a list of the keywords +acceptable in the selected ©parser stateª. + +On occasion, a kind of conflict, called a ©keyword +anomalyª may occur. If so, such conflicts will be listed +in the ©Keyword Anomaliesª window. The "©stickyª" +©attribute statementª is useful in dealing with keyword +anomalies. +## + +Keyword Anomalies Found + +This ©warningª message indicates that AnaGram has found +at least one ©keyword anomalyª in your ©grammarª. Open +the ©Keyword Anomaliesª window to see a list of those +that have been found. +## + +Keyword Anomaly + +In ©syntax directed parsingª, it is assumed that input +©tokenªs can be uniquely identified. In the case of +©keywordªs, however, there is the possibility that the +individual characters making up the keyword, as well as +the keyword taken as a whole, could constitute +legitimate input under some circumstances. Thus +©keywordsª, though a powerful and useful tool, are not +completely consistent with the assumptions that underlie +©syntax directed parsingª. This can occasionally give +rise to a type of conflict, diagnosed by AnaGram, +called a "keyword anomaly". AnaGram is quite +conservative in its diagnoses, so that many keyword +anomalies it reports are actually innocuous and can be +safely ignored. + +Basically, a keyword anomaly is a situation where a +keyword is recognized, causes a reduction, and the +parser arrives in a state where the keyword is not +legal input. If the keyword, seen simply as a sequence +of characters, might have been legal input in the +original state, AnaGram notes the existence of a +keyword anomaly. + +If you have a keyword that causes a keyword anomaly and +it is actually a reserved word in your grammar, the +anomaly is by definition innocuous. You should use the +©reserve keywordsª statement to inform AnaGram that the +keyword is reserved and the anomaly need not be +diagnosed. + +To help identify and correct any problems associated +with keyword anomalies, AnaGram provides the ©Keyword +Anomaliesª window to identify the anomalies, and the +©Keyword Anomaly Traceª to help you understand a +particular anomaly. +## + +Keyword Anomaly Trace + +A Keyword Anomaly Trace is a ready made ©grammar traceª +window which you may select from the ©Auxiliary Windowsª +menu of the ©Keyword Anomaliesª window. The anomaly +trace provides a path to a state which illustrates the +©keyword anomalyª. In this state, the keyword is a +reducing token, but after the reduction, it is not +allowable input. +## + +Keyword Anomalies + +The Keyword Anomalies window is available only if your +grammar has ©keywordª anomalies. + +Each entry in the Keyword Anomalies window consists of +two lines. The first line identifies the ©parser stateª +at which the ©keyword anomalyª occurs and the offending +keyword. The second line identifies the ©grammar ruleª +which the keyword may erroneously reduce. + +The ©Auxiliary Windowsª menu provides three auxiliary +windows keyed directly to the anomaly to help you +determine the nature of the problem: The ©Keyword +Anomaly Traceª window, the ©Reduction Traceª window, and +the ©Rule Derivationª window. Three other windows provide +supporting information: the ©Reduction Statesª window, +the ©Rule Contextª window and the ©State Definitionª +window. +## + +Keywords + +The Keywords entry in the ©Browse Menuª pops up a +window which lists all of the keywords defined in your +©grammarª. The ©token numberª is also specified. + +A Keywords window is also an option in the ©Auxiliary +Windowsª popup menu for any window which distinguishes +various states of your parser. The Keywords window will +show all of the ©keywordªs which will be recognized in +the state selected by the cursor bar in the parent +window. + +The ©Auxiliary Windowsª menu for a Keywords window +provides a ©Token Usageª option which will allow you to +all the uses of a particular keyword in your grammar. +## + +left + +"left" controls a ©precedence declarationª, indicating +that all of the listed ©rule elementsª are to be +considered ©left associativeª. +## + +Left Associative + +A binary operator is said to be left associative if +an expression with repeated instances of the operator +is to be evaluated from the left. Thus, for example, + x = a/b/c + +is normally taken to mean x = (a/b)/c The division +operator is said to be left associative. + +In ©grammarªs with ©conflictªs, you may use ©precedence +declarationªs to specify that an operator should be left +associative. +## + +Lexeme + +The "lexeme" ©attribute statementª is used to fine-tune +the "©disregardª" statement. The lexeme statement takes +the form: + lexeme { T1, T2,....Tn } + +where T1,...Tn is a list of ©nonterminalª tokens +separated by commas. Lexeme statements may be placed in +any ©configuration sectionª, and there may be any number +of them. + +When you specify that a ©tokenª is to be disregarded, +AnaGram rewrites your ©grammarª so that the token will be +passed over whenever it occurs at the beginning of a +file or following a lexical unit, or "lexeme". If you +have no lexeme statement, then the lexemes in your +grammar are just the terminal tokens. + +The lexeme statement allows you to specify that certain +nonterminal tokens are also to be treated as lexemes. +This means that the disregard token will be skipped +following the lexeme, but not between the characters +that constitute the lexeme. + +Lexemes correspond to the tokens that a lexical scanner, +if you were using one, would commonly identify and pass +to a parser as single tokens. You don't usually wish to +disregard ©white spaceª within these tokens. For +example, in a grammar for a conventional programming +language where blank characters are to be disregarded, +you might include: + [ + lexeme {string, character constant, name, number} + ] + +since blank characters must not be overlooked within +strings and constants, and should not be permitted +within names or numbers. + +If your grammar allows for situations where successive +lexemes could run together if they were not separated +by space, a name followed by a number, for example, you +may use the "©distinguish lexemesª" ©configuration +switchª to force a separation between the tokens. + +White space may be used explicitly within definitions of +lexeme tokens in your grammar if desired, without +causing conflicts. Thus, if you wish to allow embedded +space in variable names, you might write: + [ + disregard space + lexeme {variable name} + ] + space = ' ' + '\t' + letter = 'a-z' + 'A-Z' + digit = '0-9' + + variable name + -> letter + -> variable name, letter + digit + -> variable name, space..., letter + digit +## + +line + +line is a field in your ©parser control blockª used for +keeping track of the line number of the current +character in your input. Line and column numbers are +tracked only if the ©lines and columnsª ©configuration +switchª has been set. +## + +line length + +Line length is an ©obsolete configuration parameterª. +## + +Line Numbers + +"Line numbers" is a ©configuration switchª which +defaults to off. If it is on, the ©Build Parserª +command will put "#line" directives into the generated +C code file so that your compiler diagnostics will +refer to lines in the ©syntax fileª rather than in the +generated C code file. For more information on the +"#line" directive, see Kernighan and Ritchie, second +edition, section A12.6. + +If the "line numbers" switch is off, AnaGram will put +comments into your parser file to help you find +reduction procedures and embedded C in your syntax +file. + +Prior to AnaGram 2.01, if your C or C++ compiler required that the +backslashes in the pathname in the #line directive be doubled, you +would have used AnaGram's ©escape backslashesª switch to make this +happen. Although you may still use ©escape backslashesª, it should no +longer be necessary because AnaGram now puts forward slashes into #line +pathnames instead of backslashes. + +If you wish, you may specify the pathname in the #line +directives explicitly by using the ©Line Numbers Pathª +configuration parameter. + +You may also wish to change the "©parser file nameª" +parameter to provide a full path name for your parser +file. +## + +Line Numbers Path + +"Line Numbers Path" is a ©configuration parameterª +which takes a string value. It defaults to NULL. + +When you have set the ©Line Numbersª ©configuration +switchª and Line Numbers Path is not NULL, AnaGram +uses it in the #line directive in place of the full +path name of your ©syntax fileª. + +Note that Line Numbers Path should be the complete +pathname for your syntax file. + +Line Numbers Path is useful when using AnaGram in cross +platform development. When parsers are to be compiled +and tested on a platform different from that used to run +AnaGram, you may use Line Numbers Path to provide a +pathname on the platform used for compiling and +testing. +## + +Lines and Columns + +"Lines and columns" is a ©configuration switchª which +defaults to on. When set, i.e., on, it causes the +©Build Parserª command to incorporate code into your +parser which will automatically track the line number +and column number of the input token. + +You would normally set the "lines and columns" switch +when you are planning to build a parser which will read +an input file and which will need to diagnose ©syntax +errorsª with some precision. + +Your parser will store the line and column numbers in +the ©lineª and ©columnª fields respectively in the +©parser control blockª. + +If the input to your parser includes tab characters, you +should either set the ©tab spacingª ©configuration +parameterª appropriately or provide a ©TAB_SPACINGª +macro for your parser. + +Your parser will count line and column numbers beginning +with one. +## + +Main Program + +The "main program" ©configuration switchª determines +what AnaGram does if you invoke the ©Build Parserª +command, but have no ©embedded Cª in your ©syntax +fileª. If the switch is on and you have not specified +©pointer inputª or an ©event drivenª parser, AnaGram +creates a main program which does nothing but call your +©parserª. The "main program" switch defaults to on. + +This feature, along with the default definitions for +©GET_INPUTª and ©error handlingª, makes it possible +to write a grammar with no ©embedded Cª or ©reduction +procedureªs whatsoever and still get an executable +program which will read input from stdin and parse it +according to your grammar. +## + +Marked Rule + +A "marked rule" is a ©grammar ruleª together with a +marked token that indicates how much of the rule has already +been matched. The ©marked tokenª and any tokens following it +indicate the input that should be expected if the +remainder of the rule is to be matched. + +When marked rules are displayed in AnaGram windows, the +marked token is represented by a difference in the font. The token may +be in bold face, underlined, italicized, shown with a different point +size, or in a different font altogether. Since AnaGram allows you to +change fonts to suit your own preferences, you should be careful that +the font you choose for the marked tokens allows them to be readily +distinguished from the other tokens in your grammar rules. An +underlined font is often suitable. +## + +Max conflicts + +The "max conflicts" ©configuration parameterª limits the +number of ©conflictªs AnaGram will record. Sometimes, a +simple error editing your syntax file can cause hundreds +of conflicts, which you don't need to see in gory +detail. The default value of max conflicts is 50. If you +have a grammar that is in serious trouble and you want +to see more conflicts, you may change max conflicts to +suit your needs. +## + +Missing + +The ©warningª message Missing <element 1> in <element 2> +indicates that AnaGram expects to see an instance of +syntactic element 1 at the specified location, internal +to an instance of syntactic element 2. AnaGram cannot +reliably continue parsing its input after an error of +this type. Therefore, it limits further analysis of +your grammar to scanning for syntax errors. +## + +Missing Production + +"Missing production, TXXX: <token name>" is a ©warningª +message which indicates that the specified ©tokenª +appears to be defined recursively, but there is no +initial ©productionª to get the recursion started. If +you get this warning, check your ©grammarª closely. +## + +Missing Reduction Procedure + +"Missing reduction procedure, RXXX" is a ©warningª +message which appears either when the ©grammar ruleª indicated +specifies a ©parameter assignmentª but does not have a +©reduction procedureª to use it, or when the rule has no reduction +procedure but the value of the token on the left hand side is used in +as an argument for some other reduction procedure and the ©default reduction valueª +does not have the same type as the token on the left hand side. +In this latter case, a reduction procedure may be needed to effect +correct type conversion. + +This warning is +provided in case the lack of a reduction procedure is an +oversight. +## + +Multiple Definitions + +"Multiple definitions for TXXX: <token name>" is a +©warningª message which indicates that the specified +©tokenª has been defined both as a ©character setª and +as a ©nonterminal tokenª. It cannot be both. +## + +Near Functions + +"Near Functions" is a ©configuration switchª that +defaults to off. It controls the use of the "near" +keyword for static functions in your parser. If your +parser is to run on an 80x86 processor you might wish +to turn it on. Your parser will then be a slight bit +smaller and will run a little bit faster. + +If you are going to run your parser on some other +processor or use a C or C++ compiler that does not +support the "near" keyword you should make sure "near +functions" is set to off. +## + +Negative Character Code in Pointer Mode + +This ©warningª message appears if your ©grammarª defines +negative character codes and uses ©pointer inputª. If +your grammar uses the default definition for ©pointer +typeª it will be reading unsigned characters so that +the parser will never see the negative codes that have +been defined. You may correct the problem by providing +your own definition of pointer type. +## + +Nest Comments + +"Nest comments" is a ©configuration switchª which +defaults to off. It controls the treatment of ©commentsª +while scanning your ©syntax fileª. It defaults to off, +in accordance with the ANSI standard for C which +disallows ©nested commentsª. Note that AnaGram scans +comments in any ©embedded Cª code as well as in the +grammar specification. You may turn this switch on and +off as many times as necessary in a single file. +## + +Nested Comment + +As delivered, AnaGram treats C style ©commentsª +according to the ANSI standard: They do not nest. For +those who prefer nested comments, however, the ©nest +commentsª ©configuration switchª allows them to nest. +## + +Nesting too deep + +This ©warningª message indicates that ©set +expressionªs or ©virtual productionsª are +nested so deeply they have exhausted the available +stack space and AnaGram cannot continue its analysis. + +Use a ©definitionª statement to name an intermediate +level. +## + +no cr + +"no cr" is a ©configuration switchª which +defaults to off. When this switch is set, it will +cause the ©parser fileª and ©header fileª to be +written without carriage returns. This is convenient +if you wish to use the generated parser files in a +Unix environment. +## + +No Grammar Token Specified + +This ©warningª message appears if your ©grammarª does not +specify a ©grammar tokenª. Edit your ©syntax fileª to +specify one. +## + +No Productions in Syntax File + +This ©warningª message appears if AnaGram did not find +any ©productionsª at all in your ©syntax fileª. Check +to see you have the right file. +## + +No Such Parameter + +This ©warningª message appears when AnaGram does not +recognize the name of a ©configuration parameterª you +have tried to set in your ©syntax fileª. Check the +spelling of the parameter you wish to set in the +©Configuration Parameters Windowª. +## + +No Terminal Tokens in Expansion + +No terminal tokens in expansion of TXXX is a ©warningª +message indicating that there are no terminal tokens +to be found in an expansion of the specified token. +Although there are a few circumstances where this could +be legitimate, it is more likely that there is a missing +rule in the grammar. +## + +Not a Character Set + +"Not a character set, TXXX: <token name>" is a ©warningª +message which indicates that the specified ©tokenª has +been used both on the left side of a ©productionª and in +a ©character setª expression defining some other token. +AnaGram will use an empty set in place of the +specified token in evaluating the ©character setª. You +will get another warning, ©Error definingª token, when +AnaGram finishes its evaluation of the character set. +## + +Nothing Reduces + +"Nothing reduces TXXX -> RYYY" is a ©warningª message +which indicates that the ©grammarª does not specify any +input to follow an instance of the indicated ©grammar +ruleª. In all probability, the grammar does not have +any explicit end of file, or ©eof tokenª. If the grammar +does not have any conflicts with ©tokenª T000, then an +explicit end of file indicator is not necessary. +Otherwise you should modify your grammar to require an +explicit end of file. +## + +Null Character in String + +This ©warningª message appears when AnaGram finds an +explicit null character in a quoted string. If you must +allow for a null in a ©keyword stringª +you will have to rewrite your +©grammar ruleª. For instance, instead of + + widget + -> "abc\0def" + +write + + widget + -> "abc", 0, "def" +## + +nonassoc + +"nonassoc" controls a ©precedence declarationª, +indicating that all of the listed ©rule elementsª are +to be considered non-associative. +## + +Nonterminal Token, Nonterminal + +A nonterminal token is one which is constructed from a +series of other tokens as specified by one or more +©productionªs. Nonterminal tokens are to be +distinguished from ©terminal tokenªs, which are the +basic input units appearing in your input stream. +Terminal tokens most often represent single characters +or a character belonging to a ©character setª such as +'a-z'. +## + +Null Production + +A "null production" is one that has no tokens on the +right hand side whatsoever. Null ©productionªs +essentially are identified by the first following input +token. Null productions are extremely convenient +syntactic elements when you wish to make some input +optional. For example, suppose that you wish to allow an +optional semicolon at some point in your grammar. You +could write the following pair of productions: + optional semicolon -> | ';' +Note that a null production can never follow a '|'. + +This could also be written on multiple lines thus: + optional semicolon + -> + -> ';' + +You can always rewrite your grammar to eliminate null +productions if you wish, but you usually pay a price in +conciseness and clarity. Sometimes, however, it is +necessary to do such a rewrite in order to avoid +©conflictªs, to which null productions are especially +prone. For example suppose you have the following +production: + foo -> wombat, optional semicolon, widget + +You can rewrite this as two productions: + foo + -> wombat, widget + -> wombat, ';', widget + +This rewrite specifies exactly the same input language, +but is less prone to conflicts. On the other hand, it +may require significantly more table space in your +parser. + +If you have a null production with no ©reduction +procedureª specified, your parser will automatically +assign the value zero to ©reduction tokenª. + +Null productions can also be generated by ©virtual +productionsª. + +A token that has a null production is a "©zero lengthª" +token. +## + +Old Style + +"Old Style" is a ©configuration switchª which defaults +to off. It controls the function definitions in the code +AnaGram generates. When "old style" is off, it generates +ANSI style calling sequences with prototypes as +necessary. When "old style" is on, it generates old +style function definitions. +## + +Output Files + +When you use the ©Build Parserª command, to request +output from AnaGram, it creates two files: a ©parser +fileª and a ©parser headerª file. +## + +Page Length + +"Page length" is an ©obsolete configuration parameterª. +## + +Obsolete Configuration Parameter, Obsolete Configuration Switch + +A number of ©configuration parameterªs and ©configuration switchªes +which were used in the DOS version of AnaGram are no longer +used, but are still recognized for the sake of upward +compatibility. These parameters include: + ©bottom marginª + ©line lengthª + ©page lengthª + ©top marginª + ©quick referenceª + ©video modeª + +## + +Parameter + +"Parameter <name> has type void" is a ©warningª message +which appears when a ©parameter assignmentª is attached +to a ©tokenª that has been defined to have the void +©data typeª. +## + +Parameter Assignment + +In any ©grammar ruleª, the ©semantic valueª of any +©rule elementª may be passed to a ©reduction procedureª +by means of a parameter assignment. Simply follow the +rule element with a colon and a C variable name. The C +variable name can then be used in the reduction +procedure to reference the semantic value of the token +it is attached to. AnaGram will automatically provide +necessary declarations. + +Here are some examples of rule elements with parameter +assignments: + + '0-9':d + integer:n + expression:x + declaration : declaration_descriptor + +## + +Parameter Not Defined + +AnaGram does not have a ©configuration parameterª +with the specified name. Please check the spelling. +## + +Parameter Takes Integer Value +The specified ©configuration parameterª takes +an integer value only. +## + + +Parameter Takes String Value + +The specified ©configuration parameterª takes +a string value only. +## + +Parse Function + +To run your parser, you call the parse function. +The name of the parse function is given by +the ©parser nameª ©configuration parameterª and defaults to the +name of your parser file. + +If your parser uses ©pointer inputª, you should set the ©pointerª +field of the ©parser control blockª before calling the parser +function. + +If your parser is ©event drivenª, you should first call the +©initializerª, and then you should call the parser function +for each input token you + +If the ©reentrant parserª switch is set, the parse function takes +a pointer to the ©parser control blockª as its sole argument. Otherwise +it takes no arguments. The parse function returns no value. All +communication is by means of the ©parser control blockª. + +To retrieve the value of the ©grammar tokenª, once the parse is complete, +use the ©parser value functionª. +## + +Parser + +A parser is a program or, more commonly, a procedure within +a program, which scans a sequence of ©input charactersª +or input tokens and accumulates them in an input +buffer or stack as determined by a set of ©productionªs +which constitute a ©grammarª. + +When the parser discovers +a sequence of tokens as defined by a ©grammar ruleª, or +right hand side of a production, it "reduces" the +sequence to a single ©reduction tokenª as defined by the +left hand side of the grammar rule. This ©nonterminal +tokenª now replaces the tokens which matched the grammar +rule and the search for matches continues. + +If an input +token is encountered which will not yield a match for +any rule, it is considered a ©syntax errorª and some +kind of ©error recoveryª may be required to continue. If +a match, or ©reduce actionª, yields the ©grammar tokenª, +sometimes called the ©goal tokenª or ©start tokenª, the +parser deems its work complete and returns to whatever +procedure may have called it. + +The ©Grammar Traceª and ©File Traceª functions in +AnaGram provide a convenient means for understanding the +detailed operation of a syntax directed parser. + +©Tokensª may have ©semantic valuesª. If the ©input +valuesª ©configuration switchª is on, your parser will +expect semantic values to be provided by the input +process along with the token identification code. If the +input values switch is off, your parser will take the +ascii value of the input character, that is, the actual +input code, as the value of the character. + +When the +parser reduces a production, it can call a ©reduction +procedureª or ©semantic actionª to analyze the values of +the constituent tokens. This reduction procedure can +then return a value which characterizes the reduced +token. +## + +Parser Control Block + +A "Parser Control Block" is a structure which contains +all of the data necessary to describe the instantaneous +state of a parser. The typedef statement which defines +the structure is included in the ©parser headerª file +for your parser. AnaGram creates the name of the data +type for the structure by appending "_pcb_type" to the +©parser nameª. + +You may add your own declarations to the parser control +block by using the ©extend pcbª statement. + +If the ©declare pcbª ©configuration switchª is on, its +normal state, AnaGram will declare a parser control +block for you at the beginning of your parser file. +AnaGram will determine the name of the parser control +block by appending "_pcb" to the ©parser nameª. AnaGram +will also define the macro PCB as a short hand notation +for use within the parser. All references to the parser +control block within the code that AnaGram generates +are made using the PCB macro. + +If you wish to declare your own parser control block, +you must include the ©parser headerª file for your +parser before your declaration. Then you declare a +control block and define PCB to refer to the control +block you have declared. + +Suppose your grammar is called widget. You would then +write the following statements in your ©embedded Cª: + #include "widget.h" + widget_pcb_type widget_control_pcb; + #define PCB widget_control_pcb + +Alternatively, you could write the following: + #include "widget.h" + widget_pcb_type *widget_control_pcb_pointer; + #define PCB (*widget_control_pcb) + +and then allocate storage for the structure when +necessary. + +Some fields of interest in the parser control block are +as follows: + ©input_codeª + ©input_valueª + ©input_contextª + ©pointerª + ©token_numberª + ©reduction_tokenª + ©ssxª + ©snª + ©ssª[©parser stack sizeª] + ©vsª[parser stack size]; + ©csª[parser stack size]; + ©lineª + ©columnª + *©error_messageª + ©error_frame_ssxª + ©error_frame_tokenª +## + +PCB + +"PCB" is a macro AnaGram defines for use in the code it +generates to refer to the ©parser control blockª for +your ©parserª. Normally, AnaGram automatically declares +storage for a parser control block and defines PCB for +you. If you turn off the ©declare PCBª switch, you may +define PCB yourself. +## + +PCB_TYPE + +If you are writing your parser in C++, you may prefer to derive +a class from the ©parser control blockª rather than use the +©extend pcbª statement. In this case you may define the +PCB_TYPE macro in your syntax file to specify your derived +class. + +For instance, you have defined + +class MyPcb : public parser_pcb_type {...}; + +You would then add the following line: + +#define PCB_TYPE MyPcb + +If you do not define PCB_TYPE, AnaGram will define it as the +type of your parser control block. +## + +Parser File + +The "parser file" is the C (or C++) file output by AnaGram when +you execute the ©Build Parserª command. It contains all +of the ©embedded Cª from your ©syntax fileª, all of the +©reduction procedureªs defined in your ©grammarª, +syntax tables which represent, in a condensed form, all +of the intricacies of your grammar, and a customized +©parsing engineª. The name of the parser file is given +by the ©parser file nameª ©configuration parameterª. The +name of the ©parserª itself is given by the ©parser +nameª configuration parameter. + +If you wish the parser file to be written without carriage +returns, suitable for a Unix environment, set the ©no crª +configuration switch. +## + +Parser File Name + +"Parser file name" is a ©configuration parameterª which +takes a string value. The default value is "#.c". +AnaGram uses this parameter to generate the name of the +output C file, or ©parser fileª, created by the ©Build +Parserª command. The '#' character is used in this +string as a wild card to indicate the name of the +current ©syntax fileª. If the first character of the +parser file name string is a '.' character, AnaGram +will substitute the name of the current working +directory for the dot. Thus ".\\#.c" will create the +file name as a complete path. This can sometimes be +important when using the ©line numbersª switch to +enable a debugger to find code in your parser file. + +Note that the parser file name is not the same as the +©parser nameª. +## + +Parser Generator + +A parser generator, such as AnaGram, is a program that +converts a ©grammarª, a rule-based description of the +input to a program, into a conventional, procedural +module called a ©parserª. The parsers AnaGram generates +are simple C modules which can be compiled on almost +any platform. AnaGram parsers are also compatible with +C++. +## + +Header File, Parser Header + +When you use the command ©Build Parserª to generate +source code for a parser, AnaGram creates two files, a +header file and a C source file. Unless different +paths are specified in the ©parser file nameª and +©header file nameª parameters, both files will be +written to the directory that contains the ©syntax fileª. + +The header file contains a number of typedef statements, +including the definition of the ©parser control blockª, +and a number of macro +definitions which may be useful in your parser +or in other modules of your program. + +If you do not alter +the ©header file nameª parameter, the +name of the header file will be the same as the name of +your ©syntax fileª and it will have the extension ".h". + +If you wish the header file to be written without carriage +returns, suitable for a Unix environment, set the ©no crª +configuration switch. +## + +Parser Input + +AnaGram ©parserªs may be configured to accept input in any of +three different ways: + + By default, a ©parse functionª gets its input by invoking the +©GET_INPUTª macro each time it is ready for another input token. The +default implementation of GET_INPUT reads ©input characterªs from stdin. For +most practical problems, you will want to override this definition of +GET_INPUT, storing the current input character in PCB.input_code. + + Alternatively, you may configure a parser to read input from an +array in memory. Set the ©pointer inputª switch and load the +©pointerª field of the parser control block before calling the +parse function. The parser will then run, incrementing the +pointer, until it finishes or encounters an error. + + The third alternative is to set the ©event drivenª switch. The +parser will be configured as a callback routine. Begin by calling +the ©initializerª. Then, for each input character, store the +character in the ©input_codeª field of the parser control block and +call the parse function. Each time +you call the parse function it will continue until it needs more +input. You can check its status by inspecting the ©exit_flagª in the +parser control block. + +The input to your parser may be either text characters or ©tokensª +accumulated by a pre-processor, or ©lexical scannerª. The latter +case is referred to as ©token inputª. If you use a lexical scanner, +you may find it convenient to configure your parser as event driven. + +Altlhough lexical scanners are often not necessary +when you use AnaGram, if you do need one you can write it in AnaGram. +## + +Parser Name + +"Parser Name" is a ©configuration parameterª which +defaults to "#", where "#" represents the name of your +©syntax fileª. AnaGram uses this parameter to name your +©parse functionª. The ©initializerª for your parser will have the +same name preceded by "init_". Note that "©parser file +nameª" is not the same configuration parameter as "parser +name". +## + +Parser Stack + +Your ©parserª uses a "parser stack" to keep track of the +©grammar rulesª it is trying to match and its progress +in matching them. Normally, there are two separate +stacks defined by AnaGram: ©PCBª.©ssª, the ©parser state +stackª which maintains ©parser stateª numbers, and +PCB.©vsª, the ©parser value stackª which maintains the +©semantic valueªs of tokens that have been identified so +far. If you wish to maintain a stack tracking other +variables you may set the ©context typeª ©configuration +parameterª, and AnaGram will define a third stack, +PCB.©csª. All are indexed by the same stack index, +PCB.©ssxª. + +To see how tokens accumulate on the parser stack, run +the ©Grammar Traceª or the ©File Traceª. + +Normally, when the return value of a ©reduction procedureª +is stored on the parser value stack, it is stored by +simply coercing the stack pointer to the correct type. +If the return value is a C++ object, this can cause +serious problems. These problems can be avoided by +using the ©wrapperª statement. +## + +Parser stack alignment + +Parser stack alignment is a ©configuration parameterª whose +value is a C or C++ data type. It defaults to "long". If +any tokens have type "double", it will be automatically set +to double. Thus, you will normally not need to change this +parameter if your parser is to run on a PC or compatible +processor. It provides alignment control for processors +which restrict address for multibyte data access. The +default setting provides for correct operation on 64 bit +processors. + +To control byte alignment of the parser stack, +©PCBª.©vsª, AnaGram normally adds a field of the +specified data type to the "union" statement which +defines the data type for the ©parser stackª. This +parameter can be used to deal with byte alignment +problems when a ©parserª is to be run on a processor +with byte alignment restrictions. For instance, if your +©grammarª has ©tokenªs of type "long double" and your +processor requires long double variables to be +properly aligned, you can include the following +statement in a ©configuration sectionª in your grammar +or in your ©configuration fileª: + + parser stack alignment = long double + +If the data type specified is "void", no alignment declaration +will be made. +## + +Parser Stack Index, Stack Index + +The parser stack index, ©PCBª.©ssxª, tracks the depth +of the ©parser state stackª, the ©parser value stackª, +and the ©context stackª if you defined one. The parser +stack index is incremented by ©shift actionsª and +reduced by ©reduce actionsª. +## + +Parser Stack Overflow + +Your ©parserª uses a ©parser stackª to keep track of the +©grammar rulesª it is trying to match and its progress +in matching them. If your grammar has any ©recursive +ruleªs that are not strictly left recursive, then no +matter how big you make the parser stack, it will be +possible to create a syntactically correct input which +will cause the stack to overflow. As a practical matter, +however, it is usually possible to set the ©parser stack +sizeª to a value large enough so that an overflow is a +freak occurrence. Nevertheless, it is necessary to check +for overflow, and in the case overflow should occur, +your parser has to do something. What it does is invoke +the ©PARSER_STACK_OVERFLOWª macro. If you don't define +it, AnaGram will define it for you, although not +necessarily to your taste. +## + +Recursive rule, Recursion + +A ©grammar ruleª is said to be "recursive" if the ©tokenª on the left side +of the rule also appears on the right side of the rule, or +in an ©expansion ruleª of any token on the right side of the rule. + +If the token on the left side is the +first token on the right side, the rule is said to be "left recursive". +If it is the last token on the right side, the rule is said to be +"right recursive". Otherwise, the rule is "center recursive". + +For example: + statement list + -> statement + -> statement list, statement // left recursive + + fraction part + -> digit + -> fraction part, digit // right recursive + + expression + -> factor + -> expression, '+' + '-', factor + + factor + -> primary + -> factor, '*' + '/', primary + + primary + -> number + -> name + -> '(', expression, ')' // center recursive + +Note that if all the tokens in the rule other then the recursive token itself +are ©zero lengthª tokens, it is possible for the +rule to be matched arbitrarily many times without any input whatsoever. In +other words, such a rule creates an infinite loop in the parser. AnaGram can +detect this condition and issues an ©empty recursionª diagnostic if it occurs. + +## + +PARSER_STACK_OVERFLOW + +PARSER_STACK_OVERFLOW is a user definable macro. If you +do not define it, AnaGram will define it so that it +will print a message on stderr and abort the ©parserª in +case of a ©parser stack overflowª. +## + +Parser Stack Size + +"Parser stack size" is a ©configuration parameterª with +a default value of 128. It is used to define the sizes +of your ©parser stacksª in your ©parser control blockª. +When analyzing your grammar, AnaGram will determine the +minimum amount of stack space required for the deepest +left ©recursionª. To this depth it will add one half the +value of the parser stack size parameter. It will then +set the actual stack size to the larger of this value +and the parser stack size parameter. +## + +Parser State, State Number + +The essential part of your ©parserª is a group of tables +which describe in detail what to do for each "state" of +the parser. + +The states of a parser are determined by sets of +"©characteristic rulesª". The ©State Definition Tableª +shows the characteristic rules for each state of your +parser. + +AnaGram numbers the states of a parser as it identifies +them, beginning with zero. In all windows, state numbers +are displayed as three digit numbers prefixed with the +letter 'S'. +## + +Parser State Stack, State Stack + +The parser state stack is a stack maintained by your +©parserª and which is an integral part of the parsing +process. At any point in the parse of your input +stream, the parser state stack provides a summary of +what has been found so far. The parser state stack is +stored in ©PCBª.©ssª and is indexed by PCB.©ssxª, the +©parser stack indexª. +## + +Parser Value Stack, Value Stack + +In parallel with the ©parser state stackª, your parser +maintains a "value stack", ©PCBª.©vsª, each entry of +which corresponds to the ©semantic valueª of the token +identified at that state. Since the semantic values of +different tokens might well have different ©data typeªs, +AnaGram gives you the opportunity, in your ©syntax +fileª, to define the data type for any token. AnaGram +then builds a typedef statement creating a data type +which is a union of the all the types you have defined. +AnaGram creates the name for this ©data typeª by +appending "_vs_type" to the ©parser nameª. AnaGram uses +this data type to define the value stack. +## + +Parser Action + +In a traditional LR parser, there are only four actions: the ©shift +actionª, the ©reduce actionª, the ©accept actionª and the ©error +actionª. AnaGram, in doing its ©grammar analysisª, identifies a +number of special cases, and creates a number of extra actions which +make for faster processing, but which can be represented as +combinations of these primitive actions. + +When a shift action is performed, the current state +number is pushed onto the ©parser state stackª and the +new state number is determined by the current state +number and the current input token. Different tokens +cause different new states. + +When a reduce action is performed, the length of the +rule being reduced is subtracted from the ©parser stack +indexª and the new state number is read from the top of +the parser state stack. The ©reduction tokenª for the +rule being reduced is then used as an input token. +## + +Parsing Engine + +A parser consists of three basic components: A set of +syntax tables, a set of ©reduction procedureªs and a +parsing engine. The parsing engine is the body of code +that interprets the parsing table, invokes input +functions, and calls the reduction procedures. The +©Build Parserª command configures a parsing engine +according to the implicit requirements of the syntax +specification and according to the explicit values of +the ©configuration parameterªs. + +The parsing engine itself is a simple automaton, +characterized by a set of states and a set of inputs. +The inputs are the tokens of your grammar. Each state +is represented by a list of tokens which are admissible +in that state and for each token a ©parser actionª to perform +and a parameter which further defines the action. + +Each state in the grammar, with the exception of state +zero, has a ©characteristic tokenª which must have been +recognized in order to jump to that state. Therefore, +the ©parser state stackª, which is essentially a list +of state numbers, can also be thought of as a list of +token numbers. This is the list of tokens that have +been seen so far in the parse of your input stream. +## + +Partition + +If you use ©character setsª in your grammar, AnaGram +will compute a "partition" of the ©character universeª. +This partition is a collection of non-overlapping +character sets such that every one of the sets you have +defined can be written as a ©unionª of partition sets. + +Each partition set is assigned a unique ©tokenª. If one +of your character sets requires more than one partition +set to represent it, AnaGram will create appropriate +©productionªs and add them to your grammar so your parser +can make the necessary distinctions. + +To see how AnaGram has partitioned the character +universe, you may inspect the ©Partition Setsª window +found in the ©Browse Menuª. +## + +Partition Set Number + +Each ©partitionª set is identified by a unique +reference number called the partition set number. +Partition set numbers are displayed in the form Pxxx. +Partition sets are numbered starting with zero, so the +first set is P000. + +To see the elements of a given partition set, call up +the ©Partition Setsª window from the ©Browse Menuª, +then, after selecting a partition set, call up the ©Set +Elementsª window from the ©Auxiliary Windowsª popup menu. +## + +Partition Sets + +The Partition Sets option in the ©Browse Menuª pops up +a window which shows the complete ©partitionª of the +©character universeª for your parser. + +The Partition Sets option in the ©Auxiliary Windowsª popup menu +for the ©Character Setsª window lets you see the +partition sets which cover the specified character set. + +Each entry in a Partition Sets window identifies a +token number and a ©partition set numberª. The ©Auxiliary +Windowsª menu provides a ©Set Elementsª entry which +enables you to see precisely which characters belong to +the partition set. It also has a Token Usage entry to show you +what rules the set is used in. +## + +PCONTEXT + +PCONTEXT is an alternate form of the ©CONTEXTª macro +which takes an explicit argument to specify the +©parser control blockª. PCONTEXT is defined in the ©parser +headerª file. +## + +PERROR_CONTEXT + +PERROR_CONTEXT is an alternative form of the +©ERROR_CONTEXTª macro. It differs only in that it takes +an argument so you can specify the appropriate +©parser control blockª explicitly. PERROR_CONTEXT is defined in +the ©parser headerª file. +## + +pointer + +"pointer" is a field which will be included in the +©parser control blockª for your parser if you have set +the ©pointer inputª ©configuration switchª. Your main +program should set PCB.pointer before it calls your +parser. Thereafter, your parser will increment it +appropriately. When you are executing a ©reduction +procedureª or a ©SYNTAX_ERRORª macro PCB.pointer will +always point to the next input character to be read. +## + +Pointer input + +"Pointer input" is a ©configuration switchª which you +may set to control ©parser inputª. It defaults to off. When you set +pointer input, you tell AnaGram that the input to your parser is in +memory and can be scanned simply by incrementing a pointer. Before +calling your parser you should make sure that ©PCBª.©pointerª is +properly initialized to point to the first character or token in your +input. + +Use the ©configuration parameterª "©pointer typeª" to +specify the type of the pointer. The default value of +"pointer type" is "unsigned char *" + +There is no particular reason why pointer type should +be limited to variants on char. It could define a +pointer to int or a structure just as well. + +If you use pointer input with structures or C++ +classes, you should set the ©input valuesª switch and +define an ©INPUT_CODEª(t) macro. + +If you are using a 16 bit compiler and your input array +is so large that you need "huge" +pointers, make sure that "pointer type" is properly +defined. +## + +Pointer Type + +"Pointer Type is a ©configuration parameterª which +defaults to "unsigned char *". When you have specified +©pointer inputª, AnaGram uses the value of pointer type +to declare a pointer field in your ©parser control +blockª. +## + +Precedence, Operator Precedence + +In expressions of the form a+b*c, the convention is to +perform the multiplication before the addition. +Multiplication is said to take precedence over +addition. In general the rank order in which operations +are to be performed if there are no parentheses forcing +an order of computation is called the precedence of the +operators. + +If you have an ambiguous ©grammarª, that is, a grammar +with a number of ©conflictªs, you may use ©precedence +declarationªs to resolve the conflicts and to set +operator precedence. +## + +Precedence Declaration + +Precedence declarations are ©attribute statementsª which +may be used to resolve ©conflictªs in your grammar by +assigning precedence and associativity to operators. +Precedence declarations must be made inside +©configuration sectionsª. Each declaration consists of +the keyword ©leftª, ©rightª, or ©nonassocª followed by a +list of ©rule elementsª. The rule elements in the list +must be separated by commas and the entire list must be +enclosed in braces ({ }). + +Each of the rule elements is assigned the same +precedence level, which is higher than that assigned in +all previous precedence declarations and lower than that +in all subsequent declarations. The rule elements are +defined to be left, right, or nonassociative, +depending on whether the keyword was "left", "right", or +"nonassoc". + +All conflicts which are resolved by precedence +declarations are listed in the ©Resolved Conflictsª +window. +## + +Precedence Rules + +AnaGram can resolve certain types of ©conflictªs in your +grammar by applying precedence rules. There are three +classes of rules available: explicit ©precedence +declarationsª, the "©stickyª" statement, and the +implicit rule associated with the use of a "©disregardª" +token outside a ©lexemeª. + +Whenever AnaGram uses a precedence rule of any kind to +resolve a conflict, it produces a ©warningª message and +lists the conflict in the ©Resolved Conflictsª window. +## + +Previous States + +The Previous States window can be accessed via the +©Auxiliary Windowsª popup menu from any window that identifies +©parser stateªs. It shows the ©characteristic ruleªs +for all of the states which jump to the presently +selected state. +## + +Print File Name + +"Print file name" is a configuration parameter which +is not used in the Windows version of AnaGram. It is +retained only for compatibility with pre-existing +©configuration fileªs. +## + +Problem States + +The Problem States window is essentially a trimmed +version of the ©Reduction Statesª window. It is +available in the ©Auxiliary Windowsª popup menu for the +©Conflictsª and ©Resolved Conflictsª windows. + +The Problem States window has the same format as the +Reduction States window, and differs only in that it +shows only those reduction states for which the +©conflict tokenª is acceptable input. +## + +Production + +Productions are the mechanism you use to describe how +complex input structures are built up out of simpler +ones. Each production has a left hand side and a right +hand side. The right hand side, or ©grammar ruleª, is a +sequence of ©rule elementsª, which may represent either +©terminal tokensª or ©nonterminal tokensª. The left +hand side is a list of ©reduction tokensª. In most +cases there would be only a single reduction token. +Productions with more than one ©tokenª on the left hand +side are called ©semantically determined productionsª. + +The "->" symbol is used to separate the left hand side +from the right hand side. If you have several +productions with the same left hand side, you can avoid +rewriting the left hand side either by using '|' or by +using another "->". + +A ©null productionª, or empty right hand side, cannot +follow a '|'. + +Productions may be written thus: + name + -> letter + -> name, digit + +This could also be written + name -> letter | name, digit + +In order to accommodate semantic analysis of the data, +you may attach to any grammar rule a ©reduction +procedureª which will be executed when the rule is +identified. Each token may have a ©semantic valueª. By +using ©parameter assignmentªs, you may provide the +reduction procedure with access to the semantic values of +tokens that comprise the grammar rule. When it finishes, the +reduction procedure may return a value which will be +saved on the ©parser value stackª as the semantic value of the +©reduction tokenª. +## + +Productions + +The ©Productionªs window is available via the ©Auxiliary +Windowsª popup menu in any window which identifies tokens. +If the token identified by the highlighted line is +©nonterminalª, the Productions window will show the +rules produced by that ©tokenª. +## + +PRULE_CONTEXT + +PRULE_CONTEXT is an alternative form of the +©RULE_CONTEXTª macro. It differs only in that it takes +an argument so you can specify the appropriate ©parser control blockª +explicitly. PRULE_CONTEXT is defined in +the ©parser headerª file. +## + +Quick Reference + +"Quick reference" is an ©obsolete configuration switchª. +## + +Range Bounds Out of Order + +This is a ©warningª message that appears when you have a +©character rangeª of the form 'z-a'. AnaGram interprets +this range as being equal to 'a-z', but provides a +warning in case the unusual order was the result of a +clerical error. +## + +Recursive Definition of Char Set + +This ©warningª appears when AnaGram discovers a +recursively defined ©character setª. Character sets +cannot be defined recursively. +## + +Redefinition + +"Redefinition of <name>" is a ©warningª message which +appears when AnaGram discovers a redefinition of a +©symbolª. The new ©definitionª is ignored. +## + +Redefinition of Grammar Token + +This ©warningª appears when AnaGram encounters a new +definition of the ©grammar tokenª. AnaGram discards the +old definition. The last definition in the syntax file +wins. If you get this warning, check your ©syntax fileª +to make sure you have the grammar token you want. +## + +Redefinition of token + +"Redefinition of token, TXXX: <name>" is a ©warningª +message which occurs when AnaGram encounters a +©definitionª statement and the specified ©grammar tokenª +has already been seen on the left side of a +©productionª. AnaGram will ignore the definition +statement. +## + +Reduce Action, Reduction + +The reduce action, or reduction, is one of the four +actions of a traditional ©parsing engineª. The reduce +action is performed when the parser has succeeded in +matching all the elements of a ©grammar ruleª, and the +next input token is not erroneous. Reducing the grammar +rule amounts to subtracting the length of the rule from +the ©parser stack indexª, identifying the ©reduction +tokenª, stacking its ©semantic valueª and then doing a +©shift actionª with the reduction token as though it had +been input directly. +## + +Reduce-Reduce Conflict + +A grammar has a "reduce-reduce" ©conflictª at some +state if a single token turns out to be a ©reducing +tokenª for more than one ©completed ruleª. +## + +Reducing Token + +In a ©parser stateª with more than one ©completed ruleª, +your parser must be able to determine which one was +actually found. Therefore, during analysis of your +grammar, AnaGram examines each completed rule in order +to determine all the states the ©parserª will branch to +once the rule is reduced. These states are called the +"reduction states" for the rule. In any window that +displays ©marked ruleªs, these states may be found in +the ©Reduction Statesª window listed in the ©Auxiliary +Windowsª popup menu. + +The acceptable input tokens for those states are the +"reducing tokens" for the completed rules in the state +under investigation. If there is a single token which is +a reducing token for more than one rule, then the +grammar is said to have a ©reduce-reduce conflictª at +that state. If in a particular state there is both a +©shift actionª and a ©reduce actionª for the same token +the grammar is said to have a ©shift-reduce conflictª in +that state. + +Note that a "reducing token" is not the same as a +"©reduction tokenª". +## + +Reduction Choices + +"Reduction choices" is a ©configuration switchª which +defaults to off. If it is set, AnaGram will include in +your ©parser fileª a function which will identify the +acceptable choices for ©reduction tokenª in the current +state. This function, of course, is useful only if you +are using ©semantically determined productionsª. The +prototype of this function is: + int $_reduction_choices(int *); + where '$' represents the name of your parser. You must +provide an integer array whose length is at least as +long as the maximum number of reduction choices you +might have. The function will fill the array with +the token numbers of those which are acceptable in the +current state and will return a count of the number of +acceptable choices it found. +## + +reduction_token + +"reduction_token" is a field in your ©parser control +blockª. If your grammar uses ©semantically determined +productionsª, your ©reduction procedureªs need a +mechanism to specify which token the rule is to reduce +to. ©PCBª.reduction_token names the variable which +contains the ©token numberª of the ©reduction tokenª. +Prior to calling your reduction procedure, your parser +will set this field to the token number of the default +©reduction tokenª, i.e., the leftmost syntactically correct token in the +reduction token list for the production being reduced. +If the reduction procedure establishes that a different +reduction token is appropriate, it should store the +appropriate token number in PCB.reduction_token. +## + +Reduction Procedures + +The Reduction Procedures window lists the C function +prototypes for the ©reduction procedureªs in your grammar. + +When this window is active, the ©syntax fileª window, if +visible, is synchronized with it so you can see the body of +the reduction procedure as well as its usage. +## + +REDUCTION_TOKEN_ERROR + +REDUCTION_TOKEN_ERROR is a user definable macro which your ©parserª +invokes when it encounters an inadmissible reduction +token. This error should occur only if your parser uses +©semantically determined productionsª and your +©reduction procedureª provides an incorrect ©token +numberª. If you do not define it, AnaGram will define +it so that it will print an error message on stderr and +abort the parse. + +## + +Reduction Procedure, Semantic Action + +A "reduction procedure", or "semantic action", is a +function you write which your ©parserª executes when it +has identified the grammar rule to which the reduction +procedure is attached in your grammar. + +When your parser has identified a particular ©grammar +ruleª, that is to say, a particular sequence of ©tokensª +that you have specified in your grammar, it "reduces" +the production to the token at the head of the +production, or ©reduction tokenª. + +If you choose, you can +specify a "reduction procedure" which your parser will +call so that your program can do semantic analysis on +the production just identified. Your reduction procedure +will be called using, as arguments, the ©semantic +valuesª of tokens on the right side of the production. + +Your reduction procedure may, if you choose, return a +value which will become the semantic value of the +reduction token. Since many of the tokens in +©productionªs are there for only syntactic purposes, you +may specify, when you write your grammar, the tokens +whose values are needed as arguments for your reduction +procedure. + +To attach a reduction procedure to a grammar rule, just +write it immediately following the rule. There +are two formats for reduction procedures, +depending on the size and complexity of the procedure. + +The first form consists of an equal sign followed by a C +expression and a semicolon. When the rule is matched the +expression will be evaluated and its value will be +stacked on the ©parser value stackª as +the value of the reduction token. For example: + =-a; + =myProcedure(x, q); + +The second form consists of an equal sign followed by a +block of C code enclosed in curly braces. If you wish to +return a value for the reduction token you have to use a +return statement. For example: + ={ + if (x > y) return x; + return x+2y; + } + +In both forms of the reduction procedure, ©parameter +assignmentªs may be attached to ©rule elementªs in +order to make their semantic values available to the reduction +procedure. When the reduction procedure is executed, +local variables +will defined with the names specified in the parameter +assignments. The values of these variables +will have been set to the value of the corresponding +token. + +If the return value of your reduction procedure is a +C++ object, you may wish to spacify that AnaGram +enclose it in a ©wrapperª so that constructor calls +and destructor calls are made. Otherwise the object +pushed onto and popped from the parser value stack simply by +coercing the stack pointer to the appropriate type. + +The reduction procedures in your grammar are summarized +in the ©Reduction Proceduresª window. +## + +Reduction States + +The Reduction States window can be accessed via the +©Auxiliary Windowsª popup menu from any window which displays +©parser stateª numbers and ©marked ruleªs. If the highlighted +©grammar ruleª has no marked token, the Reduction States window will +show the states the parse could reach by reducing the rule and +processing the resultant ©reduction tokenª. +## + +Reduction Token + +A ©tokenª which appears on the left hand side of a +©productionª is called a reduction token. It is so +called because when the ©grammar ruleª on the right side +of the production is matched in the input stream, your +©parserª will "reduce" the sequence of tokens which +matches the rule by replacing the sequence of tokens +with the reduction token. + +If more than one +reduction token is specified, +the production is called a ©semantically determined productionª +and your ©reduction procedureª +should choose the appropriate reduction token. If it does not, your parser +will use the first token in the list that is syntactically +correct as the default. + +The ©CHANGE_REDUCTIONª macro can be used to specify the reduction +token. + +Note that a "reduction token" is not the same as a +"©reducing tokenª". +## + +Reduction Trace + +The Reduction Trace window is available from the +©Conflictsª window and the ©Resolved Conflictsª window. +It can be used in conjunction with the ©Conflict Traceª +to study ©conflictªs. The Reduction Trace represents the +result of taking the reduce option in the conflict state +of the Conflict Trace. +## + +Reentrant Parser + +"Reentrant parser" is a ©configuration switchª which defaults to off. +If it is on when AnaGram builds a parser AnaGram will generate code that +passes the pointer to the ©parser control blockª via calling sequences, +rather than using static references to the pcb. + +You can use the reentrant parser switch to help make ©thread safe +parsersª. + +The reentrant parser switch is compatible with both C and C++. + +The reentrant parser switch cannot be used in conjunction with +the ©old styleª switch. + +When you have enabled the reentrant parser switch, the ©parse functionª, +the ©initializerª function, and the ©parser value functionª +will be defined to take a pointer to the parser control block as +their sole argument. +## + +Reload Button + +The ©File Traceª window includes a reload button to allow +you to reread your ©test fileª after you have modified +it without having to start a new file trace. After the +file has been reread, the file trace is reset. +## + +rename macro + +AnaGram uses a number of macros in its generated code. +It is possible, therefore, to run into naming +collisions with other components of your program. The +rename macro ©attribute statementª allows you to change +the name AnaGram uses for a particular macro to avoid +these problems. + +For example, in the Microsoft +Foundation Classes, V4.2, there is a class called +"CONTEXT". If you use the ©context stackª option in +AnaGram, your ©parserª will have a macro called +©CONTEXTª. To avoid the name collision, add the +following attribute statement to any configuration +section in your grammar: + rename macro CONTEXT AG_CONTEXT +Then, simply use "AG_CONTEXT" where you would otherwise +have used "CONTEXT". +## + +reserve keywords + +"reserve keywords" is an ©attribute statementª which +can be used to specify a list of ©keywordªs that are +reserved and cannot be used except as explicitly +specified in the grammar. In particular this switch +enables AnaGram to avoid issuing meaningless ©keyword +anomalyª warnings. + +AnaGram does not automatically presume that keywords +are also reserved words, since in many grammars there +is no need to specify reserved words. + +Reserve keywords statements must be made inside +©configuration sectionsª. Each statement consists of +the keyword "reserve keywords" followed by a list of +keyword ©tokensª. The tokens must be separated by +commas and the list must be enclosed in braces ({ }). +Each keyword listed will then be treated as a reserved +word. +## + +Reset Button + +The Reset button, found on ©File Traceª and ©Grammar +Traceª windows restores the initial configuration of +the trace. This is especially convenient for ©Conflict +Traceª or other ©Auxiliary Traceªs. +## + +Resolved Conflicts + +AnaGram creates the Resolved Conflicts window only when +the grammar it is analyzing has ©conflictªs and when +those conflicts have been resolved by ©precedence +declarationªs, by "©stickyª" statements, or in +connection with the explicit use of a token specified in +a ©disregardª statement. The Resolved Conflicts window +shows the conflicts that have been resolved, using the +same format as that of the ©Conflictsª Window. The rule +chosen is marked with an asterisk in the leftmost column +of the window. +## + +Resynchronization + +"Resynchronization" is the process of getting your +parser back in step with its input after encountering a +©syntax errorª. As such, it is one method of ©error +recoveryª. Of course, you would resynchronize only if it +is necessary to continue after the error. There are +several options available when using AnaGram. You could +use the ©auto resynchª option, which causes AnaGram to +incorporate an automatic resynchronizing procedure into +your parser, or you could use the ©error token +resynchronizationª option, which is similar to the +technique used by YACC programmers. +## + +right + +"right" controls a ©precedence declarationª, indicating +that all of the listed ©rule elementsª are to be +considered ©right associativeª. +## + +Right Associative + +A binary operator is said to be right associative if +an expression with repeated instances of the operator +is to be evaluated from the right. Thus, for example, +when '=' is used as an assignment operator + x = a = b +is normally taken to mean a = b followed by x = a The +assignment operator is said to be right associative. + +In ©grammarªs with ©conflictªs, you may use ©precedence +declarationªs to specify that an operator should be +right associative. +## + +Rule Context + +The Rule Context window can be accessed via the +©Auxiliary Windowsª menu in any window that displays +©grammar ruleªs. AnaGram displays all occurrences in the +©grammarª of all the ©reduction tokenªs for the rule. +## + +RULE_CONTEXT + +RULE_CONTEXT is a macro you may use if you have defined +a ©context stackª. In any reduction procedure, +RULE_CONTEXT will be a pointer to the context value +stacked before the first token of the rule being +reduced. Since the context stack contains an entry for +each token in the rule, you may inspect the context +value for each token in the rule by subscripting +RULE_CONTEXT. RULE_CONTEXT[k] is the context of the +(k-1)th token in the rule. +## + +Rule Coverage + +"Rule Coverage" is the name of both a ©configuration +switchª and a window. The configuration switch +defaults to off. If you set it, AnaGram will include +code in your ©parserª to count the number of times your +parser identifies each ©grammar ruleª in your grammar. +To maintain the counts, AnaGram declares, at the +beginning of your parser, an integer array, whose name +is created by appending "_nrc" to your ©parser nameª. +The array contains one counter for each rule you have +defined in your grammar. There are no entries for the +auxiliary rules that AnaGram creates to deal with set +overlaps or ©disregardª statements. In order to identify +positively all the rules that the parser reduces, +AnaGram has to turn off certain optimization features in +your parser. Therefore a parser that has rule coverage +enabled will run slightly slower that one with the +switch off. + +In addition, AnaGram creates a pair of functions to +write the counters to a file and to initialize the +counters from a file. The names of these functions are +given by appending "_write_counts" and "_read_counts" to +the name of your parser. The name of the file is given by the +©coverage file nameª paramater which defaults +to the name of your ©syntax fileª but with the extension ".nrc". + +If rule coverage is enabled, AnaGram will also enable the +Rule Coverage option on the ©Browse Menuª. If you select +Rule Coverage, AnaGram will initialize a ©Rule Coverageª +window from the rule count file you select. + +AnaGram will +warn you if the rule count file is older than +the syntax file, since under those conditions, the +coverage file might be invalid. +## + +Rule Derivation, Token Derivation + +You can use the Rule Derivation and Token Derivation +windows to understand the nature of ©conflictªs in your +grammar. To create these windows, open the ©Conflictsª +window. Move the cursor bar to a ©completed ruleª, that +is, one which has no marked token. Press the right mouse button to pop +up the ©Auxiliary Windowsª menu. You may then select the Rule +Derivation or the Token Derivation. + +The Rule Derivation window and the Token Derivation +window, together, show how a ©conflictª, or ambiguity, +has arisen in your grammar. Both windows contain a +sequence of rules, and both begin with the same rule, +the rule which is the root cause of the conflict. + +Each subsequent line in the rule derivation is an +©expansionª of the marked token in +the previous rule. The last rule in the derivation +window is the rule you selected in the Conflicts +window. Thus the rule derivation window shows you how +the rule involved in the conflict derives from the +root. + +Each subsequent line in the token derivation window +shows an expansion of the marked token in the previous rule. The first +token of the last rule in the derivation window is the token that +causes the conflict. This is the usage that is inconsistent with other +usages of this token in the conflict state. + +The Rule Derivation and Token Derivation windows each +have five auxiliary windows. The ©Rule Contextª window +is keyed to the highlighted rule. the other four +windows, the ©Expansion Rulesª window, the +©Productionsª window, the ©Set Elementsª window and the +©Token Usageª window are keyed to the marked token. +Remember that there is no marked token on the last +line of the Rule Derivation window. +## + +Rule Element + +A ©grammar ruleª is a list of "rule elements", separated +by commas. Rule elements may be ©token nameªs, +©character setsª, ©keywordªs, ©immediate actionªs, or +©virtual productionsª. When AnaGram encounters a rule +element for which no token presently exists, it creates +one. + +Any rule element may be followed by a ©parameter assignmentª +in order to make the ©semantic valueª of +the rule element available to a ©reduction procedureª. +## + +Rule Number + +AnaGram assigns a unique rule number to each ©grammar +ruleª that you specify in your grammar. Rules are +numbered sequentially as they are encountered in the +©syntax fileª. AnaGram constructs rule 0 itself. Rule +zero has a single ©rule elementª, the ©grammar tokenª, +unless you have an ©disregardª statement in your +grammar. In this case, there will be two elements. + +In AnaGram displays, rule numbers are displayed with a +prefixed 'R' and a three digit decimal number. +## + +Rule Stack, Rule Stack Pane + +The Rule Stack pane appears across the bottom of a ©Grammar +Traceª or ©File Traceª window. It provides an alternate view of the +parser stack for the trace, showing, for each state, rules instead of +the tokens that you see in the ©Parser Stack paneª. Because it is +synched with the syntax file window, the Rule Stack makes it easy to +see the relationship between the trace and your grammar. + +For each level of the parser stack, the Rule Stack shows the ©parser +stateª number and all the active rules. The active rules at any +state consist of all the ©expansion ruleªs for the state that are +consistent with the input at all subsequent states. + +Except for the last level +of the stack, each rule has a ©marked tokenª, which in the default +configuration is displayed in bold, italic type. The significance of +the marked token is that all tokens in the rule to the left of the +marked token have already been matched in the input, and the input +in subsequent levels is consistent so far with the marked +token. As more input is processed, rules +that are inconsistent with the new input are deleted from the display. + +The last level of the stack shows the current state of the parser and +the rules against which the ©lookahead tokenª will be matched. At +this level, there may be rules with no marked tokens. These are +rules which have been matched exactly in the input. If there is +more than one such rule, at the next parser step the parser will use +the lookahead token to determine which rule to reduce. + +In the last level of the stack, marked tokens represent the input the +parser expects to see. + +The Rule Stack pane is synched with the ©syntax fileª window if it is +visible so that the rule highlighted in the Rule Stack can be seen +in context in the syntax file. +For rules that AnaGram +generated automatically (to implement ©virtual productionsª +or the ©disregardª statement). the cursor bar will move to the +top of the syntax file window. + +The Rule Stack pane is also synched with the other panes in the trace. +As you move the cursor bar in the Rule Stack, the cursor bar in the +Parser Stack pane will track the stack level in the Rule Stack. In +a File Trace, text will be highlighted in the ©Test Fileª pane +corresponding to the selected token in the Parser Stack pane. In a +Grammar Trace, the marked token in the highlighted rule will be +highlighted in the ©Allowable Input paneª. + +Clicking the right mouse button pops up an ©Auxiliary Windowsª menu to +give you more information about the highlighted rule. +## + +Rule Table + +The Rule Table lists, in numerical order, all the +©grammar ruleªs defined in your ©grammarª. Each rule is +preceded by the ©nonterminalª tokens which produce it. +If you are not using ©semantically determined +productionªs, then there will be precisely one token +line per rule. The Rule Table is synched to your ©syntax +fileª to show the rule in context. +## + +Semantic Value, Token Value + +A ©tokenª generally has a "semantic value", or "token +value", as well as the ©token numberª which identifies +it syntactically. Each instance of the token in the +input stream can have a different value. For example, +you might have a token called "variable name". In one +instance the variable name might be "widget" and in +another, "wombat". Then "widget" and "wombat" would be +the semantic values in the two instances. Another token +might have numeric semantic values. + +You can specify the C or C++ ©data typeª of the token value. +The data type of "variable name" could be "char *" +where the value is a pointer to a string holding the name. There +are separate default types for the values of ©terminalª +and ©nonterminalª tokens. In the usual case of ordinary +character input, the value of a terminal token is just +the ascii character code. + +The value of a nonterminal token is determined by the ©reduction procedureªs +attached to the rules the token produces. If there is no reduction +procedure, the value of the token is the value of the first token +in the rule. + +It should be noted that the stack operations have been +implemented in such a way that a C++ object that belongs +to a class for which the assignment operator has been +overridden will encounter serious problems. This shortcoming +will be addressed in a future version of AnaGram. Note that +there is no problem with using a pointer to any C++ object. +## + +Semantically Determined Production + +A "semantically determined production" is one which has +more than one ©reduction tokenª specified on the left +side of the ©productionª. You would write such a +production when the reduction tokens are syntactically +indistinguishable. The ©reduction procedureª may then +specify which of the listed reduction tokens the grammar +rule is to reduce to based on semantic considerations. +If there is no reduction procedure, or the reduction +procedure does not specify a reduction token, the parser +will use the first syntactically correct one in the list. + +To simplify changing the reduction token, AnaGram +provides a predefined macro, ©CHANGE_REDUCTIONª. + +The ©semantic valueªs of all the reduction tokens for a +given semantically determined production must have the +same ©data typeª. + +©File Traceª and ©Grammar Traceª have a ©Reduction Choices paneª which +appears when a semantically determined production is invoked and +you need to choose a reduction token. +## + +Set Elements + +The Set Elements window is available via the ©Auxiliary +Windowsª popup menu from windows which specify character sets, +partition sets or tokens. It displays the actual +characters which make up the set, or which map to the +specified token. For each character, the numeric code as +well as its display symbol is given. +## + +Set Expression, Expression + +A set expression is an algebraic expression used to +define a ©character setª in terms of individual +characters, ranges of characters, or other sets of +characters as constructed using ©complementsª, ©unionsª, +©intersectionsª, and ©differencesª. +## + +Shift Action + +The shift action is one of the four actions of a +traditional ©parsing engineª. The shift action is +performed when the input token matches one of the +acceptable input tokens for the current ©parser stateª. +The ©semantic valueª of the token and the current +©state numberª are stacked, the ©parser stack indexª is +incremented and the state number is set to a value +determined by the previous state and the input token. +## + +Shift-Reduce Conflict + +A "shift-reduce" ©conflictª occurs if in some ©parser +stateª there exists a ©terminal tokenª that should be +shifted, because it is legitimate input for one of the +©grammar ruleªs of the state, but should also be used to +reduce some other rule because it is a ©reducing tokenª +for that rule. +## + +sn + +sn is a field in a ©parser control blockª to which your +©error handlingª routines and your ©reduction +procedureªs may refer. Its value is the current ©state +numberª of your ©parserª. sn is modified every time +your parser "shifts" (performs a ©shift actionª on) a +token or reduces (performs a ©reduce actionª on) a +©productionª. +## + +ss + +ss is a field in a ©parser control blockª to which your +©error handlingª and ©reduction procedureªs may refer. +It is the ©state stackª for your ©parserª. Before every +©shift actionª, the current ©state numberª, ©snª, is +stored in PCB.ss[PCB.ssx], where ©ssxª is the ©parser +stack indexª. PCB.ssx is then incremented. +## + +ssx + +ssx is a field in a ©parser control blockª to which +your ©error handlingª routines and ©reduction +procedureªs may refer. It is the ©parser stack indexª +for your ©parserª. On every ©shift actionª it is +incremented. On every ©reduce actionª the length of +the ©grammar ruleª being reduced is subtracted from +PCB.ssx. +## + +State Definition + +The State Definition window can be accessed via the +©Auxiliary Windowsª popup menu from any window that specifies +states. It displays the ©characteristic rulesª that +define the state. The rules are displayed with a marked token, which is +the next token needed in the input if the particular ©grammar ruleª is +to be matched. If the rule is a completed rule, no token will be +marked. + +Each line contains the state number, blank if it is the +same as the state number of the previous line, the ©rule +numberª, and finally the ©marked ruleª. + +The ©State Definition Tableª, found in the ©Browse +Menuª, displays the characteristic rules for all states +in the ©grammarª. +## + +State Definition Table + +The State Definition Table lists, for each ©parser +stateª, all of the ©characteristic rulesª which define +that state. The rules are displayed with a ©marked tokenª, which is the +next token needed in the input if the particular ©grammar ruleª is to +be matched. If the rule is a completed rule, no token will be +marked. + +Each line contains the state number, blank if it is the +same as the state number of the previous line, the ©rule +numberª, and finally the ©marked ruleª. + +In the ©Auxiliary Windowsª menu for many states there is +a ©State Definitionª entry which provides the +characteristic rules for the ©parser stateª identified by +the cursor bar. +## + +State Expansion + +The State Expansion window may be accessed using the +©Auxiliary Windowsª menu from any window that identifies +a particular ©parser stateª. It shows the complete set +of ©expansion ruleªs for the state, consisting of the +union of the set of ©characteristic ruleªs and, for each +characteristic rule, the set of expansion rules for the +marked token. Thus the State +Expansion window shows all possible legal input to your +parser in the given state. +## + +Sticky + +"Sticky" statements are ©attribute statementªs and may +be used just like a ©precedence declarationª to resolve +©conflictªs. If a ©shift-reduce conflictª occurs in a +state where the ©characteristic tokenª is "sticky", the +shift action will always be chosen. + +Sticky statements must be made inside ©configuration +sectionsª. Each statement consists of the keyword +"sticky" followed by a list of ©tokensª. The tokens must +be separated by commas and the list must be enclosed in +braces ({ }). Each token will then be treated as sticky. + +All conflicts which are resolved by sticky statements +are listed in the ©Resolved Conflictsª window. +## + +subgrammar + +Declaring a nonterminal token to be a "subgrammar" +changes the way AnaGram searches for reducing tokens. + +Normally, if there is a completed rule in a particular +state, AnaGram investigates all states to which the +parser could jump on reducing the rule. It then +considers all terminal tokens that are acceptable input +in these states to be reducing tokens for the given +rule. If this set of tokens overlaps the set of tokens +for which there are shift actions, or the set of tokens +which reduce a different rule, there is a ©conflictª. + +Now consider a particular nonterminal token T and all +the rules it produces, whether directly or indirectly. +What the preceding remarks mean is that in determining +the reducing tokens for any of these rules, AnaGram +considers not only the definition, but also the usage +of T. + +There are circumstances when it is inappropriate to +consider the usage of T. The most common example occurs +when building a lexical scanner for a language such as +C. In this case, you can write a complete grammar for a +C token with no difficulty. But if you try to extend it +to a sequence of tokens, you get scores of conflicts. +This situation arises because you specify that any C +token can follow another, when in actual practice, an +identifier, for example, cannot follow another +identifier without some intervening space or +punctuation. While it is theoretically possible to write +a grammar for a sequence of tokens that has no +conflicts, it is not usually pretty. + +The subgrammar declaration resolves this problem by +telling AnaGram that when it is looking for reducing +tokens for any rule produced directly or indirectly by a +subgrammar token, it should disregard the usage of the +token and only consider usage internal to the definition +of the subgrammar token, as though the subgrammar token +were the start token of the grammar. + +The subgrammar declaration is made in a ©configuration +sectionª and consists of the keyword "subgrammar" +followed by a list of token names separated by +commas and enclosed in braces ({ }). For example: + subgrammar { name, number} +## + +Suspicious Production + +This ©warningª message appears when AnaGram finds a +©productionª of the form x -> x. There is probably a +typo somewhere in your ©syntax fileª. This production +causes a ©conflictª in your grammar. AnaGram leaves +this production in your ©grammarª, but if you build a +parser, it will never succeed in recognizing this +production. +## + +Switch Takes on/off Values Only + +The specified parameter is a ©configuration switchª. The +only values it may be assigned are ON and OFF. + +## + +Symbol + +In writing your ©grammarª you use symbols, or names, to +represent most of your ©tokensª. You may also use +symbols to represent ©character setªs, ©virtual +productionªs, ©immediate actionªs, or ©keywordªs. + +A symbol, or name, must begin with a letter or an +underscore. It may then contain any number of these +characters as well as digits and embedded white space +(including comments). For identification purposes all +adjacent white space characters within a symbol name +are considered to be a single blank. + +Upper case and lower case letters are considered to be +different. + +Examples: + token name + token/*embedded comment*/name + + All symbols used in your grammar are listed in +the ©Symbol Tableª window found in the ©Browse Menuª. +## + +Symbol Table + +The Symbol Table lists all the symbols, or names, you +used in your grammar. ©Symbolªs may be used, of course, +to identify ©tokensª, ©definitionsª, ©virtual +productionsª, ©immediate actionªs, or ©keywordªs. + +Each line in this table identifies a single symbol. The +first field is the token number, if any. This is +followed by the name. If the name identifies an +©expressionª or virtual production, it is followed by an +equal sign and the expression or virtual production. +## + +Syntax Analysis Aborted + +This ©warningª message appears if, because of previous +errors, AnaGram is unable to complete the ©Analyze +Grammarª command on your ©syntax fileª. +## + +Syntax Directed Parsing + +Syntax directed parsing, or formal parsing, is an +approach to building ©parsersª based on formal language +theory. Given a suitable description of a language, +called a ©grammarª, there are algorithms which can be +used to create parsers for the language automatically. +In this context, the set of all possible inputs to a +program may be considered to constitute a language, and +the rules for formulating the input to the program +constitute the grammar for the language. + +The parsers built from a grammar have the advantage +that they can recognize any input that conforms to the +rules, and can reject as erroneous any input that fails +to conform. + +Since the program logic necessary to parse input is +often extremely intricate, programs which use formal +parsing are usually much more reliable than those built +by hand. They are also much easier to maintain, since +it is much easier to modify a grammar specification +than it is to modify complex program logic. +## + +Syntax Error + +When you specify a ©grammarª, you specify a set of +input character or token sequences which your ©parserª +will "recognize". Usually it is possible for there to +be other sequences of input tokens which deviate from +the rules set down by your grammar. Should your parser +find such a sequence in its input which is not +explicitly allowed for in your grammar, it is said to +have found a "syntax error". The general treatment of +syntax errors is called ©error handlingª, of which there +are two distinct aspects: ©error diagnosisª and ©error +recoveryª. AnaGram allows you to make provision for +error handling to fit your needs, but should you not do +so, it will provide simple default error handling. +## + +Statements + +AnaGram source files, or ©syntax fileªs, consist of +the following types of statements: + ©productionªs + ©configuration sectionªs + ©embedded Cª + ©definitionªs + ©token declarationªs + + Statements may be in any order. Each statement must +begin on a new line. If a statement cannot be +construed as complete, it may continue onto another +line. + +Statements may contain spaces, tabs or comments, but +may not contain blank lines. +## + +Syntax File + +Input files to AnaGram are called syntax files. The +default extension for syntax files is .syn. A +syntax file contains a "©grammarª" and supporting C or +C++ code. The file consists of several distinct types +of statements. These are ©token declarationsª, +©productionªs, ©definitionsª, ©embedded Cª, and +©configuration sectionsª. There may be as many of each +as you need, in whatever order you find convenient. + +Each such statement begins on a new line. +## + +SYNTAX_ERROR + +SYNTAX_ERROR is a macro which your parser will invoke +when it encounters a syntax error in its input stream. +If you have set the ©diagnose errorsª ©configuration +switchª, the static variable ©PCBª.©syntax_errorª will +contain a pointer to a diagnostic message when +SYNTAX_ERROR is invoked. If you have also set the +©error frameª switch, ©PCBª.©error_frame_ssxª and +©PCBª.©error_frame_tokenª will also be set +appropriately. +## + +Tab Spacing + +"tab spacing" is a ©configuration parameterª which +controls the expansion of tabs when AnaGram displays +your source file or test files in the ©File Traceª window. + +The value of "tab spacing" is also used to set the +default value of the ©TAB_SPACINGª macro in your parser. + +The default value of "tab spacing" is 8. If you prefer +a different value, you should probably include an +appropriate statement in your ©configuration fileª. For +example: + + tab spacing = 2 +## + +TAB_SPACING + +If you have enabled the ©lines and columnsª switch, your +parser needs to know tab spacing in order to increment +the column count when it encounters a tab character. It +is set up to use the value given by the TAB_SPACING +macro. If you do not define TAB_SPACING in your parser, +AnaGram will provide a default definition, setting it to +the value of the ©tab spacingª ©configuration +parameterª. +## + +Terminal, Terminal Token + +A "terminal token" is a token which does not appear on +the left side of a ©productionª. It represents, +therefore, a basic unit of input to your ©parserª. If +the input to your parser consists of ascii characters, +you may define terminal tokens explicitly as ascii +characters or as sets of ascii characters. If you have a +lexical scanner, or preprocessor, which produces numeric +codes, you may define the terminal tokens directly in +terms of these numeric codes. +## + +Test File Binary + +"Test file binary" is a ©configuration switchª which +defaults to off. When it is off, and you select the +©File Traceª option, AnaGram will read your test files +in "text" mode, discarding carriage return characters. +When "test file binary" is on, AnaGram will read test +files in "binary" mode, preserving carriage return +characters. + +If your parser needs to recognize carriage return +characters explicitly, you should turn "test file +binary" on. +## + +Test File Mask + +"Test file mask" is a string-valued ©configuration +parameterª which AnaGram uses to set up the file dialog +for the ©File Traceª command. It defaults to "*.*". If +there is a conventional file name format for the input +to the ©parserª you are developing, you will probably +want to set "test file mask" in a ©configuration +sectionª in your ©syntax fileª so it is easier to pick +out your test files. +## + +Test range + +"Test range" is a ©configuration switchª which defaults +to on. When it is set, i.e., on, AnaGram will configure +your parser so that it checks input characters to +verify that they are within the range given by the +©character universeª before it indexes the ©token +conversionª table. If range testing is not necessary +for your parser, you may turn test range off and get a +slight improvement in the performance of your parser. +## + +Thread Safe Parsers + +AnaGram 2.01 incorporates several changes designed to make it +easier to write thread safe parsers. + +First, the ©parserªs generated by AnaGram 2.01 no longer use static or global +variables to store temporary data. All nonconstant data have been +moved to the ©parser control blockª. + +Second, two new features which make it substantially +easier to build thread safe parsers have been added. The ©reentrant parserª switch +makes the entire parser reentrant, by passing the pointer to the parser control +block as an argument on all function calls. The ©extend pcbª statement allows +you to add your own variable declarations to the ©parser control +blockª so you can avoid references to global or static variables in +your ©reduction procedureªs. + +Third, new support has been added for C++ classes, including +the ©wrapperª statement and the ©PCB_TYPEª macro. +## + +token_number + +token_number is a field in a ©parser control blockª to +which your ©error handlingª procedures and ©reduction +procedureªs may refer. It contains the actual ©token +numberª of the current input token. If you are supplying +token numbers directly, it is the result of using the +actual input character to index the ©token conversionª +array, ag_tcv. +## + +Token + +Tokens are the units with which your parser works. +There are two kinds of tokens: ©terminal tokensª and +©nonterminal tokensª. These latter are identified by the +parser as sequences of tokens. The grouping of tokens +into more complex tokens is governed by the ©grammar +rulesª, or ©productionªs in your grammar. In your +grammar, tokens are denoted by ©token nameªs, ©virtual +productionsª, explicit ©character representationsª, +©keywordªs, ©immediate actionªs, or ©expressionªs which +yield ©character setsª. +## + +Token Conversion + +By using ©character setª ©expressionªs, you may in your +©syntax fileª define a number of input characters as +being syntactically equivalent. When your ©parserª gets +an input character, it uses the character code to index +a table called ©ag_tcvª. The value it extracts from this +table is the ©token numberª for the input character. The +actual character code of the input character becomes the +©token valueª. +## + +Token Declaration + +A token declaration is simply a ©productionª with no +right hand side. Token declarations can be used to +define the ©data typeªs of tokens. To define the data type +of a token, simply put the data type in parentheses +preceding the name of the token. You can use a list of +tokens joined by commas, if you wish. Thus: + (char *) variable name, function name +could be used to specify that the ©semantic valueªs of +the tokens "variable name" and "function name" are both +character pointers. + +Of course, token types may be specified as part of any +production the token generates, but sometimes, in the +interest of clarity, it is advisable to group all +declarations together. +## + +Token Name + +All ©nonterminal tokensª that you define in your +©grammarª by means of explicit ©productionªs must have +names by which they may be referenced. Token names are +©symbolsª which represent the token syntactically in +your grammar specification. +## + +Token Names + +"Token names" is a ©configuration switchª that defaults +to off. If it is set, it causes AnaGram to include in +the ©parser fileª a static array of character strings, indexed by +token number, which provides ascii representations of token +names. The name of this array is given by "<parser name>_token_names", +where <parser name> is the name of the parser function as +given by the value of the ©parser nameª parameter. + +AnaGram also defines a macro, ©TOKEN_NAMESª, which evaluates +to the name of the array. + +The array contains strings for all grammar tokens which have +been explicitly named in the syntax file as well as tokens +which represent ©keywordªs or single character constants. + +The array is useful in creating ©syntax errorª diagnostics. + +Prior to version 2.01 of AnaGram, the TOKEN_NAMES array contained +strings only for explicitly named tokens. If this restriction +is required, set the ©token names onlyª switch. + +Token names are also included if the ©diagnose errorsª +switch is set. +## + +TOKEN_NAMES + +"TOKEN_NAMES" is the name of a macro that AnaGram defines to +provide access to a static array of character strings indexed by +token number, which provides ascii representation of token +names. The array is generated if any of the ©token namesª, +©token names onlyª or ©diagnose errorsª switches are ON. + +If ©token names onlyª is set, the array contains non-empty +strings only for those tokens which are explicitly named +in the syntax file. Otherwise, the array also contains +strings for tokens which represent keywords or single +character constants. +## + + +token names only + +"Token names only" is a ©configuration switchª that defaults to +off. If it is set, it will cause AnaGram to include in the +parser file a static array containing the names of the tokens +in your grammar. This array will include only those tokens +to which you have assigned names explicitly and will not +include character constants or keywords. "Token names only" +takes precedence over ©token namesª. +## + +Token Not Used + +"Token not used, TXXX: <token name> is a ©warningª +message which appears if AnaGram finds an unused ©tokenª +in your ©grammarª. Often an unused token is the result +of an oversight of some kind and indicates a problem in +the grammar. +## + +Token Number + +AnaGram assigns a unique number, called the "token +number" to each token in the grammar, no matter whether +it is a ©terminal tokenª or a ©nonterminal tokenª. Your +parser does all of its analysis of your input stream +using token numbers as its primary material. + +You may need to know the values of token numbers that +AnaGram has assigned, either so a lexical scanner can +output correct token numbers, or so a ©reduction +procedureª can correctly resolve a ©semantically +determined productionª. + +To help you, AnaGram defines enumeration constants for +each of the named tokens in your grammar. The definition +of these constants is in the ©parser headerª file. +## + +Token Representation + +Not all of the ©tokensª in your grammar have a ©token +nameª. Some of the tokens may represent ©character setsª +which you spelled out explicitly, ©virtual productionsª, +©immediate actionªs, or ©keywordªs. In its analysis +tables, AnaGram tries to provide a meaningful +representation for tokens whenever it can. Its first +choice is to use the name, if it has one. Otherwise it +will use the set definition or the definition of the +virtual production if one exists. If AnaGram cannot +otherwise represent your token, it will resort to using +the token number which it normally represents using the +letter T followed by a three digit, zero-padded token +number. +## + +Token Table + +The Token Table lists all the tokens of your grammar. +The first field is the token number. It is followed by a +flag field which is "zl" if the token is a ©nonterminal +tokenª and is ©zero lengthª. If the token is nonterminal +and not zero length, the flag field contains "nt". If +the token is a ©terminal tokenª, the field is blank. + +The next field is blank unless the token has been +declared ©stickyª or has had a ©precedenceª level +assigned. If the token is sticky, this field will +contain 's'. If a precedence level has been assigned, +this field will contain the letter 'l', 'r', or 'n' to +indicate associativity followed by the precedence +level. Finally there is the ©data typeª of the ©semantic +valueª of this token and the ©token representationª. +## + +Token Usage + +The Token Usage table may be accessed via the ©Auxiliary +Windowsª menu from any window that identifies tokens. It +shows all the rules in the grammar that use the token. +## + +Top Margin + +"Top margin" is an ©obsolete configuration parameterª. +## + +Trace Coverage + +Trace Coverage is a table which is built whenever you +run ©Grammar Traceª, one of its pre-built versions, or a ©File +Traceª. You can access it from the ©Browse Menuª. It shows the number +of times each rule in your grammar has been reduced. Unless you have +set the ©Rule Coverageª ©configuration switchª, some ©null productionªs +and some rules that consist of only one element will not be counted +because of speed optimizations in the parser tables. + +The Trace Coverage tables are reset to zero when you load a new syntax +file or start AnaGram. +## + +Compound Action + +Traditionally, ©LALR-1 parserªs use only four simple +©parser actionªs: shift, reduce, accept and error. +AnaGram parsers use a number of compound actions +in order to reduce the size of parse tables and +speed up processing. A single compound action +may replace several simple shift or reduce actions. + +The ©Traditional Engineª ©configuration switchª may +be used to force AnaGram to use only the simple +actions. +## + +Traditional Engine + +"Traditional engine" is a ©configuration switchª that +defaults to off. Traditional ©LALR-1 parserªs use a +©parsing engineª which has only four actions: + ©shift actionª + ©reduce actionª + ©accept actionª + ©error actionª + + +AnaGram, in the interest of +faster execution and more compact parse tables, +uses a parsing engine with a number of +short-cut, or ©compound actionªs. The "traditional engine" switch tells +AnaGram not to use the short-cut actions. + +You would turn this switch on if you wished to use the ©Grammar Traceª +or ©File Traceª to see how the standard four parser actions work for +a particular combination of grammar and input. Note that to see the +effects of single parser actions, you must use the ©Single Stepª +button. Remember that in the Grammar Trace, when you single step and +the token you have selected causes a reduce action, it will appear +on the ©lookahead lineª of the ©parser stack paneª and will be preselected +in the ©allowable input paneª until it is finally shifted in to +the parser stack. + +Normally, you should leave the "traditional engine" switch off, Then +AnaGram will, whenever possible, compress several parsing actions into +one compound action in order to speed execution of the parser. + +Unfortunately use of the term "traditional" has sometimes created the +impression that there is a conservative aspect to the operation of +traditional engine parsers. This is not the case. They have the same +effect, but are slower and have much larger tables. +## + +Type Redefinition + +"Type Redefinition of TXXX: <token name> is a ©warningª +message which appears when AnaGram finds a conflicting +©data typeª definition for a ©tokenª in your ©grammarª. +The new definition will override the previous one. If +you intend to use different type definitions, you should +use extreme caution and check the generated code to +verify that your ©reduction procedureªs are getting the +values you intended. +## + +Undefined Symbol + +"Undefined symbol: <name>" is a ©warningª message which +appears when AnaGram encounters an undefined ©symbolª +while evaluating a ©character setª expression. The +following warning in the ©Warningsª window identifies +the particular ©tokenª AnaGram was trying to evaluate. +## + +Undefined Token + +"Undefined token TXXX: <name>" is a ©warningª message +which appears when the indicated ©tokenª has been used +in the ©grammarª, but there is no definition of it as a +©terminal tokenª nor does any ©productionª define it as +a ©nonterminal tokenª. +## + +Unexpected + +"Unexpected <element 1> in <element 2>" is a ©warningª +message which you may get when AnaGram analyzes your +grammar. It appears when AnaGram unexpectedly encounters an instance of +syntactic element 1 at the specified location in an instance of +syntactic element 2. AnaGram cannot reliably continue parsing its +input. Therefore, it limits further analysis to scanning for syntax +errors. If this error is not the result of a prior error, you should +correct your ©syntax fileª. Remember that this error could result from +something missing just as well as from something extraneous. + +If element 1 is ©eofª, it often means that you have +an unbalanced brace or comment delimiter in the code +following the indicated location. +## + +Union + +The union of two sets is the set of all elements that +are to be found in one or another of the two sets. In an +AnaGram syntax file the union of two ©character setsª A +and B is represented using the plus sign, as in A + B. +The union operator has the same precedence as the +©differenceª operator: lower than that of ©intersectionª +and ©complementª. The union operator is ©left +associativeª. + +Watch out! In an AnaGram syntax file 65 + 97 represents +the character set which consists of the lower case 'a' +and upper case 'A'. It does not represent 162, the sum +of 65 and 97. +## + +Video mode + +"Video mode" is an ©obsolete configuration parameterª. +## + +Virtual Production + +Virtual productions are a special short hand +representation of ©grammar rulesª which can be used to +indicate a choice of inputs. They are an important +convenience, especially useful when you are first +building a grammar. + +Here are some examples of virtual productions: + name? // optional name + name?... // 0 or more instances of name + {name | number} // exactly one name or number + {name | number}... // one or more instances of name or number + [name | number] // optional choice of name or number + [name | number]... // zero or more instances of name or number + + AnaGram rewrites virtual productions, so that when you +look at the syntax tables in AnaGram, there will be +actual ©productionªs replacing the virtual productions. + +A virtual production appears as one of the rule +elements in a grammar rule, i.e. as one of the members +of the list on the right side of a production. + +The simplest virtual production is the "optional" +token. If x is an arbitrary token, x? can be used to +indicate an optional x. + +Related virtual productions are x... and x?... where +the three dots indicate repetition. x... represents an +arbitrary number of occurrences of x, but at least one. +x?... represents zero or more occurrences of x. + +The remaining virtual productions use curly or square +brackets to enclose a sequence of rules. The brackets +may be followed variously by nothing, a string of three +dots, or a slash, to indicate the choices to be made +from the rules. Note that rules may be used, not merely +tokens. + +If r1 through rn are a set of ©grammar rulesª, then + {r1 | r2 | ... | rn} +is a virtual production that allows a choice of exactly +one of the rules. Similarly, + {r1 | r2 | ... | rn}... +is a virtual production that allows a choice of one or +more of the rules. And, finally, + {r1 | r2 | ... | rn}/... +is a virtual production that allows a choice of one or +more of the rules subject to the side condition that +rules must alternate, that is, that no rule can follow +itself immediately without the interposition of some +other rule. This is a case that is not particularly +easy to write by hand, but is quite useful in a number +of contexts. + +If the above virtual productions are written with [] +instead of {}, they all become optional. [] is an +optional choice, []... is zero or more choices, and +[]/... is zero or more alternating choices. + +Null productions are not permitted in virtual +productions in those cases where they would cause an +intrinsic ambiguity. + +You may use a ©definitionª statement to assign a name to +a virtual production. +## + +Void token + +"Void token, <token name>, used as parameter" is a +©warningª message which appears if AnaGram encounters a +©data typeª definition declaring a ©tokenª to have type +void when the token has previously been used in a +©parameter assignmentª for a ©reduction procedureª. Your +C or C++ compiler will complain when it tries to compile +the call to the reduction procedure. +## + +vs + +vs is a field in a ©parser control blockª to which your +©error handlingª procedures and ©reduction procedureªs +may refer. It is the ©parser value stackª for your +parser. The ©semantic valuesª of the ©tokensª identified +by the parser are stored in the value stack. The value +stack, like the other ©parser stacksª, is indexed by +©PCBª.©ssxª. When you are executing a reduction +procedure, PCB.vs[PCB.ssx] contains the semantic value +of the first token in the grammar rule you are reducing, +PCB.vs[PCB.ssx+1] contains the second, and so forth. The +return value from your reduction procedure will be +stored in turn in PCB.vs[PCB.ssx]. + +vs is defined to be of type $_vt, where "$" represents +the name of your parser. AnaGram defines $_vt to +be a union of fields of sizes corresponding to all the +different data types declared in your syntax for the +semantic values of your tokens. In order to avoid +restrictions on the use of C++ classes, the fields are +defined as character arrays. On some processors which +have byte alignment restrictions for multibyte data, +you might encounter a bus error. To correct this +problem, set the ©parser stack alignmentª parameter to +an appropriate data type. +## + +Warning + +If while analyzing your syntax file, AnaGram finds +something suspicious, it is likely to issue a warning. +The Warnings window will pop up automatically when the +analysis has been completed. If the warning is for a +©syntax errorª in your input file, you will have to fix +it, because AnaGram cannot successfully interpret it. +Otherwise, AnaGram will be able to create a ©parserª for +you, if you wish, no matter how serious the warnings may +be. + +You can bring up the Help topic associated with a highlighted warning +by pressing F1 or by clicking with a ©Help Cursorª. + +If you have syntax errors, AnaGram will synchronize the +cursor in the ©syntax fileª window with the cursor in the +Warnings window so that whenever the Warnings window is +active, the cursor bar in the syntax file window will +identify the location of the error. + +## + +What's New + +Changes in AnaGram 2.40 + +Most of the changes in AnaGram 2.40 are under the hood - cleanup of +source files, reorganization of the source tree, revision of build and +test procedures, and so forth, in preparation for the open source +release. All of this will, with luck, be invisible to the end user. + +Open Source + +AnaGram is now ©open sourceª. AnaGram itself +uses the 4-clause BSD ©licenseª; the ©parsing engineª, and thus the output +files, are licensed with the less restrictive zlib ©licenseª. Source +distributions are available from http://www.parsifalsoft.com. + +The manual has been re-typeset using LaTeX instead of WordPerfect. +The typographic consistency and formatting has been considerably +improved; unfortunately, the pagination is now completely different, +so page numbers are not portable to the new version. + +All the logic dealing with registration, trial copies, serial numbers, +and so forth has been removed. + +Unix Support + +The Unix build of the ©command line versionª of AnaGram (agcl) is now +supported and available to the public. There is at present no GUI for +the Unix version. The long-term goal is to migrate the AnaGram GUI +away from the closed (and orphaned) IBM Visual Age class library to +something else, probably GTK, so as to support both Windows and Unix. + +Improved Functionality + + Examples. The examples have been adjusted to the current dialect of +C++ and are now compilable again. The legacy "classlib" code some +still depend on is being phased out. + +Increased Convenience + + File names. File names in the AnaGram distribution and source +tree are no longer limited to 8+3 characters, and quite a few now have +less cryptic names. Additionally, all HTML files are now named ".html", +not ".htm". + + Installed files. The AnaGram.cgb and AnaGram.hlp files found in +older releases of AnaGram no longer exist; their contents are compiled +into the AnaGram executables instead. + +Bug Fixes + + Engine compiler error. The ©error_messageª field of the PCB has +been changed to const char * so current C++ compilers will accept the +code generated when ©diagnose errorsª is turned off. + + Multiple output header files. Including more than one AnaGram +output header file at once used to cause some compilers to issue a +warning, because an #ifndef directive was checking the wrong +symbol. This has been corrected. + + Wrappers and error tokens. AnaGram 2.01 generated uncompilable +code if you tried to use the ©wrapperª feature and error token +resynchronization at the same time. This has been corrected. + + More than 256 keywords. Build 8 of AnaGram 2.01 fixed certain +problems with large keyword tables, but in the process introduced +another, which is now fixed. + +For changes in the previous versions of AnaGram, see ©What's New in AnaGram +2.01ª and ©What's New in AnaGram 2.0ª. + +## + +What's New in AnaGram 2.01 + +Changes in AnaGram 2.01 + +Improved Functionality + + Improved support for building ©thread safe parsersª. All +nonconstant parser data previously declared as static variables has been +moved to the ©parser control blockª. When the ©reentrant parserª switch +is set, all references to the parser control block are passed to functions +via calling sequences. The ©extend pcbª switch provides a mechanism to +add user-defined variables to the parser control block. + + Improved support for C++ parsers. The ©wrapperª statement +provides C++ wrapper classes for objects to be stored on the ©parser value stackª. +The ©PCB_TYPEª macro allows you to derive a C++ class from the parser control +block and to access its members from your ©reduction proceduresª. + + Support for the ©ISO Latin 1ª character set. When using +the ©case sensitiveª switch, case conversion is performed for all ISO-Latin-1 +characters, not just those in the ASCII range. + + Improved support for error diagnostics. It is now possible for users +to provide their own text for the error messages created by the ©diagnose errorsª +switch. In addition, the ©token namesª table option now includes ascii representation +of individual characters and keywords instead of only named tokens. The ©token names +onlyª switch can be used for compatibility with previous versions of AnaGram + + More precise determination of error context. The tables used by the ©error frameª +option to provide the context of a syntax error have been reworked and now provide +a substantially more precise localization of the error. + +Improved error diagnostics in AnaGram + + ©Missing reduction procedureª diagnostic. +In addition to warning that there is a ©parameter assignmentª +without a ©reduction procedureª, this +diagnostic is now provided if the ©default reduction valueª +does not have the same ©data typeª as the ©reduction tokenª. + + ©Command line versionª. Diagnostics have been reformatted so +they can be recognized by the Microsoft Visual C++ IDE. + + Refined ©keyword anomalyª diagnostics. There should +now be fewer false alarms. + +Increased Convenience + + ©File Traceª. If your grammar uses ©semantically determined productionsª, +the File Trace feature will now remember the choices you have +made for ©reduction tokenªs, so that you do not have to make +the same choices over and over again as you work with an example. + + File Paths. The file paths in the #line directives created by the ©line numbersª +switch now use forward slashes instead of backslashes. + +Changed Defaults + + ©Parser stack alignmentª. Now defaults to long instead of int. + ©Parser stack sizeª. Now defaults to 128 instead of 32. + +Bug Fixes + + Interaction between context tracking and error token. In previous +versions of AnaGram, if the first token in a rule was the ©error tokenª, +the value of ©CONTEXTª was the value that corresponded to the location +of the error. CONTEXT now correctly shows the context at which the +aborted rule began. For instance, in the following example, if a +syntax error is encountered while parsing the expression, the error +rule will skip over remaining characters to the terminating semicolon. +When invoked from handleError(), the CONTEXT macro will return the +context as it was at the beginning of the expression. + expression statement + -> expression, ';' + -> error, ~(eof + ';')?..., ';' =handleError(); + + ©Distinguish lexemesª. Several minor bugs in the implementation of distinguish lexemes have been +corrected. + + Set partition logic. Corrected problems in the interaction between the set ©partitionª logic +and the implementation of the ©disregardª statement. + + Table size. Fixed a data sizing problem which occurred when one particular parse table +had precisely 256 entries. + + Keyword recognition. Fixed a problem that could cause difficulties with ©keywordª +recognition when the ©case sensitiveª switch was turned off. + + Default conflict resolution. With unresolved ©shift-reduce conflictªs, the shift case was +not always being selected. This problem has been corrected. + + Lockup. It was possible to write an erroneous grammar that would cause +AnaGram to lock up. This problem has been corrected. + + Potential bus error. The error diagnostic funtion created by the ©diagnose errorsª +switch, could, under some circumstances, access an uninitialized value +on the ©parser value stackª. This problem has been corrected. + + Internal errors. Fixed a number of minor bugs which could cause ©internal errorªs +while running ©File Traceª. + +For changes in the previous version of AnaGram, see ©What's New in AnaGram 2.0ª. +## + +What's New in AnaGram 2.0 + +AnaGram's user interface has been completely revamped to make it more +convenient and easier to use. However, the same tried and true AnaGram +algorithms are still in place to build your parsers. The rules for +syntax files are also unchanged. + +The ©File Traceª and ©Grammar Traceª facilities have each had their +windows combined into a single unit, and a ©Rule Stackª synched with +these windows and with your syntax file window has been added. The +Rule Stack is particularly convenient for relating the progress of the +parse to the ©grammar rulesª in your ©syntax fileª. + +A ©text entryª field has also been added to the Grammar Trace. This +means you can provide character input to your parser in much the same +way you can with a ©test fileª in File Trace, but with instant control +over the input. + +Some further controls have been added to both File and Grammar Traces. +In particular there is a Reset button to reset the trace to its initial +state. This is particularly useful for ©Conflict Traceªs. + +AnaGram now has a small ©Control Panelª (default position is at the +upper right of the screen) from which you can conveniently control +operation. A menu bar provides access to the various commands and +tables. There are toolbar buttons for Analyze Grammar, Build Parser, +File Trace, and so on. The panel also has a data entry field for +entering search keys. + +You can set both colors and fonts in AnaGram windows to suit your own +preferences. We suggest you check Help for ©Colorsª or ©Fontsª before +making changes to make sure that all information will still be properly +displayed. + +AnaGram's ©Helpª has been updated to provide hypertext-type links. But +you can still keep multiple Help windows on view at once. A popup menu +shows all the links in a window. New topics have been added. Also, +further documentation topics are provided in HTML format in the html +subdirectory. + +A ©Help Cursorª on the Control Panel toolbar can be used to get help for +most AnaGram windows, buttons and menu items. F1 can also be used. + +On the ©Action Menuª you will find a list of your most recently used +syntax files. Just click on the file of your choice to have AnaGram +analyze it (or build it if ©Autobuildª is on). +## + +White Space + +In many grammars it is desirable to pass over blanks, +tabs, and similar characters, as well as comments, +collectively termed "white space", as though they were +not there. The "©disregardª" statement in AnaGram may +be optionally used to accomplish this. The "©lexemeª" +statement may be used to exercise fine control over the +scope of the disregard statement. +## + +Wrapper + +The wrapper ©attribute statementª provides correct handling of C++ +objects returned by ©reduction procedureªs. + +If you specify a wrapper for a C++ object, then, when a reduction +procedure returns an instance of the object, a copy of the object will +be constructed on the ©parser value stackª and the destructor will be +called when the object is removed from the stack. + +Without a wrapper, objects are stored on the value stack simply +by coercing the stack pointer to the appropriate type. +There is no constructor call when the object is stored nor +a destructor call when it is removed from the stack. + +Classes which use reference counts or otherwise overload the +assignment operator should always have wrappers in order to +function correctly. + +Wrapper statements, like other ©attribute statementsª, must appear in +configuration sections. The syntax is simply + wrapper { <comma delimited list of data types> } + +For example: + [ + wrapper {CString, CFont} + ] + +You cannot specify a wrapper for the ©default token typeª. + +If your parser exits with an error condition, there may be +objects remaining on the stack. The ©DELETE_WRAPPERSª macro +may be used to delete these objects. If you have enabled +©auto resynchª, DELETE_WRAPPERS will be invoked automatically. + +The ©AG_PLACEMENT_DELETE_REQUIREDª macro is used to control +definition of a "placement delete" operator in the wrapper +class AnaGram defines. +## + +Zero Length + +A zero length ©tokenª is a ©reduction tokenª which can +be matched by a void, i.e. by nothing at all. It +represents an optional item, or a sequence of optional +items, in the input. Since the matching process can +involve several levels of reductions, it is most precise +to use the following recursive definition: A zero length +token is one which either has at least one ©null +productionª or has at least one grammar rule defining it +such that all the tokens in the rule are zero length +tokens. + +Care should be taken when using ©zero lengthª tokens in +©recursive ruleªs. If all the tokens in the rule other than +the recursive token itself are zero length tokens +the rule will generate an infinite loop in the generated +parser. + +The ©Token Tableª identifies zero length tokens because +the use of such tokens sometimes inadvertently causes +©conflictªs. +## + +Control Panel + +The AnaGram Control Panel appears at the upper right of your monitor +when you start AnaGram. It has a menu bar, command buttons, a button +which enables a ©help cursorª, and a ©status indicatorª. At the lower +left you will see a data entry field for entering ©searchª +keys, with neighboring search forward and search backward buttons. + +Notice that the ©Options Menuª has a "Stay On Top" entry which +allows you to specify whether the Control Panel stays on top of +other AnaGram windows. +## + +Status Indicator + +The status indicator at the right of the AnaGram +Control Panel shows the status of the ©current grammarª: + Ready + Loaded + Error + Parsed + Analyzed + Built + +"Ready" appears only when no grammar has been selected. + +"Loaded" and "Parsed" are normally transitory. + +"Error" means at least one syntax error has been detected +in your grammar and AnaGram cannot continue. Check the +Warnings window to determine the nature of the problem. + +"Analyzed" means that a ©grammar analysisª has been +completed, but no ©output filesª have been written. + +"Built" means that an analysis has been completed and +output files have been written. +## + +Help Cursor + +The Help Cursor is accessed via the button with the question mark on +AnaGram's ©Control Panelª. It is convenient for getting help on +©Warningªs, browse tables, menu items and so on. + +If you click on the button you enable the Help Cursor, which you can +then drag with the mouse. A further mouse click will provide help +for the item underneath the cursor. + +Note further that AnaGram also has F1 help which you may find +simpler and faster than the Help Cursor. +## + +Search + +AnaGram has a simple search facility to let you search for text strings +in AnaGram windows. A data entry field on the ©Control Panelª is +provided for you to enter text. Left-clicking on the neighboring +buttons lets you search either forward or backward for a line in the +active window which contains at least one instance of the text. + +Note that the search begins at the next line after the highlighted line +for forward search; at the line preceding the highlighted line for +backward search. +## + +Search Key + +To find a text string in an AnaGram window, enter the +string in the Search Key field in the ©Control Panelª +and press Enter. + +To find another instance of the string click on the +©Find Nextª button or press F3. + +To find a previous instance of the string click on +the ©Find Previousª button or press F4. + +In windows that have a cursor bar, a forward search +begins on the line following the cursor and a backward +search begins on the line preceding the cursor. +## + +Find Next + +The Find Next key, on the ©Control Panelª immediately +to the right of the ©Search Keyª field, locates +the next instance of the search key in the most recently +active AnaGram window. F3 is the keyboard equivalent. +## + +Find Previous + +The Find Previous key, on the ©Control Panelª immediately +to the right of the ©Find Nextª key, searches +backwards for the search key in the most recently +active AnaGram window. F4 is the keyboard equivalent. +## + +Fonts, Set Fonts + +The Set Fonts dialog allows you to use the fonts of your choice in +AnaGram windows. You should make sure that the ©marked tokenªs font is +very distinctive so that marked tokens will show up clearly even if +they are only 1 or 2 characters long. Sometimes it is helpful to use an +underlined font for marked tokens. + +A Default button at the bottom of the dialog lets you revert to +AnaGram's original fonts if you wish. +## + +Colors, Set Colors + +The Set Colors dialog allows you change the colors of +AnaGram windows. Notice that in the ©File Traceª the ©test file paneª +requires three different sets of text and background colors. You +should make sure that the backgrounds, at least, can be easily +distinguished from each other so the trace information can be +properly displayed. You also want to take care that an active pane in +a File Trace or Grammar Trace can be distinguished from inactive +panes. + +The Default button at the bottom of the dialog lets you revert to +AnaGram's original colors if you wish. + +Color changes pertain only to the client areas of AnaGram windows. The +remaining parts of your windows will have the customary colors you have +chosen for your system. +## + +Marked Token + +Some tables and trace panes display each rule with one token marked to +show how far parsing has progressed in the rule. The marked token is +the next input expected in the input stream. It is shown in a different +font to distinguish it from other tokens in the rule. If no token is +marked, the rule is a ©completed ruleª, i.e. it has been completely +matched and will be reduced by the next input. + +You can set the font for marked tokens by choosing Fonts from the +©Options Menuª. You should make sure that the font is very distinctive so +that marked tokens will show up clearly even if they are only 1 or 2 +characters long. Sometimes it is helpful to use an underlined font for +marked tokens. +## + +Synch Parse + +The Synch Parse button replaces the ©Single Stepª button on the +toolbar of the ©File Trace windowª when, for some reason, the +location of the blinking cursor in the ©test file paneª differs from +the current parse position. This can occur when you single click in +the test file pane or when the parse cannot track the cursor because +of a ©syntax errorª or a ©semantically determined productionª. + +Click the synch parse button to resynch the parse with the cursor. +## + + +Single Step + +The Single Step button is one of the control buttons for the ©File +Traceª and ©Grammar Traceª. It advances the parse one ©parser +actionª at a time. In the File Trace, it is replaced with the "©Synch +Parseª" button whenever the blinking cursor loses synch with +the current parse location. + +In the Grammar Trace, the Single Step button takes its input from the +Allowable Input pane, the Reduction Choices pane, or the ©text entryª +field, depending on which is active. +## + +Proceed + +The Proceed button is one of the control buttons for the +©Grammar Traceª. If the ©Reduction Choices paneª or the ©Allowable +Input paneª is active, Proceed parses the highlighted token +until it is shifted in to the ©parser stackª. If the ©text entryª +field is active, Proceed parses all text in the field. If a +©syntax errorª is encountered, the parse stops and all ©reduce +actionªs are undone. + +Note that selecting a token in Allowable Input can cause a syntax +error under certain circumstances. This can happen only if the +following conditions are all true: + the indicated operation is a ©reductionª, + the reduction token for the rule being reduced has been used in several +different contexts in the grammar + and the specified token may +follow it in some contexts and not in others. +## + +Reduction Choices Pane + +The ©File Traceª and ©Grammar Traceª display a Reduction Choices +pane when they need to reduce a ©semantically determined productionª. + +The rule to be reduced is highlighted in the ©rule stack paneª. +If the ©syntax fileª window is visible, it shows the rule in +context in your grammar. + +The Reduction Choices pane lists all possible ©reduction tokenªs for +the specified rule. The first reduction token that is admissible in +the current context is highlighted and it appears +as the ©lookahead tokenª in the ©parser stack paneª. The text that +comprises the entire rule is highlighted in the ©test file paneª. + +Select the desired reduction token before continuing with the parse. + +If you select a token and it does not appear as the lookahead token, +it is not syntactically correct in the current context. If you try +to proceed with the parse, you will get a ©selection errorª. +## + +Selection Error + +The ©Parse Statusª field indicates a "selection error" if you +choose a ©reduction tokenª from the ©Reduction Choices paneª of +a ©File Traceª or ©Grammar Traceª and the selected token is not +syntactically correct in the current context. +## + +Parser Stack Pane + +The Parser Stack pane, the upper left pane of the ©File Traceª and +©Grammar Traceª windows, displays the ©parser stackª for the current +trace. + +Each line corresponds to one level in the parser state stack. It shows +the stack index, the ©parser stateª for that level, and the ©tokenª which +was seen at that state. The last line of the stack, the ©lookahead +lineª, corresponds to the current state of the parser. Since no input +has yet been processed for this state, the token, if any, which +appears at this level is a ©lookahead tokenª. + +If you move the cursor in the Parser Stack pane of a File Trace, +the text that makes up the selected token will be +highlighted in the ©Test File paneª. You can back the parse up to +any desired stack level by double clicking at the beginning of the +token text in the Test File pane. + +Similarly, if you move the cursor bar in the Parser Stack pane of a +Grammar Trace, the ©Allowable Input paneª will change to display the +allowable tokens in the selected state. The previously +selected token will be highlighted. Then, double click on any token in +the Allowable Input pane to back the parse up and choose a token +a second time. + +The ©Rule Stack paneª of the File or Grammar Trace is also synched +to the Parser Stack pane. If the ©syntax fileª window is visible, it +will be synched to show the rule currently selected in the rule +stack pane. Note that rules that have been automatically generated +by the expansion of ©virtual productionsª cannot be synched, so the +top line of the syntax file will be highlighted instead. + +In the Grammar Trace, the last line of the Parser Stack may or may not +display a ©lookahead tokenª, depending on the last ©parser actionª +performed. If input was taken from Allowable Input and the last +action was a simple ©reduce actionª, the last input token selected +will be displayed as the lookahead input. But if the last action +performed shifted the token in, the lookahead field will be empty. + +If you right-click on a highlighted line in the Parser Stack pane, you will +get a pop-up menu to give you more information. In particular you can +get an ©Auxiliary Traceª starting at the current point in your File or +Grammar Trace, so you can explore various possibilities without losing +your position in the old trace. +## + +Exit + +Select this entry from the ©Action Menuª to terminate AnaGram. +## + +Allowable Input, Allowable Input Pane + +The upper right pane of the ©Grammar Traceª window lists the +allowable input tokens for the current state of the ©grammarª. + +The tokens in the Allowable Input pane are listed in two groups: +first, the ©terminal tokensª allowable in this state, and +second, the ©nonterminal tokensª. Between these two groups of tokens +is inserted a line which is either an option for a ©default reductionª, +or declares that there is no default action. + +Double click, press Enter, or click the ©Proceedª button to +parse the highlighted token. When all parse actions triggered +by the highlighted token have been completed, all panes of the trace +will be redrawn to show the new state of the parser. + +Note that selecting a token in Allowable Input can cause a syntax +error under certain circumstances. This can happen only if the +following conditions are all true: + the indicated operation is a ©reductionª, + the reduction token for the rule being reduced has been used in several +different contexts in the grammar + and the specified token may +follow it in some contexts and not in others. + +If you wish to see the results of a single parser action, click +on the ©single stepª button. The parser will perform a single +parser action. If the +token you selected was not shifted in, it will now be displayed +as the ©lookahead tokenª on the last line, the ©lookahead lineª in +the ©Parser Stack paneª, and will be preselected in the Allowable +Input pane. + +Because AnaGram, by default, uses a number of compound +parser actions, this situation does not arise very often unless you +have set the ©traditional engineª switch or reset the ©default +reductionsª switch. Usually you will want to select the same token to +proceed, but it is not necessary. + +The Allowable Input pane also displays +the ©parser actionª associated with a specific token. If it is +not a ©compound actionª, the action and its result are also shown. + +The ©parser actionª field for a token may be interpreted as follows: If +this token would cause a shift to a new state, the action field is ">>" +followed by the new state number. If the token would cause a +©reductionª, the action field is "<<" followed by a ©rule numberª to +show the rule reduced. If the parser action is a compound action, the +action field is blank. If the token would cause the grammar to be +accepted, the action field is "Accept". + + +The ©text entryª field at the bottom of the Grammar Trace can be +used as a convenient alternative to the Allowable Input pane. It +accepts characters rather than tokens. Most non-printing characters +such as newline are only available from Allowable Input. +## + +Copy + +The Copy command on the ©Windows Menuª copies the currently active +table or Help topic to the clipboard. +## + +Statistical Summary + +While your grammar is being analyzed, a Statistical Summary window +pops up to show you the progress of the analysis. Unless you have +turned off ©Show Statisticsª on the ©Options Menuª, this window will remain +on-screen for your reference. Among other things, it shows you the +number of rules and states in your grammar, and the number of conflicts +and warnings, if any. + +Note that if your grammar is small and you have Show Statistics turned +off, the appearance of this window on your monitor may be exceedingly +brief - you may just see a flash. + +If the window is turned off or you have closed it, you can get it from +the ©Browse Menuª. +## + +Stay On Top + +The Stay On Top entry in the ©Options Menuª allows you to specify whether +the ©Control Panelª stays on top of other AnaGram windows. +## + +Show Syntax + +If this entry in the ©Options Menuª is checked, AnaGram will display the +©syntax fileª when it has analyzed your ©grammarª. If this entry is not checked +or you have closed the syntax file window, you can select the window +from the ©Browse Menuª. +## + +Show Statistics + +If this entry in the ©Options Menuª is checked, AnaGram will leave the +©Statistical Summaryª on the screen after it has analyzed your ©grammarª. If +this entry is not checked or you have closed the Statistical Summary +window, you can select the window from the ©Browse Menuª. +## + +About AnaGram + +Select this entry from the ©Help Menuª to find out the version and +serial numbers of your copy of AnaGram, and how to contact Parsifal +Software. +## + +Help Topics + +Select Help Topics from the ©Help Menuª to get a complete list of AnaGram +Help Topics titles. You can bring up the window for a highlighted topic +by double-clicking with the left mouse button, pressing F1, or using +the ©Help Cursorª. +## + +Cascade Windows + +Select this entry from the ©Windows Menuª to cascade your open windows +starting at top left of the screen. +## + +Close Windows + +Select this entry from the ©Windows Menuª to close all open windows +except the ©Control Panelª. You may also close the active window +by pressing the Escape key. +## + +Hide Windows + +Select this entry from the ©Windows Menuª to hide all open windows +except the ©Control Panelª. Restore them to the screen with ©Restore +Windowsª +## + +Restore Windows + +Use this command on the ©Windows Menuª to restore to the screen +any windows you have previously hidden with ©Hide Windowsª. +## + +Token Input, Preprocessor, Lexical Scanner + +AnaGram makes it unnecessary, in most cases, to have a separate +preprocessor to provide the ©tokensª which are fed to your parser. + +However in some cases you may want to use a preprocessor, or lexical +scanner, to provide input to your parser. The preprocessor may +or may not be written in AnaGram. If it sends the parser token +numbers, as opposed to character codes, this is referred to as token +input, as opposed to character input. Please refer to the AnaGram +User's Guide for information on identifying the tokens to the parser +and providing their semantic values, if any. + +Since a ©File Traceª is based on character codes, it will be greyed out +on the ©Action Menuª if you have token input. For a ©Grammar Traceª, +entering characters in the ©text entryª field is not appropriate and +will simply cause a syntax error. +## + +Lookahead Line + +The last line of the ©Parser Stack paneª, the "lookahead" line, +will sometimes show a ©lookahead +tokenª, and sometimes not. In a ©File Traceª, you will always see a +lookahead token because it is available from the ©test fileª. + +In a ©Grammar Traceª you will usually see a lookahead token only when +you have used the ©Single Stepª button or if there is available +input in the ©text entryª field. In the latter case the token +corresponding to the first character of the input will appear on the +lookahead line. + +If you click Single Step after selecting a token from ©Allowable +Inputª and it causes only a simple ©reduce actionª (as opposed to a +shift or a compound action), then, upon completion of the reduction, +the token you selected will appear on the lookahead line and also +will be preselected in Allowable Input. + +Usually you would select +this token for the next parse step. However, if there are other +possible inputs in this state, the parse theoretically could have +arrived at this state by a different sequence of input tokens. Thus, +if you are more interested in the behavior of the parser at this +state than in the response of the parser to a particular sequence of +inputs, it is perfectly valid to select a different input token, and +AnaGram will let you do it. + +Note that if you have enabled the ©traditional engineª switch or +disabled the ©default reductionsª switch, the +probability of finding a token which does a simple reduction is +noticeably higher than otherwise. +## + +Action Menu + +The Action menu begins with the ©Analyze Grammarª and ©Build Parserª +commands. If a grammar has already been analyzed, but not yet built, +there will also be an extra Build command bearing the name of your +syntax file. + +There are also ©Reanalyzeª and ©Rebuildª commands which are +initially greyed out. They become available if you change the +current syntax file. + +The next section has ©File Traceª and ©Grammar Traceª +commands. If you have enabled the ©Error Traceª +©configuration switchª, this section also shows an +Error Trace command. + +The menu ends with an ©Exitª command +and a list of recently used syntax files, if any. Just +click on a syntax file name to have AnaGram analyze it, or +build it if the ©Autobuildª option is on. +## + +Browse Menu + +Initially, the Browse Menu shows only a single entry: +©Configuration Parametersª which lets you see the +current state of configuration parameters before any +may have been set by your syntax file. Once you have +analyzed a grammar, this menu fills up with many tables +containing information about your grammar. You can also +bring up a window showing your ©syntax fileª from this menu. +If your grammar has generated ©syntax errorªs or warnings, or +contains conflicts, there will be ©Warningªs or ©Conflictªs +entries. +## + +Options Menu + +From this menu you can select a ©Fontsª or ©Colorsª dialog so you can +set AnaGram's fonts and colors to suit your own tastes. You can set +©Autobuildª if you want AnaGram to automatically build your ©grammarª +when you select a ©syntax fileª from the ©Action Menuª. You can also +choose whether or not to automatically show the ©Statistical Summaryª +window or your syntax file window when you open a grammar, or make +the ©Control Panelª stay on top of other AnaGram windows. +## + +Windows Menu + +The Windows menu lets you cascade, close, or hide all AnaGram +windows except the ©Control Panelª, or restore them if they +have been hidden. It also has a list of open windows (even +if hidden) so you can select the one you want. The Copy command will +copy most windows to the clipboard. +## + +Help Menu + +The Help Menu has the following entries: + +©Getting Startedª provides a brief description of AnaGram and +introductory suggestions. + +©Help Topicsª brings up a list of all help topics. + +©Using Helpª tells you how to use AnaGram's help facilities. + +©What's Newª has information on new features of this version of AnaGram. + +©About AnaGramª tells you what version of AnaGram you are using, and also +provides contact information for Parsifal Software. +## + +Autobuild + +When Autobuild (©Options Menuª) is checked, selecting a file +from the list of most recently used files on the ©Action Menuª +invokes the ©Build Parserª command. Otherwise, the ©Analyze +Grammarª command is invoked. +## + +Reanalyze, Rebuild + +Reanalyze and Rebuild commands on the ©Action Menuª are +initially greyed out. + +Reanalyze becomes available if +you have a syntax file currently analyzed or built +in AnaGram and change it while AnaGram is still running. + +Rebuild becomes available if +you have a syntax file currently built +and change it while AnaGram is still running. +## + +Percent Sign + +The percent sign ( % ) is used to mark certain tokens in your grammar +which AnaGram must redefine in order to implement the ©disregardª +statement. If you have used this statement in your grammar, You will +probably notice the percent sign appearing in some windows and traces. + +The percent sign indicates the original token, without the optional +white space attached. Early versions of AnaGram used the degree sign +instead, but this character is not generally available in Windows. +## + +Program Development + +The first step in writing a program is to write a ©grammarª in +AnaGram notation which describes the input the program expects. + +The file containing the grammar, called the ©syntax fileª, should +have the extension ".syn". You could also make up a few sample input +files at this time, but it is not necessary to write ©reduction +procedureªs at this stage. + +Run AnaGram and use the ©Analyze Grammarª command to create parse +tables. If there are ©syntax errorsª in the grammar at this point, +you will have to correct them before proceeding, but you do not +necessarily have to eliminate ©conflictsª, if there are any, at this +time. There are, however, many aids available to help you with +conflicts. These aids are described in the AnaGram User's Guide, and +somewhat more briefly in the Online Help topics. + +Once syntax errors are corrected, you can try out your grammar on the +sample input files using the ©File Traceª facility. +With File Trace, you can see interactively just how your grammar +operates on your test files. You can also use ©Grammar Traceª to +answer "what if" questions concerning input to the grammar. The +Grammar Trace does not use a test file, but rather allows you to make +input choices interactively. + +At any time, you can write ©reduction procedureªs to process your +input data as its components are identified in the input stream. Each +procedure is associated with a ©grammar ruleª. The reduction +procedures will be incorporated into your parser when you create it +with the ©Build Parserª command. + +By default, unless you specify an input procedure, ©parser inputª +will be read from stdin, using the default ©GET_INPUTª macro. +You will probably wish to redefine GET_INPUT, or configure your +parser to use ©pointer inputª or ©event drivenª input. +## + +License, Copyright, Copying, Open Source, Warranty, No Warranty + +AnaGram, A System for Syntax Directed Programming + +Copyright 1993-2002 Parsifal Software + +Copyright 2006, 2007 David A. Holland + +All Rights Reserved. + +AnaGram itself is released to the public under the traditional 4-clause BSD +license: + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + + 1. Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + 3. All advertising materials mentioning features or use of this software + must display the following acknowledgement: + This product includes software developed by Parsifal Software, + Jerome T. Holland, and their contributors. + + 4. Neither the name of Parsifal Software nor the name of Jerome T. + Holland nor the names of their contributors may be used to endorse or + promote products derived from this software without specific prior written + permission. + + THIS SOFTWARE IS PROVIDED BY PARSIFAL SOFTWARE, + JEROME T. HOLLAND, AND CONTRIBUTORS ``AS IS'' AND ANY + EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY + AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + IN NO EVENT SHALL PARSIFAL SOFTWARE, JEROME T. + HOLLAND, OR THE CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, + WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + POSSIBILITY OF SUCH DAMAGE. + +The AnaGram ©parsing engineª, that is, the code that is emitted by +AnaGram and incorporated into programs developed using AnaGram, uses +this less restrictive zlib-style license: + + This software is provided 'as-is', without any express or implied warranty. + In no event will the authors be held liable for any damages arising from + the use of this software. + + Permission is granted to anyone to use this software for any purpose, + including commercial applications, and to alter it and redistribute it + freely, subject to the following restrictions: + + 1. The origin of this software must not be misrepresented; you must not + claim that you wrote the original software. If you use this software in a + product, an acknowledgment in the product documentation would be + appreciated but is not required. + + 2. Altered source versions must be plainly marked as such, and must not + be misrepresented as being the original software. + + 3. This notice may not be removed or altered from any source distribution. + +##