Mercurial > ~dholland > hg > ag > index.cgi
view anagram/guisupport/helpdata.src @ 15:f5acaf0c8a29
Don't cast through "volatile int". Causes a gcc warning nowadays.
XXX: should put something else back here to frighten the optimizer
author | David A. Holland |
---|---|
date | Tue, 31 May 2022 01:00:55 -0400 |
parents | 13d2b8934445 |
children |
line wrap: on
line source
Accept Action The accept action is one of the four actions of a traditional ©parsing engineª. The accept action is performed when the ©parserª has succeeded in identifying the goal, or ©grammar tokenª for the ©grammarª. When the parser executes the accept action, it sets the ©exit_flagª field in the ©parser control blockª to AG_SUCCESS_CODE and returns to the calling program. The accept action is thus the last action of the parsing engine and occurs only once for each successful execution of the parser. If the grammar token has a non-void value, you may obtain its value by calling the ©parser value functionª whose name is given by <parser name>_value, that is, by appending "_value" to the ©parser nameª. ## Parser Value Function, Return Value The value assigned to the ©grammar tokenª in your parser may be retrieved by calling the parser value function after the parser has finished. The name of this function is given by <©parser nameª>_value. The return type of the function is the type assigned to the grammar token. If you have set the ©reentrant parserª switch, the parser value function takes a pointer to the ©parser control blockª as its sole argument. Otherwise, it takes no arguments. The value function is not defined if the grammar token has type "void". ## AG_PLACEMENT_DELETE_REQUIRED When the ©wrapperª option is specified, the wrapper template class that AnaGram defines uses a "placement new" operator to construct the wrapper object on the ©parser value stackª. The MSVC++ 6.0 compiler requires, in this situation, that a corresponding "placement delete" operator be defined. Other C++ compilers, notably MSVC++ 5.0, generate an error message if they encounter the definition of a "placement delete" operator. Accordingly, AG_PLACEMENT_DELETE_REQUIRED is used to determine whther a "placement delete" operator should be defined. AG_PLACEMENT_DELETE_REQUIRED is defined to be 1 if you are using MSVC++ 6.0 or greater, 0 otherwise. You can override the automatic definition of AG_PLACEMENT_DELETE_REQUIRED by defining it in the ©C prologueª section of your grammar. Set it to a non-zero value to force the "placement delete" definition, zero to skip the definition. ## ag_tcv ag_tcv is an array AnaGram includes in your ©parserª. Your parser uses ag_tcv to translate external codes to the internal token numbers that AnaGram uses. It uses the actual input code to index the ag_tcv array to fetch a ©token numberª. The token number is then used to identify the input token. ## Allow macros "Allow macros" is a ©configuration switchª which defaults to on. When it is set, i.e., on, ©reduction procedureªs will be implemented as macros if they are sufficiently simple. This makes your ©parserª somewhat more compact but makes it somewhat more difficult to debug. It's a good idea to turn this switch off for debugging. ## Analyze Grammar The Analyze Grammar command will scan and analyze your ©syntax fileª, and create a number of tables summarizing your grammar. Analyze Grammar does not create any ©output filesª. To create a ©parserª, use the ©Build Parserª command. You would probably use Analyze Grammar, rather than Build Parser, during initial development of your ©grammarª. You can use ©File Traceª and ©Grammar Traceª as soon as you have analyzed your grammar. It is not necessary to build a parser first. ## Attribute Statement Attribute statements are used in ©configuration sectionsª of your ©syntax fileª to specify certain properties for ©tokenªs, ©character setªs, or other units of your grammar. The attribute statements available are: ©disregardª ©distinguish keywordsª ©enumª ©extend pcbª ©hiddenª ©leftª ©lexemeª ©nonassocª ©rename macroª ©reserve keywordsª ©rightª ©stickyª ©subgrammarª ©wrapperª ## Auto init Auto init is a ©configuration switchª which defaults to on. It controls the initialization of any ©parserª that it is not ©event drivenª. When it is set to on, your parser is automatically initialized every time it is called. This is the situation you will normally use. On occasion, however, it is desirable to call a parser several times without reinitializing it. In this case, you may set the auto init parameter to off and then call the ©initializerª yourself whenever it is appropriate. ## Auto resynch "Auto resynch" is a ©configuration switchª which defaults to off. You may use it to specify ©automatic resynchronizationª as an ©error recoveryª mechanism. Setting the "auto resynch" switch causes AnaGram to include an automatic ©resynchronizationª procedure in your ©parserª. The resynchronization procedure will be invoked when your parser encounters a ©syntax errorª and will skip over input until it finds input characters or ©tokensª consistent with its state at the time of the error. An alternate technique, ©error token resynchronizationª, uses an ©error tokenª which you include in your grammar. ## Automatic Resynchronization Automatic ©resynchronizationª is one of several ©error recoveryª options available as part of parsers built by AnaGram. You enable automatic resynchronization by setting the ©auto resynchª ©configuration switchª. If your parser includes automatic resynchronization it will incorporate a heuristic procedure which will skip over input tokens until it finds a token which makes sense with respect to one or another of the ©productionªs active at the time of the ©syntax errorª. The purpose of the resynchronization procedure is to provide a simple way for your parser to proceed in the event of syntax errors so that it can find more than one syntax error on a given pass. The resynchronization procedure uses a heuristic based on your own syntax. AnaGram itself uses this technique to resynchronize after syntax errors in its input. A disadvantage to using this resynchronization technique is that the resynchronization procedure turns off all ©reduction procedureªs. Because of the error, a number of reduction procedures, which normally would be executed, will be skipped. The parameters for any reduction procedures that might be called later would be suspect and could cause serious problems. It seems more prudent simply to shut them down. If you use the automatic resynchronization procedure, you must also specify an ©eof tokenª so that the synchronizer doesn't inadvertently skip over the end of file. An alternative technique for resynchronization is called ©error token resynchronizationª. ## Auxiliary Trace An Auxiliary Trace is a pre-built grammar trace which you may select from the ©Auxiliary Windowsª popup menu for most windows which display parser state information. The Auxiliary Trace provides a path to the state specified in the highlighted line of the primary window. When obtained from the Parser Stack pane of the ©File Traceª or ©Grammar Traceª, the Auxiliary Trace is simply a copy of the current status of these traces so you can explore your alternatives while still retaining the status of the original trace for reference. ## Auxiliary Windows From most AnaGram windows you can pop up an Auxiliary Windows menu by clicking the right mouse button or by pressing Shift F10. Auxiliary Windows may have Auxiliary Windows of their own. Windows with a cursor bar (highlighted line): The windows available in the Auxiliary Windows menu depend on the grammar elements identified by the cursor bar in the parent window. If the cursor bar identifies a ©parser stateª, there will be windows that describe the state. If the cursor bar identifies a ©grammar ruleª, there will be windows that describe the rule. If the cursor bar identifies a ©tokenª, there will be windows that describe the token. In the case of a ©marked ruleª, token windows will describe the marked token, if any. In some cases, specialized pre-built grammar traces such as the ©Conflict Traceª or ©Auxiliary Traceª are on the menu. Help windows: For Help windows, the Auxiliary Windows menu will show all the available links to other ©Help topicsª from this window. ©Using Helpª is always available. ## Backtrack If your ©parserª does not continue after encountering a ©syntax errorª, you can speed it up and make it a little smaller by turning off the backtrack ©configuration switchª. If backtrack is on, AnaGram configures your parser so that in case of syntax error it can undo any ©default reductionsª it might have made as a consequence of the erroneous input. The purpose of such an undo function is to identify the proper ©error frameª and to maximize the probability of being able to recover gracefully. ## Empty Recursion This warning message tells you that the recursive step of the specified ©recursive ruleª can be completely matched by ©zero lengthª tokens, i.e., by nothing at all. The result is potentially an infinite loop in the generated ©parserª. The specified rule is an expansion rule of the specified token. Because of the possibility of encountering an infinite loop while parsing, AnaGram turns off its ©keyword anomalyª analysis if empty recursion is found. The ©File Traceª function is also disabled for the same reason. The ©circular definitionª of a token has the same effect as an empty recursion, in that no additional input is required to match the recursive rule. ## Keyword Anomaly analysis aborted: empty recursion The ©keyword anomalyª analysis has been turned off, since the presence of ©recursive ruleªs with ©empty recursionª can cause infinite loops in the analysis. ## Keyword Anomaly analysis aborted: circular definition The ©keyword anomalyª analysis has been turned off, since the presence of a ©circular definitionª can cause infinite loops in the analysis. ## File Trace disabled: empty recursion Because of the presence of ©recursive ruleªs with ©empty recursionª in this grammar and the infinite loops that can ensue, the ©File Traceª function has been disabled. ## File Trace disabled: circular definition Because of the presence of a ©circular definitionª in this grammar and the infinite loops that can ensue, the ©File Traceª function has been disabled. ## Both Error Token Resynch and Auto Resynch Specified This ©warningª message indicates that your ©grammarª defines an ©error tokenª and also requests ©automatic resynchronizationª. AnaGram will ignore the request for automatic resynchronization and will provide ©error token resynchronizationª. If you named a token "error" but do not wish ©error token resynchronizationª, you can either rename "error", or, in a ©configuration sectionª, you may explicitly specify the error token to be something you don't otherwise use in your grammar: [ error token = not used ] ## Bottom Margin "Bottom margin" is an ©obsolete configuration parameterª. ## Bright Background "Bright background" is a ©configuration switchª which was used in the DOS version of AnaGram. It is no longer used, but is still recognized for the sake of upward compatibility with old ©configuration fileªs. ## Build Parser You use the Build Parser command to create a ©parserª based on your ©grammarª. The parser is a C file consisting of the ©embedded Cª (which may include C++) code in your ©syntax fileª, your ©reduction procedureªs, a number of tables derived from your grammar specification, and a ©parsing engineª customized to your requirements. If you only wish to investigate your grammar and do not wish to create ©output filesª, use the ©Analyze Grammarª command. ## Build <file name> This item on the ©Action Menuª is available when you have analyzed a ©grammarª but you have not yet built it. It builds the grammar without reloading the ©syntax fileª from the disk. ## Cannot Make Wrapper for Default Token Type This ©warningª message occurs when AnaGram finds a token type that has been previously defined as the ©default token typeª listed in a ©wrapperª statement. If a wrapper is needed for a particular type, you must specify the ©data typeª explicitly for each relevant ©tokenª. As a result, a wrapper class has not been created for the specified token type. ## Token with Wrapper cannot be Default Token Type This ©warningª message indicates that an attempt has been made to specify a class that has previously been listed in a ©wrapperª statement as the ©default token typeª. If a wrapper is needed for a particular type, you must specify the ©data typeª explicitly for each relevant ©tokenª. As a result, the default token type has not been set. ## Case Sensitive "Case sensitive" is a ©configuration switchª which defaults to on. When it is on, it instructs AnaGram to build a parser for which all input is case sensitive. When it is off, the AnaGram builds a parser which ignores case for all input. If the ©iso latin 1ª configuration switch is turned off, case conversion will be limited to characters in the normal ascii range. When it is on, case conversion will be done for all iso latin 1 characters. If you have other requirements for case conversion, you may provide your own definition in your ©embedded cª for the ©CONVERT_CASEª macro which is invoked to perform case conversion on input characters. Note that the value of an input token is unaffected by the case sensitive switch. When case sensitive is off, 'a' and 'A' will be treated as the same input token by the parser, but the ©token valueªs will nevertheless be different. ## C Prologue If you include a block of ©embedded Cª code at the very beginning of your syntax file, it is called the "C prologue". It will be copied to your ©parser fileª before any of the code generated by AnaGram. You can use the C prologue to ensure that copyright notices, #include directives, or type definitions, for example, occur at the very beginning of your parser file. If you specify a C or C++ type of your own definition, you must provide a definition in the C prologue. ## CHANGE_REDUCTION CHANGE_REDUCTION(t) is a macro which AnaGram defines in your ©parser fileª if your ©parserª uses ©semantically determined productionsª. In your ©reduction procedureª, when you need to change the ©reduction tokenª you can easily do so by calling CHANGE_REDUCTION with the name of the desired token as the argument. If the token name has embedded spaces, replace the embedded spaces with underline characters. ## Character Constant You may represent single characters in your ©grammarª by using character constants. The rules for character constants are the same as in C. The escape sequences are as follows: \a alert (bell) character \b backspace \f formfeed \n newline \r carriage return \t horizontal tab \v vertical tab \\ backslash \? question mark \' single quote \" double quote \ooo octal number \xhh hexadecimal number AnaGram treats a single character as a ©character setª which contains only the specified character. Therefore you can use a character constant in a ©set expressionª. ## Character Map The Character Map table shows you the mapping of input characters to ©token numbersª. The ©ag_tcvª table in your parser is based on the information in this table. The fields in this table are: character code display character, if any (what Windows displays for this code) ©partition set numberª ©token numberª ©token representationª The display character will be what Windows displays for the character code in the Data Tables font you have chosen. ## Character Range A "character range" is a simple way to specify a ©character setª. There are two ways to represent a character range in an AnaGram ©syntax fileª. The first way is like a ©character constantª: 'a-z'. The second way allows somewhat greater freedom: 'a'..'z' 'a'..255 ^Z..037 -1..0xff Here you use two arbitrary ©character representationsª separated by two dots. If the two characters are out of order, AnaGram will reverse the order, but will give you a ©warningª. More complex ©character setsª may be specified by using ©unionª, ©differenceª, ©intersectionª, or ©complementª operators. ## Character Representation In an AnaGram ©syntax fileª you may represent a character literally with a ©character constantª or numerically using decimal, octal or hexadecimal representations following the conventions for C. Thus 'A', 65, 0101, and 0x41 all represent the same character. Control characters can be represented using the '^' character and either an upper or lower case letter. Thus ^j and ^J are acceptable representations of the ascii newline code. The rules for character constants are identical to those in C, and the same escape sequences are recognized. ## Character Set In AnaGram grammars you can conveniently specify whole sets of characters at a time. This avoids needless repetition and complexity. Sets of characters may be defined in an AnaGram ©syntax fileª in any of a number of ways. A single character is taken to represent a character set consisting of a single element. (See ©character representationª.) You can also specify a set consisting of a range of characters (see ©character rangeª) and perform the familiar set operations, union, intersection, difference and complement. All the sets you define in your syntax file are summarized in the ©Character Setsª window. The ©unionª of two character sets, represented by a '+', contains all characters that are in one or another of the two sets. Thus, 'A-Z' + 'a-z' represents the set of all upper and lower case letters. The ©intersectionª of two character sets, represented by a '&', contains all characters that are in both sets. Thus, suppose you have the ©definitionsª letter = 'A-Z' + 'a-z' hex digit = '0-9' + 'A-F' + 'a-f' Then (letter & hex digit) contains precisely upper and lower case a to f. The ©differenceª of two character sets, represented by a '-', contains all characters that are in the first set but not in the second set. Thus, using the same definitions as above, (letter - hex digit) contains precisely upper and lower case g to z. The ©complementª of a character set, represented by a preceding '~', represents all characters in the ©character universeª which are not in the given set. Suppose you have defined a set, ©eofª, which consists of the characters which represent end of file. Then, in your grammar where you wish to accept an arbitrary character, what you really want is anything but an end of file character. You can define it thus: anything = ~eof ## Character Sets This window lists all of the distinct ©character setªs which you defined, implicitly or explicitly, in your ©grammarª. Each line in the table describes one such set. The description takes the form of the internal set number and the defining ©expressionª. The ©Auxiliary Windowsª menu will allow you to see the ©Partition Setsª which cover the character set, and the ©Set Elementsª which it comprises, as well as the ©Token Usageª. ## Character Universe, Universe The character universe, or set of all expected input characters to your parser, is defined as all characters in the range given by a particular lower bound and a particular upper bound, as described below. The character universe is used for two things in AnaGram. The first use is for calculating the ©complementª of a character set. The second use is in the input processing of your parser. Input characters will be used to index a ©token conversionª table to convert character codes to token numbers. The length of this table will be given by the size of the character universe. If you have set the ©test rangeª ©configuration switchª you parser will verify that the input character is within the range of the conversion table. Otherwise, the character code will not be checked for validity. In this case, an out-of-range character will lead to undefined behavior. If you have not used any characters with negative codes in your grammar, the lower bound is zero. Otherwise, it is the most negative such character. If the highest character code you have used is less than or equal to 255, the upper bound will be 255. If you have used a character code greater than 255, the upper bound will be the largest such code which appears in your syntax file. ## Characteristic Rule Each ©parser stateª is characterized by a particular set of ©grammar rulesª, and for each such rule, a marked token which is the next ©tokenª expected. The combination of a grammar rule and its marked token is often called a ©marked ruleª. A marked rule which characterizes a state is called a "characteristic rule". In the course of doing ©grammar analysisª, AnaGram determines the characteristic rules for each ©parser stateª. After analyzing your grammar, you may inspect the ©State Definition Tableª to see the characteristic rules for any state in your parser. ## Characteristic Token Every state in a ©parserª, except state 0, can be characterized by the one, unique ©tokenª which causes a jump to that state. That token is called the ©characteristic tokenª of the state, because to get to that ©parser stateª you must have just seen precisely that token in the input. Note that several states could have the same characteristic token. When you have a list of states, such as is given by the ©parser state stackª, it is equivalent to a list of characteristic tokens. This list of tokens is the list of tokens that have been recognized so far by the parser. ## Circular Definition If the ©expansion ruleªs for a ©tokenª contain a ©grammar ruleª that consists only of the token itself, the definition of the token is circular. A circular definition is an extreme case of ©empty recursionª. As in cases of empty recursion, the generated parser may contain infinite loops. When such a condition is detected, therefore, ©keyword anomalyª analysis the ©File Traceª option are disabled. ## column "column" is an integer field in your ©parser control blockª used for keeping track of the column number of the current character in your input. Line and column numbers are tracked only if the ©lines and columnsª ©configuration switchª has been set. ## Command Line If you provide the name of a syntax file on the command line when you start AnaGram, it will open the file and run either ©Analyze Grammarª or ©Build Parserª depending on the setting of the ©Autobuildª switch. ## Command Line Version, agcl.exe The command line version of AnaGram, agcl.exe, can be used in make files. It takes the name of a single syntax file on the command line. Error and ©warningª messages are written to stdout. Normally you would only use the command line version once you have finished developing your ©parserª and are integrating it with the rest of your program. The command line version of AnaGram is not included with trial copies. ## Comment You may incorporate comments in your syntax file using either of two conventions. The first is the normal C convention for comments which begin with "/*" and end with "*/". Such comments may be of arbitrary length. By setting or resetting the ©nest commentsª switch, you may control whether they may be nested or not. The second convention for comments is the C++ comment convention. In this case the comment begins with "//" and ends with a newline. When writing a ©grammarª, you may wish to allow a user to comment his input freely without your having to explicitly allow for comments in your grammar. You may accomplish this by using the ©disregardª statement. ## Compile Command "Compile command" is a ©configuration parameterª which takes a string value. This parameter was used in the DOS version of AnaGram, but is ignored in the Windows version. ## Complement In set theory, the complement of a set, S, is the set of all elements of the ©universeª which are not members of the set S. In AnaGram, the complement operator for ©character setsª is given by '~' and has higher precedence than ©differenceª, ©intersectionª, or ©unionª. In AnaGram, the most useful complement is that of the end of file character set. For ordinary ascii files it is often convenient to read the entire file into memory, append a zero byte to the end, and define the end of file set thus: eof = 0 + ^Z. Then, ~©eofª represents all legitimate input characters. You can then use set differences to specify certain useful sets without tedious enumeration. For example, a comment that is to be terminated by the end of line then consists of characters from the set comment char = ~'\n' & ~eof This set could also be written comment char = ~('\n' + eof) ## Completed Rule A "completed rule" is a ©characteristic ruleª which has no ©marked tokenª. In other words, it has been completely matched and will be reduced by the next input. If there is more than one completed rule in a state, the decision as to which to reduce is made based on the next input token. If there is only one completed rule in a state, it will be reduced by default unless the ©default reductionsª switch has been reset, i.e., turned off. ## Configuration File If it can find them, AnaGram reads two configuration files to set up ©configuration parameterªs. At program initialization, it will first attempt to read a configuration file in the directory that contains the AnaGram executable file you are running. Then it will read a configuration file in your working directory. Both files should have the name "AnaGram.cfg" if they exist. Neither is necessary. If a parameter is specified in both files, the specification in the file from the working directory takes precedence. The effect of this two stage process is to allow you to set your standard preferences in the principal directory, with specific overrides in your working directories. The values for configuration parameters in ©syntax filesª override those read from configuration files. AnaGram does not save configuration parameters in the Windows registry, nor does it provide any mechanism for setting or changing the values of configuration parameters within AnaGram itself. ## Configuration Parameter Configuration parameters may be specified either in ©configuration filesª or in your ©syntax fileª. In your syntax files, configuration parameters are specified, one per line, in a ©configuration sectionª. AnaGram ignores case when identifying a configuration parameter, so that "ALLOW MACROS", "Allow Macros", and "allow macros" are all equivalent forms. There may be any number of configuration sections in a ©syntax fileª. Any parameter may be specified any number of times. Since AnaGram maintains only one value in storage for these parameters, whenever it refers to one it will see the most recently specified value. Every configuration parameter has a default value which has been chosen to correspond to a standard if it exists, customary usage if such can be determined, or otherwise to the most likely usage. Before executing an Analyze Grammar or Build Parser command, AnaGram resets configuration parameters to their initial values, as determined by the built in defaults and the configuration files read at program initialization. The ©Configuration Parameters Windowª shows the current settings of all of the configuration parameters. When this window is active you may press ©F1ª or click with the ©help cursorª to pop up a help window describing the parameter under the cursor bar. There are several varieties of configuration parameters. Some simply set or reset a condition. These need simply be stated to set the condition or negated with the tilde (~) to reset the condition. Thus [ nest comments ] causes AnaGram to allow nested comments, and [ ~nest comments ] causes AnaGram to disallow nested comments. If you prefer you may explicitly specify a switch value as on or off: [ nest comments = on] A second kind of configuration parameter takes a value which is the name of a token. Thus [ grammar token = c grammar] specifies that the token, c grammar, is the ©grammar tokenª which is to be analyzed. A third variety of configuration parameter takes a value which is a C data type. Thus [ default token type = unsigned char *] signifies that the ©semantic valueª of a token, unless otherwise specified is a pointer to an unsigned char. A fourth variety of configuration parameter takes a string value to set some ascii string used by AnaGram. Thus [ header file name = "widget.h" ] signifies that the header file created by AnaGram should be called "widget.h". In string-valued parameters used to specify the names of output files or the name of your parser, you may use the '#' character to indicate the name of your syntax file: When the string is actually used, AnaGram will substitute the syntax file name for the '#'. In string-valued parameters used to specify the names of functions or variables that AnaGram generates, you may use '$' to specify the name of your parser. When the string is actually used, AnaGram will substitute the name of your parser for the '$'. In the "©enum constant nameª" configuration parameter you may use '%' to specify where a token name is to be substituted. The final variety of configuration parameter takes a numeric value. The value may be decimal, octal or hexadecimal, following the C conventions, and may have an optional sign. Thus [parser stack size = 50] tells AnaGram to allocate space for at least fifty stack entries when it creates your parser. ## Configuration Parameters Window The Configuration Parameters window lists the ©configuration parameterªs AnaGram accepts with their current values, as set by the ©configuration filesª it has read and by the most recent ©syntax fileª it has analyzed. Configuration parameters cannot be changed from within AnaGram. ## Configuration Section A configuration section is one of the main divisions of your ©syntax fileª. It begins with a left square bracket on a fresh line. It then contains definitions of ©configuration parameterªs, ©configuration switchª settings and ©attribute statementªs. These specifications must each start on a new line. The configuration section is closed with a right bracket. Any further component of your syntax file, other than a ©commentª, must start on a fresh line. There can be any number of configuration sections in a syntax file. ## Configuration Switch A configuration switch is a ©configuration parameterª which can take on only the two values true and false, or on and off. You set a configuration switch, or turn it on, by simply naming it in your ©configuration fileª or in a ©configuration sectionª of your ©syntax fileª. You turn it off, or "reset" it, by use of the tilde: "~nest comments", for example, resets, or turns off, the ©nest commentsª switch. If you prefer, you may assign the value "on" to set the switch, or "off" to reset it. For example: nest comments = on ## Conflict "Conflicts" arise during the ©grammar analysisª when AnaGram cannot determine how to treat a given input token. There are two sorts of conflicts: ©shift-reduce conflictsª and ©reduce-reduce conflictsª. Conflicts may arise either because the grammar is inherently ambiguous, or simply because the grammar analyzer cannot look far enough ahead to resolve the conflict. In the latter case, it is often possible to rewrite the grammar in such a way as to eliminate the conflict. In particular, ©null productionsª are a common source of conflicts. When AnaGram analyzes your grammar, it lists all unresolved conflicts in the ©Conflictsª window. A number of ©Auxiliary Windowsª available from the Conflicts window provide help in identifying the source of the conflict. There are a number of ways to deal with conflicts. If you understand the conflict well, you may simply choose to ignore it. When AnaGram encounters a shift-reduce conflict while building parse tables it resolves it by choosing the ©shift actionª. When AnaGram encounters a reduce-reduce conflict while building parse tables, it resolves it by selecting the ©grammar ruleª which occurred first in the grammar. A second way to deal with conflicts is to set ©operator precedenceª parameters. If you set these parameters, AnaGram will use them preferentially to resolve conflicts. Any conflicts so resolved will be listed in the ©Resolved Conflictsª window. A third way to resolve a conflict is to declare some tokens as ©stickyª. This is particularly useful for ©productionªs whose sole purpose is to skip over uninteresting input. A fourth way to resolve conflicts is to declare a token to be a ©subgrammarª. When you do this, AnaGram does not look beyond the definition of the subgrammar token itself for reducing tokens. This is not a particularly selective way to resolve conflicts and should be used only when the subgrammar token is naturally defined only by internal criteria. The tokens identified by lexical scanners are prime examples of this genre. The fifth way to deal with conflicts is to rewrite the grammar to eliminate them. Many people prefer this approach since it yields the highest level of confidence in the resulting program. Please refer to the AnaGram User's Guide for more information about dealing with conflicts. ## Conflicts If there are ©conflictªs in your grammar which are not resolved by ©precedence rulesª, they will be listed in the Conflicts window. The Conflicts window will also be listed in the ©Browse Menuª. Conflicts which have been resolved by ©precedence rulesª are listed in the ©Resolved Conflictsª window. The Conflicts window lists the conflicts, or ambiguities, which AnaGram found in your grammar. The table identifies the ©parser statesª in which it found conflicts, the ©conflict tokenªs for which it had more than one option, and the ©marked rulesª for each such option. If one of the rules for a particular conflict has a ©marked tokenª, the conflict is a ©shift-reduce conflictª. The marked token is the token to be shifted. If none of the rules has a marked token the conflict is a ©reduce-reduce conflictª. AnaGram provides a number of ©Auxiliary Windowsª to help you find and fix the source of the conflict. The ©Conflict Traceª window is a pre-built ©Grammar Traceª window which shows you one of perhaps many ways to encounter the conflict. The ©Reduction Traceª window shows the result of reducing a particular ambiguous rule. In addition, the ©Rule Derivationª and ©Token Derivationª windows show you why the conflict token is a ©reducing tokenª. They are particularly useful for shift-reduce conflicts. The ©Expansion Chainª window is helpful for understanding reduce-reduce conflicts. Other Auxiliary Windows which are often useful are the ©State Definitionª window, the ©Reduction Statesª window, and the ©Problem Statesª window. Please refer to the AnaGram User's Guide for more information on how to deal with conflicts. ## Conflicts Resolved by Precedence Rules This ©warningª message indicates that AnaGram has resolved conflicts in your grammar by using ©precedence rulesª: guidelines you supplied either by explicit ©precedence declarationsª, by using a ©stickyª statement or ©distinguish lexemesª statement, or implicitly by using a ©disregardª statement. These conflicts are listed in the ©Resolved Conflictsª window, and are not listed in the ©Conflictsª window. ## Conflict Token In any given ©conflictª, there is a ©tokenª for which an unambiguous ©parser actionª cannot be determined. This token is called the "conflict token". ## Conflict Trace The Conflict Trace is a ready-made ©Grammar Traceª which shows you one of perhaps many ways to get to the state which has the ©conflictª selected by the cursor bar. The Conflict Trace window is an option in the ©Auxiliary Windowsª menu for the ©Conflictsª window and the ©Resolved Conflictsª window. ## Const Data The const data ©configuration switchª controls the use of CONST qualifiers in generated code. If the switch is set, all fixed data arrays in the ©parser fileª will be qualified as CONST, unless the ©old styleª switch is set. The default setting is ON. Other configuration switches which control declaration qualifiers in the parser file are ©near functionsª and ©far tablesª. ## CONTEXT "CONTEXT" is a macro which AnaGram defines for you if you have defined a ©context typeª. It provides access to the top value of the ©context stackª. Your ©GET_CONTEXTª macro may store the current context by assigning a value to CONTEXT. Suppose your parser uses ©pointer inputª, and you wish to know the value of the ©pointerª for every production. You could define GET_CONTEXT thus: #define GET_CONTEXT CONTEXT = PCB.pointer In ©reduction procedureªs, you may use the CONTEXT macro to find the context for the rule you are reducing, that is to say, the value the context variables had when the first token in the rule was encountered. ## Context Stack It is often convenient, when writing ©reduction procedureªs, to know the actual context of the ©grammar ruleª your procedure is reducing. To do this you need to know the values that certain variables, such as stack pointers, or input pointers, in your program had at various stages as your parser matched the rule. You can accomplish this by maintaining a context stack. If you wish, AnaGram will keep track, on a stack, of any context variables you wish. To do so, define a structure which can hold all the values you need to stack. Use the ©context typeª ©configuration parameterª to tell AnaGram how to declare the stack. Then define the ©GET_CONTEXTª macro to gather the appropriate values and store them on the stack. The ©CONTEXTª macro evaluates to the proper location into which the GET_CONTEXT macro should store the context value. AnaGram will invoke the GET_CONTEXT macro whenever necessary to make sure the right values are stacked. In a reduction procedure, you can then use the macro ©RULE_CONTEXTª to find the value of the context structure as of the beginning of each token in the rule you are reducing. If your parser is ©event drivenª, store the context of the input token in PCB.input_context. The default version of GET_CONTEXT will stack the context as appropriate. If your parser should encounter an error, you may use ©ERROR_CONTEXTª to determine the values of the context variables at the beginning of the aborted grammar rule. ## context type "Context type" is a ©configuration parameterª whose value is a C type name, possibly as defined by a typedef statement. By default, "context type" is undefined. If you define it, AnaGram will set up a ©context stackª in your ©parser control blockª so you can track the context of ©productionªs. Each time your parser pushes values onto the state stack and value stack it will invoke the ©GET_CONTEXTª macro to store the current context on the context stack. The macro ©CONTEXTª names the current stack location. In your GET_CONTEXT macro you can use it as the destination for the current context. In a ©reduction procedureª, CONTEXT names the context as of the beginning of the production. Two other macros are available to inspect the values of the context stack. In a reduction procedure, you may use ©RULE_CONTEXTª[k] to determine the value of the context variable as it was as of the (k+1)th token in the rule. In particular, RULE_CONTEXT[0] is the value the context variable had when the first token in the rule was seen. If you enable the ©error frameª ©configuration switchª, you may use ©ERROR_CONTEXTª to determine the context of the production your parser was trying to identify at the time of the error. ## CONVERT_CASE CONVERT_CASE is a user definable macro which AnaGram invokes to convert the case of input characters when the ©case sensitiveª switch has been turned off. If you do not define the macro yourself, AnaGram will provide a macro which will convert case correctly for characters in the ASCII character range and also for ©ISO latin 1ª characters if the corresponding ©configuration switchª is on. ## Coverage File Name If you have set the ©rule coverageª ©configuration switchª to include coverage analysis in your parser, AnaGram uses the value of the coverage file name ©configuration parameterª to find the results of your testing. The value of the parameter is a string. The default value is "#.nrc", where '#' represents the name of your syntax file. ## cs cs is a field in a ©parser control blockª which contains your ©context stackª. cs will be defined only if you have defined the ©configuration parameterª ©context typeª. ## Current Grammar The Current Grammar is the ©grammarª you presently have loaded. Its name is displayed on the title bar of each AnaGram window. A status field at the right center of the ©Control Panelª indicates the state of processing that has been carried out on the grammar. "Loaded" means that the ©syntax fileª has been read into memory, but that syntax errors have been found. "Parsed" means that AnaGram has tried to analyze the grammar, but got into some kind of difficulty and did not complete the job. The explanation should be apparent from the messages in the ©Warningsª window. "Analyzed" means that a ©grammar analysisª has been completed, but no ©output filesª have been written. "Built" means that an analysis has been completed and output files have been written. ## Data Type The ©tokensª in your ©parserª usually have ©semantic valuesª. The data types for these values will be determined by the ©default input typeª and ©default token typeª ©configuration parameterªs unless you explicitly provide ©token declarationsª in your grammar. You may also define the data type for any ©nonterminalª token by preceding the token name with an ordinary C cast when you write a production. For example: (int) integer -> '0-9':d =d-'0'; -> integer:n, '0-9':d =10*n + d - '0'; The data type may be any simple C or C++ data type, with arbitrary indirection and qualification. You may also use any type you have defined by means of typedef, struct or class definitions. Template classes may also be used. If you specify a type of your own definition, you must provide a definition in the ©C prologueª at the beginning of your ©syntax fileª. A token may have the type "void" if its value has no interest for the parser. Since your parser will not stack a value for a void token, your parser may run somewhat faster when tokens are declared as void. ## Declare pcb "Declare pcb" is a ©configuration switchª that defaults to on. If this switch is set when you invoke the ©Build Parserª command, AnaGram will automatically declare a ©parser control blockª for you, at the beginning of your parser file. If you have used data types that you define yourself, the typedef statements need to precede the parser control block declaration. In this case, you should turn "declare pcb" off and declare it yourself. For more information, see the AnaGram User's Guide. ## Default Input Type The default input type is a ©configuration parameterª which determines the ©data typeª for the ©semantic valueªs of ©terminal tokensª if they are not explicitly declared. Normally, you would explicitly declare terminal tokens only when you have set the ©input valuesª ©configuration switchª. If you do not set the default input type, it will default to "int". The default data type for the values of ©nonterminal tokensª is given by the ©default token typeª configuration parameter. ## Default Reduction "Default reductions" is a ©configuration switchª which defaults to on. A "default reduction" is a ©parser actionª which may be used in your parser in any state which has precisely one ©completed ruleª. If a given ©parser stateª has, among its ©characteristic rulesª, exactly one completed rule, it is usually faster to reduce it on any input than to check specifically for correct input before reducing it. The only time this default reduction causes trouble is in the event of a ©syntax errorª. In this situation you may get an erroneous reduction. Normally when you are parsing a file, this is inconsequential because you are not going to continue semantic action in the presence of error. But, if you are using your parser to handle real-time interactive input, you have to be able to continue semantic processing after notifying your user that he has entered erroneous input. In this case you would want default reductions to have been turned off so that ©productionªs are reduced only when there is correct input. ## Default reduction value If a ©grammar ruleª does not have a ©reduction procedureª the ©semantic valueª of the first token in the rule will be taken as the semantic value of the token on the left hand side. If these tokens do not have the same ©data typeª a ©warningª will be given. ## Default Token Type "Default token type" is a ©configuration parameterª which determines the ©data typeª for the ©semantic valueª of a ©nonterminal tokenª if no other type is explicitly specified. It defaults to void. Therefore, if any ©reduction procedureª returns a value, you must either explicitly set the type of the ©reduction tokenª or you must set default token type to an appropriate value. The default token type cannot have a ©wrapperª class defined. The default data type for the value of a ©terminal tokenª is given by the ©default input typeª configuration parameter. ## Definition, Definition Statement AnaGram syntax files may contain definition statements which assign new names to ©character setsª, ©virtual productionsª, ©keyword stringsª, ©immediate actionsª, or ©tokensª. Definitions have the form name = <character set> name = <virtual production> name = <keyword string> name = <immediate action> name = <token name> For example, letter = 'a-z' + 'A-Z' statement list = statement?... include = "include" The symbols thus defined may be used anywhere the expression on the right hand side might be used. Such definitions, in and of themselves, do not define tokens. Tokens are defined only by their usage in productions. ## DELETE_WRAPPERS If your parser uses ©wrapperªs and exits with an error condition, there may be objects remaining on the ©parser value stackª. The DELETE_WRAPPERS macro can be used to delete any remaining objects on the stack. If you have enabled ©auto resynchª, DELETE_WRAPPERS will be invoked automatically. ## Diagnose Errors "Diagnose errors" is a ©configuration switchª which defaults to on. When this switch is on, AnaGram includes a function, ag_diagnose(), in your parser which provides simple syntax error disgnoses. When your parser encounters a syntax error, this function will be called immediately prior to the invocation of the ©SYNTAX_ERRORª macro. A pointer to the message will be stored in the ©error_messageª field of the ©parser control blockª. If you wish to implement your own ©error diagnosisª, you should turn this switch off, and include a call to your own diagnostic procedure in your SYNTAX_ERROR macro. ag_diagnose() provides three possible error messages, governed by three macros: ©MISSING_FORMATª, ©UNEXPECTED_FORMATª, and ©UNNAMED_TOKENª. You may override the definitions of these macros with your own definitions if you wish to provide diagnostics in another language If you have set the ©error frameª switch it will also set the ©error_frame_tokenª field. The "error_frame_token" is the non-terminal token which the parser was trying to complete when the error was encountered. When the "diagnose errors" switch is set, AnaGram also includes the a ©token namesª table in the parser which contains the ascii names of the tokens in the grammar, including entries for character constants and keywords. Use the ©token names onlyª switch to limit the table to explicitly named tokens only. ## MISSING_FORMAT MISSING_FORMAT is a macro that is used by the error diagnositic function created by the ©diagnose errorsª switch. If you do not define it in your parser, AnaGram will define it thus: #define MISSING_FORMAT "Missing %s" This format is used when the diagnostic function can identify a unique terminal or nonterminal token that would satisfy the syntactic rules and is named in the ©token namesª table. ## UNEXPECTED_FORMAT UNEXPECTED_FORMAT is a macro that is used by the error diagnositic function created by the ©diagnose errorsª switch. If you do not define it in your parser, AnaGram will define it thus: #define UNEXPECTED_FORMAT "Unexpected %s" This format is used when the diagnostic function cannot identify a named, unique terminal or nonterminal token that would satisfy the syntactic rules and finds an incorrect token, the name of which can be found in the ©token namesª table. ## UNNAMED_TOKEN UNNAMED_TOKEN is a macro that is used by the error diagnositic function created by the ©diagnose errorsª switch. If you do not define it in your parser, AnaGram will define it thus: #define UNNAMED_TOKEN "input" This macro is used as argument for the ©UNEXPECTED_FORMATª macro when the actual, erroneous input cannot be identified. ## Difference In set theory, the difference of two sets, A and B, is defined to be the set of all elements of A that are not elements of B. In an AnaGram ©syntax fileª, you represent the difference of two ©character setsª by using the '-' operator. Thus the difference of A and B is A - B. The difference operator is ©left associativeª. ## Disregard The purpose of the "disregard" statement is to skip over uninteresting ©white spaceª and comments in your input file. It allows you to specify a token that should be passed over in the input to your parser. The statement takes the form: disregard ws where "ws" is a token name or character set. Disregard statements, like other ©attribute statementªs, may be placed in any ©configuration sectionª. You may have more than one disregard statement in your ©grammarª. If you do, AnaGram will create a shell production. For example, suppose you write: [ disregard alpha disregard beta ] AnaGram will proceed as though you had written: gamma -> alpha | beta [ disregard gamma ] It frequently happens that you wish your ©parserª to disregard blanks or comments, except that ©white spaceª within names, numbers, strings, and other elementary constructs is subject to special rules and thus should not be disregarded blindly. In this case, you can use the "©lexemeª" statement to declare these constructs off limits for the disregard statement. Within these constructs, the disregard statement will be inoperative and the admissibility of white space is determined solely by the productions which define these constructs. Outside those productions which define lexemes, you should not generally use a token which is supposed to be disregarded. If you do, your grammar will have ©conflictªs, since the token could satisfy both the explicit usage, as well as the implicit rules set up by the disregard statement. Such conflicts, however, are resolved automatically in favor of your explicit use of the token. The conflicts will appear in the ©Resolved Conflictsª window. If you have "open ended" lexemes in your grammar such as variable names or numeric constants, your grammar will detect a conflict if one of these lexemes may follow another such lexeme immediately. To deal with these conflicts, you should turn on the "©Distinguish Lexemesª" configuration switch. It will cause white space to be required as a separator between the lexemes. In order to implement the "disregard" statement AnaGram will redefine some tokens in your grammar. For example, '+' may be redefined to consist of a simple plus sign followed by optional white space: '+' -> '+'%, white space?... The ©percent signª is used to indicate the original, simple plus without the optional white space attached. You will probably notice the percent sign appearing in some windows and traces. ## distinguish keywords "distinguish keywords" is an ©attribute statementª which you may include in a ©configuration sectionª. It is used to tell AnaGram how to distinguish ©keywordªs from similar sequences of characters in your input stream. For example, you may want your parser to recognize "int" as a keyword when it appears in the following context: int x; but not when in appears in the middle of such words as "integral" and "intolerant". The operand of "distinguish keywords" is a list of character set ©expressionªs separated by commas and enclosed in braces ({ }). Once AnaGram has read your entire syntax file, it evaluates all of these character sets and tests each keyword string against the character sets in the order in which they were encountered in the program. If all the characters which constitute a particular keyword are members of the specified set, the keyword logic is set up so that it will recognize the keyword only if the immediately following character is not in the set. In the example above, [distinguish keywords {'a-z'} ] will do the trick. The "©stickyª" statement also affects the recognition of keywords. ## Distinguish Lexemes The "distinguish lexemes" ©configuration switchª is used in conjunction with the "©disregardª" statement and the "©lexemeª" statement to resolve the ©shift-reduce conflictªs which often crop up when suppressing white space. The difficulty with suppressing white space is that you wish it to be optional in cases like "x+y", where it is not necessary in order to parse correctly, but you want to require it in situations such as "mytype x", where it is necessary to separate otherwise indistinguishable constructs. If the white space were optional, it would be necessary to allow for "mytypex", but it would be impossible to determine if this were to be interpreted as "mytype x", "mytyp ex", or any of the many other possibilities. The distinguish lexemes switch causes AnaGram to make the white space optional where doing so causes no ambiguity and makes it mandatory where to make it optional would lead to ambiguity. In the example given above, "mytypex" would be treated as a single name, and another name would have to follow separating white space. The default value for distinguish lexemes is OFF. It is anticipated that this will be changed to ON in future releases of AnaGram. ## Duplicate Production This ©warningª message appears when a ©productionª appears twice in your ©grammarª. You will have a number of ©reduce-reduce conflictªs as a consequence. Eliminate the duplicate, and the conflicts it caused will go away. ## Edit Command "Edit command" is a ©configuration parameterª which accepts a string value. It is no longer used and is retained only for file compatiblity with the DOS version of AnaGram. ## Embedded C You may encapsulate pieces of C or C++ code in your ©syntax fileª more or less arbitrarily. Such pieces of code will simply be copied to the ©parser fileª in the order in which they are encountered. Each such piece of code must be enclosed with braces({}). The left brace must be on a new line, and nothing except comments may follow the right brace. AnaGram does not inspect the interior of such a piece of C code except to identify character constants, strings, comments and blocks surrounded with braces so that it does not identify the end of the embedded C prematurely. Note that AnaGram will use the status of the ©nest commentsª ©configuration switchª in effect at the beginning of the embedded C. AnaGram, of course, can be confused by unterminated strings, unbalanced brackets, and unterminated comments. The most likely outcome, in such a situation, is that AnaGram will encounter an end of file looking for the end of the embedded C. Should this happen, AnaGram will identify the beginning of the piece of embedded C which caused the problem. If your syntax file begins with a block of embedded C, called the "©C prologueª", it will be copied to the very beginning of the parser file, preceding all of AnaGram's output. You may use such an initial block of embedded C to guarantee that program title comments, copyright notices and important definitions are at the very beginning of your parser file. The code you include as embedded C, of course, has to coexist with the code AnaGram generates. In order to keep the potential for name conflicts to a minimum, all variables and functions which AnaGram defines begin with the letters "ag_". You should avoid variable names which begin with these letters. If AnaGram finds no embedded C in a syntax file, and you ask it to build a parser, it will automatically generate a main program that calls your parser. If you don't want it to do this, you may turn off the ©main programª ©configuration switchª. ## Empty Keyword String This ©warningª appears when you have a keyword string that contains no characters whatsoever. ©Keyword stringsª must contain at least one character. If you wish a null match, use a ©null productionª instead. ## Enable Mouse "Enable mouse" is a ©configuration switchª that defaults to on. It is not used in the Windows version of AnaGram and has been retained only for file compatibility with the DOS version. ## Enum Constant Name The "enum constant name" ©configuration parameterª allows you to select the name AnaGram will use for the set of enumeration constants it defines in the ©parser headerª file for your ©parserª. The value of "enum constant name" should be a string containing the '%' character. AnaGram will substitute each token name in turn into this template as it creates the list of enumeration constants. If it finds a '$' character it will substitute the name of your parser. The default value of "enum constant name" is "$_%_token". ## Enumeration Constants In your ©parser headerª file, AnaGram includes a typedef enum statement which provides enumeration constants corresponding to all the named constants in your grammar. The names of the enumeration constants themselves are defined by the ©enum constant nameª ©configuration parameterª. These constants are useful when dealing with ©semantically determined productionsª. ## Enum Within a ©configuration sectionª, you may use an "enum" statement to define numeric values for any number of tokens just as you define enumeration constants in C. The syntax is effectively the same as the enum statement in C: [ enum { first = 60, second, third, fourth = 'a', fifth, } ] is exactly equivalent to first = 60 second = 61 third = 62 fourth = 'a' fifth = 'b' ## eof "eof" is a quasi reserved word in AnaGram, used to specify an end of file token. You may use another token as an end of file delimiter by setting the ©Eof Tokenª ©configuration parameterª. eof is not required unless you use ©automatic resynchronizationª in your ©parserª. If you have not defined eof or specified an Eof Token parameter, ©File Traceª may show a syntax error when it encounters the end of a test file. There are various ascii values that are commonly used to represent an end of file. The end of a string in memory is commonly 0, DOS uses ^Z, Unix uses ^D, and Unix style stream I/O uses -1. It is often convenient then to define eof = -1 + 0 + ^D + ^Z ## Eof Token "Eof token" is a ©configuration parameterª which accepts a token name as a value. There is no default value. AnaGram does not need a specification for the eof token unless you are using its ©automatic resynchronizationª facility. If you use the ©automatic resynchronizationª capability of AnaGram, you must specify explicitly an end of file token. You can do this either by defining a ©terminal tokenª in your ©grammarª called eof or by using the "eof token" parameter to identify some other terminal token to be used as the end of file marker. You would do this only if you must use the name "©eofª" for some other purpose. Note that "eof" is case sensitive. Neither Eof nor EOF will qualify as end of file tokens unless you explicitly specify them using the eof token parameter. ## Eof Token Not Defined This ©warningª appears if you have requested either ©error token resynchronizationª or ©automatic resynchronizationª and you have not defined an ©eof tokenª. The resynchronization procedure will not work correctly at end of file. ## Error Action The error action is one of the four ©parser actionªs of a traditional ©parsing engineª. The error action is performed when the parser has encountered an input token which is not admissible in the current state. The further behavior of a traditional parser is undefined. ## Error Defining "Error defining TXXX: <token representation>" is a ©warningª message which appears if errors are encountered while attempting to evaluate the ©character setª for the specified ©tokenª. This warning is always generated in addition to more detailed warnings that are made when the actual errors are encountered. ## Error frame "Error frame" is a ©configuration switchª which defaults to off. You use this switch to specify the ©error diagnosisª capabilities of your parser. If this switch is set and the ©diagnose errorsª switch is set, i.e., on, your parser will include a function which will determine the "context" of any ©syntax errorª, that is, the token the parser was trying to complete. To determine the context of an error, your parser will scan backwards through the ©parser state stackª, examining ©characteristic rulesª until it finds a state which can accept a unique ©nonterminalª reduction token that you have not marked as ©hiddenª. It will then set PCB.©error_frame_ssxª to the ©parser stack indexª for that level. ## ERROR_CONTEXT ERROR_CONTEXT is a macro AnaGram defines for you. If your parser encounters a ©syntax errorª, you have enabled the ©error frameª ©configuration switchª, and you have defined a ©context typeª, ERROR_CONTEXT will enable you to access the ©contextª as of when the parser encountered the beginning of the ©error_frame_tokenª. ## Error Diagnosis "Error diagnosis" and ©error recoveryª are the two aspects of ©error handlingª. If in the ©embedded Cª portion of your syntax file you define a macro called ©SYNTAX_ERRORª, it will be invoked by the parser when a ©syntax errorª is encountered. If you have set the ©diagnose errorsª ©configuration switchª, the ©error_messageª field of the ©parser control blockª will contain a pointer to a string containing a diagnostic message. The diagnostic is of the form "Missing <token name>" or "Unexpected <token name>". If you do not define SYNTAX_ERROR it will be automatically defined so that a message will be written to stderr. If the ©lines and columnsª switch has been set you will have the current line number and column number available for your diagnostic message. If you have set the ©error frameª switch as well as the diagnose errors switch, the variable PCB.©error_frame_tokenª will identify the ©nonterminal tokenª the parser was trying to recognize when the error was encountered. Of course, if your parser is controlling direct keyboard input, a diagnosis might be unnecessary. In this case you might define SYNTAX_ERROR so that it simply beeps at the user and let it go at that. ## Error Handling Rarely is a parser built to read an arbitrary input file. The normal situation is that the parser is built to read files that conform to the rules specified in a grammar, rules that describe a class of input files rather than all possible input files. If the input file does not conform to the grammar, the parser will detect a ©syntax errorª. There are two aspects to error handling in your parser: ©error diagnosisª and ©error recoveryª. Error diagnosis consists in informing your user that something unexpected has happened. Error recovery consists in either aborting the parse, or getting it started again in some reasonable manner. AnaGram provides several options for both error diagnosis and error recovery. When a syntax error is encountered, first your error diagnosis option is executed and then your error recovery option is executed. ## error_message error_message is a field in a ©parser control blockª to which your ©error handlingª procedures may refer. If you have set the ©diagnose errorsª ©configuration switchª, on encountering a ©syntax errorª your ©parserª will create a string containing an appropriate diagnostic message and store a pointer to it into PCB.error_message. ## Error Trace "Error Trace" is both a ©configuration switchª and the name of an option in the ©Action Menuª. If the switch is on, AnaGram adds code to your parser to capture state information to a file in case of a ©syntax errorª. The Error Trace option can then read this information and prepare a pre-built ©Grammar Traceª showing you the state of the parser at the time of the error. The name of the file is determined by the macro ©AG_TRACE_FILE_NAMEª. AnaGram will provide a default definition for the macro consisting of the name of your ©syntax fileª plus the extension ".etr". You may override this definition by defining AG_TRACE_FILE_NAME in your ©embedded Cª. If error trace is enabled, AnaGram will also enable the Error Trace option on the ©Action Menuª. If you select Error Trace AnaGram will initialize a ©Grammar Traceª window from the error trace file you select. The parser stack of the trace will be as it was when the error occurred. The last line of the parser stack pane will show the ©lookahead tokenª that caused the syntax error. You may then use the Grammar Trace to explore the nature of the syntax error your parser encountered. AnaGram will warn you if the error trace file is older than the syntax file, since under those conditions, the error trace file might be invalid. ## AG_TRACE_FILE_NAME AG_TRACE_FILE_NAME is a C macro used to determine the name of the file your parser will write when it encounters a ©syntax errorª if you have enabled the ©error traceª ©configuration switchª. You may define AG_TRACE_FILE_NAME in your ©embedded Cª. AnaGram provides a default definition given by the name of your ©syntax fileª with the extension ".etr". ## Error Recovery Error recovery is the process of continuing after a ©syntax errorª. AnaGram offers several options. These are controlled by ©configuration parameterªs and by your grammar. If you do not specify any error recovery, your parser will simply return to the calling program when it encounters a syntax error. ©PCBª.©exit_flagª will be set to two, to indicate termination on syntax error. If you wish your parser to simply ignore the erroneous token and continue, set PCB.exit_flag to zero in your ©SYNTAX_ERRORª macro. You might use this option if your parser is dealing directly with keyboard input. You may wish to use YACC type error handling. To do this, simply incorporate a token called "error" in your grammar, or specify some other token as an ©error tokenª. On syntax error, your parser will back up to the most recent state where "error" was acceptable input, treat the bad input as an instance of error, and then skip all input until it finds an acceptable input token. At that point it will proceed as though nothing had happened. AnaGram also provides an ©automatic resynchronizationª option, which uses a complex heuristic to compare input tokens against all stacked states in order to find the best state from which to continue. ## Error Token Resynchronization One of your options for ©error recoveryª after a ©syntax errorª is a technique similar to that provided in YACC. You include a terminal token called "error" in your grammar. (Or, use the ©error tokenª configuration parameter to specify some other token to serve this purpose.) When the parser encounters an error in the input, after invoking the ©SYNTAX_ERRORª macro, it backs up the ©parser state stackª to the most recent state in which "error" was an acceptable input. It then shifts to the new state as though it had seen an actual "error" token. At this point, it skips over any character in the input which is not an acceptable input character for this state. Once it does find an acceptable input character, it continues processing as though nothing had happened. ## error_frame_ssx error_frame_ssx is a field in a ©parser control blockª to which your ©error handlingª routines may refer. When your ©SYNTAX_ERRORª macro is called, if you have set both the ©diagnose errorsª and ©error frameª configuration switches, error_frame_ssx will contain the value of the ©parser stack indexª at the beginning of the ©error_frame_tokenª. For example, if in a syntax file, you fail to close a comment, AnaGram will encounter an illegal end of file in the comment. In this situation, error_frame_token is the token for a comment, and error_frame_ssx gives the parser stack depth at the beginning of the comment. ## error_frame_token error_frame_token is a field in a ©parser control blockª to which your ©error handlingª routines may refer. If you have set both the ©diagnose errorsª and ©error frameª ©configuration switchªes, when your ©SYNTAX_ERRORª macro is called, it will contain the ©token numberª of the error_frame_token. ## error, Error Token "Error token" is a ©configuration parameterª that takes a token name for a value. It has no default value. If you do not specify it, and your grammar has a terminal token called "error", it will be used as the error token. If you have an error token defined your parser will presume that you wish to use the ©error token resynchronizationª method of ©error recoveryª. ## Escape Backslashes "©Escape backslashesª" is a ©configuration switchª that defaults to off. When turned on, the ©line numbersª switch will write pathnames with doubled backslashes. The switch is no longer necessary, since AnaGram now uses forward slashes in the pathnames in #line directives rather than backslashes.switch. ## Event Driven It is often convenient to configure your parser to be "event driven". In this situation, instead of calling your parser once to process the entire input, you call an ©initializerª to initialize the parser, and then you call the parser once for each input token. Each time you call it, the parser processes the single input token until it can do no more. You can interrogate the ©exit_flagª field of the ©parser control blockª to determine whether the parse is complete or whether the parser encountered an error. Event driven parsers are especially convenient for dealing with terminal input or communications protocols. ## Event Driven Parser Cannot Use Pointer Input This ©warningª message appears if you specify pointer input for your ©parserª and also specify that it should be event driven. If you are going to use ©pointer inputª, you should not specify your ©parserª as event driven. Conversely, if you really want an ©event drivenª parser, you cannot specify pointer input. ## Excessive Recursion This ©warningª message appears if an internal stack in AnaGram overflows because of the complexity of an expression in your ©grammarª. Simplify your grammar by using ©definitionª statements to name subexpressions. ## exit_flag exit_flag is a field in the ©parser control blockª. When your parser returns, PCB.exit_flag contains an exit code describing the outcome of the parse. Mnemonic values for the exit codes are defined in the parser header file AnaGram generates. These mnemonics, their values and their meanings are: AG_RUNNING_CODE = 0: Parse is not yet complete AG_SUCCESS_CODE = 1: Parse terminated successfully AG_SYNTAX_ERROR_CODE = 2: Syntax error encountered AG_REDUCTION_ERROR_CODE = 3: Bad reduction token encountered AG_STACK_ERROR_CODE = 4: Parser stack overflowed AG_SEMANTIC_ERROR_CODE = 5: Semantic error, user defined An AnaGram parser checks exit_flag on return from every ©reduction procedureª. AnaGram will exit with the flag unchanged if it is non-zero. To halt a parse from a reduction procedure, then, you need only set the exit_flag to AG_SEMANTIC_ERROR_CODE, or any other unused value greater than zero that suits your needs. ## Expansion, Expansion Rule In analyzing a ©grammarª, we are often interested in the full range of input that can be expected at a certain point. The expansion of a ©tokenª or state shows us all the expected input. An expansion yields a set of ©marked ruleªs. The ©marked tokenª in each rule shows us what input to expect. The set of expansion rules of a (©nonterminalª) token shows all the expected input that can occur whenever the token appears in the grammar. The set consists of all the ©grammar ruleªs produced by the token, plus all the rules produced by the first token of any rule in the set. A ©marked tokenª for an expansion rule of a token is the first element in the rule. The expansion of a state consists of its ©characteristic ruleªs plus the expansion rules of the marked token in each characteristic rule. ## Expansion Chain You may select an Expansion Chain window from the ©Auxiliary Windowsª popup menu of most windows that contain ©expansion ruleªs. The Expansion Chain window is extremely useful for indicating why a particular ©grammar ruleª is an ©expansion ruleª in a particular state. To see a chain of productions that produces a desired expansion rule, select the expansion rule with the cursor bar, press the right mouse button for the Auxiliary Windows menu, and select Expansion Chain. The Expansion Chain window will then present a sequence of expansion rules, using the same format as the Expansion Rules window, but subject to the constraint that each rule is produced by the ©marked tokenª in the previous line. The first rule in the window is a ©characteristic ruleª for the given state. The last rule in the window is the rule selected by the cursor bar in the window from which you chose the Expansion Chain. It should be noted that this expansion is not unique. There may be other derivations. ## Expansion Rules You may select an Expansion Rules window from the ©Auxiliary Windowsª popup menu of most windows which display ©marked rulesª. The Expansion Rules window shows the complete set of ©expansion ruleªs for the ©marked tokenª in the highlighted rule. In other windows, including all trace windows, the Expansion Rules window shows the expansion of the token on the highlighted line. ## F1 Use the F1 key to bring up a context sensitive help window. Because of various peculiarities of the Windows API, there are a few contexts where the F1 key does not work; however, generally the ©help cursorª works where F1 does not and vice versa. ©Helpª windows have hypertext links to related help windows. In a help window, the right mouse button pops up a menu of all the links for the window. ## extend pcb The "extend pcb" statement is an ©attribute statementª that allows you to add declarations of your own to the ©parser control blockª. With this feature, data needed by ©reduction procedureªs can be stored in the pcb rather than in global or static storage. This capability greatly facilitates the construction of ©thread safe parsersª. The extend pcb statement may be used in any configuration section. The format is as follows: extend pcb { <C or C++ declaration>... } It may, of course, extend over multiple lines and may contain any C or C++ declarations. AnaGram will append it to the end of the parser control block declaration in the generated parser ©header fileª. There may be any number of extend pcb statements. The extensions are appended to the pcb in the order in which they occur in the syntax file. The extend pcb statement is compatible with both C and C++ parsers. Note that even if you are deriving your own class from the parser control block, you might want to use the extend pcb to provide virtual function definitions or other declarations appropriate to a base class. ## Far Tables "Far tables" is a ©configuration switchª which defaults to off. If it is set, when AnaGram builds a ©parserª it will declare the larger tables it builds as FAR. This can be a convenience when using some memory models with 8086 architecture. ## Fatal Syntax Errors This ©warningª message occurs when AnaGram cannot complete the ©Analyze Grammarª command on your ©syntax fileª because of errors in your syntax file. ## File Trace You can use the File Trace facility to verify your grammar, even before you have implemented ©reduction proceduresª or any other code. Thus you can defer writing procedural code until you have the grammar working to your specifications. To run File Trace, select File Trace from the ©Action Menuª or click on the File Trace button. Select a test file. When the ©File Trace Windowª appears, double click at any point in the ©test file paneª, or click the ©Parse Fileª button to parse the entire file. AnaGram will parse up to the point you have selected according to the rules in your ©grammarª. If the test file does not conform to the rules of the grammar, the parse will halt with a ©syntax errorª. You can then inspect the ©Parser Stack paneª and the ©Rule Stack paneª to get an idea of the nature of the problem. AnaGram uses different colors to distinguish the portion of the test file that has been parsed from the portion that has not been parsed, so the location of the error should be readily apparent. Since the syntax error often occurs somewhat downstream from the actual error, you may need to back the parse up and approach the error slowly. In the Test File pane, double click at any point prior to the error to back the parse up to that point. You can then click on the ©Single Stepª button to perform a single parser action. You may also use the cursor keys to control the parse. As long as no error is encountered, the parse is locked to the blinking cursor. If you cursor past the syntax error, however, the parse can no longer track the cursor so the cursor location will differ from the parse location . The cursor and parse locations will also differ after you single click at any point other than the current parse location. When the cursor and the parse location are thus out of synch, the Single Step button is replaced with a ©Synch Parseª button. You can click on Synch Parse to get the parse back in synch with the cursor. The File Trace option will be greyed out on the ©Action Menuª if your grammar has ©empty recursionª, since such a grammar may cause infinite loops in the parser. Because a File Trace is based on character codes, it will also be greyed out on the ©Action Menuª if your parser uses ©token inputª rather than character input. All parser actions performed by a File Trace update the ©trace coverageª counts, enabling you to verify the extent to which your test files exercise your parser. Normally, AnaGram reads test files in "text" mode, discarding carriage return characters. If your parser needs to recognize carriage return characters explicitly, you should turn the "©test file binaryª" switch on. ## File Trace Window The ©File Traceª window normally consists of three panes: The ©Parser Stack paneª The ©Test File paneª The ©Rule Stack paneª If your grammar uses ©semantically determined productionsª, the ©Reduction Choices paneª will appear when necessary to allow you to select a ©reduction tokenª. The choice that you make will be remembered and reused if you should back up the parse and parse past this point again. The remombered choice is not made automatically when you use ©Single Stepª. Thus, if you wish to change your choice, position the cursor at the location where the choice must be made and Single Step past the choice. If you ©reloadª the test file, the choices you have made will be discarded. The active pane has a distinctively colored title panel and cursor bar. You can use the tab key to tab among the panes. The function of other keyboard keys depends on which pane is active. Along the bottom of the File Trace Window is a toolbar with two status boxes: ©Parse Locationª ©Parse Statusª and five buttons: ©Single Stepª ©Parse Fileª ©Resetª ©Reloadª ©Helpª If the blinking cursor loses synch with the current parse location, the Single Step button is replaced with the ©Synch Parseª button. ## Grammar Trace Window The ©Grammar Traceª window normally consists of three panes: The ©Parser Stack paneª The ©Allowable Input paneª The ©Rule Stack paneª If your grammar uses ©semantically determined productionsª, the ©Reduction Choices paneª will appear when necessary to allow you to select a ©reduction tokenª. The active pane has a distinctively colored column header and cursor bar. You can use the tab key to tab among the panes. The function of other keyboard keys depends on which pane is active. Along the bottom of the Grammar Trace Window is a toolbar with a ©Parse Statusª box, a ©text entryª field and four buttons: ©Proceedª ©Single Stepª ©Resetª ©Helpª In the ©Parser Stack paneª you can see a representation of the ©parser state stackª and ©parser stateª as they might appear in the course of execution of your ©parserª. You can examine the ©allowable inputª tokens and see the changes to the state and the state stack caused by any input token you choose. The ©Rule Stack paneª shows the relationship between the contents of the parser stack and your ©grammarª. If your grammar uses ©semantically determined productionsª, you can select the appropriate ©reduction tokenª from the ©Reduction Choices paneª. You can enter text characters directly in the ©text entryª field. This means you can run a Grammar Trace like a ©File Traceª where the test file is replaced by the characters you type in the text entry field. This is a very convenient way to check out your grammar. ## Test File, Test File Pane In the ©File Traceª, the file under test is displayed in the upper right pane. To parse to a specific point, double click at that point. As long as the parse location and the cursor are synchronized, when you use the cursor keys to move the cursor, the parse will track the cursor. If the parse encounters a ©syntax errorª, it will not be able to go beyond the location of the error. In this situation, moving the cursor right or down will cause the cursor position to differ from the parse location. The parse and cursor positions can also differ if you single click anywhere in the Test File pane. If the parse location and the cursor are thus not synchronized, the ©Single Stepª button will be replaced with a ©Synch Parseª button. Click on the Synch Parse button to get the cursor and the parse back in synch. Of course, the parse will still not be able to proceed past a syntax error. In the default color scheme, parsed text is shown on a lighter background than is unparsed text. If your grammar uses ©semantically determined productionªs, the parse will halt when one is encountered and the ©reduction choices paneª will be displayed so you may select the appropriate ©reduction tokenª. At any time you can click on the ©Reset buttonª to reset the parse to the beginning of the test file. If you modify the test file, you can click on the ©Reload buttonª to load the modified file and reset the parse. Normally, AnaGram reads test files in "text" mode, discarding carriage return characters. If your parser needs to recognize carriage return characters explicitly, you should turn the ©test file binaryª ©configuration switchª on. Sample test files are provided with the FFCALC and FC ©examplesª. ## Parse Location The current location of the ©File Traceª parser in the ©test file paneª. The format is <line number>:<column number>. ## Parse Status The current state of the ©File Traceª or ©Grammar Traceª parser. Ready: The parser is ready for input. Running: The parser is processing input. Parse Complete: The parser has reached the end of the input. Click on ©resetª or ©reloadª to restart the parse. Syntax error: A syntax error has been encountered. The parser cannot go any further. Unexpected end of file: The parser has reached the end of the actual input but the grammar still expects more. Select reduction token: The parser encountered a ©semantically determined productionª. Select a ©reduction tokenª from the ©Reduction Choices paneª. Selection error: The reduction token selected from the Reduction Choices pane was not allowable input in the present state. Select another reduction token. ## Parse File Use the Parse File button in the ©File Traceª to parse all the way to the end of file. The parse will not stop until it encounters a ©syntax errorª, a ©semantically determined productionª, or the end of file. ## Reset Use the Reset button in the ©File Traceª or ©Grammar Traceª to reset the parse to its initial state. This is most convenient when using a ©Conflict Traceª, ©Error Traceª, or other ©Auxiliary Traceª since these traces seldom begin at state 0. ## Reload The Reload button in the ©File Trace Windowª rereads the test file. This is convenient if you modify the test file while you are testing the ©grammarª. ## Lookahead Token In an ©LALR-1 parserª the "lookahead token" is the next token to be processed. For each ©parser stateª there is a list of tokens that may be seen in this state. For each token there is a corresponding ©parser actionª. The parser scans the list looking for the lookahead token and then performs the corresponding parser action. If the lookahead token cannot be found and there is no ©default reductionª, the parser signals a ©syntax errorª. In File Trace, and in some circumstances in Grammar Trace, the lookahead token can be seen on the last line of the ©Parser Stack paneª. ## GET_CONTEXT If you have defined a "©context typeª" ©configuration parameterª, and wish to maximize the performance of your parser, you should write a GET_CONTEXT macro which stores the context of the input token directly in ©CONTEXTª, the current stack location. Otherwise, you can write your ©GET_INPUTª macro so that it stores context into ©PCBª.©input_contextª. The default definition for GET_CONTEXT will then copy PCB.input_context to the ©context stackª at the appropriate time. ## GET_INPUT GET_INPUT is a macro which you should define to control ©parser inputª if your parser is not ©event drivenª and you are not using ©pointer inputª. If you don't define it, AnaGram will define it by default to read a single character from stdin: #define GET_INPUT (PCB.input_code = getchar()) ©PCBª.©input_codeª is an integer field in the ©parser control blockª which is used to hold the current character code. You may also want GET_INPUT to set the values of ©input_contextª or ©input_valueª. It may call an input function, or it may execute in-line code when it is invoked. ## iso latin 1 The "iso latin 1" ©configuration switchª controls case conversion on input characters when the ©case sensitiveª switch is set to off. When "iso latin 1" is set, the default ©CONVERT_CASEª macro is defined to convert correctly all characters in the latin 1 character set. When the switch is off, only characters in the ASCII range (0-127) are converted. ## Dragon Book The "dragon book" is the classic reference on formal parsing: Compilers: Principles, Techniques, and Tools Aho, Sethi, and Ullman Addison-Wesley, 1986. It is called the "dragon book" because of its colorful cover illustration showing a knight in armour ("data flow analysis") armed with sword ("©LALR parser generatorª") and shield ("syntax directed translation") at his PC attacking a bright red dragon ("complexity of compiler design"). ## LALR-1 Parser An LALR-1 parser is a ©parserª created from a ©grammarª by an ©LALR parser generatorª. ## LALR Parser Generator LALR(k) (LookAhead Left-to-right Rightmost derivation) parser generators are programs that create parsers algorithmically from formal grammars. The (k) refers to the number of lookahead symbols used to make parsing decisions. Normally, k = 1. LALR parsers are a subset of the class of so-called LR parsers. LALR parsers are generally more compact and less costly to create. These advantages are obtained at a slight sacrifice in generality. Although is possible to contrive an LR grammar which has ©conflictªs when analyzed with the LALR algorithm, such situations rarely occur in practice, and can be easily resolved by rewriting a few rules. In the ©dragon bookª, section 4.7, the authors list the following attractive properties of LR parsing: LR parsers can be constructed to recognize virtually all programming-language constructs for which context-free grammars can be written. The LR parsing method is the most general nonbacktracking shift-reduce parsing method known, yet it can be implemented as efficiently as other shift-reduce methods. The class of grammars that can be parsed using LR methods is a superset of the class of grammars that can be parsed with predictive parsers. An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input. ## Getting Started AnaGram is an ©LALR parser generatorª. Its input is a ©syntax fileª, which you prepare with an ordinary programming editor. Its output is a ©parser fileª. which you can compile with a C or C++ compiler on any platform and link into your program. To compile on Unix platforms, set the ©no crª ©configuration switchª. AnaGram has extensive context-sensitive hypertext ©helpª. In any AnaGram window, press ©F1ª or select an item with the ©Help Cursorª. Further documentation in HTML format, including documentation of examples, is found in the html subdirectory. AnaGram also has a comprehensive hard-copy manual, the AnaGram User's Guide. If you are new to AnaGram, you might begin by reviewing the Help Topics ©How AnaGram Worksª and ©Program Developmentª, and looking at An Annotated Example and Summary of AnaGram Notation in the HTML documentation. If you are not already familiar with formal parsing techniques, you may want to read Introduction to Syntax Directed Parsing in the HTML documentation. Note also the Fahrenheit to Celsius conversion examples in the examples/fc directory, which comprise a graded sequence of syntax files illustrating most of the basic principles of ©syntax directed parsingª in easy steps. Documentation is in html/fc.html. AnaGram has many features, many of which are not commonly found in parser generators: the ©configuration sectionª ©thread safe parsersª C++ support the ©disregardª and ©lexemeª statements ©event drivenª parsers ©character setsª ©virtual productionsª ©File Traceª, ©Grammar Traceª ©automatic resynchronizationª ©error token resynchronizationª To familiarize yourself with the many options available for configuring your parsers, select ©Configuration Parametersª from the ©Browse Menuª. Use ©F1ª or the ©Help Cursorª to pop up explanations of the various parameters. If you don't find the information you need, please visit the AnaGram web page at http://www.parsifalsoft.com for further information and support. ## How AnaGram Works AnaGram contains an ©LALR Parser Generatorª which creates a table driven ©LALR-1 parserª from a ©grammarª written in a variant of Backus-Naur Form. AnaGram works in two steps. In the first step, or analysis phase, it reads a ©syntax fileª and compiles a number of tables describing the grammar. In the second step, or build phase, it writes two output files: a ©parser fileª written in C or C++ and a ©header fileª. Syntax files normally have the extension .syn. The rules for writing syntax files are given in the AnaGram User's Guide and in the Summary of AnaGram Notation in the HTML documentation. The header file contains definitions and declarations, including the definition of a ©parser control blockª. The parser file consists of: The ©C prologueª, if any. Definitions and declarations provided by AnaGram. ©Reduction procedureªs. a customized ©parsing engineª. a ©parse functionª to be called when input is to be parsed. The name of the parser file is controlled by the ©parser file nameª ©configuration parameterª. The name of the parse function itself is controlled by ©parser nameª. In the default case, the parser file will have the same name as the syntax file, with the extension .c. The name of the parse function is given by the ©parser nameª parameter. It defaults to the name of the syntax file. ## Examples The EXAMPLES directory of the AnaGram distribution disk contains a number of examples to help you get started. Documentation for the examples, in HTML format, is located in the html directory (start at index.html or examples.html). The traditional Hello, World, in examples/hw, is a good example for getting familiar with the mechanical procedures of building both C and C++ parsers from ©syntax fileªs. The Fahrenheit/Celsius conversion examples in the examples/fc directory on your AnaGram diskette comprise a graded sequence of syntax files which illustrate most of the basic principles of ©syntax directed parsingª in easy steps. In addition, these examples demonstrate many features of AnaGram which are not found in other parser generators: the ©configuration sectionª ©character setsª ©virtual productionsª ©error token resynchronizationª ©File Traceª the ©disregardª and ©lexemeª statements ©event drivenª parsers The Four Function Calculator (examples/ffcalc) is used traditionally to demonstrate parser generators. If you are already familiar with ©syntax directed parsingª this example will give you a good overview of the basics of AnaGram. An annotated version of this example may be found in AnaGram's HTML documentation. The FFCALC example illustrates the use of ©precedence rulesª to resolve ©conflictsª. Other examples are available to demonstrate additional features of AnaGram. RCALC (examples/rcalc) is a simple four function calculator which accepts roman numeral input. It illustrates the following AnaGram features: ©pointer inputª ©SYNTAX_ERRORª macro ©context stackª DSL (examples/dsl) is a complete DOS script language, which provides capabilities well in excess of DOS batch files. DSL is a complete working program, used in the past to create AnaGram's install program. Some of the specific features of AnaGram which it illustrates are: ©distinguish lexemesª ©distinguish keywordsª ©far tablesª MPP is a fully functional macro preprocessor for C or C++. Included with MPP are two C grammars, either of which may be incorporated into MPP. MPP uses several parsers that work together: TS.SYN is the primary token scanner parser that identifies tokens, and handles preprocessor commands. MAS.SYN is used to do macro argument substitution. CT.SYN is used to identify tokens that result from string concatenation during macro argument substitution. EX.SYN is used to evaluate constant expressions in #if preprocessor statements. Among the more powerful features of AnaGram that MPP illustrates are: ©semantically determined productionsª ©event drivenª parsers ## Goal, Goal Token, Start Token The ©grammar tokenª is the token which represents the "top level" in your grammar. Some people refer to it as the "goal" or "goal token" and others as the "start token". Whichever it is called, it is the single token which describes the complete input to your parser. The most common way to specify a grammar token is as follows: grammar -> statements?..., eof This production tells AnaGram that the input to your parser consists of a (possibly empty) sequence of statements followed by an end of file token. There are a number of ways of specifying which token in your ©syntax fileª represents the top level of your grammar. You may simply name it "grammar", or you may tag it with a '$' character when you define it, or you may set the ©grammar tokenª ©configuration parameterª. If you should inadvertently tag several tokens with the '$' character and/or set the grammar token parameter, it is the last such specification in the file which wins. Some people develop their grammars bottom up, gradually adding new levels of complexity. In the course of development, they may specify a number of tokens as grammar tokens and forget to remove the old specifications. Notice that if you define the token "grammar" anywhere in your syntax and specify the grammar token otherwise, "grammar" will not be the grammar token. This is to keep "grammar" from being a reserved word. If you need to use it in your syntax for something other than the whole grammar, you are free to do so. ## Grammar Traditionally, a "grammar" is a set of ©productionªs which taken together specify precisely a set of acceptable input streams, in terms of an abstract set of ©terminal tokensª. The set of acceptable input streams is often called the "language" defined by the grammar. In AnaGram, the term "grammar" also includes ©configuration sectionsª as well as the ©definitionsª of ©character setsª and ©virtual productionsª which augment the collection of productions. The term is often used in contrast to the term "©syntax fileª" which is used to signify the complete AnaGram source file including reduction procedures and embedded C or the term "©parserª" which refers to AnaGram's output file. A grammar is often called a "syntax", and the rules of the grammar are often called syntactic rules. ## Grammar Analysis The major function of AnaGram is the analysis of context-free grammars written in a particular variant of Backus-Naur Form. The analysis of a grammar proceeds in four stages. In the first, the input grammar is analyzed and a number of tables are built which describe all of the ©productionªs and components of the ©grammarª. In the second stage, AnaGram analyzes all of the character sets defined in the grammar, and where necessary, defines auxiliary tokens and productions. In the third stage, AnaGram identifies all of the states of the parser and builds the go-to table for the parser. In the fourth stage, Anagram identifies ©reduction tokensª for each completed ©grammar ruleª in each state and checks for ©conflictªs. Use the ©Analyze Grammarª command to cause AnaGram to analyze your grammar. ## Grammar Is Ambiguous This ©warningª message appears if your ©grammarª contains ©conflictªs. AnaGram will resolve ©shift-reduce conflictsª by selecting the shift option. It will resolve ©reduce-reduce conflictsª by selecting from the conflicting ©grammar ruleªs the one which appears first in the ©syntax fileª. ## Grammar Rule A "grammar rule" is the right hand side of a production. It is a sequence of ©rule elementsª. Each rule element identifies some token, which can be either a ©terminal tokenª or ©nonterminal tokenª. A grammar rule is "matched" by a corresponding sequence of tokens in the input stream to the parser. The rule elements in the grammar rule may be ©token nameªs, ©set expressionsª, ©character constantsª, ©immediate actionªs, ©keyword stringsª, or ©virtual productionsª. A grammar rule may be followed by an optional ©reduction procedureª. The ©semantic valuesª of the tokens that comprise the rule may be passed to the reduction procedure by using ©parameter assignmentsª. A grammar rule always makes up the right hand side of a production. The left hand side of the production identifies one or more ©nonterminal tokensª, or ©reduction tokensª, to which the rule reduces when matched. If there is more than one reduction token, the production is called a ©semantically determined productionª and there should be a ©reduction procedureª to select the correct reduction token at run time. ## Grammar Token The "grammar token" ©configuration parameterª may be used to specify the ©goalª, or "start" token for the syntax analyzer portion of AnaGram. Alternatively, you could simply call the token "grammar", or you could append a '$' character to it when you define it. Each grammar must have a grammar token specified before it can be analyzed or before a parser can be built. The grammar token is the single token to which the grammar finally condenses. When this token is identified by the parser, the parse is complete. ## Grammar Trace AnaGram's Grammar Trace facility lets you examine the workings of your ©parserª in detail. You can use the Grammar Trace as soon as you have analyzed your ©grammarª, even before you have written any ©reduction procedureªs or other code. Thus you can defer writing procedural code until you have the grammar working to your specifications. Select the ©Grammar Trace Windowª from the ©Action Menuª or click on the Grammar Trace button. In the ©Parser Stack paneª you can see a representation of the ©parser state stackª and ©parser stateª as they might appear in the course of execution of your ©parserª. The ©Rule Stack paneª shows the relationship between the contents of the parser stack and your ©grammarª. If your grammar uses ©semantically determined productionsª, you can select the appropriate ©reduction tokenª from the ©Reduction Choices paneª. At any stage, the ©Parser Stackª represents a parse in progress. It shows the sequence of ©tokenªs that have been input so far and the states in which they were seen. When a production is complete and the grammar rule is reduced, the tokens that make up the rule are removed from the stack and replaced by the token on the left side of the production. Initially, the Parser Stack contains only a ©lookahead lineª. To explore your grammar, choose ©tokenªs one by one from the ©Allowable Inputª pane. This pane shows the tokens allowable at the current state of the grammar, and the actions that result when the tokens are chosen. You can also enter text characters directly in the ©text entryª field. This means you can run a Grammar Trace like a ©File Traceª where the test file is replaced by the characters you type in the text entry field. This is a very convenient way to check out your grammar. Text entry is, of course, not appropriate for grammars that expect ©token inputª. In a ©File Traceª you can advance the parse no matter which pane is active. In a Grammar Trace there is a question as to whether input is intended to come from the Allowable Input pane or the text entry field. Therefore the parse can only be advanced when one of these two is active to indicate that it is the source of input. Specialized prebuilt Grammar Traces such as the ©Conflict Traceª and the ©Auxiliary Traceª can be selected from ©Auxiliary Windowsª popup menus where appropriate. All Grammar Trace activity updates the ©trace coverageª counts. ## Text Entry It is sometimes more convenient to enter text in the text entry box on the ©Grammar Traceª toolbar than to select individual tokens from the ©Allowable Input paneª. By entering text you can proceed quickly to a troublesome state without having to choose each individual token en route. After entering text, press Enter or click on the Proceed button to parse the text. Click on the single step button to work slowly through the text step by step. ## header file name The "header file name" parameter names the ©parser headerª file that AnaGram will generate when it builds your parser. This header file can be used with your parser or with other modules in your program. The header file contains a number of typedef statements and an number of macro definitions which are needed in your parser and may be useful in other modules. If the value of this parameter contains a '#' character, AnaGram will substitute the name of your syntax file for the '#'. The default value of "header file name" is "#.h". ## Help, Using Help There are 3 main ways to access AnaGram Online Help: Press F1 for context-sensitive help from most windows and menu items. Similarly, use the ©Help Cursorª from most windows and menu items. From the Help menu, you can bring up ©Help Topicsª and choose a topic. You can also get fly-over help for the toolbar buttons on the ©Control Panelª. File and Grammar Traces have a Help button. AnaGram's Help windows, unlike most others, remain on-screen until you dismiss them. This means you can refer to several topics at once. They have hypertext links to other Help topics. Also, right-clicking the mouse on a Help window or pressing F1 will pop up an Auxiliary Windows menu of all linked topics in the window. "Using Help" is always available from this popup menu. Note that, for the ©Warningsª, ©Configuration Parameterªs and ©Help Topicsª windows, F1 will give you help for the item on the highlighted line, whereas the Help Cursor allows you to select any line by clicking on it. AnaGram also has documentation in HTML format, indexed in the index.html file. This documentation covers Getting Started, examples, and some further topics mainly condensed from the User's Guide. Hard copy documentation is in the AnaGram User's Guide, which has the most detail. ## Hidden In a ©configuration sectionª of your grammar you may use an ©attribute statementª to declare one or more tokens to be "hidden". Tokens that are "hidden" do not appear in the ©token namesª table, and thus do not appear in syntax error diagnoses. When your parser attempts to determine the ©error frameª of a ©syntax errorª, it will disregard the tokens that have been declared hidden. The hidden declaration consists simply of the keyword hidden followed by a list of tokens, separated by commas and enclosed in braces ({ }): [ hidden { widget, wombat, foo, bar } ] You would use the "hidden" attribute primarily for tokens whose name would not mean anything to your users. ## Immediate Action Immediate actions are snippets of C code which are to be executed in the middle of a ©grammar ruleª. Immediate actions are denoted by a '!' character followed by either a C expression, terminated by a semicolon; or a block of C code enclosed in braces. For example, in a simple desk calculator example one might write the following: transaction -> !printf('#');, expression:x =printf("%d\n",x); Notice that the only apparent difference between an immediate action and a ©reduction procedureª is that the immediate action is preceded by '!' instead of '='. Notice that the immediate action must be followed by a comma to separate it from the following ©rule elementª. Immediate actions may also be used in ©definitionªs: prompt = !printf('#'); The above example, using this definition would then be: transaction -> prompt, expression:x =printf("%d\n",x); You could accomplish the same result by writing a ©null productionª and a reduction procedure: prompt -> =printf('#'); This is exactly how AnaGram implements immediate actions. ## Implementation Errors "Implementation errors" are errors your parser detects which are not the immediate result of bad input. When it encounters an implementation error, your parser will call a macro which you can define to deal with the problem in a manner suitable to your needs. If you don't provide these macros, AnaGram will make default definitions. There are two macros corresponding to two implementation errors: ©PARSER_STACK_OVERFLOWª ©REDUCTION_TOKEN_ERRORª ## Inappropriate Value This ©warningª message appears when the value assigned to a ©configuration parameterª is not appropriate to that parameter. Check the definition of the parameter, by opening the ©Configuration Parameters Windowª, selecting the parameter and pressing F1. ## Initializer For every ©parserª it generates, AnaGram generates an "initializer" function to call the parser. AnaGram names the initializer by prefixing the ©parser nameª with "init_". If your parser is ©event drivenª, you must call the initializer before you call the parser. If your parser is not event driven, AnaGram will normally include a call to the initializer in the parser. If you wish to be able to call your parser more than once without its being re-initialized, you may turn off the ©auto initª ©configuration switchª. When you do this, you assume responsibility for calling the initializer. If your parser is event driven, you must always call the initializer function. If the ©reentrant parserª switch is set, the initializer takes a pointer to the ©parser control blockª as its sole argument. Otherwise it takes no arguments. The initializer returns no value. All communication is by means of the ©parser control blockª. ## Input Character The actual unit of ©parser inputª is usually a single character. Note that you are not limited to eight-bit characters. Your parser will use the input character to index a translation table, ©ag_tcvª, to determine the ©token numberª for that character. The ©token numberª identifies the actual syntactic token. The character code itself will be the ©semantic valueª of the token. Note that AnaGram groups together all input characters that are syntactically indistinguishable into a single input token. ## input_code input_code is a field in the ©parser control blockª which contains the current ©input characterª, or, if your ©GET_INPUTª macro supplies ©token numberªs directly, the token number. If you write your own ©GET_INPUTª macro, you must make sure that you store the input character, or token number, you get into ©PCBª.input_code. ## INPUT_CODE(t) If you set both the ©pointer inputª and the ©input valuesª ©configuration parameterªs, you must provide an INPUT_CODE macro for your parser. In this situation, your parser will use the pointer to load the ©input_valueª field of the ©parser control blockª and uses the INPUT_CODE macro to extract the appropriate value for the ©input_codeª field. For example, if the input_value is a structure and the appropriate member field is called "id" you would write: #define INPUT_CODE(t) (t).id ## input_context "input_context" is a field which AnaGram adds to the definition of the ©parser control blockª structure when you define a ©context typeª ©configuration parameterª. If you choose, you can write your GET_INPUT macro so that it stores the context value in ©PCBª.input_context. The default definition for ©GET_CONTEXTª will then stack the context value at the appropriate time. You can think of PCB.input_context as a sort of temporary "parking place" for the context value. ## Input Scan Aborted This ©warningª message appears if AnaGram is unable to finish scanning your ©syntax fileª because of previous errors. ## input values "Input values" is a ©configuration switchª which defaults to off. If your ©parser inputª includes explicit ©token valueªs which are not simply the ascii values of corresponding ascii input characters, you must set the "input values" switch to inform AnaGram. Unless your parser is ©event drivenª or uses ©pointer inputª, you must also provide your own ©GET_INPUTª macro. If your parser uses pointer input, you must provide an ©INPUT_CODE(t)ª macro. The semantic value of an input token is to be stored in the ©input_valueª field of the parser control block. ## input_value input_value is a field in the ©parser control blockª which is used to store the semantic value of the input token. If you write your own ©GET_INPUTª macro, and you have set the ©input valuesª ©configuration switchª, you should make sure that you store the value of the ©input characterª or token into ©PCBª.input_value. ## Internal Error "AnaGram internal error: ..." is a ©warningª message which appears if one of AnaGram's internal consistency tests fails. This message should never appear if AnaGram is working properly. Usually AnaGram will abort on encountering an internal error, although under a small set of circumstances it may continue. Should this happen, it would be wise to close AnaGram and restart it. If you do get an internal error, please note the complete message identifing the problem and file a bug report, following the directions posted on the AnaGram web page at http://www.parsifalsoft.com. A copy of the relevant syntax file and a summary of the circumstances surrounding the problem would be greatly appreciated. ## Intersection In set theory, the intersection of two sets, A and B, is defined to be the set of all elements of A which are also elements of B. In an AnaGram ©syntax fileª, the intersection of two ©character setsª is represented with the '&' operator. The intersection operator has lower ©precedenceª than the ©complementª operator, but higher precedence than the ©unionª and ©differenceª operators. The intersection operator is ©left associativeª. ## Keyboard Support AnaGram can be controlled entirely from the keyboard. In the Control Panel, you can tab to any button and press Enter to select it. In addition to the conventional Windows keyboard functions, the following keys have been implemented: Escape closes any AnaGram window except the Control Panel. F8 toggles between an active AnaGram window and the Control Panel F10 accesses the Control Panel menu from any AnaGram Window. Shift F10 pops up the Auxiliary Windows menu ## Keyword, Keyword String Keywords are a very important feature of AnaGram. They provide an easy way to pick up special character sequences in your input, thereby eliminating the need for a lot of tedious ©productionªs. If AnaGram finds, on the right hand side of one of your ©grammarª productions, a string enclosed in double quotes, such as "IF", it automatically creates from the string a "keyword" which is incorporated into your parser. You may have any number of keywords. A keyword is treated as a single terminal token. Recognition of keywords is governed by the ©case sensitiveª switch. Your parser will look for a keyword in its input stream wherever you have defined this particular keyword to be legitimate input. It will do whatever lookahead is necessary in order to pick up the entire keyword. If several keywords match the input, such as IF and IFF, it will select the longest match, IFF in this example. Important points to notice about keywords: 1) Keywords take precedence over ordinary characters in the input stream - thus if the character I and the keyword IF are both legitimate input at some point, IF will be selected, if present, in preference to I. 2) Keywords are not reserved words. Your parser will only look for a keyword when it is in a state where that keyword is legitimate input. 3) Keywords do not participate in character sets and should not appear in definitions of character sets. In particular, they are not considered as belonging to the complement of a character set. Thus a keyword would not be considered legitimate input for the production next char -> ~( '/' + '*' ) 4) Keywords may appear in virtual productions. 5) Keywords may be named by means of a definition. AnaGram will list all the keywords in your grammar in the ©Keywordsª window. In addition, in numerous windows where the cursor bar selects a state, the ©Auxiliary Windowsª popup menu will list a Keywords option. This window will provide a list of the keywords acceptable in the selected ©parser stateª. On occasion, a kind of conflict, called a ©keyword anomalyª may occur. If so, such conflicts will be listed in the ©Keyword Anomaliesª window. The "©stickyª" ©attribute statementª is useful in dealing with keyword anomalies. ## Keyword Anomalies Found This ©warningª message indicates that AnaGram has found at least one ©keyword anomalyª in your ©grammarª. Open the ©Keyword Anomaliesª window to see a list of those that have been found. ## Keyword Anomaly In ©syntax directed parsingª, it is assumed that input ©tokenªs can be uniquely identified. In the case of ©keywordªs, however, there is the possibility that the individual characters making up the keyword, as well as the keyword taken as a whole, could constitute legitimate input under some circumstances. Thus ©keywordsª, though a powerful and useful tool, are not completely consistent with the assumptions that underlie ©syntax directed parsingª. This can occasionally give rise to a type of conflict, diagnosed by AnaGram, called a "keyword anomaly". AnaGram is quite conservative in its diagnoses, so that many keyword anomalies it reports are actually innocuous and can be safely ignored. Basically, a keyword anomaly is a situation where a keyword is recognized, causes a reduction, and the parser arrives in a state where the keyword is not legal input. If the keyword, seen simply as a sequence of characters, might have been legal input in the original state, AnaGram notes the existence of a keyword anomaly. If you have a keyword that causes a keyword anomaly and it is actually a reserved word in your grammar, the anomaly is by definition innocuous. You should use the ©reserve keywordsª statement to inform AnaGram that the keyword is reserved and the anomaly need not be diagnosed. To help identify and correct any problems associated with keyword anomalies, AnaGram provides the ©Keyword Anomaliesª window to identify the anomalies, and the ©Keyword Anomaly Traceª to help you understand a particular anomaly. ## Keyword Anomaly Trace A Keyword Anomaly Trace is a ready made ©grammar traceª window which you may select from the ©Auxiliary Windowsª menu of the ©Keyword Anomaliesª window. The anomaly trace provides a path to a state which illustrates the ©keyword anomalyª. In this state, the keyword is a reducing token, but after the reduction, it is not allowable input. ## Keyword Anomalies The Keyword Anomalies window is available only if your grammar has ©keywordª anomalies. Each entry in the Keyword Anomalies window consists of two lines. The first line identifies the ©parser stateª at which the ©keyword anomalyª occurs and the offending keyword. The second line identifies the ©grammar ruleª which the keyword may erroneously reduce. The ©Auxiliary Windowsª menu provides three auxiliary windows keyed directly to the anomaly to help you determine the nature of the problem: The ©Keyword Anomaly Traceª window, the ©Reduction Traceª window, and the ©Rule Derivationª window. Three other windows provide supporting information: the ©Reduction Statesª window, the ©Rule Contextª window and the ©State Definitionª window. ## Keywords The Keywords entry in the ©Browse Menuª pops up a window which lists all of the keywords defined in your ©grammarª. The ©token numberª is also specified. A Keywords window is also an option in the ©Auxiliary Windowsª popup menu for any window which distinguishes various states of your parser. The Keywords window will show all of the ©keywordªs which will be recognized in the state selected by the cursor bar in the parent window. The ©Auxiliary Windowsª menu for a Keywords window provides a ©Token Usageª option which will allow you to all the uses of a particular keyword in your grammar. ## left "left" controls a ©precedence declarationª, indicating that all of the listed ©rule elementsª are to be considered ©left associativeª. ## Left Associative A binary operator is said to be left associative if an expression with repeated instances of the operator is to be evaluated from the left. Thus, for example, x = a/b/c is normally taken to mean x = (a/b)/c The division operator is said to be left associative. In ©grammarªs with ©conflictªs, you may use ©precedence declarationªs to specify that an operator should be left associative. ## Lexeme The "lexeme" ©attribute statementª is used to fine-tune the "©disregardª" statement. The lexeme statement takes the form: lexeme { T1, T2,....Tn } where T1,...Tn is a list of ©nonterminalª tokens separated by commas. Lexeme statements may be placed in any ©configuration sectionª, and there may be any number of them. When you specify that a ©tokenª is to be disregarded, AnaGram rewrites your ©grammarª so that the token will be passed over whenever it occurs at the beginning of a file or following a lexical unit, or "lexeme". If you have no lexeme statement, then the lexemes in your grammar are just the terminal tokens. The lexeme statement allows you to specify that certain nonterminal tokens are also to be treated as lexemes. This means that the disregard token will be skipped following the lexeme, but not between the characters that constitute the lexeme. Lexemes correspond to the tokens that a lexical scanner, if you were using one, would commonly identify and pass to a parser as single tokens. You don't usually wish to disregard ©white spaceª within these tokens. For example, in a grammar for a conventional programming language where blank characters are to be disregarded, you might include: [ lexeme {string, character constant, name, number} ] since blank characters must not be overlooked within strings and constants, and should not be permitted within names or numbers. If your grammar allows for situations where successive lexemes could run together if they were not separated by space, a name followed by a number, for example, you may use the "©distinguish lexemesª" ©configuration switchª to force a separation between the tokens. White space may be used explicitly within definitions of lexeme tokens in your grammar if desired, without causing conflicts. Thus, if you wish to allow embedded space in variable names, you might write: [ disregard space lexeme {variable name} ] space = ' ' + '\t' letter = 'a-z' + 'A-Z' digit = '0-9' variable name -> letter -> variable name, letter + digit -> variable name, space..., letter + digit ## line line is a field in your ©parser control blockª used for keeping track of the line number of the current character in your input. Line and column numbers are tracked only if the ©lines and columnsª ©configuration switchª has been set. ## line length Line length is an ©obsolete configuration parameterª. ## Line Numbers "Line numbers" is a ©configuration switchª which defaults to off. If it is on, the ©Build Parserª command will put "#line" directives into the generated C code file so that your compiler diagnostics will refer to lines in the ©syntax fileª rather than in the generated C code file. For more information on the "#line" directive, see Kernighan and Ritchie, second edition, section A12.6. If the "line numbers" switch is off, AnaGram will put comments into your parser file to help you find reduction procedures and embedded C in your syntax file. Prior to AnaGram 2.01, if your C or C++ compiler required that the backslashes in the pathname in the #line directive be doubled, you would have used AnaGram's ©escape backslashesª switch to make this happen. Although you may still use ©escape backslashesª, it should no longer be necessary because AnaGram now puts forward slashes into #line pathnames instead of backslashes. If you wish, you may specify the pathname in the #line directives explicitly by using the ©Line Numbers Pathª configuration parameter. You may also wish to change the "©parser file nameª" parameter to provide a full path name for your parser file. ## Line Numbers Path "Line Numbers Path" is a ©configuration parameterª which takes a string value. It defaults to NULL. When you have set the ©Line Numbersª ©configuration switchª and Line Numbers Path is not NULL, AnaGram uses it in the #line directive in place of the full path name of your ©syntax fileª. Note that Line Numbers Path should be the complete pathname for your syntax file. Line Numbers Path is useful when using AnaGram in cross platform development. When parsers are to be compiled and tested on a platform different from that used to run AnaGram, you may use Line Numbers Path to provide a pathname on the platform used for compiling and testing. ## Lines and Columns "Lines and columns" is a ©configuration switchª which defaults to on. When set, i.e., on, it causes the ©Build Parserª command to incorporate code into your parser which will automatically track the line number and column number of the input token. You would normally set the "lines and columns" switch when you are planning to build a parser which will read an input file and which will need to diagnose ©syntax errorsª with some precision. Your parser will store the line and column numbers in the ©lineª and ©columnª fields respectively in the ©parser control blockª. If the input to your parser includes tab characters, you should either set the ©tab spacingª ©configuration parameterª appropriately or provide a ©TAB_SPACINGª macro for your parser. Your parser will count line and column numbers beginning with one. ## Main Program The "main program" ©configuration switchª determines what AnaGram does if you invoke the ©Build Parserª command, but have no ©embedded Cª in your ©syntax fileª. If the switch is on and you have not specified ©pointer inputª or an ©event drivenª parser, AnaGram creates a main program which does nothing but call your ©parserª. The "main program" switch defaults to on. This feature, along with the default definitions for ©GET_INPUTª and ©error handlingª, makes it possible to write a grammar with no ©embedded Cª or ©reduction procedureªs whatsoever and still get an executable program which will read input from stdin and parse it according to your grammar. ## Marked Rule A "marked rule" is a ©grammar ruleª together with a marked token that indicates how much of the rule has already been matched. The ©marked tokenª and any tokens following it indicate the input that should be expected if the remainder of the rule is to be matched. When marked rules are displayed in AnaGram windows, the marked token is represented by a difference in the font. The token may be in bold face, underlined, italicized, shown with a different point size, or in a different font altogether. Since AnaGram allows you to change fonts to suit your own preferences, you should be careful that the font you choose for the marked tokens allows them to be readily distinguished from the other tokens in your grammar rules. An underlined font is often suitable. ## Max conflicts The "max conflicts" ©configuration parameterª limits the number of ©conflictªs AnaGram will record. Sometimes, a simple error editing your syntax file can cause hundreds of conflicts, which you don't need to see in gory detail. The default value of max conflicts is 50. If you have a grammar that is in serious trouble and you want to see more conflicts, you may change max conflicts to suit your needs. ## Missing The ©warningª message Missing <element 1> in <element 2> indicates that AnaGram expects to see an instance of syntactic element 1 at the specified location, internal to an instance of syntactic element 2. AnaGram cannot reliably continue parsing its input after an error of this type. Therefore, it limits further analysis of your grammar to scanning for syntax errors. ## Missing Production "Missing production, TXXX: <token name>" is a ©warningª message which indicates that the specified ©tokenª appears to be defined recursively, but there is no initial ©productionª to get the recursion started. If you get this warning, check your ©grammarª closely. ## Missing Reduction Procedure "Missing reduction procedure, RXXX" is a ©warningª message which appears either when the ©grammar ruleª indicated specifies a ©parameter assignmentª but does not have a ©reduction procedureª to use it, or when the rule has no reduction procedure but the value of the token on the left hand side is used in as an argument for some other reduction procedure and the ©default reduction valueª does not have the same type as the token on the left hand side. In this latter case, a reduction procedure may be needed to effect correct type conversion. This warning is provided in case the lack of a reduction procedure is an oversight. ## Multiple Definitions "Multiple definitions for TXXX: <token name>" is a ©warningª message which indicates that the specified ©tokenª has been defined both as a ©character setª and as a ©nonterminal tokenª. It cannot be both. ## Near Functions "Near Functions" is a ©configuration switchª that defaults to off. It controls the use of the "near" keyword for static functions in your parser. If your parser is to run on an 80x86 processor you might wish to turn it on. Your parser will then be a slight bit smaller and will run a little bit faster. If you are going to run your parser on some other processor or use a C or C++ compiler that does not support the "near" keyword you should make sure "near functions" is set to off. ## Negative Character Code in Pointer Mode This ©warningª message appears if your ©grammarª defines negative character codes and uses ©pointer inputª. If your grammar uses the default definition for ©pointer typeª it will be reading unsigned characters so that the parser will never see the negative codes that have been defined. You may correct the problem by providing your own definition of pointer type. ## Nest Comments "Nest comments" is a ©configuration switchª which defaults to off. It controls the treatment of ©commentsª while scanning your ©syntax fileª. It defaults to off, in accordance with the ANSI standard for C which disallows ©nested commentsª. Note that AnaGram scans comments in any ©embedded Cª code as well as in the grammar specification. You may turn this switch on and off as many times as necessary in a single file. ## Nested Comment As delivered, AnaGram treats C style ©commentsª according to the ANSI standard: They do not nest. For those who prefer nested comments, however, the ©nest commentsª ©configuration switchª allows them to nest. ## Nesting too deep This ©warningª message indicates that ©set expressionªs or ©virtual productionsª are nested so deeply they have exhausted the available stack space and AnaGram cannot continue its analysis. Use a ©definitionª statement to name an intermediate level. ## no cr "no cr" is a ©configuration switchª which defaults to off. When this switch is set, it will cause the ©parser fileª and ©header fileª to be written without carriage returns. This is convenient if you wish to use the generated parser files in a Unix environment. ## No Grammar Token Specified This ©warningª message appears if your ©grammarª does not specify a ©grammar tokenª. Edit your ©syntax fileª to specify one. ## No Productions in Syntax File This ©warningª message appears if AnaGram did not find any ©productionsª at all in your ©syntax fileª. Check to see you have the right file. ## No Such Parameter This ©warningª message appears when AnaGram does not recognize the name of a ©configuration parameterª you have tried to set in your ©syntax fileª. Check the spelling of the parameter you wish to set in the ©Configuration Parameters Windowª. ## No Terminal Tokens in Expansion No terminal tokens in expansion of TXXX is a ©warningª message indicating that there are no terminal tokens to be found in an expansion of the specified token. Although there are a few circumstances where this could be legitimate, it is more likely that there is a missing rule in the grammar. ## Not a Character Set "Not a character set, TXXX: <token name>" is a ©warningª message which indicates that the specified ©tokenª has been used both on the left side of a ©productionª and in a ©character setª expression defining some other token. AnaGram will use an empty set in place of the specified token in evaluating the ©character setª. You will get another warning, ©Error definingª token, when AnaGram finishes its evaluation of the character set. ## Nothing Reduces "Nothing reduces TXXX -> RYYY" is a ©warningª message which indicates that the ©grammarª does not specify any input to follow an instance of the indicated ©grammar ruleª. In all probability, the grammar does not have any explicit end of file, or ©eof tokenª. If the grammar does not have any conflicts with ©tokenª T000, then an explicit end of file indicator is not necessary. Otherwise you should modify your grammar to require an explicit end of file. ## Null Character in String This ©warningª message appears when AnaGram finds an explicit null character in a quoted string. If you must allow for a null in a ©keyword stringª you will have to rewrite your ©grammar ruleª. For instance, instead of widget -> "abc\0def" write widget -> "abc", 0, "def" ## nonassoc "nonassoc" controls a ©precedence declarationª, indicating that all of the listed ©rule elementsª are to be considered non-associative. ## Nonterminal Token, Nonterminal A nonterminal token is one which is constructed from a series of other tokens as specified by one or more ©productionªs. Nonterminal tokens are to be distinguished from ©terminal tokenªs, which are the basic input units appearing in your input stream. Terminal tokens most often represent single characters or a character belonging to a ©character setª such as 'a-z'. ## Null Production A "null production" is one that has no tokens on the right hand side whatsoever. Null ©productionªs essentially are identified by the first following input token. Null productions are extremely convenient syntactic elements when you wish to make some input optional. For example, suppose that you wish to allow an optional semicolon at some point in your grammar. You could write the following pair of productions: optional semicolon -> | ';' Note that a null production can never follow a '|'. This could also be written on multiple lines thus: optional semicolon -> -> ';' You can always rewrite your grammar to eliminate null productions if you wish, but you usually pay a price in conciseness and clarity. Sometimes, however, it is necessary to do such a rewrite in order to avoid ©conflictªs, to which null productions are especially prone. For example suppose you have the following production: foo -> wombat, optional semicolon, widget You can rewrite this as two productions: foo -> wombat, widget -> wombat, ';', widget This rewrite specifies exactly the same input language, but is less prone to conflicts. On the other hand, it may require significantly more table space in your parser. If you have a null production with no ©reduction procedureª specified, your parser will automatically assign the value zero to ©reduction tokenª. Null productions can also be generated by ©virtual productionsª. A token that has a null production is a "©zero lengthª" token. ## Old Style "Old Style" is a ©configuration switchª which defaults to off. It controls the function definitions in the code AnaGram generates. When "old style" is off, it generates ANSI style calling sequences with prototypes as necessary. When "old style" is on, it generates old style function definitions. ## Output Files When you use the ©Build Parserª command, to request output from AnaGram, it creates two files: a ©parser fileª and a ©parser headerª file. ## Page Length "Page length" is an ©obsolete configuration parameterª. ## Obsolete Configuration Parameter, Obsolete Configuration Switch A number of ©configuration parameterªs and ©configuration switchªes which were used in the DOS version of AnaGram are no longer used, but are still recognized for the sake of upward compatibility. These parameters include: ©bottom marginª ©line lengthª ©page lengthª ©top marginª ©quick referenceª ©video modeª ## Parameter "Parameter <name> has type void" is a ©warningª message which appears when a ©parameter assignmentª is attached to a ©tokenª that has been defined to have the void ©data typeª. ## Parameter Assignment In any ©grammar ruleª, the ©semantic valueª of any ©rule elementª may be passed to a ©reduction procedureª by means of a parameter assignment. Simply follow the rule element with a colon and a C variable name. The C variable name can then be used in the reduction procedure to reference the semantic value of the token it is attached to. AnaGram will automatically provide necessary declarations. Here are some examples of rule elements with parameter assignments: '0-9':d integer:n expression:x declaration : declaration_descriptor ## Parameter Not Defined AnaGram does not have a ©configuration parameterª with the specified name. Please check the spelling. ## Parameter Takes Integer Value The specified ©configuration parameterª takes an integer value only. ## Parameter Takes String Value The specified ©configuration parameterª takes a string value only. ## Parse Function To run your parser, you call the parse function. The name of the parse function is given by the ©parser nameª ©configuration parameterª and defaults to the name of your parser file. If your parser uses ©pointer inputª, you should set the ©pointerª field of the ©parser control blockª before calling the parser function. If your parser is ©event drivenª, you should first call the ©initializerª, and then you should call the parser function for each input token you If the ©reentrant parserª switch is set, the parse function takes a pointer to the ©parser control blockª as its sole argument. Otherwise it takes no arguments. The parse function returns no value. All communication is by means of the ©parser control blockª. To retrieve the value of the ©grammar tokenª, once the parse is complete, use the ©parser value functionª. ## Parser A parser is a program or, more commonly, a procedure within a program, which scans a sequence of ©input charactersª or input tokens and accumulates them in an input buffer or stack as determined by a set of ©productionªs which constitute a ©grammarª. When the parser discovers a sequence of tokens as defined by a ©grammar ruleª, or right hand side of a production, it "reduces" the sequence to a single ©reduction tokenª as defined by the left hand side of the grammar rule. This ©nonterminal tokenª now replaces the tokens which matched the grammar rule and the search for matches continues. If an input token is encountered which will not yield a match for any rule, it is considered a ©syntax errorª and some kind of ©error recoveryª may be required to continue. If a match, or ©reduce actionª, yields the ©grammar tokenª, sometimes called the ©goal tokenª or ©start tokenª, the parser deems its work complete and returns to whatever procedure may have called it. The ©Grammar Traceª and ©File Traceª functions in AnaGram provide a convenient means for understanding the detailed operation of a syntax directed parser. ©Tokensª may have ©semantic valuesª. If the ©input valuesª ©configuration switchª is on, your parser will expect semantic values to be provided by the input process along with the token identification code. If the input values switch is off, your parser will take the ascii value of the input character, that is, the actual input code, as the value of the character. When the parser reduces a production, it can call a ©reduction procedureª or ©semantic actionª to analyze the values of the constituent tokens. This reduction procedure can then return a value which characterizes the reduced token. ## Parser Control Block A "Parser Control Block" is a structure which contains all of the data necessary to describe the instantaneous state of a parser. The typedef statement which defines the structure is included in the ©parser headerª file for your parser. AnaGram creates the name of the data type for the structure by appending "_pcb_type" to the ©parser nameª. You may add your own declarations to the parser control block by using the ©extend pcbª statement. If the ©declare pcbª ©configuration switchª is on, its normal state, AnaGram will declare a parser control block for you at the beginning of your parser file. AnaGram will determine the name of the parser control block by appending "_pcb" to the ©parser nameª. AnaGram will also define the macro PCB as a short hand notation for use within the parser. All references to the parser control block within the code that AnaGram generates are made using the PCB macro. If you wish to declare your own parser control block, you must include the ©parser headerª file for your parser before your declaration. Then you declare a control block and define PCB to refer to the control block you have declared. Suppose your grammar is called widget. You would then write the following statements in your ©embedded Cª: #include "widget.h" widget_pcb_type widget_control_pcb; #define PCB widget_control_pcb Alternatively, you could write the following: #include "widget.h" widget_pcb_type *widget_control_pcb_pointer; #define PCB (*widget_control_pcb) and then allocate storage for the structure when necessary. Some fields of interest in the parser control block are as follows: ©input_codeª ©input_valueª ©input_contextª ©pointerª ©token_numberª ©reduction_tokenª ©ssxª ©snª ©ssª[©parser stack sizeª] ©vsª[parser stack size]; ©csª[parser stack size]; ©lineª ©columnª *©error_messageª ©error_frame_ssxª ©error_frame_tokenª ## PCB "PCB" is a macro AnaGram defines for use in the code it generates to refer to the ©parser control blockª for your ©parserª. Normally, AnaGram automatically declares storage for a parser control block and defines PCB for you. If you turn off the ©declare PCBª switch, you may define PCB yourself. ## PCB_TYPE If you are writing your parser in C++, you may prefer to derive a class from the ©parser control blockª rather than use the ©extend pcbª statement. In this case you may define the PCB_TYPE macro in your syntax file to specify your derived class. For instance, you have defined class MyPcb : public parser_pcb_type {...}; You would then add the following line: #define PCB_TYPE MyPcb If you do not define PCB_TYPE, AnaGram will define it as the type of your parser control block. ## Parser File The "parser file" is the C (or C++) file output by AnaGram when you execute the ©Build Parserª command. It contains all of the ©embedded Cª from your ©syntax fileª, all of the ©reduction procedureªs defined in your ©grammarª, syntax tables which represent, in a condensed form, all of the intricacies of your grammar, and a customized ©parsing engineª. The name of the parser file is given by the ©parser file nameª ©configuration parameterª. The name of the ©parserª itself is given by the ©parser nameª configuration parameter. If you wish the parser file to be written without carriage returns, suitable for a Unix environment, set the ©no crª configuration switch. ## Parser File Name "Parser file name" is a ©configuration parameterª which takes a string value. The default value is "#.c". AnaGram uses this parameter to generate the name of the output C file, or ©parser fileª, created by the ©Build Parserª command. The '#' character is used in this string as a wild card to indicate the name of the current ©syntax fileª. If the first character of the parser file name string is a '.' character, AnaGram will substitute the name of the current working directory for the dot. Thus ".\\#.c" will create the file name as a complete path. This can sometimes be important when using the ©line numbersª switch to enable a debugger to find code in your parser file. Note that the parser file name is not the same as the ©parser nameª. ## Parser Generator A parser generator, such as AnaGram, is a program that converts a ©grammarª, a rule-based description of the input to a program, into a conventional, procedural module called a ©parserª. The parsers AnaGram generates are simple C modules which can be compiled on almost any platform. AnaGram parsers are also compatible with C++. ## Header File, Parser Header When you use the command ©Build Parserª to generate source code for a parser, AnaGram creates two files, a header file and a C source file. Unless different paths are specified in the ©parser file nameª and ©header file nameª parameters, both files will be written to the directory that contains the ©syntax fileª. The header file contains a number of typedef statements, including the definition of the ©parser control blockª, and a number of macro definitions which may be useful in your parser or in other modules of your program. If you do not alter the ©header file nameª parameter, the name of the header file will be the same as the name of your ©syntax fileª and it will have the extension ".h". If you wish the header file to be written without carriage returns, suitable for a Unix environment, set the ©no crª configuration switch. ## Parser Input AnaGram ©parserªs may be configured to accept input in any of three different ways: By default, a ©parse functionª gets its input by invoking the ©GET_INPUTª macro each time it is ready for another input token. The default implementation of GET_INPUT reads ©input characterªs from stdin. For most practical problems, you will want to override this definition of GET_INPUT, storing the current input character in PCB.input_code. Alternatively, you may configure a parser to read input from an array in memory. Set the ©pointer inputª switch and load the ©pointerª field of the parser control block before calling the parse function. The parser will then run, incrementing the pointer, until it finishes or encounters an error. The third alternative is to set the ©event drivenª switch. The parser will be configured as a callback routine. Begin by calling the ©initializerª. Then, for each input character, store the character in the ©input_codeª field of the parser control block and call the parse function. Each time you call the parse function it will continue until it needs more input. You can check its status by inspecting the ©exit_flagª in the parser control block. The input to your parser may be either text characters or ©tokensª accumulated by a pre-processor, or ©lexical scannerª. The latter case is referred to as ©token inputª. If you use a lexical scanner, you may find it convenient to configure your parser as event driven. Altlhough lexical scanners are often not necessary when you use AnaGram, if you do need one you can write it in AnaGram. ## Parser Name "Parser Name" is a ©configuration parameterª which defaults to "#", where "#" represents the name of your ©syntax fileª. AnaGram uses this parameter to name your ©parse functionª. The ©initializerª for your parser will have the same name preceded by "init_". Note that "©parser file nameª" is not the same configuration parameter as "parser name". ## Parser Stack Your ©parserª uses a "parser stack" to keep track of the ©grammar rulesª it is trying to match and its progress in matching them. Normally, there are two separate stacks defined by AnaGram: ©PCBª.©ssª, the ©parser state stackª which maintains ©parser stateª numbers, and PCB.©vsª, the ©parser value stackª which maintains the ©semantic valueªs of tokens that have been identified so far. If you wish to maintain a stack tracking other variables you may set the ©context typeª ©configuration parameterª, and AnaGram will define a third stack, PCB.©csª. All are indexed by the same stack index, PCB.©ssxª. To see how tokens accumulate on the parser stack, run the ©Grammar Traceª or the ©File Traceª. Normally, when the return value of a ©reduction procedureª is stored on the parser value stack, it is stored by simply coercing the stack pointer to the correct type. If the return value is a C++ object, this can cause serious problems. These problems can be avoided by using the ©wrapperª statement. ## Parser stack alignment Parser stack alignment is a ©configuration parameterª whose value is a C or C++ data type. It defaults to "long". If any tokens have type "double", it will be automatically set to double. Thus, you will normally not need to change this parameter if your parser is to run on a PC or compatible processor. It provides alignment control for processors which restrict address for multibyte data access. The default setting provides for correct operation on 64 bit processors. To control byte alignment of the parser stack, ©PCBª.©vsª, AnaGram normally adds a field of the specified data type to the "union" statement which defines the data type for the ©parser stackª. This parameter can be used to deal with byte alignment problems when a ©parserª is to be run on a processor with byte alignment restrictions. For instance, if your ©grammarª has ©tokenªs of type "long double" and your processor requires long double variables to be properly aligned, you can include the following statement in a ©configuration sectionª in your grammar or in your ©configuration fileª: parser stack alignment = long double If the data type specified is "void", no alignment declaration will be made. ## Parser Stack Index, Stack Index The parser stack index, ©PCBª.©ssxª, tracks the depth of the ©parser state stackª, the ©parser value stackª, and the ©context stackª if you defined one. The parser stack index is incremented by ©shift actionsª and reduced by ©reduce actionsª. ## Parser Stack Overflow Your ©parserª uses a ©parser stackª to keep track of the ©grammar rulesª it is trying to match and its progress in matching them. If your grammar has any ©recursive ruleªs that are not strictly left recursive, then no matter how big you make the parser stack, it will be possible to create a syntactically correct input which will cause the stack to overflow. As a practical matter, however, it is usually possible to set the ©parser stack sizeª to a value large enough so that an overflow is a freak occurrence. Nevertheless, it is necessary to check for overflow, and in the case overflow should occur, your parser has to do something. What it does is invoke the ©PARSER_STACK_OVERFLOWª macro. If you don't define it, AnaGram will define it for you, although not necessarily to your taste. ## Recursive rule, Recursion A ©grammar ruleª is said to be "recursive" if the ©tokenª on the left side of the rule also appears on the right side of the rule, or in an ©expansion ruleª of any token on the right side of the rule. If the token on the left side is the first token on the right side, the rule is said to be "left recursive". If it is the last token on the right side, the rule is said to be "right recursive". Otherwise, the rule is "center recursive". For example: statement list -> statement -> statement list, statement // left recursive fraction part -> digit -> fraction part, digit // right recursive expression -> factor -> expression, '+' + '-', factor factor -> primary -> factor, '*' + '/', primary primary -> number -> name -> '(', expression, ')' // center recursive Note that if all the tokens in the rule other then the recursive token itself are ©zero lengthª tokens, it is possible for the rule to be matched arbitrarily many times without any input whatsoever. In other words, such a rule creates an infinite loop in the parser. AnaGram can detect this condition and issues an ©empty recursionª diagnostic if it occurs. ## PARSER_STACK_OVERFLOW PARSER_STACK_OVERFLOW is a user definable macro. If you do not define it, AnaGram will define it so that it will print a message on stderr and abort the ©parserª in case of a ©parser stack overflowª. ## Parser Stack Size "Parser stack size" is a ©configuration parameterª with a default value of 128. It is used to define the sizes of your ©parser stacksª in your ©parser control blockª. When analyzing your grammar, AnaGram will determine the minimum amount of stack space required for the deepest left ©recursionª. To this depth it will add one half the value of the parser stack size parameter. It will then set the actual stack size to the larger of this value and the parser stack size parameter. ## Parser State, State Number The essential part of your ©parserª is a group of tables which describe in detail what to do for each "state" of the parser. The states of a parser are determined by sets of "©characteristic rulesª". The ©State Definition Tableª shows the characteristic rules for each state of your parser. AnaGram numbers the states of a parser as it identifies them, beginning with zero. In all windows, state numbers are displayed as three digit numbers prefixed with the letter 'S'. ## Parser State Stack, State Stack The parser state stack is a stack maintained by your ©parserª and which is an integral part of the parsing process. At any point in the parse of your input stream, the parser state stack provides a summary of what has been found so far. The parser state stack is stored in ©PCBª.©ssª and is indexed by PCB.©ssxª, the ©parser stack indexª. ## Parser Value Stack, Value Stack In parallel with the ©parser state stackª, your parser maintains a "value stack", ©PCBª.©vsª, each entry of which corresponds to the ©semantic valueª of the token identified at that state. Since the semantic values of different tokens might well have different ©data typeªs, AnaGram gives you the opportunity, in your ©syntax fileª, to define the data type for any token. AnaGram then builds a typedef statement creating a data type which is a union of the all the types you have defined. AnaGram creates the name for this ©data typeª by appending "_vs_type" to the ©parser nameª. AnaGram uses this data type to define the value stack. ## Parser Action In a traditional LR parser, there are only four actions: the ©shift actionª, the ©reduce actionª, the ©accept actionª and the ©error actionª. AnaGram, in doing its ©grammar analysisª, identifies a number of special cases, and creates a number of extra actions which make for faster processing, but which can be represented as combinations of these primitive actions. When a shift action is performed, the current state number is pushed onto the ©parser state stackª and the new state number is determined by the current state number and the current input token. Different tokens cause different new states. When a reduce action is performed, the length of the rule being reduced is subtracted from the ©parser stack indexª and the new state number is read from the top of the parser state stack. The ©reduction tokenª for the rule being reduced is then used as an input token. ## Parsing Engine A parser consists of three basic components: A set of syntax tables, a set of ©reduction procedureªs and a parsing engine. The parsing engine is the body of code that interprets the parsing table, invokes input functions, and calls the reduction procedures. The ©Build Parserª command configures a parsing engine according to the implicit requirements of the syntax specification and according to the explicit values of the ©configuration parameterªs. The parsing engine itself is a simple automaton, characterized by a set of states and a set of inputs. The inputs are the tokens of your grammar. Each state is represented by a list of tokens which are admissible in that state and for each token a ©parser actionª to perform and a parameter which further defines the action. Each state in the grammar, with the exception of state zero, has a ©characteristic tokenª which must have been recognized in order to jump to that state. Therefore, the ©parser state stackª, which is essentially a list of state numbers, can also be thought of as a list of token numbers. This is the list of tokens that have been seen so far in the parse of your input stream. ## Partition If you use ©character setsª in your grammar, AnaGram will compute a "partition" of the ©character universeª. This partition is a collection of non-overlapping character sets such that every one of the sets you have defined can be written as a ©unionª of partition sets. Each partition set is assigned a unique ©tokenª. If one of your character sets requires more than one partition set to represent it, AnaGram will create appropriate ©productionªs and add them to your grammar so your parser can make the necessary distinctions. To see how AnaGram has partitioned the character universe, you may inspect the ©Partition Setsª window found in the ©Browse Menuª. ## Partition Set Number Each ©partitionª set is identified by a unique reference number called the partition set number. Partition set numbers are displayed in the form Pxxx. Partition sets are numbered starting with zero, so the first set is P000. To see the elements of a given partition set, call up the ©Partition Setsª window from the ©Browse Menuª, then, after selecting a partition set, call up the ©Set Elementsª window from the ©Auxiliary Windowsª popup menu. ## Partition Sets The Partition Sets option in the ©Browse Menuª pops up a window which shows the complete ©partitionª of the ©character universeª for your parser. The Partition Sets option in the ©Auxiliary Windowsª popup menu for the ©Character Setsª window lets you see the partition sets which cover the specified character set. Each entry in a Partition Sets window identifies a token number and a ©partition set numberª. The ©Auxiliary Windowsª menu provides a ©Set Elementsª entry which enables you to see precisely which characters belong to the partition set. It also has a Token Usage entry to show you what rules the set is used in. ## PCONTEXT PCONTEXT is an alternate form of the ©CONTEXTª macro which takes an explicit argument to specify the ©parser control blockª. PCONTEXT is defined in the ©parser headerª file. ## PERROR_CONTEXT PERROR_CONTEXT is an alternative form of the ©ERROR_CONTEXTª macro. It differs only in that it takes an argument so you can specify the appropriate ©parser control blockª explicitly. PERROR_CONTEXT is defined in the ©parser headerª file. ## pointer "pointer" is a field which will be included in the ©parser control blockª for your parser if you have set the ©pointer inputª ©configuration switchª. Your main program should set PCB.pointer before it calls your parser. Thereafter, your parser will increment it appropriately. When you are executing a ©reduction procedureª or a ©SYNTAX_ERRORª macro PCB.pointer will always point to the next input character to be read. ## Pointer input "Pointer input" is a ©configuration switchª which you may set to control ©parser inputª. It defaults to off. When you set pointer input, you tell AnaGram that the input to your parser is in memory and can be scanned simply by incrementing a pointer. Before calling your parser you should make sure that ©PCBª.©pointerª is properly initialized to point to the first character or token in your input. Use the ©configuration parameterª "©pointer typeª" to specify the type of the pointer. The default value of "pointer type" is "unsigned char *" There is no particular reason why pointer type should be limited to variants on char. It could define a pointer to int or a structure just as well. If you use pointer input with structures or C++ classes, you should set the ©input valuesª switch and define an ©INPUT_CODEª(t) macro. If you are using a 16 bit compiler and your input array is so large that you need "huge" pointers, make sure that "pointer type" is properly defined. ## Pointer Type "Pointer Type is a ©configuration parameterª which defaults to "unsigned char *". When you have specified ©pointer inputª, AnaGram uses the value of pointer type to declare a pointer field in your ©parser control blockª. ## Precedence, Operator Precedence In expressions of the form a+b*c, the convention is to perform the multiplication before the addition. Multiplication is said to take precedence over addition. In general the rank order in which operations are to be performed if there are no parentheses forcing an order of computation is called the precedence of the operators. If you have an ambiguous ©grammarª, that is, a grammar with a number of ©conflictªs, you may use ©precedence declarationªs to resolve the conflicts and to set operator precedence. ## Precedence Declaration Precedence declarations are ©attribute statementsª which may be used to resolve ©conflictªs in your grammar by assigning precedence and associativity to operators. Precedence declarations must be made inside ©configuration sectionsª. Each declaration consists of the keyword ©leftª, ©rightª, or ©nonassocª followed by a list of ©rule elementsª. The rule elements in the list must be separated by commas and the entire list must be enclosed in braces ({ }). Each of the rule elements is assigned the same precedence level, which is higher than that assigned in all previous precedence declarations and lower than that in all subsequent declarations. The rule elements are defined to be left, right, or nonassociative, depending on whether the keyword was "left", "right", or "nonassoc". All conflicts which are resolved by precedence declarations are listed in the ©Resolved Conflictsª window. ## Precedence Rules AnaGram can resolve certain types of ©conflictªs in your grammar by applying precedence rules. There are three classes of rules available: explicit ©precedence declarationsª, the "©stickyª" statement, and the implicit rule associated with the use of a "©disregardª" token outside a ©lexemeª. Whenever AnaGram uses a precedence rule of any kind to resolve a conflict, it produces a ©warningª message and lists the conflict in the ©Resolved Conflictsª window. ## Previous States The Previous States window can be accessed via the ©Auxiliary Windowsª popup menu from any window that identifies ©parser stateªs. It shows the ©characteristic ruleªs for all of the states which jump to the presently selected state. ## Print File Name "Print file name" is a configuration parameter which is not used in the Windows version of AnaGram. It is retained only for compatibility with pre-existing ©configuration fileªs. ## Problem States The Problem States window is essentially a trimmed version of the ©Reduction Statesª window. It is available in the ©Auxiliary Windowsª popup menu for the ©Conflictsª and ©Resolved Conflictsª windows. The Problem States window has the same format as the Reduction States window, and differs only in that it shows only those reduction states for which the ©conflict tokenª is acceptable input. ## Production Productions are the mechanism you use to describe how complex input structures are built up out of simpler ones. Each production has a left hand side and a right hand side. The right hand side, or ©grammar ruleª, is a sequence of ©rule elementsª, which may represent either ©terminal tokensª or ©nonterminal tokensª. The left hand side is a list of ©reduction tokensª. In most cases there would be only a single reduction token. Productions with more than one ©tokenª on the left hand side are called ©semantically determined productionsª. The "->" symbol is used to separate the left hand side from the right hand side. If you have several productions with the same left hand side, you can avoid rewriting the left hand side either by using '|' or by using another "->". A ©null productionª, or empty right hand side, cannot follow a '|'. Productions may be written thus: name -> letter -> name, digit This could also be written name -> letter | name, digit In order to accommodate semantic analysis of the data, you may attach to any grammar rule a ©reduction procedureª which will be executed when the rule is identified. Each token may have a ©semantic valueª. By using ©parameter assignmentªs, you may provide the reduction procedure with access to the semantic values of tokens that comprise the grammar rule. When it finishes, the reduction procedure may return a value which will be saved on the ©parser value stackª as the semantic value of the ©reduction tokenª. ## Productions The ©Productionªs window is available via the ©Auxiliary Windowsª popup menu in any window which identifies tokens. If the token identified by the highlighted line is ©nonterminalª, the Productions window will show the rules produced by that ©tokenª. ## PRULE_CONTEXT PRULE_CONTEXT is an alternative form of the ©RULE_CONTEXTª macro. It differs only in that it takes an argument so you can specify the appropriate ©parser control blockª explicitly. PRULE_CONTEXT is defined in the ©parser headerª file. ## Quick Reference "Quick reference" is an ©obsolete configuration switchª. ## Range Bounds Out of Order This is a ©warningª message that appears when you have a ©character rangeª of the form 'z-a'. AnaGram interprets this range as being equal to 'a-z', but provides a warning in case the unusual order was the result of a clerical error. ## Recursive Definition of Char Set This ©warningª appears when AnaGram discovers a recursively defined ©character setª. Character sets cannot be defined recursively. ## Redefinition "Redefinition of <name>" is a ©warningª message which appears when AnaGram discovers a redefinition of a ©symbolª. The new ©definitionª is ignored. ## Redefinition of Grammar Token This ©warningª appears when AnaGram encounters a new definition of the ©grammar tokenª. AnaGram discards the old definition. The last definition in the syntax file wins. If you get this warning, check your ©syntax fileª to make sure you have the grammar token you want. ## Redefinition of token "Redefinition of token, TXXX: <name>" is a ©warningª message which occurs when AnaGram encounters a ©definitionª statement and the specified ©grammar tokenª has already been seen on the left side of a ©productionª. AnaGram will ignore the definition statement. ## Reduce Action, Reduction The reduce action, or reduction, is one of the four actions of a traditional ©parsing engineª. The reduce action is performed when the parser has succeeded in matching all the elements of a ©grammar ruleª, and the next input token is not erroneous. Reducing the grammar rule amounts to subtracting the length of the rule from the ©parser stack indexª, identifying the ©reduction tokenª, stacking its ©semantic valueª and then doing a ©shift actionª with the reduction token as though it had been input directly. ## Reduce-Reduce Conflict A grammar has a "reduce-reduce" ©conflictª at some state if a single token turns out to be a ©reducing tokenª for more than one ©completed ruleª. ## Reducing Token In a ©parser stateª with more than one ©completed ruleª, your parser must be able to determine which one was actually found. Therefore, during analysis of your grammar, AnaGram examines each completed rule in order to determine all the states the ©parserª will branch to once the rule is reduced. These states are called the "reduction states" for the rule. In any window that displays ©marked ruleªs, these states may be found in the ©Reduction Statesª window listed in the ©Auxiliary Windowsª popup menu. The acceptable input tokens for those states are the "reducing tokens" for the completed rules in the state under investigation. If there is a single token which is a reducing token for more than one rule, then the grammar is said to have a ©reduce-reduce conflictª at that state. If in a particular state there is both a ©shift actionª and a ©reduce actionª for the same token the grammar is said to have a ©shift-reduce conflictª in that state. Note that a "reducing token" is not the same as a "©reduction tokenª". ## Reduction Choices "Reduction choices" is a ©configuration switchª which defaults to off. If it is set, AnaGram will include in your ©parser fileª a function which will identify the acceptable choices for ©reduction tokenª in the current state. This function, of course, is useful only if you are using ©semantically determined productionsª. The prototype of this function is: int $_reduction_choices(int *); where '$' represents the name of your parser. You must provide an integer array whose length is at least as long as the maximum number of reduction choices you might have. The function will fill the array with the token numbers of those which are acceptable in the current state and will return a count of the number of acceptable choices it found. ## reduction_token "reduction_token" is a field in your ©parser control blockª. If your grammar uses ©semantically determined productionsª, your ©reduction procedureªs need a mechanism to specify which token the rule is to reduce to. ©PCBª.reduction_token names the variable which contains the ©token numberª of the ©reduction tokenª. Prior to calling your reduction procedure, your parser will set this field to the token number of the default ©reduction tokenª, i.e., the leftmost syntactically correct token in the reduction token list for the production being reduced. If the reduction procedure establishes that a different reduction token is appropriate, it should store the appropriate token number in PCB.reduction_token. ## Reduction Procedures The Reduction Procedures window lists the C function prototypes for the ©reduction procedureªs in your grammar. When this window is active, the ©syntax fileª window, if visible, is synchronized with it so you can see the body of the reduction procedure as well as its usage. ## REDUCTION_TOKEN_ERROR REDUCTION_TOKEN_ERROR is a user definable macro which your ©parserª invokes when it encounters an inadmissible reduction token. This error should occur only if your parser uses ©semantically determined productionsª and your ©reduction procedureª provides an incorrect ©token numberª. If you do not define it, AnaGram will define it so that it will print an error message on stderr and abort the parse. ## Reduction Procedure, Semantic Action A "reduction procedure", or "semantic action", is a function you write which your ©parserª executes when it has identified the grammar rule to which the reduction procedure is attached in your grammar. When your parser has identified a particular ©grammar ruleª, that is to say, a particular sequence of ©tokensª that you have specified in your grammar, it "reduces" the production to the token at the head of the production, or ©reduction tokenª. If you choose, you can specify a "reduction procedure" which your parser will call so that your program can do semantic analysis on the production just identified. Your reduction procedure will be called using, as arguments, the ©semantic valuesª of tokens on the right side of the production. Your reduction procedure may, if you choose, return a value which will become the semantic value of the reduction token. Since many of the tokens in ©productionªs are there for only syntactic purposes, you may specify, when you write your grammar, the tokens whose values are needed as arguments for your reduction procedure. To attach a reduction procedure to a grammar rule, just write it immediately following the rule. There are two formats for reduction procedures, depending on the size and complexity of the procedure. The first form consists of an equal sign followed by a C expression and a semicolon. When the rule is matched the expression will be evaluated and its value will be stacked on the ©parser value stackª as the value of the reduction token. For example: =-a; =myProcedure(x, q); The second form consists of an equal sign followed by a block of C code enclosed in curly braces. If you wish to return a value for the reduction token you have to use a return statement. For example: ={ if (x > y) return x; return x+2y; } In both forms of the reduction procedure, ©parameter assignmentªs may be attached to ©rule elementªs in order to make their semantic values available to the reduction procedure. When the reduction procedure is executed, local variables will defined with the names specified in the parameter assignments. The values of these variables will have been set to the value of the corresponding token. If the return value of your reduction procedure is a C++ object, you may wish to spacify that AnaGram enclose it in a ©wrapperª so that constructor calls and destructor calls are made. Otherwise the object pushed onto and popped from the parser value stack simply by coercing the stack pointer to the appropriate type. The reduction procedures in your grammar are summarized in the ©Reduction Proceduresª window. ## Reduction States The Reduction States window can be accessed via the ©Auxiliary Windowsª popup menu from any window which displays ©parser stateª numbers and ©marked ruleªs. If the highlighted ©grammar ruleª has no marked token, the Reduction States window will show the states the parse could reach by reducing the rule and processing the resultant ©reduction tokenª. ## Reduction Token A ©tokenª which appears on the left hand side of a ©productionª is called a reduction token. It is so called because when the ©grammar ruleª on the right side of the production is matched in the input stream, your ©parserª will "reduce" the sequence of tokens which matches the rule by replacing the sequence of tokens with the reduction token. If more than one reduction token is specified, the production is called a ©semantically determined productionª and your ©reduction procedureª should choose the appropriate reduction token. If it does not, your parser will use the first token in the list that is syntactically correct as the default. The ©CHANGE_REDUCTIONª macro can be used to specify the reduction token. Note that a "reduction token" is not the same as a "©reducing tokenª". ## Reduction Trace The Reduction Trace window is available from the ©Conflictsª window and the ©Resolved Conflictsª window. It can be used in conjunction with the ©Conflict Traceª to study ©conflictªs. The Reduction Trace represents the result of taking the reduce option in the conflict state of the Conflict Trace. ## Reentrant Parser "Reentrant parser" is a ©configuration switchª which defaults to off. If it is on when AnaGram builds a parser AnaGram will generate code that passes the pointer to the ©parser control blockª via calling sequences, rather than using static references to the pcb. You can use the reentrant parser switch to help make ©thread safe parsersª. The reentrant parser switch is compatible with both C and C++. The reentrant parser switch cannot be used in conjunction with the ©old styleª switch. When you have enabled the reentrant parser switch, the ©parse functionª, the ©initializerª function, and the ©parser value functionª will be defined to take a pointer to the parser control block as their sole argument. ## Reload Button The ©File Traceª window includes a reload button to allow you to reread your ©test fileª after you have modified it without having to start a new file trace. After the file has been reread, the file trace is reset. ## rename macro AnaGram uses a number of macros in its generated code. It is possible, therefore, to run into naming collisions with other components of your program. The rename macro ©attribute statementª allows you to change the name AnaGram uses for a particular macro to avoid these problems. For example, in the Microsoft Foundation Classes, V4.2, there is a class called "CONTEXT". If you use the ©context stackª option in AnaGram, your ©parserª will have a macro called ©CONTEXTª. To avoid the name collision, add the following attribute statement to any configuration section in your grammar: rename macro CONTEXT AG_CONTEXT Then, simply use "AG_CONTEXT" where you would otherwise have used "CONTEXT". ## reserve keywords "reserve keywords" is an ©attribute statementª which can be used to specify a list of ©keywordªs that are reserved and cannot be used except as explicitly specified in the grammar. In particular this switch enables AnaGram to avoid issuing meaningless ©keyword anomalyª warnings. AnaGram does not automatically presume that keywords are also reserved words, since in many grammars there is no need to specify reserved words. Reserve keywords statements must be made inside ©configuration sectionsª. Each statement consists of the keyword "reserve keywords" followed by a list of keyword ©tokensª. The tokens must be separated by commas and the list must be enclosed in braces ({ }). Each keyword listed will then be treated as a reserved word. ## Reset Button The Reset button, found on ©File Traceª and ©Grammar Traceª windows restores the initial configuration of the trace. This is especially convenient for ©Conflict Traceª or other ©Auxiliary Traceªs. ## Resolved Conflicts AnaGram creates the Resolved Conflicts window only when the grammar it is analyzing has ©conflictªs and when those conflicts have been resolved by ©precedence declarationªs, by "©stickyª" statements, or in connection with the explicit use of a token specified in a ©disregardª statement. The Resolved Conflicts window shows the conflicts that have been resolved, using the same format as that of the ©Conflictsª Window. The rule chosen is marked with an asterisk in the leftmost column of the window. ## Resynchronization "Resynchronization" is the process of getting your parser back in step with its input after encountering a ©syntax errorª. As such, it is one method of ©error recoveryª. Of course, you would resynchronize only if it is necessary to continue after the error. There are several options available when using AnaGram. You could use the ©auto resynchª option, which causes AnaGram to incorporate an automatic resynchronizing procedure into your parser, or you could use the ©error token resynchronizationª option, which is similar to the technique used by YACC programmers. ## right "right" controls a ©precedence declarationª, indicating that all of the listed ©rule elementsª are to be considered ©right associativeª. ## Right Associative A binary operator is said to be right associative if an expression with repeated instances of the operator is to be evaluated from the right. Thus, for example, when '=' is used as an assignment operator x = a = b is normally taken to mean a = b followed by x = a The assignment operator is said to be right associative. In ©grammarªs with ©conflictªs, you may use ©precedence declarationªs to specify that an operator should be right associative. ## Rule Context The Rule Context window can be accessed via the ©Auxiliary Windowsª menu in any window that displays ©grammar ruleªs. AnaGram displays all occurrences in the ©grammarª of all the ©reduction tokenªs for the rule. ## RULE_CONTEXT RULE_CONTEXT is a macro you may use if you have defined a ©context stackª. In any reduction procedure, RULE_CONTEXT will be a pointer to the context value stacked before the first token of the rule being reduced. Since the context stack contains an entry for each token in the rule, you may inspect the context value for each token in the rule by subscripting RULE_CONTEXT. RULE_CONTEXT[k] is the context of the (k-1)th token in the rule. ## Rule Coverage "Rule Coverage" is the name of both a ©configuration switchª and a window. The configuration switch defaults to off. If you set it, AnaGram will include code in your ©parserª to count the number of times your parser identifies each ©grammar ruleª in your grammar. To maintain the counts, AnaGram declares, at the beginning of your parser, an integer array, whose name is created by appending "_nrc" to your ©parser nameª. The array contains one counter for each rule you have defined in your grammar. There are no entries for the auxiliary rules that AnaGram creates to deal with set overlaps or ©disregardª statements. In order to identify positively all the rules that the parser reduces, AnaGram has to turn off certain optimization features in your parser. Therefore a parser that has rule coverage enabled will run slightly slower that one with the switch off. In addition, AnaGram creates a pair of functions to write the counters to a file and to initialize the counters from a file. The names of these functions are given by appending "_write_counts" and "_read_counts" to the name of your parser. The name of the file is given by the ©coverage file nameª paramater which defaults to the name of your ©syntax fileª but with the extension ".nrc". If rule coverage is enabled, AnaGram will also enable the Rule Coverage option on the ©Browse Menuª. If you select Rule Coverage, AnaGram will initialize a ©Rule Coverageª window from the rule count file you select. AnaGram will warn you if the rule count file is older than the syntax file, since under those conditions, the coverage file might be invalid. ## Rule Derivation, Token Derivation You can use the Rule Derivation and Token Derivation windows to understand the nature of ©conflictªs in your grammar. To create these windows, open the ©Conflictsª window. Move the cursor bar to a ©completed ruleª, that is, one which has no marked token. Press the right mouse button to pop up the ©Auxiliary Windowsª menu. You may then select the Rule Derivation or the Token Derivation. The Rule Derivation window and the Token Derivation window, together, show how a ©conflictª, or ambiguity, has arisen in your grammar. Both windows contain a sequence of rules, and both begin with the same rule, the rule which is the root cause of the conflict. Each subsequent line in the rule derivation is an ©expansionª of the marked token in the previous rule. The last rule in the derivation window is the rule you selected in the Conflicts window. Thus the rule derivation window shows you how the rule involved in the conflict derives from the root. Each subsequent line in the token derivation window shows an expansion of the marked token in the previous rule. The first token of the last rule in the derivation window is the token that causes the conflict. This is the usage that is inconsistent with other usages of this token in the conflict state. The Rule Derivation and Token Derivation windows each have five auxiliary windows. The ©Rule Contextª window is keyed to the highlighted rule. the other four windows, the ©Expansion Rulesª window, the ©Productionsª window, the ©Set Elementsª window and the ©Token Usageª window are keyed to the marked token. Remember that there is no marked token on the last line of the Rule Derivation window. ## Rule Element A ©grammar ruleª is a list of "rule elements", separated by commas. Rule elements may be ©token nameªs, ©character setsª, ©keywordªs, ©immediate actionªs, or ©virtual productionsª. When AnaGram encounters a rule element for which no token presently exists, it creates one. Any rule element may be followed by a ©parameter assignmentª in order to make the ©semantic valueª of the rule element available to a ©reduction procedureª. ## Rule Number AnaGram assigns a unique rule number to each ©grammar ruleª that you specify in your grammar. Rules are numbered sequentially as they are encountered in the ©syntax fileª. AnaGram constructs rule 0 itself. Rule zero has a single ©rule elementª, the ©grammar tokenª, unless you have an ©disregardª statement in your grammar. In this case, there will be two elements. In AnaGram displays, rule numbers are displayed with a prefixed 'R' and a three digit decimal number. ## Rule Stack, Rule Stack Pane The Rule Stack pane appears across the bottom of a ©Grammar Traceª or ©File Traceª window. It provides an alternate view of the parser stack for the trace, showing, for each state, rules instead of the tokens that you see in the ©Parser Stack paneª. Because it is synched with the syntax file window, the Rule Stack makes it easy to see the relationship between the trace and your grammar. For each level of the parser stack, the Rule Stack shows the ©parser stateª number and all the active rules. The active rules at any state consist of all the ©expansion ruleªs for the state that are consistent with the input at all subsequent states. Except for the last level of the stack, each rule has a ©marked tokenª, which in the default configuration is displayed in bold, italic type. The significance of the marked token is that all tokens in the rule to the left of the marked token have already been matched in the input, and the input in subsequent levels is consistent so far with the marked token. As more input is processed, rules that are inconsistent with the new input are deleted from the display. The last level of the stack shows the current state of the parser and the rules against which the ©lookahead tokenª will be matched. At this level, there may be rules with no marked tokens. These are rules which have been matched exactly in the input. If there is more than one such rule, at the next parser step the parser will use the lookahead token to determine which rule to reduce. In the last level of the stack, marked tokens represent the input the parser expects to see. The Rule Stack pane is synched with the ©syntax fileª window if it is visible so that the rule highlighted in the Rule Stack can be seen in context in the syntax file. For rules that AnaGram generated automatically (to implement ©virtual productionsª or the ©disregardª statement). the cursor bar will move to the top of the syntax file window. The Rule Stack pane is also synched with the other panes in the trace. As you move the cursor bar in the Rule Stack, the cursor bar in the Parser Stack pane will track the stack level in the Rule Stack. In a File Trace, text will be highlighted in the ©Test Fileª pane corresponding to the selected token in the Parser Stack pane. In a Grammar Trace, the marked token in the highlighted rule will be highlighted in the ©Allowable Input paneª. Clicking the right mouse button pops up an ©Auxiliary Windowsª menu to give you more information about the highlighted rule. ## Rule Table The Rule Table lists, in numerical order, all the ©grammar ruleªs defined in your ©grammarª. Each rule is preceded by the ©nonterminalª tokens which produce it. If you are not using ©semantically determined productionªs, then there will be precisely one token line per rule. The Rule Table is synched to your ©syntax fileª to show the rule in context. ## Semantic Value, Token Value A ©tokenª generally has a "semantic value", or "token value", as well as the ©token numberª which identifies it syntactically. Each instance of the token in the input stream can have a different value. For example, you might have a token called "variable name". In one instance the variable name might be "widget" and in another, "wombat". Then "widget" and "wombat" would be the semantic values in the two instances. Another token might have numeric semantic values. You can specify the C or C++ ©data typeª of the token value. The data type of "variable name" could be "char *" where the value is a pointer to a string holding the name. There are separate default types for the values of ©terminalª and ©nonterminalª tokens. In the usual case of ordinary character input, the value of a terminal token is just the ascii character code. The value of a nonterminal token is determined by the ©reduction procedureªs attached to the rules the token produces. If there is no reduction procedure, the value of the token is the value of the first token in the rule. It should be noted that the stack operations have been implemented in such a way that a C++ object that belongs to a class for which the assignment operator has been overridden will encounter serious problems. This shortcoming will be addressed in a future version of AnaGram. Note that there is no problem with using a pointer to any C++ object. ## Semantically Determined Production A "semantically determined production" is one which has more than one ©reduction tokenª specified on the left side of the ©productionª. You would write such a production when the reduction tokens are syntactically indistinguishable. The ©reduction procedureª may then specify which of the listed reduction tokens the grammar rule is to reduce to based on semantic considerations. If there is no reduction procedure, or the reduction procedure does not specify a reduction token, the parser will use the first syntactically correct one in the list. To simplify changing the reduction token, AnaGram provides a predefined macro, ©CHANGE_REDUCTIONª. The ©semantic valueªs of all the reduction tokens for a given semantically determined production must have the same ©data typeª. ©File Traceª and ©Grammar Traceª have a ©Reduction Choices paneª which appears when a semantically determined production is invoked and you need to choose a reduction token. ## Set Elements The Set Elements window is available via the ©Auxiliary Windowsª popup menu from windows which specify character sets, partition sets or tokens. It displays the actual characters which make up the set, or which map to the specified token. For each character, the numeric code as well as its display symbol is given. ## Set Expression, Expression A set expression is an algebraic expression used to define a ©character setª in terms of individual characters, ranges of characters, or other sets of characters as constructed using ©complementsª, ©unionsª, ©intersectionsª, and ©differencesª. ## Shift Action The shift action is one of the four actions of a traditional ©parsing engineª. The shift action is performed when the input token matches one of the acceptable input tokens for the current ©parser stateª. The ©semantic valueª of the token and the current ©state numberª are stacked, the ©parser stack indexª is incremented and the state number is set to a value determined by the previous state and the input token. ## Shift-Reduce Conflict A "shift-reduce" ©conflictª occurs if in some ©parser stateª there exists a ©terminal tokenª that should be shifted, because it is legitimate input for one of the ©grammar ruleªs of the state, but should also be used to reduce some other rule because it is a ©reducing tokenª for that rule. ## sn sn is a field in a ©parser control blockª to which your ©error handlingª routines and your ©reduction procedureªs may refer. Its value is the current ©state numberª of your ©parserª. sn is modified every time your parser "shifts" (performs a ©shift actionª on) a token or reduces (performs a ©reduce actionª on) a ©productionª. ## ss ss is a field in a ©parser control blockª to which your ©error handlingª and ©reduction procedureªs may refer. It is the ©state stackª for your ©parserª. Before every ©shift actionª, the current ©state numberª, ©snª, is stored in PCB.ss[PCB.ssx], where ©ssxª is the ©parser stack indexª. PCB.ssx is then incremented. ## ssx ssx is a field in a ©parser control blockª to which your ©error handlingª routines and ©reduction procedureªs may refer. It is the ©parser stack indexª for your ©parserª. On every ©shift actionª it is incremented. On every ©reduce actionª the length of the ©grammar ruleª being reduced is subtracted from PCB.ssx. ## State Definition The State Definition window can be accessed via the ©Auxiliary Windowsª popup menu from any window that specifies states. It displays the ©characteristic rulesª that define the state. The rules are displayed with a marked token, which is the next token needed in the input if the particular ©grammar ruleª is to be matched. If the rule is a completed rule, no token will be marked. Each line contains the state number, blank if it is the same as the state number of the previous line, the ©rule numberª, and finally the ©marked ruleª. The ©State Definition Tableª, found in the ©Browse Menuª, displays the characteristic rules for all states in the ©grammarª. ## State Definition Table The State Definition Table lists, for each ©parser stateª, all of the ©characteristic rulesª which define that state. The rules are displayed with a ©marked tokenª, which is the next token needed in the input if the particular ©grammar ruleª is to be matched. If the rule is a completed rule, no token will be marked. Each line contains the state number, blank if it is the same as the state number of the previous line, the ©rule numberª, and finally the ©marked ruleª. In the ©Auxiliary Windowsª menu for many states there is a ©State Definitionª entry which provides the characteristic rules for the ©parser stateª identified by the cursor bar. ## State Expansion The State Expansion window may be accessed using the ©Auxiliary Windowsª menu from any window that identifies a particular ©parser stateª. It shows the complete set of ©expansion ruleªs for the state, consisting of the union of the set of ©characteristic ruleªs and, for each characteristic rule, the set of expansion rules for the marked token. Thus the State Expansion window shows all possible legal input to your parser in the given state. ## Sticky "Sticky" statements are ©attribute statementªs and may be used just like a ©precedence declarationª to resolve ©conflictªs. If a ©shift-reduce conflictª occurs in a state where the ©characteristic tokenª is "sticky", the shift action will always be chosen. Sticky statements must be made inside ©configuration sectionsª. Each statement consists of the keyword "sticky" followed by a list of ©tokensª. The tokens must be separated by commas and the list must be enclosed in braces ({ }). Each token will then be treated as sticky. All conflicts which are resolved by sticky statements are listed in the ©Resolved Conflictsª window. ## subgrammar Declaring a nonterminal token to be a "subgrammar" changes the way AnaGram searches for reducing tokens. Normally, if there is a completed rule in a particular state, AnaGram investigates all states to which the parser could jump on reducing the rule. It then considers all terminal tokens that are acceptable input in these states to be reducing tokens for the given rule. If this set of tokens overlaps the set of tokens for which there are shift actions, or the set of tokens which reduce a different rule, there is a ©conflictª. Now consider a particular nonterminal token T and all the rules it produces, whether directly or indirectly. What the preceding remarks mean is that in determining the reducing tokens for any of these rules, AnaGram considers not only the definition, but also the usage of T. There are circumstances when it is inappropriate to consider the usage of T. The most common example occurs when building a lexical scanner for a language such as C. In this case, you can write a complete grammar for a C token with no difficulty. But if you try to extend it to a sequence of tokens, you get scores of conflicts. This situation arises because you specify that any C token can follow another, when in actual practice, an identifier, for example, cannot follow another identifier without some intervening space or punctuation. While it is theoretically possible to write a grammar for a sequence of tokens that has no conflicts, it is not usually pretty. The subgrammar declaration resolves this problem by telling AnaGram that when it is looking for reducing tokens for any rule produced directly or indirectly by a subgrammar token, it should disregard the usage of the token and only consider usage internal to the definition of the subgrammar token, as though the subgrammar token were the start token of the grammar. The subgrammar declaration is made in a ©configuration sectionª and consists of the keyword "subgrammar" followed by a list of token names separated by commas and enclosed in braces ({ }). For example: subgrammar { name, number} ## Suspicious Production This ©warningª message appears when AnaGram finds a ©productionª of the form x -> x. There is probably a typo somewhere in your ©syntax fileª. This production causes a ©conflictª in your grammar. AnaGram leaves this production in your ©grammarª, but if you build a parser, it will never succeed in recognizing this production. ## Switch Takes on/off Values Only The specified parameter is a ©configuration switchª. The only values it may be assigned are ON and OFF. ## Symbol In writing your ©grammarª you use symbols, or names, to represent most of your ©tokensª. You may also use symbols to represent ©character setªs, ©virtual productionªs, ©immediate actionªs, or ©keywordªs. A symbol, or name, must begin with a letter or an underscore. It may then contain any number of these characters as well as digits and embedded white space (including comments). For identification purposes all adjacent white space characters within a symbol name are considered to be a single blank. Upper case and lower case letters are considered to be different. Examples: token name token/*embedded comment*/name All symbols used in your grammar are listed in the ©Symbol Tableª window found in the ©Browse Menuª. ## Symbol Table The Symbol Table lists all the symbols, or names, you used in your grammar. ©Symbolªs may be used, of course, to identify ©tokensª, ©definitionsª, ©virtual productionsª, ©immediate actionªs, or ©keywordªs. Each line in this table identifies a single symbol. The first field is the token number, if any. This is followed by the name. If the name identifies an ©expressionª or virtual production, it is followed by an equal sign and the expression or virtual production. ## Syntax Analysis Aborted This ©warningª message appears if, because of previous errors, AnaGram is unable to complete the ©Analyze Grammarª command on your ©syntax fileª. ## Syntax Directed Parsing Syntax directed parsing, or formal parsing, is an approach to building ©parsersª based on formal language theory. Given a suitable description of a language, called a ©grammarª, there are algorithms which can be used to create parsers for the language automatically. In this context, the set of all possible inputs to a program may be considered to constitute a language, and the rules for formulating the input to the program constitute the grammar for the language. The parsers built from a grammar have the advantage that they can recognize any input that conforms to the rules, and can reject as erroneous any input that fails to conform. Since the program logic necessary to parse input is often extremely intricate, programs which use formal parsing are usually much more reliable than those built by hand. They are also much easier to maintain, since it is much easier to modify a grammar specification than it is to modify complex program logic. ## Syntax Error When you specify a ©grammarª, you specify a set of input character or token sequences which your ©parserª will "recognize". Usually it is possible for there to be other sequences of input tokens which deviate from the rules set down by your grammar. Should your parser find such a sequence in its input which is not explicitly allowed for in your grammar, it is said to have found a "syntax error". The general treatment of syntax errors is called ©error handlingª, of which there are two distinct aspects: ©error diagnosisª and ©error recoveryª. AnaGram allows you to make provision for error handling to fit your needs, but should you not do so, it will provide simple default error handling. ## Statements AnaGram source files, or ©syntax fileªs, consist of the following types of statements: ©productionªs ©configuration sectionªs ©embedded Cª ©definitionªs ©token declarationªs Statements may be in any order. Each statement must begin on a new line. If a statement cannot be construed as complete, it may continue onto another line. Statements may contain spaces, tabs or comments, but may not contain blank lines. ## Syntax File Input files to AnaGram are called syntax files. The default extension for syntax files is .syn. A syntax file contains a "©grammarª" and supporting C or C++ code. The file consists of several distinct types of statements. These are ©token declarationsª, ©productionªs, ©definitionsª, ©embedded Cª, and ©configuration sectionsª. There may be as many of each as you need, in whatever order you find convenient. Each such statement begins on a new line. ## SYNTAX_ERROR SYNTAX_ERROR is a macro which your parser will invoke when it encounters a syntax error in its input stream. If you have set the ©diagnose errorsª ©configuration switchª, the static variable ©PCBª.©syntax_errorª will contain a pointer to a diagnostic message when SYNTAX_ERROR is invoked. If you have also set the ©error frameª switch, ©PCBª.©error_frame_ssxª and ©PCBª.©error_frame_tokenª will also be set appropriately. ## Tab Spacing "tab spacing" is a ©configuration parameterª which controls the expansion of tabs when AnaGram displays your source file or test files in the ©File Traceª window. The value of "tab spacing" is also used to set the default value of the ©TAB_SPACINGª macro in your parser. The default value of "tab spacing" is 8. If you prefer a different value, you should probably include an appropriate statement in your ©configuration fileª. For example: tab spacing = 2 ## TAB_SPACING If you have enabled the ©lines and columnsª switch, your parser needs to know tab spacing in order to increment the column count when it encounters a tab character. It is set up to use the value given by the TAB_SPACING macro. If you do not define TAB_SPACING in your parser, AnaGram will provide a default definition, setting it to the value of the ©tab spacingª ©configuration parameterª. ## Terminal, Terminal Token A "terminal token" is a token which does not appear on the left side of a ©productionª. It represents, therefore, a basic unit of input to your ©parserª. If the input to your parser consists of ascii characters, you may define terminal tokens explicitly as ascii characters or as sets of ascii characters. If you have a lexical scanner, or preprocessor, which produces numeric codes, you may define the terminal tokens directly in terms of these numeric codes. ## Test File Binary "Test file binary" is a ©configuration switchª which defaults to off. When it is off, and you select the ©File Traceª option, AnaGram will read your test files in "text" mode, discarding carriage return characters. When "test file binary" is on, AnaGram will read test files in "binary" mode, preserving carriage return characters. If your parser needs to recognize carriage return characters explicitly, you should turn "test file binary" on. ## Test File Mask "Test file mask" is a string-valued ©configuration parameterª which AnaGram uses to set up the file dialog for the ©File Traceª command. It defaults to "*.*". If there is a conventional file name format for the input to the ©parserª you are developing, you will probably want to set "test file mask" in a ©configuration sectionª in your ©syntax fileª so it is easier to pick out your test files. ## Test range "Test range" is a ©configuration switchª which defaults to on. When it is set, i.e., on, AnaGram will configure your parser so that it checks input characters to verify that they are within the range given by the ©character universeª before it indexes the ©token conversionª table. If range testing is not necessary for your parser, you may turn test range off and get a slight improvement in the performance of your parser. ## Thread Safe Parsers AnaGram 2.01 incorporates several changes designed to make it easier to write thread safe parsers. First, the ©parserªs generated by AnaGram 2.01 no longer use static or global variables to store temporary data. All nonconstant data have been moved to the ©parser control blockª. Second, two new features which make it substantially easier to build thread safe parsers have been added. The ©reentrant parserª switch makes the entire parser reentrant, by passing the pointer to the parser control block as an argument on all function calls. The ©extend pcbª statement allows you to add your own variable declarations to the ©parser control blockª so you can avoid references to global or static variables in your ©reduction procedureªs. Third, new support has been added for C++ classes, including the ©wrapperª statement and the ©PCB_TYPEª macro. ## token_number token_number is a field in a ©parser control blockª to which your ©error handlingª procedures and ©reduction procedureªs may refer. It contains the actual ©token numberª of the current input token. If you are supplying token numbers directly, it is the result of using the actual input character to index the ©token conversionª array, ag_tcv. ## Token Tokens are the units with which your parser works. There are two kinds of tokens: ©terminal tokensª and ©nonterminal tokensª. These latter are identified by the parser as sequences of tokens. The grouping of tokens into more complex tokens is governed by the ©grammar rulesª, or ©productionªs in your grammar. In your grammar, tokens are denoted by ©token nameªs, ©virtual productionsª, explicit ©character representationsª, ©keywordªs, ©immediate actionªs, or ©expressionªs which yield ©character setsª. ## Token Conversion By using ©character setª ©expressionªs, you may in your ©syntax fileª define a number of input characters as being syntactically equivalent. When your ©parserª gets an input character, it uses the character code to index a table called ©ag_tcvª. The value it extracts from this table is the ©token numberª for the input character. The actual character code of the input character becomes the ©token valueª. ## Token Declaration A token declaration is simply a ©productionª with no right hand side. Token declarations can be used to define the ©data typeªs of tokens. To define the data type of a token, simply put the data type in parentheses preceding the name of the token. You can use a list of tokens joined by commas, if you wish. Thus: (char *) variable name, function name could be used to specify that the ©semantic valueªs of the tokens "variable name" and "function name" are both character pointers. Of course, token types may be specified as part of any production the token generates, but sometimes, in the interest of clarity, it is advisable to group all declarations together. ## Token Name All ©nonterminal tokensª that you define in your ©grammarª by means of explicit ©productionªs must have names by which they may be referenced. Token names are ©symbolsª which represent the token syntactically in your grammar specification. ## Token Names "Token names" is a ©configuration switchª that defaults to off. If it is set, it causes AnaGram to include in the ©parser fileª a static array of character strings, indexed by token number, which provides ascii representations of token names. The name of this array is given by "<parser name>_token_names", where <parser name> is the name of the parser function as given by the value of the ©parser nameª parameter. AnaGram also defines a macro, ©TOKEN_NAMESª, which evaluates to the name of the array. The array contains strings for all grammar tokens which have been explicitly named in the syntax file as well as tokens which represent ©keywordªs or single character constants. The array is useful in creating ©syntax errorª diagnostics. Prior to version 2.01 of AnaGram, the TOKEN_NAMES array contained strings only for explicitly named tokens. If this restriction is required, set the ©token names onlyª switch. Token names are also included if the ©diagnose errorsª switch is set. ## TOKEN_NAMES "TOKEN_NAMES" is the name of a macro that AnaGram defines to provide access to a static array of character strings indexed by token number, which provides ascii representation of token names. The array is generated if any of the ©token namesª, ©token names onlyª or ©diagnose errorsª switches are ON. If ©token names onlyª is set, the array contains non-empty strings only for those tokens which are explicitly named in the syntax file. Otherwise, the array also contains strings for tokens which represent keywords or single character constants. ## token names only "Token names only" is a ©configuration switchª that defaults to off. If it is set, it will cause AnaGram to include in the parser file a static array containing the names of the tokens in your grammar. This array will include only those tokens to which you have assigned names explicitly and will not include character constants or keywords. "Token names only" takes precedence over ©token namesª. ## Token Not Used "Token not used, TXXX: <token name> is a ©warningª message which appears if AnaGram finds an unused ©tokenª in your ©grammarª. Often an unused token is the result of an oversight of some kind and indicates a problem in the grammar. ## Token Number AnaGram assigns a unique number, called the "token number" to each token in the grammar, no matter whether it is a ©terminal tokenª or a ©nonterminal tokenª. Your parser does all of its analysis of your input stream using token numbers as its primary material. You may need to know the values of token numbers that AnaGram has assigned, either so a lexical scanner can output correct token numbers, or so a ©reduction procedureª can correctly resolve a ©semantically determined productionª. To help you, AnaGram defines enumeration constants for each of the named tokens in your grammar. The definition of these constants is in the ©parser headerª file. ## Token Representation Not all of the ©tokensª in your grammar have a ©token nameª. Some of the tokens may represent ©character setsª which you spelled out explicitly, ©virtual productionsª, ©immediate actionªs, or ©keywordªs. In its analysis tables, AnaGram tries to provide a meaningful representation for tokens whenever it can. Its first choice is to use the name, if it has one. Otherwise it will use the set definition or the definition of the virtual production if one exists. If AnaGram cannot otherwise represent your token, it will resort to using the token number which it normally represents using the letter T followed by a three digit, zero-padded token number. ## Token Table The Token Table lists all the tokens of your grammar. The first field is the token number. It is followed by a flag field which is "zl" if the token is a ©nonterminal tokenª and is ©zero lengthª. If the token is nonterminal and not zero length, the flag field contains "nt". If the token is a ©terminal tokenª, the field is blank. The next field is blank unless the token has been declared ©stickyª or has had a ©precedenceª level assigned. If the token is sticky, this field will contain 's'. If a precedence level has been assigned, this field will contain the letter 'l', 'r', or 'n' to indicate associativity followed by the precedence level. Finally there is the ©data typeª of the ©semantic valueª of this token and the ©token representationª. ## Token Usage The Token Usage table may be accessed via the ©Auxiliary Windowsª menu from any window that identifies tokens. It shows all the rules in the grammar that use the token. ## Top Margin "Top margin" is an ©obsolete configuration parameterª. ## Trace Coverage Trace Coverage is a table which is built whenever you run ©Grammar Traceª, one of its pre-built versions, or a ©File Traceª. You can access it from the ©Browse Menuª. It shows the number of times each rule in your grammar has been reduced. Unless you have set the ©Rule Coverageª ©configuration switchª, some ©null productionªs and some rules that consist of only one element will not be counted because of speed optimizations in the parser tables. The Trace Coverage tables are reset to zero when you load a new syntax file or start AnaGram. ## Compound Action Traditionally, ©LALR-1 parserªs use only four simple ©parser actionªs: shift, reduce, accept and error. AnaGram parsers use a number of compound actions in order to reduce the size of parse tables and speed up processing. A single compound action may replace several simple shift or reduce actions. The ©Traditional Engineª ©configuration switchª may be used to force AnaGram to use only the simple actions. ## Traditional Engine "Traditional engine" is a ©configuration switchª that defaults to off. Traditional ©LALR-1 parserªs use a ©parsing engineª which has only four actions: ©shift actionª ©reduce actionª ©accept actionª ©error actionª AnaGram, in the interest of faster execution and more compact parse tables, uses a parsing engine with a number of short-cut, or ©compound actionªs. The "traditional engine" switch tells AnaGram not to use the short-cut actions. You would turn this switch on if you wished to use the ©Grammar Traceª or ©File Traceª to see how the standard four parser actions work for a particular combination of grammar and input. Note that to see the effects of single parser actions, you must use the ©Single Stepª button. Remember that in the Grammar Trace, when you single step and the token you have selected causes a reduce action, it will appear on the ©lookahead lineª of the ©parser stack paneª and will be preselected in the ©allowable input paneª until it is finally shifted in to the parser stack. Normally, you should leave the "traditional engine" switch off, Then AnaGram will, whenever possible, compress several parsing actions into one compound action in order to speed execution of the parser. Unfortunately use of the term "traditional" has sometimes created the impression that there is a conservative aspect to the operation of traditional engine parsers. This is not the case. They have the same effect, but are slower and have much larger tables. ## Type Redefinition "Type Redefinition of TXXX: <token name> is a ©warningª message which appears when AnaGram finds a conflicting ©data typeª definition for a ©tokenª in your ©grammarª. The new definition will override the previous one. If you intend to use different type definitions, you should use extreme caution and check the generated code to verify that your ©reduction procedureªs are getting the values you intended. ## Undefined Symbol "Undefined symbol: <name>" is a ©warningª message which appears when AnaGram encounters an undefined ©symbolª while evaluating a ©character setª expression. The following warning in the ©Warningsª window identifies the particular ©tokenª AnaGram was trying to evaluate. ## Undefined Token "Undefined token TXXX: <name>" is a ©warningª message which appears when the indicated ©tokenª has been used in the ©grammarª, but there is no definition of it as a ©terminal tokenª nor does any ©productionª define it as a ©nonterminal tokenª. ## Unexpected "Unexpected <element 1> in <element 2>" is a ©warningª message which you may get when AnaGram analyzes your grammar. It appears when AnaGram unexpectedly encounters an instance of syntactic element 1 at the specified location in an instance of syntactic element 2. AnaGram cannot reliably continue parsing its input. Therefore, it limits further analysis to scanning for syntax errors. If this error is not the result of a prior error, you should correct your ©syntax fileª. Remember that this error could result from something missing just as well as from something extraneous. If element 1 is ©eofª, it often means that you have an unbalanced brace or comment delimiter in the code following the indicated location. ## Union The union of two sets is the set of all elements that are to be found in one or another of the two sets. In an AnaGram syntax file the union of two ©character setsª A and B is represented using the plus sign, as in A + B. The union operator has the same precedence as the ©differenceª operator: lower than that of ©intersectionª and ©complementª. The union operator is ©left associativeª. Watch out! In an AnaGram syntax file 65 + 97 represents the character set which consists of the lower case 'a' and upper case 'A'. It does not represent 162, the sum of 65 and 97. ## Video mode "Video mode" is an ©obsolete configuration parameterª. ## Virtual Production Virtual productions are a special short hand representation of ©grammar rulesª which can be used to indicate a choice of inputs. They are an important convenience, especially useful when you are first building a grammar. Here are some examples of virtual productions: name? // optional name name?... // 0 or more instances of name {name | number} // exactly one name or number {name | number}... // one or more instances of name or number [name | number] // optional choice of name or number [name | number]... // zero or more instances of name or number AnaGram rewrites virtual productions, so that when you look at the syntax tables in AnaGram, there will be actual ©productionªs replacing the virtual productions. A virtual production appears as one of the rule elements in a grammar rule, i.e. as one of the members of the list on the right side of a production. The simplest virtual production is the "optional" token. If x is an arbitrary token, x? can be used to indicate an optional x. Related virtual productions are x... and x?... where the three dots indicate repetition. x... represents an arbitrary number of occurrences of x, but at least one. x?... represents zero or more occurrences of x. The remaining virtual productions use curly or square brackets to enclose a sequence of rules. The brackets may be followed variously by nothing, a string of three dots, or a slash, to indicate the choices to be made from the rules. Note that rules may be used, not merely tokens. If r1 through rn are a set of ©grammar rulesª, then {r1 | r2 | ... | rn} is a virtual production that allows a choice of exactly one of the rules. Similarly, {r1 | r2 | ... | rn}... is a virtual production that allows a choice of one or more of the rules. And, finally, {r1 | r2 | ... | rn}/... is a virtual production that allows a choice of one or more of the rules subject to the side condition that rules must alternate, that is, that no rule can follow itself immediately without the interposition of some other rule. This is a case that is not particularly easy to write by hand, but is quite useful in a number of contexts. If the above virtual productions are written with [] instead of {}, they all become optional. [] is an optional choice, []... is zero or more choices, and []/... is zero or more alternating choices. Null productions are not permitted in virtual productions in those cases where they would cause an intrinsic ambiguity. You may use a ©definitionª statement to assign a name to a virtual production. ## Void token "Void token, <token name>, used as parameter" is a ©warningª message which appears if AnaGram encounters a ©data typeª definition declaring a ©tokenª to have type void when the token has previously been used in a ©parameter assignmentª for a ©reduction procedureª. Your C or C++ compiler will complain when it tries to compile the call to the reduction procedure. ## vs vs is a field in a ©parser control blockª to which your ©error handlingª procedures and ©reduction procedureªs may refer. It is the ©parser value stackª for your parser. The ©semantic valuesª of the ©tokensª identified by the parser are stored in the value stack. The value stack, like the other ©parser stacksª, is indexed by ©PCBª.©ssxª. When you are executing a reduction procedure, PCB.vs[PCB.ssx] contains the semantic value of the first token in the grammar rule you are reducing, PCB.vs[PCB.ssx+1] contains the second, and so forth. The return value from your reduction procedure will be stored in turn in PCB.vs[PCB.ssx]. vs is defined to be of type $_vt, where "$" represents the name of your parser. AnaGram defines $_vt to be a union of fields of sizes corresponding to all the different data types declared in your syntax for the semantic values of your tokens. In order to avoid restrictions on the use of C++ classes, the fields are defined as character arrays. On some processors which have byte alignment restrictions for multibyte data, you might encounter a bus error. To correct this problem, set the ©parser stack alignmentª parameter to an appropriate data type. ## Warning If while analyzing your syntax file, AnaGram finds something suspicious, it is likely to issue a warning. The Warnings window will pop up automatically when the analysis has been completed. If the warning is for a ©syntax errorª in your input file, you will have to fix it, because AnaGram cannot successfully interpret it. Otherwise, AnaGram will be able to create a ©parserª for you, if you wish, no matter how serious the warnings may be. You can bring up the Help topic associated with a highlighted warning by pressing F1 or by clicking with a ©Help Cursorª. If you have syntax errors, AnaGram will synchronize the cursor in the ©syntax fileª window with the cursor in the Warnings window so that whenever the Warnings window is active, the cursor bar in the syntax file window will identify the location of the error. ## What's New Changes in AnaGram 2.40 Most of the changes in AnaGram 2.40 are under the hood - cleanup of source files, reorganization of the source tree, revision of build and test procedures, and so forth, in preparation for the open source release. All of this will, with luck, be invisible to the end user. Open Source AnaGram is now ©open sourceª. AnaGram itself uses the 4-clause BSD ©licenseª; the ©parsing engineª, and thus the output files, are licensed with the less restrictive zlib ©licenseª. Source distributions are available from http://www.parsifalsoft.com. The manual has been re-typeset using LaTeX instead of WordPerfect. The typographic consistency and formatting has been considerably improved; unfortunately, the pagination is now completely different, so page numbers are not portable to the new version. All the logic dealing with registration, trial copies, serial numbers, and so forth has been removed. Unix Support The Unix build of the ©command line versionª of AnaGram (agcl) is now supported and available to the public. There is at present no GUI for the Unix version. The long-term goal is to migrate the AnaGram GUI away from the closed (and orphaned) IBM Visual Age class library to something else, probably GTK, so as to support both Windows and Unix. Improved Functionality Examples. The examples have been adjusted to the current dialect of C++ and are now compilable again. The legacy "classlib" code some still depend on is being phased out. Increased Convenience File names. File names in the AnaGram distribution and source tree are no longer limited to 8+3 characters, and quite a few now have less cryptic names. Additionally, all HTML files are now named ".html", not ".htm". Installed files. The AnaGram.cgb and AnaGram.hlp files found in older releases of AnaGram no longer exist; their contents are compiled into the AnaGram executables instead. Bug Fixes Engine compiler error. The ©error_messageª field of the PCB has been changed to const char * so current C++ compilers will accept the code generated when ©diagnose errorsª is turned off. Multiple output header files. Including more than one AnaGram output header file at once used to cause some compilers to issue a warning, because an #ifndef directive was checking the wrong symbol. This has been corrected. Wrappers and error tokens. AnaGram 2.01 generated uncompilable code if you tried to use the ©wrapperª feature and error token resynchronization at the same time. This has been corrected. More than 256 keywords. Build 8 of AnaGram 2.01 fixed certain problems with large keyword tables, but in the process introduced another, which is now fixed. For changes in the previous versions of AnaGram, see ©What's New in AnaGram 2.01ª and ©What's New in AnaGram 2.0ª. ## What's New in AnaGram 2.01 Changes in AnaGram 2.01 Improved Functionality Improved support for building ©thread safe parsersª. All nonconstant parser data previously declared as static variables has been moved to the ©parser control blockª. When the ©reentrant parserª switch is set, all references to the parser control block are passed to functions via calling sequences. The ©extend pcbª switch provides a mechanism to add user-defined variables to the parser control block. Improved support for C++ parsers. The ©wrapperª statement provides C++ wrapper classes for objects to be stored on the ©parser value stackª. The ©PCB_TYPEª macro allows you to derive a C++ class from the parser control block and to access its members from your ©reduction proceduresª. Support for the ©ISO Latin 1ª character set. When using the ©case sensitiveª switch, case conversion is performed for all ISO-Latin-1 characters, not just those in the ASCII range. Improved support for error diagnostics. It is now possible for users to provide their own text for the error messages created by the ©diagnose errorsª switch. In addition, the ©token namesª table option now includes ascii representation of individual characters and keywords instead of only named tokens. The ©token names onlyª switch can be used for compatibility with previous versions of AnaGram More precise determination of error context. The tables used by the ©error frameª option to provide the context of a syntax error have been reworked and now provide a substantially more precise localization of the error. Improved error diagnostics in AnaGram ©Missing reduction procedureª diagnostic. In addition to warning that there is a ©parameter assignmentª without a ©reduction procedureª, this diagnostic is now provided if the ©default reduction valueª does not have the same ©data typeª as the ©reduction tokenª. ©Command line versionª. Diagnostics have been reformatted so they can be recognized by the Microsoft Visual C++ IDE. Refined ©keyword anomalyª diagnostics. There should now be fewer false alarms. Increased Convenience ©File Traceª. If your grammar uses ©semantically determined productionsª, the File Trace feature will now remember the choices you have made for ©reduction tokenªs, so that you do not have to make the same choices over and over again as you work with an example. File Paths. The file paths in the #line directives created by the ©line numbersª switch now use forward slashes instead of backslashes. Changed Defaults ©Parser stack alignmentª. Now defaults to long instead of int. ©Parser stack sizeª. Now defaults to 128 instead of 32. Bug Fixes Interaction between context tracking and error token. In previous versions of AnaGram, if the first token in a rule was the ©error tokenª, the value of ©CONTEXTª was the value that corresponded to the location of the error. CONTEXT now correctly shows the context at which the aborted rule began. For instance, in the following example, if a syntax error is encountered while parsing the expression, the error rule will skip over remaining characters to the terminating semicolon. When invoked from handleError(), the CONTEXT macro will return the context as it was at the beginning of the expression. expression statement -> expression, ';' -> error, ~(eof + ';')?..., ';' =handleError(); ©Distinguish lexemesª. Several minor bugs in the implementation of distinguish lexemes have been corrected. Set partition logic. Corrected problems in the interaction between the set ©partitionª logic and the implementation of the ©disregardª statement. Table size. Fixed a data sizing problem which occurred when one particular parse table had precisely 256 entries. Keyword recognition. Fixed a problem that could cause difficulties with ©keywordª recognition when the ©case sensitiveª switch was turned off. Default conflict resolution. With unresolved ©shift-reduce conflictªs, the shift case was not always being selected. This problem has been corrected. Lockup. It was possible to write an erroneous grammar that would cause AnaGram to lock up. This problem has been corrected. Potential bus error. The error diagnostic funtion created by the ©diagnose errorsª switch, could, under some circumstances, access an uninitialized value on the ©parser value stackª. This problem has been corrected. Internal errors. Fixed a number of minor bugs which could cause ©internal errorªs while running ©File Traceª. For changes in the previous version of AnaGram, see ©What's New in AnaGram 2.0ª. ## What's New in AnaGram 2.0 AnaGram's user interface has been completely revamped to make it more convenient and easier to use. However, the same tried and true AnaGram algorithms are still in place to build your parsers. The rules for syntax files are also unchanged. The ©File Traceª and ©Grammar Traceª facilities have each had their windows combined into a single unit, and a ©Rule Stackª synched with these windows and with your syntax file window has been added. The Rule Stack is particularly convenient for relating the progress of the parse to the ©grammar rulesª in your ©syntax fileª. A ©text entryª field has also been added to the Grammar Trace. This means you can provide character input to your parser in much the same way you can with a ©test fileª in File Trace, but with instant control over the input. Some further controls have been added to both File and Grammar Traces. In particular there is a Reset button to reset the trace to its initial state. This is particularly useful for ©Conflict Traceªs. AnaGram now has a small ©Control Panelª (default position is at the upper right of the screen) from which you can conveniently control operation. A menu bar provides access to the various commands and tables. There are toolbar buttons for Analyze Grammar, Build Parser, File Trace, and so on. The panel also has a data entry field for entering search keys. You can set both colors and fonts in AnaGram windows to suit your own preferences. We suggest you check Help for ©Colorsª or ©Fontsª before making changes to make sure that all information will still be properly displayed. AnaGram's ©Helpª has been updated to provide hypertext-type links. But you can still keep multiple Help windows on view at once. A popup menu shows all the links in a window. New topics have been added. Also, further documentation topics are provided in HTML format in the html subdirectory. A ©Help Cursorª on the Control Panel toolbar can be used to get help for most AnaGram windows, buttons and menu items. F1 can also be used. On the ©Action Menuª you will find a list of your most recently used syntax files. Just click on the file of your choice to have AnaGram analyze it (or build it if ©Autobuildª is on). ## White Space In many grammars it is desirable to pass over blanks, tabs, and similar characters, as well as comments, collectively termed "white space", as though they were not there. The "©disregardª" statement in AnaGram may be optionally used to accomplish this. The "©lexemeª" statement may be used to exercise fine control over the scope of the disregard statement. ## Wrapper The wrapper ©attribute statementª provides correct handling of C++ objects returned by ©reduction procedureªs. If you specify a wrapper for a C++ object, then, when a reduction procedure returns an instance of the object, a copy of the object will be constructed on the ©parser value stackª and the destructor will be called when the object is removed from the stack. Without a wrapper, objects are stored on the value stack simply by coercing the stack pointer to the appropriate type. There is no constructor call when the object is stored nor a destructor call when it is removed from the stack. Classes which use reference counts or otherwise overload the assignment operator should always have wrappers in order to function correctly. Wrapper statements, like other ©attribute statementsª, must appear in configuration sections. The syntax is simply wrapper { <comma delimited list of data types> } For example: [ wrapper {CString, CFont} ] You cannot specify a wrapper for the ©default token typeª. If your parser exits with an error condition, there may be objects remaining on the stack. The ©DELETE_WRAPPERSª macro may be used to delete these objects. If you have enabled ©auto resynchª, DELETE_WRAPPERS will be invoked automatically. The ©AG_PLACEMENT_DELETE_REQUIREDª macro is used to control definition of a "placement delete" operator in the wrapper class AnaGram defines. ## Zero Length A zero length ©tokenª is a ©reduction tokenª which can be matched by a void, i.e. by nothing at all. It represents an optional item, or a sequence of optional items, in the input. Since the matching process can involve several levels of reductions, it is most precise to use the following recursive definition: A zero length token is one which either has at least one ©null productionª or has at least one grammar rule defining it such that all the tokens in the rule are zero length tokens. Care should be taken when using ©zero lengthª tokens in ©recursive ruleªs. If all the tokens in the rule other than the recursive token itself are zero length tokens the rule will generate an infinite loop in the generated parser. The ©Token Tableª identifies zero length tokens because the use of such tokens sometimes inadvertently causes ©conflictªs. ## Control Panel The AnaGram Control Panel appears at the upper right of your monitor when you start AnaGram. It has a menu bar, command buttons, a button which enables a ©help cursorª, and a ©status indicatorª. At the lower left you will see a data entry field for entering ©searchª keys, with neighboring search forward and search backward buttons. Notice that the ©Options Menuª has a "Stay On Top" entry which allows you to specify whether the Control Panel stays on top of other AnaGram windows. ## Status Indicator The status indicator at the right of the AnaGram Control Panel shows the status of the ©current grammarª: Ready Loaded Error Parsed Analyzed Built "Ready" appears only when no grammar has been selected. "Loaded" and "Parsed" are normally transitory. "Error" means at least one syntax error has been detected in your grammar and AnaGram cannot continue. Check the Warnings window to determine the nature of the problem. "Analyzed" means that a ©grammar analysisª has been completed, but no ©output filesª have been written. "Built" means that an analysis has been completed and output files have been written. ## Help Cursor The Help Cursor is accessed via the button with the question mark on AnaGram's ©Control Panelª. It is convenient for getting help on ©Warningªs, browse tables, menu items and so on. If you click on the button you enable the Help Cursor, which you can then drag with the mouse. A further mouse click will provide help for the item underneath the cursor. Note further that AnaGram also has F1 help which you may find simpler and faster than the Help Cursor. ## Search AnaGram has a simple search facility to let you search for text strings in AnaGram windows. A data entry field on the ©Control Panelª is provided for you to enter text. Left-clicking on the neighboring buttons lets you search either forward or backward for a line in the active window which contains at least one instance of the text. Note that the search begins at the next line after the highlighted line for forward search; at the line preceding the highlighted line for backward search. ## Search Key To find a text string in an AnaGram window, enter the string in the Search Key field in the ©Control Panelª and press Enter. To find another instance of the string click on the ©Find Nextª button or press F3. To find a previous instance of the string click on the ©Find Previousª button or press F4. In windows that have a cursor bar, a forward search begins on the line following the cursor and a backward search begins on the line preceding the cursor. ## Find Next The Find Next key, on the ©Control Panelª immediately to the right of the ©Search Keyª field, locates the next instance of the search key in the most recently active AnaGram window. F3 is the keyboard equivalent. ## Find Previous The Find Previous key, on the ©Control Panelª immediately to the right of the ©Find Nextª key, searches backwards for the search key in the most recently active AnaGram window. F4 is the keyboard equivalent. ## Fonts, Set Fonts The Set Fonts dialog allows you to use the fonts of your choice in AnaGram windows. You should make sure that the ©marked tokenªs font is very distinctive so that marked tokens will show up clearly even if they are only 1 or 2 characters long. Sometimes it is helpful to use an underlined font for marked tokens. A Default button at the bottom of the dialog lets you revert to AnaGram's original fonts if you wish. ## Colors, Set Colors The Set Colors dialog allows you change the colors of AnaGram windows. Notice that in the ©File Traceª the ©test file paneª requires three different sets of text and background colors. You should make sure that the backgrounds, at least, can be easily distinguished from each other so the trace information can be properly displayed. You also want to take care that an active pane in a File Trace or Grammar Trace can be distinguished from inactive panes. The Default button at the bottom of the dialog lets you revert to AnaGram's original colors if you wish. Color changes pertain only to the client areas of AnaGram windows. The remaining parts of your windows will have the customary colors you have chosen for your system. ## Marked Token Some tables and trace panes display each rule with one token marked to show how far parsing has progressed in the rule. The marked token is the next input expected in the input stream. It is shown in a different font to distinguish it from other tokens in the rule. If no token is marked, the rule is a ©completed ruleª, i.e. it has been completely matched and will be reduced by the next input. You can set the font for marked tokens by choosing Fonts from the ©Options Menuª. You should make sure that the font is very distinctive so that marked tokens will show up clearly even if they are only 1 or 2 characters long. Sometimes it is helpful to use an underlined font for marked tokens. ## Synch Parse The Synch Parse button replaces the ©Single Stepª button on the toolbar of the ©File Trace windowª when, for some reason, the location of the blinking cursor in the ©test file paneª differs from the current parse position. This can occur when you single click in the test file pane or when the parse cannot track the cursor because of a ©syntax errorª or a ©semantically determined productionª. Click the synch parse button to resynch the parse with the cursor. ## Single Step The Single Step button is one of the control buttons for the ©File Traceª and ©Grammar Traceª. It advances the parse one ©parser actionª at a time. In the File Trace, it is replaced with the "©Synch Parseª" button whenever the blinking cursor loses synch with the current parse location. In the Grammar Trace, the Single Step button takes its input from the Allowable Input pane, the Reduction Choices pane, or the ©text entryª field, depending on which is active. ## Proceed The Proceed button is one of the control buttons for the ©Grammar Traceª. If the ©Reduction Choices paneª or the ©Allowable Input paneª is active, Proceed parses the highlighted token until it is shifted in to the ©parser stackª. If the ©text entryª field is active, Proceed parses all text in the field. If a ©syntax errorª is encountered, the parse stops and all ©reduce actionªs are undone. Note that selecting a token in Allowable Input can cause a syntax error under certain circumstances. This can happen only if the following conditions are all true: the indicated operation is a ©reductionª, the reduction token for the rule being reduced has been used in several different contexts in the grammar and the specified token may follow it in some contexts and not in others. ## Reduction Choices Pane The ©File Traceª and ©Grammar Traceª display a Reduction Choices pane when they need to reduce a ©semantically determined productionª. The rule to be reduced is highlighted in the ©rule stack paneª. If the ©syntax fileª window is visible, it shows the rule in context in your grammar. The Reduction Choices pane lists all possible ©reduction tokenªs for the specified rule. The first reduction token that is admissible in the current context is highlighted and it appears as the ©lookahead tokenª in the ©parser stack paneª. The text that comprises the entire rule is highlighted in the ©test file paneª. Select the desired reduction token before continuing with the parse. If you select a token and it does not appear as the lookahead token, it is not syntactically correct in the current context. If you try to proceed with the parse, you will get a ©selection errorª. ## Selection Error The ©Parse Statusª field indicates a "selection error" if you choose a ©reduction tokenª from the ©Reduction Choices paneª of a ©File Traceª or ©Grammar Traceª and the selected token is not syntactically correct in the current context. ## Parser Stack Pane The Parser Stack pane, the upper left pane of the ©File Traceª and ©Grammar Traceª windows, displays the ©parser stackª for the current trace. Each line corresponds to one level in the parser state stack. It shows the stack index, the ©parser stateª for that level, and the ©tokenª which was seen at that state. The last line of the stack, the ©lookahead lineª, corresponds to the current state of the parser. Since no input has yet been processed for this state, the token, if any, which appears at this level is a ©lookahead tokenª. If you move the cursor in the Parser Stack pane of a File Trace, the text that makes up the selected token will be highlighted in the ©Test File paneª. You can back the parse up to any desired stack level by double clicking at the beginning of the token text in the Test File pane. Similarly, if you move the cursor bar in the Parser Stack pane of a Grammar Trace, the ©Allowable Input paneª will change to display the allowable tokens in the selected state. The previously selected token will be highlighted. Then, double click on any token in the Allowable Input pane to back the parse up and choose a token a second time. The ©Rule Stack paneª of the File or Grammar Trace is also synched to the Parser Stack pane. If the ©syntax fileª window is visible, it will be synched to show the rule currently selected in the rule stack pane. Note that rules that have been automatically generated by the expansion of ©virtual productionsª cannot be synched, so the top line of the syntax file will be highlighted instead. In the Grammar Trace, the last line of the Parser Stack may or may not display a ©lookahead tokenª, depending on the last ©parser actionª performed. If input was taken from Allowable Input and the last action was a simple ©reduce actionª, the last input token selected will be displayed as the lookahead input. But if the last action performed shifted the token in, the lookahead field will be empty. If you right-click on a highlighted line in the Parser Stack pane, you will get a pop-up menu to give you more information. In particular you can get an ©Auxiliary Traceª starting at the current point in your File or Grammar Trace, so you can explore various possibilities without losing your position in the old trace. ## Exit Select this entry from the ©Action Menuª to terminate AnaGram. ## Allowable Input, Allowable Input Pane The upper right pane of the ©Grammar Traceª window lists the allowable input tokens for the current state of the ©grammarª. The tokens in the Allowable Input pane are listed in two groups: first, the ©terminal tokensª allowable in this state, and second, the ©nonterminal tokensª. Between these two groups of tokens is inserted a line which is either an option for a ©default reductionª, or declares that there is no default action. Double click, press Enter, or click the ©Proceedª button to parse the highlighted token. When all parse actions triggered by the highlighted token have been completed, all panes of the trace will be redrawn to show the new state of the parser. Note that selecting a token in Allowable Input can cause a syntax error under certain circumstances. This can happen only if the following conditions are all true: the indicated operation is a ©reductionª, the reduction token for the rule being reduced has been used in several different contexts in the grammar and the specified token may follow it in some contexts and not in others. If you wish to see the results of a single parser action, click on the ©single stepª button. The parser will perform a single parser action. If the token you selected was not shifted in, it will now be displayed as the ©lookahead tokenª on the last line, the ©lookahead lineª in the ©Parser Stack paneª, and will be preselected in the Allowable Input pane. Because AnaGram, by default, uses a number of compound parser actions, this situation does not arise very often unless you have set the ©traditional engineª switch or reset the ©default reductionsª switch. Usually you will want to select the same token to proceed, but it is not necessary. The Allowable Input pane also displays the ©parser actionª associated with a specific token. If it is not a ©compound actionª, the action and its result are also shown. The ©parser actionª field for a token may be interpreted as follows: If this token would cause a shift to a new state, the action field is ">>" followed by the new state number. If the token would cause a ©reductionª, the action field is "<<" followed by a ©rule numberª to show the rule reduced. If the parser action is a compound action, the action field is blank. If the token would cause the grammar to be accepted, the action field is "Accept". The ©text entryª field at the bottom of the Grammar Trace can be used as a convenient alternative to the Allowable Input pane. It accepts characters rather than tokens. Most non-printing characters such as newline are only available from Allowable Input. ## Copy The Copy command on the ©Windows Menuª copies the currently active table or Help topic to the clipboard. ## Statistical Summary While your grammar is being analyzed, a Statistical Summary window pops up to show you the progress of the analysis. Unless you have turned off ©Show Statisticsª on the ©Options Menuª, this window will remain on-screen for your reference. Among other things, it shows you the number of rules and states in your grammar, and the number of conflicts and warnings, if any. Note that if your grammar is small and you have Show Statistics turned off, the appearance of this window on your monitor may be exceedingly brief - you may just see a flash. If the window is turned off or you have closed it, you can get it from the ©Browse Menuª. ## Stay On Top The Stay On Top entry in the ©Options Menuª allows you to specify whether the ©Control Panelª stays on top of other AnaGram windows. ## Show Syntax If this entry in the ©Options Menuª is checked, AnaGram will display the ©syntax fileª when it has analyzed your ©grammarª. If this entry is not checked or you have closed the syntax file window, you can select the window from the ©Browse Menuª. ## Show Statistics If this entry in the ©Options Menuª is checked, AnaGram will leave the ©Statistical Summaryª on the screen after it has analyzed your ©grammarª. If this entry is not checked or you have closed the Statistical Summary window, you can select the window from the ©Browse Menuª. ## About AnaGram Select this entry from the ©Help Menuª to find out the version and serial numbers of your copy of AnaGram, and how to contact Parsifal Software. ## Help Topics Select Help Topics from the ©Help Menuª to get a complete list of AnaGram Help Topics titles. You can bring up the window for a highlighted topic by double-clicking with the left mouse button, pressing F1, or using the ©Help Cursorª. ## Cascade Windows Select this entry from the ©Windows Menuª to cascade your open windows starting at top left of the screen. ## Close Windows Select this entry from the ©Windows Menuª to close all open windows except the ©Control Panelª. You may also close the active window by pressing the Escape key. ## Hide Windows Select this entry from the ©Windows Menuª to hide all open windows except the ©Control Panelª. Restore them to the screen with ©Restore Windowsª ## Restore Windows Use this command on the ©Windows Menuª to restore to the screen any windows you have previously hidden with ©Hide Windowsª. ## Token Input, Preprocessor, Lexical Scanner AnaGram makes it unnecessary, in most cases, to have a separate preprocessor to provide the ©tokensª which are fed to your parser. However in some cases you may want to use a preprocessor, or lexical scanner, to provide input to your parser. The preprocessor may or may not be written in AnaGram. If it sends the parser token numbers, as opposed to character codes, this is referred to as token input, as opposed to character input. Please refer to the AnaGram User's Guide for information on identifying the tokens to the parser and providing their semantic values, if any. Since a ©File Traceª is based on character codes, it will be greyed out on the ©Action Menuª if you have token input. For a ©Grammar Traceª, entering characters in the ©text entryª field is not appropriate and will simply cause a syntax error. ## Lookahead Line The last line of the ©Parser Stack paneª, the "lookahead" line, will sometimes show a ©lookahead tokenª, and sometimes not. In a ©File Traceª, you will always see a lookahead token because it is available from the ©test fileª. In a ©Grammar Traceª you will usually see a lookahead token only when you have used the ©Single Stepª button or if there is available input in the ©text entryª field. In the latter case the token corresponding to the first character of the input will appear on the lookahead line. If you click Single Step after selecting a token from ©Allowable Inputª and it causes only a simple ©reduce actionª (as opposed to a shift or a compound action), then, upon completion of the reduction, the token you selected will appear on the lookahead line and also will be preselected in Allowable Input. Usually you would select this token for the next parse step. However, if there are other possible inputs in this state, the parse theoretically could have arrived at this state by a different sequence of input tokens. Thus, if you are more interested in the behavior of the parser at this state than in the response of the parser to a particular sequence of inputs, it is perfectly valid to select a different input token, and AnaGram will let you do it. Note that if you have enabled the ©traditional engineª switch or disabled the ©default reductionsª switch, the probability of finding a token which does a simple reduction is noticeably higher than otherwise. ## Action Menu The Action menu begins with the ©Analyze Grammarª and ©Build Parserª commands. If a grammar has already been analyzed, but not yet built, there will also be an extra Build command bearing the name of your syntax file. There are also ©Reanalyzeª and ©Rebuildª commands which are initially greyed out. They become available if you change the current syntax file. The next section has ©File Traceª and ©Grammar Traceª commands. If you have enabled the ©Error Traceª ©configuration switchª, this section also shows an Error Trace command. The menu ends with an ©Exitª command and a list of recently used syntax files, if any. Just click on a syntax file name to have AnaGram analyze it, or build it if the ©Autobuildª option is on. ## Browse Menu Initially, the Browse Menu shows only a single entry: ©Configuration Parametersª which lets you see the current state of configuration parameters before any may have been set by your syntax file. Once you have analyzed a grammar, this menu fills up with many tables containing information about your grammar. You can also bring up a window showing your ©syntax fileª from this menu. If your grammar has generated ©syntax errorªs or warnings, or contains conflicts, there will be ©Warningªs or ©Conflictªs entries. ## Options Menu From this menu you can select a ©Fontsª or ©Colorsª dialog so you can set AnaGram's fonts and colors to suit your own tastes. You can set ©Autobuildª if you want AnaGram to automatically build your ©grammarª when you select a ©syntax fileª from the ©Action Menuª. You can also choose whether or not to automatically show the ©Statistical Summaryª window or your syntax file window when you open a grammar, or make the ©Control Panelª stay on top of other AnaGram windows. ## Windows Menu The Windows menu lets you cascade, close, or hide all AnaGram windows except the ©Control Panelª, or restore them if they have been hidden. It also has a list of open windows (even if hidden) so you can select the one you want. The Copy command will copy most windows to the clipboard. ## Help Menu The Help Menu has the following entries: ©Getting Startedª provides a brief description of AnaGram and introductory suggestions. ©Help Topicsª brings up a list of all help topics. ©Using Helpª tells you how to use AnaGram's help facilities. ©What's Newª has information on new features of this version of AnaGram. ©About AnaGramª tells you what version of AnaGram you are using, and also provides contact information for Parsifal Software. ## Autobuild When Autobuild (©Options Menuª) is checked, selecting a file from the list of most recently used files on the ©Action Menuª invokes the ©Build Parserª command. Otherwise, the ©Analyze Grammarª command is invoked. ## Reanalyze, Rebuild Reanalyze and Rebuild commands on the ©Action Menuª are initially greyed out. Reanalyze becomes available if you have a syntax file currently analyzed or built in AnaGram and change it while AnaGram is still running. Rebuild becomes available if you have a syntax file currently built and change it while AnaGram is still running. ## Percent Sign The percent sign ( % ) is used to mark certain tokens in your grammar which AnaGram must redefine in order to implement the ©disregardª statement. If you have used this statement in your grammar, You will probably notice the percent sign appearing in some windows and traces. The percent sign indicates the original token, without the optional white space attached. Early versions of AnaGram used the degree sign instead, but this character is not generally available in Windows. ## Program Development The first step in writing a program is to write a ©grammarª in AnaGram notation which describes the input the program expects. The file containing the grammar, called the ©syntax fileª, should have the extension ".syn". You could also make up a few sample input files at this time, but it is not necessary to write ©reduction procedureªs at this stage. Run AnaGram and use the ©Analyze Grammarª command to create parse tables. If there are ©syntax errorsª in the grammar at this point, you will have to correct them before proceeding, but you do not necessarily have to eliminate ©conflictsª, if there are any, at this time. There are, however, many aids available to help you with conflicts. These aids are described in the AnaGram User's Guide, and somewhat more briefly in the Online Help topics. Once syntax errors are corrected, you can try out your grammar on the sample input files using the ©File Traceª facility. With File Trace, you can see interactively just how your grammar operates on your test files. You can also use ©Grammar Traceª to answer "what if" questions concerning input to the grammar. The Grammar Trace does not use a test file, but rather allows you to make input choices interactively. At any time, you can write ©reduction procedureªs to process your input data as its components are identified in the input stream. Each procedure is associated with a ©grammar ruleª. The reduction procedures will be incorporated into your parser when you create it with the ©Build Parserª command. By default, unless you specify an input procedure, ©parser inputª will be read from stdin, using the default ©GET_INPUTª macro. You will probably wish to redefine GET_INPUT, or configure your parser to use ©pointer inputª or ©event drivenª input. ## License, Copyright, Copying, Open Source, Warranty, No Warranty AnaGram, A System for Syntax Directed Programming Copyright 1993-2002 Parsifal Software Copyright 2006, 2007 David A. Holland All Rights Reserved. AnaGram itself is released to the public under the traditional 4-clause BSD license: Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by Parsifal Software, Jerome T. Holland, and their contributors. 4. Neither the name of Parsifal Software nor the name of Jerome T. Holland nor the names of their contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY PARSIFAL SOFTWARE, JEROME T. HOLLAND, AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL PARSIFAL SOFTWARE, JEROME T. HOLLAND, OR THE CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. The AnaGram ©parsing engineª, that is, the code that is emitted by AnaGram and incorporated into programs developed using AnaGram, uses this less restrictive zlib-style license: This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. ##