Mercurial > ~dholland > hg > ag > index.cgi
annotate doc/misc/html/isdp.html @ 24:a4899cdfc2d6 default tip
Obfuscate the regexps to strip off the IBM compiler's copyright banners.
I don't want bots scanning github to think they're real copyright
notices because that could cause real problems.
author | David A. Holland |
---|---|
date | Mon, 13 Jun 2022 00:40:23 -0400 |
parents | 13d2b8934445 |
children |
rev | line source |
---|---|
0
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
2 <HTML> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
3 <HEAD> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
4 <TITLE>Introduction to Syntax Directed Parsing</TITLE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
5 </HEAD> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
6 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
7 <BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif" |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
8 TEXT="#000000" LINK="#0033CC" |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
9 VLINK="#CC0033" ALINK="#CC0099"> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
10 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
11 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
12 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
13 <IMG ALIGN="right" SRC="images/agrsl6c.gif" ALT="AnaGram" |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
14 WIDTH=124 HEIGHT=30> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
15 <BR CLEAR="all"> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
16 Back to <A HREF="index.html">Index</A> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
17 </P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
18 <IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------" |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
19 WIDTH=1010 HEIGHT=2 > |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
20 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
21 <BR CLEAR="all"> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
22 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
23 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
24 <H1 ALIGN="LEFT">Introduction to Syntax Directed Parsing</H1> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
25 <IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------" |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
26 WIDTH=1010 HEIGHT=2 > |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
27 </P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
28 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
29 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
30 (Adapted from Chapter 4, AnaGram User's Guide)</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
31 </P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
32 <UL> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
33 <LI SRC="#What"><A HREF="#What">What is Syntax Directed Parsing?</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
34 <LI><A HREF="#Describing">Describing an Input Sequence</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
35 <LI><A HREF="#How">How a Parser Works</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
36 <LI><A HREF="#Note">A Note on Notation</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
37 <LI><A HREF="#Reduction">Reduction Procedures</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
38 <LI><A HREF="#Building">Building a Parser</A> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
39 <UL> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
40 <LI SRC="#Invoking"><A HREF="#Invoking">Invoking a Parser</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
41 <LI><A HREF="#Communicating">Communicating with a Parser</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
42 <LI><A HREF="#Parser">Parser Input</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
43 <LI><A HREF="#Error">Error Handling</A></LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
44 </UL> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
45 </LI> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
46 </UL> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
47 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
48 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
49 <H2><A NAME="What">What is Syntax Directed Parsing?</A></H2> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
50 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
51 Every programmer has to deal with input data. Usually processing input data |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
52 depends on what has preceded, and often even on what follows, the input under |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
53 consideration. Keeping track of these dependencies in order to know how to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
54 process the data is called parsing. It is often relatively easy to keep track of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
55 simple dependencies when first writing a program. As the program develops, as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
56 new features are added, as bugs are fixed, the dependencies often stop being |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
57 simple. Input processing becomes a headache, since it is hard to keep track of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
58 or even identify all the particular cases. Changes to the program cause |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
59 unexpected problems and program maintenance threatens to get out of control.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
60 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
61 Syntax directed parsing is a technique for addressing these difficulties. In |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
62 syntax directed parsing, the input part of a program is constructed |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
63 automatically, by means of a standard algorithm, from a high level description |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
64 of the input data structure. Code to perform any required processing of the data |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
65 is attached to the description in a convenient way.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
66 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
67 The description, being non-procedural, is generally easier to write and to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
68 modify than the equivalent program code, and much less likely to harbor bugs. It |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
69 is easier to read, and easier to maintain. It is easy to re-use in other |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
70 programs requiring the same input, thus encouraging uniform interfaces. The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
71 technique also simplifies the overall program by decoupling the input and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
72 processing components and providing a natural, modular structure.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
73 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
74 To use syntax directed parsing you first write the input data description, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
75 called a <A HREF="gloss.html#Grammar">grammar</A>. A file which contains a grammar is called a syntax |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
76 file. A parser generator, such as AnaGram, then can create, from the syntax |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
77 file, a function (or program) called a <A HREF="gloss.html#Parser">parser</A>, written in C or C++. The parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
78 keeps track of all the dependencies in your input, and calls certain functions, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
79 <A HREF="gloss.html#ReductionProcedure">reduction procedures</A>, to deal with specific units or sequences of data as they |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
80 are encountered.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
81 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
82 Reduction procedures are the code you write to process your data. In your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
83 grammar they are linked to the structures in your input, so that your parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
84 will call them at precisely the right times with precisely the right data. Note |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
85 that with this technique you need only provide a non-procedural <I>description</I> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
86 of the input. The details of flow of control are handled entirely by the parser. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
87 When you write reduction procedures, you can concentrate entirely on what you |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
88 have to do with the data. You don't have to encumber your code with switches and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
89 tests to determine the structure of your input.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
90 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
91 The parsers you build using a parser generator such as AnaGram may be complete |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
92 stand-alone programs, or they may serve as input routines for a more extensive |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
93 program. Some programs may even use more than one parser.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
94 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
95 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
96 <H2><A NAME="Describing">Describing an Input Sequence</A></H2> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
97 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
98 Writing a grammar consists of describing the acceptable input sequences for your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
99 program. The vehicle for describing an input sequence is called a <A HREF="gloss.html#Production">production</A>. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
100 Productions show how a logical component of the input can be made up of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
101 sequence of more elementary components. A production that describes a date might |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
102 be written:</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
103 <PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
104 date |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
105 -> name of month, day, comma, year </PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
106 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
107 The components of the input are called <A HREF="gloss.html#Token">tokens</A>. The sequence of tokens on the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
108 side of the production is called a <A HREF="gloss.html#GrammarRule">grammar rule</A>, or rule for short. The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
109 individual tokens on the right side of the rule are also called <A HREF="gloss.html#RuleElement">rule elements</A>. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
110 </P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
111 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
112 The token on the left side of the production is called the <A HREF="gloss.html#ReductionToken">reduction token</A> for |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
113 the rule. Tokens may have semantic values, as distinguished from syntactic |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
114 values, which you can use in your <A HREF="gloss.html#ReductionProcedure">reduction procedures</A>. For instance, the value |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
115 of name of month could be an integer in the range zero to eleven, or it could be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
116 a pointer to an ascii string. The value of day could be an integer in the range |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
117 one to thirty-one.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
118 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
119 A <A HREF="gloss.html#Grammar">grammar</A> consists of a number of such <A HREF="gloss.html#Production">productions</A>, each of which describes some |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
120 component of the input in terms of other components. It does not take very many |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
121 productions to describe quite complex input streams. A grammar for the C |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
122 language, for instance, requires about two hundred productions.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
123 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
124 Many people find the term "production" quite confusing. It comes from theoretical |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
125 linguistics where it is used to describe how one may produce sequences which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
126 correspond to a set of grammatical rules. Ironically, the major usage of the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
127 idea has been in parsing where the interest is not so much in creating sequences |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
128 which satisfy the grammatical rules as in decoding and analyzing such sequences. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
129 Nonetheless, it is convenient, in the above example, to say that the token date |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
130 produces a sequence of tokens consisting of name of month, day, comma, and year. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
131 We also say that the sequence reduces to date.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
132 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
133 There may be more than one production to describe a given component, if there is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
134 more than one way it may be represented. For instance,</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
135 <PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
136 date |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
137 -> day, name of month, year </PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
138 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
139 describes another common way of writing a date. In other words, a <A HREF="gloss.html#ReductionToken">reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
140 token</A> may produce a number of different <A HREF="gloss.html#GrammarRule">grammar rules</A>.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
141 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
142 Tokens which appear on the left side of one or more productions are called |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
143 <A HREF="gloss.html#Nonterminal">nonterminal tokens</A>. Those which appear <I>only</I> on the right sides of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
144 productions are called <A HREF="gloss.html#Terminal">terminal tokens</A>. Terminal tokens are the units which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
145 actually appear physically in the input. Nonterminal tokens are identified when |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
146 a sequence of tokens that matches the right side of a production is seen in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
147 input. When AnaGram analyzes a <A HREF="gloss.html#Grammar">grammar</A>, it assigns a unique token number to each |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
148 token it finds in the grammar.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
149 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
150 Nonterminal tokens, such as date in the example above, may appear in any grammar |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
151 rule just as though they were input tokens. The token on the left side of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
152 production can even appear on the right side as well. Such a production is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
153 called a recursive production. When a nonterminal token appears on the right |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
154 side of a production, it may be represented in this context by <I>any</I> of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
155 the <A HREF="gloss.html#GrammarRule">grammar rules</A> it produces. Grammars described in this manner are called |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
156 <A HREF="gloss.html#ContextFreeGrammar">context free grammars</A> since there is no contextual constraint on which of the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
157 rules that a token produces can appear in any given context.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
158 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
159 Recursive productions may be either left recursive or right recursive. Left |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
160 recursive productions are those where the recursively defined <A HREF="gloss.html#Nonterminal">nonterminal</A> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
161 appears as the first <A HREF="gloss.html#RuleElement">element</A> in the recursive rule. Right recursive productions |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
162 are those where it is the last element. If it appears anywhere between, the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
163 production is said to be center recursive.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
164 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
165 Any nonterminal token which has a recursive production must also have at least |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
166 one simple, non-recursive production. Otherwise, it is not possible to create a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
167 finite sequence of terminal tokens from the nonterminal token.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
168 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
169 Recursion may also occur implicitly in a <A HREF="gloss.html#Grammar">grammar</A> when one of the tokens on the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
170 right side of a production itself has a production involving the token on the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
171 left. Such implicit recursion sometimes may involve numerous levels of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
172 productions. Implicit recursion occurs most commonly in describing constructs |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
173 such as arithmetic expressions or the block structure of programming languages.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
174 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
175 Clearly, grammars can accommodate multiple levels of structure in the input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
176 sequences they describe. There must, at the top, be a single token which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
177 encompasses the entire input. This special token is variously called the <A HREF="gloss.html#GrammarToken">grammar |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
178 token</A>, the goal token or the start token.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
179 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
180 AnaGram allows you to specify <A HREF="gloss.html#Terminal">terminal tokens</A> explicitly as ascii characters, or |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
181 even as <A HREF="gloss.html#CharacterSets">sets of ascii characters</A>, right in the grammar. Thus, you may write |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
182 '0-9' to represent the set of ascii digits, or 'A-Z' to represent the set of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
183 upper case letters. The semantic value of such a <A HREF="gloss.html#Token">token</A> is the ascii character |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
184 code that actually appears in the input stream. If the various sets you use in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
185 your grammar overlap, they may not properly represent terminal tokens. In this |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
186 case, AnaGram automatically extends your <A HREF="gloss.html#Grammar">grammar</A> appropriately. </P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
187 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
188 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
189 <H2><A NAME="How">How a Parser Works</A></H2> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
190 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
191 The aim of a <A HREF="gloss.html#Parser">parser</A> is to match its input with the full syntactic structure |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
192 specified by the <A HREF="gloss.html#Production">productions</A> which make up the grammar. The primary component of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
193 a parser is an input buffer, sometimes thought of as a stack, into which tokens |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
194 are shifted, or stored sequentially, as they are encountered in the input. At |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
195 the same time that a token is shifted into the input buffer, its semantic value |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
196 is pushed onto the value stack. A token is not shifted into the buffer unless it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
197 "makes sense", that is, unless it is consistent with the rules of the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
198 <A HREF="gloss.html#Grammar">grammar</A> and with the input that has preceded it. If a token does not make sense, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
199 the parser signals a syntax error. In order to determine whether a token makes |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
200 sense, the parser has a sort of decision table which provides a list of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
201 acceptable tokens for each of a number of states. The table also specifies what |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
202 the parser is to do with each acceptable token. When the table indicates that a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
203 token is to be shifted into the buffer, it also specifies a new state. The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
204 parser stacks the current state on a state stack and jumps to the new state. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
205 Thus every time a token is shifted into the input buffer, a state number is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
206 pushed onto the state stack. For each state of the parser, excepting only the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
207 initial state, there is a unique token which will cause a jump to that state. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
208 This token is called the <A HREF="gloss.html#CharacteristicToken">characteristic token</A> for the state.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
209 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
210 When the rightmost, or most recent, tokens in the input buffer match the right |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
211 side of a production precisely, the parser <I>may</I> replace the tokens that |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
212 match the rule with a single token, the token on the left side of the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
213 production. This process of replacing a sequence of tokens with a single token |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
214 is called reduction. The token that replaces the sequence of tokens is called |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
215 the <A HREF="gloss.html#ReductionToken">reduction token</A>.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
216 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
217 The actual mechanism of the reduction is quite important. At the same time that |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
218 input tokens are removed from the input buffer, state numbers are popped from |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
219 the state stack, so that when all input tokens matching the rule have been |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
220 removed, the parser state has been restored to the value it had at the time the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
221 first token in the rule was seen. As state numbers are popped from the state |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
222 stack, token values are popped from the value stack. If the rule has a <A HREF="gloss.html#ReductionProcedure">reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
223 procedure</A>, temporary variables are loaded with the values popped from the stack |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
224 and the reduction procedure is called. The reduction token is now shifted into |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
225 the input buffer just as though it were an input token. If the reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
226 procedure returned a result it is shifted into the value stack as the value of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
227 the reduction token. The parser stacks the current state again and jumps to a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
228 new state as determined by the parser tables.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
229 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
230 If there are no errors in your input, when the last token has been read from the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
231 input, shifted into the input buffer, and reductions performed, there will be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
232 precisely one token in the input buffer: the <A HREF="gloss.html#GrammarToken">grammar, or goal, token</A> which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
233 describes your entire input. At this point your parser declares itself finished.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
234 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
235 Reductions do not necessarily occur every time a rule matches the tokens in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
236 input buffer. If the reduction token does not make sense, that is, if it is not |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
237 consistent with the rules of the <A HREF="gloss.html#GrammarToken">grammar</A> and with the input that has preceded |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
238 it, the parser will not perform the reduction. Suppose there are tokens in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
239 input which match one of the rules given above for date. A reduction will not |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
240 occur unless date is one of the tokens the parser is actually looking for at |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
241 that stage in the input.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
242 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
243 Even when the reduction token would make sense, there are still situations where |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
244 the reduction would not take place. Suppose a grammar includes the two |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
245 <A HREF="gloss.html#Production">productions</A> given above for date as well as the following:</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
246 <PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
247 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
248 date |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
249 -> name of month, day </PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
250 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
251 This production is the same as the first, but with no year specification. When a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
252 parser directed by this grammar has encountered name of month and day, it can't |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
253 tell without looking further whether it has a short form or a long form date. In |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
254 such a circumstance, the parser looks at the next following token, which is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
255 called a <A HREF="gloss.html#Lookahead">lookahead token</A>. If the lookahead token is a comma, then, in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
256 absence of other productions, the input is a long form date. If the lookahead |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
257 token is not a comma, then the input is certainly not a long form date, and it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
258 is proper to reduce the short form production.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
259 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
260 Suppose the lookahead token were a comma and the grammar were also to contain |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
261 the following production:</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
262 <PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
263 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
264 appointment |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
265 -> date, comma, time </PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
266 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
267 Since a comma can follow date, according to this rule, and can also follow day |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
268 according to the first production, it is impossible to determine, simply by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
269 looking at the <A HREF="gloss.html#Lookahead">lookahead token</A>, whether the date was being given in short form |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
270 or long form. One would have to look beyond the comma to see if what follows the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
271 comma matches the rules for time or for year. Although it is possible to build |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
272 parsers which can do this, it is not generally feasible. Situations of this sort |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
273 are called <A HREF="gloss.html#Conflict">conflicts</A>. AnaGram warns you about the conflicts in your <A HREF="gloss.html#Grammar">grammar</A> when |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
274 it analyzes your grammar, and provides numerous facilities to help you correct |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
275 them.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
276 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
277 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
278 <H2><A NAME="Note">A Note on Notation</A></H2> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
279 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
280 <A HREF="gloss.html#ContextFreeGrammar">Context free grammars</A> have been traditionally represented in the literature |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
281 using <A HREF="gloss.html#BNF">Backus-Naur Form</A>, or BNF. In Backus-Naur Form, certain characters, called |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
282 metacharacters, are used to punctuate <A HREF="gloss.html#Production">productions</A> and all other printable |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
283 characters are taken to represent themselves literally. Named tokens are denoted |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
284 by enclosing the name within angle brackets <CODE>< ></CODE> . The left side of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
285 production is distinguished from the right side by the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
286 characters <CODE> ::= </CODE>. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
287 If several productions have the same left side, the vertical |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
288 bar <CODE> | </CODE> is used to separate them. The <A HREF="gloss.html#RuleElement">elements</A> of a <A HREF="gloss |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
289 .html#GrammarRule">grammar rule</A> are simply juxtaposed |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
290 to indicate that one follows another. Blanks are ignored. Thus, in BNF, the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
291 first production given for date, above, would be:</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
292 <PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
293 <date> ::= <name of month> <day>, <year> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
294 </PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
295 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
296 AnaGram uses a notation more consonant with ordinary programming usage. Thus |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
297 token names need not be bracketed and literal characters must be appropriately |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
298 quoted. The elements of rules are joined by commas. Using this approach, there |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
299 is no need for metacharacters and it becomes possible to make a number of useful |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
300 extensions to the notation.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
301 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
302 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
303 <H2><A NAME="Reduction">Reduction Procedures</A></H2> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
304 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
305 Of course, the reason for parsing an input stream is to interpret the data in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
306 the stream and to process it in some useful manner. The primary tool for doing |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
307 this is the <A HREF="gloss.html#ReductionProcedure">reduction procedure</A>. A reduction procedure is a piece of C code that |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
308 is executed when a particular <A HREF="gloss.html#GrammarRule">grammar rule</A> is reduced. Often, a reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
309 procedure calculates a value which becomes the semantic value of the reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
310 token. The input to the reduction procedure consists of the values of the tokens |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
311 that make up the grammar rule.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
312 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
313 AnaGram allows you to assign C variable names to the tokens in a grammar rule, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
314 so that you can refer to them in the reduction procedure. To assign a C variable |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
315 name, simply follow the token in the rule with a colon and the C variable name. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
316 Simple reduction procedures can be written as C or C++ expressions. At the end |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
317 of the rule, write an equal sign and an expression, terminated by a semicolon. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
318 For example:</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
319 <PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
320 (int) hex digit |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
321 -> '0-9':d =d-'0'; |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
322 -> 'a-f':d =d-'a'+10; |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
323 -> 'A-F':d =d-'A'+10; </PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
324 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
325 When any one of these rules is matched, the value of the token in the rule is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
326 assigned to the temporary variable. The expression to the right of the equal |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
327 sign is evaluated and the value of the expression is stored as the value of hex |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
328 digit, which has been declared to be an int. These <A HREF="gloss.html#Production">productions</A> define |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
329 hexadecimal digits as ascii characters, and calculate the binary value of the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
330 digit in terms of the ascii character code, d. The binary value becomes the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
331 value of hex digit. Hexadecimal digits can be combined to make hexadecimal |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
332 numbers by writing the following productions:</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
333 <PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
334 (int) hex number |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
335 -> hex digit |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
336 -> hex number:n, hex digit:d =16*n+d; </PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
337 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
338 There are several important points to notice in this example. First, <A HREF="gloss.html#ReductionProcedure">reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
339 procedures</A> are executed "from the bottom up". That is, the reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
340 procedure for hex digit is executed before any reduction procedure for hex |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
341 number. Second, if there is no reduction procedure for a production, the value |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
342 of the first token in the rule is assigned to the reduction token. Thus, it is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
343 not necessary to provide a reduction procedure in the first production for hex |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
344 number. Third, the reduction procedures for recursive productions are always |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
345 executed <I>after</I> the reduction procedure, if any, for the non-recursive |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
346 production which begins the recursion. Finally, when an input sequence is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
347 described using left recursion, as in this example, the <A HREF="gloss.html#RuleElement">elements</A> of the sequence |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
348 are processed left to right.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
349 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
350 If you wish to process the elements of a sequence right to left, you may use |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
351 right recursion. For example, it is sometimes convenient to define the fraction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
352 part of a decimal number thus:</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
353 <PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
354 (double) fraction part |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
355 -> '0-9':d =(d-'0')/10.; |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
356 -> '0-9':d,fraction part:f =(d-'0'+f)/10.;</PRE> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
357 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
358 In this case the leading digits are stored temporarily in the parse stack, and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
359 then the fraction part is evaluated right to left only when the last digit has |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
360 been found.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
361 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
362 Reduction procedures can be more complex than simple expressions. After the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
363 equal sign you may include an arbitrary block of C or C++ code, enclosed in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
364 braces, { }. To return a semantic value for the reduction token simply use a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
365 return statement. Of course, reduction procedures have the full resources of C |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
366 or C++ at their disposal. They may set and interrogate global variables and may |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
367 call functions freely.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
368 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
369 Since the reduction procedures you write will probably need some support code, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
370 such as #include statements and declarations, you may incorporate C or C++ code |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
371 into your syntax file at any point. You need only enclose it in braces ({}). |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
372 Such code is called embedded C. All embedded C code is also copied to the parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
373 file, and <I>precedes</I> all of your reduction procedures.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
374 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
375 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
376 <H2><A NAME="Building">Building a Parser</A></H2> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
377 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
378 In order to round out a <A HREF="gloss.html#Parser">parser</A> into a functioning program it needs input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
379 procedures, as well as error diagnosis and recovery capabilities. AnaGram has a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
380 number of options available which give you a high degree of flexibility in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
381 configuring a parser to suit your particular needs. All of the options are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
382 provided with reasonable defaults, so that you can safely disregard any option |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
383 until you need the features it provides.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
384 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
385 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
386 <H4><A NAME="Invoking">Invoking a Parser</A></H4> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
387 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
388 Normally, AnaGram configures <A HREF="gloss.html#Parser">parsers</A> as functions which you can call from |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
389 elsewhere in your program. In this situation, you call the parser, it processes |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
390 its input, and returns either when it has finished or cannot proceed because of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
391 errors. Alternatively, if you set the event driven configuration switch, your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
392 parser will be configured so that you have two procedures to call: an |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
393 initializer and a parser. In the event driven configuration you start the parse |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
394 by calling the initializer and then you call the parser once for each unit of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
395 input. Using the event driven mode makes it quite easy to configure a parser as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
396 a filter and to chain several parsers together so that the output from one |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
397 parser is the input to the next. Such multi-stage parsing is a convenient way to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
398 deal with complex input that is not context free.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
399 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
400 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
401 <H4><A NAME="Communicating">Communicating with a Parser</A></H4> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
402 <P>The complete status of your parser is contained in a single data structure |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
403 called a parser control block. All communications with a parser take place via |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
404 the parser control block. Input procedures must place input data in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
405 appropriate field in the parser control block. When the parse is complete or |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
406 encounters an error, the results of the parse may be found in the parser control |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
407 block. When AnaGram builds a parser it includes, in the header file it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
408 generates, a typedef statement which defines the structure of the parser control |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
409 block.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
410 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
411 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
412 <H4><A NAME="Parser">Parser Input</A></H4> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
413 <P>The input to your <A HREF="gloss.html#Parser">parser</A> may be either characters read directly from an |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
414 input stream, or tokens accumulated by a pre-processor or <A HREF="gloss.html#LexicalScanner">lexical scanner</A>. The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
415 way you provide input to your parser depends on how your <A HREF="gloss.html#Grammar">grammar</A> defines input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
416 tokens and also on whether or not you have requested an event driven parser.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
417 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
418 If your parser is event driven, you provide its input by storing the input code |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
419 and the input value, if any, into the parser control block and calling the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
420 parser.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
421 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
422 If you have set the pointer input configuration switch in your syntax file, you |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
423 simply initialize the pointer field in your parser control block before you call |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
424 your parser. Your parser will then read its input directly from memory by simply |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
425 incrementing the pointer as necessary.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
426 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
427 Otherwise, your parser will invoke a macro called GET_INPUT every time it needs |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
428 more input. You may define GET_INPUT according to your needs. You can define it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
429 so that it calls an input function, or you can define it so that it executes |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
430 in-line code each time it is invoked. Your GET_INPUT macro should store its |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
431 input code in the input_code field of the parser control block. If you do not |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
432 write a GET_ INPUT macro, AnaGram will provide one which will read characters |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
433 from stdin.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
434 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
435 If your <A HREF="gloss.html#Grammar">grammar</A> does not define <A HREF="gloss.html#Terminal">terminal tokens</A> in terms of ascii characters or |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
436 external token numbers, your GET_INPUT will have to determine the appropriate |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
437 internal token number for each input token. To assist you in determining these |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
438 token numbers AnaGram provides a typedef enum statement in the header file. You |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
439 can then use named constants to specify the internal token numbers for the input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
440 tokens.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
441 <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
442 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
443 <H4><A NAME="Error">Error Handling</A></H4> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
444 <P>Your <A HREF="gloss.html#Parser">parser</A> must be prepared to deal with erroneous input. There are two |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
445 aspects to error handling: diagnosing the error, and recovering from the error. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
446 On encountering an error, your parser will invoke a macro called SYNTAX_ ERROR. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
447 If you do not provide a definition for SYNTAX_ERROR, AnaGram will provide a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
448 simple error diagnostic. AnaGram can also provide automatic error diagnoses |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
449 which pinpoint the location of the error.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
450 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
451 AnaGram provides two options for error recovery: error |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
452 token <A HREF="gloss.html#Resynchronization">resynchronization</A> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
453 and automatic resynchronization. These are techniques for getting your parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
454 back in synch with its input so it can proceed after encountering an error. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
455 Normally, if you do not select one of these recovery techniques, your parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
456 will terminate when it encounters an error; however, you may override this |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
457 default if you wish and provide your own recovery technique.</P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
458 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
459 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
460 <IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------" |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
461 WIDTH=1010 HEIGHT=2 > |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
462 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
463 <IMG ALIGN="right" SRC="images/pslrb6d.gif" ALT="Parsifal Software" |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
464 WIDTH=181 HEIGHT=25> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
465 <BR CLEAR="right"> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
466 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
467 Back to <A HREF="index.html">Index</A> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
468 <P> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
469 <ADDRESS><FONT SIZE="-1"> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
470 AnaGram parser generator - documentation<BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
471 Introduction to Syntax Directed Parsing<BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
472 Copyright © 1993-1999, Parsifal Software. <BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
473 All Rights Reserved.<BR> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
474 </FONT></ADDRESS> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
475 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
476 </BODY> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
477 </HTML> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
478 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
479 |