Mercurial > ~dholland > hg > ag > index.cgi
annotate doc/manual/gl.tex @ 10:5b21f127e957
silence random gcc warning
author | David A. Holland |
---|---|
date | Mon, 30 May 2022 23:58:14 -0400 |
parents | 13d2b8934445 |
children |
rev | line source |
---|---|
0
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
1 \chapter{Glossary} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
2 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
3 % XXX Add: |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
4 % parameter assignment (?) |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
5 % parser (state) stack |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
6 % parser value stack |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
7 % (may be defined in some other version of the glossary) |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
8 % (and appear in the help) |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
9 % (there are apparently dangling links to these in some version) |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
10 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
11 \index{Action} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
12 \index{Parser action} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
13 \index{Parsing engine} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
14 \paragraph{Action.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
15 A ``parser action'' is one of the basic executable elements of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
16 parsing engine. In a traditional parsing engine there are four |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
17 actions: the shift action, the reduce action, the accept action, and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
18 the error action. The parsing engines which AnaGram generates use a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
19 substantially greater number of actions, in order to provide certain |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
20 short cuts in the interest of greater speed and greater compactness of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
21 the tables which control the action of the parsing engine. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
22 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
23 \paragraph{Accept Action.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
24 The accept action is one of the four actions of a traditional parsing |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
25 engine. The accept action is performed when the parser has succeeded |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
26 in identifying the goal token for the grammar. When the parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
27 executes the accept action, it returns to the calling program. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
28 % , returning the semantic value of the goal token as its value. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
29 The accept action is thus the last action of the parsing engine and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
30 occurs only once for each successful execution of the parser. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
31 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
32 \paragraph{Associativity.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
33 This is a term used to describe binary operators (q.v.). It describes |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
34 how you interpret a sequence of operations which all involve the same |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
35 operator. Consider $a - b - c$, for instance. Here the convention is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
36 that we subtract $b$ from $a$, and then $c$ from the difference. We |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
37 say that subtraction is left associative, since if there is a sequence |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
38 of subtraction operations, the leftmost one is to be performed |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
39 first. As a second example, consider exponentiation. FORTRAN uses |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
40 \agcode{**} as an operator to indicate exponentiation. In order to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
41 conform as closely as possible to ordinary mathematical notation, the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
42 expression \agcode{a**b**c} is interpreted to mean that first |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
43 \agcode{b} is raised to the power \agcode{c}, and then the resultant |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
44 value is used as the power to which \agcode{a} is raised. We say that |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
45 exponentiation is right associative since the rightmost of a sequence |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
46 of exponentiation operations is to be performed first. If a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
47 programming language does not allow for unparenthesized sequences of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
48 particular operator, the operator is said to be non-associative. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
49 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
50 Another way to view left and right associativity is as implied |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
51 parenthesization. Left associative operators are parenthesized from |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
52 the left. Right associative operators are parenthesized from the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
53 right. Thus $a - b - c = ((a-b) - c)$ and \agcode{a**b**c = (a**(b**c))}. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
54 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
55 AnaGram offers two ways to specify associativity of binary |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
56 operators. The first way is to use conventional Backus-Naur Form |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
57 syntax descriptions. You would use recursion to describe both |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
58 subtraction and exponentiation. Subtraction, since it is left |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
59 associative, uses left recursion, and exponentiation, being right |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
60 associative, uses right recursion. Thus |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
61 \begin{indentingcode}{0.4in} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
62 difference -> value | difference, '-', value |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
63 exponential -> value | value, "**", exponential |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
64 \end{indentingcode} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
65 could be used to describe differences and exponents. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
66 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
67 The second way to specify associativity is to use an ambiguous grammar |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
68 and precedence declarations. (See Chapter 9.) |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
69 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
70 \paragraph{Backtracking.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
71 In order to make parse tables more compact and parsers faster, it is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
72 common to use default reductions. In case of error, it is necessary to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
73 undo default reductions before diagnostics can be properly |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
74 determined. In AnaGram, this undo operation is called backtracking. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
75 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
76 \index{Backus-Naur form} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
77 \index{BNF} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
78 \paragraph{Backus-Naur Form.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
79 Backus-Naur Form, or BNF, is a conventional notation for describing |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
80 context free grammars. AnaGram uses an extended notation for grammars, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
81 which, except for semantically determined productions, can be shown to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
82 be equivalent to BNF. The term ``BNF'' is often used colloquially to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
83 refer to a grammar specification. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
84 % XXX: shouldn't this say ``any grammar specification''? |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
85 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
86 AnaGram's syntax specification language differs from BNF in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
87 following respects: |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
88 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
89 \begin{enumerate} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
90 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
91 \item |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
92 In conventional BNF, a symbol not enclosed in angle brackets ($< >$) |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
93 is taken to represent itself literally. In AnaGram, literal character |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
94 representations must be enclosed in single quotes and literal strings |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
95 within double quotes. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
96 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
97 \item |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
98 In BNF, the right side of a production is simply a list of symbols |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
99 without any delimiters or punctuation. AnaGram requires that the rule |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
100 elements which make up a grammar rule, or right side, be joined by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
101 commas. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
102 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
103 \item |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
104 BNF makes no provision for identifying reduction procedures or their |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
105 arguments. AnaGram provides both reduction procedures, introduced by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
106 an ``\agcode{=}'' at the end of the production, and named arguments, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
107 introduced by a ``\agcode{:}'' at the end of any token on the right |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
108 side of the production. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
109 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
110 \item |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
111 AnaGram allows virtual productions to be used freely. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
112 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
113 \item |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
114 BNF is ``pure'' in that if you wish to define a nonterminal token |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
115 called ``digit'' to represent the digits from zero to nine, you must |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
116 provide ten explicit productions to define it. AnaGram treats the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
117 concept of ``terminal token'' as used in language theory as an |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
118 abstraction, and interposes character set functionality between actual |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
119 character input and the terminal tokens of BNF. You can define digit |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
120 to be the character range \agcode{'0-9'}, for instance, and AnaGram |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
121 will determine and define precisely the minimum number of productions |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
122 necessary, taking into account any other usage of the characters |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
123 \agcode{'0'} through \agcode{'9'} in your grammar. This makes your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
124 grammar more compact, more manageable, and easier to modify. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
125 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
126 \item |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
127 AnaGram allows for semantically determined productions, which provide |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
128 a significant increment in the ease with which semantic analysis may |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
129 be joined to syntactic analysis. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
130 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
131 \end{enumerate} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
132 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
133 \paragraph{Binary operator.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
134 A binary operator is an operator that works on two operands to create |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
135 a result. It is often written between its operands and is sometimes |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
136 called an infix operator. In ordinary programming languages |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
137 ``\agcode{+}'' and ``\agcode{-}'' are binary operators. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
138 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
139 \paragraph{Binding.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
140 Most programming languages, rather than executing arithmetic operators |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
141 in simple left to right order, conform to the traditional conventions |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
142 of ordinary algebra, which dictate that, except where parenthesis |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
143 indicate otherwise, exponentiation is done before multiplication, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
144 multiplication before addition, and addition before comparison. One |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
145 says that exponentiation is ``more tightly binding'' than |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
146 multiplication, multiplication is more tightly binding than addition, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
147 and so on. The sense of the word here is that the operator binds |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
148 together its operands into a single entity. An operand which falls |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
149 between two different operators is ``bound'' by the more tightly |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
150 binding operator. An operator that is more tightly binding than |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
151 another is also said to have higher precedence. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
152 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
153 \index{Character sets} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
154 \paragraph{Character sets.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
155 One of the traditional problems with syntax directed programming is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
156 that caused by the fact that most formal language specifications have |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
157 productions of the form |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
158 \begin{indentingcode}{0.4in} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
159 letter -> a | b | c | d ... | z |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
160 \end{indentingcode} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
161 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
162 Since the letters are not often distinguished in the grammar, this |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
163 large number of essentially identical productions causes a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
164 correspondingly large number of states in the parser. This problem is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
165 often attacked by using a ``lexical scanner'' which simply specifies a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
166 ``letter token'' whose value indicates precisely which letter is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
167 intended. This works fine as long as there is nothing in the grammar |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
168 which distinguishes any letter at all. If there is, and there is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
169 usually some special use of \agcode{h}, \agcode{x}, \agcode{q}, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
170 \agcode{o}, \agcode{e}, or even \agcode{a-f}, the situation gets more |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
171 complicated. Often the result is to abandon the use of syntax directed |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
172 programming for elemental units of the language and use a more complex |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
173 lexical scanner which identifies names, numbers, key words and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
174 operators. Such lexical scanners are often built using conventional |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
175 programming languages or simple pattern recognizing languages such as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
176 \agfile{lex}. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
177 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
178 AnaGram avoids this problem by incorporation the notion of character |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
179 set into its input specifications. Briefly, the way it works is the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
180 following: You specify the set of characters which make up any |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
181 ostensibly terminal token. AnaGram then analyzes the overlaps among |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
182 these definitions and creates a minimal set of tokens which match your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
183 specifications exactly. For instance, suppose you have the following |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
184 definitions: |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
185 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
186 \begin{indentingcode}{0.4in} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
187 letter = 'a-z' + 'A-Z' |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
188 hex digit = '0-9' + 'a-f' + 'A-F' |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
189 \end{indentingcode} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
190 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
191 AnaGram will define a token for letter which has two productions: |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
192 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
193 \begin{indentingcode}{0.4in} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
194 letter -> 'a-f' + 'A-F' | 'g-z' + 'G-Z' |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
195 \end{indentingcode} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
196 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
197 With this technique, you have the freedom to specify your grammar in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
198 the easiest possible manner, without the penalty of an absurdly large, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
199 slow parser. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
200 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
201 Character sets may be specified in terms of ranges of characters, as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
202 in the above example, as unions, denoted by ``\agcode{+}'', as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
203 intersections, denoted by ``\agcode{\&}'', as differences, denoted by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
204 ``\agcode{-}'', and as complements, using ``\agcode{\~{}}''. You may |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
205 also specify a set consisting of a single character either with a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
206 literal character or with a numeric specification. If you specify a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
207 character numerically, you may use either decimal, octal or |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
208 hexadecimal notation, as in C. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
209 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
210 \index{Character universe} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
211 \index{Universe} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
212 \paragraph{Character Universe.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
213 In ordinary set theory, the ``universe'' is the set of all entities |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
214 which may be elements of a set. It must be defined so that the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
215 complement of a set can mean something unambiguous. In AnaGram, the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
216 character universe normally comprises the ASCII characters and the IBM |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
217 extended character set, that is, all characters in the range from 0 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
218 through 255. If, however, you have defined input characters outside |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
219 this range, the character universe will be extended to the least |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
220 character you have used and to the greatest character you have used in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
221 your grammar. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
222 % XXX this should mention character sets, and it should mention that |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
223 % it's limited to character sets of size 65536. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
224 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
225 \index{Characteristic rules} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
226 \index{Rule} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
227 \paragraph{Characteristic Rule.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
228 Each parser state is characterized by a particular set of grammar |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
229 rules, and for each such rule, a marked token which is the next token |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
230 expected. These rules are called the characteristic rules for the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
231 state. In the course of doing grammar analysis, AnaGram determines the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
232 characteristic rules for each parser state. After analyzing your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
233 grammar, you may inspect the \agwindow{State Definition Table} to see |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
234 the characteristic rules for any state in your parser. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
235 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
236 \index{Characteristic token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
237 \index{Token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
238 \paragraph{Characteristic Token.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
239 Every state in a parser, except state zero, can be characterized by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
240 the one, unique token which causes a jump to that state. That token is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
241 called the characteristic token of the state, because to get to that |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
242 parser state you must have just seen precisely that token in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
243 input. Note that several states could have the same characteristic |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
244 token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
245 % XXX s/one,/one/ |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
246 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
247 When you have a list of states, such as is given by the parser state |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
248 stack, it is equivalent to a list of characteristic tokens. This list |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
249 of tokens is the list of tokens that have been recognized so far by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
250 the parser. Some of these tokens, of course, may be nonterminal tokens |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
251 and may thus represent the result of reducing a sequence of previously |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
252 recognized tokens. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
253 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
254 \index{Complement} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
255 \paragraph{Complement of a set.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
256 In set theory, the complement of a set, $S$, is the collection of all |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
257 elements of the character universe which are not elements of $S$. Using |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
258 AnaGram's notation, the complement of \agcode{S} is written |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
259 \agcode{\~{}S}. The complement operator has higher precedence than the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
260 difference, intersection, or union operators. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
261 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
262 \index{Rule} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
263 \index{Completed rule} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
264 \paragraph{Completed Rule.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
265 A ``completed rule'' is a characteristic rule whose index is pointing |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
266 beyond the last rule element. In other words, it has been completely |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
267 matched and will be reduced by the next input. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
268 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
269 If a state has more than one completed rule, the decision as to which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
270 to reduce is made based on the next input token. If there is only one |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
271 completed rule in a state, it will be reduced by default unless the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
272 default reductions switch has been reset, i.e., turned off. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
273 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
274 \index{Conflicts} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
275 \paragraph{Conflict.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
276 Conflicts arise during a grammar analysis when AnaGram cannot |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
277 determine how to treat a given input token. There are two sorts of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
278 conflicts: \agterm{shift-reduce} conflicts and \agterm{reduce-reduce} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
279 conflicts. Conflicts may arise either because the grammar is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
280 inherently ambiguous, or simply because the grammar analyzer cannot |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
281 look far enough ahead to resolve the conflict. In the latter case, it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
282 is often possible to rewrite the grammar in such a way as to eliminate |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
283 the conflict. Null productions are a common source of conflict. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
284 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
285 There are a number of ways to deal with conflicts. If you understand |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
286 the conflict well, you may simply choose to ignore it. When AnaGram |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
287 encounters a shift-reduce conflict while building parse tables it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
288 resolves it by choosing the shift action. When AnaGram encounters a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
289 reduce-reduce conflict while building parse tables, it resolves it by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
290 selecting the grammar rule which occurred first in the grammar. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
291 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
292 A second way to deal with conflicts is to set operator precedence |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
293 parameters. If you set these parameters, AnaGram will use them |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
294 preferentially to resolve conflicts. Any conflicts so resolved will be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
295 listed in the Resolved Conflicts window. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
296 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
297 A third way to resolve a conflict is to declare some tokens as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
298 \agparam{sticky} or as \agparam{subgrammar}s. This is particularly |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
299 useful for productions whose sole purpose is to skip over |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
300 uninteresting input. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
301 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
302 The fourth way to deal with conflicts is to rewrite the grammar to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
303 eliminate them. Many people prefer this approach since it yields the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
304 highest level of confidence in the resulting program. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
305 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
306 \index{Context free grammar} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
307 \paragraph{Context Free Grammars.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
308 Context free grammars are grammars wherein the definition of a grammar |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
309 unit, or nonterminal token, does not depend on the context in which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
310 the nonterminal is used. That means that if you define ``widget'' in a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
311 certain way, that definition is valid in all contexts in which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
312 ``widget'' might occur. Context free grammars can be represented in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
313 Backus-Naur Form. AnaGram's syntax specification language has no |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
314 facility for representing grammars which are not context free. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
315 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
316 \index{Default reductions} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
317 \index{Reduction} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
318 \paragraph{Default reductions.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
319 A ``default reduction'' is a parser action which may be used in your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
320 parser in any state which has precisely one completed rule. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
321 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
322 If a given parser state has, among its characteristic rules, exactly |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
323 one completed rule, it is usually faster to reduce it on any input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
324 than to check specifically for correct input before reducing it. The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
325 only time this default reduction causes trouble is in the event of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
326 syntax error. In this situation you may get an erroneous reduction. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
327 Normally when you are parsing a file, this is inconsequential because |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
328 you are not going to continue semantic action in the presence of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
329 error. But, if you are using your parser to handle real-time |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
330 interactive input, you have to be able to continue semantic processing |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
331 after notifying your user that he has entered erroneous input. In this |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
332 case you would want \agparam{default reductions} to have been turned |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
333 off so that productions are reduced only when there is correct input. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
334 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
335 \index{Difference} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
336 % XXX: \index{Set difference} ? |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
337 \paragraph{Difference of two sets.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
338 In set theory, the difference of two sets, $A$ and $B$, is defined to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
339 be the set of all elements of $A$ that are not elements of $B$. In an |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
340 AnaGram syntax file, you represent the difference of two character |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
341 sets by using the ``\agcode{-}'' operator. Thus the difference of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
342 \agcode{A} and \agcode{B} is \agcode{A - B}. The difference operator |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
343 is left associative. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
344 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
345 \paragraph{Error Action.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
346 The error action is one of the four actions of a traditional parsing |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
347 engine. The error action is performed when the parser has encountered |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
348 an input token which is not admissible in the current state. The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
349 further behavior of a traditional parser is undefined. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
350 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
351 \index{Expansion rule} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
352 \index{Rule} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
353 \paragraph{Expansion Rule.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
354 In analyzing a grammar, we are often interested in the full range of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
355 input that can be expected at a certain point. The expansion of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
356 token or state shows us all the expected input. An expansion yields a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
357 set of marked rules. Looking at the marked token shows us what input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
358 to expect. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
359 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
360 The set of expansion rules of a (nonterminal) token shows all the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
361 expected input that can occur whenever the token appears in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
362 grammar. The set consists of all the grammar rules produced by the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
363 token, plus all the rules produced by the first token of any rule in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
364 the set. The marked tokens for the expansion rules of a token are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
365 always the first element in the rule. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
366 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
367 The expansion of a state consists of its characteristic rules plus the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
368 expansion rules of the marked token in each characteristic rule. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
369 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
370 \index{Grammar} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
371 \paragraph{Grammar.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
372 Traditionally, a grammar is a set of productions which taken together |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
373 specify precisely a set of acceptable input sequences in terms of an |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
374 abstract set of terminal tokens. The set of acceptable input sequences |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
375 is often called the ``language'' defined by the grammar. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
376 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
377 In AnaGram, the term grammar also includes configuration sections, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
378 definitions of character sets, virtual productions, etc. that augment |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
379 the collection of productions. A grammar is often called a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
380 ``syntax'', and the rules of the grammar are often called syntactic |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
381 rules. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
382 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
383 % XXX you can't say that step 1 (of 4) of analysis is analysis. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
384 \paragraph{Grammar Analysis.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
385 The major function of AnaGram is the analysis of context free grammars |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
386 written in a particular variant of Backus-Naur Form. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
387 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
388 The analysis of a grammar proceeds in four stages. In the first, the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
389 input grammar is analyzed and a number of tables are built which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
390 describe all of the productions and components of the grammar. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
391 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
392 In the second stage, AnaGram analyzes all of the character sets |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
393 defined in the grammar, and where necessary, defines auxiliary tokens |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
394 and productions. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
395 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
396 In the third stage, AnaGram identifies all of the states of the parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
397 and builds the go-to table for the parser. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
398 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
399 In the fourth stage, Anagram identifies reduction tokens for each |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
400 completed grammar rule in each state and checks for conflicts. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
401 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
402 \index{Grammar rule} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
403 \paragraph{Grammar Rule.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
404 A ``grammar rule'' is the right side of a production. It consists of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
405 sequence of rule elements. Each rule element identifies some token, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
406 which can be either a terminal token or nonterminal token. It is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
407 ``matched'' by a corresponding sequence of tokens in the input stream |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
408 to the parser. The left side of the production identifies one or more |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
409 nonterminal tokens, or reduction tokens, to which the rule reduces |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
410 when matched. If there is more than one reduction token, there should |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
411 be a reduction procedure to make the choice. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
412 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
413 \index{Grammar token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
414 \index{Token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
415 \paragraph{Grammar Token.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
416 The grammar token is the token which represents the ``top level'' in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
417 your grammar. Some people refer to it as the ``goal'' or ``goal |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
418 token'' and others as the ``start token''. Whatever it is called, it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
419 is the single token which describes the complete input to your parser. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
420 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
421 \index{Intersection} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
422 \paragraph{Intersection.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
423 In set theory, the intersection of two sets, $A$ and $B$, is defined |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
424 to be the set of all elements of $A$ which are also elements of $B$. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
425 In an AnaGram syntax file, the intersection of two character sets is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
426 represented with the ``\agcode{\&}'' operator. The intersection |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
427 operator has lower precedence than the complement operator, but higher |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
428 precedence than the union and difference operators. The intersection |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
429 operator is left associative. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
430 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
431 \index{LALR parsing} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
432 \paragraph{LALR parsing.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
433 LALR, or ``lookahead LR'' parsing, is a somewhat less powerful variant |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
434 of LR parsing which produces much smaller parsing tables. Thus an LALR |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
435 parser has very significant advantages in size over its LR |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
436 counterpart. It is possible to construct grammars which can be parsed |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
437 by an LR parser but cannot be parsed by an LALR parser. That is, the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
438 LALR parser will find these grammars to be ambiguous and the LR parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
439 will not. In practice, however, such grammars seem to be rare and the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
440 ambiguities can often be eliminated by minor revisions. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
441 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
442 AnaGram generates LALR parsers and, in addition, uses LALR parsing to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
443 interpret syntax files. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
444 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
445 \index{Lexical scanner} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
446 \paragraph{Lexical Scanner.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
447 A lexical scanner is a program or procedure used as a preprocessor for |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
448 a parser. It scans input characters and lumps them together into |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
449 tokens. Many systems for generating parsers, most notably |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
450 \agfile{yacc}, can scarcely be used without a lexical scanner. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
451 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
452 AnaGram, although it can support the use of a lexical scanner, does |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
453 not need a lexical scanner for most practical problems. AnaGram avoids |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
454 the need for a lexical scanner by using character sets and disregard |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
455 and lexeme statements. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
456 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
457 If your problem does in fact need a lexical scanner, you can use |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
458 AnaGram itself to write it, so you don't need to know different |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
459 languages for the scanner and for the parser. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
460 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
461 \index{Lookahead token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
462 \index{Token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
463 \paragraph{Lookahead Token.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
464 The current input token to a parser is often called the ``lookahead'' |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
465 token. In many situations, it is not shifted into the input buffer |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
466 immediately, but acts like a catalyst to cause a number of rules to be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
467 reduced before it is eventually shifted in. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
468 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
469 \index{LR parsing} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
470 \paragraph{LR Parsing.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
471 An LR($k$) parser is a type of deterministic bottom-up shift-reduce |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
472 parser using at most $k$ lookahead symbols to determine its next |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
473 action at any point in the parse. If ($k$) is omitted, then $k$ is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
474 assumed to be 1. Discussion of the technical details of LR($k$) |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
475 parsing is beyond the scope of this manual. The interested reader may |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
476 consult any of a variety of works covering the subject\footnote{ See, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
477 for example, Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
478 \textit{Compilers: Principles, Techniques, and Tools}. (Reading, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
479 Massachusetts: Addison-Wesley, 1988).}. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
480 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
481 AnaGram produces parsers which employ a variant of LR parsing known as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
482 LALR parsing. It is not necessary to understand the details of LR and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
483 LALR parsing theory in order to use AnaGram. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
484 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
485 \index{Marked rule} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
486 \index{Rule} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
487 \paragraph{Marked Rule.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
488 A ``marked rule'' is a grammar rule together with a ``marked token'' |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
489 that indicates how much of the rule has already been matched and what |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
490 input should be expected if the remainder of the rule is to be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
491 matched. Each parser state is defined by a small number of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
492 characteristic rules which indicate what matches have already been |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
493 made and what input can be expected. Note that when a state has more |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
494 than one characteristic rule, any two characteristic rules with the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
495 same number of tokens match identically up to the marked token. Even |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
496 if one has fewer tokens to the left than the other, its tokens match |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
497 the other exactly up to the marked token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
498 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
499 When marked rules are displayed in AnaGram windows, the marked token |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
500 is displayed in a distinctive font which the developer can select. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
501 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
502 \paragraph{Non-associative.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
503 A binary operator, say \agcode{\#}, is said to be non-associative if |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
504 the sequence \agcode{x \# y \# z} is not permissible. If this sequence |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
505 is to be interpreted as \agcode{(x \# y) \# z}, the operator is said |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
506 to be left associative. If the sequence is to be interpreted as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
507 \agcode{x \# (y \# z)}, the operator is said to be right |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
508 associative. (See \agterm{associativity}.) |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
509 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
510 \index{Nonterminal token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
511 \index{Token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
512 \paragraph{Nonterminal Token.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
513 A ``nonterminal token'' is a token which results from reducing the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
514 sequence of tokens which match a grammar rule and replacing them with |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
515 the appropriate reduction token. Nonterminal tokens are to be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
516 distinguished from terminal tokens or input tokens. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
517 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
518 \index{Null productions} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
519 \index{Production} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
520 \paragraph{Null Production.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
521 A ``null production'' is one that has no tokens on the right side |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
522 whatsoever. Null productions essentially are identified by the first |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
523 following input token. Null productions are extremely convenient |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
524 syntactic elements when you wish to make some input optional. For |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
525 example, suppose that you wish to allow an optional semicolon at some |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
526 point in your grammar. You could write the following pair of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
527 productions: |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
528 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
529 \begin{indentingcode}{0.4in} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
530 optional semicolon -> | ';' |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
531 \end{indentingcode} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
532 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
533 Note that a null production can never follow a ``\agcode{|}''. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
534 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
535 The above could also be written on multiple lines thus: |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
536 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
537 \begin{indentingcode}{0.4in} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
538 optional semicolon |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
539 -> |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
540 -> ';' |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
541 \end{indentingcode} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
542 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
543 You can always rewrite your grammar to eliminate null productions if |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
544 you wish, but you usually pay a price in conciseness and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
545 clarity. Sometimes, however, it is necessary to do such a rewrite in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
546 order to avoid conflicts, to which null productions are especially |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
547 prone. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
548 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
549 If you have a null production with no reduction procedure specified, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
550 your parser will automatically assign the value zero to the reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
551 token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
552 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
553 Null productions can be generated by virtual productions. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
554 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
555 \index{Parser} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
556 \paragraph{Parser.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
557 A parser is a program, or more likely a procedure within a program, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
558 which scans a sequence of input characters or input tokens and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
559 accumulates them in an input buffer or stack as determined by a set of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
560 productions which constitute a grammar. When the parser discovers a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
561 sequence of tokens as defined by a grammar rule, or right side of a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
562 production, it ``reduces'' the sequence to a single reduction token as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
563 defined by the left side of the grammar rule. This nonterminal token |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
564 now replaces the tokens which matched the grammar rule and the search |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
565 for matches continues. If an input token is encountered which will not |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
566 yield a match for any rule, it is considered a syntax error and some |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
567 kind of error recovery may be required to continue. If a match, or |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
568 reduction action, yields the grammar token, sometimes called the goal |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
569 token or start token, the parser deems its work complete and returns |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
570 to whatever procedure may have called it. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
571 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
572 Tokens may have semantic values. If the input values configuration |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
573 switch is on, your parser will expect semantic values to be provided |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
574 by the input process along with the token identification code. If the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
575 \agparam{input values} switch is off, your parser will take the ASCII |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
576 value of the input character, that is, the actual input code, as the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
577 value of the character. When the parser reduces a production, it can |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
578 call a reduction procedure or semantic action to analyze the values of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
579 the constituent tokens. This reduction procedure can then return a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
580 value which characterizes the reduced token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
581 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
582 \paragraph{Parser Generator.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
583 A parser generator, such as AnaGram, is a program that converts a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
584 grammar, a rule-based description of the input to a program, into a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
585 conventional, procedural module called a parser. The parsers AnaGram |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
586 generates are simple C modules which can be compiled on almost any |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
587 platform. AnaGram parsers are also compatible with C++. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
588 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
589 \index{Parsing engine} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
590 \paragraph{Parsing Engine.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
591 A parser consists of three basic components: A set of syntax tables, a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
592 set of reduction procedures and a parsing engine. The parsing engine |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
593 is the body of code that interprets the parsing table, invokes input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
594 functions, and calls the reduction procedures. The \agmenu{Build |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
595 Parser} command configures a parsing engine according to the implicit |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
596 requirements of the syntax specification and according to the explicit |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
597 values of the configuration parameters. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
598 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
599 The parsing engine itself is a simple automaton, characterized by a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
600 set of states and a set of inputs. The inputs are the tokens of your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
601 grammar. Each state is represented by a list of tokens which are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
602 admissible in that state and for each token an action to perform and a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
603 parameter which further defines the action. In a traditional LR |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
604 parser, there are only four actions: the shift action, the reduce |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
605 action, the accept action and the error action. AnaGram, in doing its |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
606 grammar analysis, identifies a number of special cases, and creates a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
607 number of extra actions which make for faster processing, but which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
608 can be represented as combinations of these primitive actions. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
609 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
610 When a shift action is performed, the current state number is pushed |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
611 onto the parser state stack and the new state number is determined by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
612 the current state number and the current lookahead token. Different |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
613 tokens cause different new states. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
614 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
615 When a reduce action is performed, the length of the rule being |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
616 reduced is subtracted from the parser stack index and the new state |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
617 number is read from the top of the parser state stack. The reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
618 token for the rule being reduced is then used as an input token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
619 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
620 Each state in the grammar, with the exception of state zero, has a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
621 characteristic token which must have been recognized in order to jump |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
622 to that state. Therefore, the parser state stack, which is essentially |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
623 a list of state numbers, can also be thought of as a list of token |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
624 numbers. This is the list of tokens that have been seen so far in the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
625 parse of your input stream. Some of these tokens, of course, may be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
626 nonterminal tokens which have resulted from reducing other sequences |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
627 of tokens. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
628 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
629 \index{Partition} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
630 \paragraph{Partition.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
631 If you use character sets in your grammar, AnaGram will compute a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
632 ``partition'' of the character universe. This partition is a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
633 collection of non-overlapping character sets such that every one of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
634 the sets you have defined can be written as a union of partition |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
635 sets. Each partition set is identified by a unique reference number |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
636 called the partition set number. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
637 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
638 Each partition set is assigned a unique token. If one of your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
639 character sets requires more than one partition set to represent it, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
640 AnaGram will create appropriate productions and add them to your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
641 grammar so your parser can make the necessary distinctions. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
642 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
643 \index{Precedence} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
644 \paragraph{Precedence.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
645 ``Precedence'' is an attribute of binary operators and unary operators |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
646 which can be used to resolve conflicts. Operator precedence is also |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
647 referred to as \agterm{binding}. Suppose that \agcode{\#} and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
648 \agcode{@} are two binary operators. The question is how to interpret |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
649 \agcode{x \# y @ z}. If \agcode{\#} has higher precedence than |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
650 \agcode{@}, it is interpreted as \agcode{(x \# y) @ z}. On the other |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
651 hand if \agcode{\#} has lower precedence than \agcode{@}, it would be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
652 \agcode{x \# (y @ z)}. If \agcode{\#} and \agcode{@} have the same |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
653 precedence then the question must be resolved by associativity. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
654 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
655 Note that all operators at the same precedence level must have the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
656 same associativity. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
657 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
658 % XXX ``neither ... and ...?'' maybe you mean ``neither ... nor ...'' |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
659 The situation is somewhat simpler for unary operators. If |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
660 \agcode{\#} and \agcode{@} were both |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
661 prefix operators, or both suffix operators, there would be no |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
662 ambiguity in interpretation, since neither \agcode{\# @ x} and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
663 \agcode{x \# @} offer more than one possible interpretation. The only |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
664 difficulty arises when both a prefix and a suffix operator are applied |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
665 to the same operand. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
666 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
667 % XXX also ``if ... has ... would'' is wrong; it should be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
668 % either ``if ... had ... would'' or ``if ... has ... will''. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
669 Suppose that \agcode{\#} is a prefix operator and \agcode{@} is a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
670 suffix operator. The question then is how to interpret \agcode{\# x @}. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
671 If \agcode{\#} has higher precedence than \agcode{@}, it would be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
672 interpreted as \agcode{(\# x) @}. On the other hand, if \agcode{\#} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
673 has lower precedence than \agcode{@}, it would be interpreted as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
674 \agcode{\# (x @)}. If \agcode{\#} and \agcode{@} have the same |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
675 precedence then the question must be resolved by associativity. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
676 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
677 \index{Production} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
678 \paragraph{Production.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
679 Productions are the mechanism used to describe how complex input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
680 structures are built up out of simpler ones. Each production has a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
681 left side and a right side. The right side, or grammar rule, is a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
682 sequence of rule elements, which may represent either terminal tokens |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
683 or nonterminal tokens. The left side is a list of reduction tokens. In |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
684 most cases there would be only a single reduction token. Productions |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
685 with more than one token on the left side are called semantically |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
686 determined productions. The ``\agcode{->}'' symbol is used to separate |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
687 the left side from the right side. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
688 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
689 \paragraph{Reduce Action.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
690 The reduce action is one of the four actions of a traditional parsing |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
691 engine. The reduce action is performed when the parser has succeeded |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
692 in matching all the elements of a grammar rule and the next input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
693 token is not erroneous. Reducing the grammar rule amounts to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
694 subtracting the length of the rule from the parser stack index, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
695 identifying the reduction token, stacking its semantic value and then |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
696 doing a shift action with the reduction token as though it had been |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
697 input directly. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
698 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
699 \index{Conflicts} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
700 \index{Reduce-reduce conflicts} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
701 \paragraph{Reduce-Reduce Conflict.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
702 A grammar has a ``reduce-reduce'' conflict at some state if a single |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
703 token turns out to be a reducing token for more than one completed |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
704 rule. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
705 % XXX it would be nice to expand this. And given an example. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
706 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
707 \index{Reducing token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
708 \index{Token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
709 \paragraph{Reducing Token.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
710 In a state with more than one completed rule, your parser must be able |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
711 to determine which one was actually found. AnaGram deals with this |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
712 problem by looking at all the states the parser will branch to once |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
713 each rule is reduced. The acceptable input tokens for those states are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
714 the ``reducing tokens'' for the completed rules in the state under |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
715 investigation. If there is a single token which is a reducing token |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
716 for more than one rule, then the grammar is said to have a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
717 reduce-reduce conflict at that state. If in a particular state there |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
718 is both a shift action and a reduce action for the same token the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
719 grammar is said to have a shift-reduce conflict in that state. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
720 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
721 A reducing token is not the same as a reduction token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
722 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
723 \index{Reduction procedure} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
724 \paragraph{Reduction Procedure.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
725 A ``reduction procedure'' is a function you write which your parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
726 executes when it has identified the grammar rule to which the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
727 reduction procedure is attached in your grammar. There are two formats |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
728 for reduction procedures, depending on the size and complexity of the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
729 procedure. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
730 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
731 \index{Reduction token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
732 \index{Token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
733 \paragraph{Reduction Token.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
734 A token which appears on the left side of a production is called a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
735 reduction token. It is so called because when the grammar rule on the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
736 right side of the production is matched in the input stream, your |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
737 parser will ``reduce'' the sequence of tokens which matches the rule |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
738 by replacing the sequence of tokens with the reduction token. Note |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
739 that, if more than one reduction token is specified, your reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
740 procedure should choose the exact one. If it does not, your parser |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
741 will use the leftmost syntactically correct one in the list as the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
742 default. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
743 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
744 A reduction token is not the same as a reducing token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
745 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
746 \index{Resynchronization} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
747 \paragraph{Resynchronization.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
748 ``Resynchronization'' is the process of getting your parser back in |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
749 step with its input after encountering a syntax error. As such, it is |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
750 one method of error recovery. Of course, you would resynchronize only |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
751 if it is necessary to continue after the error. There are several |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
752 options available when using AnaGram. You could use the \agparam{auto |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
753 resynch} option, which causes AnaGram to incorporate an automatic |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
754 resynchronizing procedure into your parser, or you could use the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
755 \agparam{error resynch} option, which is similar to the technique used |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
756 by \agfile{yacc} programmers. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
757 % XXX That should be ``error token'', not ``error resynch''. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
758 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
759 \index{Rule elements} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
760 \paragraph{Rule Element.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
761 A grammar rule is a list of ``rule elements'', separated by commas. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
762 Rule elements may be token names, character sets, keywords, immediate |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
763 actions, or virtual productions. When AnaGram encounters a rule |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
764 element for which no token presently exists, it creates one. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
765 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
766 \index{Semantic action} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
767 \index{Action} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
768 \paragraph{Semantic Action.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
769 A semantic action is a piece of C or C++ code that is executed when a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
770 grammar rule has been identified by a parser. Semantic actions are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
771 also called reduction procedures, since they are executed when a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
772 grammar rule is reduced to the token on the left side of the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
773 production. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
774 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
775 \index{Semantic value} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
776 \index{Value} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
777 \paragraph{Semantic Value.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
778 Tokens, whether terminal or nonterminal, may have a semantic value. In |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
779 the case of terminal tokens, this may be a value assigned by the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
780 lexical scanner or, if the parser is using direct character input, it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
781 will be the ASCII value of the character itself. The values of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
782 nonterminal tokens are created by reduction procedures. As a parse |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
783 progresses, token values are shifted onto the stack, so that in a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
784 reduction procedure, the values of the tokens that comprise the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
785 grammar rule that is being reduced are available for inspection. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
786 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
787 \index{Semantically determined production} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
788 \index{Production} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
789 \paragraph{Semantically Determined Production.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
790 A ``semantically determined production'' is one which has more than |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
791 one reduction token specified on the left side of the production. You |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
792 would write such a production when the reduction tokens are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
793 syntactically indistinguishable. The reduction procedure may then |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
794 specify which of the listed reduction tokens the grammar rule is to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
795 reduce to based on semantic considerations. If there is no reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
796 procedure, or the reduction procedure does not specify a reduction |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
797 token, the parser will use the leftmost syntactically correct |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
798 reduction token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
799 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
800 \index{Set expressions} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
801 \paragraph{Set Expression.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
802 A set expression is an algebraic expression used to define a character |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
803 set in terms of individual characters, ranges of characters, and/or |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
804 other sets of characters as constructed using complements, unions, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
805 intersections, and differences. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
806 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
807 \paragraph{Shift Action.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
808 The shift action is one of the four actions of a traditional parsing |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
809 engine. The shift action is performed when the input token matches one |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
810 of the acceptable input tokens for the current parser state. The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
811 semantic value of the token and the current state number are stacked, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
812 the parser stack index is incremented and the state number is set to a |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
813 value determined by the previous state and the input token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
814 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
815 \index{Shift-reduce conflict} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
816 \index{Conflicts} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
817 \paragraph{Shift-Reduce Conflict.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
818 A ``shift-reduce'' conflict occurs if in some state there is a single |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
819 token that should be shifted, because it is legitimate input for one |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
820 of the rules of the state, but should also be used to reduce some |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
821 other rule because it is a reducing token for that rule. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
822 % XXX again, should expand this and give an example. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
823 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
824 \index{Parsing} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
825 \index{Syntax directed parsing} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
826 \paragraph{Syntax Directed Parsing.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
827 Syntax directed parsing, or formal parsing, is an approach to building |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
828 parsers based on formal language theory. Given a suitable description |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
829 of a language, called a grammar, there are algorithms which can be |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
830 used to create parsers for the language automatically. In this |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
831 context, the set of all possible inputs to a program may be considered |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
832 to constitute a language, and the rules for formulating the input to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
833 the program constitute the grammar for the language. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
834 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
835 The parsers built from a grammar have the advantage that they can |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
836 recognize any input that conforms to the rules, and can reject as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
837 erroneous any input that fails to conform. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
838 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
839 Since the program logic necessary to parse input is often extremely |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
840 intricate, programs which use formal parsing are usually much more |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
841 reliable than those built by hand. They are also much easier to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
842 maintain, since it is much easier to modify a grammar specification |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
843 than it is to modify complex program logic. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
844 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
845 \index{Terminal token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
846 \index{Token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
847 \paragraph{Terminal Token.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
848 A ``terminal token'' is a token which does not appear on the left side |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
849 of a production. It represents, therefore, a basic unit of input to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
850 your parser. If the input to your parser consists of ASCII |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
851 characters, you may define terminal tokens explicitly as ASCII |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
852 characters or as sets of ASCII characters. If you have an input |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
853 procedure which produces numeric codes, you may define the terminal |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
854 tokens directly in terms of these numeric codes. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
855 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
856 \index{Token} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
857 \paragraph{Token.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
858 Tokens are the units with which your parser works. There are two kinds |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
859 of tokens: terminal tokens and nonterminal tokens. These latter are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
860 identified by the parser as sequences of tokens. The grouping of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
861 tokens into more complex tokens is governed by the grammar rules, or |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
862 productions in your grammar. In your grammar, tokens may be denoted by |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
863 token names, by virtual productions, by immediate actions, by explicit |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
864 character representations, or by expressions which yield character |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
865 sets. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
866 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
867 A ``token'' should be thought of more or less as being something like |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
868 a cloakroom token. Imagine you had taken a kindergarten class to the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
869 museum and all the kids checked their coats. Each one gets a token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
870 What is the probability of losing one? You take all of the tokens |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
871 from the children, put them in a paper bag, tie it up, and check |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
872 it. You get one token in return. We would say that the kids' coats |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
873 represent the actual input to the system, their individual tokens are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
874 the terminal tokens, and your one token which enables you to redeem |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
875 the paper bag is a nonterminal token. Nonterminal tokens are just |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
876 single token numbers to represent the result of compacting a sequence |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
877 of more primitive tokens. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
878 %Thus we sometimes refer to the input character as the input even though |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
879 %there is a translation process, token conversion, required to turn the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
880 %input character, the winter coat, into a token number, or numbered brass |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
881 %slug. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
882 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
883 Actually, the word ``token'' is commonly used to refer to a number of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
884 apparently different things. The tokens you use in a grammar are |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
885 abstract. The tokens identified by a parser are concrete instances of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
886 the abstract tokens in the grammar. Furthermore, we have to identify |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
887 tokens in some manner, so we have token names and token numbers with |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
888 which to refer to particular tokens. As a result the word ``token'', |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
889 in any particular context, may refer directly to either an abstract |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
890 token, or a concrete instance of an abstract token, or may refer only |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
891 indirectly to the token by means of a token name or token number. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
892 % |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
893 % concrete whether concrete or abstract, can be confused |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
894 % |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
895 %we can easily confuse the identification with the token itself, whether |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
896 %concrete or abstract. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
897 % |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
898 % |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
899 %the token, whether concrete or abstract, can be confused with its |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
900 %identification. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
901 % |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
902 %"token" is sometimes used to refer to the token identification rather than |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
903 %the token itself. The correct meaning is usually clear from the context, at |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
904 %least to an experienced reader. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
905 % |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
906 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
907 As an example, consider a C compiler which has a lexical scanner to |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
908 identify input tokens. According to Kernighan and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
909 Ritchie\footnote{Brian W. Kernighan and Dennis M. Ritchie. \textit{The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
910 C Programming Language}, 2nd ed. (Englewood Cliffs, New Jersey: |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
911 Prentice-Hall, 1988), p. 191.}, there are six classes of C tokens: |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
912 identifiers, keywords, constants, string literals, operators, and |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
913 other separators. Consider a particular token, a hexadecimal constant, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
914 \agcode{0XF00}. When the lexical scanner hands this over to the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
915 parser, it has to provide an identification code which says what kind |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
916 of token it is, a hexadecimal constant, as well as the ``semantic |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
917 value'' of the token, 3840. The identification code is usually an |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
918 integer value, determined by the designer of the lexical scanner, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
919 which by agreement with the designer of the parser uniquely represents |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
920 a hexadecimal constant. This identification code can be thought of as |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
921 an external token number. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
922 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
923 The grammar for the parser, on the other hand, has a token, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
924 \agcode{HEXconstant}, let us say, to represent the abstract usage of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
925 hexadecimal constants in the C language. When AnaGram, or any other |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
926 parser generator for that matter, analyzes the grammar, it assigns its |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
927 own reference number to identify \agcode{HEXconstant}. This reference |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
928 number can be thought of as an internal token number to contrast it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
929 with the external token number provided by the lexical scanner. The |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
930 parsers AnaGram builds contain a table, \agcode{ag{\us}tcv}, which |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
931 provides a translation from the external to the internal token |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
932 number. Both of these token numbers, of course, differ from the |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
933 semantic value of the token, in this case, 3840. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
934 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
935 Even when an AnaGram parser is using simple character input, the word |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
936 ``token'' may represent several different things. Suppose a grammar |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
937 has a token \agcode{'A-Z'}, which represents the set of upper case |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
938 letters. There will be an internal token number, assigned by AnaGram, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
939 which identifies the (abstract) token in parser tables. The actual |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
940 (concrete) input token to a parser will, however, be a single letter, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
941 say ``Q''. In this case, the ASCII value of the character ``Q'', 81, |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
942 is used both as the external token number and as the semantic value of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
943 the token. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
944 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
945 \paragraph{Unary Operator.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
946 A ``unary operator'' is a token which represents some sort of |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
947 operation which operates on a single entity to produce a new resultant |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
948 entity. It is normally represented by a symbol which is written either |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
949 preceding the operand or following the operand. In the first case it |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
950 is called a ``prefix operator'' and in the second case a ``suffix |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
951 operator''. |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
952 |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
953 \index{Union} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
954 \paragraph{Union.} |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
955 The union of two sets is the set of all elements that are to be found |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
956 in one or another of the two sets. In an AnaGram syntax file the union |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
957 of two character sets \agcode{A} and \agcode{B} is represented using |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
958 the plus sign, as in \agcode{A + B}. The union operator has the same |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
959 precedence as the difference operator: lower than that of intersection |
13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
David A. Holland
parents:
diff
changeset
|
960 and complement. The union operator is left associative. |