diff doc/admin/todo-large.txt @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children f9e4689b837d
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/admin/todo-large.txt	Sat Dec 22 17:52:45 2007 -0500
@@ -0,0 +1,213 @@
+Large todo items
+
+This file describes the bigger and more involved projects or
+undertakings that can or should be done on AnaGram moving forward.
+
+------------------------------------------------------------
+USER INTERFACE
+
+ - Build a new user interface that's based on a non-legacy toolkit.
+This is the most pressing issue and almost nothing else in this file
+should be undertaken until the new user interface is under control.
+
+ - Conflict diagnostics. AG's conflict diagnostics are already far
+better than yacc's; however, they are still opaque to most users and
+could be improved a good deal. There are a number of areas for
+improvement: (1) presentation; we have a GUI, we can and should use it
+to draw diagrams and arrows and whatnot, and not limit ourselves to
+rows of text like AG 1.x had to. We should show a sample input that
+leads to the conflict, and bracket it on the bottom showing the rules
+and reductions that go one way and on the top showing the rules and
+reductions that go the other. (2) common forms; a lot of conflicts
+arise from common mistakes and common issues. We should have a set of
+pattern-matching rules to identify and report these common forms, and
+link to explanations and fix suggestions in the help.
+
+ - Better crossreferencing of tables. The right-button popup menu for
+auxiliary windows is fine, but you ought to get something useful by
+default if you click (or double-click?) on things directly. Also,
+there are more useful crossreferences than currently exist.
+
+ - Better presentation of tables. To really take advantage of the
+information in the various tables and windows AG provides, you have to
+understand quite a bit about how AG works. This should not be
+necessary. Also, the AG 2.x user interface is a direct conceptual port
+of the text-based AG 1.x user interface. It's still fundamentally
+based on lines of text. This is not necessary and could be improved a
+great deal without undue difficulty.
+
+ - Command-line version. agcl should print basic conflict diagnoses as
+well as the warning and error messages, so one doesn't have to fire up
+the GUI every time one makes a silly editing mistake.
+
+ - It would be nice if when you opened the Configuration Parameters
+window you could change the settings for the current run. (Has to be
+for the current run, because saving changes is not remotely
+practical.)
+
+ - Multiple grammars. In the long run it would be nice to be able to
+have multiple grammars loaded at the same time and be able to shuffle
+tokens between them when using File Trace or Grammar Trace. This would
+allow easier testing and debugging of projects that use multiple
+interacting grammars, or that use AG to implement communication
+protocols. Unfortunately, this is *not* trivial.
+
+
+------------------------------------------------------------
+PROGRAMMING INTERFACE
+
+ - Language support. There's been talk of Java support for AG for
+quite a while. It would be nice to actually *do* it. Beyond that,
+other languages which would be useful or interesting to support
+include Perl, Python, Ruby, Cyclone, OCaml... you name it. Right now
+adding any of these would be nontrivial; however, after the first
+couple adding another should be relatively straightforward. Note also
+that as many of these languages aren't syntactically compatible with
+either C or AnaGram, we will need to design a way to put the syntax
+and the accompanying code in separate files.
+
+ - Configuration support for multiple languages. There also needs to
+be a better and more systematic configuration mechanism for choosing
+output language and output language dialect.
+
+ - Output name. It's nice for makefiles to be able to know what the
+output files will be named. Unfortunately, there's no good way to do
+this that I can think of, because it depends on the output language.
+If anyone has a brilliant idea, please share it.
+
+ - Include files and/or a module system for grammars. Common bits of
+grammar have to be cut and pasted all the time. Some mechanism to
+avoid this would be nice. (Some users have used cpp or m4 to
+preprocess AG input; this works but it's awfully ugly.)
+
+ - Reading yacc files. There's been a yacc-to-AnaGram converter around
+for a long time, but it's not fully functional and hasn't ever been
+officially released or supported. (It originally got hung up because
+yacc's input language is defined, at least in some references, in a
+way that makes the grammar LR(2). This is not important now, if it
+ever was; bison, for example, does not accept the offending
+constructs.) It might be better at this point to merge the
+functionality directly into AnaGram so it can read (and build) .y
+files directly. Finding a clean version of y2ag is the first order of
+business. Note: the y2ag.syn in the test suite is mangled in some
+fashion and not a good place to start.
+
+
+------------------------------------------------------------
+BUILD
+
+ - Dumping UI. To improve the regression testing I'd like to have a
+fake user interface that takes *all* the data normally available in
+the various windows and tables and just dumps it to stdout.
+
+ - Parser equivalence. One of the problems with the test suite is that
+sometimes an internal change will, perfectly legitimately, permute
+elements in the various parse tables, causing huge test diffs that
+need to be inspected by hand. It would be nice to figure out how to
+either (1) canonicalize the tables so this doesn't happen, or (2)
+write some kind of munging script that automatically validates and
+suppresses these changes.  (Obviously, (1) is better, but I'm not sure
+it's feasible.)
+
+ - Figure out how to do coverage checking in the test suite. (This
+includes coverage of AG code, coverage of the code sections from the
+engine definition, and also coverage of config parameters that affect
+code generation.)
+
+
+------------------------------------------------------------
+INTERNALS
+
+ - Character sets. Right now, AG is limited to 16-bit-wide character
+sets, which is a problem as Unicode is now 24 bits wide. Also, the
+internal handling of case folding and so on is totally inadequate and
+needs a general revamp. And keywords only work with 8-bit character
+sets.
+
+ - Lexemes and disregard. I'd like to clean up the implementation of
+disregard whitespace and lexemes. It has a sound theoretical
+foundation, but you wouldn't know it from the existing code. (To be
+fair, the existing code was severely constrained by DOS memory
+issues.)
+
+ - LR(k) grammars. Jerry was talking about this before he died;
+unfortunately, I don't think anyone knows exactly what he had in
+mind...
+
+ - Regexp keywords. Keyword recognition is already a string search
+process; there's no real reason it couldn't be extended to include
+regular expression matching. The catch of course is how you preserve
+any useful notion of soundness in the grammar.
+
+ - Merge duplicate conflicts. A lot of times the same conflict will
+happen with a whole pile of tokens. Right now these are all generated
+and displayed independently; they should really be folded together.
+
+ - Fix the code generator. The code generator is an ugly mess and
+needs rationalization.
+
+ - Interactive completions. Many command-driven programs nowadays
+allow you to push a key (often tab) to get a list of legal inputs
+based on what's been typed so far. If input is being handled by an AG
+parser, you can in theory run the parser on the input so far, inspect
+the state, and get a list of legal tokens, which you can then use to
+give the user a context-sensitive list of possible things to
+type. This is, however, not something that you can possibly code up
+yourself by interfacing to AG, at least not in any pleasant way, so AG
+ought to provide support.
+
+
+------------------------------------------------------------
+MANUAL AND DOCS
+
+ - The manual index is in a parlous state. It was badly mauled by
+conversion from WP to LaTeX and needs to be rebuilt with reference to
+the old WP version.
+
+ - Merge the various articles of shared text that are found in the
+help, the manual, the miscellaneous documentation, the web page,
+and/or elsewhere too. These should all be maintained from one master
+source. The most urgent instances are "sbb" (the syntactic building
+blocks, found in the examples directory, the misc documentation, and
+the manual) and the Glossary. Note that the text in the help system is
+often slightly different from the corresponding text in the manual on
+purpose, because it plays a different role and has a somewhat
+different narrative context. (However, the various copies of the
+Glossary should all be the same. These include: the manual appendix,
+the glossary web page, the copy in the misc docs, and the one in the
+help.)
+
+ - The manual needs a discussion of Unicode, wchar_t, wide characters,
+multibyte characters, and all that. Of course, first we have to
+actually *work* with such constructs... (All four listed terms need to
+appear in the index, too.)
+
+ - Somewhere in the manual there should be a section to tell you what
+to do for interactive use. The most important issue, if each input
+line isn't a complete parse, is how to make sure it won't try to look
+ahead past the end of a line when the user expects the line to be
+complete and the program to do something.
+
+ - Somewhere in the manual there should be a section to tell you what
+to do for a parser that runs indefinitely and only exits at program
+shutdown, like you might use to handle a network protocol in a daemon.
+
+ - It would be nice to have a section or appendix that describes
+common sources of conflicts and what to do about them. Such constructs
+include:
+	- if-then-else (text already exists)
+	- repeated repetitions (a -> b..., b -> c...)
+	- poisoning lookahead by bad nesting of productions
+          (I had an example involving trying to distinguish whitespace
+	  and comments at too high a level)
+	- sequences of nondelimited tokens (a -> identifier...)
+	- generalize the previous to the full lexer model
+	- ...
+
+
+------------------------------------------------------------
+EXAMPLES
+
+ - Update the examples to use agclib1 (as seen in XIDEK) instead of
+oldclasslib.
+