Mercurial > ~dholland > hg > ag > index.cgi
diff doc/admin/todo-large.txt @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children | f9e4689b837d |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/admin/todo-large.txt Sat Dec 22 17:52:45 2007 -0500 @@ -0,0 +1,213 @@ +Large todo items + +This file describes the bigger and more involved projects or +undertakings that can or should be done on AnaGram moving forward. + +------------------------------------------------------------ +USER INTERFACE + + - Build a new user interface that's based on a non-legacy toolkit. +This is the most pressing issue and almost nothing else in this file +should be undertaken until the new user interface is under control. + + - Conflict diagnostics. AG's conflict diagnostics are already far +better than yacc's; however, they are still opaque to most users and +could be improved a good deal. There are a number of areas for +improvement: (1) presentation; we have a GUI, we can and should use it +to draw diagrams and arrows and whatnot, and not limit ourselves to +rows of text like AG 1.x had to. We should show a sample input that +leads to the conflict, and bracket it on the bottom showing the rules +and reductions that go one way and on the top showing the rules and +reductions that go the other. (2) common forms; a lot of conflicts +arise from common mistakes and common issues. We should have a set of +pattern-matching rules to identify and report these common forms, and +link to explanations and fix suggestions in the help. + + - Better crossreferencing of tables. The right-button popup menu for +auxiliary windows is fine, but you ought to get something useful by +default if you click (or double-click?) on things directly. Also, +there are more useful crossreferences than currently exist. + + - Better presentation of tables. To really take advantage of the +information in the various tables and windows AG provides, you have to +understand quite a bit about how AG works. This should not be +necessary. Also, the AG 2.x user interface is a direct conceptual port +of the text-based AG 1.x user interface. It's still fundamentally +based on lines of text. This is not necessary and could be improved a +great deal without undue difficulty. + + - Command-line version. agcl should print basic conflict diagnoses as +well as the warning and error messages, so one doesn't have to fire up +the GUI every time one makes a silly editing mistake. + + - It would be nice if when you opened the Configuration Parameters +window you could change the settings for the current run. (Has to be +for the current run, because saving changes is not remotely +practical.) + + - Multiple grammars. In the long run it would be nice to be able to +have multiple grammars loaded at the same time and be able to shuffle +tokens between them when using File Trace or Grammar Trace. This would +allow easier testing and debugging of projects that use multiple +interacting grammars, or that use AG to implement communication +protocols. Unfortunately, this is *not* trivial. + + +------------------------------------------------------------ +PROGRAMMING INTERFACE + + - Language support. There's been talk of Java support for AG for +quite a while. It would be nice to actually *do* it. Beyond that, +other languages which would be useful or interesting to support +include Perl, Python, Ruby, Cyclone, OCaml... you name it. Right now +adding any of these would be nontrivial; however, after the first +couple adding another should be relatively straightforward. Note also +that as many of these languages aren't syntactically compatible with +either C or AnaGram, we will need to design a way to put the syntax +and the accompanying code in separate files. + + - Configuration support for multiple languages. There also needs to +be a better and more systematic configuration mechanism for choosing +output language and output language dialect. + + - Output name. It's nice for makefiles to be able to know what the +output files will be named. Unfortunately, there's no good way to do +this that I can think of, because it depends on the output language. +If anyone has a brilliant idea, please share it. + + - Include files and/or a module system for grammars. Common bits of +grammar have to be cut and pasted all the time. Some mechanism to +avoid this would be nice. (Some users have used cpp or m4 to +preprocess AG input; this works but it's awfully ugly.) + + - Reading yacc files. There's been a yacc-to-AnaGram converter around +for a long time, but it's not fully functional and hasn't ever been +officially released or supported. (It originally got hung up because +yacc's input language is defined, at least in some references, in a +way that makes the grammar LR(2). This is not important now, if it +ever was; bison, for example, does not accept the offending +constructs.) It might be better at this point to merge the +functionality directly into AnaGram so it can read (and build) .y +files directly. Finding a clean version of y2ag is the first order of +business. Note: the y2ag.syn in the test suite is mangled in some +fashion and not a good place to start. + + +------------------------------------------------------------ +BUILD + + - Dumping UI. To improve the regression testing I'd like to have a +fake user interface that takes *all* the data normally available in +the various windows and tables and just dumps it to stdout. + + - Parser equivalence. One of the problems with the test suite is that +sometimes an internal change will, perfectly legitimately, permute +elements in the various parse tables, causing huge test diffs that +need to be inspected by hand. It would be nice to figure out how to +either (1) canonicalize the tables so this doesn't happen, or (2) +write some kind of munging script that automatically validates and +suppresses these changes. (Obviously, (1) is better, but I'm not sure +it's feasible.) + + - Figure out how to do coverage checking in the test suite. (This +includes coverage of AG code, coverage of the code sections from the +engine definition, and also coverage of config parameters that affect +code generation.) + + +------------------------------------------------------------ +INTERNALS + + - Character sets. Right now, AG is limited to 16-bit-wide character +sets, which is a problem as Unicode is now 24 bits wide. Also, the +internal handling of case folding and so on is totally inadequate and +needs a general revamp. And keywords only work with 8-bit character +sets. + + - Lexemes and disregard. I'd like to clean up the implementation of +disregard whitespace and lexemes. It has a sound theoretical +foundation, but you wouldn't know it from the existing code. (To be +fair, the existing code was severely constrained by DOS memory +issues.) + + - LR(k) grammars. Jerry was talking about this before he died; +unfortunately, I don't think anyone knows exactly what he had in +mind... + + - Regexp keywords. Keyword recognition is already a string search +process; there's no real reason it couldn't be extended to include +regular expression matching. The catch of course is how you preserve +any useful notion of soundness in the grammar. + + - Merge duplicate conflicts. A lot of times the same conflict will +happen with a whole pile of tokens. Right now these are all generated +and displayed independently; they should really be folded together. + + - Fix the code generator. The code generator is an ugly mess and +needs rationalization. + + - Interactive completions. Many command-driven programs nowadays +allow you to push a key (often tab) to get a list of legal inputs +based on what's been typed so far. If input is being handled by an AG +parser, you can in theory run the parser on the input so far, inspect +the state, and get a list of legal tokens, which you can then use to +give the user a context-sensitive list of possible things to +type. This is, however, not something that you can possibly code up +yourself by interfacing to AG, at least not in any pleasant way, so AG +ought to provide support. + + +------------------------------------------------------------ +MANUAL AND DOCS + + - The manual index is in a parlous state. It was badly mauled by +conversion from WP to LaTeX and needs to be rebuilt with reference to +the old WP version. + + - Merge the various articles of shared text that are found in the +help, the manual, the miscellaneous documentation, the web page, +and/or elsewhere too. These should all be maintained from one master +source. The most urgent instances are "sbb" (the syntactic building +blocks, found in the examples directory, the misc documentation, and +the manual) and the Glossary. Note that the text in the help system is +often slightly different from the corresponding text in the manual on +purpose, because it plays a different role and has a somewhat +different narrative context. (However, the various copies of the +Glossary should all be the same. These include: the manual appendix, +the glossary web page, the copy in the misc docs, and the one in the +help.) + + - The manual needs a discussion of Unicode, wchar_t, wide characters, +multibyte characters, and all that. Of course, first we have to +actually *work* with such constructs... (All four listed terms need to +appear in the index, too.) + + - Somewhere in the manual there should be a section to tell you what +to do for interactive use. The most important issue, if each input +line isn't a complete parse, is how to make sure it won't try to look +ahead past the end of a line when the user expects the line to be +complete and the program to do something. + + - Somewhere in the manual there should be a section to tell you what +to do for a parser that runs indefinitely and only exits at program +shutdown, like you might use to handle a network protocol in a daemon. + + - It would be nice to have a section or appendix that describes +common sources of conflicts and what to do about them. Such constructs +include: + - if-then-else (text already exists) + - repeated repetitions (a -> b..., b -> c...) + - poisoning lookahead by bad nesting of productions + (I had an example involving trying to distinguish whitespace + and comments at too high a level) + - sequences of nondelimited tokens (a -> identifier...) + - generalize the previous to the full lexer model + - ... + + +------------------------------------------------------------ +EXAMPLES + + - Update the examples to use agclib1 (as seen in XIDEK) instead of +oldclasslib. +