view help2html/notes.txt @ 1:60d3ca9d3f6b

Fix stupid bug in help2html.
author David A. Holland
date Tue, 12 Aug 2008 00:13:42 -0400
parents 13d2b8934445
children
line wrap: on
line source


Notes on HTML version of help.src
---------------------------------

----- Overview of structure assumptions and treatment of  help.src ------

(a) The help.src file

help.src consists of a sequence of topics, each with a title line and
a topic body. They are only partly sorted by title line. There may or
may not be "blank line"s between the topics. "Blank line"s, however,
are necessary to separate some sections in the topic bodies. It was
found necessary to disallow extra spaces or tabs in a "blank line",
i.e. a blank line is just a newline (lf) char.  It turns out Jerry had
already found it necessary to remove such extra spaces.

There must be at least one "blank line" after a title line, preceding
the topic body. A topic ends with an "end topic" line starting with
"##" beginning in column 1. The end topic line may have some spaces
before the terminating newline.

The final eof line can have leading blanks and be preceded by "blank
line"s as above.


(b) Title:

Title line is one line only, begins in col. 1 with a "lead title char":

  lead title char = char - blank - tab - ',' - bullet - '\n',  char=~eof

Here a "bullet" is a control-G (bell) character, aka 0x7.

The individual topics in the title are comma-separated. They may have
these chars:

   title char = char - ',' - bullet - '\n',  where  char=~eof


(c) Topic body:

Inspection of help.src suggested that there were 6 different kinds of
paragraphs in a topic body, which were separated by blank lines. The
first paragraph was always of type "text" - this turned out to be
needed for the parsing. Remaining paragraphs could be text or look
like a table, code, or several varieties of list.

(c1) Text paragraphs
The first line of a text paragraph had to begin with a 

   lead topic char = char - blank - tab - bullet - '\n',  where char=~eof

or with a single blank followed by a lead topic char. The leading
blank seemed to occur only after a preceding table-type section, after
a blank line.  It seems that Jerry uses this construct for some
purpose, but it didn't seem to be necessary for mhh5 parsing.

Some text paragraphs had code interspersed, not always separated with
a newline, and some had a following "table" or list without a blank
line as separator.

(c2) Table paragraphs
A paragraph with a tab in col 1 of the first line was assumed to be
some sort of table. Sometimes there were several leading tabs.
Sometimes there were further sequences of tabs in the rest of the
line. Mostly there was just one set of tabs - these were easy to deal
with. There are only a couple of cases of several sets of tabs, which
correspond to 2 and 3-column tables. As of July 12/01 mhh5 uses the
tabs to construct multi-column tables and the contents of the table
cells are treated as code.

A table can occur following a text paragraph, tab list or one-space
list without an intervening blank line.

Treating the whole table as code enclosed in <pre> </pre> tags,
removing the leading tabs and any leading spaces, worked very well
except for the 2 or 3-column tables.
 
(c3) Code paragraphs
If a paragraph began with a sequence of 2, 3, 4, 5, 6, 8, or 10 spaces
(all these sequences occur in help.src) it is considered a code
paragraph and enclosed in <pre> </pre> tags. In this case the leading
spaces are preserved.

(c4) List paragraphs
If a paragraph begins with a "bullet" (control-G) character it is
deemed to be some sort of list. Sometimes the list items are separated
by blank lines and sometimes not. The bullet was variously followed by
one space, two spaces, or a tab. The two-spaces version occurs only
once in help.src (in part of What's New) and possibly has no
significance for Jerry's parsing; maybe it could be eliminated.

All 3 list types were turned into HTML unordered lists, with the
leading bullet and spaces or tab swallowed. Since list topics
sometimes were separated with blank lines and sometimes not, it was
necessary to keep an AgStack with the current list type on it -
actually a multi-valued flag would have been sufficient but the stack
might also be useful for tables.

A one-space list can be followed by a table or code without an
intervening blank line and a tab list can be similarly followed by a
table.

============================================================================


July 1, 2001

The list2 variation, with 2 leading spaces after the control G instead
of 1 as in list1, appears to occur *only* in Wnat's New (not What's
New in AnaGram 2.0). There doesn't seem to be any reason for it in
terms of how AnaGram online help treats it.

Internal Error topic has a line of code with a period after it. This
looks wrong and also looks wrong in the online help. Period should be
removed from help.src.

Keyword topic - the list here has extra line spacing for the last list
items, plus an embedded code example, so spacing doesn't look too
good.  Possibly the line spacing could be altered in help.src. Also,
use of IF and IFF to demonstrate keyword lookahead is not a good
choice for HTML as the I looks like a vertical divider in some
fonts. Change to EX and EXT or GO and GOO?

Also, the use of double quotes around "keyword" doesn't look proper in
HTML and is probably not good even in help.src. Remove quotes in
help.src?


July 3, 2001

The exit_flag topic has a table which does not seem to be generated in
any regular manner and accommodates basically to the default help font
used by AnaGram, which is not monospaced.  Combinations of spaces and
tabs have been used to get alignment:

AG_RUNNING_CODE(15 char) is preceded by a tab, then 4 spaces and 4 tabs,
                          then = 0:tab and the explanation.
AG_SUCCESS_CODE (15 char) - same form as above but 8 space and 4 tabs
AG_SYNTAX_ERROR_CODE (20 char) -  3 spaces, 2 tabs
AG_REDUCTION_ERROR_CODE (23 char) -  1 space, 1 tab
AG_STACK_ERROR_CODE (19 char) -  3 spaces, 3 tabs
AG_SEMANTIC_ERROR_CODE (22 char) -  1 space, 2 tabs

It is desirable to keep the width to a minimum here because it governs 
the width chosen by the browser for displaying *all* the text.
Possible to create a table (ugh) or swallow the spaces and tabs in 
favor of a single space.

The What's New topic has an example under Bug Fixes which uses a lot
of spaces before the reduction procedure. These make the HTML page too
wide - and in the online help, they do not seem to appear at all,
probably being replaced with a single space.(?) This is a code
paragraph, leading with 8 and 10 spaces, not a table paragraph; hard
to see what to do other than change help.src.


July 4, 2001

mhh5 now inserts links properly. Turns out that some links in help.src
include the "s" at the end inside the link if they are plural, even if
the help topic is singular. So it is necessary to strip the "s" and
test again for a matching topic title.

Topics still in need of some adjustment are:
  exit_flag
  Internal Error
  Keyword
  Virtual Production
  What's New
  Character Constant
  Data Type
  PCB_TYPE
Some of these are discussed above. Mostly the problem is use of tabs 
and spaces in help.src in a way that does not lend itself to general 
rules to apply to the html.

Still need to be able to replace <,>, & in help.src with the
appropriate entities before creating the html version.


July 12, 2001

mhh5 now inserts entities and can handle multiple-column tables (as
detected by tab sequences in the table lines).


Aug 3, 2001

mhh5.syn should have some comments inserted at the beginning to
mention this notes.txt file and to say what the program does. It
should also probably print a comment at the beginning of the html file
along the lines of:

  " All the information in this file may also be accessed from within
AnaGram using Help Topics on the Help menu. This HTML file is provided
because it is sometimes more convenient to read the topics with your
browser, and beeause the browser can be used to search the whole file.
This is a long file with many links; some older browsers slow to a
crawl. Netscape works fine."

I am holding off on doing this because it means another round of
compilation and testing and Jerry has not been available to even look
at the output or pass on the above wording.

Moreover, the output file has not been tested with up-to-date versions
of Microsoft Internet Explorer. The old version on Secondo really does
slow to a crawl. The file is not really usable; not clear if it could
be improved by modifying the HTML if the newer Internet Explorer
versions can't deal with it either.