diff options
Diffstat (limited to 'doc/release2.html')
| -rw-r--r-- | doc/release2.html | 546 |
1 files changed, 546 insertions, 0 deletions
diff --git a/doc/release2.html b/doc/release2.html new file mode 100644 index 000000000..d34b49cc1 --- /dev/null +++ b/doc/release2.html @@ -0,0 +1,546 @@ +<html> + +<body bgcolor="#FFFFFF" text="#000000"> + +<center> + +<h1>Grammatical Framework Version 2</h1> + +Release of Version 2.0 + +<p> + +Planned: 24 June 2004 + +<p> + +<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a> + +</center> + + +<!-- NEW --> + +<h2>Highlights</h2> + +Module system. + +<p> + +Separate compilation to canonical GF. + +<p> + +Improved GUI. + +<p> + +Improved parser generation. + +<p> + +Improved shell (new commands and options, help, error messages). + +<p> + +Accurate <a href="DocGF.pdf">language specification</a> +(also of GFC). + +<p> + +Extended resource library. + +<p> + +Extended Numerals library. + + +<!-- NEW --> + + + +<h2>Module system</h2> + +<li> Separate modules for <tt>abstract</tt>, + <tt>concrete</tt>, and <tt>resource</tt>. +<li> Replaces the file-based <tt>include</tt> system +<li> Name space handling with qualified names +<li> Hierarchic structure (single inheritance <tt>**</tt>) + + cross-cutting reuse (<tt>open</tt>) +<li> Separate compilation, one module per file +<li> Reuse of <tt>abstract</tt>+<tt>concrete</tt> as <tt>resource</tt> +<li> Parametrized modules: + <tt>interface</tt>, <tt>instance</tt>, <tt>incomplete</tt>. +<li> New experimental module types: <tt>transfer</tt>, + <tt>union</tt>. + + +<!-- NEW --> + +<h4>Canonical format GFC</h4> + +<li> The target of GF compiler; to reuse, just read in. + +<li> Readable by Haskell/Java/C++/C applications (by BNFC generated parsers). + + + +<!-- NEW --> + +<h4>New features in expression language</h4> + +In addition to the module system: + +<p> + +<li> Disjunctive patterns <tt>P | ... | Q</tt>. +<li> String patterns <tt>"foo"</tt>. +<li> (?) Integer patterns <tt>74</tt>. +<li> Binding token <tt>&+</tt> to glue separate tokens at unlexing phase, + and unlexer to resolve this. +<li> New syntax alternatives for local definitions: <tt>let</tt> without + braces and <tt>where</tt>. +<li> Pattern variables can be used on lhs's of <tt>oper</tt> definitions. +<li> New Unicode transliterations (by Harad Hammarström). + + +<!-- NEW --> + +<h4>New shell commands and command functionalities</h4> + +<li> <tt>pi</tt> = <tt>print_info</tt>: information on an identifier in scope. +<li> <tt>h</tt> = <tt>help</tt> now in long or short form, + and on individual commands. +<li> <tt>gt</tt> = <tt>generate_trees</tt>: all trees of a given + category or instantiations of a given incomplete term, up to a + given depth. +<li> <tt>gr</tt> = <tt>generate_random</tt> can now be given + an incomplete term as an argument, to constrain generation. +<li> <tt>so</tt> = <tt>show_opers</tt> shows all <tt>ope</tt> + operations with a given value type. +<li> <tt>pm</tt> = <tt>print_multi</tt> prints the multilingual + grammar resident in the current state to a ready-compiles + <tt>.gfcm</tt> file. +<li> All commands have both long and short names (see help). Short + names are easier to type, whereas long names + make scripts more readable. +<li> Meaningless command options generate warnings. + + +<!-- NEW --> + +<h4>New editor features</h4> + +<li> Active text field: click the middle button in the focus to send + in refinement through the parser. +<li> Clipboard: copy complex terms into the refine menu. +<li> Two-step refinements generated by the "Generate" operation. + +<!-- NEW --> + +<h4>Improved implementation</h4> + +<li> Haskell source code is organized into subdirectories. +<li> BNF Converter is used for defining the languages GF and GFC, which also + give reliable LaTeX documentation. +<li> Lexical rules sorted out by option <tt>-cflexer</tt> for efficient + parsing with large lexica. +<li> GHC optimizations and strictness flags are used for improving performance. + + +<!-- NEW --> + +<h4>New parser (work in progress)</h4> + +<li> By Peter Ljunglöf, based on MCFG. +<li> Much more efficient for morphology and discontinuous constituents. +<li> Treatment of cyclic rules. +<li> Currently lots of alternative parsers via flags <tt>-parser=newX</tt>. + + +<!-- NEW --> + +<h2>Status (21/6/2004)</h2> + +Grammar compiler, editor GUIs, and shell work for all platforms +(with restrictions for Solaris). + +<p> + +The updated <tt>HelpFile</tt> (accessible through <tt>h</tt> command) +marks unsupported features present in GF 1.2 with <tt>*</tt>. +They will be supported again if interested users appear. + +<p> + +GF1 grammars can be automatically translated to GF2 (although the +result is not as good +as manual, since indentation and comments are destroyed). The results can be +saved in GF2 files, but this is not necessary. +Some rarely used GF1 features are no longer supported (see next section). + +<p> + +It is also possible to write a GF2 grammar back to GF1, with the +command <tt>pg -printer=old</tt>. + + +<!-- NEW --> + +Resource libraries +and some example grammars and have been +converted. Most old example grammars work without any changes. +There is a new resource API with +many new constructions. + +<p> + +A make facility works, finding out which modules have to be recompiled. + +<p> + +Soundness checking of module depencencies and completeness is not +complete. This means that some errors may show up too late. + +<p> + +The environment variable <tt>GF_LIB_PATH</tt> needs some more work. + +<p> + +Latex and XML printing of grammars do not work yet. + + + +<!-- NEW --> + +<h2>How to use GF 1.* files</h2> + +Backward compatibility with respect to old GF grammars has been +a central goal. All GF grammars, from version 0.9, should work in +the old way in GF2. The main exceptions are some features that +are rarely used. +<ul> +<li> The <tt>package</tt> system introduced in GF 1.2, cannot be + interpreted in the module system of GF 2.0, since packages are in + mutual scope with the top level. +<li> <tt>tokenizer</tt> pragmas are cannot be parsed any more. In GF + 1.2, they are already replaced by <tt>lexer</tt> flags. +<li> <tt>var</tt> pragmas cannot be parsed any more. +</ul> + +<p> + +Very old GF grammars (from versions before 0.9), with the completely +different notation, do not work. They should be first converted to +GF1 by using GF version 1.2. + + +<!-- NEW --> + + +The import command <tt>i</tt> can be given the option <tt>-old</tt>. E.g. +<pre> + i -old tut1.Eng.g2 +</pre> +But this is no more necessary: GF2 detects automatically if a grammar +is in the GF1 format. + +<p> + +Importing a set of GF2 files generates, internally, three modules: +<pre> + abstract tut1 = ... + resource ResEng = ... + concrete Eng of tut1 = open ResEng in ... +</pre> +(The names are different if the file name has fewer parts.) + + +<p> + +The option <tt>-o</tt> causes GF2 to write these modules into files. + + +<!-- NEW --> + +The flags <tt>-abs</tt>, <tt>-cnc</tt>, and <tt>-res</tt> can be used +to give custom names to the modules. In particular, it is good to use +the <tt>-abs</tt> flag to guarantee that the abstract syntax module +has the same name for all grammars in a multilingual environmens: +<pre> + i -old -abs=Numerals hungarian.gf + i -old -abs=Numerals tamil.gf + i -old -abs=Numerals sanskrit.gf +</pre> + +<p> + +The same flags as in the import command can be used when invoking +GF2 from the system shell. Many grammars can be imported on the same command +line, e.g. +<pre> + % gf2 -old -abs=Tutorial tut1.Eng.gf tut1.Fin.gf tut1.Fra.gf +</pre> + +<p> + +To write a GF2 grammar back to GF1 (as one big file), use the command +<pre> + > pg -old +</pre> + + + +<!-- NEW --> + + + +GF2 has more reserved words than GF 1.2. When old files are read, a preprocessor +replaces every identifier that has the shape of a new reserved word +with a variant where the last letter is replaced by <tt>Z</tt>, e.g. +<tt>instance</tt> is replaced by <tt>instancZ</tt>. This method is of course +unsafe and should be replaced by something better. + + + + +<!-- NEW --> + +<h2>Abstract, concrete, and resource modules</h2> + +Judgement forms are sorted as follows: +<ul> +<li> abstract: + <tt>cat</tt>, <tt>fun</tt>, <tt>def</tt>, <tt>data</tt>, <tt>flags</tt> +<li> concrete: + <tt>lincat</tt>, <tt>cat</tt>, <tt>printname</tt>, <tt>flags</tt> +<li> resource: + <tt>param</tt>, <tt>oper</tt>, <tt>flags</tt> +<li> +</ul> + + +<!-- NEW --> + +Example: +<pre> + abstract Sums = { + cat + Exp ; + fun + One : Exp ; + plus : Exp -> Exp -> Exp ; + } + + concrete EnglishSums of Sums = open ResEng in { + lincat + Exp = {s : Str ; n : Number} ; + lin + One = expSg "one" ; + sum x y = expSg ("the" ++ "sum" ++ "of" ++ x.s ++ "and" ++ y.s) ; + } + + resource ResEng = { + param + Number = Sg | Pl ; + oper + expSG : Str -> {s : Str ; n : Number} = \s -> {s = s ; n = Sg} ; + } +</pre> + + + +<!-- NEW --> + +<h2>Opening and extending modules</h2> + +A <tt>concrete</tt> or <tt>resource</tt> can <b>open</b> a +<tt>resource</tt>. This means that +<ul> +<li> the names defined in <tt>resource</tt> can be used ("become visible") +<li> but: these names are not included in ("exported from") the opening module +</ul> +A module of any type can moreover <b>extend</b> a module of the same type. +This means that +<ul> +<li> the names defined in the extended module can be used ("become visible") +<li> and also: these names are included in ("exported from") the extending module +</ul> +Examples of extension: +<pre> + abstract Products = Sums ** { + fun times : Exp -> Exp -> Exp ; + } + -- names exported: Exp, plus, times + + concrete English of Products = EnglishSums ** open ResEng in { + lin times x y = expSg ("the" ++ "product" ++ "of" ++ x.s ++ "and" ++ y.s) ; + } +</pre> +Another important difference: +<li> extension is single +<li> opening can be multiple: <tt>open Foo, Bar, Baz in {...}</tt> + +<!-- NEW --> + +Moreover: +<li> opening can be <b>qualified</b> +<p> +Example of qualified opening: +<pre> + concrete NumberSystems of Systems = open (Bin = Binary), (Dec = Decimal) in { + lin + BZero = Bin.Zero ; + DZero = Dec.Zero + } +</pre> + + +<!-- NEW --> + +<h2>Compiling modules</h2> + +Separate compilation assumes there is <b>one module per file</b>. + +<p> + +The <b>module header</b> is the beginning of the module code up to the +first left bracket (<tt>{</tt>). The header gives +<ul> +<li> the module type: <tt>abstract</tt>, <tt>concrete</tt> (<tt>of</tt> <i>A</i>), + or <tt>resource</tt> +<li> the name of the module (next to the module type keyword) +<li> the name of extended module (between <tt>=</tt> and <tt>**</tt>) +<li> the names of opened modules +</ul> + +<!-- NEW --> + + +<b>filename</b> = <b>modulename</b> <tt>.</tt> <b>extension</b> + +<p> + +File name extensions: +<ul> +<li> <tt>gf</tt>: GF source file (uses GF syntax, is type checked and compiled) +<li> <tt>gfc</tt>: canonical GF file (uses GFC syntax, is simply read +in instead of compiled; produced from all kinds of modules) +<li> <tt>gfr</tt>: GF resource file (uses GF syntax, is only read in; produced from +<tt>resource</tt> modules) +<li> <tt>gfcm</tt>: canonical multilingual GF file +(uses GFC syntax, is only read in; produced +from a set of <tt>abstract</tt> and <tt>conctrete</tt> modules) +</ul> +Only <tt>gf</tt> files should ever be written/edited manually! + + + +<!-- NEW --> + + +What the make facility does when compiling <tt>Foo.gf</tt> +<ol> +<li> read the module header of <tt>Foo.gf</tt>, and recursively all headers from +the modules it <b>depends</b> on (i.e. extends or opens) +<li> build a dependency graph of these modules, and do topological sorting +<li> starting from the first module in topological order, +compare the modification times of each <tt>gf</tt> and <tt>gfc</tt> file: +<ul> +<li> if <tt>gf</tt> is later, compile the module and all modules depending on it +<li> if <tt>gfc</tt> is later, just read in the module +</ul> +</ol> +Inside the GF shell, also time stamps of modules read into memory are +taken into account. Thus a module need not be read from a file if the +module is in the memory and the file has not been modified. + + +<!-- NEW --> + +If the compilation of a grammar fails at some module, the state of the +GF shell contains all modules read up to that point. This makes it +faster to compile the faulty module again after fixing it. + +<p> + +Use the command <tt>po</tt> = <tt>print_options</tt> to see what +modules are in the state. + +<p> + +To force compilation: +<ul> +<li> The flag <i>-src</i> in the import command forces compilation from + source even if more recent object files exist. This is useful + when testing new versions of GF. +<li> The flag <i>-retain</i> in the import command forces reading in + <tt>gfr</tt> files in addition to <tt>gfc</tt> files. This is useful + when testing operations with the <tt>cc</tt> command. +</ul> + +<!-- NEW --> + +<h2>Module search paths</h2> + +Modules can reside in different directories. Use the <tt>path</tt> +flag to extend the directory search path. For instance, +<pre> + -path=.:../resource/russian:../prelude +</pre> +enables files to be found in three different directories. +By default, only the current directory is included. +If a <tt>path</tt> flag is given, the current directory +<tt>.</tt> must be explicitly included if it is wanted. + +<p> + +The <tt>path</tt> flag can be set in any of the following +places: +<ul> +<li> when invoking GF: <tt>gf -path=xxx</tt> +<li> when importing a module: <tt>i -path=xxx Foo.gf</tt> +<li> as a pragma in a topmost file: <tt>--# -path=xxx</tt> +</ul> +A flag set on a command line overrides ones set in files. + +<p> + +The value of the environment variable <tt>GF_LIB_PATH</tt> is +appended to the user-given path. + + +<!-- NEW --> + +<h2>To do</h2> + +Testing + +<p> + +Documentation + +<p> + +Packaging + + + +<!-- NEW --> + +<h2>Nasty details</h2> + + +<li> Readline in Solaris + +<li> Proper treatment file search paths + +<li> Unicode fonts in GUIs + +<li> directionality of Semitic alphabets + + + +</body> +</html> |
