diff options
| author | aarne <aarne@cs.chalmers.se> | 2008-06-27 11:32:49 +0000 |
|---|---|---|
| committer | aarne <aarne@cs.chalmers.se> | 2008-06-27 11:32:49 +0000 |
| commit | 64d2a981a99c8f48f85c4efd0cecd1db1e5ce93a (patch) | |
| tree | 8ec777785ae6b99e4ade6ab7c97a7653317b82ad | |
| parent | 032531c6a690edbb377ff11ee2a743a30c5bf500 (diff) | |
more rm in doc
| -rw-r--r-- | doc/gf-course.html | 221 | ||||
| -rw-r--r-- | doc/gf-course.txt | 149 | ||||
| -rw-r--r-- | doc/gf-help.txt | 699 | ||||
| -rw-r--r-- | doc/gf-history.html | 865 | ||||
| -rw-r--r-- | doc/gf-modules.html | 1183 | ||||
| -rw-r--r-- | doc/gf-modules.txt | 994 | ||||
| -rw-r--r-- | doc/overview-resource.txt | 300 |
7 files changed, 0 insertions, 4411 deletions
diff --git a/doc/gf-course.html b/doc/gf-course.html deleted file mode 100644 index 039bbe72c..000000000 --- a/doc/gf-course.html +++ /dev/null @@ -1,221 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> -<HTML> -<HEAD> -<META NAME="generator" CONTENT="http://txt2tags.sf.net"> -<TITLE>Graduate Course: GF (Grammatical Framework)</TITLE> -</HEAD><BODY BGCOLOR="white" TEXT="black"> -<P ALIGN="center"><CENTER><H1>Graduate Course: GF (Grammatical Framework)</H1> -<FONT SIZE="4"> -<I>Aarne Ranta</I><BR> -Wed Oct 24 09:49:27 2007 -</FONT></CENTER> - -<P> -<A HREF="http://www.gslt.hum.gu.se">GSLT</A>, -<A HREF="http://ngslt.org/">NGSLT</A>, -and -<A HREF="http://www.chalmers.se/cse/EN/">Department of Computer Science and Engineering</A>, -Chalmers University of Technology and Gothenburg University. -</P> -<P> -Autumn Term 2007. -</P> -<H1>News</H1> -<P> -24/10 Tomorrow's session starts at 8.15. A detailed plan has been added to -the table below. Material (new chapters) will appear later today. -It will explain some of the files in -</P> -<UL> -<LI><A HREF="http://digitalgrammars.com/gf/examples/tutorial/syntax/"><CODE>syntax/</CODE></A>: - linguistic grammar programming -<LI><A HREF="http://digitalgrammars.com/gf/examples/tutorial/semantics/"><CODE>semantics/</CODE></A>: - a question-answer system based on logical semantics -</UL> - -<P> -12/9 The course starts tomorrow at 8.00. A detailed plan for the day is -right below. Don't forget to -</P> -<UL> -<LI>join the mailing list (send a mail to <CODE>gf-subscribe at gslt hum gu se</CODE>) -<LI>install GF on your laptops from <A HREF="../download.html">here</A> -<LI>take with you a copy of the book (as sent to the mailing list yesterday) -</UL> - -<P> -31/8 Revised the description of the one- and five-point variants. -</P> -<P> -21/8 Course mailing list started. -To subscribe, send a mail to <CODE>gf-subscribe at gslt hum gu se</CODE> -(replacing spaces by dots except around the word at, where the spaces -are just removed, and the word itself is replaced by the at symbol). -</P> -<P> -20/8/2007 <A HREF="http://www.gslt.hum.gu.se/courses/schedule.html">Schedule</A>. -The course will start on Thursday 13 September in Room C430 at the Humanities -Building of Gothenburg University ("Humanisten"). -</P> -<H1>Plan</H1> -<P> -First week (13-14/9) -</P> -<TABLE CELLPADDING="4" BORDER="1"> -<TR> -<TH>Time</TH> -<TH>Subject</TH> -<TH COLSPAN="2">Assignment</TH> -</TR> -<TR> -<TD>Thu 8.00-9.30</TD> -<TD>Chapters 1-3</TD> -<TD>Hello and Food in a new language</TD> -</TR> -<TR> -<TD>Thu 10.00-11.30</TD> -<TD>Chapters 3-4</TD> -<TD>Foods in a new language</TD> -</TR> -<TR> -<TD>Thu 13.15-14.45</TD> -<TD>Chapter 5</TD> -<TD>ExtFoods in a new language</TD> -</TR> -<TR> -<TD>Thu 15.15-16.45</TD> -<TD>Chapters 6-7</TD> -<TD>straight code compiler</TD> -</TR> -<TR> -<TD>Fri 8.00-9.30</TD> -<TD>Chapters 8</TD> -<TD>application in Haskell or Java</TD> -</TR> -</TABLE> - -<P></P> -<P> -Second week (25/10) -</P> -<TABLE CELLPADDING="4" BORDER="1"> -<TR> -<TH>Time</TH> -<TH>Subject</TH> -<TH COLSPAN="2">Assignment</TH> -</TR> -<TR> -<TD>Thu 8.15-9.45</TD> -<TD>Chapters 13-15</TD> -<TD>mini resource in a new language</TD> -</TR> -<TR> -<TD>Thu 10.15-11.45</TD> -<TD>Chapters 12,16</TD> -<TD>query system for a new domain</TD> -</TR> -<TR> -<TD>Thu 13.15-14.45</TD> -<TD>presentations</TD> -<TD>explain your own project</TD> -</TR> -</TABLE> - -<P></P> -<P> -The structure of each lecture will be the following: -</P> -<UL> -<LI>ca. 75min lecture, going through the book -<LI>ca. 15min work on computer, individually or in pairs -</UL> - -<P> -In order for this to work out, it is important that enough many -have a working GF installation, including the directory -<A HREF="../examples/tutorial"><CODE>examples/tutorial</CODE></A>. This directory is -included in the Darcs version, as well as in the updated binary -packages from 12 September. -</P> -<H1>Purpose</H1> -<P> -<A HREF="http://www.cs.chalmers.se/~aarne/GF/">GF</A> -(Grammatical Framework) is a grammar formalism, i.e. a special-purpose -programming language for writing grammars. It is suitable for many -natural language processing tasks, in particular, -</P> -<UL> -<LI>multilingual applications -<LI>systems where grammar-based components are needed for e.g. - parsing, translation, or speech recognition -</UL> - -<P> -The goal of the course is to develop an understanding of GF and -practical skills in using it. -</P> -<H1>Contents</H1> -<P> -The course consists of two modules. The first module is a one-week -intensive course (during the first intensive week of GSLT), which -is as such usable as a one-week intensive course for doctoral studies, -if completed with a small course project. -</P> -<P> -The second module is a larger programming project, written -by each student (possibly working in groups) during the Autumn term. -The projects are discussed during the second intensive week of GSLT -(see <A HREF="http://www.gslt.hum.gu.se/courses/schedule.html">schedule</A>), -and presented at a date that will be set later. -</P> -<P> -The first module goes through the basics of GF, including -</P> -<UL> -<LI>using the GF programming language -<LI>writing multilingual grammars -<LI>using the - <A HREF="http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/">GF resource grammar library</A> -<LI>generating speech recognition systems from GF grammars -<LI>using embedded grammars as components of software systems -</UL> - -<P> -The lectures follow a draft of GF book. It contains a heavily updated -version os the -<A HREF="http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html">GF Tutorial</A>; -thus the on-line tutorial is not adequate for this course. To get the course -book, join the course mailing list. -</P> -<P> -Those who just want to do the first module will write a simple application -as their course work during and after the first intensive week. -</P> -<P> -Those who continue with the second module will choose a more substantial -project. Possible topics are -</P> -<UL> -<LI>building a dialogue system by using GF -<LI>implementing a multilingual document generator -<LI>experimenting with synthetized multilingual tree banks -<LI>extending the GF resource grammar library -</UL> - -<H1>Prerequisites</H1> -<P> -Experience in programming. No earlier natural language processing -or functional programming experience is necessary. -</P> -<P> -The course is thus suitable both for GSLT and NGSLT students, -and for graduate students in computer science. -</P> -<P> -We will in particular welcome students from the Baltic countries -who wish to build resources for their own language in GF. -</P> - -<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) --> -<!-- cmdline: txt2tags gf-course.txt --> -</BODY></HTML> diff --git a/doc/gf-course.txt b/doc/gf-course.txt deleted file mode 100644 index 846186049..000000000 --- a/doc/gf-course.txt +++ /dev/null @@ -1,149 +0,0 @@ -Graduate Course: GF (Grammatical Framework) -Aarne Ranta -%%date(%c) - -% NOTE: this is a txt2tags file. -% Create an html file from this file using: -% txt2tags -thtml --toc gf-reference.html - -%!target:html - -[GSLT http://www.gslt.hum.gu.se], -[NGSLT http://ngslt.org/], -and -[Department of Computer Science and Engineering http://www.chalmers.se/cse/EN/], -Chalmers University of Technology and Gothenburg University. - -Autumn Term 2007. - - -=News= - -24/10 Tomorrow's session starts at 8.15. A detailed plan has been added to -the table below. Material (new chapters) will appear later today. -It will explain some of the files in -- [``syntax/`` http://digitalgrammars.com/gf/examples/tutorial/syntax/]: - linguistic grammar programming -- [``semantics/`` http://digitalgrammars.com/gf/examples/tutorial/semantics/]: - a question-answer system based on logical semantics - - - -12/9 The course starts tomorrow at 8.00. A detailed plan for the day is -right below. Don't forget to -- join the mailing list (send a mail to ``gf-subscribe at gslt hum gu se``) -- install GF on your laptops from [here ../download.html] -- take with you a copy of the book (as sent to the mailing list yesterday) - - -31/8 Revised the description of the one- and five-point variants. - -21/8 Course mailing list started. -To subscribe, send a mail to ``gf-subscribe at gslt hum gu se`` -(replacing spaces by dots except around the word at, where the spaces -are just removed, and the word itself is replaced by the at symbol). - -20/8/2007 [Schedule http://www.gslt.hum.gu.se/courses/schedule.html]. -The course will start on Thursday 13 September in Room C430 at the Humanities -Building of Gothenburg University ("Humanisten"). - - -=Plan= - -First week (13-14/9) - -|| Time | Subject | Assignment || -| Thu 8.00-9.30 | Chapters 1-3 | Hello and Food in a new language | -| Thu 10.00-11.30 | Chapters 3-4 | Foods in a new language | -| Thu 13.15-14.45 | Chapter 5 | ExtFoods in a new language | -| Thu 15.15-16.45 | Chapters 6-7 | straight code compiler | -| Fri 8.00-9.30 | Chapters 8 | application in Haskell or Java | - -Second week (25/10) - -|| Time | Subject | Assignment || -| Thu 8.15-9.45 | Chapters 13-15 | mini resource in a new language | -| Thu 10.15-11.45 | Chapters 12,16 | query system for a new domain | -| Thu 13.15-14.45 | presentations | explain your own project | - - - -The structure of each lecture will be the following: -- ca. 75min lecture, going through the book -- ca. 15min work on computer, individually or in pairs - - -In order for this to work out, it is important that enough many -have a working GF installation, including the directory -[``examples/tutorial`` ../examples/tutorial]. This directory is -included in the Darcs version, as well as in the updated binary -packages from 12 September. - - - -=Purpose= - -[GF http://www.cs.chalmers.se/~aarne/GF/] -(Grammatical Framework) is a grammar formalism, i.e. a special-purpose -programming language for writing grammars. It is suitable for many -natural language processing tasks, in particular, -- multilingual applications -- systems where grammar-based components are needed for e.g. - parsing, translation, or speech recognition - - -The goal of the course is to develop an understanding of GF and -practical skills in using it. - - -=Contents= - -The course consists of two modules. The first module is a one-week -intensive course (during the first intensive week of GSLT), which -is as such usable as a one-week intensive course for doctoral studies, -if completed with a small course project. - -The second module is a larger programming project, written -by each student (possibly working in groups) during the Autumn term. -The projects are discussed during the second intensive week of GSLT -(see [schedule http://www.gslt.hum.gu.se/courses/schedule.html]), -and presented at a date that will be set later. - -The first module goes through the basics of GF, including -- using the GF programming language -- writing multilingual grammars -- using the - [GF resource grammar library http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/] -- generating speech recognition systems from GF grammars -- using embedded grammars as components of software systems - - -The lectures follow a draft of GF book. It contains a heavily updated -version os the -[GF Tutorial http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html]; -thus the on-line tutorial is not adequate for this course. To get the course -book, join the course mailing list. - -Those who just want to do the first module will write a simple application -as their course work during and after the first intensive week. - -Those who continue with the second module will choose a more substantial -project. Possible topics are -- building a dialogue system by using GF -- implementing a multilingual document generator -- experimenting with synthetized multilingual tree banks -- extending the GF resource grammar library - - - -=Prerequisites= - -Experience in programming. No earlier natural language processing -or functional programming experience is necessary. - -The course is thus suitable both for GSLT and NGSLT students, -and for graduate students in computer science. - -We will in particular welcome students from the Baltic countries -who wish to build resources for their own language in GF. - diff --git a/doc/gf-help.txt b/doc/gf-help.txt deleted file mode 100644 index d77e9aff7..000000000 --- a/doc/gf-help.txt +++ /dev/null @@ -1,699 +0,0 @@ -=GF Command Help= - -Each command has a long and a short name, options, and zero or more -arguments. Commands are sorted by functionality. The short name is -given first. - -Commands and options marked with * are currently not implemented. - -==Commands that change the state== - -``` -i, import: i File - Reads a grammar from File and compiles it into a GF runtime grammar. - Files "include"d in File are read recursively, nubbing repetitions. - If a grammar with the same language name is already in the state, - it is overwritten - but only if compilation succeeds. - The grammar parser depends on the file name suffix: - .gf normal GF source - .gfc canonical GF - .gfr precompiled GF resource - .gfcm multilingual canonical GF - .gfe example-based grammar files (only with the -ex option) - .gfwl multilingual word list (preprocessed to abs + cncs) - .ebnf Extended BNF format - .cf Context-free (BNF) format - .trc TransferCore format - options: - -old old: parse in GF<2.0 format (not necessary) - -v verbose: give lots of messages - -s silent: don't give error messages - -src from source: ignore precompiled gfc and gfr files - -gfc from gfc: use compiled modules whenever they exist - -retain retain operations: read resource modules (needed in comm cc) - -nocf don't build old-style context-free grammar (default without HOAS) - -docf do build old-style context-free grammar (default with HOAS) - -nocheckcirc don't eliminate circular rules from CF - -cflexer build an optimized parser with separate lexer trie - -noemit do not emit code (default with old grammar format) - -o do emit code (default with new grammar format) - -ex preprocess .gfe files if needed - -prob read probabilities from top grammar file (format --# prob Fun Double) - -treebank read a treebank file to memory (xml format) - flags: - -abs set the name used for abstract syntax (with -old option) - -cnc set the name used for concrete syntax (with -old option) - -res set the name used for resource (with -old option) - -path use the (colon-separated) search path to find modules - -optimize select an optimization to override file-defined flags - -conversion select parsing method (values strict|nondet) - -probs read probabilities from file (format (--# prob) Fun Double) - -preproc use a preprocessor on each source file - -noparse read nonparsable functions from file (format --# noparse Funs) - examples: - i English.gf -- ordinary import of Concrete - i -retain german/ParadigmsGer.gf -- import of Resource to test - -r, reload: r - Executes the previous import (i) command. - -rl, remove_language: rl Language - Takes away the language from the state. - -e, empty: e - Takes away all languages and resets all global flags. - -sf, set_flags: sf Flag* - The values of the Flags are set for Language. If no language - is specified, the flags are set globally. - examples: - sf -nocpu -- stop showing CPU time - sf -lang=Swe -- make Swe the default concrete - -s, strip: s - Prune the state by removing source and resource modules. - -dc, define_command Name Anything - Add a new defined command. The Name must star with '%'. Later, - if 'Name X' is used, it is replaced by Anything where #1 is replaced - by X. - Restrictions: Currently at most one argument is possible, and a defined - command cannot appear in a pipe. - To see what definitions are in scope, use help -defs. - examples: - dc %tnp p -cat=NP -lang=Eng #1 | l -lang=Swe -- translate NPs - %tnp "this man" -- translate and parse - -dt, define_term Name Tree - Add a constant for a tree. The constant can later be called by - prefixing it with '$'. - Restriction: These terms are not yet usable as a subterm. - To see what definitions are in scope, use help -defs. - examples: - p -cat=NP "this man" | dt tm -- define tm as parse result - l -all $tm -- linearize tm in all forms -``` - -==Commands that give information about the state== - -``` -pg, print_grammar: pg - Prints the actual grammar (overridden by the -lang=X flag). - The -printer=X flag sets the format in which the grammar is - written. - N.B. since grammars are compiled when imported, this command - generally does not show the grammar in the same format as the - source. In particular, the -printer=latex is not supported. - Use the command tg -printer=latex File to print the source - grammar in LaTeX. - options: - -utf8 apply UTF8-encoding to the grammar - flags: - -printer - -lang - -startcat -- The start category of the generated grammar. - Only supported by some grammar printers. - examples: - pg -printer=cf -- show the context-free skeleton - -pm, print_multigrammar: pm - Prints the current multilingual grammar in .gfcm form. - (Automatically executes the strip command (s) before doing this.) - options: - -utf8 apply UTF8 encoding to the tokens in the grammar - -utf8id apply UTF8 encoding to the identifiers in the grammar - examples: - pm | wf Letter.gfcm -- print the grammar into the file Letter.gfcm - pm -printer=graph | wf D.dot -- then do 'dot -Tps D.dot > D.ps' - -vg, visualize_graph: vg - Show the dependency graph of multilingual grammar via dot and gv. - -po, print_options: po - Print what modules there are in the state. Also - prints those flag values in the current state that differ from defaults. - -pl, print_languages: pl - Prints the names of currently available languages. - -pi, print_info: pi Ident - Prints information on the identifier. -``` - -==Commands that execute and show the session history== - -``` -eh, execute_history: eh File - Executes commands in the file. - -ph, print_history; ph - Prints the commands issued during the GF session. - The result is readable by the eh command. - examples: - ph | wf foo.hist" -- save the history into a file -``` - - -==Linearization, parsing, translation, and computation== - -``` -l, linearize: l PattList? Tree - Shows all linearization forms of Tree by the actual grammar - (which is overridden by the -lang flag). - The pattern list has the form [P, ... ,Q] where P,...,Q follow GF - syntax for patterns. All those forms are generated that match with the - pattern list. Too short lists are filled with variables in the end. - Only the -table flag is available if a pattern list is specified. - HINT: see GF language specification for the syntax of Pattern and Term. - You can also copy and past parsing results. - options: - -struct bracketed form - -table show parameters (not compatible with -record, -all) - -record record, i.e. explicit GF concrete syntax term (not compatible with -table, -all) - -all show all forms and variants (not compatible with -record, -table) - -multi linearize to all languages (can be combined with the other options) - flags: - -lang linearize in this grammar - -number give this number of forms at most - -unlexer filter output through unlexer - examples: - l -lang=Swe -table -- show full inflection table in Swe - -p, parse: p String - Shows all Trees returned for String by the actual - grammar (overridden by the -lang flag), in the category S (overridden - by the -cat flag). - options for batch input: - -lines parse each line of input separately, ignoring empty lines - -all as -lines, but also parse empty lines - -prob rank results by probability - -cut stop after first lexing result leading to parser success - -fail show strings whose parse fails prefixed by #FAIL - -ambiguous show strings that have more than one parse prefixed by #AMBIGUOUS - options for selecting parsing method: - -fcfg parse using a fast variant of MCFG (default is no HOAS in grammar) - -old parse using an overgenerating CFG (default if HOAS in grammar) - -cfg parse using a much less overgenerating CFG - -mcfg parse using an even less overgenerating MCFG - Note: the first time parsing with -cfg, -mcfg, and -fcfg may take a long time - options that only work for the -old default parsing method: - -n non-strict: tolerates morphological errors - -ign ignore unknown words when parsing - -raw return context-free terms in raw form - -v verbose: give more information if parsing fails - flags: - -cat parse in this category - -lang parse in this grammar - -lexer filter input through this lexer - -parser use this parsing strategy - -number return this many results at most - examples: - p -cat=S -mcfg "jag är gammal" -- parse an S with the MCFG - rf examples.txt | p -lines -- parse each non-empty line of the file - -at, apply_transfer: at (Module.Fun | Fun) - Transfer a term using Fun from Module, or the topmost transfer - module. Transfer modules are given in the .trc format. They are - shown by the 'po' command. - flags: - -lang typecheck the result in this lang instead of default lang - examples: - p -lang=Cncdecimal "123" | at num2bin | l -- convert dec to bin - -tb, tree_bank: tb - Generate a multilingual treebank from a list of trees (default) or compare - to an existing treebank. - options: - -c compare to existing xml-formatted treebank - -trees return the trees of the treebank - -all show all linearization alternatives (branches and variants) - -table show tables of linearizations with parameters - -record show linearization records - -xml wrap the treebank (or comparison results) with XML tags - -mem write the treebank in memory instead of a file TODO - examples: - gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file - rf tb.xml | tb -c -- compare-test treebank from file - rf old.xml | tb -trees | tb -xml -- create new treebank from old - -ut, use_treebank: ut String - Lookup a string in a treebank and return the resulting trees. - Use 'tb' to create a treebank and 'i -treebank' to read one from - a file. - options: - -assocs show all string-trees associations in the treebank - -strings show all strings in the treebank - -trees show all trees in the treebank - -raw return the lookup result as string, without typechecking it - flags: - -treebank use this treebank (instead of the latest introduced one) - examples: - ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation - ut -assocs | grep "ComplV2" -- show all associations with ComplV2 - -tt, test_tokenizer: tt String - Show the token list sent to the parser when String is parsed. - HINT: can be useful when debugging the parser. - flags: - -lexer use this lexer - examples: - tt -lexer=codelit "2*(x + 3)" -- a favourite lexer for program code - -g, grep: g String1 String2 - Grep the String1 in the String2. String2 is read line by line, - and only those lines that contain String1 are returned. - flags: - -v return those lines that do not contain String1. - examples: - pg -printer=cf | grep "mother" -- show cf rules with word mother - -cc, compute_concrete: cc Term - Compute a term by concrete syntax definitions. Uses the topmost - resource module (the last in listing by command po) to resolve - constant names. - N.B. You need the flag -retain when importing the grammar, if you want - the oper definitions to be retained after compilation; otherwise this - command does not expand oper constants. - N.B.' The resulting Term is not a term in the sense of abstract syntax, - and hence not a valid input to a Tree-demanding command. - flags: - -table show output in a similar readable format as 'l -table' - -res use another module than the topmost one - examples: - cc -res=ParadigmsFin (nLukko "hyppy") -- inflect "hyppy" with nLukko - -so, show_operations: so Type - Show oper operations with the given value type. Uses the topmost - resource module to resolve constant names. - N.B. You need the flag -retain when importing the grammar, if you want - the oper definitions to be retained after compilation; otherwise this - command does not find any oper constants. - N.B.' The value type may not be defined in a supermodule of the - topmost resource. In that case, use appropriate qualified name. - flags: - -res use another module than the topmost one - examples: - so -res=ParadigmsFin ResourceFin.N -- show N-paradigms in ParadigmsFin - -t, translate: t Lang Lang String - Parses String in Lang1 and linearizes the resulting Trees in Lang2. - flags: - -cat - -lexer - -parser - examples: - t Eng Swe -cat=S "every number is even or odd" - -gr, generate_random: gr Tree? - Generates a random Tree of a given category. If a Tree - argument is given, the command completes the Tree with values to - the metavariables in the tree. - options: - -prob use probabilities (works for nondep types only) - -cf use a very fast method (works for nondep types only) - flags: - -cat generate in this category - -lang use the abstract syntax of this grammar - -number generate this number of trees (not impl. with Tree argument) - -depth use this number of search steps at most - examples: - gr -cat=Query -- generate in category Query - gr (PredVP ? (NegVG ?)) -- generate a random tree of this form - gr -cat=S -tr | l -- gererate and linearize - -gt, generate_trees: gt Tree? - Generates all trees up to a given depth. If the depth is large, - a small -alts is recommended. If a Tree argument is given, the - command completes the Tree with values to the metavariables in - the tree. - options: - -metas also return trees that include metavariables - -all generate all (can be infinitely many, lazily) - -lin linearize result of -all (otherwise, use pipe to linearize) - flags: - -depth generate to this depth (default 3) - -atoms take this number of atomic rules of each category (default unlimited) - -alts take this number of alternatives at each branch (default unlimited) - -cat generate in this category - -nonub don't remove duplicates (faster, not effective with -mem) - -mem use a memorizing algorithm (often faster, usually more memory-consuming) - -lang use the abstract syntax of this grammar - -number generate (at most) this number of trees (also works with -all) - -noexpand don't expand these categories (comma-separated, e.g. -noexpand=V,CN) - -doexpand only expand these categories (comma-separated, e.g. -doexpand=V,CN) - examples: - gt -depth=10 -cat=NP -- generate all NP's to depth 10 - gt (PredVP ? (NegVG ?)) -- generate all trees of this form - gt -cat=S -tr | l -- generate and linearize - gt -noexpand=NP | l -mark=metacat -- the only NP is meta, linearized "?0 +NP" - gt | l | p -lines -ambiguous | grep "#AMBIGUOUS" -- show ambiguous strings - -ma, morphologically_analyse: ma String - Runs morphological analysis on each word in String and displays - the results line by line. - options: - -short show analyses in bracketed words, instead of separate lines - -status show just the work at success, prefixed with "*" at failure - flags: - -lang - examples: - wf Bible.txt | ma -short | wf Bible.tagged -- analyse the Bible -``` - - -==Elementary generation of Strings and Trees== - -``` -ps, put_string: ps String - Returns its argument String, like Unix echo. - HINT. The strength of ps comes from the possibility to receive the - argument from a pipeline, and altering it by the -filter flag. - flags: - -filter filter the result through this string processor - -length cut the string after this number of characters - examples: - gr -cat=Letter | l | ps -filter=text -- random letter as text - -pt, put_tree: pt Tree - Returns its argument Tree, like a specialized Unix echo. - HINT. The strength of pt comes from the possibility to receive - the argument from a pipeline, and altering it by the -transform flag. - flags: - -transform transform the result by this term processor - -number generate this number of terms at most - examples: - p "zero is even" | pt -transform=solve -- solve ?'s in parse result - -* st, show_tree: st Tree - Prints the tree as a string. Unlike pt, this command cannot be - used in a pipe to produce a tree, since its output is a string. - flags: - -printer show the tree in a special format (-printer=xml supported) - -wt, wrap_tree: wt Fun - Wraps the tree as the sole argument of Fun. - flags: - -c compute the resulting new tree to normal form - -vt, visualize_tree: vt Tree - Shows the abstract syntax tree via dot and gv (via temporary files - grphtmp.dot, grphtmp.ps). - flags: - -c show categories only (no functions) - -f show functions only (no categories) - -g show as graph (sharing uses of the same function) - -o just generate the .dot file - examples: - p "hello world" | vt -o | wf my.dot ;; ! open -a GraphViz my.dot - -- This writes the parse tree into my.dot and opens the .dot file - -- with another application without generating .ps. -``` - -==Subshells== - -``` -es, editing_session: es - Opens an interactive editing session. - N.B. Exit from a Fudget session is to the Unix shell, not to GF. - options: - -f Fudget GUI (necessary for Unicode; only available in X Window System) - -ts, translation_session: ts - Translates input lines from any of the actual languages to all other ones. - To exit, type a full stop (.) alone on a line. - N.B. Exit from a Fudget session is to the Unix shell, not to GF. - HINT: Set -parser and -lexer locally in each grammar. - options: - -f Fudget GUI (necessary for Unicode; only available in X Windows) - -lang prepend translation results with language names - flags: - -cat the parser category - examples: - ts -cat=Numeral -lang -- translate numerals, show language names - -tq, translation_quiz: tq Lang Lang - Random-generates translation exercises from Lang1 to Lang2, - keeping score of success. - To interrupt, type a full stop (.) alone on a line. - HINT: Set -parser and -lexer locally in each grammar. - flags: - -cat - examples: - tq -cat=NP TestResourceEng TestResourceSwe -- quiz for NPs - -tl, translation_list: tl Lang Lang - Random-generates a list of ten translation exercises from Lang1 - to Lang2. The number can be changed by a flag. - HINT: use wf to save the exercises in a file. - flags: - -cat - -number - examples: - tl -cat=NP TestResourceEng TestResourceSwe -- quiz list for NPs - -mq, morphology_quiz: mq - Random-generates morphological exercises, - keeping score of success. - To interrupt, type a full stop (.) alone on a line. - HINT: use printname judgements in your grammar to - produce nice expressions for desired forms. - flags: - -cat - -lang - examples: - mq -cat=N -lang=TestResourceSwe -- quiz for Swedish nouns - -ml, morphology_list: ml - Random-generates a list of ten morphological exercises, - keeping score of success. The number can be changed with a flag. - HINT: use wf to save the exercises in a file. - flags: - -cat - -lang - -number - examples: - ml -cat=N -lang=TestResourceSwe -- quiz list for Swedish nouns -``` - - -==IO-related commands== - -``` -rf, read_file: rf File - Returns the contents of File as a String; error if File does not exist. - -wf, write_file: wf File String - Writes String into File; File is created if it does not exist. - N.B. the command overwrites File without a warning. - -af, append_file: af File - Writes String into the end of File; File is created if it does not exist. - -* tg, transform_grammar: tg File - Reads File, parses as a grammar, - but instead of compiling further, prints it. - The environment is not changed. When parsing the grammar, the same file - name suffixes are supported as in the i command. - HINT: use this command to print the grammar in - another format (the -printer flag); pipe it to wf to save this format. - flags: - -printer (only -printer=latex supported currently) - -* cl, convert_latex: cl File - Reads File, which is expected to be in LaTeX form. - -sa, speak_aloud: sa String - Uses the Flite speech generator to produce speech for String. - Works for American English spelling. - examples: - h | sa -- listen to the list of commands - gr -cat=S | l | sa -- generate a random sentence and speak it aloud - -si, speech_input: si - Uses an ATK speech recognizer to get speech input. - flags: - -lang: The grammar to use with the speech recognizer. - -cat: The grammar category to get input in. - -language: Use acoustic model and dictionary for this language. - -number: The number of utterances to recognize. - -h, help: h Command? - Displays the paragraph concerning the command from this help file. - Without the argument, shows the first lines of all paragraphs. - options - -all show the whole help file - -defs show user-defined commands and terms - -FLAG show the values of FLAG (works for grammar-independent flags) - examples: - h print_grammar -- show all information on the pg command - -q, quit: q - Exits GF. - HINT: you can use 'ph | wf history' to save your session. - -!, system_command: ! String - Issues a system command. No value is returned to GF. - example: - ! ls - -?, system_command: ? String - Issues a system command that receives its arguments from GF pipe - and returns a value to GF. - example: - h | ? 'wc -l' | p -cat=Num -``` - - -==Flags== - -The availability of flags is defined separately for each command. -``` --cat, category in which parsing is performed. - The default is S. - --depth, the search depth in e.g. random generation. - The default depends on application. - --filter, operation performed on a string. The default is identity. - -filter=identity no change - -filter=erase erase the text - -filter=take100 show the first 100 characters - -filter=length show the length of the string - -filter=text format as text (punctuation, capitalization) - -filter=code format as code (spacing, indentation) - --lang, grammar used when executing a grammar-dependent command. - The default is the last-imported grammar. - --language, voice used by Festival as its --language flag in the sa command. - The default is system-dependent. - --length, the maximum number of characters shown of a string. - The default is unlimited. - --lexer, tokenization transforming a string into lexical units for a parser. - The default is words. - -lexer=words tokens are separated by spaces or newlines - -lexer=literals like words, but GF integer and string literals recognized - -lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta - -lexer=chars each character is a token - -lexer=code use Haskell's lex - -lexer=codevars like code, but treat unknown words as variables, ?? as meta - -lexer=textvars like text, but treat unknown words as variables, ?? as meta - -lexer=text with conventions on punctuation and capital letters - -lexer=codelit like code, but treat unknown words as string literals - -lexer=textlit like text, but treat unknown words as string literals - -lexer=codeC use a C-like lexer - -lexer=ignore like literals, but ignore unknown words - -lexer=subseqs like ignore, but then try all subsequences from longest - --number, the maximum number of generated items in a list. - The default is unlimited. - --optimize, optimization on generated code. - The default is share for concrete, none for resource modules. - Each of the flags can have the suffix _subs, which performs - common subexpression elimination after the main optimization. - Thus, -optimize=all_subs is the most aggressive one. The _subs - strategy only works in GFC, and applies therefore in concrete but - not in resource modules. - -optimize=share share common branches in tables - -optimize=parametrize first try parametrize then do share with the rest - -optimize=values represent tables as courses-of-values - -optimize=all first try parametrize then do values with the rest - -optimize=none no optimization - --parser, parsing strategy. The default is chart. If -cfg or -mcfg are - selected, only bottomup and topdown are recognized. - -parser=chart bottom-up chart parsing - -parser=bottomup a more up to date bottom-up strategy - -parser=topdown top-down strategy - -parser=old an old bottom-up chart parser - --printer, format in which the grammar is printed. The default is - gfc. Those marked with M are (only) available for pm, the rest - for pg. - -printer=gfc GFC grammar - -printer=gf GF grammar - -printer=old old GF grammar - -printer=cf context-free grammar, with profiles - -printer=bnf context-free grammar, without profiles - -printer=lbnf labelled context-free grammar for BNF Converter - -printer=plbnf grammar for BNF Converter, with precedence levels - *-printer=happy source file for Happy parser generator (use lbnf!) - -printer=haskell abstract syntax in Haskell, with transl to/from GF - -printer=haskell_gadt abstract syntax GADT in Haskell, with transl to/from GF - -printer=morpho full-form lexicon, long format - *-printer=latex LaTeX file (for the tg command) - -printer=fullform full-form lexicon, short format - *-printer=xml XML: DTD for the pg command, object for st - -printer=old old GF: file readable by GF 1.2 - -printer=stat show some statistics of generated GFC - -printer=probs show probabilities of all functions - -printer=gsl Nuance GSL speech recognition grammar - -printer=jsgf Java Speech Grammar Format - -printer=jsgf_sisr_old Java Speech Grammar Format with semantic tags in - SISR WD 20030401 format - -printer=srgs_abnf SRGS ABNF format - -printer=srgs_abnf_non_rec SRGS ABNF format, without any recursion. - -printer=srgs_abnf_sisr_old SRGS ABNF format, with semantic tags in - SISR WD 20030401 format - -printer=srgs_xml SRGS XML format - -printer=srgs_xml_non_rec SRGS XML format, without any recursion. - -printer=srgs_xml_prob SRGS XML format, with weights - -printer=srgs_xml_sisr_old SRGS XML format, with semantic tags in - SISR WD 20030401 format - -printer=vxml Generate a dialogue system in VoiceXML. - -printer=slf a finite automaton in the HTK SLF format - -printer=slf_graphviz the same automaton as slf, but in Graphviz format - -printer=slf_sub a finite automaton with sub-automata in the - HTK SLF format - -printer=slf_sub_graphviz the same automaton as slf_sub, but in - Graphviz format - -printer=fa_graphviz a finite automaton with labelled edges - -printer=regular a regular grammar in a simple BNF - -printer=unpar a gfc grammar with parameters eliminated - -printer=functiongraph abstract syntax functions in 'dot' format - -printer=typegraph abstract syntax categories in 'dot' format - -printer=transfer Transfer language datatype (.tr file format) - -printer=cfg-prolog M cfg in prolog format (also pg) - -printer=gfc-prolog M gfc in prolog format (also pg) - -printer=gfcm M gfcm file (default for pm) - -printer=graph M module dependency graph in 'dot' (graphviz) format - -printer=header M gfcm file with header (for GF embedded in Java) - -printer=js M JavaScript type annotator and linearizer - -printer=mcfg-prolog M mcfg in prolog format (also pg) - -printer=missing M the missing linearizations of each concrete - --startcat, like -cat, but used in grammars (to avoid clash with keyword cat) - --transform, transformation performed on a syntax tree. The default is identity. - -transform=identity no change - -transform=compute compute by using definitions in the grammar - -transform=nodup return the term only if it has no constants duplicated - -transform=nodupatom return the term only if it has no atomic constants duplicated - -transform=typecheck return the term only if it is type-correct - -transform=solve solve metavariables as derived refinements - -transform=context solve metavariables by unique refinements as variables - -transform=delete replace the term by metavariable - --unlexer, untokenization transforming linearization output into a string. - The default is unwords. - -unlexer=unwords space-separated token list (like unwords) - -unlexer=text format as text: punctuation, capitals, paragraph <p> - -unlexer=code format as code (spacing, indentation) - -unlexer=textlit like text, but remove string literal quotes - -unlexer=codelit like code, but remove string literal quotes - -unlexer=concat remove all spaces - -unlexer=bind like identity, but bind at "&+" - --mark, marking of parts of tree in linearization. The default is none. - -mark=metacat append "+CAT" to every metavariable, showing its category - -mark=struct show tree structure with brackets - -mark=java show tree structure with XML tags (used in gfeditor) - --coding, Some grammars are in UTF-8, some in isolatin-1. - If the letters ä (a-umlaut) and ö (o-umlaut) look strange, either - change your terminal to isolatin-1, or rewrite the grammar with - 'pg -utf8'. -``` diff --git a/doc/gf-history.html b/doc/gf-history.html deleted file mode 100644 index 3fe8153e2..000000000 --- a/doc/gf-history.html +++ /dev/null @@ -1,865 +0,0 @@ -<html> -<body bgcolor="#FFFFFF" text="#000000" > -<center> -<IMG SRC="gf-logo.gif"> - - -<h1>Grammatical Framework History of Changes</h1> - - - -Changes in functionality since May 17, 2005, release of GF Version 2.2 - -</center> - -<p> - -25/6 (BB) -Added new speech recognition grammar printers for non-recursive SRGS grammars, -as used by Nuance Recognizer 9.0. Try <tt>pg -printer=srgs_xml_non_rec</tt> -or <tt>pg -printer=srgs_abnf_non_rec</tt>. - -<p> - -19/6 (AR) -Extended the functor syntax (<tt>with</tt> modules) so that the functor can have -restricted import and a module body (whose function is normally to complete restricted -import). Thus the following format is now possible: -<pre> - concrete C of A = E ** CI - [f,g] with (...) ** open R in {...} -</pre> -At the same time, the possibility of an empty module body was added to other modules -for symmetry. This can be useful for "proxy modules" that just collect other modules -without adding anything, e.g. -<pre> - abstract Math = Arithmetic, Geometry ; -</pre> - - -<p> - - -18/6 (AR) -Added a warning for clashing constants. A constant coming from multiple opened modules -was interpreted as "the first" found by the compiler, which was a source of difficult -errors. Clashing is officially forbidden, but we chose to give a warning instead of -raising an error to begin with (in version 2.8). - -<p> - -30/1/2007 (AR) -Semantics of variants fixed for complex types. Officially, it was only -defined for basic types (Str and parameters). When used for records, results were -multiplicative, which was nor usable. But now variants should work for any type. - -<p> - -<hr> - -<p> - -22/12 (AR) <b>Release of GF version 2.7</b>. - -<p> - -21/12 (AR) -Overloading rules for GF version 2.7: -<ol> -<li> If a unique instance is found by exact match with argument types, - that instance is used. -<li> Otherwise, if exact match with the expected value type gives a - uniques instance, that instance is used. -<li> Otherwise, if among possible instances only one returns a non-function - type, that instance is used, but a warning is issued. -<li> Otherwise, an error results, and the list of possible instances is shown. -</ol> -These rules are still experimental, but all future developments will guarantee -that their type-correct use will work. Rule (3) is only needed because the -current type checker does not always know an expected type. It can give -an incorrect result which is captured later in the compilation. To be noticed, -in particular, is that exact match is required. Match by subtyping will be -investigated later. - -<p> - -21/12 (BB) Java Speech Grammar Format with SISR tags can now be generated. -Use <tt>pg -printer=jsgf_sisr_old</tt>. The SISR tags are in Working Draft -20030401 format, which is supported by the OptimTALK VoiceXML interpreter -and the IBM XHTML+Voice implementation use by the Opera web browser. - -<p> - -21/12 (BB) <a name="voicexml"> -VoiceXML 2.0 dialog systems can now be generated from GF grammars. -Use <tt>pg -printer=vxml</tt>. - -<p> - -21/12 (BB) <a name="javascript"> -JavaScript code for linearization and type annotation can now be -generated from a multilingual GF grammar. Use <tt>pm -printer=js</tt>. - - -<p> - -5/12 (BB) <a name="gfcc2c"> -A new tool for generating C linearization libraries -from a GFCC file. <tt>make gfcc2c</tt> in <tt>src</tt> -compiles the tool. The generated -code includes header files in <tt>lib/c</tt> and should be linked -against <tt>libgfcc.a</tt> in <tt>lib/c</tt>. For an example of -using the generated code, see <tt>src/tools/c/examples/bronzeage</tt>. -<tt>make</tt> in that directory generates a GFCC file, then generates -C code from that, and then compiles a program <tt>bronzeage-test</tt>. -The <tt>main</tt> function for that program is defined in -<tt>bronzeage-test.c</tt>. - - -<p> - -20/11 (AR) Type error messages in concrete syntax are printed with a -heuristic where a type of the form <tt>{... ; lock_C : {} ; ...}</tt> -is printed as <tt>C</tt>. This gives more readable error messages, but -can produce wrong results if lock fields are hand-written or if subtypes -of lock-fielded categories are used. - -<p> - -17/11 (AR) <a name="overloading"> -Operation overloading: an <tt>oper</tt> can have many types, -from which one is picked at compile time. The types must have different -argument lists. Exact match with the arguments given to the <tt>oper</tt> -is required. An example is given in -<a href="../lib/resource-1.0/doc/gfdoc/Constructors.gf"><tt>Constructors.gf</tt></a>. -The purpose of overloading is to make libraries easier to use, since -only one name for each grammatical operation is needed: predication, modification, -coordination, etc. The concrete syntax is, at this experimental level, not -extended but relies on using a record with the function name repeated -as label name (see the example). The treatment of overloading is inspired -by C++, and was first suggested by Björn Nringert. - -<p> - - -3/10 (AR) A new low-level format <tt>gfcc</tt> ("Canonical Canonical GF"). -It is going to replace the <tt>gfc</tt> format later, but is already now -an efficient format for multilingual generation. -See <a href="../src/GF/Canon/GFCC/doc/gfcc.html">GFCC document</a> -for more information. - -<p> - -1/9 (AR) New way for managing errors in grammar compilation: -<pre> - Predef.Error : Type ; - Predef.error : Str -> Predef.Error ; -</pre> -Denotationally, <tt>Error</tt> is the empty type and thus a -subtype of any other types: it can be used anywhere. But the -<tt>error</tt> function is not canonical. Hence the compilation -is interrupted when <tt>(error s)</tt> is translated to GFC, and -the message <tt>s</tt> is emitted. An example use is given in -<tt>english/ParadigmsEng.gf</tt>: -<pre> - regDuplV : Str -> V ; - regDuplV fit = - case last fit of { - ("a" | "e" | "i" | "o" | "u" | "y") => - Predef.error (["final duplication makes no sense for"] ++ fit) ; - t => - let fitt = fit + t in - mkV fit (fit + "s") (fitt + "ed") (fitt + "ed") (fitt + "ing") - } ; -</pre> -This function thus cannot be applied to a stem ending with a vowel, -which is exactly what we want. In future, it may be good to add similar -checks to all morphological paradigms in the resource. - - -<p> - -16/8 (AR) New generation algorithm: slower but works with less -memory. Default of <tt>gt</tt>; use <tt>gt -mem</tt> for the old -algorithm. The new option <tt>gt -all</tt> lazily generates all -trees until interrupted. It cannot be piped to other GF commands, -hence use <tt>gt -all -lin</tt> to print out linearized strings -rather than trees. - -<hr> - - -22/6 (AR) <b>Release of GF version 2.6</b>. - -<p> - -20/6 (AR) The FCFG parser is know the default, as it even handles literals. -The old default can be selected by <tt>p -old</tt>. Since -FCFG does not support variable bindings, <tt>-old</tt> is automatically -selected if the grammar has bindings - and unless the <tt>-fcfg</tt> flag -is used. - -<p> - -17/6 (AR) The FCFG parser is now the recommended method for parsing -heavy grammars such as the resource grammars. It does not yet support -literals and variable bindings. - -<p> - -1/6 (AR) Added the FCFG parser written by Krasimir Angelov. Invoked by -<tt>p -fcfg</tt>. This parser is as general as MCFG but faster. -It needs more testing and debugging. - -<p> - -1/6 (AR) The command <tt>r = reload</tt> repeats the latest -<tt>i = import</tt> command. - -<p> - -30/5 (AR) It is now possible to use the flags <tt>-all, -table, -record</tt> -in combination with <tt>l -multi</tt>, and also with <tt>tb</tt>. - -<p> - -18/5 (AR) Introduced a wordlist format <tt>gfwl</tt> for -quick creation of language exercises and (in future) multilingual lexica. -The format is now very simple: -<pre> - # Svenska - Franska - Finska - berg - montagne - vuori - klättra - grimper / escalader - kiivetä / kiipeillä -</pre> -but can be extended to cover paradigm functions in addition to just -words. - -<p> - -3/4 (AR) The predefined abstract syntax type <tt>Int</tt> now has two -inherent parameters indicating its last digit and its size. The (hard-coded) -linearization type is -<pre> - {s : Str ; size : Predef.Ints 1 ; last : Predef.Ints 9} -</pre> -The <tt>size</tt> field has value <tt>1</tt> for integers greater than 9, and -value <tt>0</tt> for other integers (which are never negative). This parameter can -be used e.g. in calculating number agreement, -<pre> - Risala i = {s = i.s ++ table (Predef.Ints 1 * Predef.Ints 9) { - <0,1> => "risalah" ; - <0,2> => "risalatan" ; - <0,_> | <1,0> => "rasail" ; - _ => "risalah" - } ! <i.size,i.last> - } ; -</pre> -Notice that the table has to be typed explicitly for <tt>Ints k</tt>, -because type inference would otherwise return <tt>Int</tt> and therefore -fail to expand the table. - - -<p> - -31/3 (AR) Added flags and options to some commands, to help generation: -<ul> -<li> <tt>gt -noexpand=NP,V,TV</tt> does not expand these categories, -but only generates metavariables for them. -<li> <tt>gt -doexpand=NP,V,TV</tt> only expands these categories, -and generates metavariables for others. -<li> <tt>gr -cf</tt> has the same flags. -<li> <tt>l -mark=metacat</tt> marks the metavariables with their categories. -<li> <tt>p -fail</tt> marks with <tt>#FAIL</tt> strings that have no parse. -<li> <tt>p -ambiguous</tt> marks as <tt>#AMBIGUOUS</tt> - strings that have more than one parse. -</ul> - -<p> - -<hr> - -21/3/2006 <b>Release of GF 2.5</b>. - -<p> - -16/3 (AR) Added two flag values to <tt>pt -transform=X</tt>: -<tt>nodup</tt> which excludes terms where a constant is duplicated, -and -<tt>nodupatom</tt> which excludes terms where an atomic constant is duplicated. -The latter, in particular, is useful as a filter in generation: -<pre> - gt -cat=Cl | pt -transform=nodupatom -</pre> -This gives a corpus where words don't (usually) occur twice in the same clause. - -<p> - -6/3 (AR) Generalized the <tt>gfe</tt> file format in two ways: -<ol> -<li> Use the real grammar parser, hence <tt>(in M.C "foo")</tt> expressions - may occur anywhere. But the <i>ad hoc</i> word substitution syntax is - abandoned: ordinary <tt>let</tt> (and <tt>where</tt>) expressions - can now be used instead. -<li> The resource may now be a treebank, not just a grammar. Parsing - is thus replaced by treebank lookup, which in most cases is faster. -</ol> -A minor novelty is that the <tt>--# -resource=FILE</tt> flag can now be -relative to <tt>GF_LIB_PATH</tt>, both for grammars and treebanks. -The flag <tt> --# -treebank=IDENT</tt> gives the language whose treebank -entries are used, in case of a multilingual treebank. - -<p> - -4/3 (AR) Added command <tt>use_treebank = ut</tt> for lookup in a treebank. -This command can be used as a fast substitute for parsing, but also as a -way to browse treebanks. -<pre> - ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation - ut -assocs | grep "ComplV2" -- show all associations with ComplV2 -</pre> - -<p> - -3/3 (AR) Added option <tt>-treebank</tt> to the <tt>i</tt> command. This adds treebanks to -the shell state. The possible file formats are -<ol> -<li> XML file with a multilingual treebank, produced by <tt>tb -xml</tt> -<li> tab-organized text file with a unilingual treebank, produced by <tt>ut -assocs</tt> -</ol> -Notice that the treebanks in shell state are unilingual, and have strings as keys. -Multilingual treebanks have trees as keys. In case 1, one unilingual treebank per -language is built in the shell state. - - -<p> - -1/3 (AR) Added option <tt>-trees</tt> to the command <tt>tree_bank = tb</tt>. -By this option, the command just returns the trees in the treebank. It can be -used for producing new treebanks with the same trees: -<pre> - rf old.xml | tb -trees | tb -xml | wf new.xml -</pre> -Recall that only treebanks in the XML format can be read with the <tt>-trees</tt> -and <tt>-c</tt> flags. - -<p> - -1/3 (AR) A <tt>.gfe</tt> file can have a <tt>--# -path=PATH</tt> on its -second line. The file given on the first line (<tt>--# -resource=FILE</tt>) -is then read w.r.t. this path. This is useful if the resource file has -no path itself, which happens when it is gfc-only. - -<p> - -25/2 (AR) The flag <tt>preproc</tt> of the <tt>i</tt> command (and thereby -to <tt>gf</tt> itself) causes GF to apply a preprocessor to each sourcefile -it reads. - -<p> - -8/2 (AR) The command <tt>tb = tree_bank</tt> for creating and testing against -multilingual treebanks. Example uses: -<pre> - gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file - rf tb.txt | tb -c -- read comparison treebank from file -</pre> - -<p> - -10/1 (AR) Forbade variable binding inside negation and Kleene star -patterns. - -<p> - -7/1 (AR) Full set of regular expression patterns, with -as-patterns to enable variable bindings to matched expressions: -<ul> - <li> <i>p</i> <tt>+</tt> <i>q</i> : token consisting of <i>p</i> followed by <i>q</i> - <li> <i>p</i> <tt>*</tt> : token <i>p</i> repeated 0 or more times - (max the length of the strin to be matched) - <li> <tt>-</tt> <i>p</i> : matches anything that <i>p</i> does not match - <li> <i>x</i> <tt>@</tt> <i>p</i> : bind to <i>x</i> what <i>p</i> matches - <li> <i>p</i> <tt>|</tt> <i>q</i> : matches what either <i>p</i> or <i>q</i> matches -</ul> -The last three apply to all types of patterns, the first two only to token strings. -Example: plural formation in Swedish 2nd declension -(<i>pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar</i>): -<pre> - plural2 : Str -> Str = \w -> case w of { - pojk + "e" => pojk + "ar" ; - nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ; - bil => bil + "ar" - } ; -</pre> -Semantics: variables are always bound to the <b>first match</b>, in the sequence defined -as the list <tt>Match p v</tt> as follows: -<pre> - Match (p1|p2) v = Match p1 v ++ Match p2 v - Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s] - Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ... - Match c v = [[]] if c == v -- for constant patterns c - Match x v = [[(x,v)]] -- for variable patterns x - Match x@p v = [[(x,v)]] + M if M = Match p v /= [] - Match p v = [] otherwise -- failure -</pre> -Examples: -<ul> -<li> <tt>x + "e" + y</tt> matches <tt>"peter"</tt> with <tt>x = "p", y = "ter"</tt> -<li> <tt>x@("foo"*)</tt> matches any token with <tt>x = ""</tt> -<li> <tt>x + y@("er"*)</tt> matches <tt>"burgerer"</tt> with <tt>x = "burg", y = "erer"</tt> -</ul> -<p> - -6/1 (AR) Concatenative string patterns to help morphology definitions... -This can be seen as a step towards regular expression string patterns. -The natural notation <tt>p1 + p2</tt> will be considered later. -<b>Note</b>. This was done on 7/1. - -<p> - -5/1/2006 (BB) New grammar printers <tt>slf_sub</tt> and <tt>slf_sub_graphviz</tt> -for creating SLF networks with sub-automata. - -<hr> - -22/12 <b>Release of GF 2.4</b>. - -<p> - -21/12 (AR) It now works to parse escaped string literals from command -line, and also string literals with spaces: -<pre> - gf examples/tram0/TramEng.gf - > p -lexer=literals "I want to go to \"Gustaf Adolfs torg\" ;" - QInput (GoTo (DestNamed "Gustaf Adolfs torg")) -</pre> - -<p> - -20/12 (AR) Support for full disjunctive patterns (<tt>P|Q</tt>) i.e. -not just on top level. - -<p> - -14/12 (BB) The command <tt>si</tt> (<tt>speech_input</tt>) which creates -a speech recognizer from a grammar for English and admits speech input -of strings has been added. The command uses an -<a href="http://htk.eng.cam.ac.uk/develop/atk.shtml">ATK</a> recognizer and -creates a recognition -network which accepts strings in the currently active grammar. -In order to use the <tt>si</tt> command, -you need to install the -<a href="http://www.cs.chalmers.se/~bringert/darcs/atkrec/">atkrec library</a> -and configure GF with <tt>./configure --with-atk</tt> before compiling. -You need to set two environment variables for the <tt>si</tt> command to -work. <tt>ATK_HOME</tt> should contain the path to your copy of ATK -and <tt>GF_ATK_CFG</tt> should contain the path to your GF ATK configuration -file. A default version of this file can be found in - <tt>GF/src/gf_atk.cfg</tt>. - - -<p> - -11/12 (AR) Parsing of float literals now possible in object language. -Use the flag <tt>lexer=literals</tt>. - -<p> - -6/12 (AR) Accept <tt>param</tt> and <tt>oper</tt> definitions in -<tt>concrete</tt> modules. The definitions are just inlined in the -current module and not inherited. The purpose is to support rapid -prototyping of grammars. - -<p> - -2/12 (AR) The built-in type <tt>Float</tt> added to abstract syntax (and -resource). Values are stored as Haskell's <tt>Double</tt> precision -floats. For the syntax of float literals, see BNFC document. -NB: some bug still prevents parsing float literals in object -languages. <b>Bug fixed 11/12.</b> - -<p> - -1/12 (BB,AR) The command <tt>at = apply_transfer</tt>, which applies -a transfer function to a term. This is used for noncompositional -translation. Transfer functions are defined in a special transfer -language (file suffix <tt>.tr</tt>), which is compiled into a -run-time transfer core language (file suffix <tt>.trc</tt>). -The compiler is included in <tt>GF/transfer</tt>. The following is -a complete example of how to try out transfer: -<pre> - % cd GF/transfer - % make -- compile the trc compiler - % cd examples -- GF/transfer/examples - % ../compile_to_core -i../lib numerals.tr - % mv numerals.trc ../../examples/numerals - % cd ../../examples/numerals -- GF/examples/numerals - % gf - > i decimal.gf - > i BinaryDigits.gf - > i numerals.trc - > p -lang=Cncdecimal "123" | at num2bin | l - 1 0 0 1 1 0 0 1 1 1 0 -</pre> -Other relevant commands are: -<ul> -<li> <tt>i file.trc</tt>: import a transfer module -<li> <tt>pg -printer=transfer</tt>: create a syntax datatype in <tt>.tr</tt> format -</ul> -For more information on the commands, see <tt>help</tt>. Documentation on -the transfer language: to appear. - -<p> - -17/11 (AR) Made it possible for lexers to be nondeterministic. -Now with a simple-minded implementation that the parser is sent -each lexing result in turn. The option <tt>-cut</tt> is used for -breaking after first lexing leading to successful parse. The only -nondeterministic lexer right now is <tt>-lexer=subseqs</tt>, which -first filters with <tt>-lexer=ignore</tt> (dropping words neither in -the grammar nor literals) and then starts ignoring other words from -longest to shortest subsequence. This is usable for parser tasks -of keyword spotting type, but expensive (2<sup>n</sup>) in long input. -A smarter implementation is therefore desirable. - -<p> - -14/11 (AR) Functions can be made unparsable (or "internal" as -in BNFC). This is done by <tt>i -noparse=file</tt>, where -the nonparsable functions are given in <tt>file</tt> using the -line format <tt>--# noparse Funs</tt>. This can be used e.g. to -rule out expensive parsing rules. It is used in -<tt>lib/resource/abstract/LangVP.gf</tt> to get parse values -structured with <tt>VP</tt>, which is obtained via transfer. -So far only the default (= old) parser generator supports this. - -<p> - -14/11 (AR) Removed the restrictions how a lincat may look like. -Now any record type that has a value in GFC (i.e. without any -functions in it) can be used, e.g. {np : NP ; cn : Bool => CN}. -To display linearization values, only <tt>l -record</tt> shows -nice results. - -<p> - -9/11 (AR) GF shell state can now have several abstract syntaxes with -their associated concrete syntaxes. This allows e.g. parsing with -resource while testing an application. One can also have a -parse-transfer-lin chain from one abstract syntax to another. - -<p> -7/11 (BB) Running commands can now be interrupted with Ctrl-C, without -killing the GF process. This feature is not supported on Windows. - -<p> - -1/11 (AR) Yet another method for adding probabilities: append -<tt> --# prob Double</tt> to the end of a line defining a function. -This can be (1) a <tt>.cf</tt> rule (2) a <tt>fun</tt> rule, or -(3) a <tt>lin</tt> rule. The probability is attached to the -first identifier on the line. - -<p> -1/11 (BB) Added generation of weighted SRGS grammars. The weights -are calculated from the function probabilities. The algorithm -for calculating the weights is not yet very good. -Use <tt>pg -printer=srgs_xml_prob</tt>. - -<p> -31/10 (BB) Added option for converting grammars to SRGS grammars in XML format. -Use <tt>pg -printer=srgs_xml</tt>. - -<p> - -31/10 (AR) Probabilistic grammars. Probabilities can be used to -weight random generation (<tt>gr -prob</tt>) and to rank parse -results (<tt>p -prob</tt>). They are read from a separate file -(flag <tt>i -probs=File</tt>, format <tt>--# prob Fun Double</tt>) -or from the top-level grammar file itself (option <tt>i -prob</tt>). -To see the probabilities, use <tt>pg -printer=probs</tt>. -<br> -As a by-product, the probabilistic random generation algorithm is -available for any context-free abstract syntax. Use the flag -<tt>gr -cf</tt>. This algorithm is much faster than the -old (more general) one, but it may sometimes loop. - -<p> - -12/10 (AR) Flag <tt>-atoms=Int</tt> to the command <tt>gt = generate_trees</tt> -takes away all zero-argument functions except Int per category. In -this way, it is possible to generate a corpus illustrating each -syntactic structure even when the lexicon (which consists of -zero-argument functions) is large. - -<p> - -6/10 (AR) New commands <tt>dc = define_command</tt> and -<tt>dt = define_tree</tt> to define macros in a GF session. -See <tt>help</tt> for details and examples. - -<p> - -5/10 (AR) Printing missing linearization rules: -<tt>pm -printer=missing</tt>. Command <tt>g = grep</tt>, -which works in a way similar to Unix grep. - -<p> - -5/10 (PL) Printing graphs with function and category dependencies: -<tt>pg -printer=functiongraph</tt>, <tt>pg -printer=typegraph</tt>. - -<p> - -20/9 (AR) Added optimization by <b>common subexpression elimination</b>. -It works on GFC modules and creates <tt>oper</tt> definitions for -subterms that occur more than once in <tt>lin</tt> definitions. These -<tt>oper</tt> definitions are automatically reinlined in functionalities -that don't support <tt>oper</tt>s in GFC. This conversion is done by -module and the <tt>oper</tt>s are not inherited. Moreover, the subterms -can contain free variables which means that the <tt>oper</tt>s are not -always well typed. However, since all variables in GFC are type-specific -(and local variables are <tt>lin</tt>-specific), this does not destroy -subject reduction or cause illegal captures. -<br> -The optimization is triggered by the flag <tt>optimize=OPT_subs</tt>, -where <tt>OPT</tt> is any of the other optimizations (see <tt>h -optimize</tt>). -The most aggressive value of the flag is <tt>all_subs</tt>. In experiments, -the size of a GFC module can shrink by 85% compared to plain <tt>all</tt>. - -<p> - -18/9 (AR) Removed superfluous spaces from GFC printing. This shrinks -the GFC size by 5-10%. - -<p> - -15/9 (AR) Fixed some bugs in dependent-type type checking of abstract -modules at compile time. The type checker is more severe now, which means -that some old grammars may fail to compile - but this is usually the -right result. However, the type checker of <tt>def</tt> judgements still -needs work. - -<p> - -14/9 (AR) Added printing of grammars to a format without parameters, in -the spirit of Peanos "Latino sine flexione". The command <tt>pg -unpar</tt> -does the trick, and the result can be saved in a <tt>gfcm</tt> file. The generated -concrete syntax modules get the prefix <tt>UP_</tt>. The translation is briefly: -<pre> - (P => T)* = T* - (t ! p)* = t* - (table {p => t ; ...})* = t* -</pre> -In order for this to be maximally useful, the grammar should be written in such -a way that the first value of every parameter type is the desired one. For -instance, in Peano's case it would be the ablative for noun cases, the singular for -numbers, and the 2nd person singular imperative for verb forms. - -<p> - -14/9 (BB) Added finite state approximation of grammars. -Internally the conversion is done <tt>cfg -> regular -> fa -> slf</tt>, so the -different printers can be used to check the output of each stage. -The new options are: -<dl> -<dt><tt>pg -printer=slf</tt></dt> -<dd>A finite automaton in the HTK SLF format.</dd> -<dt><tt>pg -printer=slf_graphviz</tt></dt> -<dd>The same FA as in SLF, but in Graphviz format.</dd> -<dt><tt>pg -printer=fa_graphviz</tt></dt> -<dd>A finite automaton with labelled edges, instead of labelled nodes which SLF has.</dd> -<dt><tt>pg -printer=regular</tt></dt> -<dd>A regular grammar in a simple BNF.</dd> -</dl> - -<p> - -4/9 (AR) Added the option <tt>pg -printer=stat</tt> to show -statistics of gfc compilation result. To be extended with new information. -The most important stats now are the top-40 sized definitions. - -<p> -<hr> - -1/7 <b>Release of GF 2.3</b>. - -<p> - - -1/7 (AR) Added the flag <tt>-o</tt> to the <tt>vt</tt> command -to just write the <tt>.dot</tt> file without going to <tt>.ps</tt> -(cf. 20/6). - -<p> - -29/6 (AR) The printer used by Embedded Java GF Interpreter -(<tt>pm -header</tt>) now produces -working code from all optimized grammars - hence you need not select a -weaker optimization just to use the interpreter. However, the -optimization <tt>-optimize=share</tt> usually produces smaller object -grammars because the "unoptimizer" just undoes all optimizations. -(This is to be considered a temporary solution until the interpreter -knows how to handle stronger optimizations.) - -<p> - -27/6 (AR) The flag <tt>flags optimize=noexpand</tt> placed in a -resource module prevents the optimization phase of the compiler when -the <tt>.gfr</tt> file is created. This can prevent serious code -explosion, but it will also make the processing of modules using the -resource slowwer. A favourable example is <tt>lib/resource/finnish/ParadigmsFin</tt>. - -<p> - -23/6 (HD,AR) The new editor GUI <tt>gfeditor</tt> by Hans-Joachim -Daniels can now be used. It is based on Janna Khegai's <tt>jgf</tt>. -New functionality include HTML display (<tt>gfeditor -h</tt>) and -programmable refinement tooltips. - -<p> - -23/6 (AR) The flag <tt>unlexer=finnish</tt> can be used to bind -Finnish suffixes (e.g. possessives) to preceding words. The GF source -notation is e.g. <tt>"isä" ++ "&*" ++ "nsa" ++ "&*" ++ "ko"</tt>, -which unlexes to <tt>"isänsäkö"</tt>. There is no corresponding lexer -support yet. - - -<p> - -22/6 (PL,AR) The MCFG parser (<tt>p -mcfg</tt>) now works on all -optimized grammars - hence you need not select a weaker optimization -to use this parser. The same concerns the CFGM printer (<tt>pm -printer=cfgm</tt>). - -<p> - -20/6 (AR) Added the command <tt>visualize_tree</tt> = <tt>vt</tt>, to -display syntax trees graphically. Like <tt>vg</tt>, this command uses -GraphViz and Ghostview. The foremost use is to pipe the parser to this -command. - -<p> - -17/6 (BB) There is now support for lists in GF abstract syntax. -A list category is declared as: -<pre> -cat [C] -</pre> -or -<pre> -cat [C]{n} -</pre> -where <tt>C</tt> is a category and <tt>n</tt> is a non-negative integer. -<tt>cat [C]</tt> is equivalent to <tt>cat [C]{0}</tt>. List category -syntax can be used whereever categories are used. - -<p> - -<tt>cat [C]{n}</tt> is equivalent to the declarations: -<pre> -cat ListC -fun BaseC : C^n -> ListC -fun ConsC : C -> ListC -> ListC -</pre> - -where <tt>C^0 -> X</tt> means <tt>X</tt>, and <tt>C^m</tt> (where -m > 0) means <tt>C -> C^(m-1)</tt>. - -<p> - -A lincat declaration on the form: -<pre> -lincat [C] = T -</pre> -is equivalent to -<pre> -lincat ListC = T -</pre> - -The linearizations of the list constructors are written -just like they would be if the function declarations above -had been made manually, e.g.: -<pre> -lin BaseC x_1 ... x_n = t -lin ConsC x xs = t' -</pre> - -<p> - -10/6 (AR) Preprocessor of <tt>.gfe</tt> files can now be performed as part of -any grammar compilation. The flag <tt>-ex</tt> causes GF to look for -the <tt>.gfe</tt> files and preprocess those that are younger -than the corresponding <tt>.gf</tt> files. The files are first sorted -and grouped by the resource, so that each resource only need be compiled once. - -<p> - -10/6 (AR) Editor GUI can now be alternatively invoked by the shell -command <tt>gf -edit</tt> (equivalent to <tt>jgf</tt>). - -<p> - -10/6 (AR) Editor GUI command <tt>pc Int</tt> to pop <tt>Int</tt> -items from the clip board. - -<p> - -4/6 (AR) Sequence of commands in the Java editor GUI now possible. -The commands are separated by <tt> ;; </tt> (notice the space on -both sides of the two semicolons). Such a sequence can be sent -from the "GF Command" pop-up field, but is mostly intended -for external processes that communicate with GF. - -<p> - -3/6 (AR) The format <tt>.gfe</tt> defined to support -<b>grammar writing by examples</b>. Files of this format are first -converted to <tt>.gf</tt> files by the command -<pre> - gf -examples File.gfe -</pre> -See <a href="../lib/resource/doc/example/QuestionsI.gfe"> -<tt>../lib/resource/doc/examples/QuestionsI.gfe</tt></a> -for an example. - -<p> - -31/5 (AR) Default of p -rawtrees=k changed to 999999. - -<p> - -31/5 (AR) Support for restricted inheritance. Syntax: -<pre> - M -- inherit everything from M, as before - M [a,b,c] -- only inherit constants a,b,c - M-[a,b,c] -- inherit everything except a,b,c -</pre> -Caution: there is no check yet for completeness and -consistency, but restricted inheritance can create -run-time failures. - -<p> - -29/5 (AR) Parser support for reading GFC files line per line. -The category <tt>Line</tt> in <tt>GFC.cf</tt> can be used -as entrypoint instead of <tt>Grammar</tt> to achieve this. - -<p> - -28/5 (AR) Environment variables and path wild cards. -<ul> -<li> <tt>GF_LIB_PATH</tt> gives the location of <tt>GF/lib</tt> -<li> <tt>GF_GRAMMAR_PATH</tt> gives a list of directories appended - to the explicitly given path -<li> <tt>DIR/*</tt> is expanded to the union of all subdirectories - of <tt>DIR</tt> -</ul> -<p> - - -26/5/2005 (BB) Notation for list categories. - - - -</body> -</html> diff --git a/doc/gf-modules.html b/doc/gf-modules.html deleted file mode 100644 index 6292bd855..000000000 --- a/doc/gf-modules.html +++ /dev/null @@ -1,1183 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> -<HTML> -<HEAD> -<META NAME="generator" CONTENT="http://txt2tags.sf.net"> -<TITLE>The Module System of GF</TITLE> -</HEAD><BODY BGCOLOR="white" TEXT="black"> -<P ALIGN="center"><CENTER><H1>The Module System of GF</H1> -<FONT SIZE="4"> -<I>Aarne Ranta</I><BR> -8/4/2005 - 5/7/2007 -</FONT></CENTER> - -<P></P> -<HR NOSHADE SIZE=1> -<P></P> - <UL> - <LI><A HREF="#toc1">The principal module types</A> - <UL> - <LI><A HREF="#toc2">Abstract syntax</A> - <UL> - <LI><A HREF="#toc3">Compilation of abstract syntax</A> - </UL> - <LI><A HREF="#toc4">Concrete syntax</A> - <LI><A HREF="#toc5">Top-level grammar</A> - <UL> - <LI><A HREF="#toc6">Compiling top-level grammars</A> - <LI><A HREF="#toc7">Using top-level grammars</A> - </UL> - <LI><A HREF="#toc8">Multilingual grammar</A> - <UL> - <LI><A HREF="#toc9">Using multilingual grammars</A> - </UL> - <LI><A HREF="#toc10">Resource modules</A> - <UL> - <LI><A HREF="#toc11">Compiling resource modules</A> - <LI><A HREF="#toc12">Using resource modules</A> - </UL> - <LI><A HREF="#toc13">Inheritance</A> - <UL> - <LI><A HREF="#toc14">Multiple inheritance</A> - <LI><A HREF="#toc15">Restricted inheritance</A> - <LI><A HREF="#toc16">Compiling inheritance</A> - <LI><A HREF="#toc17">Inspecting grammar hierarchies</A> - </UL> - <LI><A HREF="#toc18">Reuse of top-level grammars as resources</A> - </UL> - <LI><A HREF="#toc19">Additional module types</A> - <UL> - <LI><A HREF="#toc20">Interfaces, instances, and incomplete grammars</A> - <UL> - <LI><A HREF="#toc21">Using an interface</A> - <LI><A HREF="#toc22">Instantiating an interface</A> - <LI><A HREF="#toc23">Compiling interfaces, instances, and parametrized modules</A> - </UL> - </UL> - <LI><A HREF="#toc24">Summary of module syntax and semantics</A> - <UL> - <LI><A HREF="#toc25">Abstract syntax modules</A> - <LI><A HREF="#toc26">Concrete syntax modules</A> - <LI><A HREF="#toc27">Resource modules</A> - <LI><A HREF="#toc28">Interface modules</A> - <LI><A HREF="#toc29">Instance modules</A> - <LI><A HREF="#toc30">Instantiated concrete syntax modules</A> - </UL> - </UL> - -<P></P> -<HR NOSHADE SIZE=1> -<P></P> -<P> -A GF grammar consists of a set of <B>modules</B>, which can be -combined in different ways to build different grammars. -There are several different <B>types of modules</B>: -</P> -<UL> -<LI><CODE>abstract</CODE> -<LI><CODE>concrete</CODE> -<LI><CODE>resource</CODE> -<LI><CODE>interface</CODE> -<LI><CODE>instance</CODE> -<LI><CODE>incomplete concrete</CODE> -</UL> - -<P> -We will go through the module types in this order, which is also -their order of "importance" from the most basic to -the more advanced ones. -</P> -<P> -This document presupposes knowledge of GF judgements and expressions, which can -be gained from the <A HREF="tutorial/gf-tutorial2.html">GF tutorial</A>. It aims -to give a systamatic description of the module system; -some tutorial information is repeated to make the document -self-contained. -</P> -<A NAME="toc1"></A> -<H1>The principal module types</H1> -<A NAME="toc2"></A> -<H2>Abstract syntax</H2> -<P> -Any GF grammar that is used in an application -will probably contain at least one module -of the <CODE>abstract</CODE> module type. Here is an example of -such a module, defining a fragment of propositional logic. -</P> -<PRE> - abstract Logic = { - cat Prop ; - fun Conj : Prop -> Prop -> Prop ; - fun Disj : Prop -> Prop -> Prop ; - fun Impl : Prop -> Prop -> Prop ; - fun Falsum : Prop ; - } -</PRE> -<P> -The <B>name</B> of this module is <CODE>Logic</CODE>. -</P> -<P> -An <CODE>abstract</CODE> module defines an <B>abstract syntax</B>, which -is a language-independent representation of a fragment of language. -It consists of two kinds of <B>judgements</B>: -</P> -<UL> -<LI><CODE>cat</CODE> judgements telling what <B>categories</B> there are - (types of abstract syntax trees) -<LI><CODE>fun</CODE> judgements telling what <B>functions</B> there are - (to build abstract syntax trees) -</UL> - -<P> -There can also be <CODE>def</CODE> and <CODE>data</CODE> judgements in an -abstract syntax. -</P> -<A NAME="toc3"></A> -<H3>Compilation of abstract syntax</H3> -<P> -The GF grammar compiler expects to find the module <CODE>Logic</CODE> in a file named -<CODE>Logic.gf</CODE>. When the compiler is run, it produces -another file, named <CODE>Logic.gfc</CODE>. This file is in the -format called <B>canonical GF</B>, which is the "machine language" -of GF. Next time that the module <CODE>Logic</CODE> is needed in -compiling a grammar, it can be read from the compiled (<CODE>gfc</CODE>) -file instead of the source (<CODE>gf</CODE>) file, unless the source -has been changed after the compilation. -</P> -<A NAME="toc4"></A> -<H2>Concrete syntax</H2> -<P> -In order for a GF grammar to describe a concrete language, the abstract -syntax must be completed with a <B>concrete syntax</B> of it. -For this purpose, we use modules of type <CODE>concrete</CODE>: for instance, -</P> -<PRE> - concrete LogicEng of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = a.s ++ "and" ++ b.s} ; - lin Disj a b = {s = a.s ++ "or" ++ b.s} ; - lin Impl a b = {s = "if" ++ a.s ++ "then" ++ b.s} ; - lin Falsum = {s = ["we have a contradiction"]} ; - } -</PRE> -<P> -The module <CODE>LogicEng</CODE> is a concrete syntax <CODE>of</CODE> the -abstract syntax <CODE>Logic</CODE>. The GF grammar compiler checks that -the concrete is valid with respect to the abstract syntax <CODE>of</CODE> -which it is claimed to be. The validity requires that there has to be -</P> -<UL> -<LI>a <CODE>lincat</CODE> judgement for each <CODE>cat</CODE> judgement, telling what the - <B>linearization types</B> of categories are -<LI>a <CODE>lin</CODE> judgement for each <CODE>fun</CODE> judgement, telling what the - <B>linearization functions</B> corresponding to functions are -</UL> - -<P> -Validity also requires that the linearization functions defined by -<CODE>lin</CODE> judgements are type-correct with respect to the -linearization types of the arguments and value of the function. -</P> -<P> -There can also be <CODE>lindef</CODE> and <CODE>printname</CODE> judgements in a -concrete syntax. -</P> -<A NAME="toc5"></A> -<H2>Top-level grammar</H2> -<P> -When a <CODE>concrete</CODE> module is successfully compiled, a <CODE>gfc</CODE> -file is produced in the same way as for <CODE>abstract</CODE> modules. The -pair of an <CODE>abstract</CODE> and a corresponding <CODE>concrete</CODE> module -is a <B>top-level grammar</B>, which can be used in the GF system to -perform various tasks. The most fundamental tasks are -</P> -<UL> -<LI><B>linearization</B>: take an abstract syntax tree and find the corresponding string -<LI><B>parsing</B>: take a string and find the corresponding abstract syntax - trees (which can be zero, one, or many) -</UL> - -<P> -In the current grammar, infinitely many trees and strings are recognized, although -no very interesting ones. For example, the tree -</P> -<PRE> - Impl (Disj Falsum Falsum) Falsum -</PRE> -<P> -has the linearization -</P> -<PRE> - if we have a contradiction or we have a contradiction then we have a contradiction -</PRE> -<P> -which in turn can be parsed uniquely as that tree. -</P> -<A NAME="toc6"></A> -<H3>Compiling top-level grammars</H3> -<P> -When GF compiles the module <CODE>LogicEng</CODE> it also has to compile -all modules that it <B>depends</B> on (in this case, just <CODE>Logic</CODE>). -The compilation process starts with dependency analysis to find -all these modules, recursively, starting from the explicitly imported one. -The compiler then reads either <CODE>gf</CODE> or <CODE>gfc</CODE> files, in -a dependency order. The decision on which files to read depends on -time stamps and dependencies in a natural way, so that all and only -those modules that have to be compiled are compiled. (This behaviour can -be changed with flags, see below.) -</P> -<A NAME="toc7"></A> -<H3>Using top-level grammars</H3> -<P> -To use a top-level grammar in the GF system, one uses the <CODE>import</CODE> -command (short name <CODE>i</CODE>). For instance, -</P> -<PRE> - i LogicEng.gf -</PRE> -<P> -It is also possible to specify the imported grammar(s) on the command -line when invoking GF: -</P> -<PRE> - gf LogicEng.gf -</PRE> -<P> -Various <B>compilation flags</B> can be added to both ways of compiling a module: -</P> -<UL> -<LI><CODE>-src</CODE> forces compilation form source files -<LI><CODE>-v</CODE> gives more verbose information on compilation -<LI><CODE>-s</CODE> makes compilation silent (except if it fails with an error message) -</UL> - -<P> -A complete list of flags can be obtained in GF by <CODE>help i</CODE>. -</P> -<P> -Importing a grammar makes it visible in GF's <B>internal state</B>. To see -what modules are available, use the command <CODE>print_options</CODE> (<CODE>po</CODE>). -You can empty the state with the command <CODE>empty</CODE> (<CODE>e</CODE>); this is -needed if you want to read in grammars with a different abstract syntax -than the current one without exiting GF. -</P> -<P> -Grammar modules can reside in different directories. They can then be found -by means of a <B>search path</B>, which is a flag such as -</P> -<PRE> - -path=.:api/toplevel:prelude -</PRE> -<P> -given to the <CODE>import</CODE> command or the shell command invoking GF. -(It can also be defined in the grammar file; see below.) The compiler -writes every <CODE>gfc</CODE> file in the same directory as the corresponding -<CODE>gf</CODE> file. -</P> -<P> -The <CODE>path</CODE> is relative to the working directory <CODE>pwd</CODE>, so that -all directories listed are primarily interpreted as subdirectories of -<CODE>pwd</CODE>. Secondarily, they are searched relative to the value of the -environment variable <CODE>GF_LIB_PATH</CODE>, which is by default set to -<CODE>/usr/local/share/GF</CODE>. -</P> -<P> -Parsing and linearization can be performed with the <CODE>parse</CODE> -(<CODE>p</CODE>) and <CODE>linearize</CODE> (<CODE>l</CODE>) commands, respectively. -For instance, -</P> -<PRE> - > l Impl (Disj Falsum Falsum) Falsum - if we have a contradiction or we have a contradiction then we have a contradiction - - > p -cat=Prop "we have a contradiction" - Falsum -</PRE> -<P> -Notice that the <CODE>parse</CODE> command needs the parsing category -as a flag. This necessary since a grammar can have several -possible parsing categories ("entry points"). -</P> -<A NAME="toc8"></A> -<H2>Multilingual grammar</H2> -<P> -One <CODE>abstract</CODE> syntax can have several <CODE>concrete</CODE> syntaxes. -Here are two new ones for <CODE>Logic</CODE>: -</P> -<PRE> - concrete LogicFre of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = a.s ++ "et" ++ b.s} ; - lin Disj a b = {s = a.s ++ "ou" ++ b.s} ; - lin Impl a b = {s = "si" ++ a.s ++ "alors" ++ b.s} ; - lin Falsum = {s = ["nous avons une contradiction"]} ; - } - - concrete LogicSymb of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = "(" ++ a.s ++ "&" ++ b.s ++ ")"} ; - lin Disj a b = {s = "(" ++ a.s ++ "v" ++ b.s ++ ")"} ; - lin Impl a b = {s = "(" ++ a.s ++ "->" ++ b.s ++ ")"} ; - lin Falsum = {s = "_|_"} ; - } -</PRE> -<P> -The four modules <CODE>Logic</CODE>, <CODE>LogicEng</CODE>, <CODE>LogicFre</CODE>, and -<CODE>LogicSymb</CODE> together form a <B>multilingual grammar</B>, in which -it is possible to perform parsing and linearization with respect to any -of the concrete syntaxes. As a combination of parsing and linearization, -one can also perform <B>translation</B> from one language to another. -(By <B>language</B> we mean the set of expressions generated by one -concrete syntax.) -</P> -<A NAME="toc9"></A> -<H3>Using multilingual grammars</H3> -<P> -Any combination of abstract syntax and corresponding concrete syntaxes -is thus a multilingual grammar. With many languages and other enrichments -(as described below), a multilingual grammar easily grows to the size of -tens of modules. The grammar developer, having finished her job, can -package the result in a <B>multilingual canonical grammar</B>, a file -with the suffix <CODE>.gfcm</CODE>. For instance, to compile the set of grammars -described by now, the following sequence of GF commands can be used: -</P> -<PRE> - i LogicEng.gf - i LogicFre.gf - i LogicSymb.gf - pm | wf logic.gfcm -</PRE> -<P> -The "end user" of the grammar only needs the file <CODE>logic.gfcm</CODE> to -access all the functionality of the multilingual grammar. It can be -imported in the GF system in the same way as <CODE>.gf</CODE> files. But -it can also be used in the -<A HREF="http://www.cs.chalmers.se/~bringert/gf/gf-java.html">Embedded Java Interpreter for GF</A> -to build Java programs of which the multilingual grammar functionalities -(linearization, parsing, translation) form a part. -</P> -<P> -In a multilingual grammar, the concrete syntax module names work as -names of languages that can be selected for linearization and parsing: -</P> -<PRE> - > l -lang=LogicFre Impl Falsum Falsum - si nous avons une contradiction alors nous avons une contradiction - - > l -lang=LogicSymb Impl Falsum Falsum - ( _|_ -> _|_ ) - - > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" - Conj Falsum Falsum -</PRE> -<P> -The option <CODE>-multi</CODE> gives linearization to all languages: -</P> -<PRE> - > l -multi Impl Falsum Falsum - if we have a contradiction then we have a contradiction - si nous avons une contradiction alors nous avons une contradiction - ( _|_ -> _|_ ) -</PRE> -<P> -Translation can be obtained by using a <B>pipe</B> from a parser -to a linearizer: -</P> -<PRE> - > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" | l -lang=LogicEng - if we have a contradiction then we have a contradiction -</PRE> -<P></P> -<A NAME="toc10"></A> -<H2>Resource modules</H2> -<P> -The <CODE>concrete</CODE> modules shown above would look much nicer if -we used the main idea of functional programming: avoid repetitive -code by using <B>functions</B> that capture repeated patterns of -expressions. A collection of such functions can be a valuable -<B>resource</B> for a programmer, reusable in many different -top-level grammars. Thus we introduce the <CODE>resource</CODE> -module type, with the first example -</P> -<PRE> - resource Util = { - oper SS : Type = {s : Str} ; - oper ss : Str -> SS = \s -> {s = s} ; - oper paren : Str -> Str = \s -> "(" ++ s ++ ")" ; - oper infix : Str -> SS -> SS -> SS = \h,x,y -> - ss (x.s ++ h ++ y.s) ; - oper infixp : Str -> SS -> SS -> SS = \h,x,y -> - ss (paren (infix h x y)) ; - } -</PRE> -<P> -Modules of <CODE>resource</CODE> type have two forms of judgement: -</P> -<UL> -<LI><CODE>oper</CODE> defining auxiliary operations -<LI><CODE>param</CODE> defining parameter types -</UL> - -<P> -A <CODE>resource</CODE> can be used in a <CODE>concrete</CODE> (or another -<CODE>resource</CODE>) by <CODE>open</CODE>ing it. This means that -all operations (and parameter types) defined in the resource -module become usable in module that opens it. For instance, -we can rewrite the module <CODE>LogicSymb</CODE> much more concisely: -</P> -<PRE> - concrete LogicSymb of Logic = open Util in { - lincat Prop = SS ; - lin Conj = infixp "&" ; - lin Disj = infixp "v" ; - lin Impl = infixp "->" ; - lin Falsum = ss "_|_" ; - } -</PRE> -<P> -What happens when this variant of <CODE>LogicSymb</CODE> is -compiled is that the <CODE>oper</CODE>-defined constants -of <CODE>Util</CODE> are <B>inlined</B> in the -right-hand-sides of the judgements of <CODE>LogicSymb</CODE>, -and these expressions are <B>partially evaluated</B>, i.e. -computed as far as possible. The generated <CODE>gfc</CODE> file -will look just like the file generated for the first version -of <CODE>LogicSymb</CODE> - at least, it will do the same job. -</P> -<P> -Several <CODE>resource</CODE> modules can be <CODE>open</CODE>ed -at the same time. If the modules contain same names, the -conflict can be resolved by <B>qualified</B> opening and -reference. For instance, -</P> -<PRE> - concrete LogicSymb of Logic = open Util, Prelude in { ... - } ; -</PRE> -<P> -(where <CODE>Prelude</CODE> is a standard library of GF) brings -into scope two definitions of the constant <CODE>SS</CODE>. -To specify which one is used, you can write -<CODE>Util.SS</CODE> or <CODE>Prelude.SS</CODE> instead of just <CODE>SS</CODE>. -You can also introduce abbreviations to avoid long qualifiers, e.g. -</P> -<PRE> - concrete LogicSymb of Logic = open (U=Util), (P=Prelude) in { ... - } ; -</PRE> -<P> -which means that you can write <CODE>U.SS</CODE> and <CODE>P.SS</CODE>. -</P> -<P> -Judgements of <CODE>param</CODE> and <CODE>oper</CODE> forms may also be used -in <CODE>concrete</CODE> modules, and they are then considered local -to those modules, i.e. they are not exported. -</P> -<A NAME="toc11"></A> -<H3>Compiling resource modules</H3> -<P> -The compilation of a <CODE>resource</CODE> module differs -from the compilation of <CODE>abstract</CODE> and -<CODE>concrete</CODE> modules because <CODE>oper</CODE> operations -do not in general have values in <CODE>gfc</CODE>. A <CODE>gfc</CODE> -file <I>is</I> generated, but it contains only -<CODE>param</CODE> judgements (also recall that <CODE>oper</CODE>s -are inlined in their top-level use sites, so it is not -necessary to save them in the compiled grammar). -However, since computing the operations over and over -again can be time comsuming, and since type checking -<CODE>resource</CODE> modules also takes time, a third kind -of file is generated for resource modules: a <CODE>.gfr</CODE> -file. This file is written in the GF source code notation, -but it is type checked and type annotated, and <CODE>oper</CODE>s -are computed as far as possible. -</P> -<P> -If you look at any <CODE>gfc</CODE> or <CODE>gfr</CODE> file generated -by the GF compiler, you see that all names have been replaced by -their qualified variants. This is an important first step (after parsing) -the compiler does. As for the commands in the GF shell, some output -qualified names and some not. The difference does not always result -from firm principles. -</P> -<A NAME="toc12"></A> -<H3>Using resource modules</H3> -<P> -The typical use is through <CODE>open</CODE> in a -<CODE>concrete</CODE> module, which means that -<CODE>resource</CODE> modules are not imported on their own. -However, in the developing and testing phase of grammars, it -can be useful to evaluate <CODE>oper</CODE>s with different -arguments. To prevent them from being thrown away after inlining, the -<CODE>-retain</CODE> option can be used: -</P> -<PRE> - > i -retain Util.gf -</PRE> -<P> -The command <CODE>compute_concrete</CODE> (<CODE>cc</CODE>) -can now be used for evaluating expressions that may contain -operations defined in <CODE>Util</CODE>: -</P> -<PRE> - > cc ss (paren "foo") - {s = "(" ++ "foo" ++ ")"} -</PRE> -<P> -To find out what <CODE>oper</CODE>s are available for a given type, -the command <CODE>show_operations</CODE> (<CODE>so</CODE>) can be used: -</P> -<PRE> - > so SS - Util.ss : Str -> SS ; - Util.infix : Str -> SS -> SS -> SS ; - Util.infixp : Str -> SS -> SS -> SS ; -</PRE> -<P></P> -<A NAME="toc13"></A> -<H2>Inheritance</H2> -<P> -The most characteristic modularity of GF lies in the division of -grammars into <CODE>abstract</CODE>, <CODE>concrete</CODE>, and -<CODE>resource</CODE> modules. This permits writing multilingual -grammar and sharing the maximum of code between different -languages. -</P> -<P> -In addition to this special kind of modularity, GF provides <B>inheritance</B>, -which is familiar from other programming languages (in particular, -object-oriented ones). Inheritance means that a module inherits all -judgements from another module; we also say that it <B>extends</B> -the other module. Inheritance is useful to divide big grammars into -smaller units, and also to reuse the same units in different bigger -grammars. -</P> -<P> -The first example of inheritance is for abstract syntax. Let us -extend the module <CODE>Logic</CODE> to <CODE>Arithmetic</CODE>: -</P> -<PRE> - abstract Arithmetic = Logic ** { - cat Nat ; - fun Even : Nat -> Prop ; - fun Odd : Nat -> Prop ; - fun Zero : Nat ; - fun Succ : Nat -> Nat ; - } -</PRE> -<P> -In parallel with the extension of the abstract syntax -<CODE>Logic</CODE> to <CODE>Arithmetic</CODE>, we can extend -the concrete syntax <CODE>LogicEng</CODE> to <CODE>ArithmeticEng</CODE>: -</P> -<PRE> - concrete ArithmeticEng of Arithmetic = LogicEng ** open Util in { - lincat Nat = SS ; - lin Even x = ss (x.s ++ "is" ++ "even") ; - lin Odd x = ss (x.s ++ "is" ++ "odd") ; - lin Zero = ss "zero" ; - lin Succ x = ss ("the" ++ "successor" ++ "of" ++ x.s) ; - } -</PRE> -<P> -Another extension of <CODE>Logic</CODE> is <CODE>Geometry</CODE>, -</P> -<PRE> - abstract Geometry = Logic ** { - cat Point ; - cat Line ; - fun Incident : Point -> Line -> Prop ; - } -</PRE> -<P> -The corresponding concrete syntax is left as exercise. -</P> -<A NAME="toc14"></A> -<H3>Multiple inheritance</H3> -<P> -Inheritance can be <B>multiple</B>, which means that a module -may extend many modules at the same time. Suppose, for instance, -that we want to build a module for mathematics covering both -arithmetic and geometry, and the underlying logic. We then write -</P> -<PRE> - abstract Mathematics = Arithmetic, Geometry ** { - } ; -</PRE> -<P> -We could of course add some new judgements in this module, but -it is not necessary to do so. If no new judgements are added, the -module body can be omitted: -</P> -<PRE> - abstract Mathematics = Arithmetic, Geometry ; -</PRE> -<P></P> -<P> -The module <CODE>Mathematics</CODE> shows that it is possibe -to extend a module already built by extension. The correctness -criterion for extensions is that the same name -(<CODE>cat</CODE>, <CODE>fun</CODE>, <CODE>oper</CODE>, or <CODE>param</CODE>) -may not be defined twice in the resulting union of names. -That the names defined in <CODE>Logic</CODE> are "inherited twice" -by <CODE>Mathematics</CODE> (via both <CODE>Arithmetic</CODE> and -<CODE>Geometry</CODE>) is no violation of this rule; the usual -problems of multiple inheritance do not arise, since -the definitions of inherited constants cannot be changed. -</P> -<A NAME="toc15"></A> -<H3>Restricted inheritance</H3> -<P> -Inheritance can be <B>restricted</B>, which means that only some of -the constants are inherited. There are two dual notations for this: -</P> -<PRE> - A [f,g] -</PRE> -<P> -meaning that <I>only</I> <CODE>f</CODE> and <CODE>g</CODE> are inherited from <CODE>A</CODE>, and -</P> -<PRE> - A-[f,g] -</PRE> -<P> -meaning that <I>everything except</I> <CODE>f</CODE> is <CODE>g</CODE> are inherited from <CODE>A</CODE>. -</P> -<P> -Constants that are not inherited may be redefined in the inheriting module. -</P> -<A NAME="toc16"></A> -<H3>Compiling inheritance</H3> -<P> -Inherited judgements are not copied into the inheriting modules. -Instead, an <B>indirection</B> is created for each inherited name, -as can be seen by looking into the generated <CODE>gfc</CODE> (and -<CODE>gfr</CODE>) files. Thus for instance the names -</P> -<PRE> - Mathematics.Prop Arithmetic.Prop Geometry.Prop Logic.Prop -</PRE> -<P> -all refer to the same category, declared in the module -<CODE>Logic</CODE>. -</P> -<A NAME="toc17"></A> -<H3>Inspecting grammar hierarchies</H3> -<P> -The command <CODE>visualize_graph</CODE> (<CODE>vg</CODE>) shows the -dependency graph in the current GF shell state. The graph can -also be saved in a file and used e.g. in documentation, by the -command <CODE>print_multi -graph</CODE> (<CODE>pm -graph</CODE>). -</P> -<P> -The <CODE>vg</CODE> command uses the free software packages Graphviz (commad <CODE>dot</CODE>) -and Ghostscript (command <CODE>gv</CODE>). -</P> -<A NAME="toc18"></A> -<H2>Reuse of top-level grammars as resources</H2> -<P> -Top-level grammars have a straightforward translation to -<CODE>resource</CODE> modules. The translation concerns -pairs of abstract-concrete judgements: -</P> -<PRE> - cat C ; ===> oper C : Type = T ; - lincat C = T ; - - fun f : A ; ===> oper f : A = t ; - lin f = t ; -</PRE> -<P> -Due to this translation, a <CODE>concrete</CODE> module -can be <CODE>open</CODE>ed in the same way as a -<CODE>resource</CODE> module; the translation is done -on the fly (it is computationally very cheap). -</P> -<P> -Modular grammar engineering often means that some grammarians -focus on the semantics of the domain whereas others take care -of linguistic details. Thus a typical reuse opens a -linguistically oriented <B>resource grammar</B>, -</P> -<PRE> - abstract Resource = { - cat S ; NP ; A ; - fun PredA : NP -> A -> S ; - } - concrete ResourceEng of Resource = { - lincat S = ... ; - lin PredA = ... ; - } -</PRE> -<P> -The <B>application grammar</B>, instead of giving linearizations -explicitly, just reduces them to categories and functions in the -resource grammar: -</P> -<PRE> - concrete ArithmeticEng of Arithmetic = LogicEng ** open ResourceEng in { - lincat Nat = NP ; - lin Even x = PredA x (regA "even") ; - } -</PRE> -<P> -If the resource grammar is only capable of generating grammatically -correct expressions, then the grammaticality of the application -grammar is also guaranteed: the type checker of GF is used as -grammar checker. -To guarantee distinctions between categories that have -the same linearization type, the actual translation used -in GF adds to every linearization type and linearization -a <B>lock field</B>, -</P> -<PRE> - cat C ; ===> oper C : Type = T ** {lock_C : {}} ; - lincat C = T ; - - fun f : C_1 ... C_n -> C ; ===> oper f : C_1 ... C_n -> C = \x_1,...,x_n -> - lin f = t ; t x_1 ... x_n ** {lock_C = &lt;>}; -</PRE> -<P> -(Notice that the latter translation is type-correct because of -record subtyping, which means that <CODE>t</CODE> can ignore the -lock fields of its arguments.) An application grammarian who -only uses resource grammar categories and functions never -needs to write these lock fields herself. Having to do so -serves as a warning that the grammaticality guarantee given -by the resource grammar no longer holds. -</P> -<P> -<B>Note</B>. The lock field mechanism is experimental, and may be changed -to a stronger abstraction mechnism in the future. This may result in -hand-written lock fields ceasing to work. -</P> -<A NAME="toc19"></A> -<H1>Additional module types</H1> -<A NAME="toc20"></A> -<H2>Interfaces, instances, and incomplete grammars</H2> -<P> -One difference between top-level grammars and <CODE>resource</CODE> -modules is that the former systematically separete the -declarations of categories and functions from their definitions. -In the reuse translation creating and <CODE>oper</CODE> judgement, -the declaration coming from the <CODE>abstract</CODE> module is put -together with the definition coming from the <CODE>concrete</CODE> -module. -</P> -<P> -However, the separation of declarations and definitions is so -useful a notion that GF also has specific modules types that -<CODE>resource</CODE> modules into two parts. In this splitting, -an <CODE>interface</CODE> module corresponds to an abstract syntax, -in giving the declarations of operations (and parameter types). -For instance, a generic markup interface would look as follows: -</P> -<PRE> - interface Markup = open Util in { - oper Boldface : Str -> Str ; - oper Heading : Str -> Str ; - oper markupSS : (Str -> Str) -> SS -> SS = \f,r -> - ss (f r.s) ; - } -</PRE> -<P> -The definitions of the constants declared in an <CODE>interface</CODE> -are given in an <CODE>instance</CODE> module (which is always <CODE>of</CODE> -an interface, in the same way as a <CODE>concrete</CODE> is always -<CODE>of</CODE> an abstract). The following <CODE>instance</CODE>s -define markup in HTML and latex. -</P> -<PRE> - instance MarkupHTML of Markup = open Util in { - oper Boldface s = "&lt;b>" ++ s ++ "&lt;/b>" ; - oper Heading s = "&lt;h2>" ++ s ++ "&lt;/h2>" ; - } - - instance MarkupLatex of Markup = open Util in { - oper Boldface s = "\\textbf{" ++ s ++ "}" ; - oper Heading s = "\\section{" ++ s ++ "}" ; - } -</PRE> -<P> -Notice that both <CODE>interface</CODE>s and <CODE>instance</CODE>s may -<CODE>open</CODE> <CODE>resource</CODE>s (and also reused top-level grammars). -An <CODE>interface</CODE> may moreover define some of the operations it -declares; these definitions are inherited by all instances and cannot -be changed in them. Inheritance by module extension -is possible, as always, between modules of the same type. -</P> -<A NAME="toc21"></A> -<H3>Using an interface</H3> -<P> -An <CODE>interface</CODE> or an <CODE>instance</CODE> -can be <CODE>open</CODE>ed in -a <CODE>concrete</CODE> using the same syntax as when opening -a <CODE>resource</CODE>. For an <CODE>instance</CODE>, the semantics -is the same as when opening the definitions together with -the type signatures - one can think of an <CODE>interface</CODE> -and an <CODE>instance</CODE> of it together forming an ordinary -<CODE>resource</CODE>. Opening an <CODE>interface</CODE>, however, -is different: functions that are only declared without -having a definition cannot be compiled (inlined); neither -can functions whose definitions depend on undefined functions. -</P> -<P> -A module that <CODE>open</CODE>s an <CODE>interface</CODE> is therefore -<B>incomplete</B>, and has to be <B>completed</B> with an -<CODE>instance</CODE> of the interface to become complete. To make -this situation clear, GF requires any module that opens an -<CODE>interface</CODE> to be marked as <CODE>incomplete</CODE>. Thus -the module -</P> -<PRE> - incomplete concrete DocMarkup of Doc = open Markup in { - ... - } -</PRE> -<P> -uses the interface <CODE>Markup</CODE> to place markup in -chosen places in its linearization rules, but the -implementation of markup - whether in HTML or in LaTeX - is -left unspecified. This is a powerful way of sharing -the code of a whole module with just differences in -the definitions of some constants. -</P> -<P> -Another terminology for <CODE>incomplete</CODE> modules is -<B>parametrized modules</B> or <B>functors</B>. -The <CODE>interface</CODE> gives the list of parameters -that the functor depends on. -</P> -<A NAME="toc22"></A> -<H3>Instantiating an interface</H3> -<P> -To complete an <CODE>incomplete</CODE> module, each <CODE>inteface</CODE> -that it opens has to be provided an <CODE>instance</CODE>. The following -syntax is used for this: -</P> -<PRE> - concrete DocHTML of Doc = DocMarkup with (Markup = MarkupHTML) ; -</PRE> -<P> -Instantiation of <CODE>Markup</CODE> with <CODE>MarkupLatex</CODE> is -another one-liner. -</P> -<P> -If more interfaces than one are instantiated, a comma-separated -list of equations in parentheses is used, e.g. -</P> -<PRE> - concrete MusicIta = MusicI with - (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ; -</PRE> -<P> -This example shows a common design pattern for building applications: -the concrete syntax is a functor on the generic resource grammar library -interface <CODE>Syntax</CODE> and a domain-specific lexicon interface, here -<CODE>LexMusic</CODE>. -</P> -<P> -All interfaces that are <CODE>open</CODE>ed in the completed model -must be completed. -</P> -<P> -Notice that the completion of an <CODE>incomplete</CODE> module -may at the same time extend modules of the same type (which need -not be completions). It can also add new judgements in a module body, -and restrict inheritance from the functor. -</P> -<PRE> - concrete MusicIta = MusicI - [f] with - (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ** { - - lin f = ... - - } ; -</PRE> -<P></P> -<A NAME="toc23"></A> -<H3>Compiling interfaces, instances, and parametrized modules</H3> -<P> -Interfaces, instances, and parametric modules are purely a -front-end feature of GF: these module types do not exist in -the <CODE>gfc</CODE> and <CODE>gfr</CODE> formats. The compiler has -nevertheless to keep track of their dependencies and modification -times. Here is a summary of how they are compiled: -</P> -<UL> -<LI>an <CODE>interface</CODE> is compiled into a <CODE>resource</CODE> with an empty body -<LI>an <CODE>instance</CODE> is compiled into a <CODE>resource</CODE> in union with its - <CODE>interface</CODE> -<LI>an <CODE>incomplete</CODE> module (<CODE>concrete</CODE> or <CODE>resource</CODE>) is compiled - into a module of the same type with an empty body -<LI>a completion module (<CODE>concrete</CODE> or <CODE>resource</CODE>) is compiled - into a module of the same type by compiling its functor so that, instead of - each <CODE>interface</CODE>, its given <CODE>instance</CODE> is used -</UL> - -<P> -This means that some generated code is duplicated, because those operations that -do have complete definitions in an <CODE>interface</CODE> are copied to each of -the <CODE>instances</CODE>. -</P> -<A NAME="toc24"></A> -<H1>Summary of module syntax and semantics</H1> -<A NAME="toc25"></A> -<H2>Abstract syntax modules</H2> -<P> -Syntax: -</P> -<P> -<CODE>abstract</CODE> A <CODE>=</CODE> (A<sub>1</sub>,...,A<sub>n</sub> <CODE>**</CODE>)? -<CODE>{</CODE>J<sub>1</sub> <CODE>;</CODE> ... <CODE>;</CODE> J<sub>m</sub> <CODE>; }</CODE> -</P> -<P> -where -</P> -<UL> -<LI>i >= 0 -<LI>each <I>A<sub>i</sub></I> is itself an abstract module, - possibly with restrictions on inheritance, i.e. <I>A<sub>i</sub></I><CODE>-[</CODE><I>f,..,g</I><CODE>]</CODE> - or <I>A<sub>i</sub></I><CODE>[</CODE><I>f,..,g</I><CODE>]</CODE> -<LI>each <I>J<sub>i</sub></I> is a judgement of one of the forms - <CODE>cat, fun, def, data</CODE> -</UL> - -<P> -Semantic conditions: -</P> -<UL> -<LI>all inherited names declared in each <I>A<sub>i</sub></I> and <I>A</I> must be distinct -<LI>names in restriction lists must be defined in the restricted module -<LI>inherited constants may not depend on names excluded by restriction -</UL> - -<A NAME="toc26"></A> -<H2>Concrete syntax modules</H2> -<P> -Syntax: -</P> -<P> -<CODE>incomplete</CODE>? <CODE>concrete</CODE> C <CODE>of</CODE> A <CODE>=</CODE> -(C<sub>1</sub>,...,C<sub>n</sub> <CODE>**</CODE>)? -(<CODE>open</CODE> O<sub>1</sub>,...,O<sub>k</sub> <CODE>in</CODE>)? -<CODE>{</CODE>J<sub>1</sub> <CODE>;</CODE> ... <CODE>;</CODE> J<sub>m</sub> <CODE>; }</CODE> -</P> -<P> -where -</P> -<UL> -<LI>i >= 0 -<LI><I>A</I> is an abstract module -<LI>each <I>C<sub>i</sub></I> is a concrete module, - possibly with restrictions on inheritance, i.e. <I>C<sub>i</sub></I><CODE>-[</CODE><I>f,..,g</I><CODE>]</CODE> -<LI>each <I>O<sub>i</sub></I> is an open specification, of one of the forms - <UL> - <LI><I>R</I> - <LI><CODE>(</CODE><I>Q</I><CODE>=</CODE><I>R</I><CODE>)</CODE> - </UL> -</UL> - -<P> - where <I>R</I> is a resource, instance, or concrete, and <I>Q</I> is any identifier -</P> -<UL> -<LI>each <I>J<sub>i</sub></I> is a judgement of one of the forms - <CODE>lincat, lin, lindef, printname</CODE>; also the forms <CODE>oper, param</CODE> are - allowed, but they cannot be inherited. -</UL> - -<P> -If the modifier <CODE>incomplete</CODE> appears, then any <I>R</I> in -an open specification may also be an interface or an abstract. -</P> -<P> -Semantic conditions: -</P> -<UL> -<LI>each <CODE>cat</CODE> judgement in <I>A</I> - must have a corresponding, unique - <CODE>lincat</CODE> judgement in <I>C</I> -<LI>each <CODE>fun</CODE> judgement in <I>A</I> - must have a corresponding, unique - <CODE>lin</CODE> judgement in <I>C</I> -<LI>names in restriction lists must be defined in the restricted module -<LI>inherited constants may not depend on names excluded by restriction -</UL> - -<A NAME="toc27"></A> -<H2>Resource modules</H2> -<P> -Syntax: -</P> -<P> -<CODE>resource</CODE> R <CODE>=</CODE> -(R<sub>1</sub>,...,R<sub>n</sub> <CODE>**</CODE>)? -(<CODE>open</CODE> O<sub>1</sub>,...,O<sub>k</sub> <CODE>in</CODE>)? -<CODE>{</CODE>J<sub>1</sub> <CODE>;</CODE> ... <CODE>;</CODE> J<sub>m</sub> <CODE>; }</CODE> -</P> -<P> -where -</P> -<UL> -<LI>i >= 0 -<LI>each <I>R<sub>i</sub></I> is a resource, instance, or concrete module, - possibly with restrictions on inheritance, i.e. <I>R<sub>i</sub></I><CODE>-[</CODE><I>f,..,g</I><CODE>]</CODE> -<LI>each <I>O<sub>i</sub></I> is an open specification, of one of the forms - <UL> - <LI><I>P</I> - <LI><CODE>(</CODE><I>Q</I><CODE>=</CODE><I>R</I><CODE>)</CODE> - </UL> -</UL> - -<P> - where <I>P</I> is a resource, instance, or concrete, and <I>Q</I> is any identifier -</P> -<UL> -<LI>each <I>J<sub>i</sub></I> is a judgement of one of the forms <CODE>oper, param</CODE> -</UL> - -<P> -Semantic conditions: -</P> -<UL> -<LI>all names defined in each <I>R<sub>i</sub></I> and <I>R</I> must be distinct -<LI>all constants declared must have a definition -<LI>names in restriction lists must be defined in the restricted module -<LI>inherited constants may not depend on names excluded by restriction -</UL> - -<A NAME="toc28"></A> -<H2>Interface modules</H2> -<P> -Syntax: -</P> -<P> -<CODE>interface</CODE> R <CODE>=</CODE> -(R<sub>1</sub>,...,R<sub>n</sub> <CODE>**</CODE>)? -(<CODE>open</CODE> O<sub>1</sub>,...,O<sub>k</sub> <CODE>in</CODE>)? -<CODE>{</CODE>J<sub>1</sub> <CODE>;</CODE> ... <CODE>;</CODE> J<sub>m</sub> <CODE>; }</CODE> -</P> -<P> -where -</P> -<UL> -<LI>i >= 0 -<LI>each <I>R<sub>i</sub></I> is an interface or abstract module, - possibly with restrictions on inheritance, i.e. <I>R<sub>i</sub></I><CODE>-[</CODE><I>f,..,g</I><CODE>]</CODE> -<LI>each <I>O<sub>i</sub></I> is an open specification, of one of the forms - <UL> - <LI><I>P</I> - <LI><CODE>(</CODE><I>Q</I><CODE>=</CODE><I>R</I><CODE>)</CODE> - </UL> -</UL> - -<P> - where <I>P</I> is a resource, instance, or concrete, and <I>Q</I> is any identifier -</P> -<UL> -<LI>each <I>J<sub>i</sub></I> is a judgement of one of the forms <CODE>oper, param</CODE> -</UL> - -<P> -Semantic conditions: -</P> -<UL> -<LI>all names declared in each <I>R<sub>i</sub></I> and <I>R</I> must be distinct -<LI>names in restriction lists must be defined in the restricted module -<LI>inherited constants may not depend on names excluded by restriction -</UL> - -<A NAME="toc29"></A> -<H2>Instance modules</H2> -<P> -Syntax: -</P> -<P> -<CODE>instance</CODE> R <CODE>of</CODE> I <CODE>=</CODE> -(R<sub>1</sub>,...,R<sub>n</sub> <CODE>**</CODE>)? -(<CODE>open</CODE> O<sub>1</sub>,...,O<sub>k</sub> <CODE>in</CODE>)? -<CODE>{</CODE>J<sub>1</sub> <CODE>;</CODE> ... <CODE>;</CODE> J<sub>m</sub> <CODE>; }</CODE> -</P> -<P> -where -</P> -<UL> -<LI>i >= 0 -<LI><I>I</I> is an interface module -<LI>each <I>R<sub>i</sub></I> is an instance, resource, or concrete module, - possibly with restrictions on inheritance, i.e. <I>R<sub>i</sub></I><CODE>-[</CODE><I>f,..,g</I><CODE>]</CODE> -<P></P> -<LI>each <I>O<sub>i</sub></I> is an open specification, of one of the forms - <UL> - <LI><I>P</I> - <LI><CODE>(</CODE><I>Q</I><CODE>=</CODE><I>R</I><CODE>)</CODE> - </UL> -</UL> - -<P> - where <I>P</I> is a resource, instance, or concrete, and <I>Q</I> is any identifier -</P> -<UL> -<LI>each <I>J<sub>i</sub></I> is a judgement of one of the forms - <CODE>oper, param</CODE> -</UL> - -<P> -Semantic conditions: -</P> -<UL> -<LI>all names declared in each <I>R<sub>i</sub></I>, <I>I</I>, and <I>R</I> must be distinct -<LI>all constants declared in <I>I</I> must have a definition either in - <I>I</I> or <I>R</I> -<LI>names in restriction lists must be defined in the restricted module -<LI>inherited constants may not depend on names excluded by restriction -</UL> - -<A NAME="toc30"></A> -<H2>Instantiated concrete syntax modules</H2> -<P> -Syntax: -</P> -<P> -<CODE>concrete</CODE> C <CODE>of</CODE> A <CODE>=</CODE> -(C<sub>1</sub>,...,C<sub>n</sub> <CODE>**</CODE>)? -B -<CODE>with</CODE> -<CODE>(</CODE>I<sub>1</sub> <CODE>=</CODE>J<sub>1</sub><CODE>),</CODE> ... -<CODE>, (</CODE>I<sub>p</sub> <CODE>=</CODE>J<sub>p</sub><CODE>)</CODE> -(<CODE>-</CODE>? <CODE>[</CODE>c<sub>1</sub>,...,c<sub>q</sub> <CODE>]</CODE>)? -(<CODE>**</CODE>? -(<CODE>open</CODE> O<sub>1</sub>,...,O<sub>k</sub> <CODE>in</CODE>)? -<CODE>{</CODE>J<sub>1</sub> <CODE>;</CODE> ... <CODE>;</CODE> J<sub>m</sub> <CODE>; }</CODE>)? <CODE>;</CODE> -</P> -<P> -where -</P> -<UL> -<LI>i >= 0 -<LI><I>A</I> is an abstract module -<LI>each <I>C<sub>i</sub></I> is a concrete module, - possibly with restrictions on inheritance, i.e. <I>R<sub>i</sub></I><CODE>-[</CODE><I>f,..,g</I><CODE>]</CODE> -<LI><I>B</I> is an incomplete concrete syntax of <I>A</I> -<LI>each <I>I<sub>i</sub></I> is an interface or an abstract -<LI>each <I>J<sub>i</sub></I> is an instance or a concrete of <I>I<sub>i</sub></I> -<LI>each <I>O<sub>i</sub></I> is an open specification, of one of the forms - <UL> - <LI><I>R</I> - <LI><CODE>(</CODE><I>Q</I><CODE>=</CODE><I>R</I><CODE>)</CODE> - </UL> -</UL> - -<P> - where <I>R</I> is a resource, instance, or concrete, and <I>Q</I> is any identifier -</P> -<UL> -<LI>each <I>J<sub>i</sub></I> is a judgement of one of the forms - <CODE>lincat, lin, lindef, printname</CODE>; also the forms <CODE>oper, param</CODE> are - allowed, but they cannot be inherited. -</UL> - - -<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) --> -<!-- cmdline: txt2tags -\-toc -thtml gf-modules.txt --> -</BODY></HTML> diff --git a/doc/gf-modules.txt b/doc/gf-modules.txt deleted file mode 100644 index 1a4067b40..000000000 --- a/doc/gf-modules.txt +++ /dev/null @@ -1,994 +0,0 @@ -The Module System of GF -Aarne Ranta -8/4/2005 - 5/7/2007 - -%!postproc(html): #SUB1 <sub>1</sub> -%!postproc(html): #SUBk <sub>k</sub> -%!postproc(html): #SUBi <sub>i</sub> -%!postproc(html): #SUBm <sub>m</sub> -%!postproc(html): #SUBn <sub>n</sub> -%!postproc(html): #SUBp <sub>p</sub> -%!postproc(html): #SUBq <sub>q</sub> - - -% to compile: txt2tags --toc -thtml modulesystem.txt - - -A GF grammar consists of a set of **modules**, which can be -combined in different ways to build different grammars. -There are several different **types of modules**: -- ``abstract`` -- ``concrete`` -- ``resource`` -- ``interface`` -- ``instance`` -- ``incomplete concrete`` - - -We will go through the module types in this order, which is also -their order of "importance" from the most basic to -the more advanced ones. - -This document presupposes knowledge of GF judgements and expressions, which can -be gained from the [GF tutorial tutorial/gf-tutorial2.html]. It aims -to give a systamatic description of the module system; -some tutorial information is repeated to make the document -self-contained. - - - - -=The principal module types= - -==Abstract syntax== - -Any GF grammar that is used in an application -will probably contain at least one module -of the ``abstract`` module type. Here is an example of -such a module, defining a fragment of propositional logic. -``` - abstract Logic = { - cat Prop ; - fun Conj : Prop -> Prop -> Prop ; - fun Disj : Prop -> Prop -> Prop ; - fun Impl : Prop -> Prop -> Prop ; - fun Falsum : Prop ; - } -``` -The **name** of this module is ``Logic``. - - - -An ``abstract`` module defines an **abstract syntax**, which -is a language-independent representation of a fragment of language. -It consists of two kinds of **judgements**: -- ``cat`` judgements telling what **categories** there are - (types of abstract syntax trees) -- ``fun`` judgements telling what **functions** there are - (to build abstract syntax trees) - - -There can also be ``def`` and ``data`` judgements in an -abstract syntax. - - -===Compilation of abstract syntax=== - -The GF grammar compiler expects to find the module ``Logic`` in a file named -``Logic.gf``. When the compiler is run, it produces -another file, named ``Logic.gfc``. This file is in the -format called **canonical GF**, which is the "machine language" -of GF. Next time that the module ``Logic`` is needed in -compiling a grammar, it can be read from the compiled (``gfc``) -file instead of the source (``gf``) file, unless the source -has been changed after the compilation. - - -==Concrete syntax== - -In order for a GF grammar to describe a concrete language, the abstract -syntax must be completed with a **concrete syntax** of it. -For this purpose, we use modules of type ``concrete``: for instance, -``` - concrete LogicEng of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = a.s ++ "and" ++ b.s} ; - lin Disj a b = {s = a.s ++ "or" ++ b.s} ; - lin Impl a b = {s = "if" ++ a.s ++ "then" ++ b.s} ; - lin Falsum = {s = ["we have a contradiction"]} ; - } -``` -The module ``LogicEng`` is a concrete syntax ``of`` the -abstract syntax ``Logic``. The GF grammar compiler checks that -the concrete is valid with respect to the abstract syntax ``of`` -which it is claimed to be. The validity requires that there has to be -- a ``lincat`` judgement for each ``cat`` judgement, telling what the - **linearization types** of categories are -- a ``lin`` judgement for each ``fun`` judgement, telling what the - **linearization functions** corresponding to functions are - - -Validity also requires that the linearization functions defined by -``lin`` judgements are type-correct with respect to the -linearization types of the arguments and value of the function. - - - -There can also be ``lindef`` and ``printname`` judgements in a -concrete syntax. - - -==Top-level grammar== - -When a ``concrete`` module is successfully compiled, a ``gfc`` -file is produced in the same way as for ``abstract`` modules. The -pair of an ``abstract`` and a corresponding ``concrete`` module -is a **top-level grammar**, which can be used in the GF system to -perform various tasks. The most fundamental tasks are -- **linearization**: take an abstract syntax tree and find the corresponding string -- **parsing**: take a string and find the corresponding abstract syntax - trees (which can be zero, one, or many) - - -In the current grammar, infinitely many trees and strings are recognized, although -no very interesting ones. For example, the tree -``` - Impl (Disj Falsum Falsum) Falsum -``` -has the linearization -``` - if we have a contradiction or we have a contradiction then we have a contradiction -``` -which in turn can be parsed uniquely as that tree. - - -===Compiling top-level grammars=== - -When GF compiles the module ``LogicEng`` it also has to compile -all modules that it **depends** on (in this case, just ``Logic``). -The compilation process starts with dependency analysis to find -all these modules, recursively, starting from the explicitly imported one. -The compiler then reads either ``gf`` or ``gfc`` files, in -a dependency order. The decision on which files to read depends on -time stamps and dependencies in a natural way, so that all and only -those modules that have to be compiled are compiled. (This behaviour can -be changed with flags, see below.) - - -===Using top-level grammars=== - -To use a top-level grammar in the GF system, one uses the ``import`` -command (short name ``i``). For instance, -``` - i LogicEng.gf -``` -It is also possible to specify the imported grammar(s) on the command -line when invoking GF: -``` - gf LogicEng.gf -``` -Various **compilation flags** can be added to both ways of compiling a module: -- ``-src`` forces compilation form source files -- ``-v`` gives more verbose information on compilation -- ``-s`` makes compilation silent (except if it fails with an error message) - - -A complete list of flags can be obtained in GF by ``help i``. - -Importing a grammar makes it visible in GF's **internal state**. To see -what modules are available, use the command ``print_options`` (``po``). -You can empty the state with the command ``empty`` (``e``); this is -needed if you want to read in grammars with a different abstract syntax -than the current one without exiting GF. - - - -Grammar modules can reside in different directories. They can then be found -by means of a **search path**, which is a flag such as -``` - -path=.:api/toplevel:prelude -``` -given to the ``import`` command or the shell command invoking GF. -(It can also be defined in the grammar file; see below.) The compiler -writes every ``gfc`` file in the same directory as the corresponding -``gf`` file. - -The ``path`` is relative to the working directory ``pwd``, so that -all directories listed are primarily interpreted as subdirectories of -``pwd``. Secondarily, they are searched relative to the value of the -environment variable ``GF_LIB_PATH``, which is by default set to -``/usr/local/share/GF``. - -Parsing and linearization can be performed with the ``parse`` -(``p``) and ``linearize`` (``l``) commands, respectively. -For instance, -``` - > l Impl (Disj Falsum Falsum) Falsum - if we have a contradiction or we have a contradiction then we have a contradiction - - > p -cat=Prop "we have a contradiction" - Falsum -``` -Notice that the ``parse`` command needs the parsing category -as a flag. This necessary since a grammar can have several -possible parsing categories ("entry points"). - - - -==Multilingual grammar== - -One ``abstract`` syntax can have several ``concrete`` syntaxes. -Here are two new ones for ``Logic``: -``` - concrete LogicFre of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = a.s ++ "et" ++ b.s} ; - lin Disj a b = {s = a.s ++ "ou" ++ b.s} ; - lin Impl a b = {s = "si" ++ a.s ++ "alors" ++ b.s} ; - lin Falsum = {s = ["nous avons une contradiction"]} ; - } - - concrete LogicSymb of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = "(" ++ a.s ++ "&" ++ b.s ++ ")"} ; - lin Disj a b = {s = "(" ++ a.s ++ "v" ++ b.s ++ ")"} ; - lin Impl a b = {s = "(" ++ a.s ++ "->" ++ b.s ++ ")"} ; - lin Falsum = {s = "_|_"} ; - } -``` -The four modules ``Logic``, ``LogicEng``, ``LogicFre``, and -``LogicSymb`` together form a **multilingual grammar**, in which -it is possible to perform parsing and linearization with respect to any -of the concrete syntaxes. As a combination of parsing and linearization, -one can also perform **translation** from one language to another. -(By **language** we mean the set of expressions generated by one -concrete syntax.) - - -===Using multilingual grammars=== - -Any combination of abstract syntax and corresponding concrete syntaxes -is thus a multilingual grammar. With many languages and other enrichments -(as described below), a multilingual grammar easily grows to the size of -tens of modules. The grammar developer, having finished her job, can -package the result in a **multilingual canonical grammar**, a file -with the suffix ``.gfcm``. For instance, to compile the set of grammars -described by now, the following sequence of GF commands can be used: -``` - i LogicEng.gf - i LogicFre.gf - i LogicSymb.gf - pm | wf logic.gfcm -``` -The "end user" of the grammar only needs the file ``logic.gfcm`` to -access all the functionality of the multilingual grammar. It can be -imported in the GF system in the same way as ``.gf`` files. But -it can also be used in the -[Embedded Java Interpreter for GF http://www.cs.chalmers.se/~bringert/gf/gf-java.html] -to build Java programs of which the multilingual grammar functionalities -(linearization, parsing, translation) form a part. - -In a multilingual grammar, the concrete syntax module names work as -names of languages that can be selected for linearization and parsing: -``` - > l -lang=LogicFre Impl Falsum Falsum - si nous avons une contradiction alors nous avons une contradiction - - > l -lang=LogicSymb Impl Falsum Falsum - ( _|_ -> _|_ ) - - > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" - Conj Falsum Falsum -``` -The option ``-multi`` gives linearization to all languages: -``` - > l -multi Impl Falsum Falsum - if we have a contradiction then we have a contradiction - si nous avons une contradiction alors nous avons une contradiction - ( _|_ -> _|_ ) -``` -Translation can be obtained by using a **pipe** from a parser -to a linearizer: -``` - > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" | l -lang=LogicEng - if we have a contradiction then we have a contradiction -``` - - - -==Resource modules== - -The ``concrete`` modules shown above would look much nicer if -we used the main idea of functional programming: avoid repetitive -code by using **functions** that capture repeated patterns of -expressions. A collection of such functions can be a valuable -**resource** for a programmer, reusable in many different -top-level grammars. Thus we introduce the ``resource`` -module type, with the first example -``` - resource Util = { - oper SS : Type = {s : Str} ; - oper ss : Str -> SS = \s -> {s = s} ; - oper paren : Str -> Str = \s -> "(" ++ s ++ ")" ; - oper infix : Str -> SS -> SS -> SS = \h,x,y -> - ss (x.s ++ h ++ y.s) ; - oper infixp : Str -> SS -> SS -> SS = \h,x,y -> - ss (paren (infix h x y)) ; - } -``` -Modules of ``resource`` type have two forms of judgement: - -- ``oper`` defining auxiliary operations -- ``param`` defining parameter types - - -A ``resource`` can be used in a ``concrete`` (or another -``resource``) by ``open``ing it. This means that -all operations (and parameter types) defined in the resource -module become usable in module that opens it. For instance, -we can rewrite the module ``LogicSymb`` much more concisely: -``` - concrete LogicSymb of Logic = open Util in { - lincat Prop = SS ; - lin Conj = infixp "&" ; - lin Disj = infixp "v" ; - lin Impl = infixp "->" ; - lin Falsum = ss "_|_" ; - } -``` -What happens when this variant of ``LogicSymb`` is -compiled is that the ``oper``-defined constants -of ``Util`` are **inlined** in the -right-hand-sides of the judgements of ``LogicSymb``, -and these expressions are **partially evaluated**, i.e. -computed as far as possible. The generated ``gfc`` file -will look just like the file generated for the first version -of ``LogicSymb`` - at least, it will do the same job. - - -Several ``resource`` modules can be ``open``ed -at the same time. If the modules contain same names, the -conflict can be resolved by **qualified** opening and -reference. For instance, -``` - concrete LogicSymb of Logic = open Util, Prelude in { ... - } ; -``` -(where ``Prelude`` is a standard library of GF) brings -into scope two definitions of the constant ``SS``. -To specify which one is used, you can write -``Util.SS`` or ``Prelude.SS`` instead of just ``SS``. -You can also introduce abbreviations to avoid long qualifiers, e.g. -``` - concrete LogicSymb of Logic = open (U=Util), (P=Prelude) in { ... - } ; -``` -which means that you can write ``U.SS`` and ``P.SS``. - -Judgements of ``param`` and ``oper`` forms may also be used -in ``concrete`` modules, and they are then considered local -to those modules, i.e. they are not exported. - - - -===Compiling resource modules=== - -The compilation of a ``resource`` module differs -from the compilation of ``abstract`` and -``concrete`` modules because ``oper`` operations -do not in general have values in ``gfc``. A ``gfc`` -file //is// generated, but it contains only -``param`` judgements (also recall that ``oper``s -are inlined in their top-level use sites, so it is not -necessary to save them in the compiled grammar). -However, since computing the operations over and over -again can be time comsuming, and since type checking -``resource`` modules also takes time, a third kind -of file is generated for resource modules: a ``.gfr`` -file. This file is written in the GF source code notation, -but it is type checked and type annotated, and ``oper``s -are computed as far as possible. - - - -If you look at any ``gfc`` or ``gfr`` file generated -by the GF compiler, you see that all names have been replaced by -their qualified variants. This is an important first step (after parsing) -the compiler does. As for the commands in the GF shell, some output -qualified names and some not. The difference does not always result -from firm principles. - - -===Using resource modules=== - -The typical use is through ``open`` in a -``concrete`` module, which means that -``resource`` modules are not imported on their own. -However, in the developing and testing phase of grammars, it -can be useful to evaluate ``oper``s with different -arguments. To prevent them from being thrown away after inlining, the -``-retain`` option can be used: -``` - > i -retain Util.gf -``` -The command ``compute_concrete`` (``cc``) -can now be used for evaluating expressions that may contain -operations defined in ``Util``: -``` - > cc ss (paren "foo") - {s = "(" ++ "foo" ++ ")"} -``` -To find out what ``oper``s are available for a given type, -the command ``show_operations`` (``so``) can be used: -``` - > so SS - Util.ss : Str -> SS ; - Util.infix : Str -> SS -> SS -> SS ; - Util.infixp : Str -> SS -> SS -> SS ; -``` - - - - -==Inheritance== - -The most characteristic modularity of GF lies in the division of -grammars into ``abstract``, ``concrete``, and -``resource`` modules. This permits writing multilingual -grammar and sharing the maximum of code between different -languages. - - -In addition to this special kind of modularity, GF provides **inheritance**, -which is familiar from other programming languages (in particular, -object-oriented ones). Inheritance means that a module inherits all -judgements from another module; we also say that it **extends** -the other module. Inheritance is useful to divide big grammars into -smaller units, and also to reuse the same units in different bigger -grammars. - - - -The first example of inheritance is for abstract syntax. Let us -extend the module ``Logic`` to ``Arithmetic``: -``` - abstract Arithmetic = Logic ** { - cat Nat ; - fun Even : Nat -> Prop ; - fun Odd : Nat -> Prop ; - fun Zero : Nat ; - fun Succ : Nat -> Nat ; - } -``` -In parallel with the extension of the abstract syntax -``Logic`` to ``Arithmetic``, we can extend -the concrete syntax ``LogicEng`` to ``ArithmeticEng``: -``` - concrete ArithmeticEng of Arithmetic = LogicEng ** open Util in { - lincat Nat = SS ; - lin Even x = ss (x.s ++ "is" ++ "even") ; - lin Odd x = ss (x.s ++ "is" ++ "odd") ; - lin Zero = ss "zero" ; - lin Succ x = ss ("the" ++ "successor" ++ "of" ++ x.s) ; - } -``` -Another extension of ``Logic`` is ``Geometry``, -``` - abstract Geometry = Logic ** { - cat Point ; - cat Line ; - fun Incident : Point -> Line -> Prop ; - } -``` -The corresponding concrete syntax is left as exercise. - - -===Multiple inheritance=== - - -Inheritance can be **multiple**, which means that a module -may extend many modules at the same time. Suppose, for instance, -that we want to build a module for mathematics covering both -arithmetic and geometry, and the underlying logic. We then write -``` - abstract Mathematics = Arithmetic, Geometry ** { - } ; -``` -We could of course add some new judgements in this module, but -it is not necessary to do so. If no new judgements are added, the -module body can be omitted: -``` - abstract Mathematics = Arithmetic, Geometry ; -``` - -The module ``Mathematics`` shows that it is possibe -to extend a module already built by extension. The correctness -criterion for extensions is that the same name -(``cat``, ``fun``, ``oper``, or ``param``) -may not be defined twice in the resulting union of names. -That the names defined in ``Logic`` are "inherited twice" -by ``Mathematics`` (via both ``Arithmetic`` and -``Geometry``) is no violation of this rule; the usual -problems of multiple inheritance do not arise, since -the definitions of inherited constants cannot be changed. - - - -===Restricted inheritance=== - -Inheritance can be **restricted**, which means that only some of -the constants are inherited. There are two dual notations for this: -``` - A [f,g] -``` -meaning that //only// ``f`` and ``g`` are inherited from ``A``, and -``` - A-[f,g] -``` -meaning that //everything except// ``f`` is ``g`` are inherited from ``A``. - -Constants that are not inherited may be redefined in the inheriting module. - - - - -===Compiling inheritance=== - -Inherited judgements are not copied into the inheriting modules. -Instead, an **indirection** is created for each inherited name, -as can be seen by looking into the generated ``gfc`` (and -``gfr``) files. Thus for instance the names -``` - Mathematics.Prop Arithmetic.Prop Geometry.Prop Logic.Prop -``` -all refer to the same category, declared in the module -``Logic``. - - - -===Inspecting grammar hierarchies=== - -The command ``visualize_graph`` (``vg``) shows the -dependency graph in the current GF shell state. The graph can -also be saved in a file and used e.g. in documentation, by the -command ``print_multi -graph`` (``pm -graph``). - -The ``vg`` command uses the free software packages Graphviz (commad ``dot``) -and Ghostscript (command ``gv``). - - - -==Reuse of top-level grammars as resources== - -Top-level grammars have a straightforward translation to -``resource`` modules. The translation concerns -pairs of abstract-concrete judgements: -``` - cat C ; ===> oper C : Type = T ; - lincat C = T ; - - fun f : A ; ===> oper f : A = t ; - lin f = t ; -``` -Due to this translation, a ``concrete`` module -can be ``open``ed in the same way as a -``resource`` module; the translation is done -on the fly (it is computationally very cheap). - -Modular grammar engineering often means that some grammarians -focus on the semantics of the domain whereas others take care -of linguistic details. Thus a typical reuse opens a -linguistically oriented **resource grammar**, -``` - abstract Resource = { - cat S ; NP ; A ; - fun PredA : NP -> A -> S ; - } - concrete ResourceEng of Resource = { - lincat S = ... ; - lin PredA = ... ; - } -``` -The **application grammar**, instead of giving linearizations -explicitly, just reduces them to categories and functions in the -resource grammar: -``` - concrete ArithmeticEng of Arithmetic = LogicEng ** open ResourceEng in { - lincat Nat = NP ; - lin Even x = PredA x (regA "even") ; - } -``` -If the resource grammar is only capable of generating grammatically -correct expressions, then the grammaticality of the application -grammar is also guaranteed: the type checker of GF is used as -grammar checker. -To guarantee distinctions between categories that have -the same linearization type, the actual translation used -in GF adds to every linearization type and linearization -a **lock field**, -``` - cat C ; ===> oper C : Type = T ** {lock_C : {}} ; - lincat C = T ; - - fun f : C_1 ... C_n -> C ; ===> oper f : C_1 ... C_n -> C = \x_1,...,x_n -> - lin f = t ; t x_1 ... x_n ** {lock_C = <>}; -``` -(Notice that the latter translation is type-correct because of -record subtyping, which means that ``t`` can ignore the -lock fields of its arguments.) An application grammarian who -only uses resource grammar categories and functions never -needs to write these lock fields herself. Having to do so -serves as a warning that the grammaticality guarantee given -by the resource grammar no longer holds. - -**Note**. The lock field mechanism is experimental, and may be changed -to a stronger abstraction mechnism in the future. This may result in -hand-written lock fields ceasing to work. - - -=Additional module types= - -==Interfaces, instances, and incomplete grammars== - -One difference between top-level grammars and ``resource`` -modules is that the former systematically separete the -declarations of categories and functions from their definitions. -In the reuse translation creating and ``oper`` judgement, -the declaration coming from the ``abstract`` module is put -together with the definition coming from the ``concrete`` -module. - - - -However, the separation of declarations and definitions is so -useful a notion that GF also has specific modules types that -``resource`` modules into two parts. In this splitting, -an ``interface`` module corresponds to an abstract syntax, -in giving the declarations of operations (and parameter types). -For instance, a generic markup interface would look as follows: -``` - interface Markup = open Util in { - oper Boldface : Str -> Str ; - oper Heading : Str -> Str ; - oper markupSS : (Str -> Str) -> SS -> SS = \f,r -> - ss (f r.s) ; - } -``` -The definitions of the constants declared in an ``interface`` -are given in an ``instance`` module (which is always ``of`` -an interface, in the same way as a ``concrete`` is always -``of`` an abstract). The following ``instance``s -define markup in HTML and latex. -``` - instance MarkupHTML of Markup = open Util in { - oper Boldface s = "<b>" ++ s ++ "</b>" ; - oper Heading s = "<h2>" ++ s ++ "</h2>" ; - } - - instance MarkupLatex of Markup = open Util in { - oper Boldface s = "\\textbf{" ++ s ++ "}" ; - oper Heading s = "\\section{" ++ s ++ "}" ; - } -``` -Notice that both ``interface``s and ``instance``s may -``open`` ``resource``s (and also reused top-level grammars). -An ``interface`` may moreover define some of the operations it -declares; these definitions are inherited by all instances and cannot -be changed in them. Inheritance by module extension -is possible, as always, between modules of the same type. - - -===Using an interface=== - -An ``interface`` or an ``instance`` -can be ``open``ed in -a ``concrete`` using the same syntax as when opening -a ``resource``. For an ``instance``, the semantics -is the same as when opening the definitions together with -the type signatures - one can think of an ``interface`` -and an ``instance`` of it together forming an ordinary -``resource``. Opening an ``interface``, however, -is different: functions that are only declared without -having a definition cannot be compiled (inlined); neither -can functions whose definitions depend on undefined functions. - - - -A module that ``open``s an ``interface`` is therefore -**incomplete**, and has to be **completed** with an -``instance`` of the interface to become complete. To make -this situation clear, GF requires any module that opens an -``interface`` to be marked as ``incomplete``. Thus -the module -``` - incomplete concrete DocMarkup of Doc = open Markup in { - ... - } -``` -uses the interface ``Markup`` to place markup in -chosen places in its linearization rules, but the -implementation of markup - whether in HTML or in LaTeX - is -left unspecified. This is a powerful way of sharing -the code of a whole module with just differences in -the definitions of some constants. - - - -Another terminology for ``incomplete`` modules is -**parametrized modules** or **functors**. -The ``interface`` gives the list of parameters -that the functor depends on. - - -===Instantiating an interface=== - -To complete an ``incomplete`` module, each ``inteface`` -that it opens has to be provided an ``instance``. The following -syntax is used for this: -``` - concrete DocHTML of Doc = DocMarkup with (Markup = MarkupHTML) ; -``` -Instantiation of ``Markup`` with ``MarkupLatex`` is -another one-liner. - -If more interfaces than one are instantiated, a comma-separated -list of equations in parentheses is used, e.g. -``` - concrete MusicIta = MusicI with - (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ; -``` -This example shows a common design pattern for building applications: -the concrete syntax is a functor on the generic resource grammar library -interface ``Syntax`` and a domain-specific lexicon interface, here -``LexMusic``. - -All interfaces that are ``open``ed in the completed model -must be completed. - -Notice that the completion of an ``incomplete`` module -may at the same time extend modules of the same type (which need -not be completions). It can also add new judgements in a module body, -and restrict inheritance from the functor. -``` - concrete MusicIta = MusicI - [f] with - (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ** { - - lin f = ... - - } ; -``` - - -===Compiling interfaces, instances, and parametrized modules=== - -Interfaces, instances, and parametric modules are purely a -front-end feature of GF: these module types do not exist in -the ``gfc`` and ``gfr`` formats. The compiler has -nevertheless to keep track of their dependencies and modification -times. Here is a summary of how they are compiled: -- an ``interface`` is compiled into a ``resource`` with an empty body -- an ``instance`` is compiled into a ``resource`` in union with its - ``interface`` -- an ``incomplete`` module (``concrete`` or ``resource``) is compiled - into a module of the same type with an empty body -- a completion module (``concrete`` or ``resource``) is compiled - into a module of the same type by compiling its functor so that, instead of - each ``interface``, its given ``instance`` is used - - -This means that some generated code is duplicated, because those operations that -do have complete definitions in an ``interface`` are copied to each of -the ``instances``. - - -=Summary of module syntax and semantics= - - -==Abstract syntax modules== - -Syntax: - -``abstract`` A ``=`` (A#SUB1,...,A#SUBn ``**``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - - -where -- i >= 0 -- each //A#SUBi// is itself an abstract module, - possibly with restrictions on inheritance, i.e. //A#SUBi//``-[``//f,..,g//``]`` - or //A#SUBi//``[``//f,..,g//``]`` -- each //J#SUBi// is a judgement of one of the forms - ``cat, fun, def, data`` - - -Semantic conditions: -- all inherited names declared in each //A#SUBi// and //A// must be distinct -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - -==Concrete syntax modules== - -Syntax: - -``incomplete``? ``concrete`` C ``of`` A ``=`` -(C#SUB1,...,C#SUBn ``**``)? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - - -where -- i >= 0 -- //A// is an abstract module -- each //C#SUBi// is a concrete module, - possibly with restrictions on inheritance, i.e. //C#SUBi//``-[``//f,..,g//``]`` -- each //O#SUBi// is an open specification, of one of the forms - - //R// - - ``(``//Q//``=``//R//``)`` - - - where //R// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms - ``lincat, lin, lindef, printname``; also the forms ``oper, param`` are - allowed, but they cannot be inherited. - - - -If the modifier ``incomplete`` appears, then any //R// in -an open specification may also be an interface or an abstract. - - -Semantic conditions: -- each ``cat`` judgement in //A// - must have a corresponding, unique - ``lincat`` judgement in //C// -- each ``fun`` judgement in //A// - must have a corresponding, unique - ``lin`` judgement in //C// -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - -==Resource modules== - -Syntax: - -``resource`` R ``=`` -(R#SUB1,...,R#SUBn ``**``)? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - -where -- i >= 0 -- each //R#SUBi// is a resource, instance, or concrete module, - possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]`` -- each //O#SUBi// is an open specification, of one of the forms - - //P// - - ``(``//Q//``=``//R//``)`` - - - where //P// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms ``oper, param`` - - - - -Semantic conditions: -- all names defined in each //R#SUBi// and //R// must be distinct -- all constants declared must have a definition -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - -==Interface modules== - -Syntax: - -``interface`` R ``=`` -(R#SUB1,...,R#SUBn ``**``)? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - -where -- i >= 0 -- each //R#SUBi// is an interface or abstract module, - possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]`` -- each //O#SUBi// is an open specification, of one of the forms - - //P// - - ``(``//Q//``=``//R//``)`` - - - where //P// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms ``oper, param`` - - - - -Semantic conditions: -- all names declared in each //R#SUBi// and //R// must be distinct -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - - -==Instance modules== - -Syntax: - -``instance`` R ``of`` I ``=`` -(R#SUB1,...,R#SUBn ``**``)? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - -where -- i >= 0 -- //I// is an interface module -- each //R#SUBi// is an instance, resource, or concrete module, - possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]`` - -- each //O#SUBi// is an open specification, of one of the forms - - //P// - - ``(``//Q//``=``//R//``)`` - - - where //P// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms - ``oper, param`` - - - - -Semantic conditions: -- all names declared in each //R#SUBi//, //I//, and //R// must be distinct -- all constants declared in //I// must have a definition either in - //I// or //R// -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - -==Instantiated concrete syntax modules== - -Syntax: - -``concrete`` C ``of`` A ``=`` -(C#SUB1,...,C#SUBn ``**``)? -B -``with`` -``(``I#SUB1 ``=``J#SUB1``),`` ... -``, (``I#SUBp ``=``J#SUBp``)`` -(``-``? ``[``c#SUB1,...,c#SUBq ``]``)? -(``**``? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``)? ``;`` - -where -- i >= 0 -- //A// is an abstract module -- each //C#SUBi// is a concrete module, - possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]`` -- //B// is an incomplete concrete syntax of //A// -- each //I#SUBi// is an interface or an abstract -- each //J#SUBi// is an instance or a concrete of //I#SUBi// -- each //O#SUBi// is an open specification, of one of the forms - - //R// - - ``(``//Q//``=``//R//``)`` - - - where //R// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms - ``lincat, lin, lindef, printname``; also the forms ``oper, param`` are - allowed, but they cannot be inherited. - - - - diff --git a/doc/overview-resource.txt b/doc/overview-resource.txt deleted file mode 100644 index 2f9b2cd04..000000000 --- a/doc/overview-resource.txt +++ /dev/null @@ -1,300 +0,0 @@ -==Texts. phrases, and utterances== - -The outermost linguistic structure is ``Text``. ``Text``s are composed -from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or -"!" (with their proper variants in Spanish and Arabic). Here is an -example of a ``Text`` string. -``` - John walks. Why? He doesn't want to sleep! -``` -Phrases are mostly built from Utterances (``Utt``), which in turn are -declarative sentences, questions, or imperatives - but there -are also "one-word utterances" consisting of noun phrases -or other subsentential phrases. Some Phrases are atomic, -for instance "yes" and "no". Here are some examples of Phrases. -``` - yes - come on, John - but John walks - give me the stick please - don't you know that he is sleeping - a glass of wine - a glass of wine please -``` -There is no connection between the punctuation marks and the -types of utterances. This reflects the fact that the punctuation -mark in a real text is selected as a function of the speech act -rather than the grammatical form of an utterance. The following -text is thus well-formed. -``` - John walks. John walks? John walks! -``` -What is the difference between Phrase and Utterance? Just technical: -a Phrase is an Utterance with an optional leading conjunction ("but") -and an optional tailing vocative ("John", "please"). - - -==Sentences and clauses== - -TODO: use overloaded operations in the examples. - -The richest of the categories below Utterance is ``S``, Sentence. A Sentence -is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity. -For example, each of the following strings has a distinct syntax tree -in the category Sentence: -``` - John walks - John doesn't walk - John walked - John didn't walk - John has walked - John hasn't walked - John will walk - John won't walk - ... -``` -whereas in the category Clause all of them are just different forms of -the same tree. -The difference between Sentence and Clause is thus also rather technical. -It may not correspond exactly to any standard usage of the terms -"clause" and "sentence". - -Figure 1 shows a type-annotated syntax tree of the Text "John walks." -and gives an overview of the structural levels. - -#BFIG - -``` -Node Constructor Value type Other constructors ------------------------------------------------------------ - 1. TFullStop Text TQuestMark - 2. (PhrUtt Phr - 3. NoPConj PConj but_PConj - 4. (UttS Utt UttQS - 5. (UseCl S UseQCl - 6. TPres Tense TPast - 7. ASimul Anter AAnter - 8. PPos Pol PNeg - 9. (PredVP Cl -10. (UsePN NP UsePron, DetCN -11. john_PN) PN mary_PN -12. (UseV VP ComplV2, ComplV3 -13. walk_V)))) V sleep_V -14. NoVoc) Voc please_Voc -15. TEmpty Text -``` - -#BCENTER -Figure 1. Type-annotated syntax tree of the Text "John walks." -#ECENTER - -#EFIG - -Here are some examples of the results of changing constructors. -``` - 1. TFullStop -> TQuestMark John walks? - 3. NoPConj -> but_PConj But John walks. - 6. TPres -> TPast John walked. - 7. ASimul -> AAnter John has walked. - 8. PPos -> PNeg John doesn't walk. -11. john_PN -> mary_PN Mary walks. -13. walk_V -> sleep_V John sleeps. -14. NoVoc -> please_Voc John sleeps please. -``` -All constructors cannot of course be changed so freely, because the -resulting tree would not remain well-typed. Here are some changes involving -many constructors: -``` - 4- 5. UttS (UseCl ...) -> - UttQS (UseQCl (... QuestCl ...)) Does John walk? -10-11. UsePN john_PN -> - UsePron we_Pron We walk. -12-13. UseV walk_V -> - ComplV2 love_V2 this_NP John loves this. -``` - - -==Parts of sentences== - -The linguistic phenomena mostly discussed in both traditional grammars and modern -syntax belong to the level of Clauses, that is, lines 9-13, and occasionally -to Sentences, lines 5-13. At this level, the major categories are -``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically -consists of just an ``NP`` and a ``VP``. -The internal structure of both ``NP`` and ``VP`` can be very complex, -and these categories are mutually recursive: not only can a ``VP`` -contain an ``NP``, -``` - [VP loves [NP Mary]] -``` -but also an ``NP`` can contain a ``VP`` -``` - [NP every man [RS who [VP walks]]] -``` -(a labelled bracketing like this is of course just a rough approximation of -a GF syntax tree, but still a useful device of exposition). - -Most of the resource modules thus define functions that are used inside -NPs and VPs. Here is a brief overview: - -**Noun**. How to construct NPs. The main three mechanisms -for constructing NPs are -- from proper names: "John" -- from pronouns: "we" -- from common nouns by determiners: "this man" - - -The ``Noun`` module also defines the construction of common nouns. -The most frequent ways are -- lexical noun items: "man" -- adjectival modification: "old man" -- relative clause modification: "man who sleeps" -- application of relational nouns: "successor of the number" - - -**Verb**. -How to construct VPs. The main mechanism is verbs with their arguments, -for instance, -- one-place verbs: "walks" -- two-place verbs: "loves Mary" -- three-place verbs: "gives her a kiss" -- sentence-complement verbs: "says that it is cold" -- VP-complement verbs: "wants to give her a kiss" - - -A special verb is the copula, "be" in English but not even realized -by a verb in all languages. -A copula can take different kinds of complement: -- an adjectival phrase: "(John is) old" -- an adverb: "(John is) here" -- a noun phrase: "(John is) a man" - - -**Adjective**. -How to constuct ``AP``s. The main ways are -- positive forms of adjectives: "old" -- comparative forms with object of comparison: "older than John" - - -**Adverb**. -How to construct ``Adv``s. The main ways are -- from adjectives: "slowly" -- as prepositional phrases: "in the car" - - -==Modules and their names== - -This section is not necessary for users of the library. - -TODO: explain the overloaded API. - -The resource modules are named after the kind of -phrases that are constructed in them, -and they can be roughly classified by the "level" or "size" of expressions that are -formed in them: -- Larger than sentence: ``Text``, ``Phrase`` -- Same level as sentence: ``Sentence``, ``Question``, ``Relative`` -- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb`` -- Cross-cut (coordination): ``Conjunction`` - - -Because of mutual recursion such as in embedded sentences, this classification is -not a complete order. However, no mutual dependence is needed between the -modules themselves - they can all be compiled separately. This is due -to the module ``Cat``, which defines the type system common to the other modules. -For instance, the types ``NP`` and ``VP`` are defined in ``Cat``, -and the module ``Verb`` only -needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement -a rule such as -``` - Verb.ComplV2 : V2 -> NP -> VP -``` -it is enough to know the linearization type of ``NP`` -(as well as those of ``V2`` and ``VP``, all -given in ``Cat``). It is not necessary to know what -ways there are to build ``NP``s (given in ``Noun``), since all these ways must -conform to the linearization type defined in ``Cat``. Thus the format of -category-specific modules is as follows: -``` - abstract Adjective = Cat ** {...} - abstract Noun = Cat ** {...} - abstract Verb = Cat ** {...} -``` - - -==Top-level grammar and lexicon== - -The module ``Grammar`` collects all the category-specific modules into -a complete grammar: -``` - abstract Grammar = - Adjective, Noun, Verb, ..., Structural, Idiom -``` -The module ``Structural`` is a lexicon of structural words (function words), -such as determiners. - -The module ``Idiom`` is a collection of idiomatic structures whose -implementation is very language-dependent. An example is existential -structures ("there is", "es gibt", "il y a", etc). - -The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of -ca. 350 content words: -``` - abstract Lang = Grammar, Lexicon -``` -Using ``Lang`` instead of ``Grammar`` as a library may give -for free some words needed in an application. But its main purpose is to -help testing the resource library, rather than as a resource itself. -It does not even seem realistic to develop -a general-purpose multilingual resource lexicon. - -The diagram in Figure 2 shows the structure of the API. - -#BFIG - -#GRAMMAR - -#BCENTER -Figure 2. The resource syntax API. -#ECENTER - -#EFIG - -==Language-specific syntactic structures== - -The API collected in ``Grammar`` has been designed to be implementable for -all languages in the resource package. It does contain some rules that -are strange or superfluous in some languages; for instance, the distinction -between definite and indefinite articles does not apply to Finnish and Russian. -But such rules are still easy to implement: they only create some superfluous -ambiguity in the languages in question. - -But the library makes no claim that all languages should have exactly the same -abstract syntax. The common API is therefore extended by language-dependent -rules. The top level of each languages looks as follows (with English as example): -``` - abstract English = Grammar, ExtraEngAbs, DictEngAbs -``` -where ``ExtraEngAbs`` is a collection of syntactic structures specific to English, -and ``DictEngAbs`` is an English dictionary -(at the moment, it consists of ``IrregEngAbs``, -the irregular verbs of English). Each of these language-specific grammars has -the potential to grow into a full-scale grammar of the language. These grammars -can also be used as libraries, but the possibility of using functors is lost. - -To give a better overview of language-specific structures, -modules like ``ExtraEngAbs`` -are built from a language-independent module ``ExtraAbs`` -by restricted inheritance: -``` - abstract ExtraEngAbs = Extra [f,g,...] -``` -Thus any category and function in ``Extra`` may be shared by a subset of all -languages. One can see this set-up as a matrix, which tells -what ``Extra`` structures -are implemented in what languages. For the common API in ``Grammar``, the matrix -is filled with 1's (everything is implemented in every language). - -Language-specific extensions and the use of restricted -inheritance is a recent addition to the resource grammar library, and -has only been exploited in a very small scale so far. |
