diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/gf-formalism.html | 350 | ||||
| -rw-r--r-- | doc/gf-formalism.txt | 240 |
2 files changed, 590 insertions, 0 deletions
diff --git a/doc/gf-formalism.html b/doc/gf-formalism.html new file mode 100644 index 000000000..e52929c03 --- /dev/null +++ b/doc/gf-formalism.html @@ -0,0 +1,350 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> +<HTML> +<HEAD> +<META NAME="generator" CONTENT="http://txt2tags.sf.net"> +<TITLE>A Birds-Eye View of GF as a Grammar Formalism</TITLE> +</HEAD><BODY BGCOLOR="white" TEXT="black"> +<P ALIGN="center"><CENTER><H1>A Birds-Eye View of GF as a Grammar Formalism</H1> +<FONT SIZE="4"> +<I>Author: Aarne Ranta</I><BR> +Last update: Thu Feb 2 14:16:01 2006 +</FONT></CENTER> + +<P></P> +<HR NOSHADE SIZE=1> +<P></P> + <UL> + <LI><A HREF="#toc1">GF in a few words</A> + <LI><A HREF="#toc2">History of GF</A> + <LI><A HREF="#toc3">Some key ingredients of GF in other grammar formalisms</A> + <LI><A HREF="#toc4">Examples of descriptions in each formalism</A> + <LI><A HREF="#toc5">Lambda terms and records</A> + <LI><A HREF="#toc6">The structure of GF formalisms</A> + <LI><A HREF="#toc7">The expressivity of GF</A> + <LI><A HREF="#toc8">Grammars and parsing</A> + <LI><A HREF="#toc9">Grammars as software libraries</A> + <LI><A HREF="#toc10">Multilinguality</A> + <LI><A HREF="#toc11">Parametrized modules</A> + </UL> + +<P></P> +<HR NOSHADE SIZE=1> +<P></P> +<P> +<IMG ALIGN="middle" SRC="gf-logo.gif" BORDER="0" ALT=""> +</P> +<P> +<I>Abstract. This document gives a general description of the</I> +<I>Grammatical Framework (GF), with comparisons to other grammar</I> +<I>formalisms such as CG, ACG, HPSG, and LFG.</I> +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc1"></A> +<H2>GF in a few words</H2> +<P> +Grammatical Framework (GF) is a grammar formalism +based on <B>constructive type theory</B>. +</P> +<P> +GF makes a distinction between <B>abstract syntax</B> and <B>concrete syntax</B>. +</P> +<P> +The abstract syntax part of GF is a <B>logical framework</B>, with +dependent types and higher-order functions. +</P> +<P> +The concrete syntax is a system of <B>records</B> containing strings and features. +</P> +<P> +A GF grammar defines a <B>reversible homomorphism</B> from an abstract syntax to a +concrete syntax. +</P> +<P> +A <B>multilingual GF grammar</B> is a set of concrete syntaxes associated with +one abstract syntax. +</P> +<P> +GF grammars are written in a high-level <B>functional programming language</B>, +which is compiled into a <B>core language</B> (GFC). +</P> +<P> +GF grammars can be used as <B>resources</B>, i.e. as libraries for writing +new grammars; these are compiled and optimized by the method of +<B>grammar composition</B>. +</P> +<P> +GF has a <B>module system</B> that supports grammar engineering and separate +compilation. +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc2"></A> +<H2>History of GF</H2> +<P> +1988. Intuitionistic Categorial Grammar; type theory as abstract syntax, +playing the role of Montague's analysis trees. Grammars implemented in Prolog. +</P> +<P> +1994. Type-Theoretical Grammar. Abstract syntax organized as a system of +combinators. Grammars implemented in ALF. +</P> +<P> +1996. Multilingual Type-Theoretical Grammar. Rules for generating six +languages from the same abstract syntax. Grammars implemented in ALF, ML, and +Haskell. +</P> +<P> +1998. The first implementation of GF as a language of its own. +</P> +<P> +2000. New version of GF: high-level functional source language, records used +for concrete syntax. +</P> +<P> +2003. The module system. +</P> +<P> +2004. Ljunglöf's thesis <I>Expressivity and Complexity of GF</I>. +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc3"></A> +<H2>Some key ingredients of GF in other grammar formalisms</H2> +<UL> +<LI>[GF ]: Grammatical Framework +<LI>[CG ]: categorial grammar +<LI>[ACG ]: abstract categorial grammar +<LI>[HPSG ]: head-driven phrase structure grammar +<LI>[LFG ]: lexical functional grammar +</UL> + +<TABLE CELLPADDING="4" BORDER="1"> +<TR> +<TD ALIGN="center">/</TD> +<TD>GF</TD> +<TD>ACG</TD> +<TD>LFG</TD> +<TD>HPSG</TD> +<TD>CG</TD> +</TR> +<TR> +<TD>abstract vs concrete syntax</TD> +<TD>X</TD> +<TD>X</TD> +<TD>?</TD> +<TD>-</TD> +<TD>-</TD> +</TR> +<TR> +<TD>type theory</TD> +<TD>X</TD> +<TD>X</TD> +<TD>-</TD> +<TD>-</TD> +<TD>X</TD> +</TR> +<TR> +<TD>records and features</TD> +<TD>X</TD> +<TD>-</TD> +<TD>X</TD> +<TD>X</TD> +<TD>-</TD> +</TR> +</TABLE> + +<P></P> +<P> +<!-- NEW --> +</P> +<A NAME="toc4"></A> +<H2>Examples of descriptions in each formalism</H2> +<P> +To be written... +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc5"></A> +<H2>Lambda terms and records</H2> +<P> +In CS, abstract syntax is trees and concrete syntax is strings. +This works more or less for programming languages. +</P> +<P> +In CG, all syntax is lambda terms. +</P> +<P> +In Montague grammar, abstract syntax is lambda terms and +concrete syntax is trees. Abstract syntax as lambda terms +can be considered well-established. +</P> +<P> +In PATR and HPSG, concrete syntax it records. This can be considered +well-established for natural languages. +</P> +<P> +In ACG, both are lambda terms. This is more general than GF, +but reversibility requires linearity restriction, which can be +unnatural for grammar writing. +</P> +<P> +In GF, linearization from lambda terms to records is reversible, +and grammar writing is not restricted to linear terms. +</P> +<P> +Grammar composition in ACG is just function composition. In GF, +it is more restricted... +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc6"></A> +<H2>The structure of GF formalisms</H2> +<P> +The following diagram (to be drawn properly!) describes the +levels. +</P> +<PRE> + | programming language design + V + GF source language + | + | type-directed partial evaluation + V + GFC assembly language + | + | Ljunglöf's translation + V + MCFG parser +</PRE> +<P> +The last two phases are nontrivial mathematica properties. +</P> +<P> +In most grammar formalisms, grammarians have to work on the GFC +(or MCFG) level. +</P> +<P> +Maybe they use macros - they are therefore like macro assemblers. But there +are no separately compiled library modules, no type checking, etc. +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc7"></A> +<H2>The expressivity of GF</H2> +<P> +Parsing complexity is the same as MCFG: polynomial, with +unrestricted exponent depending on grammar. +This is between TAG and HPSG. +</P> +<P> +If semantic well-formedness (type theory) is taken into account, +then arbitrary logic can be expressed. The well-formedness of +abstract syntax is decidable, but the well-formedness of a +concrete-syntax string can require an arbitrary proof construction +and is therefore undecidable. +</P> +<P> +Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF +has the goal of assigning intended trees for strings. This is +generalized to shared trees for different languages. +</P> +<P> +The high-level language strives after the properties of +writability and readability (programming language notions). +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc8"></A> +<H2>Grammars and parsing</H2> +<P> +In many projects, a grammar is just seen as a <B>declarative parsing program</B>. +</P> +<P> +For GF, a grammar is primarily the <B>definition of a language</B>. +</P> +<P> +Detaching grammars from parsers is a good idea, giving +</P> +<UL> +<LI>more efficient and robust parsing (statistical etc) +<LI>cleaner grammars +</UL> + +<P> +Separating abstract from concrete syntax is a prerequisite for this: +we want parsers to return abstract syntax objects, and these must exist +independently of parse trees. +</P> +<P> +A possible radical approach to parsing: +use a grammar to generate a treebank and machine-learn +a statistical parser from this. +</P> +<P> +Comparison: Steedman in CCG has done something like this. +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc9"></A> +<H2>Grammars as software libraries</H2> +<P> +Reuse for different purposes. +</P> +<P> +Grammar composition. +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc10"></A> +<H2>Multilinguality</H2> +<P> +In <B>application grammars</B>, the AS is a semantic +model, and a CS covers domain terminology and idioms. +</P> +<P> +This can give publication-quality translation on +limited domains (e.g. the WebALT project). +</P> +<P> +Resource grammars with grammar composition lead to +<B>compile-time transfer</B>. +</P> +<P> +When is <B>run-time transfer</B> necessary? +</P> +<P> +Cf. CLE (Core Language Engine). +</P> +<P> +<!-- NEW --> +</P> +<A NAME="toc11"></A> +<H2>Parametrized modules</H2> +<P> +This notion comes from the ML language in the 1980's. +</P> +<P> +It can be used for sharing even more code between languages +than their AS. +</P> +<P> +Especially, for related languages (Scandinavian, Romance). +</P> +<P> +Cf. grammar porting in CLE: what they do with untyped +macro packages GF does with typable interfaces. +</P> + +<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) --> +<!-- cmdline: txt2tags -thtml -\-toc gf-formalism.txt --> +</BODY></HTML> diff --git a/doc/gf-formalism.txt b/doc/gf-formalism.txt new file mode 100644 index 000000000..ca63656d1 --- /dev/null +++ b/doc/gf-formalism.txt @@ -0,0 +1,240 @@ +A Birds-Eye View of GF as a Grammar Formalism +Author: Aarne Ranta +Last update: %%date(%c) + +% NOTE: this is a txt2tags file. +% Create an html file from this file using: +% txt2tags -thtml --toc gf-formalism.txt + +%!target:html + +%!postproc(html): #NEW <!-- NEW --> + +[gf-logo.gif] + +//Abstract. This document gives a general description of the// +//Grammatical Framework (GF), with comparisons to other grammar// +//formalisms such as CG, ACG, HPSG, and LFG.// + + +#NEW + +==GF in a few words== + +Grammatical Framework (GF) is a grammar formalism +based on **constructive type theory**. + +GF makes a distinction between **abstract syntax** and **concrete syntax**. + +The abstract syntax part of GF is a **logical framework**, with +dependent types and higher-order functions. + +The concrete syntax is a system of **records** containing strings and features. + +A GF grammar defines a **reversible homomorphism** from an abstract syntax to a +concrete syntax. + +A **multilingual GF grammar** is a set of concrete syntaxes associated with +one abstract syntax. + +GF grammars are written in a high-level **functional programming language**, +which is compiled into a **core language** (GFC). + +GF grammars can be used as **resources**, i.e. as libraries for writing +new grammars; these are compiled and optimized by the method of +**grammar composition**. + +GF has a **module system** that supports grammar engineering and separate +compilation. + + +#NEW + +==History of GF== + +1988. Intuitionistic Categorial Grammar; type theory as abstract syntax, +playing the role of Montague's analysis trees. Grammars implemented in Prolog. + +1994. Type-Theoretical Grammar. Abstract syntax organized as a system of +combinators. Grammars implemented in ALF. + +1996. Multilingual Type-Theoretical Grammar. Rules for generating six +languages from the same abstract syntax. Grammars implemented in ALF, ML, and +Haskell. + +1998. The first implementation of GF as a language of its own. + +2000. New version of GF: high-level functional source language, records used +for concrete syntax. + +2003. The module system. + +2004. Ljunglöf's thesis //Expressivity and Complexity of GF//. + + + +#NEW + +==Some key ingredients of GF in other grammar formalisms== + +- [GF ]: Grammatical Framework +- [CG ]: categorial grammar +- [ACG ]: abstract categorial grammar +- [HPSG ]: head-driven phrase structure grammar +- [LFG ]: lexical functional grammar + + +| / | GF | ACG | LFG | HPSG | CG | +| abstract vs concrete syntax | X | X | ? | - | - | +| type theory | X | X | - | - | X | +| records and features | X | - | X | X | - | + + +#NEW + +==Examples of descriptions in each formalism== + +To be written... + + +#NEW + +==Lambda terms and records== + +In CS, abstract syntax is trees and concrete syntax is strings. +This works more or less for programming languages. + +In CG, all syntax is lambda terms. + +In Montague grammar, abstract syntax is lambda terms and +concrete syntax is trees. Abstract syntax as lambda terms +can be considered well-established. + +In PATR and HPSG, concrete syntax it records. This can be considered +well-established for natural languages. + +In ACG, both are lambda terms. This is more general than GF, +but reversibility requires linearity restriction, which can be +unnatural for grammar writing. + +In GF, linearization from lambda terms to records is reversible, +and grammar writing is not restricted to linear terms. + +Grammar composition in ACG is just function composition. In GF, +it is more restricted... + + +#NEW + +==The structure of GF formalisms== + +The following diagram (to be drawn properly!) describes the +levels. +``` + | programming language design + V + GF source language + | + | type-directed partial evaluation + V + GFC assembly language + | + | Ljunglöf's translation + V + MCFG parser +``` +The last two phases are nontrivial mathematica properties. + +In most grammar formalisms, grammarians have to work on the GFC +(or MCFG) level. + +Maybe they use macros - they are therefore like macro assemblers. But there +are no separately compiled library modules, no type checking, etc. + + +#NEW + +==The expressivity of GF== + +Parsing complexity is the same as MCFG: polynomial, with +unrestricted exponent depending on grammar. +This is between TAG and HPSG. + +If semantic well-formedness (type theory) is taken into account, +then arbitrary logic can be expressed. The well-formedness of +abstract syntax is decidable, but the well-formedness of a +concrete-syntax string can require an arbitrary proof construction +and is therefore undecidable. + +Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF +has the goal of assigning intended trees for strings. This is +generalized to shared trees for different languages. + +The high-level language strives after the properties of +writability and readability (programming language notions). + + +#NEW + +==Grammars and parsing== + +In many projects, a grammar is just seen as a **declarative parsing program**. + +For GF, a grammar is primarily the **definition of a language**. + +Detaching grammars from parsers is a good idea, giving +- more efficient and robust parsing (statistical etc) +- cleaner grammars + + +Separating abstract from concrete syntax is a prerequisite for this: +we want parsers to return abstract syntax objects, and these must exist +independently of parse trees. + +A possible radical approach to parsing: +use a grammar to generate a treebank and machine-learn +a statistical parser from this. + +Comparison: Steedman in CCG has done something like this. + + +#NEW + +==Grammars as software libraries== + +Reuse for different purposes. + +Grammar composition. + + +#NEW + +==Multilinguality== + +In **application grammars**, the AS is a semantic +model, and a CS covers domain terminology and idioms. + +This can give publication-quality translation on +limited domains (e.g. the WebALT project). + +Resource grammars with grammar composition lead to +**compile-time transfer**. + +When is **run-time transfer** necessary? + +Cf. CLE (Core Language Engine). + + +#NEW + +==Parametrized modules== + +This notion comes from the ML language in the 1980's. + +It can be used for sharing even more code between languages +than their AS. + +Especially, for related languages (Scandinavian, Romance). + +Cf. grammar porting in CLE: what they do with untyped +macro packages GF does with typable interfaces. |
