summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/gf-formalism.html350
-rw-r--r--doc/gf-formalism.txt240
2 files changed, 590 insertions, 0 deletions
diff --git a/doc/gf-formalism.html b/doc/gf-formalism.html
new file mode 100644
index 000000000..e52929c03
--- /dev/null
+++ b/doc/gf-formalism.html
@@ -0,0 +1,350 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
+<HTML>
+<HEAD>
+<META NAME="generator" CONTENT="http://txt2tags.sf.net">
+<TITLE>A Birds-Eye View of GF as a Grammar Formalism</TITLE>
+</HEAD><BODY BGCOLOR="white" TEXT="black">
+<P ALIGN="center"><CENTER><H1>A Birds-Eye View of GF as a Grammar Formalism</H1>
+<FONT SIZE="4">
+<I>Author: Aarne Ranta</I><BR>
+Last update: Thu Feb 2 14:16:01 2006
+</FONT></CENTER>
+
+<P></P>
+<HR NOSHADE SIZE=1>
+<P></P>
+ <UL>
+ <LI><A HREF="#toc1">GF in a few words</A>
+ <LI><A HREF="#toc2">History of GF</A>
+ <LI><A HREF="#toc3">Some key ingredients of GF in other grammar formalisms</A>
+ <LI><A HREF="#toc4">Examples of descriptions in each formalism</A>
+ <LI><A HREF="#toc5">Lambda terms and records</A>
+ <LI><A HREF="#toc6">The structure of GF formalisms</A>
+ <LI><A HREF="#toc7">The expressivity of GF</A>
+ <LI><A HREF="#toc8">Grammars and parsing</A>
+ <LI><A HREF="#toc9">Grammars as software libraries</A>
+ <LI><A HREF="#toc10">Multilinguality</A>
+ <LI><A HREF="#toc11">Parametrized modules</A>
+ </UL>
+
+<P></P>
+<HR NOSHADE SIZE=1>
+<P></P>
+<P>
+<IMG ALIGN="middle" SRC="gf-logo.gif" BORDER="0" ALT="">
+</P>
+<P>
+<I>Abstract. This document gives a general description of the</I>
+<I>Grammatical Framework (GF), with comparisons to other grammar</I>
+<I>formalisms such as CG, ACG, HPSG, and LFG.</I>
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc1"></A>
+<H2>GF in a few words</H2>
+<P>
+Grammatical Framework (GF) is a grammar formalism
+based on <B>constructive type theory</B>.
+</P>
+<P>
+GF makes a distinction between <B>abstract syntax</B> and <B>concrete syntax</B>.
+</P>
+<P>
+The abstract syntax part of GF is a <B>logical framework</B>, with
+dependent types and higher-order functions.
+</P>
+<P>
+The concrete syntax is a system of <B>records</B> containing strings and features.
+</P>
+<P>
+A GF grammar defines a <B>reversible homomorphism</B> from an abstract syntax to a
+concrete syntax.
+</P>
+<P>
+A <B>multilingual GF grammar</B> is a set of concrete syntaxes associated with
+one abstract syntax.
+</P>
+<P>
+GF grammars are written in a high-level <B>functional programming language</B>,
+which is compiled into a <B>core language</B> (GFC).
+</P>
+<P>
+GF grammars can be used as <B>resources</B>, i.e. as libraries for writing
+new grammars; these are compiled and optimized by the method of
+<B>grammar composition</B>.
+</P>
+<P>
+GF has a <B>module system</B> that supports grammar engineering and separate
+compilation.
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc2"></A>
+<H2>History of GF</H2>
+<P>
+1988. Intuitionistic Categorial Grammar; type theory as abstract syntax,
+playing the role of Montague's analysis trees. Grammars implemented in Prolog.
+</P>
+<P>
+1994. Type-Theoretical Grammar. Abstract syntax organized as a system of
+combinators. Grammars implemented in ALF.
+</P>
+<P>
+1996. Multilingual Type-Theoretical Grammar. Rules for generating six
+languages from the same abstract syntax. Grammars implemented in ALF, ML, and
+Haskell.
+</P>
+<P>
+1998. The first implementation of GF as a language of its own.
+</P>
+<P>
+2000. New version of GF: high-level functional source language, records used
+for concrete syntax.
+</P>
+<P>
+2003. The module system.
+</P>
+<P>
+2004. Ljunglöf's thesis <I>Expressivity and Complexity of GF</I>.
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc3"></A>
+<H2>Some key ingredients of GF in other grammar formalisms</H2>
+<UL>
+<LI>[GF ]: Grammatical Framework
+<LI>[CG ]: categorial grammar
+<LI>[ACG ]: abstract categorial grammar
+<LI>[HPSG ]: head-driven phrase structure grammar
+<LI>[LFG ]: lexical functional grammar
+</UL>
+
+<TABLE CELLPADDING="4" BORDER="1">
+<TR>
+<TD ALIGN="center">/</TD>
+<TD>GF</TD>
+<TD>ACG</TD>
+<TD>LFG</TD>
+<TD>HPSG</TD>
+<TD>CG</TD>
+</TR>
+<TR>
+<TD>abstract vs concrete syntax</TD>
+<TD>X</TD>
+<TD>X</TD>
+<TD>?</TD>
+<TD>-</TD>
+<TD>-</TD>
+</TR>
+<TR>
+<TD>type theory</TD>
+<TD>X</TD>
+<TD>X</TD>
+<TD>-</TD>
+<TD>-</TD>
+<TD>X</TD>
+</TR>
+<TR>
+<TD>records and features</TD>
+<TD>X</TD>
+<TD>-</TD>
+<TD>X</TD>
+<TD>X</TD>
+<TD>-</TD>
+</TR>
+</TABLE>
+
+<P></P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc4"></A>
+<H2>Examples of descriptions in each formalism</H2>
+<P>
+To be written...
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc5"></A>
+<H2>Lambda terms and records</H2>
+<P>
+In CS, abstract syntax is trees and concrete syntax is strings.
+This works more or less for programming languages.
+</P>
+<P>
+In CG, all syntax is lambda terms.
+</P>
+<P>
+In Montague grammar, abstract syntax is lambda terms and
+concrete syntax is trees. Abstract syntax as lambda terms
+can be considered well-established.
+</P>
+<P>
+In PATR and HPSG, concrete syntax it records. This can be considered
+well-established for natural languages.
+</P>
+<P>
+In ACG, both are lambda terms. This is more general than GF,
+but reversibility requires linearity restriction, which can be
+unnatural for grammar writing.
+</P>
+<P>
+In GF, linearization from lambda terms to records is reversible,
+and grammar writing is not restricted to linear terms.
+</P>
+<P>
+Grammar composition in ACG is just function composition. In GF,
+it is more restricted...
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc6"></A>
+<H2>The structure of GF formalisms</H2>
+<P>
+The following diagram (to be drawn properly!) describes the
+levels.
+</P>
+<PRE>
+ | programming language design
+ V
+ GF source language
+ |
+ | type-directed partial evaluation
+ V
+ GFC assembly language
+ |
+ | Ljunglöf's translation
+ V
+ MCFG parser
+</PRE>
+<P>
+The last two phases are nontrivial mathematica properties.
+</P>
+<P>
+In most grammar formalisms, grammarians have to work on the GFC
+(or MCFG) level.
+</P>
+<P>
+Maybe they use macros - they are therefore like macro assemblers. But there
+are no separately compiled library modules, no type checking, etc.
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc7"></A>
+<H2>The expressivity of GF</H2>
+<P>
+Parsing complexity is the same as MCFG: polynomial, with
+unrestricted exponent depending on grammar.
+This is between TAG and HPSG.
+</P>
+<P>
+If semantic well-formedness (type theory) is taken into account,
+then arbitrary logic can be expressed. The well-formedness of
+abstract syntax is decidable, but the well-formedness of a
+concrete-syntax string can require an arbitrary proof construction
+and is therefore undecidable.
+</P>
+<P>
+Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF
+has the goal of assigning intended trees for strings. This is
+generalized to shared trees for different languages.
+</P>
+<P>
+The high-level language strives after the properties of
+writability and readability (programming language notions).
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc8"></A>
+<H2>Grammars and parsing</H2>
+<P>
+In many projects, a grammar is just seen as a <B>declarative parsing program</B>.
+</P>
+<P>
+For GF, a grammar is primarily the <B>definition of a language</B>.
+</P>
+<P>
+Detaching grammars from parsers is a good idea, giving
+</P>
+<UL>
+<LI>more efficient and robust parsing (statistical etc)
+<LI>cleaner grammars
+</UL>
+
+<P>
+Separating abstract from concrete syntax is a prerequisite for this:
+we want parsers to return abstract syntax objects, and these must exist
+independently of parse trees.
+</P>
+<P>
+A possible radical approach to parsing:
+use a grammar to generate a treebank and machine-learn
+a statistical parser from this.
+</P>
+<P>
+Comparison: Steedman in CCG has done something like this.
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc9"></A>
+<H2>Grammars as software libraries</H2>
+<P>
+Reuse for different purposes.
+</P>
+<P>
+Grammar composition.
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc10"></A>
+<H2>Multilinguality</H2>
+<P>
+In <B>application grammars</B>, the AS is a semantic
+model, and a CS covers domain terminology and idioms.
+</P>
+<P>
+This can give publication-quality translation on
+limited domains (e.g. the WebALT project).
+</P>
+<P>
+Resource grammars with grammar composition lead to
+<B>compile-time transfer</B>.
+</P>
+<P>
+When is <B>run-time transfer</B> necessary?
+</P>
+<P>
+Cf. CLE (Core Language Engine).
+</P>
+<P>
+<!-- NEW -->
+</P>
+<A NAME="toc11"></A>
+<H2>Parametrized modules</H2>
+<P>
+This notion comes from the ML language in the 1980's.
+</P>
+<P>
+It can be used for sharing even more code between languages
+than their AS.
+</P>
+<P>
+Especially, for related languages (Scandinavian, Romance).
+</P>
+<P>
+Cf. grammar porting in CLE: what they do with untyped
+macro packages GF does with typable interfaces.
+</P>
+
+<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
+<!-- cmdline: txt2tags -thtml -\-toc gf-formalism.txt -->
+</BODY></HTML>
diff --git a/doc/gf-formalism.txt b/doc/gf-formalism.txt
new file mode 100644
index 000000000..ca63656d1
--- /dev/null
+++ b/doc/gf-formalism.txt
@@ -0,0 +1,240 @@
+A Birds-Eye View of GF as a Grammar Formalism
+Author: Aarne Ranta
+Last update: %%date(%c)
+
+% NOTE: this is a txt2tags file.
+% Create an html file from this file using:
+% txt2tags -thtml --toc gf-formalism.txt
+
+%!target:html
+
+%!postproc(html): #NEW <!-- NEW -->
+
+[gf-logo.gif]
+
+//Abstract. This document gives a general description of the//
+//Grammatical Framework (GF), with comparisons to other grammar//
+//formalisms such as CG, ACG, HPSG, and LFG.//
+
+
+#NEW
+
+==GF in a few words==
+
+Grammatical Framework (GF) is a grammar formalism
+based on **constructive type theory**.
+
+GF makes a distinction between **abstract syntax** and **concrete syntax**.
+
+The abstract syntax part of GF is a **logical framework**, with
+dependent types and higher-order functions.
+
+The concrete syntax is a system of **records** containing strings and features.
+
+A GF grammar defines a **reversible homomorphism** from an abstract syntax to a
+concrete syntax.
+
+A **multilingual GF grammar** is a set of concrete syntaxes associated with
+one abstract syntax.
+
+GF grammars are written in a high-level **functional programming language**,
+which is compiled into a **core language** (GFC).
+
+GF grammars can be used as **resources**, i.e. as libraries for writing
+new grammars; these are compiled and optimized by the method of
+**grammar composition**.
+
+GF has a **module system** that supports grammar engineering and separate
+compilation.
+
+
+#NEW
+
+==History of GF==
+
+1988. Intuitionistic Categorial Grammar; type theory as abstract syntax,
+playing the role of Montague's analysis trees. Grammars implemented in Prolog.
+
+1994. Type-Theoretical Grammar. Abstract syntax organized as a system of
+combinators. Grammars implemented in ALF.
+
+1996. Multilingual Type-Theoretical Grammar. Rules for generating six
+languages from the same abstract syntax. Grammars implemented in ALF, ML, and
+Haskell.
+
+1998. The first implementation of GF as a language of its own.
+
+2000. New version of GF: high-level functional source language, records used
+for concrete syntax.
+
+2003. The module system.
+
+2004. Ljunglöf's thesis //Expressivity and Complexity of GF//.
+
+
+
+#NEW
+
+==Some key ingredients of GF in other grammar formalisms==
+
+- [GF ]: Grammatical Framework
+- [CG ]: categorial grammar
+- [ACG ]: abstract categorial grammar
+- [HPSG ]: head-driven phrase structure grammar
+- [LFG ]: lexical functional grammar
+
+
+| / | GF | ACG | LFG | HPSG | CG |
+| abstract vs concrete syntax | X | X | ? | - | - |
+| type theory | X | X | - | - | X |
+| records and features | X | - | X | X | - |
+
+
+#NEW
+
+==Examples of descriptions in each formalism==
+
+To be written...
+
+
+#NEW
+
+==Lambda terms and records==
+
+In CS, abstract syntax is trees and concrete syntax is strings.
+This works more or less for programming languages.
+
+In CG, all syntax is lambda terms.
+
+In Montague grammar, abstract syntax is lambda terms and
+concrete syntax is trees. Abstract syntax as lambda terms
+can be considered well-established.
+
+In PATR and HPSG, concrete syntax it records. This can be considered
+well-established for natural languages.
+
+In ACG, both are lambda terms. This is more general than GF,
+but reversibility requires linearity restriction, which can be
+unnatural for grammar writing.
+
+In GF, linearization from lambda terms to records is reversible,
+and grammar writing is not restricted to linear terms.
+
+Grammar composition in ACG is just function composition. In GF,
+it is more restricted...
+
+
+#NEW
+
+==The structure of GF formalisms==
+
+The following diagram (to be drawn properly!) describes the
+levels.
+```
+ | programming language design
+ V
+ GF source language
+ |
+ | type-directed partial evaluation
+ V
+ GFC assembly language
+ |
+ | Ljunglöf's translation
+ V
+ MCFG parser
+```
+The last two phases are nontrivial mathematica properties.
+
+In most grammar formalisms, grammarians have to work on the GFC
+(or MCFG) level.
+
+Maybe they use macros - they are therefore like macro assemblers. But there
+are no separately compiled library modules, no type checking, etc.
+
+
+#NEW
+
+==The expressivity of GF==
+
+Parsing complexity is the same as MCFG: polynomial, with
+unrestricted exponent depending on grammar.
+This is between TAG and HPSG.
+
+If semantic well-formedness (type theory) is taken into account,
+then arbitrary logic can be expressed. The well-formedness of
+abstract syntax is decidable, but the well-formedness of a
+concrete-syntax string can require an arbitrary proof construction
+and is therefore undecidable.
+
+Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF
+has the goal of assigning intended trees for strings. This is
+generalized to shared trees for different languages.
+
+The high-level language strives after the properties of
+writability and readability (programming language notions).
+
+
+#NEW
+
+==Grammars and parsing==
+
+In many projects, a grammar is just seen as a **declarative parsing program**.
+
+For GF, a grammar is primarily the **definition of a language**.
+
+Detaching grammars from parsers is a good idea, giving
+- more efficient and robust parsing (statistical etc)
+- cleaner grammars
+
+
+Separating abstract from concrete syntax is a prerequisite for this:
+we want parsers to return abstract syntax objects, and these must exist
+independently of parse trees.
+
+A possible radical approach to parsing:
+use a grammar to generate a treebank and machine-learn
+a statistical parser from this.
+
+Comparison: Steedman in CCG has done something like this.
+
+
+#NEW
+
+==Grammars as software libraries==
+
+Reuse for different purposes.
+
+Grammar composition.
+
+
+#NEW
+
+==Multilinguality==
+
+In **application grammars**, the AS is a semantic
+model, and a CS covers domain terminology and idioms.
+
+This can give publication-quality translation on
+limited domains (e.g. the WebALT project).
+
+Resource grammars with grammar composition lead to
+**compile-time transfer**.
+
+When is **run-time transfer** necessary?
+
+Cf. CLE (Core Language Engine).
+
+
+#NEW
+
+==Parametrized modules==
+
+This notion comes from the ML language in the 1980's.
+
+It can be used for sharing even more code between languages
+than their AS.
+
+Especially, for related languages (Scandinavian, Romance).
+
+Cf. grammar porting in CLE: what they do with untyped
+macro packages GF does with typable interfaces.