summaryrefslogtreecommitdiff
path: root/doc/gf-formalism.html
blob: e52929c03c10efc5018b7fe2f9391fa34a722773 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
<TITLE>A Birds-Eye View of GF as a Grammar Formalism</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<P ALIGN="center"><CENTER><H1>A Birds-Eye View of GF as a Grammar Formalism</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta</I><BR>
Last update: Thu Feb  2 14:16:01 2006
</FONT></CENTER>

<P></P>
<HR NOSHADE SIZE=1>
<P></P>
    <UL>
    <LI><A HREF="#toc1">GF in a few words</A>
    <LI><A HREF="#toc2">History of GF</A>
    <LI><A HREF="#toc3">Some key ingredients of GF in other grammar formalisms</A>
    <LI><A HREF="#toc4">Examples of descriptions in each formalism</A>
    <LI><A HREF="#toc5">Lambda terms and records</A>
    <LI><A HREF="#toc6">The structure of GF formalisms</A>
    <LI><A HREF="#toc7">The expressivity of GF</A>
    <LI><A HREF="#toc8">Grammars and parsing</A>
    <LI><A HREF="#toc9">Grammars as software libraries</A>
    <LI><A HREF="#toc10">Multilinguality</A>
    <LI><A HREF="#toc11">Parametrized modules</A>
    </UL>

<P></P>
<HR NOSHADE SIZE=1>
<P></P>
<P>
<IMG ALIGN="middle" SRC="gf-logo.gif" BORDER="0" ALT="">
</P>
<P>
<I>Abstract. This document gives a general description of the</I>
<I>Grammatical Framework (GF), with comparisons to other grammar</I>
<I>formalisms such as CG, ACG, HPSG, and LFG.</I>
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc1"></A>
<H2>GF in a few words</H2>
<P>
Grammatical Framework (GF) is a grammar formalism 
based on <B>constructive type theory</B>.
</P>
<P>
GF makes a distinction between <B>abstract syntax</B> and <B>concrete syntax</B>.
</P>
<P>
The abstract syntax part of GF is a <B>logical framework</B>, with
dependent types and higher-order functions.
</P>
<P>
The concrete syntax is a system of <B>records</B> containing strings and features.
</P>
<P>
A GF grammar defines a <B>reversible homomorphism</B> from an abstract syntax to a
concrete syntax.
</P>
<P>
A <B>multilingual GF grammar</B> is a set of concrete syntaxes associated with
one abstract syntax.
</P>
<P>
GF grammars are written in a high-level <B>functional programming language</B>,
which is compiled into a <B>core language</B> (GFC).
</P>
<P>
GF grammars can be used as <B>resources</B>, i.e. as libraries for writing
new grammars; these are compiled and optimized by the method of
<B>grammar composition</B>.
</P>
<P>
GF has a <B>module system</B> that supports grammar engineering and separate 
compilation.
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc2"></A>
<H2>History of GF</H2>
<P>
1988. Intuitionistic Categorial Grammar; type theory as abstract syntax,
playing the role of Montague's analysis trees. Grammars implemented in Prolog.
</P>
<P>
1994. Type-Theoretical Grammar. Abstract syntax organized as a system of
combinators. Grammars implemented in ALF.
</P>
<P>
1996. Multilingual Type-Theoretical Grammar. Rules for generating six
languages from the same abstract syntax. Grammars implemented in ALF, ML, and
Haskell.
</P>
<P>
1998. The first implementation of GF as a language of its own.
</P>
<P>
2000. New version of GF: high-level functional source language, records used
for concrete syntax.
</P>
<P>
2003. The module system.
</P>
<P>
2004. Ljunglöf's thesis <I>Expressivity and Complexity of GF</I>.
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc3"></A>
<H2>Some key ingredients of GF in other grammar formalisms</H2>
<UL>
<LI>[GF ]: Grammatical Framework
<LI>[CG ]: categorial grammar
<LI>[ACG ]: abstract categorial grammar
<LI>[HPSG ]: head-driven phrase structure grammar
<LI>[LFG ]: lexical functional grammar
</UL>

<TABLE CELLPADDING="4" BORDER="1">
<TR>
<TD ALIGN="center">/</TD>
<TD>GF</TD>
<TD>ACG</TD>
<TD>LFG</TD>
<TD>HPSG</TD>
<TD>CG</TD>
</TR>
<TR>
<TD>abstract vs concrete syntax</TD>
<TD>X</TD>
<TD>X</TD>
<TD>?</TD>
<TD>-</TD>
<TD>-</TD>
</TR>
<TR>
<TD>type theory</TD>
<TD>X</TD>
<TD>X</TD>
<TD>-</TD>
<TD>-</TD>
<TD>X</TD>
</TR>
<TR>
<TD>records and features</TD>
<TD>X</TD>
<TD>-</TD>
<TD>X</TD>
<TD>X</TD>
<TD>-</TD>
</TR>
</TABLE>

<P></P>
<P>
<!-- NEW -->
</P>
<A NAME="toc4"></A>
<H2>Examples of descriptions in each formalism</H2>
<P>
To be written...
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc5"></A>
<H2>Lambda terms and records</H2>
<P>
In CS, abstract syntax is trees and concrete syntax is strings.
This works more or less for programming languages.
</P>
<P>
In CG, all syntax is lambda terms. 
</P>
<P>
In Montague grammar, abstract syntax is lambda terms and
concrete syntax is trees. Abstract syntax as lambda terms
can be considered well-established.
</P>
<P>
In PATR and HPSG, concrete syntax it records. This can be considered
well-established for natural languages.
</P>
<P>
In ACG, both are lambda terms. This is more general than GF,
but reversibility requires linearity restriction, which can be
unnatural for grammar writing.
</P>
<P>
In GF, linearization from lambda terms to records is reversible,
and grammar writing is not restricted to linear terms.
</P>
<P>
Grammar composition in ACG is just function composition. In GF,
it is more restricted...
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc6"></A>
<H2>The structure of GF formalisms</H2>
<P>
The following diagram (to be drawn properly!) describes the
levels.
</P>
<PRE>
         |   programming language design
         V
    GF source language
         |
         |   type-directed partial evaluation
         V
    GFC assembly language
         |
         |   Ljunglöf's translation
         V
    MCFG parser
</PRE>
<P>
The last two phases are nontrivial mathematica properties.
</P>
<P>
In most grammar formalisms, grammarians have to work on the GFC
(or MCFG) level.
</P>
<P>
Maybe they use macros - they are therefore like macro assemblers. But there
are no separately compiled library modules, no type checking, etc.
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc7"></A>
<H2>The expressivity of GF</H2>
<P>
Parsing complexity is the same as MCFG: polynomial, with
unrestricted exponent depending on grammar.
This is between TAG and HPSG.
</P>
<P>
If semantic well-formedness (type theory) is taken into account,
then arbitrary logic can be expressed. The well-formedness of
abstract syntax is decidable, but the well-formedness of a
concrete-syntax string can require an arbitrary proof construction
and is therefore undecidable.
</P>
<P>
Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF
has the goal of assigning intended trees for strings. This is
generalized to shared trees for different languages.
</P>
<P>
The high-level language strives after the properties of
writability and readability (programming language notions).
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc8"></A>
<H2>Grammars and parsing</H2>
<P>
In many projects, a grammar is just seen as a <B>declarative parsing program</B>.
</P>
<P>
For GF, a grammar is primarily the <B>definition of a language</B>.
</P>
<P>
Detaching grammars from parsers is a good idea, giving
</P>
<UL>
<LI>more efficient and robust parsing (statistical etc)
<LI>cleaner grammars
</UL>

<P>
Separating abstract from concrete syntax is a prerequisite for this:
we want parsers to return abstract syntax objects, and these must exist
independently of parse trees.
</P>
<P>
A possible radical approach to parsing: 
use a grammar to generate a treebank and machine-learn
a statistical parser from this.
</P>
<P>
Comparison: Steedman in CCG has done something like this.
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc9"></A>
<H2>Grammars as software libraries</H2>
<P>
Reuse for different purposes.
</P>
<P>
Grammar composition.
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc10"></A>
<H2>Multilinguality</H2>
<P>
In <B>application grammars</B>, the AS is a semantic
model, and a CS covers domain terminology and idioms.
</P>
<P>
This can give publication-quality translation on
limited domains (e.g. the WebALT project).
</P>
<P>
Resource grammars with grammar composition lead to
<B>compile-time transfer</B>.
</P>
<P>
When is <B>run-time transfer</B> necessary?
</P>
<P>
Cf. CLE (Core Language Engine).
</P>
<P>
<!-- NEW -->
</P>
<A NAME="toc11"></A>
<H2>Parametrized modules</H2>
<P>
This notion comes from the ML language in the 1980's.
</P>
<P>
It can be used for sharing even more code between languages
than their AS.
</P>
<P>
Especially, for related languages (Scandinavian, Romance).
</P>
<P>
Cf. grammar porting in CLE: what they do with untyped 
macro packages GF does with typable interfaces.
</P>

<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags -thtml -\-toc gf-formalism.txt -->
</BODY></HTML>