summaryrefslogtreecommitdiff
path: root/src/GF/GFCC/doc/gfcc.html
blob: 8f8c478c0607422db2ba104b3160b3fcfec452dd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
<TITLE>The GFCC Grammar Format</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<P ALIGN="center"><CENTER><H1>The GFCC Grammar Format</H1>
<FONT SIZE="4">
<I>Aarne Ranta</I><BR>
October 5, 2007
</FONT></CENTER>

<P>
Author's address:
<A HREF="http://www.cs.chalmers.se/~aarne"><CODE>http://www.cs.chalmers.se/~aarne</CODE></A>
</P>
<P>
History:
</P>
<UL>
<LI>5 Oct 2007: new, better structured GFCC with full expressive power
<LI>19 Oct: translation of lincats, new figures on C++
<LI>3 Oct 2006: first version
</UL>

<H2>What is GFCC</H2>
<P>
GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
that is needed to process GF grammars at runtime. This minimality has three
advantages:
</P>
<UL>
<LI>compact grammar files and run-time objects
<LI>time and space efficient processing
<LI>simple definition of interpreters
</UL>

<P>
Thus we also want to call GFCC the <B>portable grammar format</B>.
</P>
<P>
The idea is that all embedded GF applications use GFCC.
The GF system would be primarily used as a compiler and as a grammar
development tool.
</P>
<P>
Since GFCC is implemented in BNFC, a parser of the format is readily
available for C, C++, C#, Haskell, Java, and OCaml. Also an XML 
representation can be generated in BNFC. A 
<A HREF="../">reference implementation</A>
of linearization and some other functions has been written in Haskell.
</P>
<H2>GFCC vs. GFC</H2>
<P>
GFCC is aimed to replace GFC as the run-time grammar format. GFC was designed
to be a run-time format, but also to
support separate compilation of grammars, i.e.
to store the results of compiling 
individual GF modules. But this means that GFC has to contain extra information,
such as type annotations, which is only needed in compilation and not at
run-time. In particular, the pattern matching syntax and semantics of GFC is
complex and therefore difficult to implement in new platforms.
</P>
<P>
Actually, GFC is planned to be omitted also as the target format of
separate compilation, where plain GF (type annotated and partially evaluated)
will be used instead. GFC provides only marginal advantages as a target format
compared with GF, and it is therefore just extra weight to carry around this
format.
</P>
<P>
The main differences of GFCC compared with GFC (and GF) can be summarized as follows:
</P>
<UL>
<LI>there are no modules, and therefore no qualified names
<LI>a GFCC grammar is multilingual, and consists of a common abstract syntax 
  together with one concrete syntax per language
<LI>records and tables are replaced by arrays
<LI>record labels and parameter values are replaced by integers
<LI>record projection and table selection are replaced by array indexing
<LI>even though the format does support dependent types and higher-order abstract
  syntax, there is no interpreted yet that does this
</UL>

<P>
Here is an example of a GF grammar, consisting of three modules, 
as translated to GFCC. The representations are aligned; thus they do not completely
reflect the order of judgements in GFCC files, which have different orders of
blocks of judgements, and alphabetical sorting.
</P>
<PRE>
                                      grammar Ex(Eng,Swe);
  
  abstract Ex = {                     abstract {
    cat                                 cat
      S ; NP ; VP ;                      NP[]; S[]; VP[];
    fun                                 fun
      Pred : NP -&gt; VP -&gt; S ;             Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
      She, They : NP ;                   She=[0,"she"];
      Sleep : VP ;                       They=[1,"they"];
                                         Sleep=[["sleeps","sleep"]];
  }                                     } ;
                                      
  concrete Eng of Ex = {              concrete Eng {
    lincat                             lincat
      S  = {s : Str} ;                  S=[()];
      NP = {s : Str ; n : Num} ;        NP=[1,()];
      VP = {s : Num =&gt; Str} ;           VP=[[(),()]];
    param
      Num = Sg | Pl ;
    lin                                lin
      Pred np vp = {                    Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
        s = np.s ++ vp.s ! np.n} ;      
      She = {s = "she" ; n = Sg} ;      She=[0,"she"];
      They = {s = "they" ; n = Pl} ;    They = [1, "they"];
      Sleep = {s = table {              Sleep=[["sleeps","sleep"]];
        Sg =&gt; "sleeps" ; 
        Pl =&gt; "sleep"                   
        }                               
      } ;
  }                                   } ;
  
  concrete Swe of Ex = {              concrete Swe {
    lincat                             lincat
      S  = {s : Str} ;                  S=[()];
      NP = {s : Str} ;                  NP=[()];
      VP = {s : Str} ;                  VP=[()];
    param
      Num = Sg | Pl ;
    lin                                lin
      Pred np vp = {                    Pred = [(($0!0),($1!0))];
        s = np.s ++ vp.s} ;
      She = {s = "hon"} ;               She = ["hon"];
      They = {s = "de"} ;               They = ["de"];
      Sleep = {s = "sover"} ;           Sleep = ["sover"];
  }                                     } ;                                   
</PRE>
<P></P>
<H2>The syntax of GFCC files</H2>
<P>
The complete BNFC grammar, from which  
the rules in this section are taken, is in the file 
<A HREF="../DataGFCC.cf"><CODE>GF/GFCC/GFCC.cf</CODE></A>.
</P>
<H3>Top level</H3>
<P>
A grammar has a header telling the name of the abstract syntax
(often specifying an application domain), and the names of
the concrete languages. The abstract syntax and the concrete
syntaxes themselves follow.
</P>
<PRE>
    Grm. Grammar  ::= 
      "grammar" CId "(" [CId] ")" ";" 
      Abstract ";" 
      [Concrete] ;
  
    Abs. Abstract ::= 
      "abstract" "{" 
        "flags" [Flag] 
        "fun"   [FunDef] 
        "cat"   [CatDef] 
      "}" ;
  
    Cnc. Concrete ::= 
      "concrete" CId "{" 
        "flags"  [Flag] 
        "lin"    [LinDef] 
        "oper"   [LinDef] 
        "lincat" [LinDef] 
        "lindef" [LinDef] 
        "printname" [LinDef]
      "}" ;
</PRE>
<P>
This syntax organizes each module to a sequence of <B>fields</B>, such
as flags, linearizations, operations, linearization types, etc.
It is envisaged that particular applications can ignore some
of the fields, typically so that earlier fields are more
important than later ones.
</P>
<P>
The judgement forms have the following syntax.
</P>
<PRE>
    Flg. Flag     ::= CId "=" String ;
    Cat. CatDef   ::= CId "[" [Hypo] "]" ;
    Fun. FunDef   ::= CId ":" Type "=" Exp ;
    Lin. LinDef   ::= CId "=" Term ;
</PRE>
<P>
For the run-time system, the reference implementation in Haskell
uses a structure that gives efficient look-up:
</P>
<PRE>
    data GFCC = GFCC {
      absname   :: CId ,
      cncnames  :: [CId] ,
      abstract  :: Abstr ,
      concretes :: Map CId Concr
      }
  
    data Abstr = Abstr {
      aflags  :: Map CId String,     -- value of a flag
      funs    :: Map CId (Type,Exp), -- type and def of a fun
      cats    :: Map CId [Hypo],     -- context of a cat
      catfuns :: Map CId [CId]       -- funs yielding a cat (redundant, for fast lookup)
      }
  
    data Concr = Concr {
      flags   :: Map CId String, -- value of a flag
      lins    :: Map CId Term,   -- lin of a fun
      opers   :: Map CId Term,   -- oper generated by subex elim
      lincats :: Map CId Term,   -- lin type of a cat
      lindefs :: Map CId Term,   -- lin default of a cat
      printnames :: Map CId Term -- printname of a cat or a fun
      }
</PRE>
<P>
These definitions are from <A HREF="../DataGFCC.hs"><CODE>GF/GFCC/DataGFCC.hs</CODE></A>.
</P>
<P>
Identifiers (<CODE>CId</CODE>) are like <CODE>Ident</CODE> in GF, except that
the compiler produces constants prefixed with <CODE>_</CODE> in
the common subterm elimination optimization.
</P>
<PRE>
    token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
</PRE>
<P></P>
<H3>Abstract syntax</H3>
<P>
Types are first-order function types built from argument type
contexts and value types.
category symbols. Syntax trees (<CODE>Exp</CODE>) are
rose trees with nodes consisting of a head (<CODE>Atom</CODE>) and
bound variables (<CODE>CId</CODE>). 
</P>
<PRE>
    DTyp. Type  ::= "[" [Hypo] "]" CId [Exp] ;        
    DTr.  Exp   ::= "[" "(" [CId] ")" Atom [Exp] "]" ;
    Hyp.  Hypo  ::= CId ":" Type ;
</PRE>
<P>
The head Atom is either a function
constant, a bound variable, or a metavariable, or a string, integer, or float
literal.
</P>
<PRE>
    AC.   Atom  ::= CId ;
    AS.   Atom  ::= String ;
    AI.   Atom  ::= Integer ;
    AF.   Atom  ::= Double ;
    AM.   Atom  ::= "?" Integer ;
</PRE>
<P>
The context-free types and trees of the "old GFCC" are special
cases, which can be defined as follows:
</P>
<PRE>
    Typ.  Type  ::= [CId] "-&gt;" CId
    Typ args val = DTyp [Hyp (CId "_") arg | arg &lt;- args] val
  
    Tr.   Exp   ::= "(" CId [Exp] ")"
    Tr fun exps  = DTr [] fun exps
</PRE>
<P>
To store semantic (<CODE>def</CODE>) definitions by cases, the following expression
form is provided, but it is only meaningful in the last field of a function
declaration in an abstract syntax:
</P>
<PRE>
    EEq. Exp      ::= "{" [Equation] "}" ;
    Equ. Equation ::= [Exp] "-&gt;" Exp ;
</PRE>
<P>
Notice that expressions are used to encode patterns. Primitive notions
(the default semantics in GF) are encoded as empty sets of equations
(<CODE>[]</CODE>). For a constructor (canonical form) of a category <CODE>C</CODE>, we
aim to use the encoding as the application <CODE>(_constr C)</CODE>.
</P>
<H3>Concrete syntax</H3>
<P>
Linearization terms (<CODE>Term</CODE>) are built as follows.
Constructor names are shown to make the later code
examples readable.
</P>
<PRE>
    R.  Term ::= "[" [Term] "]" ;        -- array (record/table)
    P.  Term ::= "(" Term "!" Term ")" ; -- access to field (projection/selection)
    S.  Term ::= "(" [Term] ")" ;        -- concatenated sequence
    K.  Term ::= Tokn ;                  -- token
    V.  Term ::= "$" Integer ;           -- argument (subtree)
    C.  Term ::= Integer ;               -- array index (label/parameter value)
    FV. Term ::= "[|" [Term] "|]" ;      -- free variation
    TM. Term ::= "?" ;                   -- linearization of metavariable
</PRE>
<P>
Tokens are strings or (maybe obsolescent) prefix-dependent
variant lists.
</P>
<PRE>
    KS.  Tokn     ::= String ;
    KP.  Tokn     ::= "[" "pre" [String] "[" [Variant] "]" "]" ;
    Var. Variant  ::= [String] "/" [String] ;
</PRE>
<P>
Two special forms of terms are introduced by the compiler
as optimizations. They can in principle be eliminated, but
their presence makes grammars much more compact. Their semantics
will be explained in a later section.
</P>
<PRE>
    F.  Term ::= CId ;                     -- global constant
    W.  Term ::= "(" String "+" Term ")" ; -- prefix + suffix table
</PRE>
<P>
There is also a deprecated form of "record parameter alias",
</P>
<PRE>
    RP. Term ::= "(" Term "@" Term ")";    -- DEPRECATED
</PRE>
<P>
which will be removed when the migration to new GFCC is complete.
</P>
<H2>The semantics of concrete syntax terms</H2>
<P>
The code in this section is from <A HREF="../Linearize.hs"><CODE>GF/GFCC/Linearize.hs</CODE></A>.
</P>
<H3>Linearization and realization</H3>
<P>
The linearization algorithm is essentially the same as in
GFC: a tree is linearized by evaluating its linearization term
in the environment of the linearizations of the subtrees.
Literal atoms are linearized in the obvious way.
The function also needs to know the language (i.e. concrete syntax)
in which linearization is performed.
</P>
<PRE>
    linExp :: GFCC -&gt; CId -&gt; Exp -&gt; Term
    linExp gfcc lang tree@(DTr _ at trees) = case at of
      AC fun -&gt; comp (Prelude.map lin trees) $ look fun
      AS s   -&gt; R [kks (show s)] -- quoted
      AI i   -&gt; R [kks (show i)]
      AF d   -&gt; R [kks (show d)]
      AM     -&gt; TM
     where
       lin  = linExp gfcc lang
       comp = compute gfcc lang
       look = lookLin gfcc lang
</PRE>
<P>
TODO: bindings must be supported.
</P>
<P>
The result of linearization is usually a record, which is realized as
a string using the following algorithm.
</P>
<PRE>
    realize :: Term -&gt; String
    realize trm = case trm of
      R (t:_)  -&gt; realize t
      S ss     -&gt; unwords $ Prelude.map realize ss
      K (KS s) -&gt; s
      K (KP s _) -&gt; unwords s ---- prefix choice TODO
      W s t    -&gt; s ++ realize t
      FV (t:_) -&gt; realize t
      TM       -&gt; "?"
</PRE>
<P>
Notice that realization always picks the first field of a record.
If a linearization type has more than one field, the first field
does not necessarily contain the desired string.
Also notice that the order of record fields in GFCC is not necessarily
the same as in GF source.
</P>
<H3>Term evaluation</H3>
<P>
Evaluation follows call-by-value order, with two environments
needed:
</P>
<UL>
<LI>the grammar (a concrete syntax) to give the global constants
<LI>an array of terms to give the subtree linearizations
</UL>

<P>
The code is presented in one-level pattern matching, to
enable reimplementations in languages that do not permit
deep patterns (such as Java and C++).
</P>
<PRE>
  compute :: GFCC -&gt; CId -&gt; [Term] -&gt; Term -&gt; Term
  compute gfcc lang args = comp where
    comp trm = case trm of
      P r p  -&gt; proj (comp r) (comp p)
      W s t  -&gt; W s (comp t)
      R ts   -&gt; R $ Prelude.map comp ts
      V i    -&gt; idx args (fromInteger i)  -- already computed
      F c    -&gt; comp $ look c             -- not computed (if contains V)
      FV ts  -&gt; FV $ Prelude.map comp ts
      S ts   -&gt; S $ Prelude.filter (/= S []) $ Prelude.map comp ts
      _ -&gt; trm
  
    look = lookOper gfcc lang
  
    idx xs i = xs !! i
  
    proj r p = case (r,p) of
      (_,     FV ts) -&gt; FV $ Prelude.map (proj r) ts
      (W s t, _)     -&gt; kks (s ++ getString (proj t p))
      _              -&gt; comp $ getField r (getIndex p)
  
    getString t = case t of
      K (KS s) -&gt; s
      _ -&gt; trace ("ERROR in grammar compiler: string from "++ show t) "ERR"
  
    getIndex t =  case t of
      C i    -&gt; fromInteger i
      RP p _ -&gt; getIndex p
      TM     -&gt; 0  -- default value for parameter
      _ -&gt; trace ("ERROR in grammar compiler: index from " ++ show t) 0
  
    getField t i = case t of
      R rs   -&gt; idx rs i
      RP _ r -&gt; getField r i
      TM     -&gt; TM
      _ -&gt; trace ("ERROR in grammar compiler: field from " ++ show t) t
</PRE>
<P></P>
<H3>The special term constructors</H3>
<P>
The three forms introduced by the compiler may a need special
explanation.
</P>
<P>
Global constants
</P>
<PRE>
    Term ::= CId ;
</PRE>
<P>
are shorthands for complex terms. They are produced by the
compiler by (iterated) <B>common subexpression elimination</B>.
They are often more powerful than hand-devised code sharing in the source
code. They could be computed off-line by replacing each identifier by 
its definition.
</P>
<P>
<B>Prefix-suffix tables</B>
</P>
<PRE>
    Term ::= "(" String "+" Term ")" ; 
</PRE>
<P>
represent tables of word forms divided to the longest common prefix
and its array of suffixes. In the example grammar above, we have
</P>
<PRE>
    Sleep = [("sleep" + ["s",""])]
</PRE>
<P>
which in fact is equal to the array of full forms
</P>
<PRE>
    ["sleeps", "sleep"]
</PRE>
<P>
The power of this construction comes from the fact that suffix sets
tend to be repeated in a language, and can therefore be collected
by common subexpression elimination. It is this technique that
explains the used syntax rather than the more accurate
</P>
<PRE>
    "(" String "+" [String] ")"
</PRE>
<P>
since we want the suffix part to be a <CODE>Term</CODE> for the optimization to
take effect.
</P>
<H2>Compiling to GFCC</H2>
<P>
Compilation to GFCC is performed by the GF grammar compiler, and
GFCC interpreters need not know what it does. For grammar writers,
however, it might be interesting to know what happens to the grammars
in the process.
</P>
<P>
The compilation phases are the following
</P>
<OL>
<LI>type check and partially evaluate GF source
<LI>create a symbol table mapping the GF parameter and record types to
  fixed-size arrays, and parameter values and record labels to integers
<LI>traverse the linearization rules replacing parameters and labels by integers
<LI>reorganize the created GF grammar so that it has just one abstract syntax
  and one concrete syntax per language
<LI>TODO: apply UTF8 encoding to the grammar, if not yet applied (this is told by the
  <CODE>coding</CODE> flag)
<LI>translate the GF grammar object to a GFCC grammar object, using a simple
  compositional mapping
<LI>perform the word-suffix optimization on GFCC linearization terms
<LI>perform subexpression elimination on each concrete syntax module
<LI>print out the GFCC code
</OL>

<H3>Problems in GFCC compilation</H3>
<P>
Two major problems had to be solved in compiling GF to GFCC:
</P>
<UL>
<LI>consistent order of tables and records, to permit the array translation
<LI>run-time variables in complex parameter values.
</UL>

<P>
The current implementation is still experimental and may fail
to generate correct code. Any errors remaining are likely to be 
related to the two problems just mentioned.
</P>
<P>
The order problem is solved in slightly different ways for tables and records.
In both cases, <B>eta expansion</B> is used to establish a
canonical order. Tables are ordered by applying the preorder induced
by <CODE>param</CODE> definitions. Records are ordered by sorting them by labels.
This means that
e.g. the <CODE>s</CODE> field will in general no longer appear as the first
field, even if it does so in the GF source code. But relying on the
order of fields in a labelled record would be misplaced anyway.
</P>
<P>
The canonical form of records is further complicated by lock fields,
i.e. dummy fields of form <CODE>lock_C = &lt;&gt;</CODE>, which are added to grammar
libraries to force intensionality of linearization types. The problem
is that the absence of a lock field only generates a warning, not
an error. Therefore a GF grammar can contain objects of the same
type with and without a lock field. This problem was solved in GFCC
generation by just removing all lock fields (defined as fields whose
type is the empty record type). This has the further advantage of
(slightly) reducing the grammar size. More importantly, it is safe
to remove lock fields, because they are never used in computation,
and because intensional types are only needed in grammars reused
as libraries, not in grammars used at runtime.
</P>
<P>
While the order problem is rather bureaucratic in nature, run-time 
variables are an interesting problem. They arise in the presence
of complex parameter values, created by argument-taking constructors
and parameter records. To give an example, consider the GF parameter
type system
</P>
<PRE>
    Number = Sg | Pl ;
    Person = P1 | P2 | P3 ;
    Agr = Ag Number Person ;
</PRE>
<P>
The values can be translated to integers in the expected way,
</P>
<PRE>
    Sg = 0, Pl = 1
    P1 = 0, P2 = 1, P3 = 2
    Ag Sg P1 = 0, Ag Sg P2 = 1, Ag Sg P3 = 2,
    Ag Pl P1 = 3, Ag Pl P2 = 4, Ag Pl P3 = 5
</PRE>
<P>
However, an argument of <CODE>Agr</CODE> can be a run-time variable, as in
</P>
<PRE>
    Ag np.n P3
</PRE>
<P>
This expression must first be translated to a case expression,
</P>
<PRE>
    case np.n of {
      0 =&gt; 2 ;
      1 =&gt; 5
      }
</PRE>
<P>
which can then be translated to the GFCC term
</P>
<PRE>
    ([2,5] ! ($0 ! $1))  
</PRE>
<P>
assuming that the variable <CODE>np</CODE> is the first argument and that its
<CODE>Number</CODE> field is the second in the record.
</P>
<P>
This transformation of course has to be performed recursively, since
there can be several run-time variables in a parameter value:
</P>
<PRE>
    Ag np.n np.p
</PRE>
<P>
A similar transformation would be possible to deal with the double
role of parameter records discussed above. Thus the type
</P>
<PRE>
    RNP = {n : Number ; p : Person}
</PRE>
<P>
could be uniformly translated into the set <CODE>{0,1,2,3,4,5}</CODE>
as <CODE>Agr</CODE> above. Selections would be simple instances of indexing.
But any projection from the record should be translated into
a case expression,
</P>
<PRE>
    rnp.n  ===&gt; 
    case rnp of {
      0 =&gt; 0 ;
      1 =&gt; 0 ;
      2 =&gt; 0 ;
      3 =&gt; 1 ;
      4 =&gt; 1 ;
      5 =&gt; 1
      }
</PRE>
<P>
To avoid the code bloat resulting from this, we have chosen to
deal with records by a <B>currying</B> transformation:
</P>
<PRE>
    table {n : Number ; p : Person} {... ...}
     ===&gt;
    table Number {Sg =&gt; table Person {...} ; table Person {...}}
</PRE>
<P>
This is performed when GFCC is generated. Selections with
records have to be treated likewise,
</P>
<PRE>
    t ! r   ===&gt; t ! r.n ! r.p
</PRE>
<P></P>
<H3>The representation of linearization types</H3>
<P>
Linearization types (<CODE>lincat</CODE>) are not needed when generating with
GFCC, but they have been added to enable parser generation directly from
GFCC. The linearization type definitions are shown as a part of the
concrete syntax, by using terms to represent types. Here is the table
showing how different linearization types are encoded.
</P>
<PRE>
    P*                         = max(P)         -- parameter type
    {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*]  -- record
    (P =&gt; T)*                  = [T* ,...,T*]   -- table, size(P) cases
    Str*                       = ()
</PRE>
<P>
For example, the linearization type <CODE>present/CatEng.NP</CODE> is
translated as follows:
</P>
<PRE>
    NP = {
      a : {                     -- 6 = 2*3 values
        n : {ParamX.Number} ;   -- 2 values
        p : {ParamX.Person}     -- 3 values
      } ;
      s : {ResEng.Case} =&gt; Str  -- 3 values
    }
  
    __NP = [[1,2],[(),(),()]]
</PRE>
<P></P>
<H3>Running the compiler and the GFCC interpreter</H3>
<P>
GFCC generation is a part of the 
<A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html">developers' version</A> 
of GF since September 2006. To invoke the compiler, the flag 
<CODE>-printer=gfcc</CODE> to the command
<CODE>pm = print_multi</CODE> is used. It is wise to recompile the grammar from
source, since previously compiled libraries may not obey the canonical
order of records. 
Here is an example, performed in
<A HREF="../../../../../examples/bronzeage">example/bronzeage</A>.
</P>
<PRE>
    i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageEng.gf
    i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageGer.gf
    strip
    pm -printer=gfcc | wf bronze.gfcc
</PRE>
<P>
There is also an experimental batch compiler, which does not use the GFC
format or the record aliases. It can be produced by
</P>
<PRE>
    make gfc
</PRE>
<P>
in <CODE>GF/src</CODE>, and invoked by
</P>
<PRE>
    gfc --make FILES
</PRE>
<P></P>
<H2>The reference interpreter</H2>
<P>
The reference interpreter written in Haskell consists of the following files:
</P>
<PRE>
    -- source file for BNFC
    GFCC.cf       -- labelled BNF grammar of gfcc
  
    -- files generated by BNFC
    AbsGFCC.hs    -- abstrac syntax datatypes
    ErrM.hs       -- error monad used internally
    LexGFCC.hs    -- lexer of gfcc files
    ParGFCC.hs    -- parser of gfcc files and syntax trees
    PrintGFCC.hs  -- printer of gfcc files and syntax trees
  
    -- hand-written files
    DataGFCC.hs   -- grammar datatype, post-parser grammar creation
    Linearize.hs  -- linearization and evaluation
    Macros.hs     -- utilities abstracting away from GFCC datatypes
    Generate.hs   -- random and exhaustive generation, generate-and-test parsing
    API.hs        -- functionalities accessible in embedded GF applications
    Generate.hs   -- random and exhaustive generation
    Shell.hs      -- main function - a simple command interpreter
</PRE>
<P>
It is included in the
<A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html">developers' version</A>
of GF, in the subdirectories <A HREF="../"><CODE>GF/src/GF/GFCC</CODE></A> and 
<A HREF="../../Devel"><CODE>GF/src/GF/Devel</CODE></A>.
</P>
<P>
As of September 2007, default parsing in main GF uses GFCC (implemented by Krasimir
Angelov). The interpreter uses the relevant modules
</P>
<PRE>
    GF/Conversions/SimpleToFCFG.hs  -- generate parser from GFCC
    GF/Parsing/FCFG.hs              -- run the parser
</PRE>
<P></P>
<P>
To compile the interpreter, type
</P>
<PRE>
    make gfcc
</PRE>
<P>
in <CODE>GF/src</CODE>. To run it, type
</P>
<PRE>
    ./gfcc &lt;GFCC-file&gt;
</PRE>
<P>
The available commands are
</P>
<UL>
<LI><CODE>gr &lt;Cat&gt; &lt;Int&gt;</CODE>:  generate a number of random trees in category.
  and show their linearizations in all languages
<LI><CODE>grt &lt;Cat&gt; &lt;Int&gt;</CODE>:  generate a number of random trees in category.
  and show the trees and their linearizations in all languages
<LI><CODE>gt &lt;Cat&gt; &lt;Int&gt;</CODE>:  generate a number of trees in category from smallest,
  and show their linearizations in all languages
<LI><CODE>gtt &lt;Cat&gt; &lt;Int&gt;</CODE>:  generate a number of trees in category from smallest,
  and show the trees and their linearizations in all languages
<LI><CODE>p &lt;Lang&gt; &lt;Cat&gt; &lt;String&gt;</CODE>: parse a string into a set of trees
<LI><CODE>lin &lt;Tree&gt;</CODE>: linearize tree in all languages, also showing full records
<LI><CODE>q</CODE>: terminate the system cleanly
</UL>

<H2>Embedded formats</H2>
<UL>
<LI>JavaScript: compiler of linearization and abstract syntax
<P></P>
<LI>Haskell: compiler of abstract syntax and interpreter with parsing,
  linearization, and generation
<P></P>
<LI>C: compiler of linearization (old GFCC)
<P></P>
<LI>C++: embedded interpreter supporting linearization (old GFCC)
</UL>

<H2>Some things to do</H2>
<P>
Support for dependent types, higher-order abstract syntax, and
semantic definition in GFCC generation and interpreters.
</P>
<P>
Replacing the entire GF shell by one based on GFCC.
</P>
<P>
Interpreter in Java.
</P>
<P>
Hand-written parsers for GFCC grammars to reduce code size
(and efficiency?) of interpreters.
</P>
<P>
Binary format and/or file compression of GFCC output.
</P>
<P>
Syntax editor based on GFCC.
</P>
<P>
Rewriting of resource libraries in order to exploit the
word-suffix sharing better (depth-one tables, as in FM).
</P>

<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags -thtml gfcc.txt -->
</BODY></HTML>