summaryrefslogtreecommitdiff
path: root/src-3.0/PGF/doc/syntax.txt
blob: db8f7c1499e487a5aaeacae3da40c5552380573c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
GFCC Syntax 


==Syntax of GFCC files==

The parser syntax is very simple, as defined in BNF:
```
  Grm. Grammar ::= [RExp] ;

  App.  RExp ::= "(" CId [RExp] ")" ;
  AId.  RExp ::= CId ;
  AInt. RExp ::= Integer ;
  AStr. RExp ::= String ;
  AFlt. RExp ::= Double ;
  AMet. RExp ::= "?" ;

  terminator RExp "" ;

  token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
```
While a parser and a printer can be generated for many languages
from this grammar by using the BNF Converter, a parser is also
easy to write by hand using recursive descent.


==Syntax of well-formed GFCC code==

Here is a summary of well-formed syntax, 
with a comment on the semantics of each construction.
```
  Grammar ::= 
    ("grammar" CId CId*)      -- abstract syntax name and concrete syntax names
    "(" "flags"    Flag* ")"     -- global and abstract flags
    "(" "abstract" Abstract ")"  -- abstract syntax
    "(" "concrete" Concrete* ")" -- concrete syntaxes

  Abstract ::= 
    "(" "fun"   FunDef* ")"      -- function definitions
    "(" "cat"   CatDef* ")"      -- category definitions

  Concrete ::= 
    "(" CId                      -- language name
      "flags"     Flag*          -- concrete flags
      "lin"       LinDef*        -- linearization rules
      "oper"      LinDef*        -- operations (macros)
      "lincat"    LinDef*        -- linearization type definitions
      "lindef"    LinDef*        -- linearization default definitions
      "printname" LinDef*        -- printname definitions
      "param"     LinDef*        -- lincats with labels and parameter value names
    ")" 

  Flag   ::= "(" CId String ")"   -- flag and value
  FunDef ::= "(" CId Type Exp ")" -- function, type, and definition
  CatDef ::= "(" CId Hypo* ")"    -- category and context
  LinDef ::= "(" CId Term ")"     -- function and definition

  Type ::= 
    "(" CId                 -- value category
      "(" "H" Hypo* ")"     --   argument context
      "(" "X" Exp* ")" ")"  --   arguments (of dependent value type)

  Exp ::=
     "(" CId                -- function
       "(" "B" CId* ")"     --   bindings
       "(" "X" Exp* ")" ")" --   arguments
   | CId                    -- variable
   | "?"                    -- metavariable
   | "(" "Eq" Equation* ")" -- group of pattern equations
   | Integer                -- integer literal (non-negative)
   | Float                  -- floating-point literal (non-negative)
   | String                 -- string literal (in double quotes)

  Hypo ::= "(" CId Type ")" -- variable and type

  Equation ::= "(" "E" Exp Exp* ")" -- value and pattern list

  Term ::= 
     "(" "R"  Term* ")"       -- array (record or table)
   | "(" "S"  Term* ")"       -- concatenated sequence
   | "(" "FV" Term* ")"       -- free variant list
   | "(" "P"  Term Term ")"   -- access to index (projection or selection)
   | "(" "W"  String Term ")" -- token prefix with suffix list
   | "(" "A"  Integer ")"     -- pointer to subtree
   | String                   -- token (in double quotes)
   | Integer                  -- index in array
   | CId                      -- macro constant
   | "?"                      -- metavariable
```


==GFCC interpreter==

The first phase in interpreting GFCC is to parse a GFCC file and
build an internal abstract syntax representation, as specified
in the previous section.

With this representation, linearization can be performed by
a straightforward function from expressions (``Exp``) to terms
(``Term``). All expressions except groups of pattern equations
can be linearized.

Here is a reference Haskell implementation of linearization:
```
  linExp :: GFCC -> CId -> Exp -> Term
  linExp gfcc lang tree@(DTr _ at trees) = case at of
    AC fun -> comp (map lin trees) $ look fun
    AS s   -> R [K (show s)] -- quoted
    AI i   -> R [K (show i)]
    AF d   -> R [K (show d)]
    AM     -> TM
   where
     lin  = linExp gfcc lang
     comp = compute gfcc lang
     look = lookLin gfcc lang
```
TODO: bindings must be supported.

Terms resulting from linearization are evaluated in
call-by-value order, with two environments needed:
- the grammar (a concrete syntax) to give the global constants
- an array of terms to give the subtree linearizations


The Haskell implementation works as follows:
```
compute :: GFCC -> CId -> [Term] -> Term -> Term
compute gfcc lang args = comp where
  comp trm = case trm of
    P r p  -> proj (comp r) (comp p)
    W s t  -> W s (comp t)
    R ts   -> R $ map comp ts
    V i    -> idx args (fromInteger i)  -- already computed
    F c    -> comp $ look c             -- not computed (if contains V)
    FV ts  -> FV $ Prelude.map comp ts
    S ts   -> S $ Prelude.filter (/= S []) $ Prelude.map comp ts
    _ -> trm

  look = lookOper gfcc lang

  idx xs i = xs !! i

  proj r p = case (r,p) of
    (_,     FV ts) -> FV $ Prelude.map (proj r) ts
    (FV ts, _    ) -> FV $ Prelude.map (\t -> proj t p) ts
    (W s t, _)     -> kks (s ++ getString (proj t p))
    _              -> comp $ getField r (getIndex p)

  getString t = case t of
    K (KS s) -> s
    _ -> trace ("ERROR in grammar compiler: string from "++ show t) "ERR"

  getIndex t =  case t of
    C i    -> fromInteger i
    RP p _ -> getIndex p
    TM     -> 0  -- default value for parameter
    _ -> trace ("ERROR in grammar compiler: index from " ++ show t) 0

  getField t i = case t of
    R rs   -> idx rs i
    RP _ r -> getField r i
    TM     -> TM
    _ -> trace ("ERROR in grammar compiler: field from " ++ show t) t
```
The result of linearization is usually a record, which is realized as
a string using the following algorithm.
```
  realize :: Term -> String
  realize trm = case trm of
    R (t:_)  -> realize t
    S ss     -> unwords $ map realize ss
    K s      -> s
    W s t    -> s ++ realize t
    FV (t:_) -> realize t  -- TODO: all variants
    TM       -> "?"
```
Notice that realization always picks the first field of a record.
If a linearization type has more than one field, the first field
does not necessarily contain the desired string.
Also notice that the order of record fields in GFCC is not necessarily
the same as in GF source.