From d82a53ebc660cf0319cd18edb4ddf9a2256bda3f Mon Sep 17 00:00:00 2001 From: "John J. Camilleri" Date: Sun, 9 Dec 2018 20:38:02 +0100 Subject: Replace gf-refman.html with Markdown version gf-refman.md The raw HTML was invalid, and this way we use the common website template for a uniform look without any duplication. It seems gf-refman.html was once generated from txt2tags, although I have been unable to find this original .t2t file. I also tried to re-generate txt2tags from HTML but was not able to. However I was able to convert HTML to Markdown using Pandoc and I think the result is pretty good, so I think we should use this. The original gf-refman.html can be obtained from git history, e.g.: https://github.com/GrammaticalFramework/gf-core/blob/a7e43d872f5e612f93131f2d8caf811fbee9aa83/doc/gf-refman.html --- doc/gf-refman.md | 2770 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2770 insertions(+) create mode 100644 doc/gf-refman.md (limited to 'doc/gf-refman.md') diff --git a/doc/gf-refman.md b/doc/gf-refman.md new file mode 100644 index 000000000..0462cf185 --- /dev/null +++ b/doc/gf-refman.md @@ -0,0 +1,2770 @@ +--- +title: GF Language Reference Manual +author: Aarne Ranta, Krasimir Angelov +date: June 2014, GF 3.6 +toc: true +--- + +This document is a reference manual to the GF programming language. GF, +Grammatical Framework, is a special-purpose programming language, +designed to support definitions of grammars. + +This document is not an introduction to GF; such introduction can be +found in the GF tutorial available on line on the GF web page, + +[`grammaticalframework.org`](http://grammaticalframework.org) + +This manual covers only the language, not the GF compiler or interactive +system. We will however make some references to different compiler +versions, if they involve changes of behaviour having to do with the +language specification. + +This manual is meant to be fully compatible with GF version 3.0. Main +discrepancies with version 2.8 are indicated, as well as with the +reference article on GF, + +A. Ranta, \"Grammatical Framework. A Type Theoretical Grammar +Formalism\", *The Journal of Functional Programming* 14(2), 2004, pp. +145-189. + +This article will referred to as \"the JFP article\". + +As metalinguistic notation, we will use the symbols + +- *a* === *b* to say that *a* is syntactic sugar for *b* +- *a* ==\> *b* to say that *a* is computed (or compiled) to *b* + + +Overview of GF +-------------- + +GF is a typed functional language, borrowing many of its constructs from +ML and Haskell: algebraic datatypes, higher-order functions, pattern +matching. The module system bears resemblance to ML (functors) but also +to object-oriented languages (inheritance). The type theory used in the +abstract syntax part of GF is inherited from logical frameworks, in +particular ALF (\"Another Logical Framework\"; in a sense, GF is Yet +Another ALF). From ALF comes also the use of dependent types, including +the use of explicit type variables instead of Hindley-Milner +polymorphism. + +The look and feel of GF is close to Java and C, due to the use of curly +brackets and semicolons in structuring the code; the expression syntax, +however, follows Haskell in using juxtaposition for function application +and parentheses only for grouping. + +To understand the constructs of GF, and especially their limitations in +comparison to general-purpose programming languages, it is essential to +keep in mind that GF is a special-purpose and non-turing-complete +language. Every GF program is ultimately compiled to a **multilingual +grammar**, which consists of an **abstract syntax** and a set of +**concrete syntaxes**. The abstract syntax defines a system of **syntax +trees**, and each concrete syntax defines a mapping from those syntax +trees to **nested tuples** of strings and integers. This mapping is +**compositional**, i.e. **homomorphic**, and moreover **reversible**: +given a nested tuple, there exists an effective way of finding the set +of syntax trees that map to this tuple. The procedure of applying the +mapping to a tree to produce a tuple is called **linearization**, and +the reverse search procedure is called **parsing**. It is ultimately the +requirement of reversibility that restricts GF to be less than +turing-complete. This is reflected in restrictions to recursion in +concrete syntax. Tree formation in abstract syntax, however, is fully +recursive. + +Even though run-time GF grammars manipulate just nested tuples, at +compile time these are represented by by the more fine-grained labelled +records and finite functions over algebraic datatypes. This enables the +programmer to write on a higher abstraction level, and also adds type +distinctions and hence raises the level of checking of programs. + + +The module system +----------------- + + +### Top-level and supplementary module structure + +The big picture of GF as a programming language for multilingual +grammars explains its principal module structure. Any GF grammar must +have an abstract syntax module; it can in addition have any number of +concrete syntax modules matching that abstract syntax. Before going to +details, we give a simple example: a module defining the **category** +`A` of adjectives and one adjective-forming **function**, the zero-place +function `Even`. We give the module the name `Adj`. The GF code for the +module looks as follows: + + abstract Adj = { + cat A ; + fun Even : A ; + } + +Here are two concrete syntax modules, one intended for mapping the trees +to English, the other to Swedish. The mappling is defined by `lincat` +definitions assigning a **linearization type** to each category, and +`lin` definitions assigning a **linearization** to each function. + + concrete AdjEng of Adj = { + lincat A = {s : Str} ; + lin Even = {s = "even"} ; + } + + concrete AdjSwe of Adj = { + lincat A = {s : AForm => Str} ; + lin Even = {s = table { + ASg Utr => "jämn" ; + ASg Neutr => "jämnt" ; + APl => "jämna" + } + } ; + param AForm = ASg Gender | APl ; + param Gender = Utr | Neutr ; + } + +These examples illustrate the main ideas of multilingual grammars: + +- the concrete syntax must match the abstract syntax: + - every `cat` is given a `lincat` + - every `fun` is given a `lin` + + + +- the concrete syntax is internally coherent: + - the `lin` rules respect the types defined by `lincat` rules + + + +- concrete syntaxes are independent of each other + - they can use different `lincat` and `lin` definitions + - they can define their own **parameter types** (`param`) + +The first two ideas form the core of the **static checking** of GF +grammars, eliminating the possibility of run-time errors in +linearization and parsing. The third idea gives GF the expressive power +needed to map abstract syntax to vastly different languages. + +Abstract and concrete modules are called **top-level grammar modules**, +since they are the ones that remain in grammar systems at run time. +However, in order to support **modular grammar engineering**, GF +provides much more module structure than strictly required in top-level +grammars. + +**Inheritance**, also known as **extension**, means that a module can +inherit the contents of one or more other modules to which new +judgements are added, e.g. + + abstract MoreAdj = Adj ** { + fun Odd : A ; + } + +**Resource modules** define parameter types and **operations** usable in +several concrete syntaxes, + + resource MorphoFre = { + param Number = Sg | Pl ; + param Gender = Masc | Fem ; + oper regA : Str -> {s : Gender => Number => Str} = + \fin -> { + s = table { + Masc => table {Sg => fin ; Pl => fin + "s"} ; + Fem => table {Sg => fin + "e" ; Pl => fin + "es"} + } + } ; + } + +By **opening**, a module can use the contents of a resource module +without inheriting them, e.g. + + concrete AdjFre of Adj = open MorphoFre in { + lincat A = {s : Gender => Number => Str} ; + lin Even = regA "pair" ; + } + +**Interfaces** and **instances** separate the contents of a resource +module to type signatures and definitions, in a way analogous to +abstract vs. concrete modules, e.g. + + interface Lexicon = { + oper Adjective : Type ; + oper even_A : Adjective ; + } + + instance LexiconEng of Lexicon = { + oper Adjective = {s : Str} ; + oper even_A = {s = "even"} ; + } + +**Functors** i.e. **parametrized modules** i.e. **incomplete modules**, +defining a concrete syntax in terms of an interface. + + incomplete concrete AdjI of Adj = open Lexicon in { + lincat A = Adjective ; + lin Even = even_A ; + } + +A functor can be **instantiated** by providing instances of its open +interfaces. + + concrete AdjEng of Adj = AdjI with (Lexicon = LexiconEng) ; + + +### Compilation units + +The compilation unit of GF source code is a file that contains a module. +Judgements outside modules are supported only for backward +compatibility, as explained [here](#oldgf). Every source file, suffixed +`.gf`, is compiled to a \"GF object file\", suffixed `.gfo` (as of GF +Version 3.0 and later). For runtime grammar objects used for parsing and +linearization, a set of `.gfo` files is linked to a single file suffixed +`.pgf`. While `.gf` and `.gfo` files may contain modules of any kinds, a +`.pgf` file always contains a multilingual grammar with one abstract and +a set of concrete syntaxes. + +The following diagram summarizes the files involved in the compilation +process. + +`module1.gf module2.gf ... modulen.gf` + +==\> + +`module1.gfo module2.gfo ... modulen.gfo` + +==\> + +grammar.pgf + +Both `.gf` and `.gfo` files are written in the GF source language; +`.pgf` files are written in a lower-level format. The process of +translating `.gf` to `.gfo` consists of **name resolution**, **type +annotation**, **partial evaluation**, and **optimization**. There is a +great advantage in the possibility to do this separately for GF modules +and saving the result in `.gfo` files. The partial evaluation phase, in +particular, is time and memory consuming, and GF libraries are therefore +distributed in `.gfo` to make their use less arduous. + +*In GF before version 3.0, the object files are in a format called +`.gfc`,* *and the multilingual runtime grammar is in a format called +`.gfcm`.* + +The standard compiler has a built-in **make facility**, which finds out +what other modules are needed when compiling an explicitly given module. +This facility builds a dependency graph and decides which of the +involved modules need recompilation (from `.gf` to `.gfo`), and for +which the GF object can be used directly. + + +### Names + +Each module *M* defines a set of **names**, which are visible in *M* +itself, in all modules extending *M* (unless excluded, as explained +[here](#restrictedinheritance)), and all modules opening *M*. These +names can stand for abstract syntax categories and functions, parameter +types and parameter constructors, and operations. All these names live +in the same **name space**, which means that a name entering a module +more than once due to inheritance or opening can lead to a **conflict**. +It is specified [here](#renaming) how these conflicts are resolved. + +The names of modules live in a name space separate from the other names. +Even here, all names must be distinct in a set of files compiled to a +multilingual grammar. In particular, even files residing in different +directories must have different names, since GF has no notion of +hierarchic module names. + +Lexically, names belong to the class of **identifiers**. An idenfifier +is a letter followed by any number of letters, digits, undercores (`_`) +and primes (`'`). Upper- and lower-case letters are treated as distinct. +Nothing dictates the choice of upper or lower-case initials, but the +standard libraries follow conventions similar to Haskell: + +- upper case is used for modules, abstract syntax categories and + functions, parameter types and constructors, and type synonyms +- lower case is used for non-type-valued operations and for variables + +[]{#identifiers} + +\"Letters\" as mentioned in the identifier syntax include all 7-bit +ASCII letters. Iso-latin-1 and Unicode letters are supported in varying +degrees by different tools and platforms, and are hence not recommended +in identifiers. + + +### The structure of a module + +Modules of all types have the following structure: + +*moduletype* *name* `=` *extends* *opens* *body* + +The part of the module preceding the body is its **header**. The header +defines the type of the module and tells what other modules it inherits +and opens. The body consists of the judgements that introduce all the +new names defined by the module. + +Any of the parts *extends*, *opens*, and *body* may be empty. If they +are all filled, delimiters and keywords separate the parts in the +following way: + +*moduletype* *name* `=` *extends* `**` `open` *opens* `in` `{` *body* +`}` + +The part *moduletype* *name* looks slightly different if the type is +`concrete` or `instance`: the *name* intrudes between the type keyword +and the name of the module being implemented and which really belongs to +the type of the module: + +`concrete` *name* `of` *abstractname* + +The only exception to the schema of functor syntax is functor +instantiations: the instantiation list is given in a special way between +*extends* and *opens*: + +`incomplete concrete` *name* `of` *abstractname* `=` *extends* `**` +*functorname* `with` *instantiations* `**` `open` *opens* `in` `{` +*body* `}` + +Logically, the part \"*functorname* `with` *instantiations*\" should +really be one of the *extends*. This is also shown by the fact that it +can have restricted inheritance (concept defined +[here](#restrictedinheritance)). + + +### Module types, headers, and bodies + +The *extends* and *opens* parts of a module header are lists of module +names (with possible qualifications, as defined below +[here](#qualifiednames)). The first step of type checking a module +consists of verifying that these names stand for modules of approptiate +module types. As a rule of thumb, + +- the *extends* of a module must have the same *moduletype* +- the *opens* of a module must be of type `resource` + +However, the precise rules are a little more fine-grained, because of +the presence of interfaces and their instances, and the possibility to +reuse abstract and concrete modules as resources. The following table +gives, for all module types, the possible module types of their +*extends* and *opens*, as well as the forms of judgement legal in that +module type. + + module type extends opens body + --------------------------- ------------ ------------ ---------------------------- + `abstract` abstract \- `cat, fun, def, data` + `concrete of` *abstract* concrete resource\* `lincat, cat, oper, param` + `resource` resource\* resource\* `oper, param` + `interface` resource+ resource\* `oper, param` + `instance of` *interface* resource\* resource\* `oper, param` + `incomplete` concrete concrete+ resource+ `lincat, cat, oper, param` + +The table uses the following shorthands for lists of module types: + +- resource\*: resource, instance, concrete +- resource+: resource\*, interface, abstract +- concrete+: concrete, incomplete concrete + +The legality of judgements in the body is checked before the judgements +themselves are checked. + +The forms of judgement are explained [here](#judgementforms). + + +### Digression: the logic of module types + +Why are the legality conditions of opens and extends so complicated? The +best way to grasp them is probably to consider a simplified logical +model of the module system, replacing modules by types and functions. +This model could actually be developed towards treating modules in GF as +first-class objects; so far, however, this step has not been motivated +by any practical needs. + + module object and type + ------------------------------------------ ----------------------- + abstract A = B A = B : type + concrete C of A = B C = B : A -\> S + interface I = B I = B : type + instance J of I = B J = B : I + incomplete concrete C of A = open I in B C = B : I -\> A -\> S + concrete K of A = C with (I=J) K = B(J) : A -\> S + resource R = B R = B : I + concrete C of A = open R in B C = B(R) : A -\> S + +A further step of defining modules as first-class objects would use +GADTs and record types: + +- an abstract syntax is a Generalized Algebraic Datatype (GADT) +- the target type `S` of concrete syntax is the type of nested tuples + over strings and integers +- an interface is a labelled record type +- an instance is a record of the type defined by the interface +- a functor, with a module body opening an interface, is a function on + its instances +- the instantiation of a functor is an application of the function to + some instance +- a resource is a typed labelled record, putting together an interface + and an instance of it +- the body of a module opening a resource is as a function on the + interface implicit in the resource; this function is immediately + applied to the instance defined in the resource + +Slightly unexpectedly, interfaces and instances are easier to understand +in this way than resources - a resource is, indeed, more complex, since +it fuses together an interface and an instance. + +[]{#openabstract} + +When an abstract is used as an interface and a concrete as its instance, +they are actually reinterpreted so that they match the model. Then the +abstract is no longer a GADT, but a system of *abstract* datatypes, with +a record field of type `Type` for each category, and a function among +these types for each abstract syntax function. A concrete syntax +instantiates this record with linearization types and linearizations. + + +### Inheritance + +After checking that the *extends* of a module are of appropriate module +types, the compiler adds the inherited judgements to the judgements +included in the body. The inherited judgements are not copied entirely, +but their names with links to the inherited module. Conflicts may arise +in this process: a name can have two definitions in the combined pool of +inherited and added judgements. Such a conflict is always an error: GF +provides no way to redefine an inherited constant. + +Simple as the definition of a conflict may sound, it has to take care of +the inheritance hierarchy. A very common pattern of inheritance is the +**diamond**: inheritance from two modules which themselves inherit a +common base module. Assume that the base module defines a name `f`: + + N + / \ + M1 M2 + \ / + Base {f} + +Now, `N` inherits `f` from both `M1` and `M2`, so is there a conflict? +The answer in GF is *no*, because the \"two\" `f`\'s are in the end the +same: the one defined in `Base`. The situation is thus simpler than in +**multiple inheritance** in languages like C++, because definitions in +GF are **immutable**: neither `M1` nor `M2` can possibly have changed +the definition of `f` given in `Base`. In practice, the compiler manages +inheritance through hierarchy in a very simple way, by just always +creating a link not to the immediate parent, but the original ancestor; +this ancestor can be read from the link provided by the immediate +parent. Here is how links are created from source modules by the +compiler: + + Base {f} + M1 {m1} ===> M1 {Base.f, m1} + M2 {m2} ===> M2 {Base.f, m2} + N {n} ===> N {Base.f, M1.m1, M2.m2, n} + +[]{#restrictedinheritance} + +Inheritance can be **restricted**. This means that a module can be +specified as inheriting *only* explicitly listed constants, or all +constants *except* ones explicitly listed. The syntax uses constant +names in brackets, prefixed by a minus sign in the case of an exclusion +list. In the following configuration, N inherits `a,b,c` from `M1`, and +all names but `d` from `M2` + + N = M1 {a,b,c}, M2-{d} + +Restrictions are performed as a part of inheritance linking, module by +module: the link is created for a constant if and only if it is both +included in the module and compatible with the restriction. Thus, for +instance, an inadvertent usage can exclude a constant from one module +but inherit it from another one. In the following configuration, `f` is +inherited via `M1`, if `M1` inherits it. + + N = M1 [a,b,c], M2-[f] + +Unintended inheritance may cause problems later in compilation, in the +judgement-level dependency analysis phase. For instance, suppose a +function `f` has category `C` as its type in `M`, and we only include +`f`. The exclusion has the effect of creating an ill-formed module: + + abstract M = {cat C ; fun f : C ;} + M [f] ===> {fun f : C ;} + +One might expect inheritance restriction to be transitive: if an +included constant *b* depends on some other constant *a*, then *a* +should be included automatically. However, this rule would leave to +hard-to-detect inheritances. And it could only be applied later in the +compilation phase, when the compiler has not only collected the names +defined, but also resolved the names used in definitions. + +Yet another pitfall with restricted inheritance is that it must be +stated for each module separately. For instance, a concrete syntax of an +abstract must exclude all those names that the abstract does, and a +functor instantiation must replicate all restrictions of the functor. + + +### Opening + +Opening makes constants from other modules usable in judgements, without +inheriting them. This means that, unlike inheritance, opening is not +transitive. + +[]{#qualifiednames} + +Opening cannot be restricted as inheritance can, but it can be +**qualified**. This means that the names from the opened modules cannot +be used as such, but only as prefixed by a qualifier and a dot (`.`). +The qualifier can be any identifier, including the name of the module. +Here is an example of an *opens* list: + + open A, (X = XSLTS), (Y = XSLTS), B + +If `A` defines the constant `a`, it can be accessed by the names + + a A.a + +If `XSLTS` defines the constant `x`, it can be accessed by the names + + X.x Y.x XSLTS.x + +Thus qualification by real module name is always possible, and one and +the same module can be qualified in different ways at the same time (the +latter can be useful if you want to be able to change the +implementations of some constants to a different resource later). Since +the qualification with real module name is always possible, it is not +possible to \"swap\" the names of modules locally: + + open (A=B), (B=A) -- NOT POSSIBLE! + +The list of qualifiers names and module names in a module header may +thus not contain any duplicates. + + +### Name resolution + +[]{#renaming} + +**Name resolution** is the compiler phase taking place after inheritance +linking. It qualifies all names occurring in the definition parts of +judgements (that is, just excluding the defined names themselves) with +the names of the modules they come from. If a name can come from +different modules (that is, not from their common ancestor), a conflict +is reported; this decision is hence not dependent on e.g. types, which +are known only at a later phase. + +Qualification of names is the main device for avoiding conflicts in name +resolution. No other information is used, such as priorities between +modules. However, if a name is defined in different opened modules but +never used in the module body, a conflict does not arise: conflicts +arise only when names are used. Also in this respect, opening is thus +different from inheritance, where conflicts are checked independently of +use. + +As usual, inner scope has priority in name resolution. This means that +if an identifier is in scope as a bound variable, it will not be +interpreted as a constant, unless qualified by a module name (variable +bindings are explained [here](#variablebinding)). + + +### Functor instantiations + +We have dealt with the principles of module headers, inheritance, and +names in a general way that applies to all module types. The exception +is functor instantiations, that have an extra part of the instantiating +equations, assigning an instance to every interface. Here is a typical +example, displaying the full generality: + + concrete FoodsEng of Foods = PhrasesEng ** + FoodsI-[Pizza] with + (Syntax = SyntaxEng), + (LexFoods = LexFoodsEng) ** + open SyntaxEng, ParadigmsEng in { + lin Pizza = mkCN (mkA "Italian") (mkN "pie") ; + } + +(The example is modified from Section 5.9 in the GF Tutorial.) + +The instantiation syntax is similar to qualified *opens*. The +left-hand-side names must be interfaces, the right-hand-side names their +instances. (Recall that `abstract` can be use as `interface` and +`concrete` as its `instance`.) Inheritance from the functor can be +restricted, typically in the purpose of defining some excluded functions +in language-specific ways in the module body. + + +### Completeness + +(This section refers to the forms of judgement introduced +[here](#judgementforms).) + +A `concrete` is complete with respect to an `abstract`, if it contains a +`lincat` definition for every `cat` declaration, and a `lin` definition +for every `fun` declaration. + +The same completeness criterion applies to functor instantiations. It is +not possible to use a partial functor instantiation, leading to another +functor. + +Functors do not need to be complete in the sense concrete modules need. +The missing definitions can then be provided in the body of each functor +instantiation. + +A `resource` is complete, if all its `oper` and `param` judgements have +a definition part. While a `resource` must be complete, an `interface` +need not. For an `interface`, it is the definition parts of judgements +are optional. + +An `instance` is complete with respect to an `interface`, if it gives +the definition parts of all `oper` and `param` judgements that are +omitted in the `interface`. Giving definitions to judgements that have +already been defined in the `interface` is illegal. Type signatures, on +the other hand, can be repeated if the same types are used. + +In addition to completing the definitions in an `interface`, its +instance may contain other judgements, but these must all be complete +with definitions. + +Here is an example of an instance and its interface showing the above +variations: + + interface Pos = { + param Case ; -- no definition + param Number = Sg | Pl ; -- definition given + oper Noun : Type = { -- relative definition given + s : Number => Case => Str + } ; + oper regNoun : Str -> Noun ; -- no definition + } + + instance PosEng of Pos = { + param Case = Nom | Gen ; -- definition of Case + -- Number and Noun inherited + oper regNoun = \dog -> { -- type of regNoun inherited + s = table { -- definition of regNoun + Sg => table { + Nom => dog + -- etc + } + } ; + oper house_N : Noun = -- new definition + regNoun "house" ; + } + + +Judgements +---------- + + +### Overview of the forms of judgement + +[]{#judgementforms} + +A module body in GF is a set of **judgements**. Judgements are +definitions or declarations, sometimes combinations of the two; the +common feature is that every judgement introduces a name, which is +available in the module and whenever the module is extended or opened. + +There are several different **forms of judgement**, identified by +different **judgement keywords**. Here is a list of all these forms, +together with syntax descriptions and the types of modules in which each +form can occur. The table moreover indicates whether the judgement has a +default value, and whether it contributes to the **name base**, i.e. +introduces a new name to the scope. + + judgement where module default base + ----------------------------- ---------------------------- ------------ --------- ------ + `cat` C G G context abstract N/A yes + `fun` f : A A type abstract N/A yes + `def` f ps = t f fun, ps patterns, t term abstract yes no + `data` C = f `|` \... `|` g C cat, f\...g fun abstract yes no + `lincat` C = T C cat, T type concrete\* yes yes + `lin` f = t f fun, t term concrete\* no yes + `lindef` C = t C cat, t term concrete\* yes no + `linref` C = t C cat, t term concrete\* yes no + `printname cat` C = t C cat, t term concrete\* yes no + `printname fun` f = t f fun, t term concrete\* yes no + `param` P = C`|` \... `|` D C\...D constructors resource\* N/A yes + `oper` f : T = t T type, t term resource\* N/A yes + `flags` o = v o flag, v value all yes N/A + +Judgements that have default values are rarely used, except `lincat` and +`flags`, which often need values different from the defaults. + +Introducing a name twice in the same module is an error. In other words, +all judgements that have a \"yes\" in the name base column, must have +distinct identifiers on their left-hand sides. + +All judgement end with semicolons (`;`). + +In addition to the syntax given in the table, many of the forms have +syntactic sugar. This sugar will be explained below in connection to +each form. There are moreover two kinds of syntactic sugar common to all +forms: + +- the judgement keyword is shared between consecutive judgements until + a new keyword appears: + `keyw J ; K ;` === `keyw J ; keyw K ;` +- the right-hand sides of colon (`:`) and equality (`=`) can be + shared, by using comma (`,`) as separator of left-hand sides, which + must consist of identifiers + `c,d : T` === `c : T ; d : T ;` + `c,d = t` === `c = t ; d = t ;` + +These conventions, like all syntactic sugar, are performed at an early +compilation phase, directly after parsing. This means that e.g. + + lin f,g = \x -> x ; + +can be correct even though `f` and `g` required different function +types. + +Within a module, judgements can occur in any order. In particular, a +name can be used before it is introduced. + +The explanations of judgement forms refer to the notions of **type** and +**term** (the latter also called **expression**). These notions will be +explained in detail [here](#expressions). + + +### Category declarations, cat + +[]{#catjudgements} + +Category declarations + +`cat` *C* *G* + +define the **basic types** of abstract syntax. A basic type is formed +from a category by giving values to all variables in the **context** +*G*. If the context is empty, the basic type looks the same as the +category itself. Otherwise, application syntax is used: + +*C* *a*~1~\...*a*~n~ + + +### Hypotheses and contexts + +[]{#contexts} + +A context is a sequence of **hypotheses**, i.e. variable-type pairs. A +hypothesis is written + +`(` *x* `:` *T* `)` + +and a sequence does not have any separator symbols. As syntactic sugar, + +- variables can share a type, + `(` *x,y* `:` *T* `)` === `(` *x* `:` *T* `)` `(` *y* `:` *T* `)` +- a **wildcard** can be used for a variable not occurring in types + later in the context, + `(` `_` `:` *T* `)` === `(` *x* `:` *T* `)` +- if the variable does not occur later, it can be omitted altogether, + and parentheses are not used, + *T* === `(` *x* `:` *T* `)` + But if *T* is more complex than an identifier, it needs parentheses + to be separated from the rest of the context. + +An abstract syntax has **dependent types**, if any of its categories has +a non-empty context. + + +### Function declarations, fun + +Function declarations, + +`fun` *f* `:` *T* + +define the **syntactic constructors** of abstract syntax. The type *T* +of *f* is built built from basic types (formed from categories) by using +the function type constructor `->`. Thus its form is + +(*x*~1~ `:` *A*~1~) `->` \... `->` (*x*~n~ `:` *A*~n~) `->` *B* + +where *Ai* are types, called the **argument types**, and *B* is a basic +type, called the **value type** of *f*. The **value category** of *f* is +the category that forms the type *B*. + +A **syntax tree** is formed from *f* by applying it to a full list of +arguments, so that the result is of a basic type. + +A **higher-order function** is one that has a function type as an +argument. The concrete syntax of GF does not support displaying the +bound variables of functions of higher than second order, but they are +legal in abstract syntax. + +An abstract syntax is **context-free**, if it has neither dependent +types nor higher-order functions. Grammars with context-free abstract +syntax are an important subclass of GF, with more limited complexity +than full GF. Whether the *concrete* syntax is context-free in the sense +of the Chomsky hierarchy is independent of the context-freeness of the +abstract syntax. + + +### Function definitions, def + +Function definitions, + +`def` *f* *p*~1~ \... *p*~n~ `=` *t* + +where *f* is a `fun` function and *p*~i~\# are patterns, impose a +relation of **definitional equality** on abstract syntax trees. They +form the basis of **computation**, which is used when comparing whether +two types are equal; this notion is relevant only if the types are +dependent. Computation can also be used for the **normalization** of +syntax trees, which applies even in context-free abstract syntax. + +The set of `def` definitions for *f* can be scattered around the module +in which *f* is introduced as a function. The compiler builds the set of +pattern equations in the order in which the equations appear; this order +is significant in the case of overlapping patterns. All equations must +appear in the same module in which *f* itself declared. + +The syntax of patterns will be specified [here](#patternmatching), +commonly for abstract and concrete syntax. In abstract syntax, +**constructor patterns** are those of the form + +*C* *p*~1~ \... *p*~n~ + +where *C* is declared as `data` for some abstract syntax category (see +next section). A **variable pattern** is either an identifier or a +wildcard. + +A common pitfall is to forget to declare a constructor as data, which +causes it to be interpreted as a variable pattern in definitions. + +Computation is performed by applying definitions and beta conversions, +and in general by using **pattern matching**. Computation and pattern +matching are explained commonly for abstract and concrete syntax +[here](#patternmatching). + +In contrast to concrete syntax, abstract syntax computation is +completely **symbolic**: it does not produce a value, but just another +term. Hence it is not an error to have incomplete systems of pattern +equations for a function. In addition, the definitions can be +**recursive**, which means that computation can fail to terminate; this +can never happen in concrete syntax. + + +### Data constructor definitions, data + +A data constructor definition, + +`data` *C* `=` *f*~1~ `|` \... `|` *f*~n~ + +defines the functions *f1*\...*fn* to be **constructors** of the +category *C*. This means that they are recognized as constructor +patterns when used in function definitions. + +In order for the data constructor definition to be correct, +*f*~1~\...*f*~n~ must be functions with *C* as their value category. + +The complete set of constructors for a category *C* is the union of all +its data constructor definitions. Thus a category can be \"extended\" by +new constructors afterwards. However, all these constructor definitions +must appear in the same module in which the category is itself defined. + +There is syntactic sugar for declaring a function as a constructor at +the same time as introducing it: + +`data` *f* : *A*~1~ `->` \... `->` *A*~n~ `->` *C* *t*~1~ \... *t*~m~ + +=== + +`fun` *f* : *A*~1~ `->` \... `->` *A*~n~ `->` *C* *t*~1~ \... *t*~m~ ; +`data` *C* = *f* + + +### The semantic status of an abstract syntax function + +There are three possible statuses for a function declared in a `fun` +judgement: + +- primitive notion: the default status +- constructor: the function appears on the right-hand side in `data` + judgement +- defined: the function has a `def` definition + +The \"constructor\" and \"defined\" statuses are in contradiction with +each other, whereas the primitive notion status is overridden by any of +the two others. + +This distinction is relevant for the semantics of abstract syntax, not +for concrete syntax. It shows in the way patterns are treated in +equations in `def` definitions: a constructor in a pattern matches only +itself, whereas any other name is treated as a variable pattern, which +matches anything. + + +### Linearization type definitions, lincat + +A linearization type definition, + +`lincat` *C* `=` *T* + +defines the type of linearizations of trees whose type has category *C*. +Type dependences have no effect on the linearization type. + +The type *T* must be a **legal linearization type**, which means that it +is a *record type* whose fields have either parameter types, the type +Str of strings, or table or record types of these. In particular, +function types may not appear in *T*. A detailed explanation of types in +concrete syntax will be given [here](#cnctypes). + +If *K* is the concrete syntax of an abstract syntax *A*, then *K* must +define the linearization type of all categories declared in *A*. +However, the definition can be omitted from the source code, in which +case the default type `{s : Str}` is used. + + +### Linearization definitions, lin + +A linearization definition, + +`lin` *f* `=` *t* + +defines the linearizations function of function *f*, i.e. the function +used for linearizing trees formed by *f*. + +The type of *t* must be the homomorphic image of the type of *f*. In +other words, if + +`fun` *f* `:` *A*~1~ `->` \... `->` *A*~n~ `->` *A* + +then + +`lin` *f* `:` *A*~1~\* `->` \... `->` *A*~n~\* `->` *A*\* + +where the type *T*\* is defined as follows depending on *T*: + +- (*C* *t*~1~ \... *t*~n~)\* = *T*, if `lincat` *C* `=` *T* +- (*B*~1~ `->` \... `->` *B*~m~ `->` *B*)\* = *B*\* + `** {$0,...,$m : Str}` + +The second case is relevant for higher-order functions only. It says +that the linearization type of the value type is extended by adding a +string field for each argument types; these fields store the variable +symbol used for the binding of each variable. + +[]{#HOAS} + +Since the arguments of a function argument are treated as bare strings, +orders higher than the second are irrelevant for concrete syntax. + +There is syntactic sugar for binding the variables of the linearization +of a function on the left-hand side: + +`lin` *f* *p* `=` *t* === `lin` *f* `= \`*p* `->` *t* + +The pattern *p* must be either a variable or a wildcard (`_`); this is +what the syntax of lambda abstracts (`\p -> t`) requires. + + +### Linearization default definitions, lindef + +[]{#lindefjudgements} + +A linearization default definition, + +`lindef` *C* `=` *t* + +defines the default linearization of category *C*, i.e. the function +applicable to a string to make it into an object of the linearization +type of *C*. + +Linearization defaults are invoked when linearizing variable bindings in +higher-order abstract syntax. A variable symbol is then presented as a +string, which must be converted to correct type in order for the +linearization not to fail with an error. + +The other use of the defaults is for linearizing metavariables and +abstract functions without linearization in the concrete syntax. In the +first case the default linearization is applied to the string `"?X"` +where `X` is the unique index of the metavariable, and in the second +case the string is `"[f]"` where `f` is the name of the abstract +function with missing linearization. + +Usually, linearization defaults are generated by using the default rule +that \"uses the symbol itself for every string, and the first value of +the parameter type for every parameter\". The precise definition is by +structural recursion on the type: + +- default(Str,s) = s +- default(P,s) = \#1(P) +- default(P =\> T,s) = `\\_ =>` default(T,s) +- default(`{`\... ; r : R ; \...`}`,s) = `{`\... ; r : default(R,s) ; + \...`}` + +The notion of the first value of a parameter type (\#1(P)) is defined +[below](#paramvalues). + + +### Linearization reference definitions, linref + +[]{#linrefjudgements} + +A linearization reference definition, + +`linref` *C* `=` *t* + +defines the reference linearization of category *C*, i.e. the function +applicable to an object of the linearization type of *C* to make it into +a string. + +The reference linearization is always applied to the top-level node of +the abstract syntax tree. For example when we linearize the tree +`f x1 x2 .. xn`, then we first apply `f` to its arguments which gives us +an object of the linearization type of its category. After that we apply +the reference linearization for the same category to get a string out of +the object. This is particularly useful when the linearization type of +*C* contains discontious constituents. In this case usually the +reference linearization glues the constituents together to produce an +intuitive linearization string. + +The reference linearization is also used for linearizing metavariables +which stand in function position. For example the tree +`f (? x1 x2 .. xn)` is linearized as follows. Each of the arguments +`x1 x2 .. xn` is linearized, and after that the reference linearization +of the its category is applied to the output of the linearization. The +result is a sequence of `n` strings which are concatenated into a single +string. The final string is the input to the default linearization of +the category for the argument of `f`. After applying the default +linearization we get an object that we could safely pass to `f`. + +Usually, linearization references are generated by using the rule that +\"picks the first string in the linearization type\". The precise +definition is by structural recursion on the type: + +- reference(Str,o) = Just o +- reference(P,s) = Nothing +- reference(P =\> T,o) = reference(T,o ! \#1(P)) \|\| reference(T,o ! + \#2(P)) \|\| \... \|\| reference(T,o ! \#n(P)) +- reference({r1 : R1; \... rn : Rn},o) = reference(R1, o.r1) \|\| + reference(R2, o.r2) \|\| \... \|\| reference(Rn, o.rn) + +Here each call to reference returns either `(Just o)` or `Nothing`. When +we compute the reference for a table or a record then we pick the +reference for the first expression for which the recursive call gives us +`Just`. If we get `Nothing` for all of them then the final result is +`Nothing` too. + + +### Printname definitions, printname cat and printname fun + +A category printname definition, + +`printname cat` *C* `=` *s* + +defines the printname of category *C*, i.e. the name used in some +abstract syntax information shown to the user. + +Likewise, a function printname definition, + +`printname fun` *f* `=` *s* + +defines the printname of function *f*, i.e. the name used in some +abstract syntax information shown to the user. + +The most common use of printnames is in the interactive syntax editor, +where printnames are displayed in menus. It is possible e.g. to adapt +them to each language, or to embed HTML tooltips in them (as is used in +some HTML-based editor GUIs). + +Usually, printnames are generated automatically from the symbol and/or +concrete syntax information. + + +### Parameter type definitions, param + +[]{#paramjudgements} + +A parameter type definition, + +`param` *P* `=` *C*~1~ *G*~1~ `|` \... `|` *C*~n~ *G*~n~ + +defines a parameter type *P* with the **parameter constructors** +*C*~1~\...*C*~n~, with their respective contexts *G*~1~\...*G*~n~. + +[]{#paramtypes} + +Contexts have the same syntax as in `cat` judgements, explained +[here](#catjudgements). Since dependent types are not available in +parameter type definitions, the use of variables is never necessary. The +types in the context must themselves be **parameter types**, which are +defined as follows: + +- Given the judgement `param` *P* \..., *P* is a parameter type. +- A record type of parameter types is a parameter type. +- `Ints` *n* (an initial segment of integers) is a parameter type. + +The names defined by a parameter type definition include both the type +name *P* and the constructor names *C*~i~. Therefore all these names +must be distinct in a module. + +A parameter type may not be recursive, i.e. *P* itself may not occur in +the contexts of its constructors. This restriction extends to mutual +recursion: we say that *P* **depends** on the types that occur in the +contexts of its constructors and on all types that those types depend +on, and state that *P* may not depend on itself. + +In an `interface module`, it is possible to declare a parameter type +without defining it, + +`param` *P* `;` + + +### Parameter values + +[]{#paramvalues} + +All parameter types are finite, and the GF compiler will internally +compute them to **lists of parameter values**. These lists are formed by +traversing the `param` definitions, usually respecting the order of +constructors in the source code. For records, bibliographical sorting is +applied. However, both the order of traversal of `param` definitions and +the order of fields in a record are specified in a compiler-internal +way, which means that the programmer should not rely on any particular +order. + +The order of the list of parameter values can affect the program in two +cases: + +- in the default `lindef` definition ([here](#lindefjudgements)), the + first value is chosen +- in course-of-value tables ([here](#tables)), the compiler-internal + order is followed + +The first usage implies that, if `lindef` definitions are essential for +the application, they should be given manually. The second usage implies +that course-of-value tables should be avoided in hand-written GF code. + +In run-time grammar generation, all parameter values are translated to +integers denotions positions in these parameter lists. + + +### Operation definitions, oper + +An operation definition, + +`oper` *h* `:` *T* `=` *t* + +defines an **operation** *h* of type *T*, with the computation rule + +*h* ==\> *t* + +The type *T* can be any concrete syntax type, including function types +of any order. The term *t* must have the type *T*, as defined +[here](#expressions). + +As syntactic sugar, the type can be omitted, + +`oper` *h* `=` *t* + +which works in two cases + +- the type can be inferred from *t* (compiler-dependent) +- the definition occurs in an `instance` and the type is given in the + `interface` + +It is also possible to give the type and the definition separately: + +`oper` *h* `:` *T* ; `oper` *h* `=` *t* === `oper` *h* `:` *T* `=` *t* + +The order of the type part and the definition part is free, and there +can be other judgements in between. However, they must occur in the same +`resource` module for it to be complete (as defined +[here](#completeness)). In an `interface` module, it is enough to give +the type. + +When only the definition is given, it is possible to use a shorthand +similar to `lin` judgements: + +`oper` *h* *p* `=` *t* === `oper` *h* `=` `\`*p* `->` *t* + +The pattern *p* is either a variable or a wildcard (`_`). + +Operation definitions may not be recursive, not even mutually recursive. +This condition ensures that functions can in the end be eliminated from +concrete syntax code (as explained [here](#functionelimination)). + + +### Operation overloading + +[]{#overloading} + +One and the same operation name *h* can be used for different +operations, which have to have different types. For each call of *h*, +the type checker selects one of these operations depending on what type +is expected in the context of the call. The syntax of overloaded +operation definitions is + +`oper` *h* `= overload {`*h* : *T*~1~ = *t*~1~ ; \... ; *h* : *T*~n~ = +*t*~n~`}` + +Notice that *h* must be the same in all cases. This format can be used +to give the complete implementation; to give just the types, e.g. in an +interface, one can use the form + +`oper` *h* `: overload {`*h* : *T*~1~ ; \... ; *h* : *T*~n~`}` + +The implementation of this operation typing is given by a judgement of +the first form. The order of branches need not be the same. + + +### Flag definitions, flags + +A flag definition, + +`flags` *o* `=` *v* + +sets the value of the flag *o*, to be used when compiling or using the +module. + +The flag *o* is an identifier, and the value *v* is either an identifier +or a quoted string. + +Flags are a kind of metadata, which do not strictly belong to the GF +language. For instance, compilers do not necessarily check the +consistency of flags, or the meaningfulness of their values. The +inheritance of flags is not well-defined; the only certain rule is that +flags set in the module body override the settings from inherited +modules. + +Here are some flags commonly included in grammars. + + flag value description module + ------------ -------------------- ---------------------------------- ---------- + `coding` character encoding encoding used in string literals concrete + `startcat` category default target of parsing abstract + +The possible values of these flags are specified [here](#flagvalues). +Note that the `lexer` and `unlexer` flags are deprecated. If you need +their functionality, you should use supply them to GF shell commands +like so: + + put_string -lextext "страви, напої" | parse + +A summary of their possible values can be found at the [GF shell +reference](http://www.grammaticalframework.org/doc/gf-shell-reference.html). + + +Types and expressions +--------------------- + + +### Overview of expression forms + +[]{#expressions} + +Like many dependently typed languages, GF makes no syntactic distinction +between expressions and types. An illegal use of a type as an expression +or vice versa comes out as a type error. Whether a variable, for +instance, stands for a type or an expression value, can only be resolved +from its context of use. + +One practical consequence of the common syntax is that global and local +definitions (`oper` judgements and `let` expressions, respectively) work +in the same way for types and expressions. Thus it is possible to +abbreviate a type occurring in a type expression: + + let A = {s : Str ; b : Bool} in A -> A -> A + +Type and other expressions have a system of **precedences**. The +following table summarizes all expression forms, from the highest to the +lowest precedence. Some expressions are moreover left- or +right-associative. + + prec expression example explanation + --------- ------------------------------------ ----------------------------------- + 7 `c` constant or variable + 7 `Type` the type of types + 7 `PType` the type of parameter types + 7 `Str` the type of strings/token lists + 7 `"foo"` string literal + 7 `123` integer literal + 7 `0.123` floating point literal + 7 `?` metavariable + 7 `[]` empty token list + 7 `[C a b]` list category + 7 `["foo bar"]` token list + 7 `{"s : Str ; n : Num}` record type + 7 `{"s = "foo" ; n = Sg}` record + 7 `` tuple + 7 `` type-annotated expression + 6 left `t.r` projection or qualification + 5 left `f a` function application + 5 `table {Sg => [] ; _ => "xs"}` table + 5 `table P [a ; b ; c]` course-of-values table + 5 `case n of {Sg => [] ; _ => "xs"}` case expression + 5 `variants {"color" ; "colour"}` free variation + 5 `pre {vowel => "an" ; _ => "a"}` prefix-dependent choice + 4 left `t ! v` table selection + 4 left `A * B` tuple type + 4 left `R ** {b : Bool}` record (type) extension + 3 left `t + s` token gluing + 2 left `t ++ s` token list concatenation + 1 right `\x,y -> t` function abstraction (\"lambda\") + 1 right `\\x,y => t` table abstraction + 1 right `(x : A) -> B` dependent function type + 1 right `A -> B` function type + 1 right `P => T` table type + 1 right `let x = v in t` local definition + 1 `t where {x = v}` local definition + 1 `in M.C "foo"` rule by example + +Any expression in parentheses (`(`*exp*`)`) is in the highest precedence +class. + + +### The functional fragment: expressions in abstract syntax + +[]{#functiontype} + +The expression syntax is the same in abstract and concrete syntax, +although only a part of the syntax is actually usable in well-typed +expressions in abstract syntax. An abstract syntax is essentially used +for defining a set of types and a set of functions between those types. +Therefore it needs essentially the **functional fragment** of the +syntax. This fragment comprises two kinds of types: + +- **basic types**, of form *C a1\...an* where + - `cat` *C* (*x*~1~ : *A*~1~)\...(*x*~n~ : *A*~n~), including the + predefined categories `Int`, `Float`, and `String` explained + [here](#predefabs) + - *a*~1~ : *A*~1~,\...,*a*~n~ : *A*~n~{*x*~1~ = + *a*~1~,\...,*x*~n-1~=*a*~n-1~} + + + +- **function types**, of form (*x* : *A*) `->` *B*, where + - *A* is a type + - *B* is a type possibly depending on *x* : *A* + +When defining basic types, we used the notation *t*{*x*~1~ = +*t*~1~,\...,*x*~n~=*t*~n~} for the **substitution** of values to +variables. This is a metalevel notation, which denotes a term that is +formed by replacing the free occurrences of each variable *x*~i~ by +*t*~i~. + +These types have six kinds of expressions: + +- **constants**, *f* : *A* where + - `fun` *f* : *A* + + + +- **literals** for integers, floats, and strings (defined in + [here](#predefabs)) + + + +- **variables**, *x* : *A* where + - *x* has been introduced by a binding + + + +- **applications**, *f a* : *B*{*x*=*a*}, where + - *f* : (*x* : *A*) `->` *B* + - *a* : *A* + + + +- **abstractions**, `\`*x* `->` *b* : (*x* : *A*) `->` *B*, where + - *b* : *B* possibly depending on *x* : *A* + + + +- **metavariables**, `?`, as introduced in intermediate phases of + incremental type checking; metavariables are not permitted in GF + source code + +[]{#variablebinding} + +The notion of **binding** is defined for occurrences of variables in +subexpressions as follows: + +- in (*x* : *A*) `->` *B*, *x* is bound in *B* +- in `\`*x* `->` *b*, *x* is bound in *b* +- in `def` *f* *p*~1~ \... *p*~n~ = *t*, any pattern variable + introduced in any *pi* is bound in *t* (as defined + [here](#patternmatching)) + +As syntactic sugar, function types have sharing of types and suppression +of variables, in the same way as contexts (defined [here](#contexts)): + +- variables can share a type, + `(` *x,y* `:` *A* `)` `->` *B* === `(` *x* `:` *A* `) -> (` *y* `:` + *A* `) ->` *B* +- a **wildcard** can be used for a variable not occurring later in the + type, + `(` `_` `:` *A* `) ->` *B* === `(` *x* `:` *T* `) ->` *B* +- if the variable does not occur later, it can be omitted altogether, + and parentheses are not used, + *A* `->` *B* === `(` *\_* `:` *A* `) ->` *B* + +There is analogous syntactic sugar for constant functions, + +`\`*\_* `->` *t* === `\`*x* `->` *t* + +where *x* does not occur in *t*, and for multiple lambda abstractions: + +`\`*p,q* `->` *t* === `\`*p* `->` `\`*q* `->` *t* + +where *p* and *q* are variables or wild cards (`_`). + + +### Conversions + +Among expressions, there is a relation of **definitional equality** +defined by four **conversion rules**: + +- **alpha conversion**: `\`*x* `->` *b* = `\`*y* `->` *b*{*x*=*y*} + + + +- **beta conversion**: (`\`*x* `->` *b*) *a* = *b*{*x*=*a*} + + + +- **delta conversion**: *f* *a*~1~ \... *a*~n~ = *tg*, if + - there is a definition `def` *f* *p*~1~ \... *p*~n~ = *t* + - this definition is the first for *f* that matches the sequence + *a*~1~ \.... *a*~n~, with the substitution *g* + + + +- **eta conversion**: *c* = `\`*x* `->` *c x*, if *c* : (*x* : *A*) + `->` *B* + +Pattern matching substitution used in delta conversion is defined +[here](#patternmatching). + +An expression is in **beta-eta-normal form** if + +- it has no subexpressions to which beta conversion applies (beta + normality) +- each constant or variable whose type is a function type must be + **eta-expanded**, i.e. made into an abstract equal to it by eta + conversion (eta normality) + +Notice that the iteration of eta expansion would lead to an expression +not in beta-normal form. + + +### Syntax trees + +[]{#syntaxtrees} + +The **syntax trees** defined by an abstract syntax are well-typed +expressions of basic types in beta-eta normal form. Linearization +defined in concrete syntax applies to all and only these expressions. + +There is also a direct definition of syntax trees, which does not refer +to beta and eta conversions: keeping in mind that a type always has the +form + +(*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->` *B* + +where *Ai* are types and *B* is a basic type, a syntax tree is an +expression + +*b* *t*~1~ \... *t*~n~ : *B\'* + +where + +- *B\'* is the basic type *B*{*x*~1~ = *t*~1~,\...,*x*~n~ = *t*~n~} +- `fun` *b* : (*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->` + *B* +- each *t*~i~ has the form `\`*z*~1~,\...,*z*~m~ `->` *c* where *A*~i~ + is + (*y*~1~ : *B*~1~) `->` \... `->` (*y*~m~ : *B*~m~) `->` *B* + + +### Predefined types in abstract syntax + +[]{#predefabs} + +GF provides three predefined categories for abstract syntax, with +predefined expressions: + + category expressions + ---------- --------------------------------------- + `Int` integer literals, e.g. `123` + `Float` floating point literals, e.g. `12.34` + `String` string literals, e.g. `"foo"` + +These categories take no arguments, and they can be used as basic types +in the same way as if they were introduced in `cat` judgements. However, +it is not legal to define `fun` functions that have any of these types +as value type: their only well-typed expressions are literals as defined +in the above table. + + +### Overview of expressions in concrete syntax + +[]{#cnctypes} + +Concrete syntax is about defining mappings from abstract syntax trees to +**concrete syntax objects**. These objects comprise + +- records +- tables +- strings +- parameter values + +Thus functions are not concrete syntax objects; however, the mappings +themselves are expressed as functions, and the source code of a concrete +syntax can use functions under the condition that they can be eliminated +from the final compiled grammar (which they can; this is one of the +fundamental properties of compilation, as explained in more detail in +the *JFP* article). + +Concrete syntax thus has the same function types and expression forms as +abstract syntax, specified [here](#functiontype). The basic types +defined by categories (`cat` judgements) are available via grammar reuse +explained [here](#reuse); this also comprises the predefined categories +`Float` and `String`. + + +### Values, canonical forms, and run-time variables + +In abstract syntax, the conversion rules fiven [here](#conversions) +define a computational relation among expressions, but there is no +separate notion of a **value** of computation: the value (the end point) +of a computation chain is simply an expression to which no more +conversions apply. In general, we are interested in expressions that +satisfy the conditions of being syntax trees (as defined +[here](#syntaxtrees)), but there can be many computationally equivalent +syntax trees which nonetheless are distinct syntax trees and hence have +different linearizations. The main use of computation in abstract syntax +is to compare types in dependent type checking. + +In concrete syntax, the notion of values is central. At run time, we +want to compute the values of linearizations; at compile time, we want +to perform **partial evaluation**, which computes expressions as far as +possible. To specify what happens in computation we therefore have to +distinguish between **canonical forms** and other forms of expressions. +The canonical forms are defined separately for each form of type, +whereas the other forms may usually produce expressions of any type. + +[]{#linexpansion} []{#runtimevariables} + +What is done at compile time is the elimination of any noncanonical +forms, except for those depending on **run-time variables**. Run-time +variables are the same as the **argument variables** of linearization +rules, i.e. the variables *x*~1~,\...,*x*~n~ in + +`lin` *f* `= \` *x*~1~,\...,*x*~n~ `->` *t* + +where + +`fun` *f* `:` (*x*~1~ : *A*~1~) `->` \... `->` (*x*~n~ : *A*~n~) `->` +*B* + +Notice that this definition refers to the **eta-expanded** linearization +term, which has one abstracted variable for each argument type of *f*. +These variables are not necessarily explicit in GF source code, but +introduced by the compiler. + +Since certain expression forms should be eliminated in compilation but +cannot be eliminated if run-time variables appear in them, errors can +appear late in compilation. This is an issue with the following +expression forms: + +- gluing (`s + t`), defined [here](#gluing) +- pattern matching on strings, defined [here](#patternmatching) +- predefined string operations, defined [here](#predefcnc) (those + taking `Str` arguments) + + +### Token lists, tokens, and strings + +[]{#strtype} + +The most prominent basic type is `Str`, the type of **token lists**. +This type is often sloppily referred to as the type of **strings**; but +it should be kept in mind that the objects of `Str` are *lists* of +strings rather than single strings. + +Expressions of type `Str` have the following canonical forms: + +- **tokens**, i.e. **string literals**, in double quotes, e.g. `"foo"` +- **the empty token list**, `[]` +- **concatenation**, *s* `++` *t*, where *s,t* : `Str` +- **prefix-dependent choice**, + `pre {p1 => s1 ; ... ; pn => sn ; _ => s }`, where + - *s*, *s*~1~,\...,*s*~n~, *p*~1~,\...,*p*~n~ : `Str` + +For convenience, the notation is overloaded so that tokens are +identified with singleton token lists, and there is no separate type of +tokens (this is a change from the *JFP* article). The notion of a token +is still important for compilation: all tokens introduced by the grammar +must be known at compile time. This, in turn, is required by the parsing +algorithms used for parsing with GF grammars. + +In addition to string literals, tokens can be formed by a specific +non-canonical operator: + +- **gluing**, *s* `+` *t*, where *s,t* : `Str` + +[]{#gluing} + +Being noncanonical, gluing is equipped with a computation rule: string +literals are glued by forming a new string literal, and empty token +lists can be ignored: + +- `"foo" + "bar"` ==\> `"foobar"` +- *t* `+ []` ==\> *t* +- `[] +` *t* ==\> *t* + +Since tokens must be known at compile time, the operands of gluing may +not depend on run-time variables, as defined [here](#runtimevariables). + +As syntactic sugar, token lists can be given as bracketed string +literals, where spaces separate tokens: + +- **token lists**, `["one two three"]` === `"one" ++ "two" ++ "three"` + +Notice that there are no empty tokens, but the expression `[]` can be +used in a context requiring a token, in particular in gluing expression +below. Since `[]` denotes an empty token list, the following computation +laws are valid: + +- *t* `++ []` ==\> *t* +- `[] ++` *t* ==\> *t* + +Moreover, concatenation and gluing are associative: + +- s `+` (t `+` u) ==\> s `+` t `+` u +- s `++` (t `++` u) ==\> s `++` t `++` u + +For the programmer, associativity and the empty token laws mean that the +compiler can use them to simplify string expressions. It also means that +these laws are respected in pattern matching on strings. + +A prime example of prefix-dependent choice operation is the following +approximative expression for the English indefinite article: + + pre { + ("a" | "e" | "i" | "o") => "an" ; + _ => "a" + } ; + +This expression can be computed in the context of a subsequent token: + +- `pre {p1 => s1 ; ... ; pn => sn ; _ => s } ++ t` ==\> + - *s*~i~ for the first *i* such that the prefix *p*~i~ matches + *t*, if it exists + - *s* otherwise + +The **matching prefix** is defined by comparing the string with the +prefix of the token. If the prefix is a variant list of strings, then it +matches the token if any of the strings in the list matches it. + +The computation rule can sometimes be applied at compile time, but it +general, prefix-dependent choices need to be passed to the run-time +grammar, because they are not given a subsequent token to compare with, +or because the subsequent token depends on a run-time variable. + +The prefix-dependent choice expression itself may not depend on run-time +variables. + +*There is an older syntax for prefix-dependent choice, namely: +`pre { s ; s1 / p1 ; ... ; sn / pn}`. This syntax will not accept +strings as patterns.* + +*In GF prior to 3.0, a specific type* `Strs` *is used for defining +prefixes,* *instead of just* `variants` *of* `Str`. + + +### Records and record types + +A **record** is a collection of objects of possibly different types, +accessible by **projections** from the record with **labels** pointing +to these objects. A record is also itself an object, whose type is a +**record type**. Record types have the form + +`{` *r*~1~ : *A*~1~ `;` \... `;` *r*~n~ : *A*~n~ `}` + +where *n* \>= 0, each *A*~i~ is a type, and the labels *r*~i~ are +distinct. A record of this type has the form + +`{` *r*~1~ = *a*~1~ `;` \... `;` *r*~n~ = *a*~n~ `}` + +where each \#aii : \"Aii. A limiting case is the **empty record type** +`{}`, which has the object `{}`, the **empty record**. + +The **fields** of a record type are its parts of the form *r* : *A*, +also called **typings**. The **fields** of a record are of the form *r* += *a*, also called **value assignments**. Value assignments may +optionally indicate the type, as in *r* : *A* = *a*. + +The order of fields in record types and records is insignificant: two +record types (or records) are equal if they have the same fields, in any +order, and a record is an object of a record type, if it has +type-correct value assignments for all fields of the record type. The +latter definition implies the even stronger principle of **record +subtyping**: a record can have any type that has some subset of its +fields. This principle is explained further [here](#subtyping). + +All fields in a record must have distinct labels. Thus it is not +possible e.g. to \"redefine\" a field \"later\" in a record. + +Lexically, labels are identifiers (defined [here](#identifiers)). This +is with the exception of the labels selecting bound variables in the +linearization of higher-order abstract syntax, which have the form +`$`*i* for an integer *i*, as specified [here](#HOAS). In source code, +these labels should not appear in records fields, but only in +selections. + +Labels occur only in syntactic positions where they cannot be confused +with constants or variables. Therefore it is safe to write, as in +`Prelude`, + + ss : Str -> {s : Str} = \s -> {s = s} ; + +A **projection** is an expression of the form + +*t*.*r* + +where *t* must be a record and *r* must be a label defined in it. The +type of the projection is the type of that field. The computation rule +for projection returns the value assigned to that field: + +`{` \... `;` *r* = *a* `;` \... `}.`*r* ==\> *a* + +Notice that the dot notation *t*.*r* is also used for qualified names as +specified [here](#qualifiednames). This ambiguity follows tradition and +convenience. It is resolved by the following rules (before type +checking): + +1. if *t* is a bound variable or a constant in scope, *t*.*r* is + type-checked as a projection +2. otherwise, *t*.*r* is type-checked as a qualified name + +As syntactic sugar, types and values can be shared: + +- `{` \... `;` *r,s* : *A* `;` \... `}` === `{` \... `;` *r* : *A* `;` + *s* : *A* `;` \... `}` +- `{` \... `;` *r,s* = *a* `;` \... `}` === `{` \... `;` *r* = *a* `;` + *s* = *a* `;` \... `}` + +Another syntactic sugar are **tuple types** and **tuples**, which are +translated by endowing their unlabelled fields by the labels `p1`, +`p2`,\... in the order of appearance of the fields: + +- *A*~1~ `*` \... `*` *A*~n~ === `{` `p1` : *A*~1~ `;` \... `;` `pn` : + *A*~n~ `}` +- `<`*a*~1~ `,` \... `,` *a*~n~ `>` === `{` `p1` = *a*~1~`;` \... `;` + `pn` = *a*~n~ `}` + +A **record extension** is formed by adding fields to a record or a +record type. The general syntax involves two expressions, + +*R* `**` *S* + +The result is a record type or a record with a union of the fields of +*R* and *S*. It is therefore well-formed if + +- both *R* and *S* are either records or record types +- the labels in *R* and *S* are disjoint, if *R* and *S* are record + types + +(Since GF version 3.6) If *R* and *S* are record objects, then the +labels in them need not be disjoint. Labels defined in *S* are then +given priority, so that record extensions in fact works as **record +update**. A common pattern of using this feature is + + lin F x ... = x ** {r = ... x.r ...} + +where `x` is a record with many fields, just one of which is updated. +Following the normal binding conditions, `x.r` on the right hand side +still refers to the old value of the `r` field. + + +### Subtyping + +The possibility of having superfluous fields in a record forms the basis +of the **subtyping** relation. That *A* is a subtype of *B* means that +*a : A* implies *a : B*. This is clearly satisfied for records with +superfluous fields: + +- if *R* is a record type without the label *r*, then *R* `** {` *r* : + *A* `}` is a subtype of *R* + +The GF grammar compiler extends subtyping to function types by +**covariance** and **contravariance**: + +- covariance: if *A* is a subtype of *B*, then *C* `->` *A* is a + subtype of *C* `->` *B* +- contravariance: if *A* is a subtype of *B*, then *B* `->` *C* is a + subtype of *A* `->` *C* + +The logic of these rules is natural: if a function is returns a value in +a subtype, then this value is *a fortiori* in the supertype. If a +function is defined for some type, then it is *a fortiori* defined for +any subtype. + +In addition to the well-known principles of record subtyping and co- and +contravariance, GF implements subtyping for initial segments of +integers: + +- if *m* \< *n*, then `Ints` *m* is a subtype of `Ints` *n* +- `Ints` *n* is a subtype of `Integer` + +As the last rule, subtyping is transitive: + +- if *A* is a subtype of *B* and *B* is a subtype of *C*, then *A* is + a subtype of *C*. + + +### Tables and table types + +[]{#tables} + +One of the most characteristic constructs of GF is **tables**, also +called **finite functions**. That these functions are finite means that +it is possible to finitely enumerate all argument-value pairs; this, in +turn, is possible because the argument types are finite. + +A **table type** has the form + +*P* `=>` *T* + +where *P* must be a parameter type in the sense defined +[here](#paramtypes), whereas *T* can be any type. + +Canonical expressions of table types are **tables**, of the form + +`table` `{` *V*~1~ `=>` *t*~1~ ; \... ; *V*~n~ `=>` *t*~n~ `}` + +where *V*~1~,\...,*V*~n~ is the complete list of the parameter values of +the argument type *P* (defined [here](#paramvalues)), and each *t*~i~ is +an expression of the value type *T*. + +In addition to explicit enumerations, tables can be given by **pattern +matching**, + +`table` `{`*p*~1~ `=>` *t*~1~ ; \... ; *p*~m~ `=>` *t*~m~`}` + +where *p*~1~,\....,*p*~m~ is a list of patterns that covers all values +of type *P*. Each pattern *p*~i~ may bind some variables, on which the +expression *t*~i~ may depend. A complete account of patterns and pattern +matching is given [here](#patternmatching). + +A **course-of-values table** omits the patterns and just lists all +values. It uses the enumeration of all values of the argument type *P* +to pair the values with arguments: + +`table` *P* `[`*t*~1~ ; \... ; *t*~n~`]` + +This format is not recommended for GF source code, since the ordering of +parameter values is not specified and therefore a compiler-internal +decision. + +The argument type can be indicated in ordinary tables as well, which is +sometimes helpful for type inference: + +`table` *P* `{` \... `}` + +The **selection** operator `!`, applied to a table *t* and to an +expression *v* of its argument type + +*t* `!` *v* + +returns the first pattern matching result from *t* with *v*, as defined +[here](#patternmatching). The order of patterns is thus significant as +long as the patterns contain variables or wildcards. When the compiler +reorders the patterns following the enumeration of all values of the +argument type, this order no longer matters, because no overlap remains +between patterns. + +The GF compiler performs **table expansion**, i.e. an analogue of eta +expansion defined [here](#conversions), where a table is applied to all +values to its argument type: + +*t* : *P* `=>` *T* ==\> `table` *P* `[`*t* `!` *V*~1~ ; \... ; *t* `!` +*V*~n~`]` + +As syntactic sugar, one-branch tables can be written in a way similar to +lambda abstractions: + +`\\`*p* `=>` *t* === `table {`*p* `=>` *t* `}` + +where *p* is either a variable or a wildcard (`_`). Multiple bindings +can be abbreviated: + +`\\`*p,q* `=>` *t* === `\\`*p* `=>` `\\`*q* `=>` *t* + +**Case expressions** are syntactic sugar for selections: + +`case` *e* `of {`\...`}` === `table {`\...`} !` *e* + + +### Pattern matching + +[]{#patternmatching} + +We will list all forms of patterns that can be used in table branches. +We define their **variable bindings** and **matching substitutions**. + +We start with the patterns available for all parameter types, as well as +for the types `Integer` and `Str`. + +- A constructor pattern *C* *p*~1~\...*p*~n~ binds the union of all + variables bound in the subpatterns *p*~1~,\...,*p*~n~. It matches + any value *C* *V*~1~\...*V*~n~ where each *p*~i~\# matches *V*~i~, + and the matching substitution is the union of these substitutions. +- A record pattern `{` *r*~1~ `=` *p*~1~ `;` \... `;` *r*~n~ `=` + *p*~n~ `}` binds the union of all variables bound in the subpatterns + *p*~1~,\...,*p*~n~. It matches any value `{` *r*~1~ `=` *V*~1~ `;` + \... `;` *r*~n~ `=` *V*~n~ `;` \...`}` where each *p*~i~\# matches + *V*~i~, and the matching substitution is the union of these + substitutions. +- A variable pattern *x* (identifier other than parameter constructor) + binds the variable *x*. It matches any value *V*, with the + substitution {*x* = *V*}. +- The wild card `_` binds no variables. It matches any value, with the + empty substitution. +- A disjunctive pattern *p* `|` *q* binds the intersection of the + variables bound by *p* and *q*. It matches anything that either *p* + or *q* matches, with the first substitution starting with *p* + matches, from which those variables that are not bound by both + patterns are removed. +- A negative pattern `-` *p* binds no variables. It matches anything + that *p* does *not* match, with the empty substitution. +- An alias pattern *x* `@` *p* binds *x* and all the variables bound + by *p*. It matches any value *V* that *p* matches, with the same + substition extended by {*x* = *V*}. + +The following patterns are only available for the type `Str`: + +- A string literal pattern, e.g. `"s"`, binds no variables. It matches + the same string, with the empty substitution. +- A concatenation pattern, *p* `+` *q*, binds the union of variables + bound by *p* and *q*. It matches any string that consists of a + prefix matching *p* and a suffix matching *q*, with the union of + substitutions corresponding to the first match (see below). +- A repetition pattern *p*`*` binds no variables. It matches any + string that can be decomposed into strings that match *p*, with the + empty substitution. + +The following pattern is only available for the types `Integer` and +`Ints` *n*: + +- An integer literal pattern, e.g. `214`, binds no variables. It + matches the same integer, with the empty substitution. + +All patterns must be **linear**: the same pattern variable may occur +only once in them. This is what makes it straightforward to speak about +unions of binding sets and substitutions. + +Pattern matching is performed in the order in which the branches appear +in the source code: the branch of the first matching pattern is +followed. In concrete syntax, the type checker reject sets of patterns +that are not exhaustive, and warns for completely overshadowed patterns. +It also checks the type correctness of patterns with respect to the +argument type. In abstract syntax, only type correctness is checked, no +exhaustiveness or overshadowing. + +It follows from the definition of record pattern matching that it can +utilize partial records: the branch + + {g = Fem} => t + +in a table of type `{g : Gender ; n : Number} => T` means the same as + + {g = Fem ; n = _} => t + +Variables in regular expression patterns are always bound to the **first +match**, which is the first in the sequence of binding lists. For +example: + +- `x + "e" + y` matches `"peter"` with `x = "p", y = "ter"` +- `x + "er"*` matches `"burgerer"` with `x = "burg"` + + +### Free variation + +An expressions of the form + +`variants` `{`*t*~1~ ; \... ; *t*~n~`}` + +where all *t*~i~ are of the same type *T*, has itseld type *T*. This +expression presents *t*~i~,\...,*t*~n~ as being in **free variation**: +the choice between them is not determined by semantics or parameters. A +limiting case is + +`variants {}` + +which encodes a rule saying that there is no way to express a certain +thing, e.g. that a certain inflectional form does not exist. + +A common wisdom in linguistics is that \"there is no free variation\", +which refers to the situation where *all* aspects are taken into +account. For instance, the English negation contraction could be +expressed as free variation, + + variants {"don't" ; "do" ++ "not"} + +if only semantics is taken into account, but if stylistic aspects are +included, then the proper formulation might be with a parameter +distinguishing between informal and formal style: + + case style of {Informal => "don't" ; Formal => "do" ++ "not"} + +Since there is not way to choose a particular element from a +\`\`variants\` list, free variants is normally not adequate in +libraries, nor in grammars meant for natural language generation. In +application grammars meant to parse user input, free variation is a way +to avoid cluttering the abstract syntax with semantically insignificant +distinctions and even to tolerate some grammatical errors. + +Permitting `variants` in all types involves a major modification of the +semantics of GF expressions. All computation rules have to be lifted to +deal with lists of expressions and values. For instance, + +*t* `!` `variants` `{`*t*~1~ ; \... ; *t*~n~`}` ==\> `variants` `{`*t* +`!` *t*~1~ ; \... ; *t* `!` *t*~n~`}` + +This is done in such a way that variation does not distribute to records +(or other product-like structures). For instance, variants of records, + + variants {{s = "Auto" ; g = Neutr} ; {s = "Wagen" ; g = Masc}} + +is *not* the same as a record of variants, + + {s = variants {"Auto" ; "Wagen"} ; g = variants {Neutr ; Masc}} + +Variants of variants are flattened, + +`variants` `{`\...; `variants` `{`*t*~1~ ;\...; *t*~n~`}` ;\...`}` ==\> +`variants` `{`\...; *t*~1~ ;\...; *t*~n~ ;\...`}` + +and singleton variants are eliminated, + +`variants` `{`*t*`}` ==\> *t* + + +### Local definitions + +A **local definition**, i.e. a **let expression** has the form + +`let` *x* : *T* = *t* `in` *e* + +The type of *x* must be *T*, which also has to be the type of *t*. +Computation is performed by substituting *t* for *x* in *e*: + +`let` *x* : *T* = *t* `in` *e* ==\> *e* {*x* = *t*} + +As syntactic sugar, the type can be omitted if the type checker is able +to infer it: + +`let` *x* = *t* `in` *e* + +It is possible to compress several local definitions into one block: + +`let` *x* : *T* = *t* `;` *y* : *U* = *u* `in` *e* === `let` *x* : *T* = +*t* `in` `let` *y* : *U* = *u* `in` *e* + +Another notational variant is a definition block appearing after the +main expression: + +*e* `where` `{`\...`}` === `let` `{`\...`}` `in` *e* + +Curly brackets are obligatory in the `where` form, and can also be +optionally used in the `let` form. + +Since a block of definitions is treated as syntactic sugar for a nested +`let` expression, a constant must be defined before it is used: the +scope is not mutual, as in a module body. Furthermore, unlike in `lin` +and `oper` definitions, it is *not* possible to bind variables on the +left of the equality sign. + + +### Function applications in concrete syntax + +[]{#functionelimination} + +Fully compiled concrete syntax may not include expressions of function +types except on the outermost level of `lin` rules, as defined +[here](#linexpansion). However, in the source code, and especially in +`oper` definitions, functions are the main vehicle of code reuse and +abstraction. Thus function types and functions follow the same rules as +in abstract syntax, as specified [here](#functiontype). In particular, +the application of a lambda abstract is computed by beta conversion. + +To ensure the elimination of functions, GF uses a special computation +rule for pushing function applications inside tables, since otherwise +run-time variables could block their applications: + +(`table` `{`*p*~1~ `=>` *f*~1~ ; \... ; *p*~n~ `=>` *f*~n~ `}` `!` *e*) +*a* ==\> `table` `{`*p*~1~ `=>` *f*~1~ *a* ; \... ; *p*~n~ `=>` *f*~n~ +*a*`}` `!` *e* + +Also parameter constructors with non-empty contexts, as defined +[here](#paramjudgements), result in expressions in application form. +These expressions are never a problem if their arguments are just +constructors, because they can then be translated to integers +corresponding to the position of the expression in the enumaration of +the values of its type. However, a constructor applied to a run-time +variable may need to be converted as follows: + +*C*\...*x*\... ==\> `case` *x* of `{_ =>` *C*\...*x*`}` + +The resulting expression, when processed by table expansion as explained +[here](#tables), results in *C* being applied to just values of the type +of *x*, and the application thereby disappears. + + +### Reusing top-level grammars as resources + +[]{#reuse} + +*This section is valid for GF 3.0, which abandons the \"lock field\"* +*discipline of GF 2.8.* + +As explained [here](#openabstract), abstract syntax modules can be +opened as interfaces and concrete syntaxes as their instances. This +means that judgements are, as it were, translated in the following way: + +- `cat` *C* *G* ===\> `oper` *C* : `Type` +- `fun` *f* : *T* ===\> `oper` *f* : *T* +- `lincat` *C* = *T* ===\> `oper` *C* : `Type` = *C* +- `lin` *f* = *t* ===\> `oper` *f* = *t* + +Notice that the value *T* of `lincat` definitions is not disclosed in +the translation. This means that the type *C* remains abstract: the only +ways of building an object of type *C* are the operations *f* obtained +from *fun* and *lin* rules. + +The purpose of keeping linearization types abstract is to enforce +**grammar checking via type checking**. This means that any well-typed +operation application is also well-typed in the sense of the original +grammar. If the types were disclosed, then we could for instance easily +confuse all categories that have the linearization type `{s : Str}`. Yet +another reason is that revealing the types makes it impossible for the +library programmers to change their type definitions afterwards. + +Library writers may occasionally want to have access to the values of +linearization types. The way to make it possible is to add an extra +construction operation to a module in which the linearization type is +available: + + oper MkC : T -> C = \x -> x + +In object-oriented terms, the type *C* itself is **protected**, whereas +*MkC* is a **public constructor** of *C*. Of course, it is possible to +make these constructors overloaded (concept explained +[here](#overloading)), to enable easy access to special cases. + + +### Predefined concrete syntax types + +[]{#predefcnc} + +The following concrete syntax types are predefined: + +- `Str`, the type of tokens and token lists (defined [here](#strtype)) +- `Integer`, the type of nonnegative integers +- `Ints` *n*, the type of integers from *0* to *n* +- `Type`, the type of (concrete syntax) types +- `PType`, the type of parameter types + +The last two types are, in a way, extended by user-written grammars, +since new parameter types can be defined in the way shown +[here](#paramjudgements), and every paramater type is also a type. From +the point of view of the values of expressions, however, a `param` +declaration does not extend `PType`, since all parameter types get +compiled to initial segments of integers. + +Notice the difference between the concrete syntax types `Str` and +`Integer` on the one hand, and the abstract syntax categories `String` +and `Int`, on the other. As *concrete syntax* types, the latter are +treated in the same way as any reused categories: their objects can be +formed by using syntax trees (string and integer literals). + +*The type name* `Integer` *replaces in GF 3.0 the name* `Int`, *to avoid +confusion with the abstract syntax type and to be analogous* *with the* +`Str` *vs.* `String` *distinction.* + + +### Predefined concrete syntax operations + +The following predefined operations are defined in the resource module +`prelude/Predef.gf`. Their implementations are defined as a part of the +GF grammar compiler. + + ------------------------------------------------------------------------------------------------- + operation type explanation + -------------- --------------------------------- ------------------------------------------------ + `PBool` `PType` `PTrue | PFalse` + + `Error` `Type` the empty type + + `Int` `Type` the type of integers + + `Ints` `Integer -> Type` the type of integers from 0 to n + + `error` `Str -> Error` forms error message + + `length` `Str -> Int` length of string + + `drop` `Integer -> Str -> Str` drop prefix of length + + `take` `Integer -> Str -> Str` take prefix of length + + `tk` `Integer -> Str -> Str` drop suffix of length + + `dp` `Integer -> Str -> Str` take suffix of length + + `eqInt` `Integer -> Integer -> PBool` test if equal integers + + `lessInt` `Integer -> Integer -> PBool` test order of integers + + `plus` `Integer -> Integer -> Integer` add integers + + `eqStr` `Str -> Str -> PBool` test if equal strings + + `occur` `Str -> Str -> PBool` test if occurs as substring + + `occurs` `Str -> Str -> PBool` test if any char occurs + + `show` `(P : Type) -> P -> Str` convert param to string + + `read` `(P : Type) -> Str -> P` convert string to param + + `toStr` `(L : Type) -> L -> Str` find the \"first\" string + + `nonExist` `Str` a special token marking\ + non-existing morphological forms + + `BIND` `Str` a special token marking\ + that the surrounding tokens should not\ + be separated by space + + `SOFT_BIND` `Str` a special token marking\ + that the surrounding tokens may not\ + be separated by space + + `SOFT_SPACE` `Str` a special token marking\ + that the space between the surrounding tokens\ + is optional + + `CAPIT` `Str` a special token marking\ + that the first character in the next token\ + should be capitalized + + `ALL_CAPIT` `Str` a special token marking\ + that the next word should be\ + in all capital letters + ------------------------------------------------------------------------------------------------- + +Compilation eliminates these operations, and they may therefore not take +arguments that depend on run-time variables. + +The module `Predef` is included in the *opens* list of all modules, and +therefore does not need to be opened explicitly. + + +Flags and pragmas +----------------- + + +### Some flags and their values + +[]{#flagvalues} + +The flag `coding` in concrete syntax sets the **character encoding** +used in the grammar. Internally, GF uses unicode, and `.pgf` files are +always written in UTF8 encoding. The presence of the flag `coding=utf8` +prevents GF from encoding an already encoded file. + +The flag `startcat` in abstract syntax sets the default start category +for parsing, random generation, and any other grammar operation that +depends on category. Its legal values are the categories defined or +inherited in the abstract syntax. + + +### Compiler pragmas + +**Compiler pragmas** are a special form of comments prefixed with `--#`. +Currently GF interprets the following pragmas. + + pragma explanation + -------------- --------------------------------- + `-path=`PATH path list for searching modules + +For instance, the line + + --# -path=.:present:prelude:/home/aarne/GF/tmp + +in the top of `FILE.gf` causes the GF compiler, when invoked on +`FILE.gf`, to search through the current directory (`.`) and the +directories `present`, `prelude`, and `/home/aarne/GF/tmp`, in this +order. If a directory `DIR` is not found relative to the working +directory, `$(GF_LIB_PATH)/DIR` is searched. `$GF_LIB_PATH` can be a +colon-separated list of directories, in which case each directory in the +list contributes to the search path expansion. + + +Alternative grammar input formats +--------------------------------- + +While the GF language as specified in this document is the most +versatile and powerful way of writing GF grammars, there are several +other formats that a GF compiler may make available for users, either to +get started with small grammars or to semiautomatically convert grammars +from other formats to GF. Here are the ones supported by GF 2.8 and 3.0. + + +### Old GF without modules + +[]{#oldgf} + +Before GF compiler version 2.0, there was no module system, and all +kinds of judgement could be written in all files, without any headers. +This format is still available, and the compiler (version 2.8) detects +automatically if a file is in the current or the old format. However, +the old format is not recommended because of pure modularity and missing +separate compilation, and also because libraries are not available, +since the old and the new format cannot be mixed. With version 2.8, +grammars in the old format can be converted to modular grammar with the +command + + > import -o FILE.gf + +which rewrites the grammar divided into three files: an abstract, a +concrete, and a resource module. + + +### Context-free grammars + +A quick way to write a GF grammar is to use the context-free format, +also known as BNF. Files of this form are recognized by the suffix +`.cf`. Rules in these files have the form + +*Label* `.` *Cat* `::=` (*String* \| *Cat*)\* `;` + +where *Label* and *Cat* are identifiers and *String* quoted strings. + +There is a shortcut form generating labels automatically, + +*Cat* `::=` (*String* \| *Cat*)\* `;` + +In the shortcut form, vertical bars (`|`) can be used to give several +right-hand-sides at a time. An empty right-hand side means the singleton +of an empty sequence, and not an empty union. + +Just like old-style GF files (previous section), contex-free grammar +files can be converted to modular GF by using the `-o` option to the +compiler in GF 2.8. + + +### Extended BNF grammars + +Extended BNF (`FILE.ebnf`) goes one step further from the shortcut +notation of previous section. The rules have the form + +*Cat* `::=` *RHS* `;` + +where an *RHS* can be any regular expression built from quoted strings +and category symbols, in the following ways: + + RHS item explanation + ----------------- ------------------------ + *Cat* nonterminal + *String* terminal + *RHS* *RHS* sequence + *RHS* `|` *RHS* alternatives + *RHS* `?` optional + *RHS* `*` repetition + *RHS* `+` non-empty repetition\| + +Parentheses are used to override standard precedences, where `|` binds +weaker than sequencing, which binds weaker than the unary operations. + +The compiler generates not only labels, but also new categories +corresponding to the regular expression combinations actually in use. + +Just like `.cf` files (previous section), `.ebnf` files can be converted +to modular GF by using the `-o` option to the compiler in GF 2.8. + + +### Example-based grammars + +**Example-based grammars** (`.gfe`) provide a way to use resource +grammar libraries without having to know the names of functions in them. +The compiler works as a preprocessor, saving the result in a (`.gf`) +file, which can be compiled as usual. + +If a library is implemented as an abstract and concrete syntax, it can +be used for parsing. Calls of library functions can therefore be formed +by parsing strings in the library. GF has an expression format for this, + +`in` *C* *String* + +where *C* is the category in which to parse (it can be qualified by the +module name) and the string is the input to parser. Expressions of this +form are replaced by the syntax trees that result. These trees are +always type-correct. If several parses are found, all but the first one +are given in comments. + +Here is an example, from `GF/examples/animal/`: + + --# -resource=../../lib/present/LangEng.gfc + --# -path=.:present:prelude + + incomplete concrete QuestionsI of Questions = open Lang in { + lincat + Phrase = Phr ; + Entity = N ; + Action = V2 ; + lin + Who love_V2 man_N = in Phr "who loves men" ; + Whom man_N love_V2 = in Phr "whom does the man love" ; + Answer woman_N love_V2 man_N = in Phr "the woman loves men" ; + } + +The `resource` pragma shows the grammar that is used for parsing the +examples. + +Notice that the variables `love_V2`, `man_N`, etc, are actually +constants in the library. In the resulting rules, such as + + lin Whom = \man_N -> \love_V2 -> + PhrUtt NoPConj (UttQS (UseQCl TPres ASimul PPos + (QuestSlash whoPl_IP (SlashV2 (DetCN (DetSg (SgQuant + DefArt)NoOrd)(UseN man_N)) love_V2)))) NoVoc ; + +those constants are nonetheless treated as variables, following the +normal binding conventions, as stated [here](#renaming). + + +The grammar of GF +----------------- + +The following grammar is actually used in the parser of GF, although we +have omitted some obsolete rules still included in the parser for +backward compatibility reasons. + +This document was automatically generated by the *BNF-Converter*. It was +generated together with the lexer, the parser, and the abstract syntax +module, which guarantees that the document matches with the +implementation of the language (provided no hand-hacking has taken +place). + + +The lexical structure of GF +--------------------------- + + +### Identifiers + +Identifiers *Ident* are unquoted strings beginning with a letter, +followed by any combination of letters, digits, and the characters `_ '` +reserved words excluded. + + +### Literals + +Integer literals *Integer* are nonempty sequences of digits. + +String literals *String* have the form `"`*x*`"`}, where *x* is any +sequence of any characters except `"` unless preceded by `\`. + +Double-precision float literals *Double* have the structure indicated by +the regular expression `digit+ '.' digit+ ('e' ('-')? digit+)?` i.e. +two sequences of digits separated by a decimal point, optionally +followed by an unsigned or negative exponent. + + +### Reserved words and symbols + +The set of reserved words is the set of terminals appearing in the +grammar. Those reserved words that consist of non-letter characters are +called symbols, and they are treated in a different way from those that +are similar to identifiers. The lexer follows rules familiar from +languages like Haskell, C, and Java, including longest match and spacing +conventions. + +The reserved words used in GF are the following: + +- `PType` +- `Str` +- `Strs` +- `Type` +- `abstract` +- `case` +- `cat` +- `concrete` +- `data` +- `def` +- `flags` +- `fun` +- `in` +- `incomplete` +- `instance` +- `interface` +- `let` +- `lin` +- `lincat` +- `lindef` +- `linref` +- `of` +- `open` +- `oper` +- `param` +- `pre` +- `printname` +- `resource` +- `strs` +- `table` +- `transfer` +- `variants` +- `where` +- `with` + +The symbols used in GF are the following: + +- `;` +- `=` +- `:` +- `->` +- `{` +- `}` +- `**` +- `,` +- `(` +- `)` +- `[` +- `]` +- `-` +- `.` +- `|` +- `?` +- `<` +- `>` +- `@` +- `!` +- `*` +- `+` +- `++` +- `\` +- `=>` +- `_` +- `$` +- `/` + +### Comments + +Single-line comments begin with `--`. +Multiple-line comments are enclosed with `{-` and `-}`. + + +The syntactic structure of GF +----------------------------- + +Terminal appear as `code`. +The symbols **->** (production), **|** (union) and **eps** (empty rule) belong to the BNF notation. +All other symbols are non-terminals. + + --------------------- -------- ------------------------------------------------------------------------------------ + *Grammar* **->** *\[ModDef\]* + *\[ModDef\]* **->** **eps** + **|** *ModDef* *\[ModDef\]* + *ModDef* **->** *ModDef* `;` + **|** *ComplMod* *ModType* `=` *ModBody* + *ModType* **->** `abstract` *Ident* + **|** `resource` *Ident* + **|** `interface` *Ident* + **|** `concrete` *Ident* `of` *Ident* + **|** `instance` *Ident* `of` *Ident* + **|** `transfer` *Ident* `:` *Open* `->` *Open* + *ModBody* **->** *Extend* *Opens* `{` *\[TopDef\]* `}` + **|** *\[Included\]* + **|** *Included* `with` *\[Open\]* + **|** *Included* `with` *\[Open\]* `**` *Opens* `{` *\[TopDef\]* `}` + **|** *\[Included\]* `**` *Included* `with` *\[Open\]* + **|** *\[Included\]* `**` *Included* `with` *\[Open\]* `**` *Opens* `{` *\[TopDef\]* `}` + *\[TopDef\]* **->** **eps** + **|** *TopDef* *\[TopDef\]* + *Extend* **->** *\[Included\]* `**` + **|** **eps** + *\[Open\]* **->** **eps** + **|** *Open* + **|** *Open* `,` *\[Open\]* + *Opens* **->** **eps** + **|** `open` *\[Open\]* `in` + *Open* **->** *Ident* + **|** `(` *QualOpen* *Ident* `)` + **|** `(` *QualOpen* *Ident* `=` *Ident* `)` + *ComplMod* **->** **eps** + **|** `incomplete` + *QualOpen* **->** **eps** + *\[Included\]* **->** **eps** + **|** *Included* + **|** *Included* `,` *\[Included\]* + *Included* **->** *Ident* + **|** *Ident* `[` *\[Ident\]* `]` + **|** *Ident* `-` `[` *\[Ident\]* `]` + *Def* **->** *\[Name\]* `:` *Exp* + **|** *\[Name\]* `=` *Exp* + **|** *Name* *\[Patt\]* `=` *Exp* + **|** *\[Name\]* `:` *Exp* `=` *Exp* + *TopDef* **->** `cat` *\[CatDef\]* + **|** `fun` *\[FunDef\]* + **|** `data` *\[FunDef\]* + **|** `def` *\[Def\]* + **|** `data` *\[DataDef\]* + **|** `param` *\[ParDef\]* + **|** `oper` *\[Def\]* + **|** `lincat` *\[PrintDef\]* + **|** `lindef` *\[Def\]* + **|** `linref` *\[Def\]* + **|** `lin` *\[Def\]* + **|** `printname` `cat` *\[PrintDef\]* + **|** `printname` `fun` *\[PrintDef\]* + **|** `flags` *\[FlagDef\]* + *CatDef* **->** *Ident* *\[DDecl\]* + **|** `[` *Ident* *\[DDecl\]* `]` + **|** `[` *Ident* *\[DDecl\]* `]` `{` *Integer* `}` + *FunDef* **->** *\[Ident\]* `:` *Exp* + *DataDef* **->** *Ident* `=` *\[DataConstr\]* + *DataConstr* **->** *Ident* + **|** *Ident* `.` *Ident* + *\[DataConstr\]* **->** **eps** + **|** *DataConstr* + **|** *DataConstr* `|` *\[DataConstr\]* + *ParDef* **->** *Ident* `=` *\[ParConstr\]* + **|** *Ident* `=` `(` `in` *Ident* `)` + **|** *Ident* + *ParConstr* **->** *Ident* *\[DDecl\]* + *PrintDef* **->** *\[Name\]* `=` *Exp* + *FlagDef* **->** *Ident* `=` *Ident* + *\[Def\]* **->** *Def* `;` + **|** *Def* `;` *\[Def\]* + *\[CatDef\]* **->** *CatDef* `;` + **|** *CatDef* `;` *\[CatDef\]* + *\[FunDef\]* **->** *FunDef* `;` + **|** *FunDef* `;` *\[FunDef\]* + *\[DataDef\]* **->** *DataDef* `;` + **|** *DataDef* `;` *\[DataDef\]* + *\[ParDef\]* **->** *ParDef* `;` + **|** *ParDef* `;` *\[ParDef\]* + *\[PrintDef\]* **->** *PrintDef* `;` + **|** *PrintDef* `;` *\[PrintDef\]* + *\[FlagDef\]* **->** *FlagDef* `;` + **|** *FlagDef* `;` *\[FlagDef\]* + *\[ParConstr\]* **->** **eps** + **|** *ParConstr* + **|** *ParConstr* `|` *\[ParConstr\]* + *\[Ident\]* **->** *Ident* + **|** *Ident* `,` *\[Ident\]* + *Name* **->** *Ident* + **|** `[` *Ident* `]` + *\[Name\]* **->** *Name* + **|** *Name* `,` *\[Name\]* + *LocDef* **->** *\[Ident\]* `:` *Exp* + **|** *\[Ident\]* `=` *Exp* + **|** *\[Ident\]* `:` *Exp* `=` *Exp* + *\[LocDef\]* **->** **eps** + **|** *LocDef* + **|** *LocDef* `;` *\[LocDef\]* + *Exp6* **->** *Ident* + **|** *Sort* + **|** *String* + **|** *Integer* + **|** *Double* + **|** `?` + **|** `[` `]` + **|** `data` + **|** `[` *Ident* *Exps* `]` + **|** `[` *String* `]` + **|** `{` *\[LocDef\]* `}` + **|** `<` *\[TupleComp\]* `>` + **|** `<` *Exp* `:` *Exp* `>` + **|** `(` *Exp* `)` + *Exp5* **->** *Exp5* `.` *Label* + **|** *Exp6* + *Exp4* **->** *Exp4* *Exp5* + **|** `table` `{` *\[Case\]* `}` + **|** `table` *Exp6* `{` *\[Case\]* `}` + **|** `table` *Exp6* `[` *\[Exp\]* `]` + **|** `case` *Exp* `of` `{` *\[Case\]* `}` + **|** `variants` `{` *\[Exp\]* `}` + **|** `pre` `{` *Exp* `;` *\[Altern\]* `}` + **|** `strs` `{` *\[Exp\]* `}` + **|** *Ident* `@` *Exp6* + **|** *Exp5* + *Exp3* **->** *Exp3* `!` *Exp4* + **|** *Exp3* `*` *Exp4* + **|** *Exp3* `**` *Exp4* + **|** *Exp4* + *Exp1* **->** *Exp2* `+` *Exp1* + **|** *Exp2* + *Exp* **->** *Exp1* `++` *Exp* + **|** `\` *\[Bind\]* `->` *Exp* + **|** `\` `\` *\[Bind\]* `=>` *Exp* + **|** *Decl* `->` *Exp* + **|** *Exp3* `=>` *Exp* + **|** `let` `{` *\[LocDef\]* `}` `in` *Exp* + **|** `let` *\[LocDef\]* `in` *Exp* + **|** *Exp3* `where` `{` *\[LocDef\]* `}` + **|** `in` *Exp5* *String* + **|** *Exp1* + *Exp2* **->** *Exp3* + *\[Exp\]* **->** **eps** + **|** *Exp* + **|** *Exp* `;` *\[Exp\]* + *Exps* **->** **eps** + **|** *Exp6* *Exps* + *Patt2* **->** `_` + **|** *Ident* + **|** *Ident* `.` *Ident* + **|** *Integer* + **|** *Double* + **|** *String* + **|** `{` *\[PattAss\]* `}` + **|** `<` *\[PattTupleComp\]* `>` + **|** `(` *Patt* `)` + *Patt1* **->** *Ident* *\[Patt\]* + **|** *Ident* `.` *Ident* *\[Patt\]* + **|** *Patt2* `*` + **|** *Ident* `@` *Patt2* + **|** `-` *Patt2* + **|** *Patt2* + *Patt* **->** *Patt* `|` *Patt1* + **|** *Patt* `+` *Patt1* + **|** *Patt1* + *PattAss* **->** *\[Ident\]* `=` *Patt* + *Label* **->** *Ident* + **|** `$` *Integer* + *Sort* **->** `Type` + **|** `PType` + **|** `Str` + **|** `Strs` + *\[PattAss\]* **->** **eps** + **|** *PattAss* + **|** *PattAss* `;` *\[PattAss\]* + *\[Patt\]* **->** *Patt2* + **|** *Patt2* *\[Patt\]* + *Bind* **->** *Ident* + **|** `_` + *\[Bind\]* **->** **eps** + **|** *Bind* + **|** *Bind* `,` *\[Bind\]* + *Decl* **->** `(` *\[Bind\]* `:` *Exp* `)` + **|** *Exp4* + *TupleComp* **->** *Exp* + *PattTupleComp* **->** *Patt* + *\[TupleComp\]* **->** **eps** + **|** *TupleComp* + **|** *TupleComp* `,` *\[TupleComp\]* + *\[PattTupleComp\]* **->** **eps** + **|** *PattTupleComp* + **|** *PattTupleComp* `,` *\[PattTupleComp\]* + *Case* **->** *Patt* `=>` *Exp* + *\[Case\]* **->** *Case* + **|** *Case* `;` *\[Case\]* + *Altern* **->** *Exp* `/` *Exp* + *\[Altern\]* **->** **eps** + **|** *Altern* + **|** *Altern* `;` *\[Altern\]* + *DDecl* **->** `(` *\[Bind\]* `:` *Exp* `)` + **|** *Exp6* + *\[DDecl\]* **->** **eps** + **|** *DDecl* *\[DDecl\]* + --------------------- -------- ------------------------------------------------------------------------------------ -- cgit v1.2.3