summaryrefslogtreecommitdiff
path: root/doc/intro-resource.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/intro-resource.txt')
-rw-r--r--doc/intro-resource.txt511
1 files changed, 0 insertions, 511 deletions
diff --git a/doc/intro-resource.txt b/doc/intro-resource.txt
deleted file mode 100644
index c4c292fca..000000000
--- a/doc/intro-resource.txt
+++ /dev/null
@@ -1,511 +0,0 @@
-
-
-==Coverage==
-
-The GF Resource Grammar Library contains grammar rules for
-10 languages (in addition, 2 languages are available as incomplete
-implementations, and a few more are under construction). Its purpose
-is to make these rules available for application programmers,
-who can thereby concentrate on the semantic and stylistic
-aspects of their grammars, without having to think about
-grammaticality. The targeted level of application grammarians
-is that of a skilled programmer with
-a practical knowledge of the target languages, but without
-theoretical knowledge about their grammars.
-Such a combination of
-skills is typical of programmers who, for instance, want to localize
-software to new languages.
-
-The current resource languages are
-- ``Ara``bic (incomplete)
-- ``Cat``alan (incomplete)
-- ``Dan``ish
-- ``Eng``lish
-- ``Fin``nish
-- ``Fre``nch
-- ``Ger``man
-- ``Ita``lian
-- ``Nor``wegian
-- ``Rus``sian
-- ``Spa``nish
-- ``Swe``dish
-
-
-The first three letters (``Eng`` etc) are used in grammar module names.
-The incomplete Arabic and Catalan implementations are
-enough to be used in many applications; they both contain, amoung other
-things, complete inflectional morphology.
-
-
-
-==A first example==
-
-To give an example application, consider a system for steering
-music playing devices by voice commands. In the application,
-we may have a semantical category ``Kind``, examples
-of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song``
-is linearized into the noun "Lied", but knowing this is not
-enough to make the application work, because the noun must be
-produced in both singular and plural, and in four different
-cases. By using the resource grammar library, it is enough to
-write
-```
- lin Song = mkN "Lied" "Lieder" neuter
-```
-and the eight forms are correctly generated. The resource grammar
-library contains a complete set of inflectional paradigms (such as
-``mkN`` here), enabling the definition of any lexical items.
-
-The resource grammar library is not only about inflectional paradigms - it
-also has syntax rules. The music player application
-might also want to modify songs with properties, such as "American",
-"old", "good". The German grammar for adjectival modifications is
-particularly complex, because adjectives have to agree in gender,
-number, and case, and also depend on what determiner is used
-("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this
-variation is taken care of by the resource grammar function
-```
- mkCN : AP -> CN -> CN
-```
-(see the table in the end of this document for the list of all resource grammar
-functions). The resource grammar implementation of the rule adding properties
-to kinds is
-```
- lin PropKind kind prop = mkCN prop kind
-```
-given that
-```
- lincat Prop = AP
- lincat Kind = CN
-```
-The resource library API is devided into language-specific
-and language-independent parts. To put it roughly,
-- the lexicon API is language-specific
-- the syntax API is language-independent
-
-
-Thus, to render the above example in French instead of German, we need to
-pick a different linearization of ``Song``,
-```
- lin Song = mkN "chanson" feminine
-```
-But to linearize ``PropKind``, we can use the very same rule as in German.
-The resource function ``mkCN`` has different implementations in the two
-languages (e.g. a different word order in French),
-but the application programmer need not care about the difference.
-
-
-
-==Note on APIs==
-
-From version 1.1 onwards, the resource library is available via two
-APIs:
-- original ``fun`` and ``oper`` definitions
-- overloaded ``oper`` definitions
-
-
-Introducing overloading in GF version 2.7 has been a success in improving
-the accessibility of libraries. It has also created a layer of abstraction
-between the writers and users of libraries, and thereby makes the library
-easier to modify. We shall therefore use the overloaded API
-in this document. The original function names are mainly interesting
-for those who want to write or modify libraries.
-
-
-
-==A complete example==
-
-To summarize the example, and also give a template for a programmer to work on,
-here is the complete implementation of a small system with songs and properties.
-The abstract syntax defines a "domain ontology":
-```
- abstract Music = {
-
- cat
- Kind,
- Property ;
- fun
- PropKind : Kind -> Property -> Kind ;
- Song : Kind ;
- American : Property ;
- }
-```
-The concrete syntax is defined by a functor (parametrized module),
-independently of language, by opening
-two interfaces: the resource ``Syntax`` and an application lexicon.
-```
- incomplete concrete MusicI of Music =
- open Syntax, MusicLex in {
- lincat
- Kind = CN ;
- Property = AP ;
- lin
- PropKind k p = mkCN p k ;
- Song = mkCN song_N ;
- American = mkAP american_A ;
- }
-```
-The application lexicon ``MusicLex`` is an interface
-opening the resource category system ``Cat``.
-```
- interface MusicLex = Cat ** {
- oper
- song_N : N ;
- american_A : A ;
- }
-```
-It could also be an abstract syntax that extends ``Cat``, but
-this would limit the kind of constructions that are possible in
-the interface
-
-Each language has its own concrete syntax, which opens the
-inflectional paradigms module for that language:
-```
- interface MusicLexGer of MusicLex =
- CatGer ** open ParadigmsGer in {
- oper
- song_N = mkN "Lied" "Lieder" neuter ;
- american_A = mkA "amerikanisch" ;
- }
-
- interface MusicLexFre of MusicLex =
- CatFre ** open ParadigmsFre in {
- oper
- song_N = mkN "chanson" feminine ;
- american_A = mkA "américain" ;
- }
-```
-The top-level ``Music`` grammars are obtained by
-instantiating the two interfaces of ``MusicI``:
-```
- concrete MusicGer of Music = MusicI with
- (Syntax = SyntaxGer),
- (MusicLex = MusicLexGer) ;
-
- concrete MusicFre of Music = MusicI with
- (Syntax = SyntaxFre),
- (MusicLex = MusicLexFre) ;
-```
-Both of these files can use the same ``path``, defined as
-```
- --# -path=.:present:prelude
-```
-The ``present`` category contains the compiled resources, restricted to
-present tense; ``alltenses`` has the full resources.
-
-To localize the music player system to a new language,
-all that is needed is two modules,
-one implementing ``MusicLex`` and the other
-instantiating ``Music``. The latter is
-completely trivial, whereas the former one involves the choice of correct
-vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
-```
- instance MusicLexFin of MusicLex =
- CatFin ** open ParadigmsFin in {
- oper
- song_N = mkN "kappale" ;
- american_A = mkA "amerikkalainen" ;
- }
-
- concrete MusicFin of Music = MusicI with
- (Syntax = SyntaxFin),
- (MusicLex = MusicLexFin) ;
-```
-More work is of course needed if the language-independent linearizations in
-MusicI are not satisfactory for some language. The resource grammar guarantees
-that the linearizations are possible in all languages, in the sense of grammatical,
-but they might of course be inadequate for stylistic reasons. Assume,
-for the sake of argument, that adjectival modification does not sound good in
-English, but that a relative clause would be preferrable. One can then use
-restricted inheritance of the functor:
-```
- concrete MusicEng of Music =
- MusicI - [PropKind]
- with
- (Syntax = SyntaxEng),
- (MusicLex = MusicLexEng) **
- open SyntaxEng in {
- lin
- PropKind k p = mkCN k (mkRS (mkRCl which_RP (mkVP p))) ;
- }
-```
-The lexicon is as expected:
-```
- instance MusicLexEng of MusicLex =
- CatEng ** open ParadigmsEng in {
- oper
- song_N = mkN "song" ;
- american_A = mkA "American" ;
- }
-```
-
-
-==Lock fields==
-
-//This section is only relevant as a guide to error messages that have to do with lock fields, and can be skipped otherwise.//
-
-FIXME: this section may become obsolete.
-
-When the categories of the resource grammar are used
-in applications, a **lock field** is added to their linearization types.
-The lock field for a category ``C`` is a record field
-```
- lock_C : {}
-```
-with the only possible value
-```
- lock_C = <>
-```
-The lock field carries no information, but its presence
-makes the linearization type of ``C``
-unique, so that categories
-with the same implementation are not confused with each other.
-(This is inspired by the ``newtype`` discipline in Haskell.)
-
-For example, the lincats of adverbs and conjunctions are the same
-in ``CatEng`` (and therefore in ``GrammarEng``, which inherits it):
-```
- lincat Adv = {s : Str} ;
- lincat Conj = {s : Str} ;
-```
-But when these category symbols are used to denote their linearization
-types in an application, these definitions are translated to
-```
- oper Adv : Type = {s : Str ; lock_Adv : {}} ;
- oper Conj : Type = {s : Str} ; lock_Conj : {}} ;
-```
-In this way, the user of a resource grammar cannot confuse adverbs with
-conjunctions. In other words, the lock fields force the type checker
-to function as grammaticality checker.
-
-When the resource grammar is ``open``ed in an application grammar,
-and only functions from the resource are used in type-correct way, the
-lock fields are never seen (except possibly in type error messages).
-If an application grammarian has to write lock fields herself,
-it is a sign that the guarantees given by the resource grammar
-no longer hold. But since the resource may be incomplete, the
-application grammarian may occasionally have to provide the dummy
-values of lock fields (always ``<>``, the empty record).
-Here is an example:
-```
- mkUtt : Str -> Utt ;
- mkUtt s = {s = s ; lock_Utt = <>} ;
-```
-Currently, missing lock field produce warnings rather than errors,
-but this behaviour of GF may change in future.
-
-
-==Parsing with resource grammars?==
-
-The intended use of the resource grammar is as a library for writing
-application grammars. It is not designed for parsing e.g. newspaper text. There
-are several reasons why this is not practical:
-- Efficiency: the resource grammar uses complex data structures, in
-particular, discontinuous constituents, which make parsing slow and the
-parser size huge.
-- Completeness: the resource grammar does not necessarily cover all rules
-of the language - only enough many to be able to express everything
-in one way or another.
-- Lexicon: the resource grammar has a very small lexicon, only meant for test
-purposes.
-- Semantics: the resource grammar has very little semantic control, and may
-accept strange input or deliver strange interpretations.
-- Ambiguity: parsing in the resource grammar may return lots of results many
-of which are implausible.
-
-
-All of these problems should be solved in application grammars.
-The task of resource grammars is just to take care of low-level linguistic
-details such as inflection, agreement, and word order.
-
-It is for the same reasons that resource grammars are not adequate for translation.
-That the syntax API is implemented for different languages of course makes
-it possible to translate via it - but there is no guarantee of translation
-equivalence. Of course, the use of functor implementations such as ``MusicI``
-above only extends to those cases where the syntax API does give translation
-equivalence - but this must be seen as a limiting case, and bigger applications
-will often use only restricted inheritance of ``MusicI``.
-
-
-
-=To find rules in the resource grammar library=
-
-==Inflection paradigms==
-
-Inflection paradigms are defined separately for each language //L//
-in the module ``Paradigms``//L//. To test them, the command
-``cc`` (= ``compute_concrete``)
-can be used:
-```
- > i -retain german/ParadigmsGer.gf
-
- > cc mkN "Schlange"
- {
- s : Number => Case => Str = table Number {
- Sg => table Case {
- Nom => "Schlange" ;
- Acc => "Schlange" ;
- Dat => "Schlange" ;
- Gen => "Schlange"
- } ;
- Pl => table Case {
- Nom => "Schlangen" ;
- Acc => "Schlangen" ;
- Dat => "Schlangen" ;
- Gen => "Schlangen"
- }
- } ;
- g : Gender = Fem
- }
-```
-For the sake of convenience, every language implements these five paradigms:
-```
- oper
- mkN : Str -> N ; -- regular nouns
- mkA : Str -> A : -- regular adjectives
- mkV : Str -> V ; -- regular verbs
- mkPN : Str -> PN ; -- regular proper names
- mkV2 : V -> V2 ; -- direct transitive verbs
-```
-It is often possible to initialize a lexicon by just using these functions,
-and later revise it by using the more involved paradigms. For instance, in
-German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a
-Masculine noun with the plural form ``"Liede"``.
-The individual ``Paradigms`` modules
-tell what cases are covered by the regular heuristics.
-
-As a limiting case, one could even initialize the lexicon for a new language
-by copying the English (or some other already existing) lexicon. This would
-produce language with correct grammar but with content words directly borrowed from
-English - maybe not so strange in certain technical domains.
-
-
-
-==Syntax rules==
-
-Syntax rules should be looked for in the module ``Constructors``.
-Below this top-level module exposing overloaded constructors,
-there are around 10 abstract modules, each defining constructors for
-a group of one or more related categories. For instance, the module
-``Noun`` defines how to construct common nouns, noun phrases, and determiners.
-But these special modules are seldom or never needed by the users of the library.
-
-TODO: when are they needed?
-
-Browsing the libraries is helped by the gfdoc-generated HTML pages,
-whose LaTeX versions are included in the present document.
-
-
-==Special-purpose APIs==
-
-To give an analogy with the well-known type setting software, GF can be compared
-with TeX and the resource grammar library with LaTeX.
-Just like TeX frees the author
-from thinking about low-level problems of page layout, so GF frees the grammarian
-from writing parsing and generation algorithms. But quite a lot of knowledge of
-//how// to write grammars is still needed, and the resource grammar library helps
-GF grammarians in a way similar to how the LaTeX macro package helps TeX authors.
-
-But even LaTeX is often too detailed and low-level, and users are encouraged to
-develop their own macro packages. The same applies to GF resource grammars:
-the application grammarian might not need all the choices that the resource
-provides, but would prefer less writing and higher-level programming.
-To this end, application grammarians may want to write their own views on the
-resource grammar.
-
-
-==Browsing by the parser==
-
-A method alternative to browsing library documentation is
-to use the parser.
-Even though parsing is not an intended end-user application
-of resource grammars, it is a useful technique for application grammarians
-to browse the library. To find out which resource function implements
-a particular structure, one can just parse a string that exemplifies this
-structure. For instance, to find out how sentences are built using
-transitive verbs, write
-```
- > i english/LangEng.gf
-
- > p -cat=Cl "she loves him"
- PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
-```
-The parser returns original constructors, not overloaded ones. Overloaded
-constructors can be returned, so far with experimental heuristics, by using
-the grammar ``api/toplevel/OverLangEng.gf`` and a special flag:
-```
- > i api/toplevel/OverLangEng.gf
-
- > p -cat=Cl -overload "she loves him"
- mkCl (mkNP she_Pron) love_V2 (mkNP he_Pron)
-```
-Parsing with the English resource grammar has an acceptable speed, but
-with most languages it takes just too much resources even to build the
-parser. However, examples parsed in one language can always be linearized into
-other languages:
-```
- > i italian/LangIta.gf
-
- > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
- lo ama
-```
-Therefore, one can use the English parser to write an Italian grammar, and also
-to write a language-independent (incomplete) grammar. One can also parse strings
-that are bizarre in English but the intended way of expression in another language.
-For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
-This can be built by parsing "I have beer" in ``OverLangEng`` and then writing
-```
- lin IamHungry =
- let beer_N = mkN "fame" feminine
- in
- mkCl (mkNP i_Pron) have_V2 (mkNP massQuant beer_N)
-```
-which uses ``ParadigmsIta.mkN``.
-
-
-
-==Example-based grammar writing==
-
-The technique of parsing with the resource grammar can be used in GF source files,
-endowed with the suffix ``.gfe`` ("GF examples"). The suffix tells GF to preprocess
-the file by replacing all expressions of the form
-```
- in Module.Cat "example string"
-```
-by the syntax trees obtained by parsing "example string" in ``Cat`` in ``Module``.
-For instance,
-```
- lin IamHungry =
- let beer_N = mkN "fame" feminine
- in
- (in LangEng.Cl "I have beer") ;
-```
-will result in the rule displayed in the previous section. The normal binding rules
-of functional programming (and GF) guarantee that local bindings of identifiers
-take precedence over constants of the same forms. Thus it is also possible to
-linearize functions taking arguments in this way:
-```
- lin
- PropKind car_N old_A = in LangEng.CN "old car" ;
-```
-However, the technique of example-based grammar writing has some limitations:
-- Ambiguity. If a string has several parses, the first one is returned, and
-it may not be the intended one. The other parses are shown in a comment, from
-where they must/can be picked manually.
-- Lexicality. The arguments of a function must be atomic identifiers, and are thus
-not available for categories that have no lexical items.
-For instance, the ``PropKind`` rule above gives the result
-```
- lin
- PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ;
-```
-However, it is possible to write a special lexicon that gives atomic rules for
-all those categories that can be used as arguments, for instance,
-```
- fun
- cat_CN : CN ;
- old_AP : AP ;
-```
-and then use this lexicon instead of the standard one included in ``Lang``.
-
-