From e4e64c13a69db6505df499a0c3445ada9b1b2d88 Mon Sep 17 00:00:00 2001 From: aarne Date: Fri, 27 Jun 2008 11:27:00 +0000 Subject: more rm in doc --- doc/multimodal.html | 863 ---------------------------------------------------- 1 file changed, 863 deletions(-) delete mode 100644 doc/multimodal.html (limited to 'doc/multimodal.html') diff --git a/doc/multimodal.html b/doc/multimodal.html deleted file mode 100644 index 9f2b43902..000000000 --- a/doc/multimodal.html +++ /dev/null @@ -1,863 +0,0 @@ - - - - -Demonstrative Expressions and Multimodal Grammars - -

Demonstrative Expressions and Multimodal Grammars

- -Author: Aarne Ranta <aarne (at) cs.chalmers.se>
-Last update: Mon Jan 9 20:29:45 2006 -
- -

-
-

- - -

-
-

- -

Abstract

-

-This document shows a method to write grammars -in which spoken utterances are accompanied by -pointing gestures. A computer application of such -grammars are multimodal dialogue systems, in -which the pointing gestures are performed by -mouse clicks and movements. -

-

-After an introduction to the notions of -demonstratives and integrated multimodality, -we will show by a concrete example -how multimodal grammars can be written in GF -and how they can be used in dialogue systems. -The explanation is given in three stages: -

-
    -
  1. How to write a multimodal grammar by hand. -
  2. How to add multimodality to a unimodal grammar. -
  3. How to use a multimodal resource grammar. -
- - -

Multimodal grammars

-

-Demonstrative expressions are an old idea. Such -expressions get their meaning from the context. -

-
- This train is faster than that airplane. -
-

-
- I want to go from this place to this place. -
-

-

-In particular, as in these examples, the meaning -can be obtained from accompanying pointing gestures. -

-

-Thus the meaning-bearing unit is neither the words nor the -gestures alone, but their combination. Demonstratives -thus provide an example of integrated multimodality, -as opposed to parallel multimodality. In parallel -multimodality, speech and other modes of communication -are just alternative ways to convey the same information. -

- -

Representing demonstratives in semantics and grammar

-

-When formalizing the semantics of demonstratives, we can combine syntax with coordinates: -

-
- I want to go from this place to this place -
-

-

-is interpreted as something like -

-
-    want(I, go, this(place,(123,45)), this(place,(98,10))) 
-
-

-Now, the same semantic value can be given in many ways, by performing -the clicks at different points of time in relation to the speech: -

-
- I want to go from this place CLICK(123,45) to this place CLICK(98,10) -
-

-
- I want to go from this place to this place CLICK(123,45) CLICK(98,10) -
-

-
- CLICK(123,45) CLICK(98,10) I want to go from this place to this place -
-

-

-How do we build the value compositionally in parsing? -Traditional parsing is sequential: its input is a string of tokens. -It works for demonstratives only if the pointing is adjacent to -the spoken expression. In the actual input, the demonstrative word -can be separated from the accompanying click by other words. The two -can also be simultaneous. -

- -

Asynchronous syntax in GF

-

-What we need is a notion of asynchronous parsing, as opposed to -sequential parsing (where demonstrative words and clicks must be -adjacent). -

-

-We can implement asynchronous parsin in GF by exploiting the generality -of linearization types. A linearization type is the type of -the concrete syntax objects assigned to semantic values. -What a GF grammar defines is a relation -

-
-        abstract syntax trees  <--->  concrete syntax objects
-
-

-When modelling context-free grammar in GF, -the concrete syntax objects are just strings. -But they can be more structured objects as well - in general, they are -records of different kinds of objects. For example, -a demonstrative expression can be linearized into a record of two strings. -

-
-                                       {s = "this place" ;
-    this place (coord 123 45)  <--->    p = "(123,45)"
-                                       }
-
-

-The record -

-
-    {s = "I want to go from this place to this place" ;
-     p = "(123,45) (98,10"
-    }
-
-

-represents any combination of the sentence and the clicks, as long -as the clicks appear in this order. -

- -

Example multimodal grammar: abstract syntax

-

-A simple example of a multimodal GF grammar is the one called -the Tram Demo grammar. It was written by Björn Bringert within -the TALK project as a part of a dialogue system that -deals with queries about tram timetables. The system interprets -a speech input in combination with mouse clicks on a digital map. -

-

-The abstract syntax of (a minimal fragment of) the Tram Demo -grammar is -

-
-  cat
-    Input, Dep, Dest, Click ;
-  fun
-    GoFromTo    : Dep  -> Dest -> Input ; -- "I want to go from x to y"
-    DepHere     : Click -> Dep ;          -- "from here" with click
-    DestHere    : Click -> Dest ;         -- "to here" with click
-  
-    CCoord      : Int -> Int -> Click ;   -- click coordinates
-
-

-An English concrete syntax of the grammar is -

-
-  lincat
-    Input, Dep, Dest = {s : Str ; p : Str} ;
-    Click            = {p : Str} ;
-  
-  lin
-    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ;
-    DepHere c     = {s = ["from here"]                  ; p = c.p} ;
-    DestHere c    = {s = ["to here"]                    ; p = c.p} ;
-  
-    CCoord x y    = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ;
-
-

-When the grammar is used in the actual system, standard parsing methods -are used for interpreting the integrated speech and click input. -Parsing appears on two levels: the speech input parsing -performed by the Nuance speech recognition program (without the clicks), -and the semantics-yielding parser sending input to the dialogue manager. -The latter parser just attaches the clicks to the speech input. The order -of the clicks is preserved, and the parser can hence associate each of -the clicks with proper demonstratives. Here is the grammar used in the -two parsing phases. -

-
-  cat
-    Query,    -- whole content
-    Speech ;  -- speech only
-  fun
-    QueryInput  : Input -> Query ;   -- the whole content shown
-    SpeechInput : Input -> Speech ;  -- only the speech shown
-  
-  lincat
-    Query, Speech = {s : Str} ;
-  lin
-    QueryInput i  = {s = i.s ++ ";" ++ i.p} ;
-    SpeechInput i = {s = i.s} ;
-
-

- -

Digression: discontinuous constituents

-

-The GF representation of integrated multimodality is -similar to the representation of discontinous constituents. -For instance, assume has arrived is a verb phrase in English, -which can be used both in declarative sentences and questions, -

-
- she has arrived -
-

-
- has she arrived -
-

-

-In the question, the two words are separated from each other. If -has arrived is a constituent of the question, it is thus discontinuous. -To represent such constituents in GF, records can be used: -we split verb phrases (VP) into a finite and infinitive part. -

-
-    lincat VP = {fin, inf : Str} ;
-  
-    lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ;
-    lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ;
-
-

- -

From grammars to dialogue systems

-

-The general recipe for using GF when building dialogue systems -is to write a grammar with the following components: -

- - -

-The engineering advantages of this approach have to do partly with -the declarativity of the description, partly with the tools provided -by GF to derive different components of the system: -

- - -

-An example of this process is Björn Bringert's TramDemo. -More recently, grammars have been integrated to the GoDiS dialogue -manager by Prolog representations of abstract syntax. -

- -

Adding multimodality to a unimodal grammar

-

-This section gives a recipe for making any unimodal grammar -multimodal, by adding pointing gestures to chosen expressions. The recipe -guarantees that the resulting grammar remains semantically well-formed, -i.e. type correct. -

- -

The multimodal conversion

-

-The multimodal conversion of a grammar consists of seven -steps, of which the first is always the same, the second -involves a decision, and the rest are derivative: -

-
    -
  1. Add the category `Point` with a standard linearization type. -
    -    cat Point ;
    -    lincat Point = {point : Str} ;
    -
    -
  2. (Decision) Decide which constructors are demonstrative, i.e. take - a pointing gesture as an argument. Add a Point` as their last argument. - The new type signatures for such constructors d have the form -
    -     fun d : ... -> Point -> D 
    -
    -
  3. (Derivative) Add a point field to the linearization type L of any - demonstrative category D, i.e. a category that has at least one demonstrative - constructor: -
    -      lincat D = L ** {point : Str} ;
    -
    -
  4. (Derivative) If some other category C has a constructor d that takes - demonstratives as arguments, make it demonstrative by adding a point field - to its linearization type. -
  5. (Derivative) Store the point field in the linearization t of any - constructor d that has been made demonstrative: -
    -      lin d x1 ... xn p = t x1 ... xn ** {point = p.point} ;
    -
    -
  6. (Derivative) For each constructor f that takes demonstratives D_1,...,D_n - as arguments, collect the point fields of the arguments in the point - field of the value: -
    -    lin f x_1 ... x_m = 
    -      t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ;
    -
    - Make sure that the pointings x_d1.point ... x_dn.point are concatenated - in the same order as the arguments appear in the linearization t, - which is not necessarily the same as the abstract argument order. -
  7. (Derivative) To preserve type correctness, add an empty - point field to the linearization t of any - constructor c of a demonstrative category: -
    -      lin c x1 ... xn = t x1 ... xn ** {point = []} ;
    -
    -
- - -

An example of the conversion

-

-Start with a Tram Demo grammar with no demonstratives, but just -tram stop names and the indexical here (interpreted as e.g. the user's -standing place). -

-
-  cat
-    Input, Dep, Dest, Name ;
-  fun
-    GoFromTo    : Dep  -> Dest -> Input ;
-    DepHere     : Dep ;                  
-    DestHere    : Dest ;                 
-    DepName     : Name -> Dep ;          
-    DestName    : Name -> Dest ;         
-  
-    Almedal     : Name ;                 
-
-

-A unimodal English concrete syntax of the grammar is -

-
-  lincat
-    Input, Dep, Dest, Name = {s : Str} ;
-  
-  lin
-    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s} ;
-    DepHere       = {s = ["from here"]} ;
-    DestHere      = {s = ["to here"]} ;
-    DepName n     = {s = ["from"] ++ n.s} ;
-    DestName n    = {s = ["to"] ++ n.s} ;
-  
-    Almedal       = {s = "Almedal"} ;
-
-

-Let us follow the steps of the recipe. -

-
    -
  1. We add the category Point and its linearization type. -
  2. We decide that DepHere and DestHere involve a pointing gesture. -
  3. We add point to the linearization types of Dep and Dest. -
  4. Therefore, also add point to Input. (But Name remains unimodal.) -
  5. Add p.point to the linearizations of DepHere and DestHere. -
  6. Concatenate the points of the arguments of GoFromTo. -
  7. Add an empty point to DepName and DestName. -
- -

-In the resulting grammar, one category is added and -two functions are changed in the abstract syntax (annotated by the step numbers): -

-
-  cat
-    Point ;                                               -- 1
-  fun
-    DepHere     : Point -> Dep ;                          -- 2
-    DestHere    : Point -> Dest ;                         -- 2
-  
-
-

-The concrete syntax in its entirety looks as follows -

-
-  lincat
-    Dep, Dest = {s : Str ; point : Str} ;                 -- 3    
-    Input = {s : Str ; point : Str} ;                     -- 4
-    Name = {s : Str} ;
-    Point = {point : Str} ;                               -- 1
-  lin
-    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s ; -- 6
-                     point = x.point ++ y.point
-                    } ;
-    DepHere p     = {s = ["from here"] ;                  -- 5
-                     point = p.point
-                    } ;
-    DestHere p    = {s = ["to here"] :                    -- 5
-                     point = p.point
-                    } ;
-    DepName n     = {s = ["from"] ++ n.s ;                -- 7
-                     point = []
-                    } ;
-    DestName n    = {s = ["to"] ++ n.s ;                  -- 7
-                     point = []
-                    } ;
-    Almedal       = {s = "Almedal"} ;
-
-

-What we need in addition, to use the grammar in applications, are -

-
    -
  1. Constructors for Point, e.g. coordinate pairs. -
  2. Top-level categories, like Query and Speech in the original. -
- -

-But their proper place is probably in another grammar module, so that -the core Tram Demo grammar can be used in different systems e.g. -encoding clicks in different ways. -

- -

Multimodal conversion combinators

-

-GF is a functional programming language, and we exploit this -by providing a set of combinators that makes the multimodal conversion easier -and clearer. We start with the type of sequences of pointing gestures. -

-
-      Point : Type = {point : Str} ;
-
-

-To make a record type multimodal is to extend it with Point. -The record extension operator ** is needed here. -

-
-      Dem   : Type -> Type = \t -> t ** Point ;
-
-

-To construct, use, and concatenate pointings: -

-
-      mkPoint : Str -> Point = \s -> {point = s} ;
-  
-      noPoint : Point = mkPoint [] ;
-  
-      point   : Point -> Str = \p -> p.point ;
-  
-      concatPoint : (x,y : Point) -> Point = \x,y -> 
-        mkPoint (point x ++ point y) ;
-
-

-Finally, to add pointing to a record, with the limiting case of no demonstrative needed. -

-
-      mkDem : (t : Type) -> t -> Point -> Dem t = \_,x,s -> x ** s ;
-  
-      nonDem : (t : Type) -> t -> Dem t = \t,x -> mkDem t x noPoint ;
-
-

-Let us rewrite the Tram Demo grammar by using these combinators: -

-
-  oper
-    SS : Type = {s : Str} ;
-  lincat
-    Input, Dep, Dest = Dem SS ; 
-    Name = SS ;
-  
-  lin
-    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s} ** 
-                    concatPoint x y ;
-    DepHere       = mkDem  SS {s = ["from here"]} ;
-    DestHere      = mkDem  SS {s = ["to here"]} ;
-    DepName n     = nonDem SS {s = ["from"] ++ n.s} ;
-    DestName n    = nonDem SS {s = ["to"] ++ n.s} ;
-  
-    Almedal       = {s = "Almedal"} ;
-
-

-The type synonym SS is introduced to make the combinator applications -concise. Notice the use of partial application in DepHere and -DestHere; an equivalent way to write is -

-
-    DepHere p     = mkDem  SS {s = ["from here"]} p ;
-
-

- -

Multimodal resource grammars

-

-The main advantage of using GF when building dialogue systems is -that various components of the system -can be automatically generated from GF grammars. -Writing these grammars, however, can still be a considerable -task. A case in point are multilingual systems: -how to localize e.g. a system built in a car to -the languages of all those customers to whom the -car is sold? This problem has been the main focus of -GF for some years, and the solution on which most work has been -done is the development of resource grammar libraries. -These libraries work in the same way as program libraries -in software engineering, enabling a division of labour -between linguists and domain experts. -

-

-One of the goals in the resource grammars of different -languages has been to provide a language-independent API, -which makes the same resource grammar functions available for -different languages. For instance, the categories -S, NP, and VP are available in all of the -10 languages currently supported, and so is the function -

-
-    PredVP : NP -> VP -> S
-
-

-which corresponds to the rule S -> NP VP in phrase -structure grammar. However, there are several levels of abstraction -between the function PredVP and the phrase structure rule, -because the rule is implemented in so different ways in different -languages. In particular, discontinuous constituents are needed in -various degrees to make the rule work in different languages. -

-

-Now, dealing with discontinuous constituents is one of the demanding -aspects of multilingual grammar writing that the resource grammar -API is designed to hide. But the proposed treatment of integrated -multimodality is heavily dependent on similar things. What can we -do to make multimodal grammars easier to write (for different languages)? -There are two orthogonal answers: -

-
    -
  1. Use resource grammars to write a unimodal dialogue grammar and - then apply the multimodal - conversion to manually chosen parts. -
  2. Use multimodal resource grammars to derive multimodal - dialogue system grammars directly. -
- -

-The multimodal resource grammar library has been obtained from -the unimodal one by applying the multimodal conversion manually. -In addition, the API has been simplified -by leaving out structures needed in written technical documents -(the original application area of GF) but not in spoken dialogue. -

-

-In the following subsections, we will show a part of the -multimodal resource grammar API, limited to a fragment that -is needed to get the main ideas and to reimplement the -Tram Demo grammar. The reimplementation shows one more advantage -of the resource grammar approach: dialogue systems can be -automatically instantiated to different languages. -

- -

Resource grammar API

-

-The resource grammar API has three main kinds of entries: -

-
    -
  1. Language-independent linguistic structures (``linguistic ontology''), e.g. -
    -    PredVP : NP -> VP -> S ;     -- "Mary helps him"
    -
    -
  2. Language-specific syntax extensions, e.g. Swedish and German fronting -topicalization -
    -    TopicObj : NP -> VP -> S ;   -- "honom hjälper Mary"
    -
    -
  3. Language-specific lexical constructors, e.g. Germanic Ablaut patterns -
    -    irregV : (sing,sang,sung : Str) -> V ;
    -
    -
- -

-The first two kinds of entries are cat and fun definitions -in an abstract syntax. The multimodal, restricted API has -e.g. the following categories. Their names are obtained from -the corresponding unimodal categories by prefixing M. -

-
-    MS ;     -- multimodal sentence or question
-    MQS ;    -- multimodal wh question
-    MImp ;   -- multimodal imperative
-    MVP ;    -- multimodal verb phrase
-    MNP ;    -- multimodal (demonstrative) noun phrase
-    MAdv ;   -- multimodal (demonstrative) adverbial
-  
-    Point ;  -- pointing gesture
-
-

- -

Multimodal API: functions for building demonstratives

-

-Demonstrative pronouns can be used both as noun phrases and -as determiners. -

-
-      this_MNP    : Point -> MNP ;        -- this
-      thisDet_MNP : CN -> Point -> MNP ;  -- this car
-
-

-There are also demonstrative adverbs, and prepositions give -a productive way to build more adverbs. -

-
-      here_MAdv      : Point -> MAdv ;    -- here
-      here7from_MAdv : Point -> MAdv ;    -- from here
-  
-      MPrepNP : Prep -> MNP -> MAdv ;     -- in this car
-
-

- -

Multimodal API: functions for building sentences and phrases

-

-A handful of predication rules construct sentences, questions, and imperatives. -

-
-      MPredVP   : MNP -> MVP -> MS ;    -- this plane flies here
-      MQPredVP  : MNP -> MVP -> MQS ;   -- does this plane fly here
-      MQuestVP  : IP  -> MVP -> MQS ;   -- who flies here
-      MImpVP    : MVP -> MImp ;         -- fly here!
-
-

-Verb phrases are constructed from verbs (inherited as such from -the unimodal API) by providing their complements. -

-
-      MUseV     : V   -> MVP ;          -- flies
-      MComplV2  : V2  -> MNP -> MVP ;   -- takes this
-      MComplVV  : VV  -> MVP -> MVP ;   -- wants to take this
-
-

-A multimodal adverb can be attached to a verb phrase. -

-
-      MAdvVP    : MVP -> MAdv -> MVP ;  -- flies here
-
-

- -

Language-independent implementation: examples

-

-The implementation makes heavy use of the multimodal conversion -combinators. It adds a point field to whatever the implementation of the unimodal -category is in any language. Thus, for example -

-
-    lincat
-      MVP   = Dem VP ;
-      MNP   = Dem NP ;
-      MAdv  = Dem Adv ;
-  
-    lin 
-      this_MNP = mkDem NP this_NP ;
-      -- i.e. this_MNP p = this_NP ** {point = p.point} ;
-  
-      MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ;
-  
-      MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ;
-
-

- -

Multimodal API: interface to unimodal expressions

-

-Using nondemonstrative expressions as demonstratives: -

-
-      DemNP   : NP  -> MNP ;
-      DemAdv  : Adv -> MAdv ;
-
-

-Building top-level phrases: -

-
-      PhrMS   : Pol -> MS   -> Phr ;
-      PhrMS   : Pol -> MS   -> Phr ;
-      PhrMQS  : Pol -> MQS  -> Phr ;
-      PhrMImp : Pol -> MImp -> Phr ;
-
-

- -

Instantiating multimodality to different languages

-

-The implementation above has only used the resource grammar API, -not the concrete implementations. The library Demonstrative -is a parametrized module, also called a functor, which -has the following structure -

-
-    incomplete concrete DemonstrativeI of Demonstrative = 
-      Cat, TenseX ** open Test, Structural in {
-      
-      -- lincat and lin rules
-  
-      }
-
-

-It can be instantiated to different languages as follows. -

-
-    concrete DemonstrativeEng of Demonstrative = 
-      CatEng, TenseX ** DemonstrativeI with
-        (Test = TestEng),
-        (Structural = StructuralEng) ;
-  
-    concrete DemonstrativeSwe of Demonstrative = 
-      CatSwe, TenseX ** DemonstrativeI with
-        (Test = TestSwe),
-        (Structural = StructuralSwe) ;
-
-

- -

Language-independent reimplementation of TramDemo

-

-Again using the functor idea, we reimplement TramDemo -as follows: -

-
-  incomplete concrete TramI of Tram = open Multimodal in {
-  
-  lincat
-    Query = Phr ; Input = MS ; 
-    Dep, Dest = MAdv ; Click = Point ;
-  lin
-    QInput = PhrMS PPos ;
-  
-    GoFromTo x y = 
-      MPredVP (DemNP (UsePron i_Pron)) 
-        (MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ;
-  
-    DepHere    = here7from_MAdv ;
-    DestHere   = here7to_MAdv ;
-    DepName s  = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
-    DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
-  
-
-

-Then we can instantiate this to all languages for which -the Multimodal API has been implemented: -

-
-    concrete TramEng of Tram = TramI with 
-      (Multimodal = MultimodalEng) ;
-  
-    concrete TramSwe of Tram = TramI with 
-      (Multimodal = MultimodalSwe) ;
-  
-    concrete TramFre of Tram = TramI with 
-      (Multimodal = MultimodalFre) ;
-
-

- -

The order problem

-

-It was pointed out in the section on the multimodal conversion that -the concrete word order may be different from the abstract one, -and vary between different languages. For instance, Swedish -topicalization -

-
- Det här tåget vill den här kunden inte ta. -
-

-

-(``this train, this customer doesn't want to take'') may well have -an abstract syntax of a form in which the customer appears -before the train. -

-

-This is a problem for the implementor of the resource grammar. -It means that some parts of the resource must be written manually -and not as a functor. -However, the user of the resource can safely -ignore the word order problem, if it is correctly dealt with in -the resource. -

- -

A recipe for using the resource library

-

-When starting to develop resource grammars, we believed they -would be all that -an application grammarian needs to write a concrete syntax. -However, experience has shown that it can be tough to start -grammar development in this way: selecting functions from -a resource API requires more abstract thinking than just -writing strings, and its take longer to reach testable -results. The most light-weight format is -maybe to start with context-free grammars (which notation is -also supported by GF). Context-free grammars that -give acceptable even though over-generating -results for languages like English are quick to produce. -

-

-The experience has led to the following -steps for grammar development. While giving the work -a quick start, this recipe -increases abstraction at a later level, when it is time to -to localize the grammar to different languages. -If context-free notation is used, steps 1 and 2 can -be merged. -

-
    -
  1. Encode domain ontology in and abstract syntax, Domain. -
  2. Write a rough concrete syntax in English, DomainRough. - This can be oversimplified and overgenerating. -
  3. Reimplement by using the resource library, and build a functor DomainI. - This can helped by example-based grammar writing, where - the examples are generated from DomainRough. -
  4. Instantiate the functor DomainI to different languages, - and test the results by generating linearizations. -
  5. If some rule doesn't satisfy in some language, use the resource in - a different way for that case (compile-time transfer). -
- - - - - -- cgit v1.2.3