summaryrefslogtreecommitdiff
path: root/doc/multimodal.html
blob: 9f2b4390263dcb84a9ddaab0d7c461f150e5ed33 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
<TITLE>Demonstrative Expressions and Multimodal Grammars</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<P ALIGN="center"><CENTER><H1>Demonstrative Expressions and Multimodal Grammars</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Mon Jan  9 20:29:45 2006
</FONT></CENTER>

<P></P>
<HR NOSHADE SIZE=1>
<P></P>
    <UL>
    <LI><A HREF="#toc1">Abstract</A>
    <LI><A HREF="#toc2">Multimodal grammars</A>
      <UL>
      <LI><A HREF="#toc3">Representing demonstratives in semantics and grammar</A>
      <LI><A HREF="#toc4">Asynchronous syntax in GF</A>
      <LI><A HREF="#toc5">Example multimodal grammar: abstract syntax</A>
      <LI><A HREF="#toc6">Digression: discontinuous constituents</A>
      <LI><A HREF="#toc7">From grammars to dialogue systems</A>
      </UL>
    <LI><A HREF="#toc8">Adding multimodality to a unimodal grammar</A>
      <UL>
      <LI><A HREF="#toc9">The multimodal conversion</A>
      <LI><A HREF="#toc10">An example of the conversion</A>
      <LI><A HREF="#toc11">Multimodal conversion combinators</A>
      </UL>
    <LI><A HREF="#toc12">Multimodal resource grammars</A>
      <UL>
      <LI><A HREF="#toc13">Resource grammar API</A>
      <LI><A HREF="#toc14">Multimodal API: functions for building demonstratives</A>
      <LI><A HREF="#toc15">Multimodal API: functions for building sentences and phrases</A>
      <LI><A HREF="#toc16">Language-independent implementation: examples</A>
      <LI><A HREF="#toc17">Multimodal API: interface to unimodal expressions</A>
      <LI><A HREF="#toc18">Instantiating multimodality to different languages</A>
      <LI><A HREF="#toc19">Language-independent reimplementation of TramDemo</A>
      <LI><A HREF="#toc20">The order problem</A>
      <LI><A HREF="#toc21">A recipe for using the resource library</A>
      </UL>
    </UL>

<P></P>
<HR NOSHADE SIZE=1>
<P></P>
<A NAME="toc1"></A>
<H2>Abstract</H2>
<P>
This document shows a method to write grammars
in which spoken utterances are accompanied by
pointing gestures. A computer application of such
grammars are <B>multimodal dialogue systems</B>, in
which the pointing gestures are performed by 
mouse clicks and movements.
</P>
<P>
After an introduction to the notions of
<B>demonstratives</B> and <B>integrated multimodality</B>,
we will show by a concrete example
how multimodal grammars can be written in GF
and how they can be used in dialogue systems.
The explanation is given in three stages:
</P>
<OL>
<LI>How to write a multimodal grammar by hand.
<LI>How to add multimodality to a unimodal grammar.
<LI>How to use a multimodal resource grammar.
</OL>

<A NAME="toc2"></A>
<H2>Multimodal grammars</H2>
<P>
<B>Demonstrative expressions</B> are an old idea. Such
expressions get their meaning from the context.
</P>
	<BLOCKQUOTE>
	    <I>This train</I> is faster than <I>that airplane</I>.
	</BLOCKQUOTE>
<P></P>
	<BLOCKQUOTE>
	    I want to go from <I>this place</I> to <I>this place</I>.
	</BLOCKQUOTE>
<P></P>
<P>
In particular, as in these examples, the meaning
can be obtained from accompanying pointing gestures.
</P>
<P>
Thus the meaning-bearing unit is neither the words nor the 
gestures alone, but their combination. Demonstratives
thus provide an example of <B>integrated multimodality</B>,
as opposed to parallel multimodality. In parallel
multimodality, speech and other modes of communication 
are just alternative ways to convey the same information.
</P>
<A NAME="toc3"></A>
<H3>Representing demonstratives in semantics and grammar</H3>
<P>
When formalizing the semantics of demonstratives, we can combine syntax with coordinates:
</P>
	<BLOCKQUOTE>
	    I want to go from this place to this place
	</BLOCKQUOTE>
<P></P>
<P>
is interpreted as something like
</P>
<PRE>
    want(I, go, this(place,(123,45)), this(place,(98,10))) 
</PRE>
<P>
Now, the same semantic value can be given in many ways, by performing
the clicks at different points of time in relation to the speech: 
</P>
	<BLOCKQUOTE>
	    I want to go from this place CLICK(123,45) to this place CLICK(98,10) 
	</BLOCKQUOTE>
<P></P>
	<BLOCKQUOTE>
	    I want to go from this place to this place CLICK(123,45) CLICK(98,10) 
	</BLOCKQUOTE>
<P></P>
	<BLOCKQUOTE>
	    CLICK(123,45) CLICK(98,10) I want to go from this place to this place 
	</BLOCKQUOTE>
<P></P>
<P>
How do we build the value compositionally in parsing?
Traditional parsing is sequential: its input is a string of tokens.
It works for demonstratives only if the pointing is adjacent to 
the spoken expression. In the actual input, the demonstrative word
can be separated from the accompanying click by other words. The two
can also be simultaneous. 
</P>
<A NAME="toc4"></A>
<H3>Asynchronous syntax in GF</H3>
<P>
What we need is a notion of <B>asynchronous parsing</B>, as opposed to
sequential parsing (where demonstrative words and clicks must be
adjacent). 
</P>
<P>
We can implement asynchronous parsin in GF by exploiting the generality
of <B>linearization types</B>. A linearization type is the type of
the <B>concrete syntax objects</B> assigned to semantic values.
What a GF grammar defines is a relation 
</P>
<PRE>
        abstract syntax trees  &lt;---&gt;  concrete syntax objects
</PRE>
<P>
When modelling context-free grammar in GF,
the concrete syntax objects are just strings. 
But they can be more structured objects as well - in general, they are
<B>records</B> of different kinds of objects. For example,
a demonstrative expression can be linearized into a record of two strings.
</P>
<PRE>
                                       {s = "this place" ;
    this place (coord 123 45)  &lt;---&gt;    p = "(123,45)"
                                       }
</PRE>
<P>
The record
</P>
<PRE>
    {s = "I want to go from this place to this place" ;
     p = "(123,45) (98,10"
    }
</PRE>
<P>
represents any combination of the sentence and the clicks, as long
as the clicks appear in this order.
</P>
<A NAME="toc5"></A>
<H3>Example multimodal grammar: abstract syntax</H3>
<P>
A simple example of a multimodal GF grammar is the one called
the Tram Demo grammar. It was written by Björn Bringert within
the TALK project as a part of a dialogue system that
deals with queries about tram timetables. The system interprets
a speech input in combination with mouse clicks on a digital map.
</P>
<P>
The abstract syntax of (a minimal fragment of) the Tram Demo
grammar is
</P>
<PRE>
  cat
    Input, Dep, Dest, Click ;
  fun
    GoFromTo    : Dep  -&gt; Dest -&gt; Input ; -- "I want to go from x to y"
    DepHere     : Click -&gt; Dep ;          -- "from here" with click
    DestHere    : Click -&gt; Dest ;         -- "to here" with click
  
    CCoord      : Int -&gt; Int -&gt; Click ;   -- click coordinates
</PRE>
<P>
An English concrete syntax of the grammar is
</P>
<PRE>
  lincat
    Input, Dep, Dest = {s : Str ; p : Str} ;
    Click            = {p : Str} ;
  
  lin
    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ;
    DepHere c     = {s = ["from here"]                  ; p = c.p} ;
    DestHere c    = {s = ["to here"]                    ; p = c.p} ;
  
    CCoord x y    = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ;
</PRE>
<P>
When the grammar is used in the actual system, standard parsing methods
are used for interpreting the integrated speech and click input.
Parsing appears on two levels: the speech input parsing
performed by the Nuance speech recognition program (without the clicks),
and the semantics-yielding parser sending input to the dialogue manager.
The latter parser just attaches the clicks to the speech input. The order
of the clicks is preserved, and the parser can hence associate each of
the clicks with proper demonstratives. Here is the grammar used in the
two parsing phases.
</P>
<PRE>
  cat
    Query,    -- whole content
    Speech ;  -- speech only
  fun
    QueryInput  : Input -&gt; Query ;   -- the whole content shown
    SpeechInput : Input -&gt; Speech ;  -- only the speech shown
  
  lincat
    Query, Speech = {s : Str} ;
  lin
    QueryInput i  = {s = i.s ++ ";" ++ i.p} ;
    SpeechInput i = {s = i.s} ;
</PRE>
<P></P>
<A NAME="toc6"></A>
<H3>Digression: discontinuous constituents</H3>
<P>
The GF representation of integrated multimodality is 
similar to the representation of <B>discontinous constituents</B>.
For instance, assume <I>has arrived</I> is a verb phrase in English,
which can be used both in declarative sentences and questions,
</P>
	<BLOCKQUOTE>
	she <I>has arrived</I>
	</BLOCKQUOTE>
<P></P>
	<BLOCKQUOTE>
	<I>has</I> she <I>arrived</I>
	</BLOCKQUOTE>
<P></P>
<P>
In the question, the two words are separated from each other. If
<I>has arrived</I> is a constituent of the question, it is thus discontinuous.
To represent such constituents in GF, records can be used:
we split verb phrases (<CODE>VP</CODE>) into a finite and infinitive part.
</P>
<PRE>
    lincat VP = {fin, inf : Str} ;
  
    lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ;
    lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ;
</PRE>
<P></P>
<A NAME="toc7"></A>
<H3>From grammars to dialogue systems</H3>
<P>
The general recipe for using GF when building dialogue systems
is to write a grammar with the following components:
</P>
<UL>
<LI>The abstract syntax defines the semantics (the "ontology")
  of the domain of the system.
<LI>The concrete syntaxes define alternative modes of input and output.
</UL>

<P>
The engineering advantages of this approach have to do partly with
the declarativity of the description, partly with the tools provided
by GF to derive different components of the system:
</P>
<UL>
<LI>The type checker guarantees that all the input and output
  modes match with the ontology.
<LI>The grammar compiler generates parsers for each input grammar
  and generators for each output grammar.
<LI>Translators between GF's abstract syntax and other ontology
  description languages enable communication with different 
  kinds of dialogue managers and cover e.g. Prolog terms and XML objects.
<LI>Translators from GF's concrete syntax to speech recognition formats
  make it possible to generate e.g. Nuance grammars and ATK language
  models. 
</UL>

<P>
An example of this process is Björn Bringert's TramDemo.
More recently, grammars have been integrated to the GoDiS dialogue
manager by Prolog representations of abstract syntax.
</P>
<A NAME="toc8"></A>
<H2>Adding multimodality to a unimodal grammar</H2>
<P>
This section gives a recipe for making any unimodal grammar
multimodal, by adding pointing gestures to chosen expressions. The recipe
guarantees that the resulting grammar remains semantically well-formed,
i.e. type correct.
</P>
<A NAME="toc9"></A>
<H3>The multimodal conversion</H3>
<P>
The <B>multimodal conversion</B> of a grammar consists of seven
steps, of which the first is always the same, the second
involves a decision, and the rest are derivative:
</P>
<OL>
<LI>Add the category <CODE>`Point`</CODE> with a standard linearization type.
<PRE>
    cat Point ;
    lincat Point = {point : Str} ;
</PRE>
<LI>(Decision) Decide which constructors are demonstrative, i.e. take
  a pointing gesture as an argument. Add a <CODE>Point`</CODE> as their last argument.
  The new type signatures for such constructors <I>d</I> have the form
<PRE>
     fun d : ... -&gt; Point -&gt; D 
</PRE>
<LI>(Derivative) Add a <CODE>point</CODE> field to the linearization type <I>L</I> of any
  demonstrative category <I>D</I>, i.e. a category that has at least one demonstrative
  constructor:
<PRE>
      lincat D = L ** {point : Str} ;
</PRE>
<LI>(Derivative) If some other category <I>C</I> has a constructor <I>d</I> that takes
  demonstratives as arguments, make it demonstrative by adding a <I>point</I> field
  to its linearization type.
<LI>(Derivative) Store the <CODE>point</CODE> field in the linearization <I>t</I> of any
  constructor <I>d</I> that has been made demonstrative:
<PRE>
      lin d x1 ... xn p = t x1 ... xn ** {point = p.point} ;
</PRE>
<LI>(Derivative) For each constructor <I>f</I> that takes demonstratives <I>D_1,...,D_n</I> 
  as arguments, collect the <I>point</I> fields of the arguments in the <I>point</I>
  field of the value:
<PRE>
    lin f x_1 ... x_m = 
      t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ;
</PRE>
  Make sure that the pointings <CODE>x_d1.point ... x_dn.point</CODE> are concatenated
  in the same order as the arguments appear in the <I>linearization</I> <I>t</I>,
  which is not necessarily the same as the abstract argument order.
<LI>(Derivative) To preserve type correctness, add an empty 
  <CODE>point</CODE> field to the linearization <I>t</I> of any
  constructor <I>c</I> of a demonstrative category:
<PRE>
      lin c x1 ... xn = t x1 ... xn ** {point = []} ;
</PRE>
</OL>

<A NAME="toc10"></A>
<H3>An example of the conversion</H3>
<P>
Start with a Tram Demo grammar with no demonstratives, but just 
tram stop names and the indexical <I>here</I> (interpreted as e.g. the user's 
standing place). 
</P>
<PRE>
  cat
    Input, Dep, Dest, Name ;
  fun
    GoFromTo    : Dep  -&gt; Dest -&gt; Input ;
    DepHere     : Dep ;                  
    DestHere    : Dest ;                 
    DepName     : Name -&gt; Dep ;          
    DestName    : Name -&gt; Dest ;         
  
    Almedal     : Name ;                 
</PRE>
<P>
A unimodal English concrete syntax of the grammar is
</P>
<PRE>
  lincat
    Input, Dep, Dest, Name = {s : Str} ;
  
  lin
    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s} ;
    DepHere       = {s = ["from here"]} ;
    DestHere      = {s = ["to here"]} ;
    DepName n     = {s = ["from"] ++ n.s} ;
    DestName n    = {s = ["to"] ++ n.s} ;
  
    Almedal       = {s = "Almedal"} ;
</PRE>
<P>
Let us follow the steps of the recipe.
</P>
<OL>
<LI>We add the category <CODE>Point</CODE> and its linearization type.
<LI>We decide that <CODE>DepHere</CODE> and <CODE>DestHere</CODE> involve a pointing gesture.
<LI>We add <CODE>point</CODE> to the linearization types of <CODE>Dep</CODE> and <CODE>Dest</CODE>.
<LI>Therefore, also add <CODE>point</CODE> to <CODE>Input</CODE>. (But <CODE>Name</CODE> remains unimodal.)
<LI>Add <CODE>p.point</CODE> to the linearizations of <CODE>DepHere</CODE> and <CODE>DestHere</CODE>.
<LI>Concatenate the points of the arguments of <CODE>GoFromTo</CODE>.
<LI>Add an empty <CODE>point</CODE> to <CODE>DepName</CODE> and <CODE>DestName</CODE>.
</OL>

<P>
In the resulting grammar, one category is added and 
two functions are changed in the abstract syntax (annotated by the step numbers):
</P>
<PRE>
  cat
    Point ;                                               -- 1
  fun
    DepHere     : Point -&gt; Dep ;                          -- 2
    DestHere    : Point -&gt; Dest ;                         -- 2
  
</PRE>
<P>
The concrete syntax in its entirety looks as follows 
</P>
<PRE>
  lincat
    Dep, Dest = {s : Str ; point : Str} ;                 -- 3    
    Input = {s : Str ; point : Str} ;                     -- 4
    Name = {s : Str} ;
    Point = {point : Str} ;                               -- 1
  lin
    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s ; -- 6
                     point = x.point ++ y.point
                    } ;
    DepHere p     = {s = ["from here"] ;                  -- 5
                     point = p.point
                    } ;
    DestHere p    = {s = ["to here"] :                    -- 5
                     point = p.point
                    } ;
    DepName n     = {s = ["from"] ++ n.s ;                -- 7
                     point = []
                    } ;
    DestName n    = {s = ["to"] ++ n.s ;                  -- 7
                     point = []
                    } ;
    Almedal       = {s = "Almedal"} ;
</PRE>
<P>
What we need in addition, to use the grammar in applications, are
</P>
<OL>
<LI>Constructors for <CODE>Point</CODE>, e.g. coordinate pairs.
<LI>Top-level categories, like <CODE>Query</CODE> and <CODE>Speech</CODE> in the original.
</OL>

<P>
But their proper place is probably in another grammar module, so that
the core Tram Demo grammar can be used in different systems e.g.
encoding clicks in different ways.
</P>
<A NAME="toc11"></A>
<H3>Multimodal conversion combinators</H3>
<P>
GF is a functional programming language, and we exploit this
by providing a set of combinators that makes the multimodal conversion easier
and clearer. We start with the type of sequences of pointing gestures.
</P>
<PRE>
      Point : Type = {point : Str} ;
</PRE>
<P>
To make a record type multimodal is to extend it with <CODE>Point</CODE>.
The record extension operator <CODE>**</CODE> is needed here.
</P>
<PRE>
      Dem   : Type -&gt; Type = \t -&gt; t ** Point ;
</PRE>
<P>
To construct, use, and concatenate pointings:
</P>
<PRE>
      mkPoint : Str -&gt; Point = \s -&gt; {point = s} ;
  
      noPoint : Point = mkPoint [] ;
  
      point   : Point -&gt; Str = \p -&gt; p.point ;
  
      concatPoint : (x,y : Point) -&gt; Point = \x,y -&gt; 
        mkPoint (point x ++ point y) ;
</PRE>
<P>
Finally, to add pointing to a record, with the limiting case of no demonstrative needed.
</P>
<PRE>
      mkDem : (t : Type) -&gt; t -&gt; Point -&gt; Dem t = \_,x,s -&gt; x ** s ;
  
      nonDem : (t : Type) -&gt; t -&gt; Dem t = \t,x -&gt; mkDem t x noPoint ;
</PRE>
<P>
Let us rewrite the Tram Demo grammar by using these combinators:
</P>
<PRE>
  oper
    SS : Type = {s : Str} ;
  lincat
    Input, Dep, Dest = Dem SS ; 
    Name = SS ;
  
  lin
    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s} ** 
                    concatPoint x y ;
    DepHere       = mkDem  SS {s = ["from here"]} ;
    DestHere      = mkDem  SS {s = ["to here"]} ;
    DepName n     = nonDem SS {s = ["from"] ++ n.s} ;
    DestName n    = nonDem SS {s = ["to"] ++ n.s} ;
  
    Almedal       = {s = "Almedal"} ;
</PRE>
<P>
The type synonym <CODE>SS</CODE> is introduced to make the combinator applications
concise. Notice the use of partial application in <CODE>DepHere</CODE> and
<CODE>DestHere</CODE>; an equivalent way to write is
</P>
<PRE>
    DepHere p     = mkDem  SS {s = ["from here"]} p ;
</PRE>
<P></P>
<A NAME="toc12"></A>
<H2>Multimodal resource grammars</H2>
<P>
The main advantage of using GF when building dialogue systems is
that various components of the system
can be automatically generated from GF grammars.
Writing these grammars, however, can still be a considerable
task. A case in point are multilingual systems:
how to localize e.g. a system built in a car to
the languages of all those customers to whom the
car is sold? This problem has been the main focus of
GF for some years, and the solution on which most work has been
done is the development of <B>resource grammar libraries</B>.
These libraries work in the same way as program libraries
in software engineering, enabling a division of labour
between linguists and domain experts.
</P>
<P>
One of the goals in the resource grammars of different
languages has been to provide a <B>language-independent API</B>,
which makes the same resource grammar functions available for
different languages. For instance, the categories
<CODE>S</CODE>, <CODE>NP</CODE>, and <CODE>VP</CODE> are available in all of the
10 languages currently supported, and so is the function
</P>
<PRE>
    PredVP : NP -&gt; VP -&gt; S
</PRE>
<P>
which corresponds to the rule <CODE>S -&gt; NP VP</CODE> in phrase
structure grammar. However, there are several levels of abstraction
between the function <CODE>PredVP</CODE> and the phrase structure rule,
because the rule is implemented in so different ways in different
languages. In particular, discontinuous constituents are needed in
various degrees to make the rule work in different languages.
</P>
<P>
Now, dealing with discontinuous constituents is one of the demanding
aspects of multilingual grammar writing that the resource grammar 
API is designed to hide. But the proposed treatment of integrated
multimodality is heavily dependent on similar things. What can we
do to make multimodal grammars easier to write (for different languages)?
There are two orthogonal answers:
</P>
<OL>
<LI>Use resource grammars to write a unimodal dialogue grammar and 
  then apply the multimodal
  conversion to manually chosen parts.
<LI>Use <B>multimodal resource grammars</B> to derive multimodal
  dialogue system grammars directly.
</OL>

<P>
The multimodal resource grammar library has been obtained from
the unimodal one by applying the multimodal conversion  manually.
In addition, the API has been simplified
by leaving out structures needed in written technical documents 
(the original application area of GF) but not in spoken dialogue.
</P>
<P>
In the following subsections, we will show a part of the
multimodal resource grammar API, limited to a fragment that
is needed to get the main ideas and to reimplement the
Tram Demo grammar. The reimplementation shows one more advantage
of the resource grammar approach: dialogue systems can be 
automatically instantiated to different languages.
</P>
<A NAME="toc13"></A>
<H3>Resource grammar API</H3>
<P>
The resource grammar API has three main kinds of entries:
</P>
<OL>
<LI>Language-independent linguistic structures (``linguistic ontology''), e.g.
<PRE>
    PredVP : NP -&gt; VP -&gt; S ;     -- "Mary helps him"
</PRE>
<LI>Language-specific syntax extensions, e.g. Swedish and German fronting
topicalization
<PRE>
    TopicObj : NP -&gt; VP -&gt; S ;   -- "honom hjälper Mary"
</PRE>
<LI>Language-specific lexical constructors, e.g. Germanic <I>Ablaut</I> patterns
<PRE>
    irregV : (sing,sang,sung : Str) -&gt; V ;
</PRE>
</OL>

<P>
The first two kinds of entries are <CODE>cat</CODE> and <CODE>fun</CODE> definitions
in an abstract syntax. The multimodal, restricted API has
e.g. the following categories. Their names are obtained from
the corresponding unimodal categories by prefixing <CODE>M</CODE>.
</P>
<PRE>
    MS ;     -- multimodal sentence or question
    MQS ;    -- multimodal wh question
    MImp ;   -- multimodal imperative
    MVP ;    -- multimodal verb phrase
    MNP ;    -- multimodal (demonstrative) noun phrase
    MAdv ;   -- multimodal (demonstrative) adverbial
  
    Point ;  -- pointing gesture
</PRE>
<P></P>
<A NAME="toc14"></A>
<H3>Multimodal API: functions for building demonstratives</H3>
<P>
Demonstrative pronouns can be used both as noun phrases and
as determiners.
</P>
<PRE>
      this_MNP    : Point -&gt; MNP ;        -- this
      thisDet_MNP : CN -&gt; Point -&gt; MNP ;  -- this car
</PRE>
<P>
There are also demonstrative adverbs, and prepositions give
a productive way to build more adverbs.
</P>
<PRE>
      here_MAdv      : Point -&gt; MAdv ;    -- here
      here7from_MAdv : Point -&gt; MAdv ;    -- from here
  
      MPrepNP : Prep -&gt; MNP -&gt; MAdv ;     -- in this car
</PRE>
<P></P>
<A NAME="toc15"></A>
<H3>Multimodal API: functions for building sentences and phrases</H3>
<P>
A handful of predication rules construct sentences, questions, and imperatives.
</P>
<PRE>
      MPredVP   : MNP -&gt; MVP -&gt; MS ;    -- this plane flies here
      MQPredVP  : MNP -&gt; MVP -&gt; MQS ;   -- does this plane fly here
      MQuestVP  : IP  -&gt; MVP -&gt; MQS ;   -- who flies here
      MImpVP    : MVP -&gt; MImp ;         -- fly here!
</PRE>
<P>
Verb phrases are constructed from verbs (inherited as such from
the unimodal API) by providing their complements.
</P>
<PRE>
      MUseV     : V   -&gt; MVP ;          -- flies
      MComplV2  : V2  -&gt; MNP -&gt; MVP ;   -- takes this
      MComplVV  : VV  -&gt; MVP -&gt; MVP ;   -- wants to take this
</PRE>
<P>
A multimodal adverb can be attached to a verb phrase.
</P>
<PRE>
      MAdvVP    : MVP -&gt; MAdv -&gt; MVP ;  -- flies here
</PRE>
<P></P>
<A NAME="toc16"></A>
<H3>Language-independent implementation: examples</H3>
<P>
The implementation makes heavy use of the multimodal conversion
combinators. It adds a <CODE>point</CODE> field to whatever the implementation of the unimodal
category is in any language. Thus, for example
</P>
<PRE>
    lincat
      MVP   = Dem VP ;
      MNP   = Dem NP ;
      MAdv  = Dem Adv ;
  
    lin 
      this_MNP = mkDem NP this_NP ;
      -- i.e. this_MNP p = this_NP ** {point = p.point} ;
  
      MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ;
  
      MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ;
</PRE>
<P></P>
<A NAME="toc17"></A>
<H3>Multimodal API: interface to unimodal expressions</H3>
<P>
Using nondemonstrative expressions as demonstratives:
</P>
<PRE>
      DemNP   : NP  -&gt; MNP ;
      DemAdv  : Adv -&gt; MAdv ;
</PRE>
<P>
Building top-level phrases:
</P>
<PRE>
      PhrMS   : Pol -&gt; MS   -&gt; Phr ;
      PhrMS   : Pol -&gt; MS   -&gt; Phr ;
      PhrMQS  : Pol -&gt; MQS  -&gt; Phr ;
      PhrMImp : Pol -&gt; MImp -&gt; Phr ;
</PRE>
<P></P>
<A NAME="toc18"></A>
<H3>Instantiating multimodality to different languages</H3>
<P>
The implementation above has only used the resource grammar API,
not the concrete implementations. The library <CODE>Demonstrative</CODE>
is a <B>parametrized module</B>, also called a <B>functor</B>, which
has the following structure
</P>
<PRE>
    incomplete concrete DemonstrativeI of Demonstrative = 
      Cat, TenseX ** open Test, Structural in {
      
      -- lincat and lin rules
  
      }
</PRE>
<P>
It can be <B>instantiated</B> to different languages as follows.
</P>
<PRE>
    concrete DemonstrativeEng of Demonstrative = 
      CatEng, TenseX ** DemonstrativeI with
        (Test = TestEng),
        (Structural = StructuralEng) ;
  
    concrete DemonstrativeSwe of Demonstrative = 
      CatSwe, TenseX ** DemonstrativeI with
        (Test = TestSwe),
        (Structural = StructuralSwe) ;
</PRE>
<P></P>
<A NAME="toc19"></A>
<H3>Language-independent reimplementation of TramDemo</H3>
<P>
Again using the functor idea, we reimplement <CODE>TramDemo</CODE>
as follows:
</P>
<PRE>
  incomplete concrete TramI of Tram = open Multimodal in {
  
  lincat
    Query = Phr ; Input = MS ; 
    Dep, Dest = MAdv ; Click = Point ;
  lin
    QInput = PhrMS PPos ;
  
    GoFromTo x y = 
      MPredVP (DemNP (UsePron i_Pron)) 
        (MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ;
  
    DepHere    = here7from_MAdv ;
    DestHere   = here7to_MAdv ;
    DepName s  = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
    DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
  
</PRE>
<P>
Then we can instantiate this to all languages for which
the <CODE>Multimodal</CODE> API has been implemented:
</P>
<PRE>
    concrete TramEng of Tram = TramI with 
      (Multimodal = MultimodalEng) ;
  
    concrete TramSwe of Tram = TramI with 
      (Multimodal = MultimodalSwe) ;
  
    concrete TramFre of Tram = TramI with 
      (Multimodal = MultimodalFre) ;
</PRE>
<P></P>
<A NAME="toc20"></A>
<H3>The order problem</H3>
<P>
It was pointed out in the section on the multimodal conversion that
the concrete word order may be different from the abstract one,
and vary between different languages. For instance, Swedish
topicalization
</P>
	<BLOCKQUOTE>
	 Det här tåget vill den här kunden inte ta.
	</BLOCKQUOTE>
<P></P>
<P>
(``this train, this customer doesn't want to take'') may well have
an abstract syntax of a form in which the customer appears 
before the train.
</P>
<P>
This is a problem for the implementor of the resource grammar.
It means that some parts of the resource must be written manually
and not as a functor.
However, the <I>user</I> of the resource can safely
ignore the word order problem, if it is correctly dealt with in
the resource.
</P>
<A NAME="toc21"></A>
<H3>A recipe for using the resource library</H3>
<P>
When starting to develop resource grammars, we believed they
would be all that
an application grammarian needs to write a concrete syntax.
However, experience has shown that it can be tough to start
grammar development in this way: selecting functions from
a resource API requires more abstract thinking than just
writing strings, and its take longer to reach testable
results. The most light-weight format is
maybe to start with context-free grammars (which notation is
also supported by GF). Context-free grammars that
give acceptable even though over-generating
results for languages like English are quick to produce.
</P>
<P>
The experience has led to the following
steps for grammar development. While giving the work
a quick start, this recipe
increases abstraction at a later level, when it is time to
to localize the grammar to different languages.
If context-free notation is used, steps 1 and 2 can 
be merged.
</P>
<OL>
<LI>Encode domain ontology in and abstract syntax, <CODE>Domain</CODE>.
<LI>Write a rough concrete syntax in English, <CODE>DomainRough</CODE>.
  This can be oversimplified and overgenerating. 
<LI>Reimplement by using the resource library, and build a functor <CODE>DomainI</CODE>.
  This can helped by <B>example-based grammar writing</B>, where
  the examples are generated from <CODE>DomainRough</CODE>.
<LI>Instantiate the functor <CODE>DomainI</CODE> to different languages, 
  and test the results by generating linearizations.
<LI>If some rule doesn't satisfy in some language, use the resource in
  a different way for that case (<B>compile-time transfer</B>).
</OL>


<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags -\-toc multimodal.txt -->
</BODY></HTML>