From b3c302ca6fa99abaa5cbc3ed69f138aecc9d7e98 Mon Sep 17 00:00:00 2001 From: aarne Date: Tue, 1 Jun 2010 22:48:43 +0000 Subject: updated phrasebook doc --- examples/phrasebook/phrasebook.html | 466 ++++++++++++++++++++++++++++++++---- 1 file changed, 425 insertions(+), 41 deletions(-) (limited to 'examples/phrasebook/phrasebook.html') diff --git a/examples/phrasebook/phrasebook.html b/examples/phrasebook/phrasebook.html index fae61468a..2d36e5fc0 100644 --- a/examples/phrasebook/phrasebook.html +++ b/examples/phrasebook/phrasebook.html @@ -2,6 +2,7 @@ + MOLTO Multilingual Phrasebook

MOLTO Multilingual Phrasebook

@@ -10,6 +11,25 @@ Showcase for project FP7-ICT-247914, Deliverable D10.2.
+

+
+

+ + +

+
+


@@ -18,6 +38,8 @@ Showcase for project FP7-ICT-247914, Deliverable D10.2. History

-

Acknowledgements

+ +

Effort and cost

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LanguageGrammarian's language skillsGrammarian's GF skillsInformant used for developmentInformant used for testingUse of external toolsImpact of external toolsChanges on the resource grammarDevelopment time
Bulgarian######---?###
Catalan######---?##
Danish-###+++######
Dutch-###+++#####
English#####-+--_#
Finnish######---?###
French#####-+-?##
German####+++#######
Italian####---?####
Norwegian####+-+#####
Polish######+++####
Romanian######--+#######
Spanish###---?_##
Swedish#####-+-?-##
+ +

+Explanation on scores +

+ + + + + + + + + + + + + +

Example-based grammar writing prototype

+

+The figure presents the process of creating a Phrasebook using an example-based +approach for the language X, where X = {Danish, Dutch, German, Norwegian}. +

+

+ +

+ + +

+The time needed for preparing the configuration files for a grammar will not be needed +in the future, since the files are reusable for other applications. +The time for the second step can be saved if automatic tools, like Google translate +are used. This is only possible in languages with a simpler morphology and syntax +and large corpora available. +Good results were obtained for German and Dutch with Google translate, but for +languages like Romanian or Polish, which are both complex and lack enough resources, +the results are discouraging. +

+

+If the statistical oracle works well, the only step where the presence of a human +translator is needed is the evaluation and feedback step. An average of 4 hours per +round and 2 rounds were needed in average for the languages for which we performed +the experiment. It is possible that more effort is needed for more complex languages. +

+ +

Conclusions (tentative)

+

+The grammarian need not be a native speaker of the language. +

+

+For many languages, the grammarian need not even know the language - native informants are +enough. +

+

+However, evaluation by native speakers is necessary. +

+

+Correct and idiomatic translations are possible. +

+

+A typical development time was 2-3 person working days per language. +

+

+Google translate helps in bootstrapping grammars, but must be checked. +

+ + +

+Resource grammars should give some more support +

+ + + +

Acknowledgements

The Phrasebook has been built in the MOLTO project funded by the European Commission.

The authors are grateful to their native speaker informants helping to bootstrap and evaluate -the grammars: Richard Bubel, Grégoire Détrez, Michal Palka, Willard Rafnsson,... +the grammars: +Richard Bubel, +Grégoire Détrez, +Karin Keijzer, +Michał Pałka, +Willard Rafnsson, +Nick Smallbone.

- + -- cgit v1.2.3