MODELING QEBENA-ENGLISH CORPUS BASED BILINGUAL DICTIONARY: USING  STATISTICAL MACHINE TRANSLATION

AMU IR Home
→
AMiT
→
Computer Science and Information Technology
→
View Item

dc.contributor.author	Gragn Kedir Nassir
dc.date.accessioned	2016-05-31T06:22:13Z
dc.date.available	2016-05-31T06:22:13Z
dc.date.issued	2014-09
dc.identifier.uri	http://hdl.handle.net/123456789/294
dc.description.abstract	In the world we live in today, understanding language and its usage is very important. D ictionary has been used for many years to translate texts or just to look up single words. It has become an essential part of our modern life and is playing a vital role to increase communication among people. Many dictionaries are available for nearly every combination of popular languages. They also often exist between language pairs where one language is popular and the other is not, Bilingual dictionaries for uncommon languages are much less likely to exist. Hence, the need to develop Bilingual dictionaries for uncommon languages like that of Qebena becomes more noticeable. Therefore, the aim of this study was modeling Qebena-English corpus based bilingual dictionary and analyzing the output of the system with the objective of identifying the challenges that need to be tackled. The development approach used in this study was corpus based statistical machine translation(SMT), which allows training of translation model based on parallel corpora. Parallel corpora plays a fundamental role to model bilingual dictionaries in SMT system. Today, there are quite a large number of parallel corpora available for most of major languages; however, there does not seem to be as many available for the Qebena-English language pair. Consequently, looking for parallel corpora for Qebena-English language pair was the challenging task in this study. The only available parallel documents found for these languages pair were Grammar of Qebena from council of South Nation Nationality People (SNNP) and some parts of Grade 4 Qebena student's text book from elementary school of Qebena woreda. The resulted parallel corpora of 5,144 sentences having 15,951 words was used for training, tuning and testing the translation system. In order to measure the accuracy of the translation system, the experiment has been conducte d on parallel Qebena-English data that are different from training data with 240 Qebena-English sentences having 1,550 words. Accordingly, the BLEU score result was 32.49 %. A 6 % increase in BLEU score was achieved by tuning the translation system. Due to the scarcity of resources for Qebena language, the researcher could not get as many parallel documents as needed for the experiment. However, using a limited corpus, the translation accuracy we achieved was not too low as compared to the systems built on a relativel y sufficient amount of resource. Therefore, based on the result we obtained from the model, with the use of large parallel Qebena-English document collections, it is possible to develop bilingual dictionary using statistical machine translation approach for Qebena-English language pairs.	en_US
dc.language.iso	en	en_US
dc.publisher	ARBA MINCH UNIVERSITY	en_US
dc.subject	Communication, Bilingual dictionary, Parallel Corpora, Statistical Machine Translation	en_US
dc.title	MODELING QEBENA-ENGLISH CORPUS BASED BILINGUAL DICTIONARY: USING STATISTICAL MACHINE TRANSLATION	en_US
dc.type	Thesis	en_US