| dc.description.abstract |
In the world we live in today, understanding language and its usage is very important. D ictionary has been
used for many years to translate texts or just to look up single words. It has become an essential part of our
modern life and is playing a vital role to increase communication among people. Many dictionaries are
available for nearly every combination of popular languages. They also often exist between language pairs
where one language is popular and the other is not, Bilingual dictionaries for uncommon languages are much
less likely to exist. Hence, the need to develop Bilingual dictionaries for uncommon languages like that of
Qebena becomes more noticeable.
Therefore, the aim of this study was modeling Qebena-English corpus based bilingual dictionary and
analyzing the output of the system with the objective of identifying the challenges that need to be tackled. The
development approach used in this study was corpus based statistical machine translation(SMT), which
allows training of translation model based on parallel corpora. Parallel corpora plays a fundamental role to
model bilingual dictionaries in SMT system. Today, there are quite a large number of parallel corpora
available for most of major languages; however, there does not seem to be as many available for the
Qebena-English language pair. Consequently, looking for parallel corpora for Qebena-English language pair
was the challenging task in this study. The only available parallel documents found for these languages pair
were Grammar of Qebena from council of South Nation Nationality People (SNNP) and some parts of Grade
4 Qebena student's text book from elementary school of Qebena woreda. The resulted parallel corpora of
5,144 sentences having 15,951 words was used for training, tuning and testing the translation system.
In order to measure the accuracy of the translation system, the experiment has been conducte d on
parallel Qebena-English data that are different from training data with 240 Qebena-English sentences
having 1,550 words. Accordingly, the BLEU score result was 32.49 %. A 6 % increase in BLEU score was
achieved by tuning the translation system. Due to the scarcity of resources for Qebena language, the
researcher could not get as many parallel documents as needed for the experiment. However, using a limited
corpus, the translation accuracy we achieved was not too low as compared to the systems built on a relativel y
sufficient amount of resource. Therefore, based on the result we obtained from the model, with the use of
large parallel Qebena-English document collections, it is possible to develop bilingual dictionary using
statistical machine translation approach for Qebena-English language pairs. |
en_US |