DEVELOPING PART-OF-SPEECH TAGGING MODEL  FOR GAMOTSTSO LANGUAGE

AMU IR Home
→
AMiT
→
Computer Science and Information Technology
→
View Item

DEVELOPING PART-OF-SPEECH TAGGING MODEL FOR GAMOTSTSO LANGUAGE

MIHIRETEAB THOMAS

URI: http://hdl.handle.net/123456789/1849

Date: 2012-10

Abstract:

To manipulate, analyze and process human language in a computer, it must be organized and structured in a way it understands. Part-of-speech (POS) tagging is one of the Natural Language Processing (NLP) applications. It is a task of labeling words with their appropriate Part-of-Speech tags. As far as the researcher's knowledge there is no POS tagging research conducted for Gamotstso language. But for local languages like Amharic, Afaan Oromoo, Wolayta, Kafi-noonoo, Tigrigna, and other languages a part-of-speech tagger is developed using different approaches. In this study, a part-of-speech tagger for Gamotstso language has been developed using a Hidden Markov Model and rule-based approach. The Viterbi algorithm for Hidden Markov Model and brill transformation-based error-driven learning for the rule-based approach was used with slight modifications in their modules based on the nature of the language. Natural Language Toolkit version 3.4.5 and Python 2.7 were used to implement the tagger model and conduct experimental analysis. Discussion with linguists and review on different works of literature were made to understand the morphological and grammatical structure of the language and to identify possible tagsets for the study. As a result, 25 tagsets were identified. 1346 sentences which are composed of 25,512 words with 6919 unique words are collected from news of FM 90.9 radio station and Gamotstso New Testament bible. The collected corpus has been split into training and testing corpus. Hence 80% of the corpus is used to train the tagger model and the remaining 20% is to test the performance of the tagger model. Both the Hidden Markov Model and rule-based taggers were trained and tested on the same data. As a result, Hidden Markov Model taggers: unigram, bigram, and trigram taggers achieved an accuracy of 89.6%, 90.6%, and 91.5% respectively and the rule-based taggers which use unigram, bigram, and trigram taggers as initial stage taggers achieved an accuracy of 91%, 91.5%, and 92.2% respectively. As shown in the performance analysis result that the rule-based taggers outperform the Hidden Markov Model taggers. To improve the performance of the taggers pre-prepared standard balanced corpus and standard tagsets were recommended for future work

Description:

DEVELOPING PART-OF-SPEECH TAGGING MODEL FOR GAMOTSTSO LANGUAGE

Show full item record

Files in this item

Name: POS for Gamo Language ...

Size: 1.771Mb

Format: PDF

Description: DEVELOPING PART-O ...

View/Open

This item appears in the following Collection(s)

Computer Science and Information Technology
Computer Science and Information Technology

DEVELOPING PART-OF-SPEECH TAGGING MODEL FOR GAMOTSTSO LANGUAGE

DEVELOPING PART-OF-SPEECH TAGGING MODEL FOR GAMOTSTSO LANGUAGE

Abstract:

Description:

Files in this item

This item appears in the following Collection(s)

Search AMU IR

Browse

All of DSpace

This Collection

My Account