| dc.contributor.author | Deresse Demeke Shallo | |
| dc.date.accessioned | 2016-07-26T06:34:29Z | |
| dc.date.available | 2016-07-26T06:34:29Z | |
| dc.date.issued | 2016-05 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/317 | |
| dc.description.abstract | Stemming is the conflation of the variant forms of a word into a single representation. The aim of this study is to develop a stemming algorithm for Gamo language text documents which conflates variants. Gamo language is an omotic language family a language which is spoken in Gamo Gofa zone in SNNPR and different locations of Ethiopia by Gamo ethnic groups. The language is used in different aspects of social activities and in academic areas as a means of instruction media in most part of Gamo Gofa zone. As far as the researcher know, no one has tried to develop stemming algorithm for Gamotho. There are different approaches to develop stemming algorithm for a certain language like affix removal, n-gram, table lookup and successor variety stemming. From those approaches the researcher have adopted porter stemmer for affix removal approach, n-gram and successor variety approaches by using python programming language for this research study. In order to develop and to test performance of the stemmers’ corpus having 62,716 word were prepared from various documents written in Gamo language, which discuss issues related to social, cultural, spiritual and political things. In order to test the performance of the stemmers’ test case with 1000 unique words were randomly selected from the corpus. The results of the stemmers adopted for the language show as porter stemmer has 81.1% accuracy rate, n-gram has 30% accuracy rate and successor variety stemmer 34.3% accuracy rate.The major sources of errors are also reported with possible recommendations to further improve the performance of the stemmer and also for further research. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Arbaminch University | en_US |
| dc.subject | stemming algorithm, n-gram, succor variety, table lookup, under-stemming, overstemming, text-ming, IR, NLP, conflation method. | en_US |
| dc.title | Adopting Stemming Algorithm for Gamotho Text Documents | en_US |
| dc.type | Thesis | en_US |