AFAAN OROMOO INFORMATION EXTRACTION SYSTEM

AMU IR Home
→
AMiT
→
Computer Science and Information Technology
→
View Item

dc.contributor.author	AMEHA GERO BERSISA
dc.date.accessioned	2019-11-14T11:32:13Z
dc.date.available	2019-11-14T11:32:13Z
dc.date.issued	2019-04
dc.identifier.uri	http://hdl.handle.net/123456789/1280
dc.description.abstract	In our today's digital world, the task of handling electronic sport new document overload is a critical issue. This text document is an essential source of most significant information with respect to selected sport news article. Hence, the use of automated text analyzing method for this domain is an essential and selective strategy while searching for this important information. An IE is a systematic method emerged to handle the process of analyzing and capturing such a significant information existing under the given text news document. As it has been stated by some recent studies, the explosion of Afaan Oromoo sport news text document as electronic form become increasing from period to period. Reading throughout this text document to capture and access most relevant information related the football news topic is a time-consuming, tedious and difficult task for the users. The main objective of this study is to develop automated information extraction system for Afaan Oromoo language text document using the supervised machine learning classification approach. The system extracts the most relevant football news information from the Afaan Oromoo sport news text document and it contains the training and prediction phases as core base. To implement the AOIES, the Afaan Oromoo sport news documents collected from the Radio Fana Share Company Afaan Oromoo broad casting service is used as training and testing corpus, the tokenization, normalization, stop word removal and regular expression methods and the machine learning Naïve Bayes classification algorithm are applied to train how to learn patterns. The standard precision, recall and F-score evaluation metrics are used to evaluate the text classification and IE model accuracy of the developed system prototype. While experimenting the proposed model with training and testing dataset, the 10-fold cross validation method is applied. The developed system classification module achieved 91.7% and the IE model 94.6% F-scores performance by correctly predicting the instances. The above result indicates the developed system prototype has scored promising performance by correctly predicting the instances using the Naïve Bayes classification algorithm. Generally, the evaluation result demonstrates that the machine learning classification algorithm can be adopted as information extraction method for the Afaan Oromoo text document.	en_US
dc.language.iso	en	en_US
dc.publisher	Arba minch University	en_US
dc.subject	Afaan Oromoo, Machine Learning, Naïve Bayes, Information Extraction	en_US
dc.title	AFAAN OROMOO INFORMATION EXTRACTION SYSTEM	en_US
dc.type	Thesis	en_US