| dc.description.abstract |
In our today's digital world, the task of handling electronic sport new document overload is a
critical issue. This text document is an essential source of most significant information with respect
to selected sport news article. Hence, the use of automated text analyzing method for this domain
is an essential and selective strategy while searching for this important information. An IE is a
systematic method emerged to handle the process of analyzing and capturing such a significant
information existing under the given text news document. As it has been stated by some recent
studies, the explosion of Afaan Oromoo sport news text document as electronic form become
increasing from period to period. Reading throughout this text document to capture and access
most relevant information related the football news topic is a time-consuming, tedious and difficult
task for the users. The main objective of this study is to develop automated information extraction
system for Afaan Oromoo language text document using the supervised machine learning
classification approach. The system extracts the most relevant football news information from the
Afaan Oromoo sport news text document and it contains the training and prediction phases as core
base. To implement the AOIES, the Afaan Oromoo sport news documents collected from the
Radio Fana Share Company Afaan Oromoo broad casting service is used as training and testing
corpus, the tokenization, normalization, stop word removal and regular expression methods and
the machine learning Naïve Bayes classification algorithm are applied to train how to learn
patterns. The standard precision, recall and F-score evaluation metrics are used to evaluate the text
classification and IE model accuracy of the developed system prototype. While experimenting the
proposed model with training and testing dataset, the 10-fold cross validation method is applied.
The developed system classification module achieved 91.7% and the IE model 94.6% F-scores
performance by correctly predicting the instances. The above result indicates the developed system
prototype has scored promising performance by correctly predicting the instances using the Naïve
Bayes classification algorithm. Generally, the evaluation result demonstrates that the machine
learning classification algorithm can be adopted as information extraction method for the Afaan
Oromoo text document. |
en_US |