A Thesis Submitted to the School of Graduate Studies in Partial Fulfillment of the Requirement for the Degree of Master of Science in Computer Science

Show simple item record

dc.contributor.author Samuel Eto Salgedo
dc.date.accessioned 2025-10-23T07:01:43Z
dc.date.available 2025-10-23T07:01:43Z
dc.date.issued 2025
dc.identifier.uri http://hdl.handle.net/123456789/2592
dc.description.abstract The rapid growth of large volumes of data on the web has made it increasingly challenging to extract relevant information efficiently. To address this issue, numerous information extraction tasks have been explored in the literature. One such task is information extraction for the Gamotho language, which aims to identify key information from large text collections, organize it chronologically, and answer questions about what happened in a specific situation and when it occurred. Unlike other information extraction tasks, such as entity extraction, there is a notable research gap in text information extraction (IE) for the Gamotho language. To date, no work has been conducted in this specific area. As the first comprehensive effort in this field, the researcher designed a model for extracting information from Gamotho texts. The model consists of several components, including general preprocessing, learning and classification, and Gamotho language information extraction. To develop the proposed model, different approaches were employed for each task. For the Gamotho language information extraction component, the researcher utilized a machine learning classifier that leverages syntactic features such as part-of-speech (POS) tagging, morphological analysis, and gazetteer lists. In practice, relying solely on a single information extraction method is challenging due to the limited availability of annotated or labeled data and the lack of linguistic resources. To overcome these limitations the strengths of machine learning approaches, the researcher developed a machine learning approach for Gamotho text information extraction. The researcher conducted various experiments for information extraction algorithms, using the Bi-LSTM with CRF, Support Vector Machines (SVM) and BERT (fine-tuned) for named entity recognition, relationship and event extraction for Gamoththo text. The researcher conducts all the experiments using the most commonly used method of training option percentage split, 70% is done. This means that out of a total of 600 sentence datasets, 70% (420) is for training and the rest 30% (180) are given for testing in the experiment. The overall performance of Bi-LSTM+CRF model with the training and testing set NER scored respectively is ( 80.5%and 80.9%); Relationship scored respectively is ( 85.7%, and 84.3%) and Event scored respectively is ( 80.2%, and 79.5%). en_US
dc.language.iso en en_US
dc.subject Information extraction, machine learning, Machine learning approach, Gamoththo text information extraction en_US
dc.title A Thesis Submitted to the School of Graduate Studies in Partial Fulfillment of the Requirement for the Degree of Master of Science in Computer Science en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AMU IR


Advanced Search

Browse

My Account