A Thesis Submitted to the School of Graduate Studies in Partial  Fulfillment of the Requirement for the Degree of Master of Science in  Computer Science

AMU IR Home
→
AMiT
→
Computer Science and Information Technology
→
View Item

A Thesis Submitted to the School of Graduate Studies in Partial Fulfillment of the Requirement for the Degree of Master of Science in Computer Science

Samuel Eto Salgedo

URI: http://hdl.handle.net/123456789/2592

Date: 2025

Abstract:

The rapid growth of large volumes of data on the web has made it increasingly challenging to extract relevant information efficiently. To address this issue, numerous information extraction tasks have been explored in the literature. One such task is information extraction for the Gamotho language, which aims to identify key information from large text collections, organize it chronologically, and answer questions about what happened in a specific situation and when it occurred. Unlike other information extraction tasks, such as entity extraction, there is a notable research gap in text information extraction (IE) for the Gamotho language. To date, no work has been conducted in this specific area. As the first comprehensive effort in this field, the researcher designed a model for extracting information from Gamotho texts. The model consists of several components, including general preprocessing, learning and classification, and Gamotho language information extraction. To develop the proposed model, different approaches were employed for each task. For the Gamotho language information extraction component, the researcher utilized a machine learning classifier that leverages syntactic features such as part-of-speech (POS) tagging, morphological analysis, and gazetteer lists. In practice, relying solely on a single information extraction method is challenging due to the limited availability of annotated or labeled data and the lack of linguistic resources. To overcome these limitations the strengths of machine learning approaches, the researcher developed a machine learning approach for Gamotho text information extraction. The researcher conducted various experiments for information extraction algorithms, using the Bi-LSTM with CRF, Support Vector Machines (SVM) and BERT (fine-tuned) for named entity recognition, relationship and event extraction for Gamoththo text. The researcher conducts all the experiments using the most commonly used method of training option percentage split, 70% is done. This means that out of a total of 600 sentence datasets, 70% (420) is for training and the rest 30% (180) are given for testing in the experiment. The overall performance of Bi-LSTM+CRF model with the training and testing set NER scored respectively is ( 80.5%and 80.9%); Relationship scored respectively is ( 85.7%, and 84.3%) and Event scored respectively is ( 80.2%, and 79.5%).

Show full item record

Files in this item

Name: Sami final defence ...

Size: 3.029Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Information Technology
Computer Science and Information Technology

A Thesis Submitted to the School of Graduate Studies in Partial Fulfillment of the Requirement for the Degree of Master of Science in Computer Science

A Thesis Submitted to the School of Graduate Studies in Partial Fulfillment of the Requirement for the Degree of Master of Science in Computer Science

Abstract:

Files in this item

This item appears in the following Collection(s)

Search AMU IR

Browse

All of DSpace

This Collection

My Account