Abstract:
Amharic is an indigenous Ethiopic script that follows a unique syllabic writing system
adopted from an ancient Geez script. The Ethiopic script used by Amharic has about 317
different symbols of which 238 basic characters, 50 labialized, 20 numeric, and 9 punctuation
marks. Recently Optical Character Recognition for the Amharic Script has become an area
of research interest because there is a bulk of handwritten Amharic documents available in
libraries, information centers, museums, and offices. The digitization of these manuscripts
allows existing language technologies to be used to local information demands and advances.
Limited research works have been made for handwriting character recognition of Amharic
scripts but most of them use a dataset that is composed of text characters only, not including
digit and punctuation mark scripts. A fully handwriting character dataset for Amharic scripts
which include all text, digit, and punctuation marks is not available. As a result, doing
complete handwriting character recognition at this level is very challenging and time
consuming. So, this research will concern on handwriting digit and punctuation mark scripts
only. In this research work, we develop a model for recognizing digit and punctuation mark
scripts so that future researchers can integrate this research with previously done handwriting
text character recognition and generate the complete handwriting character recognition for
the Amharic language. For this research work, 200(two hundred) different handwritten
Amharic digit and punctuation marks for each character (20 numerals and 9 punctuation
marks) which is a total of 5800 (200x29) were collected. We used data augmentation
technique to increase the training data. Using a convolutional neural network and by
performing a grid search optimization on the hyper-parameters of the network, the researcher
attained an accuracy of 96% for training, 95% for validation and tasting dataset