Journal & Conference Proceeding Publications

ID Code : CSC 0066
Title : Towards Stemming Error Reduction for Malay Texts
Author/s : Mohamad Nizam Kassim; Shaiful Hisham Mat Jali; Mohd Aizaini Maaruf and Anazida Zainal
Abstract : Text stemmer is one of useful language pre-processing tools in the field of information retrieval, text mining and natural language processing. It is used to map morphological variants of words into base forms. Most of the current text stemmers for the Malay language focused on removing affixes, clitics, and particles from affixation words. However, these stemmers still suffered from stemming errors due to insufficiently address the root cause of these stemming errors. This paper investigates the root cause of stemming errors and proposes stemming technique to address possible stemming errors. The proposed text stemmer uses affixes removal method and multiple dictionary lookup to address various root causes of stemming errors. The experimental results showed promising stemming accuracy in reducing various possible stemming errors.
Publication : Lecture Notes in Electrical Engineering – Computational Science and Technology
Year Published : 2018
PDF / Official URL :