Myanmar Text Grade-Level Prediction Using Statistical and Linguistics Features

dc.contributor.authorKhaing Hsu Wai
dc.contributor.authorHlaing Myat Nwe
dc.contributor.authorThura Aung
dc.contributor.authorKaung Khant Si Thu
dc.contributor.authorHsu Yee Mon
dc.contributor.authorSeng Pan
dc.contributor.authorThiha Nyein
dc.contributor.authorYu Myat Moe
dc.contributor.authorYe Kyaw Thu
dc.contributor.authorThazin Myint Oo
dc.date.accessioned2026-05-08T19:26:05Z
dc.date.issued2025-11-12
dc.description.abstractReadability assessment supports curriculum design and adaptive learning by estimating the difficulty of a text for readers at different proficiency levels. However, research on text readability for the Myanmar language remains unexplored, mainly due to the absence of labeled resources and established computational approaches. This paper presents the first comprehensive study on Myanmar text readability classification. We construct a grade-annotated corpus from official Myanmar school textbooks (Grades 1-12) and extract linguistic, statistical, and Myanmar-specific indicators. We then evaluate regression and classification baselines, text-only embeddings, and ensembles. We also adapt three classic readability formulas (LIX, Dale-Chall, Flesch) to Myanmar and empirically show that their score distributions overlap heavily across educational levels. Experimental results show that ensemble-based models achieved the best performance in predicting grade levels, demonstrating the effectiveness of our feature design and modeling framework. This work introduces the first readability dataset, modeling approaches, and benchmark results for the Myanmar language, providing a strong foundation for future research in readability prediction, low-resource Natural language processing (NLP), and educational text analysis.
dc.identifier.doi10.1109/isai-nlp66160.2025.11320515
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/20402
dc.subjectText Readability and Simplification
dc.subjectSecond Language Acquisition and Learning
dc.subjectNatural Language Processing Techniques
dc.titleMyanmar Text Grade-Level Prediction Using Statistical and Linguistics Features
dc.typeArticle

Files

Collections