Myanmar Text Grade-Level Prediction Using Statistical and Linguistics Features

Khaing Hsu Wai; Hlaing Myat Nwe; Thura Aung; Kaung Khant Si Thu; Hsu Yee Mon; Seng Pan; Thiha Nyein; Yu Myat Moe; Ye Kyaw Thu; Thazin Myint Oo

doi:10.1109/isai-nlp66160.2025.11320515

Myanmar Text Grade-Level Prediction Using Statistical and Linguistics Features

dc.contributor.author	Khaing Hsu Wai
dc.contributor.author	Hlaing Myat Nwe
dc.contributor.author	Thura Aung
dc.contributor.author	Kaung Khant Si Thu
dc.contributor.author	Hsu Yee Mon
dc.contributor.author	Seng Pan
dc.contributor.author	Thiha Nyein
dc.contributor.author	Yu Myat Moe
dc.contributor.author	Ye Kyaw Thu
dc.contributor.author	Thazin Myint Oo
dc.date.accessioned	2026-05-08T19:26:05Z
dc.date.issued	2025-11-12
dc.description.abstract	Readability assessment supports curriculum design and adaptive learning by estimating the difficulty of a text for readers at different proficiency levels. However, research on text readability for the Myanmar language remains unexplored, mainly due to the absence of labeled resources and established computational approaches. This paper presents the first comprehensive study on Myanmar text readability classification. We construct a grade-annotated corpus from official Myanmar school textbooks (Grades 1-12) and extract linguistic, statistical, and Myanmar-specific indicators. We then evaluate regression and classification baselines, text-only embeddings, and ensembles. We also adapt three classic readability formulas (LIX, Dale-Chall, Flesch) to Myanmar and empirically show that their score distributions overlap heavily across educational levels. Experimental results show that ensemble-based models achieved the best performance in predicting grade levels, demonstrating the effectiveness of our feature design and modeling framework. This work introduces the first readability dataset, modeling approaches, and benchmark results for the Myanmar language, providing a strong foundation for future research in readability prediction, low-resource Natural language processing (NLP), and educational text analysis.
dc.identifier.doi	10.1109/isai-nlp66160.2025.11320515
dc.identifier.uri	https://dspace.kmitl.ac.th/handle/123456789/20402
dc.subject	Text Readability and Simplification
dc.subject	Second Language Acquisition and Learning
dc.subject	Natural Language Processing Techniques
dc.title	Myanmar Text Grade-Level Prediction Using Statistical and Linguistics Features
dc.type	Article

Collections

All

Myanmar Text Grade-Level Prediction Using Statistical and Linguistics Features

Files

Collections