myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy

dc.contributor.authorShin Thant Phyo
dc.contributor.authorPyae Linn
dc.contributor.authorLynn Myat Bhone
dc.contributor.authorThet Hmue Khin
dc.contributor.authorEaint Kay Khaing Kyaw
dc.contributor.authorYe Kyaw Thu
dc.date.accessioned2026-05-08T19:26:05Z
dc.date.issued2025-11-12
dc.description.abstractThis paper introduces myFoodQA (Myanmar (Burmese) Food Question Answering), the first multimodal benchmark designed to evaluate AI models on Myanmar's rich gastronomic culture. A core contribution of this work is the thorough construction of the benchmark itself, which involved curating a diverse set of food images for 20 distinct dishes and, crucially, generating a novel corpus of 2,485 question-answer pairs. The benchmark features tasks for single-image, multi-image, and text-only reasoning, specifically designed to evaluate model understanding of ingredient recognition, cultural context, preparation methods, and comparative logic. To ensure authenticity, data was sourced and collected from personal photography and web-crawling, with all annotations, prompts and questions validated by native Burmese speakers. Leading vision-language models were evaluated in a zero-shot condition and revealed a large performance disparity. While models perform well on text-based tasks, the performance significantly deficit on image-based reasoning, which needs specific image understanding and extensive cultural knowledge. These findings reveal the limitations of current Large Language and Vision Models (LLMs and VLMs) regarding the Myanmar gastronomic domain. Consequently, this work establishes myFoodQA as a foundational resource for advancing multimodal AI in culturally relevant and low-resource settings.
dc.identifier.doi10.1109/isai-nlp66160.2025.11320506
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/20409
dc.subjectNutritional Studies and Diet
dc.subjectCulinary Culture and Tourism
dc.subjectMultimodal Machine Learning Applications
dc.titlemyFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy
dc.typeArticle

Files

Collections