myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy

Shin Thant Phyo; Pyae Linn; Thet Hmue Khin; Lynn Myat Bhone; Eaint Kay Khaing Kyaw; Ye Kyaw Thu

doi:10.5281/zenodo.17531976

myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy

dc.contributor.author	Shin Thant Phyo
dc.contributor.author	Pyae Linn
dc.contributor.author	Thet Hmue Khin
dc.contributor.author	Lynn Myat Bhone
dc.contributor.author	Eaint Kay Khaing Kyaw
dc.contributor.author	Ye Kyaw Thu
dc.date.accessioned	2026-05-08T19:25:53Z
dc.date.issued	2025-11-5
dc.description.abstract	This paper presents myFoodQA (Myanmar Food Question Answering), the first multimodal benchmark focused on Myanmar's rich gastronomic culture. We constructed a novel dataset containing 2,485 question-answer pairs covering 20 distinct dishes, with all data sourced from personal photography and web-crawling and subsequently validated by native Burmese speakers for authenticity. The benchmark is designed to test single-image, multi-image, and text-only reasoning, evaluating a model's understanding of ingredients, cultural context, preparation methods, and comparative logic. Our zero-shot evaluation of leading vision-language models (VLMs) reveals a significant performance gap: while models perform well on text-based tasks, they show a significant deficit in image-based reasoning, which requires specific visual understanding and deep cultural knowledge. These findings expose the limitations of current models in the Myanmar gastronomic domain. We establish myFoodQA as a foundational resource for advancing culturally-aware multimodal AI, particularly in low-resource settings.
dc.identifier.doi	10.5281/zenodo.17531976
dc.identifier.uri	https://dspace.kmitl.ac.th/handle/123456789/20309
dc.publisher	Zenodo (CERN European Organization for Nuclear Research)
dc.title	myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy
dc.type	Other

Collections

All

myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy

Files

Collections