myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy
| dc.contributor.author | Shin Thant Phyo | |
| dc.contributor.author | Pyae Linn | |
| dc.contributor.author | Thet Hmue Khin | |
| dc.contributor.author | Lynn Myat Bhone | |
| dc.contributor.author | Eaint Kay Khaing Kyaw | |
| dc.contributor.author | Ye Kyaw Thu | |
| dc.date.accessioned | 2026-05-08T19:25:53Z | |
| dc.date.issued | 2025-11-5 | |
| dc.description.abstract | This paper presents myFoodQA (Myanmar Food Question Answering), the first multimodal benchmark focused on Myanmar's rich gastronomic culture. We constructed a novel dataset containing 2,485 question-answer pairs covering 20 distinct dishes, with all data sourced from personal photography and web-crawling and subsequently validated by native Burmese speakers for authenticity. The benchmark is designed to test single-image, multi-image, and text-only reasoning, evaluating a model's understanding of ingredients, cultural context, preparation methods, and comparative logic. Our zero-shot evaluation of leading vision-language models (VLMs) reveals a significant performance gap: while models perform well on text-based tasks, they show a significant deficit in image-based reasoning, which requires specific visual understanding and deep cultural knowledge. These findings expose the limitations of current models in the Myanmar gastronomic domain. We establish myFoodQA as a foundational resource for advancing culturally-aware multimodal AI, particularly in low-resource settings. | |
| dc.identifier.doi | 10.5281/zenodo.17531977 | |
| dc.identifier.uri | https://dspace.kmitl.ac.th/handle/123456789/20310 | |
| dc.publisher | Zenodo (CERN European Organization for Nuclear Research) | |
| dc.title | myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy | |
| dc.type | Other |