myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy

Shin Thant Phyo; Pyae Linn; Thet Hmue Khin; Lynn Myat Bhone; Eaint Kay Khaing Kyaw; Ye Kyaw Thu

doi:10.5281/zenodo.17531976

myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy

Date

2025-11-5

Authors

Eaint Kay Khaing Kyaw

Ye Kyaw Thu

Publisher

Zenodo (CERN European Organization for Nuclear Research)

Abstract

This paper presents myFoodQA (Myanmar Food Question Answering), the first multimodal benchmark focused on Myanmar's rich gastronomic culture. We constructed a novel dataset containing 2,485 question-answer pairs covering 20 distinct dishes, with all data sourced from personal photography and web-crawling and subsequently validated by native Burmese speakers for authenticity. The benchmark is designed to test single-image, multi-image, and text-only reasoning, evaluating a model's understanding of ingredients, cultural context, preparation methods, and comparative logic. Our zero-shot evaluation of leading vision-language models (VLMs) reveals a significant performance gap: while models perform well on text-based tasks, they show a significant deficit in image-based reasoning, which requires specific visual understanding and deep cultural knowledge. These findings expose the limitations of current models in the Myanmar gastronomic domain. We establish myFoodQA as a foundational resource for advancing culturally-aware multimodal AI, particularly in low-resource settings.

URI

https://dspace.kmitl.ac.th/handle/123456789/20309

Collections

All

Full item page

myFoodQA: A Multimodal Dataset for Evaluating Cultural and Visual Reasoning in Myanmar Gastronomy

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By