Modular LLM Architecture with Pluggable Reasoning Heads: A Scalable Approach to Multi-Modal AI Reasoning

dc.contributor.authorAnurag Pathak
dc.contributor.authorDilip Kumar Sharma
dc.contributor.authorHarshada Agrawal
dc.contributor.authorRathachai Chawuthai
dc.contributor.authorJirayu Petchhan
dc.date.accessioned2026-05-08T19:25:55Z
dc.date.issued2025-8-5
dc.description.abstractThis paper introduces a modular architecture for Large Language Models (LLMs) that incorporates pluggable, domain-specific reasoning heads to augment the model's capabilities beyond conventional text generation. Central to our approach is an attention-routing controller that intelligently classifies and dispatches user prompts to appropriate reasoning modules—namely symbolic, logical, or graph—based heads-based on the prompt's structure and intent. This design enables hybrid, multi-paradigm reasoning without requiring retraining or finetuning of the base LLM. By decoupling reasoning tasks from general language understanding, our system improves both computational efficiency and interpretability. We demonstrate the architecture using Groq as the base LLM and integrate lightweight engines such as SymPy for symbolic mathematics and custom modules for logical and graph-based reasoning. Experiments across a suite of structured and unstructured prompts show a significant reduction in inference latency and token usage, along with higher accuracy and better explainability. The proposed framework offers a scalable foundation for embedding modular reasoning capabilities into modern LLM-driven applications.
dc.identifier.doi10.1109/etncc66224.2025.11299615
dc.identifier.urihttps://dspace.kmitl.ac.th/handle/123456789/20338
dc.subjectTopic Modeling
dc.subjectAdvanced Graph Neural Networks
dc.subjectMultimodal Machine Learning Applications
dc.titleModular LLM Architecture with Pluggable Reasoning Heads: A Scalable Approach to Multi-Modal AI Reasoning
dc.typeArticle

Files

Collections