Modular LLM Architecture with Pluggable Reasoning Heads: A Scalable Approach to Multi-Modal AI Reasoning

Abstract

This paper introduces a modular architecture for Large Language Models (LLMs) that incorporates pluggable, domain-specific reasoning heads to augment the model's capabilities beyond conventional text generation. Central to our approach is an attention-routing controller that intelligently classifies and dispatches user prompts to appropriate reasoning modules—namely symbolic, logical, or graph—based heads-based on the prompt's structure and intent. This design enables hybrid, multi-paradigm reasoning without requiring retraining or finetuning of the base LLM. By decoupling reasoning tasks from general language understanding, our system improves both computational efficiency and interpretability. We demonstrate the architecture using Groq as the base LLM and integrate lightweight engines such as SymPy for symbolic mathematics and custom modules for logical and graph-based reasoning. Experiments across a suite of structured and unstructured prompts show a significant reduction in inference latency and token usage, along with higher accuracy and better explainability. The proposed framework offers a scalable foundation for embedding modular reasoning capabilities into modern LLM-driven applications.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By