MULTIMODAL LARGE LANGUAGE MODELS FOR LOW-RESOURCE LANGUAGES AND GLOBAL SOUTH DEPLOYMENT: A COMPREHENSIVE SURVEY OF ARCHITECTURES, BENCHMARKS, AND SOCIOTECHNICAL CHALLENGES
Abstract
Multimodal Large Language Models (MLLMs) such as LLaVA, InstructBLIP, and Qwen2-VL have unlocked joint reasoning over text and images at an unprecedented scale. The technical literature on MLLM efficiency, domain adaptation, and benchmarking has matured into a substantial corpus. However, the overwhelming majority of surveys are written from the vantage point of high-resource English-language datasets and well-resourced computing infrastructures. This survey takes a different angle. We consolidate the recent literature on multimodal language understanding through the lens of low-resource languages and Global South deployment contexts, where data scarcity, compute constraints, intermittent connectivity, and sociotechnical-trust considerations interact in ways that high-resource surveys rarely address. We propose a four-quadrant taxonomy that organises the field around linguistic coverage, data scarcity, compute constraints, and sociotechnical trust. We trace the evolution of multilingual multimodal architectures across an eight-year arc, map the benchmark landscape against nine languages and five modalities, and describe a three-tier edge-regional-global deployment topology suited to low-resource environments. Five high-impact application domains are surveyed: healthcare, agriculture, education, disaster response, and public service. A dedicated section examines sociotechnical considerations, including linguistic justice, algorithmic fairness, and regulatory readiness. We identify six concrete open research problems and outline a future research agenda. The survey is intended as a single-source reference for researchers, policy makers, and practitioners pursuing equitable multimodal AI deployment in low-resource contexts.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Science World Journal

This work is licensed under a Creative Commons Attribution 4.0 International License.