Skip to main content
Registration has reached capacity. Join the waitlist

All Accepted Papers

MARVIS: Modality Adaptive Reasoning over VISualizations

Benjamin Feuer (Stanford University), Lennart Purucker (Prior Labs), Oussama Elachqar (Oumi), Chinmay Hegde (New York University)

Architectural Patterns & Composition

MARVIS converts latent embeddings from small specialized ML models into visual representations, then uses a VLM's spatial reasoning to make predictions on non-traditional modalities and long-tail domains. It achieves competitive accuracy without requiring raw data exposure or retraining the underlying specialized models.

Presentation

Talk

Paper Session 8: AI Systems in Practice

Friday, May 29 · 1:20 PM – 1:30 PM

Bayshore Ballroom

Poster

Friday, May 29 · 1:45 PM – 3:15 PM

Carmel / Monterey

Abstract

Predictive applications of machine learning often rely on small (sub 1 Bn parameter) specialized models tuned to particular do- mains or modalities. Such models often achieve excellent perfor- mance, but lack flexibility. LLMs and VLMs offer versatility, but typically underperform specialized predictors, especially on non- traditional modalities and long-tail domains. We propose MARVIS (Modality Adaptive Reasoning over VISualizations), a system that transforms latent embedding spaces into visual representations and then leverages the spatial and fine-grained reasoning skills of VLMs to interpret the visualizations and utilize them for pre- dictions successfully. MARVIS achieves competitive performance across vision, audio, biological, and tabular domains using a sin- gle 3B parameter model, yielding results that beat Gemini 2.0 by 16% on average. MARVIS drastically reduces the gap between LLM/VLMs approaches and specialized domain-specific methods, without requiring any domain-specific training. Code and datasets are available at https://github.com/penfever/marvis.

ACM CAIS 2026 Sponsors