The DVPS project (Diversibus Viis Plurima Solvo, Latin for “Through diverse paths, I solve many issues”) builds on the success of large language models by exploring the future of AI through multimodal foundation models. Unlike today’s systems, which learn from representations of the world via text, images, and video, these next-generation models are designed to learn across multiple input channels, including visual, auditory, linguistic, and sensory signals, to gain a grounded understanding of the physical world. This multimodal approach enables them to interpret meaning in parallel, manage complexity, and adapt to real-world scenarios where today’s single-modal AI often fails.
DVPS is funded under the 2030 Horizon Europe Programme.