Beyond Perception: Building Grounded, Agentic AI at GRAIL-V@CVPR 2026

For the last decade, computer vision focused on one goal: perception. Researchers taught machines to classify images, detect objects, and segment pixels. But in the enterprise and the real world “seeing” is only the first step.

To drive value, AI must move from observation to action. We invite you to join us at GRAIL-V (Grounded Retrieval & Agentic Intelligence for Vision-Language) at CVPR 2026. This workshop brings researchers and practitioners together to solve the hardest challenge in AI today: creating multimodal agents that work in production.

From Passive Observation to Active Agents

Real-world data is messy. Critical context lives in fragmented PDFs, dashboards, images, spreadsheets, and videos. A reliable multimodal agent must connect these distributed sources to make accurate decisions and solve user tasks. The question is no longer “Can the model interpret the input?” The question is “Can you trust the system to act?”

To bridge the gap between impressive demos and deployable reliability, agents must master the following capabilities:

Plan and route intelligently across tools and modalities.
Generate and edit content only when necessary.
Ground decisions in evidence with precise citations.
Handle uncertainty responsibly when evidence conflicts.

Why You Should Attend

Oracle AI is proud to sponsor and organize GRAIL-V. We believe the next wave of enterprise AI is defined by evidence-driven systems that behave predictably in complex environments.

This half-day workshop features a mix of invited talks, peer-reviewed research, and expert panels. You will hear from industry leaders and distinguished professors, including:

Kristen Grauman (UT Austin)
Mohit Bansal (UNC, Chapel Hill)
Dan Roth (University of Pennsylvania)
Scott Yih (FAIR, Meta)
Sujith Ravi (Oracle AI)

Call for Papers

We want to see your work. We are looking for submissions that advance the mechanisms of multimodal agents.

Topics of Interest (non-exhaustive list):

Multimodal Retrieval: Scaling search across images, video, and UI.
Image/Video Understanding: Deep interpretation of visual data.
Generative Tools: Use by multimodal agents across images, videos and text.
Benchmarks & Evaluation: Reproducible methods for measuring success.
Grounding: Evidence, citation provenance, and audit-ready faithfulness.

Important Dates

Mark your calendars for the following deadlines:

Submission Deadline: March 5, 2026
Author Notification: March 18, 2026
Conference: June 3–7, 2026 (Denver, USA)

To learn more about the submission guidelines and speakers, visit the GRAIL-V Workshop Website.
We look forward to seeing you in Denver to advance the state of grounded multimodal agents.

Beyond Perception: Building Grounded, Agentic AI at GRAIL-V@CVPR 2026

Amit Agarwal

Principal Applied Scientist

Hitesh Laxmichand Patel

Senior Applied Scientist

Evaluating language models with OCI Data Science and Generative AI

Announcing AI-powered generative extraction in OCI Document Understanding

Beyond Perception: Building Grounded, Agentic AI at GRAIL-V@CVPR 2026

Authors

Amit Agarwal

Principal Applied Scientist

Hitesh Laxmichand Patel

Senior Applied Scientist

Evaluating language models with OCI Data Science and Generative AI

Announcing AI-powered generative extraction in OCI Document Understanding