VAST & TwelveLabs bring AI video search on-premises
VAST Data and video intelligence company TwelveLabs have signed a partnership to offer a customer-managed deployment option for TwelveLabs' video models on the VAST AI Operating System. The move targets organisations that keep large video archives in controlled environments.
The agreement focuses on scenarios where public cloud-only deployments do not meet governance and data-handling requirements, particularly for organisations managing large volumes of long-lived video across hybrid infrastructure.
Enterprises have expanded their use of video across media libraries, smart spaces and public safety programmes, but applying video datasets at scale can be difficult. Files are large, the content is complex and unstructured, and many deployments operate under strict rules for sovereignty, retention and access.
The partnership is positioned as a response to demand for bringing video intelligence closer to where footage is created and governed, including on-premises environments and newer hosted infrastructure described as "neocloud".
Deployment model
The collaboration introduces TwelveLabs' first customer-managed deployment path. Under this model, customers run TwelveLabs software in their own environment on top of the VAST AI Operating System, rather than consuming model services only through public cloud platforms.
VAST positions its AI Operating System as a software foundation for managing unstructured data at very large scale. In this partnership, it provides the layer for storing and organising video data and running related processing workflows.
TwelveLabs develops video foundation models for video understanding tasks. It highlighted two models: Marengo, designed for multimodal embeddings and search, and Pegasus, designed for deep video understanding and text generation.
The models are intended to support natural-language search and analysis across video, including any-to-any search patterns. Depending on pipeline design, this can include video frames, speech and other signals.
Platform components
The integration centres on several VAST software components. VAST DataSpace is positioned as a global namespace for accessing large archives across hybrid multicloud environments. VAST DataBase provides vector storage and retrieval at what VAST describes as trillion-vector scale, supporting similarity search used in many AI retrieval patterns.
VAST DataEngine provides orchestration for data and compute workflows, with event-driven pipelines that can generate and store embeddings and metadata as new video arrives. The companies framed this as an alternative to assembling separate tools for orchestration and data movement.
In practice, the combined stack targets common video processing steps such as frame sampling, speech-to-text and multimodal chunking. These steps can increase the volume of derived data stored alongside raw footage, raising storage and governance considerations for customers operating at scale.
Target sectors
The companies pointed to demand across media and entertainment, financial services and the public sector. Media teams often manage large libraries of finished content and production assets, where search and enrichment can support reuse and discovery across archives and production workflows.
In financial services, the companies cited surveillance footage as a potential input for fraud detection and compliance. Public sector agencies often use multi-camera footage for investigations and situational awareness. Many of these deployments require data to remain in controlled infrastructure rather than moving to public cloud services.
Constraints that can limit where AI workloads run are a key driver behind the customer-managed model. The companies cited sovereignty rules, regulatory requirements, security policies and costs at scale as factors that can narrow deployment options for large archives.
Partner comments
Danny Nicolopoulos, Head of Global Strategic Partnerships at TwelveLabs, said: "Our mission at TwelveLabs is to help machines understand video the way people do, across visuals, speech, sound and time - so teams can find what matters and extract insight from the moments that used to be unreachable."
He added: "Partnering with VAST expands where customers can deploy video intelligence, including on-premises and emerging AI and neocloud environments, while supporting the performance and scale demanded by the world's largest video archives. Together, we're opening new possibilities for video search and analytics in industries where governance and data control are non-negotiable."
John Mao, Vice President, Global Technology Alliances at VAST Data, said: "Video is one of the most valuable sources of truth across both private and public sectors, and one of the hardest to operationalise at scale because the data is massive, long-lived, and often highly governed."
He added: "By partnering with TwelveLabs, we're giving organisations a path to bring state-of-the-art video understanding to the data, not the other way around. With the VAST AI Operating System, customers can unify their video archives in a single global namespace and pair TwelveLabs' models with a built-in, trillion-vector-scale foundation for embeddings and retrieval - so they can power search, analytics, and reasoning workflows with speed, governance, and control."
The partnership will focus on deployments where organisations want to run video search, analytics and reasoning workflows across very large archives while keeping custody of the underlying data and infrastructure.