Three non-negotiables before upgrading the data stack to an AI stack

Most companies are not ready to replace a dashboard-era data stack with an AI stack. Salesforce’s latest State of Data & Analytics indicates that 84% of data and analytics leaders say their strategies require a complete overhaul before AI ambitions can succeed. Leaders estimate that 26% of their data is untrustworthy, only 43% report formal data governance frameworks, and around 50% are not confident in their ability to generate and deliver timely insights. At the same time 70% believe the most valuable insights are locked in unstructured data. The conclusion is straightforward: the obstacle is not enthusiasm but foundation, and that foundation must change before agentic systems can scale.

Agentic AI turns data platforms into systems of action that read contracts and tickets, watch cameras, listen to calls, correlate with logs and events, and then execute. Dashboards can tolerate delay, whereas agents cannot; compliance can operate through documentation, yet agents require controls at runtime. Before any enterprise proclaims an AI stack, three elements must be non-negotiable: multimodality as the default, streaming as the operating mode, and governance as a runtime system. The sections that follow translate these convictions into architectural choices that carry agentic workloads at production scale.

Multimodal as the default

Treating documents, images, audio, video, and logs as second-class assets constrains agent capability by design. A platform fit for agents stores tables and tensors as peers under one catalog with a unified lineage system and a consistent access model, so high-dimensional media can move through the same lifecycle as relational data. In practice, this means chunked or tiled layouts that enable partial reads and region of interest extraction; content-addressed identifiers with versioned sidecar metadata so every artifact is reproducible; and array and column native formats that allow predicate and coordinate pushdown. Text and documents require embeddings as a primary derivative that are generated deterministically, versioned with their sources, and indexed with clear trade-offs, for example, HNSW when recall dominates, IVF PQ when memory and latency must be balanced, and hybrid dense plus lexical indexing when product codes and numbers are essential. Retrieval should return evidence bundles such as passages, pages, frames, or clips because agents reason over evidence rather than isolated substrings.

The greatest gains come from fusion, yet fusion is only valuable when alignment is solved at the platform level. Early fusion captures fine-grained interactions, such as aligning text spans to image regions, late fusion preserves modality-specific models until a decision boundary, and hybrid approaches combine both where interaction points are well defined. Alignment is the hard part. Event time synchronization across sources with different sampling rates, spatial registration across sensors, and semantic linking so an order ID in a PDF, a lane three camera frame, a vibration spike, and a ledger event resolve to the same business object are capabilities that belong in the catalog and lineage layer. Without this shared notion of time, identity, and provenance, agents will hallucinate context or act on stale signals. Given that 70% of leaders believe the most valuable insights are inside unstructured data, multimodality cannot be a phase two add-on. It is the default that unlocks context.

Streaming as the operating mode

Agentic work introduces a central service level objective: decision latency, the time between a real-world signal and an acceptable action. Meeting that objective requires a streaming first platform. The backbone is an event log that acts as the system of record, with schemas enforced on write and event time semantics with watermarks so windows reflect business truth rather than arrival order. On that backbone run stateful stream processors that maintain durable local state, handle backpressure predictably, join hot streams to cold reference data, and emit decisions rather than merely transformed rows. To support inspection and justification by both humans and agents, place a real-time analytical layer that ingests directly from the log and answers sub-second queries over continuously updating tables.

The hot path and the cold path should live in the same fabric. Land streams into open table formats with ACID transactions and time travel, so replays, backfills, audits, and training can proceed without freezing the pipeline. This collapses the false choice between fast and ephemeral on one side and durable and slow on the other. Operational discipline closes the loop. Define lateness budgets and retraction rules, document replay and dead letter runbooks, and use canary deployments with mirrored traffic so rollouts are reversible. When these practices are present, fraud agents can block before settlement, personalization can adapt in session, reliability agents can schedule maintenance when drift first appears, and service agents can escalate when frustration rises rather than after churn.

Governance as a runtime system

As autonomy increases, trust becomes the gate. Documentation and periodic audits do not prevent bad decisions; control at runtime does. Two moves establish that control. First, enforce data contracts at write so producers publish schemas, semantics, and freshness objectives, the registry blocks incompatible changes or routes them to quarantine, consumers declare expectations, and incidents are raised automatically when expectations are breached, so agent permissions are downgraded until conditions recover. Second, evaluate policy as code on every action, whether triggered by a human or an agent. When a tool or dataset is requested, a policy engine should evaluate who is calling, for what purpose, over which data, under which risk, and with which obligations, for example, attaching an evidence bundle, redacting sensitive fields, or requiring a human co-signature above a threshold. The outcome must be a signed decision and an auditable trail.

Security needs a Zero Trust posture that assumes breach and verifies every call. Least privilege, continuous verification, sandboxed tool execution, and strict egress control are table stakes, while agent-specific risks such as prompt injection, retrieval poisoning, and tool abuse require telemetry and containment at runtime rather than model-centric tuning alone. The stakes are clear in the same Salesforce research: only 43% of leaders report formal governance frameworks, 89% of teams already using AI have experienced inaccurate or misleading outputs, and 88% say AI demands new governance and security approaches. Proven governance means the platform can show who acted, with what data, under which policy, using which model or tool version, and how to roll back if necessary, which is what separates pilots that impress from systems that survive audits.

How the three elements converge

The blueprint is a living system that connects producers, storage, compute, and policy without handoffs that erode trust. Producers emit change events and deposit artifacts such as PDFs, images, audio, and video into chunked storage while a global catalog registers assets, ownership, lineage, and applicable policies. Near data pipelines normalize, redact, embed, and align, and the resulting features and embeddings flow into indexed stores with documented retrieval profiles. Stateful stream processors enrich and decide in motion, and a real-time analytical surface exposes fresh facts to humans and to agents. In parallel, the same events tier into open lakehouse tables with ACID semantics and time travel, so real-time and historical views live in one governed substrate. Agents interact through narrow least privilege tools; every call is checked by policy, and every action is signed and linked to lineage. This pattern addresses the trust, timeliness, and accessibility gaps that block scale.

Conclusion

Enterprises that aim to scale AI should resist the urge to bolt agent features onto a dashboard-era stack. The foundation must change first. Make multimodality the default so every useful signal sits under one catalog with shared identity, time, and lineage, and so retrieval returns evidence rather than fragments. Make streaming the operating mode so decisions respect event time and arrive within a defined latency budget, while the same streams land in open tables for audit and learning. Upgrade governance from slides to runtime by enforcing contracts at write, evaluating policies on every action, applying Zero Trust to tools and data, and observing continuously so autonomy is paused or downgraded when quality slips. Taken together, these moves replace a reporting substrate with an action substrate. Agents then operate on fresh multimodal evidence inside explicit guardrails, with explanations built in, which is how pilots become production and how AI compounds value instead of exceptions.