Features, Insight, Interviews

Multimodal AI is critical, says Zebra Technologies Director

Stuart Hubbard, Global Senior Director, AI and Advanced Development, Zebra Technologies, discusses the intelligence supercycle, ACI and multimodal AI in this exclusive Q&A.

The concept of an “Intelligence Supercycle” suggests a rapid shift in how decisions and work are done—what are the most immediate, tangible changes manufacturers should expect on the factory floor in the next 2–3 years?

There have been several supercycles or industrial leaps, each marked by specific technologies that transformed and shaped work and society long term. Steam power, electricity, computing and digitalisation brought about new ways of working, new jobs, and whole new industries with possibilities never seen before.

We need to leverage AI to bring about the same sort of transformation across industries like manufacturing, where AI acts as a growth engine that positively impacts revenue and profit, and creates new industries and new jobs. We are already beginning to see this with high-demand roles like forward deployed AI engineers. Physical environments, workflows, assets and inventory are digitised and turned into new, richer sources of insight to support immediate decision-making, intelligent operations and longer term planning.

Today’s supercycle is driven by people building multimodal AI and applying it as an embedded intelligence layer across frontline operations and putting it in the hands of workers. On-premises and on-device AI is particularly important for manufactures, so AI solutions that meet this requirement will be in demand, particularly smaller, more tailored AI models.

I also think manufacturers will discover new value in their frontline data. Huge volumes of data are generated from machine operations, inventory creation, storage and movement, workflows and worker tasks, and customer and supply chain interactions. AI is providing new impetus for data orchestration as it becomes the nervous system for Agentic AI and the new digital workers who augment the frontline worker.

While much of the AI conversation is focused on AGI, you’ve emphasized Augmented Collective Intelligence (ACI). What does ACI look like in practice for frontline manufacturing workers, and why is it a more realistic path right now?

ACI is about combining the capabilities of AI to enrich and elevate human experience, judgement, and insight. ACI on the frontline extends human intelligence and creates a low-threshold user experience.

There are three key components of ACI. First, agent swarms instead of a single “all-knowing” model. ACI utilises a network or “swarm” of connected, dedicated agents. Second, it is multi-style, combining different styles of AI such as generative and deep learning algorithms to address complex tasks. And third, human integration, with workers contributing unique intelligence, common sense, and domain expertise to the network, while AI scales these talents through decision support and automation.

Many manufacturers are already dealing with skills shortages and high turnover—how can AI meaningfully reduce onboarding time and upskill workers without adding complexity or friction?

Talent shortages, slow time-to-value for new hires, and churn are primary headwinds facing manufacturing leaders. In the immediate term, it means investing in automation to fill labour gaps and take some of the manual and cognitive burden off the current workforce. It’s about ensuring uptime and productivity, but also worker experience and wellbeing. And with AI agents trained on proprietary standard operating procedures, worker time-to-value is faster. New and current talent have accessible, consistent and tailored AI agents on their workplace wearable and handheld devices. These provide the knowledge they need – from booking time off to locating items and knowing the next step in a workflow.

You’ve highlighted that AI for the frontline is fundamentally different. Can you explain why multimodal AI is so critical in physical environments, and what challenges companies face in deploying it effectively?

Multimodal AI is critical because our world is multimodal. We see, touch, smell, hear and taste. AI models need to be able to do something similar with real-world data inputs across image, video, temperature, location, text, and audio. And the model should be able to deliver multimodal outputs to match the needs of the frontline worker and the environment. Data capture and AI can create living, digitised versions of workflows and environments, which means each “sense” needs to be present to make replication accurate and authentic.

There are a few key challenges. CTOs will be thinking about the financial investments needed, the build versus buy case, and the right sorts of AI partners to work with as part of AI transformation. Smaller, on-device models are already available, which removes the need for R&D investments and accelerates proofs of concept and pilots. There are also AI enablers and blueprints for developers to take AI models and templates and integrate them into operational technology, again speeding up time-to-value.

Meanwhile, CIOs and IT teams are concerned with AI governance and data quality and security, multi-factor authentication and role-based access control to strengthen protection. Regular vulnerability testing, code reviews, threat modelling, and compliance checks are needed for timely identification and correction of any vulnerabilities at each stage of the project. Treat agents like evolving operational systems, not static deployments.

Zebra has spoken about the importance of purpose-built, “AI-first” hardware. How does hardware innovation—like smart sensors, industrial cameras, and RFID—change what’s actually possible with AI compared to software-only approaches?

AI-first hardware combined with RFID systems, smart sensors, and mobile computers play a threefold role. First, they act as the multimodal data capture layer, capturing text, character, audio, 2D and 3D and visual data from workflows, environments and interactions between humans and machines. Hardware and sensors specifically designed for the environment that they are operating to maximise accuracy and speed.

Second, they enable on-device AI inference, as they are equipped with specialised neural processing units and graphics processing units, and AI models optimised to fit within a device’s storage and memory. AI inference happens on the device, so no data needs to leave the security of the device and company network, cloud costs (tokens) are reduced, and latency eliminated.

And third, they are the user interface for human workers to access intelligence, and the interface between machine and machine, so the benefits of AI are shared across a fleet of solutions sharing new learnings.

There’s a shift from IoT to what you call “Ambient Intelligence.” What does that transition look like in real-world manufacturing settings, and how close are we to truly context-aware, autonomous operations?

The terms internet of things (IoT) and the industrial internet of things (IIoT) are still valid but need updating. Our ability to capture much more data from our physical environments and workflows is coupled with the capabilities of AI to harness data and turn it into intelligence for human decision making, and for AI systems that self-improve over time.

Ambient intelligence is the evolution of the IoT and performs a couple of important functions. First, it recognises that environments change, sometimes a lot, even within the structured environments of the factory and production line. And it recognises that within working environment there is more going on than “things” – human interaction, data flows, environmental conditions, the unexpected and unforeseen are part of the environment. If the IoT is a system of record, then ambient intelligence is a system of reality that reflects the moment the worker finds themselves in.

I think the technologies are already in place for fully context aware solutions, but I would add that the focus is on an ACI approach rather than fully autonomous operations without humans, such as “dark factories” operating 24/7. The focus is frontline AI, and I think it makes sense to talk about intelligent operations with humans in the loop, with different levels of automation for manual and cognitive tasks.

Image Credit: Zebra Technologies

Previous Article

GET TAHAWULTECH.COM IN YOUR INBOX

The free newsletter covering the top industry headlines