Tech Explained: Here’s a simplified explanation of the latest technology update around Tech Explained: Siemens is building its AI bet on a problem the industry refuses to admit in Simple Termsand what it means for users..
In 2025, global enterprises invested an estimated $684 billion in artificial intelligence initiatives. By year-end, more than 80% of that investment had failed to deliver intended business value. In manufacturing, where the promises of predictive maintenance, autonomous quality control, and self-optimising production lines have been loudest, the AI project failure rate stands at 76.4%, with OT/IT integration and IoT data quality cited as the primary culprits. The technology, as it turns out, is rarely the problem. The data feeding it is almost always.
This is the context in which Siemens AG, the German industrial conglomerate, has been making a pointed argument. It cuts against the industry’s reflex toward more: more sensors, more storage, more compute. The argument is that industrial AI does not have a data shortage. It has a data quality problem that volume cannot fix.
Dirk Didascalou, Head of Foundational Technologies at Siemens AG, calls it the correlation trap. In factory environments, machinery running within designed tolerances produces readings that are, structurally, repetitive. “When I have a running system, it always runs well,” he explains. “I can make a lot of data, only one piece of information: good.” A sensor logging stable temperature readings a billion times yields, informationally, just two data points. “It’s not that we don’t have enough gigabytes,” he says. “The question is, is the data uncorrelated enough that it contains different information?”
Gartner’s own projections estimate that 60% of AI projects unsupported by AI-ready data will be abandoned through 2026, a finding that maps directly onto what Didascalou describes. Industrial environments are, almost by definition, hostile to the kind of data diversity that AI models require. Failure states are rare. Edge cases are infrequent. The data that would actually train a useful model, breakdowns, anomalies, and degradations, is precisely the data that well-run systems are designed to prevent from occurring.
Peter Koerte, Chief Technology and Strategy Officer at Siemens, puts a number on the gap. “AI without data is nothing,” he told an audience at the company’s India Innovation Day in March. “And in the industrial case, it’s very, very different.” His estimate: today, 80% of data generated across more than 20 billion connected field devices globally goes unused. Factories are, in effect, generating the raw material for intelligent systems while capturing almost none of its value.
The precision requirements compound the problem in ways that have no consumer equivalent. Industry experts now warn that in manufacturing, an AI-generated suggestion directing a factory worker to take the wrong action on a critical piece of equipment could result in production disruptions, unplanned downtime, or damaged machinery. “Hallucination is definitely a big issue that we have to overcome so that it really becomes reliable,” Koerte said. “In industry, we have to rethink how we deploy AI.” A general-purpose model delivering 70 to 80% accuracy may be commercially viable in consumer applications. On a production line, the same accuracy rate is a liability.
Siemens’ response is to build what it calls an Industrial Foundation Model — architecture intended to natively process the actual languages of industrial systems: CAD files, piping and instrumentation diagrams, PLC code, sensor streams. The distinction from large language models is architectural, not cosmetic. “How do you chunk a 3D model into small pieces and pretend it’s sequential language? It doesn’t work,” Didascalou says. Siemens holds roughly 300 petabytes of operational data and two million CAD files internally. Koerte has acknowledged that this is still not sufficient. Building usable industrial AI models requires heterogeneous data drawn from multiple industries, sites, and failure profiles — a dataset no single company assembles alone. Whether data alliances and shared training infrastructure can actually close that gap at the speed the market expects remains an open question.
Koerte’s warning to manufacturers is unambiguous: “We’ve seen so many proof points of pilots that don’t scale. You have to do it industry by industry, electronics, battery manufacturing, automotive, and really understand what the key use cases are. Is that a use case that can actually scale? Otherwise, it will be just yet another pilot.” That gap, as Forrester research noted in late 2025, is widening rather than closing, with 25% of enterprise AI investments planned for 2026 deferred to 2027, not because the technology does not work, but because the distance between vendor promises and measurable results keeps growing.
The data problem in industrial AI was never about having too little. It was always about whether what you had was worth anything at all.
