Why your Data Infrastructure Is the Competitive Advantage in Product Development

Table of Contents

min read

Product development organizations in industrial sectors are under more pressure than ever before. Regulatory requirements keep mounting, sustainability mandates are reshaping materials decisions, and the pressure for speed is relentless.

In this environment, AI arrived with enormous promise. Yet for many organizations, it hasn’t paid off: Pilots get stuck, models trained in one context fall apart in another, and scientists don’t trust results that can’t be traced back to something they understand. BCG found in 2024 that 74% of companies struggled to achieve value from AI anywhere, much less in the lab or production line.

In most cases, though, the problem isn’t the algorithms. It’s the data underneath them. Any serious AI initiative must start with an honest look at data quality, structure, and accessibility.

Most organizations find significant gaps. In fact, Gartner predicts that 60% of AI projects will be abandoned due to a lack of AI-ready data. Therefore, the highest-leverage investment isn’t in the latest model—it’s in the experimental data the model will be trained on.

‍

The data problem that AI cannot solve for you

Walk into the R&D or QC function of most industrial companies, and you’ll find data being captured in ways that made sense within each department, but don’t connect into anything coherent.

Formulation records sit in spreadsheets. Test results live in a LIMS that was set up to track compliance, not support analysis. Process parameters are buried in equipment logs that nobody ever thought to link to the experiments they were part of. Observations, photos, and customer notes are somewhere on a shared drive, inconsistently named and inconsistently filed.

That fragmentation creates a real obstacle for AI. Machine learning needs data in context: formulation compositions connected to process conditions, linked to measured properties, tied to how the final product performed.

When those connections don’t exist, or when someone must spend days rebuilding them by hand before any analysis can happen, the economics of AI stop making sense. These issues lead to a higher rate of failure for AI projects than other IT projects.

‍

Data structure is a strategic decision

The gap between structured and unstructured data is both a software and an organizational problem. Scientists working on their own, with the tools they’ve always used, generate unstructured data naturally as the path of least resistance—spreadsheets, free-text notebooks, and PDFs with no consistent naming or metadata. These can be quickly produced but are nearly impossible to use at any scale.

Structured data is different. It requires someone to sit down and design a shared model before the data is ever captured: what gets measured and how, which metadata travels with every observation, and how different data types connect to each other.

For example, take something as basic as Brookfield viscosity (a commonly used industry standard for measuring a fluid’s resistance to flow). Pull records from a typical unstructured lab system, and you might see the same property written three different ways: “Viscosity, 7D = 3000,” “BV, ON = 1800,” or “Brookfield Visc. Sp #4 = 5500.”

Are those comparable? Maybe. It depends on multiple factors, including the spindle, the RPM, the temperature, how long the sample was aged, and which instrument was used—none of which was recorded.

A structured data model captures all of this information as separate, searchable fields attached to every measurement. The difference in what you can do with the data afterward isn’t trivial. It’s the difference between data you can use and data you can’t.

Getting there is harder than it sounds, for two reasons that tend to compound each other. The first is technical: Existing systems like LIMS, ELNs, and equipment logs were built for specific purposes and weren't designed to feed AI. Stitching them together after the fact through data lake initiatives usually gets you marginally better access without any real improvement in whether the data means the same thing across systems or locations.

Purpose-built product development platforms take a different approach, designing the data model around scientific workflows from the ground up, so that formulations, process conditions, and test results are linked by default rather than assembled after the fact.

The second problem is cultural, and it’s often the more difficult one. Scientists have built their workflows around familiar tools, and when you try to change those workflows, you get resistance—even when the new approach is technically better. Plenty of sound data initiatives have died in implementation because this piece wasn’t taken seriously enough.

‍

Failure modes to avoid

‍

Not enough data

A simple model trained on a large dataset almost always outperforms a sophisticated one trained on too little data. When AI goes live before enough useful experiments have been captured in a consistent format, the models don’t generalize—and when they fail, the blame lands on the AI rather than on the data shortage that caused the problem.

‍

Applying AI where it doesn’t fit

AI doesn’t work equally well on every type of problem. Push predictive models onto projects with sparse data, poorly defined outcomes, or highly variable results, and you’ll get noise, not answers. Choosing the right targets matters just as much as the modeling work itself. The strongest candidates are areas with a solid history of consistent experiments and measurable outcomes.

‍

Setting the bar at the wrong place

It’s tempting to point AI at the hardest problems—the ones with hundreds of interacting variables and only a few dozen historical data points to draw from. When nothing useful comes out, everyone walks away thinking AI failed. Really, the expectation was never realistic. Starting with grounded targets lets AI build a real track record before you ask it to do something more difficult.

‍

Advancing Early Drug Discovery With Intelligent Screening Workflows

Through practical examples, discover how integrating AI and ML tools with robust experimental data streamlines analysis, reduces review time by up to ~90%, and supports more confident decision making.

‍

A maturity model for data-driven product development

Most organizations follow a recognizable path when it comes to data maturity. It usually starts with paper notebooks, moves to spreadsheets and Word documents, and eventually lands on some kind of shared digital repository such as SharePoint or an ELN. At that point, the data is at least findable, but it’s rarely usable across teams, and running any kind of cross-functional analysis usually requires a lot of manual cleanup.

The real turning point is when an organization builds a unified data model with scientific context in mind. When formulations, process conditions, and measured properties are captured consistently across teams and sites, something genuinely useful starts to accumulate: institutional knowledge that doesn’t evaporate when someone leaves.

Some modern platforms are purpose-built for this inflection point, giving both R&D and QC teams a shared environment where experimental data accumulates in a consistent, connected structure.

With this system in place, old experiments become training data. Patterns that no single scientist could spot on their own begin to surface. Once that foundation is solid, AI stops being a high-stakes gamble and becomes something more incremental and manageable.

You find the projects with enough historical depth. You validate what the models suggest against what your experienced scientists know. You build AI recommendations into existing workflows. And you keep the feedback loops tight so that data quality and model performance improve together over time.

‍

Institutional knowledge as a competitive advantage

There’s also an argument for data infrastructure that has nothing to do with AI. Industrial organizations hold decades of experimental knowledge, and they lose enormous amounts of it to turnover, inconsistent documentation, and siloed files. A well-designed data architecture changes that equation. It turns tacit knowledge into something an organization can build on.

Over a five-to-ten-year horizon, two organizations with equivalent AI tools but different approaches to data will end up in very different places. The one with structured, connected, historically rich data will get stronger model performance, spot innovation opportunities faster, catch problems before they become late-stage failures, and bring new scientists up to speed more quickly.

The one that tried to run AI on top of fragmented spreadsheets will keep cycling through vendor pilots, never getting the returns they were promised.

This article was originally published on Technology Networks.

Before the Algorithm: Why Data Infrastructure Is the Competitive Advantage in Product Development

The data problem that AI cannot solve for you

Data structure is a strategic decision

Failure modes to avoid

Not enough data

Applying AI where it doesn’t fit

Setting the bar at the wrong place

Advancing Early Drug Discovery With Intelligent Screening Workflows

A maturity model for data-driven product development

Institutional knowledge as a competitive advantage

Download our resource

Related Articles

Your AI Strategy Isn’t the Problem. Your AI Readiness for R&D Is.

Why 80% of R&D Data Never Gets Reused — and What AI Changes

The Value of Seamless Instrument and Equipment Connectivity in R&D Lab Digitalization