From Lab to Market: A Practical Framework for Evaluating AI Products in Research Organisations

May 26, 2026

Sipho Dikweni

A Practical Approach To Identifying AI Innovations With Real Commercial and Societal Potential

In research environments, we are exceptionally good at producing novel AI solutions—new models, improved accuracy, and breakthrough techniques. Yet many of these innovations struggle to move beyond pilots. Not because they lack technical merit, but because they are not evaluated through the lens of real-world deployment, trust, and commercial viability early enough. Over time, one insight becomes unavoidable: AI products don’t fail in the lab—they fail in the real world. And the gap between those two environments is exactly where commercialisation managers must operate.

Traditional AI evaluation has focused heavily on accuracy, benchmark performance, and novelty. While these are important, they are not sufficient for commercialisation. The real question is whether an AI solution can move from prototype to a deployable system that is trusted and capable of generating sustainable revenue. This requires a broader evaluation approach—one that balances novelty, system readiness, trust, and economics in a structured and disciplined way.

The starting point is problem–market fit. Many research projects begin with a solution and then attempt to find a market, but commercial success requires the opposite. It is critical to understand whether AI is solving a real, urgent problem that organisations are already spending money on. If there is no existing budget or clear demand, the risk of commercial failure is high. For example, an AI model for predictive maintenance may demonstrate strong technical performance, but if existing rule-based systems are already sufficient and significantly cheaper, adoption will depend on whether the AI delivers a clear and measurable economic advantage.

Closely linked to this is novelty and defensibility. Novelty absolutely has value, but it must be interrogated carefully. Not all novelty translates into commercial advantage. It is important to distinguish between technical novelty, system-level innovation, and data-driven advantage. A marginal improvement in model accuracy may be academically impressive but commercially irrelevant if users cannot perceive the difference. On the other hand, a system that reconfigures workflows or leverages proprietary data to continuously improve can create meaningful and defensible value. The key question is whether the novelty shows up in a way that the user experiences and whether it can be sustained over time.

Even when problem–market fit and novelty are present, many AI solutions fail at the point of system readiness. A model that performs well in a controlled environment often struggles when exposed to real-world conditions such as messy data, inconsistent inputs, and integration with legacy systems. Commercialisation managers must therefore evaluate whether the solution can operate reliably outside the lab. This includes assessing how the system handles variability, whether it can integrate into existing infrastructure, and what is likely to break first during deployment. For instance, an AI system designed for solar panel defect detection may work well on curated datasets but fail when deployed in environments with dust, lighting variability, or limited connectivity.

Perhaps the most critical and most underestimated dimension is trust and adoption. AI systems are often abandoned not because they are inaccurate, but because users cannot predict when they will be wrong. This uncertainty introduces what can be described as a “trust tax.” In practice, users begin to double-check outputs, run parallel processes, and validate results manually. This creates hidden costs in the form of verification time, error correction, and increased cognitive load. If an AI system saves time but requires constant checking, it does not create value; it creates friction. In environments such as banking or healthcare, where decisions carry significant risk, this trust tax can completely undermine adoption.

Trust is built not only through accuracy but through how a system behaves when it fails. This is where failure design becomes essential. A fail-safe AI system is one that signals uncertainty clearly, limits the impact of errors, and enables recovery. It should indicate when confidence is low, avoid presenting uncertain outputs as definitive, and ensure that errors do not propagate silently through downstream processes. It should also allow users to easily correct, override, or escalate decisions. In a healthcare triage system, for example, a fail-safe approach would involve flagging uncertain diagnoses, escalating them to human experts, and preventing automated high-risk decisions without validation. Without these mechanisms, even a highly accurate system can quickly lose user trust after a single visible failure.

Related to this is the importance of predictability. Users generally prefer a system that is consistently reliable over one that is occasionally brilliant but unpredictable. Predictability allows users to form mental models, understand when to trust the system, and integrate it into their workflows. Without this consistency, users remain cautious and continue to verify outputs, increasing the trust tax and reducing overall value.

Workflow fit further determines whether an AI system is adopted or ignored. Even a technically strong and trustworthy solution will fail if it does not align with how people actually work. It is important to assess whether the AI reduces steps or introduces additional layers of complexity, whether it integrates into existing tools or requires users to switch platforms, and whether it is available at the point where decisions are made. Common warning signs include users exporting results to verify them elsewhere, maintaining manual backups, or using the AI only occasionally rather than as part of their core workflow. These are clear indicators that the system is not properly embedded and that adoption will remain limited.

Beyond adoption, commercial viability must also be addressed. A solution may be technically sound and trusted by users but still fail if it cannot generate sustainable revenue. This requires clarity on the business model, a realistic understanding of unit economics particularly compute and infrastructure costs and alignment between the user and the buyer. In some cases, AI solutions incur costs that make them difficult to scale profitably, despite their technical strengths.

Finally, strategic fit must be considered. Research organisations must decide whether they are the right entities to commercialise a given innovation. Not all solutions should be developed into standalone products. Some may be better suited for licensing, spin-outs, or partnerships. Choosing the wrong commercialisation pathway can stall even the most promising technologies.

Ultimately, the role of a commercialisation manager is to balance two competing forces: the research drive to maximise novelty and the commercial imperative to maximise adoption. The objective is not to prioritise one over the other, but to translate novelty into deployable, trusted, and scalable value. This requires disciplined evaluation across all dimensions, rather than an overreliance on technical performance alone.

In practice, every AI product should be tested against four fundamental conditions: whether it solves a real problem, whether it can operate reliably in the real world, whether users will trust and adopt it, and whether it can sustain a viable business model. If any one of these conditions is not met, commercialisation becomes significantly more difficult.

Research organisations are sitting on immense AI potential but unlocking that potential requires a shift in mindset—from models to systems, from accuracy to reliability, and from innovation to adoption. In the end, AI does not create value when it is impressive; it creates value when it is trusted, used, and embedded into real decisions.

I’d be interested to hear from others working at the intersection of AI and commercialisation: where have you seen the biggest gaps—novelty, deployment, or trust?

Visit CSIR