Bullseye - the critical importance of data accuracy in regulated industries

Tom Bridge
Jul 24, 2025
4 min read

Part three of a three-part series

Are you hitting the bullseye when it comes to data governance?

When it comes to compliance in regulated industries, there’s little room for error - you either hit the bullseye, or you miss. Even a small slip-up can trigger costly consequences or regulatory setbacks. Take this scenario:

Megan from Marketing has misread a critical note on an updated ingredient list for a new sparkling lemon soda. Instead of highlighting the new recipe for their sparkling lemon soda, having 15% less sugar, she’s launched an advert saying it has 1.5% less sugar. Fortunately, the compliance process catches the mistake before it gets published - and fortunately, Megan simply gets a reminder from her manager to take it as a ‘learning opportunity’ and is mandated to receive training on marketing SOPs.

But imagine if I told you Megan isn’t a human but a Gen-AI ‘agent’. Would this catch you by surprise?

While the so-called ‘black box’ of artificial intelligence is to a large extent unknown, the coding and design of the software and scripts that run it are designed by humans. That’s why common artificial intelligence tools such as Chat GPT are cited as having had hallucinations - simply making things up [1]. The central theme of AI errors is the data it's fed. By the nature of having almost every entry on the World Wide Web, it’s no wonder that the output is questionable from time to time. So, what happens when you are more selective about the data you train AI algorithms on? To explore how data quality impacts AI accuracy, I spoke with experts Samuel Fryer and Lynn Epstein, CEO and COO at BlueberryLabs respectively.

Back to the chalkboard

“In Megan’s example, it might be feasible to assume that her algorithm picked up on an advertising campaign from a competitor and took inspiration from it to develop its incorrect claim about sugar content. How do you prevent the ‘agent’ from doing that? Jumping to technical solutions to solve AI’s problems is certainly instinctive; however, it’s more critical for organizations to redefine trusted data sources”. - Lynn Epstein

In today’s world, the definition of a trusted source of information appears to be constantly up for debate. However, as an organization in a regulated industry, there will always be undisputed data repositories:

Government agency websites
Publicly funded bodies
Research publications
Recipes and formulations
Your own SOPs, policies, and other internal documents

According to Lynn, “these act as the irrefutable fact base of your organization.” Once you’ve clearly defined your trusted sources, you can unlock AI’s full potential. Lynn explains: “You can then utilize retrieval augmented generation (RAG) to combine AI technology with only the most up-to-date information from your agreed-upon trusted sources.”

Sam elaborates:

“by telling AI models which sources to reference, you provide them with a clear framework to guide their output, improving accuracy and usefulness.”

“In Megan’s simple example, the AI agent can reference its marketing claim against its trusted sources, and inevitably be made aware that the statement is inaccurate and, per policy and/or procedure, cannot be published.”

This is a simple example, but what happens when things get more complicated or specific? Sam continues: “That’s the comfort in providing better quality data to your LLM, even if it’s unstructured,” Sam explains. “With high-quality sources, you also gain a clear audit trail - showing when the information was published, who authored it, and what it references. This leads to more reliable answers to your questions.”

For example: “What US federal legislation affecting the food and beverage industry has been published since January 1st, 2025?”

Sam adds: “Sure, ChatGPT or another general AI model might attempt to answer - but without trusted data, can you rely on the result for critical decisions? The answer is no.”

Test, verify, test again.

Inevitably, LLMs will improve in terms of accuracy over time. And that’s why RAG-ready solutions are so helpful for regulated industries, as Sam explains: “Being able to feed controlled and trusted data sources gives you an advantage. It’s then on us humans to test and verify the output [2]. You can even use other LLMs to verify the output for you - that’s why our platform allows users to switch between popular LLMs.”

“We’ve worked with numerous multi-national organizations that set the bar when it comes to data validation. The key is to work with technology and your teams to build failproof processes that employ a zero-trust approach to output accuracy, until it can be validated.”

So there it is then...

Hitting the bullseye requires nuance and a trusted internal data governance process. But the truth is that technology and its implementation can do much of the heavy lifting for you. Discover how your knowledge, intelligence, and data science teams can benefit from RAG-ready AI solutions, customized for the legal, life science, and finance sectors.

Let’s talk data

Looking to get started on leveraging your data and integrate it with AI?

Take proactive steps to manage your data leveraging the latest techniques. Book a call with one of our experts today:

References

[1] Guinness, H. (2024, July 10). What are AI hallucinations and how do you prevent them? Zapier. https://zapier.com/blog/ai-hallucinations/

[2] Amer, M. (n.d.). Evaluating outputs. Cohere. https://cohere.com/llmu/evaluating-llm-outputs