Transforming fragmented regulatory data into navigable intelligence for top pharmaceutical companies.
At Redica, we used GPT-4o and a knowledge graph to turn fragmented pharma regulatory data into navigable intelligence. We built a system that connected guidance documents, inspection reports, enforcement actions, and CFRs. Users could explore relationships between regulatory activity across different agencies and topics. I worked with the team to use GPT-4o for extracting relationships, summarizing documents, translating non-English content, and surfacing connections that weren't obvious through keyword search. The system allowed users to chat with any document, automatically generate site risk briefings, and explore complex regulatory topics. This made it much easier to find relevant context and make informed decisions.
Redica had excellent inspection data that was structured, clean, and trusted by top pharma companies. However, their regulatory intelligence product was still in early stages with limited structure and no connections to other data sources.
The challenge was taking a disorganized collection of guidance documents, warning letters, and enforcement actions and making it genuinely useful. Users needed more than just search capability; they needed to navigate relationships and understand context across regulatory activity, inspections, and risk patterns. QA, compliance, and strategy teams needed to spot trends before they became problems.
The individual pieces of data existed, but the meaningful relationships between them didn't. Our goal was to build those connections systematically at scale.
Most systems pile on more data. We focused on surfacing what matters and how itβs connected.
We began by identifying the core entities that regulatory teams care about:
Each entity included relevant metadata, and every connection had provenance tracking and confidence weighting. We used GPT-4o to identify potential relationships, LangChain to process and chunk lengthy documents, and Neo4j for graph storage and traversal. I collaborated with the engineering team on schema design and worked on the user experience to help people explore these relationships without feeling overwhelmed.
Practical AI applications that improved daily workflows
The graph provided the underlying structure, while GPT-4o helped us extract meaningful insights from inspections, enforcement actions, and regulatory documents. We focused on reducing noise, minimizing manual work, and helping users find relevant answers more efficiently.
Most of Redica's users aren't searching for PDFs. They're trying to answer questions.
We added chat to any node in the graph. You could open an inspection or guidance doc and ask a real question. The model used the graph context and source text to give a useful answer, with references. No magic. Just fast access to information that mattered.
A lot of documents in Redica had no summaries. Many weren't in English. That slowed everything down.
We used GPT-4o to fix both.
Every document now has a clear, scoped summary that regulatory teams can scan quickly. If the original language wasn't English, we translated it. If it lacked metadata, we filled it in with topic and geography. We gave users a reason to open the document instead of skipping it.
Regulatory documents often have genuine relationships that aren't obvious from their titles or surface content.
We used GPT-4o to identify meaningful connections that weren't apparent through keyword matching alone. For example, an FDA observation about inadequate process control could be linked to EMA guidance on aseptic processing. They covered similar regulatory themes but used different terminology. Traditional search might miss these connections, but the model could identify the conceptual relationships.
These connections appeared as "related guidance" or "related inspections" in the interface, providing users with relevant context without requiring them to know specific search terms.
Customers spend hours compiling reports before audits or internal reviews. They pull citations manually, summarize findings, and guess what context to include. We built a tool that does most of that for them.
You enter a site or manufacturer. It pulls in relevant inspections, observations, enforcement actions, linked documents, and guidance. Then it assembles a briefing that's actually readable. Structured. Reviewable. Editable.
It doesn't replace judgment. It just saves the team from doing the same work over and over.
Defined schema alongside engineering and data science teams
Mapped product requirements to user-facing features and model evaluation
Set evaluation metrics: recall of relevant nodes, user task completion, query latency
Prioritized development based on customer interviews and feedback
Built feedback loop with SMEs to validate edge accuracy and reduce false positives
Scoped and reviewed API contracts for frontend graph exploration tooling
Regulatory teams can now see when new documents signal changes in inspection behavior
Across 5 core object types
Connected intelligence
Median after optimization
vs keyword search
Closed in 6 weeks
Guidance links surfaced
Cross-jurisdictional intelligence reveals hidden regulatory patterns and precedent connections
| Metric | Keyword Search | Graph Query |
|---|---|---|
| Avg. relevant docs found | 5.7 | 11.2 |
| Time to first insight | 3m12s | 38s |
| Tasks completed (SMEs) | 54% | 92% |
| # of hops to full context | N/A | 2.3 |
Fast, deep, and usefulβthe graph changes how users get work done
We limited default traversal depth for common queries and precomputed relationship paths for the most used node types. Redis handled caching. This kept UX responsive without oversimplifying the graph.
We had regulatory experts from top pharma companies on staff. They reviewed relationships directly. If a link didnβt hold up, it was removed. We didnβt pad counts or chase novelty. The graph had to reflect reality.
We also built feedback tools into the product. Early on, we weighted input from a trusted group of power users. They knew the space and gave direct, actionable feedback. It helped us catch weak connections and keep the signal clean.
Plugging the graph into dynamic monitoring. Trigger alerts when new documents strengthen risk signals for a known site. We already started work on query-driven workflows and narrative explanations on top of the graph engine.
We turned a disconnected regulatory corpus into a navigable graph with provenance. Guidance, inspections, enforcement actions, and CFRs linked in one place.
Users stopped guessing keywords and hopping tools. From a citation to a finding to related guidance in a few clicks. Time to first insight 3m12sβ38s. SME task completion 54%β92%.
Teams could track issues across sites and regions and prep for audits without copying data by hand. It reflected how people actually work and what they need to move faster and make better decisions.
Iβll help you define objects and relationships, set evaluation, and use LLMs where it adds value. If you have volume and messy text, we can scope a pilot in ~4-8 weeks and measure time to insight and task completion.
Talk About Your Data