← Back to portfolio

Pharmaceutical

Scientific & GxP AI Knowledge Assistant

Built an Enterprise RAG assistant unifying PubMed, Veeva Vault QualityDocs, QUMAS and SharePoint into a single Q&A interface for scientific and regulatory teams.

Client
Confidential, pharma
Stack
Azure AI Search, Azure AI Foundry, Databricks, Python, Docker, RAG, LangChain

Outcomes

  • 3M+ documents indexed and continuously updated across PubMed, Veeva QualityDocs, QUMAS, and SharePoint
  • 45 monthly active users across scientific and quality teams
  • Every answer carries citations and references to source documents
  • Per-source access control is preserved end to end
  • Document keyword search and user-curated document selection

Scientists, quality and regulatory teams were searching across multiple disconnected systems (PubMed, Veeva QualityDocs, QUMAS and SharePoint) to find answers buried in over 3M documents. Each system had its own search interface, access model and content type, forcing users to query them separately and manually reconcile findings. As GenAI capabilities matured and the pressure to accelerate scientific and quality work increased, the cost of this fragmented workflow became hard to justify.

I designed, implemented and led the deployment of the Scientific & GxP AI Knowledge Assistant, an enterprise RAG assistant on Azure AI Search, Azure AI Foundry and Databricks. I delivered the application end to end: data engineering pipelines for continuous ingestion across the four sources, the AI systems handling retrieval and answer generation, the orchestration and infrastructure layer, and the frontend that scientists and quality teams use day to day. The platform continuously synchronizes over 3M documents from the source systems, preserves per-source access controls end-to-end, and grounds every answer in cited source documents.

A defining design choice was giving users explicit control over which documents inform each answer. Through keyword and metadata search across the full corpus, users first narrow down to the documents they trust and then ask their questions, bounding retrieval and creating a transparent audit trail of what drove each answer. A deliberate choice for regulated environments where unverified answers are unusable.

This design unlocks cross-source questions that no single system could answer on its own: “What does recent literature say about resistance mechanisms to enzalutamide, and which of our internal protocols address this?”, “For a given compound, what trials have been published externally and what internal study reports do we have on file?”, or “Which of our SOPs reference the regulatory standards cited in this recent publication?”. Scientific, quality and regulatory teams can now reconcile external evidence with internal policy in a single conversation, with citations back to every source.

Are your Data and AI systems, or the lack of them, holding your true business value hostage?

Book 30 minutes. Tell me where you are. I'll tell you what I'd do.