THE CHALLENGE

A Cambridge-based biotech startup targeting rare neurological diseases faced the classic drug discovery dilemma: traditional high-throughput screening is expensive (£50-100M), time-consuming (5-7 years to a candidate), and has a high failure rate.

The startup needed to:

  • Identify novel small-molecule drug candidates for 3 rare disease targets
  • Validate binding affinity, toxicity, and synthesizability computationally
  • Dramatically reduce the time and cost to lead candidate identification
  • Compete with pharma giants despite limited resources

They sought an AI-first approach to accelerate discovery while maintaining scientific rigour.

OUR APPROACH

We built a generative AI drug discovery platform combining deep learning, molecular simulations, and laboratory automation:

GENERATIVE DESIGN

  1. Molecular Generation Models
  • Graph neural networks (GNN) generate molecular structures atom-by-atom
  • Variational Autoencoders (VAE) learning from 200M known compounds
  • Reinforcement learning optimising for multiple objectives: binding affinity, drug-likeness (Lipinski’s Rule of Five), synthesizability, toxicity
  1. Virtual Screening at Scale
  • Molecular docking simulations predicting binding to target proteins
  • ADMET prediction (Absorption, Distribution, Metabolism, Excretion, Toxicity)
  • Quantum chemistry calculations (DFT) for top candidates
  • Processed 50M generated molecules computationally before synthesis
  1. Active Learning Loop
  • Iterative cycle: Generate → Predict → Synthesise top candidates → Test → Retrain models
  • Each synthesis/assay result improves model accuracy
  • Converges on optimal candidates faster than traditional screening

INFRASTRUCTURE

  • Compute: AWS HPC clusters with GPU instances (p4d.24xlarge)
  • Chemical Libraries: ZINC, ChEMBL, PubChem integration
  • Molecular Simulation: Schrödinger Suite, RDKit, OpenMM
  • Laboratory Automation: Integrated with lab robotics for automated synthesis and testing

SCIENTIFIC VALIDATION

  • Retrospective validation on known protein-drug pairs (95% accurate prediction)
  • Collaboration with academic researchers for target protein validation
  • Independent toxicology assessment by CRO (Contract Research Organisation)

THE RESULTS

DISCOVERY TIMELINE

Traditional high-throughput screening would take 4-6 years and £60M to identify lead candidates. Our AI platform achieved:

  • Month 1-3: Target protein characterisation and model training
  • Month 4-8: Generative model designed 2,500 novel candidates
  • Month 9-12: Virtual screening narrowed to 150 candidates
  • Month 13-14: Synthesis and testing of top 45 candidates
  • Month 14: 12 candidates showed strong binding and acceptable ADMET profiles

14 MONTHS. £4.2M INVESTED. 12 VIABLE LEAD CANDIDATES.

SCIENTIFIC OUTCOMES

  • Hit Rate: 27% (12 of 45 synthesised compounds viable) vs. 0.1-1% traditional screening
  • Binding Affinity: 3 candidates with IC50 < 100nM (strong binding)
  • Novelty: 9 of 12 candidates had no structural analogues in existing literature (novel mechanisms)
  • Synthesizability: 98% of designed molecules could be synthesised (vs. 30-40% typical generative models)

BUSINESS IMPACT

  • Cost Savings: £55M+ vs. traditional discovery (92% reduction)
  • Time Savings: 3-5 years accelerated timeline
  • IP Portfolio: 4 patent applications filed for novel compounds
  • Investment: Series B funding (£28M) secured on the strength of the platform and candidates
  • Partnership: Big Pharma licensing deal signed (£180M potential value)

CLINICAL PROGRESS

As of March 2026:

  • 3 candidates in pre-clinical IND-enabling studies
  • 1 candidate preparing for Phase I clinical trial (Q4 2026)
  • Platform expansion: Applied to 4 additional disease targets

INDUSTRY TRANSFORMATION

The platform demonstrated that AI can fundamentally change drug discovery economics:

  • Democratisation: Small biotechs can compete with pharma R&D budgets
  • Neglected Diseases: Economically viable to target rare diseases (smaller markets)
  • Failure Risk Reduction: Computational validation before expensive synthesis
  • Speed to Clinic: Faster patient access to novel therapeutics

The startup is now licensing the platform to other biotechs and has spun out the AI capability as a separate drug discovery-as-a-service company.

TECHNOLOGIES USED
Python, PyTorch, TensorFlow, RDKit, Schrödinger Suite, OpenMM, DeepChem, Graph Neural Networks, Reinforcement Learning, AWS (EC2 P4, S3, Batch), PostgreSQL, Airflow, Jupyter, React