THE CHALLENGE
A Cambridge-based biotech startup targeting rare neurological diseases faced the classic drug discovery dilemma: traditional high-throughput screening is expensive (£50-100M), time-consuming (5-7 years to a candidate), and has a high failure rate.
The startup needed to:
- Identify novel small-molecule drug candidates for 3 rare disease targets
- Validate binding affinity, toxicity, and synthesizability computationally
- Dramatically reduce the time and cost to lead candidate identification
- Compete with pharma giants despite limited resources
They sought an AI-first approach to accelerate discovery while maintaining scientific rigour.
OUR APPROACH
We built a generative AI drug discovery platform combining deep learning, molecular simulations, and laboratory automation:
GENERATIVE DESIGN
- Molecular Generation Models
- Graph neural networks (GNN) generate molecular structures atom-by-atom
- Variational Autoencoders (VAE) learning from 200M known compounds
- Reinforcement learning optimising for multiple objectives: binding affinity, drug-likeness (Lipinski’s Rule of Five), synthesizability, toxicity
- Virtual Screening at Scale
- Molecular docking simulations predicting binding to target proteins
- ADMET prediction (Absorption, Distribution, Metabolism, Excretion, Toxicity)
- Quantum chemistry calculations (DFT) for top candidates
- Processed 50M generated molecules computationally before synthesis
- Active Learning Loop
- Iterative cycle: Generate → Predict → Synthesise top candidates → Test → Retrain models
- Each synthesis/assay result improves model accuracy
- Converges on optimal candidates faster than traditional screening
INFRASTRUCTURE
- Compute: AWS HPC clusters with GPU instances (p4d.24xlarge)
- Chemical Libraries: ZINC, ChEMBL, PubChem integration
- Molecular Simulation: Schrödinger Suite, RDKit, OpenMM
- Laboratory Automation: Integrated with lab robotics for automated synthesis and testing
SCIENTIFIC VALIDATION
- Retrospective validation on known protein-drug pairs (95% accurate prediction)
- Collaboration with academic researchers for target protein validation
- Independent toxicology assessment by CRO (Contract Research Organisation)
THE RESULTS
DISCOVERY TIMELINE
Traditional high-throughput screening would take 4-6 years and £60M to identify lead candidates. Our AI platform achieved:
- Month 1-3: Target protein characterisation and model training
- Month 4-8: Generative model designed 2,500 novel candidates
- Month 9-12: Virtual screening narrowed to 150 candidates
- Month 13-14: Synthesis and testing of top 45 candidates
- Month 14: 12 candidates showed strong binding and acceptable ADMET profiles
14 MONTHS. £4.2M INVESTED. 12 VIABLE LEAD CANDIDATES.
SCIENTIFIC OUTCOMES
- Hit Rate: 27% (12 of 45 synthesised compounds viable) vs. 0.1-1% traditional screening
- Binding Affinity: 3 candidates with IC50 < 100nM (strong binding)
- Novelty: 9 of 12 candidates had no structural analogues in existing literature (novel mechanisms)
- Synthesizability: 98% of designed molecules could be synthesised (vs. 30-40% typical generative models)
BUSINESS IMPACT
- Cost Savings: £55M+ vs. traditional discovery (92% reduction)
- Time Savings: 3-5 years accelerated timeline
- IP Portfolio: 4 patent applications filed for novel compounds
- Investment: Series B funding (£28M) secured on the strength of the platform and candidates
- Partnership: Big Pharma licensing deal signed (£180M potential value)
CLINICAL PROGRESS
As of March 2026:
- 3 candidates in pre-clinical IND-enabling studies
- 1 candidate preparing for Phase I clinical trial (Q4 2026)
- Platform expansion: Applied to 4 additional disease targets
INDUSTRY TRANSFORMATION
The platform demonstrated that AI can fundamentally change drug discovery economics:
- Democratisation: Small biotechs can compete with pharma R&D budgets
- Neglected Diseases: Economically viable to target rare diseases (smaller markets)
- Failure Risk Reduction: Computational validation before expensive synthesis
- Speed to Clinic: Faster patient access to novel therapeutics
The startup is now licensing the platform to other biotechs and has spun out the AI capability as a separate drug discovery-as-a-service company.
TECHNOLOGIES USED
Python, PyTorch, TensorFlow, RDKit, Schrödinger Suite, OpenMM, DeepChem, Graph Neural Networks, Reinforcement Learning, AWS (EC2 P4, S3, Batch), PostgreSQL, Airflow, Jupyter, React