A six-tiered framework for evaluating AI models from repeatability to replaceability

Siqi Tian, Alicia Wan Yu Lam, Joseph Jao Yiu Sung*, Wilson Wen Bin Goh

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

Abstract

Artificial intelligence (AI) is rapidly transforming biotechnology and medicine. But evaluating its safety, effectiveness, and generalizability is increasingly challenging, especially for complex generative models. Traditional evaluation metrics often fall short in high-stakes applications where reliability and adaptability are critical. We propose a six-tiered framework to guide AI evaluation across the dimensions of repeatability, reproducibility, robustness, rigidity, reusability, and replaceability. These tiers reflect increasing expectations, from basic consistency to deployment. Each is defined clearly, with actionable testing methodologies informed by literature. Designed for flexibility, the framework applies to both traditional and generative AI. Through case studies in diagnostics and medical large language models (LLM), we demonstrate its utility in fostering trustworthy, accountable, and effective AI for biomedicine, biotechnology, and beyond.

Original languageEnglish
JournalTrends in Biotechnology
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2025 Elsevier Ltd

ASJC Scopus Subject Areas

  • Biotechnology
  • Bioengineering

Keywords

  • artificial intelligence
  • data science
  • machine learning
  • model evaluation
  • robustness

Cite this