December 5
@
12:00 PM
–
1:00 PM
Advancing Agentic, Data-Driven, and Trustworthy Hypothesis Generation in Biomedical Research
Yang Lu, PhD
Assistant Professor
Cheriton School of Computer Science
University of Waterloo
Abstract:
Rapid developments in high-throughput sequencing have enabled biologists to collect large volumes of multi-omics data with unprecedented resolution. However, interpretation of such an increasing amount of heterogeneous biological data becomes highly nontrivial. In my talk, I will present a data-driven research paradigm to discover testable hypotheses directly from biological data in an interpretable and trustworthy fashion. In particular, the talk will focus on three recent works that address key aspects of biomedical research: analyzing data, generating and prioritizing hypotheses, and engaging with users:
(1) An interpretation method that detects non-additive interactions from any machine learning (ML) models. The detected interactions, treated as hypotheses, are rigorously controlled for statistical errors without relying on p-values. This method was the first to demonstrate to the community that higher-order interpretations of ML models can be achieved with confidence guarantees.
(2) An AI-driven agent that automatically translates biologists’ needs into actionable insights. The agent we developed enables the automatic execution of off-the-shelf Python-based bioinformatics tools, allowing researchers to generate analysis results with minimal tool-specific knowledge and coding expertise. This method was the first initiative to streamline the automatic and codeless execution of general-purpose bioinformatics tasks via conversation.
(3) A critical reevaluation of problematic statistical estimation of the Basic Alignment Search Tool (BLAST), a cornerstone tool used in daily biomedical analysis over the past 30 years. We have introduced an alternative method to address this issue, ensuring that it does not yield inflated estimates of significance. Our study has the potential to influence and reshape numerous conclusions drawn by researchers.
Print PDF