Modern biological experiments generate an unprecedented amount of data. How do we discover new biology when we have millions of microscopy images or protein sequences, and it becomes impossible to look at data at a one-by-one basis anymore? My research develops machine learning methods for discovering hypotheses in biology.
I have broad research interests, but central themes include:
How do we discover biological hypotheses with machine learning?
Self-supervised representation learning produces unbiased representations of biology
Visualization and interpretation discovers biological hypotheses
How do we build models that ignore non-biological variation?