Virchow: A Million-Slide Digital Pathology Foundation Model

Sep 14, 2023·
Eugene Vorontsov
Alican Bozkurt
Alican Bozkurt
,
Adam Casson
,
George Shaikovski
,
Michal Zelechowski
,
Siqi Liu
,
Kristen Severson
,
Eric Zimmermann
,
James Hall
,
Neil Tenenholtz
,
Nicolo Fusi
,
Philippe Mathieu
,
Alexander van Eck
,
Donghun Lee
,
Julian Viret
,
Eric Robert
,
Yi Kan Wang
,
Jeremy D. Kunz
,
Matthew C. H. Lee
,
Jan Bernhard
,
Ran A. Godrich
,
Gerard Oakley
,
Ewan Millar
,
Matthew Hanna
,
Juan Retamero
,
William A. Moye
,
Razik Yousfi
,
Christopher Kanan
,
David Klimstra
,
Brandon Rothrock
,
Thomas J. Fuchs
· 0 min read
Abstract
The use of artificial intelligence to enable precision medicine and decision support systems through the analysis of pathology images has the potential to revolutionize the diagnosis and treatment of cancer. Such applications will depend on models’ abilities to capture the diverse patterns observed in pathology images. To address this challenge, we present Virchow, a foundation model for computational pathology. Using self-supervised learning empowered by the DINOv2 algorithm, Virchow is a vision transformer model with 632 million parameters trained on 1.5 million hematoxylin and eosin stained whole slide images from diverse tissue and specimen types, which is orders of magnitude more data than previous works. The Virchow model enables the development of a pan-cancer detection system with 0.949 overall specimen-level AUC across 17 different cancer types, while also achieving 0.937 AUC on 7 rare cancer types. The Virchow model sets the state-of-the-art on the internal and external image tile level benchmarks and slide level biomarker prediction tasks.
Type
Publication
arXiv preprint arXiv:2309.07778 (2023)
publication
Alican Bozkurt
Authors
AI Scientist
I am an AI Scientist at Paige AI. I did my Ph.D. with Jennifer Dy, Dana Brooks, and Jan-Willem van de Meent at Northeastern University. My main research interests are machine learning with emphasis on probabilistic programming, deep neural networks, and their applications in biomedical image processing. I am one of the developers of Probabilistic Torch, a library for deep generative models that extends PyTorch. I am also one of the maintainers of the PyTorch distributions module.