Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens

  • Kunal Nagpal
    Google Health, Google LLC, Mountain View, California
  • Davis Foote
    Google Health, Google LLC, Mountain View, California
  • Fraser Tan
    Google Health, Google LLC, Mountain View, California
  • Yun Liu
    Google Health, Google LLC, Mountain View, California
  • Po-Hsuan Cameron Chen
    Google Health, Google LLC, Mountain View, California
  • David F. Steiner
    Google Health, Google LLC, Mountain View, California
  • Naren Manoj
    Google Health, Google LLC, Mountain View, California
  • Niels Olson
    Laboratory Department, Naval Medical Center San Diego, San Diego, California
  • Jenny L. Smith
    Laboratory Department, Naval Medical Center San Diego, San Diego, California
  • Arash Mohtashamian
    Laboratory Department, Naval Medical Center San Diego, San Diego, California
  • Brandon Peterson
    Laboratory Department, Naval Medical Center San Diego, San Diego, California
  • Mahul B. Amin
    Department of Pathology and Laboratory Medicine, University of Tennessee Health Science Center, Memphis
  • Andrew J. Evans
    Department of Pathology, Laboratory Medicine and Pathology, University Health Network and University of Toronto, Toronto, Ontario, Canada
  • Joan W. Sweet
    Department of Pathology, Laboratory Medicine and Pathology, University Health Network and University of Toronto, Toronto, Ontario, Canada
  • Carol Cheung
    Department of Pathology, Laboratory Medicine and Pathology, University Health Network and University of Toronto, Toronto, Ontario, Canada
  • Theodorus van der Kwast
    Department of Pathology, Laboratory Medicine and Pathology, University Health Network and University of Toronto, Toronto, Ontario, Canada
  • Ankur R. Sangoi
    Department of Pathology, El Camino Hospital, Mountain View, California
  • Ming Zhou
    Tufts Medical Center, Boston, Massachusetts
  • Robert Allan
    Pathology and Laboratory Medicine Service, North Florida/South Georgia Veterans Health System, Gainesville, Florida
  • Peter A. Humphrey
    Department of Pathology, Yale School of Medicine, New Haven, Connecticut
  • Jason D. Hipp
    Google Health, Google LLC, Mountain View, California
  • Krishna Gadepalli
    Google Health, Google LLC, Mountain View, California
  • Greg S. Corrado
    Google Health, Google LLC, Mountain View, California
  • Lily H. Peng
    Google Health, Google LLC, Mountain View, California
  • Martin C. Stumpe
    Google Health, Google LLC, Mountain View, California
  • Craig H. Mermel
    Google Health, Google LLC, Mountain View, California

Search this article

Description

For prostate cancer, Gleason grading of the biopsy specimen plays a pivotal role in determining case management. However, Gleason grading is associated with substantial interobserver variability, resulting in a need for decision support tools to improve the reproducibility of Gleason grading in routine clinical practice.To evaluate the ability of a deep learning system (DLS) to grade diagnostic prostate biopsy specimens.The DLS was evaluated using 752 deidentified digitized images of formalin-fixed paraffin-embedded prostate needle core biopsy specimens obtained from 3 institutions in the United States, including 1 institution not used for DLS development. To obtain the Gleason grade group (GG), each specimen was first reviewed by 2 expert urologic subspecialists from a multi-institutional panel of 6 individuals (years of experience: mean, 25 years; range, 18-34 years). A third subspecialist reviewed discordant cases to arrive at a majority opinion. To reduce diagnostic uncertainty, all subspecialists had access to an immunohistochemical-stained section and 3 histologic sections for every biopsied specimen. Their review was conducted from December 2018 to June 2019.The frequency of the exact agreement of the DLS with the majority opinion of the subspecialists in categorizing each tumor-containing specimen as 1 of 5 categories: nontumor, GG1, GG2, GG3, or GG4-5. For comparison, the rate of agreement of 19 general pathologists' opinions with the subspecialists' majority opinions was also evaluated.For grading tumor-containing biopsy specimens in the validation set (n = 498), the rate of agreement with subspecialists was significantly higher for the DLS (71.7%; 95% CI, 67.9%-75.3%) than for general pathologists (58.0%; 95% CI, 54.5%-61.4%) (P  .001). In subanalyses of biopsy specimens from an external validation set (n = 322), the Gleason grading performance of the DLS remained similar. For distinguishing nontumor from tumor-containing biopsy specimens (n = 752), the rate of agreement with subspecialists was 94.3% (95% CI, 92.4%-95.9%) for the DLS and similar at 94.7% (95% CI, 92.8%-96.3%) for general pathologists (P = .58).In this study, the DLS showed higher proficiency than general pathologists at Gleason grading prostate needle core biopsy specimens and generalized to an independent institution. Future research is necessary to evaluate the potential utility of using the DLS as a decision support tool in clinical workflows and to improve the quality of prostate cancer grading for therapy decisions.

Journal

  • JAMA Oncology

    JAMA Oncology 6 (9), 1372-, 2020-09-01

    American Medical Association (AMA)

Citations (3)*help

See more

Report a problem

Back to top