Human-Level X-Ray Diagnosis: A research summary of CheXbert for labeling chest x-rays

Analytics DeepLearning.AI

20 May 2020 — 2 min read

Like nurses who can’t decipher a doctor’s handwriting, machine learning models can’t decipher medical scans — without labels. Conveniently, natural language models can read medical records to extract labels for X-ray images.

What’s new: A Stanford team including Akshay Smit and Saahil Jain developed CheXbert, a network that labels chest X-rays nearly as accurately as human radiologists. (Disclosure: The authors include Pranav Rajpurkar, teacher of deeplearning.ai’s AI for Medicine Specialization, as well as Andrew Ng.)

Key insight: A natural language model trained on a rule-based system can generalize to situations the rule-based system doesn’t recognize. This is not a new insight, but it is novel in the authors’ application.

How it works: CheXbert predicts a label from 14 diagnostic classes in the similarly named CheXpert dataset: one of 12 conditions, uncertain, or blank. CheXpert comes with a rule-based labeler that searches radiological reports for mentions of the conditions and determines whether they appear in an image.

The researchers started with BlueBERT, a language model pre-trained on medical documents.
They further trained the model on CheXpert’s 190,000 reports to predict labels generated by CheXpert’s labeler.
Then they fine-tuned the model on 1,000 reports labeled by two board-certified radiologists.
The fine-tuning also included augmented examples of the reports produced by the technique known as back translation. The researchers used a Facebook translator to turn the reports from English into German and back, producing rephrased versions.

Results: CheXbert achieved an F1 score of 0.798 on the MIMIC-CXR dataset of chest X-rays. That’s 0.045 better than CheXpert’s labeler and 0.007 short of a board-certified radiologist’s score.

Yes, but: This approach requires a pre-existing, high-quality labeler. Moreover, the neural network’s gain over the rule-based system comes at the cost of interpretability.

Why it matters: A doctor’s attention is too valuable to spend relabeling hundreds of thousands of patient records as one-hot vectors for every possible medical condition. Rule-based labeling can automate some of the work, but language models are better at determining labels.

We’re thinking: Deep learning is poised to accomplish great things in medicine. It all starts with good labels.

Line of futuristic humanoid robots, getting gradually bigger from left to right

Training power laws translate to robotics: Amazon builds forecasting model to predict multiple scenarios

Stability AI’s limited wins in Getty copyright suit. Kosmos’s new generalist scientific research agent. German Commons, a big open dataset for training AI models. Google’s experiments putting satellites with AI chips in space.

Robots extract colorful data streams from silo towers, highlighting data silos being broken.

Tear Down Data Silos!: Many software-as-a-service vendors aim to hold their customers' data in silos. Their customers would do well to open the silos so AI agents can use the data.

AI agents are getting better at looking at different types of data in businesses to spot patterns and create value. This is making data silos increasingly painful.

OpenAI Reorgs For Profit, MiniMax-M2 Leads Open Coding, Universal Music Group Embraces AI, LLMs Go Private

The Batch AI News and Insights: AI agents are getting better at looking at different types of data in businesses to spot patterns and create value. This is making data silos increasingly painful.

Chart illustrates exact and approximate memorization percentages in different Gemma models.

Masking Private Data in Training Sets: Google researchers released VaultGemma, an open-weights model redacting personal information

Large language models often memorize details in their training data, including private information that may appear only once, like a person’s name, address, or phone number. Researchers built the first open-weights language model that’s guaranteed not to remember such facts.

Read more

Training power laws translate to robotics: Amazon builds forecasting model to predict multiple scenarios

Tear Down Data Silos!: Many software-as-a-service vendors aim to hold their customers' data in silos. Their customers would do well to open the silos so AI agents can use the data.

OpenAI Reorgs For Profit, MiniMax-M2 Leads Open Coding, Universal Music Group Embraces AI, LLMs Go Private

Masking Private Data in Training Sets: Google researchers released VaultGemma, an open-weights model redacting personal information