Summary: Researchers developed a new approach to improve uncertainty estimates in machine learning models, increasing prediction accuracy. Their method, IF-COMP, uses the minimum description length principle to provide more reliable confidence measures for AI decisions, essential in high-stakes environments such as healthcare.
This scalable technique can be applied to large models, helping non-experts determine the reliability of AI predictions. The findings can lead to better decision-making in real-world applications.
Key facts:
- Improved accuracy: IF-COMP improves uncertainty estimates in AI predictions.
- Scalability: Applicable for large, complex models in critical environments such as healthcare.
- User friendly: It helps non-experts assess the reliability of AI decisions.
Source: myth
Because machine learning models can make false predictions, researchers often equip them with the ability to tell a user how confident they are about a particular decision. This is especially important in high-stakes environments, such as when using patterns to help identify diseases in medical images or filter job applications.
But the quantities of a model’s uncertainty are only useful if they are accurate. If a model says it is 49% certain that a medical image shows a pleural effusion, then 49% of the time, the model must be right.
MIT researchers have introduced a new approach that can improve uncertainty estimates in machine learning models. Their method not only generates more accurate uncertainty estimates than other techniques, but does so more efficiently.
In addition, because the technique is scalable, it can be applied to large deep learning models that are increasingly being used in healthcare and other safety-critical situations.
This technique can give end users, many of whom lack machine learning expertise, better information that they can use to determine whether to trust a model’s predictions or whether the model should set for a specific task.
“It’s easy to see that these models perform really well in scenarios where they’re very good, and then assume they’ll be just as good in other scenarios.
“This makes it particularly important to push this type of work that seeks to better calibrate the uncertainty of these models to ensure that they match human notions of uncertainty,” says lead author Nathan Ng, a graduate student at the University of Toronto who is. a visiting student at MIT.
Ng wrote the paper with Roger Grosse, an assistant professor of computer science at the University of Toronto; and senior author Marzyeh Ghassemi, associate professor in the Department of Electrical Engineering and Computer Science and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems. The research will be presented at the International Conference on Machine Learning.
Quantification of uncertainty
Uncertainty quantification methods often require complex statistical calculations that do not match well with multimillion-parameter machine learning models. These methods also require users to make assumptions about the model and the data used to train it.
MIT researchers took a different approach. They use what is known as the principle of minimum description length (MDL), which does not require assumptions that can hinder the accuracy of other methods. The MDL is used to quantify and best calibrate the uncertainty for the test points that the model is asked to label.
The technique developed by the researchers, known as IF-COMP, makes MDL fast enough to be used with the kinds of large deep learning models deployed in many real-world environments.
MDL involves considering all possible labels that a model can give a test point. If there are many alternative labels for that point that fit well, his confidence in the label he chose should decrease accordingly.
“One way to figure out how confident a model is would be to show it some counterfactual information and see how likely it is to believe you,” says Ng.
For example, consider a model that says a medical image shows a pleural effusion. If researchers tell the model that this image shows edema and it is willing to update its belief, then the model should be less confident in its original decision.
With MDL, if a model is confident in labeling a data point, it must use a very short code to describe that point. If she is unsure of her decision because the point may have many other labels, she uses a longer code to capture these possibilities.
The amount of code used to label a data point is known as the complexity of the stochastic data. If researchers ask the model how willing it is to update its belief about a data point given contrary evidence, the complexity of the stochastic data should decrease if the model is confident.
But testing each data point using the MDL would require a large amount of computation.
Speeding up the process
With IF-COMP, the researchers developed an approximation technique that can accurately estimate the complexity of stochastic data using a special function, known as an influence function. They also used a statistical technique called temperature scaling, which improves the calibration of model outputs. This combination of impact functions and temperature scaling enables high-quality approximations of the complexity of stochastic data.
Ultimately, IF-COMP can efficiently produce well-calibrated uncertainty quantities that reflect the true confidence of a model. The technique can also determine if the model has mislabeled certain data points or detect which data points are outliers.
The researchers tested their system on these three tasks and found that it was faster and more accurate than other methods.
“It’s really important to have some confidence that a model is well calibrated, and there’s a growing need to detect when a specific prediction doesn’t seem quite right. Audit tools are becoming more necessary in machine learning problems as we use large amounts of unexamined data to make models that will be applied to problems that humans face,” says Ghassemi.
IF-COMP is model agnostic, so it can provide accurate uncertainty quantifications for many types of machine learning models. This could enable it to be deployed in a wider range of real-world settings, ultimately helping more practitioners make better decisions.
“People need to understand that these systems are very fallible and can make things up as they go. A model may look like she’s very confident, but there are a number of different things she’s willing to believe, given evidence to the contrary,” says Ng.
In the future, the researchers are interested in applying their approach to large language models and studying other possible use cases for the minimum description length principle.
About this AI research news
Author: Melanie Grados
Source: myth
Contact: Melanie Grados – MYTH
Image: Image is credited to Neuroscience News
Original research: Closed access.
“Measuring Stochastic Data Complexity with Boltzmann Influence Functions” by Roger Grosse et al. arXiv
ABSTRACT
Measuring the complexity of stochastic data with Boltzmann influence functions
Estimating the uncertainty of a model’s prediction at a test point is an essential part of ensuring reliability and calibration under distribution shifts.
A minimum description length approach to this problem uses the predictive normalized maximum likelihood (pNML) distribution, which considers every possible label for a data point and decreases confidence in a prediction if other labels also match with the model and training data.
In this work we propose IF-COMP, a scalable and efficient approximation of the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function. IF-COMP can be used to produce well-calibrated predictions on test points as well as measure complexity in labeled and unlabeled settings.
We experimentally validate IF-COMP for uncertainty calibration, mislabel detection, and OOD detection tasks, where it consistently matches or outperforms robust baseline methods.