One of the most important comparative measurements in ML is log loss. In layman’s terms, it expresses the accuracy of a prediction model. For example, a retailer could use an ML system to predict whether a customer will buy jeans or khaki pants based on their previous behavior. This article covers the basics of using the log loss function.

The metric is a reliable parameter for model comparison. The lower it is, the better the predictions. Begin with our guide and find out more about the log loss function explained by AI experts. We will show how it works for a clothing retailer.

Table of Contents

## Example of a Problem

Suppose your company sells apparel online. It is searching for an accurate prediction model that will show which orders to send to customers based on their previous behavior in the store. You have a binary choice between a pair of jeans and a pair of khaki pants. These classes have true values of 1 and 0, respectively.

**Also Read: 4 Ways Deep Learning Is Changing The Present**

## Finding a Solution via Log Loss

Data scientists will use log loss averages to compare different predictive models. In each case, log loss is calculated as the negative average of the sum of corrected predicted probabilities. The logic is similar regardless of whether you assess clothing choices, generate weather forecasts, or decide if an email will be classified as spam.

The value of log loss for every instance shows how far the prediction probability is from the actual/true value. The lower the divergence, the more accurate the model.

- To understand if a customer is more likely to choose jeans or pants, the algorithms will first calculate probability for jeans (1) and produce a classification based on the value of probability.
- By default, 0.5 is the threshold. This means that if the probability of jeans being chosen is 0.4, your customer is most likely to purchase khaki pants (0).
- Prediction probability of 1 is ideal. There is almost no divergence (it is minuscule), so the value of log loss is approximately zero.
- If the probability is 0.8, the divergence is 0.2, and the log loss is 0.223 (you will find the formula below). For probability of 0.9, the value is 0.105. Here is how to calculate this output:

- “i” = the record,
- “In” = the natural logarithm (an e-based value),
- “y” = the true value (spam or non-spam),
- “p” = the probability of prediction (1, 0.1 and 0.2 in our examples).

Log loss must be calculated for every observation, and the average of all outputs is used to compare entire models. This requires the use of the following formula, in which “N” is the number of observations in the set.

## To Sum up

The lower the log loss average of a model, the more accurate its predictions. Note that these scores can only be compared for the same sets of data.