The F1 score is a key metric used to evaluate the accuracy of a machine learning model. It helps us understand how well our models are performing, especially when dealing with imbalanced datasets.
At SmarterX, we use F1 scores to validate our models. It is a critical part of the process for us to bring a model or CPG decision to the market.
The F1 score is a measure of a model's accuracy that considers both precision and recall. It ranges from 0 to 1, with 1 being the best possible score.
The F1 score is the harmonic mean of precision and recall. It provides a single metric that balances both concerns, ensuring the model performs well in identifying relevant items without generating too many false alarms.
Making accurate predictions is crucial. An F1 score gives us a reliable measure of our model's performance – and it removes subjectivity. For example, in regulatory compliance, we need to ensure that our models correctly identify products that fall under certain regulations without missing any or incorrectly flagging non-relevant products.
To calculate the F1 score, we first need to understand two concepts:
The formulas are as follows:
Precision:
Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP} Precision=TP+FPTP
Recall:
Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN} Recall=TP+FNTP
F1 Score:
F1Score=2×(Precision×RecallPrecision+Recall)\text{F1 Score} = 2 \times \left( \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \right) F1Score=2×(Precision+RecallPrecision×Recall)
Let’s say we have a model that identifies defective products on a production line. Out of 100 products:
We calculate:
Precision:
7070+10=7080=0.875\frac{70}{70 + 10} = \frac{70}{80} = 0.875 70+1070=8070=0.875
Recall:
7070+20=7090≈0.778\frac{70}{70 + 20} = \frac{70}{90} \approx 0.778 70+2070=9070≈0.778
F1 Score:
2×(0.875×0.7780.875+0.778)≈0.8242 \times \left( \frac{0.875 \times 0.778}{0.875 + 0.778} \right) \approx 0.824 2×(0.875+0.7780.875×0.778)≈0.824
This score tells us that our model is fairly accurate in identifying defective products, balancing both precision and recall. It’s up to you to determine what F1 is acceptable to you and your business – we’d keep refining if we got a .82 score.