Featured

What to Monitor in a Machine Learning Model

Deploying a machine learning model is only half the battle — knowing whether it's still working weeks or months later is where most teams fall short.

 

Just saying that you should monitor your model is quite vague, so what specifically should you be monitoring? You can use a combination of both operational approaches (is the process working?) and analytical approaches (is the model still working well?). We’ll break these down into three steps that we’ll discuss in this post:

  1. Confirm the predictions are being generated.
  2. Compare predictions to actual outcomes.
  3. Evaluate important distributions where applicable.

Confirm the Predictions Are Being Generated

This is purely operational in nature. The goal of this step is to ensure you’re on top of any backend processes that break. It’s much better to alert your stakeholders of an issue with a process you own than to have them alert you. When you’re able to raise these questions, your stakeholder sees you as more competent. The inverse is also true when they’re the ones alerting you of an issue with your process.

 

How you’ll implement this step depends on the technology you’re using. In general, the goal is to run a parallel validation process that ensures predictions are generated in the correct place. This should be a parallel process rather than one you integrate directly with your prediction workflow. This will ensure that if prediction generation fails, a sep­arate process is in place to identify the missing predictions, which is especially import­ant when you’re using automation. In practice, this means two separate scripts that are scheduled: your code for the model that generates the predictions and code that checks whether the predictions have been generated.

 

While this process may not be the most glamorous, it does have the best return on investment (ROI) for your time. As mentioned, there’s a big difference between a stake­holder informing you of an issue and you informing them. While the probability of the process failing may be quite low, the impact of it occurring is significant. It’s also rela­tively easy to establish this type of process. In its simplest form, you’re checking whether the current day contains predictions and scheduling the process. This means the investment is quite small compared to the potential return of getting ahead of any issues.

 

Compare Predictions to Actual Outcomes

This is where you evaluate and monitor model drift. From a mental model perspective, think about this as an ongoing test that’s similar to the way you train models. When the actual outcome is available, you can combine it with your model’s predictions to track how well your model performs. Plotting predictions and actual outcomes in a line graph is where the term model drift comes from. As time progresses and the model’s training data becomes outdated, you’ll see the actual outcomes and the predictions drift away from each other. The figure below shows an example of what this can look like over time. The drifting starts to occur in March and then becomes severe in June. By monitoring your model, you’ll see these results as they occur and be able retrain the model to account for new underlying behaviors.

 

Model Drift Example

 

The ability to execute this in practice requires you to join your predictions and your actual data. Having a unique identifier associated with your predictions enables you to do this. As shown in the previous figure, we included the date from the predictions to ensure we could cre­ate such a graph. If you’re predicting something about customers, including the cus­tomer ID would enable you to compare the predicted and actual outcomes.

 

Displaying this data in a time series manner allows you to see whether the predictions are changing over time. If you only look at a fixed period of time, you lack context and run the risk of over- or underreacting. If you only look at the results in aggregate, it will likely take a while to identify any model drift.

 

It’s helpful to set up a dashboard or recurring report that will push this information to you. Each model is different, but aim to keep an eye on the results each week, or at min­imum, a few times a month. That way, you can you avoid a situation where a model should have been retrained a while ago but business leaders are still making decisions from it.

 

Evaluate Important Distributions Where Applicable

This is a subset of your comparison of actual data and predictions, adding a layer to eval­uate how prediction performance varies across different population groups. Let’s use the example of a resume scoring model to illustrate this concept.

 

An important measurement for a resume scoring model is how the model handles demographic information like ethnicity, gender, and age. You don’t want the model to create or perpetuate any bias from the data it was trained on. You can monitor this by comparing the actual results to the predictions for each subset of the population, as shown in this figure.

 

Model Monitoring Gender Example

 

By looking at the data this way, you’re able to see how the actual results compare to the model. In this example, the model actually creates a more stable balance between the scores of males and females. This type of finding is an insight you can bring to your stakeholders. This helps you demonstrate the model’s value to them, while highlighting that you’re actively reviewing this type of information. A key component to modeling monitoring is building trust with your stakeholders. Proactively showing that you’re looking into your data this way helps build that trust.

 

Sometimes the cardinality, or number of options within a column, is too large to make a time series view valuable. In a case like this, you can visualize the data in aggregation, as shown.

 

Model Monitoring Example in Aggregation

 

This makes it easier to review the data across multiple values. Even with a binary male–female view, this visualization more concisely shows how the model’s predictions pro­duce a more stable and equal distribution compared to the actual outcomes.

 

Conclusion

Monitoring a machine learning model effectively comes down to three interconnected habits: confirming predictions are being generated reliably, tracking how well those predictions hold up against real outcomes over time, and evaluating performance across meaningful population segments. Done consistently, this work isn't just technical hygiene — it's how you build credibility with stakeholders, catch problems before they become costly, and ensure the models you deploy keep delivering value long after training ends.

 

Editor’s note: This post has been adapted from a section of the book Applied Machine Learning: Using Machine Learning to Solve Business Problems by Jason Hodson. Jason has worked in data-centric roles for nearly a decade. He currently works as an HR analytics manager, and he has prior experience in a forecasting role using the full range of applied machine learning. In a previous role, Jason wrote the end-to-end code for an enterprise hiring manager and candidate experience process, collaborating with recruiting leaders to understand and leverage data from a company-wide survey. He’s built large data models and dashboards and taught nontechnical users how to adopt and use them. Jason has been a technical mentor in all his roles, helping others develop their analytics and programming skill set. The common thread across Jason’s career is his ability to be a translator for stakeholders, peers, and junior team members. His learning journey also gives him a unique perspective: Before earning a master’s degree in business analytics, he was entirely self-taught. This has made his approach to teaching more practical, allowing concepts to translate better (and faster) into the business world.

 

This post was originally published 6/2026.

Recommendation

Develop Machine Learning Models to Solve Business Problems!
Develop Machine Learning Models to Solve Business Problems!

If you've ever wanted to put machine learning to work but didn't know where to start, this book is the guide you need. Follow three real-world business use cases, learn to select and evaluate the right models, and implement solutions that deliver measurable results — with downloadable sample code included.

Learn More
Rheinwerk Computing
by Rheinwerk Computing

Rheinwerk Computing is an imprint of Rheinwerk Publishing and publishes books by leading experts in the fields of programming, administration, security, analytics, and more.

Comments