Deploying machine learning models in a production environment is a significant challenge, often requiring as much effort as training the model itself.
In this post, I will discuss one of several key aspects that need to be taken care of, which is logging all necessary information after deploying the model.
When we are in the stage of laboratory experiments (offline), we usually do not think about saving various calculations and data on disk. After all, we have our dataset, which we use, and the final metrics allow us to select the appropriate model. If necessary, we can reproduce the experiment and add something if we forget.
The situation is drastically different when the model is running online, i.e., when data is flowing in real-time, the model calculates its predictions (and perhaps is retrained periodically), and certain actions are taken based on them. If we miss capturing any information, not only will we make it difficult to analyze the results, but in critical situations, we will not be able to find out what is happening, and we will also not be able to reproduce the experiment and draw conclusions.
Let’s look at how the information flow in an ML model typically looks:
In the offline experiment stage, we usually have saved on disk what I marked in green (sometimes, due to, for example, long computation times, we save more information, but this is not standard).
You may be wondering which elements we should save after deploying the model?
The only sensible answer is… all of them!
Why? Because each arrow in the diagram implies performing some operations, calculations. If what you observe, i.e., the model metrics or actions the model took, is incorrect, the cause may lie in any of the previous stages. Maybe the model received low-quality data? Perhaps the model had many stages, and one of them failed? Below, I will discuss each step in turn.
- Raw data – i.e., what we collect from various sources and what is used to build the model. We need to gather this original information before performing any operations on it. Often, at this stage, the data are already skewed, missing, or noisy, and this is where the cause of the model’s deteriorating performance lies.
- Processed data – i.e., the result of operations such as removing duplicates, correcting erroneous or missing data (preprocessing).
- Features for the model – Based on the data, we often calculate additional representations for the model, such as one-hot encoding, embeddings, or some relationships or aggregates. We need to know exactly what was the input to the model to be able to say why the model returned certain predictions and not others.
- ML model – Remember that your model can change dynamically – you may periodically retrain it or upload newer, better versions. You always need to know which version of the model was used in a particular case.
- Intermediate states of the model – Imagine your model is an ensemble of a neural network, a random forest, and logistic regression. The final result is the arithmetic mean of these three predictions. If this combined model performs poorly, you will need to determine if one or several of the components are performing poorly. The same goes if the model has a complex, multi-stage architecture. The more intermediate states are saved, the easier it will be to identify potential problems.
- Model predictions – Often, the model predicts many things, and only a portion of the information reaches the final decision. This is often the case in classification models that return the probability of each class. But the final decision is the most popular class. In this situation, you need to save the probabilities of each class, not just the final result, to see in detail how the model performed. The same goes for recommendation systems, which, even though they return only a portion of the best-matched products or content, evaluate more of them. In this case, it’s worth keeping all ratings.
- Metrics – With them, you can monitor the model’s performance in real-time and react when it deteriorates.
- Actions – The results of the model operating online usually lead to specific actions, such as displaying relevant information, saving something, or recommendation. You need to know if what the model returned was correctly interpreted and if the action was actually taken. Is logging all this information everything? Unfortunately not. To have a full picture of how the model works, you still need to log two categories of events.
- System information – Here we log information such as the model response status, returned warnings or errors, as well as execution time, memory used, or the load on the computational infrastructure (e.g., GPU memory).
- “Meta”-information – It is worth having an additional document where you will record, for example, the planned scope of the experiment or assumptions. As well as any additional events that may affect the interpretation of the results (e.g., exceptional days, events in the economy or society, weather phenomena, etc.).
To collect this information, we usually need three places: a database, files for system logs (these two places should be automatically supplemented), and one document for manually entering meta-information.
It may happen that detailed logging of all information will not be possible for our resources (e.g., due to space limitations). In that case, we need to try solutions, such as automatically deleting older information or aggregating it after some time. As a last resort, we may not log a particular stage, but giving up on it must always be entirely deliberate.
Depending on how our deployment looks now, the logging implementation will be completely different. A sample technology stack when deploying entirely manually (without the use of prediction serving systems such as Nvidia Triton) may be SQLite for building a database, the logging module for system data, and Google Docs for entering meta-information.
I hope this post encourages you to take a conscious approach to logging data in production. Remember that without it, it will be difficult for you to improve your model, reproduce experiments, or respond to problems.