Blog Content

Home – Blog Content

Planning Machine Learning Projects

Today, I would like to introduce you to the process of planning Machine Learning projects, as practiced in my company, COGITA. If you are involved in ML projects—whether as an analyst or a Data Scientist—you are likely either participating in this process or closely utilizing its outcomes. So, stick around till the end of this article, and you will surely find something relevant to your work.

Agile Approach to ML Projects Instead of Detailed Planning

First and foremost, when planning ML work, remember the agile philosophy—agile project management. In this light, the aim of the planning process should be a preliminary sketch and outline of the main phases, rather than a detailed plan of all stages. However, decisions regarding details should always be made on an ongoing basis, for example, at the end of two-week sprints.

Usually, though, you may need to define the timeframe for the project somewhat rigidly. What to do then?

There are several ways to handle this. Firstly, it is valuable to have access to historical projects from either yours or your team’s portfolio. When starting a new project, you can quickly find projects with similar complexity or from the same domain and see how long they took.

The second approach involves setting a time limit but not defining expected metrics (similar to what I discussed in this blog about limiting scope with a set timeframe). It is very difficult to predict how long it will take to achieve, for example, 90% model accuracy. But it is quite easy to plan to spend X time on initial training and Y time on adjustments to achieve the highest quality.

The third approach is to only evaluate the Proof-of-Concept (PoC) stage and provide, for example, time ranges for the full algorithm-building stage. The PoC stage allows for preliminary data analysis and running initial models. You will know where you stand and how far you are from the desired outcome (remember to start with benchmarks).

Start with “Why?”

When embarking on a Machine Learning project, it is important to first understand the problem you are trying to solve. It is worth asking simple questions: What is the intended model supposed to do? What need does it address? Will it be complete automation of a function, partial, or will the model only provide recommendations while final decisions are made by a human?

Consider the advantage this solution will have over the current state. Why was the decision made to apply Machine Learning to this problem?

By keeping the end result in mind, it will be easier for you to make decisions and plan work.

Gather Detailed Requirements

Now, think about the users of your model. They could be people from another department in your company, employees of your client (e.g., bank, store, hospital, etc.), or individual users of your ML-utilizing application.

Ask them why what they are currently using is not sufficient. How do they envision the desired solution? Consult on the form of interaction with the model, the shape of the user interface, etc.

Try to gather all requirements regarding the model—what level of effectiveness is ideal, and what is acceptable? How long is an acceptable wait time for the model’s response?

Then, investigate constraints—will these people always want to use your model in every situation? Pay attention to trust in the model—since the model will make decisions about granting loans or not, beyond a YES/NO answer, users will likely require justification for decisions. Ask what kind of justification would be sufficient.

Our Intuition and Quick Hypothesis Validation

When designing any Machine Learning solution, consciously or not, you rely on your intuition about data, how the world works, and the capabilities of algorithms. Firstly, it is valuable for this intuition to be based on as broad a range of information as possible. That’s why I always ask for a data sample before planning work on models and try to talk to the target users of the model to understand their experiences, needs, and requirements. I also conduct research on current approaches to similar problems and algorithms used. I have described these actions more extensively here.

Then, it is worth documenting the assumptions (hypotheses) on which our intuition is based. The goal of the PoC stage should be to verify these hypotheses as quickly as possible. Initial work should really answer two questions: how wrong were we initially and what should we change in the further plan to achieve the goal.

Communication: People Replaced by Models

During the planning stage of the project, you must consider the aspect of communication. If you are creating an algorithm for an external client, you should ensure that they are available to address your doubts and plan the scope of sprints together.

Particularly delicately, you need to approach the issue of collaborating with individuals whose work is ultimately to be replaced by AI models. It is worth familiarizing yourself with the increasingly popular approach of collaborative AI. In this model, the goal is to facilitate and accelerate human work using AI models, not to replace it. It is worth thinking about this when designing AI solutions.

Divide Work into Stages

An appropriate division of the project into stages is one in which each stage completes a certain whole, which ideally should immediately add value to the user and be deployable. An example could be implementing a heuristic model or a model that works on a subset of data. In this case, the end user will see progress after each stage and be ready to invest further in the project. Let me give you an example. If the target model is to determine a customer’s creditworthiness, the first step could be clustering customers into groups with similar creditworthiness. The next step could be predicting a range of values. Only the final model could provide a specific value.

The second aspect is to consider that the end of any stage should allow for choosing different paths. This aligns with the agile philosophy, where you are not certain that by completing the first stage, you will also decide on the second.

Example of Time Estimation

In my company, we use Clockify to log working hours. This allows us to fairly accurately say how much time we spent on specific tasks. Moreover, by comparing this time with the initial estimate, we are increasingly building our intuition and ability to estimate project size. I recommend this approach to everyone!

I will provide a real example from one of the projects in a previous company, where the goal was to detect products and read their names and prices.

Here’s the initial estimation:

Planned TaskEstimated plan in MD (Man-Days)
Data analysis and preprocessing10
Detection algorithm40
OCR25
Matching algorithm15
Total90

And here are the actual logged hours:

Completed TaskLogged time in MD
Data analysis and preprocessing12
Detection algorithm15
OCR11
Bug fixes: freezing the train-test-val split12.5
Heuristics algorithms13
Metrics and different heuristics12
NLP model extracting data from OCR output12
NLP: OCR + finding subset with high precision11
Error analysis and confidence in the overall solution + quick fixes26
Detecting boxes7
Total131.5

We see that the project took about 50% more time than estimated. Detection turned out to be much easier, whereas OCR for reading names and prices performed so poorly that we tried several other approaches (heuristics and NLP models). Moreover, 38.5 MD (almost 30% of the time!) was spent on analyzing the correctness of the solution and fixing errors.

Summary

In this article, I have presented several practices I use in planning Machine Learning projects. I suggest you trying to use at least one of them before starting your next project.

Leave a Reply

Your email address will not be published. Required fields are marked *

Our mission is to create artificial intelligence technology that benefits people.

Our Services

AI algorithms

AI audit

Trainings

Consultations

Information

FAQ

Team

Company

Services

News

Industries

COGITA Sp. z o.o. is a company registered in the National Court Register kept by the District Court in Częstochowa (Poland), XVII Economic Division of the National Court Register. KRS (National Court Register) number: 0000995030, NIP (Tax Identification Number): 9492257381.