Today’s post is directed to those who want to begin their journey with Data Science or are considering this possibility.
In this post, I will explain:
– What does working in Data Science involve?
– What skills are needed?
– How to acquire these skills?
– How to get your first job.
What does working in Data Science involve?
Data Science (with the role known as Data Scientist) is about building various models based on data. Unlike a Data Analyst position, which primarily uses tools for simple analyses and visualizations (such as Power BI), a Data Scientist mainly uses programming languages like Python or R, which provide many more capabilities.
The goal is not only to discover interesting patterns or trends in data but, more importantly, to create models that allow for predicting various values, optimizing processes, or automating human tasks.
What skills are needed?
The primary technology here is usually programming in Python or (less often) R, along with the ability to use the appropriate libraries.
Typical Python libraries include:
– NumPy – for mathematical computations, vector operations, and matrix manipulations,
– Pandas – for analyzing tabular data,
– scikit-learn – for building various machine learning models,
– Matplotlib – for charts and visualizations,
– TensorFlow / PyTorch – for more advanced machine learning and deep learning.
To use these libraries effectively, you must understand basic machine learning (ML) algorithms, such as regressions, neural networks, or random forests.
You also need to have a foundational understanding of statistics and probability theory. From my experience, basic knowledge is sufficient (e.g. what is a probability distribution, the most common distributions, and what a derivative is, including derivatives of the most common functions). This is generally high school-level advanced mathematics.
Additionally, a basic understanding of databases (SQL) is necessary to extract data for analysis, as well as knowledge of fundamental IT technologies such as Git, Linux, and Jupyter Notebooks.
How to learn Data Science?
Depending on your budget, available time, independence, and self-discipline, there are several approaches you can take:
1. Data Science Degree at a University
This is the most time-consuming yet thorough option. In my opinion, full-time programs are most effective as they allow you to fully dedicate yourself to the field. One of the best choices in Poland is the Machine Learning program at the University of Warsaw.
2. Data Science Bootcamp
These are usually several-month-long programs, either on weekends or evenings, often online, covering the most popular topics comprehensively. In Poland, there are a few popular options. Prices range from about a thousand dollars. They require a lot of regularity and time and often offer support in finding your first job.
3. Online Data Science Courses
Here, you buy access to pre-recorded materials online. These courses can be completed at any time and typically consist of a few to several hours of videos.
Platforms like Coursera or Udemy offer popular courses, which are usually a good, affordable introduction to the topic. Prices start from just a few dollars (there are also free courses), but their quality can vary. I recommend the Machine Learning Specialization and Deep Learning courses from Stanford.
This is a good low-budget solution if you’re highly motivated and disciplined. Keep in mind, though, that help might be scarce if you don’t understand something, and there usually aren’t assignments for practical learning (and even if there are, they likely won’t be reviewed by the instructor). Moreover, you need to organize your learning and find appropriate courses since most only cover part of the topics.
4. Custom Learning Programs for Machine Learning
These are courses that, in addition to videos, provide access to the instructor, extra materials, student groups, and online meetings. Such programs are not numerous. An example is my Machine Learning Mastery program, where over an intense 7 weeks, we go from a surface-level understanding to a deep knowledge of machine learning. Additional benefits include homework reviewed by the instructor, allowing for rapid progress and practical application of knowledge. Live meetings in small groups function almost like individual consultations and mentoring.
The cost of this and similar programs is several hundred dollars, which is significantly less than a bootcamp but more than typical online courses.
5. Free Data Science Materials Online
Remember, you can find (almost) everything online 🙂 So, in theory, you can learn:
– Basic Python from free tutorials, like this one,
– The relevant libraries from tutorials or YouTube videos,
– Statistics and math from books or e-books; for instance, “StatQuest” can be a good choice.
This is certainly the cheapest option but requires the most self-discipline, material selection, and motivation. Meeting with a mentor might help, as they can guide you to the right materials and answer your questions.
However, keep in mind that you should start working on your own projects as soon as possible. The advantage of degree programs, bootcamps, or custom programs is that you usually get a project suited to your level, which is then evaluated by an experienced person.
If you’re learning on your own, it might be harder to find such projects. It’s worth joining Data Science or ML groups on Facebook or LinkedIn, where you can ask for help.
How to get your first job?
I believe you should start looking for a job as soon as possible. Remember that the search can take many months, during which you’ll refine your skills to match the market’s needs. You’ll also see which stage of recruitment you’re falling behind at and for what reasons, so you can gradually improve your weakest areas.
Initially, it’s good to get a sense of the job market by browsing portals like Just Join IT, No Fluff Jobs, or LinkedIn. Make sure to have a well-prepared LinkedIn profile that shows you are open to new opportunities.
The biggest challenge is often landing your first job in the industry. Therefore, it’s essential to be creative and use different strategies, such as:
– Applying for unpaid internships,
– Tailoring your CV to job descriptions (use GPT to help adjust your CV),
– Creating your own projects and building a portfolio on platforms like GitHub.
Conclusion
Data Science is a fascinating field, and I definitely recommend pursuing it. Remember, though, that everything worthwhile takes effort. So don’t get discouraged if learning doesn’t come easy or if you struggle to find a job. If I can help or if you have any questions, feel free to write to me at adam.dobrakowski@praktycznyml.pl. For more in-depth advice on working in Data Science, check out my mini-course “Career in Data Science,” or join my Machine Learning Mastery program.