5 Steps to Get Started With Data Science

As a beginner it is easier to get lost in the details and shear overwhelming nature of learning machine learning.


Cross the beginner’s block

I get a lot of mails from readers asking.

  • How do I get started with Machine Learning? 
  • I do not have a background in math, how can I learn data science?                        

More often the materials on blog posts and courses are often targeted at intermediates. But remember it is easier to get started without the math. You would still need the math, but it can come later. Below is a step by step guide to get started, but remember..

Screen Shot 2016-08-22 at 9.38.33 AM

Be Curious

When I first started with machine learning, I started reading anything that had the title data science/machine learning. I often did not understand most of it, but slowly I started to grow chunks of knowledge which I later assembled. The important skill here is to be curious and believe.

Learn a tool

Never get overwhelmed with a choice of tool. Just pick one!. Often beginners are divided between R and Python. Here are a list of resources to get started with the tools.



Get your hands dirty

The best place for a good data source would be the  UCI Machine Learning Repository.  The repository is an inventory of many small real world examples. Start with the simple Iris Data Set. 

Learn to explore the data and try the following with the tool of choice. Preparing data for data science problems is an art of its own right. Below are the list of techniques you should try your hands at.

  1. Wrangle

    Start by dicing the data into subsets. Understand the variables and their types. Take a look at the variables that might impact the machine learning problem at hand.

  2. Transform

    Try simple data transformations like aggregation, decomposition (splitting the variables) , log transforms.

  3. Visualize

    A key part of solving data problems is to understand the data at hand. Visualization is a wonderful way to understand the data and the hidden gold in them.

  4. Question:

    Majority of the data science problems is to look for answers. Practice asking questions and look for answers in the data.

Applied Data Science Process

Understand the process behind solutions to data science problems.The most common approach to solving data science problems is as follows.

  1. Define the problem: Understand the problem that is being solved
  2. Analyze data: Analyze the data to for patterns and information that could be used to develop a model.
  3. Data preparation:  Prepare the data for modelling.
  4. Model: Start applying machine learning algorithms and validate.
  5. Evaluate:  Evaluate the performance of the model and choose the best performing model.
  6. Deploy: Implement the model in production.

Practice, Practice, Practice

Once you start learn the tools, get your hands at the data ,  practice the applied data science process, it is important to rinse and repeat this process on different datasets across different domains.

Diving Deep

As you start learning the tricks of the trade, it is important to get deep down to the details. The next step is to dive deeper into the algorithms and to understand why they work and how they work. Understand when one is better than the other, under what circumstances they perform better.


In this post you will learn a step by step approach to learn data science, understand simple approaches to learn and get better at doing applied data science.