How to smartly choose your Data Science toolkit – R vs Python

Often I get questions from readers who are constantly caught in the tool conundrum

Should I choose R or Python to start learning data science?

tools-image

If you are newly entering the world of data science and not have tried either of these languages it is easy to land into this question. In this post we shall carefully examine both with the needs of data science in mind.

R

R is built by data Scientists for data scientist. So doing data analysis, building models, communicating results are the core strengths

The major power of R is it’s user community which offers extensive support and has developed the package base CRAN.

A few great packages for you to start exploring in R would be

  • ggplot2/ggvis – Data Visualization
  • dplyr (Data Munging and Wrangling)
  • data.table (Data  Wrangling)
  • Caret: (Machine learning workbench)
  • reshape2: (Data Shaping)

R has a steep learning curve and is generally built for stand alone systems. Although there are several packages to speed up the process.

If you are a beginner, I would strongly recommend downloading RStudio which is the de facto IDE for R

Python

Python is great programming language and is very easy to start with. You can easily perform most of the data science task like data wrangling, munging, visualization and of course it has a great machine learning library – scikit learn. If you are already familiar with Java/C++, it is straightforward to get started with Python.

According to the data science survey conducted by O’Reilly almost 40% of the data scientists use Python to solve their problems. Python also has a great community of open source packages.

Below are the list of packages which are great for data science applications

  • Seaborn – Data Visualization
  • Pandas – Data Munging and Wrangling
  • Numpy/Scipy – Data Wrangling/ Representation
  • Scikit-learn – Machine Learning library

Clash of the Titans: Python vs R

injustice-1

Photo Credit

It is indeed clash of the titans of the data science world. Here are a few guidelines which you could use to choose the language.

Popularity:

Python is one of the top programming languages. Let us get down to the numbers in the data science community. Got the data from here . There is an increase in The below graph popularity of Python is increasing in the data science community. (The plot done in R 😛 )

DS StackExchange_R_vs_Python

Personal Choice

Coming from an engineering background I chose python as it was more natural to me. Later explored in to R to understand its strengths and support. The best way is to start one and learn the other to work on its strengths.

Learning Curve

R has a steep learning curve as compared to python. But deliberate practice could help you climb the ladder faster.  In order to learn R I chose to use R for my projects deliberately, there by gaining knowledge and experience using it.

Type of Problem

Often the type of problem your solving has a bearing on the choice of language. If the nature of the problem at hand is to do thorough data analysis then I choose R, but If I need to write quick scripts to get things done, scrape the web then it is simpler to use Python.

Communication

Often overlooked but an important data science activity is the ability to communicate results and  exchange ideas. IPython notebooks are a beauty in itself providing the best interface to communicate, shortly followed by R Markdowns.

Verdict

As a data scientist it is always best to open to learn more tools. Preferring one over the other may be good to start with, but it is always know and use the tools to their best strengths.

Crack Your Next Data Science Interview

Preparing for a data science interview might seem like a huge mountain to climb with a huge variety of topics piled in front . But it isn’t hard as it seems to be.

Mountain

The time is now!!

Having a wide range of topics to cover, calls for a need to set aside time and prepare meticulously for topics . Interviews can range from explaining logistic regression to a 5 year old to tuning the parameters of a model. Set aside a time every day to prepare and religiously sit down to prepare for on the topic interview.  With consistent effort it is easier to be there on top of the mountain. From experience below are the topics we should be covering to ace your next data science interview

With a wide variety of topic it is entirely possible to get sucked into one of these holes. This makes it necessary to fix SMART goals and prepare towards these goals.

get_sucked

Below are the steps which I personally followed to prepare for my interviews.

  1. Review your background and prepare a list of topics you may want to cover. As data scientist come from different backgrounds such as political sciences, statistics, software engineering. It is important to understand your weak links and to prepare towards strengthening it.
  2. Write down your goals and prepare a schedule to work on the small weak links. By writing your goals you create a subconscious wiring to work towards these goals.
  3. Make a commitment by setting a time aside every day for you to religiously study the topics on your weak links list.
  4. Attend Interviews: Attending interviews is another way to get feedback to understand your week links and iterate over them.
  5. Review your goals: Set weekly review meetings with your self to review your current preparation

While these steps are important below are the topics which are essential for a data scientist to know.

Basic Mathematics

To become a good data scientist one must have the ability to deliver insights from the data. You would be able to deliver insights with descent  understanding of mathematical concepts. Go through a refreshers of linear algebra, probability and statistics theory.

Asking the right questions

This is more learned by practiced than taught. Many employers look in for the curiosity and the ability of the candidate to ask questions that can extract insights from the data. Take up a totally unknown data set and practice asking questions and look for answers for your question. With this approach you would improve your questions and strengthen your abilities to find the answers.

Applied machine learning

It is important to understand the basic algorithms in machine learning. Interviewers focus on how the candidate formulates the problem and his ability to transform business into an analytical problem. If you are new to machine learning, a good place to start understanding these concepts would be to enroll in a course or learn from the web. Do check the data science specialization at Coursera and nano degree’s at Udacity. These are a great place to start.

Learn white board coding

This is similar to a software engineer position where the interviewers test the candidate’s ability to define, analyze, solve and test the problem at hand. It is important to brush up concepts of algorithms and data structure. This has been a part of many product oriented data science interviews where the data scientist are expected to be good programmers. There are tons of websites and books to get you started here.

Get the right tools

Thou there many a wide range of tools to express analytics, the top choice of many data scientists have been python and R. Both the languages have great machine learning libraries. These tools would be good to know and have in your toolbox.

 Be a data hacker

Learn data wrangling and mugging techniques in the language of choice. This helps to get up to speed with any given data set.

 Understand databases

Relational databases are a part of every industry and it is important to learn the basics of databases and how to write efficient queries.

 Learn Data Visualization

The best way to start understanding the data is to visualizing. Choose and learn visualization techniques in a tool of choice. Thou it would not be asked during an interview but it is a must required skillset for a good data scientist.

Practice

Practicing the theoretical concepts you learn with help you develop a better understanding of the concepts and also understand your weakness quickly.

Research about the role

Along with preparing for the interview, it is essential to align your skills to the type of data science role you are looking for.  Think about what kind of data scientist you would want to be and which type of teams you would like to be a part of. Ask appropriate questions to understand the requirements of the role and tailor your needs. Look up the profiles of the people who would be interviewing  to understand their background and performing similar roles at the company. This would help you to be understand the type of questions you could expect during the interview. It is important to identify the type of role the employer is looking to fill in, and focus your preparation towards that direction. Take time to understand the job description and also the background of people who would be interviewing you. Remember to work on your weakness on the chosen type of roles. Below are the simplified types of data scientist employers commonly look for.

Business Savvy Data Scientist

The business savvy data scientist focusses on building analytic solutions to help business users and final decision makers.  They help to understand the underlying problems of a company’s marketing campaign, to understand churn or what interest the customers. Communication and story telling plays a major role for these type of roles as it involves communicating the value to non-technical people. They do not have to build complex models, but must unearth the value from the data to answer the questions of why and how.

Product Savvy Data Scientist

The other type of data scientist focuses on building products to help businesses. They build high complex models using sophisticated statistical and machine learning algorithms. They are very focused on improving the performance of the models where it has direct impact on the company’s product. They require to posses good statistical and solid computer science skills.

Hope the above steps helps you to crack your next data science interview. Don’t wait to make your next leap.

5-handling-success-e1369159215250

 

Resources to get Started