The Blog

Machine Learning Showdown: Python vs R

Posted on August 2, 2017 by

Let’s say you have an amazing idea for a machine learning app. It’s going to be brilliant. It’s going to revolutionize the world of finance, mobile advertising, or… some other world, but it’s definitely going to revolutionize something. And gosh darn it, it’s going to be the smartest, most learned app the world has ever seen.

The only thing standing between you and glory is the small matter of actually coding your brilliant idea; and the first question you would want to ask yourself in this regard is which programming language you want to use for your app, with the two immediate candidates likely being R and Python.

Each of these languages has its pros, cons, and diehard fanbase. This article is meant to help developers choose between these two bitter rivals, in the context of machine learning (for a more general, feature-by-feature comparison you might want to check out this great infographic by DataCamp).

Let’s get down to it then!

Round 1: Ease of Development

Python lets you hit the ground running… if you have programming experience.

While both Python and R are completely manageable and used by many developers in both business and academia, Python lends itself more easily to developers who have experience with other programming languages. Its syntax is more familiar than R, while also closer to regular English text – making it easier to read and debug.

R is very popular with advanced business users – e.g. data analysts in fields such as retail, marketing or finance – who come from more of a statistics background, rather than programming or software development. Since you’re developing a machine learning app, we’re guessing you’re closer to the latter group – in which case you might appreciate Python’s flexibility, readability and similarity to the type of programming you already know and love.

Winner: Python

Round 2: Robustness and Production Readiness

Python fits more naturally into a complex coding environment.

While applications of R in the business world are definitely on a growth trajectory, Python is still a more full-fledged programming language and is used for many types of web and other applications, in addition to its data science applications. R, on the other hand, is still mostly used for data analysis advanced statistical modeling.

Hence, assuming you would want to integrate your machine learning algorithms into some kind of interface that’s communicating with other code, written by other programmers, Python might be the better choice. R can be used for rapid prototyping or to solve a specific problem, but Python will be easier to maintain and scale in the long run (especially considering its versioning and documentation are far more consistent).

Winner: Python

Round 3: External Libraries

Both languages have a breadth of external libraries that can be (relatively) easily used in a machine learning project, Python’s are a bit more mature. Specifically, scikit-learn is an extremely popular, open-source machine learning package that is used in many commercial applications.

Meanwhile, R libraries such as caret are catching up, but are not quite there yet when it comes to breadth of functionality. With R you might be able to more quickly build and launch your first model – but mastering scikit and similar libraries will provide you with a deeper and more complete toolset that you can feel safe using in your machine learning app.

Winner: Python

Banner: Get free machine learning datasets

Round 4: Performance with Big Data

R can provide better performance when performing large computations.

Machine learning will often involve working with massive datasets and highly complex computations to train and test your algorithms – so you’ll want to make sure the programming language you use will perform will in these kind of scenarios.

While both R and Python can integrate with Hadoop for big data, newer R packages utilize C to provide better performance for large-scale computation. Hence, you might get faster results when using R in these situations.

Winner: R

Round 5: Statistics and Data Visualization

While this would not be the core of your machine learning software, your app might very well include some elements of statistics, analytics and data visualization.

Here, R is the hands-off winner as a tool that’s built from the ground up to provide a robust platform for advanced statistical analysis. Integrating ggplot2 will enable you to create some really nifty visualizations as well, including interactive, browser-based graphs and charts.

While Python can and is used for statistical analysis and data visualization, R will probably be the better choice for this type of functionality – especially when it comes to ‘one-off’ operations, prototyping and testing various hypotheses (versus creating reusable and extendible features).

Winner: R

And the overall winner is….

Python is the winner

Python. With the necessary caveats that every application, use case and business scenario is different, Python is the more mature, fully-fledged and flexible option for machine learning – and for creating complex coding projects in general. However, with R’s rapid development and growing popularity, we won’t be surprised if it catches up within a few years.

P.S.: if you’re developing from scratch, it’s probably neither

Our discussion above assumes you would want to be using an external library and build your machine learning app around it. Unless you’ve got a team of programming superstars, this is probably the direction you’d go.

However, if you want to start from scratch and rewrite the libraries themselves – either as a research project or because you have a truly brilliant idea for optimizing some of the under-the-hood processes – then you probably would use a compiled language (rather than an interpreted one), such as C or Java. In fact, most of the external libraries you’ll be using are actually written in these languages.

Do you have your own opinion on the Python vs R debate? Go ahead and tell us in the comments. Whichever language you choose, you’ll want some great datasets to train your machine learning algorithms on – so go ahead and check out our open machine learning datasets now. You can also read more about using web data in artificial intelligence, or check out some of our case studies.

This entry was posted in Machine Learning, Python. Bookmark the permalink.

8 thoughts on “Machine Learning Showdown: Python vs R”

  1. Round 3: External Libraries … The winner here is actually R.

    Round 4: Performance with Big Data … The winner here is actually python. The Pyspark API actually works meanwhile the R API is terrible.

  2. James says:

    Comment that machine learning libraries being weaker in R is questionable….the leading names in research such as Hastie, Tibshirani have a book out there….Statistical Learning with….R. Caret is just a wrapper for underlying functionality. Gradient boosting, which is all the rage with XGB was initially deployed by Greg Ridgeway….in R, with the original vigenettes/docs dating back to 2007. Deep learning algos in Python are very strong though. Python can also call on R packages.

    Something critical missing in your analysis are the available IDE’s, and in this respect R Studio does not have an equal in Python.

  3. Glen DePalma says:

    External libraries and ease of use : Python? No way. Not today, not tomorrow, not any day.

    Also, how can Python be production readiness when half the code is written in Python 2.* and the other half Python 3.*? That’s the opposite of robust.

  4. Cristi Neagu says:

    Not only do most statistical and big data libraries for Python are written in C, but Python will let you write sections of your code in C/C++, enabling your script to run much faster. I would say Python wins that section as well. Also, these days, speed of development is more important than speed of execution.
    And I’m not sure about the visualization libraries available for R, but Python probably wins again, considering the many options one has when it comes to data visualization.
    Sounds like it’s a complete win for Python. But I do understand. You have to give R something too, or else what’s the point of the comparison.

  5. Stuart says:

    Cristi Neagu says:
    And I’m not sure about the visualization libraries available for R…so I’ll refrain from commenting on that altogether.
    It’s a complete win for Python for what I do personally. But I do understand. A person who has only coded in Fortran77 would no doubt find Fortran77 superior to all other languages until they took the time to try and learn something else.


  6. Konstantin Buzanovsky says:

    Round 1: Ease of Development
    R definitely wins.
    Tidyverse makes code readable, like pseudocode-readable.

    Round 3: External Libraries
    R definitely wins, especially after adding keras and mxnet.
    If you want sklearn in R, get mlR.

    As a result – 4-1?

  7. Anujay says:

    Strongly agree with cristi

  8. Solutions says:

    It’s HARD to know whether to use Python or R for data analysis. And this is especially true if you’re a newbie data analyst looking for the right language to start with.

Leave a Reply

Your email address will not be published. Required fields are marked *