# Python or R: Which Is A Better Choice For Data Science?

Data science is going to revolutionize this world completely in the coming years. The tough question among data scientists is that which programming language plays the most important role in data science? There are many programming languages used in data science including R, C++, Python.

In this blog, we are going to discuss two important programming languages namely Python and R. This will help you choose the best-fit language for your next data science project.

Python is an open-source, flexible, object-oriented and easy-to-use programming language. It has a large community base and consists of a rich set of libraries & tools. It is, in fact, the first choice of every data scientist.

On the other hand, R is a very useful programming language for statistical computation & data science. It offers unique technique's viz. nonlinear/linear modeling, clustering, time-series analysis, classical statistical tests, and classification technique.

Features of Python

• Dynamically typed language, so the variables are defined automatically.
• More readable and uses less code to perform the same task as compared to other programming languages.
• Strongly typed. So, developers have to cast types manually.
• An interpreted language. This means that the program need not be compiled.
• Flexible, portable and can run on any platform easily. It is scalable and can be integrated with other third-party software easily.

R features for data science apps

• Multiple calculations can be done with vectors
• Statistical language
• You can run your code without any compiler
• Data science support

Here, I have listed out some domains that are used to differentiate these two programming languages for data science.

1) Data structures

When it comes to data structures, binary trees can be easily implemented in Python but this process is done in R by using list class which is a slow move.

Implementation of binary trees in Python is shown below:

First, create a node class and assign any value to the node. This will create a tree with a root node.

class Node:

def __init__(self, data):

self.left = None
self.right = None
self.data = data

def PrintTree(self):
print(self.data)

root = Node(10)

root.PrintTree()

Output: 10

Now, we need to insert into a tree so we add an insert class & same node class inserted above.

class Node:

def __init__(self, data):

self.left = None
self.right = None
self.data = data

def insert(self, data):
# Compare the new value with the parent node
if self.data:
if data < self.data:
if self.left is None:
self.left = Node(data)
else:
self.left.insert(data)
elif data > self.data:
if self.right is None:
self.right = Node(data)
else:
self.right.insert(data)
else:
self.data = data

# Print the tree
def PrintTree(self):
if self.left:
self.left.PrintTree()
print( self.data),
if self.right:
self.right.PrintTree()

# Use the insert method to add nodes
root = Node(12)
root.insert(6)
root.insert(14)
root.insert(3)

root.PrintTree()

Output: 3 6 12 14

Winning language: Python

2) Programming language unity

The version change of Python from 2.7 to 3.x will not cause any disruption in the market while changing the version of R into two different dialects is impacting a lot because of RStudio: R & Tidyverse.

Winning language: Python

3) Meta programming & OOP facts

Python programming language has one OOP paradigm while in R, you can print a function to the terminal many times. The meta programming features of R i.e. code that produce code is magical. Hence, it has become the first choice of computer scientists. Though functions are objects in both programming languages R takes it more seriously as that of Python.

As a functional programming language, R provides good tools to perform well-structured code generation. Here, a simple function is taking a vector as an argument & returning element which is higher than the threshold.

myFun <- function(vec) {
numElements <- length(which(vec > threshold))
numElements
}

For a different threshold value, we will write a function that generates all these functions instead of rewriting the function by hand. Below, we have shown the function that produces many myFun type functions:

genMyFuns <- function(thresholds) {
ll <- length(thresholds)
print("Generating functions:")
for(i in 1:ll) {
fName <- paste("myFun.", i, sep="")
print(fName)
assign(fName, eval(
substitute(
function(vec) {
numElements <- length(which(vec > tt));
numElements;
},
list(tt=thresholds[i])
)
),
envir=parent.frame()
)
}
}


You can also consider the numeric example on the R CLI session as shown below:

>  genMyFuns(c(7, 9, 10))
[1] "Generating functions:"
[1] "myFun.1"
[1] "myFun.2"
[1] "myFun.3"
>  myFun.1(1:20)
[1] 13
>  myFun.2(1:20)
[1] 11
>  myFun.3(1:20)
[1] 10
>  

Winning language: R

4) Interface to C/C++

To interface with C/C++, R programming language has strong tools as compared to Python language. R's Rcpp is one of the powerful tools which interface to C/C++ and its new ALTREP idea can further enhance performance & usability. On the other hand, Python has tools viz. swig which is not that much power but working the same. Other variants of Python like Cython and PyPy can remove the need for explicit C/C++ interface completely anytime.
Winning language: R programming

5) Parallel computation

Both programming languages do not provide good support for multicore computation. R comes with a parallel package which is not a good workaround and Python's multiprocessing package is not either. Python has better interfaces for GPUs. However, external libraries supporting cluster computation are good in both the programming languages.
Winning language: None of the two

6) Statistical issues

R language was written by statisticians for statisticians. Hence there were no statistical issues involved. On the other hand, Python professionals majorly work in machine learning and have a poor understanding of the statistical issues.

R is related to the S statistical language commercially available as S-PLUS. R provides numerous statistics functions namely sd(variable), median(variable), min(variable), mean(variable), quantile(variable, level), length(variable), var(variable). T-test is used to determine statistical differences. An example is hown below to perform a t-test:

> t.test(var1, var2)

Welch Two Sample t-test

data: x1 and x2
t = 4.0369, df = 22.343, p-value = 0.0005376
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.238967 6.961033
sample estimates:
mean of x mean of y
8.733333 4.133333

>

However, the classic version of the t-test can be run as shown below:

> t.test(var1, var2, var.equal=T)

Two Sample t-test

data: x1 and x2
t = 4.0369, df = 28, p-value = 0.0003806
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.265883 6.934117
sample estimates:
mean of x mean of y
8.733333 4.133333

>

To run a t-test on paired data, you need to code like below:

> t.test(var1, var2, paired=T)

Paired t-test

data: x1 and x2
t = 4.3246, df = 14, p-value = 0.0006995
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.318620 6.881380
sample estimates:
mean of the differences
4.6

>
Winning language: R language

7) AL & ML

Python got huge importance after the arrival of machine learning and artificial intelligence. Python offers a great number of finely-tuned libraries for image recognition like AlexNet. Therefore, R versions can be easily developed. Python powerful libraries come from making certain image-smoothing ops which further can be implemented in R's Keras wrapper. Due to which a pure-R version of TensorFlow can be easily developed. However, R's package availability for gradient boosting & random forests is outstanding.
Winning language: Python

8) Presence of libraries

The Comprehensive R Archive Network (CRAN) has over 12,000 packages while the Python Package Index (PyPI) has over 183,000. PyPI is thin on data science as compared to R.

Winning language: Tie between the two

9) Learning graph

When it comes to becoming proficient in Python, one needs to learn a lot of material including Pandas, NumPy & matplotlib, matrix types while basic graphics are already built-in R. The novice can easily learn R programming language within minutes by doing simple data analysis. However, Python libraries can be tricky for him to configure out. But R packages are out of the box.

Winning language: R programming language

10) Elegance

Being the last comparison factor, it is actually the most subjective one. Python is more elegant than R programming language as it greatly reduces the use of parentheses & braces while coding and making it more sleek to use by developers.
Winning language: Python

Final Note:

Both languages are giving a head fight to each other in the world of data science. At some point, Python is winning the race while at some other R language is up. So the end choice between the two above programming languages for data science depends on the following factors:

-> Amount of time you invest

Share post

## Similar posts

AdBlock has stolen the banner, but banners are not teeth — they will be back

+1
Nice post!!!
0
The novice can easily learn R programming language within minutes by doing simple data analysis.
It depends much from previous learner experience.
R is more easy for statistic specialists without much previous programming experience, but Python is much more easy to understand for learners with basic programming experience.
I learned Python quite a lot and tried learn R. For me R seems very non-intuitive and hard to learn — it looks not like strict logical programming language, but more just like set of statistical tools where same things can be done in lot of different ways what confuse learner.
And R have no use outside data-science, while Python can be used in lot of different fields, so it's much more useful, is rising in his popularity from year to year and I am pretty sure it will continue replace R in future.
0

Stop directly comparing Python and R, these are two completely different tools. Python is a simplified language designed to basically solve everything at the cost of performance/maintainability. R is designed for data science and visualization. Period.

1)
When you compare data structures, it would be nice to see both Python and R implementations, as well as performance comparison. In order to get a tree-like structure in R, you can probably use environments, or write a faster implementation in C/C++ (with the help of e.g. RCpp), or look for a CRAN/GitHub package with said functionality.

2)
Comparing Python 2.7/3.x debacle to R and tidyverse relationship is simply inappropriate. While Python introduced breaking changes and some packages are not being ported to 3.x, R versions have been consistent.
tidyverse is an addon to R, a set of packages that can be used at any point in the code. tidyverse is closer to numpy/matplotlib libs from Python. There is always an alternative, but tidyverse happens to be more widely used.

3)

Python programming language has one OOP paradigm while in R, you can print a function to the terminal many times.

I fail to understand what does it even mean. Anyway, you show no examples of Python metaprogramming/OOP features, at the same time presenting a very poor example of R metaprogramming in a form of code generation. It would be nice to have an explanation of R OOP mechanics, S3 as the very basic level, S4, reference classes, proto (which powers ggplot2 OOP) and R6. This could be compared to Python more traditional OOP system with decorators (which do provide some level of metaprogramming).

9)
R learning curve can be much steeper than that of Python. Python was designed to be simple and understandable and its syntax is very concise. R, on the contrary, differs from typical programming languages. It is also affected by its predecessor, the S language. In R, virtually everything is an expression, and this feature, together with the non-standard evaluation, are the greatest tools that R has to offer. R also has formulas, a quite strange type that is used in e.g. machine learning/fitting procedures (like lm()). Formulae,NSE and environments are at the core of grammar and syntaxis of tools like ggplot2, dplyr, purrr, data.table and so on. I doubt that learning to understand NSE and environments is easier than mastering advanced Python OOP.

10)
Elegance is indeed subjective. I find R more elegant because it naturally allows for functional-style piping (thanks to magrittr and friends). Pipes, NSE used within dplyr/data.table, lambdas that are constructed from one-sided functions in purrr (~ .x + 5), pronouns like . — all these features make R more elegant, in my opinion.
Python, however, has a very typical imperative/OOP syntax. Its only advantage is the absence of braces/other scope delimiting symbols and usage of short keywords (like def, pass, with). The indentation may sometimes make the code less elegant.