import numpy as np
import pandas as pd
import math
= np.array([7.3, 22.5, 30.0, 25.3])
myVect = pd.DataFrame({"x": [1, 2, 3, 4], "y": [5, 6, 7, 8]}) df
Python and R are powerful programming languages widely used in data science and in many cases they do not feel much different
But R was developed by statisticians for statistical computing, modeling, and data visualization, while Python is more general-purpose language designed for readability, efficiency and scalability in large-scale computation across many possible uses.
Their syntax, typing rules and behavior reflect these different purposes…
Many basic aspects of syntax and functions may feel practically the same…
Task | Python | R |
---|---|---|
Assignment | x = [1,2,3] |
x = list(1,2,3) or x <- list(1,2,3) |
Indexing | x[0] |
x[1] |
Comments | # like this |
# like this |
print(x) |
print(x) |
|
Functions | sum(x) |
sum(x) |
Functions | round(2/3, 3) |
round(2/3, 3) |
Coercion | float("5.334") |
as.numeric("5.334") |
Type check | type(x) |
typeof(x) |
Logical values | True , False |
TRUE , FALSE |
…but for us, it might be a bit annoying that Python does not have built-in functions for statistical- and data-related tasks like
Task | Python | R |
---|---|---|
average | np.mean(x) |
mean(x) |
standard deviation | np.std(x) |
sd(x) |
square root | np.sqrt(x) |
sqrt(x) |
linear model | smf.ols("y ~ x", data=df).fit() |
lm(y ~ x, data=df) |
correlation | np.corrcoef(x, y) |
cor(x,y) |
create data frame | pd.DataFrame(...) |
data.frame(...) |
random normal | np.random.normal(0,1,size=1) |
rnorm(n=1) |
normal cdf | scipy.stats.norm.cdf(1.96) |
pnorm(1.96) |
.
”In Python, the dot (“.
”) has a very special role and it is part of the syntax
df.shape
: access attribute of dataframe df
;df.head()
: access method attached to a DataFrame;math.sqrt(16)
: access method of a package (like ::
in R);sklearn.linear_model
: access submodule of a package;model.fit()
: access function for fitting object model
;my.data = 5
: returns error in Python!On the contrary, in R it has no particular meaning and can be part of names
.
”.
”.
to access submodules of modules: statsmodels.formula.api
, np.random
; class constructor methods: .DataFrame()
, .ols()
; methods: .fit()
, .normal()
, .summary()
; attributes: .bic
.
” Results: Ordinary least squares
================================================================
Model: OLS Adj. R-squared: -0.101
Dependent Variable: y AIC: 64.4683
Date: 2025-06-04 15:53 BIC: 67.4555
No. Observations: 20 Log-Likelihood: -29.234
Df Model: 2 F-statistic: 0.1303
Df Residuals: 17 Prob (F-statistic): 0.879
R-squared: 0.015 Scale: 1.2815
-----------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
-----------------------------------------------------------------
Intercept -0.0355 0.2571 -0.1379 0.8919 -0.5778 0.5069
x1 -0.0451 0.3117 -0.1447 0.8867 -0.7028 0.6126
x2 0.1500 0.3000 0.5000 0.6235 -0.4830 0.7829
----------------------------------------------------------------
Omnibus: 2.891 Durbin-Watson: 1.761
Prob(Omnibus): 0.236 Jarque-Bera (JB): 1.210
Skew: -0.025 Prob(JB): 0.546
Kurtosis: 1.796 Condition No.: 2
================================================================
np.float64(67.45549215305579)
In Python you cannot index outside a list or vector
But you can use the built-in function append()
:
Python
[10, 11, 12, 13, 14, 15, 10, 11, 12, 13, 14, 15, 10, 11, 12, 13, 14, 15]
['a', 'b', 'c', 'd', 'f', 'a', 'b', 'c', 'd', 'f', 'a', 'b', 'c', 'd', 'f']
TypeError: can only concatenate list (not "int") to list
R
Error in grades * 3: non-numeric argument to binary operator
In Python, you may even multiply and add strings:
'BasicsPythonBasicsPythonBasicsPythonBasicsPythonBasicsPython'
'BasicsPython is a PhD course at Psychological Sciences'
However, classical numerical operations on vectors may appear incredibly painful for us…
…luckily, there are good packages for data science in Python! To obtain vectorized operations like those we expect for data analysis and we are accustomed to in R, we can use the numpy
package (more on numpy
later!):
Unlike in R, where syntactic symbols such as the curly brackets {}
define code blocks, Python uses indentation (i.e., spaces at the beginning of lines) to delimit blocks of code. This makes the code more readable… but also unforgiving: indentation is part of the syntax!
Incorrect indentation in Python raises errors 🤯
This enforces clarity in Python code, but requires discipline…
Remember that, in any case, writing tidy and readable code is a best practice!
With mutable types like lists, “=
” creates a reference, not a copy like in R
This referencing allows for faster, convenient, and memory-efficient operations when editing large datasets, although of course it requires caution to avoid unintended modifications
To create a copy, you could use the copy()
function
Python