Basic Syntax, Typing, Indexing, Differences with R

Enrico Toffalini

Two Languages, Two Philosophies

Python and R are powerful programming languages widely used in data science and in many cases they do not feel much different

But R was developed by statisticians for statistical computing, modeling, and data visualization, while Python is more general-purpose language designed for readability, efficiency and scalability in large-scale computation across many possible uses.

Their syntax, typing rules and behavior reflect these different purposes…

Basic syntax

Many basic aspects of syntax and functions may feel practically the same…

Task Python R
Assignment x = [1,2,3] x = list(1,2,3) or
x <- list(1,2,3)
Indexing x[0] x[1]
Comments # like this # like this
Print print(x) print(x)
Functions sum(x) sum(x)
Functions round(2/3, 3) round(2/3, 3)
Coercion float("5.334") as.numeric("5.334")
Type check type(x) typeof(x)
Logical values True, False TRUE, FALSE

Key differences: BUILT-IN FUNCTIONS for stats

…but for us, it might be a bit annoying that Python does not have built-in functions for statistical- and data-related tasks like

Task Python R
average np.mean(x) mean(x)
standard deviation np.std(x) sd(x)
square root np.sqrt(x) sqrt(x)
linear model smf.ols("y ~ x", data=df).fit() lm(y ~ x, data=df)
correlation np.corrcoef(x, y) cor(x,y)
create data frame pd.DataFrame(...) data.frame(...)
random normal np.random.normal(0,1,size=1) rnorm(n=1)
normal cdf scipy.stats.norm.cdf(1.96) pnorm(1.96)

Key differences: ROLE OF “.

In Python, the dot (“.”) has a very special role and it is part of the syntax

  • Access objects attributes, methods and functions:
    • df.shape: access attribute of dataframe df;
    • df.head(): access method attached to a DataFrame;
    • math.sqrt(16): access method of a package (like :: in R);
    • sklearn.linear_model: access submodule of a package;
    • model.fit(): access function for fitting object model;
  • Cannot be used in variable names!
    • my.data = 5: returns error in Python!

On the contrary, in R it has no particular meaning and can be part of names

Key differences: ROLE OF “.

import numpy as np
import pandas as pd
import math
myVect = np.array([7.3, 22.5, 30.0, 25.3])
df = pd.DataFrame({"x": [1, 2, 3, 4], "y": [5, 6, 7, 8]})

.mean() accesses a method of a numpy array

myVect.mean()
np.float64(21.275)

.shape accesses an attribute of a pandas DataFrame

df.shape
(4, 2)

.pi accesses an attribute of module ‘math

math.pi
3.141592653589793

Key differences: ROLE OF “.

using . to access submodules of modules: statsmodels.formula.api, np.random; class constructor methods: .DataFrame(), .ols(); methods: .fit(), .normal(), .summary(); attributes: .bic
import statsmodels.formula.api as smf

df = pd.DataFrame({
  "x1": np.random.normal(size=20),
  "x2": np.random.normal(size=20),
  "y" : np.random.normal(size=20)
})

myModel = smf.ols("y ~ x1 + x2", data=df)
myFit = myModel.fit()

myFit.summary()
myFit.bic

Key differences: ROLE OF “.

                Results: Ordinary least squares
================================================================
Model:              OLS              Adj. R-squared:     -0.101 
Dependent Variable: y                AIC:                64.4683
Date:               2025-06-04 15:53 BIC:                67.4555
No. Observations:   20               Log-Likelihood:     -29.234
Df Model:           2                F-statistic:        0.1303 
Df Residuals:       17               Prob (F-statistic): 0.879  
R-squared:          0.015            Scale:              1.2815 
-----------------------------------------------------------------
               Coef.   Std.Err.     t     P>|t|    [0.025  0.975]
-----------------------------------------------------------------
Intercept     -0.0355    0.2571  -0.1379  0.8919  -0.5778  0.5069
x1            -0.0451    0.3117  -0.1447  0.8867  -0.7028  0.6126
x2             0.1500    0.3000   0.5000  0.6235  -0.4830  0.7829
----------------------------------------------------------------
Omnibus:              2.891        Durbin-Watson:          1.761
Prob(Omnibus):        0.236        Jarque-Bera (JB):       1.210
Skew:                 -0.025       Prob(JB):               0.546
Kurtosis:             1.796        Condition No.:          2    
================================================================
np.float64(67.45549215305579)

Key differences: INDEXING

Key differences: INDEXING

Python

grades = ["a","b","c","d","f"]
grades[1]
'b'

1st element is actually “zero”

grades[0]
'a'

R

grades = c("a","b","c","d","f")
grades[1]
[1] "a"

“minus” means index from end

grades[-2]
'd'

“minus” means exclude

grades[-2]
[1] "a" "c" "d" "f"

takes at [2] and [3]

grades[2:4]
['c', 'd']

takes at [2], [3], and [4]

grades[2:4]
[1] "b" "c" "d"

Key differences: INDEXING

In Python you cannot index outside a list or vector

Python

grades = ["A","B","C","D","E","F"]
grades[10]
IndexError: list index out of range

R

grades = c("A","B","C","D","E","F")
grades[10]
[1] NA
grades[10] = "K"
IndexError: list assignment index out of range
grades[10] = "K"
grades
 [1] "A" "B" "C" "D" "E" "F" NA  NA  NA  "K"

But you can use the built-in function append():

grades.append("K")
grades
['A', 'B', 'C', 'D', 'E', 'F', 'K']

Key differences: VECTORIZED OPERATIONS

Python

grades = [10, 11, 12, 13, 14, 15]
grades * 3
[10, 11, 12, 13, 14, 15, 10, 11, 12, 13, 14, 15, 10, 11, 12, 13, 14, 15]
grades = ["a","b","c","d","f"]
grades * 3
['a', 'b', 'c', 'd', 'f', 'a', 'b', 'c', 'd', 'f', 'a', 'b', 'c', 'd', 'f']
grades = [10, 11, 12, 13, 14, 15]
grades + 10
TypeError: can only concatenate list (not "int") to list
grades = [10, 11, 12, 13, 14, 15]
grades + [10, 1000]
[10, 11, 12, 13, 14, 15, 10, 1000]

R

grades = c(10,11,12,13,14,15)
grades * 3
[1] 30 33 36 39 42 45
grades = c("a","b","c","d","f")
grades * 3
Error in grades * 3: non-numeric argument to binary operator
grades = c(10,11,12,13,14,15)
grades + 10
[1] 20 21 22 23 24 25


grades = c(10,11,12,13,14,15)
grades + c(10, 1000) # vector recycling
[1]   20 1011   22 1013   24 1015

Key differences: VECTORIZED OPERATIONS

In Python, you may even multiply and add strings:

ourCourse = "BasicsPython"
ourCourse * 5
'BasicsPythonBasicsPythonBasicsPythonBasicsPythonBasicsPython'
ourCourse + " is a PhD course at Psychological Sciences"
'BasicsPython is a PhD course at Psychological Sciences'

However, classical numerical operations on vectors may appear incredibly painful for us…

grades = [10, 11, 12, 13, 14, 15]
grades = [g + 10 for g in grades]
print(grades)
[20, 21, 22, 23, 24, 25]

Key differences: VECTORIZED OPERATIONS

…luckily, there are good packages for data science in Python! To obtain vectorized operations like those we expect for data analysis and we are accustomed to in R, we can use the numpy package (more on numpy later!):

import numpy as np

grades = np.array([10,11,12,13,14,15])

grades * 3
array([30, 33, 36, 39, 42, 45])
grades + 10
array([20, 21, 22, 23, 24, 25])

Key differences: THE CRUCIAL ROLE OF INDENTATION

Unlike in R, where syntactic symbols such as the curly brackets {} define code blocks, Python uses indentation (i.e., spaces at the beginning of lines) to delimit blocks of code. This makes the code more readable… but also unforgiving: indentation is part of the syntax!

Python: indentation is not just style, it is mandatory!

age = 20
if age >= 18:
    print("Adult")  # indented block
else:
    print("Minor")  # same indentation
Adult

R: indentation is good practice, but it relies on brackets { }

age = 20
if(age >= 18){
    print("Adult")
} else {
    print("Minor")
}
[1] "Adult"

Key differences: THE CRUCIAL ROLE OF INDENTATION

Incorrect indentation in Python raises errors 🤯

if age >= 18: 
print("Adult") 
expected an indented block after 'if' statement on line 1 (<string>, line 2)

R accepts even totally inconsistent indentation

if(age >= 18){print("Adult") } else {
print("Minor") }
[1] "Adult"
for i in range(3):
print(i)
expected an indented block after 'for' statement on line 1 (<string>, line 2)
for(i in 1:3){
print(i) }
[1] 1
[1] 2
[1] 3

This enforces clarity in Python code, but requires discipline
Remember that, in any case, writing tidy and readable code is a best practice!

Key differences: COPYING MUTABLE TYPES

With mutable types like lists, “=” creates a reference, not a copy like in R

Python
v1 = ["A","B","C","D"]

newvar = v1
newvar[0] = "ZZZ"
newvar
['ZZZ', 'B', 'C', 'D']
v1
['ZZZ', 'B', 'C', 'D']
R
v1 = c("A","B","C","D")

newvar = v1
newvar[1] = "ZZZ"
newvar
[1] "ZZZ" "B"   "C"   "D"  
v1  # untouched!
[1] "A" "B" "C" "D"

This referencing allows for faster, convenient, and memory-efficient operations when editing large datasets, although of course it requires caution to avoid unintended modifications

Key differences: COPYING MUTABLE TYPES

To create a copy, you could use the copy() function

Python
v1 = ["A","B","C","D"]

newvar = v1.copy()
newvar[0] = "ZZZ"
newvar
['ZZZ', 'B', 'C', 'D']
v1  # untouched!
['A', 'B', 'C', 'D']
R
v1 = c("A","B","C","D")

newvar = v1
newvar[1] = "ZZZ"
newvar
[1] "ZZZ" "B"   "C"   "D"  
v1  # untouched!
[1] "A" "B" "C" "D"

Key differences: MULTIPLE ASSIGNMENT AND UNPACKING

Python

a, b, c = 59.2, 11.4, 98.0
print(b)
11.4
x = [59.2, 11.4, 98.0]
a, b, c = x
print(b)
11.4
a, b, c = np.random.normal(size=3)
print(b)
2.294976928839455
intrcpt, beta0, beta1 = myModel.params

R

a, b, c = 59.2, 11.4, 98.0
Error in parse(text = input): <text>:1:2: unexpected ','
1: a,
     ^





intrcpt = coef(myModel)[1]
beta0 = coef(myModel)[2]
beta1 = coef(myModel)[3]