python -m venv nameOfMyEnv
β an isolated Python environment that allows you to install packages and manage dependencies separately from your system/main Python installation. This helps avoid conflicts between projects that may require different versions of the same packages. venv
s are routinely used in professional projects. We will not use them systematically in this course, but know that they are best practice for managing projects ensuring reproducibility
BTW, something similar now exists also in R via the renv
package, although unfortunately not widely used
Practically, it is a local folder with an isolated full Python environment. It contains:
This folder can ideally be placed inside your project directory
Create a virtual environment with this command in your bash/terminal:
then activate it before using
β¦alternatively, inside IDEs, you may activate the venv
via specific commands like reticulate::use_virtualenv("nameOfMyEnv", required=T)
(in RStudio), or setting the Python interpreter manually and then restarting the kernel (in Spyder)
Regardless of your main Python installation, you should reinstall all packages needed for your project inside the local venv
(after activating it). This is considered best practice, as it ensures isolation across projects and reproducibility. At any time, you can export a requirements.txt
file to document the exact versions of all installed packages (this is particularly useful for sharing your environment, e.g., via GitHub):
"data/myfile.csv"
);myProjectFolder/
βββ venv/ β virtual environment
βββ data/ β .csv, .xlsx, etc.
βββ scripts/ β .py scripts
βββ results/ β output files, figures
βββ notebooks/ β markdowns, colab notebooks, etc.
βββ requirements.txt β list of installed packages for reproducibility
βββ README.md β brief description of the project
Installing, inside an IDE console or Colab:
Then, before using any of their functions, import packages and modules:
as
gives a shorter alias to a package or module name (e.g., pd
for pandas
, np
for numpy
); this is convenient because in Python you frequently need to call different functions by always specifying the package/module name (unlike in R; unless you import individual functions, e.g., from numpy import array
)
Use a function from a package, and call help:
Use tab
to autocomplete and explore available functions of a package β΄
As in R, you can rely on positional order of arguments instead of naming them, or you can completely omit them if there are valid default arguments. However, itβs best practice to make all relevant arguments explicit for readability and reproducibility
In Python, objects may have functions attached to them: these are called methods, and are accessed using dot (.
) notation (more on this later!)
Use tab
to autocomplete and explore available methods of an object β
getwd()
/ setwd()
:
(in Colab, paths are relative to the notebook location in Google Drive)
save.image()
of R)simpler version
(this simpler version is suboptimal because it doesnβt properly close the file after using, but still works)
pandas
later!)from CSV
from Excel
from Ctrl+C
copied elements (beautiful β€οΈ but only for Windows)
rm(df)
in R)ls()
in R)dir()
dir()
is a built-in function that does more than just returning a list of objects in workspace; it allows you to inspect all attributes and methods of any object
['append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse']
['all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'device', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield']
['abs', 'absolute', 'acos', 'acosh', 'add', 'all', 'allclose', 'amax', 'amin', 'angle', 'any', 'append', 'apply_along_axis', 'apply_over_axes', 'arange', 'arccos', 'arccosh', 'arcsin', 'arcsinh', 'arctan', 'arctan2', 'arctanh', 'argmax', 'argmin', 'argpartition', 'argsort', 'argwhere', 'around', 'array', 'array2string']