len('do re mi fa sol')
15
In previous tutorials, we studied how tests and loops work, allowing us to write Python programs that make automated decisions. In practice, a program will generally consist of different blocks, each executing an action or a group of actions (e.g., data import, data cleaning, statistical modeling, etc.). Furthermore, some of these actions are repeated with slight differences throughout a program (e.g., importing multiple different datasets). It will be useful to model each of these actions as a function, a sort of mini-program within the overall program. Using functions is a best practice in programming, as they make the logical structure of the code more explicit and help reduce code duplication.
A function can be defined as a structured block of code that:
We have already seen and used several functions in previous tutorials (range
, len
, etc.). We have also used methods, which are simply functions attached to a particular type of object. Let’s use a well-known function to illustrate their general operation.
len('do re mi fa sol')
15
In this example, the len
function:
The “set of instructions” that calculate the length of the string is not known. As a user, you only need to know what the function takes as input and what it returns as output. This is true for cases where you use Python’s built-in functions or functions from trusted Python libraries. Such functions are referred to as “black boxes.”
In practice, you will want to define your own functions to structure your code and reuse it in analyses.
The def
statement is used to define a function.
def welcome(name):
= "Greetings " + name + "!"
msg return msg
Let’s analyze the syntax of the function definition:
a def
statement that:
welcome
)name
):
like the different statements we have seena set of operations that will be performed by the function, which must be indented one level relative to the def
statement
a return
statement that specifies what the function will return when called (here, the content of the msg
variable)
Defining a function as above makes the function’s code available in the Python environment. It is only when the function is called in the code, with arguments, that the contained code is executed and produces a result.
"Miranda") welcome(
'Greetings Miranda!'
As explained in the introduction, the main purpose of a function is to reuse code without duplicating it in the program.
"Romuald") welcome(
'Greetings Romuald!'
When you call a function and specify arguments, you are “passing” arguments to it. These arguments then become variables that can be used within the context of the function. Unlike a for
loop, the variables created do not persist after the function call.
def addition(x, y):
return x + y
5, 3) addition(
8
# The variable does not persist in memory after the function call x
5
Note: We will look more closely at this behavior later in the tutorial through the concepts of global and local variables.
The number of arguments you can pass to a function varies. Strictly speaking, you can define a function that does not need any arguments, although this is rarely useful in practice.
def nine():
return 9
= nine()
a a
9
In Python, functions allow two modes of passing arguments:
passing by position, which is the mode we have seen in all previous examples: arguments are passed to the function in the order they were defined, without specifying the parameter name.
passing by keyword: you specify the parameter name when passing the argument, which allows you not to follow the order specified during the definition.
Let’s illustrate this difference with a function that simply performs a division.
def division(x, y):
return x / y
4, 2) # Passing by position division(
2.0
=4, y=2) # Passing by keyword division(x
2.0
In the case of passing by position, maintaining the order is imperative.
print(division(0, 5))
print(division(5, 0))
0.0
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) Cell In[51], line 2 1 print(division(0, 5)) ----> 2 print(division(5, 0)) Cell In[48], line 2, in division(x, y) 1 def division(x, y): ----> 2 return x / y ZeroDivisionError: division by zero
In the case of passing by keyword, the order no longer matters.
print(division(x=0, y=5))
print(division(y=5, x=0))
0.0
0.0
When defining a function, it is common to want to mix arguments that the user must specify and optional arguments that specify a default behavior of the function but can be changed if needed.
Let’s see how we can modify the behavior of the print
function using an optional argument.
print("hello")
print("hello")
hello
hello
print("hello", end=' ')
print("hello")
hello hello
We modified the behavior of the first print
call via the optional end
parameter. By default, this value is set to '\n'
, meaning a newline. We changed it to a space in the second cell, hence the difference in result.
This example also illustrates the link between the mandatory or optional nature of an argument and its passing mode:
generally, mandatory arguments are passed by position. They can also be passed by keyword, but since they are “expected,” they are usually passed by position for conciseness
optional arguments must be passed by keyword, to clearly indicate that the default behavior of the function is being modified
How do you specify that an argument is optional when defining a function? Simply by specifying a default value for the argument. For example, let’s build a function that concatenates two strings and allows the user to specify a separator.
def concat_string(str1, str2, sep=''):
return str1 + sep + str2
'hello', 'world') # Default behavior concat_string(
'helloworld'
'hello', 'world', sep=', ') # Modified behavior concat_string(
'hello, world'
This example also illustrates the rule when mixing positional and keyword arguments: positional arguments must always be placed before keyword arguments.
We have seen that every function returns a result as output and that the return
statement specifies this result. When the function is called, it is evaluated to the value specified by return
, and this value can then be stored in a variable and used in subsequent calculations, and so on.
def division(x, y):
return x / y
= division(4, 2)
a = division(9, 3)
b # 2 / 3 division(a, b)
0.6666666666666666
Important note: when a return
statement is reached in a function, the rest of the function is not executed.
def test(x):
return x
print("Will I be displayed?")
3) test(
3
None
ValueA function necessarily returns a result when called… but what happens if you do not specify a return
statement?
def welcome(name):
print("Greetings " + name + "!")
= welcome("Leontine")
x print(x)
print(type(x))
Greetings Leontine!
None
<class 'NoneType'>
As expected, the function printed a welcome message in the console. But we did not specify a value to return. Since an object must still be returned by definition, Python returns the value None
, which is a special object of type NoneType
representing the absence of a value. Its only purpose is to clearly indicate the difference between a real value and the absence of a value.
To test if an object has the value None
, use the following syntax:
is None # and not x == None x
True
A function by definition returns one result, which can be any Python object. What if you want to return multiple results? You can simply store the different results in a container (list, tuple, dictionary, etc.), which can hold many objects.
In practice, it is very common to return a tuple when you want to return multiple objects. Tuples have the property of tuple unpacking, which we have seen several times in previous tutorials. This property allows a very convenient and elegant syntax for assigning the results of a function to variables.
def powers(x):
return x**2, x**3, x**4
= powers(2)
a, b, c
print(a)
print(b)
print(c)
4
8
16
In the introduction, we saw that functions could be viewed as mini-programs within a global program. This interpretation gives us an opportunity to quickly discuss the notion of scope in Python. A scope is a sort of container for Python objects, which can only be accessed within the context of that scope.
All the objects (variables, functions, etc.) that you define during a Python session are
recorded in Python’s global scope. These objects can then be accessed anywhere in the program, including within a function. When this happens, they are referred to as global variables.
= 5 # global variable
x
def add(y):
return x + y
6) add(
11
The x
variable was not passed as an argument to the add
function nor defined within the function. Yet, it can be called within the function. This allows sharing elements between multiple functions.
However, arguments passed to a function or variables defined within the function are local variables: they only exist within the specific context of the function and cannot be reused once the function has executed.
def add(y):
= 5 # local variable
z return z + y
6)
add(print(z)
3
Within a given context, each variable is unique. However, it is possible to have variables with the same name in different contexts. Let’s see what happens when we create a variable within the context of a function, even though it already exists in the global context.
= 5 # global variable
x
def add(y):
= 10
x return x + y
6) add(
16
This is a good example of a more general principle: the most local context always takes precedence. When Python performs the x + y
operation, it looks for the values of x
and y
first in the local context and then, only if it doesn’t find them, in the higher context—in this case, the global context.
Note: We will see in a future tutorial on best practices that it is best to limit the use of global variables to a strict minimum, as they reduce the reproducibility of analyses.
Using functions helps reduce code duplication and better isolate the logical blocks of a program.
A function takes arguments as input, performs a specific action through a set of instructions, and returns a result as output.
“Black box” functions are functions whose code is unknown when executed, such as Python’s built-in functions (len, range..). They are opposed to user-created functions.
When you define a function using the def statement, you store the function’s code in memory. It is only when you call the function that this code is executed, and a result is returned.
As many as you want.
By position: you pass the arguments in the order they were specified when the function was defined. By keyword: you pass the arguments by naming them.
To modify the default behavior of a function, as intended by its designer.
First the mandatory arguments, then the optional arguments.
No, a function always returns an object. If no return statement is specified, the function returns the value None, which is an object of type NoneType.
No, a function returns a single object. However, if you want a function to return multiple results, you can simply put them in a container (list, tuple, dictionary..).
They disappear and cannot be reused in the global scope.
Create a power
function that takes two numbers x
and y
as input and returns the power function \(x^y\).
# Test your answer in this cell
def power(x, y):
return x**y
2, 3) power(
8
Given x = 5
and y = 3
as arguments passed to each of the functions defined in the following cell. Predict what the functions will return (value and type
of the object), and verify your answers.
def f1(x):
return x
def f2(x):
return ''
def f3(x):
print("Hello World")
def f4(x, y):
print(x + y)
def f5(x, y):
+ y
x
def f6(x, y):
if x >= 3 and y < 9:
return 'test ok'
else:
return 'test not ok'
def f7(x, y):
return f6(2, 8)
def f8(x, y, z):
return x + y + z
def f9(x, y, z=5):
return x + y + z
# Test your answer in this cell
- f1. Value: 5; Type: int
- f2. Value: ''; Type: str
- f3. Value: None; Type: NoneType
- f4. Value: None; Type: NoneType
- f5. Value: None; Type: NoneType
- f6. Value: 'test ok'; Type: str
- f7. Value: 'test not ok'; Type: str
- f8. Error: z is not defined
- f9. Value: 13; Type: int
Cell In[71], line 1 - f1. Value: 5; Type: int ^ SyntaxError: illegal target for annotation
What is the value of the total
variable in the following program?
= 3
z
def f1(x, y):
= 5
z return x + y + z
def f2(x, y, z=1):
return x + y + z
def f3(x, y):
return x + y + z
= f1(2, 3) + f2(3, 1) + f3(1, 0)
total print(total)
19
# Test your answer in this cell
= 3
z
def f1(x, y):
= 5
z return x + y + z
def f2(x, y, z=1):
return x + y + z
def f3(x, y):
return x + y + z
= f1(2, 3) + f2(3, 1) + f3(1, 0)
total
print(f1(2, 3))
# The local variable z within f1 is used -> f1 returns 10
print(f2(3, 1))
# The local variable z within f1 is used
# Its default value is 1 -> f2 returns 5
print(f3(1, 0))
# The global variable z is used -> f3 returns 4
print(total)
10
5
4
19
Write a calculator
function that:
Use the tuple unpacking property to assign the results to variables in a single line.
# Test your answer in this cell
def calculator(a, b):
return a + b, a - b, a * b, a / b
= calculator(5, 3)
add, sub, mult, div print(add, sub, mult, div)
8 2 15 1.6666666666666667
Write a function that:
Hint: The procedure was discussed in the tutorial on dictionaries and sets.
# Test your answer in this cell
def dedup(l, sort=False):
= list(set(l))
l_dedup if sort:
l_dedup.sort()return l_dedup
= ["a", "a", "b", "c"]
l print(dedup(l)) # Default behavior: no sorting
print(dedup(l, sort=True)) # Modified behavior: sorting
['c', 'b', 'a']
['a', 'b', 'c']
Write a function that:
# Test your answer in this cell
def multiply(l):
print("There are " + str(len(l)) + " numbers in the list.")
c
= 1
for x in l:
*= x # Equivalent to: c = c * x
c return c
= [2, 8, 3]
l multiply(l)
File <tokenize>:5 = 1 ^ IndentationError: unindent does not match any outer indentation level
In an exercise from the previous tutorial, we manually coded the calculation of the variance of a list of numbers using the formula: \[\sigma^2 = {\frac {1}{n}}\sum_{i=1}^{n} (x_{i}-\bar{x})^2\]
Strictly speaking, this formula is valid when calculating population variance. If we only observe a sample of the population, we do not calculate the variance but estimate it, and we must then use the following formula to obtain an unbiased estimator of the true variance: \[s^2 = {\frac {1}{n-1}}\sum_{i=1}^{n} (x_{i}-\bar{x})^2\].
To account for this distinction:
mean
function that calculates the mean as in the previous tutorial exercisevar
function that calculates the variance as in the previous tutorial exercise (calling the mean
function to calculate the mean)var
function to allow the user to choose the calculation method via an optional mode
parameter (default value: ‘population’ for calculation using the population formula; alternative value: ‘sample’ for calculation using the sample formula)Compare the values obtained in both cases with what the black box function var
from the numpy
library returns (see the solution to the previous tutorial exercise for the syntax, and see the doc of the function, especially the ddof
parameter to vary the calculation method).
# Test your answer in this cell
def mean(x):
= len(x)
n = 0
sum_mean for x_i in x:
+= x_i
sum_mean = sum_mean / n
mean return mean
def var(x, mode="population"):
= len(x)
n = mean(x)
mean_value = 0
sum_var for x_i in x:
+= (x_i - mean_value)**2
sum_var if mode == "population":
= sum_var / n
variance elif mode == "sample":
= sum_var / (n-1)
variance return variance
= [8, 18, 6, 0, 15, 17.5, 9, 1]
x print(mean(x))
print(var(x)) # population
print(var(x, mode="sample")) # sample
# Verification with numpy library functions
import numpy as np
print(np.mean(x))
print(np.var(x)) # population
print(np.var(x, ddof=1)) # sample
9.3125
42.93359375
49.066964285714285
9.3125
42.93359375
49.066964285714285
Recursive functions are functions that call themselves within the body of the function, causing infinite calls until a stopping criterion is reached.
A good example of a recursive function is one that calculates the factorial of an integer. The factorial of a natural number \(n\) is the product of all positive integers less than or equal to n. For example: \(5! = 5*4*3*2*1 = 120\).
Code this function and verify that it works correctly.
# Test your answer in this cell
def factorial(n):
if n == 0:
# Stopping criterion
return 1
else:
return n * factorial(n-1)
5) factorial(
120