Functions

In previous tutorials, we studied how tests and loops work, allowing us to write Python programs that make automated decisions. In practice, a program will generally consist of different blocks, each executing an action or a group of actions (e.g., data import, data cleaning, statistical modeling, etc.). Furthermore, some of these actions are repeated with slight differences throughout a program (e.g., importing multiple different datasets). It will be useful to model each of these actions as a function, a sort of mini-program within the overall program. Using functions is a best practice in programming, as they make the logical structure of the code more explicit and help reduce code duplication.

Definition

A function can be defined as a structured block of code that:

  • takes a set of arguments (Python objects) as input
  • performs a specific action through a set of instructions
  • returns a result (a Python object) as output

We have already seen and used several functions in previous tutorials (range, len, etc.). We have also used methods, which are simply functions attached to a particular type of object. Let’s use a well-known function to illustrate their general operation.

len('do re mi fa sol')
15

In this example, the len function:

  • takes an argument as input (a string)
  • calculates the number of characters present in the string
  • returns this number as output

The “set of instructions” that calculate the length of the string is not known. As a user, you only need to know what the function takes as input and what it returns as output. This is true for cases where you use Python’s built-in functions or functions from trusted Python libraries. Such functions are referred to as “black boxes.”

In practice, you will want to define your own functions to structure your code and reuse it in analyses.

Syntax

The def statement is used to define a function.

def welcome(name):
    msg = "Greetings " + name + "!"
    return msg

Let’s analyze the syntax of the function definition:

  • a def statement that:

    • specifies the name of the function (here, welcome)
    • specifies the expected arguments in parentheses (here, a single argument: name)
    • ends with : like the different statements we have seen
  • a set of operations that will be performed by the function, which must be indented one level relative to the def statement

  • a return statement that specifies what the function will return when called (here, the content of the msg variable)

Defining a function as above makes the function’s code available in the Python environment. It is only when the function is called in the code, with arguments, that the contained code is executed and produces a result.

welcome("Miranda")
'Greetings Miranda!'

As explained in the introduction, the main purpose of a function is to reuse code without duplicating it in the program.

welcome("Romuald")
'Greetings Romuald!'

Passing Arguments

Principle

When you call a function and specify arguments, you are “passing” arguments to it. These arguments then become variables that can be used within the context of the function. Unlike a for loop, the variables created do not persist after the function call.

def addition(x, y):
    return x + y
addition(5, 3)
8
x  # The variable does not persist in memory after the function call
5

Note: We will look more closely at this behavior later in the tutorial through the concepts of global and local variables.

Number of Arguments

The number of arguments you can pass to a function varies. Strictly speaking, you can define a function that does not need any arguments, although this is rarely useful in practice.

def nine():
    return 9
a = nine()
a
9

Passing by Position and Passing by Keyword

In Python, functions allow two modes of passing arguments:

  • passing by position, which is the mode we have seen in all previous examples: arguments are passed to the function in the order they were defined, without specifying the parameter name.

  • passing by keyword: you specify the parameter name when passing the argument, which allows you not to follow the order specified during the definition.

Let’s illustrate this difference with a function that simply performs a division.

def division(x, y):
    return x / y
division(4, 2)  # Passing by position
2.0
division(x=4, y=2)  # Passing by keyword
2.0

In the case of passing by position, maintaining the order is imperative.

print(division(0, 5))
print(division(5, 0))
0.0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[51], line 2
      1 print(division(0, 5))
----> 2 print(division(5, 0))

Cell In[48], line 2, in division(x, y)
      1 def division(x, y):
----> 2     return x / y

ZeroDivisionError: division by zero

In the case of passing by keyword, the order no longer matters.

print(division(x=0, y=5))
print(division(y=5, x=0))
0.0
0.0

Mandatory and Optional Arguments

When defining a function, it is common to want to mix arguments that the user must specify and optional arguments that specify a default behavior of the function but can be changed if needed.

Let’s see how we can modify the behavior of the print function using an optional argument.

print("hello")
print("hello")
hello
hello
print("hello", end=' ')
print("hello")
hello hello

We modified the behavior of the first print call via the optional end parameter. By default, this value is set to '\n', meaning a newline. We changed it to a space in the second cell, hence the difference in result.

This example also illustrates the link between the mandatory or optional nature of an argument and its passing mode:

  • generally, mandatory arguments are passed by position. They can also be passed by keyword, but since they are “expected,” they are usually passed by position for conciseness

  • optional arguments must be passed by keyword, to clearly indicate that the default behavior of the function is being modified

How do you specify that an argument is optional when defining a function? Simply by specifying a default value for the argument. For example, let’s build a function that concatenates two strings and allows the user to specify a separator.

def concat_string(str1, str2, sep=''):
    return str1 + sep + str2
concat_string('hello', 'world')  # Default behavior
'helloworld'
concat_string('hello', 'world', sep=', ')  # Modified behavior
'hello, world'

This example also illustrates the rule when mixing positional and keyword arguments: positional arguments must always be placed before keyword arguments.

Returning Results

Principle

We have seen that every function returns a result as output and that the return statement specifies this result. When the function is called, it is evaluated to the value specified by return, and this value can then be stored in a variable and used in subsequent calculations, and so on.

def division(x, y):
    return x / y
a = division(4, 2)
b = division(9, 3)
division(a, b)  # 2 / 3
0.6666666666666666

Important note: when a return statement is reached in a function, the rest of the function is not executed.

def test(x):
    return x
    print("Will I be displayed?")
    
test(3)
3

The None Value

A function necessarily returns a result when called… but what happens if you do not specify a return statement?

def welcome(name):
    print("Greetings " + name + "!")
    
x = welcome("Leontine")
print(x)
print(type(x))
Greetings Leontine!
None
<class 'NoneType'>

As expected, the function printed a welcome message in the console. But we did not specify a value to return. Since an object must still be returned by definition, Python returns the value None, which is a special object of type NoneType representing the absence of a value. Its only purpose is to clearly indicate the difference between a real value and the absence of a value.

To test if an object has the value None, use the following syntax:

x is None  # and not x == None
True

Returning Multiple Results

A function by definition returns one result, which can be any Python object. What if you want to return multiple results? You can simply store the different results in a container (list, tuple, dictionary, etc.), which can hold many objects.

In practice, it is very common to return a tuple when you want to return multiple objects. Tuples have the property of tuple unpacking, which we have seen several times in previous tutorials. This property allows a very convenient and elegant syntax for assigning the results of a function to variables.

def powers(x):
    return x**2, x**3, x**4

a, b, c = powers(2)

print(a)
print(b)
print(c)
4
8
16

Local and Global Variables

In the introduction, we saw that functions could be viewed as mini-programs within a global program. This interpretation gives us an opportunity to quickly discuss the notion of scope in Python. A scope is a sort of container for Python objects, which can only be accessed within the context of that scope.

All the objects (variables, functions, etc.) that you define during a Python session are

recorded in Python’s global scope. These objects can then be accessed anywhere in the program, including within a function. When this happens, they are referred to as global variables.

x = 5  # global variable

def add(y):
    return x + y

add(6)
11

The x variable was not passed as an argument to the add function nor defined within the function. Yet, it can be called within the function. This allows sharing elements between multiple functions.

However, arguments passed to a function or variables defined within the function are local variables: they only exist within the specific context of the function and cannot be reused once the function has executed.

def add(y):
    z = 5  # local variable
    return z + y

add(6)
print(z)
3

Within a given context, each variable is unique. However, it is possible to have variables with the same name in different contexts. Let’s see what happens when we create a variable within the context of a function, even though it already exists in the global context.

x = 5  # global variable

def add(y):
    x = 10
    return x + y

add(6)
16

This is a good example of a more general principle: the most local context always takes precedence. When Python performs the x + y operation, it looks for the values of x and y first in the local context and then, only if it doesn’t find them, in the higher context—in this case, the global context.

Note: We will see in a future tutorial on best practices that it is best to limit the use of global variables to a strict minimum, as they reduce the reproducibility of analyses.

Exercises

Comprehension Questions

  1. Why is using functions in a program considered a best practice in development?
  2. What are the three characteristics of a function?
  3. What is a “black box” function? What other functions is it opposed to?
  4. What happens when you define a function? And when you call it?
  5. How many arguments can you pass to a function?
  6. What are the two modes of passing arguments to a function?
  7. What is the usefulness of passing optional arguments to a function?
  8. In what order should arguments be passed to a function if it has both mandatory and optional arguments?
  9. Are there functions that return nothing?
  10. Can a function return multiple objects?
  11. What happens to the variables in the local scope of a function once the function has been called?
Show the solution
  1. Using functions helps reduce code duplication and better isolate the logical blocks of a program.

  2. A function takes arguments as input, performs a specific action through a set of instructions, and returns a result as output.

  3. “Black box” functions are functions whose code is unknown when executed, such as Python’s built-in functions (len, range..). They are opposed to user-created functions.

  4. When you define a function using the def statement, you store the function’s code in memory. It is only when you call the function that this code is executed, and a result is returned.

  5. As many as you want.

  6. By position: you pass the arguments in the order they were specified when the function was defined. By keyword: you pass the arguments by naming them.

  7. To modify the default behavior of a function, as intended by its designer.

  8. First the mandatory arguments, then the optional arguments.

  9. No, a function always returns an object. If no return statement is specified, the function returns the value None, which is an object of type NoneType.

  10. No, a function returns a single object. However, if you want a function to return multiple results, you can simply put them in a container (list, tuple, dictionary..).

  11. They disappear and cannot be reused in the global scope.

Power Function

Create a power function that takes two numbers x and y as input and returns the power function \(x^y\).

# Test your answer in this cell
Show the solution
def power(x, y):
    return x**y

power(2, 3)
8

Predicting Values Returned by Functions

Given x = 5 and y = 3 as arguments passed to each of the functions defined in the following cell. Predict what the functions will return (value and type of the object), and verify your answers.

def f1(x):
    return x

def f2(x):
    return ''

def f3(x):
    print("Hello World")
    
def f4(x, y):
    print(x + y)
    
def f5(x, y):
    x + y
    
def f6(x, y):
    if x >= 3 and y < 9:
        return 'test ok'
    else:
        return 'test not ok'
    
def f7(x, y):
    return f6(2, 8)

def f8(x, y, z):
    return x + y + z

def f9(x, y, z=5):
    return x + y + z
# Test your answer in this cell
Show the solution
- f1. Value: 5; Type: int

- f2. Value: ''; Type: str

- f3. Value: None; Type: NoneType

- f4. Value: None; Type: NoneType

- f5. Value: None; Type: NoneType

- f6. Value: 'test ok'; Type: str

- f7. Value: 'test not ok'; Type: str

- f8. Error: z is not defined

- f9. Value: 13; Type: int
  Cell In[71], line 1
    - f1. Value: 5; Type: int
    ^
SyntaxError: illegal target for annotation

Global and Local Variables

What is the value of the total variable in the following program?

z = 3

def f1(x, y):
    z = 5
    return x + y + z

def f2(x, y, z=1):
    return x + y + z

def f3(x, y):
    return x + y + z

total = f1(2, 3) + f2(3, 1) + f3(1, 0)
print(total)
19
# Test your answer in this cell
Show the solution
z = 3

def f1(x, y):
    z = 5
    return x + y + z

def f2(x, y, z=1):
    return x + y + z

def f3(x, y):
    return x + y + z

total = f1(2, 3) + f2(3, 1) + f3(1, 0)

print(f1(2, 3))  
# The local variable z within f1 is used -> f1 returns 10

print(f2(3, 1))  
# The local variable z within f1 is used
# Its default value is 1 -> f2 returns 5

print(f3(1, 0)) 
# The global variable z is used -> f3 returns 4

print(total)
10
5
4
19

Calculator

Write a calculator function that:

  • takes two numbers as input
  • returns the addition, subtraction, multiplication, and division of these two numbers as output

Use the tuple unpacking property to assign the results to variables in a single line.

# Test your answer in this cell
Show the solution
def calculator(a, b):
    return a + b, a - b, a * b, a / b

add, sub, mult, div = calculator(5, 3)
print(add, sub, mult, div)
8 2 15 1.6666666666666667

Deduplicating a List

Write a function that:

  • takes a list of any elements as input
  • returns a new list consisting of the unique elements of the initial list
  • allows via an optional parameter to sort or not the final list in alphanumeric order. The default behavior should be not to sort.

Hint: The procedure was discussed in the tutorial on dictionaries and sets.

# Test your answer in this cell
Show the solution
def dedup(l, sort=False):
    l_dedup = list(set(l))
    if sort:
        l_dedup.sort()
    return l_dedup

l = ["a", "a", "b", "c"]
print(dedup(l))  # Default behavior: no sorting
print(dedup(l, sort=True))  # Modified behavior: sorting
['c', 'b', 'a']
['a', 'b', 'c']

Multiplying List Elements

Write a function that:

  • takes a list of numbers as input
  • prints: “There are \(n\) numbers in the list.” with \(n\) being the actual number
  • multiplies all elements of the list (without using a pre-coded function)
  • returns the result
# Test your answer in this cell
Show the solution
def multiply(l):
    print("There are " + str(len(l)) + " numbers in the list.")
    c

 = 1
    for x in l:
        c *= x  # Equivalent to: c = c * x
    return c

l = [2, 8, 3]
multiply(l)
  File <tokenize>:5
    = 1
    ^
IndentationError: unindent does not match any outer indentation level

Variance in a Population and Variance in a Sample

In an exercise from the previous tutorial, we manually coded the calculation of the variance of a list of numbers using the formula: \[\sigma^2 = {\frac {1}{n}}\sum_{i=1}^{n} (x_{i}-\bar{x})^2\]

Strictly speaking, this formula is valid when calculating population variance. If we only observe a sample of the population, we do not calculate the variance but estimate it, and we must then use the following formula to obtain an unbiased estimator of the true variance: \[s^2 = {\frac {1}{n-1}}\sum_{i=1}^{n} (x_{i}-\bar{x})^2\].

To account for this distinction:

  • code a mean function that calculates the mean as in the previous tutorial exercise
  • code a var function that calculates the variance as in the previous tutorial exercise (calling the mean function to calculate the mean)
  • modify the var function to allow the user to choose the calculation method via an optional mode parameter (default value: ‘population’ for calculation using the population formula; alternative value: ‘sample’ for calculation using the sample formula)

Compare the values obtained in both cases with what the black box function var from the numpy library returns (see the solution to the previous tutorial exercise for the syntax, and see the doc of the function, especially the ddof parameter to vary the calculation method).

# Test your answer in this cell
Show the solution
def mean(x):
    n = len(x)
    sum_mean = 0
    for x_i in x:
        sum_mean += x_i
    mean = sum_mean / n
    return mean

def var(x, mode="population"):
    n = len(x)
    mean_value = mean(x)
    sum_var = 0
    for x_i in x:
        sum_var += (x_i - mean_value)**2
    if mode == "population":
        variance = sum_var / n
    elif mode == "sample":
        variance = sum_var / (n-1)
    return variance

x = [8, 18, 6, 0, 15, 17.5, 9, 1]
print(mean(x))
print(var(x))  # population
print(var(x, mode="sample"))  # sample

# Verification with numpy library functions
import numpy as np
print(np.mean(x))
print(np.var(x))  # population
print(np.var(x, ddof=1))  # sample
9.3125
42.93359375
49.066964285714285
9.3125
42.93359375
49.066964285714285

Recursive Functions: Factorial

Recursive functions are functions that call themselves within the body of the function, causing infinite calls until a stopping criterion is reached.

A good example of a recursive function is one that calculates the factorial of an integer. The factorial of a natural number \(n\) is the product of all positive integers less than or equal to n. For example: \(5! = 5*4*3*2*1 = 120\).

Code this function and verify that it works correctly.

# Test your answer in this cell
Show the solution
def factorial(n):
    if n == 0:
        # Stopping criterion
        return 1
    else:
        return n * factorial(n-1)

factorial(5)
120