= {'coffee': '500g', 'milk': '1.5L'}
inventory inventory
{'coffee': '500g', 'milk': '1.5L'}
In the previous tutorial, we worked with sequential data structures: lists and tuples. Now, we will discover dictionaries and sets, which are unordered data structures: objects are no longer stored by position (or index) but by key, which is a unique identifier.
Dictionaries are unordered collections of key-value pairs. A dictionary is defined using the following syntax: d = {'key1': 'value1', 'key2': 'value2'}
.
= {'coffee': '500g', 'milk': '1.5L'}
inventory inventory
{'coffee': '500g', 'milk': '1.5L'}
type(inventory)
dict
It is possible to have as many keys as desired in a dictionary. However, keys are unique to uniquely identify the associated value. If you try to define a dictionary with a duplicated key, Python does not return an error, but only the last duplicated key is taken into account.
= {'coffee': '500g', 'milk': '1.5L', 'coffee': '300g'}
inventory inventory
{'coffee': '300g', 'milk': '1.5L'}
What can a dictionary contain? Keys can be of different types, but strings or integers are generally used. The values of a dictionary can be any type of Python object.
Since dictionaries are unordered, there is no notion of position: you access a value by its associated key. For example, to retrieve the value ('1.5L'
) associated with the key 'milk'
:
'milk'] inventory[
'1.5L'
Additional key-value pairs can be added to an existing dictionary using variable assignment syntax.
"cereal"] = "250g"
inventory[ inventory
{'coffee': '300g', 'milk': '1.5L', 'cereal': '250g'}
Unlike lists, keys do not necessarily start at 0 and can be any number.
= {12: "Aveyron", 33: "Gironde"}
dic1
print("The department 33 is " + dic1[33]) # String concatenation!
The department 33 is Gironde
Similarly, values can be of different types, including data containers.
= {"scale": "C major",
dic2 "notes": ["do", "re", "mi", "fa", "sol", "la", "si"]}
"notes"] dic2[
['do', 're', 'mi', 'fa', 'sol', 'la', 'si']
Dictionaries can also contain other dictionaries. This makes them particularly suitable for representing hierarchical data structures.
= {
resume "marc": {"position": "manager", "experience": 7, "hobbies": ["sewing", "frisbee"]},
"mirande": {"position": "engineer", "experience": 5, "hobbies": ["trekking"]}
}
print(resume["marc"])
print(resume["marc"]["hobbies"][0])
{'position': 'manager', 'experience': 7, 'hobbies': ['sewing', 'frisbee']}
sewing
Let’s repeat: dictionaries have no notion of order. Therefore, there is no sense in querying the element at position 0
in a dictionary (unless the key 0
exists). Querying a non-existent key returns an error.
= {12: "Aveyron", 33: "Gironde"}
dic1
0] dic1[
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[50], line 3 1 dic1 = {12: "Aveyron", 33: "Gironde"} ----> 3 dic1[0] KeyError: 0
It is possible to modify a value associated with an existing key in the dictionary. The new value can be of a different type from the original.
= {'coffee': '500g', 'milk': '1.5L'}
inventory 'coffee'] = {'arabica': '250g', 'robusta': '400g'}
inventory[ inventory
{'coffee': {'arabica': '250g', 'robusta': '400g'}, 'milk': '1.5L'}
To delete a key (and the associated value), you can use the same operations as those used to delete elements from a list.
# Using the `del` operator
= {'coffee': '500g', 'milk': '1.5L'}
inventory del inventory['milk']
inventory
{'coffee': '500g'}
# Using the `pop` method
= {'coffee': '500g', 'milk': '1.5L'}
inventory 'milk')
inventory.pop( inventory
{'coffee': '500g'}
We saw earlier that querying a non-existent key returns an error. The .get()
method allows querying a key without being sure of its existence, as it does not return an error in this case, but the None
object, which we will see in a future tutorial.
= {'coffee': '500g', 'milk': '1.5L'}
inventory 'honey') inventory.get(
You can also specify a default value when the key does not exist.
'honey', 'not found') inventory.get(
'not found'
The .keys()
, .values()
, and .items()
methods return the keys, values, and key-value pairs of a dictionary, respectively. The objects returned by these methods are a bit complex, but they can be converted to lists using the list
function to query them by position.
= {
resume "marc": {"position": "manager", "experience": 7, "hobbies": ["sewing", "frisbee"]},
"mirande": {"position": "engineer", "experience": 5, "hobbies": ["triathlon"]}
}
list(resume.keys())
['marc', 'mirande']
list(resume.values())
[{'position': 'manager', 'experience': 7, 'hobbies': ['sewing', 'frisbee']},
{'position': 'engineer', 'experience': 5, 'hobbies': ['triathlon']}]
list(resume.items())
[('marc',
{'position': 'manager', 'experience': 7, 'hobbies': ['sewing', 'frisbee']}),
('mirande',
{'position': 'engineer', 'experience': 5, 'hobbies': ['triathlon']})]
Sets are unordered collections of unique elements. As such, they can be seen as dictionaries without values, where only the keys (which are unique by definition in a dictionary) are kept. Another analogy is that of mathematical sets, whose elements are also unordered and unique.
Due to their similarity to dictionaries, sets are also defined using curly braces {}
.
= {3, 2, 1}
x x
{1, 2, 3}
type(x)
set
Just like dictionaries, sets are unordered, so there is no notion of position. Asking for the element at position i
, as in a list, returns an error.
= {3, 2, 1}
x 0] x[
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[61], line 2 1 x = {3, 2, 1} ----> 2 x[0] TypeError: 'set' object is not subscriptable
It is possible to add an element to a set using the add
method.
= {3, 2, 1}
x "4")
x.add( x
{1, 2, 3, '4'}
Adding an existing element changes nothing by definition.
= {3, 2, 1}
x 2)
x.add( x
{1, 2, 3}
It is possible to remove an element from a set using the remove
method.
= {3, 2, 1}
x 2)
x.remove( x
{1, 3}
Sets are not very often used in practice, but they are quite useful in certain specific situations. Due to the uniqueness of the elements they contain, sets allow you to simply and effectively remove duplicates from a sequential container, such as a list.
Deduplication
Suppose you want to remove duplicates from a given list. By definition, converting a list to a set removes duplicates. However, you generally want to return to a list, as sets do not offer the same flexibility as lists (for example, the ability to find an element by position). It is therefore common to perform the list -> set -> list
chain of operations to deduplicate a list.
= [1, 2, 3, 3, 2, 1]
l = list(set(l))
l_dedup l_dedup
[1, 2, 3]
Set Operations
Since sets programmatically represent mathematical sets, it is not surprising that they allow for elementary set operations. For example, union and intersection.
= [5, 3, 2, 3, 3, 5, 8, 9]
l1 = [3, 7, 0, 0, 1, 9, 4, 6] l2
# Union: elements in l1, l2, or both
= list(set(l1) | set(l2))
l_union l_union
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Intersection: elements in both l1 and l2
= list(set(l1) & set(l2))
l_inter l_inter
[9, 3]
Membership Tests
Sets are also very useful for membership tests, as they offer much better performance than lists for this type of test.
The concept of testing will be covered in a future tutorial. For now, let’s note that a membership test such as “is element a
in list l
” is written in Python as a in l
and returns True
or False
depending on whether a
is actually present in list l
.
= [1, 2, 3]
l 2 in l
True
4 in l
False
Now, imagine performing this test on a list containing millions of elements. Exaggerating, the Python interpreter would then have to go through all the elements of the list one by one until it finds the element in question, which can take a long time.
In contrast, since the elements of a set are unique, Python can easily keep track of the list of unique elements contained in the set and thus conclude the test very quickly. We will see a performance comparison in an end-of-tutorial exercise.
NB: The implementation of the concepts mentioned above is called a “hash table”. Interested readers can find more information about this data structure here.
Can you access the i
th element of a dictionary? of a set?
What types of objects can be used as dictionary keys? As values?
For what types of data is it beneficial to use a dictionary?
Can a dictionary have duplicate keys?
Why can we say that a set is a special type of dictionary?
Why are sets used to deduplicate lists?
Why are sets more suitable than lists for membership tests?
No, dictionaries and sets are unordered collections of objects.
For values: any type of object. For keys, we generally restrict to strings and/or integers.
Hierarchical data.
No, keys are unique.
A set contains only unique elements and is written with curly braces. It can thus be seen as a special dictionary containing only keys.
By definition, the elements of a set are unique. Converting a list to a set removes duplicates.
Due to the uniqueness of elements, Python can keep track of the positions of different elements. Membership tests are therefore highly optimized compared to performing them with a list.
Given the dictionary defined in the cell below.
Display using print
operations:
= {
results "6emeA": {"Miranda" : {"grades": {"physics": 16, "history": 12}},
"Celestin": {"grades": {"physics": "absent", "history": 18}}
},"6emeB": {"Hypolyte": {"grades": {"math": 11, "english": 0}},
"Josephine": {"grades": {"math": 16, "english": 20}}
} }
# Test your answer in this cell
print(list(results.keys()))
print(results["6emeA"]["Miranda"]["grades"]["history"])
print(list(results["6emeB"]["Hypolyte"]["grades"].values()))
print(list(results["6emeB"].keys()))
print(list(results["6emeA"]["Miranda"]["grades"].keys()))
print(list(results["6emeA"]["Miranda"]["grades"].keys())
+ list(results["6emeB"]["Josephine"]["grades"].keys()))
print(list(results["6emeA"]["Miranda"]["grades"].values())
+ list(results["6emeB"]["Josephine"]["grades"].values()))
['6emeA', '6emeB']
12
[11, 0]
['Hypolyte', 'Josephine']
['physics', 'history']
['physics', 'history', 'math', 'english']
[16, 12, 16, 20]
In previous tutorials, we saw the len
function, which counts the number of elements in a sequence. Does this function work with dictionaries? What does it count?
# Test your answer in this cell
= {
resume "marc": {"position": "manager", "experience": 7, "hobbies": ["sewing", "frisbee"]},
"miranda": {"position": "engineer", "experience": 5, "hobbies": ["trekking"]}
}
print(len(resume))
print(len(resume["marc"]))
2
3
The len
function applied to a dictionary counts the number of keys.
We saw that we can delete a key from a dictionary in two different ways:
del
operator: del my_dict[key]
pop
method: my_dict.pop(key)
Beyond syntax, what are the two major differences between these two ways of deleting a key from a dictionary? Feel free to experiment with examples of your choice.
# Test your answer in this cell
= {'coffee': '500g', 'milk': '1.5L'}
inventory
print(inventory.pop('coffee'))
print(inventory.pop('orange', 'unavailable'))
500g
unavailable
1st difference: when deleting an existing key with the pop
method, the value associated with the key is returned. The del
operation returns nothing (actually, a None
object).
2nd difference: the pop
method allows specifying a default value in case the key does not exist, and thus does not return an error in this case. The del
operation necessarily returns an error when the key does not exist.
By exploiting the fact that the pop
method used to delete a key from a dictionary returns the value associated with that key, propose a method to rename a dictionary key in a single operation.
# Test your answer in this cell
= {'coffee': '500g', 'milk': '1.5L'}
inventory
'water'] = inventory.pop('milk')
inventory[
inventory
{'coffee': '500g', 'water': '1.5L'}
Given the following dictionary:
animals = {'cats': 5, 'dogs': 12}
What will the following membership tests return? Verify your predictions.
'cats' in animals.keys()
'cats' in animals.values()
'cats' in animals
# Test your answer in this cell
= {'cats': 5, 'dogs': 12}
animals
print(animals.keys())
print('cats' in animals.keys())
# True: 'cats' is indeed in the keys of `animals`
print()
print(animals.values())
print('cats' in animals.values())
# False: 'cats' is not a value in `animals`
print()
print(animals)
print('cats' in animals)
# True: this test is strictly equivalent to 'cats' in animals.keys()
dict_keys(['cats', 'dogs'])
True
dict_values([5, 12])
False
{'cats': 5, 'dogs': 12}
True
We saw that the del
operation returns an error when used to delete a non-existent key from a dictionary. Using your new knowledge of membership tests, can you imagine a method (without necessarily implementing it) that would handle this case without an error?
# Test your answer in this cell
= {'coffee': '500g', 'milk': '1.5L'}
inventory
= 'chocolate'
key
if key in inventory.keys():
del inventory[key]
We use a membership test: if the key exists in the dictionary’s keys, it is deleted. Otherwise, nothing happens. This syntax will become clearer with the next tutorial.
Given the following string with repetitions:
x = "cdabcdabcdabcdabcdabcdabcdabcdabcdab"
Construct a list of unique characters in this string, sorted alphabetically, i.e.:
l = ['a', 'b', 'c', 'd']
Hint: the procedure is similar to removing duplicates from a list.
# Test your answer in this cell
= "cdabcdabcdabcdabcdabcdabcdabcdabcdab"
x = list(set(x))
l
l.sort() l
['a', 'b', 'c', 'd']
Given the following two strings:
cyrano1 = 'C’est un roc ! … c’est un pic ! … c’est un cap !'
cyrano2 = 'Que dis-je, c’est un cap ? … C’est une péninsule !'
Question 1: find the characters that appear in both strings.
Question 2: find the characters that appear in at least one of the two texts.
# Test your answer in this cell
= 'C’est un roc ! … c’est un pic ! … c’est un cap !'
cyrano1 = 'Que dis-je, c’est un cap ? … C’est une péninsule !'
cyrano2
# Question 1
= list(set(cyrano1) & set(cyrano2))
inter print(inter)
# Question 2
= list(set(cyrano1) | set(cyrano2))
union print(union)
['p', 'a', 'c', '’', ' ', 'e', 'u', 't', 'i', 'C', 's', '!', '…', 'n']
['p', 'a', 'c', 'j', 'e', 't', ',', '-', ' ', 'C', 'u', '!', '…', 'n', 'r', 'Q', '’', 'd', 'o', 'i', 's', 'l', 'é', '?']
The code below generates a list with the letters a, b, c, and d repeated 1 million times. Then, it performs a membership test for a letter that does not exist in the list and calculates the time taken by the Python interpreter to perform the test.
Using this syntax, compare the time taken for the same membership test when the list is converted to a set beforehand.
= ['a', 'b', 'c', 'd'] * 1000000
x print(x[:10])
%time 'e' in x
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b']
CPU times: user 39.7 ms, sys: 32 μs, total: 39.7 ms
Wall time: 39.7 ms
False
# Test your answer in this cell
= ['a', 'b', 'c', 'd'] * 1000000
x print(x[:10])
= set(x)
x_set print(x_set)
%time 'e' in x
%time 'e' in x_set
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b']
{'a', 'd', 'b', 'c'}
CPU times: user 39.9 ms, sys: 53 μs, total: 40 ms
Wall time: 40 ms
CPU times: user 3 μs, sys: 0 ns, total: 3 μs
Wall time: 7.39 μs
False
The initial membership test is measured in milliseconds. The one performed on the set is measured in microseconds. The test is much faster when converted to a set beforehand.