Scientific Programming Lab¶
Data Science Master @University of Trento - AA 2019/20
Download: PDF EPUB HTMLTeaching assistant: David Leoni david.leoni@unitn.it website: davidleoni.it
This work is licensed under a Creative Commons Attribution 4.0 License CC-BY
News¶
25 August 2020 - Published 2020-08-24 exam results
27 July 2020 - Published 2020-07-17 exam results
17 June 2020 - Published 2020-06-16 exam results
4 March 2020 - Published 2020-02-10 exam results
31 January 2020 - Published 2020-01-23 exam results
7 January 2020 Extra tutoring:
(Beware rooms are not always the same)
Tue 14 January 10.00 - 12.00 A216
Wed 15 January 10.00 - 12.00 A216
Thu 16 January 10.00 - 12.00 A214
Fri 17 January 10.00 - 12.00 A221
Tue 21 January 10.00 - 12.00 A216
23 December 2019 - Published Midterm B grades:
07 December 2019 - Set midterm Part B date:
Friday 20th December, lab A202, from 11.45 to 13.45
Admission: students who got grade >= 16 at the first midterm
06 December 2019: Published midterm results:
28 November 2019: Set exams dates:
23 January 8:30-13:30 A201
10 February 8:30-13:30 A202
7 November 2019: published Midterm Part A solution
Slides¶
See Slides page
Labs timetable¶
For the regular labs timetable please see:
Part A: Andrea Passerini’s course site
Part B: Luca Bianco’s course site
Tutoring¶
A tutoring service for Scientific Programming - Data science labs has been set up and will be held by Gabriele Masina - email: gabriele.masina (guess what) studenti.unitn.it
Please take advantage of it as much as possible so you don’t end up writing random code at the exam!
Mondays: room A215 from 11.30-13.30 (note: it will be until 13:30 and not 14:30 as previously said in class)
Wednesday: 9:00-11:00, Rooms: A219 until Wednesday 13 November included, A218 afterwards
Complete tutoring schedule:
November 2019:
4 monday 11.30-13.30 A218
6 wednesday 9.00-11:00
11 monday 11.30-13.30 A218
13 wednesday 9.00-11:00
18 monday 11.30-13.30 A218
20 wednesday 9.00-11:00 A218
25 monday 11.30-13.30 A218
27 wednesday 9.00-11:00 A218
December 2019:
2 monday 11.30-13.30 A218
4 wednesday 9.00-11:00 A218
9 monday 11.30-13.30 A218
11 wednesday 9.00-11:00 A218
16 monday 11.30-13.30 A218
18 wednesday 9.00-11:00 A218
January 2020:
(Beware rooms are not always the same)
Tue 14 January 10.00 - 12.00 A216
Wed 15 January 10.00 - 12.00 A216
Thu 16 January 10.00 - 12.00 A214
Fri 17 January 10.00 - 12.00 A221
Tue 21 January 10.00 - 12.00 A216
Exams¶
Schedule¶
Taking part to an exam erases any vote you had before (except for Midterm B which of course doesn’t erase Midterm A taken in the same academic year)
Exams dates:
23 January 8:30-11:30 A201
10 February 8:30-11:30 A202
Exam modalities¶
Sciprog exams are open book. You can bring a printed version of the material listed below.
Exam will take place in the lab with no internet access. You will only be able to access this documentation:
Andrea Passerini slides and Luca Bianco slides
-
In particular, Unittest docs
Part A: Think Python book
Part B: Problem Solving with Algorithms and Data Structures using Python book
Exams how to¶
Make practice with the lab computers !!
Exam will be in Linux Ubuntu environment - so learn how to browse folders there and also typing with noisy lab keyboards :-)
If you need to look up some Python function, please start today learning how to search documentation on Python website.
Make sure all exercises at least compile!
Don’t forget duplicated code around!
If I see duplicated code, I don’t know what to grade, I waste time, and you don’t want me angry while grading.
Only implementations of provided function signatures will be evaluated !!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
Look in Applications->Programming:
Part A: Jupyter: open
Terminal
and typejupyter notebook
Part B: open Visual Studio Code
If for whatever reason tests don’t work in Visual Studio Code, be prepared to run them in the Terminal.
PAY close attention to function comments!
DON’T modify function signatures! Just provide the implementation
DON’T change existing test methods. If you want, you can add tests
DON’T create other files. If you still do it, they won’t be evaluated
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies
Even if print statements are allowed, be careful with prints that might break your function! For example, avoid stuff like this:
x = 0
print(1/x)
Expectations¶
This is a data science master, so you must learn to be a proficient programmer - no matter the background you have.
Exercises proposed during labs are an example of what you will get during the exam, BUT there is no way you can learn the required level of programming only doing exercises on this website. Fortunately, since Python is so trendy nowadays there are a zillion good resources to hone your skills - you can find some in Resources
To successfully pass the exam, you should be able to quickly solve exercises proposed during labs with difficulty ranging from ✪ to ✪✪✪ stars. By quickly I mean in half on hour you should be able to solve a three star exercise ✪✪✪. Typically, an exercise will be divided in two parts, the first easy ✪✪ to introduce you to the concept and the second more difficult ✪✪✪ to see if you really grasped the idea.
Before getting scared, keep in mind I’m most interested in your capability to understand the problem and find your way to the solution. In real life, junior programmers are often given by senior colleagues functions to implement based on specifications and possibly tests to make sure what they are implementing meets the specifications. Also, programmers copy code all of the time. This is why during the exam I give you tests for the functions to implement so you can quickly spot errors, and also let you use the course material (see exam modalities).
Part A expectations: performance does not matters: if you are able to run the required algorithm on your computer and the tests pass, it should be fine. Just be careful when given a 100Mb file, in that case sometimes bad code may lead to very slow execution and/or clog the memory.
In particular, in lab computers the whole system can even hang, so watch out for errors such as:
infinite
while
which keeps adding new elements to lists - whenever possible, preferfor
loopsscanning a big pandas dataframe using a
for in
instead of pandas native transformations
Part B expectations: performance does matters (i.e. finding the diagonal of a matrix should take a time linearly proportional to \(n\), not \(n^2\)). Also, in this part we will deal with more complex data structures. Here we generally follow the Do It Yourself method, reimplementing things from scratch. So please, use the brain:
if the exercise is about sorting, do not call Python
.sort()
method !!!if the exercise is about data structures, and you are thinking about converting the whole data structure (or part of it) into python lists, first, think about the computational cost of such conversion, and second, do ask the instructor for permission.
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
When all tests pass hopefully should get full grade (although tests are never exhaustive!), but if the code is not correct you will still get a percentage. Percentage of course is subjective, and may depend on unfathomable factors such as the quantity of jam I found in the morning croissant that particular day. Jokes aside, the amount you get is usually proportional to the amount of time I have to spend to fix your algorithm.
After exams I publish the code with corrections. If all tests pass and you still don’t get 100% grade, you may come to my office questioning the grade. If tests don’t pass I’m less available for debating - I don’t like much complaints like ‘my colleague did the same error as me and got more points’ - even worse is complaining without having read the corrections.
Exams FAQ¶
As part of the exam, there are some questions you need to know. Luckily, answers are pretty easy.
I did good part A/B, can I only do part B/A on next exam?
No way.
Can I have additional retake just for me?
No way.
Can I have additional oral to increase the grade?
No way.
I have 7 + \(\sqrt{3}\) INF credits from a Summer School in Applied Calculonics, can I please give only Part B?
I’m not into credits engineering, please ask the administrative office or/and Passerini.
I have another request which does not concern corrections / possibly wrong grading
Ask Passerini, I’m not the boss.
I’ve got 26.99 but this is my last exam and I really need 27 so I can get good master final outcome, could you please raise grade of just that little 0.01?
Preposterous requests like this will be forwarded to our T-800 assistent, it’s very efficient.
Past exams¶
See Past exams page
Resources¶
Google colabs: Scratchpads to show python code. During the lesson you can also write on them to share code.
Source code of these worksheets (download zip), in Jupyter Notebook format.
Part A Resources¶
Part A Theory slides by Andrea Passerini
Allen Downey, Think Python
Talks a lot, step by step, good for beginners
License: Creative Commons CC BY Non Commercial 3.0as reported in the original page
Tutorials from Nicola Cassetta
Tutorial step by step, in Italian, good for beginners. They are well done and with solutions - please try them all.
Dive into Python 3
More practical, contains more focused tutorials (i.e. manage XML files)
Licence: Creative Commons By Share-alike 3.0 come riportato in fondo al sito del libro
LeetCode
Website with collections of exercises sorted by difficulty and acceptance rate. You can generally try sorting by Acceptance and Easy filters.
For a selection of exercises from leetcode, see Further resources sections at the ends of
List chapter
HackerRank
Contains many Python 3 exercises on algorithms and data structures (Needs to login)
Geeks for Geeks
Contains many exercises - doesn’t have solutions nor explicit asserts but if you login and submit solutions, the system will run some tests serverside and give you a response.
In general for Part A you can filter difficulty by school+basic+easy and if you need to do part B also include medium.
Example: Filter difficulty by school+basic+easy and topic String
You can select many more topics if you click more>>
un der Topic Tags:
Material from other courses of mine (in Italian)
SoftPython - pulizia e analisi dati con Python : contains many practical example of reading files and performing analysis on them.
Part B Resources¶
Part B theory slides by Luca Bianco
Problem Solving with Algorithms and Data Structures using Python online book by Brad Miller and David Ranum
Theory exercises (complexity, tree visits, graph visits) - by Alberto Montresor
LeetCode (sort by easy difficulty)
Editors¶
Visual Studio Code: the course official editor.
Spyder: Seems like a fine and simple editor
Jupyter Notebook: Nice environment to execute Python commands and display results like graphs. Allows to include documentation in Markdown format
JupyterLab : next and much better version of Jupyter, although as of Sept 2018 is still in beta
PythonTutor, a visual virtual machine (very useful! can also be found in examples inside the book!)
Further readings¶
Rule based design by Lex Wedemeijer, Stef Joosten, Jaap van der woude: a very readable text on how to represent information using only binary relations with boolean matrices (not mandatory read, it only gives context and practical applications for some of the material on graphs presented during the course)
Acknoledgements¶
I wish to thank Dr. Luca Bianco for the introductory material on Visual Studio Code and Python
This site was made with Jupyter using NBSphinx extension and Jupman template
Past Exams¶
Data science¶
NOTE: 19-20 exams are very similar to 18-19, the only difference being that you might also get an exercise on Pandas.
Midterm Simulation - Tue 13, November 2018 - solutions¶
Scientific Programming - Data Science Master @ University of Trento
Introduction¶
This simulation gives you NO credit whatsoever, it’s just an example. If you do everything wrong, you lose nothing. If you do everything correct, you gain nothing.
Allowed material¶
There won’t be any internet access. You will only be able to access:
DS Sciprog Lab worksheets
Alberto Montresor slides
Python 3 documentation (in particular, see unittest)
The course book “Problem Solving with Algorithms and Data Structures using Python”
Grading FACSIMILE - IN THIS SIMULATION TIME YOU GET NO GRADE !!!!¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use Jupyter (start it from Terminal with jupyter notebook
), if it doesn’t work use an editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2018-11-13-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2018-11-13-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-other stuff ...
|-exams
|-2018-11-13
|- A1_exercise.ipynb
|- A2_exercise.ipynb
|- B1_exercise.py
|- B1_test.py
|- B2_exercise.py
|- B2_test.py
Rename
datasciprolab-2018-11-13-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2018-11-12-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
1. matrices¶
1.1 fill¶
Difficulty: ✪✪
[2]:
def fill(lst1, lst2):
""" Takes a list lst1 of n elements and a list lst2 of m elements, and MODIFIES lst2
by copying all lst1 elements in the first n positions of lst2
If n > m, raises a ValueError
"""
#jupman-raise
if len(lst1) > len(lst2):
raise ValueError("List 1 is bigger than list 2 ! lst_a = %s, lst_b = %s" % (len(lst1), len(lst2)))
j = 0
for x in lst1:
lst2[j] = x
j += 1
#/jupman-raise
try:
fill(['a','b'], [None])
raise Exception("TEST FAILED: Should have failed before with a ValueError!")
except ValueError:
"Test passed"
try:
fill(['a','b','c'], [None,None])
raise Exception("TEST FAILED: Should have failed before with a ValueError!")
except ValueError:
"Test passed"
L1 = []
R1 = []
fill(L1, R1)
assert L1 == []
assert R1 == []
L = []
R = ['x']
fill(L, R)
assert L == []
assert R == ['x']
L = ['a']
R = ['x']
fill(L, R)
assert L == ['a']
assert R == ['a']
L = ['a']
R = ['x','y']
fill(L, R)
assert L == ['a']
assert R == ['a','y']
L = ['a','b']
R = ['x','y']
fill(L, R)
assert L == ['a','b']
assert R == ['a','b']
L = ['a','b']
R = ['x','y','z',]
fill(L, R)
assert L == ['a','b']
assert R == ['a','b','z']
L = ['a']
R = ['x','y','z',]
fill(L, R)
assert L == ['a']
assert R == ['a','y','z']
1.2 lab¶
✪✪✪ If you’re a teacher that often see new students, you have this problem: if two students who are friends sit side by side they can start chatting way too much. To keep them quiet, you want to somehow randomize student displacement by following this algorithm:
first sort the students alphabetically
then sorted students progressively sit at the available chairs one by one, first filling the first row, then the second, till the end.
Now implement the algorithm:
[3]:
def lab(students, chairs):
"""
INPUT:
- students: a list of strings of length <= n*m
- chairs: an nxm matrix as list of lists filled with None values (empty chairs)
OUTPUT: MODIFIES BOTH students and chairs inputs, without returning anything
If students are more than available chairs, raises ValueError
Example:
ss = ['b', 'd', 'e', 'g', 'c', 'a', 'h', 'f' ]
mat = [
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None]
]
lab(ss, mat)
# after execution, mat should result changed to this:
assert mat == [
['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', None],
[None, None, None],
]
# after execution, input ss should now be ordered:
assert ss == ['a','b','c','d','e','f','g','f']
For more examples, see tests
"""
#jupman-raise
n = len(chairs)
m = len(chairs[0])
if len(students) > n*m:
raise ValueError("There are more students than chairs ! Students = %s, chairs = %sx%s" % (len(students), n, m))
i = 0
j = 0
students.sort()
for s in students:
chairs[i][j] = s
if j == m - 1:
j = 0
i += 1
else:
j += 1
#/jupman-raise
try:
lab(['a','b'], [[None]])
raise Exception("TEST FAILED: Should have failed before with a ValueError!")
except ValueError:
"Test passed"
try:
lab(['a','b','c'], [[None,None]])
raise Exception("TEST FAILED: Should have failed before with a ValueError!")
except ValueError:
"Test passed"
m0 = [
[None]
]
r0 = lab([],m0)
assert m0 == [
[None]
]
assert r0 == None # function is not meant to return anything (so returns None by default)
m1 = [
[None]
]
r1 = lab(['a'], m1)
assert m1 == [
['a']
]
assert r1 == None # function is not meant to return anything (so returns None by default)
m2 = [
[None, None]
]
lab(['a'], m2) # 1 student 2 chairs in one row
assert m2 == [
['a', None]
]
m3 = [
[None],
[None],
]
lab(['a'], m3) # 1 student 2 chairs in one column
assert m3 == [
['a'],
[None]
]
ss4 = ['b', 'a']
m4 = [
[None, None]
]
lab(ss4, m4) # 2 students 2 chairs in one row
assert m4 == [
['a','b']
]
assert ss4 == ['a', 'b'] # also modified input list as required by function text
m5 = [
[None, None],
[None, None]
]
lab(['b', 'c', 'a'], m5) # 3 students 2x2 chairs
assert m5 == [
['a','b'],
['c', None]
]
m6 = [
[None, None],
[None, None]
]
lab(['b', 'd', 'c', 'a'], m6) # 4 students 2x2 chairs
assert m6 == [
['a','b'],
['c','d']
]
m7 = [
[None, None, None],
[None, None, None]
]
lab(['b', 'd', 'e', 'c', 'a'], m7) # 5 students 3x2 chairs
assert m7 == [
['a','b','c'],
['d','e',None]
]
ss8 = ['b', 'd', 'e', 'g', 'c', 'a', 'h', 'f' ]
m8 = [
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None]
]
lab(ss8, m8) # 8 students 3x4 chairs
assert m8 == [
['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', None],
[None, None, None],
]
assert ss8 == ['a','b','c','d','e','f','g','h']
2. phones¶
A radio station used to gather calls by recording just the name of the caller and the phone number as seen on the phone display. For marketing purposes, the station owner wants now to better understand the places from where listeners where calling. He then hires you as Algorithmic Market Strategist and asks you to show statistics about the provinces of the calling sites. There is a problem, though. Numbers where written down by hand and sometimes they are not uniform, so it would be better to find a canonical representation.
NOTE: Phone prefixes can be a very tricky subject, if you are ever to deal with them seriously please use proper phone number parsing libraries and do read Falsehoods Programmers Believe About Phone Numbers
2.1 canonical¶
✪ We first want to canonicalize a phone number as a string.
For us, a canonical phone number:
contains no spaces
contains no international prefix, so no
+39
nor0039
: we assume all calls where placed from Italy (even if they have international prefix)
For example, all of these are canonicalized to “0461123456”:
+39 0461 123456
+390461123456
0039 0461 123456
00390461123456
These are canonicalized as the following:
328 123 4567 -> 3281234567
0039 328 123 4567 -> 3281234567
0039 3771 1234567 -> 37711234567
REMEMBER: strings are immutable !!!!!
[4]:
def canonical(phone):
""" RETURN the canonical version of phone as a string. See above for an explanation.
"""
#jupman-raise
p = phone.replace(' ', '')
if p.startswith('0039'):
p = p[4:]
if p.startswith('+39'):
p = p[3:]
return p
#/jupman-raise
assert canonical('+39 0461 123456') == '0461123456'
assert canonical('+390461123456') == '0461123456'
assert canonical('0039 0461 123456') == '0461123456'
assert canonical('00390461123456') == '0461123456'
assert canonical('003902123456') == '02123456'
assert canonical('003902120039') == '02120039'
assert canonical('0039021239') == '021239'
2.2 prefix¶
✪✪ We now want to extract the province prefix - the ones we consider as valid are in province_prefixes
list.
Note some numbers are from mobile operators and you can distinguish them by prefixes like 328
- the ones we consider are in an mobile_prefixes
list.
[5]:
province_prefixes = ['0461', '02', '011']
mobile_prefixes = ['330', '340', '328', '390', '3771']
def prefix(phone):
""" RETURN the prefix of the phone as a string. Remeber first to make it canonical !!
If phone is mobile, RETURN string 'mobile'. If it is not a phone nor a mobile, RETURN
the string 'unrecognized'
To determine if the phone is mobile or from province, use above province_prefixes and mobile_prefixes lists.
DO USE THE ALREADY DEFINED FUCTION canonical(phone)
"""
#jupman-raise
c = canonical(phone)
for m in mobile_prefixes:
if c.startswith(m):
return 'mobile'
for p in province_prefixes:
if c.startswith(p):
return p
return 'unrecognized'
#/jupman-raise
assert prefix('0461123') == '0461'
assert prefix('+39 0461 4321') == '0461'
assert prefix('0039011 432434') == '011'
assert prefix('328 432434') == 'mobile'
assert prefix('+39340 432434') == 'mobile'
assert prefix('00666011 432434') == 'unrecognized'
assert prefix('12345') == 'unrecognized'
assert prefix('+39 123 12345') == 'unrecognized'
2.3 hist¶
Difficulty: ✪✪✪
[6]:
province_prefixes = ['0461', '02', '011']
mobile_prefixes = ['330', '340', '328', '390', '3771']
def hist(phones):
""" Given a list of non-canonical phones, RETURN a dictionary where the keys are the prefixes of the canonical phones
and the values are the frequencies of the prefixes (keys may also be `unrecognized' or `mobile`)
NOTE: Numbers corresponding to the same phone (so which have the same canonical representation)
must be counted ONLY ONCE!
DO USE THE ALREADY DEFINED FUCTIONS canonical(phone) AND prefix(phone)
"""
#jupman-raise
d = {}
s = set()
for phone in phones:
c = canonical(phone)
if c not in s:
s.add(c)
p = prefix(phone)
if p in d :
d[p] += 1
else:
d[p] = 1
return d
#/jupman-raise
assert hist(['0461123']) == {'0461':1}
assert hist(['123']) == {'unrecognized':1}
assert hist(['328 123']) == {'mobile':1}
assert hist(['0461123','+390461123']) == {'0461':1} # same canonicals, should be counted only once
assert hist(['0461123', '+39 0461 4321']) == {'0461':2}
assert hist(['0461123', '+39 0461 4321', '0039011 432434']) == {'0461':2, '011':1}
assert hist(['+39 02 423', '0461123', '02 426', '+39 0461 4321', '0039328 1234567', '02 423', '02 424']) == {'0461':2, 'mobile':1, '02':3}
2.4 display calls by prefixes¶
✪✪ Using matplotlib, display a bar plot of the frequency of calls by prefixes (including mobile and unrecognized), sorting them in reverse order so you first see the province with the higher number of calls. Also, save the plot on disk with plt.savefig('prefixes-count.png')
(call it before plt.show()
)
If you’re in trouble you can find plenty of examples in the visualization chapter
You should obtain something like this:
[7]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
province_prefixes = ['0461', '02', '011']
mobile_prefixes = ['330', '340', '328', '390', '3771']
phones = ['+39 02 423', '0461123', '02 426', '+39 0461 4321', '0039328 1234567', '02 423', '02 424']
# write here
[8]:
# SOLUTION
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
province_prefixes = ['0461', '02', '011']
province_names = ['Trento', 'Milano', 'Torino']
mobile_prefixes = ['330', '340', '328', '390', '3771']
phones = ['+39 02 423', '0461123', '02 426', '+39 0461 4321', '0039328 1234567', '02 423', '02 424']
coords = list(hist(phones).items())
coords.sort(key=lambda x:x[1], reverse=True)
xs = np.arange(len(coords))
ys = [c[1] for c in coords]
plt.bar(xs, ys, 0.5, align='center')
plt.title("province calls by prefixes sorted solution")
plt.xticks(xs, [c[0] for c in coords])
plt.xlabel('prefixes')
plt.ylabel('calls')
plt.savefig('prefixes-count-solution.png')
plt.show()

Midterm - Fri 16 November 2018 - solutions¶
Scientific Programming - Data Science Master @ University of Trento
Introduction¶
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use Jupyter (start it from Terminal with jupyter notebook
), if it doesn’t work use an editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2018-11-16-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2018-11-16-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-other stuff ...
|-exams
|-2018-11-16
|- exam-2018-11-16-exercise.ipynb
Rename
datasciprolab-2018-11-16-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2018-11-16-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it
If you don’t have unitn login: tell instructors and we will download your work manually
A1 union¶
✪✪ When we talk about the union of two graphs, we intend the graph having union of verteces of both graphs and having as edges the union of edges of both graphs. In this exercise, we have two graphs as list of lists with boolean edges. To simplify we suppose they have the same vertices but possibly different edges, and we want to calculate the union as a new graph.
For example, if we have a graph ma
like this:
[2]:
ma = [
[True, False, False],
[False, True, False],
[True, False, False]
]
[3]:
draw_mat(ma)

And another mb
like this:
[4]:
mb = [
[True, True, False],
[False, False, True],
[False, True, False]
]
[5]:
draw_mat(mb)

The result of calling union(ma, mb)
will be the following:
[19]:
res = [[True, True, False], [False, True, True], [True, True, False]]
which will be displayed as
[20]:
draw_mat(res)

So we get same verteces and edges from both ma
and mb
[6]:
def union(mata, matb):
""" Takes two graphs represented as nxn matrices of lists of lists with boolean edges,
and RETURN a NEW matrix which is the union of both graphs
if mata row number is different from matb, raises ValueError
"""
#jupman-raise
if len(mata) != len(matb):
raise ValueError("mata and matb have different row number a:%s b:%s!" % (len(mata), len(matb)))
n = len(mata)
ret = []
for i in range(n):
row = []
ret.append(row)
for j in range(n):
row.append(mata[i][j] or matb[i][j])
return ret
#/jupman-raise
try:
union([[False],[False]], [[False]])
raise Exception("Shouldn't arrive here !")
except ValueError:
"test passed"
try:
union([[False]], [[False],[False]])
raise Exception("Shouldn't arrive here !")
except ValueError:
"test passed"
ma1 = [
[False]
]
mb1 = [
[False]
]
assert union(ma1, mb1) == [
[False]
]
ma2 = [
[False]
]
mb2 = [
[True]
]
assert union(ma2, mb2) == [
[True]
]
ma3 = [
[True]
]
mb3 = [
[False]
]
assert union(ma3, mb3) == [
[True]
]
ma4 = [
[True]
]
mb4 = [
[True]
]
assert union(ma4, mb4) == [
[True]
]
ma5 = [
[False, False, False],
[False, False, False],
[False, False, False]
]
mb5 = [
[True, False, True],
[False, True, True],
[False, False, False]
]
assert union(ma5, mb5) == [
[True, False, True],
[False, True, True],
[False, False, False]
]
ma6 = [
[True, False, True],
[False, True, True],
[False, False, False]
]
mb6 = [
[False, False, False],
[False, False, False],
[False, False, False]
]
assert union(ma6, mb6) == [
[True, False, True],
[False, True, True],
[False, False, False]
]
ma7 = [
[True, False, False],
[False, True, False],
[True, False, False]
]
mb7 = [
[True, True, False],
[False, False, True],
[False, True, False]
]
assert union(ma7, mb7) == [
[True, True, False],
[False, True, True],
[True, True, False]
]
A2 surjective¶
✪✪ If we consider a graph as a nxn binary relation where the domain is the same as the codomain, such relation is called surjective if every node is reached by at least one edge.
For example, G1
here is surjective, because there is at least one edge reaching into each node (self-loops as in 0 node also count as incoming edges)
[7]:
G1 = [
[True, True, False, False],
[False, False, False, True],
[False, True, True, False],
[False, True, True, True],
]
[8]:
draw_mat(G1)

G2
down here instead does not represent a surjective relation, as there is at least one node ( 2
in our case) which does not have any incoming edge:
[9]:
G2 = [
[True, True, False, False],
[False, False, False, True],
[False, True, False, False],
[False, True, False, False],
]
[10]:
draw_mat(G2)

[11]:
def surjective(mat):
""" RETURN True if provided graph mat as list of boolean lists is an
nxn surjective binary relation, otherwise return False
"""
#jupman-raise
n = len(mat)
c = 0 # number of incoming edges found
for j in range(len(mat)): # go column by column
for i in range(len(mat)): # go row by row
if mat[i][j]:
c += 1
break # as you find first incoming edge, increment c and stop search for that column
return c == n
#/jupman-raise
m1 = [
[False]
]
assert surjective(m1) == False
m2 = [
[True]
]
assert surjective(m2) == True
m3 = [
[True, False],
[False, False],
]
assert surjective(m3) == False
m4 = [
[False, True],
[False, False],
]
assert surjective(m4) == False
m5 = [
[False, False],
[True, False],
]
assert surjective(m5) == False
m6 = [
[False, False],
[False, True],
]
assert surjective(m6) == False
m7 = [
[True, False],
[True, False],
]
assert surjective(m7) == False
m8 = [
[True, False],
[False, True],
]
assert surjective(m8) == True
m9 = [
[True, True],
[False, True],
]
assert surjective(m9) == True
m10 = [
[True, True, False, False],
[False, False, False, True],
[False, True, False, False],
[False, True, False, False],
]
assert surjective(m10) == False
m11 = [
[True, True, False, False],
[False, False, False, True],
[False, True, True, False],
[False, True, True, True],
]
assert surjective(m11) == True
A3 ediff¶
✪✪✪ The edge difference of two graphs ediff(da,db)
is a graph with the edges of the first except the edges of the second. For simplicity, here we consider only graphs having the same verteces but possibly different edges. This time we will try operate on graphs represented as dictionaries of adjacency lists.
For example, if we have
[12]:
da = {
'a':['a','c'],
'b':['b', 'c'],
'c':['b','c']
}
[13]:
draw_adj(da)

and
[14]:
db = {
'a':['c'],
'b':['a','b', 'c'],
'c':['a']
}
[15]:
draw_adj(db)

The result of calling ediff(da,db)
will be:
[16]:
res = {
'a':['a'],
'b':[],
'c':['b','c']
}
Which can be shown as
[17]:
draw_adj(res)

[18]:
def ediff(da,db):
""" Takes two graphs as dictionaries of adjacency lists da and db, and
RETURN a NEW graph as dictionary of adjacency lists, containing the same vertices of da,
and the edges of da except the edges of db.
- As order of elements within the adjacency lists, use the same order as found in da.
- We assume all verteces in da and db are represented in the keys (even if they have
no outgoing edge), and that da and db have the same keys
EXAMPLE:
da = {
'a':['a','c'],
'b':['b', 'c'],
'c':['b','c']
}
db = {
'a':['c'],
'b':['a','b', 'c'],
'c':['a']
}
assert ediff(da, db) == {
'a':['a'],
'b':[],
'c':['b','c']
}
"""
#jupman-raise
ret = {}
for key in da:
ret[key] = []
for target in da[key]:
# not efficient but works for us
# using sets would be better, see https://stackoverflow.com/a/6486483
if target not in db[key]:
ret[key].append(target)
return ret
#/jupman-raise
da1 = {
'a': []
}
db1 = {
'a': []
}
assert ediff(da1, db1) == {
'a': []
}
da2 = {
'a': []
}
db2 = {
'a': ['a']
}
assert ediff(da2, db2) == {
'a': []
}
da3 = {
'a': ['a']
}
db3 = {
'a': []
}
assert ediff(da3, db3) == {
'a': ['a']
}
da4 = {
'a': ['a']
}
db4 = {
'a': ['a']
}
assert ediff(da4, db4) == {
'a': []
}
da5 = {
'a':['b'],
'b':[]
}
db5 = {
'a':['b'],
'b':[]
}
assert ediff(da5, db5) == {
'a':[],
'b':[]
}
da6 = {
'a':['b'],
'b':[]
}
db6 = {
'a':[],
'b':[]
}
assert ediff(da6, db6) == {
'a':['b'],
'b':[]
}
da7 = {
'a':['a','b'],
'b':[]
}
db7 = {
'a':['a'],
'b':[]
}
assert ediff(da7, db7) == {
'a':['b'],
'b':[]
}
da8 = {
'a':['a','b'],
'b':['a']
}
db8 = {
'a':['a'],
'b':['b']
}
assert ediff(da8, db8) == {
'a':['b'],
'b':['a']
}
da9 = {
'a':['a','c'],
'b':['b', 'c'],
'c':['b','c']
}
db9 = {
'a':['c'],
'b':['a','b', 'c'],
'c':['a']
}
assert ediff(da9, db9) == {
'a':['a'],
'b':[],
'c':['b','c']
}
Midterm - Thu 10, Jan 2019 - solutions¶
Scientific Programming - Data Science Master @ University of Trento
Download exercises and solution¶
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-01-10-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-01-10-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-other stuff ...
|-exams
|-2019-01-10
|- gaps_exercise.py
|- gaps_test.py
|- tasks_exercise.py
|- tasks_test.py
|- exits_exercise.py
|- exits_test.py
|- other stuff ...
Rename
datasciprolab-2019-01-10-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-01-10-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it
If you don’t have unitn login: tell instructors and we will download your work manually
Introduction¶
B1 Theory¶
Please write the solution in the text file theory.txt
Given the following function:
def fun(N, M):
S1 = set(N)
S2 = set(M)
res = []
for x in S1:
if x in S2:
for i in range(N.count(x)):
res.append(x)
return res
let N
and M
be two lists of length n
and m
, respectively. What is the computational complexity of function fun()
with respect to n
and m
?
B2 Gaps linked list¶
Given a linked list of size n which only contains integers, a gap is an index i
, 0<i<n
, such that L[i−1]<L[i]
. For the purpose of this exercise, we assume an empy list or a list with one element have zero gaps
Example:
data: 9 7 6 8 9 2 2 5
index: 0 1 2 3 4 5 6 7
contains three gaps [3,4,7] because:
number 8 at index 3 is greater than previous number 6 at index 2
number 9 at index 4 is greater than previous number 8 at index 3
number 5 at index 7 is greater than previous number 2 at index 6
Open file gaps_exercise.py
and implement this method:
def gaps(self):
""" Assuming all the data in the linked list is made by numbers,
finds the gaps in the LinkedList and return them as a Python list.
- we assume empty list and list of one element have zero gaps
- MUST perform in O(n) where n is the length of the list
NOTE: gaps to return are *indeces* , *not* data!!!!
"""
Testing: python3 -m unittest gaps_test.GapsTest
B3 Tasks stack¶
Very often, you begin to do a task just to discover it requires doing 3 other tasks, so you start carrying them out one at a time and discover one of them actually requires to do yet another two other subtasks….
To represent the fact a task may have subtasks, we will use a dictionary mapping a task label to a list of subtasks, each represented as a label. For example:
[2]:
subtasks = {
'a':['b','g'],
'b':['c','d','e'],
'c':['f'],
'd':['g'],
'e':[],
'f':[],
'g':[]
}
Task a
requires subtasks b
andg
to be carried out (in this order), but task b
requires subtasks c
, d
and e
to be done. c
requires f
to be done, and d
requires g
.
You will have to implement a function called do
and use a Stack data structure, which is already provided and you don’t need to implement. Let’s see an example of execution.
IMPORTANT: In the execution example, there are many prints just to help you understand what’s going on, but the only thing we actually care about is the final list returned by the function!
IMPORTANT: notice subtasks are scheduled in reversed order, so the item on top of the stack will be the first to get executed !
[3]:
from tasks_solution import *
do('a', subtasks)
DEBUG: Stack: elements=['a']
DEBUG: Doing task a, scheduling subtasks ['b', 'g']
DEBUG: Stack: elements=['g', 'b']
DEBUG: Doing task b, scheduling subtasks ['c', 'd', 'e']
DEBUG: Stack: elements=['g', 'e', 'd', 'c']
DEBUG: Doing task c, scheduling subtasks ['f']
DEBUG: Stack: elements=['g', 'e', 'd', 'f']
DEBUG: Doing task f, scheduling subtasks []
DEBUG: Nothing else to do!
DEBUG: Stack: elements=['g', 'e', 'd']
DEBUG: Doing task d, scheduling subtasks ['g']
DEBUG: Stack: elements=['g', 'e', 'g']
DEBUG: Doing task g, scheduling subtasks []
DEBUG: Nothing else to do!
DEBUG: Stack: elements=['g', 'e']
DEBUG: Doing task e, scheduling subtasks []
DEBUG: Nothing else to do!
DEBUG: Stack: elements=['g']
DEBUG: Doing task g, scheduling subtasks []
DEBUG: Nothing else to do!
DEBUG: Stack: elements=[]
[3]:
['a', 'b', 'c', 'f', 'd', 'g', 'e', 'g']
The Stack
you must use is simple and supports push
, pop
, and is_empty
operations:
[4]:
s = Stack()
[5]:
print(s)
Stack: elements=[]
[6]:
s.is_empty()
[6]:
True
[7]:
s.push('a')
[8]:
print(s)
Stack: elements=['a']
[9]:
s.push('b')
[10]:
print(s)
Stack: elements=['a', 'b']
[11]:
s.pop()
[11]:
'b'
[12]:
print(s)
Stack: elements=['a']
B3.1 do¶
Now open tasks_stack_exercise.py
and implement function do
:
def do(task, subtasks):
""" Takes a task to perform and a dictionary of subtasks,
and RETURN a list of performed tasks
- To implement it, inside create a Stack instance and a while cycle.
- DO *NOT* use a recursive function
- Inside the function, you can use a print like "I'm doing task a',
but that is only to help yourself in debugging, only the
list returned by the function will be considered in the evaluation!
"""
Testing: python3 -m unittest tasks_test.DoTest
B3.2 do_level¶
In this exercise, you are asked to implement a slightly more complex version of the previous function where on the Stack
you push two-valued tuples, containing the task label and the associated level. The first task has level 0, the immediate subtask has level 1, the subtask of the subtask has level 2 and so on and so forth. In the list returned by the function, you will put such tuples.
One possibile use is to display the executed tasks as an indented tree, where the indentation is determined by the level. Here we see an example:
IMPORTANT: Again, the prints are only to let you understand what’s going on, and you are not required to code them. The only thing that really matters is the list the function must return !
[13]:
subtasks = {
'a':['b','g'],
'b':['c','d','e'],
'c':['f'],
'd':['g'],
'e':[],
'f':[],
'g':[]
}
do_level('a', subtasks)
DEBUG: Stack: elements=[('a', 0)]
DEBUG: I'm doing a level=0 Stack: elements=[('g', 1), ('b', 1)]
DEBUG: I'm doing b level=1 Stack: elements=[('g', 1), ('e', 2), ('d', 2), ('c', 2)]
DEBUG: I'm doing c level=2 Stack: elements=[('g', 1), ('e', 2), ('d', 2), ('f', 3)]
DEBUG: I'm doing f level=3 Stack: elements=[('g', 1), ('e', 2), ('d', 2)]
DEBUG: I'm doing d level=2 Stack: elements=[('g', 1), ('e', 2), ('g', 3)]
DEBUG: I'm doing g level=3 Stack: elements=[('g', 1), ('e', 2)]
DEBUG: I'm doing e level=2 Stack: elements=[('g', 1)]
DEBUG: I'm doing g level=1 Stack: elements=[]
[13]:
[('a', 0),
('b', 1),
('c', 2),
('f', 3),
('d', 2),
('g', 3),
('e', 2),
('g', 1)]
Now implement the function:
def do_level(task, subtasks):
""" Takes a task to perform and a dictionary of subtasks,
and RETURN a list of performed tasks, as tuples (task label, level)
- To implement it, use a Stack and a while cycle
- DO *NOT* use a recursive function
- Inside the function, you can use a print like "I'm doing task a',
but that is only to help yourself in debugging, only the
list returned by the function will be considered in the evaluation
"""
Testing: python3 -m unittest tasks_test.DoLevelTest
B4 Exits graph¶
There is a place nearby Trento called Silent Hill, where people always study and do little else. Unfortunately, one day an unethical biotech AI experiment goes wrong and a buggy cyborg is left free to roam in the building. To avoid panic, you are quickly asked to devise an evacuation plan. The place is a well known labyrinth, with endless corridors also looping into cycles. But you know you can model this network as a digraph, and decide to represent crossings as nodes. When a crossing has a
door to leave the building, its label starts with letter e
, while when there is no such door the label starts with letter n
.
In the example below, there are three exits e1
, e2
, and e3
. Given a node, say n1
, you want to tell the crowd in that node the shortest paths leading to the three exits. To avoid congestion, one third of the crowd may be told to go to e2
, one third to reach e1
and the remaining third will go to e3
even if they are farther than e2
.
In python terms, we would like to obtain a dictionary of paths like the following, where as keys we have the exits and as values the shortest sequence of nodes from n1
leading to that exit
{
'e1': ['n1', 'n2', 'e1'],
'e2': ['n1', 'e2'],
'e3': ['n1', 'e2', 'n3', 'e3']
}
[14]:
from sciprog import draw_dig
from exits_solution import *
from exits_test import dig
[15]:
G = dig({'n1':['n2','e2'],
'n2':['e1'],
'e1':['n1'],
'e2':['n2','n3', 'n4'],
'n3':['e3'],
'n4':['n1']})
draw_dig(G)

You will solve the exercise in steps, so open exits_solution.py
and proceed reading the following points.
B4.1 cp¶
Implement this method
def cp(self, source):
""" Performs a BFS search starting from provided node label source and
RETURN a dictionary of nodes representing the visit tree in the
child-to-parent format, that is, each key is a node label and as value
has the node label from which it was discovered for the first time
So if node "n2" was discovered for the first time while
inspecting the neighbors of "n1", then in the output dictionary there
will be the pair "n2":"n1".
The source node will have None as parent, so if source is "n1" in the
output dictionary there will be the pair "n1": None
NOTE: This method must *NOT* distinguish between exits
and normal nodes, in the tests we label them n1, e1 etc just
because we will reuse in next exercise
NOTE: You are allowed to put debug prints, but the only thing that
matters for the evaluation and tests to pass is the returned
dictionary
"""
Testing: python3 -m unittest exits_test.CpTest
Example:
[16]:
G.cp('n1')
DEBUG: Removed from queue: n1
DEBUG: Found neighbor: n2
DEBUG: not yet visited, enqueueing ..
DEBUG: Found neighbor: e2
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['n2', 'e2']
DEBUG: Removed from queue: n2
DEBUG: Found neighbor: e1
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['e2', 'e1']
DEBUG: Removed from queue: e2
DEBUG: Found neighbor: n2
DEBUG: already visited
DEBUG: Found neighbor: n3
DEBUG: not yet visited, enqueueing ..
DEBUG: Found neighbor: n4
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['e1', 'n3', 'n4']
DEBUG: Removed from queue: e1
DEBUG: Found neighbor: n1
DEBUG: already visited
DEBUG: Queue is: ['n3', 'n4']
DEBUG: Removed from queue: n3
DEBUG: Found neighbor: e3
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['n4', 'e3']
DEBUG: Removed from queue: n4
DEBUG: Found neighbor: n1
DEBUG: already visited
DEBUG: Queue is: ['e3']
DEBUG: Removed from queue: e3
DEBUG: Queue is: []
[16]:
{'n1': None,
'n2': 'n1',
'e2': 'n1',
'e1': 'n2',
'n3': 'e2',
'n4': 'e2',
'e3': 'n3'}
Basically, the dictionary above represents this visit tree:
n1
/ \
n2 e2
\ / \
e1 n3 n4
|
e3
B4.2 exits¶
Implement this function. NOTE: the function is external to class DiGraph.
def exits(cp):
"""
INPUT: a dictionary of nodes representing a visit tree in the
child-to-parent format, that is, each key is a node label and as value
has its parent as a node label. The root has associated None as parent.
OUTPUT: a dictionary mapping node labels of exits to the shortest path
from the root to the exit (root and exit included)
"""
Testing: python3 -m unittest exits_test.ExitsTest
Example:
[17]:
# as example we can use the same dictionary outputted by the cp call in the previous exercise
visit_cp = { 'e1': 'n2',
'e2': 'n1',
'e3': 'n3',
'n1': None,
'n2': 'n1',
'n3': 'e2',
'n4': 'e2'
}
exits(visit_cp)
[17]:
{'e1': ['n1', 'n2', 'e1'], 'e2': ['n1', 'e2'], 'e3': ['n1', 'e2', 'n3', 'e3']}
[ ]:
Exam - Wed 23, Jan 2019 - solutions¶
Scientific Programming - Data Science Master @ University of Trento
Download exercises and solution¶
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-01-23-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-01-23-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-other stuff ...
|-exams
|-2019-01-23
|- exam-2019-01-23-exercise.ipynb
|- list_exercise.py
|- list_test.py
|- tree_exercise.py
|- tree_test.py
Rename
datasciprolab-2019-01-23-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-01-23-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A¶
Open Jupyter and start editing this notebook exam-2019-01-23-exercise.ipynb
A.1 table_to_adj¶
Suppose you have a table expressed as a list of lists with headers like this:
[2]:
m0 = [
['Identifier','Price','Quantity'],
['a',1,1],
['b',5,8],
['c',2,6],
['d',8,5],
['e',7,3]
]
where a
, b
, c
etc are the row identifiers (imagine they represent items in a store), Price
and Quantity
are properties they might have. NOTE: here we put two properties, but they might have n
properties !
We want to transform such table into a graph-like format as a dictionary of lists, which relates store items as keys to the properties they might have. To include in the list both the property identifier and its value, we will use tuples. So you need to write a function that transforms the above input into this:
[3]:
res0 = {
'a':[('Price',1),('Quantity',1)],
'b':[('Price',5),('Quantity',8)],
'c':[('Price',2),('Quantity',6)],
'd':[('Price',8),('Quantity',5)],
'e':[('Price',7),('Quantity',3)]
}
[4]:
def table_to_adj(table):
#jupman-raise
ret = {}
headers = table[0]
for row in table[1:]:
lst = []
for j in range(1, len(row)):
lst.append((headers[j], row[j]))
ret[row[0]] = lst
return ret
#/jupman-raise
m0 = [
['I','P','Q']
]
res0 = {}
assert res0 == table_to_adj(m0)
m1 = [
['Identifier','Price','Quantity'],
['a',1,1],
['b',5,8],
['c',2,6],
['d',8,5],
['e',7,3]
]
res1 = {
'a':[('Price',1),('Quantity',1)],
'b':[('Price',5),('Quantity',8)],
'c':[('Price',2),('Quantity',6)],
'd':[('Price',8),('Quantity',5)],
'e':[('Price',7),('Quantity',3)]
}
assert res1 == table_to_adj(m1)
m2 = [
['I','P','Q'],
['a','x','y'],
['b','w','z'],
['c','z','x'],
['d','w','w'],
['e','y','x']
]
res2 = {
'a':[('P','x'),('Q','y')],
'b':[('P','w'),('Q','z')],
'c':[('P','z'),('Q','x')],
'd':[('P','w'),('Q','w')],
'e':[('P','y'),('Q','x')]
}
assert res2 == table_to_adj(m2)
m3 = [
['I','P','Q', 'R'],
['a','x','y', 'x'],
['b','z','x', 'y'],
]
res3 = {
'a':[('P','x'),('Q','y'), ('R','x')],
'b':[('P','z'),('Q','x'), ('R','y')],
}
assert res3 == table_to_adj(m3)
A.2 bus stops¶
Today we will analzye intercity bus network in GTFS format taken from dati.trentino.it, MITT service.
Original GTFS data was split in several files which we merged into dataset data/network.csv containing the bus stop times of three extra-urban routes. To load it, we provide this function:
[5]:
def load_stops():
"Loads file network.csv and RETURN a list of dictionaries with the stop times"
import csv
with open('data/network.csv', newline='', encoding='UTF-8') as csvfile:
reader = csv.DictReader(csvfile)
lst = []
for d in reader:
lst.append(d)
return lst
[6]:
stops = load_stops()
stops[0:2]
[6]:
[OrderedDict([('', '1'),
('route_id', '76'),
('agency_id', '12'),
('route_short_name', 'B202'),
('route_long_name',
'Trento-Sardagna-Candriai-Vaneze-Vason-Viote'),
('route_type', '3'),
('service_id', '22018091220190621'),
('trip_id', '0002402742018091220190621'),
('trip_headsign', 'Trento-Autostaz.'),
('direction_id', '0'),
('arrival_time', '06:25:00'),
('departure_time', '06:25:00'),
('stop_id', '844'),
('stop_sequence', '2'),
('stop_code', '2620'),
('stop_name', 'Sardagna'),
('stop_desc', ''),
('stop_lat', '46.064848'),
('stop_lon', '11.09729'),
('zone_id', '2620.0')]),
OrderedDict([('', '2'),
('route_id', '76'),
('agency_id', '12'),
('route_short_name', 'B202'),
('route_long_name',
'Trento-Sardagna-Candriai-Vaneze-Vason-Viote'),
('route_type', '3'),
('service_id', '22018091220190621'),
('trip_id', '0002402742018091220190621'),
('trip_headsign', 'Trento-Autostaz.'),
('direction_id', '0'),
('arrival_time', '06:26:00'),
('departure_time', '06:26:00'),
('stop_id', '5203'),
('stop_sequence', '3'),
('stop_code', '2620VD'),
('stop_name', 'Sardagna Civ. 22'),
('stop_desc', ''),
('stop_lat', '46.069494'),
('stop_lon', '11.095252'),
('zone_id', '2620.0')])]
Of interest to you are the fields route_short_name
, arrival_time
, and stop_lat
and stop_lon
which provide the geographical coordinates of the stop. Stops are already sorted in the file from earliest to latest.
Given a route_short_name
, like B202
, we want to plot the graph of bus velocity measured in km/hours at each stop. We define velocity at stop n
as
\(velocity_n = \frac{\Delta space_n}{\Delta time_n }\)
where
\(\Delta time_n = time_n - time_{n-1}\) as the time in hours the bus takes between stop \(n\) and stop \(n-1\).
and
\(\Delta space_n = space_n - space_{n-1}\) is the distance the bus has moved between stop \(n\) and stop \(n-1\).
We also set \(velocity_0 = 0\)
NOTE FOR TIME: When we say time in hours, it means that if you have the time as string 08:27:42
, its number in seconds since midnight is like:
[7]:
secs = 8*60*60+27*60+42
and to calculate the time in float hours you need to divide secs
by 60*60=3600
:
[8]:
hours_float = secs / (60*60)
hours_float
[8]:
8.461666666666666
NOTE FOR SPACE: Unfortunately, we could not find the actual distance as road length done by the bus between one stop and the next one. So, for the sake of the exercise, we will take the geo distance, that is, we will calculate it using the line distance between the points of the stops, using their geographical coordinates. The function to calculate the geo_distance
is already implemented :
[9]:
def geo_distance(lat1, lon1, lat2, lon2):
""" Return the geo distance in kilometers
between the points 1 and 2 at provided geographical coordinates.
"""
# Shamelessly copied from https://stackoverflow.com/a/19412565
from math import sin, cos, sqrt, atan2, radians
# approximate radius of earth in km
R = 6373.0
lat1 = radians(lat1)
lon1 = radians(lon1)
lat2 = radians(lat2)
lon2 = radians(lon2)
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
return R * c
In the following we see the bus line B102
, going from Sardagna to Trento. The graph should show something like the following.
We can see that as long as the bus is taking stops within Sardagna town, velocity (always intended as air-line velocity ) is high, but when the bus has to go to Trento, since there are many twists and turns on the road, it takes a while to arrive even if in geo-distance Trento is near, so actual velocity decreases. In such case it would be much more convenient to take the cable car.
These type of graphs might show places in the territory where shortcuts such as cable cars, tunnels or bridges might be helpful for transportation.
[10]:
def to_float_hour(time_string):
"""
Takes a time string in the format like 08:27:42
and RETURN the time since midnight in hours as a float (es 8.461666666666666)
"""
#jupman-raise
hours = int(time_string[0:2])
mins = int(time_string[3:5])
secs = int(time_string[6:])
return (hours * 60 * 60 + mins * 60 + secs) / (60*60)
#/jupman-raise
def plot(route_short_name):
""" Takes a route_short_name and *PLOTS* with matplotlib a graph of the velocity of
the the bus trip for that route
- just use matplotlib, you *don't* need pandas and *don't* need numpy
- xs positions MUST be in *float hours*, distanced at lengths proportional
to the actual time the bus arrives that stop
- xticks MUST show
- the stop name *NICELY* (with carriage returns)
- the time in *08:50:12 format*
- ys MUST show the velocity of the bus at that time
- assume velocity at stop 0 equals 0
- remember to set the figure width and heigth
- remember to set axis labels and title
"""
#jupman-raise
stops = load_stops()
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
xs = []
ys = []
ticks = []
seq = [d for d in stops if d['route_short_name'] == route_short_name]
d_prev = seq[0]
n = 0
for d in seq:
xs.append(to_float_hour(d['arrival_time']))
if n == 0:
v = 0
else:
delta_distance = geo_distance(float(d['stop_lat']), float(d['stop_lon']),
float(d_prev['stop_lat']), float(d_prev['stop_lon']))
delta_time = (to_float_hour(d['arrival_time']) - to_float_hour(d_prev['arrival_time']))
v = delta_distance / delta_time
ys.append(v)
ticks.append("%s\n%s" % (d['stop_name'].replace(' ','\n').replace('-','\n'), d['arrival_time']))
d_prev = d
n += 1
fig = plt.figure(figsize=(20,12)) # width: 20 inches, height 12 inches
plt.plot(xs, ys)
plt.title("%s stops SOLUTION" % route_short_name)
plt.xlabel('stops')
plt.ylabel('velocity (Km/h)')
# FIRST NEEDS A SEQUENCE WITH THE POSITIONS, THEN A SEQUENCE OF SAME LENGTH WITH LABELS
plt.xticks(xs, ticks)
print('xs = %s' % xs)
print('ys = %s' % ys)
print('xticks = %s' % ticks)
plt.savefig('img/%s.png' % route_short_name)
plt.show()
#/jupman-raise
plot('B202')
xs = [6.416666666666667, 6.433333333333334, 6.45, 6.466666666666667, 6.516666666666667, 6.55, 6.566666666666666, 6.616666666666666, 6.65, 6.683333333333334]
ys = [0, 32.410644806589666, 25.440452145453996, 29.058090168277648, 4.151814096935986, 7.514788081665398, 24.226499833822754, 3.8149164687282586, 34.89698602693173, 14.321244382769315]
xticks = ['Sardagna\n06:25:00', 'Sardagna\nCiv.\n22\n06:26:00', 'Sardagna\nCiv.20\n06:27:00', 'Sardagna\nMaso\nScala\n06:28:00', 'Trento\nLoc.\nS.Antonio\n06:31:00', 'Trento\nVia\nSardagna\nCiv.\n104\n06:33:00', 'Trento\nMaso\nPedrotti\n06:34:00', 'Trento\nLoc.Conotter\n06:37:00', 'Trento\nVia\nBrescia\n4\n06:39:00', 'Trento\nAutostaz.\n06:41:00']
plot('B201')
xs = [18.25, 18.283333333333335, 18.333333333333332, 18.533333333333335, 18.75, 19.166666666666668]
ys = [0, 57.11513455659372, 27.731105466934423, 41.63842308087865, 28.5197376150513, 31.49374154105802]
xticks = ['Tione\nAutostazione\n18:15:00', 'Zuclo\nSs237\n"Superm.\nLidl"\n18:17:00', 'Saone\n18:20:00', 'Ponte\nArche\nAutost.\n18:32:00', 'Sarche\nCentro\nComm.\n18:45:00', 'Trento\nAutostaz.\n19:10:00']
plot('B301')
xs = [17.583333333333332, 17.666666666666668, 17.733333333333334, 17.766666666666666, 17.8, 17.833333333333332, 17.883333333333333, 17.9, 17.916666666666668, 17.933333333333334, 17.983333333333334, 18.0, 18.05, 18.066666666666666, 18.083333333333332, 18.1, 18.133333333333333, 18.15, 18.166666666666668, 18.183333333333334, 18.25, 18.266666666666666, 18.3, 18.316666666666666, 18.35, 18.383333333333333, 18.4]
ys = [0, 12.183536596091201, 11.250009180954352, 16.612469697023045, 20.32290877261807, 29.650645502388567, 43.45858933073937, 33.590326783093374, 51.14340770207765, 31.710506116846854, 24.12416002315475, 68.52690370810224, 66.54632979050625, 36.97129817779247, 29.62791050495846, 34.08490909322781, 29.184331044522004, 19.648559840967014, 37.7140096915846, 43.892216115372726, 33.48796397878209, 29.521341752309603, 32.83990219938084, 38.20505182104893, 27.292895333249888, 12.602972475349818, 28.804672730461583]
xticks = ['Trento\nAutostaz.\n17:35:00', 'Trento\nC.So\nTre\nNovembre\n17:40:00', 'Trento\nViale\nVerona\n17:44:00', 'Trento\nS.Bartolameo\n17:46:00', 'Trento\nViale\nVerona\nBig\nCenter\n17:48:00', 'Trento\nMan\n17:50:00', 'Mattarello\nLoc.Ronchi\n17:53:00', 'Mattarello\nVia\nNazionale\n17:54:00', 'Mattarello\n17:55:00', 'Mattarello\nEx\nSt.Vestimenta\n17:56:00', 'Acquaviva\n17:59:00', 'Acquaviva\nPizzeria\n18:00:00', 'Besenello\nPosta\nVecchia\n18:03:00', 'Besenello\nFerm.\nNord\n18:04:00', 'Besenello\n18:05:00', 'Besenello\nFerm.\nSud\n18:06:00', 'Calliano\nSp\n49\n"Cimitero"\n18:08:00', 'Calliano\n18:09:00', 'Calliano\nGrafiche\nManfrini\n18:10:00', 'Castelpietra\n18:11:00', 'Volano\n18:15:00', 'Volano\nVia\nDes\nTor\n18:16:00', 'Ss.12\nS.Ilario/Via\nStroperi\n18:18:00', 'S.Ilario\n18:19:00', 'Rovereto\nV.Le\nTrento\n18:21:00', 'Rovereto\nVia\nBarattieri\n18:23:00', 'Rovereto\nVia\nManzoni\n18:24:00']
Part B¶
B.1 Theory¶
Let L
a list of size n
, and i
and j
two indeces. Return the computational complexity of function fun()
with respect to n.
def fun(L,i,j):
if i==j:
return 0
else:
m = (i+j)//2
count = 0
for x in L[i:m]:
for y in L[m:j+1]:
if x==y:
count = count+1
left = fun(L,i,m)
right = fun(L,m+1,j)
return left+right+count
ANSWER: write solution here
\(O(n^2)\)
B.2 Linked List flatv¶
Suppose a LinkedList
only contains integer numbers, say 3,8,8,7,5,8,6,3,9. Implement method flatv
which scans the list: when it finds the first occurence of a node which contains a number which is less then the previous one, and the less than successive one, it inserts after the current one another node with the same data as the current one, and exits.
Example:
for Linked list 3,8,8,7,5,8,6,3,9
calling flatv
should modify the linked list so that it becomes
Linked list 3,8,8,7,5,5,8,6,3,9
Note that it only modifies the first occurrence found 7,5,8 to 7,5,5,8 and the successive sequence 6,3,9 is not altered
Open list_exercise.py
and implement this method:
def flatv(self):
B.3 Generic Tree rightmost¶
In the example above, the rightmost branch of a
is given by the node sequence a
,d
,n
Open tree_exercise.py
and implement this method:
def rightmost(self):
""" RETURN a list containing the *data* of the nodes
in the *rightmost* branch of the tree.
Example:
a
├b
├c
|└e
└d
├f
└g
├h
└i
should give
['a','d','g','i']
"""
[ ]:
Exam - Wed 13, Feb 2019 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-02-13-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-02-13-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-other stuff ...
|-exams
|-2019-02-13
|- exam-2019-02-13-exercise.ipynb
|- queue_exercise.py
|- queue_test.py
|- tree_exercise.py
|- tree_test.py
Rename
datasciprolab-2019-02-13-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-02-13-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A - Bus network visualization¶
Open Jupyter and start editing this notebook exam-2019-02-13-exercise.ipynb
Today we will visualize intercity bus network in GTFS format taken from dati.trentino.it, MITT service. Original data was split in several files which we merged into dataset data/network-short.csv.
To visualize it, we will use networkx library. Let’s first see an example on how to do it:
[2]:
import networkx as nx
from sciprog import draw_nx
Gex = nx.DiGraph()
# we can force horizontal layout like this:
Gex.graph['graph']= {
'rankdir':'LR',
}
# When we add nodes, we can identify them with an identifier like the
# stop_id which is separate from the label, because in some unfortunate
# case two different stops can share the same label.
Gex.add_node('1', label='Trento-Autostaz.',
color='black', fontcolor='black')
Gex.add_node('723', label='Trento-Via Brescia 4',
color='black', fontcolor='black')
Gex.add_node('870', label='Sarch Centro comm.',
color='black', fontcolor='black')
Gex.add_node('1180', label='Trento Corso 3 Novembre',
color='black', fontcolor='black')
# IMPORTANT: edges connect stop_ids , NOT labels !!!!
Gex.add_edge('870','1')
Gex.add_edge('723','1')
Gex.add_edge('1','1180')
# function defined in sciprog.py :
draw_nx(Gex)

Since we have a bus stop netowrk, we might want to draw edges according to the route they represent. Here we show how to do it only with the edge from Trento-Autostaz to Trento Corso 3 Novembre:
[3]:
# we can retrieve an edge like this:
edge = Gex['1']['1180']
# and set attributes, like these:
edge['weight'] = 5 # it takes 5 minutes to go from Trento-Autostaz
# to Trento Corso 3 Novembre
edge['label'] = str(5) # the label is a string
edge['color'] = '#2ca02c' # we can set some style for the edge, such as color
edge['penwidth']= 4 # and thickness
edge['route_short_name'] = 'B301' # we can add any attribute we want,
# Note these custom ones won't show in the graph
draw_nx(Gex)

To be more explicit, we can also add a legend this way:
[4]:
draw_nx(Gex, [{'color': '#2ca02c', 'label': 'B211'}])

[5]:
# Note an edge is a simple dictionary:
print(edge)
{'weight': 5, 'label': '5', 'color': '#2ca02c', 'penwidth': 4, 'route_short_name': 'B301'}
To load network-short.csv, we provide this function:
[6]:
def load_stops():
"""Loads file data and RETURN a list of dictionaries with the stop times
"""
import csv
with open('data/network-short.csv', newline='', encoding='UTF-8') as csvfile:
reader = csv.DictReader(csvfile)
lst = []
for d in reader:
lst.append(d)
return lst
[7]:
stops = load_stops()
#IMPORTANT: NOTICE *ALL* VALUES ARE *STRINGS* !!!!!!!!!!!!
stops[0:2]
[7]:
[OrderedDict([('', '3'),
('route_id', '76'),
('agency_id', '12'),
('route_short_name', 'B202'),
('route_long_name',
'Trento-Sardagna-Candriai-Vaneze-Vason-Viote'),
('route_type', '3'),
('service_id', '22018091220190621'),
('trip_id', '0002402742018091220190621'),
('trip_headsign', 'Trento-Autostaz.'),
('direction_id', '0'),
('arrival_time', '06:27:00'),
('departure_time', '06:27:00'),
('stop_id', '5025'),
('stop_sequence', '4'),
('stop_code', '2620VE'),
('stop_name', 'Sardagna Civ.20'),
('stop_desc', ''),
('stop_lat', '46.073125'),
('stop_lon', '11.093579'),
('zone_id', '2620.0')]),
OrderedDict([('', '4'),
('route_id', '76'),
('agency_id', '12'),
('route_short_name', 'B202'),
('route_long_name',
'Trento-Sardagna-Candriai-Vaneze-Vason-Viote'),
('route_type', '3'),
('service_id', '22018091220190621'),
('trip_id', '0002402742018091220190621'),
('trip_headsign', 'Trento-Autostaz.'),
('direction_id', '0'),
('arrival_time', '06:28:00'),
('departure_time', '06:28:00'),
('stop_id', '843'),
('stop_sequence', '5'),
('stop_code', '2620MS'),
('stop_name', 'Sardagna-Maso Scala'),
('stop_desc', ''),
('stop_lat', '46.069871'),
('stop_lon', '11.097749'),
('zone_id', '2620.0')])]
A1 extract_routes¶
Implement extract_routes
function:
[8]:
import networkx as nx
from sciprog import draw_nx
stops = load_stops()
def extract_routes(stops):
""" Extract all route_short_name from the stops list and RETURN
an alphabetically sorted list of them, without duplicates
(see example)
"""
#jupman-raise
s = set()
for diz in stops:
s.add(diz['route_short_name'])
ret = list(s)
ret.sort()
return ret
#/jupman-raise
Example:
[9]:
extract_routes(stops)
[9]:
['B201', 'B202', 'B211', 'B217', 'B301']
A2 to_int_min¶
Implement this function:
[10]:
def to_int_min(time_string):
"""
Takes a time string in the format like 08:27:42
and RETURN the time since midnight in minutes, ignoring
the seconds (es 507)
"""
#jupman-raise
hours = int(time_string[0:2])
mins = int(time_string[3:5])
return (hours * 60 + mins)
#/jupman-raise
Example:
[11]:
to_int_min('08:27:42')
[11]:
507
A3 get_legend_edges¶
If you have n
routes numbered from 0
to n-1
, and you want to assign to each of them a different color, we provide this function:
[12]:
def get_color(i, n):
""" RETURN the i-th color chosen from n possible colors, in
hex format (i.e. #ff0018).
- if i < 0 or i >= n, raise ValueError
"""
if n < 1:
raise ValueError("Invalid n: %s" % n)
if i < 0 or i >= n:
raise ValueError("Invalid i: %s" % i)
#HACKY, just for matplotlib < 3
lst = ['#1f77b4',
'#ff7f0e',
'#2ca02c',
'#d62728',
'#9467bd',
'#8c564b',
'#e377c2',
'#7f7f7f',
'#bcbd22',
'#17becf']
return lst[i % 10]
[13]:
get_color(4,5)
[13]:
'#9467bd'
Now implement this function:
[14]:
def get_legend_edges():
"""
RETURN a list of dictionaries, where each dictionary represent a route
with label and associated color. Dictionaries are in the order returned by
extract_routes() function.
"""
#jupman-raise
legend_edges = []
i = 0
routes = extract_routes(stops)
for route_short_name in routes:
legend_edges.append({
'label': route_short_name,
'color':get_color(i,len(routes))
})
i += 1
return legend_edges
#/jupman-raise
[15]:
get_legend_edges()
[15]:
[{'label': 'B201', 'color': '#1f77b4'},
{'label': 'B202', 'color': '#ff7f0e'},
{'label': 'B211', 'color': '#2ca02c'},
{'label': 'B217', 'color': '#d62728'},
{'label': 'B301', 'color': '#9467bd'}]
A4 calc_nx¶
Implement this function:
[16]:
def calc_nx(stops):
"""
RETURN a NetworkX DiGraph representing the bus stop network
- To keep things simple, we suppose routes NEVER overlap (no edge is ever
shared by two routes), so we need only a DiGraph and not a MultiGraph
- as label for nodes, use the stop_name, and try to format it nicely.
- as 'weight' for the edges, use the time in minutes between one stop
and the next one
- as custom property, add 'route_short_name'
- as 'color' for the edges, use the color given by provided
get_color(i,n) function
- as 'penwidth' for edges, set 4
- IMPORTANT: notice stops are already ordered by arrival_time, this
makes it easy to find edges !
- HINT: to make sure you're on the right track, try first to
represent one single route, like B202
"""
#jupman-raise
G = nx.DiGraph()
G.graph['graph']= {
'rankdir':'LR', # horizontal layout ,
}
G.name = '************* calc_nx SOLUTION '
routes = extract_routes(stops)
i = 0
for route_short_name in routes:
prev_diz = None
for diz in stops:
if diz['route_short_name'] == route_short_name:
G.add_node( diz['stop_id'],
label=diz['stop_name'].replace(' ', '\n').replace('-','\n'),
color='black',
fontcolor='black')
if prev_diz:
G.add_edge(prev_diz['stop_id'], diz['stop_id'])
delta_time = to_int_min(diz['arrival_time']) - to_int_min(prev_diz['arrival_time'])
edge = G[prev_diz['stop_id']][diz['stop_id']]
edge['weight'] = delta_time
edge['label'] = str(delta_time)
edge['route_short_name'] = route_short_name
edge['color'] = get_color(i, len(routes))
edge['penwidth']= 4
prev_diz = diz
i += 1
return G
#/jupman-raise
[17]:
G = calc_nx(stops)
draw_nx(G, get_legend_edges())

A5 color_hubs¶
A hub is a node that allows to switch route, that is, it is touched by at least two different routes.
For example, Trento-Autostaz is touched by three routes, which is more than one, so it is a hub. Let’s examine the node - we know it has stop_id='1'
:
[18]:
G.node['1']
[18]:
{'label': 'Trento\nAutostaz.', 'color': 'black', 'fontcolor': 'black'}
If we examine its in_edges
, we find it has incoming edges from stop_id
'723'
and '870'
, which represent respectively Trento Via Brescia and Sarche Centro Commerciale :
[19]:
G.in_edges('1')
[19]:
InEdgeDataView([('870', '1'), ('723', '1')])
If you get a View object, if needed you can easily transform to a list:
[20]:
list(G.in_edges('1'))
[20]:
[('870', '1'), ('723', '1')]
[21]:
G.node['723']
[21]:
{'label': 'Trento\nVia\nBrescia\n4', 'color': 'black', 'fontcolor': 'black'}
[22]:
G.node['870']
[22]:
{'label': 'Sarche\nCentro\nComm.', 'color': 'black', 'fontcolor': 'black'}
There is only an outgoing edge toward Trento Corso 3 Novembre :
[23]:
G.out_edges('1')
[23]:
OutEdgeDataView([('1', '1108')])
[24]:
G.node['1108']
[24]:
{'label': 'Trento\nC.So\nTre\nNovembre',
'color': 'black',
'fontcolor': 'black'}
If, for example, we want to know the route_id
of this outgoing edge, we can access it this way:
[25]:
G['1']['1108']
[25]:
{'weight': 5,
'label': '5',
'route_short_name': 'B301',
'color': '#9467bd',
'penwidth': 4}
If you want to change the color attribute of the node '1'
, you can write like this:
[26]:
G.node['1']['color'] = 'red'
G.node['1']['fontcolor'] = 'red'
Now implement the function color_hubs
:
[27]:
def color_hubs(G):
""" Print the hubs in the graph G as text, and then draws the graph
with the hubs colored in red.
NOTE: you don't need to recalculate the graph, just set the relevant
nodes color to red
"""
#jupman-raise
G.name = '************* color_hubs SOLUTION '
hubs = []
for node in G.nodes():
edges = list(G.in_edges(node)) + list(G.out_edges(node))
route_short_names = set()
for edge in edges:
route_short_names.add(G[edge[0]][edge[1]]['route_short_name'])
if len(route_short_names) > 1:
hubs.append(node)
print("SOLUTION: The hubs are:")
print()
for hub in hubs:
print("stop_id:%s\n%s\n" % (hub, G.node[hub]['label'] ))
G.node[hub]['color']='red'
G.node[hub]['fontcolor']='red'
#/jupman-raise
draw_nx(G, legend_edges=get_legend_edges())
[28]:
color_hubs(G)
SOLUTION: The hubs are:
stop_id:757
Tione
Autostazione
stop_id:742
Ponte
Arche
Autost.
stop_id:1
Trento
Autostaz.

A6 plot_timings¶
To extract bus times from G
, use this:
[29]:
G.edges()
[29]:
OutEdgeView([('757', '746'), ('746', '857'), ('857', '742'), ('742', '870'), ('870', '1'), ('1', '1108'), ('5025', '843'), ('843', '842'), ('842', '3974'), ('3974', '841'), ('841', '881'), ('881', '723'), ('723', '1'), ('1556', '4392'), ('4392', '4391'), ('4391', '4390'), ('4390', '742'), ('829', '3213'), ('3213', '757'), ('1108', '1109')])
If you get a View, you can iterate through the sequence like it were a list
To get the data from an edge, you can use this:
[30]:
G.get_edge_data('1','1108')
[30]:
{'weight': 5,
'label': '5',
'route_short_name': 'B301',
'color': '#9467bd',
'penwidth': 4}
Now implement the function plot_timings
:
[31]:
def plot_timings(G):
"""
Given a networkx DiGraph G plots a frequency histogram of the
time between bus stops.
"""
#jupman-raise
import numpy as np
import matplotlib.pyplot as plt
timings = [G.get_edge_data(edge[0], edge[1])['weight'] for edge in G.edges()]
import matplotlib.pyplot as plt
import numpy as np
# add histogram
min_x = min(timings)
max_x = max(timings)
bar_width = 1.0
# in this case hist returns a tuple of three values
# we put in three variables
n, bins, columns = plt.hist(timings,
bins=range(min_x,max_x + 1),
width=1.0) # graphical width of the bars
xs = np.arange(min_x,max_x + 1)
plt.xlabel('Time between stops in minutes')
plt.ylabel('Frequency counts')
plt.title('Time histogram SOLUTION')
plt.xlim(0, max(timings) + 2)
plt.xticks(xs + bar_width / 2, # position of ticks
xs )
plt.show()
#/jupman-raise
[32]:
plot_timings(G)

Part B¶
B.1 Theory¶
Let L
a list of size n
, and i
and j
two indeces. Return the computational complexity of function fun()
with respect to n
.
Write the solution in separate ``theory.txt`` file
def fun(L, i, j):
# j-i+1 is the number of elements
# between index i and index j (both included)
if j-i+1 <= 3:
# Compute their minimum
return min(L[i:j+1])
else:
onethird = (j-i+1)//3
res1 = fun(L,i, i+onethird)
res2 = fun(L,i+onethird+1, i+2*onethird)
res3 = fun(L,i+2*onethird+1, j)
return min(res1,res2,res3)
ANSWER: \(\Theta(n)\)
B2 Company queues¶
We can model a company as a list of many employees ordered by their rank, the highest ranking being the first in the list. We assume all employees have different rank. Each employee has a name, a rank, and a queue of tasks to perform (as a Python deque).
When a new employee arrives, it is inserted in the list in the right position according to his rank:
[33]:
from queue_solution import *
c = Company()
print(c)
Company:
name rank tasks
[34]:
c.add_employee('x',9)
[35]:
print(c)
Company:
name rank tasks
x 9 deque([])
[36]:
c.add_employee('z',2)
[37]:
print(c)
Company:
name rank tasks
x 9 deque([])
z 2 deque([])
[38]:
c.add_employee('y',6)
[39]:
print(c)
Company:
name rank tasks
x 9 deque([])
y 6 deque([])
z 2 deque([])
B2.1 add_employee¶
Implement this method:
def add_employee(self, name, rank):
"""
Adds employee with name and rank to the company, maintaining
the _employees list sorted by rank (higher rank comes first)
Represent the employee as a dictionary with keys 'name', 'rank'
and 'tasks' (a Python deque)
- here we don't mind about complexity, feel free to use a
linear scan and .insert
- If an employee of the same rank already exists, raise ValueError
- if an employee of the same name already exists, raise ValueError
"""
Testing: python3 -m unittest queue_test.AddEmployeeTest
B2.2 add_task¶
Each employee has a queue of tasks to perform. Tasks enter from the right and leave from the left. Each task has associated a required rank to perform it, but when it is assigned to an employee the required rank may exceed the employee rank or be far below the employee rank. Still, when the company receives the task, it is scheduled in the given employee queue, ignoring the task rank.
[40]:
c.add_task('a',3,'x')
[41]:
c
[41]:
Company:
name rank tasks
x 9 deque([('a', 3)])
y 6 deque([])
z 2 deque([])
[42]:
c.add_task('b',5,'x')
[43]:
c
[43]:
Company:
name rank tasks
x 9 deque([('a', 3), ('b', 5)])
y 6 deque([])
z 2 deque([])
[44]:
c.add_task('c',12,'x')
c.add_task('d',1,'x')
c.add_task('e',8,'y')
c.add_task('f',2,'y')
c.add_task('g',8,'y')
c.add_task('h',10,'z')
[45]:
c
[45]:
Company:
name rank tasks
x 9 deque([('a', 3), ('b', 5), ('c', 12), ('d', 1)])
y 6 deque([('e', 8), ('f', 2), ('g', 8)])
z 2 deque([('h', 10)])
Implement this function:
def add_task(self, task_name, task_rank, employee_name):
""" Append the task as a (name, rank) tuple to the tasks of
given employee
- If employee does not exist, raise ValueError
"""
Testing: python3 -m unittest queue_test.AddTaskTest
B2.2 work¶
Work in the company is produced in work steps. Each work step produces a list of all task names executed by the company in that work step.
A work step is done this way:
For each employee, starting from the highest ranking one, dequeue its current task (from the left), and than compare the task required rank with the employee rank according to these rules:
When an employee discovers a task requires a rank strictly greater than his rank, he will append the task to his supervisor tasks. Note the highest ranking employee may be forced to do tasks that are greater than his rank.
When an employee discovers he should do a task requiring a rank strictly less than his, he will try to see if the next lower ranking employee can do the task, and if so append the task to that employee tasks.
When an employee cannot pass the task to the supervisor nor the next lower ranking employee, he will actually execute the task, adding it to the work step list
Example:
[46]:
c
[46]:
Company:
name rank tasks
x 9 deque([('a', 3), ('b', 5), ('c', 12), ('d', 1)])
y 6 deque([('e', 8), ('f', 2), ('g', 8)])
z 2 deque([('h', 10)])
[47]:
c.work()
DEBUG: Employee x gives task ('a', 3) to employee y
DEBUG: Employee y gives task ('e', 8) to employee x
DEBUG: Employee z gives task ('h', 10) to employee y
DEBUG: Total performed work this step: []
[47]:
[]
[48]:
c
[48]:
Company:
name rank tasks
x 9 deque([('b', 5), ('c', 12), ('d', 1), ('e', 8)])
y 6 deque([('f', 2), ('g', 8), ('a', 3), ('h', 10)])
z 2 deque([])
[49]:
c.work()
DEBUG: Employee x gives task ('b', 5) to employee y
DEBUG: Employee y gives task ('f', 2) to employee z
DEBUG: Employee z executes task ('f', 2)
DEBUG: Total performed work this step: ['f']
[49]:
['f']
[50]:
c
[50]:
Company:
name rank tasks
x 9 deque([('c', 12), ('d', 1), ('e', 8)])
y 6 deque([('g', 8), ('a', 3), ('h', 10), ('b', 5)])
z 2 deque([])
[51]:
c.work()
DEBUG: Employee x executes task ('c', 12)
DEBUG: Employee y gives task ('g', 8) to employee x
DEBUG: Total performed work this step: ['c']
[51]:
['c']
[52]:
c
[52]:
Company:
name rank tasks
x 9 deque([('d', 1), ('e', 8), ('g', 8)])
y 6 deque([('a', 3), ('h', 10), ('b', 5)])
z 2 deque([])
[53]:
c.work()
DEBUG: Employee x gives task ('d', 1) to employee y
DEBUG: Employee y executes task ('a', 3)
DEBUG: Total performed work this step: ['a']
[53]:
['a']
[54]:
c
[54]:
Company:
name rank tasks
x 9 deque([('e', 8), ('g', 8)])
y 6 deque([('h', 10), ('b', 5), ('d', 1)])
z 2 deque([])
[55]:
c.work()
DEBUG: Employee x executes task ('e', 8)
DEBUG: Employee y gives task ('h', 10) to employee x
DEBUG: Total performed work this step: ['e']
[55]:
['e']
[56]:
c
[56]:
Company:
name rank tasks
x 9 deque([('g', 8), ('h', 10)])
y 6 deque([('b', 5), ('d', 1)])
z 2 deque([])
[57]:
c.work()
DEBUG: Employee x executes task ('g', 8)
DEBUG: Employee y executes task ('b', 5)
DEBUG: Total performed work this step: ['g', 'b']
[57]:
['g', 'b']
[58]:
c
[58]:
Company:
name rank tasks
x 9 deque([('h', 10)])
y 6 deque([('d', 1)])
z 2 deque([])
[59]:
c.work()
DEBUG: Employee x executes task ('h', 10)
DEBUG: Employee y gives task ('d', 1) to employee z
DEBUG: Employee z executes task ('d', 1)
DEBUG: Total performed work this step: ['h', 'd']
[59]:
['h', 'd']
[60]:
c
[60]:
Company:
name rank tasks
x 9 deque([])
y 6 deque([])
z 2 deque([])
Now implement this method:
def work(self):
""" Performs a work step and RETURN a list of performed task names.
For each employee, dequeue its current task from the left and:
- if the task rank is greater than the rank of the
current employee, append the task to his supervisor queue
(the highest ranking employee must execute the task)
- if the task rank is lower or equal to the rank of the
next lower ranking employee, append the task to that employee
queue
- otherwise, add the task name to the list of
performed tasks to return
"""
Testing: python3 -m unittest queue_test.WorkTest
B3 GenericTree¶
B3.1 fill_left¶
Open tree_exercise.py
and implement fill_left
method:
def fill_left(self, stuff):
""" MODIFIES the tree by filling the leftmost branch data
with values from provided array 'stuff'
- if there aren't enough nodes to fill, raise ValueError
- root data is not modified
- *DO NOT* use recursion
"""
Testing: python3 -m unittest tree_test.FillLeftTest
Example:
[61]:
from tree_test import gt
from tree_solution import *
[62]:
t = gt('a',
gt('b',
gt('e',
gt('f'),
gt('g',
gt('i')),
gt('h')),
gt('c'),
gt('d')))
[63]:
print(t)
a
└b
├e
│├f
│├g
││└i
│└h
├c
└d
[64]:
t.fill_left(['x','y'])
[65]:
print(t)
a
└x
├y
│├f
│├g
││└i
│└h
├c
└d
[66]:
t.fill_left(['W','V','T'])
print(t)
a
└W
├V
│├T
│├g
││└i
│└h
├c
└d
B3.2 follow¶
Open tree_exercise.py
and implement follow
method:
def follow(self, positions):
"""
RETURN an array of node data, representing a branch from the
root down to a certain depth.
The path to follow is determined by given positions, which
is an array of integer indeces, see example.
- if provided indeces lead to non-existing nodes, raise ValueError
- IMPORTANT: *DO NOT* use recursion, use a couple of while instead.
- IMPORTANT: *DO NOT* attempt to convert siblings to
a python list !!!! Doing so will give you less points!
"""
Testing: python3 -m unittest tree_test.FollowTest
Example:
level 01234
a
├b
├c
|└e
| ├f
| ├g
| |└i
| └h
└d
RETURNS
t.follow([]) [a] root data is always present
t.follow([0]) [a,b] b is the 0-th child of a
t.follow([2]) [a,d] d is the 2-nd child of a
t.follow([1,0,2]) [a,c,e,h] c is the 1-st child of a
e is the 0-th child of c
h is the 2-nd child of e
t.follow([1,0,1,0]) [a,c,e,g,i] c is the 1-st child of a
e is the 0-th child of c
g is the 1-st child of e
i is the 0-th child of g
[ ]:
Exam - Monday 10, June 2019 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-06-10-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-06-10-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-other stuff ...
|-exams
|-2019-06-10
|- exam-2019-06-10-exercise.ipynb
|- stack_exercise.py
|- stack_test.py
|- tree_exercise.py
|- tree_test.py
Rename
datasciprolab-2019-06-10-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-06-10-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A¶
Open Jupyter and start editing this notebook exam-2019-06-10-exercise.ipynb
A1 ITEA real estate¶
You will now analyze public real estates in Trentino, which are managed by ITEA agency. Every real estate has a type, and we will find the type distribution.
Data provider: ITEA - dati.trentino.it
A function load_itea
is given to load the dataset (you don’t need to implement it):
[2]:
def load_itea():
"""Loads file data and RETURN a list of dictionaries with the stop times
"""
import csv
with open('data/itea.csv', newline='', encoding='latin-1',) as csvfile:
reader = csv.DictReader(csvfile, delimiter=';')
lst = []
for d in reader:
lst.append(d)
return lst
itea = load_itea()
IMPORTANT: look at the dataset by yourself !
Here we show only first 5 rows, but to get a clear picture of the dataset you need to study it a bit by yourself
[3]:
itea[:5]
[3]:
[OrderedDict([('Tipologia', 'ALTRO'),
('Proprietà', 'ITEA'),
('Indirizzo', "Codice unita': 30100049"),
('Frazione', ''),
('Comune', "BASELGA DI PINE'")]),
OrderedDict([('Tipologia', 'ALLOGGIO'),
('Proprietà', 'ITEA'),
('Indirizzo', "Codice unita': 43100011"),
('Frazione', ''),
('Comune', 'TRENTO')]),
OrderedDict([('Tipologia', 'ALLOGGIO'),
('Proprietà', 'ITEA'),
('Indirizzo', "Codice unita': 43100002"),
('Frazione', ''),
('Comune', 'TRENTO')]),
OrderedDict([('Tipologia', 'ALLOGGIO'),
('Proprietà', 'ITEA'),
('Indirizzo', 'VIALE DELLE ROBINIE 26'),
('Frazione', ''),
('Comune', 'TRENTO')]),
OrderedDict([('Tipologia', 'ALLOGGIO'),
('Proprietà', 'ITEA'),
('Indirizzo', 'VIALE DELLE ROBINIE 26'),
('Frazione', ''),
('Comune', 'TRENTO')])]
A1.1 calc_types_hist¶
Implement function calc_types_hist
to extract the types ('Tipologia'
) of ITEA real estate and RETURN a histogram which associates to each type its frequency.
You will discover there are three types of apartments: ‘ALLOGGIO’, ‘ALLOGGIO DUPLEX’ and ‘ALLOGGIO MONOLOCALE’. In the resulting histogram you must place only the key ‘ALLOGGIO’ which will be the sum of all of them.
Same goes for ‘POSTO MACCHINA’ (parking lot): there are many of them ( ‘POSTO MACCHINA COMUNE ESTERNO’, ‘POSTO MACCHINA COMUNE INTERNO’, ‘POSTO MACCHINA ESTERNO’, ‘POSTO MACCHINA INTERNO’, ‘POSTO MACCHINA SOTTO TETTOIA’) but we only want to see ‘POSTO MACCHINA’ as key with the sum of all of them. NOTE: Please don’t use 5 ifs, try to come up with some generic code to catch all these cases ..)
[4]:
def calc_types_hist(db):
#jupman-raise
tipologie = {}
for diz in db:
if diz['Tipologia'].startswith('ALLOGGIO'):
chiave = 'ALLOGGIO'
elif diz['Tipologia'].startswith('POSTO MACCHINA'):
chiave = 'POSTO MACCHINA'
else:
chiave = diz['Tipologia']
if chiave in tipologie:
tipologie[chiave] += 1
else:
tipologie[chiave] = 1
return tipologie
#/jupman-raise
calc_types_hist(itea)
[4]:
{'ALTRO': 64,
'ALLOGGIO': 10778,
'POSTO MACCHINA': 3147,
'MAGAZZINO': 143,
'CABINA ELETTRICA': 41,
'LOCALE COMUNE': 28,
'NEGOZIO': 139,
'CANTINA': 40,
'GARAGE': 2221,
'CENTRALE TERMICA': 4,
'UFFICIO': 29,
'TETTOIA': 2,
'ARCHIVIO ITEA': 10,
'SALA / ATTIVITA SOCIALI': 45,
'AREA URBANA': 6,
'ASILO': 1,
'CASERMA': 2,
'LABORATORIO PER ARTI E MESTIERI': 3,
'MUSEO': 1,
'SOFFITTA': 3,
'AMBULATORIO': 1,
'LEGNAIA': 3,
'RUDERE': 1}
A1.2 calc_types_series¶
Takes a dictionary histogram and RETURN a list of tuples containing key/value pairs, sorted from most frequent iyems to least frequent.
HINT: if you don’t remember how to sort by an element of a tuple, look at this example and also in python documentation about sorting.
[5]:
def calc_types_series(hist):
#jupman-raise
ret = []
for key in hist:
ret.append((key, hist[key]))
ret.sort(key=lambda c: c[1],reverse=True)
return ret[:10]
#/jupman-raise
tipologie = calc_types_series(calc_types_hist(itea))
tipologie
[5]:
[('ALLOGGIO', 10778),
('POSTO MACCHINA', 3147),
('GARAGE', 2221),
('MAGAZZINO', 143),
('NEGOZIO', 139),
('ALTRO', 64),
('SALA / ATTIVITA SOCIALI', 45),
('CABINA ELETTRICA', 41),
('CANTINA', 40),
('UFFICIO', 29)]
A1.3 Real estates plot¶
Once you obtained the series as above, plot the first 10 most frequent items, in decreasing order.
please pay attention to plot title, width and height, axis labels. Everything MUST display in a readable way.
try also to print nice the labels, if they are too long / overlap like for ‘SALA / ATTIVITA SOCIALI’ put carriage returns in a generic way.
[6]:
# write here
[7]:
# SOLUTION
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(len(tipologie))
xs_labels = [t[0].replace('/', '/\n') for t in tipologie]
ys = [t[1] for t in tipologie]
fig = plt.figure(figsize=(15,5))
plt.bar(xs, ys, 0.5, align='center')
plt.title("ITEA real estates SOLUTION")
plt.xticks(xs, xs_labels)
plt.xlabel('name')
plt.ylabel('quantity')
plt.show()

A2 Air quality¶
You will now analyze air_quality in Trentino. You are given a dataset which records various pollutants (‘Inquinante’) at various stations ('Stazione'
) in Trentino. Pollutants values can be 'PM10'
, 'Biossido Zolfo'
, and a few others. Each station records some set of pollutants. For each pollutant values are recorded ('Valore'
) 24 times per day.
Data provider: PAT Ag. Provinciale per la protezione dell’Ambiente - dati.trentino.it
A function load_air_quality
is given to load the dataset (you don’t need to implement it):
[8]:
def load_air_quality():
"""Loads file data and RETURN a list of dictionaries with the stop times
"""
import csv
with open('data/air-quality.csv', newline='', encoding='latin-1') as csvfile:
reader = csv.DictReader(csvfile)
lst = []
for d in reader:
lst.append(d)
return lst
air_quality = load_air_quality()
IMPORTANT 1: look at the dataset by yourself !
Here we show only first 5 rows, but to get a clear picture of the dataset you need to study it a bit by yourself
IMPORTANT 2: EVERY field is a STRING, including ‘Valore’ !
[9]:
air_quality[:5]
[9]:
[OrderedDict([('Stazione', 'Parco S. Chiara'),
('Inquinante', 'PM10'),
('Data', '2019-05-04'),
('Ora', '1'),
('Valore', '17'),
('Unità di misura', 'µg/mc')]),
OrderedDict([('Stazione', 'Parco S. Chiara'),
('Inquinante', 'PM10'),
('Data', '2019-05-04'),
('Ora', '2'),
('Valore', '19'),
('Unità di misura', 'µg/mc')]),
OrderedDict([('Stazione', 'Parco S. Chiara'),
('Inquinante', 'PM10'),
('Data', '2019-05-04'),
('Ora', '3'),
('Valore', '17'),
('Unità di misura', 'µg/mc')]),
OrderedDict([('Stazione', 'Parco S. Chiara'),
('Inquinante', 'PM10'),
('Data', '2019-05-04'),
('Ora', '4'),
('Valore', '15'),
('Unità di misura', 'µg/mc')]),
OrderedDict([('Stazione', 'Parco S. Chiara'),
('Inquinante', 'PM10'),
('Data', '2019-05-04'),
('Ora', '5'),
('Valore', '13'),
('Unità di misura', 'µg/mc')])]
Now implement the following function:
[10]:
def calc_avg_pollution(db):
""" RETURN a dictionary containing two elements tuples as keys:
- first tuple element is the station ('Stazione'),
- second tuple element is the name of a pollutant ('Inquinante')
To each tuple key, you must associate as value the average for that station
_and_ pollutant over all days.
"""
#jupman-raise
ret = {}
counts = {}
for diz in db:
t = (diz['Stazione'], diz['Inquinante'])
if t in ret:
ret[t] += float(diz['Valore'])
counts[t] += 1
else:
ret[t] = float(diz['Valore'])
counts[t] = 1
for t in ret:
ret[t] /= counts[t]
return ret
#/jupman-raise
calc_avg_pollution(air_quality)
[10]:
{('Parco S. Chiara', 'PM10'): 11.385752688172044,
('Parco S. Chiara', 'PM2.5'): 7.9471544715447155,
('Parco S. Chiara', 'Biossido di Azoto'): 20.828146143437078,
('Parco S. Chiara', 'Ozono'): 66.69541778975741,
('Parco S. Chiara', 'Biossido Zolfo'): 1.2918918918918918,
('Via Bolzano', 'PM10'): 12.526881720430108,
('Via Bolzano', 'Biossido di Azoto'): 29.28493894165536,
('Via Bolzano', 'Ossido di Carbonio'): 0.5964769647696474,
('Piana Rotaliana', 'PM10'): 9.728744939271255,
('Piana Rotaliana', 'Biossido di Azoto'): 15.170068027210885,
('Piana Rotaliana', 'Ozono'): 67.03633916554509,
('Rovereto', 'PM10'): 9.475806451612904,
('Rovereto', 'PM2.5'): 7.764784946236559,
('Rovereto', 'Biossido di Azoto'): 16.284167794316645,
('Rovereto', 'Ozono'): 70.54655870445345,
('Borgo Valsugana', 'PM10'): 11.819407008086253,
('Borgo Valsugana', 'PM2.5'): 7.413746630727763,
('Borgo Valsugana', 'Biossido di Azoto'): 15.73806275579809,
('Borgo Valsugana', 'Ozono'): 58.599730458221025,
('Riva del Garda', 'PM10'): 9.912398921832883,
('Riva del Garda', 'Biossido di Azoto'): 17.125845737483086,
('Riva del Garda', 'Ozono'): 68.38159675236807,
('A22 (Avio)', 'PM10'): 9.651821862348179,
('A22 (Avio)', 'Biossido di Azoto'): 33.0650406504065,
('A22 (Avio)', 'Ossido di Carbonio'): 0.4228848821081822,
('Monte Gaza', 'PM10'): 7.794520547945205,
('Monte Gaza', 'Biossido di Azoto'): 4.34412955465587,
('Monte Gaza', 'Ozono'): 99.0858310626703}
Part B¶
B1 Theory¶
Let L
be a list containing n
lists, each of them of size m
. Return the computational complexity of function fun()
with respect to n
and m
.
Write the solution in separate ``theory.txt`` file
def fun(L):
for r1 in L:
for r2 in L:
if r1 != r2 and sum(r1) == sum(r2):
print("Similar:")
print(r1)
print(r2)
ANSWER: \(\Theta(m \cdot n^2 )\)
B2 WStack¶
Using a text editor, open file stack_exercise.py
. You will find a WStack
class skeleton which represents a simple stack that can only contain integers.
B2.1 implement class WStack¶
Fill in missing methods in class WStack
in the order they are presented so to have a .weight()
method that returns the total sum of integers in the stack in O(1)
time.
Example:
[11]:
from stack_solution import *
[12]:
s = WStack()
[13]:
print(s)
WStack: weight=0 elements=[]
[14]:
s.push(7)
[15]:
print(s)
WStack: weight=7 elements=[7]
[16]:
s.push(4)
[17]:
print(s)
WStack: weight=11 elements=[7, 4]
[18]:
s.push(2)
[19]:
s.pop()
[19]:
2
[20]:
print(s)
WStack: weight=11 elements=[7, 4]
B2.2 accumulate¶
Implement function accumulate
:
def accumulate(stack1, stack2, min_amount):
""" Pushes on stack2 elements taken from stack1 until the weight of
stack2 is equal or exceeds the given min_amount
- if the given min_amount cannot possibly be reached because
stack1 has not enough weight, raises early ValueError without
changing stack1.
- DO NOT access internal fields of stacks, only use class methods.
- MUST perform in O(n) where n is the size of stack1
- NOTE: this function is defined *outside* the class !
"""
Testing: python -m unittest stacks_test.AccumulateTest
Example:
[21]:
s1 = WStack()
print(s1)
WStack: weight=0 elements=[]
[22]:
s1.push(2)
s1.push(9)
s1.push(5)
s1.push(3)
[23]:
print(s1)
WStack: weight=19 elements=[2, 9, 5, 3]
[24]:
s2 = WStack()
print(s2)
WStack: weight=0 elements=[]
[25]:
s2.push(1)
s2.push(7)
s2.push(4)
[26]:
print(s2)
WStack: weight=12 elements=[1, 7, 4]
[27]:
# attempts to reach in s2 a weight of at least 17
[28]:
accumulate(s1,s2,17)
[29]:
print(s1)
WStack: weight=11 elements=[2, 9]
Two top elements were taken from s1 and now s2 has a weight of 20, which is >= 17
[30]:
print(s2)
WStack: weight=20 elements=[1, 7, 4, 3, 5]
B3 GenericTree¶
Open file tree.py
in a text editor and read following instructions.
B3.1 is_triangle¶
A triangle is a node which has exactly two children.
Let’s see some example:
a
/ \
/ \
b ----- c
/|\ /
d-e-f g
/ \
h---i
/
l
The tree above can also be represented like this:
a
├b
|├d
|├e
|└f
└c
└g
├h
└i
└l
node
a
is a triangle because has exactly two childrenb
andc
, note it doesn’t matter ifb
orc
have children)b
is not a triangle (has 3 children)c
andi
are not triangles (have only 1 child)g
is a triangle as it has exactly two childrenh
andi
d
,e
,f
,h
andl
are not triangles, because they have zero children
Now implement this method:
def is_triangle(self, elems):
""" RETURN True if this node is a triangle matching the data
given by list elems.
In order to match:
- first list item must be equal to this node data
- second list item must be equal to this node first child data
- third list item must be equal to this node second child data
- if elems has less than three elements, raises ValueError
"""
Testing: python -m unittest tree_test.IsTriangleTest
Examples:
[31]:
from tree_test import gt
[32]:
# this is the tree from the example above
tb = gt('b', gt('d', gt('e'), gt('f')))
tg = gt('g', gt('h'), gt('i', gt('l')))
ta = gt('a', tb, gt('c', tg))
ta.is_triangle(['a','b','c'])
[32]:
True
[33]:
ta.is_triangle(['b','c','a'])
[33]:
False
[34]:
tb.is_triangle(['b','d','e'])
[34]:
False
[35]:
tg.is_triangle(['g','h','i'])
[35]:
True
[36]:
tg.is_triangle(['g','i','h'])
[36]:
False
B3.2 has_triangle¶
Implement this method:
def has_triangle(self, elems):
""" RETURN True if this node *or one of its descendants* is a triangle
matching given elems. Otherwise, return False.
- a recursive solution is acceptable
"""
Testing: python -m unittest tree_test.HasTriangleTest
Examples:
[37]:
# example tree seen at the beginning
tb = gt('b', gt('d', gt('e'), gt('f')))
tg = gt('g', gt('h'), gt('i', gt('l')))
tc = gt('c', tg)
ta = gt('a', tb, tc)
ta.has_triangle(['a','b','c'])
[37]:
True
[38]:
ta.has_triangle(['a','c','b'])
[38]:
False
[39]:
ta.has_triangle(['b','c','a'])
[39]:
False
[40]:
tb.is_triangle(['b','d','e'])
[40]:
False
[41]:
tg.has_triangle(['g','h','i'])
[41]:
True
[42]:
tc.has_triangle(['g','h','i']) # check recursion
[42]:
True
[43]:
ta.has_triangle(['g','h','i']) # check recursion
[43]:
True
Exam - Tue 02, July 2019 - solutions¶
Scientific Programming - Data Science Master @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-07-02-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-07-02-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-other stuff ...
|-exams
|-2019-07-02
|- exam-2019-07-02-exercise.ipynb
|- theory.txt
|- linked_sort_exercise.py
|- linked_sort_test.py
|- stacktris_exercise.py
|- stacktris_test.py
Rename
datasciprolab-2019-07-02-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-07-02-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A¶
Open Jupyter and start editing this notebook exam-2019-07-02-exercise.ipynb
A1 Botteghe storiche¶
You will work on the dataset of _Botteghe storiche del Trentino” (small shops, workshops of Trentino)
Data provider: Provincia Autonoma di Trento - dati.trentino.it
A function load_botteghe
is given to load the dataset (you don’t need to implement it):
[2]:
def load_botteghe():
"""Loads file data and RETURN a list of dictionaries with the botteghe dati
"""
import csv
with open('data/botteghe.csv', newline='', encoding='utf-8',) as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
lst = []
for d in reader:
lst.append(d)
return lst
botteghe = load_botteghe()
IMPORTANT: look at the dataset !
Here we show only first 5 rows, but to get a clear picture of the dataset you should explore it further.
[3]:
botteghe[:5]
[3]:
[OrderedDict([('Numero', '1'),
('Insegna', 'BAZZANELLA RENATA'),
('Indirizzo', 'Via del Lagorai'),
('Civico', '30'),
('Comune', 'Sover'),
('Cap', '38068'),
('Frazione/Località', 'Piscine di Sover'),
('Note', 'generi misti, bar - ristorante')]),
OrderedDict([('Numero', '2'),
('Insegna', 'CONFEZIONI MONTIBELLER S.R.L.'),
('Indirizzo', 'Corso Ausugum'),
('Civico', '48'),
('Comune', 'Borgo Valsugana'),
('Cap', '38051'),
('Frazione/Località', ''),
('Note', 'esercizio commerciale')]),
OrderedDict([('Numero', '3'),
('Insegna', 'FOTOGRAFICA TRINTINAGLIA UMBERTO S.N.C.'),
('Indirizzo', 'Largo Dordi'),
('Civico', '8'),
('Comune', 'Borgo Valsugana'),
('Cap', '38051'),
('Frazione/Località', ''),
('Note', 'esercizio commerciale, attività artigianale')]),
OrderedDict([('Numero', '4'),
('Insegna', 'BAR SERAFINI DI MINATI RENZO'),
('Indirizzo', ''),
('Civico', '24'),
('Comune', 'Grigno'),
('Cap', '38055'),
('Frazione/Località', 'Serafini'),
('Note', 'esercizio commerciale')]),
OrderedDict([('Numero', '6'),
('Insegna', 'SEMBENINI GINO & FIGLI S.R.L.'),
('Indirizzo', 'Via S. Francesco'),
('Civico', '35'),
('Comune', 'Riva del Garda'),
('Cap', '38066'),
('Frazione/Località', ''),
('Note', '')])]
We would like to know which different categories of bottega there are, and count them. Unfortunately, there is no specific field for Categoria, so we will need to extract this information from other fields such as Insegna
and Note
. For example, this Insegna
contains the category BAR
, while the Note
(commercial enterprise) is a bit too generic to be useful:
'Insegna': 'BAR SERAFINI DI MINATI RENZO',
'Note': 'esercizio commerciale',
while this other Insegna
contains just the owner name and Note
holds both the categories bar
and ristorante
:
'Insegna': 'BAZZANELLA RENATA',
'Note': 'generi misti, bar - ristorante',
As you see, data is non uniform:
sometimes the category is in the
Insegna
sometimes is in the
Note
sometimes is in both
sometimes is lowercase
sometimes is uppercase
sometimes is single
sometimes is multiple (
bar - ristorante
)
First we want to extract all categories we can find, and rank them according their frequency, from most frequent to least frequent.
To do so, you need to
count all words you can find in both
Insegna
andNote
fields, and sort them. Note you need to normalize the uppercase.consider a category relevant if it is present at least 11 times in the dataset.
filter non relevant words: some words like prepositions, type of company (
'S.N.C'
,S.R.L.
, ..), etc will appear a lot, and will need to be ignored. To detect them, you are given a list calledstopwords
.
NOTE: the rules above do not actually extract all the categories, for the sake of the exercise we only keep the most frequent ones.
A1.1 rank_categories¶
[4]:
def rank_categories(db, stopwords):
#jupman-raise
ret = {}
for diz in db:
parole = diz['Insegna'].split(" ") + diz['Note'].upper().split(" ")
for parola in parole:
if parola in ret and not parola in stopwords:
ret[parola] += 1
else:
ret[parola] = 1
return sorted([(key, val) for key,val in ret.items() if val > 10], key=lambda c: c[1], reverse=True)
#/jupman-raise
stopwords = ['',
'S.N.C.', 'SNC','S.A.S.', 'S.R.L.', 'S.C.A.R.L.', 'SCARL','S.A.S', 'COMMERCIALE','FAMIGLIA','COOPERATIVA',
'-', '&', 'C.', 'ESERCIZIO',
'IL', 'DE', 'DI','A', 'DA', 'E', 'LA', 'AL', 'DEL', 'ALLA', ]
categories = rank_categories(botteghe, stopwords)
categories
[4]:
[('BAR', 191),
('RISTORANTE', 150),
('HOTEL', 67),
('ALBERGO', 64),
('MACELLERIA', 27),
('PANIFICIO', 22),
('CALZATURE', 21),
('FARMACIA', 21),
('ALIMENTARI', 20),
('PIZZERIA', 16),
('SPORT', 16),
('TABACCHI', 12),
('FERRAMENTA', 12),
('BAZAR', 11)]
A1.2 plot¶
Now plot the 10 most frequent categories. Please pay attention to plot title, width and height, axis labels. Everything MUST display in a readable way.
[5]:
# write here
[6]:
# SOLUTION
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
cats = categories[:10]
xs = np.arange(len(cats))
xs_labels = [t[0] for t in cats]
ys = [t[1] for t in cats]
fig = plt.figure(figsize=(15,5))
plt.bar(xs, ys, 0.5, align='center')
plt.title("Categorie botteghe storiche SOLUTION")
plt.xticks(xs, xs_labels)
plt.xlabel('name')
plt.ylabel('frequency')
plt.show()

A1.3 enrich¶
Once you found the categories, implement function enrich
, which takes the db and previously computed categories, and RETURN a NEW DB where the dictionaries are enriched with a new field Categorie
, which holds a list of the categories a particular bottega belongs to.
[7]:
def enrich(db, categories):
#jupman-raise
ret = []
for diz in db:
new_diz = {key:val for key,val in diz.items()}
new_diz['Categorie'] = []
for cat in categories:
if cat[0] in diz['Insegna'].upper() or cat[0] in diz['Note'].upper():
new_diz['Categorie'].append(cat[0])
ret.append(new_diz)
return ret
#/jupman-raise
new_db = enrich(botteghe, rank_categories(botteghe, stopwords))
new_db[:6] #NOTE here we only show a sample
[7]:
[{'Numero': '1',
'Insegna': 'BAZZANELLA RENATA',
'Indirizzo': 'Via del Lagorai',
'Civico': '30',
'Comune': 'Sover',
'Cap': '38068',
'Frazione/Località': 'Piscine di Sover',
'Note': 'generi misti, bar - ristorante',
'Categorie': ['BAR', 'RISTORANTE']},
{'Numero': '2',
'Insegna': 'CONFEZIONI MONTIBELLER S.R.L.',
'Indirizzo': 'Corso Ausugum',
'Civico': '48',
'Comune': 'Borgo Valsugana',
'Cap': '38051',
'Frazione/Località': '',
'Note': 'esercizio commerciale',
'Categorie': []},
{'Numero': '3',
'Insegna': 'FOTOGRAFICA TRINTINAGLIA UMBERTO S.N.C.',
'Indirizzo': 'Largo Dordi',
'Civico': '8',
'Comune': 'Borgo Valsugana',
'Cap': '38051',
'Frazione/Località': '',
'Note': 'esercizio commerciale, attività artigianale',
'Categorie': []},
{'Numero': '4',
'Insegna': 'BAR SERAFINI DI MINATI RENZO',
'Indirizzo': '',
'Civico': '24',
'Comune': 'Grigno',
'Cap': '38055',
'Frazione/Località': 'Serafini',
'Note': 'esercizio commerciale',
'Categorie': ['BAR']},
{'Numero': '6',
'Insegna': 'SEMBENINI GINO & FIGLI S.R.L.',
'Indirizzo': 'Via S. Francesco',
'Civico': '35',
'Comune': 'Riva del Garda',
'Cap': '38066',
'Frazione/Località': '',
'Note': '',
'Categorie': []},
{'Numero': '7',
'Insegna': 'HOTEL RISTORANTE PIZZERIA “ALLA NAVE”',
'Indirizzo': 'Via Nazionale',
'Civico': '29',
'Comune': 'Lavis',
'Cap': '38015',
'Frazione/Località': 'Nave San Felice',
'Note': '',
'Categorie': ['RISTORANTE', 'HOTEL', 'PIZZERIA']}]
A2 dump¶
The multinational ToxiCorp wants to hire you for devising an automated truck driver which will deposit highly contaminated waste in the illegal dumps they own worldwide. You find it ethically questionable, but they pay well, so you accept.
A dump is modelled as a rectangular region of dimensions nrow
and ncol
, implemented as a list of lists matrix. Every cell i
, j
contains the tons of waste present, and can contain at most 7
tons of waste.
The dumpster truck will transport q
tons of waste, and try to fill the dump by depositing waste in the first row, filling each cell up to 7 tons. When the first row is filled, it will proceed to the second one from the left , then to the third one again from the left until there is no waste to dispose of.
Function dump(m, q)
takes as input the dump mat
and the number of tons q
to dispose of, and RETURN a NEW list representing a plan with the sequence of tons to dispose. If waste to dispose exceeds dump capacity, raises ValueError
.
NOTE: the function does not modify the matrix
Example:
m = [
[5,4,6],
[4,7,1],
[3,2,6],
[3,6,2],
]
dump(m, 22)
[2, 3, 1, 3, 0, 6, 4, 3]
For first row we dispose of 2,3,1 tons in three cells, for second row we dispose of 3,0,6 tons in three cells, for third row we only dispose 4, 3 tons in two cells as limit q=22 is reached.
[8]:
def dump(mat, q):
#jupman-raise
rem = q
ret = []
for riga in mat:
for j in range(len(riga)):
cellfill = 7 - riga[j]
unload = min(cellfill, rem)
rem -= unload
if rem > 0:
ret.append(unload)
else:
if unload > 0:
ret.append(unload)
return ret
if rem > 0:
raise ValueError("Couldn't fill the dump, %s tons remain!")
#/jupman-raise
m1 = [
[5]
]
assert dump(m1,0) == [] # nothing to dump
m2 = [
[4]
]
assert dump(m2,2) == [2]
m3 = [
[5,4]
]
assert dump(m3,3) == [2, 1]
m3 = [
[5,7,3]
]
assert dump(m3,3) == [2, 0, 1]
m5 = [
[2,5], # 5 2
[4,3] # 3 1
]
assert dump(m5,11) == [5,2,3,1]
m6 = [ # tons to dump in each cell
[5,4,6], # 2 3 1
[4,7,1], # 3 0 6
[3,2,6], # 4 3 0
[3,6,2], # 0 0 0
]
assert dump(m6, 22) == [2,3,1,3,0,6,4,3]
try:
dump ([[5]], 10)
raise Exception("Should have failed !")
except ValueError:
pass
Part B¶
B1 Theory¶
Write the solution in separate ``theory.txt`` file
Let L1
and L2
be two lists containing n
lists, each of them of size n
. Compute the computational complexity of function fun()
with respect to n
.
def fun(L1,L2):
for r1 in L1:
for val in r1:
for r2 in L2:
if val = sum(r2):
print(val)
ANSWER: $:nbsphinx-math:`Theta`(n^4) $
B2 Linked List sorting¶
Open a text editor and edit file linked_sort_exercise.py
B2.1 bubble_sort¶
You will implement bubble sort on a LinkedList
.
def bubble_sort(self):
""" Sorts in-place this linked list using the method of bubble sort
- MUST execute in O(n^2) where n is the length of the linked list
"""
Testing: python3 -m unittest linked_sort_test.BubbleSortTest
As a reference, you can look at this example_bubble
implementation below that operates on regular python lists. Basically, you will have to translate the for
cycles into two suitable while
and use node pointers.
NOTE: this version of the algorithm is inefficient as we do not use j
in the inner loop: your linked list implementation can have this inefficiency as well.
[9]:
def example_bubble(plist):
for j in range(len(plist)):
for i in range(len(plist)):
if i + 1 < len(plist) and plist[i]>plist[i+1]:
temp = plist[i]
plist[i] = plist[i+1]
plist[i+1] = temp
my_list = [23, 34, 55, 32, 7777, 98, 3, 2, 1]
example_bubble(my_list)
print(my_list)
[1, 2, 3, 23, 32, 34, 55, 98, 7777]
B2.2 merge¶
Implement this method:
def merge(self,l2):
""" Assumes this linkedlist and l2 linkedlist contain integer numbers
sorted in ASCENDING order, and RETURN a NEW LinkedList with
all the numbers from this and l2 sorted in DESCENDING order
IMPORTANT 1: *MUST* EXECUTE IN O(n1+n2) TIME where n1 and n2 are
the sizes of this and l2 linked_list, respectively
IMPORTANT 2: *DO NOT* attempt to convert linked lists to
python lists!
"""
Testing: python3 -m unittest linked_sort_test.MergeTest
B3 Stacktris¶
Open a text editor and edit file stacktris_exercise.py
A Stacktris
is a data structure that operates like the famous game Tetris, with some restrictions:
Falling pieces can be either of length 1 or 2. We call them
1-block
and2-block
respectivelyThe pit has a fixed width of 3 columns
2-block
s can only be in horizontal
We print a Stacktris
like this:
\ j 012
i
4 | 11| # two 1-block
3 | 22| # one 2-block
2 | 1 | # one 1-block
1 |22 | # one 2-block
0 |1 1| # on the ground there are two 1-block
In Python, we model the Stacktris
as a class holding in the variable _stack
a list of lists of integers, which models the pit:
class Stacktris:
def __init__(self):
""" Creates a Stacktris
"""
self._stack = []
So in the situation above the _stack
variable would look like this (notice row order is inverted with respect to the print)
[
[1,0,1],
[2,2,0],
[0,1,0],
[0,2,2],
[0,1,1],
]
The class has three methods of interest which you will implement, drop1(j)
, drop2h(j)
and _shorten
Example
Let’s see an example:
[10]:
from stacktris_solution import *
st = Stacktris()
At the beginning the pit is empty:
[11]:
st
[11]:
Stacktris:
EMPTY
We can start by dropping from the ceiling a block of dimension 1 into the last column at index j=2
. By doing so, a new row will be created, and will be a list containing the numbers [0,0,1]
IMPORTANT: zeroes are not displayed
[12]:
st.drop1(2)
DEBUG: Stacktris:
| 1|
[12]:
[]
Now we drop an horizontal block of dimension 2 (a 2-block
) having the leftmost block at column j=1
. Since below in the pit there is already the 1
block we previosly put, the new block will fall and stay upon it. Internally, we will add a new row as a python list containing the numbers [0,2,2]
[13]:
st.drop2h(1)
DEBUG: Stacktris:
| 22|
| 1|
[13]:
[]
We see the zeroth column is empty, so if we drop there a 1-block
it will fall to the ground. Internally, the zeroth list will become [1,0,1]
:
[14]:
st.drop1(0)
DEBUG: Stacktris:
| 22|
|1 1|
[14]:
[]
Now we drop again a 2-block
at column j=2
, on top of the previously laid one. This will add a new row as list [0,2,2]
.
[15]:
st.drop2h(1)
DEBUG: Stacktris:
| 22|
| 22|
|1 1|
[15]:
[]
In the game Tetris, when a row becomes completely filled it disappears. So if we drop a 1-block
to the leftmost column, the mid line should be removed.
NOTE: The messages on the console are just debug print, the function drop1
only returns the extracted line [1,2,2]
:
[16]:
st.drop1(0)
DEBUG: Stacktris:
| 22|
|122|
|1 1|
DEBUG: POPPING [1, 2, 2]
DEBUG: Stacktris:
| 22|
|1 1|
[16]:
[1, 2, 2]
Now we insert another 2-block
starting at j=0
. It will fall upon the previously laid one:
[17]:
st.drop2h(0)
DEBUG: Stacktris:
|22 |
| 22|
|1 1|
[17]:
[]
We can complete teh topmost row by dropping a 1-block
to the rightmost column. As a result, the row will be removed from the stack and the row will be returned by the call to drop1
:
[18]:
st.drop1(2)
DEBUG: Stacktris:
|221|
| 22|
|1 1|
DEBUG: POPPING [2, 2, 1]
DEBUG: Stacktris:
| 22|
|1 1|
[18]:
[2, 2, 1]
Another line completion with a drop1
at column j=0
:
[19]:
st.drop1(0)
DEBUG: Stacktris:
|122|
|1 1|
DEBUG: POPPING [1, 2, 2]
DEBUG: Stacktris:
|1 1|
[19]:
[1, 2, 2]
We can finally empty the Stacktris by dropping a 1-block
in the mod column:
[20]:
st.drop1(1)
DEBUG: Stacktris:
|111|
DEBUG: POPPING [1, 1, 1]
DEBUG: Stacktris:
EMPTY
[20]:
[1, 1, 1]
B3.1 _shorten¶
Start by implementing this private method:
def _shorten(self):
""" Scans the Stacktris from top to bottom searching for a completely filled line:
- if found, remove it from the Stacktris and return it as a list.
- if not found, return an empty list.
"""
If you wish, you can add debug prints but they are not mandatory
Testing: python3 -m unittest stacktris_test.ShortenTest
B3.2 drop1¶
Once you are done with the previous function, implement drop1
method:
NOTE: In the implementation, feel free to call the previously implemented _shorten
method.
def drop1(self, j):
""" Drops a 1-block on column j.
- If another block is found, place the 1-block on top of that block,
otherwise place it on the ground.
- If, after the 1-block is placed, a row results completely filled, removes
the row and RETURN it. Otherwise, RETURN an empty list.
- if index `j` is outside bounds, raises ValueError
"""
Testing: python3 -m unittest stacktris_test.Drop1Test
B3.3 drop2h¶
Once you are done with the previous function, implement drop2
method:
def drop2h(self, j):
""" Drops a 2-block horizontally with left block on column j,
- If another block is found, place the 2-block on top of that block,
otherwise place it on the ground.
- If, after the 2-block is placed, a row results completely filled,
removes the row and RETURN it. Otherwise, RETURN an empty list.
- if index `j` is outside bounds, raises ValueError
"""
Testing: python3 -m unittest stacktris_test.Drop2hTest
[ ]:
Exam - Mon 26, August 2019 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-08-26-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-08-26-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-exams
|-2019-08-26
|- exam-2019-08-26-exercise.ipynb
|- theory.txt
|- backpack_exercise.py
|- backpack_test.py
|- concert_exercise.py
|- concert_test.py
Rename
datasciprolab-2019-08-26-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-08-26-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A - University of Trento staff¶
Open Jupyter and start editing this notebook exam-2019-08-26-exercise.ipynb
You will work on the dataset of University of Trento staff, modified so not to contain names or surnames.
Data provider: University of Trento
A function load_data
is given to load the dataset (you don’t need to implement it):
[1]:
import json
def load_data():
with open('data/2019-06-30-persone-en-stripped.json', encoding='utf-8') as json_file:
data = json.load(json_file)
return data
unitn = load_data()
IMPORTANT: look at the dataset !
Here we show only first 2 rows, but to get a clear picture of the dataset you should explore it further.
The dataset contains a list of employees, each of whom may have one or more positions, in one or more university units. Each unit is identified by a code like STO0000435
:
[2]:
unitn[:2]
[2]:
[{'givenName': 'NAME-1',
'phone': ['0461 283752'],
'identifier': 'eb9139509dc40d199b6864399b7e805c',
'familyName': 'SURNAME-1',
'positions': [{'unitIdentifier': 'STO0008929',
'role': 'Staff',
'unitName': 'Student Support Service: Economics, Law and International Studies'}]},
{'givenName': 'NAME-2',
'phone': ['0461 281521'],
'identifier': 'b6292ffe77167b31e856d2984544e45b',
'familyName': 'SURNAME-2',
'positions': [{'unitIdentifier': 'STO0000435',
'role': 'Associate professor',
'unitName': 'Doctoral programme – Physics'},
{'unitIdentifier': 'STO0000435',
'role': 'Deputy coordinator',
'unitName': 'Doctoral programme – Physics'},
{'unitIdentifier': 'STO0008627',
'role': 'Associate professor',
'unitName': 'Department of Physics'}]}]
Department names can be very long, so when you need to display them you can use the function this abbreviate
.
NOTE: function is already fully implemented, do not modify it.
[3]:
def abbreviate(unitName):
abbreviations = {
"Department of Psychology and Cognitive Science": "COGSCI",
"Center for Mind/Brain Sciences - CIMeC":"CIMeC",
"Department of Civil, Environmental and Mechanical Engineering":"DICAM",
"Centre Agriculture Food Environment - C3A":"C3A",
"School of International Studies - SIS":"SIS",
"Department of Sociology and social research": "Sociology",
"Faculty of Law": "Law",
"Department of Economics and Management": "Economics",
"Department of Information Engineering and Computer Science":"DISI",
"Department of Cellular, Computational and Integrative Biology - CIBIO":"CIBIO",
"Department of Industrial Engineering":"DII"
}
if unitName in abbreviations:
return abbreviations[unitName]
else:
return unitName.replace("Department of ", "")
Example:
[4]:
abbreviate("Department of Information Engineering and Computer Science")
[4]:
'DISI'
A1 calc_uid_to_abbr¶
✪ It will be useful having a map from department ids to their abbreviations, if they are actually present, otherwise to their original name. To implement this, you can use the previously defined function abbreviate
.
{
.
.
'STO0008629': 'DISI',
'STO0008630': 'Sociology',
'STO0008631': 'COGSCI',
.
.
'STO0012897': 'Institutional Relations and Strategic Documents',
.
.
}
[5]:
def calc_uid_to_abbr(db):
#jupman-raise
ret = {}
for person in db:
for position in person['positions']:
uid = position['unitIdentifier']
ret[uid] = abbreviate(position['unitName'])
return ret
#/jupman-raise
#calc_uid_to_abbr(unitn)
print(calc_uid_to_abbr(unitn)['STO0008629'])
print(calc_uid_to_abbr(unitn)['STO0012897'])
DISI
Institutional Relations and Strategic Documents
A2.1 calc_prof_roles¶
✪✪ For each department, we want to see how many professor roles are covered, sorting them from greatest to lowest. In returned list we will only put the 10 department with most roles.
NOTE 1: we are interested in roles covered. Don’t care if actual people might be less (one person can cover more professor roles within the same unit)
NOTE 2: there are several professor roles. Please avoid listing all roles in the code (“Senior Professor’, “Visiting Professor”, ….), and prefer using some smarter way to match them.
[6]:
def calc_prof_roles(db):
#jupman-raise
hist = {}
uid_to_abbr = calc_uid_to_abbr(db)
for person in db:
for position in person['positions']:
role = position['role']
uid = position['unitIdentifier']
if 'professor'.lower() in role.lower():
if uid in hist:
hist[uid] += 1
else:
hist[uid] = 1
ret = [(uid_to_abbr[x[0]],x[1]) for x in hist.items()]
ret.sort(key=lambda c: c[1], reverse=True)
return ret[:10]
#/jupman-raise
#calc_prof_roles(unitn)
[7]:
# EXPECTED RESULT
calc_prof_roles(unitn)
[7]:
[('Humanities', 92),
('DICAM', 85),
('Law', 84),
('Economics', 83),
('Sociology', 66),
('COGSCI', 61),
('Physics', 60),
('DISI', 55),
('DII', 49),
('Mathematics', 47)]
A2.2 plot_profs¶
✪ Write a funciton to plot a bar chart of data calculated above
[8]:
%matplotlib inline
import matplotlib.pyplot as plt
def plot_profs(db):
#jupman-raise
prof_roles = calc_prof_roles(db)
xs = list(range(len(prof_roles)))
xticks = [p[0] for p in prof_roles]
ys = [p[1] for p in prof_roles]
fig = plt.figure(figsize=(20,3))
plt.bar(xs, ys, 0.5, align='center')
plt.title("Professor roles per department SOLUTION")
plt.xticks(xs, xticks)
plt.xlabel('departments')
plt.ylabel('professor roles')
plt.show()
#/jupman-raise
#plot_profs(unitn)
[9]:
# EXPECTED RESULT
plot_profs(unitn)

A3.1 calc_roles¶
✪✪ We want to calculate how many roles are covered for each department.
You will group roles by these macro groups (some already exist, some are new):
Professor : “Senior Professor’, “Visiting Professor”, …
Research : “Senior researcher”, “Research collaborator”, …
Teaching : “Teaching assistant”, “Teaching fellow”, …
Guest : “Guest”, …
and discard all the others (there are many, like “Rector”, “Head”, etc ..)
NOTE: Please avoid listing all roles in the code (“Senior researcher”, “Research collaborator”, …), and prefer using some smarter way to match them.
[10]:
def calc_roles(db):
#jupman-raise
ret = {}
for person in db:
for position in person['positions']:
uid = position['unitIdentifier']
role = position['role']
grouped_role = None
if "professor" in role.lower():
grouped_role = 'Professor'
elif "research" in role.lower():
grouped_role = 'Research'
elif "teaching" in role.lower():
grouped_role = 'Teaching'
elif "guest" in role.lower():
grouped_role = 'Guest'
if grouped_role:
if uid in ret:
if grouped_role in ret[uid]:
ret[uid][grouped_role] += 1
else:
ret[uid][grouped_role] = 1
else:
diz = {}
diz[grouped_role] = 1
ret[uid] = diz
return ret
#/jupman-raise
#print(calc_roles(unitn)['STO0000001'])
#print(calc_roles(unitn)['STO0000006'])
#print(calc_roles(unitn)['STO0000012'])
#print(calc_roles(unitn)['STO0008629'])
EXPECTED RESULT - Showing just first ones …
>>> calc_roles(unitn)
{
'STO0000001': {'Teaching': 9, 'Research': 3, 'Professor': 12},
'STO0000006': {'Professor': 1},
'STO0000012': {'Guest': 3},
'STO0008629': {'Teaching': 94, 'Research': 71, 'Professor': 55, 'Guest': 38}
.
.
.
}
A3.2 plot_roles¶
✪✪ Implement a function plot_roles
that given, the abbreviations (or long names) of some departments, plots pie charts of their grouped role distribution, all in one row.
NOTE 1: different plots MUST show equal groups with equal colors
NOTE 2: always show all the 4 macro groups defined before, even if they have zero frequency
For on example on how to plot the pie charts, see this
For on example on plotting side by side, see this
[11]:
%matplotlib inline
import matplotlib.pyplot as plt
def plot_roles(db, abbrs):
#jupman-raise
fig = plt.figure(figsize=(15,4))
uid_to_abbr = calc_uid_to_abbr(db)
for i in range(len(abbrs)):
abbr = abbrs[i]
roles = calc_roles(db)
uid = None
for key in uid_to_abbr:
if uid_to_abbr[key] == abbr:
uid = key
labels = ['Professor', 'Guest', 'Teaching', 'Research']
fracs = []
for role in labels:
if role in roles[uid]:
fracs.append(roles[uid][role])
else:
fracs.append(0)
plt.subplot(1, # rows
len(abbrs), # columns
i+1) # plotting in first cell
plt.pie(fracs, labels=labels, autopct='%1.1f%%', shadow=True)
plt.title(abbr )
#/jupman-raise
#plot_roles(unitn, ['DISI','Sociology', 'COGSCI'])
[12]:
# EXPECTED RESULT
plot_roles(unitn, ['DISI','Sociology', 'COGSCI'])

Part B¶
B1 Theory¶
Write the solution in separate ``theory.txt`` file
Let M
be a square matrix - a list containing n lists, each of them of size n
. Return the computational complexity of function fun()
with respect to n
:
def fun(M):
for row in M:
for element in row:
print(sum([x for x in row if x != element]))
ANSWER: \(O(n^3)\)
B2 Backpack¶
Open a text editor and edit file backpack_solution.py
We can model a backpack as stack of elements, each being a tuple with a name and a weight.
A sensible strategy to fill a backpack is to place heaviest elements to the bottom, so our backback will allow pushing an element only if that element weight is equal or lesser than current topmost element weight.
The backpack has also a maximum weight: you can put any number of items you want, as long as its maximum weight is not exceeded.
Example
[17]:
from backpack_solution import *
bp = Backpack(30) # max_weight = 30
bp.push('a',10) # item 'a' with weight 10
DEBUG: Pushing (a,10)
[18]:
print(bp)
Backpack: weight=10 max_weight=30
elements=[('a', 10)]
[19]:
bp.push('b',8)
DEBUG: Pushing (b,8)
[20]:
print(bp)
Backpack: weight=18 max_weight=30
elements=[('a', 10), ('b', 8)]
>>> bp.push('c', 11)
DEBUG: Pushing (c,11)
ValueError: ('Pushing weight greater than top element weight! %s > %s', (11, 8))
[21]:
bp.push('c', 7)
DEBUG: Pushing (c,7)
[22]:
print(bp)
Backpack: weight=25 max_weight=30
elements=[('a', 10), ('b', 8), ('c', 7)]
>>> bp.push('d', 6)
DEBUG: Pushing (d,6)
ValueError: Can't exceed max_weight ! (31 > 30)
B2.1 class¶
✪✪ Implement methods in the class Backpack
, in the order they are shown. If you want, you can add debug prints by calling the debug
function
IMPORTANT: the data structure should provide the total current weight in O(1), so make sure to add and update an appropriate field to meet this constraint.
Testing: python3 -m unittest backpack_test.BackpackTest
B2.2 remove¶
✪✪ Implement function remove
:
# NOTE: this function is implemented *outside* the class !
def remove(backpack, el):
"""
Remove topmost occurrence of el found in the backpack,
and RETURN it (as a tuple name, weight)
- if el is not found, raises ValueError
- DO *NOT* ACCESS DIRECTLY FIELDS OF BACKPACK !!!
Instead, just call methods of the class!
- MUST perform in O(n), where n is the backpack size
- HINT: To remove el, you need to call Backpack.pop() until
the top element is what you are looking for. You need
to save somewhere the popped items except the one to
remove, and then push them back again.
"""
Testing: python3 -m unittest backpack_test.RemoveTest
Example:
[23]:
bp = Backpack(50)
bp.push('a',9)
bp.push('b',8)
bp.push('c',8)
bp.push('b',8)
bp.push('d',7)
bp.push('e',5)
bp.push('f',2)
DEBUG: Pushing (a,9)
DEBUG: Pushing (b,8)
DEBUG: Pushing (c,8)
DEBUG: Pushing (b,8)
DEBUG: Pushing (d,7)
DEBUG: Pushing (e,5)
DEBUG: Pushing (f,2)
[24]:
print(bp)
Backpack: weight=47 max_weight=50
elements=[('a', 9), ('b', 8), ('c', 8), ('b', 8), ('d', 7), ('e', 5), ('f', 2)]
[25]:
remove(bp, 'b')
DEBUG: Popping ('f', 2)
DEBUG: Popping ('e', 5)
DEBUG: Popping ('d', 7)
DEBUG: Popping ('b', 8)
DEBUG: Pushing (d,7)
DEBUG: Pushing (e,5)
DEBUG: Pushing (f,2)
[25]:
('b', 8)
[26]:
print(bp)
Backpack: weight=39 max_weight=50
elements=[('a', 9), ('b', 8), ('c', 8), ('d', 7), ('e', 5), ('f', 2)]
B.3 Concert¶
Start editing file concert_exercise.py
.
When there are events with lots of potential visitors such as concerts, to speed up check-in there are at least two queues: one for cash where tickets are sold, and one for the actual entrance at the event.
Each visitor may or may not have a ticket. Also, since people usually attend in groups (coupls, families, and so on), in the queue lines each group tends to move as a whole.
In Python, we will model a Person
as a class you can create like this:
[27]:
from concert_solution import *
[28]:
Person('a', 'x', False)
[28]:
Person(a,x,False)
a
is the name, 'x'
is the group, and False
indicates the person doesn’t have ticket
To model the two queues, in Concert
class we have these fields and methods:
class Concert:
def __init__(self):
self._cash = deque()
self._entrance = deque()
def enqc(self, person):
""" Enqueues at the cash from the right """
self._cash.append(person)
def enqe(self, person):
""" Enqueues at the entrance from the right """
self._entrance.append(person)
B3.1 dequeue¶
✪✪✪ Implement dequeue
. If you want, you can add debug prints by calling the debug
function.
def dequeue(self):
""" RETURN the names of people admitted to concert
Dequeuing for the whole queue system is done in groups, that is,
with a _single_ call to dequeue, these steps happen, in order:
1. entrance queue: all people belonging to the same group at
the front of entrance queue who have the ticket exit the queue
and are admitted to concert. People in the group without the
ticket are sent to cash.
2. cash queue: all people belonging to the same group at the front
of cash queue are given a ticket, and are queued at the entrance queue
"""
Testing: python3 -m unittest concert_test.DequeueTest
Example:
[29]:
con = Concert()
con.enqc(Person('a','x',False)) # a,b,c belong to same group x
con.enqc(Person('b','x',False))
con.enqc(Person('c','x',False))
con.enqc(Person('d','y',False)) # d belongs to group y
con.enqc(Person('e','z',False)) # e,f belongs to group z
con.enqc(Person('f','z',False))
con.enqc(Person('g','w',False)) # g belongs to group w
[30]:
con
[30]:
Concert:
cash: deque([Person(a,x,False),
Person(b,x,False),
Person(c,x,False),
Person(d,y,False),
Person(e,z,False),
Person(f,z,False),
Person(g,w,False)])
entrance: deque([])
First time we dequeue, entrance queue is empty so no one enters concert, while at the cash queue people in group x
are given a ticket and enqueued at the entrance queue
NOTE: The messages on the console are just debug print, the function dequeue
only return name sof people admitted to concert
[31]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: giving ticket to a (group x)
DEBUG: giving ticket to b (group x)
DEBUG: giving ticket to c (group x)
DEBUG: Concert:
cash: deque([Person(d,y,False),
Person(e,z,False),
Person(f,z,False),
Person(g,w,False)])
entrance: deque([Person(a,x,True),
Person(b,x,True),
Person(c,x,True)])
[31]:
[]
[32]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: a (group x) admitted to concert
DEBUG: b (group x) admitted to concert
DEBUG: c (group x) admitted to concert
DEBUG: giving ticket to d (group y)
DEBUG: Concert:
cash: deque([Person(e,z,False),
Person(f,z,False),
Person(g,w,False)])
entrance: deque([Person(d,y,True)])
[32]:
['a', 'b', 'c']
[33]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: d (group y) admitted to concert
DEBUG: giving ticket to e (group z)
DEBUG: giving ticket to f (group z)
DEBUG: Concert:
cash: deque([Person(g,w,False)])
entrance: deque([Person(e,z,True),
Person(f,z,True)])
[33]:
['d']
[34]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: e (group z) admitted to concert
DEBUG: f (group z) admitted to concert
DEBUG: giving ticket to g (group w)
DEBUG: Concert:
cash: deque([])
entrance: deque([Person(g,w,True)])
[34]:
['e', 'f']
[35]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: g (group w) admitted to concert
DEBUG: Concert:
cash: deque([])
entrance: deque([])
[35]:
['g']
[36]:
# calling dequeue on empty lines gives empty list:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: Concert:
cash: deque([])
entrance: deque([])
[36]:
[]
Special dequeue case: broken group¶
In the special case when there is a group at the entrance with one or more members without a ticket, it is assumed that the group gets broken, so whoever has the ticket enters and the others get enqueued at the cash.
[37]:
con = Concert()
con.enqe(Person('a','x',True))
con.enqe(Person('b','x',False))
con.enqe(Person('c','x',True))
con.enqc(Person('f','y',False))
con
[37]:
Concert:
cash: deque([Person(f,y,False)])
entrance: deque([Person(a,x,True),
Person(b,x,False),
Person(c,x,True)])
[38]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: a (group x) admitted to concert
DEBUG: b (group x) has no ticket! Sending to cash
DEBUG: c (group x) admitted to concert
DEBUG: giving ticket to f (group y)
DEBUG: Concert:
cash: deque([Person(b,x,False)])
entrance: deque([Person(f,y,True)])
[38]:
['a', 'c']
[39]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: f (group y) admitted to concert
DEBUG: giving ticket to b (group x)
DEBUG: Concert:
cash: deque([])
entrance: deque([Person(b,x,True)])
[39]:
['f']
[40]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: b (group x) admitted to concert
DEBUG: Concert:
cash: deque([])
entrance: deque([])
[40]:
['b']
[41]:
con
[41]:
Concert:
cash: deque([])
entrance: deque([])
[42]:
import sys;
sys.path.append('../../');
import jupman;
import backpack_solution
import backpack_test
backpack_solution.DEBUG = False
jupman.run(backpack_test)
import concert_solution
import concert_test
concert_solution.DEBUG = False
jupman.run(concert_test)
..................
----------------------------------------------------------------------
Ran 18 tests in 0.010s
OK
.......
----------------------------------------------------------------------
Ran 7 tests in 0.004s
OK
[ ]:
Midterm sim - Tue 31, October 2019 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
This is only a simulation. By participating to it, you gain nothing, and you lose nothing
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-10-31-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-08-26-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-exams
|-2019-10-31
|- exam-2019-10-31-exercise.ipynb
Rename
datasciprolab-2019-10-31-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-10-31-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A - offerte lavoro EURES¶
Open Jupyter and start editing this notebook exam-2019-10-31-exercise.ipynb
After exiting this university prison, you will look for a job and be shocked to discover in Europe a great variety of languages are spoken. Many job listings are provided by Eures portal, which is easily searchable with many fields on which you can filter. For this exercise we will use a test dataset which was generated just for a hackaton: it is a crude italian version of the job offers data, with many fields expressed in natural language. We will try to convert it to a dataset with more columns and translate some terms to English.
Data provider: Autonomous Province of Trento
License: Creative Commons Zero 1.0
WARNING: avoid constants in function bodies !!
In the exercises data you will find many names such as 'Austria'
, 'Giugno'
, etc. DO NOT put such constant names inside body of functions !! You have to write generic code which works with any input.
offerte dataset¶
We will load the dataset data/offerte-lavoro.csv into Pandas:
[1]:
import pandas as pd # we import pandas and for ease we rename it to 'pd'
import numpy as np # we import numpy and for ease we rename it to 'np'
# remember the encoding !
offerte = pd.read_csv('data/offerte-lavoro.csv', encoding='UTF-8')
offerte.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 8 columns):
RIFER. 53 non-null object
SEDE LAVORO 53 non-null object
POSTI 53 non-null int64
IMPIEGO RICHIESTO 53 non-null object
TIPO CONTRATTO 53 non-null object
LINGUA RICHIESTA 51 non-null object
RET. LORDA 53 non-null object
DESCRIZIONE OFFERTA 53 non-null object
dtypes: int64(1), object(7)
memory usage: 3.4+ KB
It contains Italian column names, and many string fields:
[2]:
offerte.head()
[2]:
RIFER. | SEDE LAVORO | POSTI | IMPIEGO RICHIESTO | TIPO CONTRATTO | LINGUA RICHIESTA | RET. LORDA | DESCRIZIONE OFFERTA | |
---|---|---|---|---|---|---|---|---|
0 | 18331901000024 | Norvegia | 6 | Restaurant staff | Tempo determinato da maggio ad agosto | Inglese fluente + Vedi testo | Da 3500\nFr/\nmese | We will be working together with sales, prepar... |
1 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese; italiano; francese fluente | Da definire | Vos missions principales sont les suivantes : ... |
2 | 4954752 | Danimarca | 1 | Italian Sales Representative | Non specificato | Inglese; Italiano fluente | Da definire | Minimum 2 + years sales experience, preferably... |
3 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... |
4 | 10531631 | Svezia | 1 | Italian speaking purchase | Non specificato | Inglese; italiano fluente | Da definire | This is a varied Purchasing role, where your m... |
rename columns¶
As first thing, we create a new dataframe offers
with columns renamed into English:
[3]:
replacements = ['Reference','Workplace','Positions','Qualification','Contract type','Required languages','Gross retribution','Offer description']
diz = {}
i = 0
for col in offerte:
diz[col] = replacements[i]
i += 1
offers = offerte.rename(columns = diz)
[4]:
offers
[4]:
Reference | Workplace | Positions | Qualification | Contract type | Required languages | Gross retribution | Offer description | |
---|---|---|---|---|---|---|---|---|
0 | 18331901000024 | Norvegia | 6 | Restaurant staff | Tempo determinato da maggio ad agosto | Inglese fluente + Vedi testo | Da 3500\nFr/\nmese | We will be working together with sales, prepar... |
1 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese; italiano; francese fluente | Da definire | Vos missions principales sont les suivantes : ... |
2 | 4954752 | Danimarca | 1 | Italian Sales Representative | Non specificato | Inglese; Italiano fluente | Da definire | Minimum 2 + years sales experience, preferably... |
3 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... |
4 | 10531631 | Svezia | 1 | Italian speaking purchase | Non specificato | Inglese; italiano fluente | Da definire | This is a varied Purchasing role, where your m... |
5 | 51485 | Islanda | 1 | Pizza chef | Tempo determinato | Inglese Buono | Da definire | Job details/requirements: Experience in making... |
6 | 4956299 | Danimarca | 1 | Regional Key account manager - Italy | Non specificato | Inglese; italiano fluente | Da definire | Requirements: possess good business acumen; ar... |
7 | - | Italia\nLazise | 1 | Receptionist | Non specificato | Inglese; Tedesco fluente + Vedi testo | Min 1500€\nMax\n1800€\nnetto\nmese | Camping Village Du Parc, Lazise,Italy is looki... |
8 | 2099681 | Irlanda | 11 | Customer Service Representative in Athens | Non specificato | Italiano fluente; Inglese buono | Da definire | Responsibilities: Solving customers queries by... |
9 | 12091902000474 | Norvegia | 1 | Dispatch personnel | Maggio – agosto 2019 | Inglese fluente + Vedi testo | Da definire | The Dispatch Team works outside in all weather... |
10 | 10000-1169373760-S | Svizzera | 1 | Mitarbeiter (m/w/d) im Verkaufsinnendienst | Non specificato | Tedesco fluente; francese e/o italiano buono | Da definire | Was Sie erwartet: telefonische und persönliche... |
11 | 10000-1168768920-S | Germania | 1 | Vertriebs assistent | Non specificato | Tedesco ed inglese fluente + italiano e/o spag... | Da definire | Ihre Tätigkeit: enge Zusammenarbeit mit unsere... |
12 | 082BMLG | Francia | 1 | Second / Seconde de cuisine | Tempo determinato da aprile ad ottobre 2019 | Francese discreto | Da definire | Missions : Vous serez en charge de la mise en ... |
13 | 23107550 | Svezia | 1 | Waiter/Waitress | Non specificato | Inglese ed Italiano buono | Da definire | Bar Robusta are looking for someone that speak... |
14 | 11949-11273083-S | Austria | 1 | Empfangskraft | Non specificato | Tedesco ed Inglese Fluente + vedi testo | Da definire | Erfolgreich abgeschlossene Ausbildung in der H... |
15 | 18331901000024 | Norvegia | 6 | Salesclerk | Da maggio ad ottobre | Inglese fluente + Vedi testo | Da definire | We will be working together with sales, prepar... |
16 | ID-11252967 | Austria | 1 | Verkaufssachbearbeiter für Italien (m/w) | Non specificato | Tedesco e italiano fluenti | 2574,68 Euro/\nmese | Unsere Anforderungen: Sie haben eine kaufmänni... |
17 | 10000-1162270517-S | Germania | 1 | Koch/Köchin | Non specificato | Italiano e tedesco buono | Da definire | Kenntnisse und Fertigkeiten: Erfolgreich abges... |
18 | 2100937 | Irlanda | 1 | Garden Centre Assistant | Non specificato | Inglese fluente | Da definire | Applicants should have good plant knowledge an... |
19 | WBS697919 | Paesi Bassi | 5 | Strawberries and Rhubarb processors | Da maggio a settembre | NaN | Vedi testo | In this job you will be busy picking strawberr... |
20 | 19361902000002 | Norvegia | 2 | Cleaners/renholdere Fishing Camp 2019 season | Tempo determinato da aprile ad ottobre 2019 | Inglese fluente | Da definire | Torsvåg Havfiske, estbl. 2005, is a touristcom... |
21 | 2095000 | Spagna | 15 | Customer service agent for solar energy | Non specificato | Inglese e tedesco fluenti | €21,000 per annum + 3.500 | One of our biggest clients offer a wide range ... |
22 | 58699222 | Norvegia | 1 | Receptionists tourist hotel | Da maggio a settembre o da giugno ad agosto | Inglese Fluente; francese e/o spagnolo buoni | Da definire | The job also incl communication with the kitch... |
23 | 10000-1169431325-S | Svizzera | 1 | Reiseverkehrskaufmann/-frau - Touristik | Non specificato | Tedesco Fluente + Vedi testo | Da definire | Wir erwarten: Abgeschlossene Reisebüroausbildu... |
24 | 082QNLW | Francia | 1 | Assistant administratif export avec Italie (H/F) | Non specificato | Francese ed italiano fluenti | Da definire | Vous serez en charge des missions suivantes po... |
25 | 2101510 | Irlanda | 1 | Receptionist | Non specificato | Inglese fluente; Tedesco discreto | Da definire | Receptionist required for the 2019 Season. Kno... |
26 | 171767 | Spagna | 300 | Seasonal worker in a strawberry farm | Da febbraio a giugno | NaN | Da definire | Peon agricola (recolector fresa) / culegator d... |
27 | 14491903000005 | Norvegia\nMøre e Romsdal e Sogn og Fjordane. | 6 | Guider | Tempo determinato da maggio a settembre | Tedesco e inglese fluente + Italiano buono | 20000 NOK /mese | We require that you: are at least 20 years old... |
28 | 10000-1167210671-S | Germania | 1 | Sales Manager Südeuropa m/w | Tempo indeterminato | Inglese e tedesco fluente + Italiano e/o spagn... | Da definire | Ihr Profil :Idealerweise Erfahrung in der Text... |
29 | 507 | Italia\ned\nestero | 25 | Animatori - coreografi - ballerini - istruttor... | Tempo determinato da aprile ad ottobre | Inglese Buono + Vedi testo | Vedi testo | Padronanza di una o più lingue tra queste (ita... |
30 | 846727 | Belgio | 1 | Junior Buyer Italian /English (m/v) | Non specificato | Inglese Ed italiano fluente | Da definire | You have a Bachelor degree. 2-3 years of profe... |
31 | 10531631 | Svezia\nLund | 1 | Italian Speaking Sales Administration Officer | Tempo indeterminato | Inglese ed italiano fluente | Da definire | You will focus on: Act as our main contact for... |
32 | 082ZFDB | Francia | 1 | Assistant Administratif et Commercial Bilingue... | Non specificato | Francese ed italiano fluente | Da definire | Au sein de l'équipe administrative, vous trava... |
33 | 1807568 | Regno Unito | 1 | Account Manager - German, Italian, Spanish, Dutch | Non specificato | Inglese Fluente + Vedi testo | £25,000 per annum | Account Manager The Candidate You will be an e... |
34 | 2103264 | Irlanda | 1 | Receptionist - Summer | Da maggio a settembre | Inglese fluente | Da definire | Assist with any ad-hoc project as required by ... |
35 | ID-11146984 | Austria Klagenfurt | 1 | Nachwuchsführungskraft im Agrarhandel / Traine... | Non specificato | Tedesco; Italiano buono | 1.950\nEuro/ mese | Ihre Qualifikationen: landwirtschaftliche Ausb... |
36 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... |
37 | 243096 | Spagna | 1 | Customer Service with French and Italian | Non specificato | Italiano; Francese fluente; Spagnolo buono | Da definire | As an IT Helpdesk, you will be responsible for... |
38 | 9909319 | Francia | 1 | Commercial Web Italie (H/F) | Non specificato | Italiano; Francese fluente | Da definire | Profil : Première expérience réussie dans la v... |
39 | WBS1253419 | Paesi\nBassi | 1 | Customer service employee Dow | Tempo determinato | Inglese; italiano fluente + vedi testo | Da definire | Requirements: You have a bachelor degree or hi... |
40 | 70cb25b1-5510-11e9-b89f-005056ac086d | Svizzera | 1 | Hauswart/In | Non specificato | Tedesco buono | Da definire | Wir suchen in unserem Team einen Mitarbeiter m... |
41 | 10000-1170625924-S | Germania | 1 | Monteur (m/w/d) Photovoltaik (Elektroanlagenmo... | Non specificato | Tedesco e/o inglese buono | Da definire | Anforderungen an die Bewerber/innen: abgeschlo... |
42 | 2106868 | Irlanda | 1 | Retail Store Assistant | Non specificato | Inglese Fluente | Da definire | Retail Store Assistant required for a SPAR sho... |
43 | 23233743 | Svezia | 1 | E-commerce copywriter | Non specificato | Inglese Fluente + vedi testo | Da definire | We support 15 languages incl Chinese, Russian ... |
44 | ID-11478229 | Italia\nAustria | 1 | Forstarbeiter/in | Aprile – maggio 2019 | Tedesco italiano discreto | €9,50\n/ora | ANFORDERUNGSPROFIL: Pflichtschulabschluss und ... |
45 | ID-11477956 | Austria | 1 | Koch/Köchin für italienische Küche in Teilzeit | Non specificato | Tedesco buono | Da definire | ANFORDERUNGSPROFIL:Erfahrung mit Pasta & Pizze... |
46 | 6171903000036 | Norvegia\nHesla Gaard | 1 | Maid / Housekeeping assistant | Tempo determinato da aprile a dicembre | Inglese fluente | 20.000 NOK mese | Responsibility for cleaning off our apartments... |
47 | 9909319 | Finlandia | 1 | Test Designer | Non specificato | Inglese fluente | Da definire | As Test Designer in R&D Devices team you will:... |
48 | ID-11239341 | Cipro Grecia Spagna | 5 | Animateur 2019 (m/w) | Tempo determinato aprile-ottobre | Tedesco; inglese buono | 800\n€/mese | Deine Fähigkeiten: Im Vordergrund steht Deine ... |
49 | 10000-1167068836-S | Germania | 2 | Verkaufshilfe im Souvenirshop (m/w/d) 5 Tage-W... | Contratto stagionale fino a novembre 2019 | Tedesco buono; Inglese buono | Da definire | Wir bieten: Einen zukunftssicheren, saisonalen... |
50 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese francese; Italiano fluente | Da definire | Description : Au sein d'une équipe de 10 perso... |
51 | 4956299 | Belgio | 1 | ACCOUNT MANAGER EXPORT ITALIE - HAYS - StepSto... | Non specificato | Inglese francese; Italiano fluente | Da definire | Votre profil : Pour ce poste, nous recherchons... |
52 | - | Austria\nPfenninger Alm | 1 | Cameriere e Commis de rang | Non specificato | Inglese buono; tedesco preferibile | 1500-1600\n€/mese | Lavoro estivo nella periferia di Salisburgo. E... |
1. Rename countries¶
We would like to create a new column holding a list of countries where the job is to be done. You will also have to translate countries to their English name.
To allow for text processing, you are provided with some data as python data structures (you do not need to further edit it):
[5]:
connectives = ['e', 'ed']
punctuation = ['.',';',',']
countries = {
'Austria':'Austria',
'Belgio': 'Belgium',
'Cipro':'Cyprus',
'Danimarca': 'Denmark',
'Irlanda':'Ireland',
'Italia':'Italy',
'Grecia':'Greece',
'Finlandia' : 'Finland',
'Francia' : 'France',
'Norvegia': 'Norway',
'Paesi Bassi':'Netherlands',
'Regno Unito': 'United Kingdom',
'Spagna': 'Spain',
'Svezia':'Sweden',
'Islanda':'Iceland',
'Svizzera':'Switzerland',
'estero': 'abroad' # special case
}
cities = {
'Pfenninger Alm': 'Pfenninger Alm',
'Berlino': 'Berlin',
'Trento': 'Trento',
'Klagenfurt': 'Klagenfurt',
'Lazise': 'Lazise',
'Lund':'Lund',
'Møre e Romsdal': 'Møre og Romsdal',
'Pfenninger Alm' : 'Pfenninger Alm',
'Sogn og Fjordane': 'Sogn og Fjordane',
'Hesla Gaard':'Hesla Gaard'
}
1.1 countries_to_list¶
✪✪ Implement function countries_to_list
which given a string from Workplace
column, RETURN a list holding country names in English in the exact order they appear in the string. The function will have to remove city names as well as punctuation, connectives and newlines using data define in the previous cell. There are various ways to solve the exercise: if you try the most straightforward one, most probably you will get countries which are not in the same order as in the string.
NOTE: this function only takes a single string as input!
Example:
>>> countries_to_list("Regno Unito, Italia ed estero")
['United Kingdom', 'Italy', 'abroad']
For other examples, see asserts.
[6]:
def countries_to_list(s):
#jupman-raise
ret = []
i = 0
ns = s.replace('\n',' ')
for connective in connectives:
ns = ns.replace(' ' + connective + ' ',' ')
for p in punctuation:
ns = ns.replace(p,'')
while i < len(ns):
for country in countries:
if ns[i:].startswith(country):
ret.append(countries[country])
i += len(country)
i += 1 # crude but works for this dataset ;-)
return ret
#/jupman-raise
# single country
assert countries_to_list("Francia") == ['France']
# country with a city
assert countries_to_list("Austria Klagenfurt") == ['Austria']
# country with a space
assert countries_to_list("Paesi Bassi") == ['Netherlands']
# one country, newline, one city
assert countries_to_list("Italia\nLazise") == ['Italy']
# newline, multiple cities
assert countries_to_list("Norvegia\nMøre e Romsdal e Sogn og Fjordane.") == ['Norway']
# multiple countries - order *must* be preserved !
assert countries_to_list('Cipro Grecia Spagna') == ['Cyprus', 'Greece', 'Spain']
# punctuation and connectives, multiple countries - order *must* be preserved !
assert countries_to_list('Regno Unito, Italia ed estero') == ['United Kingdom', 'Italy', 'abroad']
1.2 Filling column Workplace Country¶
✪ Now create a new column Workplace Country
with data calculated using the function you just defined.
To do it, check method transform in Pandas worksheet
[7]:
# write here
[8]:
# SOLUTION
offers['Workplace Country'] = offerte['SEDE LAVORO']
offers['Workplace Country'] = offers['Workplace Country'].transform(countries_to_list)
[9]:
print()
print(" ***************** SOLUTION OUTPUT ********************")
offers
***************** SOLUTION OUTPUT ********************
[9]:
Reference | Workplace | Positions | Qualification | Contract type | Required languages | Gross retribution | Offer description | Workplace Country | |
---|---|---|---|---|---|---|---|---|---|
0 | 18331901000024 | Norvegia | 6 | Restaurant staff | Tempo determinato da maggio ad agosto | Inglese fluente + Vedi testo | Da 3500\nFr/\nmese | We will be working together with sales, prepar... | [Norway] |
1 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese; italiano; francese fluente | Da definire | Vos missions principales sont les suivantes : ... | [France] |
2 | 4954752 | Danimarca | 1 | Italian Sales Representative | Non specificato | Inglese; Italiano fluente | Da definire | Minimum 2 + years sales experience, preferably... | [Denmark] |
3 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... | [] |
4 | 10531631 | Svezia | 1 | Italian speaking purchase | Non specificato | Inglese; italiano fluente | Da definire | This is a varied Purchasing role, where your m... | [Sweden] |
5 | 51485 | Islanda | 1 | Pizza chef | Tempo determinato | Inglese Buono | Da definire | Job details/requirements: Experience in making... | [Iceland] |
6 | 4956299 | Danimarca | 1 | Regional Key account manager - Italy | Non specificato | Inglese; italiano fluente | Da definire | Requirements: possess good business acumen; ar... | [Denmark] |
7 | - | Italia\nLazise | 1 | Receptionist | Non specificato | Inglese; Tedesco fluente + Vedi testo | Min 1500€\nMax\n1800€\nnetto\nmese | Camping Village Du Parc, Lazise,Italy is looki... | [Italy] |
8 | 2099681 | Irlanda | 11 | Customer Service Representative in Athens | Non specificato | Italiano fluente; Inglese buono | Da definire | Responsibilities: Solving customers queries by... | [Ireland] |
9 | 12091902000474 | Norvegia | 1 | Dispatch personnel | Maggio – agosto 2019 | Inglese fluente + Vedi testo | Da definire | The Dispatch Team works outside in all weather... | [Norway] |
10 | 10000-1169373760-S | Svizzera | 1 | Mitarbeiter (m/w/d) im Verkaufsinnendienst | Non specificato | Tedesco fluente; francese e/o italiano buono | Da definire | Was Sie erwartet: telefonische und persönliche... | [Switzerland] |
11 | 10000-1168768920-S | Germania | 1 | Vertriebs assistent | Non specificato | Tedesco ed inglese fluente + italiano e/o spag... | Da definire | Ihre Tätigkeit: enge Zusammenarbeit mit unsere... | [] |
12 | 082BMLG | Francia | 1 | Second / Seconde de cuisine | Tempo determinato da aprile ad ottobre 2019 | Francese discreto | Da definire | Missions : Vous serez en charge de la mise en ... | [France] |
13 | 23107550 | Svezia | 1 | Waiter/Waitress | Non specificato | Inglese ed Italiano buono | Da definire | Bar Robusta are looking for someone that speak... | [Sweden] |
14 | 11949-11273083-S | Austria | 1 | Empfangskraft | Non specificato | Tedesco ed Inglese Fluente + vedi testo | Da definire | Erfolgreich abgeschlossene Ausbildung in der H... | [Austria] |
15 | 18331901000024 | Norvegia | 6 | Salesclerk | Da maggio ad ottobre | Inglese fluente + Vedi testo | Da definire | We will be working together with sales, prepar... | [Norway] |
16 | ID-11252967 | Austria | 1 | Verkaufssachbearbeiter für Italien (m/w) | Non specificato | Tedesco e italiano fluenti | 2574,68 Euro/\nmese | Unsere Anforderungen: Sie haben eine kaufmänni... | [Austria] |
17 | 10000-1162270517-S | Germania | 1 | Koch/Köchin | Non specificato | Italiano e tedesco buono | Da definire | Kenntnisse und Fertigkeiten: Erfolgreich abges... | [] |
18 | 2100937 | Irlanda | 1 | Garden Centre Assistant | Non specificato | Inglese fluente | Da definire | Applicants should have good plant knowledge an... | [Ireland] |
19 | WBS697919 | Paesi Bassi | 5 | Strawberries and Rhubarb processors | Da maggio a settembre | NaN | Vedi testo | In this job you will be busy picking strawberr... | [Netherlands] |
20 | 19361902000002 | Norvegia | 2 | Cleaners/renholdere Fishing Camp 2019 season | Tempo determinato da aprile ad ottobre 2019 | Inglese fluente | Da definire | Torsvåg Havfiske, estbl. 2005, is a touristcom... | [Norway] |
21 | 2095000 | Spagna | 15 | Customer service agent for solar energy | Non specificato | Inglese e tedesco fluenti | €21,000 per annum + 3.500 | One of our biggest clients offer a wide range ... | [Spain] |
22 | 58699222 | Norvegia | 1 | Receptionists tourist hotel | Da maggio a settembre o da giugno ad agosto | Inglese Fluente; francese e/o spagnolo buoni | Da definire | The job also incl communication with the kitch... | [Norway] |
23 | 10000-1169431325-S | Svizzera | 1 | Reiseverkehrskaufmann/-frau - Touristik | Non specificato | Tedesco Fluente + Vedi testo | Da definire | Wir erwarten: Abgeschlossene Reisebüroausbildu... | [Switzerland] |
24 | 082QNLW | Francia | 1 | Assistant administratif export avec Italie (H/F) | Non specificato | Francese ed italiano fluenti | Da definire | Vous serez en charge des missions suivantes po... | [France] |
25 | 2101510 | Irlanda | 1 | Receptionist | Non specificato | Inglese fluente; Tedesco discreto | Da definire | Receptionist required for the 2019 Season. Kno... | [Ireland] |
26 | 171767 | Spagna | 300 | Seasonal worker in a strawberry farm | Da febbraio a giugno | NaN | Da definire | Peon agricola (recolector fresa) / culegator d... | [Spain] |
27 | 14491903000005 | Norvegia\nMøre e Romsdal e Sogn og Fjordane. | 6 | Guider | Tempo determinato da maggio a settembre | Tedesco e inglese fluente + Italiano buono | 20000 NOK /mese | We require that you: are at least 20 years old... | [Norway] |
28 | 10000-1167210671-S | Germania | 1 | Sales Manager Südeuropa m/w | Tempo indeterminato | Inglese e tedesco fluente + Italiano e/o spagn... | Da definire | Ihr Profil :Idealerweise Erfahrung in der Text... | [] |
29 | 507 | Italia\ned\nestero | 25 | Animatori - coreografi - ballerini - istruttor... | Tempo determinato da aprile ad ottobre | Inglese Buono + Vedi testo | Vedi testo | Padronanza di una o più lingue tra queste (ita... | [Italy, abroad] |
30 | 846727 | Belgio | 1 | Junior Buyer Italian /English (m/v) | Non specificato | Inglese Ed italiano fluente | Da definire | You have a Bachelor degree. 2-3 years of profe... | [Belgium] |
31 | 10531631 | Svezia\nLund | 1 | Italian Speaking Sales Administration Officer | Tempo indeterminato | Inglese ed italiano fluente | Da definire | You will focus on: Act as our main contact for... | [Sweden] |
32 | 082ZFDB | Francia | 1 | Assistant Administratif et Commercial Bilingue... | Non specificato | Francese ed italiano fluente | Da definire | Au sein de l'équipe administrative, vous trava... | [France] |
33 | 1807568 | Regno Unito | 1 | Account Manager - German, Italian, Spanish, Dutch | Non specificato | Inglese Fluente + Vedi testo | £25,000 per annum | Account Manager The Candidate You will be an e... | [United Kingdom] |
34 | 2103264 | Irlanda | 1 | Receptionist - Summer | Da maggio a settembre | Inglese fluente | Da definire | Assist with any ad-hoc project as required by ... | [Ireland] |
35 | ID-11146984 | Austria Klagenfurt | 1 | Nachwuchsführungskraft im Agrarhandel / Traine... | Non specificato | Tedesco; Italiano buono | 1.950\nEuro/ mese | Ihre Qualifikationen: landwirtschaftliche Ausb... | [Austria] |
36 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... | [] |
37 | 243096 | Spagna | 1 | Customer Service with French and Italian | Non specificato | Italiano; Francese fluente; Spagnolo buono | Da definire | As an IT Helpdesk, you will be responsible for... | [Spain] |
38 | 9909319 | Francia | 1 | Commercial Web Italie (H/F) | Non specificato | Italiano; Francese fluente | Da definire | Profil : Première expérience réussie dans la v... | [France] |
39 | WBS1253419 | Paesi\nBassi | 1 | Customer service employee Dow | Tempo determinato | Inglese; italiano fluente + vedi testo | Da definire | Requirements: You have a bachelor degree or hi... | [Netherlands] |
40 | 70cb25b1-5510-11e9-b89f-005056ac086d | Svizzera | 1 | Hauswart/In | Non specificato | Tedesco buono | Da definire | Wir suchen in unserem Team einen Mitarbeiter m... | [Switzerland] |
41 | 10000-1170625924-S | Germania | 1 | Monteur (m/w/d) Photovoltaik (Elektroanlagenmo... | Non specificato | Tedesco e/o inglese buono | Da definire | Anforderungen an die Bewerber/innen: abgeschlo... | [] |
42 | 2106868 | Irlanda | 1 | Retail Store Assistant | Non specificato | Inglese Fluente | Da definire | Retail Store Assistant required for a SPAR sho... | [Ireland] |
43 | 23233743 | Svezia | 1 | E-commerce copywriter | Non specificato | Inglese Fluente + vedi testo | Da definire | We support 15 languages incl Chinese, Russian ... | [Sweden] |
44 | ID-11478229 | Italia\nAustria | 1 | Forstarbeiter/in | Aprile – maggio 2019 | Tedesco italiano discreto | €9,50\n/ora | ANFORDERUNGSPROFIL: Pflichtschulabschluss und ... | [Italy, Austria] |
45 | ID-11477956 | Austria | 1 | Koch/Köchin für italienische Küche in Teilzeit | Non specificato | Tedesco buono | Da definire | ANFORDERUNGSPROFIL:Erfahrung mit Pasta & Pizze... | [Austria] |
46 | 6171903000036 | Norvegia\nHesla Gaard | 1 | Maid / Housekeeping assistant | Tempo determinato da aprile a dicembre | Inglese fluente | 20.000 NOK mese | Responsibility for cleaning off our apartments... | [Norway] |
47 | 9909319 | Finlandia | 1 | Test Designer | Non specificato | Inglese fluente | Da definire | As Test Designer in R&D Devices team you will:... | [Finland] |
48 | ID-11239341 | Cipro Grecia Spagna | 5 | Animateur 2019 (m/w) | Tempo determinato aprile-ottobre | Tedesco; inglese buono | 800\n€/mese | Deine Fähigkeiten: Im Vordergrund steht Deine ... | [Cyprus, Greece, Spain] |
49 | 10000-1167068836-S | Germania | 2 | Verkaufshilfe im Souvenirshop (m/w/d) 5 Tage-W... | Contratto stagionale fino a novembre 2019 | Tedesco buono; Inglese buono | Da definire | Wir bieten: Einen zukunftssicheren, saisonalen... | [] |
50 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese francese; Italiano fluente | Da definire | Description : Au sein d'une équipe de 10 perso... | [France] |
51 | 4956299 | Belgio | 1 | ACCOUNT MANAGER EXPORT ITALIE - HAYS - StepSto... | Non specificato | Inglese francese; Italiano fluente | Da definire | Votre profil : Pour ce poste, nous recherchons... | [Belgium] |
52 | - | Austria\nPfenninger Alm | 1 | Cameriere e Commis de rang | Non specificato | Inglese buono; tedesco preferibile | 1500-1600\n€/mese | Lavoro estivo nella periferia di Salisburgo. E... | [Austria] |
2. Work dates¶
You will add columns holding the dates of when a job start and when a job ends.
2.1 from_to function¶
✪✪ First define from_to
function, which takes some text from column "Contract type"
and RETURNS a tuple holding the extracted month numbers (starting from ONE, not zero!)
Example:
In this this case result is (5, 8)
because May is the fifth month and August is the eighth:
>>> from_to("Tempo determinato da maggio ad agosto")
(5,8)
If it is not possible to extract the text, the function should return a tuple holding NaNs:
>>> from_to('Non specificato')
(np.nan, np.nan)
Beware NaNs can lead to puzzling results, make sure you have read NaN and Infinities section in Numpy Matrices notebook
For other patterns to check, see asserts.
[10]:
months = ['gennaio', 'febbraio', 'marzo' , 'aprile' , 'maggio' , 'giugno',
'luglio' , 'agosto' , 'settembre', 'ottobre', 'novembre', 'dicembre' ]
def from_to(text):
#jupman-raise
ntext = text.lower().replace('ad ', 'a ')
found = False
if 'da ' in ntext:
from_pos = ntext.find('da ') + 3
from_month = text[from_pos:].split(' ')[0]
if ' a ' in ntext:
to_pos = ntext.find(' a ') + 3
to_month = ntext[to_pos:].split(' ')[0]
found = True
if '–' in ntext:
from_month = ntext.split(' – ')[0]
to_month = ntext.split(' – ')[0].split(' ')[0]
found = True
if found:
from_number = months.index(from_month) + 1
to_number = months.index(to_month) + 1
return (from_number,to_number)
else:
return (np.nan, np.nan)
#/jupman-raise
assert from_to('Da maggio a settembre') == (5,9)
assert from_to('Da maggio ad ottobre') == (5, 10)
assert from_to('Tempo determinato da maggio ad agosto') == (5,8)
# Unspecified
assert from_to('Non specificato') == (np.nan, np.nan)
# WARNING: BE SUPERCAREFUL ABOUT THIS ONE: SYMBOL – IS *NOT* A MINUS !!
# COPY AND PASTE IT EXACTLY AS YOU FIND IT HERE
# (BUT OF COURSE *DO NOT COPY* THE MONTH NAMES !)
assert from_to('Maggio – agosto 2019') == (5, 5)
# special case 'or', we just consider first interval and ignore the following one.
assert from_to('Da maggio a settembre o da giugno ad agosto') == (5,9)
# special case only right side, we ignore all of it
assert from_to('Contratto stagionale fino a novembre 2019') == (np.nan, np.nan)
2.2. From To columns¶
✪ Change offers
dataframe to so add From
and To
columns.
HINT 1: You can call transform, see Transforming section in Pandas worksheet
HINT 2 : to extract the element you want from the tuple, you can pass to the transform a function on the fly with
lambda
. See lambdas section in Functions worksheet
[11]:
# write here
[12]:
# SOLUTION
offers['From'] = offers['Contract type'].transform(lambda t: from_to(t)[0])
offers['To'] = offers['Contract type'].transform(lambda t: from_to(t)[1])
[13]:
print()
print(" **************** SOLUTION OUTPUT ****************")
offers
**************** SOLUTION OUTPUT ****************
[13]:
Reference | Workplace | Positions | Qualification | Contract type | Required languages | Gross retribution | Offer description | Workplace Country | From | To | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18331901000024 | Norvegia | 6 | Restaurant staff | Tempo determinato da maggio ad agosto | Inglese fluente + Vedi testo | Da 3500\nFr/\nmese | We will be working together with sales, prepar... | [Norway] | 5.0 | 8.0 |
1 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese; italiano; francese fluente | Da definire | Vos missions principales sont les suivantes : ... | [France] | NaN | NaN |
2 | 4954752 | Danimarca | 1 | Italian Sales Representative | Non specificato | Inglese; Italiano fluente | Da definire | Minimum 2 + years sales experience, preferably... | [Denmark] | NaN | NaN |
3 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... | [] | NaN | NaN |
4 | 10531631 | Svezia | 1 | Italian speaking purchase | Non specificato | Inglese; italiano fluente | Da definire | This is a varied Purchasing role, where your m... | [Sweden] | NaN | NaN |
5 | 51485 | Islanda | 1 | Pizza chef | Tempo determinato | Inglese Buono | Da definire | Job details/requirements: Experience in making... | [Iceland] | NaN | NaN |
6 | 4956299 | Danimarca | 1 | Regional Key account manager - Italy | Non specificato | Inglese; italiano fluente | Da definire | Requirements: possess good business acumen; ar... | [Denmark] | NaN | NaN |
7 | - | Italia\nLazise | 1 | Receptionist | Non specificato | Inglese; Tedesco fluente + Vedi testo | Min 1500€\nMax\n1800€\nnetto\nmese | Camping Village Du Parc, Lazise,Italy is looki... | [Italy] | NaN | NaN |
8 | 2099681 | Irlanda | 11 | Customer Service Representative in Athens | Non specificato | Italiano fluente; Inglese buono | Da definire | Responsibilities: Solving customers queries by... | [Ireland] | NaN | NaN |
9 | 12091902000474 | Norvegia | 1 | Dispatch personnel | Maggio – agosto 2019 | Inglese fluente + Vedi testo | Da definire | The Dispatch Team works outside in all weather... | [Norway] | 5.0 | 5.0 |
10 | 10000-1169373760-S | Svizzera | 1 | Mitarbeiter (m/w/d) im Verkaufsinnendienst | Non specificato | Tedesco fluente; francese e/o italiano buono | Da definire | Was Sie erwartet: telefonische und persönliche... | [Switzerland] | NaN | NaN |
11 | 10000-1168768920-S | Germania | 1 | Vertriebs assistent | Non specificato | Tedesco ed inglese fluente + italiano e/o spag... | Da definire | Ihre Tätigkeit: enge Zusammenarbeit mit unsere... | [] | NaN | NaN |
12 | 082BMLG | Francia | 1 | Second / Seconde de cuisine | Tempo determinato da aprile ad ottobre 2019 | Francese discreto | Da definire | Missions : Vous serez en charge de la mise en ... | [France] | 4.0 | 10.0 |
13 | 23107550 | Svezia | 1 | Waiter/Waitress | Non specificato | Inglese ed Italiano buono | Da definire | Bar Robusta are looking for someone that speak... | [Sweden] | NaN | NaN |
14 | 11949-11273083-S | Austria | 1 | Empfangskraft | Non specificato | Tedesco ed Inglese Fluente + vedi testo | Da definire | Erfolgreich abgeschlossene Ausbildung in der H... | [Austria] | NaN | NaN |
15 | 18331901000024 | Norvegia | 6 | Salesclerk | Da maggio ad ottobre | Inglese fluente + Vedi testo | Da definire | We will be working together with sales, prepar... | [Norway] | 5.0 | 10.0 |
16 | ID-11252967 | Austria | 1 | Verkaufssachbearbeiter für Italien (m/w) | Non specificato | Tedesco e italiano fluenti | 2574,68 Euro/\nmese | Unsere Anforderungen: Sie haben eine kaufmänni... | [Austria] | NaN | NaN |
17 | 10000-1162270517-S | Germania | 1 | Koch/Köchin | Non specificato | Italiano e tedesco buono | Da definire | Kenntnisse und Fertigkeiten: Erfolgreich abges... | [] | NaN | NaN |
18 | 2100937 | Irlanda | 1 | Garden Centre Assistant | Non specificato | Inglese fluente | Da definire | Applicants should have good plant knowledge an... | [Ireland] | NaN | NaN |
19 | WBS697919 | Paesi Bassi | 5 | Strawberries and Rhubarb processors | Da maggio a settembre | NaN | Vedi testo | In this job you will be busy picking strawberr... | [Netherlands] | 5.0 | 9.0 |
20 | 19361902000002 | Norvegia | 2 | Cleaners/renholdere Fishing Camp 2019 season | Tempo determinato da aprile ad ottobre 2019 | Inglese fluente | Da definire | Torsvåg Havfiske, estbl. 2005, is a touristcom... | [Norway] | 4.0 | 10.0 |
21 | 2095000 | Spagna | 15 | Customer service agent for solar energy | Non specificato | Inglese e tedesco fluenti | €21,000 per annum + 3.500 | One of our biggest clients offer a wide range ... | [Spain] | NaN | NaN |
22 | 58699222 | Norvegia | 1 | Receptionists tourist hotel | Da maggio a settembre o da giugno ad agosto | Inglese Fluente; francese e/o spagnolo buoni | Da definire | The job also incl communication with the kitch... | [Norway] | 5.0 | 9.0 |
23 | 10000-1169431325-S | Svizzera | 1 | Reiseverkehrskaufmann/-frau - Touristik | Non specificato | Tedesco Fluente + Vedi testo | Da definire | Wir erwarten: Abgeschlossene Reisebüroausbildu... | [Switzerland] | NaN | NaN |
24 | 082QNLW | Francia | 1 | Assistant administratif export avec Italie (H/F) | Non specificato | Francese ed italiano fluenti | Da definire | Vous serez en charge des missions suivantes po... | [France] | NaN | NaN |
25 | 2101510 | Irlanda | 1 | Receptionist | Non specificato | Inglese fluente; Tedesco discreto | Da definire | Receptionist required for the 2019 Season. Kno... | [Ireland] | NaN | NaN |
26 | 171767 | Spagna | 300 | Seasonal worker in a strawberry farm | Da febbraio a giugno | NaN | Da definire | Peon agricola (recolector fresa) / culegator d... | [Spain] | 2.0 | 6.0 |
27 | 14491903000005 | Norvegia\nMøre e Romsdal e Sogn og Fjordane. | 6 | Guider | Tempo determinato da maggio a settembre | Tedesco e inglese fluente + Italiano buono | 20000 NOK /mese | We require that you: are at least 20 years old... | [Norway] | 5.0 | 9.0 |
28 | 10000-1167210671-S | Germania | 1 | Sales Manager Südeuropa m/w | Tempo indeterminato | Inglese e tedesco fluente + Italiano e/o spagn... | Da definire | Ihr Profil :Idealerweise Erfahrung in der Text... | [] | NaN | NaN |
29 | 507 | Italia\ned\nestero | 25 | Animatori - coreografi - ballerini - istruttor... | Tempo determinato da aprile ad ottobre | Inglese Buono + Vedi testo | Vedi testo | Padronanza di una o più lingue tra queste (ita... | [Italy, abroad] | 4.0 | 10.0 |
30 | 846727 | Belgio | 1 | Junior Buyer Italian /English (m/v) | Non specificato | Inglese Ed italiano fluente | Da definire | You have a Bachelor degree. 2-3 years of profe... | [Belgium] | NaN | NaN |
31 | 10531631 | Svezia\nLund | 1 | Italian Speaking Sales Administration Officer | Tempo indeterminato | Inglese ed italiano fluente | Da definire | You will focus on: Act as our main contact for... | [Sweden] | NaN | NaN |
32 | 082ZFDB | Francia | 1 | Assistant Administratif et Commercial Bilingue... | Non specificato | Francese ed italiano fluente | Da definire | Au sein de l'équipe administrative, vous trava... | [France] | NaN | NaN |
33 | 1807568 | Regno Unito | 1 | Account Manager - German, Italian, Spanish, Dutch | Non specificato | Inglese Fluente + Vedi testo | £25,000 per annum | Account Manager The Candidate You will be an e... | [United Kingdom] | NaN | NaN |
34 | 2103264 | Irlanda | 1 | Receptionist - Summer | Da maggio a settembre | Inglese fluente | Da definire | Assist with any ad-hoc project as required by ... | [Ireland] | 5.0 | 9.0 |
35 | ID-11146984 | Austria Klagenfurt | 1 | Nachwuchsführungskraft im Agrarhandel / Traine... | Non specificato | Tedesco; Italiano buono | 1.950\nEuro/ mese | Ihre Qualifikationen: landwirtschaftliche Ausb... | [Austria] | NaN | NaN |
36 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... | [] | NaN | NaN |
37 | 243096 | Spagna | 1 | Customer Service with French and Italian | Non specificato | Italiano; Francese fluente; Spagnolo buono | Da definire | As an IT Helpdesk, you will be responsible for... | [Spain] | NaN | NaN |
38 | 9909319 | Francia | 1 | Commercial Web Italie (H/F) | Non specificato | Italiano; Francese fluente | Da definire | Profil : Première expérience réussie dans la v... | [France] | NaN | NaN |
39 | WBS1253419 | Paesi\nBassi | 1 | Customer service employee Dow | Tempo determinato | Inglese; italiano fluente + vedi testo | Da definire | Requirements: You have a bachelor degree or hi... | [Netherlands] | NaN | NaN |
40 | 70cb25b1-5510-11e9-b89f-005056ac086d | Svizzera | 1 | Hauswart/In | Non specificato | Tedesco buono | Da definire | Wir suchen in unserem Team einen Mitarbeiter m... | [Switzerland] | NaN | NaN |
41 | 10000-1170625924-S | Germania | 1 | Monteur (m/w/d) Photovoltaik (Elektroanlagenmo... | Non specificato | Tedesco e/o inglese buono | Da definire | Anforderungen an die Bewerber/innen: abgeschlo... | [] | NaN | NaN |
42 | 2106868 | Irlanda | 1 | Retail Store Assistant | Non specificato | Inglese Fluente | Da definire | Retail Store Assistant required for a SPAR sho... | [Ireland] | NaN | NaN |
43 | 23233743 | Svezia | 1 | E-commerce copywriter | Non specificato | Inglese Fluente + vedi testo | Da definire | We support 15 languages incl Chinese, Russian ... | [Sweden] | NaN | NaN |
44 | ID-11478229 | Italia\nAustria | 1 | Forstarbeiter/in | Aprile – maggio 2019 | Tedesco italiano discreto | €9,50\n/ora | ANFORDERUNGSPROFIL: Pflichtschulabschluss und ... | [Italy, Austria] | 4.0 | 4.0 |
45 | ID-11477956 | Austria | 1 | Koch/Köchin für italienische Küche in Teilzeit | Non specificato | Tedesco buono | Da definire | ANFORDERUNGSPROFIL:Erfahrung mit Pasta & Pizze... | [Austria] | NaN | NaN |
46 | 6171903000036 | Norvegia\nHesla Gaard | 1 | Maid / Housekeeping assistant | Tempo determinato da aprile a dicembre | Inglese fluente | 20.000 NOK mese | Responsibility for cleaning off our apartments... | [Norway] | 4.0 | 12.0 |
47 | 9909319 | Finlandia | 1 | Test Designer | Non specificato | Inglese fluente | Da definire | As Test Designer in R&D Devices team you will:... | [Finland] | NaN | NaN |
48 | ID-11239341 | Cipro Grecia Spagna | 5 | Animateur 2019 (m/w) | Tempo determinato aprile-ottobre | Tedesco; inglese buono | 800\n€/mese | Deine Fähigkeiten: Im Vordergrund steht Deine ... | [Cyprus, Greece, Spain] | NaN | NaN |
49 | 10000-1167068836-S | Germania | 2 | Verkaufshilfe im Souvenirshop (m/w/d) 5 Tage-W... | Contratto stagionale fino a novembre 2019 | Tedesco buono; Inglese buono | Da definire | Wir bieten: Einen zukunftssicheren, saisonalen... | [] | NaN | NaN |
50 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese francese; Italiano fluente | Da definire | Description : Au sein d'une équipe de 10 perso... | [France] | NaN | NaN |
51 | 4956299 | Belgio | 1 | ACCOUNT MANAGER EXPORT ITALIE - HAYS - StepSto... | Non specificato | Inglese francese; Italiano fluente | Da definire | Votre profil : Pour ce poste, nous recherchons... | [Belgium] | NaN | NaN |
52 | - | Austria\nPfenninger Alm | 1 | Cameriere e Commis de rang | Non specificato | Inglese buono; tedesco preferibile | 1500-1600\n€/mese | Lavoro estivo nella periferia di Salisburgo. E... | [Austria] | NaN | NaN |
3. Required languages¶
Now we will try to extract required languages.
3.1 function reqlan¶
✪✪✪ First implement function reqlan
that given a string from column 'Required language'
produces a dictionary with extracted languages and associated level code in CEFR standard (Common European Framework of Reference for Languages).
Example:
>>> reqlan("Italiano; Francese fluente; Spagnolo buono")
{'italian': 'C1', 'french': 'C1', 'spanish': 'B2'}
To know what italian words are to be translated to, use dictionaries provided in the following cell.
See tests for more cases to handle.
WARNING 1: function takes a single string !!
WARNING 2: BE VERY CAREFUL WITH NaN input !
Function might also take a NaN value (math.nan
or np.nan
they are the same), in which case it should RETURN an empty dictionary:
>>> reqlan(np.nan)
{}
If you are checking for a NaN, DO NOT write
if text == np.nan: # WRONG !
To see why, do read NaNs and Infinities section in Numpy Matrices worksheet !
[14]:
languages = {
'italiano':'italian',
'tedesco':'german',
'francese':'french',
'inglese':'english',
'spagnolo':'spanish',
}
lang_levels = {
'discreto':'B1',
'buono':'B2',
'fluente':'C1',
}
def reqlan(text):
#jupman-raise
import math
if type(text) != str and math.isnan(text):
return {}
ret = {}
ntext = text.lower().replace('+ vedi testo', '')
ntext = ntext.replace('e/o','; ')
ntext = ntext.replace(' e ','; ')
words = ntext.replace(';','').split(' ')
found_langs = []
for w in words:
if w in languages:
found_langs.append(w)
if w in lang_levels or (w[:-1] +'e' in lang_levels):
if w in lang_levels:
label = lang_levels[w]
else:
label = lang_levels[w[:-1] + 'e']
for lang in found_langs:
ret[languages[lang]] = label
found_langs = [] # reset
return ret
#/jupman-raise
# different languages may have different skills
assert reqlan("Italiano fluente; Inglese buono") == {'italian': 'C1',
'english': 'B2'}
# a sequence of languages terminating with a level is assumed to have that same level
assert reqlan("Inglese; italiano; francese fluente") == {'english': 'C1',
'italian':'C1',
'french' : 'C1'}
# semicolon absence shouldn't be a problem
assert reqlan("Tedesco italiano discreto") == {
'german':'B1',
'italian': 'B1'
}
# we can have multiple sequences
assert reqlan("Italiano; Francese fluente; Spagnolo buono") == {'italian': 'C1',
'french': 'C1',
'spanish': 'B2'}
# text after plus needs to be removed
assert reqlan("Inglese fluente + Vedi testo") == {'english': 'C1'}
# plural.
# NOTE: to do this, assume all plurals in the world
# are constructed by substituing 'i' to last character of singular words
assert reqlan("Tedesco e italiano fluenti") == {'german':'C1',
'italian':'C1'}
# special case: we ignore codes in parentheses and just put B2
assert reqlan("Inglese Buono (B1-B2); Tedesco base") == {'english': 'B2'}
# e/o: and / or case. We simplify and just list them as others
assert reqlan("Tedesco fluente; francese e/o italiano buono") == { 'german':'C1',
'french':'B2',
'italian':'B2'
}
# of course there is a cell which is NaN :P
assert reqlan(np.nan) == {}
3.2 Languages column¶
✪ Now add the languages
column using the previously defined reqlan
function:
[15]:
# write here
offers['Languages'] = offers['Required languages'].transform(reqlan)
[16]:
print()
print(" ******************* SOLUTION OUTPUT ***********************")
offers
******************* SOLUTION OUTPUT ***********************
[16]:
Reference | Workplace | Positions | Qualification | Contract type | Required languages | Gross retribution | Offer description | Workplace Country | From | To | Languages | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 18331901000024 | Norvegia | 6 | Restaurant staff | Tempo determinato da maggio ad agosto | Inglese fluente + Vedi testo | Da 3500\nFr/\nmese | We will be working together with sales, prepar... | [Norway] | 5.0 | 8.0 | {'english': 'C1'} |
1 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese; italiano; francese fluente | Da definire | Vos missions principales sont les suivantes : ... | [France] | NaN | NaN | {'english': 'C1', 'italian': 'C1', 'french': '... |
2 | 4954752 | Danimarca | 1 | Italian Sales Representative | Non specificato | Inglese; Italiano fluente | Da definire | Minimum 2 + years sales experience, preferably... | [Denmark] | NaN | NaN | {'english': 'C1', 'italian': 'C1'} |
3 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... | [] | NaN | NaN | {'english': 'B2'} |
4 | 10531631 | Svezia | 1 | Italian speaking purchase | Non specificato | Inglese; italiano fluente | Da definire | This is a varied Purchasing role, where your m... | [Sweden] | NaN | NaN | {'english': 'C1', 'italian': 'C1'} |
5 | 51485 | Islanda | 1 | Pizza chef | Tempo determinato | Inglese Buono | Da definire | Job details/requirements: Experience in making... | [Iceland] | NaN | NaN | {'english': 'B2'} |
6 | 4956299 | Danimarca | 1 | Regional Key account manager - Italy | Non specificato | Inglese; italiano fluente | Da definire | Requirements: possess good business acumen; ar... | [Denmark] | NaN | NaN | {'english': 'C1', 'italian': 'C1'} |
7 | - | Italia\nLazise | 1 | Receptionist | Non specificato | Inglese; Tedesco fluente + Vedi testo | Min 1500€\nMax\n1800€\nnetto\nmese | Camping Village Du Parc, Lazise,Italy is looki... | [Italy] | NaN | NaN | {'english': 'C1', 'german': 'C1'} |
8 | 2099681 | Irlanda | 11 | Customer Service Representative in Athens | Non specificato | Italiano fluente; Inglese buono | Da definire | Responsibilities: Solving customers queries by... | [Ireland] | NaN | NaN | {'italian': 'C1', 'english': 'B2'} |
9 | 12091902000474 | Norvegia | 1 | Dispatch personnel | Maggio – agosto 2019 | Inglese fluente + Vedi testo | Da definire | The Dispatch Team works outside in all weather... | [Norway] | 5.0 | 5.0 | {'english': 'C1'} |
10 | 10000-1169373760-S | Svizzera | 1 | Mitarbeiter (m/w/d) im Verkaufsinnendienst | Non specificato | Tedesco fluente; francese e/o italiano buono | Da definire | Was Sie erwartet: telefonische und persönliche... | [Switzerland] | NaN | NaN | {'german': 'C1', 'french': 'B2', 'italian': 'B2'} |
11 | 10000-1168768920-S | Germania | 1 | Vertriebs assistent | Non specificato | Tedesco ed inglese fluente + italiano e/o spag... | Da definire | Ihre Tätigkeit: enge Zusammenarbeit mit unsere... | [] | NaN | NaN | {'german': 'C1', 'english': 'C1', 'italian': '... |
12 | 082BMLG | Francia | 1 | Second / Seconde de cuisine | Tempo determinato da aprile ad ottobre 2019 | Francese discreto | Da definire | Missions : Vous serez en charge de la mise en ... | [France] | 4.0 | 10.0 | {'french': 'B1'} |
13 | 23107550 | Svezia | 1 | Waiter/Waitress | Non specificato | Inglese ed Italiano buono | Da definire | Bar Robusta are looking for someone that speak... | [Sweden] | NaN | NaN | {'english': 'B2', 'italian': 'B2'} |
14 | 11949-11273083-S | Austria | 1 | Empfangskraft | Non specificato | Tedesco ed Inglese Fluente + vedi testo | Da definire | Erfolgreich abgeschlossene Ausbildung in der H... | [Austria] | NaN | NaN | {'german': 'C1', 'english': 'C1'} |
15 | 18331901000024 | Norvegia | 6 | Salesclerk | Da maggio ad ottobre | Inglese fluente + Vedi testo | Da definire | We will be working together with sales, prepar... | [Norway] | 5.0 | 10.0 | {'english': 'C1'} |
16 | ID-11252967 | Austria | 1 | Verkaufssachbearbeiter für Italien (m/w) | Non specificato | Tedesco e italiano fluenti | 2574,68 Euro/\nmese | Unsere Anforderungen: Sie haben eine kaufmänni... | [Austria] | NaN | NaN | {'german': 'C1', 'italian': 'C1'} |
17 | 10000-1162270517-S | Germania | 1 | Koch/Köchin | Non specificato | Italiano e tedesco buono | Da definire | Kenntnisse und Fertigkeiten: Erfolgreich abges... | [] | NaN | NaN | {'italian': 'B2', 'german': 'B2'} |
18 | 2100937 | Irlanda | 1 | Garden Centre Assistant | Non specificato | Inglese fluente | Da definire | Applicants should have good plant knowledge an... | [Ireland] | NaN | NaN | {'english': 'C1'} |
19 | WBS697919 | Paesi Bassi | 5 | Strawberries and Rhubarb processors | Da maggio a settembre | NaN | Vedi testo | In this job you will be busy picking strawberr... | [Netherlands] | 5.0 | 9.0 | {} |
20 | 19361902000002 | Norvegia | 2 | Cleaners/renholdere Fishing Camp 2019 season | Tempo determinato da aprile ad ottobre 2019 | Inglese fluente | Da definire | Torsvåg Havfiske, estbl. 2005, is a touristcom... | [Norway] | 4.0 | 10.0 | {'english': 'C1'} |
21 | 2095000 | Spagna | 15 | Customer service agent for solar energy | Non specificato | Inglese e tedesco fluenti | €21,000 per annum + 3.500 | One of our biggest clients offer a wide range ... | [Spain] | NaN | NaN | {'english': 'C1', 'german': 'C1'} |
22 | 58699222 | Norvegia | 1 | Receptionists tourist hotel | Da maggio a settembre o da giugno ad agosto | Inglese Fluente; francese e/o spagnolo buoni | Da definire | The job also incl communication with the kitch... | [Norway] | 5.0 | 9.0 | {'english': 'C1'} |
23 | 10000-1169431325-S | Svizzera | 1 | Reiseverkehrskaufmann/-frau - Touristik | Non specificato | Tedesco Fluente + Vedi testo | Da definire | Wir erwarten: Abgeschlossene Reisebüroausbildu... | [Switzerland] | NaN | NaN | {'german': 'C1'} |
24 | 082QNLW | Francia | 1 | Assistant administratif export avec Italie (H/F) | Non specificato | Francese ed italiano fluenti | Da definire | Vous serez en charge des missions suivantes po... | [France] | NaN | NaN | {'french': 'C1', 'italian': 'C1'} |
25 | 2101510 | Irlanda | 1 | Receptionist | Non specificato | Inglese fluente; Tedesco discreto | Da definire | Receptionist required for the 2019 Season. Kno... | [Ireland] | NaN | NaN | {'english': 'C1', 'german': 'B1'} |
26 | 171767 | Spagna | 300 | Seasonal worker in a strawberry farm | Da febbraio a giugno | NaN | Da definire | Peon agricola (recolector fresa) / culegator d... | [Spain] | 2.0 | 6.0 | {} |
27 | 14491903000005 | Norvegia\nMøre e Romsdal e Sogn og Fjordane. | 6 | Guider | Tempo determinato da maggio a settembre | Tedesco e inglese fluente + Italiano buono | 20000 NOK /mese | We require that you: are at least 20 years old... | [Norway] | 5.0 | 9.0 | {'german': 'C1', 'english': 'C1', 'italian': '... |
28 | 10000-1167210671-S | Germania | 1 | Sales Manager Südeuropa m/w | Tempo indeterminato | Inglese e tedesco fluente + Italiano e/o spagn... | Da definire | Ihr Profil :Idealerweise Erfahrung in der Text... | [] | NaN | NaN | {'english': 'C1', 'german': 'C1', 'italian': '... |
29 | 507 | Italia\ned\nestero | 25 | Animatori - coreografi - ballerini - istruttor... | Tempo determinato da aprile ad ottobre | Inglese Buono + Vedi testo | Vedi testo | Padronanza di una o più lingue tra queste (ita... | [Italy, abroad] | 4.0 | 10.0 | {'english': 'B2'} |
30 | 846727 | Belgio | 1 | Junior Buyer Italian /English (m/v) | Non specificato | Inglese Ed italiano fluente | Da definire | You have a Bachelor degree. 2-3 years of profe... | [Belgium] | NaN | NaN | {'english': 'C1', 'italian': 'C1'} |
31 | 10531631 | Svezia\nLund | 1 | Italian Speaking Sales Administration Officer | Tempo indeterminato | Inglese ed italiano fluente | Da definire | You will focus on: Act as our main contact for... | [Sweden] | NaN | NaN | {'english': 'C1', 'italian': 'C1'} |
32 | 082ZFDB | Francia | 1 | Assistant Administratif et Commercial Bilingue... | Non specificato | Francese ed italiano fluente | Da definire | Au sein de l'équipe administrative, vous trava... | [France] | NaN | NaN | {'french': 'C1', 'italian': 'C1'} |
33 | 1807568 | Regno Unito | 1 | Account Manager - German, Italian, Spanish, Dutch | Non specificato | Inglese Fluente + Vedi testo | £25,000 per annum | Account Manager The Candidate You will be an e... | [United Kingdom] | NaN | NaN | {'english': 'C1'} |
34 | 2103264 | Irlanda | 1 | Receptionist - Summer | Da maggio a settembre | Inglese fluente | Da definire | Assist with any ad-hoc project as required by ... | [Ireland] | 5.0 | 9.0 | {'english': 'C1'} |
35 | ID-11146984 | Austria Klagenfurt | 1 | Nachwuchsführungskraft im Agrarhandel / Traine... | Non specificato | Tedesco; Italiano buono | 1.950\nEuro/ mese | Ihre Qualifikationen: landwirtschaftliche Ausb... | [Austria] | NaN | NaN | {'german': 'B2', 'italian': 'B2'} |
36 | - | Berlino\nTrento | 1 | Apprendista perito elettronico; Elettrotecnico | Inizialmente contratto di apprendistato con po... | Inglese Buono (B1-B2); Tedesco base | Min 1000\nMax\n1170\n€/mese | Ti stai diplomando e/o stai cercando un primo ... | [] | NaN | NaN | {'english': 'B2'} |
37 | 243096 | Spagna | 1 | Customer Service with French and Italian | Non specificato | Italiano; Francese fluente; Spagnolo buono | Da definire | As an IT Helpdesk, you will be responsible for... | [Spain] | NaN | NaN | {'italian': 'C1', 'french': 'C1', 'spanish': '... |
38 | 9909319 | Francia | 1 | Commercial Web Italie (H/F) | Non specificato | Italiano; Francese fluente | Da definire | Profil : Première expérience réussie dans la v... | [France] | NaN | NaN | {'italian': 'C1', 'french': 'C1'} |
39 | WBS1253419 | Paesi\nBassi | 1 | Customer service employee Dow | Tempo determinato | Inglese; italiano fluente + vedi testo | Da definire | Requirements: You have a bachelor degree or hi... | [Netherlands] | NaN | NaN | {'english': 'C1', 'italian': 'C1'} |
40 | 70cb25b1-5510-11e9-b89f-005056ac086d | Svizzera | 1 | Hauswart/In | Non specificato | Tedesco buono | Da definire | Wir suchen in unserem Team einen Mitarbeiter m... | [Switzerland] | NaN | NaN | {'german': 'B2'} |
41 | 10000-1170625924-S | Germania | 1 | Monteur (m/w/d) Photovoltaik (Elektroanlagenmo... | Non specificato | Tedesco e/o inglese buono | Da definire | Anforderungen an die Bewerber/innen: abgeschlo... | [] | NaN | NaN | {'german': 'B2', 'english': 'B2'} |
42 | 2106868 | Irlanda | 1 | Retail Store Assistant | Non specificato | Inglese Fluente | Da definire | Retail Store Assistant required for a SPAR sho... | [Ireland] | NaN | NaN | {'english': 'C1'} |
43 | 23233743 | Svezia | 1 | E-commerce copywriter | Non specificato | Inglese Fluente + vedi testo | Da definire | We support 15 languages incl Chinese, Russian ... | [Sweden] | NaN | NaN | {'english': 'C1'} |
44 | ID-11478229 | Italia\nAustria | 1 | Forstarbeiter/in | Aprile – maggio 2019 | Tedesco italiano discreto | €9,50\n/ora | ANFORDERUNGSPROFIL: Pflichtschulabschluss und ... | [Italy, Austria] | 4.0 | 4.0 | {'german': 'B1', 'italian': 'B1'} |
45 | ID-11477956 | Austria | 1 | Koch/Köchin für italienische Küche in Teilzeit | Non specificato | Tedesco buono | Da definire | ANFORDERUNGSPROFIL:Erfahrung mit Pasta & Pizze... | [Austria] | NaN | NaN | {'german': 'B2'} |
46 | 6171903000036 | Norvegia\nHesla Gaard | 1 | Maid / Housekeeping assistant | Tempo determinato da aprile a dicembre | Inglese fluente | 20.000 NOK mese | Responsibility for cleaning off our apartments... | [Norway] | 4.0 | 12.0 | {'english': 'C1'} |
47 | 9909319 | Finlandia | 1 | Test Designer | Non specificato | Inglese fluente | Da definire | As Test Designer in R&D Devices team you will:... | [Finland] | NaN | NaN | {'english': 'C1'} |
48 | ID-11239341 | Cipro Grecia Spagna | 5 | Animateur 2019 (m/w) | Tempo determinato aprile-ottobre | Tedesco; inglese buono | 800\n€/mese | Deine Fähigkeiten: Im Vordergrund steht Deine ... | [Cyprus, Greece, Spain] | NaN | NaN | {'german': 'B2', 'english': 'B2'} |
49 | 10000-1167068836-S | Germania | 2 | Verkaufshilfe im Souvenirshop (m/w/d) 5 Tage-W... | Contratto stagionale fino a novembre 2019 | Tedesco buono; Inglese buono | Da definire | Wir bieten: Einen zukunftssicheren, saisonalen... | [] | NaN | NaN | {'german': 'B2', 'english': 'B2'} |
50 | 083PZMM | Francia | 1 | Assistant export trilingue italien et anglais ... | Non specificato | Inglese francese; Italiano fluente | Da definire | Description : Au sein d'une équipe de 10 perso... | [France] | NaN | NaN | {'english': 'C1', 'french': 'C1', 'italian': '... |
51 | 4956299 | Belgio | 1 | ACCOUNT MANAGER EXPORT ITALIE - HAYS - StepSto... | Non specificato | Inglese francese; Italiano fluente | Da definire | Votre profil : Pour ce poste, nous recherchons... | [Belgium] | NaN | NaN | {'english': 'C1', 'french': 'C1', 'italian': '... |
52 | - | Austria\nPfenninger Alm | 1 | Cameriere e Commis de rang | Non specificato | Inglese buono; tedesco preferibile | 1500-1600\n€/mese | Lavoro estivo nella periferia di Salisburgo. E... | [Austria] | NaN | NaN | {'english': 'B2'} |
[1]:
#Please execute this cell
import sys;
sys.path.append('../../');
import jupman;
Midterm - Thu 07, Nov 2019 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-11-07-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-11-07-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-exams
|-2019-11-07
|- exam-2019-11-07-exercise.ipynb
Rename
datasciprolab-2019-11-07-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-11-07-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A¶
Open Jupyter and start editing this notebook exam-2019-11-07-exercise.ipynb
You will work on a dataset of events which occur in the Municipality of Trento, in years 2019-20. Each event can be held during a particular day, two days, or many specified as a range. Events are written using natural language, so we will try to extract such dates, taking into account that information sometimes can be partial or absent.
Data provider: Comune di Trento
License: Creative Commons Attribution 4.0
WARNING: avoid constants in function bodies !!
In the exercises data you will find many names and connectives such as ‘Giovedì’, ‘Novembre’, ‘e’, ‘a’, etc. DO NOT put such constant names inside body of functions !! You have to write generic code which works with any input.
[2]:
import pandas as pd # we import pandas and for ease we rename it to 'pd'
import numpy as np # we import numpy and for ease we rename it to 'np'
# remember the encoding !
eventi = pd.read_csv('data/eventi.csv', encoding='UTF-8')
eventi.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 253 entries, 0 to 252
Data columns (total 35 columns):
remoteId 253 non-null object
published 253 non-null object
modified 253 non-null object
Priorità 253 non-null int64
Evento speciale 0 non-null float64
Titolo 253 non-null object
Titolo breve 1 non-null object
Sottotitolo 227 non-null object
Descrizione 224 non-null object
Locandina 16 non-null object
Inizio 253 non-null object
Termine 252 non-null object
Quando 253 non-null object
Orario 251 non-null object
Durata 6 non-null object
Dove 252 non-null object
lat 253 non-null float64
lon 253 non-null float64
address 241 non-null object
Pagina web 201 non-null object
Contatto email 196 non-null object
Contatto telefonico 196 non-null object
Informazioni 62 non-null object
Costi 132 non-null object
Immagine 252 non-null object
Evento - manifestazione 252 non-null object
Manifestazione cui fa parte 108 non-null object
Tipologia 252 non-null object
Materia 252 non-null object
Destinatari 24 non-null object
Circoscrizione 109 non-null object
Struttura ospitante 220 non-null object
Associazione 1 non-null object
Ente organizzatore 0 non-null float64
Identificativo 0 non-null float64
dtypes: float64(5), int64(1), object(29)
memory usage: 69.3+ KB
We will concentrate on Quando
(When) column:
[3]:
eventi['Quando']
[3]:
0 venerdì 5 aprile alle 20:30 in via degli Olmi ...
1 Giovedì 7 novembre 2019
2 Giovedì 14 novembre 2019
3 Giovedì 21 novembre 2019
4 Giovedì 28 novembre 2019
...
248 sabato 9 novembre 2019
249 da venerdì 8 a domenica 10 novembre 2019
250 giovedì 7 novembre 2019
251 giovedì 28 novembre 2019
252 giovedì 21 novembre 2019
Name: Quando, Length: 253, dtype: object
A.1 leap_year¶
✪ A leap year has 366 days instead of regular 365. Yor are given some criteria to detect whether or not a year is a leap year. Implement them in a function which given a year as a number RETURN True
if it is a leap year, False
otherwise.
IMPORTANT: in Python there are predefined methods to detect leap years, but here you MUST write your own code!
If the year is evenly divisible by 4, go to step 2. Otherwise, go to step 5.
If the year is evenly divisible by 100, go to step 3. Otherwise, go to step 4.
If the year is evenly divisible by 400, go to step 4. Otherwise, go to step 5.
The year is a leap year (it has 366 days)
The year is not a leap year (it has 365 days)
(if you’re curios about calendars, see this link)
[4]:
def is_leap(year):
#jupman-raise
if year % 4 == 0:
if year % 100 == 0:
return year % 400 == 0
else:
return True
else:
return False
#/jupman-raise
assert is_leap(4) == True
assert is_leap(104) == True
assert is_leap(204) == True
assert is_leap(400) == True
assert is_leap(1600) == True
assert is_leap(2000) == True
assert is_leap(2400) == True
assert is_leap(2000) == True
assert is_leap(2004) == True
assert is_leap(2008) == True
assert is_leap(2012) == True
assert is_leap(1) == False
assert is_leap(5) == False
assert is_leap(100) == False
assert is_leap(200) == False
assert is_leap(1700) == False
assert is_leap(1800) == False
assert is_leap(1900) == False
assert is_leap(2100) == False
assert is_leap(2200) == False
assert is_leap(2300) == False
assert is_leap(2500) == False
assert is_leap(2600) == False
A.2 full_date¶
✪✪ Write function full_date
which takes some natural language text representing a complete date and outputs a string in the format yyyy-mm-dd
like 2019-03-25
.
Dates will be expressed in Italian, so we report here the corresponding translations
your function should work regardless of capitalization of input
we assume the date to be always well formed
Examples:
At the begininning you always have day name (Mercoledì
means Wednesday):
>>> full_date("Mercoledì 13 Novembre 2019")
"2019-11-13"
Right after day name, you may also find a day phase, like mattina
for morning:
>>> full_date("Mercoledì mattina 13 Novembre 2019")
"2019-11-13"
Remember you can have lowercases and single digits which must be prepended by zero:
>>> full_date("domenica 4 dicembre 1923")
"1923-12-04"
For more examples, see assertions.
[5]:
days = ['lunedì', 'martedì', 'mercoledì', 'giovedì', 'venerdì', 'sabato', 'domenica']
months = ['gennaio', 'febbraio', 'marzo' , 'aprile' , 'maggio' , 'giugno',
'luglio' , 'agosto' , 'settembre', 'ottobre', 'novembre', 'dicembre' ]
# morning, afternoon, evening, night
day_phase = ['mattina', 'pomeriggio', 'sera', 'notte']
[6]:
def full_date(text):
#jupman-raise
ntext = text.lower()
words = ntext.split()
i = 1
if words[i] in day_phase:
i += 1
day = int(words[i])
i += 1
month = int(months.index(words[i])) + 1
i += 1
year = int(words[i])
return "{:04d}-{:02d}-{:02d}".format(year, month, day)
#/jupman-raise
assert full_date("Giovedì 14 novembre 2019") == "2019-11-14"
assert full_date("Giovedì 7 novembre 2019") == "2019-11-07"
assert full_date("Giovedì pomeriggio 14 novembre 2019") == "2019-11-14"
assert full_date("sabato mattina 25 marzo 2017") == "2017-03-25"
assert full_date("Mercoledì 13 Novembre 2019") == "2019-11-13"
assert full_date("domenica 4 dicembre 1923") == "1923-12-04"
A.3 partial_date¶
✪✪✪ Write a function partial_date
which takes a natural language text representing one or more dates, and RETURN only the FIRST date found, in the format yyyy-mm-dd
. If the FIRST date contains insufficient information to form a complete date, in the returned date leave the characters 'yyyy'
for unknown year, 'mm'
for unknown months and 'dd'
for unknown day.
NOTE: Here we only care about FIRST date, DO NOT attempt to fetch eventual missing information from the second date, we will deal will that in a later exercise.
Examples:
>>> partial_date("Giovedì 7 novembre 2019")
"2019-11-07"
>>> partial_date("venerdì 15 novembre")
"yyyy-11-15"
>>> partial_date("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019")
"yyyy-mm-15"
For more examples, see asserts.
[7]:
connective_and = 'e'
connective_from = 'da'
connective_to = 'a'
days = ['lunedì', 'martedì', 'mercoledì', 'giovedì', 'venerdì', 'sabato', 'domenica']
months = ['gennaio', 'febbraio', 'marzo' , 'aprile' , 'maggio' , 'giugno',
'luglio' , 'agosto' , 'settembre', 'ottobre', 'novembre', 'dicembre' ]
# morning, afternoon, evening, night
day_phases = ['mattina', 'pomeriggio', 'sera', 'notte']
[8]:
def partial_date(text):
#jupman-raise
if type(text) != str:
return 'yyyy-mm-dd'
year = 'yyyy'
month = 'mm'
day = 'dd'
ntext = text.lower()
ret = []
words = ntext.split()
if len(words) > 0:
if words[0] == connective_from:
i = 1
else:
i = 0
if words[i] in days:
i = i + 1
if words[i] in day_phases:
i += 1
day = "{:02d}".format(int(words[i]))
i += 1
if i < len(words):
# 'e' case with double date
if words[i] in months:
month = "{:02d}".format(months.index(words[i]) + 1)
i += 1
if i < len(words):
if words[i].isdigit():
year = "{:04d}".format(int(words[i]))
return "%s-%s-%s" % (year, month, day)
#/jupman-raise
# complete, uppercase day
assert partial_date("Giovedì 7 novembre 2019") == "2019-11-07"
assert partial_date("Giovedì 14 novembre 2019") == "2019-11-14"
# lowercase day
assert partial_date("mercoledì 13 novembre 2019") == "2019-11-13"
# lowercase, dayphase, missing month and year
assert partial_date("venerdì pomeriggio 15") == "yyyy-mm-15"
# single day, lowercase, no year
assert partial_date("venerdì 15 novembre") == "yyyy-11-15"
# no year, hour / location to be discarded
assert partial_date("venerdì 5 aprile alle 20:30 in via degli Olmi 26 (Trento sud)")\
== "yyyy-04-05"
# two dates, 'and' connective ('e'), day phase morning/afternoon ('mattina'/'pomeriggio')
assert partial_date("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019") \
== "yyyy-mm-15"
# two dates, begins with connective 'Da'
assert partial_date("Da lunedì 25 novembre a domenica 01 dicembre 2019") == "yyyy-11-25"
assert partial_date("da giovedì 12 a domenica 15 dicembre 2019") == "yyyy-mm-12"
assert partial_date("da giovedì 9 a domenica 12 gennaio 2020") == "yyyy-mm-09"
assert partial_date("Da lunedì 04 a domenica 10 novembre 2019") == "yyyy-mm-04"
A.4 parse_dates_and¶
✪✪✪ Write a function which, given a string representing two possibly partial dates separated by the e
connective (and), RETURN a tuple holding the two extracted dates each in the format yyyy-mm-dd
.
IMPORTANT: Notice that the year or month of the first date might actually be indicated in the second date ! In this exercise we want missing information in the first date to be filled in with year and/or month taken from second date.
HINT: implement this function calling previously defined functions. If you do so, it will be fairly easy.
Examples:
>>> parse_dates_and("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019")
("2019-11-15", "2019-11-16")
>>> parse_dates_and("lunedì 4 e domenica 10 novembre")
("yyyy-11-04","yyyy-11-10")
For more examples, see asserts.
[9]:
def parse_dates_and(text):
#jupman-raise
ntext = text.lower()
strings = ntext.split(' ' + connective_and + ' ')
date_left = partial_date(strings[0])
date_right = partial_date(strings[1])
if 'yyyy' in date_left:
date_left = date_left.replace('yyyy', date_right[0:4])
if 'mm' in date_left:
date_left = date_left.replace('mm', date_right[5:7])
return (date_left, date_right)
#/jupman-raise
# complete dates
assert parse_dates_and("lunedì 25 aprile 2018 e domenica 01 dicembre 2019") == ("2018-04-25","2019-12-01")
# exactly two dates, day phase morning/afternoon ('mattina'/'pomeriggio')
assert parse_dates_and("venerdì pomeriggio 15 e sabato mattina 16 novembre 2019") == ("2019-11-15", "2019-11-16")
# first date missing year
assert parse_dates_and("lunedì 13 settembre e sabato 25 dicembre 2019") == ("2019-09-13","2019-12-25")
# first date missing month and year
assert parse_dates_and("Giovedì 12 e domenica 15 dicembre 2019") == ("2019-12-12","2019-12-15")
assert parse_dates_and("giovedì 9 e domenica 12 gennaio 2020") == ("2020-01-09", "2020-01-12")
assert parse_dates_and("lunedì 4 e domenica 10 novembre 2019") == ("2019-11-04","2019-11-10")
# first missing month and year, second missing year
assert parse_dates_and("lunedì 4 e domenica 10 novembre") == ("yyyy-11-04","yyyy-11-10")
# first missing month and year, second missing month and year
assert parse_dates_and("lunedì 4 e domenica 10") == ("yyyy-mm-04","yyyy-mm-10")
A.5 Fake news generator¶
Functional illiteracy is reading and writing skills that are inadequate “to manage daily living and employment tasks that require reading skills beyond a basic level”
✪✪ Knowing that functional illiteracy is on the rise, a news website wants to fire obsolete human journalists and attract customers by feeding them with automatically generated fake news. You are asked to develop the algorithm for producing the texts: while ethically questionable, the company pays well, so you accept.
Typically, a fake news starts with a real subject, a real fact (the antecedent), and follows it with some invented statement (the consequence). You are provided by the company three databases, one with subjects, one with antecedents and one of consequences. To each antecedent and consequence is associated a topic.
Write a function fake_news
which takes the databases and RETURN a list holding strings with all possible combinations of subjects, antecedents and consequences where the topic of antecedent matches the one of consequence. See desired output for more info.
NOTE: Your code MUST work with any database
[10]:
db_subjects = [
'Government',
'Party X',
]
db_antecedents = [
("passed fiscal reform","economy"),
("passed jobs act","economy"),
("regulated pollution emissions", "environment"),
("restricted building in natural areas", "environment"),
("introduced more controls in agrifood production","environment"),
("changed immigration policy","foreign policy"),
]
db_consequences = [
("economy","now spending is out of control"),
("economy","this increased taxes by 10%"),
("economy","this increased deficit by a staggering 20%"),
("economy","as a consequence our GDP has fallen dramatically"),
("environment","businesses had to fire many employees"),
("environment","businesses are struggling to meet law requirements"),
("foreign policy","immigrants are stealing our jobs"),
]
def fake_news(subjects, antecedents,consequences):
#jupman-raise
ret = []
for subject in subjects:
for ant in antecedents:
for con in consequences:
if ant[1] == con[0]:
ret.append(subject + ' ' + ant[0] + ', ' + con[1])
return ret
#/jupman-raise
#fake_news(db_subjects, db_antecedents, db_consequences)
[11]:
print()
print(" ******************* EXPECTED OUTPUT *******************")
print()
fake_news(db_subjects, db_antecedents, db_consequences)
******************* EXPECTED OUTPUT *******************
[11]:
['Government passed fiscal reform, now spending is out of control',
'Government passed fiscal reform, this increased taxes by 10%',
'Government passed fiscal reform, this increased deficit by a staggering 20%',
'Government passed fiscal reform, as a consequence our GDP has fallen dramatically',
'Government passed jobs act, now spending is out of control',
'Government passed jobs act, this increased taxes by 10%',
'Government passed jobs act, this increased deficit by a staggering 20%',
'Government passed jobs act, as a consequence our GDP has fallen dramatically',
'Government regulated pollution emissions, businesses had to fire many employees',
'Government regulated pollution emissions, businesses are struggling to meet law requirements',
'Government restricted building in natural areas, businesses had to fire many employees',
'Government restricted building in natural areas, businesses are struggling to meet law requirements',
'Government introduced more controls in agrifood production, businesses had to fire many employees',
'Government introduced more controls in agrifood production, businesses are struggling to meet law requirements',
'Government changed immigration policy, immigrants are stealing our jobs',
'Party X passed fiscal reform, now spending is out of control',
'Party X passed fiscal reform, this increased taxes by 10%',
'Party X passed fiscal reform, this increased deficit by a staggering 20%',
'Party X passed fiscal reform, as a consequence our GDP has fallen dramatically',
'Party X passed jobs act, now spending is out of control',
'Party X passed jobs act, this increased taxes by 10%',
'Party X passed jobs act, this increased deficit by a staggering 20%',
'Party X passed jobs act, as a consequence our GDP has fallen dramatically',
'Party X regulated pollution emissions, businesses had to fire many employees',
'Party X regulated pollution emissions, businesses are struggling to meet law requirements',
'Party X restricted building in natural areas, businesses had to fire many employees',
'Party X restricted building in natural areas, businesses are struggling to meet law requirements',
'Party X introduced more controls in agrifood production, businesses had to fire many employees',
'Party X introduced more controls in agrifood production, businesses are struggling to meet law requirements',
'Party X changed immigration policy, immigrants are stealing our jobs']
Midterm B - Fri 20, Dec 2019¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
You can take this midterm ONLY IF you got grade >= 16 in Part A midterm.
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2019-12-20-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2019-12-20-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-exams
|-2019-12-20
|- exam-2019-12-20-exercise.ipynb
|- theory.txt
|- linked_list_exercise.py
|- linked_list_test.py
|- bin_tree_exercise.py
|- bin_tree_test.py
Rename
datasciprolab-2019-12-20-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2019-12-20-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part B¶
B1 Theory¶
Write the solution in separate ``theory.txt`` file
B1.1 Complexity¶
Given a list 𝐿 of 𝑛 elements, please compute the asymptotic computational complexity of the following function, explaining your reasoning.
def my_fun(L):
R = 0
for i in range(len(L)):
for j in range(len(L)-1,0,-1):
k = 0
while k < 4:
R = R + L[j] - L[i]
k += 1
return R
B1.2 Data structure choice¶
Given an algorithm that frequently checks the presence of an element in its internal data structure. Please briefly answer the following questions:
What data structure would you choose? Why?
In case entries are sorted, would you use the same data structures?
B2 LinkedList¶
Open a text editor and edit file linkedlist_exercise.py
You are given a LinkedList
holding pointers _head
, _last
, and also _size
attribute.
Notice the list also holds _last
and _size
attributes !!!
B2.1 rotate¶
✪✪ Implement this method:
def rotate(self):
""" Rotate the list of 1 element, that is, removes last node and
inserts it as the first one.
- MUST execute in O(n) where n is the length of the list
- Remember to also update _last pointer
- WARNING: DO *NOT* try to convert whole linked list to a python list
- WARNING: DO *NOT* swap node data or create nodes, I want you to
change existing node links !!
"""
Testing: python3 -m unittest linked_list_test.RotateTest
Example:
[2]:
from linked_list_solution import *
[3]:
ll = LinkedList()
ll.add('d')
ll.add('c')
ll.add('b')
ll.add('a')
print(ll)
LinkedList: a,b,c,d
[4]:
ll.rotate()
[5]:
print(ll)
LinkedList: d,a,b,c
B2.2 rotaten¶
✪✪✪ Implement this method:
def rotaten(self, k):
""" Rotate k times the linkedlist
- k can range from 0 to any positive integer number (even greater than list size)
- if k < 0 raise ValueError
- MUST execute in O( n-(k%n) ) where n is the length of the list
- WARNING: DO *NOT* call .rotate() k times !!!!
- WARNING: DO *NOT* try to convert whole linked list to a python list
- WARNING: DO *NOT* swap node data or create nodes, I want you to
change node links !!
"""
Testing: python3 -m unittest linked_list_test.RotatenTest
IMPORTANT HINT
The line “MUST execute in O( n-(k%n) ) where n is the length of the list” means that you have to calculate m = k%n
, and then only scan first n-m
nodes!
Example:
[6]:
ll = LinkedList()
ll.add('h')
ll.add('g')
ll.add('f')
ll.add('e')
ll.add('d')
ll.add('c')
ll.add('b')
ll.add('a')
print(ll)
LinkedList: a,b,c,d,e,f,g,h
[7]:
ll.rotaten(0) # changes nothing
[8]:
print(ll)
LinkedList: a,b,c,d,e,f,g,h
[9]:
ll.rotaten(3)
[10]:
print(ll)
LinkedList: f,g,h,a,b,c,d,e
[11]:
ll.rotaten(8) # changes nothing
[12]:
print(ll)
LinkedList: f,g,h,a,b,c,d,e
[13]:
ll.rotaten(5)
[14]:
print(ll)
LinkedList: a,b,c,d,e,f,g,h
[15]:
ll.rotaten(11) # 11 = 8 + 3 , only rotates 3 nodes
[16]:
print(ll)
LinkedList: f,g,h,a,b,c,d,e
B3 Binary trees¶
We will now go looking for leaves, that is, nodes with no children. Open bin_tree_exercise
.
[17]:
from bin_tree_test import bt
from bin_tree_solution import *
B3.1 sum_leaves_rec¶
✪✪ Implement this method:
def sum_leaves_rec(self):
""" Supposing the tree holds integer numbers in all nodes,
RETURN the sum of ONLY the numbers in the leaves.
- a root with no children is considered a leaf
- implement it as a recursive Depth First Search (DFS) traversal
NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.SumLeavesRecTest
Example:
[18]:
t = bt(3,
bt(10,
bt(1),
bt(7,
bt(5))),
bt(9,
bt(6,
bt(2,
None,
bt(4)),
bt(8))))
t.sum_leaves_rec() # 1 + 5 + 4 + 8
[18]:
18
B3.2 leaves_stack¶
✪✪✪ Implement this method:
def leaves_stack(self):
""" RETURN a list holding the *data* of all the leaves of the tree,
in left to right order.
- a root with no children is considered a leaf
- DO *NOT* use recursion
- implement it with a while and a stack (as a Python list)
"""
Testing: python3 -m unittest bin_tree_test.LeavesStackTest
Example:
[19]:
t = bt('a',
bt('b',
bt('c'),
bt('d',
None,
bt('e'))),
bt('f',
bt('g',
bt('h')),
bt('i')))
t.leaves_stack()
[19]:
['c', 'e', 'h', 'i']
[ ]:
Exam - Thu 23, Jan 2020 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2020-01-23-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2020-01-23-FIRSTNAME-LASTNAME-ID
data
db.mm
proof.txt
exam-2020-01-23.ipynb
digi_list_exercise.py
digi_list_test.py
bin_tree_exercise.py
bin_tree_test.py
jupman.py
sciprog.py
Rename
datasciprolab-2020-01-23-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2020-01-23-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A¶
Open Jupyter and start editing this notebook exam-2020-01-23.ipynb
Metamath¶
Metamath is a language that can express theorems, accompanied by proofs that can be verified by a computer program. Its website lets you browse from complex theorems up to the most basic axioms they rely on to be proven .
For this exercise, we have two files to consider, db.mm
and proof.txt
.
db.mm
contains the description of a simple algebra where you can only add zero to variablesproof.txt
contains the awesome proof that… any variable is equal to itself
The purpose of this exercise is to visualize the steps of the proof as a graph, and visualize statement frequencies.
DISCLAIMER: No panic !
You DO NOT need to understand any of the mathematics which follows. Here we are only interested in parsing the data and visualize it
Metamath db¶
First you will load data/db.mm
and parse text file into Python, here is the full content:
$( Declare the constant symbols we will use $)
$c 0 + = -> ( ) term wff |- $.
$( Declare the metavariables we will use $)
$v t r s P Q $.
$( Specify properties of the metavariables $)
tt $f term t $.
tr $f term r $.
ts $f term s $.
wp $f wff P $.
wq $f wff Q $.
$( Define "term" and "wff" $)
tze $a term 0 $.
tpl $a term ( t + r ) $.
weq $a wff t = r $.
wim $a wff ( P -> Q ) $.
$( State the axioms $)
a1 $a |- ( t = r -> ( t = s -> r = s ) ) $.
a2 $a |- ( t + 0 ) = t $.
$( Define the modus ponens inference rule $)
${
min $e |- P $.
maj $e |- ( P -> Q ) $.
mp $a |- Q $.
$}
Format description:
Each row is a statement
Words are separated by spaces. Each word that appears in a statement is called a token
Tokens starting with dollar
$
are called keywords, you may have$(
,$)
,$c
,$v
,$a
,$f
,${
,$}
,$.
Statements may be identified with a unique arbitrary label, which is placed at the beginning of the row. For example,
tt
,weq
,maj
are all labels (in the file there are more):tt $f term t $.
weq $a wff t = r $.
maj $e |- ( P -> Q ) $.
Some rows have no label, examples:
$c 0 + = -> ( ) term wff |- $.
$v t r s P Q $.
$( State the axioms $)
${
$}
in each row, after the first dollar keyword, you may have an arbitratry sequence of characters terminated by a dollar followed by a dot
$.
. You don’t need to care about the sequence meaning! Examples:tt $f term t $.
has sequenceterm t
weq $a wff t = r $.
has sequencewff t = r
$v t r s P Q $.
has sequencet r s P Q
Now implement function parse_db
which scans the file line by line (it is a text file, so you can use line files examples), parses ONLY rows with labels, and RETURN a dictionary mapping labels to remaining data in the row represented as a dictionary, formatted like this (showing here only first three labels):
{
'a1': {'keyword': '$a',
'sequence': '|- ( t = r -> ( t = s -> r = s ) )'
},
'a2': {
'keyword': '$a',
'sequence': '|- ( t + 0 ) = t'
},
'maj': {
'keyword': '$e',
'sequence': '|- ( P -> Q )'
},
.
.
.
}
A.1 Metamath db¶
[2]:
def parse_db(filepath):
#jupman-raise
ret = {}
with open(filepath, encoding='utf-8') as f:
line=f.readline().strip()
while line != "":
#print(line)
if line.startswith('$('):
label = ''
keyword = '$('
sequence = ''
elif line.split()[0].startswith('${'):
label = ''
keyword = '${'
sequence = ''
elif line.split()[0].startswith('$}'):
label = ''
keyword = '$}'
sequence = ''
elif line.split()[0].startswith('$'):
label = ''
keyword = line.split()[0]
sequence = line.split()[1][:-2].strip()
else:
label = line.split(' $')[0].strip()
keyword = line.split()[1]
if line.endswith('$.'):
sequence = line.split(keyword)[1][1:-2].strip()
if label:
ret[label] = {
'keyword' : keyword,
'sequence' : sequence
}
#print(' DEBUG: FOUND', label, ':', ret[label])
#else:
#print(' DEBUG: DISCARDED')
line=f.readline().strip()
return ret
#/jupman-raise
db_mm = parse_db('data/db.mm')
assert db_mm['tt'] == {'keyword': '$f', 'sequence': 'term t'}
assert db_mm['maj'] == {'keyword': '$e', 'sequence': '|- ( P -> Q )'}
# careful 'mp' label shouldn't have spaces inside !
assert 'mp' in db_mm
assert db_mm['mp'] == {'keyword': '$a', 'sequence': '|- Q'}
from pprint import pprint
#pprint(db_mm)
[3]:
from pprint import pprint
print("************ EXPECTED OUTPUT: ****************")
pprint(db_mm)
************ EXPECTED OUTPUT: ****************
{'a1': {'keyword': '$a', 'sequence': '|- ( t = r -> ( t = s -> r = s ) )'},
'a2': {'keyword': '$a', 'sequence': '|- ( t + 0 ) = t'},
'maj': {'keyword': '$e', 'sequence': '|- ( P -> Q )'},
'min': {'keyword': '$e', 'sequence': '|- P'},
'mp': {'keyword': '$a', 'sequence': '|- Q'},
'tpl': {'keyword': '$a', 'sequence': 'term ( t + r )'},
'tr': {'keyword': '$f', 'sequence': 'term r'},
'ts': {'keyword': '$f', 'sequence': 'term s'},
'tt': {'keyword': '$f', 'sequence': 'term t'},
'tze': {'keyword': '$a', 'sequence': 'term 0'},
'weq': {'keyword': '$a', 'sequence': 'wff t = r'},
'wim': {'keyword': '$a', 'sequence': 'wff ( P -> Q )'},
'wp': {'keyword': '$f', 'sequence': 'wff P'},
'wq': {'keyword': '$f', 'sequence': 'wff Q'}}
A.2 Metamath proof¶
A proof file is made of steps, one per row. Each statement, in order to be proven, needs other steps to be proven until very basic facts called axioms are reached, which need no further proof (typically proofs in Metamath are shown in much shorter format, but here we use a more explicit way)
So a proof can be nicely displayed as a tree of the steps it is made of, where the top node is the step to be proven and the axioms are the leaves of the tree.
Complete content of data/proof.txt
:
1 tt $f term t
2 tze $a term 0
3 1,2 tpl $a term ( t + 0 )
4 tt $f term t
5 3,4 weq $a wff ( t + 0 ) = t
6 tt $f term t
7 tt $f term t
8 6,7 weq $a wff t = t
9 tt $f term t
10 9 a2 $a |- ( t + 0 ) = t
11 tt $f term t
12 tze $a term 0
13 11,12 tpl $a term ( t + 0 )
14 tt $f term t
15 13,14 weq $a wff ( t + 0 ) = t
16 tt $f term t
17 tze $a term 0
18 16,17 tpl $a term ( t + 0 )
19 tt $f term t
20 18,19 weq $a wff ( t + 0 ) = t
21 tt $f term t
22 tt $f term t
23 21,22 weq $a wff t = t
24 20,23 wim $a wff ( ( t + 0 ) = t -> t = t )
25 tt $f term t
26 25 a2 $a |- ( t + 0 ) = t
27 tt $f term t
28 tze $a term 0
29 27,28 tpl $a term ( t + 0 )
30 tt $f term t
31 tt $f term t
32 29,30,31 a1 $a |- ( ( t + 0 ) = t -> ( ( t + 0 ) = t -> t = t ) )
33 15,24,26,32 mp $a |- ( ( t + 0 ) = t -> t = t )
34 5,8,10,33 mp $a |- t = t
Each line represents a step of the proof. Last line is the final goal of the proof.
Each line contains, in order:
a step number at the beginning, starting from 1 (
step_id
)possibly a list of other step_ids, separated by commas, like
29,30,31
- they are references to previous rowslabel of the
db_mm
statement referenced by the step, likett
,tze
,weq
- that label must have been defined somewhere indb.mm
filestatement type: a token starting with a dollar, like
$a
,$f
a sequence of characters, like (for you they are just characters, don’t care about the meaning !):
term ( t + 0 )
|- ( ( t + 0 ) = t -> ( ( t + 0 ) = t -> t = t ) )
Implement function parse_proof
, which takes a filepath
to the proof and RETURN a list of steps expressed as a dictionary, in this format (showing here only first 5 items):
NOTE: referenced step_ids
are integer numbers and they are the original ones from the file, meaning they start from one.
[
{'keyword': '$f',
'label': 'tt',
'sequence': 'term t',
'step_ids': []},
{'keyword': '$a',
'label': 'tze',
'sequence': 'term 0',
'step_ids': []},
{'keyword': '$a',
'label': 'tpl',
'sequence': 'term ( t + 0 )',
'step_ids': [1,2]},
{'keyword': '$f',
'label': 'tt',
'sequence': 'term t',
'step_ids': []},
{'keyword': '$a',
'label': 'weq',
'sequence': 'wff ( t + 0 ) = t',
'step_ids': [3,4]},
.
.
.
]
[4]:
def parse_proof(filepath):
#jupman-raise
ret = []
with open(filepath, encoding='utf-8') as f:
line=f.readline().strip()
while line != "":
step_id = int(line.split(' ')[0])
label = line.split('$')[0].strip().split(' ')[-1]
keyword = '$' + line.split('$')[1][:1]
sequence = line.split('$')[1][2:]
candidate_step_ids = line.split(' ')[1]
if candidate_step_ids != label:
step_ids = [int(x) for x in line.split(' ')[1].split(',')]
else:
step_ids = []
#print('deps =', deps)
ret.append( {
'step_ids': step_ids,
'sequence': sequence,
'label': label,
'keyword': keyword
})
line=f.readline().strip()
return ret
#/jupman-raise
proof = parse_proof('data/proof.txt')
assert proof[0] == {'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []}
assert proof[1] == {'keyword': '$a', 'label': 'tze', 'sequence': 'term 0', 'step_ids': []}
assert proof[2] == {'keyword': '$a',
'label': 'tpl',
'sequence': 'term ( t + 0 )',
'step_ids': [1, 2]}
assert proof[4] == {'keyword': '$a',
'label': 'weq',
'sequence': 'wff ( t + 0 ) = t',
'step_ids': [3,4]}
assert proof[33] == { 'keyword': '$a',
'label': 'mp',
'sequence': '|- t = t',
'step_ids': [5, 8, 10, 33]}
pprint(proof)
[{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a', 'label': 'tze', 'sequence': 'term 0', 'step_ids': []},
{'keyword': '$a',
'label': 'tpl',
'sequence': 'term ( t + 0 )',
'step_ids': [1, 2]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a',
'label': 'weq',
'sequence': 'wff ( t + 0 ) = t',
'step_ids': [3, 4]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a', 'label': 'weq', 'sequence': 'wff t = t', 'step_ids': [6, 7]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a',
'label': 'a2',
'sequence': '|- ( t + 0 ) = t',
'step_ids': [9]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a', 'label': 'tze', 'sequence': 'term 0', 'step_ids': []},
{'keyword': '$a',
'label': 'tpl',
'sequence': 'term ( t + 0 )',
'step_ids': [11, 12]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a',
'label': 'weq',
'sequence': 'wff ( t + 0 ) = t',
'step_ids': [13, 14]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a', 'label': 'tze', 'sequence': 'term 0', 'step_ids': []},
{'keyword': '$a',
'label': 'tpl',
'sequence': 'term ( t + 0 )',
'step_ids': [16, 17]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a',
'label': 'weq',
'sequence': 'wff ( t + 0 ) = t',
'step_ids': [18, 19]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a',
'label': 'weq',
'sequence': 'wff t = t',
'step_ids': [21, 22]},
{'keyword': '$a',
'label': 'wim',
'sequence': 'wff ( ( t + 0 ) = t -> t = t )',
'step_ids': [20, 23]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a',
'label': 'a2',
'sequence': '|- ( t + 0 ) = t',
'step_ids': [25]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a', 'label': 'tze', 'sequence': 'term 0', 'step_ids': []},
{'keyword': '$a',
'label': 'tpl',
'sequence': 'term ( t + 0 )',
'step_ids': [27, 28]},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$f', 'label': 'tt', 'sequence': 'term t', 'step_ids': []},
{'keyword': '$a',
'label': 'a1',
'sequence': '|- ( ( t + 0 ) = t -> ( ( t + 0 ) = t -> t = t ) )',
'step_ids': [29, 30, 31]},
{'keyword': '$a',
'label': 'mp',
'sequence': '|- ( ( t + 0 ) = t -> t = t )',
'step_ids': [15, 24, 26, 32]},
{'keyword': '$a',
'label': 'mp',
'sequence': '|- t = t',
'step_ids': [5, 8, 10, 33]}]
Checking proof¶
If you’ve done everything properly, by executing following cells you should be be able to see nice graphs.
IMPORTANT: You do not need to implement anything!
Just look if results match expected graphs
Overview plot¶
Here we only show step numbers using function draw_proof
defined in sciprog
library
[5]:
from sciprog import draw_proof
# uncomment and check
#draw_proof(proof, db_mm, only_ids=True) # all graph, only numbers
[6]:
print()
print('************************ EXPECTED COMPLETE GRAPH **********************************')
draw_proof(proof, db_mm, only_ids=True)
************************ EXPECTED COMPLETE GRAPH **********************************

Detail plot¶
Here we show data from both the proof
and the db_mm
we calculated earlier. To avoid having a huge graph we only focus on subtree starting from step_id
24.
To understand what is shown, look at node 20
: - first line contains statement wff ( t + 0 ) = t
taken from line 20 of proof
file - second line weq: wff t = r
is taken from db_mm
, and means rule labeled weq
was used to derive the statement in the first line.
[7]:
# uncomment and check
#draw_proof(proof, db_mm, step_id=24)
[8]:
print()
print('************************* EXPECTED DETAIL GRAPH *******************************')
draw_proof(proof, db_mm, step_id=24)
************************* EXPECTED DETAIL GRAPH *******************************

A.3 Metamath top statements¶
We can measure the importance of theorems and definitions (in general, statements) by counting how many times they are referenced in proofs.
A3.1 histogram¶
Write some code to plot the histogram of statement labels referenced by steps in proof
, from most to least frequently referenced.
A label gets a count each time a step references another step with that label.
For example, in the subgraph above:
tt
is referenced 4 times, that is, there are 4 steps referencing other steps which contain the labeltt
weq
is referenced 2 timestpl
andtze
are referenced 1 time eachwim
is referenced 0 times (it is only present in the last node, which being the root node cannot be referenced by any step)
NOTE: the previous counts are just for the subgraph example.
In your exercise, you will need to consider all the steps
A3.2 print list¶
Below the graph, print the list of labels from most to least frequent, associating them to corresponding statement sequence taken from db_mm
[9]:
# write here
[10]:
# SOLUTION
import numpy as np
import matplotlib.pyplot as plt
freqs = {}
for step in proof:
for step_id in step['step_ids']:
label = proof[step_id-1]['label']
if label not in freqs:
freqs[label] = 1
else:
freqs[label] += 1
xs = np.arange(len(freqs.keys()))
coords = [(k, freqs[k]) for k in freqs ]
coords.sort(key=lambda c: c[1], reverse=True)
ys_in = [c[1] for c in coords]
plt.bar(xs, ys_in, 0.5, align='center')
plt.title("Statement references SOLUTION")
plt.xticks(xs, [c[0] for c in coords])
plt.xlabel('Statement labels')
plt.ylabel('frequency')
plt.show()
for c in coords:
print(c[0], ':', '\t', db_mm[c[0]]['sequence'])

tt : term t
weq : wff t = r
tze : term 0
tpl : term ( t + r )
a2 : |- ( t + 0 ) = t
wim : wff ( P -> Q )
a1 : |- ( t = r -> ( t = s -> r = s ) )
mp : |- Q
[ ]:
Part B¶
B1 Theory¶
Write the solution in separate ``theory.txt`` file
B1.1 my_fun¶
Given a list L
of n
elements, please compute the asymptotic computational complexity of the following function, explaining your reasoning.
def my_fun(L):
n = len(L)
if n <= 1:
return 1
else:
L1 = L[0:n//2]
L2 = L[n//2:]
a = my_fun(L1) + max(L1)
b = my_fun(L2) + max(L2)
return a + b
B1.2 differences¶
Briefly describe the main differences between the stack and queue data structures. Please provide an example of where you would use one or the other.
B2 plus_one¶
Open a text editor and edit file digi_lists_exercise.py
You are given this class:
class DigiList:
"""
This is a stripped down version of the LinkedList as previously seen,
which can only hold integer digits 0-9
NOTE: there is also a _last pointer
"""
Implement this method:
def plus_one(self):
""" MODIFIES the digi list by summing one to the integer number it represents
- you are allowed to perform multiple scans of the linked list
- remember the list has a _last pointer
- MUST execute in O(N) where N is the size of the list
- DO *NOT* create new nodes EXCEPT for special cases:
a. empty list ( [] -> [5] )
b. all nines ( [9,9,9] -> [1,0,0,0] )
- DO *NOT* convert the digi list to a python int
- DO *NOT* convert the digi list to a python list
- DO *NOT* reverse the digi list
"""
Test: python3 -m unittest digi_list_test.PlusOneTest
Example:
[11]:
from digi_list_solution import *
dl = DigiList()
dl.add(9)
dl.add(9)
dl.add(7)
dl.add(3)
dl.add(9)
dl.add(2)
print(dl)
DigiList: 2,9,3,7,9,9
[12]:
dl.last()
[12]:
9
[13]:
dl.plus_one()
[14]:
print(dl)
DigiList: 2,9,3,8,0,0
B3 add_row¶
Open a text editor and edit file bin_tree_exercise.py
.
Now implement this method:
def add_row(self, elems):
""" Takes as input a list of data and MODIFIES the tree by adding
a row of new leaves, each having as data one element of elems,
in order.
- elems size can be less than 2*|leaves|
- if elems size is more than 2*|leaves|, raises ValueError
- for simplicity, you can assume assume self is a perfect
binary tree, that is a binary tree in which all interior nodes
have two children and all leaves have the same depth
- MUST execute in O(n+|elems|) where n is the size of the tree
- DO *NOT* use recursion
- implement it with a while and a stack (as a Python list)
"""
Test: python3 -m unittest bin_tree_test.AddRowTest
Example:
[15]:
from bin_tree_solution import *
from bin_tree_test import bt
t = bt('a',
bt('b',
bt('d'),
bt('e')),
bt('c',
bt('f'),
bt('g')))
print(t)
a
├b
│├d
│└e
└c
├f
└g
[16]:
t.add_row(['h','i','j','k','l'])
[17]:
print(t)
a
├b
│├d
││├h
││└i
│└e
│ ├j
│ └k
└c
├f
│├l
│└
└g
Exam - Monday 10, February 2020 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Bonus point: One bonus point can be earned by writing stylish code. You got style if you:
do not infringe the Commandments
write pythonic code
avoid convoluted code like i.e.
if x > 5: return True else: return False
when you could write just
return x > 5
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2020-02-10-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2020-02-10-FIRSTNAME-LASTNAME-ID
|-jupman.py
|-sciprog.py
|-exams
|-2020-02-10
|- exam-2020-02-10-exercise.ipynb
|- B1-theory.txt
|- B2_italian_queue_v2_exercise.py
|- B2_italian_queue_v2_test.py
Rename
datasciprolab-2020-02-10-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2020-02-10-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A¶
Open Jupyter and start editing this notebook exam-2020-02-10-exercise.ipynb
WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of semantic relations. The resulting network of related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download, making it a useful tool for computational linguistics and natural language processing. Princeton University “About WordNet.” WordNet. Princeton University. 2010
In Python there are specialized libraries to read WordNet like NLTK, but for the sake of this exercise, you will parse the noun database as a text file which can be read line by line.
We will focus on names and how they are linked by IS A relation, for example, a dalmatian
IS A dog
(IS A is also called hypernym relation)
A1 parse_db¶
First, you will begin with parsing an excerpt of wordnet data/dogs.noun
, which is a noun database shown here in its entirety.
According to documentation, a noun database begins with several lines containing a copyright notice, version number, and license agreement: these lines all begin with two spaces and the line number like
1 This software and database is being provided to you, the LICENSEE, by
2 Princeton University under the following license. By obtaining, using
3 and/or copying this software and database, you agree that you have
Afterwards, each of following lines describe a noun synset, that is, a unique concept identified by a number called synset_offset
.
each synset can have many words to represent it - for example, the noun synset
02112993
has03
(w_cnt
) wordsdalmatian
coach_dog
,carriage_dog
.a synset can be linked to other ones by relations. The dalmatian synset is linked to
002
(p_cnt
) other synsets: to synset02086723
by the@
relation, and to synset02113184
by the~
relation. For our purposes, you can focus on the@
symbol which means IS A relation (also calledhypernym
). If you search for a line starting with02086723
, you will see it is the synset fordog
, so Wordnet is telling us adalmatian
IS Adog
.
WARNING 1: lines can be quite long so if they appear to span multiple lines don’t be fooled : remember each name definition only occupies one single line with no carriage returns!
WARNING 2: there are no empty lines between the synsets, here you see them just to visually separate the text blobs
1 This software and database is being provided to you, the LICENSEE, by
2 Princeton University under the following license. By obtaining, using
3 and/or copying this software and database, you agree that you have
4 read, understood, and will comply with these terms and conditions.:
5
6 Permission to use, copy, modify and distribute this software and
7 database and its documentation for any purpose and without fee or
8 royalty is hereby granted, provided that you agree to comply with
9 the following copyright notice and statements, including the disclaimer,
10 and that the same appear on ALL copies of the software, database and
11 documentation, including modifications that you make for internal
12 use or for distribution.
13
14 WordNet 3.1 Copyright 2011 by Princeton University. All rights reserved.
15
16 THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON
17 UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
18 IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON
19 UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT-
20 ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE
21 OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT
22 INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR
23 OTHER RIGHTS.
24
25 The name of Princeton University or Princeton may not be used in
26 advertising or publicity pertaining to distribution of the software
27 and/or database. Title to copyright in this software, database and
28 any associated documentation shall at all times remain with
29 Princeton University and LICENSEE agrees to preserve same.
01320032 05 n 02 domestic_animal 0 domesticated_animal 0 007 @ 00015568 n 0000 ~ 01320304 n 0000 ~ 01320544 n 0000 ~ 01320872 n 0000 ~ 02086723 n 0000 ~ 02124460 n 0000 ~ 02125232 n 0000 | any of various animals that have been tamed and made fit for a human environment
02085998 05 n 02 canine 0 canid 0 011 @ 02077948 n 0000 #m 02085690 n 0000 + 02688440 a 0101 ~ 02086324 n 0000 ~ 02086723 n 0000 ~ 02116752 n 0000 ~ 02117748 n 0000 ~ 02117987 n 0000 ~ 02119787 n 0000 ~ 02120985 n 0000 %p 02442560 n 0000 | any of various fissiped mammals with nonretractile claws and typically long muzzles
02086723 05 n 03 dog 0 domestic_dog 0 Canis_familiaris 0 023 @ 02085998 n 0000 @ 01320032 n 0000 #m 02086515 n 0000 #m 08011383 n 0000 ~ 01325095 n 0000 ~ 02087384 n 0000 ~ 02087513 n 0000 ~ 02087924 n 0000 ~ 02088026 n 0000 ~ 02089774 n 0000 ~ 02106058 n 0000 ~ 02112993 n 0000 ~ 02113458 n 0000 ~ 02113610 n 0000 ~ 02113781 n 0000 ~ 02113929 n 0000 ~ 02114152 n 0000 ~ 02114278 n 0000 ~ 02115149 n 0000 ~ 02115478 n 0000 ~ 02115987 n 0000 ~ 02116630 n 0000 %p 02161498 n 0000 | a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds; “the dog barked all night”
02106058 05 n 01 working_dog 0 016 @ 02086723 n 0000 ~ 02106493 n 0000 ~ 02107175 n 0000 ~ 02109506 n 0000 ~ 02110072 n 0000 ~ 02110741 n 0000 ~ 02110906 n 0000 ~ 02111074 n 0000 ~ 02111324 n 0000 ~ 02111699 n 0000 ~ 02111802 n 0000 ~ 02112043 n 0000 ~ 02112177 n 0000 ~ 02112339 n 0000 ~ 02112463 n 0000 ~ 02112613 n 0000 | any of several breeds of usually large powerful dogs bred to work as draft animals and guard and guide dogs
02112993 05 n 03 dalmatian 0 coach_dog 0 carriage_dog 0 002 @ 02086723 n 0000 ~ 02113184 n 0000 | a large breed having a smooth white coat with black or brown spots; originated in Dalmatia
02107175 05 n 03 shepherd_dog 0 sheepdog 0 sheep_dog 0 012 @ 02106058 n 0000 ~ 02107534 n 0000 ~ 02107903 n 0000 ~ 02108064 n 0000 ~ 02108157 n 0000 ~ 02108293 n 0000 ~ 02108507 n 0000 ~ 02108682 n 0000 ~ 02108818 n 0000 ~ 02109034 n 0000 ~ 02109202 n 0000 ~ 02109314 n 0000 | any of various usually long-haired breeds of dog reared to herd and guard sheep
02111324 05 n 02 bulldog 0 English_bulldog 0 003 @ 02106058 n 0000 + 01121448 v 0101 ~ 02111567 n 0000 | a sturdy thickset short-haired breed with a large head and strong undershot lower jaw; developed originally in England for bull baiting
02116752 05 n 01 wolf 0 007 @ 02085998 n 0000 #m 02086515 n 0000 ~ 01324999 n 0000 ~ 02117019 n 0000 ~ 02117200 n 0000 ~ 02117364 n 0000 ~ 02117507 n 0000 | any of various predatory carnivorous canine mammals of North America and Eurasia that usually hunt in packs
Field description¶
While parsing, skip the copyright notice. Then, each name definition follows the following format:
synset_offset lex_filenum ss_type w_cnt word lex_id [word lex_id...] p_cnt [ptr...] | gloss
synset_offset
: Number identifying the synset, for example02112993
. MUST be converted to a Python intlex_filenum
: Two digit decimal integer corresponding to the lexicographer file name containing the synset, for example03
. MUST be converted to a Python intss_type
: One character code indicating the synset type, store it as a string.w_cnt
: Two digit hexadecimal integer indicating the number of words in the synset, for exampleb3
. MUST be converted to a Python int.
WARNING: w_cnt
is expressed as hexadecimal!
To convert an hexadecimal number like b3
to a decimal int you will need to specify the base 16 like in int('b3',16)
which produces the decimal integer 179
.
Afterwards, there will be
w_cnt
words, each represented by two fields (for example,dalmatian 0
). You MUST store these fields into a Python list calledwords
containing a dictionary for each word, having these fields:word
: ASCII form of a word (example:dalmatian
), with spaces replaced by underscore characters (_
)lex_id
: One digit hexadecimal integer (example:0
) that MUST be converted to a Python int
WARNING: lex_id
is expressed as hexadecimal!
To convert an hexadecimal number like b3
to a decimal int you will need to specify the base 16 like in int('b3',16)
which produces the decimal integer 179
.
p_cnt
: Three digit decimal integer indicating the number of pointers (that is, relations like for example IS A) from this synset to other synsets. MUST be converted to a Python int
WARNING: differently from w_cnt
, the value p_cnt
is expressed as decimal!
Afterwards, there will be
p_cnt
pointers, each represented by four fieldspointer_symbol
synset_offset
pos
source/target
(for example,@ 02086723 n 0000
). You MUST store these fields into a Python list calledptrs
containing a dictionary for each pointer, having these fields:pointer_symbol
: a symbol indicating the type of relation, for example@
(which represents IS A relation)synset_offset
: the identifier of the target synset, for example02086723
. You MUST convert this to a Python intpos
: just parse it as a string (we will not use it)source/target
: just parse it as a string (we will not use it)
WARNING: DO NOT assume first pointer is an @
(IS A) !!
In the full database, the root synset entity can’t possibly have a parent synset:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
00001740 03 n 01 entity 0 003 ~ 00001930 n 0000 ~ 00002137 n 0000 ~ 04431553 n 0000 | that which is perceived or known or inferred to have its own distinct existence (living or nonliving)
gloss
: Each synset contains a gloss (that is, a description). A gloss is represented as a vertical bar (|
), followed by a text string that continues until the end of the line. For example,a large breed having a smooth white coat with black or brown spots; originated in Dalmatia
implement parse_db¶
[2]:
def parse_db(filename):
""" Parses noun database filename as a text file and RETURN a dictionary containing
all the synset found. Each key will be a synset_offset mapping to a dictionary
holding the fields of the correspoing synset. See next printout for an example.
"""
#jupman-raise
ret = {}
with open(filename, encoding='utf-8') as f:
line=f.readline()
r = 0
while line.startswith(' '):
line=f.readline()
#print(line)
r += 1
while line != "":
i = 0
d = {}
params = line.split('|')[0].split(' ')
d['synset_offset'] = int(params[0]) # '00001740'
d['lex_filenum'] = int(params[1]) # '03'
d['ss_type'] = params[2] # 'n'
# WARNING: HERE THE STRING REPRESENT A NUMBER IN *HEXADECIMAL* FORMAT,
# AND WE WANT TO STORE AN *INTEGER*
# TO DO THE CONVERSION PROPERLY, YOU NEED TO USE int(my_string, 16)
d['w_cnt'] = int(params[3], 16) # 'b3' -> 179
d['words'] = []
i = 4
for j in range(d['w_cnt']):
wd = {
'word' : params[i], # 'entity'
'lex_id': int(params[i + 1],16), # '0'
}
d['words'].append(wd)
i += 2
#
# WARNING: HERE THE STRING REPRESENT A NUMBER IN *DECIMAL* FORMAT,
# AND WE WANT TO STORE AN *INTEGER*
# TO DO THE CONVERSION PROPERLY, YOU NEED TO USE int(my_string)
d['p_cnt'] = int(params[i]) # '003' -> 3
d['ptrs'] = []
i += 1
for j in range(d['p_cnt']):
ptr = {
'pointer_symbol': params[i ], # '~'
'synset_offset': int(params[i + 1]), # '00001930'
'pos': params[i + 2], # 'n'
'source_target':params[i + 3], # '0000'
}
d['ptrs'].append(ptr)
i += 4
d['gloss'] = line.split('|')[1]
ret[d['synset_offset']] = d
i += 1
line=f.readline()
return ret
#/jupman-raise
[3]:
dogs_db = parse_db('data/dogs.noun')
from pprint import pprint
pprint(dogs_db)
{1320032: {'gloss': ' any of various animals that have been tamed and made fit '
'for a human environment\n',
'lex_filenum': 5,
'p_cnt': 7,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 15568},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1320304},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1320544},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1320872},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086723},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2124460},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2125232}],
'ss_type': 'n',
'synset_offset': 1320032,
'w_cnt': 2,
'words': [{'lex_id': 0, 'word': 'domestic_animal'},
{'lex_id': 0, 'word': 'domesticated_animal'}]},
2085998: {'gloss': ' any of various fissiped mammals with nonretractile claws '
'and typically long muzzles \n',
'lex_filenum': 5,
'p_cnt': 11,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2077948},
{'pointer_symbol': '#m',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2085690},
{'pointer_symbol': '+',
'pos': 'a',
'source_target': '0101',
'synset_offset': 2688440},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086324},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086723},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2116752},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117748},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117987},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2119787},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2120985},
{'pointer_symbol': '%p',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2442560}],
'ss_type': 'n',
'synset_offset': 2085998,
'w_cnt': 2,
'words': [{'lex_id': 0, 'word': 'canine'},
{'lex_id': 0, 'word': 'canid'}]},
2086723: {'gloss': ' a member of the genus Canis (probably descended from the '
'common wolf) that has been domesticated by man since '
'prehistoric times; occurs in many breeds; "the dog barked '
'all night" \n',
'lex_filenum': 5,
'p_cnt': 23,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2085998},
{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1320032},
{'pointer_symbol': '#m',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086515},
{'pointer_symbol': '#m',
'pos': 'n',
'source_target': '0000',
'synset_offset': 8011383},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1325095},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2087384},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2087513},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2087924},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2088026},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2089774},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2106058},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112993},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113458},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113610},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113781},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113929},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2114152},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2114278},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2115149},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2115478},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2115987},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2116630},
{'pointer_symbol': '%p',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2161498}],
'ss_type': 'n',
'synset_offset': 2086723,
'w_cnt': 3,
'words': [{'lex_id': 0, 'word': 'dog'},
{'lex_id': 0, 'word': 'domestic_dog'},
{'lex_id': 0, 'word': 'Canis_familiaris'}]},
2106058: {'gloss': ' any of several breeds of usually large powerful dogs '
'bred to work as draft animals and guard and guide '
'dogs \n',
'lex_filenum': 5,
'p_cnt': 16,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086723},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2106493},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2107175},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2109506},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2110072},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2110741},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2110906},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111074},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111324},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111699},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111802},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112043},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112177},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112339},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112463},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2112613}],
'ss_type': 'n',
'synset_offset': 2106058,
'w_cnt': 1,
'words': [{'lex_id': 0, 'word': 'working_dog'}]},
2107175: {'gloss': ' any of various usually long-haired breeds of dog reared '
'to herd and guard sheep\n',
'lex_filenum': 5,
'p_cnt': 12,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2106058},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2107534},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2107903},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108064},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108157},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108293},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108507},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108682},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2108818},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2109034},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2109202},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2109314}],
'ss_type': 'n',
'synset_offset': 2107175,
'w_cnt': 3,
'words': [{'lex_id': 0, 'word': 'shepherd_dog'},
{'lex_id': 0, 'word': 'sheepdog'},
{'lex_id': 0, 'word': 'sheep_dog'}]},
2111324: {'gloss': ' a sturdy thickset short-haired breed with a large head '
'and strong undershot lower jaw; developed originally in '
'England for bull baiting \n',
'lex_filenum': 5,
'p_cnt': 3,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2106058},
{'pointer_symbol': '+',
'pos': 'v',
'source_target': '0101',
'synset_offset': 1121448},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2111567}],
'ss_type': 'n',
'synset_offset': 2111324,
'w_cnt': 2,
'words': [{'lex_id': 0, 'word': 'bulldog'},
{'lex_id': 0, 'word': 'English_bulldog'}]},
2112993: {'gloss': ' a large breed having a smooth white coat with black or '
'brown spots; originated in Dalmatia \n',
'lex_filenum': 5,
'p_cnt': 2,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086723},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2113184}],
'ss_type': 'n',
'synset_offset': 2112993,
'w_cnt': 3,
'words': [{'lex_id': 0, 'word': 'dalmatian'},
{'lex_id': 0, 'word': 'coach_dog'},
{'lex_id': 0, 'word': 'carriage_dog'}]},
2116752: {'gloss': ' any of various predatory carnivorous canine mammals of '
'North America and Eurasia that usually hunt in packs \n',
'lex_filenum': 5,
'p_cnt': 7,
'ptrs': [{'pointer_symbol': '@',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2085998},
{'pointer_symbol': '#m',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2086515},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 1324999},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117019},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117200},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117364},
{'pointer_symbol': '~',
'pos': 'n',
'source_target': '0000',
'synset_offset': 2117507}],
'ss_type': 'n',
'synset_offset': 2116752,
'w_cnt': 1,
'words': [{'lex_id': 0, 'word': 'wolf'}]}}
A2 to_adj¶
Implement a function to_adj
which takes the parsed db and RETURN a graph-like data structure in adjacency list format. Each node represent a synset - as label use the first word of the synset. A node is linked to another one if there is a IS A relation among the nodes, so use the @
symbol to filter the hypernyms.
IMPORTANT: not all linked synsets are present in the dogs excerpt.
HINT: If you couldn’t implement the parse_db
function properly, use as data the result of the previous print.
[4]:
def to_adj(db):
#jupman-raise
ret = {}
for d in db.values():
targets = []
for ptr in d['ptrs']:
if ptr['pointer_symbol'] == '@':
if ptr['synset_offset'] in db:
targets.append(db[ptr['synset_offset']]['words'][0]['word'])
#else:
# targets.append(ptr['synset_offset'])
ret[d['words'][0]['word']] = targets
return ret
#/jupman-raise
dogs_graph = to_adj(dogs_db)
from pprint import pprint
pprint(dogs_graph)
{'bulldog': ['working_dog'],
'canine': [],
'dalmatian': ['dog'],
'dog': ['canine', 'domestic_animal'],
'domestic_animal': [],
'shepherd_dog': ['working_dog'],
'wolf': ['canine'],
'working_dog': ['dog']}
Check results¶
If parsing is right, you should get the following graph
DO NOT implement any drawing function, this is just for checking your results
[5]:
from sciprog import draw_adj
draw_adj(dogs_graph, options={'graph':{'rankdir':'BT'}})

A.3 hist¶
You are given a dictionary mapping each relation symbol (i.e. @
) to its description (i.e. Hypernym
).
Implement a function to draw the histogram of relation frequencies found in the relation links of the entire Wordnet, which can be loaded from the file data/data.noun
. If you previously implemented parse_db
in a correct way, you should be able to load the whole db. If for any reasons you can’t, try at least to draw the histogram of frequencies found in dogs_db
sort the histogram from greatest to lowest frequency
do not count the relations containing the word ‘domain’ inside (upper/lowercase)
do not count the ‘’ relation
display the relation names nicely, adding newlines if necessary
[6]:
relation_names = {
'!':'Antonym',
'@':'Hypernym',
'@i':'Instance Hypernym',
'~':'Hyponym',
'~i':'Instance Hyponym',
'#m':'Member holonym',
'#s':'Substance holonym',
'#p':'Part holonym',
'%m':'Member meronym',
'%s':'Substance meronym',
'%p':'Part meronym',
'=':'Attribute',
'+':'Derivationally related form',
';c':'Domain of synset - TOPIC', # DISCARD
'-c':'Member of this domain - TOPIC', # DISCARD
';r':'Domain of synset - REGION', # DISCARD
'-r':'Member of this domain - REGION', # DISCARD
';u':'Domain of synset - USAGE', # DISCARD
'-u':'Member of this domain - USAGE', # DISCARD
'\\': 'Pertainym (pertains to noun)' # DISCARD
}
def draw_hist(db):
#jupman-raise
hist = {}
for d in db.values():
for ptr in d['ptrs']:
ps = ptr['pointer_symbol']
if 'domain' not in relation_names[ps].lower() and ps != '\\':
if ps in hist:
hist[ps] += 1
else:
hist[ps] = 0
pprint(hist)
import numpy as np
import matplotlib.pyplot as plt
xs = list(range(len(hist.keys())))
coords = [(x,hist[x]) for x in hist.keys()]
coords.sort(key=lambda c: c[1], reverse=True)
ys = [c[1] for c in coords]
fig = plt.figure(figsize=(18,6))
plt.bar(xs, ys,
0.5, # the width of the bars
color='green', # someone suggested the default blue color is depressing, so let's put green
align='center') # bars are centered on the xtick
plt.title('Wordnet Relation frequency SOLUTION')
xticks = [relation_names[c[0]].replace(' ', '\n') for c in coords]
plt.xticks(xs,xticks)
plt.show()
#/jupman-raise
[ ]:
[7]:
wordnet = parse_db('data/data.noun')
draw_hist(wordnet)
{'!': 2153,
'#m': 12287,
'#p': 9110,
'#s': 796,
'%m': 12287,
'%p': 9110,
'%s': 796,
'+': 37235,
'=': 638,
'@': 75915,
'@i': 8588,
'~': 75915,
'~i': 8588}

Part B¶
B1 Theory¶
Write the solution in separate ``theory.txt`` file
B1.1 complexity¶
Given a list 𝐿 of 𝑛 elements, please compute the asymptotic computational complexity of the following function, explaining your reasoning. Any ideas on how to improve the complexity of this code?
def my_fun(L):
n = len(L)
out = []
for i in range(n-2):
out.insert(0,L[i] + L[i+1] + L[i+2])
return out
B1.2 graph visits¶
Briefly describe the two classic ways of visiting the nodes of a graph.
B2 ItalianQueue v2¶
Open a text editor and have a look at file italian_queue_v2_exercise.py
In the original v1 implementation of the ItalianQueue we’ve already seen in class, enqueue
can take \(O(n)\): you will improve it by adding further indexing so it runs in \(O(1)\)
An ItalianQueue
is modelled as a LinkedList with two pointers, a _head
and a _tail
:
an element is enqueued scanning from
_head
until a matching group is found, in which case the element is inserted after (that is, at the right) of the matching group, otherwise the element is appended at the very end marked by_tail
an element is dequeued from the
_head
For this improved v2 version, you will use an additional dictionary _tails
which associates to each group present in the queue the node at the tail of that group sequence. This way, instead of scanning you will be able to directly jump to insertion point.
class ItalianQueue:
def __init__(self):
""" Initializes the queue.
- Complexity: O(1)
"""
self._head = None
self._tail = None
self._tails = {} # <---- NEW !
self._size = 0
Example:
If we have the following situation:
data : a -> b -> c -> d -> e -> f -> g -> h
group : x x y y y z z z
^ ^ ^ ^
| | | |
| _tails[x] _tails[y] _tails[z]
| |
_head _tail
By calling
q.enqueue('i','y')
We get:
data : a -> b -> c -> d -> e -> i -> f -> g -> h
group : x x y y y y z z z
^ ^ ^ ^
| | | |
| _tails[x] _tails[y] _tails[z]
| |
_head _tail
We can see here the complete run:
[8]:
from italian_queue_v2_solution import *
q = ItalianQueue()
print(q)
ItalianQueue:
_head: None
_tail: None
_tails: {}
[9]:
q.enqueue('a','x') # 'a' is the element,'x' is the group
[10]:
print(q)
ItalianQueue: a
x
_head: Node(a,x)
_tail: Node(a,x)
_tails: {'x': Node(a,x),}
[11]:
q.enqueue('c','y') # 'c' belongs to new group 'y', goes to the end of the queue
[12]:
print(q)
ItalianQueue: a->c
x y
_head: Node(a,x)
_tail: Node(c,y)
_tails: {'x': Node(a,x),
'y': Node(c,y),}
[13]:
q.enqueue('d','y') # 'd' belongs to existing group 'y', goes to the end of the group
[14]:
print(q)
ItalianQueue: a->c->d
x y y
_head: Node(a,x)
_tail: Node(d,y)
_tails: {'x': Node(a,x),
'y': Node(d,y),}
[15]:
q.enqueue('b','x') # 'b' belongs to existing group 'x', goes to the end of the group
[16]:
print(q)
ItalianQueue: a->b->c->d
x x y y
_head: Node(a,x)
_tail: Node(d,y)
_tails: {'x': Node(b,x),
'y': Node(d,y),}
[17]:
q.enqueue('f','z') # 'f' belongs to new group, goes at the end of the queue
[18]:
print(q)
ItalianQueue: a->b->c->d->f
x x y y z
_head: Node(a,x)
_tail: Node(f,z)
_tails: {'x': Node(b,x),
'y': Node(d,y),
'z': Node(f,z),}
[19]:
q.enqueue('e','y') # 'e' belongs to an existing group 'y', goes at the end of the group
[20]:
print(q)
ItalianQueue: a->b->c->d->e->f
x x y y y z
_head: Node(a,x)
_tail: Node(f,z)
_tails: {'x': Node(b,x),
'y': Node(e,y),
'z': Node(f,z),}
[21]:
q.enqueue('g','z') # 'g' belongs to an existing group 'z', goes at the end of the group
[22]:
print(q)
ItalianQueue: a->b->c->d->e->f->g
x x y y y z z
_head: Node(a,x)
_tail: Node(g,z)
_tails: {'x': Node(b,x),
'y': Node(e,y),
'z': Node(g,z),}
[23]:
q.enqueue('h','z') # 'h' belongs to an existing group 'z', goes at the end of the group
[24]:
print(q)
ItalianQueue: a->b->c->d->e->f->g->h
x x y y y z z z
_head: Node(a,x)
_tail: Node(h,z)
_tails: {'x': Node(b,x),
'y': Node(e,y),
'z': Node(h,z),}
[25]:
q.enqueue('h','z') # 'h' belongs to an existing group 'z', goes at the end of the group
[26]:
print(q)
ItalianQueue: a->b->c->d->e->f->g->h->h
x x y y y z z z z
_head: Node(a,x)
_tail: Node(h,z)
_tails: {'x': Node(b,x),
'y': Node(e,y),
'z': Node(h,z),}
[27]:
q.enqueue('i','y') # 'i' belongs to an existing group 'y', goes at the end of the group
[28]:
print(q)
ItalianQueue: a->b->c->d->e->i->f->g->h->h
x x y y y y z z z z
_head: Node(a,x)
_tail: Node(h,z)
_tails: {'x': Node(b,x),
'y': Node(i,y),
'z': Node(h,z),}
Dequeue is always from the head, without taking in consideration the group:
[29]:
q.dequeue()
[29]:
'a'
[30]:
print(q)
ItalianQueue: b->c->d->e->i->f->g->h->h
x y y y y z z z z
_head: Node(b,x)
_tail: Node(h,z)
_tails: {'x': Node(b,x),
'y': Node(i,y),
'z': Node(h,z),}
[31]:
q.dequeue() # removed last member of group 'x', key 'x' disappears from _tails['x']
[31]:
'b'
[32]:
print(q)
ItalianQueue: c->d->e->i->f->g->h->h
y y y y z z z z
_head: Node(c,y)
_tail: Node(h,z)
_tails: {'y': Node(i,y),
'z': Node(h,z),}
[33]:
q.dequeue()
[33]:
'c'
[34]:
print(q)
ItalianQueue: d->e->i->f->g->h->h
y y y z z z z
_head: Node(d,y)
_tail: Node(h,z)
_tails: {'y': Node(i,y),
'z': Node(h,z),}
B2.1 enqueue¶
Implement enqueue
:
def enqueue(self, v, g):
""" Enqueues provided element v having group g, with the following
criteria:
Queue is scanned from head to find if there is another element
with a matching group:
- if there is, v is inserted after the last element in the
same group sequence (so to the right of the group)
- otherwise v is inserted at the end of the queue
- MUST run in O(1)
"""
Testing: python3 -m unittest italian_queue_test.EnqueueTest
B2.2 dequeue¶
Implement dequeue
:
def dequeue(self):
""" Removes head element and returns it.
- If the queue is empty, raises a LookupError.
- MUST perform in O(1)
- REMEMBER to clean unused _tails keys
"""
IMPORTANT: you can test ``dequeue`` even if you didn’t implement ``enqueue`` correctly
Testing: python3 -m unittest italian_queue_test.DequeueTest
[ ]:
Exam - Tuesday 16, June 2020 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2020-06-16-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2020-06-16-FIRSTNAME-LASTNAME-ID
exam-2020-06-16-exercise.ipynb
theory.txt
linked_list_exercise.py
linked_list_test.py
bin_tree_exercise.py
bin_tree_test.py
Rename
datasciprolab-2020-06-16-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2020-06-16-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A - Zoom surveillance¶
A training center holds online courses with Zoom software. Participants attendance is mandatory, and teachers want to determine who left, when and for what reason. Zoom allows to save a meeting log in a sort of CSV format which holds the timings of joins and leaves of each student. You will clean the file content and show relevant data in charts.
Basically, you are going to build a surveillance system to monitor YOU. Welcome to digital age.
CSV format¶
You are provided with the file UserQos_12345678901.csv
. Unfortunately, it is a weird CSV which actually looks like two completely different CSVs were merged together, one after the other. It contains the following:
1st line: general meeting header
2nd line: general meeting data
3rd line: empty
4th line completely different header for participant sessions for that meeting. Each session contains a join time and a leave time, and each participant can have multiple sessions in a meeting.
5th line and following: sessions data
The file has lots of useless fields, try to explore it and understand the format (if you want, you may use LibreOffice Calc to help yourself)
Here we only show the few fields we are actually interested in, and examples of trasformations you should apply:
From general meeting information section:
Meeting ID
:123 4567 8901
Topic
:Hydraulics Exam
Start Time
:"Apr 17, 2020 02:00 PM"
should becomeApr 17, 2020
From participant sessions section:
Participant
:Luigi
Join Time
:01:54 PM
should become13:54
Leave Time
:03:10 PM(Luigi got disconnected from the meeting.Reason: Network connection error. )
should be split into two fields, one for actual leave time in15:10
format and another one for disconnection reason.
There are 3 possible disconnection reasons (try to come up with a general way to parse them - notice that there is no dot at the end of transformed string):
(Luigi got disconnected from the meeting.Reason: Network connection error. )
should becomeNetwork connection error
(Bowser left the meeting.Reason: Host closed the meeting. )
should becomeHost closed the meeting
(Princess Toadstool left the meeting.Reason: left the meeting.)
should becomeleft the meeting
Your first goal will be to load the dataset and restructure the data so it looks like this:
[['meeting_id', 'topic', 'date', 'participant', 'join_time', 'leave_time', 'reason'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Luigi','13:54','15:10','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Luigi','15:12','15:54','left the meeting'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','14:02','14:16','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','14:19','15:02','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','15:04','15:50','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','15:52','15:55','Network connection error'],
['123 4567 8901','Hydraulics Exam','Apr 17, 2020','Mario','15:56','16:00','Host closed the meeting'],
...
]
To fix the times, you will first need to implement the following function.
Open Jupyter and start editing this notebook exam-2020-06-16-exercise.ipynb
A1 time24¶
[1]:
def time24(t):
""" Takes a time string like '06:27 PM' and outputs a string like 18:27
"""
#jupman_raise
if t.endswith('AM'):
if t.startswith('12:00'):
return '00:00'
else:
return t.replace(' AM', '')
else:
if t.startswith('12:00'):
return '12:00'
h = '%0.d' % (int(t.split(':')[0]) + 12)
return h + ':' + t.split(':')[1].replace(' PM','')
#/jupman_raise
assert time24('12:00 AM') == '00:00' # midnight
assert time24('01:06 AM') == '01:06'
assert time24('09:45 AM') == '09:45'
assert time24('12:00 PM') == '12:00' # special case, it's actually midday
assert time24('01:27 PM') == '13:27'
assert time24('06:27 PM') == '18:27'
assert time24('10:03 PM') == '22:03'
A2 load¶
Implement a function which loads the file UserQos_12345678901.csv
and RETURN a list of lists.
To parse the file, you can use simple CSV parsing as seen in class (there is no need to use pandas)
[2]:
import csv
def load(filepath):
#jupman-raise
ret = []
with open(filepath, encoding='utf-8', newline='') as f:
lettore = csv.reader(f, delimiter=',')
next(lettore)
riga_meeting = next(lettore)
meeting_id = riga_meeting[0]
topic = riga_meeting[1]
meeting_date = riga_meeting[7]
next(lettore) # riga vuota
next(lettore) # secondo header
ret.append(['meeting_id', 'topic','date', 'participant','join_time','leave_time','reason'])
for riga in lettore:
session = {}
if len(riga) > 0:
ret.append([meeting_id,
topic,
meeting_date[:12],
riga[0],
time24(riga[10]),
time24(riga[11].split('(')[0]),
riga[11].split('Reason: ')[1].split('.')[0]])
return ret
#/jupman-raise
meeting_log = load('UserQos_12345678901.csv')
from pprint import pprint
pprint(meeting_log, width=150)
[['meeting_id', 'topic', 'date', 'participant', 'join_time', 'leave_time', 'reason'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Luigi', '13:54', '15:10', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Luigi', '15:12', '15:54', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '14:02', '14:16', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '14:19', '15:02', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:04', '15:50', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:52', '15:55', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:56', '16:00', 'Host closed the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '14:15', '14:30', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '14:54', '15:03', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '15:12', '15:40', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '15:45', '16:00', 'Host closed the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Princess Toadstool', '13:56', '15:33', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:05', '14:10', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:15', '14:29', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:33', '15:10', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '15:25', '15:54', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '15:55', '16:00', 'Host closed the meeting']]
[3]:
EXPECTED_MEETING_LOG = \
[['meeting_id', 'topic', 'date', 'participant', 'join_time', 'leave_time', 'reason'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Luigi', '13:54', '15:10', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Luigi', '15:12', '15:54', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '14:02', '14:16', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '14:19', '15:02', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:04', '15:50', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:52', '15:55', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Mario', '15:56', '16:00', 'Host closed the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '14:15', '14:30', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '14:54', '15:03', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '15:12', '15:40', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Bowser', '15:45', '16:00', 'Host closed the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Princess Toadstool', '13:56', '15:33', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:05', '14:10', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:15', '14:29', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '14:33', '15:10', 'left the meeting'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '15:25', '15:54', 'Network connection error'],
['123 4567 8901', 'Hydraulics Exam', 'Apr 17, 2020', 'Wario', '15:55', '16:00', 'Host closed the meeting']]
assert meeting_log[0] == EXPECTED_MEETING_LOG[0] # header
assert meeting_log[1] == EXPECTED_MEETING_LOG[1] # first Luigi row
assert meeting_log[1:3] == EXPECTED_MEETING_LOG[1:3] # Luigi rows
assert meeting_log[:4] == EXPECTED_MEETING_LOG[:4] # until first Mario row included
assert meeting_log == EXPECTED_MEETING_LOG # all table
A3.1 duration¶
Given two times as strings a
and b
in format like 17:34
, RETURN the duration in minutes between them as an integer.
To calculate gap durations, we assume a meeting NEVER ends after midnight
[4]:
def duration(a, b):
#jupman-raise
asp = a.split(':')
ta = int(asp[0])*60+int(asp[1])
bsp = b.split(':')
tb = int(bsp[0])*60 + int(bsp[1])
return tb - ta
#/jupman-raise
assert duration('15:00','15:34') == 34
assert duration('15:00','17:34') == 120 + 34
assert duration('15:50','16:12') == 22
assert duration('09:55','11:06') == 5 + 60 + 6
assert duration('00:00','00:01') == 1
#assert duration('11:58','00:01') == 3 # no need to support this case !!
A3.2 calc_stats¶
We want to know something about the time each participant has been disconnected from the exam. We call such intervals gaps
, which are the difference between a session leave time and successive session join time.
Implement the function calc_stats
that given a cleaned log produced by load
, RETURN a dictionary mapping each partecipant to a dictionary with these statistics:
max_gap
: the longest time in minutes in which the participant has been disconnectedgaps
: the number of disconnections happend to the participant during the meetingtime_away
: the total time in minutes during which the participant has been disconnected during the meeting
To calculate gap durations, we assume a meeting NEVER ends after midnight
For the data format details, see EXPECTED_STATS
below.
To test the function, you DON’T NEED to have correctly implemented previous functions
[5]:
def calc_stats(log):
#jupman-raise
ret = {}
last_sessions = {}
first = True
for session in log:
if first:
first = False
continue
date = session[2]
participant = session[3]
join_time = session[4]
leave_time = session[5]
reason = session[6]
if participant not in ret:
ret[participant] = {'max_gap': 0,
'gaps': 0,
'time_away':0
}
if participant in last_sessions:
last_leave_time = last_sessions[participant][5]
gap = duration(last_leave_time, join_time)
ret[participant]['max_gap'] = max(gap, ret[participant]['max_gap'])
ret[participant]['gaps'] += 1
ret[participant]['time_away'] += gap
last_sessions[participant] = session
return ret
#/jupman-raise
stats = calc_stats(meeting_log)
# in case you had trouble implementing load function, use this:
#stats = calc_stats(EXPECTED_MEETING_LOG)
stats
[5]:
{'Bowser': {'gaps': 3, 'max_gap': 24, 'time_away': 38},
'Luigi': {'gaps': 1, 'max_gap': 2, 'time_away': 2},
'Mario': {'gaps': 4, 'max_gap': 3, 'time_away': 8},
'Princess Toadstool': {'gaps': 0, 'max_gap': 0, 'time_away': 0},
'Wario': {'gaps': 4, 'max_gap': 15, 'time_away': 25}}
[6]:
EXPECTED_STATS = { 'Bowser': {'gaps': 3, 'max_gap': 24, 'time_away': 38},
'Luigi': {'gaps': 1, 'max_gap': 2, 'time_away': 2},
'Mario': {'gaps': 4, 'max_gap': 3, 'time_away': 8},
'Princess Toadstool': {'gaps': 0, 'max_gap': 0, 'time_away': 0},
'Wario': {'gaps': 4, 'max_gap': 15, 'time_away': 25}}
assert stats == EXPECTED_STATS
A4 viz¶
Produce a bar chart of the statistics you calculated before. For how to do it, see examples in Visualiation tutorial
participant names MUST be sorted in alphabetical order
remember to put title, legend and axis labels
To test the function, you DON’T NEED to have correctly implemented previous functions
[7]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
def viz(stats):
#jupman-raise
xs = np.arange(len(stats))
ys_max_gap = []
ys_time_away = []
labels = list(sorted(stats.keys()))
for participant in sorted(stats):
pstats = stats[participant]
ys_max_gap.append(pstats['max_gap'])
ys_time_away.append(pstats['time_away'])
width = 0.35
fig, ax = plt.subplots(figsize=(10,3))
rects1 = ax.bar(xs - width/2, ys_max_gap, width,
color='red', label='max gap')
rects2 = ax.bar(xs + width/2, ys_time_away, width,
color='darkred', label='time_away')
plt.xticks(xs, labels)
ax.set_title('Disconnections SOLUTION')
ax.legend()
plt.ylabel('minutes')
plt.savefig('surveillance.png')
plt.show()
#/jupman-raise
viz(stats)
# in case you had trouble implementing calc_stats, use this:
#viz(EXPECTED_STATS)

Part B¶
B1 Theory¶
Write the solution in separate theory.txt
file
B1.1 complexity¶
Given a list L
of n
positive integers, please compute the asymptotic computational complexity of the following function, explaining your reasoning.
def my_max(L):
M = -1
for e in L:
if e > M:
M = e
return M
def my_fun(L):
n = len(L)
out = 0
for i in range(5):
out = out + my_max(L[i:])
return out
B1.2 describe¶
Briefly describe what a bidirectional linked list is. How does it differ from a queue?
B2 - LinkedList slice¶
Open a text editor and edit file linked_list_exercise.py
Implement the method slice
:
def slice(self, start, end):
""" RETURN a NEW LinkedList created by copying nodes of this list
from index start INCLUDED to index end EXCLUDED
- if start is greater or equal than end, returns an empty LinkedList
- if start is greater than available nodes, returns an empty LinkedList
- if end is greater than the available nodes, copies all items until the tail without errors
- if start index is negative, raises ValueError
- if end index is negative, raises ValueError
- IMPORTANT: All nodes in the returned LinkedList MUST be NEW
- DO *NOT* modify original linked list
- DO *NOT* add an extra size field
- MUST execute in O(n), where n is the size of the list
"""
Testing: python3 -m unittest linked_list_test.SliceTest
Example:
[8]:
from linked_list_solution import *
[9]:
la = LinkedList()
la.add('g')
la.add('f')
la.add('e')
la.add('d')
la.add('c')
la.add('b')
la.add('a')
[10]:
print(la)
LinkedList: a,b,c,d,e,f,g
Creates a NEW LinkedList
copying nodes from index 2
INCLUDED up to index 5
EXCLUDED:
[11]:
lb = la.slice(2,5)
[12]:
print(lb)
LinkedList: c,d,e
Note original LinkedList
is still intact:
[13]:
print(la)
LinkedList: a,b,c,d,e,f,g
Special cases¶
If start
is greater or equal then end
, you get an empty LinkedList
:
[14]:
print(la.slice(5,3))
LinkedList:
If start
is greater than available nodes, you get an empty LinkedList
:
[15]:
print(la.slice(10,15))
LinkedList:
If end
is greater than the available nodes, you get a copy of all the nodes until the tail without errors:
[16]:
print(la.slice(3,10))
LinkedList: d,e,f,g
Using negative indexes for either start
, end
or both raises ValueError
:
la.slice(-3,4)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-184-e3380bb66e77> in <module>()
----> 1 la.slice(-3,4)
~/Da/prj/datasciprolab/prj/exams/2020-06-16/linked_list_solution.py in slice(self, start, end)
63
64 if start < 0:
---> 65 raise ValueError('Negative values for start are not supported! %s ' % start)
66 if end < 0:
67 raise ValueError('Negative values for end are not supported: %s' % end)
ValueError: Negative values for start are not supported! -3
la.slice(1,-2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-185-8e09ec468c30> in <module>()
----> 1 la.slice(1,-2)
~/Da/prj/datasciprolab/prj/exams/2020-06-16/linked_list_solution.py in slice(self, start, end)
65 raise ValueError('Negative values for start are not supported! %s ' % start)
66 if end < 0:
---> 67 raise ValueError('Negative values for end are not supported: %s' % end)
68
69 ret = LinkedList()
ValueError: Negative values for end are not supported: -2
B3 BinaryTree prune_rec¶
Implement the method prune_rec
:
def prune_rec(self, el):
""" MODIFIES the tree by cutting all the subtrees that have their
root node data equal to el. By 'cutting' we mean they are no longer linked
by the tree on which prune is called.
- if prune is called on a node having data equal to el, raises ValueError
- MUST execute in O(n) where n is the number of nodes of the tree
- NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.PruneRecTest
Example:
[17]:
from bin_tree_solution import *
from bin_tree_test import bt
[18]:
t = bt('a',
bt('b',
bt('z'),
bt('c',
bt('d'),
bt('z',
None,
bt('e')))),
bt('z',
bt('f'),
bt('z',
None,
bt('g'))))
[19]:
print(t)
a
├b
│├z
│└c
│ ├d
│ └z
│ ├
│ └e
└z
├f
└z
├
└g
[20]:
t.prune_rec('z')
[21]:
print(t)
a
├b
│├
│└c
│ ├d
│ └
└
[22]:
t.prune_rec('c')
[23]:
print(t)
a
├b
└
Trying to prune the root will throw a ValueError
:
t.prune_rec('a')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-f8e8fa8a97dd> in <module>()
----> 1 t.prune_rec('a')
ValueError: Tried to prune the tree root !
[ ]:
Exam - Friday 17, July 2020 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
Grading¶
Correct implementations: Correct implementations with the required complexity grant you full grade.
Partial implementations: Partial implementations might still give you a few points. If you just can’t solve an exercise, try to solve it at least for some subcase (i.e. array of fixed size 2) commenting why you did so.
Valid code¶
WARNING: MAKE SURE ALL EXERCISE FILES AT LEAST COMPILE !!! 10 MINS BEFORE THE END OF THE EXAM I WILL ASK YOU TO DO A FINAL CLEAN UP OF THE CODE
WARNING: ONLY IMPLEMENTATIONS OF THE PROVIDED FUNCTION SIGNATURES WILL BE EVALUATED !!!!!!!!!
For example, if you are given to implement:
def f(x):
raise Exception("TODO implement me")
and you ship this code:
def my_f(x):
# a super fast, correct and stylish implementation
def f(x):
raise Exception("TODO implement me")
We will assess only the latter one f(x)
, and conclude it doesn’t work at all :P !!!!!!!
Helper functions
Still, you are allowed to define any extra helper function you might need. If your f(x)
implementation calls some other function you defined like my_f
here, it is ok:
# Not called by f, will get ignored:
def my_g(x):
# bla
# Called by f, will be graded:
def my_f(y,z):
# bla
def f(x):
my_f(x,5)
How to edit and run¶
To edit the files, you can use any editor of your choice, you can find them under Applications->Programming:
Visual Studio Code
Editra is easy to use, you can find it under Applications->Programming->Editra.
Others could be GEdit (simpler), or PyCharm (more complex).
To run the tests, use the Terminal which can be found in Accessories -> Terminal
IMPORTANT: Pay close attention to the comments of the functions.
WARNING: DON’T modify function signatures! Just provide the implementation.
WARNING: DON’T change the existing test methods, just add new ones !!! You can add as many as you want.
WARNING: DON’T create other files. If you still do it, they won’t be evaluated.
Debugging¶
If you need to print some debugging information, you are allowed to put extra print statements in the function bodies.
WARNING: even if print statements are allowed, be careful with prints that might break your function!
For example, avoid stuff like this:
x = 0
print(1/x)
What to do¶
Download
datasciprolab-2020-07-17-exam.zip
and extract it on your desktop. Folder content should be like this:
datasciprolab-2020-07-17-FIRSTNAME-LASTNAME-ID
exam-2020-07-17-exercise.ipynb
theory.txt
office_queue_exercise.py
office_queue_test.py
Rename
datasciprolab-2020-07-17-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2020-07-17-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A - NACE codes¶
https://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_CLS_DLD&StrNom=NACE_REV2&StrLanguageCode=EN&StrLayoutCode=HIERARCHIC#
So you want to be a data scientist. Good, plenty of oopportunities ahead!
After graduating, you might discover though that many companies require you to actually work as a freelancer: you will just need to declare to the state which type of economic activity you are going to perform, they say. Seems easy, but you will soon encounter a pretty burocratic problem: do public institutions even know what a data scientist is? If not, what is the closest category they recognize? Is there any specific exclusion that would bar you from entering that category?
If you are in Europe, you will be presented with a catalog of economic activites you can choose from called NACE, which is then further specialized by various states (for example Italy’s catalog is called ATECO)
Sections¶
A NACE code is subdivided in a hierarchical, four-level structure. The categories at the highest level are called sections, here they are:
Section detail¶
If you drill down in say, section M
, you will find something like this:
The first two digits of the code identify the division, the third digit identifies the group, and the fourth digit identifies the class:
Let’s pick for example Advertising agencies , which has code 73.11
:
Level |
Code |
Spec |
Description |
|
---|---|---|---|---|
1 |
Section |
M |
a single alphabetic char |
PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES |
2 |
Division |
73 |
two-digits |
Advertising and market research |
3 |
Group |
73.1 |
three-digits, with dot after first two |
Advertising |
4 |
Class |
73.12 |
four-digits, with dot after first two |
Advertising agencies |
Specifications¶
WARNING: CODES MAY CONTAIN ZEROES!
IF YOU LOAD THE CSV IN LIBREOFFICE CALC OR EXCEL, MAKE SURE IT IMPORTS EVERYTHING AS STRING!
WATCH OUT FOR CHOPPED ZEROES !
Zero examples:
Veterinary activities contains a double zero at the end :
75.00
group Manufacture of beverages contains a single zero at the end:
11.0
Manufacture of beer contains zero inside :
11.05
Support services to forestry contains a zero at the beginning :
02.4
which is different from02.40
even if they have the same description !
The section level code is not integrated in the NACE code: For example, the activity Manufacture of glues is identified by the code 20.52
, where 20
is the code for the division, 20.5
is the code for the group and 20.52
is the code of the class; section C
, to which this class belongs, does not appear in the code itself.
There may be gaps (not very important for us): The divisions are coded consecutively. However, some “gaps” have been provided to allow the introduction of additional divisions without a complete change of the NACE coding.
NACE CSV¶
We provide you with a CSV NACE_REV2_20200628_213139.csv that contains all the codes. Try to explore it with LibreOffice Calc or pandas
Here we show some relevant parts (NOTE: for part A you will NOT need to use pandas)
[1]:
import pandas as pd # we import pandas and for ease we rename it to 'pd'
import numpy as np # we import numpy and for ease we rename it to 'np'
pd.set_option('display.max_colwidth', -1)
df = pd.read_csv('NACE_REV2_20200628_213139.csv', encoding='UTF-8')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 996 entries, 0 to 995
Data columns (total 10 columns):
Order 996 non-null int64
Level 996 non-null int64
Code 996 non-null object
Parent 975 non-null object
Description 996 non-null object
This item includes 778 non-null object
This item also includes 202 non-null object
Rulings 134 non-null object
This item excludes 507 non-null object
Reference to ISIC Rev. 4 996 non-null object
dtypes: int64(2), object(8)
memory usage: 77.9+ KB
[2]:
df.head(5)
[2]:
Order | Level | Code | Parent | Description | This item includes | This item also includes | Rulings | This item excludes | Reference to ISIC Rev. 4 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 398481 | 1 | A | NaN | AGRICULTURE, FORESTRY AND FISHING | This section includes the exploitation of vegetal and animal natural resources, comprising the activities of growing of crops, raising and breeding of animals, harvesting of timber and other plants, animals or animal products from a farm or their natural habitats. | NaN | NaN | NaN | A |
1 | 398482 | 2 | 01 | A | Crop and animal production, hunting and related service activities | This division includes two basic activities, namely the production of crop products and production of animal products, covering also the forms of organic agriculture, the growing of genetically modified crops and the raising of genetically modified animals. This division includes growing of crops in open fields as well in greenhouses.\n \nGroup 01.5 (Mixed farming) breaks with the usual principles for identifying main activity. It accepts that many agricultural holdings have reasonably balanced crop and animal production, and that it would be arbitrary to classify them in one category or the other. | This division also includes service activities incidental to agriculture, as well as hunting, trapping and related activities. | NaN | Agricultural activities exclude any subsequent processing of the agricultural products (classified under divisions 10 and 11 (Manufacture of food products and beverages) and division 12 (Manufacture of tobacco products)), beyond that needed to prepare them for the primary markets. The preparation of products for the primary markets is included here.\n\nThe division excludes field construction (e.g. agricultural land terracing, drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers and cooperative associations engaged in the marketing of farm products classified in section G. Also excluded is the landscape care and maintenance, which is classified in class 81.30. | 01 |
2 | 398483 | 3 | 01.1 | 01 | Growing of non-perennial crops | This group includes the growing of non-perennial crops, i.e. plants that do not last for more than two growing seasons. Included is the growing of these plants for the purpose of seed production. | NaN | NaN | NaN | 011 |
3 | 398484 | 4 | 01.11 | 01.1 | Growing of cereals (except rice), leguminous crops and oil seeds | This class includes all forms of growing of cereals, leguminous crops and oil seeds in open fields. The growing of these crops is often combined within agricultural units.\n\nThis class includes:\n- growing of cereals such as:\n . wheat\n . grain maize\n . sorghum\n . barley\n . rye\n . oats\n . millets\n . other cereals n.e.c.\n- growing of leguminous crops such as:\n . beans\n . broad beans\n . chick peas\n . cow peas\n . lentils\n . lupines\n . peas\n . pigeon peas\n . other leguminous crops\n- growing of oil seeds such as:\n . soya beans\n . groundnuts\n . castor bean\n . linseed\n . mustard seed\n . niger seed\n . rapeseed\n . safflower seed\n . sesame seed\n . sunflower seed\n . other oil seeds | NaN | NaN | This class excludes:\n- growing of rice, see 01.12\n- growing of sweet corn, see 01.13\n- growing of maize for fodder, see 01.19\n- growing of oleaginous fruits, see 01.26 | 0111 |
4 | 398485 | 4 | 01.12 | 01.1 | Growing of rice | This class includes:\n- growing of rice (including organic farming and the growing of genetically modified rice) | NaN | NaN | NaN | 0112 |
We can focus on just these columns:
[3]:
selection = [398482,398488,398530,398608,398482,398518,398521,398567]
from IPython.display import display
example_df = df[['Order', 'Level','Code','Parent','Description','This item excludes']]
# Assuming the variable df contains the relevant DataFrame
example_df = example_df[example_df['Order'].isin(selection)]
display(example_df.style.set_properties(**{'white-space': 'pre-wrap',}))
Order | Level | Code | Parent | Description | This item excludes | |
---|---|---|---|---|---|---|
1 | 398482 | 2 | 01 | A | Crop and animal production, hunting and related service activities | Agricultural activities exclude any subsequent processing of the agricultural products (classified under divisions 10 and 11 (Manufacture of food products and beverages) and division 12 (Manufacture of tobacco products)), beyond that needed to prepare them for the primary markets. The preparation of products for the primary markets is included here. The division excludes field construction (e.g. agricultural land terracing, drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers and cooperative associations engaged in the marketing of farm products classified in section G. Also excluded is the landscape care and maintenance, which is classified in class 81.30. |
7 | 398488 | 4 | 01.15 | 01.1 | Growing of tobacco | This class excludes: - manufacture of tobacco products, see 12.00 |
37 | 398518 | 4 | 01.64 | 01.6 | Seed processing for propagation | This class excludes: - growing of seeds, see groups 01.1 and 01.2 - processing of seeds to obtain oil, see 10.41 - research to develop or modify new forms of seeds, see 72.11 |
40 | 398521 | 2 | 02 | A | Forestry and logging | Excluded is further processing of wood beginning with sawmilling and planing of wood, see division 16. |
49 | 398530 | 2 | 03 | A | Fishing and aquaculture | This division does not include building and repairing of ships and boats (30.1, 33.15) and sport or recreational fishing activities (93.19). Processing of fish, crustaceans or molluscs is excluded, whether at land-based plants or on factory ships (10.20). |
86 | 398567 | 4 | 09.90 | 09.9 | Support activities for other mining and quarrying | This class excludes: - operating mines or quarries on a contract or fee basis, see division 05, 07 or 08 - specialised repair of mining machinery, see 33.12 - geophysical surveying services, on a contract or fee basis, see 71.12 |
127 | 398608 | 4 | 11.03 | 11.0 | Manufacture of cider and other fruit wines | This class excludes: - merely bottling and labelling, see 46.34 (if performed as part of wholesale) and 82.92 (if performed on a fee or contract basis) |
A1 Extracting codes¶
Let’s say European Commission wants to review the catalog to simplify it. One way to do it, could be to look for codes that have lots of exclusions, the reasoning being that trying to explain somebody something by stating what it is not often results in confusion.
A1.1 is_nace¶
Implement following function. NOTE: it was not explicitly required in the original exam but could help detecting words.
[4]:
def is_nace(word):
"""Given a word, RETURN True if the word is a NACE code, else otherwise"""
#jupman-raise
# we could implement it also with regexes, here we use explicit methods:
if len(word) == 1:
return word.isalpha() and word.isupper()
elif len(word) == 2:
return word.isdigit()
elif len(word) == 4:
return word[:2].isdigit() and word[2] == '.' and word[3].isdigit()
elif len(word) == 5:
return word[:2].isdigit() and word[2] == '.' and word[3:].isdigit()
else:
return False
#/jupman-raise
assert is_nace('0') == False
assert is_nace('01') == True
assert is_nace('A') == True # this is a Section
assert is_nace('AA') == False
assert is_nace('a') == False
assert is_nace('01.2') == True
assert is_nace('01.20') == True
assert is_nace('03.25') == True
assert is_nace('02.753') == False
assert is_nace('300') == False
assert is_nace('5012') == False
A1.2 extract_codes¶
Implement following function which extracts codes from This item excludes
column cells. For examples, see asserts.
[5]:
def extract_codes(text):
"""Extracts all the NACE codes from given text (a single string),
and RETURN a list of the codes
- also extracts section letters
- list must have *no* duplicates
"""
#jupman-raise
ret = []
words = [word.strip(';,.:()"\'') for word in text.replace('-',' ').split()]
for i in range(len(words)):
if i < len(words) - 1 \
and words[i].lower() == 'section' \
and len(words[i+1]) == 1 \
and words[i+1][0].isalpha():
if words[i+1] not in ret:
ret.append(words[i+1])
else:
if is_nace(words[i]) and words[i] not in ret:
ret.append(words[i])
return ret
#/jupman-raise
assert extract_codes('group 02.4') == ['02.4']
assert extract_codes('class 02.40') == ['02.40']
assert extract_codes('.') == []
assert extract_codes('exceeding 300 litres') == []
assert extract_codes('see 46.34') == ['46.34']
assert extract_codes('divisions 10 and 11') == ['10','11']
assert extract_codes('(10.20)') == ['10.20']
assert extract_codes('(30.1, 33.15)') == ['30.1', '33.15']
assert extract_codes('as outlined in groups 85.1-85.4, i.e.') == ['85.1','85.4']
assert extract_codes('see 25.99 see 25.99') == ['25.99'] # no duplicates
assert extract_codes('section A') == ['A']
assert extract_codes('in section G. Also') == ['G']
assert extract_codes('section F (Construction)') == ['F']
assert extract_codes('section A, section A') == ['A']
[6]:
# MORE REALISTIC asserts:
t01 = """Agricultural activities exclude any subsequent processing of the
agricultural products (classified under divisions 10 and 11 (Manufacture of food
products and beverages) and division 12 (Manufacture of tobacco products)), beyond
that needed to prepare them for the primary markets. The preparation of products for
the primary markets is included here.
The division excludes field construction (e.g. agricultural land terracing,
drainage, preparing rice paddies etc.) classified in section F (Construction) and buyers
and cooperative associations engaged in the marketing of farm products classified
in section G. Also excluded is the landscape care and maintenance,
which is classified in class 81.30.
"""
assert extract_codes(t01) == ['10','11','12','F','G','81.30']
t01_15 = """This class excludes:
- manufacture of tobacco products, see 12.00
"""
assert extract_codes(t01_15) == ['12.00']
t03 = """This division does not include building and repairing of ships and
boats (30.1, 33.15) and sport or recreational fishing activities (93.19).
Processing of fish, crustaceans or molluscs is excluded, whether at land-based
plants or on factory ships (10.20).
"""
assert extract_codes(t03) == ['30.1', '33.15','93.19','10.20']
t11_03 = """This class excludes:
- merely bottling and labelling, see 46.34 (if performed as part of wholesale)
and 82.92 (if performed on a fee or contract basis)
"""
assert extract_codes(t11_03) == ['46.34', '82.92']
t01_64 = """This class excludes:
- growing of seeds, see groups 01.1 and 01.2
- processing of seeds to obtain oil, see 10.41
- research to develop or modify new forms of seeds, see 72.11
"""
assert extract_codes(t01_64) == ['01.1','01.2','10.41','72.11']
t02 = """Excluded is further processing of wood beginning with sawmilling and planing of wood,
see division 16.
"""
assert extract_codes(t02) == ['16']
t09_90 = """This class excludes:
- operating mines or quarries on a contract or fee basis, see division 05, 07 or 08
- specialised repair of mining machinery, see 33.12
- geophysical surveying services, on a contract or fee basis, see 71.12
"""
assert extract_codes(t09_90) == ['05','07','08','33.12','71.12']
A2 build_db¶
Given a filepath pointing to a NACE CSV, reads the CSV and RETURN a dictionary mapping codes to dictionaries which hold the code descriptionn and a field with the list of excluded codes, for example:
{'01': {'description': 'Crop and animal production, hunting and related service activities',
'exclusions': ['10', '11', '12', 'F', 'G', '81.30']},
'01.1': {'description': 'Growing of non-perennial crops', 'exclusions': []},
'01.11': {'description': 'Growing of cereals (except rice), leguminous crops and oil seeds',
'exclusions': ['01.12', '01.13', '01.19', '01.26']},
'01.12': {'description': 'Growing of rice', 'exclusions': []},
'01.13': {'description': 'Growing of vegetables and melons, roots and tubers',
'exclusions': ['01.28', '01.30']},
...
...
}
The complete desired output is in file expected_db.py
[7]:
def build_db(filepath):
#jupman-raise
ret = {}
import csv
with open(filepath, encoding='utf-8', newline='') as f:
my_reader = csv.DictReader(f, delimiter=',')
for d in my_reader:
diz = {'description' : d['Description'],
'exclusions' : extract_codes(d['This item excludes'])}
ret[d['Code']] = diz
return ret
#/jupman-raise
activities_db = build_db('NACE_REV2_20200628_213139.csv')
activities_db
[7]:
{'01': {'description': 'Crop and animal production, hunting and related service activities',
'exclusions': ['10', '11', '12', 'F', 'G', '81.30']},
'01.1': {'description': 'Growing of non-perennial crops', 'exclusions': []},
'01.11': {'description': 'Growing of cereals (except rice), leguminous crops and oil seeds',
'exclusions': ['01.12', '01.13', '01.19', '01.26']},
'01.12': {'description': 'Growing of rice', 'exclusions': []},
'01.13': {'description': 'Growing of vegetables and melons, roots and tubers',
'exclusions': ['01.28', '01.30']},
'01.14': {'description': 'Growing of sugar cane', 'exclusions': ['01.13']},
'01.15': {'description': 'Growing of tobacco', 'exclusions': ['12.00']},
'01.16': {'description': 'Growing of fibre crops', 'exclusions': []},
'01.19': {'description': 'Growing of other non-perennial crops',
'exclusions': ['01.28']},
'01.2': {'description': 'Growing of perennial crops', 'exclusions': []},
'01.21': {'description': 'Growing of grapes', 'exclusions': ['11.02']},
'01.22': {'description': 'Growing of tropical and subtropical fruits',
'exclusions': []},
'01.23': {'description': 'Growing of citrus fruits', 'exclusions': []},
'01.24': {'description': 'Growing of pome fruits and stone fruits',
'exclusions': []},
'01.25': {'description': 'Growing of other tree and bush fruits and nuts',
'exclusions': ['01.26']},
'01.26': {'description': 'Growing of oleaginous fruits',
'exclusions': ['01.11']},
'01.27': {'description': 'Growing of beverage crops', 'exclusions': []},
'01.28': {'description': 'Growing of spices, aromatic, drug and pharmaceutical crops',
'exclusions': []},
'01.29': {'description': 'Growing of other perennial crops',
'exclusions': ['01.19', '02.30']},
'01.3': {'description': 'Plant propagation', 'exclusions': []},
'01.30': {'description': 'Plant propagation',
'exclusions': ['01.1', '01.2', '02.10']},
'01.4': {'description': 'Animal production',
'exclusions': ['01.62', '10.11']},
'01.41': {'description': 'Raising of dairy cattle', 'exclusions': ['10.51']},
'01.42': {'description': 'Raising of other cattle and buffaloes',
'exclusions': []},
'01.43': {'description': 'Raising of horses and other equines',
'exclusions': ['93.19']},
'01.44': {'description': 'Raising of camels and camelids', 'exclusions': []},
'01.45': {'description': 'Raising of sheep and goats',
'exclusions': ['01.62', '10.11', '10.51']},
'01.46': {'description': 'Raising of swine/pigs', 'exclusions': []},
'01.47': {'description': 'Raising of poultry', 'exclusions': ['10.12']},
'01.49': {'description': 'Raising of other animals',
'exclusions': ['01.70', '03.21', '03.22', '96.09', '01.47']},
'01.5': {'description': 'Mixed farming', 'exclusions': []},
'01.50': {'description': 'Mixed farming',
'exclusions': ['01.1', '01.2', '01.4']},
'01.6': {'description': 'Support activities to agriculture and post-harvest crop activities',
'exclusions': []},
'01.61': {'description': 'Support activities for crop production',
'exclusions': ['01.63', '43.12', '71.11', '74.90', '81.30', '82.30']},
'01.62': {'description': 'Support activities for animal production',
'exclusions': ['68.20', '75.00', '77.39', '96.09']},
'01.63': {'description': 'Post-harvest crop activities',
'exclusions': ['01.1', '01.2', '01.3', '01.64', '12.00', '46', '46.2']},
'01.64': {'description': 'Seed processing for propagation',
'exclusions': ['01.1', '01.2', '10.41', '72.11']},
'01.7': {'description': 'Hunting, trapping and related service activities',
'exclusions': []},
'01.70': {'description': 'Hunting, trapping and related service activities',
'exclusions': ['01.49', '01.4', '03.11', '10.11', '93.19', '94.99']},
'02': {'description': 'Forestry and logging', 'exclusions': ['16']},
'02.1': {'description': 'Silviculture and other forestry activities',
'exclusions': []},
'02.10': {'description': 'Silviculture and other forestry activities',
'exclusions': ['01.29', '01.30', '02.30', '16.10']},
'02.2': {'description': 'Logging', 'exclusions': []},
'02.20': {'description': 'Logging',
'exclusions': ['01.29', '02.10', '02.30', '16.10', '20.14']},
'02.3': {'description': 'Gathering of wild growing non-wood products',
'exclusions': []},
'02.30': {'description': 'Gathering of wild growing non-wood products',
'exclusions': ['01', '01.13', '01.25', '02.20', '16.10']},
'02.4': {'description': 'Support services to forestry', 'exclusions': []},
'02.40': {'description': 'Support services to forestry',
'exclusions': ['02.10', '43.12']},
'03': {'description': 'Fishing and aquaculture',
'exclusions': ['30.1', '33.15', '93.19', '10.20']},
'03.1': {'description': 'Fishing', 'exclusions': []},
'03.11': {'description': 'Marine fishing',
'exclusions': ['01.70', '10.11', '10.20', '50.10', '84.24', '93.19']},
'03.12': {'description': 'Freshwater fishing',
'exclusions': ['10.20', '84.24', '93.19']},
'03.2': {'description': 'Aquaculture', 'exclusions': []},
'03.21': {'description': 'Marine aquaculture',
'exclusions': ['03.22', '93.19']},
'03.22': {'description': 'Freshwater aquaculture',
'exclusions': ['03.21', '93.19']},
'05': {'description': 'Mining of coal and lignite',
'exclusions': ['19.10', '09.90', '19.20']},
'05.1': {'description': 'Mining of hard coal', 'exclusions': []},
'05.10': {'description': 'Mining of hard coal',
'exclusions': ['05.20', '08.92', '09.90', '19.10', '19.20', '43.12']},
'05.2': {'description': 'Mining of lignite', 'exclusions': []},
'05.20': {'description': 'Mining of lignite',
'exclusions': ['05.10', '08.92', '09.90', '19.20', '43.12']},
'06': {'description': 'Extraction of crude petroleum and natural gas',
'exclusions': ['09.10', '19.20', '71.12']},
'06.1': {'description': 'Extraction of crude petroleum', 'exclusions': []},
'06.10': {'description': 'Extraction of crude petroleum',
'exclusions': ['09.10', '19.20', '49.50']},
'06.2': {'description': 'Extraction of natural gas', 'exclusions': []},
'06.20': {'description': 'Extraction of natural gas',
'exclusions': ['09.10', '19.20', '20.11', '49.50']},
'07': {'description': 'Mining of metal ores',
'exclusions': ['20.13', '24.42', '24']},
'07.1': {'description': 'Mining of iron ores', 'exclusions': []},
'07.10': {'description': 'Mining of iron ores', 'exclusions': ['08.91']},
'07.2': {'description': 'Mining of non-ferrous metal ores', 'exclusions': []},
'07.21': {'description': 'Mining of uranium and thorium ores',
'exclusions': ['20.13', '24.46']},
'07.29': {'description': 'Mining of other non-ferrous metal ores',
'exclusions': ['07.21', '24.42', '24.44', '24.45']},
'08': {'description': 'Other mining and quarrying', 'exclusions': []},
'08.1': {'description': 'Quarrying of stone, sand and clay',
'exclusions': []},
'08.11': {'description': 'Quarrying of ornamental and building stone, limestone, gypsum, chalk and slate',
'exclusions': ['08.91', '23.52', '23.70']},
'08.12': {'description': 'Operation of gravel and sand pits; mining of clays and kaolin',
'exclusions': ['06.10']},
'08.9': {'description': 'Mining and quarrying n.e.c.', 'exclusions': []},
'08.91': {'description': 'Mining of chemical and fertiliser minerals',
'exclusions': ['08.93', '20.13', '20.15']},
'08.92': {'description': 'Extraction of peat',
'exclusions': ['09.90', '19.20', '20.15', '23.99']},
'08.93': {'description': 'Extraction of salt',
'exclusions': ['10.84', '36.00']},
'08.99': {'description': 'Other mining and quarrying n.e.c.',
'exclusions': []},
'09': {'description': 'Mining support service activities', 'exclusions': []},
'09.1': {'description': 'Support activities for petroleum and natural gas extraction',
'exclusions': []},
'09.10': {'description': 'Support activities for petroleum and natural gas extraction',
'exclusions': ['06.10', '06.20', '33.12', '52.21', '71.12']},
'09.9': {'description': 'Support activities for other mining and quarrying',
'exclusions': []},
'09.90': {'description': 'Support activities for other mining and quarrying',
'exclusions': ['05', '07', '08', '33.12', '71.12']},
'10': {'description': 'Manufacture of food products', 'exclusions': []},
'10.1': {'description': 'Processing and preserving of meat and production of meat products',
'exclusions': []},
'10.11': {'description': 'Processing and preserving of meat',
'exclusions': ['10.12', '82.92']},
'10.12': {'description': 'Processing and preserving of poultry meat',
'exclusions': ['82.92']},
'10.13': {'description': 'Production of meat and poultry meat products',
'exclusions': ['10.85', '10.89', '46.32', '82.92']},
'10.2': {'description': 'Processing and preserving of fish, crustaceans and molluscs',
'exclusions': []},
'10.20': {'description': 'Processing and preserving of fish, crustaceans and molluscs',
'exclusions': ['03.11', '10.11', '10.41', '10.85', '10.89']},
'10.3': {'description': 'Processing and preserving of fruit and vegetables',
'exclusions': []},
'10.31': {'description': 'Processing and preserving of potatoes',
'exclusions': []},
'10.32': {'description': 'Manufacture of fruit and vegetable juice',
'exclusions': []},
'10.39': {'description': 'Other processing and preserving of fruit and vegetables',
'exclusions': ['10.32', '10.61', '10.82', '10.85', '10.89']},
'10.4': {'description': 'Manufacture of vegetable and animal oils and fats',
'exclusions': []},
'10.41': {'description': 'Manufacture of oils and fats',
'exclusions': ['10.11', '10.42', '10.62', '20.53', '20.59']},
'10.42': {'description': 'Manufacture of margarine and similar edible fats',
'exclusions': []},
'10.5': {'description': 'Manufacture of dairy products', 'exclusions': []},
'10.51': {'description': 'Operation of dairies and cheese making',
'exclusions': ['01.41', '01.43', '01.44', '01.45', '10.89']},
'10.52': {'description': 'Manufacture of ice cream', 'exclusions': ['56.10']},
'10.6': {'description': 'Manufacture of grain mill products, starches and starch products',
'exclusions': []},
'10.61': {'description': 'Manufacture of grain mill products',
'exclusions': ['10.31', '10.62']},
'10.62': {'description': 'Manufacture of starches and starch products',
'exclusions': ['10.51', '10.81']},
'10.7': {'description': 'Manufacture of bakery and farinaceous products',
'exclusions': []},
'10.71': {'description': 'Manufacture of bread; manufacture of fresh pastry goods and cakes',
'exclusions': ['10.72', '10.73', '56']},
'10.72': {'description': 'Manufacture of rusks and biscuits; manufacture of preserved pastry goods and cakes',
'exclusions': ['10.31']},
'10.73': {'description': 'Manufacture of macaroni, noodles, couscous and similar farinaceous products',
'exclusions': ['10.85', '10.89']},
'10.8': {'description': 'Manufacture of other food products',
'exclusions': []},
'10.81': {'description': 'Manufacture of sugar', 'exclusions': ['10.62']},
'10.82': {'description': 'Manufacture of cocoa, chocolate and sugar confectionery',
'exclusions': ['10.81']},
'10.83': {'description': 'Processing of tea and coffee',
'exclusions': ['10.62', '11', '21.20']},
'10.84': {'description': 'Manufacture of condiments and seasonings',
'exclusions': ['01.28']},
'10.85': {'description': 'Manufacture of prepared meals and dishes',
'exclusions': ['10', '10.89', '47.11', '47.29', '46.38', '56.29']},
'10.86': {'description': 'Manufacture of homogenised food preparations and dietetic food',
'exclusions': []},
'10.89': {'description': 'Manufacture of other food products n.e.c.',
'exclusions': ['10.39', '10.85', '11']},
'10.9': {'description': 'Manufacture of prepared animal feeds',
'exclusions': []},
'10.91': {'description': 'Manufacture of prepared feeds for farm animals',
'exclusions': ['10.20', '10.41', '10.61']},
'10.92': {'description': 'Manufacture of prepared pet foods',
'exclusions': ['10.20', '10.41', '10.61']},
'11': {'description': 'Manufacture of beverages',
'exclusions': ['10.32', '10.51', '10.83']},
'11.0': {'description': 'Manufacture of beverages', 'exclusions': []},
'11.01': {'description': 'Distilling, rectifying and blending of spirits',
'exclusions': ['11.02', '11.06', '20.14', '46.34', '82.92']},
'11.02': {'description': 'Manufacture of wine from grape',
'exclusions': ['46.34', '82.92']},
'11.03': {'description': 'Manufacture of cider and other fruit wines',
'exclusions': ['46.34', '82.92']},
'11.04': {'description': 'Manufacture of other non-distilled fermented beverages',
'exclusions': ['46.34', '82.92']},
'11.05': {'description': 'Manufacture of beer', 'exclusions': []},
'11.06': {'description': 'Manufacture of malt', 'exclusions': []},
'11.07': {'description': 'Manufacture of soft drinks; production of mineral waters and other bottled waters',
'exclusions': ['10.32',
'10.51',
'10.83',
'11.01',
'11.02',
'11.03',
'11.04',
'11.05',
'35.30',
'46.34',
'82.92']},
'12': {'description': 'Manufacture of tobacco products', 'exclusions': []},
'12.0': {'description': 'Manufacture of tobacco products', 'exclusions': []},
'12.00': {'description': 'Manufacture of tobacco products',
'exclusions': ['01.15', '01.63']},
'13': {'description': 'Manufacture of textiles', 'exclusions': []},
'13.1': {'description': 'Preparation and spinning of textile fibres',
'exclusions': []},
'13.10': {'description': 'Preparation and spinning of textile fibres',
'exclusions': ['01', '01.16', '01.63', '20.60', '23.14']},
'13.2': {'description': 'Weaving of textiles', 'exclusions': []},
'13.20': {'description': 'Weaving of textiles',
'exclusions': ['13.91', '13.93', '13.95', '13.96', '13.99']},
'13.3': {'description': 'Finishing of textiles', 'exclusions': []},
'13.30': {'description': 'Finishing of textiles', 'exclusions': ['22.19']},
'13.9': {'description': 'Manufacture of other textiles', 'exclusions': []},
'13.91': {'description': 'Manufacture of knitted and crocheted fabrics',
'exclusions': ['13.99', '14.39']},
'13.92': {'description': 'Manufacture of made-up textile articles, except apparel',
'exclusions': ['13.96']},
'13.93': {'description': 'Manufacture of carpets and rugs',
'exclusions': ['16.29', '22.23']},
'13.94': {'description': 'Manufacture of cordage, rope, twine and netting',
'exclusions': ['14.19', '25.93', '32.30']},
'13.95': {'description': 'Manufacture of non-wovens and articles made from non-wovens, except apparel',
'exclusions': []},
'13.96': {'description': 'Manufacture of other technical and industrial textiles',
'exclusions': ['22.19', '22.21', '25.93']},
'13.99': {'description': 'Manufacture of other textiles n.e.c.',
'exclusions': ['13.93', '17.22']},
'14': {'description': 'Manufacture of wearing apparel', 'exclusions': []},
'14.1': {'description': 'Manufacture of wearing apparel, except fur apparel',
'exclusions': []},
'14.11': {'description': 'Manufacture of leather clothes',
'exclusions': ['14.20', '32.30', '32.99']},
'14.12': {'description': 'Manufacture of workwear',
'exclusions': ['15.20', '32.99', '95.29']},
'14.13': {'description': 'Manufacture of other outerwear',
'exclusions': ['14.20', '22.19', '22.29', '32.99', '95.29']},
'14.14': {'description': 'Manufacture of underwear', 'exclusions': ['95.29']},
'14.19': {'description': 'Manufacture of other wearing apparel and accessories',
'exclusions': ['32.30', '32.99', '95.29']},
'14.2': {'description': 'Manufacture of articles of fur', 'exclusions': []},
'14.20': {'description': 'Manufacture of articles of fur',
'exclusions': ['01.4',
'01.70',
'10.11',
'13.20',
'13.91',
'14.19',
'15.11',
'15.20']},
'14.3': {'description': 'Manufacture of knitted and crocheted apparel',
'exclusions': []},
'14.31': {'description': 'Manufacture of knitted and crocheted hosiery',
'exclusions': []},
'14.39': {'description': 'Manufacture of other knitted and crocheted apparel',
'exclusions': ['13.91', '14.31']},
'15': {'description': 'Manufacture of leather and related products',
'exclusions': []},
'15.1': {'description': 'Tanning and dressing of leather; manufacture of luggage, handbags, saddlery and harness; dressing and dyeing of fur',
'exclusions': []},
'15.11': {'description': 'Tanning and dressing of leather; dressing and dyeing of fur',
'exclusions': ['01.4', '10.11', '14.11', '22.19', '22.29']},
'15.12': {'description': 'Manufacture of luggage, handbags and the like, saddlery and harness',
'exclusions': ['14.11',
'14.19',
'15.20',
'30.92',
'32.12',
'32.13',
'32.99']},
'15.2': {'description': 'Manufacture of footwear', 'exclusions': []},
'15.20': {'description': 'Manufacture of footwear',
'exclusions': ['14.19', '16.29', '22.19', '22.29', '32.30', '32.50']},
'16': {'description': 'Manufacture of wood and of products of wood and cork, except furniture; manufacture of articles of straw and plaiting materials',
'exclusions': ['31.0', '43.32', '43.33', '43.39']},
'16.1': {'description': 'Sawmilling and planing of wood', 'exclusions': []},
'16.10': {'description': 'Sawmilling and planing of wood',
'exclusions': ['02.20', '16.21', '16.23', '16.29']},
'16.2': {'description': 'Manufacture of products of wood, cork, straw and plaiting materials',
'exclusions': []},
'16.21': {'description': 'Manufacture of veneer sheets and wood-based panels',
'exclusions': []},
'16.22': {'description': 'Manufacture of assembled parquet floors',
'exclusions': ['16.10']},
'16.23': {'description': "Manufacture of other builders' carpentry and joinery",
'exclusions': ['31.01', '31.02', '31.09']},
'16.24': {'description': 'Manufacture of wooden containers',
'exclusions': ['15.12', '16.29']},
'16.29': {'description': 'Manufacture of other products of wood; manufacture of articles of cork, straw and plaiting materials',
'exclusions': ['13.92',
'15.12',
'15.20',
'20.51',
'26.52',
'28.94',
'31.0',
'32.40',
'32.91',
'32.99']},
'17': {'description': 'Manufacture of paper and paper products',
'exclusions': []},
'17.1': {'description': 'Manufacture of pulp, paper and paperboard',
'exclusions': []},
'17.11': {'description': 'Manufacture of pulp', 'exclusions': []},
'17.12': {'description': 'Manufacture of paper and paperboard',
'exclusions': ['17.21', '17.22', '17.23', '17.24', '17.29', '23.91']},
'17.2': {'description': 'Manufacture of articles of paper and paperboard ',
'exclusions': []},
'17.21': {'description': 'Manufacture of corrugated paper and paperboard and of containers of paper and paperboard',
'exclusions': ['17.23', '17.29']},
'17.22': {'description': 'Manufacture of household and sanitary goods and of toilet requisites',
'exclusions': ['17.12']},
'17.23': {'description': 'Manufacture of paper stationery',
'exclusions': ['18.1']},
'17.24': {'description': 'Manufacture of wallpaper',
'exclusions': ['17.12', '22.29']},
'17.29': {'description': 'Manufacture of other articles of paper and paperboard',
'exclusions': ['32.40']},
'18': {'description': 'Printing and reproduction of recorded media',
'exclusions': ['J']},
'18.1': {'description': 'Printing and service activities related to printing',
'exclusions': []},
'18.11': {'description': 'Printing of newspapers',
'exclusions': ['58.1', '82.19']},
'18.12': {'description': 'Other printing', 'exclusions': ['17.23', '58.1']},
'18.13': {'description': 'Pre-press and pre-media services',
'exclusions': ['74.10']},
'18.14': {'description': 'Binding and related services', 'exclusions': []},
'18.2': {'description': 'Reproduction of recorded media', 'exclusions': []},
'18.20': {'description': 'Reproduction of recorded media',
'exclusions': ['18.11',
'18.12',
'58.2',
'59.11',
'59.12',
'59.13',
'59.20']},
'19': {'description': 'Manufacture of coke and refined petroleum products',
'exclusions': ['20.14', '20.11', '06.20', '35.21', '20']},
'19.1': {'description': 'Manufacture of coke oven products',
'exclusions': []},
'19.10': {'description': 'Manufacture of coke oven products',
'exclusions': ['19.20']},
'19.2': {'description': 'Manufacture of refined petroleum products',
'exclusions': []},
'19.20': {'description': 'Manufacture of refined petroleum products',
'exclusions': []},
'20': {'description': 'Manufacture of chemicals and chemical products',
'exclusions': []},
'20.1': {'description': 'Manufacture of basic chemicals, fertilisers and nitrogen compounds, plastics and synthetic rubber in primary forms',
'exclusions': []},
'20.11': {'description': 'Manufacture of industrial gases',
'exclusions': ['06.20', '19.20', '35.21']},
'20.12': {'description': 'Manufacture of dyes and pigments',
'exclusions': ['20.30']},
'20.13': {'description': 'Manufacture of other inorganic basic chemicals',
'exclusions': ['20.11', '20.15', '20.53', '24']},
'20.14': {'description': 'Manufacture of other organic basic chemicals',
'exclusions': ['20.16', '20.17', '20.41', '20.53', 'O', '21.10']},
'20.15': {'description': 'Manufacture of fertilisers and nitrogen compounds',
'exclusions': ['08.91', '20.20']},
'20.16': {'description': 'Manufacture of plastics in primary forms',
'exclusions': ['20.60', '38.32']},
'20.17': {'description': 'Manufacture of synthetic rubber in primary forms',
'exclusions': []},
'20.2': {'description': 'Manufacture of pesticides and other agrochemical products',
'exclusions': []},
'20.20': {'description': 'Manufacture of pesticides and other agrochemical products',
'exclusions': ['20.15']},
'20.3': {'description': 'Manufacture of paints, varnishes and similar coatings, printing ink and mastics',
'exclusions': []},
'20.30': {'description': 'Manufacture of paints, varnishes and similar coatings, printing ink and mastics',
'exclusions': ['20.12', '20.59']},
'20.4': {'description': 'Manufacture of soap and detergents, cleaning and polishing preparations, perfumes and toilet preparations',
'exclusions': []},
'20.41': {'description': 'Manufacture of soap and detergents, cleaning and polishing preparations',
'exclusions': ['20.13', '20.14', '20.42']},
'20.42': {'description': 'Manufacture of perfumes and toilet preparations',
'exclusions': ['20.53']},
'20.5': {'description': 'Manufacture of other chemical products',
'exclusions': []},
'20.51': {'description': 'Manufacture of explosives', 'exclusions': []},
'20.52': {'description': 'Manufacture of glues', 'exclusions': ['20.59']},
'20.53': {'description': 'Manufacture of essential oils',
'exclusions': ['20.14', '20.42']},
'20.59': {'description': 'Manufacture of other chemical products n.e.c.',
'exclusions': ['20.13', '20.14', '20.30', '23.99']},
'20.6': {'description': 'Manufacture of man-made fibres', 'exclusions': []},
'20.60': {'description': 'Manufacture of man-made fibres',
'exclusions': ['13.10']},
'21': {'description': 'Manufacture of basic pharmaceutical products and pharmaceutical preparations',
'exclusions': []},
'21.1': {'description': 'Manufacture of basic pharmaceutical products',
'exclusions': []},
'21.10': {'description': 'Manufacture of basic pharmaceutical products',
'exclusions': []},
'21.2': {'description': 'Manufacture of pharmaceutical preparations',
'exclusions': []},
'21.20': {'description': 'Manufacture of pharmaceutical preparations',
'exclusions': ['10.83', '32.50', '46.46', '47.73', '72.1', '82.92']},
'22': {'description': 'Manufacture of rubber and plastic products',
'exclusions': []},
'22.1': {'description': 'Manufacture of rubber products', 'exclusions': []},
'22.11': {'description': 'Manufacture of rubber tyres and tubes; retreading and rebuilding of rubber tyres',
'exclusions': ['22.19', '45.20']},
'22.19': {'description': 'Manufacture of other rubber products',
'exclusions': ['13.96',
'14.14',
'14.19',
'15.20',
'20.52',
'22.11',
'30.11',
'30.12',
'31.03',
'32.30',
'32.40',
'38.32']},
'22.2': {'description': 'Manufacture of plastic products', 'exclusions': []},
'22.21': {'description': 'Manufacture of plastic plates, sheets, tubes and profiles',
'exclusions': ['20.16', '22.1']},
'22.22': {'description': 'Manufacture of plastic packing goods',
'exclusions': ['15.12']},
'22.23': {'description': 'Manufacture of builders’ ware of plastic',
'exclusions': []},
'22.29': {'description': 'Manufacture of other plastic products',
'exclusions': ['15.12',
'15.20',
'31.01',
'31.02',
'31.09',
'31.03',
'32.30',
'32.40',
'32.50',
'32.99']},
'23': {'description': 'Manufacture of other non-metallic mineral products',
'exclusions': []},
'23.1': {'description': 'Manufacture of glass and glass products',
'exclusions': []},
'23.11': {'description': 'Manufacture of flat glass', 'exclusions': []},
'23.12': {'description': 'Shaping and processing of flat glass',
'exclusions': []},
'23.13': {'description': 'Manufacture of hollow glass',
'exclusions': ['32.40']},
'23.14': {'description': 'Manufacture of glass fibres',
'exclusions': ['13.20', '27.31']},
'23.19': {'description': 'Manufacture and processing of other glass, including technical glassware',
'exclusions': ['26.70', '32.50']},
'23.2': {'description': 'Manufacture of refractory products',
'exclusions': []},
'23.20': {'description': 'Manufacture of refractory products',
'exclusions': []},
'23.3': {'description': 'Manufacture of clay building materials',
'exclusions': []},
'23.31': {'description': 'Manufacture of ceramic tiles and flags',
'exclusions': ['22.23', '23.20', '23.32']},
'23.32': {'description': 'Manufacture of bricks, tiles and construction products, in baked clay',
'exclusions': ['23.20', '23.4']},
'23.4': {'description': 'Manufacture of other porcelain and ceramic products',
'exclusions': []},
'23.41': {'description': 'Manufacture of ceramic household and ornamental articles',
'exclusions': ['32.13', '32.40']},
'23.42': {'description': 'Manufacture of ceramic sanitary fixtures',
'exclusions': ['23.20', '23.3']},
'23.43': {'description': 'Manufacture of ceramic insulators and insulating fittings',
'exclusions': ['23.20']},
'23.44': {'description': 'Manufacture of other technical ceramic products',
'exclusions': ['22.23', '23.20', '23.3']},
'23.49': {'description': 'Manufacture of other ceramic products',
'exclusions': ['23.42', '32.50']},
'23.5': {'description': 'Manufacture of cement, lime and plaster',
'exclusions': []},
'23.51': {'description': 'Manufacture of cement',
'exclusions': ['23.20', '23.63', '23.64', '23.69', '32.50']},
'23.52': {'description': 'Manufacture of lime and plaster',
'exclusions': ['23.62', '23.69']},
'23.6': {'description': 'Manufacture of articles of concrete, cement and plaster',
'exclusions': []},
'23.61': {'description': 'Manufacture of concrete products for construction purposes',
'exclusions': []},
'23.62': {'description': 'Manufacture of plaster products for construction purposes',
'exclusions': []},
'23.63': {'description': 'Manufacture of ready-mixed concrete',
'exclusions': ['23.20']},
'23.64': {'description': 'Manufacture of mortars',
'exclusions': ['23.20', '23.63']},
'23.65': {'description': 'Manufacture of fibre cement', 'exclusions': []},
'23.69': {'description': 'Manufacture of other articles of concrete, plaster and cement',
'exclusions': []},
'23.7': {'description': 'Cutting, shaping and finishing of stone',
'exclusions': []},
'23.70': {'description': 'Cutting, shaping and finishing of stone',
'exclusions': ['08.11', '23.9']},
'23.9': {'description': 'Manufacture of abrasive products and non-metallic mineral products n.e.c.',
'exclusions': []},
'23.91': {'description': 'Production of abrasive products', 'exclusions': []},
'23.99': {'description': 'Manufacture of other non-metallic mineral products n.e.c.',
'exclusions': ['23.14', '27.90', '28.29']},
'24': {'description': 'Manufacture of basic metals', 'exclusions': []},
'24.1': {'description': 'Manufacture of basic iron and steel and of ferro-alloys',
'exclusions': []},
'24.10': {'description': 'Manufacture of basic iron and steel and of ferro-alloys ',
'exclusions': ['24.31']},
'24.2': {'description': 'Manufacture of tubes, pipes, hollow profiles and related fittings, of steel',
'exclusions': []},
'24.20': {'description': 'Manufacture of tubes, pipes, hollow profiles and related fittings, of steel',
'exclusions': ['24.52']},
'24.3': {'description': 'Manufacture of other products of first processing of steel',
'exclusions': []},
'24.31': {'description': 'Cold drawing of bars', 'exclusions': ['24.34']},
'24.32': {'description': 'Cold rolling of narrow strip', 'exclusions': []},
'24.33': {'description': 'Cold forming or folding', 'exclusions': []},
'24.34': {'description': 'Cold drawing of wire',
'exclusions': ['24.31', '25.93']},
'24.4': {'description': 'Manufacture of basic precious and other non-ferrous metals',
'exclusions': []},
'24.41': {'description': 'Precious metals production',
'exclusions': ['24.53', '24.54', '32.12']},
'24.42': {'description': 'Aluminium production',
'exclusions': ['24.53', '24.54']},
'24.43': {'description': 'Lead, zinc and tin production',
'exclusions': ['24.53', '24.54']},
'24.44': {'description': 'Copper production',
'exclusions': ['24.53', '24.54']},
'24.45': {'description': 'Other non-ferrous metal production',
'exclusions': ['24.53', '24.54']},
'24.46': {'description': 'Processing of nuclear fuel ', 'exclusions': []},
'24.5': {'description': 'Casting of metals',
'exclusions': ['25.21', '25.99']},
'24.51': {'description': 'Casting of iron', 'exclusions': []},
'24.52': {'description': 'Casting of steel', 'exclusions': []},
'24.53': {'description': 'Casting of light metals', 'exclusions': []},
'24.54': {'description': 'Casting of other non-ferrous metals',
'exclusions': []},
'25': {'description': 'Manufacture of fabricated metal products, except machinery and equipment',
'exclusions': ['33.1', '43.22']},
'25.1': {'description': 'Manufacture of structural metal products',
'exclusions': []},
'25.11': {'description': 'Manufacture of metal structures and parts of structures',
'exclusions': ['25.30', '25.99', '30.11']},
'25.12': {'description': 'Manufacture of doors and windows of metal',
'exclusions': []},
'25.2': {'description': 'Manufacture of tanks, reservoirs and containers of metal',
'exclusions': []},
'25.21': {'description': 'Manufacture of central heating radiators and boilers',
'exclusions': ['27.51']},
'25.29': {'description': 'Manufacture of other tanks, reservoirs and containers of metal',
'exclusions': ['25.91', '25.92', '29.20', '30.40']},
'25.3': {'description': 'Manufacture of steam generators, except central heating hot water boilers',
'exclusions': []},
'25.30': {'description': 'Manufacture of steam generators, except central heating hot water boilers',
'exclusions': ['25.21', '28.11', '28.99']},
'25.4': {'description': 'Manufacture of weapons and ammunition',
'exclusions': []},
'25.40': {'description': 'Manufacture of weapons and ammunition',
'exclusions': ['20.51', '25.71', '29.10', '30.30', '30.40']},
'25.5': {'description': 'Forging, pressing, stamping and roll-forming of metal; powder metallurgy',
'exclusions': []},
'25.50': {'description': 'Forging, pressing, stamping and roll-forming of metal; powder metallurgy',
'exclusions': ['24.1', '24.4']},
'25.6': {'description': 'Treatment and coating of metals; machining',
'exclusions': []},
'25.61': {'description': 'Treatment and coating of metals',
'exclusions': ['01.62',
'18.12',
'22.29',
'24.41',
'24.42',
'24.43',
'24.44',
'95.29']},
'25.62': {'description': 'Machining', 'exclusions': ['01.62']},
'25.7': {'description': 'Manufacture of cutlery, tools and general hardware',
'exclusions': []},
'25.71': {'description': 'Manufacture of cutlery',
'exclusions': ['25.99', '32.12']},
'25.72': {'description': 'Manufacture of locks and hinges', 'exclusions': []},
'25.73': {'description': 'Manufacture of tools',
'exclusions': ['28.24', '28.91']},
'25.9': {'description': 'Manufacture of other fabricated metal products',
'exclusions': []},
'25.91': {'description': 'Manufacture of steel drums and similar containers',
'exclusions': ['25.2']},
'25.92': {'description': 'Manufacture of light metal packaging ',
'exclusions': []},
'25.93': {'description': 'Manufacture of wire products, chain and springs',
'exclusions': ['26.52', '27.32', '28.15']},
'25.94': {'description': 'Manufacture of fasteners and screw machine products',
'exclusions': []},
'25.99': {'description': 'Manufacture of other fabricated metal products n.e.c.',
'exclusions': ['25.71',
'30.99',
'31.01',
'31.02',
'31.09',
'32.30',
'32.40']},
'26': {'description': 'Manufacture of computer, electronic and optical products',
'exclusions': []},
'26.1': {'description': 'Manufacture of electronic components and boards',
'exclusions': []},
'26.11': {'description': 'Manufacture of electronic components',
'exclusions': ['18.12',
'26.20',
'26.40',
'26.30',
'X',
'26.60',
'26.70',
'27',
'27.11',
'27.12',
'27.33']},
'26.12': {'description': 'Manufacture of loaded electronic boards',
'exclusions': ['18.12', '26.11']},
'26.2': {'description': 'Manufacture of computers and peripheral equipment',
'exclusions': []},
'26.20': {'description': 'Manufacture of computers and peripheral equipment',
'exclusions': ['18.20', '26.1', '26.12', '26.30', '26.40', '26.80']},
'26.3': {'description': 'Manufacture of communication equipment',
'exclusions': []},
'26.30': {'description': 'Manufacture of communication equipment',
'exclusions': ['26.1', '26.12', '26.20', '26.40', '26.51', '27.90']},
'26.4': {'description': 'Manufacture of consumer electronics',
'exclusions': []},
'26.40': {'description': 'Manufacture of consumer electronics',
'exclusions': ['18.2', '26.20', '26.30', '26.70', '32.40']},
'26.5': {'description': 'Manufacture of instruments and appliances for measuring, testing and navigation; watches and clocks',
'exclusions': []},
'26.51': {'description': 'Manufacture of instruments and appliances for measuring, testing and navigation',
'exclusions': ['26.30',
'26.60',
'26.70',
'28.23',
'28.29',
'32.50',
'33.20']},
'26.52': {'description': 'Manufacture of watches and clocks',
'exclusions': ['15.12', '32.12', '32.13']},
'26.6': {'description': 'Manufacture of irradiation, electromedical and electrotherapeutic equipment',
'exclusions': []},
'26.60': {'description': 'Manufacture of irradiation, electromedical and electrotherapeutic equipment',
'exclusions': ['27.90']},
'26.7': {'description': 'Manufacture of optical instruments and photographic equipment',
'exclusions': []},
'26.70': {'description': 'Manufacture of optical instruments and photographic equipment',
'exclusions': ['26.20', '26.30', '26.40', '26.60', '28.23', '32.50']},
'26.8': {'description': 'Manufacture of magnetic and optical media',
'exclusions': []},
'26.80': {'description': 'Manufacture of magnetic and optical media',
'exclusions': ['18.2']},
'27': {'description': 'Manufacture of electrical equipment',
'exclusions': ['26']},
'27.1': {'description': 'Manufacture of electric motors, generators, transformers and electricity distribution and control apparatus',
'exclusions': []},
'27.11': {'description': 'Manufacture of electric motors, generators and transformers',
'exclusions': ['26.11', '27.90', '28.11', '29.31']},
'27.12': {'description': 'Manufacture of electricity distribution and control apparatus',
'exclusions': ['26.51', '27.33']},
'27.2': {'description': 'Manufacture of batteries and accumulators',
'exclusions': []},
'27.20': {'description': 'Manufacture of batteries and accumulators',
'exclusions': []},
'27.3': {'description': 'Manufacture of wiring and wiring devices',
'exclusions': []},
'27.31': {'description': 'Manufacture of fibre optic cables',
'exclusions': ['23.14', '26.11']},
'27.32': {'description': 'Manufacture of other electronic and electric wires and cables',
'exclusions': ['24.34',
'24.41',
'24.42',
'24.43',
'24.44',
'24.45',
'26.11',
'27.90',
'29.31']},
'27.33': {'description': 'Manufacture of wiring devices',
'exclusions': ['23.43', '26.11']},
'27.4': {'description': 'Manufacture of electric lighting equipment',
'exclusions': []},
'27.40': {'description': 'Manufacture of electric lighting equipment',
'exclusions': ['23.19', '27.33', '27.51', '27.90']},
'27.5': {'description': 'Manufacture of domestic appliances',
'exclusions': []},
'27.51': {'description': 'Manufacture of electric domestic appliances',
'exclusions': ['28', '28.94', '43.29']},
'27.52': {'description': 'Manufacture of non-electric domestic appliances',
'exclusions': []},
'27.9': {'description': 'Manufacture of other electrical equipment',
'exclusions': []},
'27.90': {'description': 'Manufacture of other electrical equipment',
'exclusions': ['23.43',
'23.99',
'26.11',
'27.1',
'27.20',
'27.3',
'27.40',
'27.5',
'28.29',
'29.31']},
'28': {'description': 'Manufacture of machinery and equipment n.e.c.',
'exclusions': ['25', '26', '27', '29', '30']},
'28.1': {'description': 'Manufacture of general-purpose machinery',
'exclusions': []},
'28.11': {'description': 'Manufacture of engines and turbines, except aircraft, vehicle and cycle engines',
'exclusions': ['27.11', '29.31', '29.10', '30.30', '30.91']},
'28.12': {'description': 'Manufacture of fluid power equipment',
'exclusions': ['28.13', '28.14', '28.15']},
'28.13': {'description': 'Manufacture of other pumps and compressors',
'exclusions': ['28.12']},
'28.14': {'description': 'Manufacture of other taps and valves',
'exclusions': ['22.19', '23.19', '23.44', '28.11', '28.12']},
'28.15': {'description': 'Manufacture of bearings, gears, gearing and driving elements',
'exclusions': ['25.93', '28.12', '29.31', '29', '30']},
'28.2': {'description': 'Manufacture of other general-purpose machinery',
'exclusions': []},
'28.21': {'description': 'Manufacture of ovens, furnaces and furnace burners',
'exclusions': ['27.51', '28.93', '28.99', '32.50']},
'28.22': {'description': 'Manufacture of lifting and handling equipment',
'exclusions': ['28.99', '28.92', '30.11', '30.20', '43.29']},
'28.23': {'description': 'Manufacture of office machinery and equipment (except computers and peripheral equipment)',
'exclusions': ['26.20']},
'28.24': {'description': 'Manufacture of power-driven hand tools',
'exclusions': ['25.73', '27.90']},
'28.25': {'description': 'Manufacture of non-domestic cooling and ventilation equipment',
'exclusions': ['27.51']},
'28.29': {'description': 'Manufacture of other general-purpose machinery n.e.c.',
'exclusions': ['26.51',
'27.51',
'27.90',
'28.30',
'28.91',
'28.99',
'28.93',
'28.94']},
'28.3': {'description': 'Manufacture of agricultural and forestry machinery',
'exclusions': []},
'28.30': {'description': 'Manufacture of agricultural and forestry machinery',
'exclusions': ['25.73', '28.22', '28.24', '28.93', '29.10', '29.20']},
'28.4': {'description': 'Manufacture of metal forming machinery and machine tools',
'exclusions': []},
'28.41': {'description': 'Manufacture of metal forming machinery',
'exclusions': ['25.73', '27.90']},
'28.49': {'description': 'Manufacture of other machine tools',
'exclusions': ['25.73', '27.90', '28.24', '28.91', '28.92']},
'28.9': {'description': 'Manufacture of other special-purpose machinery',
'exclusions': []},
'28.91': {'description': 'Manufacture of machinery for metallurgy',
'exclusions': ['28.41', '25.73', '28.99']},
'28.92': {'description': 'Manufacture of machinery for mining, quarrying and construction',
'exclusions': ['28.22', '28.30', '29.10', '28.49']},
'28.93': {'description': 'Manufacture of machinery for food, beverage and tobacco processing',
'exclusions': ['26.60', '28.29', '28.30']},
'28.94': {'description': 'Manufacture of machinery for textile, apparel and leather production',
'exclusions': ['17.29', '27.51', '28.29', '28.99']},
'28.95': {'description': 'Manufacture of machinery for paper and paperboard production',
'exclusions': []},
'28.96': {'description': 'Manufacture of plastics and rubber machinery',
'exclusions': []},
'28.99': {'description': 'Manufacture of other special-purpose machinery n.e.c.',
'exclusions': ['27.5', '28.23', '28.49', '28.91']},
'29': {'description': 'Manufacture of motor vehicles, trailers and semi-trailers',
'exclusions': []},
'29.1': {'description': 'Manufacture of motor vehicles', 'exclusions': []},
'29.10': {'description': 'Manufacture of motor vehicles',
'exclusions': ['27.11',
'27.40',
'28.11',
'28.30',
'28.92',
'29.20',
'29.31',
'29.32',
'30.40',
'45.20']},
'29.2': {'description': 'Manufacture of bodies (coachwork) for motor vehicles; manufacture of trailers and semi-trailers',
'exclusions': []},
'29.20': {'description': 'Manufacture of bodies (coachwork) for motor vehicles; manufacture of trailers and semi-trailers',
'exclusions': ['28.30', '29.32', '30.99']},
'29.3': {'description': 'Manufacture of parts and accessories for motor vehicles',
'exclusions': []},
'29.31': {'description': 'Manufacture of electrical and electronic equipment for motor vehicles',
'exclusions': ['27.20', '27.40', '28.13']},
'29.32': {'description': 'Manufacture of other parts and accessories for motor vehicles',
'exclusions': ['22.11', '22.19', '28.11', '45.20']},
'30': {'description': 'Manufacture of other transport equipment',
'exclusions': []},
'30.1': {'description': 'Building of ships and boats', 'exclusions': []},
'30.11': {'description': 'Building of ships and floating structures',
'exclusions': ['13.92',
'25.99',
'28.11',
'26.51',
'27.40',
'29.10',
'30.12',
'33.15',
'38.31',
'43.3']},
'30.12': {'description': 'Building of pleasure and sporting boats',
'exclusions': ['13.92', '25.99', '28.11', '32.30', '33.15']},
'30.2': {'description': 'Manufacture of railway locomotives and rolling stock',
'exclusions': []},
'30.20': {'description': 'Manufacture of railway locomotives and rolling stock',
'exclusions': ['24.10', '25.99', '27.11', '27.90', '28.11']},
'30.3': {'description': 'Manufacture of air and spacecraft and related machinery',
'exclusions': []},
'30.30': {'description': 'Manufacture of air and spacecraft and related machinery',
'exclusions': ['13.92',
'25.40',
'26.30',
'26.51',
'27.40',
'27.90',
'28.11',
'28.99']},
'30.4': {'description': 'Manufacture of military fighting vehicles',
'exclusions': []},
'30.40': {'description': 'Manufacture of military fighting vehicles',
'exclusions': ['25.40']},
'30.9': {'description': 'Manufacture of transport equipment n.e.c.',
'exclusions': []},
'30.91': {'description': 'Manufacture of motorcycles',
'exclusions': ['30.92']},
'30.92': {'description': 'Manufacture of bicycles and invalid carriages',
'exclusions': ['30.91', '32.40']},
'30.99': {'description': 'Manufacture of other transport equipment n.e.c.',
'exclusions': ['28.22', '31.01']},
'31': {'description': 'Manufacture of furniture', 'exclusions': []},
'31.0': {'description': 'Manufacture of furniture', 'exclusions': []},
'31.01': {'description': 'Manufacture of office and shop furniture',
'exclusions': ['28.23', '29.32', '30.20', '30.30', '32.50', '43.32']},
'31.02': {'description': 'Manufacture of kitchen furniture',
'exclusions': []},
'31.03': {'description': 'Manufacture of mattresses',
'exclusions': ['22.19']},
'31.09': {'description': 'Manufacture of other furniture',
'exclusions': ['13.92',
'23.42',
'23.69',
'23.70',
'27.40',
'29.32',
'30.20',
'30.30',
'95.24']},
'32': {'description': 'Other manufacturing', 'exclusions': []},
'32.1': {'description': 'Manufacture of jewellery, bijouterie and related articles',
'exclusions': []},
'32.11': {'description': 'Striking of coins', 'exclusions': []},
'32.12': {'description': 'Manufacture of jewellery and related articles',
'exclusions': ['15.12', '25', '26.52', '32.13', '95.25']},
'32.13': {'description': 'Manufacture of imitation jewellery and related articles',
'exclusions': ['32.12']},
'32.2': {'description': 'Manufacture of musical instruments',
'exclusions': []},
'32.20': {'description': 'Manufacture of musical instruments',
'exclusions': ['18.2', '26.40', '32.40', '33.19', '59.20', '95.29']},
'32.3': {'description': 'Manufacture of sports goods', 'exclusions': []},
'32.30': {'description': 'Manufacture of sports goods',
'exclusions': ['13.92',
'14.19',
'15.12',
'15.20',
'25.40',
'25.99',
'29',
'30',
'30.12',
'32.40',
'32.99',
'95.29']},
'32.4': {'description': 'Manufacture of games and toys', 'exclusions': []},
'32.40': {'description': 'Manufacture of games and toys',
'exclusions': ['26.40', '30.92', '32.99', '58.21', '62.01']},
'32.5': {'description': 'Manufacture of medical and dental instruments and supplies',
'exclusions': []},
'32.50': {'description': 'Manufacture of medical and dental instruments and supplies',
'exclusions': ['20.42', '21.20', '26.60', '30.92', '47.78']},
'32.9': {'description': 'Manufacturing n.e.c.', 'exclusions': []},
'32.91': {'description': 'Manufacture of brooms and brushes',
'exclusions': []},
'32.99': {'description': 'Other manufacturing n.e.c. ',
'exclusions': ['13.96', '14.12', '17.29']},
'33': {'description': 'Repair and installation of machinery and equipment',
'exclusions': ['81.22', '95.1', '95.2']},
'33.1': {'description': 'Repair of fabricated metal products, machinery and equipment',
'exclusions': ['25', '30', '81.22', '95.1', '95.2']},
'33.11': {'description': 'Repair of fabricated metal products',
'exclusions': ['33.12', '43.22', '80.20']},
'33.12': {'description': 'Repair of machinery',
'exclusions': ['43.22', '43.29', '95.11']},
'33.13': {'description': 'Repair of electronic and optical equipment',
'exclusions': ['33.12', '95.11', '95.12', '95.21', '95.25']},
'33.14': {'description': 'Repair of electrical equipment',
'exclusions': ['95.11', '95.12', '95.21', '95.25']},
'33.15': {'description': 'Repair and maintenance of ships and boats',
'exclusions': ['30.1', '33.12', '38.31']},
'33.16': {'description': 'Repair and maintenance of aircraft and spacecraft',
'exclusions': ['30.30']},
'33.17': {'description': 'Repair and maintenance of other transport equipment',
'exclusions': ['30.20', '30.40', '33.11', '33.12', '45.40', '95.29']},
'33.19': {'description': 'Repair of other equipment',
'exclusions': ['95.24', '95.29']},
'33.2': {'description': 'Installation of industrial machinery and equipment',
'exclusions': []},
'33.20': {'description': 'Installation of industrial machinery and equipment',
'exclusions': ['43.29', '43.32', '62.09']},
'35': {'description': 'Electricity, gas, steam and air conditioning supply',
'exclusions': []},
'35.1': {'description': 'Electric power generation, transmission and distribution',
'exclusions': []},
'35.11': {'description': 'Production of electricity',
'exclusions': ['38.21']},
'35.12': {'description': 'Transmission of electricity', 'exclusions': []},
'35.13': {'description': 'Distribution of electricity', 'exclusions': []},
'35.14': {'description': 'Trade of electricity', 'exclusions': []},
'35.2': {'description': 'Manufacture of gas; distribution of gaseous fuels through mains',
'exclusions': []},
'35.21': {'description': 'Manufacture of gas',
'exclusions': ['06.20', '19.10', '19.20', '20.11']},
'35.22': {'description': 'Distribution of gaseous fuels through mains',
'exclusions': ['49.50']},
'35.23': {'description': 'Trade of gas through mains',
'exclusions': ['46.71', '47.78', '47.99']},
'35.3': {'description': 'Steam and air conditioning supply',
'exclusions': []},
'35.30': {'description': 'Steam and air conditioning supply',
'exclusions': []},
'36': {'description': 'Water collection, treatment and supply',
'exclusions': []},
'36.0': {'description': 'Water collection, treatment and supply',
'exclusions': []},
'36.00': {'description': 'Water collection, treatment and supply',
'exclusions': ['01.61', '37.00', '49.50']},
'37': {'description': 'Sewerage', 'exclusions': []},
'37.0': {'description': 'Sewerage', 'exclusions': []},
'37.00': {'description': 'Sewerage', 'exclusions': ['39.00', '43.22']},
'38': {'description': 'Waste collection, treatment and disposal activities; materials recovery',
'exclusions': []},
'38.1': {'description': 'Waste collection', 'exclusions': []},
'38.11': {'description': 'Collection of non-hazardous waste',
'exclusions': ['38.12', '38.21', '38.32']},
'38.12': {'description': 'Collection of hazardous waste',
'exclusions': ['39.00']},
'38.2': {'description': 'Waste treatment and disposal',
'exclusions': ['37.00', '38.3']},
'38.21': {'description': 'Treatment and disposal of non-hazardous waste',
'exclusions': ['38.22', '38.32', '39.00']},
'38.22': {'description': 'Treatment and disposal of hazardous waste',
'exclusions': ['20.13', '38.21', '39.00']},
'38.3': {'description': 'Materials recovery', 'exclusions': []},
'38.31': {'description': 'Dismantling of wrecks',
'exclusions': ['38.22', 'G']},
'38.32': {'description': 'Recovery of sorted materials',
'exclusions': ['C', '20.13', '24.10', '38.2', '38.21', '38.22', '46.77']},
'39': {'description': 'Remediation activities and other waste management services',
'exclusions': []},
'39.0': {'description': 'Remediation activities and other waste management services',
'exclusions': []},
'39.00': {'description': 'Remediation activities and other waste management services',
'exclusions': ['01.61', '36.00', '38.21', '38.22', '81.29']},
'41': {'description': 'Construction of buildings', 'exclusions': []},
'41.1': {'description': 'Development of building projects', 'exclusions': []},
'41.10': {'description': 'Development of building projects',
'exclusions': ['41.20', '71.1']},
'41.2': {'description': 'Construction of residential and non-residential buildings',
'exclusions': []},
'41.20': {'description': 'Construction of residential and non-residential buildings',
'exclusions': ['42.99', '71.1']},
'42': {'description': 'Civil engineering', 'exclusions': []},
'42.1': {'description': 'Construction of roads and railways',
'exclusions': []},
'42.11': {'description': 'Construction of roads and motorways',
'exclusions': ['43.21', '71.1']},
'42.12': {'description': 'Construction of railways and underground railways',
'exclusions': ['43.21', '71.1']},
'42.13': {'description': 'Construction of bridges and tunnels',
'exclusions': ['43.21', '71.1']},
'42.2': {'description': 'Construction of utility projects', 'exclusions': []},
'42.21': {'description': 'Construction of utility projects for fluids',
'exclusions': ['71.12']},
'42.22': {'description': 'Construction of utility projects for electricity and telecommunications',
'exclusions': ['71.12']},
'42.9': {'description': 'Construction of other civil engineering projects',
'exclusions': []},
'42.91': {'description': 'Construction of water projects',
'exclusions': ['71.12']},
'42.99': {'description': 'Construction of other civil engineering projects n.e.c.',
'exclusions': ['33.20', '68.10', '71.12']},
'43': {'description': 'Specialised construction activities',
'exclusions': []},
'43.1': {'description': 'Demolition and site preparation', 'exclusions': []},
'43.11': {'description': 'Demolition', 'exclusions': []},
'43.12': {'description': 'Site preparation',
'exclusions': ['06.10', '06.20', '39.00', '42.21', '43.99']},
'43.13': {'description': 'Test drilling and boring',
'exclusions': ['06.10', '06.20', '09.90', '42.21', '43.99', '71.12']},
'43.2': {'description': 'Electrical, plumbing and other construction installation activities',
'exclusions': []},
'43.21': {'description': 'Electrical installation',
'exclusions': ['42.22', '80.20']},
'43.22': {'description': 'Plumbing, heat and air-conditioning installation',
'exclusions': ['43.21']},
'43.29': {'description': 'Other construction installation',
'exclusions': ['33.20']},
'43.3': {'description': 'Building completion and finishing',
'exclusions': []},
'43.31': {'description': 'Plastering', 'exclusions': []},
'43.32': {'description': 'Joinery installation', 'exclusions': ['43.29']},
'43.33': {'description': 'Floor and wall covering', 'exclusions': []},
'43.34': {'description': 'Painting and glazing', 'exclusions': ['43.32']},
'43.39': {'description': 'Other building completion and finishing',
'exclusions': ['74.10', '81.21', '81.22']},
'43.9': {'description': 'Other specialised construction activities',
'exclusions': []},
'43.91': {'description': 'Roofing activities', 'exclusions': ['77.32']},
'43.99': {'description': 'Other specialised construction activities n.e.c.',
'exclusions': ['77.32']},
'45': {'description': 'Wholesale and retail trade and repair of motor vehicles and motorcycles',
'exclusions': []},
'45.1': {'description': 'Sale of motor vehicles', 'exclusions': []},
'45.11': {'description': 'Sale of cars and light motor vehicles',
'exclusions': ['45.3', '49.3', '77.1']},
'45.19': {'description': 'Sale of other motor vehicles',
'exclusions': ['45.3', '49.41', '77.12']},
'45.2': {'description': 'Maintenance and repair of motor vehicles',
'exclusions': []},
'45.20': {'description': 'Maintenance and repair of motor vehicles',
'exclusions': ['22.11']},
'45.3': {'description': 'Sale of motor vehicle parts and accessories',
'exclusions': []},
'45.31': {'description': 'Wholesale trade of motor vehicle parts and accessories',
'exclusions': []},
'45.32': {'description': 'Retail trade of motor vehicle parts and accessories',
'exclusions': ['47.30']},
'45.4': {'description': 'Sale, maintenance and repair of motorcycles and related parts and accessories',
'exclusions': []},
'45.40': {'description': 'Sale, maintenance and repair of motorcycles and related parts and accessories',
'exclusions': ['46.49', '47.64', '77.39', '95.29']},
'46': {'description': 'Wholesale trade, except of motor vehicles and motorcycles',
'exclusions': ['45.1', '45.4', '45.31', '45.40', '77', '82.92']},
'46.1': {'description': 'Wholesale on a fee or contract basis',
'exclusions': []},
'46.11': {'description': 'Agents involved in the sale of agricultural raw materials, live animals, textile raw materials and semi-finished goods',
'exclusions': ['46.2', '46.9', '47.99']},
'46.12': {'description': 'Agents involved in the sale of fuels, ores, metals and industrial chemicals',
'exclusions': ['46.2', '46.9', '47.99']},
'46.13': {'description': 'Agents involved in the sale of timber and building materials',
'exclusions': ['46.2', '46.9', '47.99']},
'46.14': {'description': 'Agents involved in the sale of machinery, industrial equipment, ships and aircraft',
'exclusions': ['45.1', '46.2', '46.9', '47.99']},
'46.15': {'description': 'Agents involved in the sale of furniture, household goods, hardware and ironmongery',
'exclusions': ['46.2', '46.9', '47.99']},
'46.16': {'description': 'Agents involved in the sale of textiles, clothing, fur, footwear and leather goods',
'exclusions': ['46.2', '46.9', '47.99']},
'46.17': {'description': 'Agents involved in the sale of food, beverages and tobacco',
'exclusions': ['46.2', '46.9', '47.99']},
'46.18': {'description': 'Agents specialised in the sale of other particular products',
'exclusions': ['46.2', '46.9', '47.99', '66.22', '68.31']},
'46.19': {'description': 'Agents involved in the sale of a variety of goods',
'exclusions': ['46.2', '46.9', '47.99']},
'46.2': {'description': 'Wholesale of agricultural raw materials and live animals',
'exclusions': []},
'46.21': {'description': 'Wholesale of grain, unmanufactured tobacco, seeds and animal feeds',
'exclusions': ['46.76']},
'46.22': {'description': 'Wholesale of flowers and plants', 'exclusions': []},
'46.23': {'description': 'Wholesale of live animals', 'exclusions': []},
'46.24': {'description': 'Wholesale of hides, skins and leather',
'exclusions': []},
'46.3': {'description': 'Wholesale of food, beverages and tobacco',
'exclusions': []},
'46.31': {'description': 'Wholesale of fruit and vegetables',
'exclusions': []},
'46.32': {'description': 'Wholesale of meat and meat products',
'exclusions': []},
'46.33': {'description': 'Wholesale of dairy products, eggs and edible oils and fats',
'exclusions': []},
'46.34': {'description': 'Wholesale of beverages',
'exclusions': ['11.01', '11.02']},
'46.35': {'description': 'Wholesale of tobacco products', 'exclusions': []},
'46.36': {'description': 'Wholesale of sugar and chocolate and sugar confectionery',
'exclusions': []},
'46.37': {'description': 'Wholesale of coffee, tea, cocoa and spices',
'exclusions': []},
'46.38': {'description': 'Wholesale of other food, including fish, crustaceans and molluscs',
'exclusions': []},
'46.39': {'description': 'Non-specialised wholesale of food, beverages and tobacco',
'exclusions': []},
'46.4': {'description': 'Wholesale of household goods', 'exclusions': []},
'46.41': {'description': 'Wholesale of textiles', 'exclusions': ['46.76']},
'46.42': {'description': 'Wholesale of clothing and footwear',
'exclusions': ['46.48', '46.49']},
'46.43': {'description': 'Wholesale of electrical household appliances',
'exclusions': ['46.52', '46.64']},
'46.44': {'description': 'Wholesale of china and glassware and cleaning materials',
'exclusions': []},
'46.45': {'description': 'Wholesale of perfume and cosmetics',
'exclusions': []},
'46.46': {'description': 'Wholesale of pharmaceutical goods',
'exclusions': []},
'46.47': {'description': 'Wholesale of furniture, carpets and lighting equipment',
'exclusions': ['46.65']},
'46.48': {'description': 'Wholesale of watches and jewellery',
'exclusions': []},
'46.49': {'description': 'Wholesale of other household goods',
'exclusions': []},
'46.5': {'description': 'Wholesale of information and communication equipment',
'exclusions': []},
'46.51': {'description': 'Wholesale of computers, computer peripheral equipment and software',
'exclusions': ['46.52', '46.66']},
'46.52': {'description': 'Wholesale of electronic and telecommunications equipment and parts',
'exclusions': ['46.43', '46.51']},
'46.6': {'description': 'Wholesale of other machinery, equipment and supplies',
'exclusions': []},
'46.61': {'description': 'Wholesale of agricultural machinery, equipment and supplies',
'exclusions': []},
'46.62': {'description': 'Wholesale of machine tools', 'exclusions': []},
'46.63': {'description': 'Wholesale of mining, construction and civil engineering machinery',
'exclusions': []},
'46.64': {'description': 'Wholesale of machinery for the textile industry and of sewing and knitting machines',
'exclusions': []},
'46.65': {'description': 'Wholesale of office furniture', 'exclusions': []},
'46.66': {'description': 'Wholesale of other office machinery and equipment',
'exclusions': ['46.51', '46.52']},
'46.69': {'description': 'Wholesale of other machinery and equipment',
'exclusions': ['45.1', '45.31', '45.40', '46.49']},
'46.7': {'description': 'Other specialised wholesale', 'exclusions': []},
'46.71': {'description': 'Wholesale of solid, liquid and gaseous fuels and related products',
'exclusions': []},
'46.72': {'description': 'Wholesale of metals and metal ores',
'exclusions': ['46.77']},
'46.73': {'description': 'Wholesale of wood, construction materials and sanitary equipment',
'exclusions': []},
'46.74': {'description': 'Wholesale of hardware, plumbing and heating equipment and supplies',
'exclusions': []},
'46.75': {'description': 'Wholesale of chemical products', 'exclusions': []},
'46.76': {'description': 'Wholesale of other intermediate products',
'exclusions': []},
'46.77': {'description': 'Wholesale of waste and scrap',
'exclusions': ['38.1', '38.2', '38.3', '38.31', '38.32', '47.79']},
'46.9': {'description': 'Non-specialised wholesale trade', 'exclusions': []},
'46.90': {'description': 'Non-specialised wholesale trade', 'exclusions': []},
'47': {'description': 'Retail trade, except of motor vehicles and motorcycles',
'exclusions': ['01', '10', '32', '45', '46', '56', '77.2']},
'47.1': {'description': 'Retail sale in non-specialised stores',
'exclusions': []},
'47.11': {'description': 'Retail sale in non-specialised stores with food, beverages or tobacco predominating',
'exclusions': []},
'47.19': {'description': 'Other retail sale in non-specialised stores',
'exclusions': []},
'47.2': {'description': 'Retail sale of food, beverages and tobacco in specialised stores',
'exclusions': []},
'47.21': {'description': 'Retail sale of fruit and vegetables in specialised stores',
'exclusions': []},
'47.22': {'description': 'Retail sale of meat and meat products in specialised stores',
'exclusions': []},
'47.23': {'description': 'Retail sale of fish, crustaceans and molluscs in specialised stores',
'exclusions': []},
'47.24': {'description': 'Retail sale of bread, cakes, flour confectionery and sugar confectionery in specialised stores',
'exclusions': []},
'47.25': {'description': 'Retail sale of beverages in specialised stores',
'exclusions': []},
'47.26': {'description': 'Retail sale of tobacco products in specialised stores',
'exclusions': []},
'47.29': {'description': 'Other retail sale of food in specialised stores',
'exclusions': []},
'47.3': {'description': 'Retail sale of automotive fuel in specialised stores',
'exclusions': []},
'47.30': {'description': 'Retail sale of automotive fuel in specialised stores',
'exclusions': ['46.71', '47.78']},
'47.4': {'description': 'Retail sale of information and communication equipment in specialised stores',
'exclusions': []},
'47.41': {'description': 'Retail sale of computers, peripheral units and software in specialised stores',
'exclusions': ['47.63']},
'47.42': {'description': 'Retail sale of telecommunications equipment in specialised stores',
'exclusions': []},
'47.43': {'description': 'Retail sale of audio and video equipment in specialised stores',
'exclusions': []},
'47.5': {'description': 'Retail sale of other household equipment in specialised stores',
'exclusions': []},
'47.51': {'description': 'Retail sale of textiles in specialised stores',
'exclusions': ['47.71']},
'47.52': {'description': 'Retail sale of hardware, paints and glass in specialised stores',
'exclusions': []},
'47.53': {'description': 'Retail sale of carpets, rugs, wall and floor coverings in specialised stores',
'exclusions': ['47.52']},
'47.54': {'description': 'Retail sale of electrical household appliances in specialised stores',
'exclusions': ['47.43']},
'47.59': {'description': 'Retail sale of furniture, lighting equipment and other household articles in specialised stores',
'exclusions': ['47.79']},
'47.6': {'description': 'Retail sale of cultural and recreation goods in specialised stores',
'exclusions': []},
'47.61': {'description': 'Retail sale of books in specialised stores',
'exclusions': ['47.79']},
'47.62': {'description': 'Retail sale of newspapers and stationery in specialised stores',
'exclusions': []},
'47.63': {'description': 'Retail sale of music and video recordings in specialised stores',
'exclusions': []},
'47.64': {'description': 'Retail sale of sporting equipment in specialised stores',
'exclusions': []},
'47.65': {'description': 'Retail sale of games and toys in specialised stores',
'exclusions': ['47.41']},
'47.7': {'description': 'Retail sale of other goods in specialised stores',
'exclusions': []},
'47.71': {'description': 'Retail sale of clothing in specialised stores',
'exclusions': ['47.51']},
'47.72': {'description': 'Retail sale of footwear and leather goods in specialised stores',
'exclusions': ['47.64']},
'47.73': {'description': 'Dispensing chemist in specialised stores',
'exclusions': []},
'47.74': {'description': 'Retail sale of medical and orthopaedic goods in specialised stores',
'exclusions': []},
'47.75': {'description': 'Retail sale of cosmetic and toilet articles in specialised stores',
'exclusions': []},
'47.76': {'description': 'Retail sale of flowers, plants, seeds, fertilisers, pet animals and pet food in specialised stores',
'exclusions': []},
'47.77': {'description': 'Retail sale of watches and jewellery in specialised stores',
'exclusions': []},
'47.78': {'description': 'Other retail sale of new goods in specialised stores',
'exclusions': []},
'47.79': {'description': 'Retail sale of second-hand goods in stores',
'exclusions': ['45.1', '47.91', '47.99', '64.92']},
'47.8': {'description': 'Retail sale via stalls and markets',
'exclusions': []},
'47.81': {'description': 'Retail sale via stalls and markets of food, beverages and tobacco products',
'exclusions': ['56.10']},
'47.82': {'description': 'Retail sale via stalls and markets of textiles, clothing and footwear',
'exclusions': []},
'47.89': {'description': 'Retail sale via stalls and markets of other goods',
'exclusions': []},
'47.9': {'description': 'Retail trade not in stores, stalls or markets',
'exclusions': []},
'47.91': {'description': 'Retail sale via mail order houses or via Internet',
'exclusions': ['45.1', '45.3', '45.40']},
'47.99': {'description': 'Other retail sale not in stores, stalls or markets',
'exclusions': []},
'49': {'description': 'Land transport and transport via pipelines',
'exclusions': []},
'49.1': {'description': 'Passenger rail transport, interurban',
'exclusions': []},
'49.10': {'description': 'Passenger rail transport, interurban',
'exclusions': ['49.31', '52.21', '55.90', '56.10']},
'49.2': {'description': 'Freight rail transport', 'exclusions': []},
'49.20': {'description': 'Freight rail transport',
'exclusions': ['52.10', '52.21', '52.24']},
'49.3': {'description': 'Other passenger land transport ', 'exclusions': []},
'49.31': {'description': 'Urban and suburban passenger land transport',
'exclusions': ['49.10']},
'49.32': {'description': 'Taxi operation', 'exclusions': []},
'49.39': {'description': 'Other passenger land transport n.e.c.',
'exclusions': ['86.90']},
'49.4': {'description': 'Freight transport by road and removal services',
'exclusions': []},
'49.41': {'description': 'Freight transport by road',
'exclusions': ['02.40',
'36.00',
'52.21',
'52.29',
'53.10',
'53.20',
'38.11',
'38.12']},
'49.42': {'description': 'Removal services', 'exclusions': []},
'49.5': {'description': 'Transport via pipeline', 'exclusions': []},
'49.50': {'description': 'Transport via pipeline',
'exclusions': ['35.22', '35.30', '36.00', '49.41']},
'50': {'description': 'Water transport', 'exclusions': ['56.10', '56.30']},
'50.1': {'description': 'Sea and coastal passenger water transport',
'exclusions': []},
'50.10': {'description': 'Sea and coastal passenger water transport',
'exclusions': ['56.10', '56.30', '77.21', '77.34', '92.00']},
'50.2': {'description': 'Sea and coastal freight water transport',
'exclusions': []},
'50.20': {'description': 'Sea and coastal freight water transport',
'exclusions': ['52.10', '52.22', '52.24', '77.34']},
'50.3': {'description': 'Inland passenger water transport', 'exclusions': []},
'50.30': {'description': 'Inland passenger water transport',
'exclusions': ['77.21']},
'50.4': {'description': 'Inland freight water transport', 'exclusions': []},
'50.40': {'description': 'Inland freight water transport',
'exclusions': ['52.24', '77.34']},
'51': {'description': 'Air transport',
'exclusions': ['01.61', '33.16', '52.23', '73.11', '74.20']},
'51.1': {'description': 'Passenger air transport', 'exclusions': []},
'51.10': {'description': 'Passenger air transport', 'exclusions': ['77.35']},
'51.2': {'description': 'Freight air transport and space transport',
'exclusions': []},
'51.21': {'description': 'Freight air transport', 'exclusions': []},
'51.22': {'description': 'Space transport', 'exclusions': []},
'52': {'description': 'Warehousing and support activities for transportation',
'exclusions': []},
'52.1': {'description': 'Warehousing and storage', 'exclusions': []},
'52.10': {'description': 'Warehousing and storage',
'exclusions': ['52.21', '68.20']},
'52.2': {'description': 'Support activities for transportation',
'exclusions': []},
'52.21': {'description': 'Service activities incidental to land transportation',
'exclusions': ['52.24']},
'52.22': {'description': 'Service activities incidental to water transportation',
'exclusions': ['52.24', '93.29']},
'52.23': {'description': 'Service activities incidental to air transportation',
'exclusions': ['52.24', '85.32', '85.53']},
'52.24': {'description': 'Cargo handling',
'exclusions': ['52.21', '52.22', '52.23']},
'52.29': {'description': 'Other transportation support activities ',
'exclusions': ['53.20', '65.12', '79.11', '79.12', '79.90']},
'53': {'description': 'Postal and courier activities', 'exclusions': []},
'53.1': {'description': 'Postal activities under universal service obligation',
'exclusions': []},
'53.10': {'description': 'Postal activities under universal service obligation',
'exclusions': ['64.19']},
'53.2': {'description': 'Other postal and courier activities',
'exclusions': []},
'53.20': {'description': 'Other postal and courier activities',
'exclusions': ['49.20', '49.41', '50.20', '50.40', '51.21', '51.22']},
'55': {'description': 'Accommodation', 'exclusions': ['L']},
'55.1': {'description': 'Hotels and similar accommodation', 'exclusions': []},
'55.10': {'description': 'Hotels and similar accommodation',
'exclusions': ['68']},
'55.2': {'description': 'Holiday and other short-stay accommodation',
'exclusions': []},
'55.20': {'description': 'Holiday and other short-stay accommodation',
'exclusions': ['55.10', '68']},
'55.3': {'description': 'Camping grounds, recreational vehicle parks and trailer parks',
'exclusions': []},
'55.30': {'description': 'Camping grounds, recreational vehicle parks and trailer parks',
'exclusions': ['55.20']},
'55.9': {'description': 'Other accommodation', 'exclusions': []},
'55.90': {'description': 'Other accommodation', 'exclusions': []},
'56': {'description': 'Food and beverage service activities',
'exclusions': ['10', '11', 'G']},
'56.1': {'description': 'Restaurants and mobile food service activities',
'exclusions': []},
'56.10': {'description': 'Restaurants and mobile food service activities',
'exclusions': ['47.99', '56.29']},
'56.2': {'description': 'Event catering and other food service activities',
'exclusions': []},
'56.21': {'description': 'Event catering activities',
'exclusions': ['10.89', '47']},
'56.29': {'description': 'Other food service activities',
'exclusions': ['10.89', '47']},
'56.3': {'description': 'Beverage serving activities', 'exclusions': []},
'56.30': {'description': 'Beverage serving activities',
'exclusions': ['47', '47.99', '93.29']},
'58': {'description': 'Publishing activities',
'exclusions': ['59', '18.11', '18.12', '18.20']},
'58.1': {'description': 'Publishing of books, periodicals and other publishing activities',
'exclusions': []},
'58.11': {'description': 'Book publishing',
'exclusions': ['32.99', '58.19', '59.20', '90.03']},
'58.12': {'description': 'Publishing of directories and mailing lists',
'exclusions': []},
'58.13': {'description': 'Publishing of newspapers', 'exclusions': ['63.91']},
'58.14': {'description': 'Publishing of journals and periodicals',
'exclusions': []},
'58.19': {'description': 'Other publishing activities',
'exclusions': ['58.13', '63.11']},
'58.2': {'description': 'Software publishing', 'exclusions': []},
'58.21': {'description': 'Publishing of computer games', 'exclusions': []},
'58.29': {'description': 'Other software publishing',
'exclusions': ['18.20', '47.41', '62.01', '63.11']},
'59': {'description': 'Motion picture, video and television programme production, sound recording and music publishing activities',
'exclusions': []},
'59.1': {'description': 'Motion picture, video and television programme activities',
'exclusions': []},
'59.11': {'description': 'Motion picture, video and television programme production activities',
'exclusions': ['18.20',
'46.43',
'46.52',
'47.63',
'59.12',
'59.20',
'60.2',
'74.20',
'74.90',
'77.22',
'82.99',
'90.0']},
'59.12': {'description': 'Motion picture, video and television programme post-production activities',
'exclusions': ['18.20',
'46.43',
'46.52',
'47.63',
'74.20',
'77.22',
'90.0']},
'59.13': {'description': 'Motion picture, video and television programme distribution activities',
'exclusions': ['18.20', '46.43', '47.63']},
'59.14': {'description': 'Motion picture projection activities',
'exclusions': []},
'59.2': {'description': 'Sound recording and music publishing activities',
'exclusions': []},
'59.20': {'description': 'Sound recording and music publishing activities',
'exclusions': []},
'60': {'description': 'Programming and broadcasting activities',
'exclusions': ['61']},
'60.1': {'description': 'Radio broadcasting', 'exclusions': []},
'60.10': {'description': 'Radio broadcasting', 'exclusions': ['59.20']},
'60.2': {'description': 'Television programming and broadcasting activities',
'exclusions': []},
'60.20': {'description': 'Television programming and broadcasting activities',
'exclusions': ['59.11', '61']},
'61': {'description': 'Telecommunications', 'exclusions': []},
'61.1': {'description': 'Wired telecommunications activities',
'exclusions': []},
'61.10': {'description': 'Wired telecommunications activities',
'exclusions': ['61.90']},
'61.2': {'description': 'Wireless telecommunications activities',
'exclusions': []},
'61.20': {'description': 'Wireless telecommunications activities',
'exclusions': ['61.90']},
'61.3': {'description': 'Satellite telecommunications activities',
'exclusions': []},
'61.30': {'description': 'Satellite telecommunications activities',
'exclusions': ['61.90']},
'61.9': {'description': 'Other telecommunications activities',
'exclusions': []},
'61.90': {'description': 'Other telecommunications activities',
'exclusions': ['61.10', '61.20', '61.30']},
'62': {'description': 'Computer programming, consultancy and related activities',
'exclusions': []},
'62.0': {'description': 'Computer programming, consultancy and related activities',
'exclusions': []},
'62.01': {'description': 'Computer programming activities',
'exclusions': ['58.29', '62.02']},
'62.02': {'description': 'Computer consultancy activities',
'exclusions': ['46.51', '47.41', '33.20', '62.09']},
'62.03': {'description': 'Computer facilities management activities',
'exclusions': []},
'62.09': {'description': 'Other information technology and computer service activities',
'exclusions': ['33.20', '62.01', '62.02', '62.03', '63.11']},
'63': {'description': 'Information service activities', 'exclusions': []},
'63.1': {'description': 'Data processing, hosting and related activities; web portals',
'exclusions': []},
'63.11': {'description': 'Data processing, hosting and related activities',
'exclusions': []},
'63.12': {'description': 'Web portals', 'exclusions': ['58', '60']},
'63.9': {'description': 'Other information service activities',
'exclusions': ['91.01']},
'63.91': {'description': 'News agency activities',
'exclusions': ['74.20', '90.03']},
'63.99': {'description': 'Other information service activities n.e.c.',
'exclusions': ['82.20']},
'64': {'description': 'Financial service activities, except insurance and pension funding',
'exclusions': []},
'64.1': {'description': 'Monetary intermediation', 'exclusions': []},
'64.11': {'description': 'Central banking', 'exclusions': []},
'64.19': {'description': 'Other monetary intermediation',
'exclusions': ['64.92', '66.19']},
'64.2': {'description': 'Activities of holding companies', 'exclusions': []},
'64.20': {'description': 'Activities of holding companies',
'exclusions': ['70.10']},
'64.3': {'description': 'Trusts, funds and similar financial entities',
'exclusions': []},
'64.30': {'description': 'Trusts, funds and similar financial entities',
'exclusions': ['64.20', '65.30', '66.30']},
'64.9': {'description': 'Other financial service activities, except insurance and pension funding',
'exclusions': ['65']},
'64.91': {'description': 'Financial leasing', 'exclusions': ['77']},
'64.92': {'description': 'Other credit granting',
'exclusions': ['64.19', '77', '94.99']},
'64.99': {'description': 'Other financial service activities, except insurance and pension funding n.e.c.',
'exclusions': ['64.91', '66.12', '68', '82.91', '94.99']},
'65': {'description': 'Insurance, reinsurance and pension funding, except compulsory social security',
'exclusions': []},
'65.1': {'description': 'Insurance', 'exclusions': []},
'65.11': {'description': 'Life insurance', 'exclusions': []},
'65.12': {'description': 'Non-life insurance', 'exclusions': []},
'65.2': {'description': 'Reinsurance', 'exclusions': []},
'65.20': {'description': 'Reinsurance', 'exclusions': []},
'65.3': {'description': 'Pension funding', 'exclusions': []},
'65.30': {'description': 'Pension funding', 'exclusions': ['66.30', '84.30']},
'66': {'description': 'Activities auxiliary to financial services and insurance activities',
'exclusions': []},
'66.1': {'description': 'Activities auxiliary to financial services, except insurance and pension funding',
'exclusions': []},
'66.11': {'description': 'Administration of financial markets',
'exclusions': []},
'66.12': {'description': 'Security and commodity contracts brokerage',
'exclusions': ['64.99', '66.30']},
'66.19': {'description': 'Other activities auxiliary to financial services, except insurance and pension funding',
'exclusions': ['66.22', '66.30']},
'66.2': {'description': 'Activities auxiliary to insurance and pension funding',
'exclusions': []},
'66.21': {'description': 'Risk and damage evaluation',
'exclusions': ['68.31', '74.90', '80.30']},
'66.22': {'description': 'Activities of insurance agents and brokers',
'exclusions': []},
'66.29': {'description': 'Other activities auxiliary to insurance and pension funding',
'exclusions': ['52.22']},
'66.3': {'description': 'Fund management activities', 'exclusions': []},
'66.30': {'description': 'Fund management activities', 'exclusions': []},
'68': {'description': 'Real estate activities', 'exclusions': []},
'68.1': {'description': 'Buying and selling of own real estate',
'exclusions': []},
'68.10': {'description': 'Buying and selling of own real estate',
'exclusions': ['41.10', '42.99']},
'68.2': {'description': 'Rental and operating of own or leased real estate',
'exclusions': []},
'68.20': {'description': 'Rental and operating of own or leased real estate',
'exclusions': ['55']},
'68.3': {'description': 'Real estate activities on a fee or contract basis',
'exclusions': []},
'68.31': {'description': 'Real estate agencies', 'exclusions': ['69.10']},
'68.32': {'description': 'Management of real estate on a fee or contract basis',
'exclusions': ['69.10', '81.10']},
'69': {'description': 'Legal and accounting activities', 'exclusions': []},
'69.1': {'description': 'Legal activities', 'exclusions': []},
'69.10': {'description': 'Legal activities', 'exclusions': ['84.23']},
'69.2': {'description': 'Accounting, bookkeeping and auditing activities; tax consultancy',
'exclusions': []},
'69.20': {'description': 'Accounting, bookkeeping and auditing activities; tax consultancy',
'exclusions': ['63.11', '70.22', '82.91']},
'70': {'description': 'Activities of head offices; management consultancy activities',
'exclusions': []},
'70.1': {'description': 'Activities of head offices', 'exclusions': []},
'70.10': {'description': 'Activities of head offices',
'exclusions': ['64.20']},
'70.2': {'description': 'Management consultancy activities',
'exclusions': []},
'70.21': {'description': 'Public relations and communication activities',
'exclusions': ['73.1', '73.20']},
'70.22': {'description': 'Business and other management consultancy activities',
'exclusions': ['62.01',
'69.10',
'69.20',
'71.11',
'71.12',
'74.90',
'78.10',
'85.60']},
'71': {'description': 'Architectural and engineering activities; technical testing and analysis',
'exclusions': []},
'71.1': {'description': 'Architectural and engineering activities and related technical consultancy',
'exclusions': []},
'71.11': {'description': 'Architectural activities ',
'exclusions': ['62.02', '62.09', '74.10']},
'71.12': {'description': 'Engineering activities and related technical consultancy',
'exclusions': ['09.10',
'09.90',
'58.29',
'62.01',
'62.02',
'62.09',
'71.20',
'72.19',
'74.10',
'74.20']},
'71.2': {'description': 'Technical testing and analysis', 'exclusions': []},
'71.20': {'description': 'Technical testing and analysis',
'exclusions': ['75.00', '86']},
'72': {'description': 'Scientific research and development ',
'exclusions': ['73.20']},
'72.1': {'description': 'Research and experimental development on natural sciences and engineering',
'exclusions': []},
'72.11': {'description': 'Research and experimental development on biotechnology',
'exclusions': []},
'72.19': {'description': 'Other research and experimental development on natural sciences and engineering',
'exclusions': []},
'72.2': {'description': 'Research and experimental development on social sciences and humanities',
'exclusions': []},
'72.20': {'description': 'Research and experimental development on social sciences and humanities',
'exclusions': ['73.20']},
'73': {'description': 'Advertising and market research', 'exclusions': []},
'73.1': {'description': 'Advertising', 'exclusions': []},
'73.11': {'description': 'Advertising agencies',
'exclusions': ['58.19',
'59.11',
'59.20',
'73.20',
'74.20',
'82.30',
'82.19']},
'73.12': {'description': 'Media representation', 'exclusions': ['70.21']},
'73.2': {'description': 'Market research and public opinion polling',
'exclusions': []},
'73.20': {'description': 'Market research and public opinion polling',
'exclusions': []},
'74': {'description': 'Other professional, scientific and technical activities',
'exclusions': []},
'74.1': {'description': 'Specialised design activities', 'exclusions': []},
'74.10': {'description': 'Specialised design activities',
'exclusions': ['62.01', '71.11', '71.12']},
'74.2': {'description': 'Photographic activities', 'exclusions': []},
'74.20': {'description': 'Photographic activities',
'exclusions': ['59.12', '71.12', '96.09']},
'74.3': {'description': 'Translation and interpretation activities',
'exclusions': []},
'74.30': {'description': 'Translation and interpretation activities',
'exclusions': []},
'74.9': {'description': 'Other professional, scientific and technical activities n.e.c.',
'exclusions': []},
'74.90': {'description': 'Other professional, scientific and technical activities n.e.c.',
'exclusions': ['45.1',
'47.91',
'47.79',
'68.31',
'69.20',
'70.22',
'71.1',
'71.12',
'74.10',
'71.20',
'73.11',
'82.30',
'82.99',
'88.99']},
'75': {'description': 'Veterinary activities', 'exclusions': []},
'75.0': {'description': 'Veterinary activities', 'exclusions': []},
'75.00': {'description': 'Veterinary activities',
'exclusions': ['01.62', '96.09']},
'77': {'description': 'Rental and leasing activities',
'exclusions': ['64.91', 'L', 'F', 'H']},
'77.1': {'description': 'Rental and leasing of motor vehicles',
'exclusions': []},
'77.11': {'description': 'Rental and leasing of cars and light motor vehicles',
'exclusions': ['49.32', '49.39']},
'77.12': {'description': 'Rental and leasing of trucks',
'exclusions': ['49.41']},
'77.2': {'description': 'Rental and leasing of personal and household goods',
'exclusions': []},
'77.21': {'description': 'Rental and leasing of recreational and sports goods',
'exclusions': ['50.10', '50.30', '77.22', '77.29', '93.29']},
'77.22': {'description': 'Rental of video tapes and disks', 'exclusions': []},
'77.29': {'description': 'Rental and leasing of other personal and household goods',
'exclusions': ['77.1', '77.21', '77.22', '77.33', '77.39', '96.01']},
'77.3': {'description': 'Rental and leasing of other machinery, equipment and tangible goods',
'exclusions': []},
'77.31': {'description': 'Rental and leasing of agricultural machinery and equipment',
'exclusions': ['01.61', '02.40']},
'77.32': {'description': 'Rental and leasing of construction and civil engineering machinery and equipment',
'exclusions': ['43']},
'77.33': {'description': 'Rental and leasing of office machinery and equipment (including computers)',
'exclusions': []},
'77.34': {'description': 'Rental and leasing of water transport equipment',
'exclusions': ['50', '77.21']},
'77.35': {'description': 'Rental and leasing of air transport equipment',
'exclusions': ['51']},
'77.39': {'description': 'Rental and leasing of other machinery, equipment and tangible goods n.e.c.',
'exclusions': ['77.21', '77.31', '77.32', '77.33']},
'77.4': {'description': 'Leasing of intellectual property and similar products, except copyrighted works',
'exclusions': []},
'77.40': {'description': 'Leasing of intellectual property and similar products, except copyrighted works',
'exclusions': ['58', '59', '68.20', '77.1', '77.2', '77.3']},
'78': {'description': 'Employment activities', 'exclusions': ['74.90']},
'78.1': {'description': 'Activities of employment placement agencies',
'exclusions': []},
'78.10': {'description': 'Activities of employment placement agencies',
'exclusions': ['74.90']},
'78.2': {'description': 'Temporary employment agency activities',
'exclusions': []},
'78.20': {'description': 'Temporary employment agency activities',
'exclusions': []},
'78.3': {'description': 'Other human resources provision', 'exclusions': []},
'78.30': {'description': 'Other human resources provision',
'exclusions': ['78.20']},
'79': {'description': 'Travel agency, tour operator and other reservation service and related activities',
'exclusions': []},
'79.1': {'description': 'Travel agency and tour operator activities',
'exclusions': []},
'79.11': {'description': 'Travel agency activities', 'exclusions': []},
'79.12': {'description': 'Tour operator activities', 'exclusions': []},
'79.9': {'description': 'Other reservation service and related activities',
'exclusions': []},
'79.90': {'description': 'Other reservation service and related activities',
'exclusions': ['79.11', '79.12', '82.30']},
'80': {'description': 'Security and investigation activities',
'exclusions': []},
'80.1': {'description': 'Private security activities', 'exclusions': []},
'80.10': {'description': 'Private security activities',
'exclusions': ['84.24']},
'80.2': {'description': 'Security systems service activities',
'exclusions': []},
'80.20': {'description': 'Security systems service activities',
'exclusions': ['43.21', '47.59', '74.90', '84.24', '95.29']},
'80.3': {'description': 'Investigation activities', 'exclusions': []},
'80.30': {'description': 'Investigation activities', 'exclusions': []},
'81': {'description': 'Services to buildings and landscape activities',
'exclusions': []},
'81.1': {'description': 'Combined facilities support activities',
'exclusions': []},
'81.10': {'description': 'Combined facilities support activities',
'exclusions': ['62.03', '84.23']},
'81.2': {'description': 'Cleaning activities',
'exclusions': ['01.61', '43.39', '43.99', '96.01']},
'81.21': {'description': 'General cleaning of buildings',
'exclusions': ['81.22']},
'81.22': {'description': 'Other building and industrial cleaning activities',
'exclusions': ['43.99']},
'81.29': {'description': 'Other cleaning activities',
'exclusions': ['01.61', '45.20']},
'81.3': {'description': 'Landscape service activities', 'exclusions': []},
'81.30': {'description': 'Landscape service activities',
'exclusions': ['01', '02', '01.30', '02.10', '01.61', 'F', '71.11']},
'82': {'description': 'Office administrative, office support and other business support activities',
'exclusions': []},
'82.1': {'description': 'Office administrative and support activities',
'exclusions': []},
'82.11': {'description': 'Combined office administrative service activities',
'exclusions': ['78']},
'82.19': {'description': 'Photocopying, document preparation and other specialised office support activities',
'exclusions': ['18.12', '18.13', '73.11', '82.99']},
'82.2': {'description': 'Activities of call centres', 'exclusions': []},
'82.20': {'description': 'Activities of call centres', 'exclusions': []},
'82.3': {'description': 'Organisation of conventions and trade shows',
'exclusions': []},
'82.30': {'description': 'Organisation of conventions and trade shows',
'exclusions': []},
'82.9': {'description': 'Business support service activities n.e.c.',
'exclusions': []},
'82.91': {'description': 'Activities of collection agencies and credit bureaus',
'exclusions': []},
'82.92': {'description': 'Packaging activities',
'exclusions': ['11.07', '52.29']},
'82.99': {'description': 'Other business support service activities n.e.c.',
'exclusions': ['82.19', '59.12']},
'84': {'description': 'Public administration and defence; compulsory social security',
'exclusions': []},
'84.1': {'description': 'Administration of the State and the economic and social policy of the community',
'exclusions': []},
'84.11': {'description': 'General public administration activities',
'exclusions': ['68.2', '68.3', '84.12', '84.13', '84.22', '91.01']},
'84.12': {'description': 'Regulation of the activities of providing health care, education, cultural services and other social services, excluding social security',
'exclusions': ['37', '38', '39', '84.30', 'P', '86', '91', '91.01', '93']},
'84.13': {'description': 'Regulation of and contribution to more efficient operation of businesses',
'exclusions': ['72']},
'84.2': {'description': 'Provision of services to the community as a whole',
'exclusions': []},
'84.21': {'description': 'Foreign affairs', 'exclusions': ['88.99']},
'84.22': {'description': 'Defence activities',
'exclusions': ['72', '84.21', '84.23', '84.24', '85.4', '86.10']},
'84.23': {'description': 'Justice and judicial activities',
'exclusions': ['69.10', '85', '86.10']},
'84.24': {'description': 'Public order and safety activities',
'exclusions': ['71.20', '84.22']},
'84.25': {'description': 'Fire service activities',
'exclusions': ['02.40', '09.10', '52.23']},
'84.3': {'description': 'Compulsory social security activities',
'exclusions': []},
'84.30': {'description': 'Compulsory social security activities',
'exclusions': ['65.30', '88.10', '88.99']},
'85': {'description': 'Education', 'exclusions': []},
'85.1': {'description': 'Pre-primary education', 'exclusions': []},
'85.10': {'description': 'Pre-primary education ', 'exclusions': ['88.91']},
'85.2': {'description': 'Primary education', 'exclusions': []},
'85.20': {'description': 'Primary education ',
'exclusions': ['85.5', '88.91']},
'85.3': {'description': 'Secondary education', 'exclusions': ['85.5']},
'85.31': {'description': 'General secondary education ', 'exclusions': []},
'85.32': {'description': 'Technical and vocational secondary education ',
'exclusions': ['85.4', '85.52', '85.53', '88.10', '88.99']},
'85.4': {'description': 'Higher education', 'exclusions': ['85.5']},
'85.41': {'description': 'Post-secondary non-tertiary education',
'exclusions': []},
'85.42': {'description': 'Tertiary education', 'exclusions': []},
'85.5': {'description': 'Other education', 'exclusions': ['85.1', '85.4']},
'85.51': {'description': 'Sports and recreation education',
'exclusions': ['85.52']},
'85.52': {'description': 'Cultural education', 'exclusions': ['85.59']},
'85.53': {'description': 'Driving school activities',
'exclusions': ['85.32']},
'85.59': {'description': 'Other education n.e.c.',
'exclusions': ['85.20', '85.31', '85.32', '85.4']},
'85.6': {'description': 'Educational support activities', 'exclusions': []},
'85.60': {'description': 'Educational support activities',
'exclusions': ['72.20']},
'86': {'description': 'Human health activities', 'exclusions': []},
'86.1': {'description': 'Hospital activities', 'exclusions': []},
'86.10': {'description': 'Hospital activities',
'exclusions': ['71.20', '75.00', '84.22', '86.23', '86.2', '86.90']},
'86.2': {'description': 'Medical and dental practice activities',
'exclusions': []},
'86.21': {'description': 'General medical practice activities',
'exclusions': ['86.10', '86.90']},
'86.22': {'description': 'Specialist medical practice activities',
'exclusions': ['86.10', '86.90']},
'86.23': {'description': 'Dental practice activities',
'exclusions': ['32.50', '86.10', '86.90']},
'86.9': {'description': 'Other human health activities', 'exclusions': []},
'86.90': {'description': 'Other human health activities',
'exclusions': ['32.50',
'49',
'50',
'51',
'71.20',
'86.10',
'86.2',
'87.10']},
'87': {'description': 'Residential care activities', 'exclusions': []},
'87.1': {'description': 'Residential nursing care activities',
'exclusions': []},
'87.10': {'description': 'Residential nursing care activities',
'exclusions': ['86', '87.30', '87.90']},
'87.2': {'description': 'Residential care activities for mental retardation, mental health and substance abuse',
'exclusions': []},
'87.20': {'description': 'Residential care activities for mental retardation, mental health and substance abuse',
'exclusions': ['86.10', '87.90']},
'87.3': {'description': 'Residential care activities for the elderly and disabled',
'exclusions': []},
'87.30': {'description': 'Residential care activities for the elderly and disabled',
'exclusions': ['87.10', '87.90']},
'87.9': {'description': 'Other residential care activities',
'exclusions': []},
'87.90': {'description': 'Other residential care activities',
'exclusions': ['84.30', '87.10', '87.30', '88.99']},
'88': {'description': 'Social work activities without accommodation',
'exclusions': []},
'88.1': {'description': 'Social work activities without accommodation for the elderly and disabled',
'exclusions': []},
'88.10': {'description': 'Social work activities without accommodation for the elderly and disabled',
'exclusions': ['84.30', '87.30', '88.91']},
'88.9': {'description': 'Other social work activities without accommodation',
'exclusions': []},
'88.91': {'description': 'Child day-care activities', 'exclusions': []},
'88.99': {'description': 'Other social work activities without accommodation n.e.c.',
'exclusions': ['84.30', '87.90']},
'90': {'description': 'Creative, arts and entertainment activities',
'exclusions': ['91',
'92',
'93',
'59.11',
'59.12',
'59.13',
'59.14',
'60.1',
'60.2']},
'90.0': {'description': 'Creative, arts and entertainment activities',
'exclusions': []},
'90.01': {'description': 'Performing arts', 'exclusions': ['74.90', '78.10']},
'90.02': {'description': 'Support activities to performing arts',
'exclusions': ['74.90', '78.10']},
'90.03': {'description': 'Artistic creation',
'exclusions': ['23.70', '33.19', '59.11', '59.12', '95.24']},
'90.04': {'description': 'Operation of arts facilities',
'exclusions': ['59.14', '79.90', '91.02']},
'91': {'description': 'Libraries, archives, museums and other cultural activities',
'exclusions': ['93']},
'91.0': {'description': 'Libraries, archives, museums and other cultural activities',
'exclusions': []},
'91.01': {'description': 'Library and archives activities', 'exclusions': []},
'91.02': {'description': 'Museums activities',
'exclusions': ['47.78', '90.03', '91.01']},
'91.03': {'description': 'Operation of historical sites and buildings and similar visitor attractions',
'exclusions': ['F']},
'91.04': {'description': 'Botanical and zoological gardens and nature reserves activities',
'exclusions': ['81.30', '93.19']},
'92': {'description': 'Gambling and betting activities', 'exclusions': []},
'92.0': {'description': 'Gambling and betting activities', 'exclusions': []},
'92.00': {'description': 'Gambling and betting activities', 'exclusions': []},
'93': {'description': 'Sports activities and amusement and recreation activities',
'exclusions': ['90']},
'93.1': {'description': 'Sports activities', 'exclusions': []},
'93.11': {'description': 'Operation of sports facilities',
'exclusions': ['49.39', '77.21', '93.13', '93.29']},
'93.12': {'description': 'Activities of sports clubs',
'exclusions': ['85.51', '93.11']},
'93.13': {'description': 'Fitness facilities', 'exclusions': ['85.51']},
'93.19': {'description': 'Other sports activities',
'exclusions': ['77.21', '85.51', '93.11', '93.12', '93.29']},
'93.2': {'description': 'Amusement and recreation activities',
'exclusions': []},
'93.21': {'description': 'Activities of amusement parks and theme parks',
'exclusions': []},
'93.29': {'description': 'Other amusement and recreation activities',
'exclusions': ['49.39', '50.10', '50.30', '55.30', '56.30', '90.01']},
'94': {'description': 'Activities of membership organisations',
'exclusions': []},
'94.1': {'description': 'Activities of business, employers and professional membership organisations',
'exclusions': []},
'94.11': {'description': 'Activities of business and employers membership organisations',
'exclusions': ['94.20']},
'94.12': {'description': 'Activities of professional membership organisations',
'exclusions': ['85']},
'94.2': {'description': 'Activities of trade unions', 'exclusions': []},
'94.20': {'description': 'Activities of trade unions', 'exclusions': ['85']},
'94.9': {'description': 'Activities of other membership organisations',
'exclusions': []},
'94.91': {'description': 'Activities of religious organisations',
'exclusions': ['85', '86', '87', '88']},
'94.92': {'description': 'Activities of political organisations',
'exclusions': []},
'94.99': {'description': 'Activities of other membership organisations n.e.c.',
'exclusions': ['88.99', '90.0', '93.12', '94.12']},
'95': {'description': 'Repair of computers and personal and household goods',
'exclusions': ['33.13']},
'95.1': {'description': 'Repair of computers and communication equipment',
'exclusions': []},
'95.11': {'description': 'Repair of computers and peripheral equipment',
'exclusions': ['95.12']},
'95.12': {'description': 'Repair of communication equipment',
'exclusions': []},
'95.2': {'description': 'Repair of personal and household goods',
'exclusions': []},
'95.21': {'description': 'Repair of consumer electronics', 'exclusions': []},
'95.22': {'description': 'Repair of household appliances and home and garden equipment',
'exclusions': ['33.12', '43.22']},
'95.23': {'description': 'Repair of footwear and leather goods',
'exclusions': []},
'95.24': {'description': 'Repair of furniture and home furnishings',
'exclusions': []},
'95.25': {'description': 'Repair of watches, clocks and jewellery',
'exclusions': ['33.13']},
'95.29': {'description': 'Repair of other personal and household goods',
'exclusions': ['25.61', '33.11', '33.12', '33.19']},
'96': {'description': 'Other personal service activities', 'exclusions': []},
'96.0': {'description': 'Other personal service activities',
'exclusions': []},
'96.01': {'description': 'Washing and (dry-)cleaning of textile and fur products',
'exclusions': ['77.29', '95.29']},
'96.02': {'description': 'Hairdressing and other beauty treatment',
'exclusions': ['32.99']},
'96.03': {'description': 'Funeral and related activities',
'exclusions': ['81.30', '94.91']},
'96.04': {'description': 'Physical well-being activities',
'exclusions': ['86.90', '93.13']},
'96.09': {'description': 'Other personal service activities n.e.c.',
'exclusions': ['75.00', '92.00', '96.01']},
'97': {'description': 'Activities of households as employers of domestic personnel',
'exclusions': []},
'97.0': {'description': 'Activities of households as employers of domestic personnel',
'exclusions': []},
'97.00': {'description': 'Activities of households as employers of domestic personnel',
'exclusions': []},
'98': {'description': 'Undifferentiated goods- and services-producing activities of private households for own use',
'exclusions': []},
'98.1': {'description': 'Undifferentiated goods-producing activities of private households for own use',
'exclusions': []},
'98.10': {'description': 'Undifferentiated goods-producing activities of private households for own use',
'exclusions': []},
'98.2': {'description': 'Undifferentiated service-producing activities of private households for own use',
'exclusions': []},
'98.20': {'description': 'Undifferentiated service-producing activities of private households for own use',
'exclusions': []},
'99': {'description': 'Activities of extraterritorial organisations and bodies',
'exclusions': []},
'99.0': {'description': 'Activities of extraterritorial organisations and bodies',
'exclusions': []},
'99.00': {'description': 'Activities of extraterritorial organisations and bodies',
'exclusions': []},
'A': {'description': 'AGRICULTURE, FORESTRY AND FISHING', 'exclusions': []},
'B': {'description': 'MINING AND QUARRYING',
'exclusions': ['C', 'F', '11.07', '23.9']},
'C': {'description': 'MANUFACTURING', 'exclusions': []},
'D': {'description': 'ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY',
'exclusions': ['36', '37']},
'E': {'description': 'WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES',
'exclusions': []},
'F': {'description': 'CONSTRUCTION', 'exclusions': []},
'G': {'description': 'WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES',
'exclusions': []},
'H': {'description': 'TRANSPORTATION AND STORAGE',
'exclusions': ['33.1', '42', '45.20', '77.1', '77.3']},
'I': {'description': 'ACCOMMODATION AND FOOD SERVICE ACTIVITIES',
'exclusions': ['L', 'C']},
'J': {'description': 'INFORMATION AND COMMUNICATION', 'exclusions': []},
'K': {'description': 'FINANCIAL AND INSURANCE ACTIVITIES', 'exclusions': []},
'L': {'description': 'REAL ESTATE ACTIVITIES', 'exclusions': []},
'M': {'description': 'PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES',
'exclusions': []},
'N': {'description': 'ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES',
'exclusions': []},
'O': {'description': 'PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY',
'exclusions': []},
'P': {'description': 'EDUCATION', 'exclusions': []},
'Q': {'description': 'HUMAN HEALTH AND SOCIAL WORK ACTIVITIES',
'exclusions': []},
'R': {'description': 'ARTS, ENTERTAINMENT AND RECREATION', 'exclusions': []},
'S': {'description': 'OTHER SERVICE ACTIVITIES', 'exclusions': []},
'T': {'description': 'ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS- AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN USE',
'exclusions': []},
'U': {'description': 'ACTIVITIES OF EXTRATERRITORIAL ORGANISATIONS AND BODIES',
'exclusions': []}}
A3 plot¶
Implement function plot
which given a db
as created at previous point and a code level
among 1,2,3,4, plots the number of exclusions for all codes of that exact level (so do not include sublevels in the sum), sorted in reversed order.
remember to plot title, notice it should shows the type of level (could be Section, Division, Group, or Class)
try to display labels nicely as in the example output
(if you look at the graph, apparently European Union has a hard time defining what an artist is :-)
IMPORTANT: IF you couldn’t implement the function build_db
, you will still find the complete desired output in file expected_db.py, to import it write: from expected_db import activities_db
[8]:
%matplotlib inline
def plot(db, level):
import matplotlib.pyplot as plt
#jupman-raise
coords = [(code, len(db[code]['exclusions'])) for code in db if len(code.replace('.','')) == level]
coords.sort(key=lambda c: c[1], reverse=True)
coords = coords[:10]
xs = [c[0] for c in coords]
ys = [c[1] for c in coords]
fig = plt.figure(figsize=(13,6)) # width: 10 inches, height 3 inches
plt.bar(xs, ys, 0.5, align='center')
def fix_label(label):
# coding horror, sorry
return label.replace(' ','\n').replace('\nand\n',' and\n').replace('\nof\n',' of\n')
plt.xticks(xs, ['NACE ' + c[0] + '\n' + fix_label(db[c[0]]['description']) for c in coords])
level_names = {
1:'Section',
2:'division',
3:'Group',
4:'Class'
}
plt.title("# of exclusions by %ss (level %s) - SOLUTION" % (level_names[level], level))
#plt.xlabel('level_names[level]')
#plt.ylabel('y')
fig.tight_layout()
plt.savefig('division-exclusions-solution.png')
plt.show()
#/jupman-raise
#Uncomment *only* if you had problems with build_db
#from expected_db import activities_db
#1 Section
#2 Division
#3 Group
#4 Class
plot(activities_db, 2)

Part B¶
### B1 Theory
Write the solution in separate theory.txt
file
B1.1 complexity¶
Given a list L
of n
elements, please compute the asymptotic computational complexity of the following function, explaining your reasoning.
def my_fun(L):
n = len(L)
if n <= 1:
return 1
else:
L1 = L[0:n//2]
L2 = L[n//2:]
a = my_fun(L1) + min(L1) - n
b = my_fun(L2) + min(L2) - n
return a + b
B1.2 describe¶
Briefly describe what a hash table is and provide an example of its usage.
B2 - OfficeQueue¶
An office offers services 'x'
, 'y'
and 'z'
. When people arrive at the office, they state which service they need, get a ticket and enqueue. Suppose at the beginning of the day we are considering there is only one queue.
The office knows on average how much time each service requires:
[9]:
SERVICES = { 'x':5, # minutes
'y':20,
'z':30
}
With this information it is able to inform new clients approximately how long they will need to wait.
OfficeQueue
is implemented as a linked list, where people enter the queue from the tail and leave from the head. We can represent it like this (NOTE: ‘cumulative wait’ is not actually stored in the queue):
wait time: 155 minutes
cumulative wait: 5 10 15 45 50 55 85 105 110 130 150 155
wait times: 5 5 5 30 5 5 30 20 5 20 20 5
x x x z x x z y x y y x
a -> b -> c -> d -> e -> f -> g -> h -> i -> l -> m -> n
^ ^
| |
head tail
Each node holds the client identifier 'a'
, 'b'
, 'c'
, and the service label (like 'x'
) requested by the client:
class Node:
def __init__(self, initdata, service):
self._data = initdata
self._service = service
self._next = None
OfficeQueue
keeps fields _services
, _size
and a field _wait_time
which holds the total wait time of the queue:
class OfficeQueue:
def __init__(self, services):
self._head = None
self._tail = None
self._size = 0
self._wait_time = 0
self._services = dict(services)
[10]:
from office_queue_solution import *
SERVICES = { 'x':5, # minutes
'y':20,
'z':30
}
oq = OfficeQueue(SERVICES)
print(oq)
OfficeQueue:
[11]:
oq.enqueue('a','x')
oq.enqueue('b','x')
oq.enqueue('c','x')
oq.enqueue('d','z')
oq.enqueue('e','x')
oq.enqueue('f','x')
oq.enqueue('g','z')
oq.enqueue('h','y')
oq.enqueue('i','x')
oq.enqueue('l','y')
oq.enqueue('m','y')
oq.enqueue('n','x')
[12]:
print(oq)
OfficeQueue:
x x x z x x z y x y y x
a -> b -> c -> d -> e -> f -> g -> h -> i -> l -> m -> n
[13]:
oq.size()
[13]:
12
Total wait time can be accessed from outside with the method wait_time()
:
[14]:
oq.wait_time()
[14]:
155
ATTENTION: you only need to implement the methods time_to_service
and split
DO NOT touch other methods.
B2.1 - time_to_service¶
Open file office_queue_exercise.py
with and start editing.
In order to schedule work and pauses, for each service office employees want to know after how long they will have to process the first client requiring that particular service.
First service encountered will always have a zero time interval (in this example it’s x
):
wait time: 155
cumulative wait: 5 10 15 45 50 55 85 105 110 130 150 155
wait times: 5 5 5 30 5 5 30 20 5 20 20 5
x x x z x x z y x y y x
a -> b -> c -> d -> e -> f -> g -> h -> i -> l -> m -> n
|| | |
x : 0 | |
| | |
|---------------| |
| z : 15 |
| |
|-----------------------------------|
y : 85
[15]:
SERVICES = { 'x':5, # minutes
'y':20,
'z':30
}
oq = OfficeQueue(SERVICES)
print(oq)
OfficeQueue:
[16]:
oq.enqueue('a','x')
oq.enqueue('b','x')
oq.enqueue('c','x')
oq.enqueue('d','z')
oq.enqueue('e','x')
oq.enqueue('f','x')
oq.enqueue('g','z')
oq.enqueue('h','y')
oq.enqueue('i','x')
oq.enqueue('l','y')
oq.enqueue('m','y')
oq.enqueue('n','x')
print(oq)
OfficeQueue:
x x x z x x z y x y y x
a -> b -> c -> d -> e -> f -> g -> h -> i -> l -> m -> n
Method to implement will return a dictionary mapping each service to the time interval after which the service is first required:
[17]:
oq.time_to_service()
[17]:
{'x': 0, 'y': 85, 'z': 15}
Services not required by any client¶
As a special case, if a service is not required by any client, its time interval is set to the queue total wait time (because a client requiring that service might still show up in the future and get enqueued)
[18]:
oq = OfficeQueue(SERVICES)
oq.enqueue('a','x') # completed after 5 mins
oq.enqueue('b','y') # completed after 5 + 20 mins
print(oq)
OfficeQueue:
x y
a -> b
[19]:
print(oq.wait_time())
25
[20]:
oq.time_to_service() # note z is set to total wait time
[20]:
{'x': 0, 'y': 5, 'z': 25}
Now implement this:
def time_to_service(self):
""" RETURN a dictionary mapping each service to the time interval after which
the service is first required.
- the first service encountered will always have a zero time interval
- If a service is not required by any client, time interval is set to
the queue total wait time
- MUST run in O(n) where n is the size of the queue.
"""
Testing: python3 -m unittest office_queue_test.TestTimeToService
B2.2 split¶
Suppose a new desk is opened: to reduce waiting times the office will comunicate on a screen to some people in the current queue to move to the new desk, thereby creating a new queue. The current queue will be split in two according to this criteria: after the cut, the total waiting time of the current queue should be the same or slightly bigger than the waiting time in the new queue:
ATTENTION: This example is different from previous one (total wait time is 150 instead of 155)
ORIGINAL QUEUE:
wait time = 150 minutes
wait time / 2 = 75 minutes
cumulative wait: 30 50 80 110 115 120 140 145 150
wait times: 30 20 30 30 5 5 20 5 5
z y z z x x y x x
a -> b -> c -> d -> e -> f -> g -> h -> i
^ ^ ^
| | |
head cut here tail
MODIFIED QUEUE:
wait time: 80 minutes
wait times: 30 20 30
cumulative wait: 30 50 80
z y z
a -> b -> c
^ ^
| |
head tail
NEW QUEUE:
wait time: 75 minutes
wait times: 30 5 5 20 5 5
cumulative wait: 30 35 40 60 65 70
z x x y x x
d -> e -> f -> g -> h -> i
^ ^
| |
head tail
Implement this method:
def split(self):
""" Perform two operations:
- MODIFY the queue by cutting it so that the wait time of this cut
will be half (or slightly more) of wait time for the whole original queue
- RETURN a NEW queue holding remaining nodes after the cut - the wait time of
new queue will be half (or slightly less) than original wait time
- If queue to split is empty or has only one element, modify nothing
and RETURN a NEW empty queue
- After the call, present queue wait time should be equal or slightly bigger
than returned queue.
- DO *NOT* create new nodes, just reuse existing ones
- REMEMBER to set _size, _wait_time, _tail in both original and new queue
- MUST execute in O(n) where n is the size of the queue
"""
Testing: python3 -m unittest office_queue_test.SplitTest
[ ]:
Exam - Monday 24, August 2020 - solutions¶
Scientific Programming - Data Science @ University of Trento
Introduction¶
Taking part to this exam erases any vote you had before
What to do¶
Download
datasciprolab-2020-08-24-exam.zip
and extract it on your desktop. Folder content should be like this:Rename
datasciprolab-2020-08-24-FIRSTNAME-LASTNAME-ID
folder: put your name, lastname an id number, likedatasciprolab-2020-08-24-john-doe-432432
From now on, you will be editing the files in that folder. At the end of the exam, that is what will be evaluated.
Edit the files following the instructions in this worksheet for each exercise. Every exercise should take max 25 mins. If it takes longer, leave it and try another exercise.
When done:
if you have unitn login: zip and send to examina.icts.unitn.it/studente
If you don’t have unitn login: tell instructors and we will download your work manually
Part A - Prezzario¶
Open Jupyter and start editing this notebook exam-2020-08-24-exercise.ipynb
You are going to analyze the dataset EPPAT-2018-new-compact.csv
, which is the price list for all products and services the Autonomous Province of Trento may require. Source: dati.trentino.it
DO NOT WASTE TIME LOOKING AT THE WHOLE DATASET!
The dataset is quite complex, please focus on the few examples we provide
We will show examples with pandas, but it is not required to solve the exercises.
[1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_colwidth', -1)
df = pd.read_csv('EPPAT-2018-new-compact.csv', encoding='latin-1')
The dataset contains several columns, but we will consider the following ones:
[2]:
df = df[['Codice Prodotto', 'Descrizione Breve Prodotto', 'Categoria', 'Prezzo']]
df[:22]
[2]:
Codice Prodotto | Descrizione Breve Prodotto | Categoria | Prezzo | |
---|---|---|---|---|
0 | A.02.35.0050 | ATTREZZATURA PER INFISSIONE PALI PILOTI | NaN | NaN |
1 | A.02.35.0050.010 | Attrezzatura per infissione pali piloti. | Noli e trasporti | 109.09 |
2 | A.02.40 | ATTREZZATURE SPECIALI | NaN | NaN |
3 | A.02.40.0010 | POMPA COMPLETA DI MOTORE | NaN | NaN |
4 | A.02.40.0010.010 | fino a mm 50. | Noli e trasporti | 2.21 |
5 | A.02.40.0010.020 | oltre mm 50 fino a mm 100. | Noli e trasporti | 3.36 |
6 | A.02.40.0010.030 | oltre mm 100 fino a mm 150. | Noli e trasporti | 4.42 |
7 | A.02.40.0010.040 | oltre mm 150 fino a mm 200. | Noli e trasporti | 5.63 |
8 | A.02.40.0010.050 | oltre mm 200. | Noli e trasporti | 6.84 |
9 | A.02.40.0020 | GRUPPO ELETTROGENO | NaN | NaN |
10 | A.02.40.0020.010 | fino a 10 KW | Noli e trasporti | 8.77 |
11 | A.02.40.0020.020 | oltre 10 fino a 13 KW | Noli e trasporti | 9.94 |
12 | A.02.40.0020.030 | oltre 13 fino a 20 KW | Noli e trasporti | 14.66 |
13 | A.02.40.0020.040 | oltre 20 fino a 28 KW | Noli e trasporti | 15.62 |
14 | A.02.40.0020.050 | oltre 28 fino a 36 KW | Noli e trasporti | 16.40 |
15 | A.02.40.0020.060 | oltre 36 fino a 56 KW | Noli e trasporti | 28.53 |
16 | A.02.40.0020.070 | oltre 56 fino a 80 KW | Noli e trasporti | 44.06 |
17 | A.02.40.0020.080 | oltre 80 fino a 100 KW | Noli e trasporti | 50.86 |
18 | A.02.40.0020.090 | oltre 100 fino a 120 KW | Noli e trasporti | 55.88 |
19 | A.02.40.0020.100 | oltre 120 fino a 156 KW | Noli e trasporti | 80.47 |
20 | A.02.40.0020.110 | oltre 156 fino a 184 KW | Noli e trasporti | 94.00 |
21 | A.02.40.0030 | NASTRO TRASPORTATORE CON MOTORE AD ARIA COMPRESSA | NaN | NaN |
Pompa completa a motore Example¶
If we look at the dataset, in some cases we can spot a pattern like the following (rows 3 to 8 included):
[3]:
df[3:12]
[3]:
Codice Prodotto | Descrizione Breve Prodotto | Categoria | Prezzo | |
---|---|---|---|---|
3 | A.02.40.0010 | POMPA COMPLETA DI MOTORE | NaN | NaN |
4 | A.02.40.0010.010 | fino a mm 50. | Noli e trasporti | 2.21 |
5 | A.02.40.0010.020 | oltre mm 50 fino a mm 100. | Noli e trasporti | 3.36 |
6 | A.02.40.0010.030 | oltre mm 100 fino a mm 150. | Noli e trasporti | 4.42 |
7 | A.02.40.0010.040 | oltre mm 150 fino a mm 200. | Noli e trasporti | 5.63 |
8 | A.02.40.0010.050 | oltre mm 200. | Noli e trasporti | 6.84 |
9 | A.02.40.0020 | GRUPPO ELETTROGENO | NaN | NaN |
10 | A.02.40.0020.010 | fino a 10 KW | Noli e trasporti | 8.77 |
11 | A.02.40.0020.020 | oltre 10 fino a 13 KW | Noli e trasporti | 9.94 |
We see the first column holds product codes. If two rows share a code prefix, they belong to the same product type. As an example, we can take product A.02.40.0010
, which has 'POMPA COMPLETA A MOTORE'
as description (‘Descrizione Breve Prodotto’ column). The first row is basically telling us the product type, while the following rows are specifying several products of the same type (notice they all share the A.02.40.0010
prefix code until 'GRUPPO ELETTROGENO'
excluded). Each
description specifies a range of values for that product: fino a means until to , and oltre means beyond.
Notice that:
first row has only one number
intermediate rows have two numbers
last row of the product series (row 8) has only one number and contains the word oltre ( beyond ) (in some other cases, last row of product series may have two numbers)
A1 extract_bounds¶
Write a function that given a Descrizione Breve Prodotto as a single string extracts the range contained within as a tuple.
If the string contains only one number n
:
if it contains
UNTIL
( ‘fino’ ) it is considered a first row with bounds(0,n)
if it contains
BEYOND
( ‘oltre’ ) it is considered a last row with bounds(n, math.inf)
DO NOT use constants like measure units ‘mm’, ‘KW’, etc in the code
[22]:
import math
#use this list to rmeove unneeded stuff
PUNCTUATION=[',','-','.','%']
UNTIL = 'fino'
BEYOND = 'oltre'
def extract_bounds(text):
#jupman-raise
fixed_text = text
for pun in PUNCTUATION:
fixed_text = fixed_text.replace(pun, ' ')
words = fixed_text.split()
i = 0
left = None
right = None
while i < len(words) and (not left or not right):
if words[i].isdigit():
if not left:
left = int(words[i])
elif not right:
right = int(words[i])
i += 1
if not right:
if BEYOND in text:
right = math.inf
else:
right = left
left = 0
return (left,right)
#/jupman-raise
assert extract_bounds('fino a mm 50.') == (0,50)
assert extract_bounds('oltre mm 50 fino a mm 100.') == (50,100)
assert extract_bounds('oltre mm 200.') == (200, math.inf)
assert extract_bounds('da diametro 63 mm a diametro 127 mm') == (63, 127)
assert extract_bounds('fino a 10 KW') == (0,10)
assert extract_bounds('oltre 156 fino a 184 KW') == (156,184)
assert extract_bounds('fino a 170 A, avviamento elettrico') == (0,170)
assert extract_bounds('oltre 170 A fino a 250 A, avviamento elettrico') == (170, 250)
assert extract_bounds('oltre 300 A, avviamento elettrico') == (300, math.inf)
assert extract_bounds('tetti piani o con bassa pendenza - fino al 10%') == (0,10)
assert extract_bounds('tetti a media pendenza - oltre al 10% e fino al 45%') == (10,45)
assert extract_bounds('tetti ad alta pendenza - oltre al 45%') == (45, math.inf)
A2 extract_product¶
Write a function that given a filename
, a code
and a unit
, parses the csv until it finds the corresponding code and RETURNS one dictionary with relevant information for that product
Prezzo ( price ) must be converted to float
implement the parsing with a
csv.DictReader
, see exampleas encoding, use
latin-1
[5]:
# Suppose we want to get all info about A.02.40.0010 prefix:
df[3:12]
[5]:
Codice Prodotto | Descrizione Breve Prodotto | Categoria | Prezzo | |
---|---|---|---|---|
3 | A.02.40.0010 | POMPA COMPLETA DI MOTORE | NaN | NaN |
4 | A.02.40.0010.010 | fino a mm 50. | Noli e trasporti | 2.21 |
5 | A.02.40.0010.020 | oltre mm 50 fino a mm 100. | Noli e trasporti | 3.36 |
6 | A.02.40.0010.030 | oltre mm 100 fino a mm 150. | Noli e trasporti | 4.42 |
7 | A.02.40.0010.040 | oltre mm 150 fino a mm 200. | Noli e trasporti | 5.63 |
8 | A.02.40.0010.050 | oltre mm 200. | Noli e trasporti | 6.84 |
9 | A.02.40.0020 | GRUPPO ELETTROGENO | NaN | NaN |
10 | A.02.40.0020.010 | fino a 10 KW | Noli e trasporti | 8.77 |
11 | A.02.40.0020.020 | oltre 10 fino a 13 KW | Noli e trasporti | 9.94 |
A call to
pprint(extract_product('EPPAT-2018-new-compact.csv', 'A.02.40.0010', 'mm'))
Must produce:
{'category': 'Noli e trasporti',
'code': 'A.02.40.0010',
'description': 'POMPA COMPLETA DI MOTORE',
'measure_unit': 'mm',
'models': [{'bounds': (0, 50), 'price': 2.21, 'subcode': '010'},
{'bounds': (50, 100), 'price': 3.36, 'subcode': '020'},
{'bounds': (100, 150), 'price': 4.42, 'subcode': '030'},
{'bounds': (150, 200), 'price': 5.63, 'subcode': '040'},
{'bounds': (200, math.inf),'price': 6.84, 'subcode': '050'}]}
Notice that if we append subcode
to code
(with a dot) we obtain the full product code.
[6]:
import csv
from pprint import pprint
def extract_product(filename, code, measure_unit):
#jupman-raise
c = 0
with open(filename, encoding='latin-1', newline='') as f:
my_reader = csv.DictReader(f, delimiter=',') # Notice we now used DictReader
for d in my_reader:
if d['Codice Prodotto'] == code:
ret = {}
ret['description'] = d['Descrizione Breve Prodotto']
ret['code'] = code
ret['measure_unit'] = measure_unit
ret['models'] = []
if d['Codice Prodotto'].startswith(code + '.'):
ret['category'] = d['Categoria']
subdiz = {}
subdiz['price'] = float(d['Prezzo'])
subdiz['subcode'] = d['Codice Prodotto'][len(code)+1:]
subdiz['bounds'] = extract_bounds(d['Descrizione Breve Prodotto'])
ret['models'].append(subdiz)
return ret
#/jupman-raise
pprint(extract_product('EPPAT-2018-new-compact.csv', 'A.02.40.0010', 'mm'))
assert extract_product('EPPAT-2018-new-compact.csv', 'A.02.40.0010', 'mm') == \
{'category': 'Noli e trasporti',
'code': 'A.02.40.0010',
'description': 'POMPA COMPLETA DI MOTORE',
'measure_unit': 'mm',
'models': [{'bounds': (0, 50), 'price': 2.21, 'subcode': '010'},
{'bounds': (50, 100), 'price': 3.36, 'subcode': '020'},
{'bounds': (100, 150), 'price': 4.42, 'subcode': '030'},
{'bounds': (150, 200), 'price': 5.63, 'subcode': '040'},
{'bounds': (200, math.inf),'price': 6.84, 'subcode': '050'}]}
#pprint(extract_product('EPPAT-2018-new-compact.csv', 'A.02.40.0020', 'KW'))
#pprint(extract_product('EPPAT-2018-new-compact.csv', 'B.02.10.0042', 'mm'))
#pprint(extract_product('EPPAT-2018-new-compact.csv','B.30.10.0010', '%'))
{'category': 'Noli e trasporti',
'code': 'A.02.40.0010',
'description': 'POMPA COMPLETA DI MOTORE',
'measure_unit': 'mm',
'models': [{'bounds': (0, 50), 'price': 2.21, 'subcode': '010'},
{'bounds': (50, 100), 'price': 3.36, 'subcode': '020'},
{'bounds': (100, 150), 'price': 4.42, 'subcode': '030'},
{'bounds': (150, 200), 'price': 5.63, 'subcode': '040'},
{'bounds': (200, inf), 'price': 6.84, 'subcode': '050'}]}
A3 plot_product¶
Implement following function that takes a dictionary as output by previous extract_product
and shows its price ranges.
pay attention to display title and axis labels as shown, using input data and not constants.
in case last range holds a
math.inf
, show a>
signif you don’t have a working
extract_product
, just copy paste data from previous asserts.
[7]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
def plot_product(product):
#jupman-raise
models = product['models']
xs = np.arange(len(models))
ys = [ model["price"] for model in models]
plt.bar(xs, ys, 0.5, align='center')
plt.title('%s (%s) SOLUTION' % (product['description'], product['code']) )
ticks = []
for model in models:
bounds = model["bounds"]
if bounds[1] == math.inf:
ticks.append('>%s' % bounds[0])
else:
ticks.append('%s - %s' % (bounds[0], bounds[1]))
plt.xticks(xs, ticks)
plt.gcf().set_size_inches(11,8)
plt.xlabel(product['measure_unit'])
plt.ylabel('Price (€)')
plt.savefig('pompa-a-motore-solution.png')
plt.show()
#/jupman-raise
product = extract_product('EPPAT-2018-new-compact.csv', 'A.02.40.0010', 'mm')
#product = extract_product('EPPAT-2018-new-compact.csv', 'A.02.40.0020', 'KW')
#product = extract_product('EPPAT-2018-new-compact.csv', 'B.02.10.0042', 'mm')
#product = extract_product('EPPAT-2018-new-compact.csv','B.30.10.0010', '%')
plot_product(product)

Part B¶
B1 Theory¶
Write the solution in separate ``theory.txt`` file
B1.1 complexity¶
Given a list L
of n
elements, please compute the asymptotic computational complexity of the following function, explaining your reasoning.
def my_fun(L):
n = len(L)
tmp = []
for i in range(int(n)):
tmp.insert(0,L[i]-L[int(n/3)])
return sum(tmp)
B1.2 describe¶
Briefly describe what a graph is and the two classic ways that can be used to represent it as a data structure.
B2 couple_sort¶
Open a text editor and edit file linked_list_exercise.py
. Implement this method:
def couple_sort(self):
"""MODIFIES the linked list by considering couples of nodes at *even* indexes
and their successors: if a node data is lower than its successor data, swaps
the nodes *data*.
- ONLY swap *data*, DO NOT change node links.
- if linked list has odd size, simply ignore the exceeding node.
- MUST execute in O(n), where n is the size of the list
"""
Testing: python3 -m unittest linked_list_Test.CoupleSortTest
Example:
[8]:
from linked_list_solution import *
from linked_list_test import to_ll
[9]:
ll = to_ll([4,3,5,2,6,7,6,3,2,4,5,3,2])
[10]:
print(ll)
LinkedList: 4,3,5,2,6,7,6,3,2,4,5,3,2
[11]:
ll.couple_sort()
[12]:
print(ll)
LinkedList: 3,4,2,5,6,7,3,6,2,4,3,5,2
Notice it sorted each couple at even positions. This particular linked list has odd size (13 items), so last item 2 was not considered.
B3 schedule_rec¶
Suppose the nodes of a binary tree represent tasks (nodes data is the task label). Each task may have up to two subtasks, represented by its children. To be declared as completed, each task requires first the completion of all of its subtasks.
We want to create a schedule of tasks, so that to declare completed the task at the root of the tree, before all tasks below it must be completed, specifically first the tasks on the left side, and then the tasks on the right side. If you apply this reasoning recursively, you can obtain a schedule of tasks to be executed.
Open bin_tree_exercise.py
and implement this method:
def schedule_rec(self):
""" RETURN a list of task labels in the order they will be completed.
- Implement it with recursive calls.
- MUST run in O(n) where n is the size of the tree
NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.ScheduleRecTest
Example:
For this tree, it should return the schedule ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
Here we show code execution with the same tree:
[13]:
from bin_tree_solution import *
from bin_tree_test import bt
[17]:
tasks = bt('i',
bt('d',
bt('b',
bt('a')),
bt('c')),
bt('h',
bt('f',
None,
bt('e')),
bt('g')))
[18]:
print(tasks)
i
├d
│├b
││├a
││└
│└c
└h
├f
│├
│└e
└g
[15]:
tasks.schedule_rec()
[15]:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
2017-18 (QCB)¶
See QCB master past exams on sciprolab2.readthedocs.io
NOTE: Those exams are useful, but for you there will be:
no biological examples
less dynamic programming
more exercises on graphs & matrices
exercise on pandas
custom DiGraph won’t have Visit and VertexLog classes
2016-17 (QCB)¶
See davidleoni.github.io/algolab/past-exams.html
WARNING: keep in mind that 2016-17 exams are for Python 2 - in this course we use Python 3
[ ]:
Slides 2019/20¶
Old slides: - 2018/19 slides
Part A¶
Lab A.1¶
Tuesday 24 Sep 2019
Links¶
lab site: datasciprolab.readthedocs.org
Questionnaire: https://tinyurl.com/y6nlnx7l
First Google Colab (for code shown during lesson)
Lesson: Introduction
Lesson (maybe): Python basics
What I expect¶
if you don’t program in Python, you don’t learn Python
you don’t learn Python if you don’t program in Python
to be a successful data scientist, you must know programming
Exercise: now put the right priorities in your TODO list ;-)
Course contents¶
Hands-on approach
Part A - python intro
logic basics
discrete structures basics
python basics
data cleaning
format conversion (matrices, tables, graphs, …)
visualization (matplotlib, graphviz)
some analytics (with pandas)
focus on correct code, don’t care about performance
plus: some software engineering wisdom
Part A exams:
There will always be some practical structured exercise
Examples:
Sometimes, there can also be a more abstract thing with matrices / relations, (i.e. surjective relation)
Part B - algorithms
going from theory taught by Prof. Luca Bianco to Python 3 implementation
performance matters
few Python functions
Python Tutor¶
Let’s meet Python on the web with Python Tutor is a great way to visualize Python code.
Use it as much as possible! . It really provides great guidance about how things are working under the hood.
By default works for standard Python code. If you want to use it also with code from modules (i.e. numpy) you have to select Write code in Python3 with Anaconda (experimental)
Anaconda
System console
Jupyter
Some data types example¶
mutable vs immutable
Examples for
int
float
string
boolean
warning: everything in Python can be interpreted as a boolean !
‘empty’ objects are considered as false:
None
, zero0
, empty string""
, empty list[]
, empty dictdict()
list
Especially when there are examples involving lists, try them in Python tutor !!!!
Let’s start:¶
Lab A.2¶
Thursday 26 Sep 2019
Links¶
If you didn’t yet, please fill questionnaire: https://tinyurl.com/y6nlnx7l
Second Google Colab (for code shown during lesson)
Lesson: Python basics
Lab A.3¶
Tuesday 1st Oct 2019
Links¶
If you didn’t yet, please fill questionnaire: https://tinyurl.com/y6nlnx7l
Third Google Colab (for code shown during lesson)
Lesson: Strings (NOTE: redownload it, was updated)
Material for Part A is being restructured, always pay attention to News section
In general, Part A basic notebooks will be divided in three parts :
Introduction - does not require any prior knowledge
Exercises with functions: require knowing complex statements, functions, cycles
Verify your comprehension: require knowing complex statements, functions, cycles and also Error handling and testing with asserts
Lab A.4¶
Thursday 3rd Oct 2019
Links¶
Fourth Google Colab (for code shown during lesson)
Lesson: Lists (NOTE: redownload it, was updated)
Material for Part A is being restructured, always pay attention to News section
Lab A.5¶
Tuesday 8 Oct 2019
Links¶
Fifth Google Colab (for code shown during lesson)
Lesson: Sets
Lesson: Tuples
Lesson: Dictionaries (NOTE: redownload it, was updated)
Material for Part A is being restructured, always pay attention to News section
SAVE THE DATE:
MIDTERM PART A SIMULATION: 31 october 15:30-17:30 room a202
MIDTERM PART A: 7 november 11:30-13:30 room b106
IMPORTANT: differently from past Part A exams, there will also be an exercise on pandas.
Lab A.6¶
Thursday 10 Oct 2019
Links¶
Sixth Google Colab (for code shown during lesson)
Lesson: Dictionaries (continued)
Lesson: Errors and testing (until testing with assert included)
Lesson: Matrices: list of lists
Material for Part A is being restructured, always pay attention to News section
SAVE THE DATE:
MIDTERM PART A SIMULATION: 31 october 15:30-17:30 room a202
MIDTERM PART A: 7 november 11:30-13:30 room b106
IMPORTANT: differently from past Part A exams, there will also be an exercise on Pandas.
Lab A.7¶
Tuesday 15 Oct 2019
Links¶
Seventh Google Colab (for code shown during lesson)
Lesson (continued): Matrices: list of lists
Lesson: Data formats
Lab A.8¶
Thursday 17 Oct 2019
Links¶
Eight Google Colab (for code shown during lesson)
Lesson: Graph formats until Adjacency lists excluded
Lab A.9¶
Tuesday 22 Oct 2019
Links¶
Ninth Google Colab (for code shown during lesson)
Lesson: Graph formats from Adjacency lists included until end
mention: Binary relations (finish it at home)
Lab A.10¶
Thursday 24 Oct 2019
Links¶
Tenth Google Colab (for code shown during lesson)
lesson: Numpy matrices
lesson: Visualization
Lab A.11¶
Thursday 29 Oct 2019
Links¶
Eleventh Google Colab (for code shown during lesson)
lesson: Pandas
Lab B.1¶
Tuesday 12 November
OOP (first part until magnitude)
Remember that from now on we only use Visual Studio Code
Lab B.2¶
Thursday 14 November
OOP (finish)
At home: try to implement MultiSet class
Remember that from now on we only use Visual Studio Code
Lab B.6¶
Thursday 28 November
Stacks CappedStack, Tasks, (maybe) Stacktris
At home: try to finish whole Stacks worksheet
Lab B.7¶
Tuesday 3 December
Queues (CircularQueue and ItalianQueue)
At home: try to finish whole Queues worksheet
Lab B.12¶
Thursday 19 December
Trees again : Binary Search Trees (added exercises 2.7 and following)
See also Further resources from LeetCode
[ ]:
Commandments¶
The Supreme Committee for the Doctrine of Coding has ruled important Commandments you shall follow.
If you accept their wise words, you shall become a true Python Jedi.
WARNING: if you don’t follow the Commandments, bad things shall happen.
COMMANDMENT 1: You shall test!
To run tests, enter the following command in the terminal:
Windows Anaconda:
python -m unittest my-file
Linux/Mac: remember the three after python command:
python3 -m unittest my-file
WARNING: In the call above, DON’T append the extension .py to my-file
WARNING: Still, on the hard-disk the file MUST be named with a .py at the end, like my-file.py
WARNING: If strange errors occur, make sure to be using python version 3. Just run the interpreter and it will display the current version.
COMMANDMENT 2: You shall also write on paper!
If staring at the monitor doesn’t work, help yourself and draw a representation of the state sof the program. Tables, nodes, arrows, all can help figuring out a solution for the problem.
COMMANDMENT 3: You shall copy exactly the same function definitions as in the exercises!
For example don’t write :
def MY_selection_sort(A):
COMMANDMENT 4: You shall never ever reassign function parameters
def myfun(i, s, L, D):
# You shall not do any of such evil, no matter what the type of the parameter is:
i = 666 # basic types (int, float, ...)
s = "evil" # strings
L = [666] # containers
D = {"evil":666} # dictionaries
# For the sole case of composite parameters like lists or dictionaries,
# you can write stuff like this IF AND ONLY IF the function specification
# requires you to modify the parameter internal elements (i.e. sorting a list
# or changing a dictionary field):
L[4] = 2 # list
D["my field"] = 5 # dictionary
C.my_field = 7 # class
COMMANDMENT 5: You shall never ever reassign self:
Never ever write horrors such as:
class MyClass
def my_method(self, x, y):
self = {a:666} # since self is a kind of dictionary, you might be tempted to do like this
# but to the outside world this will bring no effect.
# For example, let's say somebody from outside makes a call like this:
# mc = MyClass()
# mc.my_method()
# after the call mc will not point to {a:666}
self = ['evil'] # self is only supposed to be a sort of dictionary and passed from outside
self = 6 # self is only supposed to be a sort of dictionary and passed from outside
COMMANDMENT 6: You shall never ever assign values to function nor method calls
WRONG WRONG:
my_fun() = 666
my_fun() = 'evil'
my_fun() = [666]
CORRECT:
With the assignment operator we want to store in the left side a value from the right side, so all of these are valid operations:
x = 5
y = my_fun()
z = []
z[0] = 7
d = dict()
d["a"] = 6
Function calls such as my_fun()
return instead results of calculations in a box that is created just for the purpose of the call and Python will just not allow us to reuse it as a variable. So whenever you see ‘name()’ at the left side, it can’t be possibly follewed by one equality =
sign (but it can be followed by two equality signs ==
if you are performing a comparison).
COMMANDMENT 7: You shall use return command only if you see written “return” in the function description!
If there is no return
in function description, the function is intended to return None
. In this case you don’t even need to write return None
, as Python will do it implicitly for you.
COMMANDMENT 8: You shall never ever redefine system functions
Python has system defined function, for example list
is a Python type. As such, you can use it for example as a function to convert some type to a list:
[1]:
list("ciao")
[1]:
['c', 'i', 'a', 'o']
when you allow the forces of evil to take the best of you, you might be tempted to use reserved words like list
as a variable for you own miserable purposes:
[2]:
list = ['my', 'pitiful', 'list']
Python allows you to do so, but we do not, for the consequences are disastrous.
For example, if you now attempt to use list
for its intended purpose like casting to list, it won’t work anymore:
list("ciao")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-c63add832213> in <module>()
----> 1 list("ciao")
TypeError: 'list' object is not callable
COMMANDMENT 9: Whenever you introduce a variable in a cycle, such variable must be new
If you read carefully Commandment 4 you should not need to be reminded of this Commandment, nevertheless it is always worth restating the Right Way.
If you defined a variable before, you shall not reintroduce it in a for
, since it is as confusing as reassigning function parameters.
So avoid these sins:
[3]:
i = 7
for i in range(3): # sin, you lose i variable
print(i)
0
1
2
[4]:
def f(i):
for i in range(3): # sin again, you lose i parameter
print(i)
[5]:
for i in range(2):
for i in range(3): # debugging hell, you lose i from outer for
print(i)
0
1
2
0
1
2
Introduction solutions¶
Download exercises zip¶
In this practical we will set up a working Python3 development environment and will start familiarizing a bit with Python.
There are many ways to write and execute Python code:
Python tutor (online, visual debugger)
Python interpreter (command line)
Visual Studio Code (editor, good debugger)
Jupyter (notebook)
Google Colab (online, collaborative)
During this lab we see all of them and familiarize with the exercises format. For now ignore the exercises zip and proceed reading.
Installation¶
You will need to install several pieces of software to get a working programming environment. In this section we will install everything that we are going to need in the next few weeks.
Python3 is available for Windows, Mac and Linux. Python3 alone is often not enough, and you will need to install extra system-specific libraries + editors like Visual Studio Code and Jupyter.
Windows/Mac installation¶
To avoid hassles, especially on Win / Mac you should install some so called package manager (Linux distributions already come with a package manager). Among the many options for this course we use the package manager Anaconda for Python 3.7.
Install Anaconda for Python 3.7 (anaconda installer will ask you to install also visual studio code, so accept the kind offer)
If you didn’t in the previous point, install now Visual Studio Code, which is available for all platforms. You can read about it here. Downloads for all platforms can be found here
Linux installation¶
Although you can install Anaconda on Linux, it is usually better to use the system package manager that comes with your distribution.
Check the Python interpreter - most probably you already have one in your distribution, but you have to check it is the right version. In this course we will use python version 3.x. Open a terminal and try typing in:
python3
if you get an error like “python3 command not found” , try typing
python
if you get something like this (mind the version 3):
you are already sorted, just type Ctrl-D
to exit. If it doesn’t work, try typing exit()
and hit Enter
Otherwise you need to install Python 3.
Linux, debian-like(e.g. Ubuntu)
Issue the following commands on a terminal:
sudo apt-get update
sudo apt-get install python3
Linux Fedora:
Issue the following commands on a terminal:
sudo dnf install python3
Install now the package manager
pip
, which is a very convenient tool to install python packages, with the following command (on Fedora the command above should have already installed it):sudo apt-get install python3-pip
Note:
If pip is already installed in your system you will get a message like: python3-pip is already the newest version (3.x.y)
Install Jupyter notebook:
Open the system console and copy and paste this command:
python3 -m pip install --user jupyter -U
It will install jupyter in your user home.
Python tutor¶
Let’s meet Python on the web with Python Tutor is a great way to visualize Python code.
Use it as much as possible! . It really provides great guidance about how things are working under the hood.
By default works for standard Python code. If you want to use it also with code from modules (i.e. numpy) you have to select Write code in Python3 with Anaconda (experimental)
System console¶
Let’s look at the operating system console. In Anaconda installations you must open it with Anaconda Prompt (if you have a Mac but not Anaconda, open the Terminal). We assume Linux users can get around their way.
WARNING: In the system console we are entering commands for the operating system, using the system command language which varies for each operating system. So following commands are not Python !
to see files of the folder you are in you can type
dir
in windows andls
in Mac/Linuxto enter a folder:
cd MYFOLDER
to leave a folder:
cd ..
mind the space between cd and two dots
Python interpreter¶
To start the Python interpreter, from system console run (to open it see previous paragraph)
python
You will see the python interpreter (the one with >>>
), where you can directly issue commands and see the output. If for some reason it doesn’t work, try running
python3
WARNING: you must be running Python 3, in this course we only use that version ! Please check you are indeed using version 3 by looking at the interpreter banner, it should read something similar to this:
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
WARNING: if you take random code from the internet, be sure it is for Python 3
WARNING: the >>>
is there just to tell you you are looking at python interpreter. It is not python code ! If you find written >>>
in some code example , do not copy it !
Now we are all set to start interacting with the Python interpreter. First make sure you are inside the interpreter (you should see a >>>
in the console, if not see previous paragraph), then type in the following instructions:
[2]:
5 + 3
[2]:
8
All as expected. The “In [1]” line is the input, while the “Out [1]” reports the output of the interpreter. Let’s challenge python with some other operations:
[3]:
12 / 5
[3]:
2.4
[4]:
1/133
[4]:
0.007518796992481203
[5]:
2**1000
[5]:
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
And some assignments:
[6]:
a = 10
b = 7
s = a + b
d = a / b
print("sum is:",s, " division is:",d)
sum is: 17 division is: 1.4285714285714286
In the first four lines, values have been assigned to variables through the = operator. In the last line, the print function is used to display the output. For the time being, we will skip all the details and just notice that the print function somehow managed to get text and variables in input and coherently merged them in an output text. Although quite useful in some occasions, the console is quite limited therefore you can close it for now. To exit type Ctrl-D or exit().
Visual Studio Code¶
Visual Studio Code is an Integrated Development Editor (IDE) for text files. It can handle many languages, Python included (python programs are text files ending in .py
).
Features:
open source
lightweight
used by many developers
Python plugin is not the best, but works enough for us
Once you open the IDE Visual Studio Code you will see the welcome screen:
You can find useful information on this tool here. Please spend some time having a look at that page.
Once you are done with it you can close this window pressing on the “x”. First thing to do is to set the python interpreter to use. Click on View –> Command Palette and type “Python” in the text search space. Select Python: Select Workspace Interpreter as shown in the picture below.
Finally, select the python version you want to use (e.g. Python3).
Now you can click on Open Folder to create a new folder to place all the scripts you are going to create. You can call it something like “exercises”. Next you can create a new file, example1.py (.py extension stands for python).
Visual Studio Code will understand that you are writing Python code and will help you with valid syntax for your program.
Warning:
If you get the following error message:
click on Install Pylint which is a useful tool to help your coding experience.
Add the following text to your example1.py file.
[7]:
"""
This is the first example of Python script.
"""
a = 10 # variable a
b = 33 # variable b
c = a / b # variable c holds the ratio
# Let's print the result to screen.
print("a:", a, " b:", b, " a/b=", c)
a: 10 b: 33 a/b= 0.30303030303030304
A couple of things worth nothing. The first three lines opened and closed by """
are some text describing the content of the script. Moreover, comments are proceeded by the hash key (#) and they are just ignored by the python interpreter. Please remember to comment your code, as it helps readability and will make your life easier when you have to modify or just understand the code you wrote some time in the past.
Please notice that Visual Studio Code will help you writing your Python scripts. For example, when you start writing the print line it will complete the code for you (if the Pylint extension mentioned above is installed), suggesting the functions that match the letters written. This useful feature is called code completion and, alongside suggesting possible matches, it also visualizes a description of the function and parameters it needs. Here is an example:
Save the file (Ctrl+S as shortcut). It is convenient to ask the IDE to highlight potential syntactic problems found in the code. You can toggle this function on/off by clicking on View –> Problems. The Problems panel should look like this
Visual Studio Code is warning us that the variable names a,b,c at lines 4,5,6 do not follow Python naming conventions for constants. This is because they have been defined at the top level (there is no structure to our script yet) and therefore are interpreted as constants. The naming convention for constants states that they should be in capital letters. To amend the code, you can just replace all the names with the corresponding capitalized name (i.e. A,B,C). If you do that, and you save the file again (Ctrl+S), you will see all these problems disappearing as well as the green underlining of the variable names. If your code does not have an empty line before the end, you might get another warning “Final new line missing”. Note that these were just warnings and the interpreter in this case will happily and correctly execute the code anyway, but it is always good practice to understand what the warnings are telling us before deciding to ignore them!
Had we by mistake mispelled the print function name (something that should not happen with the code completion tool that suggests functions names!) writing printt (note the double t), upon saving the file, the IDE would have underlined in red the function name and flagged it up as a problem.
This is because the builtin function printt does not exist and the python interpreter does not know what to do when it reads it. Note that printt is actually underlined in red, meaning that there is an error which will cause the interpreter to stop the execution with a failure. Please remember that before running any piece of code all errors must be fixed.
Now it is time to execute the code. By right-clicking in the code panel and selecting Run Python File in Terminal (see picture below) you can execute the code you have just written.
Upon clicking on Run Python File in Terminal a terminal panel should pop up in the lower section of the coding panel and the result shown above should be reported.
Saving script files like the example1.py above is also handy because they can be invoked several times (later on we will learn how to get inputs from the command line to make them more useful…). To do so, you just need to call the python intepreter passing the script file as parameter. From the folder containing the example1.py script:
python3 example1.py
will in fact return:
a: 10 b: 33 a/b= 0.30303030303030304
Before ending this section, let me add another note on errors. The IDE will diligently point you out syntactic warnings and errors (i.e. errors/warnings concerning the structure of the written code like name of functions, number and type of parameters, etc.) but it will not detect semantic or runtime errors (i.e. connected to the meaning of your code or to the value of your variables). These sort of errors will most probably make your code crash or may result in unexpected results/behaviours. In the next section we will introduce the debugger, which is a useful tool to help detecting these errors.
Before getting into that, consider the following lines of code (do not focus on the import line, this is only to load the mathematics module and use its method sqrt):
[8]:
"""
Runtime error example, compute square root of numbers
"""
import math
A = 16
B = math.sqrt(A)
C = 5*B
print("A:", A, " B:", B, " C:", C)
#D = math.sqrt(A-C) # whoops, A-C is now -4!!!
#print(D)
A: 16 B: 4.0 C: 20.0
If you add that code to a python file (e.g. sqrt_example.py), you save it and you try to execute it, you should get an error message as reported above. You can see that the interpreter has happily printed off the vaule of A,B and C but then stumbled into an error at line 9 (math domain error) when trying to compute \(\sqrt{A-C} = \sqrt{-4}\), because the sqrt method of the math module cannot be applied to negative values (i.e. it works in the domain of real numbers).
Please take some time to familiarize with Visual Studio Code (creating files, saving files etc.) as in the next practicals we will take this ability for granted.
The debugger¶
Another important feature of advanced Integrated Development Environments (IDEs) is their debugging capabilities. Visual Studio Code comes with a debugging tool that can help you trace the execution of your code and understand where possible errors hide.
Write the following code on a new file (let’s call it integer_sum.py) and execute it to get the result.
[9]:
""" integer_sum.py is a script to
compute the sum of the first 1200 integers. """
S = 0
for i in range(0, 1201):
S = S + i
print("The sum of the first 1200 integers is: ", S)
The sum of the first 1200 integers is: 720600
Without getting into too many details, the code you just wrote starts initializing a variable S to zero, and then loops from 0 to 1200 assigning each time the value to a variable i, accumulating the sum of S + i in the variable S. A final thing to notice is indentation. In Python it is important to indent the code properly as this provides the right scope for variables (e.g. see that the line S = S + 1 starts more to the right than the previous and following line – this is because it is inside the for loop). You do not have to worry about this for the time being, we will get to this in a later practical…
How does this code work? How does the value of S and i change as the code is executed? These are questions that can be answered by the debugger.
To start the debugger, click on Debug –> Start Debugging (shortcut F5). The following small panel should pop up:
We will use it shortly, but before that, let’s focus on what we want to track. On the left hand side of the main panel, a Watch panel appeared. This is where we need to add the things we want to monitor as the execution of the program goes. With respect to the code written above, we are interested in keeping an eye on the variables S, i and also of the expression S+i (that will give us the value of S of the next iteration). Add these three expressions in the watch panel (click on + to add new expressions). The watch panel should look like this:
do not worry about the message “name X is not defined”, this is normal as no execution has taken place yet and the interpreter still does not know the value of these expressions.
The final thing before starting to debug is to set some breakpoints, places where the execution will stop so that we can check the value of the watched expressions. This can be done by hovering with the mouse on the left of the line number. A small reddish dot should appear, place the mouse over the correct line (e.g. the line corresponding to S = S + 1 and click to add the breakpoint (a red dot should appear once you click).
Now we are ready to start debugging the code. Click on the green triangle on the small debug panel and you will see that the yellow arrow moved to the breakpoint and that the watch panel updated the value of all our expressions.
The value of all expressions is zero because the debugger stopped before executing the code specified at the breakpoint line (recall that S is initialized to 0 and that i will range from 0 to 1200). If you click again on the green arrow, execution will continue until the next breakpoint (we are in a for loop, so this will be again the same line - trust me for the time being).
Now i has been increased to 1, S is still 0 (remember that the execution stopped before executing the code at the breakpoint) and therefore S + i is now 1. Click one more time on the green arrow and values should update accordingly (i.e. S to 1, i to 2 and S + i to 3), another round of execution should update S to 3, i to 3 and S + i to 6. Got how this works? Variable i is increased by one each time, while S increases by i. You can go on for a few more iterations and see if this makes any sense to you, once you are done with debugging you can stop the execution by pressing the red square on the small debug panel.
Please take some more time to familiarize with Visual Studio Code (creating files, saving files, interacting with the debugger etc.) as in the next practicals we will take this ability for granted. Once you are done you can move on and do the following exercises.
Jupyter¶
Jupyter is a handy program to write notebooks organized in cells (files with .ipynb
extension), where there is both code, output of running that code and text. The code by default is Python, but can also be other languages like R). The text is formatted with Markdown language - see cheatsheet. It’s becoming the de-facto standard for writing technical documentation (you can find
everywhere, i.e. blogs).
Run Jupyter¶
Jupyter is a web server, so when you run it, a Jupyter server starts and you should see a system console opening (on Anaconda system you might see it for a very short time), afterwards an internet browser should open. Since Jupyter is a server, what you see in the browser is just the UI which is connecting to the server.
If you have Anaconda :
Launch Anaconda Navigator, and then search and run Jupyter.`
If you don’t have Anaconda:
From system console try to run
jupyter notebook
or, as alternative if the previous doesn’t work:
python3 -m notebook
Editing notebooks¶
Useful shortcuts:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
Some tips:
when something seem wrong in computations, try to clean memory by running
Kernel->Restart and Run all
when you see an asterisk to the side of a cell, maybe the computationg has hanged (an infinite while?). To solve the problem, run
Kernel->shutdown
and then `Kernel -> restart
Browsing notebooks¶
(Optional) To improve your browsing experience, you might wish to install some Jupyter extension , like toc2
which shows paragraphs headers on the sidebar. To install it:
Install the Jupyter contrib extensions package:
If you have Anaconda:
Open Anaconda Prompt, and type:
conda install -c conda-forge jupyter_contrib_nbextensions
If you don’t have Anaconda:
Open a Terminal and type:
python3 -m pip install --user jupyter_contrib_nbextensions
Install it in Jupyter:
jupyter contrib nbextension install --user
Enable extensions
jupyter nbextension enable toc2/main
Once you installed: To see tocs when in a document you will need to press a list button at the right-end of the toolbar.
Course exercise formats¶
In this course, you will find the solutions to the exercises on the website. At the top page of each solution, you will find the link to download a zip like this:
Download exercises zip¶
now unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-other stuff ...
-exercises
|- introduction
|- introduction-exercise.ipynb
|- introduction-solution.ipynb
|- other stuff ..
WARNING 1: to correctly visualize the notebook, it MUST be in an unzipped folder !
Each zip contains both the exercises to do as files to edit, along with their solution in a separate file.
Some exercises will need to be done in Jupyter notebooks (.ipynb
files), while others in plain .py
Python files.
open Jupyter Notebook from that folder. Two things should open, first a console and then browser.
The browser should show a file list: navigate the list and open the notebook
exercises/introduction/introduction-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
now look into the exercise notebook, it should begin with a cell like this:
#Please execute this cell
import sys;
sys.path.append('../../');
import jupman;
import sciprog;
This is because some code is common to all exercises. In particular:
in
jupman.py
there is code for special cell outputs in Jupyter notebooks (like Python tutor or unit tests display)in
sciprog.py
there are common algorithms and data structures used in the course
A notebook always looks for modules in the current directory of the notebook. Since jupman.py
stays a parent directory in the zip, with the lines
import sys;
sys.path.append('../../');
we tell Python to also look modules (= python .py files) in a directory which is two parent folders above the current one.
It is not the most elegant way to locate modules but gets around the quirks of Jupyter fine enough for our purposes.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Python Tutor inside Jupyter¶
We implemented a command jupman.pytut()
to show a Python tutor debugger in a Python notebook. Let’s see how it works.
You can put a call to jupman.pytut()
at the end of a cell, and the cell code will magically appear in python tutor in the output (except the call to pytut() of course).
ATTENTION: To see Python tutor you need to be online!
For this to work you need to be online both when you execute the cell and when visiting the built website.
[10]:
x = 5
y= 7
z = x + y
jupman.pytut()
[10]:
Beware of variables which were initialized in previous cells which won’t be available in Python Tutor, like w
in this case:
[11]:
w = 8
[12]:
x = w + 5
jupman.pytut()
[12]:
Exercises¶
Try to familiarize yourself with Jupyter and Visual Studio Code by doing these exercises in both of them.
Compute the area of a triangle having base 120 units (B) and height 33 (H). Assign the result to a variable named area and print it.
[13]:
# SOLUTION
B = 120
H = 33
Area = B*H/H
print("Triangle area is:", Area)
Triangle area is: 120.0
Compute the area of a square having side (S) equal to 145 units. Assign the result to a variable named area and print it.
[14]:
S = 145
Area = S**2
print("Square area is:",Area)
Square area is: 21025
Modify the program at point 2. to acquire the side S from the user at runtime. (Hint: use the input function and remember to convert the acquired value into an int).
ANSWER:
print("Insert size: ")
S = int(input())
Area = S**2
print("Square area is:",Area)
If you have not done so already, put the two previous scripts in two separate files (e.g.
triangle_area.py
andsquare_area.py
and execute them from the terminal).Write a small script (
trapezoid.py
) that computes the area of a trapezoid having major base (MB) equal to 30 units, minor base (mb) equal to 12 and height (H) equal to 17. Print the resulting area. Try executing the script from inside Visual Studio Code and from the terminal.
[15]:
# SOLUTION
"""trapezoid.py"""
MB = 30
mb = 12
H = 17
Area = (MB + mb)*H/2
print("Trapezoid area is: ", Area)
Trapezoid area is: 357.0
Rewrite the example of the sum of the first 1200 integers by using the following equation: \(\sum\limits_{i=1}^n i = \frac{n (n+1)}{2}\).
[16]:
# SOLUTION
N = 1200
print("Sum of first 1200 integers: ", N*(N+1)/2)
Sum of first 1200 integers: 720600.0
Modify the program at point 6. to make it acquire the number of integers to sum N from the user at runtime.
ANSWER:
print("Input number N:")
N = int(input())
print("Sum of first ", N, " integers: ", N*(N+1)/2)
Write a small script to compute the length of the hypotenuse (c) of a right triangle having sides a=133 and b=72 units (see picture below). Hint: remember the Pythagorean theorem and use math.sqrt).
[17]:
# SOLUTION
import math
a = 133
b = 72
c = math.sqrt(a**2 + b**2)
print("Hypotenuse: ", c)
Hypotenuse: 151.23822268196622
Python basics solutions¶
Download exercises zip¶
References
In this practical we will start interacting more with Python, practicing on how to handle data, functions and methods. We will see some built-in data types (integers, floats, booleans - we will reserve strings for later)
Modules¶
Python modules are simply text files having the extension .py (e.g. exercise.py
). When you were writing the code in the IDE in the previous practical, you were in fact implementing the corresponding module.
As said in the previous practical, once you implemented and saved the code of the module, you can execute it by typing
python3 exercise1.py
or, in Visual Studio Code, by right clicking on the code panel and selecting Run Python File in Terminal.
A Module A
can be loaded from another module B
so that B
can use the functions defined in A
. Remember when we used the sqrt
function? It is defined in the module math
. To import it and use it we indeed wrote something like:
[2]:
import math
x = math.sqrt(4)
print(x)
2.0
When importing modules we do not need to specify the extension .py
of the file.
Objects¶
Python understands very well objects, and in fact everything is an object in Python. Objects have properties (characteristic features) and methods (things they can do). For example, an object car has the properties model, make, color, number of doors etc., and the methods steer right, steer left, accelerate, break, stop, change gear,… According to Python’s official documentation:
“Objects are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects.”
All you need to know for now is that in Python objects have an identifier (ID) (i.e. their name), a type (numbers, text, collections,…) and a value (the actual data represented by the objects). Once an object has been created the identifier and the type never change, while its value can either change (mutable objects) or stay constant (immutable objects).
Python provides these built-in data types:
We will stick with the simplest ones for now, but later on we will dive deeper into the all of them.
Variables¶
Variables are just references to objects, in other words they are the name given to an object. Variables can be assigned to objects by using the assignment operator =
.
The instruction
[3]:
sides = 4
might represent the number of sides of a square. What happens when we execute it in Python? An object is created, it is given an identifier, its type is set to “int” (an integer number), it value to 4 and a name sides is placed in the current namespace to point to that object, so that after that instruction we can access that object through its name. The type of an object can be accessed with the function type() and the identifier with the function id():
[4]:
sides = 4
print( type(sides) )
print( id(sides) )
<class 'int'>
94241937814656
Consider now the following code:
[5]:
sides = 4 # a square
print ("value:", sides, " type:", type(sides), " id:", id(sides))
sides = 5 # a pentagon
print ("value:", sides, " type:", type(sides), " id:", id(sides))
value: 4 type: <class 'int'> id: 94241937814656
value: 5 type: <class 'int'> id: 94241937814688
The value of the variable sides has been changed from 4 to 5, but as stated in the table above, the type int
is immutable. Luckily, this did not prevent us to change the value of sides from 4 to 5. What happened behind the scenes when we executed the instruction sides = 5 is that a new object has been created of type int (5 is still an integer) and it has been made accessible with the same name sides, but since it is a different object (i.e. the integer 5) you can see that the
identifier is actually different. Note: you do not have to really worry about what happens behind the scenes, as the Python interpreter will take care of these aspects for you, but it is nice to know what it does.
You can even change the type of a variable during execution but that is normally a bad idea as it makes understanding the code more complicated.
You can do (but, please, refrain!):
[6]:
sides = 4 # a square
print ("value:", sides, " type:", type(sides), " id:", id(sides))
sides = "four" #the sides in text format
print ("value:", sides, " type:", type(sides), " id:", id(sides))
value: 4 type: <class 'int'> id: 94241937814656
value: four type: <class 'str'> id: 140613404719232
IMPORTANT NOTE: You can chose the name that you like for your variables (I advise to pick something reminding their meaning), but you need to adhere to some simple rules:
Names can only contain upper/lower case digits (
A-Z
,a-z
), numbers (0-9
) or underscores_
;Names cannot start with a number;
Names cannot be equal to reserved keywords:
variable names should start with a lowercase letter
Exercise: variable names¶
For each of the following names, try to guess if it is a valid variable name or not, then try to assign it in following cell
my-variable
my_variable
theCount
the count
some@var
MacDonald
7channel
channel7
stand.by
channel45
maybe3maybe
"ciao"
'hello'
as
PLEASE: DO UNDERSTAND THE VERY IMPORTANT DIFFERENCE BETWEEN THIS AND FOLLOWING TWOs !!!asino
As
lista
PLEASE: DO UNDERSTAND THE VERY IMPORTANT DIFFERENCE BETWEEN THIS AND FOLLOWING TWOs !!!list
DO NOT EVEN TRY TO ASSIGN THIS ONE IN THE INTERPRETER (likelist = 5
), IF YOU DO YOU WILL BASICALLY BREAK PYTHONList
black&decker
black & decker
glab()
caffè
(notice the accentedè
!)):-]
€zone
(notice the euro sign)some:pasta
aren'tyouboredyet
<angular>
[7]:
# write here
Numeric types¶
We already mentioned that numbers are immutable objects. Python provides different numeric types: integers, reals (floats), booleans and even complex numbers and fractions (but we will not get into those).
Integers¶
Their range of values is limited only by the memory available. As we have already seen, python provides also a set of standard operators to work with numbers:
[8]:
a = 7
b = 4
a + b # 11
a - b # 3
a // b # integer division: 1
a * b # 28
a ** b # power: 2401
a / b # division 0.8333333333333334
type(a / b)
[8]:
float
Note that in the latter case the result is no more an integer, but a float (we will get to that later).
Booleans¶
These objects are used for the boolean algebra and have type bool
.
Truth values are represented with the keywords True
and False
in Python, a boolean object can only have value True
or False
.
[9]:
x = True
[10]:
x
[10]:
True
[11]:
type(x)
[11]:
bool
[12]:
y = False
[13]:
type(y)
[13]:
bool
Boolean operators¶
We can operate on boolean values with the boolean operators not
, and
, or
. Recall boolean algebra for their use:
[14]:
print("not True: ", not True) # False
print("not False: ", not False) # True
print()
print("False and False: ", False and False) # False
print("False and True: ", False and True ) # False
print("True and False: ", True and False) # False
print("True and True: ", True and True) # True
print()
print("False or False: ", False or False) # False
print("False or True: ", False or True) # True
print("True or False: ", True or False) # True
print("True or True: ", True or True) # True
not True: False
not False: True
False and False: False
False and True: False
True and False: False
True and True: True
False or False: False
False or True: True
True or False: True
True or True: True
Booleans exercise: constants¶
Try to guess the result of these boolean expressions (first guess, and then try it out !!)
not (True and False)
(not True) or (not (True or False))
not (not True)
not (True and (False or True))
not (not (not False))
True and (not (not((not False) and True)))
False or (False or ((True and True) and (True and False)))
Booleans exercise: variables¶
For which values of x
and y
these expressions give True
? Try to think the answer before trying it !!!!
NOTE: there can be more combinations that produce True
, try to find all of them.
x or (not x)
(not x) and (not y)
x and (y or y)
x and (not y)
(not x) or y
y or not (y and x)
x and ((not x) or not(y))
(not (not x)) and not (x and y)
x and (x or (not(x) or not(not(x or not (x)))))
For which values of x
, y
and z
these expressions give False
?
NOTE: there can be more combinations that produce False
, try to find all of them.
x or ((not y) or z)
x or (not y) or (not z)
not (x and y and (not z))
not (x and (not y) and (x or z))
y or ((x or y) and (not z))
Boolean conversion¶
We can convert booleans into integers with the builtin function int
. Any integer can be converted into a boolean (and vice-versa) with bool
:
[15]:
a = bool(1)
b = bool(0)
c = bool(72)
d = bool(-5)
t = int(True)
f = int(False)
print("a: ", a)
print("b: ", b)
print("c: ", c)
print("d: ", d)
print("t: ", t)
print("f: ", f)
a: True
b: False
c: True
d: True
t: 1
f: 0
Any integer is evaluated to True
, except 0
. Note that, the truth values True
and False
respectively behave like the integers 1
and 0
.
Booleans exercise: what is a boolean?¶
Read carefully previous description of booleans, and try to guess the result of following expressions.
bool(True)
bool(False)
bool(2 + 4)
bool(4-3-1)
int(4-3-1)
True + True
True + False
True - True
True * True
Numeric operators¶
Numeric comparators are operators that return a boolean value. Here are some examples (from the lecture):
Example: Given a variable a = 10
and a variable b = 77
, let’s swap their values (i.e. at the end a
will be equal to 77
and b
to 10
). Let’s also check the values at the beginning and at the end.
[16]:
a = 10
b = 77
print("a: ", a, " b:", b)
print("is a equal to 10?", a == 10)
print("is b equal to 77?", b == 77)
TMP = b # we need to store the value of b safely
b = a # ok, the old value of b is gone... is it?
a = TMP # a gets the old value of b... :-)
print()
print("a: ", a, " b:", b)
print("is a equal to 10?", a == 10)
print("is a equal to 77?", a == 77)
print("is b equal to 10?", b == 10)
print("is b equal to 77?", b == 77)
a: 10 b: 77
is a equal to 10? True
is b equal to 77? True
a: 77 b: 10
is a equal to 10? False
is a equal to 77? True
is b equal to 10? True
is b equal to 77? False
Numeric operators exercise: cycling¶
Write a program that given three variables with numebers a
,b
,c
, cycles the values, that is, puts the value of a
in b
, the value of b
in c
, and the value of c
in a
.
So if you begin like this:
a = 4
b = 7
c = 9
After the code that you will write, by running this:
print(a)
print(b)
print(c)
You should see
9
4
7
There are various ways to do it, try to use only one temporary variable and be careful not to lose values !
HINT: to help yourself, try to write down in comments the state of the memory, and think which command to do
# a b c t which command do I need?
# 4 7 9
# 4 7 9 7 t = b
#
#
#
[17]:
a = 4
b = 7
c = 9
# write code here
print(a)
print(b)
print(c)
4
7
9
[18]:
# SOLUTION
a = 4
b = 7
c = 9
# a b c t which command do I need?
# 4 7 9
# 4 7 9 7 t = b
# 4 4 9 7 b = a
# 9 4 9 7 a = c
# 9 4 7 7 c = t
t = b
b = a
a = c
c = t
print(a)
print(b)
print(c)
9
4
7
[19]:
# SOLUTION
Real numbers¶
Python stores real numbers (floating point numbers) in 64 bits of information divided in sign, exponent and mantissa.
Exercise: Let’s calculate the area of the center circle of a football pitch (radius = 9.15m) recalling that \(area= Pi*R^2\) (as power operator, use **
):
[20]:
# SOLUTION
R = 9.15
Pi = 3.1415926536
Area = Pi*(R**2)
print(Area)
263.02199094102605
Note that the parenthesis around the R**2
are not necessary as operator **
has the precedence, but I personally think it helps readability.
Here is a reminder of the precedence of operators:
Example: Let’s compute the GC content of a DNA sequence 33 base pairs long, having 12 As, 9 Ts, 5 Cs and 7Gs. The GC content can be expressed by the formula: \(gc = \frac{G+C}{A+T+C+G}\) where A,T,C,G represent the number of nucleotides of each kind. What is the AT content? Is the GC content higher than the AT content ?
[21]:
A = 12
T = 9
C = 5
G = 7
gc = (G+C)/(A+T+C+G)
print("The GC content is: ", gc)
at = 1 - gc
print("Is the GC content higher than AT content? ", gc > at)
The GC content is: 0.36363636363636365
Is the GC content higher than AT content? False
Real numbers exercise: quadratic equation¶
Calculate the zeros of the equation \(ax^2-b = 0\) where a = 10
and b = 1
. Hint: use math.sqrt
or ** 0.5
. Finally check that substituting the obtained value of x
in the equation gives zero.
[22]:
# SOLUTION
import math
A = 10
B = 1
x = math.sqrt(B/A)
print("10x**2 - 1 = 0 for x:", x)
print("Is x a solution?", 10*x**2 -1 == 0)
10x**2 - 1 = 0 for x: 0.31622776601683794
Is x a solution? True
Strings solutions¶
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-my_lib.py
-other stuff ...
-exercises
|- lists
|- strings-exercise.ipynb
|- strings-solution.ipynb
|- other stuff ..
WARNING 1: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/strings/strings-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate to the unzipped folder while in Jupyter browser!
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Introduction¶
References:
extra if you want to do text mining:
Read before first part on Unicode encoding from chapter Strings in book Dive into Python
Look at NLTK library
Strings are immutable objects (note the actual type is str) used by python to handle text data. Strings are sequences of unicode code points that can represent characters, but also formatting information (e.g. ‘\n’ for new line). Unlike other programming languages, python does not have the data type character, which is represented as a string of length 1.
There are several ways to define a string:
[1]:
S = "my first string, in double quotes"
S1 = 'my second string, in single quotes'
S2 = '''my third string is
in triple quotes
therefore it can span several lines'''
S3 = """my fourth string, in triple double-quotes
can also span
several lines"""
print(S, '\n') #let's add a new line at the end of the string with \n
print(S1,'\n')
print(S2, '\n')
print(S3, '\n')
my first string, in double quotes
my second string, in single quotes
my third string is
in triple quotes
therefore it can span several lines
my fourth string, in triple double-quotes
can also span
several lines
To put special characters like ‘,” and so on you need to “escape them” (i.e. write them following a back-slash).
Example: Let’s print a string containing a quote and double quote (i.e. ‘ and “).
[2]:
myString = "This is how I \'quote\' and \"double quote\" things in strings"
print(myString)
This is how I 'quote' and "double quote" things in strings
Strings can be converted to and from numbers with the functions str()
, int()
or float()
.
Example: Let’s define a string myString with the value “47001” and convert it into an int
. Try adding one and print the result.
[3]:
my_string = "47001"
print(my_string, " has type ", type(my_string))
my_int = int(my_string)
print(my_int, " has type ", type(my_int))
my_int = my_int + 7 #adds seven
my_string = my_string + "7" # cannot add 7 (we need to use a string).
# This will append 7 at the end of the string
#my_string = my_string + 7 # CANNOT DO THIS, python will complain about concatenating a stirng to a different type,
# in this case an int
my_string = my_string + str(7) # this works, I have to force before the conversion of inter to string.
print(my_int)
print(my_string)
47001 has type <class 'str'>
47001 has type <class 'int'>
47008
4700177
Python defines some operators to work with strings. Recall the slides shown during the lecture:
Exercise: many hello¶
Look at the table above. Given the string x = "hello"
, print a string with "hello"
repeated 5 times: "hellohellohellohellohello"
. Your code must work with any string stored in the variable x
[4]:
x = "hello"
# write here
print("hello"*5)
hellohellohellohellohello
Exercise: interleave terns¶
Given two strings which both have length 3
, print a string which interleaves characters from both strings. Your code should work for any string of such lenght.
Example:
Given
x="say"
y="hi!"
should print
shaiy!
[5]:
# write here
x="say"
y="hi!"
print(x[0] + y[0] + x[1] + y[1] + x[2] + y[2])
shaiy!
Exercise: print length¶
Write some code that given a string x
, prints the content of the string followed by its length. Your code should work for any content of the variable x
.
Example:
Given
x = 'howdy'
should print
howdy5
[6]:
# write here
x = 'howdy'
print(x + str(len(x)))
howdy5
Exercise: both contained¶
You are given two strings x
and y
, and a third one z
. Write some code that prints True
if x
and y
are contained in z
.
For example,
Given
x = 'cad'
y = 'ra'
z = 'abracadabra'
it should print
True
x = 'zam'
y = 'ra'
z = 'abracadabra'
it should print
False
[7]:
# write here
x = 'cad'
y = 'ra'
z = 'abracadabra'
print((x in z) and (y in z))
True
Slicing¶
We can access strings at specific positions (indexing) or get a substring starting from a position S to a position E. The only thing to remember is that numbering starts from 0. Thei
-th character of a string can be accessed as str[i-1]
. Substrings can be accessed as str[S:E]
, optionally a third parameter can be specified to set the step (i.e. str[S:E:STEP]
).
Important note. Remember that when you do str[S:E], S is inclusive, while E is exclusive (see S[0:6] below).
Let’s see these aspects in action with an example:
[8]:
S = "Luther College"
print(S) #print the whole string
print(S == S[:]) #a fancy way of making a copy of the original string
print(S[0]) #first character
print(S[3]) #fourth character
print(S[-1]) #last character
print(S[0:6]) #first six characters
print(S[-7:]) #final seven characters
print(S[0:len(S):2]) #every other character starting from the first
print(S[1:len(S):2]) #every other character starting from the second
Luther College
True
L
h
e
Luther
College
Lte olg
uhrClee
Exercise: garalampog¶
Write some code to extract and print alam
from the string "garalampog"
. Try to correctly guess indeces.
[9]:
x = "garalampog"
# write here
# 0123456789
print(x[3:7])
alam
Exercise: ifE:nbsphinx-math:te:nbsphinx-math:`nfav `lkD lkWe¶
Write some code to extract and print kS
from the string "ifE\te\nfav lkD lkWe"
. Mind the spaces and special characters (you might want to print x
first). Try to correctly guess indeces.
[10]:
x = "ifE\te\nfav lkD lkWe"
# write here
# 0123 45 67890123456789
#x = "ifE\te\nfav lkD lkWe"
print(x[12:14])
kD
Exercise: javarnanda¶
Given a string x
, write some code to extract and print its last 3 characters and join them to the first 3. Code should work for any string of length at least 3.
Example:
Given
x = "javarnanda"
it should print
javnda
Given
x = "abcd"
it should print
abcbcd
[11]:
# write here
x = "abcd"
print(x[:3] + x[-3:])
abcbcd
Methods for the str object¶
The object str
has some methods that can be applied to it (remember methods are things you can do on objects). Recall from the lecture that the main methods are:
ATTENTION: Since Strings are immutable, every operation that changes the string actually produces a new str object having the modified string as value.
Example
[12]:
my_string = "ciao"
anotherstring = my_string.upper()
print(anotherstring)
CIAO
[13]:
print(my_string) # didn't change
ciao
If you are unsure about a method (for example strip
), you can ask python help like this:
NOTICE there are no round parenthesis after the method !!!
[14]:
help("ciao".strip)
Help on built-in function strip:
strip(...) method of builtins.str instance
S.strip([chars]) -> str
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
Exercise substitute¶
Given a string x
, write some code to print a string like x
but with all occurrences of bab
substituted by dada
Example:
Given
x = 'kljsfsdbabòkkrbabej'
it should print
kljsfsddadaòkkrdadaej
[15]:
# write here
x = 'kljsfsdbabòkkrbabej'
print(x.replace('bab', 'dada'))
kljsfsddadaòkkrdadaej
Exercise hatespace¶
Given a string x
which may contain blanks ( spaces, special controls characters such as \t
and n
, …) at the beginning and end, write some code that prints the string without the blanks and the strings START
and END
at the extremities.
Example:
Given
x = ' \t \n \n hatespace\n \t \n'
prints
STARThatespaceEND
[16]:
# write here
x = ' \t \n \n hatespace\n \t \n'
print('START' + x.strip() + 'END')
STARThatespaceEND
Exercises with functions¶
ATTENTION
Following exercises require you to know:
Complex statements: Andrea Passerini slides A03
Functions: Andrea Passerini slides A04
length¶
✪ a. Write a function length1(s)
in which, given a string, RETURN the length of the string. Use len
function. For example, with "ciao"
string your function should return 4
while with "hi"
it should return 2
>>> x = length1("ciao")
>>> x
4
✪ b. Write a function length2
that like before calculates the string length, this time without using len
(instead, use a for
cycle)
>>> y = length2("mondo")
>>> y
5
[17]:
# write here
# version with len, faster because python with a string always mantains in memory
# the number of length immediately available
def length1(s):
return len(s)
# version with counter, slower
def length2(s):
counter = 0
for character in s:
counter = counter + 1
return counter
contains¶
✪ Write the function contains(word, character)
, which RETURN True
is the string contains the given character, otherwise RETURN False
Use
in
operator
>>> x = contains('ciao', 'a')
>>> x
True
>>> y = contains('ciao', 'z')
>>> y
False
[18]:
# write here
def contains(word, character):
return character in word
invertilet¶
✪ Write the function invertilet(first, second)
which takes in input two strings of length greater than 3, and RETURN a nnew string in which the words are concataned and separated by a space, the last two characters in the words are inverted. For example, if you pass in input 'ciao'
and 'world'
, the function should RETURN 'ciad worlo'
If the two strings are not of adequate length, the program PRINTS error!
HINT: use slices
NOTE 1: PRINTing is different from RETURNing !!! Whatever gets printed is shown to the user but Python cannot reuse it for calculations.
NOTE 2: if a function does not explicitly return anything, Python implicitly returns None
.
NOTE 3: Resorting to prints on error conditions is not actually good practice, here we use it as invitation to think about what happens when you print something and do not return anything. You can read a discussion about it in Errors handling and testing page
>>> x = invertilet("ciao", "world")
>>> x
'ciad worlo'
>>> x = invertilet('hi','mondo')
'errore!'
>>> x
None
>>> x = invertilet('cirippo', 'bla')
'errore!'
>>> x
None
[19]:
# write here
def invertilet(first,second):
if len(first) <= 3 or len(second) <=3:
print("errore!")
else:
return first[:-1] + second[-1] + " " + second[:-1] + first[-1]
nspace¶
✪ Write the function nspace
that given a string s
in input, RETURN a new string in which the n
-character is a space.
For example, given the string 'largamente'
and the index 5
, the program should RETURN the string 'larga ente'
. NOTE: if the number is too big (for example, the word has 6 characters and you pass the number 9), the program PRINTS error!.
NOTE 1: if the number is too big (for example, the word has 6 character and you pass the number 9), the program PRINTS error!.
NOTE 2: PRINTing is different from RETURNing !!! Whatever gets printed is shown to the user but Python cannot reuse it for other calculations.
NOTE 3: Resorting to prints on error conditions is not actually a good practice, here we use it as invitation to think about what happens when you print something and do not return anything. You can read a discussion about it in Errors handling and testing page
>>> x = nspazio('largamente', 5)
>>> x
'larga ente'
>>> x = nspazio('ciao', 9)
errore!
>>> x
None
>>> x = nspazio('ciao', 4)
errore!
>>> x
None
[20]:
# write here
def nspace(word, index):
if index >= len(word):
print("error!")
return word[:index] + ' ' + word[index+1:]
#nspace("largamente", 5)
startend¶
✪ Write a Python program which takes a string s
, and if it has a length greater than 4, the program PRINTS the first and last two characters, otherwise, PRINTS I want at least 4 characters
. For example, by passing "ciaomondo"
, the function should print "cido"
. By passing "ciao"
it should print ciao
and by passing "hi"
it should print I want at least 4 characters
.
>>> startend('ciaomondo')
cido
>>> startend('hi')
Voglio almeno 4 caratteri
[21]:
# write here
def startend(s):
if len(s) >= 4:
print(s[:2] + s[-2:])
else:
print("I want at least 4 characters")
swap¶
Write a function that given a string, swaps the first and last character and PRINTS the result.
For example, given the string "world"
, the program will PRINT 'oondm'
>>> swap('mondo')
oondm
[22]:
# write here
def swap(s):
print(s[-1] + s[1:-1] + s[0])
Verify comprehension¶
ATTENTION
Following exercises require you to know:
Complex statements: Andrea Passerini slides A03
Functions: Andrea Passerini slides A04
Tests with asserts: Following exercises contain automated tests to help you spot errors. To understand how to do them, read before Error handling and testing
has_char¶
✪ RETURN True
if word
contains char
, False
otherwise
use
while
cycle (just for didactical purposes, usingin
would certainly be faster & shorter)
[23]:
def has_char(word, char):
#jupman-raise
index = 0 # initialize index
while index < len(word):
if word[index] == char:
return True # we found the character, we can stop search
index += 1 # it is like writing index = index + 1
# if we arrive AFTER the while, there is only one reason:
# we found nothing, so we have to return False
return False
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert has_char("ciao", 'a')
assert not has_char("ciao", 'A')
assert has_char("ciao", 'c')
assert not has_char("", 'a')
assert not has_char("ciao", 'z')
# TEST END
count¶
✪ RETURN the number of occurrences of char
in word
NOTE: I DO NOT WANT A PRINT, IT MUST RETURN THE VALUE !
Use the cycle for in (just for didactical purposes, strings already provide a method to do it fast - which one?)
[24]:
def count(word, char):
#jupman-raise
occurrences = 0
for c in word:
#print("current character = ", char) # debugging prints are allowed
if c == char:
#print("found occurrence !") # debugging prints are allowed
occurrences += 1
return occurrences # THE IMPORTANT IS TO _RETURN_ THE VALUE AS THE EXERCISE TEXT REQUIRES !!
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert count("ciao", "z") == 0
assert count("ciao", "c") == 1
assert count("babbo", "b") == 3
assert count("", "b") == 0
assert count("ciao", "C") == 0
# TEST END
dialect¶
✪✪ There exist a dialect in which all the "a"
must be always preceded by a "g"
. In case a word contains an "a"
not preceded by a "g"
, we can say with certainty that this word does not belong to the dialect. Write a function that given a word, RETURN True
if the word respects the rules of the dialect, False
otherwise.
>>> dialect("ammot")
False
>>> print(dialect("paganog")
False
>>> print(dialect("pgaganog")
True
>>> print(dialect("ciao")
False
>>> dialect("cigao")
True
>>> dialect("zogava")
False
>>> dialect("zogavga")
True
[25]:
def dialect(word):
#jupman-raise
n = 0
for i in range(0,len(word)):
if word[i] == "a":
if i == 0 or word[i - 1] != "g":
return False
return True
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert dialect("a") == False
assert dialect("ab") == False
assert dialect("ag") == False
assert dialect("ag") == False
assert dialect("ga") == True
assert dialect("gga") == True
assert dialect("gag") == True
assert dialect("gaa") == False
assert dialect("gaga") == True
assert dialect("gabga") == True
assert dialect("gabgac") == True
assert dialect("gabbgac") == True
assert dialect("gabbgagag") == True
# TEST END
countvoc¶
✪✪ Given a string, write a function that counts the number of vocals. If the vocals number is even, RETURN the number of vocals, otherwise raises exception ValueError
>>> countvoc("arco")
2
>>> count_voc("ciao")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-058310342431> in <module>()
16 countvoc("arco")
---> 19 countvoc("ciao")
ValueError: Odd vocals !
[26]:
def countvoc(word):
#jupman-raise
n_vocals = 0
vocals = ["a","e","i","o","u"]
for char in word:
if char.lower() in vocals:
n_vocals = n_vocals + 1
if n_vocals % 2 == 0:
return n_vocals
else:
raise ValueError("Odd vocals !")
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert countvoc("arco") == 2
assert countvoc("scaturire") == 4
try:
countvoc("ciao") # with this string we expect it raises exception ValueError
raise Exception("I shouldn't arrive until here !")
except ValueError: # if it raises the exception ValueError, it is behaving as expected and we do nothing
pass
try:
countvoc("aiuola") # with this string we expect it raises exception ValueError
raise Exception("I shouldn't arrive until here !")
except ValueError: # if it raises the exception ValueError, it is behaving as expected and we do nothing
pass
palindrome¶
✪✪✪ A word is palindrome if it exactly the same when you read it in reverse
Write a function the RETURN True
if the given word is palindrome, False
otherwise
assume that the empty string is palindrome
Example:
>>> x = palindrome('radar')
>>> x
True
>>> x = palindrome('scatola')
>>> x
False
There are various ways to solve this problems, some actually easy & elegant. Try to find at least a couple of them (don’t need to bang your head with the recursive one ..).
[27]:
def palindrome(word):
#jupman-raise
for i in range(len(word) // 2):
if word[i] != word[len(word)- i - 1]:
return False
return True # note it is OUTSIDE for: after passing all controls,
# we can conclude that the word it is actually palindrome
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert palindrome('') == True # we assume the empty string is palindrome
assert palindrome('a') == True
assert palindrome('aa') == True
assert palindrome('ab') == False
assert palindrome('aba') == True
assert palindrome('bab') == True
assert palindrome('bba') == False
assert palindrome('abb') == False
assert palindrome('abba') == True
assert palindrome('baab') == True
assert palindrome('abbb') == False
assert palindrome('bbba') == False
assert palindrome('radar') == True
assert palindrome('scatola') == False
[ ]:
extract_email¶
**COMMANDMENT 4 (adapted for strings): You shall never ever reassign function parameters **
def myfun(s):
# You shall not do any of such evil, no matter what the type of the parameter is:
s = "evil" # strings
[28]:
def extract_email(s):
""" Takes a string s formatted like
"lun 5 nov 2018, 02:09 John Doe <john.doe@some-website.com>"
and RETURN the email "john.doe@some-website.com"
NOTE: the string MAY contain spaces before and after, but your function must be able to extract email anyway.
If the string for some reason is found to be ill formatted, raises ValueError
"""
#jupman-raise
stripped = s.strip()
i = stripped.find('<')
return stripped[i+1:len(stripped)-1]
#/jupman-raise
assert extract_email("lun 5 nov 2018, 02:09 John Doe <john.doe@some-website.com>") == "john.doe@some-website.com"
assert extract_email("lun 5 nov 2018, 02:09 Foo Baz <mrfoo.baz@blabla.com>") == "mrfoo.baz@blabla.com"
assert extract_email(" lun 5 nov 2018, 02:09 Foo Baz <mrfoo.baz@blabla.com> ") == "mrfoo.baz@blabla.com" # with spaces
canon_phone¶
✪ Implement a function that canonicalize canonicalize a phone number as a string. It must RETURN the canonical version of phone as a string.
For us, a canonical phone number:
contains no spaces
contains no international prefix, so no
+39
nor0039
: we assume all calls where placed from Italy (even if they have international prefix)
For example, all of these are canonicalized to "0461123456"
:
+39 0461 123456
+390461123456
0039 0461 123456
00390461123456
These are canonicalized as the following:
328 123 4567 -> 3281234567
0039 328 123 4567 -> 3281234567
0039 3771 1234567 -> 37711234567
REMEMBER: strings are immutable !!!!!
[29]:
def phone_canon(phone):
#jupman-raise
p = phone.replace(' ', '')
if p.startswith('0039'):
p = p[4:]
if p.startswith('+39'):
p = p[3:]
return p
#/jupman-raise
assert phone_canon('+39 0461 123456') == '0461123456'
assert phone_canon('+390461123456') == '0461123456'
assert phone_canon('0039 0461 123456') == '0461123456'
assert phone_canon('00390461123456') == '0461123456'
assert phone_canon('003902123456') == '02123456'
assert phone_canon('003902120039') == '02120039'
assert phone_canon('0039021239') == '021239'
phone_prefix¶
✪✪ We now want to extract the province prefix from phone numbers (see previous exercise) - the ones we consider as valid are in province_prefixes
list.
Note some numbers are from mobile operators and you can distinguish them by prefixes like 328
- the ones we consider are in mobile_prefixes
list.
Implement a function that RETURN the prefix of the phone as a string. Remember first to make it canonical !!
If phone is mobile, RETURN string
'mobile'
. If it is not a phone nor a mobile, RETURN the string'unrecognized'
To determine if the phone is mobile or from province, use
province_prefixes
andmobile_prefixes
lists.DO USE THE PREVIOUSLY DEFINED FUNCTION
phone_canon(phone)
[30]:
province_prefixes = ['0461', '02', '011']
mobile_prefixes = ['330', '340', '328', '390', '3771']
def phone_prefix(phone):
#jupman-raise
c = phone_canon(phone)
for m in mobile_prefixes:
if c.startswith(m):
return 'mobile'
for p in province_prefixes:
if c.startswith(p):
return p
return 'unrecognized'
#/jupman-raise
assert phone_prefix('0461123') == '0461'
assert phone_prefix('+39 0461 4321') == '0461'
assert phone_prefix('0039011 432434') == '011'
assert phone_prefix('328 432434') == 'mobile'
assert phone_prefix('+39340 432434') == 'mobile'
assert phone_prefix('00666011 432434') == 'unrecognized'
assert phone_prefix('12345') == 'unrecognized'
assert phone_prefix('+39 123 12345') == 'unrecognized'
Further resources¶
Have a look at leetcode string problems sorting by Acceptance and Easy.
In particular, you may check:
[ ]:
Lists solutions¶
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-other stuff ...
-exercises
|- lists
|- lists-exercise.ipynb
|- lists-solution.ipynb
|- other stuff ..
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/lists/lists-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Introduction¶
References
Python lists are ordered collections of (homogeneous) objects, but they can hold also non-homogeneous data. List are mutable objects. Elements of the collection are specified within two square brackets [] and are comma separated.
We can use the function print to print the content of lists. Some examples of list definitions follow:
[2]:
my_first_list = [1,2,3]
print("first:" , my_first_list)
my_second_list = [1,2,3,1,3] #elements can appear several times
print("second: ", my_second_list)
fruits = ["apple", "pear", "peach", "strawberry", "cherry"] #elements can be strings
print("fruits:", fruits)
an_empty_list = []
print("empty:" , an_empty_list)
another_empty_list = list()
print("another empty:", another_empty_list)
a_list_containing_other_lists = [[1,2], [3,4,5,6]] #elements can be other lists
print("list of lists:", a_list_containing_other_lists)
my_final_example = [my_first_list, a_list_containing_other_lists]
print("a list of lists of lists:", my_final_example)
first: [1, 2, 3]
second: [1, 2, 3, 1, 3]
fruits: ['apple', 'pear', 'peach', 'strawberry', 'cherry']
empty: []
another empty: []
list of lists: [[1, 2], [3, 4, 5, 6]]
a list of lists of lists: [[1, 2, 3], [[1, 2], [3, 4, 5, 6]]]
Operators for lists¶
Python provides several operators to handle lists. The following behave like on strings (remember that as in strings, the first position is 0!):
While this requires that the whole tested obj is present in the list
and
can also change the corresponding value of the list (lists are mutable objects).
Some examples follow.
[3]:
A = [1, 2, 3 ]
B = [1, 2, 3, 1, 2]
print("A is a ", type(A))
A is a <class 'list'>
[4]:
print(A, " has length: ", len(A))
[1, 2, 3] has length: 3
[5]:
print("A[0]: ", A[0], " A[1]:", A[1], " A[-1]:", A[-1])
A[0]: 1 A[1]: 2 A[-1]: 3
[6]:
print(B, " has length: ", len(B))
[1, 2, 3, 1, 2] has length: 5
[7]:
print("Is A equal to B?", A == B)
Is A equal to B? False
[8]:
C = A + [1, 2]
print(C)
[1, 2, 3, 1, 2]
[9]:
print("Is C equal to B?", B == C)
Is C equal to B? True
[10]:
D = [1, 2, 3]*8
[11]:
E = D[12:18] #slicing
print(E)
[1, 2, 3, 1, 2, 3]
[12]:
print("Is A*2 equal to E?", A*2 == E)
Is A*2 equal to E? True
[13]:
A = [1, 2, 3, 4, 5, 6]
B = [1, 3, 5]
print("A:", A)
print("B:", B)
A: [1, 2, 3, 4, 5, 6]
B: [1, 3, 5]
[14]:
print("Is B in A?", B in A)
Is B in A? False
[15]:
print("A\'s ID:", id(A))
A's ID: 140585721605768
[16]:
A[5] = [1,3,5] #we can add elements
print(A)
[1, 2, 3, 4, 5, [1, 3, 5]]
[17]:
print("A\'s ID:", id(A))
A's ID: 140585721605768
[18]:
print("A has length:", len(A))
A has length: 6
[19]:
print("Is now B in A?", B in A)
Is now B in A? True
Note: When slicing do not exceed the list boundaries (or you will be prompted a list index out of range
error).
Consider the following example:
[20]:
A = [1, 2, 3, 4, 5, 6]
print("A has length:", len(A))
A has length: 6
[21]:
print("First element:", A[0])
First element: 1
print("7th-element: ", A[6])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-67-98687c36d491> in <module>
----> 1 print("7th-element: ", A[6])
IndexError: list index out of range
Example: Consider the matrix \(M = \begin{bmatrix}1 & 2 & 3\\ 1 & 2 & 1\\ 1 & 1 & 3\end{bmatrix}\) and the vector \(v=[10, 5, 10]^T\). What is the matrix-vector product \(M*v\)?
[22]:
M = [[1, 2, 3], [1, 2, 1], [1, 1, 3]]
v = [10, 5, 10]
prod = [0, 0 ,0] #at the beginning the product is the null vector
prod[0]=M[0][0]*v[0] + M[0][1]*v[1] + M[0][2]*v[2]
prod[1]=M[1][0]*v[0] + M[1][1]*v[1] + M[1][2]*v[2]
prod[2]=M[2][0]*v[0] + M[2][1]*v[1] + M[2][2]*v[2]
print("M: ", M)
M: [[1, 2, 3], [1, 2, 1], [1, 1, 3]]
[23]:
print("v: ", v)
v: [10, 5, 10]
[24]:
print("M*v: ", prod)
M*v: [50, 30, 45]
Methods of the class list¶
The class list has some methods to operate on it. Recall from the lecture the following methods:
Note: Lists are mutable objects and therefore virtually all the previous methods (except count) do not have an output value, but they modify the list.
Some usage examples follow:
[25]:
#A numeric list
A = [1, 2, 3]
print(A)
[1, 2, 3]
[26]:
print("A has id:", id(A))
A has id: 140585712305608
[27]:
A.append(72) # appends one and only one object.
# NOTE: does not return anything !!!!
[28]:
print(A)
[1, 2, 3, 72]
[29]:
print("A has id:", id(A))
A has id: 140585712305608
[30]:
A.extend([1, 5, 124, 99]) # adds all these objects, one after the other.
# NOTE: does not return anything !!!
[31]:
print(A)
[1, 2, 3, 72, 1, 5, 124, 99]
[32]:
print("A has id:", id(A)) # same id as before
A has id: 140585712305608
[33]:
D = [9,6,4]
A = A + D # beware: + between lists generates an entirely *new* list !!!!
print(A)
[1, 2, 3, 72, 1, 5, 124, 99, 9, 6, 4]
[34]:
print("A has now id:", id(A)) # id is different from before !!!
A has now id: 140585822899400
[35]:
A.reverse() # Does not return anything !!!
[36]:
print(A)
[4, 6, 9, 99, 124, 5, 1, 72, 3, 2, 1]
[37]:
A.sort()
print(A)
[1, 1, 2, 3, 4, 5, 6, 9, 72, 99, 124]
[38]:
print("Min value: ", A[0]) # In this simple case, could have used min(A)
Min value: 1
[39]:
print("Max value: ", A[-1]) #In this simple case, could have used max(A)
Max value: 124
[40]:
print("Number 1 appears:", A.count(1), " times")
Number 1 appears: 2 times
[41]:
print("While number 837: ", A.count(837))
While number 837: 0
Exercise: growing list 1¶
Given a list la
of fixed size 7, write some code to grow an empty list lb
so that it contains only the elements from la
at even indeces (0, 2, 4, …).
Your code should work for any list
la
of fixed size 7.
# 0 1 2 3 4 5 6 indeces
la=[8,4,3,5,7,3,5]
lb=[]
After your code, you should get:
>>> print(lb)
[8,3,7,5]
[42]:
# 0 1 2 3 4 5 6 indeces
la=[8,4,3,5,7,3,5]
lb=[]
# write here
lb.append(la[0])
lb.append(la[2])
lb.append(la[4])
lb.append(la[6])
print(lb)
[8, 3, 7, 5]
Exercise: growing list 2¶
Given two lists la
and lb
, write some code that MODIFIES la
such that la
contains at the end also all elements of lb
.
NOTE 1: your code should work with any
la
andlb
NOTE 2: If you try to print
id(la)
before modifyingla
andid(la)
afterwords, you should get exactly the same id. If you get a different one, it means you generated an entirely new list. In any case, check how it works in python tutor.
la = [5,9,2,4]
lb = [9,1,2]
You should obtain:
>>> print(la)
[5,9,2,4,9,1,2]
>>> print(lb)
[9,1,2]
[43]:
la = [5,9,2,4]
lb = [9,1,2]
# write here
la.extend(lb)
print(la)
print(lb)
[5, 9, 2, 4, 9, 1, 2]
[9, 1, 2]
List of strings¶
Let’s now try a list with strings, we will try to obtain a a reverse lexicographic order:
[44]:
#A string list
fruits = ["apple", "banana", "pineapple", "cherry","pear", "almond", "orange"]
print(fruits)
['apple', 'banana', 'pineapple', 'cherry', 'pear', 'almond', 'orange']
[45]:
fruits.sort() # does not return anything. Modifies list!
[46]:
print(fruits)
['almond', 'apple', 'banana', 'cherry', 'orange', 'pear', 'pineapple']
[47]:
fruits.reverse()
print(fruits)
['pineapple', 'pear', 'orange', 'cherry', 'banana', 'apple', 'almond']
[48]:
fruits.remove("banana") # NOTE: does not return anything !!!
[49]:
print(fruits)
['pineapple', 'pear', 'orange', 'cherry', 'apple', 'almond']
[50]:
fruits.insert(5, "wild apple") # put wild apple after apple.
# NOTE: does not return anything !!!
[51]:
print(fruits)
['pineapple', 'pear', 'orange', 'cherry', 'apple', 'wild apple', 'almond']
Let’s finally obtain the sorted fruits:
[52]:
fruits.sort() # does not return anything. Modifies list!
[53]:
print(fruits)
['almond', 'apple', 'cherry', 'orange', 'pear', 'pineapple', 'wild apple']
Some things to remember
append and extend work quite differently:
[54]:
A = [1, 2, 3]
A.extend([4, 5])
[55]:
print(A)
[1, 2, 3, 4, 5]
[56]:
B = [1, 2, 3]
B.append([4,5]) # NOTE: append does not return anything !
[57]:
print(B)
[1, 2, 3, [4, 5]]
To remove an object it must exist:
[58]:
A = [1,2,3, [[4],[5,6]], 8]
print(A)
[1, 2, 3, [[4], [5, 6]], 8]
[59]:
A.remove(2) # NOTE: remove does not return anything !!
[60]:
print(A)
[1, 3, [[4], [5, 6]], 8]
[61]:
A.remove([[4],[5,6]]) # NOTE: remove does not return anything !!
[62]:
print(A)
[1, 3, 8]
A.remove(7) # 7 is not present in list, python will complain during execution
# NOTE: remove does not return anything !!
A.remove(7) # 7 is not present in list, python will complain during execution
# NOTE: remove does not return anything !!
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-61-6cfd75f76650> in <module>
----> 1 A.remove(7) # 7 is not present in list, python will complain during execution
2 # NOTE: remove does not return anything !!
ValueError: list.remove(x): x not in list
To sort a list, its elements must be sortable (i.e. homogeneous)!
[63]:
A = [4,3, 1,7, 2]
print(A)
[4, 3, 1, 7, 2]
[64]:
A.sort() # NOTE: sort does not return anything !!
[65]:
print(A)
[1, 2, 3, 4, 7]
[66]:
A.append("banana") # NOTE: append does not return anything !!
[67]:
print(A)
[1, 2, 3, 4, 7, 'banana']
A.sort() # Python will complain, list contains uncomparable elements
# like ints and strings
# NOTE: sort does not return anything !!
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-75-acf26fcfe0bf> in <module>
----> 1 A.sort() # Python will complain, list contains uncomparable elements like ints and strings
2 # NOTE: sort does not return anything !!
TypeError: '<' not supported between instances of 'str' and 'int'
Lists hold references¶
Important to remember:
Lists are mutable objects and this has some consequences! Since lists are mutable objects, they hold references to objects rather than objects.
Take a look at the following examples:
[68]:
l1 = [1, 2]
print("l1:", l1)
l1: [1, 2]
[69]:
l2 = [4, 3]
print("l2:",l2)
l2: [4, 3]
[70]:
LL = [l1, l2]
print("LL:", LL)
LL: [[1, 2], [4, 3]]
[71]:
l1.append(7) # NOTE: does not return anything !!
print("\nAppending 7 to l1...")
print("l1:", l1)
print("LL now: ", LL)
Appending 7 to l1...
l1: [1, 2, 7]
LL now: [[1, 2, 7], [4, 3]]
[72]:
LL[0][1] = -1
print("\nSetting LL[0][1]=-1...")
print("LL now:" , LL)
print("l1 now", l1)
Setting LL[0][1]=-1...
LL now: [[1, -1, 7], [4, 3]]
l1 now [1, -1, 7]
[73]:
# but the list can point also to a different object,
# without affecting the original list.
LL[0] = 100
print("\nSetting LL[0] = 100")
print("LL now:", LL)
print("l1 now", l1)
Setting LL[0] = 100
LL now: [100, [4, 3]]
l1 now [1, -1, 7]
Making copies¶
[74]:
A = ["hi", "there"]
print("A:", A)
A: ['hi', 'there']
[75]:
B = A
print("B:", B)
B: ['hi', 'there']
[76]:
A.extend(["from", "python"]) # NOTE: extend does not return anything !
[77]:
print("A now: ", A)
print("B now: ", B)
A now: ['hi', 'there', 'from', 'python']
B now: ['hi', 'there', 'from', 'python']
Copy example
Let’s make a distinct copy of A
[78]:
C = A[:] # all the elements of A have been copied in C
print("C:", C)
C: ['hi', 'there', 'from', 'python']
[79]:
A[3] = "java"
print("A now:", A)
print("C now:", C)
A now: ['hi', 'there', 'from', 'java']
C now: ['hi', 'there', 'from', 'python']
Be careful though:
[80]:
D = [A, A]
print("D:", D)
D: [['hi', 'there', 'from', 'java'], ['hi', 'there', 'from', 'java']]
[81]:
E = D[:]
print("E:", E)
E: [['hi', 'there', 'from', 'java'], ['hi', 'there', 'from', 'java']]
[82]:
D[0][0] = "hello"
print("\nD now:", D)
print("E now:", E)
print("A now:", A)
D now: [['hello', 'there', 'from', 'java'], ['hello', 'there', 'from', 'java']]
E now: [['hello', 'there', 'from', 'java'], ['hello', 'there', 'from', 'java']]
A now: ['hello', 'there', 'from', 'java']
Equality and identity¶
[83]:
A = [1, 2, 3]
B = A
C = [1, 2, 3]
[84]:
print("Is A equal to B?", A == B)
Is A equal to B? True
[85]:
print("Is A actually B?", A is B)
Is A actually B? True
[86]:
print("Is A equal to C?", A == C)
Is A equal to C? True
[87]:
print("Is A actually C?", A is C)
Is A actually C? False
In fact:
[88]:
print("\nA's id:", id(A))
print("B's id:", id(B))
print("C's id:", id(C))
A's id: 140585712271432
B's id: 140585712271432
C's id: 140585711965896
[89]:
#just to confirm that:
A.append(4) # NOTE: append does not return anything !
[90]:
B.append(5) # NOTE: append does not return anything !
[91]:
print("\nA now: ", A)
print("B now: ", A)
A now: [1, 2, 3, 4, 5]
B now: [1, 2, 3, 4, 5]
From strings to lists, the split
method¶
Strings have a method split that can literally split the string at specific characters.
Example Suppose we have a protein encoded as a multiline-string. How can we split it into several lines?
[92]:
chain_a = """SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
EPHHELPPGSTKRALPNNT"""
lines = chain_a.split('\n')
print("Original sequence:")
print( chain_a, "\n") #some spacing to keep things clear
print("line by line:")
print("1st line:" ,lines[0])
print("2nd line:" ,lines[1])
print("3rd line:" ,lines[2])
print("4th line:" ,lines[3])
print("5th line:" ,lines[4])
print("6th line:" ,lines[5])
print("\nSplit the 1st line in correspondence to FRL:\n",lines[0].split("FRL"))
Original sequence:
SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
EPHHELPPGSTKRALPNNT
line by line:
1st line: SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
2nd line: FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
3rd line: RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
4th line: HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
5th line: IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
6th line: EPHHELPPGSTKRALPNNT
Split the 1st line in correspondence to FRL:
['SSSVPSQKTYQGSYG', 'GFLHSGTAKSVTCTYSPALNKM']
Note that in the last instruction, the substring FRL is disappeared (as happened to the newline).
And back to strings with the join
method¶
Given a list, one can join the elements of the list together into a string by using the join
method of the class string. The syntax is the following: str.join(list) which joins together all the elements in the list in a string separating them with the string str.
Example Given the list ['Oct', '5', '2018', '15:30']
, let’s combine all its elements in a string joining the elements with a dash (“-“) and print them. Let’s finally join them with a tab ("\t"
) and print them.
[93]:
vals = ['Oct', '5th', '2018', '15:30']
print(vals)
myStr = "-".join(vals)
print("\n" + myStr)
myStr = "\t".join(vals)
print("\n" + myStr)
['Oct', '5th', '2018', '15:30']
Oct-5th-2018-15:30
Oct 5th 2018 15:30
Exercise: manylines¶
Given the following text string:
"""this is a text
string on
several lines that does not say anything."""
print it
print how many lines, words and characters it contains.
sort the words alphabetically and print the first and the last in lexicographic order.
You should obtain:
this is a text
string on
several lines that does not say anything.
Lines: 3 words: 13 chars: 66
['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 'x', 't', '\n', 's', 't', 'r', 'i', 'n', 'g', ' ', 'o', 'n', '\n', 's', 'e', 'v', 'e', 'r', 'a', 'l', ' ', 'l', 'i', 'n', 'e', 's', ' ', 't', 'h', 'a', 't', ' ', 'd', 'o', 'e', 's', ' ', 'n', 'o', 't', ' ', 's', 'a', 'y', ' ', 'a', 'n', 'y', 't', 'h', 'i', 'n', 'g', '.']
66
First word: a
Last word: this
['a', 'anything.', 'does', 'is', 'lines', 'not', 'on', 'say', 'several', 'string', 'text', 'that', 'this']
[94]:
s = """this is a text
string on
several lines that does not say anything."""
# write here
# 1) print it
print(s)
print("")
# 2) print the lines, words and characters
lines = s.split('\n')
# NOTE: words are split by a space or a newline!
words = lines[0].split(' ') + lines[1].split(' ') + lines[2].split(' ')
num_chars = len(s)
print("Lines:", len(lines), "words:", len(words), "chars:", num_chars)
# alternative way for number of characters:
print("")
characters = list(s)
num_chars2 = len(characters)
print(characters)
print(num_chars2)
# 3. sort the words alphabetically and print the first and the last in lexicographic order.
words.sort() # NOTE: it does not return ANYTHING!!!
print("")
print("First word: ", words[0])
print("Last word:", words[-1])
print(words)
this is a text
string on
several lines that does not say anything.
Lines: 3 words: 13 chars: 66
['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 'x', 't', '\n', 's', 't', 'r', 'i', 'n', 'g', ' ', 'o', 'n', '\n', 's', 'e', 'v', 'e', 'r', 'a', 'l', ' ', 'l', 'i', 'n', 'e', 's', ' ', 't', 'h', 'a', 't', ' ', 'd', 'o', 'e', 's', ' ', 'n', 'o', 't', ' ', 's', 'a', 'y', ' ', 'a', 'n', 'y', 't', 'h', 'i', 'n', 'g', '.']
66
First word: a
Last word: this
['a', 'anything.', 'does', 'is', 'lines', 'not', 'on', 'say', 'several', 'string', 'text', 'that', 'this']
Exercise: welldone¶
Given the list
L = ["walnut", "eggplant", "lemon", "lime", "date", "onion", "nectarine", "endive" ]:
Create another list (called
newList
) containing the first letter of each element ofL
(e.gnewList =["w", "e", ...]
).Add a space to newList at position 4 and append an exclamation mark (
!
) at the end.Print the list.
Print the content of the list joining all the elements with an empty space (i.e. use the method
join
:"".join(newList)
)
You should obtain:
['w', 'e', 'l', 'l', ' ', 'd', 'o', 'n', 'e', '!']
well done!
[95]:
L = ["walnut", "eggplant", "lemon", "lime", "date", "onion", "nectarine", "endive" ]
# write here
newList = []
newList.append(L[0][0])
newList.append(L[1][0])
newList.append(L[2][0])
newList.append(L[3][0])
newList.append(L[4][0])
newList.append(L[5][0])
newList.append(L[6][0])
newList.append(L[7][0])
newList.insert(4," ")
newList.append("!")
print(newList)
print("\n", "".join(newList))
['w', 'e', 'l', 'l', ' ', 'd', 'o', 'n', 'e', '!']
well done!
Exercise: numlist¶
Given the list lst = [10, 60, 72, 118, 11, 71, 56, 89, 120, 175]
find the min, max and median value (hint: sort it and extract the right values).
Create a list with only the elements at even indexes (i.e. [10, 72, 11, ..], note that the “..” means that the list is not complete!) and re-compute min, max and median values.
re-do the same for the elements located at odd indexes (i.e. [60, 118,..]).
You should obtain:
lst: [10, 60, 72, 118, 11, 71, 56, 89, 120, 175]
even: [10, 72, 11, 56, 120]
odd: [60, 118, 71, 89, 175]
sorted: [10, 11, 56, 60, 71, 72, 89, 118, 120, 175]
sorted even: [10, 11, 56, 72, 120]
sorted odd: [60, 71, 89, 118, 175]
lst : Min: 10 Max. 175 Median: 72
even: Min: 10 Max. 120 Median: 56
odd: Min: 60 Max. 175 Median: 89
[2]:
lst = [10, 60, 72, 118, 11, 71, 56, 89, 120, 175]
# write here
even = L[0::2] #get only even-indexed elements
odd = L[1::2] #get only odd-indexed elements
print("lst:" , lst)
print("Leven:", even)
print("Lodd:", odd)
lst.sort()
even.sort()
odd.sort()
print("sorted: " , lst)
print("sorted even: " , even)
print("sorted odd: " , odd)
print("lst: Min: ", lst[0], " Max." , lst[-1], " Median: ", lst[len(lst) // 2])
print("even: Min: ", even[0], " Max." , even[-1], " Median: ", even[len(even) // 2])
print("odd: Min: ", odd[0], " Max." , odd[-1], " Median: ", odd[len(odd) // 2])
lst: [10, 60, 72, 118, 11, 71, 56, 89, 120, 175]
Leven: [10, 56, 71, 89, 120]
Lodd: [11, 60, 72, 118, 175]
sorted: [10, 11, 56, 60, 71, 72, 89, 118, 120, 175]
sorted even: [10, 56, 71, 89, 120]
sorted odd: [11, 60, 72, 118, 175]
lst: Min: 10 Max. 175 Median: 72
even: Min: 10 Max. 120 Median: 71
odd: Min: 11 Max. 175 Median: 72
List comprehension¶
List comprehension is a quick way of creating a list. The resulting list is normally obtained by applying a function or a method to the elements of another list that remains unchanged.
The basic syntax is:
new_list = [ some_function (x) for x in start_list]
or
new_list = [ x.some_method() for x in start_list]
List comprehension can also be used to filter elements of a list and produce another list as sublist of the first one (remember that the original list is not changed).
In this case the syntax is:
new_list = [ some_function (x) for x in start_list if condition]
or
new_list = [ x.some_method() for x in start_list if condition]
where the element x in start_list becomes part of new_list if and only if the condition holds True.
Let’s see some examples:
Example: Given a list of strings [“hi”, “there”, “from”, “python”] create a list with the length of the corresponding element (i.e. the one with the same index).
[97]:
elems = ["hi", "there", "from", "python"]
newList = [len(x) for x in elems]
for i in range(0,len(elems)):
print(elems[i], " has length ", newList[i])
hi has length 2
there has length 5
from has length 4
python has length 6
Example: Given a list of strings [“dog”, “cat”, “rabbit”, “guinea pig”, “hamster”, “canary”, “goldfish”] create a list with the elements starting with a “c” or “g”.
[98]:
pets = ["dog", "cat", "rabbit", "guinea pig", "hamster", "canary", "goldfish"]
cg_pets = [x for x in pets if x.startswith("c") or x.startswith("g")]
print("Original:")
print(pets)
print("Filtered:")
print(cg_pets)
Original:
['dog', 'cat', 'rabbit', 'guinea pig', 'hamster', 'canary', 'goldfish']
Filtered:
['cat', 'guinea pig', 'canary', 'goldfish']
Example: Create a list with all the numbers divisible by 17 from 1 to 200.
[99]:
values = [ x for x in range(1,200) if x % 17 == 0]
print(values)
[17, 34, 51, 68, 85, 102, 119, 136, 153, 170, 187]
Example: Transpose the matrix \(\begin{bmatrix}1 & 10\\2 & 20\\3 & 30\\4 & 40\end{bmatrix}\) stored as a list of lists (i.e. matrix = [[1, 10], [2,20], [3,30], [4,40]]). The output matrix should be: \(\begin{bmatrix}1 & 2 & 3 & 4\\10 & 20 & 30 & 40\end{bmatrix}\), represented as [[1, 2, 3, 4], [10, 20, 30, 40]]
[100]:
matrix = [[1, 10], [2,20], [3,30], [4,40]]
print(matrix)
transpose = [[row[i] for row in matrix] for i in range(2)]
print (transpose)
[[1, 10], [2, 20], [3, 30], [4, 40]]
[[1, 2, 3, 4], [10, 20, 30, 40]]
Example: Given the list:
["Hotel", "Icon"," Bus","Train", "Hotel", "Eye", "Rain", "Elephant"]
create a list with all the first letters.
[101]:
myList = ["Hotel", "Icon"," Bus","Train", "Hotel", "Eye", "Rain", "Elephant"]
initials = [x[0] for x in myList]
print(myList)
print(initials)
print("".join(initials))
['Hotel', 'Icon', ' Bus', 'Train', 'Hotel', 'Eye', 'Rain', 'Elephant']
['H', 'I', ' ', 'T', 'H', 'E', 'R', 'E']
HI THERE
Exercises with functions¶
ATTENTION
Following exercises require you to know:
Complex statements: Andrea Passerini slides A03
Functions: Andrea Passerini slides A04
printwords¶
✪ Write a function printwords
that PRINTS all the words in a phrase
>>> printwords("ciao come stai?")
ciao
come
stai?
[102]:
# write here
phrase = "ciao come stai?"
def printwords(f):
my_list = f.split() # DO *NOT* create a variable called 'list' !!!!
for word in my_list:
print(word)
printwords(phrase)
ciao
come
stai?
printeven¶
✪ Write a function printeven(numbers)
that PRINTS all even numbers in a list of numbers xs
>>> printeven([1,2,3,4,5,6])
2
4
6
[103]:
# write here
def printeven(xs):
for x in xs:
if x % 2 == 0:
print(x)
numbers = [1,2,3,4,5,6]
printeven(numbers)
2
4
6
find26¶
✪ Write a function that RETURN True
if the number 26 is contained in a list of numbers
>>> find26( [1,26,143,431,53,6] )
True
[104]:
# write here
def find26(xs):
return (26 in xs)
numbers = [1,26,143,431,53,6]
find26(numbers)
[104]:
True
firstsec¶
✪ Write a function firstsec(s)
that PRINTS the first and second word of a phrase.
to find a list of words, you can use
.split()
method
>>> firstsec("ciao come stai?")
ciao come
[105]:
# write here
def firstsec(s):
my_list = phrase.split() # DO *NOT* create a variable called 'list' !!!!
print(my_list[0], my_list[1])
phrase = "ciao come stai?"
firstsec(phrase)
ciao come
threeven¶
✪ Write a function that PRINTS "yes"
if first three elements of a list are even numbers. Otherwise, the function must PRINT "no"
. In case the list contains less than three elements, PRINT "not good"
>>> threeven([6,4,8,4,5])
yes
>>> threeven([2,5,6,3,4,5])
no
>>> threeven([4])
not good
[106]:
# write here
def threeven(xs):
if len(xs) >= 3:
if xs[0] % 2 == 0 and xs[1] % 2 == 0 and xs[2] % 2 == 0:
print("yes")
else:
print("no")
else:
print("not good")
threeven([6,4,8,4,5])
threeven([2,5,6,3,4,5])
threeven([4])
yes
no
not good
separate_ip¶
✪ An IP address is a string with four sequences of numbers (of max length 3), separated by a dot .
. For example, 192.168.19.34
and 255.31.1.0
are IP addresses.
Write a function that given an IP address as input, PRINTS the numbers inside the IP address
NOTE: do NOT use
.replace
method !
>>> separate_ip("192.168.0.1")
192
168
0
1
[107]:
# write here
def separate_ip(s):
separated = s.split(".")
for element in separated:
print(element)
separate_ip("192.168.0.1")
192
168
0
1
average¶
✪ Given a list of integer numbers, write a function average(xs)
that RETURNS the arithmetic average of the numbers it contains. If the given list is empty, RETURN zero.
>>> x = average([3,4,2,3]) # ( 10/4 => 2.5)
>>> x
2.5
>>> y = average([])
>>> y
0
>>> z = average([ 30, 28 , 20, 29 ])
>>> z
26.75
[108]:
# write here
def average(xs):
if len(xs) == 0:
return 0
else:
total = 0
for x in xs:
total = total + x
return(total / len(xs))
av = average([])
print(av)
average([30,28,20,29])
0
[108]:
26.75
Verify comprehension¶
ATTENTION
Following exercises require you to know:
Complex statements: Andrea Passerini slides A03
Functions: Andrea Passerini slides A04
Tests with asserts: Following exercises contain automated tests to help you spot errors. To understand how to do them, read before Error handling and testing
We will discuss differences between modifying a list and returning a new one, and look into basic operations like transform, filter, mapping.
Mapping¶
Generally speaking, mapping (or transform) operations take something in input and gives back the same type of thing with elements somehow changed.
In these cases, pay attention if it is required to give back a NEW list or MODIFY the existing list.
newdoublefor¶
Difficulty: ✪
[109]:
def newdoublefor(lst):
""" Takes a list of integers in input and RETURN a NEW one with all
the numbers of lst doubled. Implement it with a for.
Example:
newdouble([3,7,1])
returns:
[6,14,2]
"""
#jupman-raise
ret = []
for x in lst:
ret.append(x*2)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert newdoublefor([]) == []
assert newdoublefor([3]) == [6]
assert newdoublefor([3,7,1]) == [6,14,2]
l = [3,7,1]
assert newdoublefor(l) == [6,14,2]
assert l == [3,7,1]
# TEST END
double¶
Difficulty: ✪✪
[110]:
def double(lst):
""" Takes a list of integers in input and MODIFIES it by doubling all the numbers
"""
#jupman-raise
for i in range(len(lst)):
lst[i] = lst[i] * 2
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
l = []
double(l)
assert l == []
l = [3]
double(l)
assert l == [6]
l = [3,7,1]
double(l)
assert l == [6,14,2]
# TEST END
newdoublecomp¶
Difficulty: ✪
[111]:
def newdoublecomp(lst):
""" Takes a list of integers in input and RETURN a NEW one with all
the numbers of lst doubled. Implement it as a list comprehnsion
Example:
newdouble([3,7,1])
returns:
[6,14,2]
"""
#jupman-raise
return [x*2 for x in lst]
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert newdoublecomp([]) == []
assert newdoublecomp([3]) == [6]
assert newdoublecomp([3,7,1]) == [6,14,2]
l = [3,7,1]
assert newdoublecomp(l) == [6,14,2]
assert l == [3,7,1]
# TEST END
up¶
Difficulty: ✪
[112]:
def up(lst):
""" Takes a list of strings and RETURN a NEW list having all the strings in lst in capital
(use .upper() method and a list comprehension )
"""
#jupman-raise
return [x.upper() for x in lst]
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert up([]) == []
assert up(['']) == ['']
assert up(['a']) == ['A']
assert up(['aA']) == ['AA']
assert up(['Ba']) == ['BA']
assert up(['Ba', 'aC']) == ['BA','AC']
assert up(['Ba dA']) == ['BA DA']
l = ['ciAo']
assert up(l) == ['CIAO']
assert l == ['ciAo']
# TEST END
Filter¶
Generally speaking, filter operations take something in input and give back the same type of thing with elements somehow filtered out.
In these cases, pay attention if it is required to give back a NEW list or MODIFY the existing list.
remall¶
Difficulty: ✪✪
[113]:
def remall(list1, list2):
""" RETURN a NEW list which has the elements from list2 except the elements in list1
"""
#jupman-raise
list3 = list2[:]
for x in list1:
if x in list3:
list3.remove(x)
return list3
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert remall([],[]) == []
assert remall(['a'], []) == []
assert remall([], ['a']) == ['a']
assert remall(['a'], ['a']) == []
assert remall(['b'], ['a']) == ['a']
assert remall(['a', 'b'], ['a','c','b']) == ['c']
assert remall(['a','d'], ['a','c','d','b']) == ['c', 'b']
# TEST END
only_capital_for¶
Difficulty: ✪
[114]:
def only_capital_for(lst):
""" Takes a list of strings lst and RETURN a NEW list which only contains the strings
of lst which are all in capital letters (so keeps 'AB' but not 'aB')
Implement it with a for
"""
#jupman-raise
ret = []
for el in lst:
if el.isupper():
ret.append(el)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert only_capital_for(["CD"]) == [ "CD"]
assert only_capital_for(["ab"]) == []
assert only_capital_for(["dE"]) == []
assert only_capital_for(["De"]) == []
assert only_capital_for(["ab","DE"]) == ["DE"]
assert only_capital_for(["ab", "CD", "Hb", "EF"]) == [ "CD", "EF"]
# TEST END
only_capital_comp¶
Difficulty: ✪
[115]:
def only_capital_comp(lst):
""" Takes a list of strings lst and RETURN a NEW list which only contains the strings
of lst which are all in capital letters (so keeps 'AB' but not 'aB')
Implement it with a list comprehension
"""
#jupman-raise
return [el for el in lst if el.isupper() ]
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert only_capital_comp(["CD"]) == [ "CD"]
assert only_capital_comp(["ab"]) == []
assert only_capital_comp(["dE"]) == []
assert only_capital_comp(["De"]) == []
assert only_capital_comp(["ab","DE"]) == ["DE"]
assert only_capital_comp(["ab", "CD", "Hb", "EF"]) == [ "CD", "EF"]
# END
Reduce¶
Generally speaking, reduce operations involve operating on sets of elements and giving back an often smaller result.
In these cases, we operate on lists. Pay attention if it is required to give back a NEW list or MODIFY the existing list.
sum_all¶
Difficulty: ✪
[116]:
def sum_all(lst):
""" RETURN the sum of all elements in lst
Implement it as you like.
"""
#jupman-raise
return sum(lst)
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert sum_all([]) == 0
assert sum_all([7,5]) == 12
assert sum_all([9,5,8]) == 22
# TEST END
sum_all_even_for¶
Difficulty: ✪
[117]:
def sum_all_even_for(lst):
""" RETURN the sum of all even elements in lst
Implement it with a for
"""
#jupman-raise
ret = 0
for el in lst:
if el % 2 == 0:
ret += el
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert sum_all_even_for([]) == 0
assert sum_all_even_for([9]) == 0
assert sum_all_even_for([4]) == 4
assert sum_all_even_for([7,2,5,8]) == 10
# END
sum_all_even_comp¶
Difficulty: ✪
[118]:
def sum_all_even_comp(lst):
""" RETURN the sum of all even elements in lst
Implement it in one line as an operation on a list comprehension
"""
#jupman-raise
return sum([el for el in lst if el % 2 == 0])
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert sum_all_even_comp([]) == 0
assert sum_all_even_comp([9]) == 0
assert sum_all_even_comp([4]) == 4
assert sum_all_even_comp([7,2,5,8]) == 10
# END
Other exercises¶
contains¶
✪ RETURN True
if elem
is present in list, otherwise RETURN False
[119]:
def contains(xs, x):
#jupman-raise
return x in xs
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert contains([],'a') == False
assert contains(['a'],'a') == True
assert contains(['a','b','c'],'b') == True
assert contains(['a','b','c'],'z') == False
# END TEST
firstn¶
✪ RETURN a list with the first numbers from 0
included to n
excluded
For example,
firstn(3)
must RETURN[0,1,2]
if
n
< 0, RETURN an empty list
Ingredients:
variable list to return
variable counter
cycle
while
(there also other ways)return
[120]:
def firstn(n):
#jupman-raise
ret = []
counter = 0
while counter < n:
ret.append(counter)
counter += 1
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert firstn(-1) == []
assert firstn(-2) == []
assert firstn(0) == []
assert firstn(1) == [0]
assert firstn(2) == [0,1]
assert firstn(3) == [0,1,2]
# TEST END
firstlast¶
✪ RETURN True
if the first element of a list is equal to the last one, otherwise RETURN False
NOTE: you can assume the list always contains at least one element.
[121]:
def firstlast(xs):
#jupman-raise
return xs[0] == xs[-1]
# note: the comparation xs[0] == xs[-1] is an EXPRESSION which generates a boolean,
# in this case True if the first character is equal to the last one and False otherwise
# so we can directly return the result of the expression
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert firstlast(['a']) == True
assert firstlast(['a','a']) == True
assert firstlast(['a','b']) == False
assert firstlast(['a','b','a']) == True
assert firstlast(['a','b','c','a']) == True
assert firstlast(['a','b','c','d']) == False
# TEST END
dup¶
✪ RETURN a NEW list, in which each list element in input is duplicated. For example,
dup(['ciao','mondo','python'])
must RETURN
['ciao','ciao','mondo','mondo','python','python']
Ingredients: - variable for a new list - for cycle - return
[122]:
def dup(xs):
#jupman-raise
ret = []
for x in xs:
ret.append(x)
ret.append(x)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert dup([]) == []
assert dup(['a']) == ['a','a']
assert dup(['a','b']) == ['a','a','b','b']
assert dup(['a','b','c']) == ['a','a','b','b','c','c']
assert dup(['a','a']) == ['a','a','a','a']
assert dup(['a','a','b','b']) == ['a','a','a','a','b','b','b','b']
# TEST END
hasdup¶
✪✪ RETURN True
if xs
contains element x
more than once, otherwise RETURN False
.
[123]:
def hasdup(x, xs):
#jupman-raise
counter = 0
for y in xs:
if y == x:
counter += 1
if counter > 1:
return True
return False
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert hasdup("a", []) == False
assert hasdup("a", ["a"]) == False
assert hasdup("a", ["a", "a"]) == True
assert hasdup("a", ["a", "a", "a"]) == True
assert hasdup("a", ["b", "a", "a"]) == True
assert hasdup("a", ["b", "a", "a", "a"]) == True
assert hasdup("b", ["b", "a", "a", "a"]) == False
assert hasdup("b", ["b", "a", "b", "a"]) == True
# TEST END
ord3¶
✪✪ RETURN True
if provided list has first elements increasingly ordered, False
otherwise
if
xs
has less than three elements, RETURNFalse
[124]:
def ord3(xs):
#jupman-raise
if len(xs) >= 3:
return xs[0] <= xs[1] and xs[1] <= xs[2]
else:
return False
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert ord3([5]) == False
assert ord3([4,7]) == False
assert ord3([4,6,9]) == True
assert ord3([4,9,7]) == False
assert ord3([9,5,7]) == False
assert ord3([4,8,9,1,5]) == True # first 3 elements increasing
assert ord3([9,4,8,10,13]) == False # first 3 elements NOT increasing
# TEST END
filterab¶
✪✪ Takes as input a list of characters, and RETURN a NEW list containing only the characters 'a'
and 'b'
found in the input list.
Example:
filterab(['c','a','c','d','b','a','c','a','b','e'])
must return
['a','b','a','a','b']
[125]:
def filterab(xs):
#jupman-raise
ret = []
for x in xs:
if x == 'a' or x == 'b':
ret.append(x)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert filterab([]) == []
assert filterab(['a']) == ['a']
assert filterab(['b']) == ['b']
assert filterab(['a','b']) == ['a','b']
assert filterab(['a','b','c']) == ['a','b']
assert filterab(['a','c','b']) == ['a','b']
assert filterab(['c','a','b']) == ['a','b']
assert filterab(['c','a','c','d','b','a','c','a','b','e']) == ['a','b','a','a','b']
l = ['a','c','b']
assert filterab(l) == ['a','b'] # verify a NEW list is returned
assert l == ['a','c','b'] # verify original list was NOT modified
# TEST END
hill¶
✪✪ RETURN a list having as with first elements the numbers from one to n
increasing, and after n
the decrease until 1
included. NOTE: n
is contained only once.
Example:
hill(4)
must return
[1,2,3,4,3,2,1]
Ingredients: - variable for the list to return - two for cycles one after the other and range
functions or two while
one after the other
[126]:
def hill(n):
#jupman-raise
ret = []
for i in range(1,n):
ret.append(i)
for i in range(n,0,-1):
ret.append(i)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert hill(0) == []
assert hill(1) == [1]
assert hill(2) == [1,2,1]
assert hill(3) == [1,2,3,2,1]
assert hill(4) == [1,2,3,4,3,2,1]
assert hill(5) == [1,2,3,4,5,4,3,2,1]
# TEST END
peak¶
✪✪ Suppose in a list are saved the heights of a mountain road taking a measure every 3 km (we assume the road constantly goes upward). At a certain point, you will arrive at the mountain peak where you will measure the height with respect to the sea. Of course, there is also a road to go down hill (constantly downward) and here also the height will be measured every 3 km.
A measurement example is [100, 400, 800, 1220, 1600, 1400, 1000, 300, 40]
Write a function that RETURNS the value from the list which corresponds to the measurement taken at the peak
if the list contains less than three elements, raise exception
ValueError
>>> peak([100,400, 800, 1220, 1600, 1400, 1000, 300, 40])
1600
[127]:
def peak(xs):
#jupman-raise
if len(xs) < 3:
raise ValueError("Empty list !")
if len(xs) == 1:
return xs[0]
for i in range(len(xs)):
if xs[i] > xs[i+1]:
return xs[i]
return xs[-i] # road without way down
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
try:
peak([]) # with this anomalous list we expect the excpetion ValueError is raised
raise Exception("Shouldn't arrive here!")
except ValueError: # if exception is raised, it is behaving as expected and we do nothing
pass
assert peak([5,40,7]) == 40
assert peak([5,30,4]) == 30
assert peak([5,70,70, 4]) == 70
assert peak([5,10,80,25,2]) == 80
assert peak([100,400, 800, 1220, 1600, 1400, 1000, 300, 40]) == 1600
even¶
✪✪ RETURN a list containing the elements at even position, starting from zero which is considered even
you can assume the input list always contains an even number of elements
HINT: remember that
range
can take three parameters
[128]:
def even(xs):
#jupman-raise
ret = []
for i in range(0,len(xs),2):
ret.append(xs[i])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert even([]) == []
assert even(['a','b']) == ['a']
assert even(['a','b','c','d']) == ['a', 'c']
assert even(['a','b','a','c']) == ['a', 'a']
assert even(['a','b','c','d','e','f']) == ['a', 'c','e']
# TEST END
mix¶
✪✪ RETURN a NEW list in which the elements are taken in alternation from lista
and listb
you can assume that
lista
andlistb
contain the same number of elements
Example:
mix(['a', 'b','c'], ['x', 'y','z'])
must give
['a', 'x', 'b','y', 'c','z']
[129]:
def mix(lista, listb):
#jupman-raise
ret = []
for i in range(len(lista)):
ret.append(lista[i])
ret.append(listb[i])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert mix([], []) == []
assert mix(['a'], ['x']) == ['a', 'x']
assert mix(['a'], ['a']) == ['a', 'a']
assert mix(['a', 'b'], ['x', 'y']) == ['a', 'x', 'b','y']
assert mix(['a', 'b','c'], ['x', 'y','z']) == ['a', 'x', 'b','y', 'c','z']
# TEST END
fill¶
✪✪ Takes a list lst1
of n
elements and a list lst2
of m
elements, and MODIFIES lst2
by copying all lst1
elements in the first n
positions of lst2
If
n
>m
, raises a ValueError
[130]:
def fill(lst1, lst2):
#jupman-raise
if len(lst1) > len(lst2):
raise ValueError("List 1 is bigger than list 2 ! lst_a = %s, lst_b = %s" % (len(lst1), len(lst2)))
j = 0
for x in lst1:
lst2[j] = x
j += 1
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
try:
fill(['a','b'], [None])
raise Exception("TEST FAILED: Should have failed before with a ValueError!")
except ValueError:
"Test passed"
try:
fill(['a','b','c'], [None,None])
raise Exception("TEST FAILED: Should have failed before with a ValueError!")
except ValueError:
"Test passed"
L1 = []
R1 = []
fill(L1, R1)
assert L1 == []
assert R1 == []
L = []
R = ['x']
fill(L, R)
assert L == []
assert R == ['x']
L = ['a']
R = ['x']
fill(L, R)
assert L == ['a']
assert R == ['a']
L = ['a']
R = ['x','y']
fill(L, R)
assert L == ['a']
assert R == ['a','y']
L = ['a','b']
R = ['x','y']
fill(L, R)
assert L == ['a','b']
assert R == ['a','b']
L = ['a','b']
R = ['x','y','z',]
fill(L, R)
assert L == ['a','b']
assert R == ['a','b','z']
L = ['a']
R = ['x','y','z',]
fill(L, R)
assert L == ['a']
assert R == ['a','y','z']
# TEST END
nostop¶
✪✪ When you analyze a phrase, it might be useful processing it to remove very common words, for example articles and prepositions: "a book on Python"
can be simplified in "book Python"
The ‘not so useful’ words are called stopwords. For example, this process is done by search engines to reduce the complexity of input string provided ny the user.
Implement a function which takes a string and RETURN the input string without stopwords
Implementa una funzione che prende una stringa e RITORNA la stringa di input senza le stopwords
HINT 1: Python strings are immutable ! To rimove words you need to create a new string from the original string
HINT 2: create a list of words with:
words = stringa.split(" ")
HINT 3: transform the list as needed, and then build the string to return with " ".join(lista)
[131]:
def nostop(s, stopwords):
#jupman-raise
words = s.split(" ")
for s in stopwords:
if s in words:
words.remove(s)
return " ".join(words)
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert nostop("a", ["a"]) == ""
assert nostop("a", []) == "a"
assert nostop("", []) == ""
assert nostop("", ["a"]) == ""
assert nostop("a book", ["a"]) == "book"
assert nostop("a book on Python", ["a","on"]) == "book Python"
assert nostop("a book on Python for beginners", ["a","the","on","at","in", "of", "for"]) == "book Python beginners"
# TEST END
threes¶
✪✪ To check if an integer is divisible for a number n
, you can check the reminder of the integer division by x
and n
is equal to zero using the operator %
:
[132]:
0 % 3
[132]:
0
[133]:
1 % 3
[133]:
1
[134]:
2 % 3
[134]:
2
[135]:
3 % 3
[135]:
0
[136]:
4 % 3
[136]:
1
[137]:
5 % 3
[137]:
2
[138]:
6 % 3
[138]:
0
Now implement the following function:
[139]:
def threes(lst):
""" RETURN a NEW lst with the same elements of lst, except the ones at indeces which are divisible by 3.
In such cases, the output list will contain a the string 'z'
"""
#jupman-raise
ret = []
for i in range(len(lst)):
if i % 3 == 0:
ret.append('z')
else:
ret.append(lst[i])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert threes([]) == []
assert threes(['a']) == ['z']
assert threes(['a','b']) == ['z','b']
assert threes(['a','b','c']) == ['z','b','c']
assert threes(['a','b','c','d']) == ['z','b','c','z']
assert threes(['f','c','s','g','a','w','a','b']) == ['z','c','s','z','a','w','z','b']
# TEST END
list_to_int¶
Given a non-empty array of digits representing a non-negative integer, return a proper python integer
The digits are stored such that the most significant digit is at the head of the list, and each element in the array contain a single digit.
You may assume the integer does not contain any leading zero, except the number 0 itself.
Example:
Input: [3,7,5]
Output: 375
Input: [2,0]
Output: 20
Input: [0]
Output: 0
list_to_int_dirty¶
✪✪ This is the totally dirty approach, but may be fun (never do this in real life - prefer instead the next list_to_int_proper
approach).
convert the list to a string
'[5,7,4]'
using the functionstr()
remove from the string
[
,']
and the commas,
using the method.replace(str1, str2)
which returns a NEW string withstr1
replaced forstr2
convert the string to an integer using the special function
int()
and return it
[140]:
def list_to_int_dirty(lst):
#jupman-raise
s = str(lst)
stripped = s.replace('[', '').replace(']','').replace(',','').replace(' ', '')
n = int(stripped)
return n
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert list_to_int_dirty([0]) == 0
assert list_to_int_dirty([1]) == 1
assert list_to_int_dirty([2]) == 2
assert list_to_int_dirty([92]) == 92
assert list_to_int_dirty([5,7,4]) == 574
# TEST END
list_to_int¶
✪✪ The proper way is to follow rules of math. To do it, keep in mind that
For our purposes, it is better to rewrite the formula like this:
Basically, we are performing a sum \(4\) times. Each time and starting from the least significant digit, the digit in consideration is multiplied for a progressivly bigger power of 10, starting from \(10^0 = 1\) up to \(10^4=1000\).
To understand how it could work in Python, we might progressivly add stuff to a cumulator variable c
like this:
c = 0
c = c + 6*1
c = c + 4*10
c = c + 7*100
c = c + 5*1000
In a more pythonic and concise way, we would write:
c = 0
c += 6*1
c += 4*10
c += 7*100
c += 5*1000
So first of all to get the 6,4,7,5 it might help to try scanning the list in reverse order using the function reversed
(notice the ed
at the end!)
[141]:
for x in reversed([5,7,4,6]):
print(x)
6
4
7
5
Once we have such sequence, we need a way to get a sequence of progressively increasing powers of 10. To do so, we might use a variable power
:
[142]:
power = 1
for x in reversed([5,7,4,6]):
print (power)
power = power * 10
1
10
100
1000
Now you should have the necessary elements to implement the required function by yourself.
PLEASE REMEMBER: if you can’t find a general solution, keep trying with constants and write down all the passages you do. Then in new cells try substituting the constants with variables and keep experimenting - it’s the best method to spot patterns !
[143]:
def list_to_int(lst):
""" RETURN a Python integer which is represented by the provided list of digits, which always
represent a number >= 0 and has no trailing zeroes except for special case of number 0.
Example:
Input: [3,7,5]
Output: 375
Input: [2,0]
Output: 20
Input: [0]
Output: 0
"""
#jupman-raise
power = 1
num = 0
for digit in reversed(lst):
num += power * digit
power = power * 10
return num
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert list_to_int([0]) == 0
assert list_to_int([1]) == 1
assert list_to_int([2]) == 2
assert list_to_int([92]) == 92
assert list_to_int([90]) == 90
assert list_to_int([5,7,4]) == 574
# END
int_to_list¶
✪✪ Let’s now try the inverse operation, that is, going from a proper Python number like 574
to a list [5,7,4]
To do so, we must exploit integer division //
and reminder operator %
.
Let’s say we want to get the final digit 4
out of 574
. To do so, we can notice that 4
is the reminder of integer division between 547
and 10
:
[144]:
574 % 10
[144]:
4
This extracts the four, but if we want to find an algorithm for our problem, we must also find a way to progressively reduce the problem size. To do so, we can exploit the integer division operator //
:
[145]:
574 // 10
[145]:
57
Now, given any integer number, you know how to
extract last digit
reduce the problem for the next iteration
This should be sufficient to proceed. Pay attention to special case for input 0
.
[146]:
def int_to_list(num):
""" Takes an integer number >= 0 and RETURN a list of digits representing the number in base 10.
Example:
Input: 375
Output: [3,7,5]
Input: 20
Output: [2,0]
Input: 0
Output: [0]
"""
#jupman-raise
if num == 0:
return [0]
else:
ret = []
d = num
while d > 0:
digit = d % 10 # remainder of d divided by 10
ret.append(digit)
d = d // 10
return list(reversed(ret))
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert int_to_list(0) == [0]
assert int_to_list(1) == [1]
assert int_to_list(2) == [2]
assert int_to_list(92) == [9,2]
assert int_to_list(90) == [9,0]
assert int_to_list(574) == [5,7,4]
# TEST END
add one¶
Given a non-empty array of digits representing a non-negative integer, adds one to the integer.
The digits are stored such that the most significant digit is at the head of the list, and each element in the array contain a single digit.
You may assume the integer does not contain any leading zero, except the number 0 itself.
For example:
Input: [1,2,3]
Output: [1,2,4]
Input: [3,6,9,9]
Output: [3,7,0,0]
Input: [9,9,9,9]
Output: [1,0,0,0,0]
There are two ways to solve this exercise: you can convert to a proper integer, add one, and then convert back to list which you will do in add_one_conv
. The other way is to directly operate on a list, using a carry variable, which you will do in add_one_carry
add_one_conv¶
✪✪✪ You need to do three steps:
Convert to a proper python integer
add one to the python integer
convert back to a list and return it
[147]:
def add_one_conv(lst):
"""
Takes a list of digits representing a >= 0 integer without trailing zeroes except zero itself
and RETURN a NEW a list representing the value of lst plus one.
Implement by calling already used implemented functions.
"""
#jupman-raise
power = 1
num = list_to_int(lst)
return int_to_list(num + 1)
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert add_one_conv([0]) == [1]
assert add_one_conv([1]) == [2]
assert add_one_conv([2]) == [3]
assert add_one_conv([9]) == [1, 0]
assert add_one_conv([5,7]) == [5, 8]
assert add_one_conv([5,9]) == [6, 0]
assert add_one_conv([9,9]) == [1, 0, 0]
# TEST END
add_one_carry¶
✪✪✪ Given a non-empty array of digits representing a non-negative integer, adds one to the integer.
The digits are stored such that the most significant digit is at the head of the list, and each element in the array contain a single digit.
You may assume the integer does not contain any leading zero, except the number 0 itself.
For example:
Input: [1,2,3]
Output: [1,2,4]
Input: [3,6,9,9]
Output: [3,7,0,0]
Input: [9,9,9,9]
Output: [1,0,0,0,0]
To implement it, directly operate on the list, using a carry
variable (riporto in italian).
Just follow addition as done in elementary school. Start from the last digit and sum one:
If you get a number <= 9, that is the result of summing last two digits, and the rest is easy:
596+ carry=0
001
----
7 6 + 1 + carry = 7
596+ carry=0
001
----
97 9 + 0 + carry = 9
596+ carry=0
001
----
07 5 + 0 + carry = 5
If you get a number bigger than 9, then you put zero and set carry
to one:
3599+ carry=0
0001
-----
0 9 + 1 + carry = 10 # >9, will write zero and set carry to 1
`3599+ carry=1 0001 ---- 00 9 + 0 + carry = 10 # >9, will write zero and set carry to 1
3599+ carry=1
0001
-----
600 5 + 0 + carry = 6 # <= 9, will write result and set carry to zero
3599+ carry=0
0001
-----
3600 3 + 0 + carry = 3 # <= 9, will write result and set carry to zero
Credits: inspiration taken from leetcode.com
[148]:
def add_one_carry(lst):
"""
Takes a list of digits representing a >= 0 integer without trailing zeroes except zero itself
and RETURN a NEW a list representing the value of lst plus one.
Implement it using the carry method explained before.
"""
#jupman-raise
ret = []
carry = 1
for digit in reversed(lst):
new_digit = digit + carry
if new_digit == 10:
ret.append(0)
carry = 1
else:
ret.append(new_digit)
carry = 0
if carry == 1:
ret.append(carry)
ret.reverse()
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert add_one_carry([0]) == [1]
assert add_one_carry([1]) == [2]
assert add_one_carry([2]) == [3]
assert add_one_carry([9]) == [1, 0]
assert add_one_carry([5,7]) == [5, 8]
assert add_one_carry([5,9]) == [6, 0]
assert add_one_carry([9,9]) == [1, 0, 0]
# TEST END
collatz¶
Difficulty: ✪✪✪
More challenging, implement this function from Montresor slides (the Collatz conjecture says that starting from any n you end up to 1):
The 3n +1 sequence is defined like this: given a number n , compute a new value for n as follow: if n is even, divide n by 2 . If n is odd, multiply it by 3 and add 1 . Stop when you reach the value of 1 . Example: for n = 3 , the sequence is [3 , 10 , 5 , 16 , 8 , 4 , 2 , 1] . Write a program that creates a list D , such that for each value n between 1 and 50 , D [ n ] contains the length of the sequence so generated. In case of n = 3 , the length is 8 . In case of n = 27 , the length is 111 .
If you need to check your results, you can also try this nice online tool
[149]:
def collatz():
""" Return D"""
raise Exception("TODO IMPLEMENT ME !")
[ ]:
Recursive operations¶
Here we deal with recursion. Before doing this, you might wait until doing Montresor class on recursion theory
When we have a problem, we try to solve it by splitting its dimension in half (or more), look for solutions in each of the halves and then decide what to do with the found solutions, if any.
Several cases may occur:
No solution is found
One solution is found
Two solutions are found
case 1): we can only give up.
case 2): we have only one solution, so we can just return that one.
case 3): we have two solutions, so we need to decide what is the purpose of the algorithm.
Is the purpose to …
find all possible solutions? Then we return both of them.
find the best solution, according to some measure of ‘goodness’? Then we measure each of the solutions and give back the highest scoring one.
always provide a combination of existing solutions, according to some combination method? Then we combine the found solutions and give them back
gap_rec¶
✪✪ In a list \(L\) containing \(n≥2\) integers, a gap is an index \(i\), \(0< i < n\), such that \(L[i−1]< L[i]\)
If \(n≥2\) and \(L[0]< L[n−1]\), \(L\) contains at least one gap
Design an algorithm that, given a list \(L\) containing \(n≥2\) integers such that \(L[0]< L[n−1]\), finds a gap in the list.
Try to code and test the gap
function. To avoid displaying directly Python, here we wrote it as pseudocode:
Use the following skeleton to code it and add some test to the provided testcase class.To understand what’s going on, try copy pasting in Python tutor
Notice that
We created a function
gap_rec
to differentiate it from the iterative oneUsers of
gap_rec
function might want to call it by passing just a list, in order to find any gap in the whole list. So for convenience the new functiongap_rec(L)
only accepts a list, without indexesi
andj
. This function just calls the other functiongap_rec_helper
that will actually contain the recursive calls. So your task is to translate the pseudocode ofgap
into the Python code ofgap_rec_helper
, which takes as input the array and the indexes asgap
does. Adding a helper function is a frequent pattern you can find when programming recursive functions.
WARNING: The specification of gap_rec assumes the input is always a list of at least two elements, and that the first element is less or equal than the last one. If these conditions are not met, function behaviour could be completely erroneus!
When preconditions are not met, execution could stop because of an error like index out of bounds, or, even worse, we might get back some wrong index as a gap! To prevent misuse of the function, a good idea can be putting a check at the beginning of the gap_rec
function. Such check should immediately stop the execution and raise an error if the parameters don’t satisfy the preconditions. One way to do this could be to to some
assertion like this:
def gap_rec(L, i , j):
assert len(L) >= 2
assert L[0] <= L[len(L)-1]
These commands will make python interrupt execution and throw an error as soon it detects list
L
is too small or with wrong valuesThis kind of behaviour is also called fail fast, which is better than returning wrong values!
You can put any condition you want after
assert
, but ideally they should be fast to execute.asserts might be better here than
raise Exception
constructs because asserts can be disabled with a flag passed to the interpreter. So, when you debug you can take advantage of them, and when the code is production quality and supposed to be bug free you can disable all assertions at once to gain in execution speed.
GOOD PRACTICE: Notice I wrote as a comment what the helper function is expected to receive. Writing down specs often helps understanding what the function is supposed to do, and helps users of your code as well!
COMMANDMENT 2: You shall also write on paper!
To get an idea of how gap_rec
is working, draw histograms on paper like the following, with different heights at index m
:
Notice how at each recursive call, we end up with a histogram that is similar to the inital one, that is, it respects the same preconditions (a list of size >= 2 where first element is smaller or equal than the last one)
[150]:
def gap_rec(L, i, j):
#jupman-raise
if j == i+1:
return j
else:
m = (i+j) // 2
if L[m] < L[j]:
return gaprec(L,m,j)
else:
return gaprec(L,i,m)
#/jupman-raise
def gap(L):
#jupman-raise
return gap_rec(L, 0, len(L) - 1)
#/jupman-raise
# try also to write asserts
Further exercises¶
Have a look at leetcode array problems sorting by Acceptance and Easy.
In particular, you may check:
Missing number - has many possible solutions
Array partition 1 actually a bit hard but makes you think
[ ]:
Tuples solutions¶
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|- lists
|- tuples-exercise.ipynb
|- tuples-solution.ipynb
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/tuples/tuples-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Introduction¶
References
Tuples are immutable sequences, so it is not possible to change their content without actually changing the object. They are sequential collections of objects, and elements of tuples are assumed to be in a particular order.
Duplicates are allowed
They can hold heterogeneous information.
Building tuples¶
Tuples are created with round brackets ()
Some examples:
[2]:
first_tuple = (1,2,3)
print(first_tuple)
(1, 2, 3)
[3]:
second_tuple = (1,) # this contains one element only, but we need the comma!
print(second_tuple, " type:", type(second_tuple))
(1,) type: <class 'tuple'>
[4]:
var = (1) # This is not a tuple!!!
print(var, " type:", type(var))
1 type: <class 'int'>
[5]:
empty_tuple = () # fairly useless
print(empty_tuple, "\n")
()
[6]:
third_tuple = ("January", 1 ,2007) # heterogeneous info
print(third_tuple)
('January', 1, 2007)
[7]:
days = (third_tuple,("February",2,1998), ("March",2,1978),("June",12,1978))
print(days, "\n")
(('January', 1, 2007), ('February', 2, 1998), ('March', 2, 1978), ('June', 12, 1978))
Remember tuples are immutable objects…
[8]:
print("Days has id: ", id(days))
days = ("Mon","Tue","Wed","Thu","Fri","Sat","Sun")
Days has id: 140632243535944
…hence reassignment creates a new object
[9]:
print("Days now has id: ", id(days))
Days now has id: 140632252392016
Building from sequences¶
You can build a tuple from any sequence:
[10]:
tuple([8,2,5])
[10]:
(8, 2, 5)
[11]:
tuple("abc")
[11]:
('a', 'b', 'c')
Tuple operators¶
The following operators work on tuples and they behave exactly as on lists:
[12]:
practical1 = ("Friday", "28/09/2018")
practical2 = ("Tuesday", "02/10/2018")
practical3 = ("Friday", "05/10/2018")
# A tuple containing 3 tuples
lectures = (practical1, practical2, practical3)
print("The first three lectures:\n", lectures, "\n")
The first three lectures:
(('Friday', '28/09/2018'), ('Tuesday', '02/10/2018'), ('Friday', '05/10/2018'))
[13]:
# One tuple only
mergedLectures = practical1 + practical2 + practical3
print("mergedLectures:\n", mergedLectures)
mergedLectures:
('Friday', '28/09/2018', 'Tuesday', '02/10/2018', 'Friday', '05/10/2018')
[14]:
# This returns the whole tuple
print("1st lecture was on: ", lectures[0], "\n")
1st lecture was on: ('Friday', '28/09/2018')
[15]:
# 2 elements from the same tuple
print("1st lecture was on ", mergedLectures[0], ", ", mergedLectures[1], "\n")
1st lecture was on Friday , 28/09/2018
[16]:
# Return type is tuple!
print("3rd lecture was on: ", lectures[2])
3rd lecture was on: ('Friday', '05/10/2018')
[17]:
# 2 elements from the same tuple returned in tuple
print("3rd lecture was on ", mergedLectures[4:], "\n")
3rd lecture was on ('Friday', '05/10/2018')
The following methods are available for tuples:
[18]:
practical1 = ("Friday", "28/09/2018")
practical2 = ("Tuesday", "02/10/2018")
practical3 = ("Friday", "05/10/2018")
mergedLectures = practical1 + practical2 + practical3 # One tuple only
print(mergedLectures.count("Friday"), " lectures were on Friday")
print(mergedLectures.count("Tuesday"), " lecture was on Tuesday")
print("Index:", practical2.index("Tuesday"))
2 lectures were on Friday
1 lecture was on Tuesday
Index: 0
# not present in tuple, python will complain
print("Index:", practical2.index("Wednesday"))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-125-f7ecc5f7f5d6> in <module>
----> 1 print("Index:", practical2.index("Wednesday"))
ValueError: tuple.index(x): x not in tuple
Exercise: pet tuples¶
Given the string pets = "siamese cat,dog,songbird,guinea pig,rabbit,hampster"
convert it into a list.
create then a tuple of tuples where each tuple has two information: the name of the pet and the length of the name. E.g. ((“dog”,3), ( “hampster”,8)).
print the tuple
You should obtain:
['cat', 'dog', 'bird', 'guinea pig', 'rabbit', 'hampster']
(('cat', 3), ('dog', 3), ('bird', 4), ('guinea pig', 10), ('rabbit', 6), ('hampster', 8))
[19]:
pets = "cat,dog,bird,guinea pig,rabbit,hampster"
# write here
pet_list = pets.split(',')
print(pet_list)
pet_tuples = ((pet_list[0], len(pet_list[0])),
(pet_list[1], len(pet_list[1])),
(pet_list[2], len(pet_list[2])),
(pet_list[3], len(pet_list[3])),
(pet_list[4], len(pet_list[4])),
(pet_list[5], len(pet_list[5])))
print(pet_tuples)
['cat', 'dog', 'bird', 'guinea pig', 'rabbit', 'hampster']
(('cat', 3), ('dog', 3), ('bird', 4), ('guinea pig', 10), ('rabbit', 6), ('hampster', 8))
Exercise: fruits¶
Given the string S="apple|pear|apple|cherry|pear|apple|pear|pear|cherry|pear|strawberry"
. Store the elements separated by the "|"
in a list.
How many elements does the list have?
Knowing that the list created at the previous point has only four distinct elements (i.e.
"apple"
,"pear"
,"cherry"
and"strawberry"
), create another list where each element is a tuple containing the name of the fruit and its multiplicity (that is how many times it appears in the original list). Ex. list_of_tuples = [(“apple”, 3), (“pear”, “5”),…]. Here you can and should write code that only works with the given constant string, so there is no need for cycles.Print the content of each tuple in a separate line (ex. first line: apple is present 3 times)
You should obtain:
['apple', 'pear', 'apple', 'cherry', 'pear', 'apple', 'pear', 'pear', 'cherry', 'pear', 'strawberry']
[('apple', 3), ('pear', 5), ('cherry', 2), ('strawberry', 1)]
apple is present 3 times
pear is present 5 times
cherry is present 2 times
strawberry is present 1 times
[20]:
S="apple|pear|apple|cherry|pear|apple|pear|pear|cherry|pear|strawberry"
# write here
Slist = S.split("|")
print(Slist)
appleT = ("apple", Slist.count("apple"))
pearT = ("pear", Slist.count("pear"))
cherryT = ("cherry", Slist.count("cherry"))
strawberryT = ("strawberry", Slist.count("strawberry"))
list_of_tuples =[appleT, pearT, cherryT, strawberryT]
print(list_of_tuples, "\n") #adding newline to separate elements
print(appleT[0], " is present ", appleT[1], " times")
print(pearT[0], " is present ", pearT[1], " times")
print(cherryT[0], " is present ", cherryT[1], " times")
print(strawberryT[0], " is present ", strawberryT[1], " times")
['apple', 'pear', 'apple', 'cherry', 'pear', 'apple', 'pear', 'pear', 'cherry', 'pear', 'strawberry']
[('apple', 3), ('pear', 5), ('cherry', 2), ('strawberry', 1)]
apple is present 3 times
pear is present 5 times
cherry is present 2 times
strawberry is present 1 times
Exercise: build a tuple¶
Given a tuple x
, store in variable y
another tuple containing the same elements as x
except the last one_, and also the elements d
and e
appended at the end. Your code should work with any input x
.
Example:
x = ('a','b','c')
after your code, you should get printed:
x = ('a', 'b', 'c')
y = ('a', 'b', 'd', 'e')
[21]:
x = ('a','b','c')
# write here
y = tuple(x[:-1]) + ('d','e')
print('x=',x)
print('y=',y)
x= ('a', 'b', 'c')
y= ('a', 'b', 'd', 'e')
Verify comprehension¶
ATTENTION
Following exercises require you to know:
Complex statements: Andrea Passerini slides A03
Functions: Andrea Passerini slides A04
Tests with asserts: Following exercises contain automated tests to help you spot errors. To understand how to do them, read before Error handling and testing
doubletup¶
✪✪ Takes as input a list with n
integer numbers, and RETURN a NEW list which contains n
tuples each with two elements. Each tuple contains a number taken from the corresponding position from original list, and its double
Example:
>>> doubletup([ 5, 3, 8])
[(5,10), (3,6), (8,16)]
[22]:
def doubletup(xs):
#jupman-raise
ret = []
for x in xs:
ret.append((x, x * 2))
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert doubletup([]) == []
assert doubletup([3]) == [(3,6)]
assert doubletup([2,7]) == [(2,4),(7,14)]
assert doubletup([5,3,8]) == [(5,10), (3,6), (8,16)]
# verify original list has not changed
la = [6]
lb = doubletup(la)
assert la == [6]
assert lb == [(6,12)]
# END TEST
Sets solutions¶
Download exercises zip¶
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|-sets
|- sets-exercise.ipynb
|- sets-solution.ipynb
open the editor of your choice (for example Visual Studio Code, Spyder or PyCharme), you will edit the files ending in
_exercise.py
filesGo on reading this notebook, and follow instuctions inside.
introduction¶
A set is an unordered collection of distinct elements, so no duplicates are allowed.
Creating a set¶
In Python you can create a set with a call to set()
[2]:
s = set()
[3]:
s
[3]:
set()
To add elements, use .add()
method:
[4]:
s.add('hello')
s.add('world')
Notice Python represents a set with curly brackets, but differently from a dictionary you won’t see colons :
nor key/value couples:
[5]:
s
[5]:
{'hello', 'world'}
set from a sequence¶
You can create a set from any sequence, like a list. Doing so will eliminate duplicates present:
[6]:
set(['a','b','c','b','a','d'])
[6]:
{'a', 'b', 'c', 'd'}
Empty sets¶
WARNING: {}
means empty dictionary, not empty set !
Since a set print out representation starts and ends with curly brackets as dictionaries, when you see written {}
you might wonder whether that is the empty set or the empty dictionary.
The empty set is represented with set()
[7]:
s = set()
[8]:
s
[8]:
set()
[9]:
type(s)
[9]:
set
Instead, the empty dictionary is represented as a curly bracket:
[10]:
d = {}
[11]:
d
[11]:
{}
[12]:
type(d)
[12]:
dict
Iterating a set¶
You can iterate in a set with the for in
construct:
[13]:
for el in s:
print(el)
From the print out you notice sets, like dictionaries keys, are not necessarily iterated in same order as the insertion one. This also means they do not support access by index:
s[0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-61-f8bb2b116405> in <module>()
----> 1 s[0]
TypeError: 'set' object does not support indexing
Adding twice¶
Since sets must contain distinct elements, if we add the same element twice the same remains unmodified with no complaints from Python:
[14]:
s.add('hello')
[15]:
s
[15]:
{'hello'}
[16]:
s.add('world')
[17]:
s
[17]:
{'hello', 'world'}
In a set we add eterogenous elements, like a numer here:
[18]:
s.add(7)
[19]:
s
[19]:
{7, 'hello', 'world'}
To remove an element, use .remove()
method:
[20]:
s.remove('world')
[21]:
s
[21]:
{7, 'hello'}
Belonging to a set¶
To determine if an item belongs to a set you can use the usual ‘in’ operator as for any other sequence:
[22]:
'b' in set(['a','b','c','d'])
[22]:
True
[23]:
'z' in set(['a','b','c','d'])
[23]:
False
There is an important difference with other sequences such as lists, though: searching for an item in a set is always very fast, while searching in a list in the worst case requires Python to search the whole list.
There is a catch though: to get such performance you are obliged to only put in the set immutable data, such as numbers, strings, etc. If you try to add a mutable type like i.e. a list, you will get an error:
s = set()
s.add( ['a','b','c'] )
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-b345c7f28446> in <module>
----> 1 s.add( ['a','b','c'] )
TypeError: unhashable type: 'list'
Operations¶
You can perform set .union(s2)
, .intersection(s2)
, .difference(s2)
…
NOTE: set operations which don’t have ‘update’ in the name create a NEW set each time!!!
[24]:
s1 = set(['a','b','c','d','e'])
print(s1)
{'a', 'e', 'c', 'b', 'd'}
[25]:
s2 = set(['b','c','f'])
[26]:
s3 = s1.intersection(s2) # NOTE: it returns a NEW set !!!
print(s3)
{'c', 'b'}
[27]:
print(s1) # did not change
{'a', 'e', 'c', 'b', 'd'}
updating sets¶
If you do want to change the original, you have to use intersection_update
:
[28]:
s4 = set(['a','b','c','d','e'])
s5 = set(['b','c','f'])
res = s4.intersection_update(s5) #NOTE: this MODIFIES s4 and thus return None !!!!
print(res)
None
[29]:
print(s4)
{'c', 'b'}
Exercise: set operators¶
Write some code that creates a set s4
which contains all the elements of s1
and s2
but does not contain the elements of s3
. Your code should work with any s1
,s2
,s3
.
With
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])
After you code you should get
{'d', 'a', 'c', 'g', 'e'}
[30]:
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])
# write here
s4 = s1.union(s2).difference(s3)
print(s4)
{'g', 'a', 'e', 'c', 'd'}
Exercise: dedup¶
Write some short code to create a listb
which contains all elements from lista
without duplicates and sorted alphabetically.
MUST NOT change original lista
no cycles allowed !
your code should work with any
lista
lista = ['c','a','b','c','d','b','e']
after your code, you should get
lista = ['c', 'a', 'b', 'c', 'd', 'b', 'e']
listb = ['a', 'b', 'c', 'd', 'e']
[31]:
lista = ['c','a','b','c','d','b','e']
# write here
s = set(lista)
listb = list(sorted(s)) # NOTE: sorted generates a NEW sequence
print("lista =",lista)
print("listb =",listb)
lista = ['c', 'a', 'b', 'c', 'd', 'b', 'e']
listb = ['a', 'b', 'c', 'd', 'e']
[ ]:
Dictionaries solutions¶
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-my_lib.py
-other stuff ...
-exercises
|- lists
|- dictionaries-exercise.ipynb
|- dictionaries-solution.ipynb
|- other stuff ..
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/dictionaries/dictionaries-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Introduction¶
We will review dictionaries, discuss ordering issues for keys, and finally deal with nested dictionaries
Dict¶
First let’s review Python dictionaries:
Dictionaries map keys to values. Keys must be immutable types such as numbers, strings, tuples (so i.e. no lists are allowed as keys), while values can be anything. In the following example, we create a dictionary d
that initially maps from strings to numbers:
[2]:
# create empty dict:
d = dict()
d
[2]:
{}
[3]:
type( dict() )
[3]:
dict
Alternatively, to create a dictionary you can type {}
:
[4]:
{}
[4]:
{}
[5]:
type( {} )
[5]:
dict
[6]:
# associate string "some key" to number 4
d['some key'] = 4
d
[6]:
{'some key': 4}
To access a value corresponding to a key, write this:
[7]:
d['some key']
[7]:
4
You can’t associate mutable objects like lists:
d[ ['a', 'mutable', 'list', 'as key'] ] = 3
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-204-fb9d60c4e88a> in <module>()
----> 1 d[ ['a', 'mutable', 'list', 'as key'] ] = 3
TypeError: unhashable type: 'list'
But you can associate tuples:
[8]:
d[ ('an', 'immutable', 'tuple', 'as key') ] = 3
d
[8]:
{('an', 'immutable', 'tuple', 'as key'): 3, 'some key': 4}
[9]:
# associate string "some other key" to number 7
d['some other key'] = 7
d
[9]:
{('an', 'immutable', 'tuple', 'as key'): 3, 'some key': 4, 'some other key': 7}
[10]:
# Dictionary is mutable, so you can reassign a key to a different value:
d['some key'] = 5
d
[10]:
{('an', 'immutable', 'tuple', 'as key'): 3, 'some key': 5, 'some other key': 7}
[11]:
# Dictionares are eterogenous, so values can be of different types:
d['yet another key'] = 'now a string!'
d
[11]:
{('an', 'immutable', 'tuple', 'as key'): 3,
'some key': 5,
'some other key': 7,
'yet another key': 'now a string!'}
[12]:
# Keys also can be of eterogeneous types, but they *must* be of immutable types:
[13]:
d[123] = 'hello'
d
[13]:
{('an', 'immutable', 'tuple', 'as key'): 3,
123: 'hello',
'some key': 5,
'some other key': 7,
'yet another key': 'now a string!'}
To iterate through keys, use a ‘for in’ construct :
WARNING: iteration order most often is NOT the same as insertion order!!
[14]:
for k in d:
print(k)
123
some key
some other key
('an', 'immutable', 'tuple', 'as key')
yet another key
Get all keys:
[15]:
d.keys()
[15]:
dict_keys([123, 'some key', 'some other key', ('an', 'immutable', 'tuple', 'as key'), 'yet another key'])
Get all values:
[16]:
d.values()
[16]:
dict_values(['hello', 5, 7, 3, 'now a string!'])
[17]:
# delete a key:
del d['some key']
d
[17]:
{('an', 'immutable', 'tuple', 'as key'): 3,
123: 'hello',
'some other key': 7,
'yet another key': 'now a string!'}
Dictionary methods¶
Recall what seen in the lecture, the following methods are available for dictionaries:
These methods are new to dictionaries and can be used to loop through the elements in them.
ATTENTION: dict.keys()
returns a dict_keys
object not a list. To cast it to list, we need to call list(dict.keys())
.
Functions working on dictionaries¶
As for the other data types, python provides several operators that can be applied to dictionaries. The following operators are available and they basically work as in lists. The only exception being that the operator in checks whether the specified object is present among the keys.
Exercise print key¶
✪ PRINT the value of key 'b'
, that is, 2
[18]:
d = {'a':6, 'b':2,'c':5}
# write here
d['b']
[18]:
2
Exercise modify dictionary¶
✪ MODIFY the dictionary, by substituting the key c
with 8
. Then PRINT the dictionary
NOTE: the order in which couples key/value are printed is NOT relevant!
[19]:
d = {'a':6, 'b':2, 'c':5}
# write here
d['c'] = 8
print(d)
{'c': 8, 'a': 6, 'b': 2}
Exercise print keys¶
✪ PRINT a sequence with all the keys, using the appropriate method of dictionaries
[20]:
d = {'a':6, 'b':2,'c':5}
# write here
d.keys()
[20]:
dict_keys(['c', 'a', 'b'])
Exercise print dimension¶
✪ PRINT the number of couples key/value in the dictionary
[21]:
d = {'a':6, 'b':2, 'c':5}
# write here
print(len(d))
3
Exercise print keys as list¶
✪ PRINT a LIST with all the keys in the dictionary
NOTE 1: it is NOT necessary that the list is ordered
NOTE 2: to convert any sequence to a list, use the predefined function
list
[22]:
d = {'a':6, 'b':2,'c':5}
# write here
list(d.keys())
[22]:
['c', 'a', 'b']
Exercise ordered keys¶
✪ PRINT an ordered LIST holding all dictionary keys
NOTE 1: now it is necessary for the list to be ordered
NOTE 2: to convert any sequence to a list, use the predefined function
list
[23]:
d = {'a':6, 'c':2,'b':5}
# write here
my_list = list(d.keys())
my_list.sort() # REMEMBER: sort does NOT return anything !!!
print(my_list)
['a', 'b', 'c']
OrderedDict¶
As we said before, when you scan the keys of a dictionary, the order most often is not the same as the insertion order. To have it predictable, you need to use an OrderedDict
[24]:
# first you need to import it from collections module
from collections import OrderedDict
od = OrderedDict()
# OrderedDict looks and feels exactly as regular dictionaries. Here we reproduce the previous example:
od['some key'] = 5
od['some other key'] = 7
od[('an', 'immutable', 'tuple','as key')] = 3
od['yet another key'] = 'now a string!'
od[123] = 'hello'
od
[24]:
OrderedDict([('some key', 5),
('some other key', 7),
(('an', 'immutable', 'tuple', 'as key'), 3),
('yet another key', 'now a string!'),
(123, 'hello')])
Now you will see that if you iterate with the for in
construct, you get exactly the same insertion sequence:
[25]:
for key in od:
print("%s : %s" %(key, od[key]))
some key : 5
some other key : 7
('an', 'immutable', 'tuple', 'as key') : 3
yet another key : now a string!
123 : hello
To create it all at once, since you want to be sure of the order, you can pass a list of tuples representing key/value pairs. Here we reproduce the previous example:
[26]:
od = OrderedDict(
[
('some key', 5),
('some other key', 7),
(('an', 'immutable', 'tuple','as key'), 3),
('yet another key', 'now a string!'),
(123, 'hello')
]
)
od
[26]:
OrderedDict([('some key', 5),
('some other key', 7),
(('an', 'immutable', 'tuple', 'as key'), 3),
('yet another key', 'now a string!'),
(123, 'hello')])
Again you will see that if you iterate with the for in
construct, you get exactly the same insertion sequence:
[27]:
for key in od:
print("%s : %s" % (key, od[key]))
some key : 5
some other key : 7
('an', 'immutable', 'tuple', 'as key') : 3
yet another key : now a string!
123 : hello
Exercise: OrderedDict phonebook¶
Write some short code that given three tuples, like the following, prints an OrderedDict which associates names to phone numbers, in the order they are proposed above.
Your code should work with any tuples.
Don’t forget to import the OrderedDict from collections
Example:
t1 = ('Alice', '143242903')
t2 = ('Bob', '417483437')
t3 = ('Charles', '423413213')
after your code should give:
OrderedDict([('Alice', '143242903'), ('Bob', '417483437'), ('Charles', '423413213')])
[28]:
# first you need to import it from collections module
from collections import OrderedDict
t1 = ('Alice', '143242903')
t2 = ('Bob', '417483437')
t3 = ('Charles', '423413213')
# write here
od = OrderedDict([t1, t2, t3])
print(od)
OrderedDict([('Alice', '143242903'), ('Bob', '417483437'), ('Charles', '423413213')])
Exercise: OrderedDict copy¶
Given an OrderedDict od1
containing translations English -> Italian, create a NEW OrderedDict called od2
which contains the same translations as the input one PLUS the translation 'water'
: 'acqua'
.
NOTE 1: your code should work with any input ordered dict
NOTE 2:
od2
MUST hold a NEW OrderedDict !!
Example:
With
od1 = OrderedDict()
od1['dog'] = 'cane'
od1['home'] = 'casa'
od1['table'] = 'tavolo'
after your code you should get:
>>> print(od1)
OrderedDict([('dog', 'cane'), ('home', 'casa'), ('table', 'tavolo')])
>>> print(od2)
OrderedDict([('dog', 'cane'), ('home', 'casa'), ('table', 'tavolo'), ('water', 'acqua')])
[29]:
from collections import OrderedDict
od1 = OrderedDict()
od1['dog'] = 'cane'
od1['home'] = 'casa'
od1['table'] = 'tavolo'
# write here
od2 = OrderedDict(od1)
od2['water'] = 'acqua'
print("od1=", od1)
print("od2=", od2)
od1= OrderedDict([('dog', 'cane'), ('home', 'casa'), ('table', 'tavolo')])
od2= OrderedDict([('dog', 'cane'), ('home', 'casa'), ('table', 'tavolo'), ('water', 'acqua')])
List of nested dictionaries¶
Suppose you have a list of dictionaries which represents a database of employees. Each employee is represented by a dictionary:
{
"name":"Mario",
"surname": "Rossi",
"age": 34,
"company": {
"name": "Candy Apples Inc.",
"sector":"Food"
}
}
The dictionary has several simple attributes like name
, surname
, age
. The attribute company
is more complex, because it is represented as another dictionary:
"company": {
"name": "Candy Apples Inc.",
"sector":"Food"
}
[30]:
employees_db = [
{
"name":"Mario",
"surname": "Rossi",
"age": 34,
"company": {
"name": "Candy Apples Inc.",
"sector":"Food"
}
},
{
"name":"Pippo",
"surname": "Rossi",
"age": 20,
"company": {
"name": "Batworks",
"sector":"Clothing"
}
},
{
"name":"Paolo",
"surname": "Bianchi",
"age": 25,
"company": {
"name": "Candy Apples Inc.",
"sector":"Food"
}
}
]
Exercise: print employees¶
Write some code to print all employee names and surnames from the above employees_db
You can assume employees_db
has exactly 3 employees (so for
cycle is not even needed)
You should obtain:
Mario Rossi
Pippo Rossi
Paolo Bianchi
[31]:
# write here
print(employees_db[0]["name"], employees_db[0]["surname"])
print(employees_db[1]["name"], employees_db[1]["surname"])
print(employees_db[2]["name"], employees_db[2]["surname"])
Mario Rossi
Pippo Rossi
Paolo Bianchi
Exercise: print company names¶
Write some code to print all company names and sector from the above employees_db
, without duplicating them. Pay attention to sector lowercase name.
You can assume employees_db
has exactly 3 employees (so for
cycle is not even needed)
[32]:
# write here
print(employees_db[0]["company"]["name"], "is a", employees_db[0]["company"]["sector"].lower(), "company")
print(employees_db[1]["company"]["name"], "is a", employees_db[1]["company"]["sector"].lower(), "company")
Candy Apples Inc. is a food company
Batworks is a clothing company
Exercises with functions¶
ATTENTION
Following exercises require you to know:
Complex statements: Andrea Passerini slides A03
Functions: Andrea Passerini slides A04
print_val¶
✪ Write the function print_val(d, key)
which RETURN the value associated to key
>>> x = print_val({'a':5,'b':2}, 'a')
>>> x
5
>>> y = print_val({'a':5,'b':2}, 'b')
>>> y
2
[33]:
# write here
def print_val(d, key):
return d[key]
#x = print_val({'a':5,'b':2}, 'a')
#x
has_key¶
Write a function has_key(d,key)
which PRINTS "found"
if diz
contains the key key
, otherwise PRINTS "not found"
>>> has_key({'a':5,'b':2}, 'a')
found
>>> has_key({'a':5,'b':2}, 'z')
not found
[34]:
# write here
def has_key(d, key):
if key in d:
print("found")
else:
print("not found")
#has_key({'a':5,'b':2}, 'a')
#has_key({'a':5,'b':2}, 'b')
#has_key({'a':5,'b':2}, 'z')
dim¶
✪ Write a function dim(d)
which RETURN the associations key-value present in the dictionary
>>> x = dim({'a':5,'b':2,'c':9})
>>> x
3
[35]:
# write here
def dim(d):
return len(d)
#x = dim({'a':5,'b':2,'c':9})
#x
keyring¶
✪ Given a dictionary, write a function keyring
which RETURN an ORDERED LIST with all the keys, una at a time
NOTE: the order of keys in this list IS important !
>>> x = keyring({'a':5,'c':2,'b':9})
>>> x
['a','b','c']
[36]:
# write here
def keyring(d):
my_list = list(d.keys())
my_list.sort() # REMEMBER: .sort() does NOT return anything !!
return my_list
#x = keyring({'a':5,'c':2,'b':9})
#x
couples¶
✪ Given a dictionary, write a function couples
which PRINTS all key/value couples, one per row
NOTE: the order of the print is NOT important, it si enough to print all couples !
>>> couples({'a':5,'b':2,'c':9})
a 5
c 9
b 2
[37]:
# write here
def couples(d):
for key in d:
print(key,d[key])
#couples({'a':5,'b':2,'c':9})
Verify comprehension¶
ATTENTION
Following exercises require you to know:
Complex statements: Andrea Passerini slides A03
Functions: Andrea Passerini slides A04
Tests with asserts: Following exercises contain automated tests to help you spot errors. To understand how to do them, read before Error handling and testing
histogram¶
✪✪ RETURN a dictionary that for each character in string contains the number of occurrences. The keys are the caracthers and the values are to occurrences
[38]:
def histogram(string):
#jupman-raise
ret = dict()
for c in string:
if c in ret:
ret[c] += 1
else:
ret[c] = 1
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert histogram("babbo") == {'b': 3, 'a':1, 'o':1}
assert histogram("") == {}
assert histogram("cc") == {'c': 2}
assert histogram("aacc") == {'a': 2, 'c':2}
# TEST END
listify¶
✪✪ Takes a dictionary d
as input and RETURN a LIST with only the values from the dict (so no keys )
To have a predictable order, the function also takes as input a list order
where there are the keys from first dictionary ordered as we would like in the resulting list
[39]:
def listify(d, order):
#jupman-raise
ret = list()
for element in order:
ret.append (d[element])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert listify({}, []) == []
assert listify({'ciao':123}, ['ciao']) == [123]
assert listify({'a':'x','b':'y'}, ['a','b']) == ['x','y']
assert listify({'a':'x','b':'y'}, ['b','a']) == ['y','x']
assert listify({'a':'x','b':'y','c':'x'}, ['c','a','b']) == ['x','x','y']
assert listify({'a':'x','b':'y','c':'x'}, ['b','c','a']) == ['y','x','x']
assert listify({'a':5,'b':2,'c':9}, ['b','c','a']) == [2,9,5]
assert listify({6:'x',8:'y',3:'x'}, [6,3,8]) == ['x','x','y']
# TEST END
tcounts¶
✪✪ Takes a list of tuples. Each tuple has two values, the first is an immutable object and the second one is an integer number (the counts of that object). RETURN a dictionary that for each immutable object found in the tuples, associate the total count found for it.
See asserts for examples
[40]:
def tcounts(lst):
ret = {}
for c in lst:
if c[0] in ret:
ret[c[0]] += c[1]
else:
ret[c[0]] = c[1]
return ret
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert tcounts([]) == {}
assert tcounts([('a',3)]) == {'a':3}
assert tcounts([('a',3),('a',4)]) == {'a':7}
assert tcounts([('a',3),('b',8), ('a',4)]) == {'a':7, 'b':8}
assert tcounts([('a',5), ('c',8), ('b',7), ('a',2), ('a',1), ('c',4)]) == {'a':5+2+1, 'b':7, 'c': 8 + 4}
# TEST END
inter¶
✪✪ Write a function inter(d1,d2)
which takes two dictionaries and RETURN a SET of keys for which the couple is the same in both dictionaries
Example
>>> a = {'key1': 1, 'key2': 2 , 'key3': 3}
>>> b = {'key1': 1 ,'key2': 3 , 'key3': 3}
>>> inter(a,b)
{'key1','key3'}
[41]:
def inter(d1, d2):
#jupman-raise
res = set()
for key in d1:
if key in d2:
if d1[key] == d2[key]:
res.add(key)
return res
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert inter({'key1': 1, 'key2': 2 , 'key3': 3}, {'key1':1 ,'key2':3 , 'key3':3}) == {'key1', 'key3'}
assert inter(dict(), {'key1':1 ,'key2':3 , 'key3':3}) == set()
assert inter({'key1':1 ,'key2':3 , 'key3':3}, dict()) == set()
assert inter(dict(),dict()) == set()
# TEST END
unique_vals¶
✪✪ Write a function unique_vals(d)
which RETURN a list of unique values from the dictionary. The list MUST be ordered alphanumerically
Question: We need it ordered for testing purposes. Why?
to order the list, use method .sort()
Example:
>>> unique_vals({'a':'y','b':'x','c':'x'})
['x','y']
[42]:
def unique_vals(d):
#jupman-raise
s = set(d.values())
ret = list(s) # we can only sort lists (sets have no order)
ret.sort()
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert unique_vals({}) == []
assert unique_vals({'a':'y','b':'x','c':'x'}) == ['x','y']
assert unique_vals({'a':4,'b':6,'c':4,'d':8}) == [4,6,8]
# TEST END
uppers¶
✪✪ Takes a list and RETURN a dictionary which associates to each string in the list the same string but with all characters uppercase
Example:
>>> uppers(["ciao", "mondo", "come va?"])
{"ciao":"CIAO", "mondo":"MONDO", "come va?":"COME VA?"}
Ingredients:
for cycle
.upper()
method
[43]:
def uppers(xs):
#jupman-raise
d = {}
for s in xs:
d[s] = s.upper()
return d
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert uppers([]) == {}
assert uppers(["ciao"]) == {"ciao":"CIAO"}
assert uppers(["ciao", "mondo"]) == {"ciao":"CIAO", "mondo":"MONDO"}
assert uppers(["ciao", "mondo", "ciao"]) == {"ciao":"CIAO", "mondo":"MONDO"}
assert uppers(["ciao", "mondo", "come va?"]) == {"ciao":"CIAO", "mondo":"MONDO", "come va?":"COME VA?"}
# TEST END
filtraz¶
✪✪ RETURN a NEW dictionary, which contains only the keys key/value of the dictionary d
in input in which in the key is present the character 'z'
Example:
filtraz({'zibibbo':'to drink',
'mc donald': 'to avoid',
'liquirizia': 'ze best',
'burger king': 'zozzerie'
})
must RETURN the NEW dictionary
{
'zibibbo':'da bere',
'liquirizia': 'ze best'
}
In other words, we only kept those keys which contained at least a z
. We do not care about z
in values.
Ingredients:
To check if z
is in the key, use the operator in
, for example
'z' in 'zibibbo' == True
'z' in 'mc donald' == False
[44]:
def filtraz(diz):
#jupman-raise
ret = {}
for chiave in diz:
if 'z' in chiave:
ret[chiave] = diz[chiave]
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert filtraz({}) == {}
assert filtraz({'az':'t'}) == {'az':'t'}
assert filtraz({'zc':'w'}) == {'zc':'w'}
assert filtraz({'b':'h'}) == {}
assert filtraz({'b':'hz'}) == {}
assert filtraz({'az':'t','b':'hz'}) == {'az':'t'}
assert filtraz({'az':'t','b':'hz','zc':'w'}) == {'az':'t', 'zc':'w'}
# TEST END
powers¶
✪✪ RETURN a dictionary in which keys are integer numbers from 1
to n
included, and respective values are the sqaures of the keys.
Example:
powers(3)
should return:
{
1:1,
2:4,
3:9
}
[45]:
def powers(n):
#jupman-raise
d=dict()
for i in range(1,n+1):
d[i]=i**2
return d
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert powers(1) == {1:1}
assert powers(2) == {
1:1,
2:4
}
assert powers(3) == {
1:1,
2:4,
3:9
}
assert powers(4) == {
1:1,
2:4,
3:9,
4:16
}
# TEST END
dilist¶
✪✪ RETURN a dictionary with n
couples key-value, where the keys are integer numbers from 1
to n
included, and to each key i
is associated a list of numbers from 1
to i
.
NOTE: the keys are integer numbers, NOT strings !!!!
Example
>>> dilist(3)
{
1:[1],
2:[1,2],
3:[1,2,3]
}
[46]:
def dilist(n):
#jupman-raise
ret = dict()
for i in range(1,n+1):
lista = []
for j in range(1,i+1):
lista.append(j)
ret[i] = lista
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert dilist(0) == dict()
assert dilist(1) == {
1:[1]
}
assert dilist(2) == {
1:[1],
2:[1,2]
}
assert dilist(3) == {
1:[1],
2:[1,2],
3:[1,2,3]
}
# TEST END
prefixes¶
✪✪ Write a functionprefixes
which given
a dictionary d
having as keys italian provincies and as values their phone numbers (note: prefixes are also strings !) - a list provinces
with the italian provinces
RETURN a list of prefixes corresponding to provinces of given list.
Example:
>>> prefissi({
'tn':'0461',
'bz':'0471',
'mi':'02',
'to':'011',
'bo':'051'
},
['tn','to', 'mi'])
['0461', '011', '02']
HINTS:
intialize an empty list to return
go through provinces list and take corresponding prefixes from the dictionary
[47]:
def prefixes(d, provinces):
#jupman-raise
ret = []
for province in provinces:
ret.append(d[province])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert prefixes({'tn':'0461'}, []) == []
assert prefixes({'tn':'0461'}, ['tn']) == ['0461']
assert prefixes({'tn':'0461', 'bz':'0471'}, ['tn']) == ['0461']
assert prefixes({'tn':'0461', 'bz':'0471'}, ['bz']) == ['0471']
assert prefixes({'tn':'0461', 'bz':'0471'}, ['tn','bz']) == ['0461', '0471']
assert prefixes({'tn':'0461', 'bz':'0471'}, ['bz','tn']) == ['0471', '0461']
assert prefixes({'tn':'0461',
'bz':'0471',
'mi':'02',
'to':'011',
'bo':'051'
},
['tn','to', 'mi']) == ['0461', '011', '02']
# TEST END
Managers¶
Let’s look at this managers_db
data structure. It is a list of dictionaries of managers.
Each manager supervises a department, which is also represented as a dictionary.
Each department can stay either in building
"A"
or building"B"
[48]:
managers_db = [
{
"name":"Diego",
"surname": "Zorzi",
"age": 34,
"department": {
"name": "Accounting",
"budget":20000,
"building":"A"
}
},
{
"name":"Giovanni",
"surname": "Tapparelli",
"age": 45,
"department": {
"name": "IT",
"budget":10000,
"building":"B"
}
},
{
"name":"Sara",
"surname": "Tomasi",
"age": 25,
"department": {
"name": "Human resources",
"budget":30000,
"building":"A"
}
},
{
"name":"Giorgia",
"surname": "Tamanin",
"age": 28,
"department": {
"name": "R&D",
"budget":15000,
"building":"A"
}
},
{
"name":"Paola",
"surname": "Guadagnini",
"age": 30,
"department": {
"name": "Public relations",
"budget":40000,
"building":"B"
}
}
]
managers: extract_managers¶
✪✪ RETURN the names of the managers in a list
[49]:
def extract_managers(db):
#jupman-raise
ret = []
for d in db:
ret.append(d["name"])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert extract_managers([]) == []
# if it doesn't find managers_db, remember to executre the cell above which defins it !
assert extract_managers(managers_db) == ['Diego', 'Giovanni', 'Sara', 'Giorgia', 'Paola']
# TEST END
managers: extract_departments¶
✪✪ RETURN the names of departments in a list.
[50]:
def extract_departments(db):
#jupman-raise
ret = []
for d in db:
ret.append(d["department"]["name"])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert extract_departments([]) == []
# if it doesn't find managers_db, remember to execute the cell above which defins it !
assert extract_departments(managers_db) == ["Accounting", "IT", "Human resources","R&D", "Public relations"]
# TEST END
managers: avg_age¶
✪✪ RETURN the average age of managers
[51]:
def avg_age(db):
#jupman-raise
s = 0
for d in db:
s += d["age"]
return s / len(db)
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
# since the function returns a float we can't compare for exact numbers but
# only for close numbers with the function math.isclose
import math
assert math.isclose(avg_age(managers_db), (34 + 45 + 25 + 28 + 30) / 5)
# TEST END
managers: buildings¶
✪✪ RETURN the buildings the departments belong to, WITHOUT duplicates !!!
[52]:
def buildings(db):
#jupman-raise
ret = []
for d in db:
building = d["department"]["building"]
if building not in ret:
ret.append(building)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert buildings([]) == []
assert buildings(managers_db) == ["A", "B"]
# TEST END
medie¶
✪✪ Given a dictionary structured as a tree regarding the grades of a student in class V and VI, RETURN an array containing the average for each subject
Example:
>>> averages([
{'id' : 1, 'subject' : 'math', 'V' : 70, 'VI' : 82},
{'id' : 1, 'subject' : 'italian', 'V' : 73, 'VI' : 74},
{'id' : 1, 'subject' : 'german', 'V' : 75, 'VI' : 86}
])
[ 76.0 , 73.5, 80.5 ]
which corresponds to
[ (70+82)/2 , (73+74)/2, (75+86)/2 ]
[53]:
def averages(lista):
ret = [0.0, 0.0, 0.0]
for i in range(len(lista)):
ret[i] = (lista[i]['V'] + lista[i]['VI']) / 2
return ret
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
import math
def is_list_close(lista, listb):
""" Verifies the float numbers in lista are similar to nubers in listb
"""
if len(lista) != len(listb):
return False
for i in range(len(lista)):
if not math.isclose(lista[i], listb[i]):
return False
return True
assert is_list_close(averages([
{'id' : 1, 'subject' : 'math', 'V' : 70, 'VI' : 82},
{'id' : 1, 'subject' : 'italian', 'V' : 73, 'VI' : 74},
{'id' : 1, 'subject' : 'german', 'V' : 75, 'VI' : 86}
]),
[ 76.0 , 73.5, 80.5 ])
# TEST END
has_pref¶
✪✪ A big store has a database of clients modelled as a dictionary which associates customer names to their preferences regarding the categories of articles the usually buy:
{
'aldo':['cinema', 'music', 'sport'],
'giovanni':['music'],
'giacomo':['cinema', 'videogames']
}
Given the dictionary, the customer name and a category, write a function has_pref
which RETURN True
if that client has the given preference, False
otherwise
Example:
ha_pref({
'aldo':['cinema', 'musica', 'sport'],
'giovanni':['musica'],
'giacomo':['cinema', 'videogiochi']
}, 'aldo', 'musica')
deve ritornare True
perchè ad aldo
piace la musica, invece
has_pref({'aldo':['cinema', 'music', 'sport'],
'giovanni':['music'],
'giacomo':['cinema', 'videogames']
}, 'giacomo', 'sport')
Must return False
because Giacomo does not like sport
[54]:
def has_pref(d, name, pref):
#jupman-raise
if name in d:
return pref in d[name]
else:
return False
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert has_pref({}, 'a', 'x') == False
assert has_pref({'a':[]}, 'a', 'x') == False
assert has_pref({'a':['x']}, 'a', 'x') == True
assert has_pref({'a':['x']}, 'b', 'x') == False
assert has_pref({'a':['x','y']}, 'a', 'y') == True
assert has_pref({'a':['x','y'],
'b':['y','x','z']}, 'b', 'y') == True
assert has_pref({'aldo':['cinema', 'music', 'sport'],
'giovanni':['music'],
'giacomo':['cinema', 'videogames']
}, 'aldo', 'music') == True
assert has_pref({'aldo':['cinema', 'music', 'sport'],
'giovanni':['music'],
'giacomo':['cinema', 'videogames']
}, 'giacomo', 'sport') == False
# TEST END
[ ]:
Control flow solutions¶
Introduction¶
In this practical we will work with conditionals (branching) and loops.
References:
Complex statements: Andrea Passerini slides A03
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-exercises
|- lists
|- control-flow-exercise.ipynb
|- control-flow-solution.ipynb
WARNING 1: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/control-flow/control-flow-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate to the unzipped folder while in Jupyter browser!
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Execution flow¶
Recall from the lecture that there are at least three types of execution flows. Our statements can be simple and structured sequentially, when one instruction is executed right after the previous one, but some more complex flows involve conditional branching (when the portion of the code to be executed depends on the value of some condition), or loops when a portion of the code is executed multiple times until a certain condition becomes False.
These portions of code are generally called blocks and Python, unlike most of the programming languages, uses indentation (and some keywords like else, ‘:’, ‘next’, etc.) to define blocks.
Conditionals¶
We can use conditionals any time a decision needs to be made depending on the value of some condition. A block of code will be executed if the condition is evaluated to the boolean True and another one if the condition is evaluated to False.
The basic if - else statement¶
The basic syntax of conditionals is an if statement like:
if condition :
# This is the True branch
# do something
else:
# This is the False branch (or else branch)
# do something else
where condition is a boolean expression that tells the interpreter which of the two blocks should be executed. If and only if the condition is True the first branch is executed, otherwise execution goes to the second branch (i.e. the else branch). Note that the condition is followed by a “:” character and that the two branches are indented. This is the way Python uses to identify the block of instructions that belong to the same branch. The else keyword is followed by “:” and is not indented (i.e. it is at the same level of the if statement. There is no keyword at the end of the “else branch”, but indentation tells when the block of code is finished.
Example: Let’s get an integer from the user and test if it is even or odd, printing the result to the screen.
print("Dear user give me an integer:")
num = int(input())
res = ""
if num % 2 == 0:
#The number is even
res = "even"
else:
#The number is odd
res = "odd"
print("Number ", num, " is ", res)
Dear user give me an integer:
34
Number 34 is even
Note that the execution is sequential until the if keyword, then it branches until the indentation goes back to the same level of the if (i.e. the two branches rejoin at the print statement in the final line). Remember that the else branch is optional.
The if - elif - else statement¶
If statements can be chained in such a way that there are more than two possible branches to be followed. Chaining them with the if - elif - else statement will make execution follow only one of the possible paths.
The syntax is the following:
if condition :
# This is branch 1
# do something
elif condition1 :
# This is branch 2
# do something
elif condition2 :
# This is branch 3
# do something
else:
# else branch. Executed if all other conditions are false
# do something else
Note that branch 1 is executed if condition is True, branch 2 if and only if condition is False and condition1 is True, branch 3 if condition is False, condition 1 is False and condition2 is True. If all conditions are False the else branch is executed.
Example: The tax rate of a salary depends on the income. If the income is < 10000 euros, no tax is due, if the income is between 10000 euros and 20000 the tax rate is 25%, if between 20000 and 45000 it is 35% otherwise it is 40%. What is the tax due by a person earning 35000 euros per year?
[1]:
income = 35000
rate = 0.0
if income < 10000:
rate = 0
elif income < 20000:
rate = 0.2
elif income < 45000:
rate = 0.35
else:
rate = 0.4
tax = income*rate
print("The tax due is ", tax, " euros (i.e ", rate*100, "%)")
The tax due is 12250.0 euros (i.e 35.0 %)
Note the difference in the two following cases:
[2]:
#Example 1
val = 10
if val > 5:
print("Value >5")
elif val > 5:
print("I said value is >5!")
else:
print("Value is <= 5")
Value >5
[3]:
#Example 2
val = 10
if(val > 5):
print("\n\nValue is >5")
if(val > 5):
print("I said Value is >5!!!")
Value is >5
I said Value is >5!!!
Nested ifs¶
If statements are blocks so they can be nested as any other block.
If you have a point with coordinates x
and y
and you want to know into which quadrant it falls
You might write something like this:
[4]:
x = 5
y = 9
if x >= 0:
if y >= 0:
print('first quadrant')
else:
print('fourth quadrant')
else:
if y >= 0:
print('second quadrant')
else:
print('third quadrant')
first quadrant
an equivalent way could be to use boolean expressions and write:
[5]:
if x >= 0 and y >= 0:
print('first quadrant')
elif x >= 0 and y < 0:
print('fourth quadrant')
elif x < 0 and y >= 0:
print('second quadrant')
elif x < 0 and y < 0:
print('third quadrant')
first quadrant
Ternary operator¶
In some cases it is handy to be able to initialize a variable depending on the value of another one.
Example:
The discount rate applied to a purchase depends on the amount of the sale. Create a variable discount setting its value to 0 if the variable amount is lower than 100 euros, to 10% if it is higher.
[6]:
amount = 110
discount = 0
if(amount >100):
discount = 0.1
else:
discount = 0 # not necessary
print("Total amount:", amount, "discount:", discount)
Total amount: 110 discount: 0.1
The previous code can be written more coincisely as:
[7]:
amount = 110
discount = 0.1 if amount > 100 else 0
print("Total amount:", amount, "discount:", discount)
Total amount: 110 discount: 0.1
The basic syntax of the ternary operator is:
variable = value if condition else other_value
meaning that the variable is initialized to value if the condition holds, otherwise to other_value.
Python also allows in line operations separated by a “;”
[8]:
a = 10; b = a + 1; c = b +2
print(a,b,c)
10 11 13
Note: Although the ternary operator and in line operations are sometimes useful and less verbose than the explicit definition, they are considered “non-pythonic” and advised against.
Loops¶
Looping is the ability of repeating a specific block of code several times (i.e. until a specific condition is True or there are no more elements to process).
For loop¶
The for loop is used to loop over a collection of objects (e.g. a string, list, tuple, …). The basic syntax of the for loop is the following:
for elem in collection :
# OK, do something with elem
# instruction 1
# instruction 2
the variable elem
will get the value of each one of the elements present in collection
one after the other. The end of the block of code to be executed for each element in the collection is again defined by indentation.
Depending on the type of the collection elem will get different values. Recall from the lecture that:
Let’s see this in action:
[9]:
S = "Hi there from python"
Slist = S.split(" ")
Stuple = ("Hi","there","from","python")
print("String:", S)
print("List:", Slist)
print("Tuple:", Stuple)
String: Hi there from python
List: ['Hi', 'there', 'from', 'python']
Tuple: ('Hi', 'there', 'from', 'python')
[10]:
#for loop on string
print("On strings:")
for c in S:
print(c)
On strings:
H
i
t
h
e
r
e
f
r
o
m
p
y
t
h
o
n
[11]:
print("\nOn lists:")
#for loop on list
for item in Slist:
print(item)
On lists:
Hi
there
from
python
[12]:
print("\nOn tuples:")
#for loop on list
for item in Stuple:
print(item)
On tuples:
Hi
there
from
python
Looping over a range¶
It is possible to loop over a range of values with the python built-in function range
. The range
function accepts either two or three parameters (all of them are integers). Similarly to the slicing operator, it needs the starting point, end point and an optional step.
Three distinct syntaxes are available:
range(E) # ranges from 0 to E-1
range(S,E) # ranges from S to E-1
range(S,E,step) # ranges from S to E-1 with +step jumps
Remember that S is included while E is excluded. Let’s see some examples.
Example: Given a list of integers, return a list with all the even numbers.
[13]:
myList = [1, 7, 9, 121, 77, 82]
onlyEven = []
for i in range(0, len(myList)): #this is equivalent to range(len(myList)):
if( myList[i] % 2 == 0 ):
onlyEven.append(myList[i])
print("original list:", myList)
print("only even numbers:", onlyEven)
original list: [1, 7, 9, 121, 77, 82]
only even numbers: [82]
Example: Store in a list the multiples of 19 between 1 and 100.
[14]:
multiples = []
for i in range(19,101,19):
multiples.append(i)
print("multiples of 19: ", multiples)
#alternative way:
multiples = []
for i in range(1, (100//19) + 1):
multiples.append(i*19)
print("multiples of 19:", multiples)
multiples of 19: [19, 38, 57, 76, 95]
multiples of 19: [19, 38, 57, 76, 95]
Note: range works differently in Python 2.x and 3.x
In Python 3 the range function returns an iterator rather storing the entire list.
[15]:
#Check out the difference:
print(range(0,10))
print(list(range(0,10)))
range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Example: Let’s consider the two DNA strings s1 = “ATACATATAGGGCCAATTATTATAAGTCAC” and s2 = “CGCCACTTAAGCGCCCTGTATTAAAGTCGC” that have the same length. Let’s create a third string \(out\) such that \(out[i]\) is \("|"\) if \(s1[i]==s2[i]\), \("\ "\) otherwise.
[16]:
s1 = "ATACATATAGGGCCAATTATTATAAGTCAC"
s2 = "CGCCACTTAAGCGCCCTGTATTAAAGTCGC"
outSTR = ""
for i in range(len(s1)):
if(s1[i] == s2[i]):
outSTR = outSTR + "|"
else:
outSTR = outSTR + " "
print(s1)
print(outSTR)
print(s2)
ATACATATAGGGCCAATTATTATAAGTCAC
|| || | | | | ||||| |
CGCCACTTAAGCGCCCTGTATTAAAGTCGC
Nested for loops¶
In some occasions it is useful to nest one (or more) for loops into another one. The basic syntax is:
for i in collection:
for j in another_collection:
# do some stuff with i and j
Example:
Given the matrix \(\begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}\) stored as a list of lists (i.e. matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]].
Print it out as: \(\begin{matrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{matrix}\)
[17]:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
for i in range(len(matrix)):
line = ""
for j in range(len(matrix[i])):
line = line + str(matrix[i][j]) + " " #note int --> str conversion!
print(line)
1 2 3
4 5 6
7 8 9
While loops¶
The for loop is great when we have to iterate over a finite sequence of elements. But when one needs to loop until a specific condition holds true, another construct must be used: the while statement. The loop will end when the condition becomes false.
The basic syntax is the following:
while condition:
# do something
# update the value of condition
An example follows:
[18]:
i = 0
while (i < 5):
print("i now is:", i)
i = i + 1 #THIS IS VERY IMPORTANT!
i now is: 0
i now is: 1
i now is: 2
i now is: 3
i now is: 4
Note that if condition is false at the beginning the block of code is never executed.
Note: The loop will continue until condition holds true and the only code executed is the block defined through the indentation. This block of code must update the value of condition otherwise the interpreter will get stuck in the loop and will never exit.
We can combine for loops and while loops one into the code block of the other:
Break and continue¶
Sometimes it is useful to skip an entire iteration of a loop or end the loop before its supposed end. This can be achieved with two different statements: continue and break.
Continue statement¶
Within a for or while loop, continue makes the interpreter skip that iteration and move to the next.
Example: Print all the odd numbers from 1 to 20.
[19]:
#Two equivalent ways
#1. Testing remainder == 1
for i in range(21):
if(i % 2 == 1):
print(i, end = " ")
print("")
#2. Skipping if remainder == 0 in for
for i in range(21):
if(i % 2 == 0):
continue
print(i, end = " ")
1 3 5 7 9 11 13 15 17 19
1 3 5 7 9 11 13 15 17 19
Continue can be used also within while loops but we need to be careful to update the value of the variable before reaching the continue statement or we will get stuck in never-ending loops. Example: Print all the odd numbers from 1 to 20.
#Wrong code:
i = 0
while (i < 21):
if(i % 2 == 0):
continue
print(i, end = " ")
i = i + 1 # NEVER EXECUTED IF i % 2 == 0!!!!
a possible correct solution using while:
[20]:
i = -1
while( i< 20): #i is incremented in the loop, so 20!!!
i = i + 1 #the variable is updated no matter what
if(i % 2 == 0 ):
continue
print(i, end = " ")
1 3 5 7 9 11 13 15 17 19
Break statement¶
Within a for or while loop, break makes the interpreter exit the loop and continue with the sequential execution. Sometimes it is useful to get out of the loop if to complete our task we do not need to get to the end of the loop.
Example: Given the following list of integers [1,5,6,4,7,1,2,3,7] print them until a number already printed is found.
[21]:
L = [1,5,6,4,7,1,2,3,7]
found = []
for i in L:
if(i in found):
break
found.append(i)
print(i, end = " ")
1 5 6 4 7
Example: Pick a random number from 1 and 50 and count how many times it takes to randomly choose number 27. Limit the number of random picks to 40 (i.e. if more than 40 picks have been done and 27 has not been found exit anyway with a message).
[22]:
import random
iterations = 1
picks = []
while(iterations <= 40):
pick = random.randint(1,50)
picks.append(pick)
if(pick == 27):
break
iterations += 1
if(iterations == 41):
print("Sorry number 27 was never found!")
else:
print("27 found in ", iterations, "iterations")
print(picks)
Sorry number 27 was never found!
[22, 12, 16, 22, 19, 41, 50, 20, 37, 47, 18, 42, 33, 19, 18, 16, 8, 16, 36, 31, 1, 49, 19, 38, 34, 18, 45, 30, 26, 44, 7, 23, 37, 12, 38, 43, 42, 26, 46, 41]
An alternative way without using the break statement makes use of a flag variable (that when changes value will make the loop end):
[23]:
import random
found = False # This is called flag
iterations = 1
picks = []
while iterations <= 40 and found == False: #the flag is used to exit
pick = random.randint(1,50)
picks.append(pick)
if pick == 27:
found = True #update the flag, will exit at next iteration
iterations += 1
if iterations == 41 and not found:
print("Sorry number 27 was never found!")
else:
print("27 found in ", iterations -1, "iterations")
print(picks)
Sorry number 27 was never found!
[40, 46, 29, 29, 38, 1, 12, 41, 19, 39, 8, 10, 5, 18, 31, 50, 38, 18, 9, 46, 22, 47, 36, 41, 7, 43, 24, 39, 50, 47, 15, 10, 34, 8, 6, 23, 9, 1, 24, 18]
[24]:
for i in range(1,10): # or without string output
j = 1 # for i in range(1,10):
output = "" # j = 1
while(j<= i): # while(j<=i):
output = str(j) + " " + output # print(j, end = " ")
j = j + 1 # j = j + 1
print(output) # print("")
1
2 1
3 2 1
4 3 2 1
5 4 3 2 1
6 5 4 3 2 1
7 6 5 4 3 2 1
8 7 6 5 4 3 2 1
9 8 7 6 5 4 3 2 1
Exercises¶
Given the integer 134479170, print if it is divisible for the numbers from 2 to 16. Hint: use for and if.
Show/Hide Solution
Given the DNA string “GATTACATATATCAGTACAGATATATACGCGCGGGCTTACTATTAAAAACCCC”, write a Python script that reverse-complements it. To reverse-complement a string of DNA, one needs to replace and A with T, T with A, C with G and G with C, while any other character is complemented in N. Finally, the sequence has to be reversed (e.g. the first base becomes the last). For example, ATCG becomes CGAT.
Show/Hide Solution
Write a python script that creates the following pattern:
+
++
+++
++++
+++++
++++++
+++++++ <-- 7
++++++
+++++
++++
+++
++
+
Show/Hide Solution
Count how many of the first 100 integers are divisible by 2, 3, 5, 7 but not by 10 and print these counts. Be aware that a number can be divisible by more than one of these numbers (e.g. 6) and therefore it must be counted as divisible by all of them (e.g. 6 must be counted as divisible by 2 and 3).
Show/Hide Solution
@HWI-ST1296:75:C3F7CACXX:1:1101:19142:14904
CCAACAACTTTGACGCTAAGGATAGCTCCATGGCAGCATATCTGGCACAA
+
FHIIJIJJGIJJJJJ1HHHFFFFFEE:;CIDDDDDDDDDDDDEDDD-./0
Store the sequence and the quality in two strings. Create a list with all the quality phred scores (given a quality character “X” the phred score is: ord(“X”) -33. Finally print all the bases that have quality lower than 25, reporting the base, its position, quality character and phred score. Output example: base: C index: 14 qual:1 phred: 16).
Show/Hide Solution
Given the following sequence:
AUGCUGUCUCCCUCACUGUAUGUAAAUUGCAUCUAGAAUAGCA
UCUGGAGCACUAAUUGACACAUAGUGGGUAUCAAUUAUUA
UUCCAGGUACUAGAGAUACCUGGACCAUUAACGGAUAAAU
AGAAGAUUCAUUUGUUGAGUGACUGAGGAUGGCAGUUCCU
GCUACCUUCAAGGAUCUGGAUGAUGGGGAGAAACAGAGAA
CAUAGUGUGAGAAUACUGUGGUAAGGAAAGUACAGAGGAC
UGGUAGAGUGUCUAACCUAGAUUUGGAGAAGGACCUAGAA
GUCUAUCCCAGGGAAAUAAAAAUCUAAGCUAAGGUUUGAG
GAAUCAGUAGGAAUUGGCAAAGGAAGGACAUGUUCCAGAU
GAUAGGAACAGGUUAUGCAAAGAUCCUGAAAUGGUCAGAG
CUUGGUGCUUUUUGAGAACCAAAAGUAGAUUGUUAUGGAC
CAGUGCUACUCCCUGCCUCUUGCCAAGGGACCCCGCCAAG
CACUGCAUCCCUUCCCUCUGACUCCACCUUUCCACUUGCC
CAGUAUUGUUGGUGU
Considering the genetic code and the first forward open reading frame (i.e. the string as it is remembering to remove newlines).
How many start codons are present in the whole sequence (i.e. AUG)?
How many stop codons (i.e. UAA,UAG, UGA)
Create another string in which any codon with except the start and stop codons are substituted with “—” and print the resulting string.
Show/Hide Solution
Playing time! Write a python scripts that:
Picks a random number from 1 to 10, with: import random myInt = random.randint(1,10)
Asks the user to guess a number and checks if the user has guessed the right one
If the guess is right the program will stop with a congratulation message
If the guess is wrong the program will continue asking a number, reporting the numbers already guessed (hint: store them in a list and print it).
Modify the program to notify the user if he/she inputs the same number more than once.
Show/Hide Solution
Functions - solutions¶
Introduction¶
References:
A function takes some parameters and uses them to produce or report some result.
In this notebook we will see how to define functions to reuse code, and talk about the scope of variables
References
Thinking in Python, Chapter 6, Fruitful functions NOTE: in the book they use the weird term ‘fruitful functions’ for those functions which RETURN a value (mind you, RETURN a value, which is different from PRINTing it), and use also the term ‘void functions’ for functions which do not return anything but have some effect like PRINTing to screen. Please ignore these terms.
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-exercises
|- functions
|- functions-exercise.ipynb
|- functions-solution.ipynb
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/functions/functions-exercise.ipynb
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
What is a function ?¶
A function is a block of code that has a name and that performs a task. A function can be thought of as a box that gets an input and returns an output.
Why should we use functions? For a lot of reasons including:
Reduce code duplication: put in functions parts of code that are needed several times in the whole program so that you don’t need to repeat the same code over and over again;
Decompose a complex task: make the code easier to write and understand by splitting the whole program in several easier functions;
both things improve code readability and make your code easier to understand.
The basic definition of a function is:
def function_name(input) :
#code implementing the function
...
...
return return_value
Functions are defined with the def keyword that proceeds the function_name and then a list of parameters is passed in the brackets. A colon : is used to end the line holding the definition of the function. The code implementing the function is specified by using indentation. A function might or might not return a value. In the first case a return statement is used.
Example:
Define a function that implements the sum of two integer lists (note that there is no check that the two lists actually contain integers and that they have the same size).
[2]:
def int_list_sum(la,lb):
"""implements the sum of two lists of integers having the same size
"""
ret =[]
for i in range(len(la)):
ret.append(la[i] + lb[i])
return ret
La = list(range(1,10))
print("La:", La)
La: [1, 2, 3, 4, 5, 6, 7, 8, 9]
[3]:
Lb = list(range(20,30))
print("Lb:", Lb)
Lb: [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[4]:
res = int_list_sum(La,Lb)
[5]:
print("La+Lb:", res)
La+Lb: [21, 23, 25, 27, 29, 31, 33, 35, 37]
[6]:
res = int_list_sum(La,La)
[7]:
print("La+La", res)
La+La [2, 4, 6, 8, 10, 12, 14, 16, 18]
Note that once the function has been defined, it can be called as many times as wanted with different input parameters. Moreover, a function does not do anything until it is actually called. A function can return 0 (in this case the return value would be “None”), 1 or more results. Notice also that collecting the results of a function is not mandatory.
Example: Let’s write a function that, given a list of elements, prints only the even-placed ones without returning anything.
[8]:
def get_even_placed(myList):
"""returns the even placed elements of myList"""
ret = [myList[i] for i in range(len(myList)) if i % 2 == 0]
print(ret)
[9]:
L1 = ["hi", "there", "from","python","!"]
[10]:
L2 = list(range(13))
[11]:
print("L1:", L1)
L1: ['hi', 'there', 'from', 'python', '!']
[12]:
print("L2:", L2)
L2: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[13]:
print("even L1:")
get_even_placed(L1)
even L1:
['hi', 'from', '!']
[14]:
print("even L2:")
get_even_placed(L2)
even L2:
[0, 2, 4, 6, 8, 10, 12]
Note that the function above is polymorphic (i.e. it works on several data types, provided that we can iterate through them).
Example: Let’s write a function that, given a list of integers, returns the number of elements, the maximum and minimum.
[15]:
def get_info(myList):
"""returns len of myList, min and max value (assumes elements are integers)"""
tmp = myList[:] #copy the input list
tmp.sort()
return len(tmp), tmp[0], tmp[-1] #return type is a tuple
A = [7, 1, 125, 4, -1, 0]
print("Original A:", A, "\n")
Original A: [7, 1, 125, 4, -1, 0]
[16]:
result = get_info(A)
[17]:
print("Len:", result[0], "Min:", result[1], "Max:",result[2], "\n" )
Len: 6 Min: -1 Max: 125
[18]:
print("A now:", A)
A now: [7, 1, 125, 4, -1, 0]
[19]:
def my_sum(myList):
ret = 0
for el in myList:
ret += el # == ret = ret + el
return ret
A = [1,2,3,4,5,6]
B = [7, 9, 4]
[20]:
s = my_sum(A)
[21]:
print("List A:", A)
print("Sum:", s)
List A: [1, 2, 3, 4, 5, 6]
Sum: 21
[22]:
s = my_sum(B)
[23]:
print("List B:", B)
print("Sum:", s)
List B: [7, 9, 4]
Sum: 20
Please note that the return value above is actually a tuple. Importantly enough, a function needs to be defined (i.e. its code has to be written) before it can actually be used.
[24]:
A = [1,2,3]
my_sum(A)
def my_sum(myList):
ret = 0
for el in myList:
ret += el
return ret
Namespace and variable scope¶
Namespaces are mappings from names to objects, or in other words places where names are associated to objects. Namespaces can be considered as the context. According to Python’s reference a scope is a textual region of a Python program, where a namespace is directly accessible, which means that Python will look into that namespace to find the object associated to a name. Four namespaces are made available by Python:
Local: the innermost that contains local names (inside a function or a class);
Enclosing: the scope of the enclosing function, it does not contain local nor global names (nested functions) ;
Global: contains the global names;
Built-in: contains all built in names (e.g. print, if, while, for,…)
When one refers to a name, Python tries to find it in the current namespace, if it is not found it continues looking in the namespace that contains it until the built-in namespace is reached. If the name is not found there either, the Python interpreter will throw a NameError exception, meaning it cannot find the name. The order in which namespaces are considered is: Local, Enclosing, Global and Built-in (LEGB).
Consider the following example:
[25]:
def my_function():
var = 1 #local variable
print("Local:", var)
b = "my string"
print("Local:", b)
var = 7 #global variable
my_function()
print("Global:", var)
print(b)
Local: 1
Local: my string
Global: 7
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-56-7dd8330a24f0> in <module>
8 my_function()
9 print("Global:", var)
---> 10 print(b)
NameError: name 'b' is not defined
Variables defined within a function can only be seen within the function. That is why variable b is defined only within the function. Variables defined outside all functions are global to the whole program. The namespace of the local variable is within the function my_function, while outside it the variable will have its global value.
And the following:
[26]:
def outer_function():
var = 1 #outer
def inner_function():
var = 2 #inner
print("Inner:", var)
print("Inner:", B)
inner_function()
print("Outer:", var)
var = 3 #global
B = "This is B"
outer_function()
print("Global:", var)
print("Global:", B)
Inner: 2
Inner: This is B
Outer: 1
Global: 3
Global: This is B
Note in particular that the variable B is global, therefore it is accessible everywhere and also inside the inner_function. On the contrary, the value of var defined within the inner_function is accessible only in the namespace defined by it, outside it will assume different values as shown in the example.
In a nutshell, remember the three simple rules seen in the lecture. Within a def:
1. Name assignments create local names by default;
2. Name references search the following four scopes in the order:
local, enclosing functions (if any), then global and finally built-in (LEGB)
3. Names declared in global and nonlocal statements map assigned names to
enclosing module and function scopes.
Argument passing¶
Arguments are the parameters and data we pass to functions. When passing arguments, there are three important things to bear in mind are:
Passing an argument is actually assigning an object to a local variable name;
Assigning an object to a variable name within a function does not affect the caller;
Changing a mutable object variable name within a function affects the caller
Consider the following examples:
[27]:
"""Assigning the argument does not affect the caller"""
def my_f(x):
x = "local value" #local
print("Local: ", x)
x = "global value" #global
my_f(x)
print("Global:", x)
my_f(x)
Local: local value
Global: global value
Local: local value
[28]:
"""Changing a mutable affects the caller"""
def my_f(myList):
myList[1] = "new value1"
myList[3] = "new value2"
print("Local: ", myList)
myList = ["old value"]*4
print("Global:", myList)
my_f(myList)
print("Global now: ", myList)
Global: ['old value', 'old value', 'old value', 'old value']
Local: ['old value', 'new value1', 'old value', 'new value2']
Global now: ['old value', 'new value1', 'old value', 'new value2']
Recall what seen in the lecture:
The behaviour above is because immutable objects are passed by value (therefore it is like making a copy), while mutable objects are passed by reference (therefore changing them effectively changes the original object).
To avoid making changes to a mutable object passed as parameter one needs to explicitely make a copy of it.
Consider the example seen before. Example: Let’s write a function that, given a list of integers, returns the number of elements, the maximum and minimum.
[29]:
def get_info(myList):
"""returns len of myList, min and max value (assumes elements are integers)"""
myList.sort()
return len(myList), myList[0], myList[-1] #return type is a tuple
def get_info_copy(myList):
"""returns len of myList, min and max value (assumes elements are integers)"""
tmp = myList[:] #copy the input list!!!!
tmp.sort()
return len(tmp), tmp[0], tmp[-1] #return type is a tuple
A = [7, 1, 125, 4, -1, 0]
B = [70, 10, 1250, 40, -10, 0, 10]
print("A:", A)
result = get_info(A)
A: [7, 1, 125, 4, -1, 0]
[30]:
print("Len:", result[0], "Min:", result[1], "Max:",result[2] )
Len: 6 Min: -1 Max: 125
[31]:
print("A now:", A) #whoops A is changed!!!
A now: [-1, 0, 1, 4, 7, 125]
[32]:
print("\nB:", B)
B: [70, 10, 1250, 40, -10, 0, 10]
[33]:
result = get_info_copy(B)
[34]:
print("Len:", result[0], "Min:", result[1], "Max:",result[2] )
Len: 7 Min: -10 Max: 1250
[35]:
print("B now:", B) #B is not changed!!!
B now: [70, 10, 1250, 40, -10, 0, 10]
Positional arguments¶
Arguments can be passed to functions following the order in which they appear in the function definition.
Consider the following example:
[36]:
def print_parameters(a,b,c,d):
print("1st param:", a)
print("2nd param:", b)
print("3rd param:", c)
print("4th param:", d)
print_parameters("A", "B", "C", "D")
1st param: A
2nd param: B
3rd param: C
4th param: D
Passing arguments by keyword¶
Given the name of an argument as specified in the definition of the function, parameters can be passed using the name = value syntax.
For example:
[37]:
def print_parameters(a,b,c,d):
print("1st param:", a)
print("2nd param:", b)
print("3rd param:", c)
print("4th param:", d)
print_parameters(a = 1, c=3, d=4, b=2)
1st param: 1
2nd param: 2
3rd param: 3
4th param: 4
[38]:
print_parameters("first","second",d="fourth",c="third")
1st param: first
2nd param: second
3rd param: third
4th param: fourth
Arguments passed positionally and by name can be used at the same time, but parameters passed by name must always be to the left of those passed by name. The following code in fact is not accepted by the Python interpreter:
def print_parameters(a,b,c,d):
print("1st param:", a)
print("2nd param:", b)
print("3rd param:", c)
print("4th param:", d)
print_parameters(d="fourth",c="third", "first","second")
File "<ipython-input-60-4991b2c31842>", line 7
print_parameters(d="fourth",c="third", "first","second")
^
SyntaxError: positional argument follows keyword argument
Specifying default values¶
During the definition of a function it is possible to specify default values. The syntax is the following:
def my_function(par1 = val1, par2 = val2, par3 = val3):
Consider the following example:
[39]:
def print_parameters(a="defaultA", b="defaultB",c="defaultC"):
print("a:",a)
print("b:",b)
print("c:",c)
print_parameters("param_A")
a: param_A
b: defaultB
c: defaultC
[40]:
print_parameters(b="PARAMETER_B")
a: defaultA
b: PARAMETER_B
c: defaultC
[41]:
print_parameters()
a: defaultA
b: defaultB
c: defaultC
[42]:
print_parameters(c="PARAMETER_C", b="PAR_B")
a: defaultA
b: PAR_B
c: PARAMETER_C
Simple exercises¶
sum2¶
✪ Write function sum2
which given two numbers x
and y
RETURN their sum
QUESTION: Why do we call it sum2
instead of just sum
??
[43]:
sum([2,51])
[43]:
53
ANSWER: sum
is already defined as standard python function, we do not want to overwrite it. Look at how in the following snippet it displays in green:
>>> sum([5,8])
13
[44]:
# write here
def sum2(x,y):
return x + y
[45]:
s = sum2(3,6)
print(s)
9
[46]:
s = sum2(-1,3)
print(s)
2
comparep¶
✪ Write a function comparep
which given two numbers x
and y
, PRINTS x is greater than y
, x is less than y
, x is equal to y
NOTE: in print, put real numbers. For example, comparep(10,5) should print:
10 is greater than 5
HINT: to print numbers and text, use commas in print
:
print(x, " is greater than ")
[47]:
# write here
def comparep(x,y):
if x > y:
print(x, " is greater than ", y)
elif x < y:
print(x, " is less than ", y)
else:
print(x, " is equal to ", y)
[48]:
comparep(10,5)
10 is greater than 5
[49]:
comparep(3,8)
3 is less than 8
[50]:
comparep(3,3)
3 is equal to 3
comparer¶
✪ Write function comparer
which given two numbers x
andy
RETURN the STRING '>'
if x
is greater than y
, the STRING '<'
if x
is less than y
or the STRING '=='
if x
is equal to y
[51]:
# write here
def comparer(x,y):
if x > y:
return '>'
elif x < y:
return '<'
else:
return '=='
[52]:
c = comparer(10,5)
print(c)
>
[53]:
c = comparer(3,7)
print(c)
<
[54]:
c = comparer(3,3)
print(c)
==
even¶
✪ Write a function even
which given a number x
, RETURN True
if x
is even, otherwise RETURN False
HINT: a number is even when the rest of division by two is zero. To obtaing the reminder of division, write x % 2
[55]:
# Example:
2 % 2
[55]:
0
[56]:
3 % 2
[56]:
1
[57]:
4 % 2
[57]:
0
[58]:
5 % 2
[58]:
1
[59]:
# write here
def even(x):
return x % 2 == 0
[60]:
p = even(2)
print(p)
True
[61]:
p = even(3)
print(p)
False
[62]:
p = even(4)
print(p)
True
[63]:
p = even(5)
print(p)
False
[64]:
p = even(0)
print(p)
True
gre¶
✪ Write a function gre
that given two numbers x
and y
, RETURN the greatest number.
If they are equal, RETURN any number.
[65]:
# write here
def gre(x,y):
if x > y:
return x
else:
return y
[66]:
m = gre(3,5)
print(m)
5
[67]:
m = gre(6,2)
print(m)
6
[68]:
m = gre(4,4)
print(m)
4
[69]:
m = gre(-5,2)
print(m)
2
[70]:
m = gre(-5, -3)
print(m)
-3
is_vocal¶
✪ Write a function is_vocal
in which a character car
is passed as parameter, and PRINTs 'yes'
if the carachter is a vocal, otherwise PRINTs 'no'
(using the prints
).
>>> is_vocal("a")
'yes'
>>> is_vocal("c")
'no'
[71]:
# write here
def is_vocal(char):
if char == 'a' or char == 'e' or char == 'i' or char == 'o' or char == 'u':
print('yes')
else:
print('no')
sphere_volume¶
✪ The volume of a sphere of radius r
is \(4/3 π r^3\)
Write a function sphere_volume(radius)
which given a radius
of a sphere, PRINTs the volume.
NOTE: assume pi = 3.14
>>> sphere_volume(4)
267.94666666666666
[72]:
# write here
def sphere_volume(radius):
print((4/3)*3.14*(radius**3))
ciri¶
✪ Write a function ciri(name)
which takes as parameter the string name
and RETURN True
if it is equal to the name 'Cirillo'
>>> r = ciri("Cirillo")
>>> r
True
>>> r = ciri("Cirillo")
>>> r
False
[73]:
# write here
def ciri(name):
if name == "Cirillo":
return True
else:
return False
age¶
✪ Write a function age
which takes as parameter year
of birth and RETURN the age of the person
**Suppose the current year is known, so to represent it in the function body use a constant like 2019
:
>>> a = age(2003)
>>> print(a)
16
[74]:
# write here
def age(year):
return 2019 - year
Verify comprehension¶
Following exercises require you to know:
ATTENTION
Following exercises require you to know:
Complex statements: Andrea Passerini slides A03
Tests with asserts: Following exercises contain automated tests to help you spot errors. To understand how to do them, read before Error handling and testing
gre3¶
✪✪ Write a function gre3(a,b,c)
which takes three numbers and RETURN the greatest among them
Examples:
>>> gre3(1,2,4)
4
>>> gre3(5,7,3)
7
>>> gre3(4,4,4)
4
[75]:
# write ehere
def gre3(a,b,c):
if a > b:
if a>c:
return a
else:
return c
else:
if b > c:
return b
else:
return c
assert gre3(1,2,4) == 4
assert gre3(5,7,3) == 7
assert gre3(4,4,4) == 4
final_price¶
✪✪ The cover price of a book is € 24,95, but a library obtains 40% of discount. Shipping costs are € 3 for first copy and 75 cents for each additional copy. How much n
copies cost ?
Write a function final_price(n)
which RETURN the price.
ATTENTION 1: For numbers Python wants a dot, NOT the comma !
ATTENTION 2: If you ordered zero books, how much should you pay ?
HINT: the 40% of 24,95 can be calculated by multiplying the price by 0.40
>>> p = final_price(10)
>>> print(p)
159.45
>>> p = final_price(0)
>>> print(p)
0
[76]:
def final_price(n):
#jupman-raise
if n == 0:
return 0
else:
return n* 24.95*0.6 + 3 +(n-1)*0.75
#/jupman-raise
assert final_price(10) == 159.45
assert final_price(0) == 0
arrival_time¶
✪✪✪ By running slowly you take 8 minutes and 15 seconds per mile, and by running with moderate rhythm you take 7 minutes and 12 seconds per mile.
Write a function arrival_time(n,m)
which, supposing you start at 6:52, given n
miles run with slow rhythm and m
with moderate rhythm, PRINTs arrival time.
HINT 1: to calculate an integer division, use
//
HINT 2: to calculate the reminder of integer division, use the module operator
%
>>> arrival_time(2,2)
7:22
[77]:
def arrival_time(n,m):
#jupman-raise
starting_hours = 6
starting_minutes = 52
# passed seconds
seconds = n * 495 + m * 432
# passed time
seconds_two = seconds % 60
minutes = seconds // 60
hours = minutes // 60
arrival_hours= hours + starting_hours
arrival_minutes= minutes + starting_minutes
final_minutes = arrival_minutes % 60
final_hours = arrival_minutes // 60 + arrival_hours
return str(final_hours) + ":" + str(final_minutes)
#/jupman-raise
assert arrival_time(0,0) == '6:52'
assert arrival_time(2,2) == '7:22'
assert arrival_time(2,5) == '7:44'
assert arrival_time(8,5) == '9:34'
[ ]:
Lambda functions¶
Lambda functions are functions which:
have no name
are defined on one line, typically right where they are needed
their body is an expression, thus you need no
return
Let’s create a lambda function which takes a number x
and doubles it:
[78]:
lambda x: x*2
[78]:
<function __main__.<lambda>(x)>
As you see, Python created a function object, which gets displayed by Jupyter. Unfortunately, at this point the function object got lost, because that is what happens to any object created by an expression that is not assigned to a variable.
To be able to call the function, we will thus convenient to assign such function object to a variable, say f
:
[79]:
f = lambda x: x*2
[80]:
f
[80]:
<function __main__.<lambda>(x)>
Great, now we have a function we can call as many times as we want:
[81]:
f(5)
[81]:
10
[82]:
f(7)
[82]:
14
So writing
[83]:
def f(x):
return x*2
or
[84]:
f = lambda x: x*2
are completely equivalent forms, the main difference being with def
we can write functions with bodies on multiple lines. Lambdas may appear limited, so why should we use them? Sometimes they allow for very concise code. For example, imagine you have a list of tuples holding animals and their lifespan:
[85]:
animals = [('dog', 12), ('cat', 14), ('pelican', 30), ('eagle', 25), ('squirrel', 6)]
If you want to sort them, you can try the .sort
method but it will not work:
[86]:
animals.sort()
[87]:
animals
[87]:
[('cat', 14), ('dog', 12), ('eagle', 25), ('pelican', 30), ('squirrel', 6)]
Clearly, this is not what we wanted. To get proper ordering, we need to tell python that when it considers a tuple for comparison, it should extract the lifespan number. To do so, Pyhton provides us with key
parameter, which we must pass a function that takes as argument the list element under consideration (in this case a tuple) and will return a trasformation of it (in this case the number at 1-th position):
[88]:
animals.sort(key=lambda t: t[1])
[89]:
animals
[89]:
[('squirrel', 6), ('dog', 12), ('cat', 14), ('eagle', 25), ('pelican', 30)]
Now we got the ordering we wanted. We could have written the thing as
[90]:
def myf(t):
return t[1]
animals.sort(key=myf)
animals
[90]:
[('squirrel', 6), ('dog', 12), ('cat', 14), ('eagle', 25), ('pelican', 30)]
but lambdas clearly save some keyboard typing
Notice lambdas can take multiple parameters:
[91]:
mymul = lambda x,y: x * y
mymul(2,5)
[91]:
10
Exercises: lambdas¶
apply_borders¶
✪ Write a function apply_borders
which takes a function f
as parameter and a sequence, and RETURN a tuple holding two elements:
first element is obtained by applying
f
to the first element of the sequencesecond element is obtained by appling
f
to the last element of the sequence
Example:
>>> apply_borders(lambda x: x.upper(), ['the', 'river', 'is', 'very', 'long'])
('THE', 'LONG')
>>> apply_borders(lambda x: x[0], ['the', 'river', 'is', 'very', 'long'])
('t', 'l')
[92]:
# write here
def apply_borders(f, seq):
return ( f(seq[0]), f(seq[-1]) )
[93]:
print(apply_borders(lambda x: x.upper(), ['the', 'river', 'is', 'very', 'long']))
print(apply_borders(lambda x: x[0], ['the', 'river', 'is', 'very', 'long']))
('THE', 'LONG')
('t', 'l')
process¶
✪✪ Write a lambda expression to be passed as first parameter of the function process
defined down here, so that a call to process generates a list as shown here:
>>> f = PUT_YOUR_LAMBDA_FUNCTION
>>> process(f, ['d','b','a','c','e','f'], ['q','s','p','t','r','n'])
['An', 'Bp', 'Cq', 'Dr', 'Es', 'Ft']
NOTE: process
is already defined, you do not need to change it
[94]:
def process(f, lista, listb):
orda = list(sorted(lista))
ordb = list(sorted(listb))
ret = []
for i in range(len(lista)):
ret.append(f(orda[i], ordb[i]))
return ret
# write here the f = lambda ...
f = lambda x,y: x.upper() + y
[95]:
process(f, ['d','b','a','c','e','f'], ['q','s','p','t','r','n'])
[95]:
['An', 'Bp', 'Cq', 'Dr', 'Es', 'Ft']
Error handling and testing solutions¶
Introduction¶
In this notebook we will try to understand what our program should do when it encounters unforeseen situations, and how to test the code we write. In particular, we will describe the exercise format as proposed in Part A and in Part B (they are different!)
For some strange reason, many people believe that computer programs do not need much error handling nor testing. Just to make a simple comparison, would you ever drive a car that did not undergo scrupolous checks? We wouldn’t.
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-exercises
|- errors-and-testing
|- errors-and-testing-exercise.ipynb
|- errors-and-testing-solution.ipynb
WARNING 1: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/strings/strings-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate to the unzipped folder while in Jupyter browser!
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Unforeseen situations¶
It is evening, there is to party for a birthday and they asked you to make a pie. You need the following steps:
take milk
take sugar
take flour
mix
heat in the oven
You take the milk, the sugar, but then you discover there is no flour. It is evening, and there aren’t open shops. Obviously, it makes no sense to proceed to point 4 with the mixture, and you have to give up on the pie, telling the guest of honor the problem. You can only hope she/he decides for some alternative.
Translating everything in Python terms, we can ask ourselves if during the function execution, when we find an unforeseen situation, is it possible to:
interrupt the execution flow of the program
signal to whoever called the function that a problem has occurred
allow to manage the problem to whoever called the function
The answer is yes, you can do it with the mechanism of exceptions (Exception
)
make_problematic_pie¶
Let’s see how we can represent the above problem in Python. A basic version might be the following:
[2]:
def make_problematic_pie(milk, sugar, flour):
""" Suppose you need 1.3 kg for the milk, 0.2kg for the sugar and 1.0kg for the flour
- takes as parameters the quantities we have in the sideboard
"""
if milk > 1.3:
print("take milk")
else:
print("Don't have enough milk !")
if sugar > 0.2:
print("take sugar")
else:
print("Don't have enough sugar!")
if flour > 1.0:
print("take flour")
else:
print("Don't have enough flour !")
print("Mix")
print("Heat")
print("I made the pie!")
make_problematic_pie(5,1,0.3) # not enough flour ...
print("Party")
take milk
take sugar
Don't have enough flour !
Mix
Heat
I made the pie!
Party
QUESTION: this above version has a serious problem. Can you spot it ??
ANSWER: the program above is partying even when we do not have enough ingredients !
Check with the return¶
EXERCISE: We could correct the problems of the above pie by adding return
commands. Implement the following function.
WARNING: DO NOT move the print("Party")
inside the function
The exercise goal is keeping it outside, so to use the value returned by make_pie
for deciding whether to party or not.
If you have any doubts on functions with return values, check Chapter 6 of Think Python
[3]:
def make_pie(milk, sugar, flour):
""" - suppose we need 1.3 kg for milk, 0.2kg for sugar and 1.0kg for flour
- takes as parameters the quantities we have in the sideboard
IMPROVE WITH return COMMAND: RETURN True if the pie is doable,
False otherwise
*OUTSIDE* USE THE VALUE RETURNED TO PARTY OR NOT
"""
# implement here the function
#jupman-strip
if milk > 1.3:
print("take milk")
# return True # NO, it would finish right here
else:
print("Don't have enough milk !")
return False
if sugar > 0.2:
print("take sugar")
else:
print("Don't have enouch sugar !")
return False
if flour > 1.0:
print("take flour")
else:
print("Don't have enough flour !")
return False
print("Mix")
print("Heat")
print("I made the pie !")
return True
#/jupman-strip
# now write here the function call, make_pie(5,1,0.3)
# using the result to declare whether it is possible or not to party :-(
#jupman-strip
made_pie = make_pie(5,1,0.3)
if made_pie == True:
print("Party")
else:
print("No party !")
#/jupman-strip
take milk
take sugar
Don't have enough flour !
No party !
Exceptions¶
Real Python - Python Exceptions: an Introduction
Using return
we improved the previous function, but remains a problem: the responsability to understand whether or not the pie is properly made is given to the caller of the function, who has to take the returned value and decide upon that whether to party or not. A careless programmer might forget to do the check and party even with an ill-formed pie.
So we ask ourselves: is it possible to stop the execution not just of the function, but of the whole program when we find an unforeseen situation?
To improve on our previous attempt, we can use the exceptions. To tell Python to interrupt the program execution in a given point, we can insert the instruction raise
like this:
raise Exception()
If we want, we can also write a message to help programmers (who could be ourselves …) to understand the problem origin. In our case it could be a message like this:
raise Exception("Don't have enough flour !")
Note: in professional programs, the exception messages are intended for programmers, verbose, and tipically end up hidden in system logs. To final users you should only show short messages which are understanble by a non-technical public. At most, you can add an error code which the user might give to the technician for diagnosing the problem.
EXERCISE: Try to rewrite the function above by substituting the rows containing return
with raise Exception()
:
[4]:
def make_exceptional_pie(milk, sugar, flour):
""" - suppose we need 1.3 kg for milk, 0.2kg for sugar and 1.0kg for flour
- takes as parameters the quantities we have in the sideboard
- if there are missing ingredients, raises Exception
"""
# implement function
#jupman-strip
if milk > 1.3:
print("take milk")
else:
raise Exception("Don't have enough milk !")
if sugar > 0.2:
print("take sugar")
else:
raise Exception("Don't have enough sugar!")
if flour > 1.0:
print("take flour")
else:
raise Exception("Don't have enough flour!")
print("Mix")
print("Heat")
print("I made the pie !")
#/jupman-strip
Once implemented, by writing
make_exceptional_pie(5,1,0.3)
print("Party")
you should see the following (note how “Party” is not printed):
take milk
take sugar
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-10-02c123f44f31> in <module>()
----> 1 make_exceptional_pie(5,1,0.3)
2
3 print("Party")
<ipython-input-9-030239f08ca5> in make_exceptional_pie(milk, sugar, flour)
18 print("take flour")
19 else:
---> 20 raise Exception("Don't have enough flour !")
21 print("Mix")
22 print("Heat")
Exception: Don't have enough flour !
We see the program got interrupted before arriving to mix step (inside the function), and it didn’t even arrived to party (which is outside the function). Let’s try now to call the function with enough ingredients in the sideboard:
[5]:
make_exceptional_pie(5,1,20)
print("Party")
take milk
take sugar
take flour
Mix
Heat
I made the pie !
Party
Manage exceptions¶
Instead of brutally interrupting the program when problems are spotted, we might want to try some alternative (like go buying some ice cream). We could use some try
except
blocks like this:
[6]:
try:
make_exceptional_pie(5,1,0.3)
print("Party")
except:
print("Can't make the pie, what about going out for an ice cream?")
take milk
take sugar
Can't make the pie, what about going out for an ice cream?
If you note, the execution jumped the print("Party"
but no exception has been printed, and the execution passed to the row right after the except
Particular exceptions¶
Until know we used a generic Exception
, but, if you will, you can use more specific exceptions to better signal the nature of the error. For example, when you implement a function, since checking the input values for correctness is very frequent, Python gives you an exception called ValueError
. If you use it instead of Exception
, you allow the function caller to intercept only that particular error type.
If the function raises an error which is not intercepted in the catch, the program will halt.
[7]:
def make_exceptional_pie_2(milk, sugar, flour):
""" - suppose we need 1.3 kg for milk, 0.2kg for sugar and 1.0kg for flour
- takes as parameters the quantities we have in the sideboard
- if there are missing ingredients, raises Exception
"""
if milk > 1.3:
print("take milk")
else:
raise ValueError("Don't have enough milk !")
if sugar > 0.2:
print("take sugar")
else:
raise ValueError("Don't have enough sugar!")
if flour > 1.0:
print("take flour")
else:
raise ValueError("Don't have enough flour!")
print("Mix")
print("Heat")
print("I made the pie !")
try:
make_exceptional_pie_2(5,1,0.3)
print("Party")
except ValueError:
print()
print("There must be a problem with the ingredients!")
print("Let's try asking neighbors !")
print("We're lucky, they gave us some flour, let's try again!")
print("")
make_exceptional_pie_2(5,1,4)
print("Party")
except: # manages all exceptions
print("Guys, something bad happened, don't know what to do. Better to go out and take an ice-cream !")
take milk
take sugar
There must be a problem with the ingredients!
Let's try asking neighbors !
We're lucky, they gave us some flour, let's try again!
take milk
take sugar
take flour
Mix
Heat
I made the pie !
Party
For more explanations about try catch
, you can see Real Python - Python Exceptions: an Introduction
assert¶
They asked you to develop a program to control a nuclear reactor. The reactor produces a lot of energy, but requires at least 20 meters of water to cool down, and your program needs to regulate the water level. Without enough water, you risk a meltdown. You do not feel exactly up to the job, and start sweating.
Nervously, you write the code. You check and recheck the code - everything looks fine.
On inauguration day, the reactor is turned on. Unexpectedly, the water level goes down to 5 meters, and an uncontrolled chain reaction occurs. Plutoniom fireworks follow.
Could we have avoided all of this? We often believe everything is good but then for some reason we find variables with unexpected values. The wrong program described above might have been written like so:
[8]:
# we need water to cool our reactor
water_level = 40 # seems ok
print("water level: ", water_level)
# a lot of code
# a lot of code
# a lot of code
# a lot of code
water_level = 5 # forgot somewhere this bad row !
print("WARNING: water level low! ", water_level)
# a lot of code
# a lot of code
# a lot of code
# a lot of code
# after a lot of code we might not know if there are the proper conditions so that everything works allright
print("turn on nuclear reactor")
water level: 40
WARNING: water level low! 5
turn on nuclear reactor
How could we improve it? Let’s look at the assert
command, which must be written by following it with a boolean condition.
assert True
does absolutely nothing:
[9]:
print("before")
assert True
print("after")
before
after
Instead, assert False
completely blocks program execution, by launching an exception of type AssertionError
(Note how "after"
is not printed):
print("before")
assert False
print("after")
before
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-7-a871fdc9ebee> in <module>()
----> 1 assert False
AssertionError:
To improve the previous program, we might use assert
like this:
# we need water to cool our reactor
water_level = 40 # seems ok
print("water level: ", water_level)
# a lot of code
# a lot of code
# a lot of code
# a lot of code
water_level = 5 # forgot somewhere this bad row !
print("WARNING: water level low! ", water_level)
# a lot of code
# a lot of code
# a lot of code
# a lot of code
# after a lot of code we might not know if there are the proper conditions so that
# everything works allright so before doing critical things, it is always a good idea
# to perform a check ! if asserts fail (that is, the boolean expression is False),
# the execution suddenly stops
assert water_level >= 20
print("turn on nuclear reactor")
water level: 40
WARNING: water level low! 5
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-3-d553a90d4f64> in <module>
31 # the execution suddenly stops
32
---> 33 assert water_level >= 20
34
35 print("turn on nuclear reactor")
AssertionError:
When to use assert?¶
The case above is willingly exagerated, but shows how a check more sometimes prevents disasters.
Asserts are a quick way to do checks, so much so that Python even allows to ignore them during execution to improve the performance (calling python
with the -O
parameter like in python -O my_file.py
).
But if performance are not a problem (like in the reactor above), it’s more convenient to rewrite the program using an if
and explicitly raising an Exception
:
# we need water to cool our reactor
water_level = 40 # seems ok
print("water level: ", water_level)
# a lot of code
# a lot of code
# a lot of code
# a lot of code
water_level = 5 # forgot somewhere this bad row !
print("WARNING: water level low! ", water_level)
# a lot of code
# a lot of code
# a lot of code
# a lot of code
# after a lot of code we might not know if there are the proper conditions so
# that everything works all right. So before doing critical things, it is always
# a good idea to perform a check !
if water_level < 20:
raise Exception("Water level too low !") # execution stops here
print("turn on nuclear reactor")
water level: 40
WARNING: water level low! 5
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-30-4840536c3388> in <module>
30
31 if water_level < 20:
---> 32 raise Exception("Water level too low !") # execution stops here
33
34 print("turn on nuclear reactor")
Exception: Water level too low !
Note how the reactor was not turned on.
Testing¶
If it seems to work, then it actually works? Probably not.
The devil is in the details, especially for complex algorithms.
We will do a crash course on testing in Python
WARNING: Bad software can cause losses of million euros or even kill people. Suggested reading: Software Horror Stories
Where Is Your Software?¶
As a data scientist, you might likely end up with code which is algorithmically complex, but maybe not too big in size. Either way, when red line is crossed you should start testing properly:
In a typical scenario, you are a junior programmer and your senior colleague ask you to write a function to perform some task, giving only an informal description:
[10]:
def my_sum(x,y):
""" RETURN the sum of x and y
"""
raise Exception("TODO IMPLEMENT ME!")
Even better, your colleague might provide you with some automated tests you might run to check your function meets his/her expectations. If you are smart, you will even write tests for your own functions to make sure every little piece you add to your software is a solid block you can build upon.
According to the part of the course you are following, we will review two kinds of tests:
Testing with asserts¶
NOTE: Testing with asserts is only done in PART A of this course
We can use assert to quickly test functions, and verify they behave like they should.
For example, from this function:
[11]:
def my_sum(x, y):
s = x + y
return s
We expect that my_sum(2,3)
gives 5
. We can write in Python this expectation by using an assert
:
[12]:
assert my_sum(2,3) == 5
Se my_sum
is correctly implemented:
my_sum(2,3)
will give5
the boolean expression
my_sum(2,3) == 5
will giveTrue
assert True
will be exectued without producing any result, and the program execution will continue.
Otherwise, if my_sum
is NOT correctly implemented like in this case:
def my_sum(x,y):
return 666
my_sum(2,3)
will produce the number666
the boolean expression
my_sum(2,3) == 5
will giveFalse
assert False
will interrupt the program execution, raising an exception of typeAssertionError
Part A exercise structure¶
Exercises in Part A will be often structured in the following format:
def my_sum(x,y):
""" RETURN the sum of numbers x and y
"""
raise Exception("TODO IMPLEMENT ME!")
assert my_sum(2,3) == 5
assert my_sum(3,1) == 4
assert my_sum(-2,5) == 3
If you attempt to execute the cell, you will see this error:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-16-5f5c8512d42a> in <module>()
6
7
----> 8 assert my_sum(2,3) == 5
9 assert my_sum(3,1) == 4
10 assert my_sum(-2,5) == 3
<ipython-input-16-5f5c8512d42a> in somma(x, y)
3 """ RETURN the sum of numbers x and y
4 """
----> 5 raise Exception("TODO IMPLEMENT ME!")
6
7
Exception: TODO IMPLEMENT ME!
To fix them, you will need to:
substitute the row
raise Exception("IMPLEMENTAMI")
with the body of the functionexecute the cell
If cell execution doesn’t result in raised exceptions, perfect ! It means your function does what it is expected to do (the assert
which succeed do not produce any output)
Otherwise, if you see some AssertionError
, probably you did something wrong.
NOTE: The raise Exception("TODO IMPLEMENT ME")
is put there to remind you that the function has a big problem, that is, it doesn’t have any code !!! In long programs, it might happen you know you need a function, but in that moment you don’t know what code put in th efunction body. So, instead of putting in the body commands that do nothing like print()
or pass
or return None
, it is WAY BETTER to raise exceptions so that if by chance the program reaches the function, the
execution is suddenly stopped and the user is signalled with the nature and position of the problem. Many editors for programmers, when automatically generating code, put inside function skeletons to implement some Exception like this.
Let’s try to willingly write a wrong function body, which always return 5
, independently from x
and y
given in input:
def my_sum(x,y):
""" RETURN the sum of numbers x and y
"""
return 5
assert my_sum(2,3) == 5
assert my_sum(3,1) == 4
assert my_sum(-2,5) == 3
In this case the first assertion succeeds and so the execution simply passes to the next row, which contains another assert
. We expect that my_sum(3,1)
gives 4, but our ill-written function returns 5
so this assert
fails. Note how the execution is interrupted at the second assert
:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-19-e5091c194d3c> in <module>()
6
7 assert my_sum(2,3) == 5
----> 8 assert my_sum(3,1) == 4
9 assert my_sum(-2,5) == 3
AssertionError:
If we implement well the function and execute the cell we will see no output: this means the function successfully passed the tests and we can conclude that it is correct with reference to the tests:
ATTENTION: always remember that these kind of tests are never exhaustive ! If tests pass it is only an indication the function might be correct, but it is never a certainty !
[13]:
def my_sum(x,y):
""" RITORNA the sum of numbers x and y
"""
return x + y
assert my_sum(2,3) == 5
assert my_sum(3,1) == 4
assert my_sum(-2,5) == 3
EXERCISE: Try to write the body of the function multiply
:
substitute
raise Exception("TODO IMPLEMENT ME")
withreturn x * y
and execute the cell. If you have written correctly, nothing should happen. In this case, congratulatins! The code you have written is correct with reference to the tests !Try to substitute instead with
return 10
and see what happens.
[14]:
def my_mul(x,y):
""" RETURN the multiplication of numbers x and y
"""
#jupman-raise
return x * y
#/jupman-raise
assert my_mul(2,5) == 10
assert my_mul(0,2) == 0
assert my_mul(3,2) == 6
even_numbers example¶
Let’s see a slightly more complex function:
[15]:
def even_numbers(n):
"""
Return a list of the first n even numbers
Zero is considered to be the first even number.
>>> even_numbers(5)
[0,2,4,6,8]
"""
raise Exception("TODO IMPLEMENT ME!")
In this case, if you run the function as it is, you are reminded to implement it:
>>> even_numbers(5)
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-2-d2cbc915c576> in <module>()
----> 1 even_numbers(5)
<ipython-input-1-a20a4ea4b42a> in even_numbers(n)
8 [0,2,4,6,8]
9 """
---> 10 raise Exception("TODO IMPLEMENT ME!")
Exception: TODO IMPLEMENT ME!
Why? The instruction
raise Exception("TODO IMPLEMENT ME!")
tells Python to immediatly stop execution, and signal an error to the caller of the function even_number
. If there were commands right after raise Exception("TODO IMPLEMENT ME")
, they would not be executed. Here, we are directly calling the function from the prompt, and we didn’t tell Python how to handle the Exception
, so Python just stopped and showed the error message given as parameter to the Exception
Spend time reading well the function text!
Always read very well function text and ask yourself questions! What is the supposed input? What should be the output? Is there any output to return at all, or should you instead modify in-place a passed parameter (i.e. for example, when you sort a list)? Are there any edge cases, es what happens for n=0
)? What about n < 0
?
Let’s code a possible solution. As it often happens, first version may be buggy, in this case for example purposes we intentionally introduce a bug:
[16]:
def even_numbers(n):
"""
Return a list of the first n even numbers
Zero is considered to be the first even number.
>>> even_numbers(5)
[0,2,4,6,8]
"""
r = [2 * x for x in range(n)]
r[n // 2] = 3 # <-- evil bug, puts number '3' in the middle, and 3 is not even ..
return r
Typically the first test we do is printing the output and do some ‘visual inspection’ of the result, in this case we find many numbers are correct but we might miss errors such as the wrong 3
in the middle:
[17]:
print(even_numbers(5))
[0, 2, 3, 6, 8]
Furthermore, if we enter commands a the prompt, each time we fix something in the code, we need to enter commands again to check everything is ok. This is inefficient, boring, and prone to errors.
Let’s add assertions¶
To go beyond the dumb “visual inspection” testing, it’s better to write some extra code to allow Python checking for us if the function actually returns what we expect, and throws an error otherwise. We can do so with assert
command, which verifies if its argument is True. If it is not, it raises an AssertionError
immediately stopping execution.
Here we check the result of even_numbers(5)
is actually the list of even numbers [0,2,4,6,8]
we expect:
assert even_numbers(5) == [0,2,4,6,8]
Since our code is faulty, even_numbers
returns the wrong list [0,2,3,6,8]
which is different from [0,2,4,6,8]
so assertion fails showing AssertionError
:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-21-d4198f229404> in <module>()
----> 1 assert even_numbers(5) != [0,2,4,6,8]
AssertionError:
We got some output, but we would like to have it more informative. To do so, we may add a message, separated by a comma:
assert even_numbers(5) == [0,2,4,6,8], "even_numbers is not working !!"
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-18-8544fcd1b7c8> in <module>()
----> 1 assert even_numbers(5) == [0,2,4,6,8], "even_numbers is not working !!"
AssertionError: even_numbers is not working !!
So if we modify code to fix bugs we can just launch the assert commands and have a quick feedback about possible errors.
Error kinds¶
As a fact of life, errors happen. Sometimes, your program may have inconsistent data, like wrong parameter type passed to a function (i.e. string instead of integer). A good principle to follow in these cases is to try have the program detect weird situations, and stop as early as such a situation is found (i.e. in the Therac 25 case, if you detect excessive radiation, showing a warning sign is not enough, it’s better to stop). Note stopping might not always be the desirable solution (if one pidgeon enters one airplane engine, you don’t want to stop all the other engines). If you want to check function parameters are correct, you do the so called precondition checking.
There are roughly two cases for errors, external user misusing you program, and just plain wrong code. Let’s analyize both:
Error kind a) An external user misuses you program.¶
You can assume whover uses your software, final users or other programmers , they will try their very best to wreck your precious code by passing all sort of non-sense to functions. Everything can come in, strings instead of numbers, empty arrays, None
objects … In this case you should signal the user he made some mistake. The most crude signal you can have is raising an Exception
with raise Exception("Some error occurred")
, which will stop the program and print the stacktrace in
the console. Maybe final users won’t understand a stacktrace, but at least programmers hopefully will get a clue about what is happening.
In these case you can raise an appropriate Exception, like TypeError for wrong types and ValueError for more generic errors. Other basic exceptions can be found in Python documentation. Notice you can also define your own, if needed (we won’t consider custom exceptions in this course).
NOTE: Many times, you can consider yourself the ‘careless external user’ to guard against.
Let’s enrich the function with some appropriate type checking:
Note that for checking input types, you can use the function type()
:
[18]:
type(3)
[18]:
int
[19]:
type("ciao")
[19]:
str
Let’s add the code for checking the even_numbers example:
[20]:
def even_numbers(n):
"""
Return a list of the first n even numbers
Zero is considered to be the first even number.
>>> even_numbers(5)
[0,2,4,6,8]
"""
if type(n) is not int:
raise TypeError("Passed a non integer number: " + str(n))
if n < 0:
raise ValueError("Passed a negative number: " + str(n))
r = [2 * x for x in range(n)]
return r
Let’s pass a wrong type and see what happens:
>>> even_numbers("ciao")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-a908b20f00c4> in <module>()
----> 1 even_numbers("ciao")
<ipython-input-13-b0b3a85f2b2a> in even_numbers(n)
9 """
10 if type(n) is not int:
---> 11 raise TypeError("Passed a non integer number: " + str(n))
12
13 if n < 0:
TypeError: Passed a non integer number: ciao
Now let’s try to pass a negative number - it should suddenly stop with a meaningful message:
>>> even_numbers(-5)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-3f648fdf6de7> in <module>()
----> 1 even_numbers(-5)
<ipython-input-13-b0b3a85f2b2a> in even_numbers(n)
12
13 if n < 0:
---> 14 raise ValueError("Passed a negative number: " + str(n))
15
16 r = [2 * x for x in range(n)]
ValueError: Passed a negative number: -5
Now, even if you ship your code to careless users, and as soon as they commit a mistrake, they will get properly notified.
Error kind b): Your code is just plain wrong¶
In this case, it’s 100% your fault, and these sort of bugs should never pop up in production. For example your code passes internally wrong stuff, like strings instead of integers, or wrong ranges (typically integer outside array bounds). So if you have an internal function nobody else should directly call, and you suspect it is being passed wrong parameters or at some point it has inconsistent data, to quickly spot the error you could add an assertion:
[21]:
def even_numbers(n):
"""
Return a list of the first n even numbers
Zero is considered to be the first even number.
>>> even_numbers(5)
[0,2,4,6,8]
"""
assert type(n) is int, "type of n is not correct: " + str(type(n))
assert n >= 0, "Found negative n: " + str(n)
r = [2 * x for x in range(n)]
return r
As before, the function will stop as soon we call it we wrong parameters. The big difference is, this time we are assuming even_numbers
is just for personal use and nobody else except us should directly call it.
Since assertion consume CPU time, IF we care about performances AND once we are confident our program behaves correctly, we can even remove them from compiled code by using the -O
compiler flag. For more info, see Python wiki
EXERCISE: try to call latest definition of even_numbers
with wrong parameters, and see what happens.
NOTE: here we are using the correct definition of even_numbers
, not the buggy one with the 3
in the middle of returned list !
Testing with Unittest¶
NOTE: Testing with Unittest is only done in PART B of this course
Is there anything better than assert
for testing? assert
can be a quick way to check but doesn’t tell us exactly which is the wrong number in the list returned by even_number(5)
. Luckily, Python offers us a better option, which is a complete testing framework called unittest. We will use unittest because it is the standard one, but if you’re doing other projects you might consider using better ones like
pytest
So let’s give unittest a try. Suppose you have a file called file_test.py
like this:
[22]:
import unittest
def even_numbers(n):
"""
Return a list of the first n even numbers
Zero is considered to be the first even number.
>>> even_numbers(5)
[0,2,4,6,8]
"""
r = [2 * x for x in range(n)]
r[n // 2] = 3 # <-- evil bug, puts number '3' in the middle
return r
class MyTest(unittest.TestCase):
def test_long_list(self):
self.assertEqual(even_numbers(5),[0,2,4,6,8])
We won’t explain what class
mean (for classes see the book chpater), the important thing to notice is the method definition:
def test_long_list(self):
self.assertEqual(even_numbers(5),[0,2,4,6,8])
In particular:
method is declared like a function, and begins with
'test_'
wordmethod takes
self
as parameterself.assertEqual(even_numbers(5),[0,2,4,6,8])
executes the assertion. Other assertions could beself.assertTrue(some_condition) or self.assertFalse(some_condition)
Running tests¶
To run the tests, enter the following command in the terminal:
python -m unittest file_test
!!!!! WARNING: In the call above, DON’T append the extension .py
to file_test
!!!!!! !!!!! WARNING: Still, on the hard-disk the file MUST be named with a .py
at the end, like file_test.py
!!!!!!
You should see an output like the following:
[23]:
jupman.show_run(MyTest)
F
======================================================================
FAIL: test_long_list (__main__.MyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython-input-22-397caec8a66f>", line 19, in test_long_list
self.assertEqual(even_numbers(5),[0,2,4,6,8])
AssertionError: Lists differ: [0, 2, 3, 6, 8] != [0, 2, 4, 6, 8]
First differing element 2:
3
4
- [0, 2, 3, 6, 8]
? ^
+ [0, 2, 4, 6, 8]
? ^
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (failures=1)
Now you can see a nice display of where the error is, exactly in the middle of the list!
When tests don’t run¶
When -m unittest
does not work and you keep seeing absurd errors like Python not finding a module and you are getting desperate (especially because Python has unittest
included by default, there is no need to install it! ), try to put the following code at the very end of the file you are editing:
unittest.main()
Then run your file with just
python file_test.py
In this case it should REALLY work. If it still doesn’t, call the Ghostbusters. Or, better, the IndentationBusters, you’re likely having tabs mixed with spaces mixed with bad bad luck.
Adding tests¶
How can we add (good) tests? Since best ones are usually short, it would be better starting small boundary cases. For example like n=1
, which according to function documentation should produce a list containing zero:
[24]:
class MyTest(unittest.TestCase):
def test_one_element(self):
self.assertEqual(even_numbers(1),[0])
def test_long_list(self):
self.assertEqual(even_numbers(5),[0,2,4,6,8])
Let’s call again the command:
python -m unittest file_test
[25]:
jupman.show_run(MyTest)
FF
======================================================================
FAIL: test_long_list (__main__.MyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython-input-24-306d9f1c7777>", line 7, in test_long_list
self.assertEqual(even_numbers(5),[0,2,4,6,8])
AssertionError: Lists differ: [0, 2, 3, 6, 8] != [0, 2, 4, 6, 8]
First differing element 2:
3
4
- [0, 2, 3, 6, 8]
? ^
+ [0, 2, 4, 6, 8]
? ^
======================================================================
FAIL: test_one_element (__main__.MyTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython-input-24-306d9f1c7777>", line 4, in test_one_element
self.assertEqual(even_numbers(1),[0])
AssertionError: Lists differ: [3] != [0]
First differing element 0:
3
0
- [3]
+ [0]
----------------------------------------------------------------------
Ran 2 tests in 0.002s
FAILED (failures=2)
From the tests we can now see there is clearly something wrong with the number 3 that keeps popping up, making both tests fail. You can see immediately which tests have failed by looking at the first two FF
at the top of the output. Let’s fix the code by removing the buggy line:
[26]:
def even_numbers(n):
"""
Return a list of the first n even numbers
Zero is considered to be the first even number.
>>> even_numbers(5)
[0,2,4,6,8]
"""
r = [2 * x for x in range(n)]
# NOW WE COMMENTED THE BUGGY LINE r[n // 2] = 3 # <-- evil bug, puts number '3' in the middle
return r
And call yet again the command:
python -m unittest file_test
[27]:
jupman.show_run(MyTest)
..
----------------------------------------------------------------------
Ran 2 tests in 0.001s
OK
Wonderful, all the two tests have passed and we got rid of the bug.
WARNING: DON’T DUPLICATE TEST CLASS NAMES AND/OR METHODS!
In the following, you will be asked to add tests. Just add NEW methods with NEW names to the EXISTING class MyTest
!
Exercise: boundary cases¶
Think about other boundary cases, and try to add corresponding tests.
Can we ever have an empty list?
Can
n
be equal to zero? Add a test inside MyTest class for its expected result.Can
n
be negative? In this case the function text tells us nothing about the expected behaviour, so we might choose it now: either the function raises an error, or it gives a back something, like i.e. list of even negative numbers. Try to modifyeven_numbers
and add a relative test inside MyTest class for expecting even negative numbers (starting from zero).
Exercise: expecting assertions¶
What if user passes us a float like 3.5
instead of an integer? If you try to run even_numbers(3.5)
you will discover it works anyway, but we might decide to be picky and not accept inputs other than integers. Try to modify even_numbers
to make so that when input is not of type int
, raises TypeError (to check for type, you can write type(n) == int
).
To test for it, add following test inside MyTest class :
def test_type(self):
with self.assertRaises(TypeError):
even_numbers(3.5)
The with
block tells Python to expect the code inside the with
block to raise the exception TypeError:
If
even_numbers(3.5)
actually raisesTypeError
exception, nothing happensIf
even_numbers(3.5)
does not raiseTypeError
exception, with raisesAssertionError
After you completed previous task, consider when the input is the float 4.0
: in this case it might make sense to still accept it, so modify even_numbers
accordingly and write a test for it.
Exercise: good tests¶
What difference is there between the following two test classes? Which one is better for testing?
class MyTest(unittest.TestCase):
def test_one_element(self):
self.assertEqual(even_numbers(1),[0])
def test_long_list(self):
self.assertEqual(even_numbers(5),[0,2,4,6,8])
and
class MyTest(unittest.TestCase):
def test_stuff(self):
self.assertEqual(even_numbers(1),[0])
self.assertEqual(even_numbers(5),[0,2,4,6,8])
Running unittests in Visual Studio Code¶
You can run and debug tests in Visual Studio Code, which is very handy. First, you need to set it up.
Hit
Control-Shift-P
(on Mac:Command-Shift-P
) and typePython: Configure Tests
Select unittest:
Select
. root directory
(we assume tests are in the folder that you’ve opened):
Select
*Python files containing the word 'test'
:
Hopefully, on the currently opened test file new labels should appear above class and test methods, like in the following example. Try to click on them:
In the bottom bar, you should see a recap of run tests (right side of the picture):
TROUBLESHOOTING
If you encounter problems running tests and have Anaconda, sometimes an easy solution can be just closing Visual Studio Code and running it from the Anaconda Navigator. You can also try to update it.
Running tests by console does not work:
remember to SAVE the files before executing tests: in Windows, a file appears as not saved when its filename in the tab is written in italics; on Linux, you might see a dot to the right of the filename
Run Test label does not show up in code:
if you see red squiggles in the code, most probably syntax is not correct and thus no test will get discovered ! If this is the case, fix the syntax error, SAVE, and then tell Visual Studio to discover test.
you might also try Right click->Run current Test File.
try selecting another testing framework , try pytest, which is also capable to discover and execute unittests.
if you are really out of luck with the editor, there is always the option of running tests from the console.
WARNING: spend time also with the console !!!!
During the exam testing in VSCode might not work, so please be prepared to use the console
Functional programming¶
In functional programming, functions behave as mathematical ones so they always take some parameter and return new data without ever changing the input. They say functional programming is easier to test. Why?
Immutable data structures: all data structures are (or are meant to be) immutable -> no code can ever tweak your data, so other developers just cannot (should not) be able to inadvertently change your data.
Simpler parallel computing: point above is particularly inmportant in parallel computation, wheb the system can schedule thread executions differently each time you run the program: this implies that when you have multiple threads it can be very very hard to reproduce a bug where a thread wrongly changes a data which is supposed to be exclusively managed by another one, as it might fail in one run and succeed in another just because the system scheduled differently the code execution ! Functional programming frameworks like Spark solve these problems very nicely.
Easier to reason about code: it is much easier to reason about functions, as we can use standard equational reasoning on input/outputs as traditionally done in algebra. To understand what we’re talking about, you can see these slides: Visual functional programming (will talk more about it in class)
[ ]:
Matrices: list of lists solutions¶
Introduction¶
Python natively does not provide easy and efficient ways to manipulate matrices. To do so, you would need an external library called numpy which will be seen later in the course. For now we will limit ourselves to using matrices as lists of lists because
lists are pervasive in Python, you will probably encounter matrices expressed as lists of lists anyway
you get an idea of how to construct a nested data structure
we can discuss memory referencies and copies along the way
even if numpy internal representation is different, it prints matrices as they were lists of lists
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-exercises
|- matrix-lists
|- matrix-list-exercise.ipynb
|- matrix-list-solution.ipynb
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/matrices-lists/matrices-lists-exercise.ipynb
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Overview¶
So let’s see these lists of lists.For example, we can consider the following a matrix with 3 rows and 2 columns, or in short 3x2 matrix:
[2]:
m = [
['a','b'],
['c','d'],
['a','e']
]
For convenience, we assume as input to our functions there won’t be matrices with no rows, nor rows with no columns.
Going back to the example, in practice we have a big external list:
m = [
]
and each of its elements is another list which represents a row:
m = [
['a','b'],
['c','d'],
['a','e']
]
So, to access the whole first row ['a','b']
, we would simply access the element at index 0 of the external list m
:
[3]:
m[0]
[3]:
['a', 'b']
To access the second whole second row ['c','d']
, we would access the element at index 1 of the external list m
:
[4]:
m[1]
[4]:
['c', 'd']
To access the second whole third row ['c','d']
, we would access the element at index 2 of the external list m
:
[5]:
m[2]
[5]:
['a', 'e']
To access the first element 'a'
of the first row ['a','b']
we would add another subscript operator with index 0:
[6]:
m[0][0]
[6]:
'a'
To access the second elemnt 'b'
of the first row ['a','b']
we would use instead index 1 :
[7]:
m[0][1]
[7]:
'b'
WARNING: When a matrix is a list of lists, you can only access values with notation m[i][j]
, NOT with m[i,j]
!!
[8]:
# write here the wrong notation m[0,0] and see which error you get:
Exercises¶
Now implement the following functions.
REMEMBER: if the cell is executed and nothing happens, it is because all the assert tests have worked! In such case you probably wrote correct code but careful, these kind of tests are never exhaustive so you could have still made some error.
COMMANDMENT 4: You shall never ever reassign function parameters
def myfun(i, s, L, D):
# You shall not do any of such evil, no matter what the type of the parameter is:
i = 666 # basic types (int, float, ...)
s = "666" # strings
L = [666] # containers
D = {"evil":666} # dictionaries
# For the sole case of composite parameters like lists or dictionaries,
# you can write stuff like this IF AND ONLY IF the function specification
# requires you to modify the parameter internal elements (i.e. sorting a list
# or changing a dictionary field):
L[4] = 2 # list
D["my field"] = 5 # dictionary
C.my_field = 7 # class
COMMANDMENT 7: You shall use ``return`` command only if you see written *return* in the function description!
If there is no return
in function description, the function is intended to return None
. In this case you don’t even need to write return None
, as Python will do it implicitly for you.
Matrix dimensions¶
✪ EXERCISE: For getting matrix dimensions, we can use normal list operations. Which ones? You can assume the matrix is well formed (all rows have equal length) and has at least one row and at least one column
[9]:
m = [
['a','b'],
['c','d'],
['a','e']
]
[10]:
# write here code for printing rows and columns
# the outer list is a list of rows, so to count htem we just use len(m)
print("rows")
print(len(m))
# if we assume the matrix is well formed and has at least one row and column, we can directly check the length
# of the first row
print("columns")
print(len(m[0]))
rows
3
columns
2
extract_row¶
One of the first things you might want to do is to extract the i-th row. If you’re implementing a function that does this, you have basically two choices. Either you
return a pointer to the original row
return a copy of the row.
Since a copy consumes memory, why should you ever want to return a copy? Sometimes you should because you don’t know which use will be done of the data structure. For example, suppose you got a book of exercises which has empty spaces to write exercises in. It’s such a great book everybody in the classroom wants to read it - but you are afraid if the book starts changing hands some careless guy might write on it. To avoid problems, you make a copy of the book and distribute it (let’s leave copyright infringment matters aside :-)
extract_row_pointer¶
So first let’s see what happens when you just return a pointer to the original row.
NOTE: For convenience, at the end of the cell we put a magic call to jupman.pytut()
which shows the code execution like in Python tutor (for further info about jupman.pytut()
, see here). If execute all the code in Python tutor, you will see that at the end you have two arrow pointers to the row ['a','b']
, one starting from m
list and one from
row
variable.
[11]:
def extract_row_pointer(mat, i):
""" RETURN the ith row from mat
NOTE: the underlying row is returned, so modifications to it will also modify original mat
"""
return mat[i]
m = [
['a','b'],
['c','d'],
['a','e'],
]
row = extract_row_pointer(m, 0)
jupman.pytut()
[11]:
extract_row_f¶
✪ Now try to implement a version which returns a copy of the row.
You might be tempted to implement something like this:
[12]:
# WARNING: WRONG CODE!!!!
# It is adding a LIST as element to another empty list.
# In other words, it is wrapping the row (which is already a list) into another list.
def extract_row(mat, i):
""" RETURN the ith row from mat. NOTE: the row MUST be a new list ! """
riga = []
riga.append(mat[i])
return riga
# Let's check the problem in Python tutor! You will see an arrow going from row to a list of one element
# which will contain exactly one arrow to the original row.
m = [
['a','b'],
['c','d'],
['a','e'],
]
row = extract_row(m,0)
jupman.pytut()
[12]:
You can build an actual copy in several ways, with a for, a slice or a list comprehension. Try to implement all versions, starting with the for here. Be sure to check your result with Python tutor - to visualize python tutor inside the cell output, you might use the special command jupman.pytut()
at the end of the cell as we did before. If you run the code with Python tutor, you should see only one arrow going to the original ['a','b']
row in m
, and there should be another
['a','b']
copy somewhere, with row
variable pointing to it.
[13]:
def extract_row_f(mat, i):
""" RETURN the ith row from mat.
NOTE: the row MUST be a new list! To create a new list use a for cycle
which iterates over the elements, _not_ the indexes (so don't use range!)
"""
#jupman-raise
riga = []
for x in mat[i]:
riga.append(x)
return riga
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b'],
['c','d'],
['a','e'],
]
assert extract_row_f(m, 0) == ['a','b']
assert extract_row_f(m, 1) == ['c','d']
assert extract_row_f(m, 2) == ['a','e']
# check it didn't change the original matrix !
r = extract_row_f(m, 0)
r[0] = 'z'
assert m[0][0] == 'a'
# TEST END
# uncomment if you want to visualize execution here (you need to be online for this to work)
#jupman.pytut()
extract_row_fr¶
✪ Now try to iterate over a range of row indexes. Let’s have a quick look at range(n)
. Maybe you think it should return a sequence of integers, from zero to n - 1
. Does it?
[14]:
range(5)
[14]:
range(0, 5)
Maybe you expected to see something like a list [0,1,2,3,4]
, instead we just discovered Python is pretty lazy here: range(n)
actually returns an iterabile object, not a real sequence materialized in memory.
To get an actual list of integers, we must explicitly ask this iterable object to give us the numbers one by one.
When you write for i in range(5)
the for cycle is doing exactly this, at each round it is asking the range object to generate a number in the sequence. If we want the whole sequence materialized in memory, we can generate it by converting the range to a list object:
[15]:
list(range(5))
[15]:
[0, 1, 2, 3, 4]
Be careful, though. Depending on the size of the sequence, this might be dangerous. A list of billion elements might saturate the RAM of your computer (as of 2018 laptops come with 4 gigabytes of RAM memory, that is 4 billion of bytes).
Now implement the extract_row_fr
iterating over a range of row indexes:
[16]:
def extract_row_fr(mat, i):
""" RETURN the ith row from mat.
NOTE: the row MUST be a new list! To create a new list use a for cycle
which iterates over the indexes, _not_ the elements (so use range!)
"""
#jupman-raise
riga = []
for j in range(len(mat[0])):
riga.append(mat[i][j])
return riga
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b'],
['c','d'],
['a','e'],
]
assert extract_row_fr(m, 0) == ['a','b']
assert extract_row_fr(m, 1) == ['c','d']
assert extract_row_fr(m, 2) == ['a','e']
# check it didn't change the original matrix !
r = extract_row_fr(m, 0)
r[0] = 'z'
assert m[0][0] == 'a'
# TEST END
# uncomment if you want to visualize execution here (you need to be online for this to work)
#jupman.pytut()
extract_row_s¶
✪ Remember slices return a copy of a list? Now try to use them.
[17]:
def extract_row_s(mat, i):
""" RETURN the ith row from mat.
NOTE: the row MUST be a new list! To create a new list use slices.
"""
#jupman-raise
return mat[i][:] # if you omit start end end indexes, you get a copy of the whole list
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b'],
['c','d'],
['a','e'],
]
assert extract_row_s(m, 0) == ['a','b']
assert extract_row_s(m, 1) == ['c','d']
assert extract_row_s(m, 2) == ['a','e']
# check it didn't change the original matrix !
r = extract_row_s(m, 0)
r[0] = 'z'
assert m[0][0] == 'a'
# TEST END
# uncomment if you want to visualize execution here (you need to be online for this to work)
#jupman.pytut()
extract_row_c¶
✪ Try now to use list comprehensions.
[18]:
def extract_row_c(mat, i):
""" RETURN the ith row from mat.
NOTE: the row MUST be a new list! To create a new list use list comprehension.
"""
#jupman-raise
return [x for x in mat[i]]
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b'],
['c','d'],
['a','e'],
]
assert extract_row_c(m, 0) == ['a','b']
assert extract_row_c(m, 1) == ['c','d']
assert extract_row_c(m, 2) == ['a','e']
# check it didn't change the original matrix !
r = extract_row_c(m, 0)
r[0] = 'z'
assert m[0][0] == 'a'
# TEST END
# uncomment if you want to visualize execution here (you need to be online for this to work)
#jupman.pytut()
extract_col_f¶
✪✪ Now we can try to extract a column at j
th position. This time we will be forced to create a new list, so we don’t have to wonder if we need to return a pointer or a copy.
[19]:
def extract_col_f(mat, j):
""" RETURN the jth column from mat. To create it, use a for """
#jupman-raise
ret = []
for row in mat:
ret.append(row[j])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b'],
['c','d'],
['a','e'],
]
assert extract_col_f(m, 0) == ['a','c','a']
assert extract_col_f(m, 1) == ['b','d','e']
# check returned column does not modify m
c = extract_col_f(m,0)
c[0] = 'z'
assert m[0][0] == 'a'
# TEST END
# uncomment if you want to visualize execution here (you need to be online for this to work)
#jupman.pytut()
extract_col_c¶
Difficulty: ✪✪
[20]:
def extract_col_c(mat, j):
""" RETURN the jth column from mat. To create it, use a list comprehension """
#jupman-raise
return [row[j] for row in mat]
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b'],
['c','d'],
['a','e'],
]
assert extract_col_c(m, 0) == ['a','c','a']
assert extract_col_c(m, 1) == ['b','d','e']
# check returned column does not modify m
c = extract_col_c(m,0)
c[0] = 'z'
assert m[0][0] == 'a'
# TEST END
# uncomment if you want to visualize execution here (you need to be online for this to work)
#jupman.pytut()
deep_clone¶
✪✪ Let’s try to produce a complete clone of the matrix, also called a deep clone, by creating a copy of the external list and also the internal lists representing the rows.
You might be tempted to write code like this:
[21]:
# WARNING: WRONG CODE
def deep_clone_wrong(mat):
""" RETURN a NEW list of lists which is a COMPLETE DEEP clone
of mat (which is a list of lists)
"""
return mat[:] # NOT SUFFICIENT !
# This is a SHALLOW clone, it's only copying the _external_ list
# and not also the internal ones !
m = [
['a','b'],
['b','d']
]
res = deep_clone_wrong(m)
# Notice you will have arrows in res list going to the _original_ mat. We don't want this !
jupman.pytut()
[21]:
To fix the above code, you will need to iterate through the rows and for each row create a copy of that row.
[22]:
def deep_clone(mat):
""" RETURN a NEW list of lists which is a COMPLETE DEEP clone
of mat (which is a list of lists)
"""
#jupman-raise
ret = []
for row in mat:
ret.append(row[:])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b'],
['b','d']
]
res = [
['a','b'],
['b','d']
]
# verify the copy
c = deep_clone(m)
assert c == res
# verify it is a DEEP copy (that is, it created also clones of the rows!)
c[0][0] = 'z'
assert m[0][0] == 'a'
# TEST END
stitch_down¶
Difficulty: ✪✪
[23]:
def stitch_down(mat1, mat2):
"""Given matrices mat1 and mat2 as list of lists, with mat1 of size u x n and mat2 of size d x n,
RETURN a NEW matrix of size (u+d) x n as list of lists, by stitching second mat to the bottom of mat1
NOTE: by NEW matrix we intend a matrix with no pointers to original rows (see previous deep clone exercise)
"""
#jupman-raise
res = []
for row in mat1:
res.append(row[:])
for row in mat2:
res.append(row[:])
return res
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = [
['a']
]
m2 = [
['b']
]
assert stitch_down(m1, m2) == [
['a'],
['b']
]
# check we are giving back a deep clone
s = stitch_down(m1, m2)
s[0][0] = 'z'
assert m1[0][0] == 'a'
m1 = [
['a','b','c'],
['d','b','a']
]
m2 = [
['f','b', 'h'],
['g','h', 'w']
]
res = [
['a','b','c'],
['d','b','a'],
['f','b','h'],
['g','h','w']
]
assert stitch_down(m1, m2) == res
# TEST END
stitch_up¶
Difficulty: ✪✪
[24]:
def stitch_up(mat1, mat2):
"""Given matrices mat1 and mat2 as list of lists, with mat1 of size u x n and mat2 of size d x n,
RETURN a NEW matrix of size (u+d) x n as list of lists, by stitching first mat to the bottom of mat2
NOTE: by NEW matrix we intend a matrix with no pointers to original rows (see previous deep clone exercise)
To implement this function, use a call to the method stitch_down you implemented before.
"""
#jupman-raise
return stitch_down(mat2, mat1)
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = [
['a']
]
m2 = [
['b']
]
assert stitch_up(m1, m2) == [
['b'],
['a']
]
# check we are giving back a deep clone
s = stitch_up(m1, m2)
s[0][0] = 'z'
assert m1[0][0] == 'a'
m1 = [
['a','b','c'],
['d','b','a']
]
m2 = [
['f','b', 'h'],
['g','h', 'w']
]
res = [
['f','b','h'],
['g','h','w'],
['a','b','c'],
['d','b','a']
]
assert stitch_up(m1, m2) == res
# TEST END
stitch_right¶
Difficulty: ✪✪✪
[25]:
def stitch_right(mata,matb):
"""Given matrices mata and matb as list of lists, with mata of size n x l and matb of size n x r,
RETURN a NEW matrix of size n x (l + r) as list of lists, by stitching second mat to the right end of mat1
"""
#jupman-raise
ret = []
for i in range(len(mata)):
row_to_add = mata[i][:]
row_to_add.extend(matb[i])
ret.append(row_to_add)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
ma1 = [
['a','b','c'],
['d','b','a']
]
mb1 = [
['f','b'],
['g','h']
]
r1 = [
['a','b','c','f','b'],
['d','b','a','g','h']
]
assert stitch_right(ma1, mb1) == r1
# TEST END
stitch_left_mod¶
✪✪✪ This time let’s try to modify mat1
in place, by stitching mat2
to the left of mat1
.
So this time don’t put a return
instruction.
You will need to perform list insertion, which can be tricky. There are many ways to do it in Python, one could be using the weird splice assignment insertion:
mylist[0:0] = list_to_insert
see here for more info: https://stackoverflow.com/a/10623383
[26]:
def stitch_left_mod(mat1,mat2):
"""Given matrices mat1 and mat2 as list of lists, with mat1 of size n x l and mat2 of size n x r,
MODIFIES mat1 so that it becomes of size n x (l + r), by stitching second mat to the left of mat1
"""
#jupman-raise
for i in range(len(mat1)):
mat1[i][0:0] = mat2[i]
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = [
['a','b','c'],
['d','b','a']
]
m2 = [
['f','b'],
['g','h']
]
res = [
['f','b','a','b','c'],
['g','h','d','b','a']
]
stitch_left_mod(m1, m2)
assert m1 == res
# TEST END
Exceptions and parameter checking¶
Let’s look at a parameter validation example (it is not an exercise).
If we wanted to implement a function mydiv(a,b)
which divides a
by b
we could check inside that b is not zero. If it is, we might abruptly stop the function raising a ValueError. In this case the division by zero actually has already a very specific ZeroDivisionError, but for the sake of the example we will raise a ValueError
.
[27]:
def mydiv(a,b):
""" Divides a by b. If b is zero, raises a ValueError
"""
if b == 0:
raise ValueError("Invalid divisor 0")
return a / b
# to check the function actually raises ValueError when called, we might write a quick test like this:
try:
mydiv(3,0)
raise Exception("SHOULD HAVE FAILED !") # if mydiv raises an exception which is ValueError as we expect it to do,
# the code should never arrive here
except ValueError: # this only catches ValueError. Other types of errors are not catched
"passed test" # In an except clause you always need to put some code.
# Here we put a placeholder string just to fill in
assert mydiv(6,2) == 3
diag¶
✪✪ diag
extracts the diagonal of a matrix. To do so, diag
requires an nxn matrix as input. To make sure we actually get an nxn matrix, this time you will have to validate the input, that is check if the number of rows is equal to the number of columns (as always we assume the matrix has at least one row and at least one column). If the matrix is not nxn, the function should stop raising an exception. In particular, it shoud raise a
ValueError, which is the standard Python exception to raise when the expected input is not correct and you can’t find any other more specific error.
Just for illustrative puroposes, we show here the index numbers i
and j
and avoid putting apices around strings:
\ j 0,1,2,3
i
[
0 [a,b,c,d],
1 [e,f,g,h],
2 [p,q,r,s],
3 [t,u,v,z]
]
Let’s see a step by step execution:
\ j 0,1,2,3
i
[
extract from row at i=0 --> 0 [a,b,c,d], 'a' is extracted from mat[0][0]
1 [e,f,g,h],
2 [p,q,r,s],
3 [t,u,v,z]
]
\ j 0,1,2,3
i
[
0 [a,b,c,d],
extract from row at i=1 --> 1 [e,f,g,h], 'f' is extracted from mat[1][1]
2 [p,q,r,s],
3 [t,u,v,z]
]
\ j 0,1,2,3
i
[
0 [a,b,c,d],
1 [e,f,g,h],
extract from row at i=2 --> 2 [p,q,r,s], 'r' is extracted from mat[2][2]
3 [t,u,v,z]
]
\ j 0,1,2,3
i
[
0 [a,b,c,d],
1 [e,f,g,h],
2 [p,q,r,s],
extract from row at i=3 --> 3 [t,u,v,z] 'z' is extracted from mat[3][3]
]
From the above, we notice we need elements from these indeces:
i, j
1, 1
2, 2
3, 3
There are two ways to solve this exercise, one is to use a double for (a nested for to be precise) while the other method uses only one for. Try to solve it in both ways. How many steps do you need with double for? and with only one?
About perfomances
For the purposes of the first part of the course, performance considerations won’t be part of the evaluation. So if all the tests run in a decent time on your laptop (and the code is actually correct!), then the exercise is considered solved, even if there are better algorithmic ways to solve it. Typically in this first part you won’t have many performance problems, except when we will deal with 100 mb files - in that cases you will be forced to use the right method otherwise your laptop will just keep keep heating without spitting out results
In the second part of the course, we will consider performance indeed, so in that part using a double for would be considered an unacceptable waste.
[28]:
def diag(mat):
""" Given an nxn matrix mat as a list of lists, RETURN a list which contains the elemets in the diagonal
(top left to bottom right corner).
- if mat is not nxn raise ValueError
"""
#jupman-raise
if len(mat) != len(mat[0]):
raise ValueError("Matrix should be nxn, found instead %s x %s" % (len(mat), len(mat[0])))
ret = []
for i in range(len(mat)):
ret.append(mat[i][i])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b','c'],
['d','e','f'],
['g','h','i']
]
assert diag(m) == ['a','e','i']
try:
diag([ # 1x2 dimension, not square
['a','b']
])
raise Exception("SHOULD HAVE FAILED !") # if diag raises an exception which is ValueError as we expect it to do,
# the code should never arrive here
except ValueError: # this only catches ValueError. Other types of errors are not catched
"passed test" # In an except clause you always need to put some code.
# Here we put a placeholder string just to fill in
# TEST END
anti_diag¶
✪✪ Before implementing it, be sure to write down understand the required indeces as we did in the example for the diag function.
[29]:
def anti_diag(mat):
""" Given an nxn matrix mat as a list of lists, RETURN a list which contains the elemets in the antidiagonal
(top right to bottom left corner). If mat is not nxn raise ValueError
"""
#jupman-raise
n = len(mat)
ret = []
for i in range(n):
ret.append(mat[i][n-i-1])
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m = [
['a','b','c'],
['d','e','f'],
['g','h','i']
]
assert anti_diag(m) == ['c','e','g']
# TEST END
# If you have doubts about the indexes remember to try it in python tutor !
# jupman.pytut()
is_utriang¶
✪✪✪ You will now try to iterate only the lower triangular half of a matrix. Let’s look at an example:
[30]:
m = [
[3,2,5,8],
[0,6,2,3],
[0,0,4,9],
[0,0,0,5]
]
Just for illustrative puroposes, we show here the index numbers i
and j
:
\ j 0,1,2,3
i
[
0 [3,2,5,8],
1 [0,6,2,3],
2 [0,0,4,9],
3 [0,7,0,5]
]
Let’s see a step by step execution an a non-upper triangular matrix:
\ j 0,1,2,3
i
[
0 [3,2,5,8],
start from row at index i=1 -> 1 [0,6,2,3], Check until column limit j=0 included
2 [0,0,4,9],
3 [0,7,0,5]
]
One zero is found, time to check next row.
\ j 0,1,2,3
i
[
0 [3,2,5,8],
1 [0,6,2,3],
check row at index i=2 ---> 2 [0,0,4,9], Check until column limit j=1 included
3 [0,7,0,5]
]
Two zeros are found. Time to check next row.
\ j 0,1,2,3
i
[
0 [3,2,5,8],
1 [0,6,2,3],
2 [0,0,4,9],
check row at index i=3 ---> 3 [0,7,0,5] Check until column limit j=2 included
] BUT can stop sooner at j=1 because number at j=1
is different from zero. As soon as 7 is found, can return False
In this case the matrix is not upper triangular
When you develop these algorithms, it is fundamental to write down a step by step example like the above to get a clear picture of what is happening. Also, if you write down the indeces correctly, you will easily be able to derive a generalization. To find it, try to further write the found indeces in a table.
For example, from above for each row index i
we can easily find out which limit index j
we need to reach for our hunt for zeros:
| i | limit j (included) | Notes |
|---|--------------------|---------------------------------|
| 1 | 0 | we start from row at index i=1 |
| 2 | 1 | |
| 3 | 2 | |
From the table, we can see the limit for j can be calculated in terms of the current row index i
with the simple formula i - 1
The fact you need to span through rows and columns suggest you need two for
s, one for rows and one for columns - that is, a nested for.
please use ranges of indexes to carry out the task (no
for row in mat
..)please use letter
i
as index for rows,j
as index of columns and in case you need itn
letter as matrix dimension
HINT 1: remember you can set range to start from a specific index, like range(3,7)
will start from 3 and end to 6 included (last 7 is excluded!)
HINT 2: To implement this, it is best looking for numbers different from zero. As soon as you find one, you can stop the function and return False. Only after all the number checking is done you can return True.
Finally, be reminded of the following:
COMMANDMENT 9: Whenever you introduce a variable with a for cycle, such variable must be new
If you defined a variable before, you shall not reintroduce it in a for
, since it is as confusing as reassigning function parameters.
So avoid this sins:
[31]:
i = 7
for i in range(3): # sin, you lose i variable
print(i)
0
1
2
[32]:
def f(i):
for i in range(3): # sin again, you lose i parameter
print(i)
[33]:
for i in range(2):
for i in range(3): # debugging hell, you lose i from outer for
print(i)
0
1
2
0
1
2
If you read all the above, start implementing the function:
[34]:
def is_utriang(mat):
""" Takes a RETURN True if the provided nxn matrix is upper triangular, that is, has all the entries
below the diagonal set to zero. Return False otherwise.
"""
#jupman-raise
n = len(mat)
m = len(mat[0])
for i in range(1,n):
for j in range(i): # notice it arrives until i *excluded*, that is, arrives to i - 1 *included*
if mat[i][j] != 0:
return False
return True
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert is_utriang([
[1]
]) == True
assert is_utriang([
[3,2,5],
[0,6,2],
[0,0,4]
]) == True
assert is_utriang([
[3,2,5],
[0,6,2],
[1,0,4]
]) == False
assert is_utriang([
[3,2,5],
[0,6,2],
[1,1,4]
]) == False
assert is_utriang([
[3,2,5],
[0,6,2],
[0,1,4]
]) == False
assert is_utriang([
[3,2,5],
[1,6,2],
[1,0,4]
]) == False
# TEST END
transpose_1¶
✪✪✪ Transpose a matrix in-place. The transpose \(M^T\) of a matrix \(M\) is defined as
\(M^T[i][j] = M[j][i]\)
The definition is simple yet implementation might be tricky. If you’re not careful, you could easily end up swapping the values twice and get the same original matrix. To prevent this, iterate only the upper triangular part of the matrix and remember range
funciton can also have a start index:
[35]:
list(range(3,7))
[35]:
[3, 4, 5, 6]
Also, make sure you know how to swap just two values by solving first this very simple exercise - also check the result in Python Tutor
[36]:
x = 3
y = 7
# write here code for swapping x and y (don't directly use the constants 3 and 7!)
k = x
x = y
y = k
jupman.pytut()
[36]:
Going back to the transpose, for now we will consider only an nxn matrix. To make sure we actually get an nxn matrix, this time you will have to validate the input, that is check if the number of rows is equal to the number of columns (as always we assume the matrix has at least one row and at least one column). If the matrix is not nxn, the function should stop raising an exception. In particular, it shoud raise a ValueError, which is the standard Python exception to raise when the expected input is not correct and you can’t find any other more specific error.
COMMANDMENT 4 (adapted for matrices): You shall never ever reassign function parameters
def myfun(M):
# M is a parameter, so you shall *not* do any of such evil:
M = [
[6661,6662],
[6663,6664 ]
]
# For the sole case of composite parameters like lists (or lists of lists ..)
# you can write stuff like this IF AND ONLY IF the function specification
# requires you to modify the parameter internal elements (i.e. transposing _in-place_):
M[0][1] = 6663
If you read all the above, you can now proceed implementing the transpose_1
function:
[37]:
def transpose_1(mat):
""" MODIFIES given nxn matrix mat by transposing it *in-place*.
If the matrix is not nxn, raises a ValueError
"""
#jupman-raise
if len(mat) != len(mat[0]):
raise ValueError("Matrix should be nxn, found instead %s x %s" % (len(mat), len(mat[0])))
for i in range(len(mat)):
for j in range(i+1,len(mat[i])):
el = mat[i][j]
mat[i][j] = mat[j][i]
mat[j][i] = el
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
# let's try wrong matrix dimensions:
try:
transpose_1([
[3,5]
])
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
m1 = [
['a']
]
transpose_1(m1)
assert m1 == [
['a']
]
m2 = [
['a','b'],
['c','d']
]
transpose_1(m2)
assert m2 == [
['a','c'],
['b','d']
]
# TEST END
empty matrix¶
✪✪ There are several ways to create a new empty 3x5 matrix as lists of lists which contains zeros. Try to create one with two nested for
cycle:
[38]:
def empty_matrix(n, m):
"""
RETURN a NEW nxm matrix as list of lists filled with zeros. Implement it with a nested for
"""
#jupman-raise
ret = []
for i in range(n):
row = []
ret.append(row)
for j in range(m):
row.append(0)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert empty_matrix(1,1) == [
[0]
]
assert empty_matrix(1,2) == [
[0,0]
]
assert empty_matrix(2,1) == [
[0],
[0]
]
assert empty_matrix(2,2) == [
[0,0],
[0,0]
]
assert empty_matrix(3,3) == [
[0,0,0],
[0,0,0],
[0,0,0]
]
# TEST END
empty_matrix the elegant way¶
To create a new list of 3 elements filled with zeros, you can write like this:
[39]:
[0]*3
[39]:
[0, 0, 0]
The *
is kind of multiplying the elements in a list
Given the above, to create a 5x3 matrix filled with zeros, which is a list of seemingly equal lists, you might then be tempted to write like this:
[40]:
# WRONG
[[0]*3]*5
[40]:
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
Why is that (possibly) wrong? Let’s try to inspect it in Python tutor:
[41]:
bad = [[0]*3]*5
jupman.pytut()
[41]:
If you look closely, you will see many arrows pointing to the same list of 3 zeros. This means that if we change one number, we will apparently change 5 of them in the whole column !
The right way to create a matrix as list of lists with zeroes is the following:
[42]:
# CORRECT
[[0]*3 for i in range(5)]
[42]:
[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
transpose_2¶
✪✪ Now let’s try to transpose a generic nxm matrix. This time for simplicity we will return a whole new matrix.
[43]:
def transpose_2(mat):
""" RETURN a NEW mxn matrix which is the transpose of the given nxm matrix mat as list of lists.
"""
#jupman-raise
n = len(mat)
m = len(mat[0])
ret = [[0]*n for i in range(m)]
for i in range(n):
for j in range(m):
ret[j][i] = mat[i][j]
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = [
['a']
]
r1 = transpose_2(m1)
assert r1 == [
['a']
]
r1[0][0] = 'z'
assert m1[0][0] == 'a'
m2 = [
['a','b','c'],
['d','e','f']
]
assert transpose_2(m2) == [
['a','d'],
['b','e'],
['c','f'],
]
# TEST END
threshold¶
✪✪ Takes a matrix as a list of lists (every list has the same dimension) and RETURN a NEW matrix as list of lists where there is True
if the corresponding input element is greater than t
, otherwise return False
Ingredients:
- a variable for the matrix to return
- for each original row, we need to create a new list
[44]:
def threshold(mat, t):
#jupman-raise
ret = []
for row in mat:
new_row = []
ret.append(new_row)
for el in row:
new_row.append(el > t)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
morig = [
[1,4,2],
[7,9,3],
]
m1 = [
[1,4,2],
[7,9,3],
]
r1 = [
[False,False,False],
[True,True,False],
]
assert threshold(m1,4) == r1
assert m1 == morig # verify original didn't change
m2 = [
[5,2],
[3,7]
]
r2 = [
[True,False],
[False,True]
]
assert threshold(m2,4) == r2
# TEST END
swap_rows¶
Difficulty: ✪✪
[45]:
def swap_rows(mat, i1, i2):
"""Takes a matrix as list of lists, and RETURN a NEW matrix where rows at indexes i1 and i2 are swapped
"""
#jupman-raise
# deep clones
ret = []
for row in mat:
ret.append(row[:])
#swaps
s = ret[i1]
ret[i1] = ret[i2]
ret[i2] = s
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = [
['a','d'],
['b','e'],
['c','f']
]
r1 = swap_rows(m1, 0, 2)
assert r1 == [
['c','f'],
['b','e'],
['a','d']
]
r1[0][0] = 'z'
assert m1[0][0] == 'a'
m2 = [
['a','d'],
['b','e'],
['c','f']
]
# swap with itself should in fact generate a deep clone
r2 = swap_rows(m2, 0, 0)
assert r2 == [
['a','d'],
['b','e'],
['c','f']
]
r2[0][0] = 'z'
assert m2[0][0] == 'a'
# TEST END
swap_cols¶
✪✪ RETURN a NEW matrix where the columns j1
and j2
are swapped
[46]:
def swap_cols(mat, j1, j2):
#jupman-raise
ret = []
for row in mat:
new_row = row[:]
new_row[j1] = row[j2]
new_row[j2] = row[j1]
ret.append(new_row)
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = [
['a','b','c'],
['d','e','f']
]
r1 = swap_cols(m1, 0,2)
assert r1 == [
['c','b','a'],
['f','e','d']
]
r1[0][0] = 'z'
assert m1[0][0] == 'a'
# TEST END
lab¶
✪✪✪ If you’re a teacher that often see new students, you have this problem: if two students who are friends sit side by side they can start chatting way too much. To keep them quiet, you want to somehow randomize student displacement by following this algorithm:
first sort the students alphabetically
then sorted students progressively sit at the available chairs one by one, first filling the first row, then the second, till the end.
Now implement the algorithm.
INPUT:
students: a list of strings of length <= n*m
chairs: an nxm matrix as list of lists filled with None values (empty chairs)
OUTPUT: MODIFIES BOTH students and chairs inputs, without returning anything
If students are more than available chairs, raises ValueError
Example:
ss = ['b', 'd', 'e', 'g', 'c', 'a', 'h', 'f' ]
mat = [
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None]
]
lab(ss, mat)
# after execution, mat should result changed to this:
assert mat == [
['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', None],
[None, None, None],
]
# after execution, input ss should now be ordered:
assert ss == ['a','b','c','d','e','f','g','f']
For more examples, see tests
[47]:
def lab(students, chairs):
#jupman-raise
n = len(chairs)
m = len(chairs[0])
if len(students) > n*m:
raise ValueError("There are more students than chairs ! Students = %s, chairs = %sx%s" % (len(students), n, m))
i = 0
j = 0
students.sort()
for s in students:
chairs[i][j] = s
if j == m - 1:
j = 0
i += 1
else:
j += 1
#/jupman-raise
try:
lab(['a','b'], [[None]])
raise Exception("TEST FAILED: Should have failed before with a ValueError!")
except ValueError:
"Test passed"
try:
lab(['a','b','c'], [[None,None]])
raise Exception("TEST FAILED: Should have failed before with a ValueError!")
except ValueError:
"Test passed"
m0 = [
[None]
]
r0 = lab([],m0)
assert m0 == [
[None]
]
assert r0 == None # function is not meant to return anything (so returns None by default)
m1 = [
[None]
]
r1 = lab(['a'], m1)
assert m1 == [
['a']
]
assert r1 == None # function is not meant to return anything (so returns None by default)
m2 = [
[None, None]
]
lab(['a'], m2) # 1 student 2 chairs in one row
assert m2 == [
['a', None]
]
m3 = [
[None],
[None],
]
lab(['a'], m3) # 1 student 2 chairs in one column
assert m3 == [
['a'],
[None]
]
ss4 = ['b', 'a']
m4 = [
[None, None]
]
lab(ss4, m4) # 2 students 2 chairs in one row
assert m4 == [
['a','b']
]
assert ss4 == ['a', 'b'] # also modified input list as required by function text
m5 = [
[None, None],
[None, None]
]
lab(['b', 'c', 'a'], m5) # 3 students 2x2 chairs
assert m5 == [
['a','b'],
['c', None]
]
m6 = [
[None, None],
[None, None]
]
lab(['b', 'd', 'c', 'a'], m6) # 4 students 2x2 chairs
assert m6 == [
['a','b'],
['c','d']
]
m7 = [
[None, None, None],
[None, None, None]
]
lab(['b', 'd', 'e', 'c', 'a'], m7) # 5 students 3x2 chairs
assert m7 == [
['a','b','c'],
['d','e',None]
]
ss8 = ['b', 'd', 'e', 'g', 'c', 'a', 'h', 'f' ]
m8 = [
[None, None, None],
[None, None, None],
[None, None, None],
[None, None, None]
]
lab(ss8, m8) # 8 students 3x4 chairs
assert m8 == [
['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', None],
[None, None, None],
]
assert ss8 == ['a','b','c','d','e','f','g','h']
dump¶
The multinational ToxiCorp wants to hire you for devising an automated truck driver which will deposit highly contaminated waste in the illegal dumps they own worldwide. You find it ethically questionable, but they pay well, so you accept.
A dump is modelled as a rectangular region of dimensions nrow
and ncol
, implemented as a list of lists matrix. Every cell i
, j
contains the tons of waste present, and can contain at most 7
tons of waste.
The dumpster truck will transport q
tons of waste, and try to fill the dump by depositing waste in the first row, filling each cell up to 7 tons. When the first row is filled, it will proceed to the second one from the left , then to the third one again from the left until there is no waste to dispose of.
Function dump(m, q)
takes as input the dump mat
and the number of tons q
to dispose of, and RETURN a NEW list representing a plan with the sequence of tons to dispose. If waste to dispose exceeds dump capacity, raises ValueError
.
NOTE: the function does not modify the matrix
Example:
m = [
[5,4,6],
[4,7,1],
[3,2,6],
[3,6,2],
]
dump(m, 22)
[2, 3, 1, 3, 0, 6, 4, 3]
For first row we dispose of 2,3,1 tons in three cells, for second row we dispose of 3,0,6 tons in three cells, for third row we only dispose 4, 3 tons in two cells as limit q=22 is reached.
[48]:
def dump(mat, q):
#jupman-raise
rem = q
ret = []
for riga in mat:
for j in range(len(riga)):
cellfill = 7 - riga[j]
unload = min(cellfill, rem)
rem -= unload
if rem > 0:
ret.append(unload)
else:
if unload > 0:
ret.append(unload)
return ret
if rem > 0:
raise ValueError("Couldn't fill the dump, %s tons remain!")
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = [
[5]
]
assert dump(m1,0) == [] # nothing to dump
m2 = [
[4]
]
assert dump(m2,2) == [2]
m3 = [
[5,4]
]
assert dump(m3,3) == [2, 1]
m3 = [
[5,7,3]
]
assert dump(m3,3) == [2, 0, 1]
m5 = [
[2,5], # 5 2
[4,3] # 3 1
]
assert dump(m5,11) == [5,2,3,1]
m6 = [ # tons to dump in each cell
[5,4,6], # 2 3 1
[4,7,1], # 3 0 6
[3,2,6], # 4 3 0
[3,6,2], # 0 0 0
]
assert dump(m6, 22) == [2,3,1,3,0,6,4,3]
try:
dump ([[5]], 10)
raise Exception("Should have failed !")
except ValueError:
pass
# TEST END
matrix multiplication¶
✪✪✪ Have a look at matrix multiplication definition on Wikipedia and try to implement it in the following function.
Basically, gicen nxm matrix A and mxp matrix B you need to output an nxp matrix C calculating the entries \(c_{ij}\) with the formula
\(c_{ij} = a_{i1}b_{1j} +\cdots + a_{im}b_{mj}= \sum_{k=1}^m a_{ik}b_{kj}\)
You need to fill all the nxp cells of C, so sure enough to fill a rectangle you need two for
s. Do you also need another for
? Help yourself with the following visualization.
[49]:
def mul(mata, matb):
""" Given matrices n x m mata and m x p matb, RETURN a NEW n x p matrix which is the result
of the multiplication of mata by matb.
If mata has column number different from matb row number, raises a ValueError.
"""
#jupman-raise
n = len(mata)
m = len(mata[0])
p = len(matb[0])
if m != len(matb):
raise ValueError("mat1 column number %s must be equal to mat2 row number %s !" % (m, len(matb)))
ret = [[0]*p for i in range(n)]
for i in range(n):
for j in range(p):
ret[i][j] = 0
for k in range(m):
ret[i][j] += mata[i][k] * matb[k][j]
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
# let's try wrong matrix dimensions:
try:
mul([[3,5]], [[7]])
raise Exception("SHOULD HAVE FAILED!")
except ValueError:
"passed test"
ma1 = [ [3] ]
mb1 = [ [5] ]
r1 = mul(ma1,mb1)
assert r1 == [
[15]
]
ma2 = [
[3],
[5]
]
mb2 = [
[2,6]
]
r2 = mul(ma2,mb2)
assert r2 == [
[3*2, 3*6],
[5*2, 5*6]
]
ma3 = [ [3,5] ]
mb3 = [ [2],
[6]
]
r3 = mul(ma3,mb3)
assert r3 == [
[3*2 + 5*6]
]
ma4 = [
[3,5],
[7,1],
[9,4]
]
mb4 = [
[4,1,5,7],
[8,5,2,7]
]
r4 = mul(ma4,mb4)
assert r4 == [
[52, 28, 25, 56],
[36, 12, 37, 56],
[68, 29, 53, 91]
]
# TEST END
check_nqueen¶
✪✪✪✪ This is a hard problem but don’t worry, exam exercises will be simpler!
You have an nxn matrix of booleans representing a chessboard where True means there is a queen in a cell,and False there is nothing.
For the sake of visualization, we can represent a configurations using o
to mean False
and letters like ‘A’ and ‘B’ are queens. Contrary to what we’ve done so far, for later convenience we show the matrix with the j
going from bottom to top.
Let’s see an example. In this case A and B can not attack each other, so the algorithm would return True
:
7 ......B.
6 ........
5 ........
4 ........
3 ....A...
2 ........
1 ........
0 ........
i
j 01234567
Let's see why by evidencing A attack lines ..
7 \...|.B.
6 .\..|../
5 ..\.|./.
4 ...\|/..
3 ----A---
2 .../|\..
1 ../.|.\.
0 ./..|..\
i
j 01234567
... and B attack lines:
7 ------B-
6 ...../|\
5 ..../.|.
4 .../..|.
3 ../.A.|.
2 ./....|.
1 /.....|.
0 ......|.
i
j 01234567
In this other case the algorithm would return False as A
and B
can attack each other:
7 \./.|...
6 -B--|--/
5 /|\.|./.
4 .|.\|/..
3 ----A---
2 .|./|\..
1 .|/.|.\.
0 ./..|..\
i
j 01234567
In your algorithm, first you need to scan for queens. When you find one (and for each one of them !), you need to check if it can hit some other queen. Let’s see how:
In this 7x7 table we have only one queen A, with at position i=1
and j=4
6 ....|..
5 \...|..
4 .\..|..
3 ..\.|./
2 ...\|/.
1 ----A--
0 .../|\.
i
j 0123456
To completely understand the range of the queen and how to calculate the diagonals, it is convenient to visually extend the table like so to have the diagonals hit the vertical axis. Notice we also added letters y
and x
NOTE: in the algorithm you do not need to extend the matrix !
y
6 ....|....
5 \...|.../
4 .\..|../.
3 ..\.|./..
2 ...\|/...
1 ----A----
0 .../|\...
-1 ../.|.\..
-2 ./..|..\.
-3 /...|...\
i
j 01234567 x
We see that the top-left to bottom-right diagonal hits the vertical axis at y = 5
and the bottom-left to top-right diagonal hits the axis at y = -3
. You should use this info to calculate the line equations.
Now you should have all the necessary hints to proceed with the implementation.
[50]:
def check_nqueen(mat):
""" Takes an nxn matrix of booleans representing a chessboard where True means there is a queen in a cell,
and False there is nothing. RETURN True if no queen can attack any other one, False otherwise
"""
#jupman-raise
# bottom-left to top-right line equation
# y = x - 3
# -3 = -j + i
# y = x -j + i
# top-left to bottom-right line equation
# y = x + 5
# 5 = j + i
# y = x + j + i
n = len(mat)
for i in range(n):
for j in range(n):
if mat[i][j]: # queen is found at i,j
for y in range(n): # vertical scan
if y != i and mat[y][j]:
return False
for x in range(n): # horizontal scan
if x != j and mat[i][x]:
return False
for x in range(n):
y = x + j + i # top-left to bottom-right
if y >= 0 and y < n and y != i and x != j and mat[y][x]:
return False
y = x - j + i # bottom-left to top-right
if y >= 0 and y < n and y != i and x != j and mat[y][x]:
return False
return True
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert check_nqueen([
[True]
])
assert check_nqueen([
[True, True],
[False, False]
]) == False
assert check_nqueen([
[True, False],
[False, True]
]) == False
assert check_nqueen([
[True, False],
[True, False]
]) == False
assert check_nqueen([
[True, False, False],
[False, False, True],
[False, False, False]
]) == True
assert check_nqueen([
[True, False, False],
[False, False, False],
[False, False, True]
]) == False
assert check_nqueen([
[False, True, False],
[False, False, False],
[False, False, True]
]) == True
assert check_nqueen([
[False, True, False],
[False, True, False],
[False, False, True]
]) == False
# TEST END
Matrices: Numpy solutions¶
Introduction¶
References:
Previously we’ve seen Matrices as lists of lists, here we focus on matrices using Numpy library
There are substantially two ways to represent matrices in Python: as list of lists, or with the external library numpy. The most used is surely Numpy, let’s see the reason the principal differences:
List of lists - see separate notebook
native in Python
not efficient
lists are pervasive in Python, probably you will encounter matrices expressed as list of lists anyway
give an idea of how to build a nested data structure
may help in understanding important concepts like pointers to memory and copies
Numpy - this notebook
not natively available in Python
efficient
many libraries for scientific calculations are based on Numpy (scipy, pandas)
syntax to access elements is slightly different from list of lists
in rare cases might give problems of installation and/or conflicts (implementation is not pure Python)
Here we will see data types and essential commands of Numpy library, but we will not get into the details.
The idea is to simply pass using the the data format ndarray
without caring too much about performances: for example, even if for
cycles in Python are slow because they operate cell by cell, we will use them anyway. In case you actually need to execute calculations fast, you will want to use operators on vectors but for this we invite you to read links below
ATTENTION: if you want to use Numpy in Python tutor, instead of default interpreter Python 3.6
you will need to select Python 3.6 with Anaconda
(at May 2019 results marked as experimental)
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-exercises
|- matrices-numpy
|- matrices-numpy-exercise.ipynb
|- matrices-numpy-solution.ipynb
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/matrices-numpy/matrices-numpy-exercise.ipynb
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
np.array¶
First of all, we import the library, and for convenience we rename it to ‘np’:
[2]:
import numpy as np
With lists of lists we have often built the matrices one row at a time, adding lists as needed. In Numpy instead we usually create in one shot the whole matrix, filling it with zeroes.
In particular, this command creates an ndarray
filled with zeroes:
[3]:
mat = np.zeros( (2,3) ) # 2 rows, 3 columns
[4]:
mat
[4]:
array([[0., 0., 0.],
[0., 0., 0.]])
Note like inside array( )
the content seems represented like a list of lists, BUT in reality in physical memory the data is structured in a linear sequence which allows Python to access numbers in a faster way.
To access data or overwrite square bracket notation is used, with the important difference that in Numpy you can write bot the indeces inside the same brackets, separated by a comma:
ATTENTION: notation mat[i,j]
is only for Numpy, with list of lists does not work!
Let’s put number 0
in cell at row 0
and column 1
[5]:
mat[0,1] = 9
[6]:
mat
[6]:
array([[0., 9., 0.],
[0., 0., 0.]])
Let’s access cell at row 0
and column 1
[7]:
mat[0,1]
[7]:
9.0
We put number 7
into cell at row 1
and column 2
[8]:
mat[1,2] = 7
[9]:
mat
[9]:
array([[0., 9., 0.],
[0., 0., 7.]])
To get the dimension, we write like the following:
ATTENTIONE: after shape
there are no round parenthesis !
shape
is an attribute, not a function to call
[10]:
mat.shape
[10]:
(2, 3)
If we want to memorize the dimension in separate variables, we can use thi more pythonic mode (note the comma between num_rows
and num_cols
:
[11]:
num_rows, num_cols = mat.shape
[12]:
num_rows
[12]:
2
[13]:
num_cols
[13]:
3
✪ Exercise: try to write like the following, what happens?
mat[0,0] = "c"
[14]:
# write here
We can also create an ndarray
starting from a list of lists:
[15]:
mat = np.array( [ [5.0,8.0,1.0],
[4.0,3.0,2.0]])
[16]:
mat
[16]:
array([[5., 8., 1.],
[4., 3., 2.]])
[17]:
type(mat)
[17]:
numpy.ndarray
[18]:
mat[1,1]
[18]:
3.0
✪ Exercise: Try to write like this and check what happens:
mat[1,1.0]
[19]:
# write here
NaNs and infinities¶
Float numbers can be numbers and…. not numbers, and infinities. Sometimes during calculations extremal conditions may arise, like when dividing a small number by a huge number. In such cases, you might end up having a float which is a dreaded Not a Number, NaN for short, or you might get an infinity. This can lead to very awful unexpected behaviours, so you must be well aware of it.
Following behaviours are dictated by IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754) which Numpy uses and is implemented in all CPUs, so they actually regard all programming languages.
NaNs¶
A NaN is Not a Number. Which is already a silly name, since a NaN is actually a very special member of floats, with this astonishing property:
WARNING: NaN IS NOT EQUAL TO ITSELF !!!!
Yes you read it right, NaN is really not equal to itself.
Even if your mind wants to refuse it, we are going to confirm it.
To get a NaN, you can use Python module math
which holds this alien item:
[20]:
import math
math.nan # notice it prints as 'nan' with lowercase n
[20]:
nan
As we said, a NaN is actually considered a float:
[21]:
type(math.nan)
[21]:
float
Still, it behaves very differently from its fellow floats, or any other object in the known universe:
[22]:
math.nan == math.nan # what the F... alse
[22]:
False
Detecting NaN¶
Given the above, if you want to check if a variable x
is a NaN, you cannot write this:
[23]:
x = math.nan
if x == math.nan: # WRONG
print("I'm NaN ")
else:
print("x is something else ??")
x is something else ??
To correctly handle this situation, you need to use math.isnan
function:
[24]:
x = math.nan
if math.isnan(x): # CORRECT
print("x is NaN ")
else:
print("x is something else ??")
x is NaN
Notice math.isnan
also work with negative NaN:
[25]:
y = -math.nan
if math.isnan(y): # CORRECT
print("y is NaN ")
else:
print("y is something else ??")
y is NaN
Sequences with NaNs¶
Still, not everything is completely crazy. If you compare a sequence holding NaNs to another one, you will get reasonable results:
[26]:
[math.nan, math.nan] == [math.nan, math.nan]
[26]:
True
Exercise NaN: two vars¶
Given two number variables x
and y
, write some code that prints "same"
when they are the same, even when they are NaN. Otherwise, prints `”not the same”
[27]:
# expected output: same
x = math.nan
y = math.nan
# expected output: not the same
#x = 3
#y = math.nan
# expected output: not the same
#x = math.nan
#y = 5
# expected output: not the same
#x = 2
#y = 7
# expected output: same
#x = 4
#y = 4
# write here
if math.isnan(x) and math.isnan(y):
print('same')
elif x == y:
print('same')
else:
print('not the same')
same
Operations on NaNs¶
Any operation on a NaN will generate another NaN:
[28]:
5 * math.nan
[28]:
nan
[29]:
math.nan + math.nan
[29]:
nan
[30]:
math.nan / math.nan
[30]:
nan
The only thing you cannot do is dividing by zero with an unboxed NaN:
math.nan / 0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-94-1da38377fac4> in <module>
----> 1 math.nan / 0
ZeroDivisionError: float division by zero
NaN corresponds to boolean value True
:
[31]:
if math.nan:
print("That's True")
That's True
NaN and Numpy¶
When using Numpy you are quite likely to encounter NaNs, so much so they get redefined inside Numpy, but they are exactly the same as in math
module:
[32]:
np.nan
[32]:
nan
[33]:
math.isnan(np.nan)
[33]:
True
[34]:
np.isnan(math.nan)
[34]:
True
In Numpy when you have unknown numbers you might be tempted to put a None
. You can actually do it, but look closely at the result:
[35]:
import numpy as np
np.array([4.9,None,3.2,5.1])
[35]:
array([4.9, None, 3.2, 5.1], dtype=object)
The resulting array type is not an array of float64 which allows fast calculations, instead it is an array containing generic objects, as Numpy is assuming the array holds heterogenous data. So what you gain in generality you lose it in performance, which should actually be the whole point of using Numpy.
Despite being weird, NaNs are actually regular float citizen so they can be stored in the array:
[36]:
np.array([4.9,np.nan,3.2,5.1]) # Notice how the `dtype=object` has disappeared
[36]:
array([4.9, nan, 3.2, 5.1])
Where are the NaNs ?¶
Let’s try to see where we can spot NaNs and other weird things such infinities in the wild
First, let check what happens when we call function log
of standard module math
. As we know, log function behaves like this:
\(x < 0\): not defined
\(x = 0\): tends to minus infinity
\(x > 0\): defined
So we might wonder what happens when we pass to it a value where it is not defined:
>>> math.log(-1)
ValueError Traceback (most recent call last)
<ipython-input-38-d6e02ba32da6> in <module>
----> 1 math.log(-1)
ValueError: math domain error
Let’s try the equivalent with Numpy:
[37]:
np.log(-1)
/home/da/Da/bin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in log
"""Entry point for launching an IPython kernel.
[37]:
nan
Notice we actually got as a result np.nan
, even if Jupyter is printing a warning.
The default behaviour of Numpy regarding dangerous calculations is to perform them anyway and storing the result in as a NaN or other limit objects. This also works for arrays calculations:
[38]:
np.log(np.array([3,7,-1,9]))
/home/da/Da/bin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in log
"""Entry point for launching an IPython kernel.
[38]:
array([1.09861229, 1.94591015, nan, 2.19722458])
Infinities¶
As we said previously, NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754). Since somebody at IEEE decided to capture the misteries of infinity into floating numbers, we have yet another citizen to take into account when performing calculations (for more info see Numpy documentation on constants):
Positive infinity np.inf
¶
[39]:
np.array( [ 5 ] ) / 0
/home/da/Da/bin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
[39]:
array([inf])
[40]:
np.array( [ 6,9,5,7 ] ) / np.array( [ 2,0,0,4 ] )
/home/da/Da/bin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
[40]:
array([3. , inf, inf, 1.75])
Be aware that:
Not a Number is not equivalent to infinity
positive infinity is not equivalent to negative infinity
infinity is equivalent to positive infinity
This time, infinity is equal to infinity:
[41]:
np.inf == np.inf
[41]:
True
so we can safely detect infinity with ==
:
[42]:
x = np.inf
if x == np.inf:
print("x is infinite")
else:
print("x is finite")
x is infinite
Alternatively, we can use the function np.isinf
:
[43]:
np.isinf(np.inf)
[43]:
True
Negative infinity¶
We can also have negative infinity, which is different from positive infinity:
[44]:
-np.inf == np.inf
[44]:
False
Note that isinf
detects both positive and negative:
[45]:
np.isinf(-np.inf)
[45]:
True
To actually check for negative infinity you have to use isneginf
:
[46]:
np.isneginf(-np.inf)
[46]:
True
[47]:
np.isneginf(np.inf)
[47]:
False
Where do they appear? As an example, let’s try np.log
function:
[48]:
np.log(0)
/home/da/Da/bin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log
"""Entry point for launching an IPython kernel.
[48]:
-inf
Combining infinities and NaNs¶
When performing operations involving infinities and NaNs, IEEE arithmetics tries to mimic classical analysis, sometimes including NaN as a result:
[49]:
np.inf + np.inf
[49]:
inf
[50]:
- np.inf - np.inf
[50]:
-inf
[51]:
np.inf * -np.inf
[51]:
-inf
What in classical analysis would be undefined, here becomes NaN:
[52]:
np.inf - np.inf
[52]:
nan
[53]:
np.inf / np.inf
[53]:
nan
As usual, combining with NaN results in NaN:
[54]:
np.inf + np.nan
[54]:
nan
[55]:
np.inf / np.nan
[55]:
nan
Negative zero¶
We can even have a negative zero - who would have thought?
[56]:
np.NZERO
[56]:
-0.0
Negative zero of course pairs well with the more known and much appreciated positive zero:
[57]:
np.PZERO
[57]:
0.0
NOTE: Writing np.NZERO
or -0.0
is exactly the same thing. Same goes for positive zero.
At this point, you might start wondering with some concern if they are actually equal. Let’s try:
[58]:
0.0 == -0.0
[58]:
True
Great! Finally one thing that makes sense.
Given the above, you might think in a formula you can substitute one for the other one and get same results, in harmony with the rules of the universe.
Let’s make an attempt of substitution, as an example we first try dividing a number by positive zero (even if math teachers tell us such divisions are forbidden) - what will we ever get??
\(\frac{5.0}{0.0}=???\)
In Numpy terms, we might write like this to box everything in arrays:
[59]:
np.array( [ 5.0 ] ) / np.array( [ 0.0 ] )
/home/da/Da/bin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
[59]:
array([inf])
Hmm, we got an array holding an np.inf
.
If 0.0
and -0.0
are actually the same, dividing a number by -0.0
we should get the very same result, shouldn’t we?
Let’s try:
[60]:
np.array( [ 5.0 ] ) / np.array( [ -0.0 ] )
/home/da/Da/bin/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
[60]:
array([-inf])
Oh gosh. This time we got an array holding a negative infinity -np.inf
If all of this seems odd to you, do not bash at Numpy. This is the way pretty much any CPUs does floating point calculations so you will find it in almost ALL computer languages.
What programming languages can do is add further controls to protect you from paradoxical situations, for example when you directly write 1.0/0.0
Python raises ZeroDivisionError
(blocking thus execution), and when you operate on arrays Numpy emits a warning (but doesn’t block execution).
Exercise: detect proper numbers¶
Write some code that PRINTS equal numbers
if two numbers x
and y
passed are equal and actual numbers, and PRINTS not equal numbers
otherwise.
NOTE: not equal numbers
must be printed if any of the numbers is infinite or NaN.
To solve it, feel free to call functions indicated in Numpy documentation about costants
[1]:
# expected: equal numbers
x = 5
y = 5
# expected: not equal numbers
#x = np.inf
#y = 3
# expected: not equal numbers
#x = 3
#y = np.inf
# expected: not equal numbers
#x = np.inf
#y = np.nan
# expected: not equal numbers
#x = np.nan
#y = np.inf
# expected: not equal numbers
#x = np.nan
#y = 7
# expected: not equal numbers
#x = 9
#y = np.nan
# expected: not equal numbers
#x = np.nan
#y = np.nan
# write here
# SOLUTION 1 - the ugly one
if np.isinf(x) or np.isinf(y) or np.isnan(x) or np.isnan(y):
print('not equal numbers')
else:
print('equal numbers')
# SOLUTION 2 - the pretty one
if np.isfinite(x) and np.isfinite(y):
print('equal numbers')
else:
print('not equal numbers')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-32186ec2496f> in <module>()
35
36 # SOLUTION 1 - the ugly one
---> 37 if np.isinf(x) or np.isinf(y) or np.isnan(x) or np.isnan(y):
38 print('not equal numbers')
39 else:
NameError: name 'np' is not defined
Exercise: guess expressions¶
For each of the following expressions, try to guess the result
WARNING: the following may cause severe convulsions and nausea.
During clinical trials, both mathematically inclined and math-averse patients have experienced illness, for different reasons which are currently being investigated.
a. 0.0 * -0.0
b. (-0.0)**3
c. np.log(-7) == math.log(-7)
d. np.log(-7) == np.log(-7)
e. np.isnan( 1 / np.log(1) )
f. np.sqrt(-1) * np.sqrt(-1) # sqrt = square root
g. 3 ** np.inf
h 3 ** -np.inf
i. 1/np.sqrt(-3)
j. 1/np.sqrt(-0.0)
m. np.sqrt(np.inf) - np.sqrt(-np.inf)
n. np.sqrt(np.inf) + ( 1 / np.sqrt(-0.0) )
o. np.isneginf(np.log(np.e) / np.sqrt(-0.0))
p. np.isinf(np.log(np.e) / np.sqrt(-0.0))
q. [np.nan, np.inf] == [np.nan, np.inf]
r. [np.nan, -np.inf] == [np.nan, np.inf]
s. [np.nan, np.inf] == [-np.nan, np.inf]
Verify comprehension¶
odd¶
✪✪✪ Takes a Numpy matrix mat
of dimension nrows
by ncols
containing integer numbers and RETURN a NEW Numpy matrix of dimension nrows
by ncols
which is like the original, ma in the cells which contained even numbers now there will be odd numbers obtained by summing 1
to the existing even number.
Example:
odd(np.array( [
[2,5,6,3],
[8,4,3,5],
[6,1,7,9]
]))
Must give as output
array([[ 3., 5., 7., 3.],
[ 9., 5., 3., 5.],
[ 7., 1., 7., 9.]])
Hints:
Since you need to return a matrix, start with creating an empty one
go through the whole input matrix with indeces
i
andj
[62]:
import numpy as np
def odd(mat):
#jupman-raise
nrows, ncols = mat.shape
ret = np.zeros( (nrows, ncols) )
for i in range(nrows):
for j in range(ncols):
if mat[i,j] % 2 == 0:
ret[i,j] = mat[i,j] + 1
else:
ret[i,j] = mat[i,j]
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = np.array([
[2],
])
m2 = np.array([
[3]
])
assert np.allclose(odd(m1),
m2)
assert m1[0][0] == 2 # checks we are not modifying original matrix
m3 = np.array( [
[2,5,6,3],
[8,4,3,5],
[6,1,7,9]
])
m4 = np.array( [
[3,5,7,3],
[9,5,3,5],
[7,1,7,9]
])
assert np.allclose(odd(m3),
m4)
# TEST END
doublealt¶
✪✪✪ Takes a Numpy matrix mat
of dimensions nrows
x ncols
containing integer numbers and RETURN a NEW Numpy matrix of dimension nrows
x ncols
having at rows of even index the numbers of original matrix multiplied by two, and at rows of odd index the same numbers as the original matrix.
Example:
m = np.array( [ # index
[ 2, 5, 6, 3], # 0 even
[ 8, 4, 3, 5], # 1 odd
[ 7, 1, 6, 9], # 2 even
[ 5, 2, 4, 1], # 3 odd
[ 6, 3, 4, 3] # 4 even
])
A call to
doublealt(m)
will return the Numpy matrix:
array([[ 4, 10, 12, 6],
[ 8, 4, 3, 5],
[14, 2, 12, 18],
[ 5, 2, 4, 1],
[12, 6, 8, 6]])
[63]:
import numpy as np
def doublealt(mat):
#jupman-raise
nrows, ncols = mat.shape
ret = np.zeros( (nrows, ncols) )
for i in range(nrows):
for j in range(ncols):
if i % 2 == 0:
ret[i,j] = mat[i,j] * 2
else:
ret[i,j] = mat[i,j]
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = np.array([
[2],
])
m2 = np.array([
[4]
])
assert np.allclose(doublealt(m1),
m2)
assert m1[0][0] == 2 # checks we are not modifying original matrix
m3 = np.array( [
[ 2, 5, 6],
[ 8, 4, 3]
])
m4 = np.array( [
[ 4,10,12],
[ 8, 4, 3]
])
assert np.allclose(doublealt(m3),
m4)
m5 = np.array( [
[ 2, 5, 6, 3],
[ 8, 4, 3, 5],
[ 7, 1, 6, 9],
[ 5, 2, 4, 1],
[ 6, 3, 4, 3]
])
m6 = np.array( [
[ 4,10,12, 6],
[ 8, 4, 3, 5],
[14, 2,12,18],
[ 5, 2, 4, 1],
[12, 6, 8, 6]
])
assert np.allclose(doublealt(m5),
m6)
# TEST END
frame¶
✪✪✪ RETURN a NEW Numpy matrix of n
rows and n
columns, in which all the values are zero except those on borders, which must be equal to a given k
For example, frame(4, 7.0)
must give:
array([[7.0, 7.0, 7.0, 7.0],
[7.0, 0.0, 0.0, 7.0],
[7.0, 0.0, 0.0, 7.0],
[7.0, 7.0, 7.0, 7.0]])
Ingredients:
create a matrix filled with zeros. ATTENTION: which dimensions does it have? Do you need
n
ork
? Read WELL the text.start by filling the cells of first row with
k
values. To iterate along the first row columns, use afor j in range(n)
fill other rows and columns, using appropriate
for
[64]:
def frame(n, k):
#jupman-raise
mat = np.zeros( (n,n) )
for i in range(n):
mat[0, i] = k
mat[i, 0] = k
mat[i, n-1] = k
mat[n-1, i] = k
return mat
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
expected_mat = np.array( [[7.0, 7.0, 7.0, 7.0],
[7.0, 0.0, 0.0, 7.0],
[7.0, 0.0, 0.0, 7.0],
[7.0, 7.0, 7., 7.0]])
# all_close return Ture if all the values in the first matrix are close enough
# (that is, within a given tolerance) to corresponding values in the second
assert np.allclose(frame(4, 7.0), expected_mat)
expected_mat = np.array( [ [7.0]
])
assert np.allclose(frame(1, 7.0), expected_mat)
expected_mat = np.array( [ [7.0, 7.0],
[7.0, 7.0]
])
assert np.allclose(frame(2, 7.0), expected_mat)
# TEST END
chessboard¶
✪✪✪ RETURN a NEW Numpy matrix of n
rows and n
columns, in which all cells alternate zeros and ones.
For example, chessboard(4)
must give:
array([[1.0, 0.0, 1.0, 0.0],
[0.0, 1.0, 0.0, 1.0],
[1.0, 0.0, 1.0, 0.0],
[0.0, 1.0, 0.0, 1.0]])
Ingredients:
to alternate, you can use
range
in the form in which takes 3 parameters, for examplerange(0,n,2)
starts from 0, arrives ton
excluded by jumping one item at a time, generating 0,2,4,6,8, ….instead range(1,n,2) would generate 1,3,5,7, …
[65]:
def chessboard(n):
#jupman-raise
mat = np.zeros( (n,n) )
for i in range(0,n, 2):
for j in range(0,n, 2):
mat[i, j] = 1
for i in range(1,n, 2):
for j in range(1,n, 2):
mat[i, j] = 1
return mat
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
expected_mat = np.array([[1.0, 0.0, 1.0, 0.0],
[0.0, 1.0, 0.0, 1.0],
[1.0, 0.0, 1.0, 0.0],
[0.0, 1.0, 0.0, 1.0]])
# all_close return True if all the values in the first matrix are close enough
# (that is, within a certain tolerance) to the corresponding ones in the second matrix
assert np.allclose(chessboard(4), expected_mat)
expected_mat = np.array( [ [1.0]
])
assert np.allclose(chessboard(1), expected_mat)
expected_mat = np.array( [ [1.0, 0.0],
[0.0, 1.0]
])
assert np.allclose(chessboard(2), expected_mat)
# TEST END
altsum¶
✪✪✪ MODIFY the input Numpy matrix (n x n), by summing to all the odd rows the even rows. For example
m = [[1.0, 3.0, 2.0, 5.0],
[2.0, 8.0, 5.0, 9.0],
[6.0, 9.0, 7.0, 2.0],
[4.0, 7.0, 2.0, 4.0]]
altsum(m)
after the call to altsum m
should be:
m = [[1.0, 3.0, 2.0, 5.0],
[3.0, 11.0,7.0, 14.0],
[6.0, 9.0, 7.0, 2.0],
[10.0,16.0,9.0, 6.0]]
Ingredients:
to alternate, you can use
range
in the form in which takes 3 parameters, for examplerange(0,n,2)
starts from 0, arrives ton
excluded by jumping one item at a time, generating 0,2,4,6,8, ….instead
range(1,n,2)
would generate 1,3,5,7, ..
[66]:
def altsum(mat):
#jupman-raise
nrows, ncols = mat.shape
for i in range(1,nrows, 2):
for j in range(0,ncols):
mat[i, j] = mat[i,j] + mat[i-1, j]
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = np.array( [
[1.0, 3.0, 2.0, 5.0],
[2.0, 8.0, 5.0, 9.0],
[6.0, 9.0, 7.0, 2.0],
[4.0, 7.0, 2.0, 4.0]
])
r1 = np.array( [
[1.0, 3.0, 2.0, 5.0],
[3.0, 11.0,7.0, 14.0],
[6.0, 9.0, 7.0, 2.0],
[10.0,16.0,9.0, 6.0]
])
altsum(m1)
assert np.allclose(m1, r1)
m2 = np.array( [ [5.0] ])
r2 = np.array( [ [5.0] ])
altsum(m1)
assert np.allclose(m2, r2)
m3 = np.array( [ [6.0, 1.0],
[3.0, 2.0]
])
r3 = np.array( [ [6.0, 1.0],
[9.0, 3.0]
])
altsum(m3)
assert np.allclose(m3, r3)
# TEST END
avg_rows¶
✪✪✪ Takes a Numpy matrix n x m and RETURN a NEW Numpy matrix consisting in a single column in which the values are the average of the values in corresponding rows of input matrix
Example:
Input: 5x4 matrix
3 2 1 4
6 2 3 5
4 3 6 2
4 6 5 4
7 2 9 3
Output: 5x1 matrix
(3+2+1+4)/4
(6+2+3+5)/4
(4+3+6+2)/4
(4+6+5+4)/4
(7+2+9+3)/4
Ingredients:
create a matrix n x 1 to return, filling it with zeros
visit all cells of original matrix with two nested fors
during visit, accumulate in the matrix to return the sum of elements takes from each row of original matrix
once completed the sum of a row, you can divide it by the dimension of columns of original matrix
return the matrix
[67]:
def avg_rows(mat):
#jupman-raise
nrows, ncols = mat.shape
ret = np.zeros( (nrows,1) )
for i in range(nrows):
for j in range(ncols):
ret[i] += mat[i,j]
ret[i] = ret[i] / ncols
# for brevity we could also write
# ret[i] /= colonne
#/jupman-raise
return ret
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = np.array([ [5.0] ])
r1 = np.array([ [5.0] ])
assert np.allclose(avg_rows(m1), r1)
m2 = np.array([ [5.0, 3.0] ])
r2 = np.array([ [4.0] ])
assert np.allclose(avg_rows(m2), r2)
m3 = np.array([ [3,2,1,4],
[6,2,3,5],
[4,3,6,2],
[4,6,5,4],
[7,2,9,3] ])
r3 = np.array([ [(3+2+1+4)/4],
[(6+2+3+5)/4],
[(4+3+6+2)/4],
[(4+6+5+4)/4],
[(7+2+9+3)/4] ])
assert np.allclose(avg_rows(m3), r3)
# TEST END
avg_half¶
✪✪✪ Takes as input a Numpy matrix withan even number of columns, and RETURN as output a Numpy matrix 1x2, in which the first element will be the average of the left half of the matrix, and the second element will be the average of the right half.
Ingredients:
to obtain the number of columns divided by two as integer number, use
//
operator
[68]:
def avg_half(mat):
#jupman-raise
nrows, ncols = mat.shape
half_cols = ncols // 2
avg_sx = 0.0
avg_dx = 0.0
# scrivi qui
for i in range(nrows):
for j in range(half_cols):
avg_sx += mat[i,j]
for j in range(half_cols, ncols):
avg_dx += mat[i,j]
half_elements = nrows * half_cols
avg_sx /= half_elements
avg_dx /= half_elements
return np.array([avg_sx, avg_dx])
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = np.array([[3,2,1,4],
[6,2,3,5],
[4,3,6,2],
[4,6,5,4],
[7,2,9,3]])
r1 = np.array([(3+2+6+2+4+3+4+6+7+2)/10, (1+4+3+5+6+2+5+4+9+3)/10 ])
assert np.allclose( avg_half(m1), r1)
# TEST END
matxarr¶
✪✪✪ Takes a Numpy matrix n
x m
and an ndarray
of m
elements, and RETURN a NEW Numpy matrix in which the values of each column of input matrix are multiplied by the corresponding value in the n
elements array.
[69]:
def matxarr(mat, arr):
#jupman-raise
ret = np.zeros( mat.shape )
for i in range(mat.shape[0]):
for j in range(mat.shape[1]):
ret[i,j] = mat[i,j] * arr[j]
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = np.array([ [3,2,1],
[6,2,3],
[4,3,6],
[4,6,5]])
a1 = [5, 2, 6]
r1 = [ [3*5, 2*2, 1*6],
[6*5, 2*2, 3*6],
[4*5, 3*2, 6*6],
[4*5, 6*2, 5*6]]
assert np.allclose(matxarr(m1,a1), r1)
# TEST END
quadrants¶
✪✪✪ Given a matrix 2n * 2n
, divide the matrix in 4 equal square parts (see example) and RETURN a NEW matrix 2 * 2
containing the average of each quadrant.
We assume the matrix is always of even dimensions
HINT: to divide by two and obtain an integer number, use //
operator
Example:
1, 2 , 5 , 7
4, 1 , 8 , 0
2, 0 , 5 , 1
0, 2 , 1 , 1
can be divided in
1, 2 | 5 , 7
4, 1 | 8 , 0
-----------------
2, 0 | 5 , 1
0, 2 | 1 , 1
and returns
(1+2+4+1)/ 4 | (5+7+8+0)/4 2.0 , 5.0
----------------------------- => 1.0 , 2.0
(2+0+0+2)/4 | (5+1+1+1)/4
[70]:
import numpy as np
def quadrants(mat):
#jupman-raise
ret = np.zeros( (2,2) )
dim = mat.shape[0]
n = dim // 2
elements_per_quad = n * n
for i in range(n):
for j in range(n):
ret[0,0] += mat[i,j]
ret[0,0] /= elements_per_quad
for i in range(n,dim):
for j in range(n):
ret[1,0] += mat[i,j]
ret[1,0] /= elements_per_quad
for i in range(n,dim):
for j in range(n,dim):
ret[1,1] += mat[i,j]
ret[1,1] /= elements_per_quad
for i in range(n):
for j in range(n,dim):
ret[0,1] += mat[i,j]
ret[0,1] /= elements_per_quad
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
assert np.allclose(
quadrants(np.array([
[3.0, 5.0],
[4.0, 9.0],
])),
np.array([
[3.0, 5.0],
[4.0, 9.0],
]))
assert np.allclose(
quadrants(np.array([
[1.0, 2.0 , 5.0 , 7.0],
[4.0, 1.0 , 8.0 , 0.0],
[2.0, 0.0 , 5.0 , 1.0],
[0.0, 2.0 , 1.0 , 1.0]
])),
np.array([
[2.0, 5.0],
[1.0, 2.0]
]))
# TEST END
matrot¶
✪✪✪ RETURN a NEW Numpy matrix which has the numbers of input matrix rotated by a column.
With rotation we mean that:
if a number of input matrix is found in column
j
, in the output matrix it will be in the columnj+1
in the same row.if a number is found in the last column, in the output matrix it will be in the zertoth column
Example:
If we have as input:
np.array( [
[0,1,0],
[1,1,0],
[0,0,0],
[0,1,1]
])
We expect as output:
np.array( [
[0,0,1],
[0,1,1],
[0,0,0],
[1,0,1]
])
[71]:
import numpy as np
def matrot(mat):
#jupman-raise
ret = np.zeros(mat.shape)
for i in range(mat.shape[0]):
ret[i,0] = mat[i,-1]
for j in range(1, mat.shape[1]):
ret[i,j] = mat[i,j-1]
return ret
#/jupman-raise
# TEST START - DO NOT TOUCH!
# if you wrote the whole code correct, and execute the cell, Python shouldn't raise `AssertionError`
m1 = np.array( [ [1] ])
r1 = np.array( [ [1] ])
assert np.allclose(matrot(m1), r1)
m2 = np.array( [ [0,1] ])
r2 = np.array( [ [1,0] ])
assert np.allclose(matrot(m2), r2)
m3 = np.array( [ [0,1,0] ])
r3 = np.array( [ [0,0,1] ])
assert np.allclose(matrot(m3), r3)
m4 = np.array( [
[0,1,0],
[1,1,0]
])
r4 = np.array( [
[0,0,1],
[0,1,1]
])
assert np.allclose(matrot(m4), r4)
m5 = np.array([
[0,1,0],
[1,1,0],
[0,0,0],
[0,1,1]
])
r5 = np.array([
[0,0,1],
[0,1,1],
[0,0,0],
[1,0,1]
])
assert np.allclose(matrot(m5), r5)
# TEST END
Other Numpy exercises¶
Try to do exercises from liste di liste using Numpy instead.
try to do the exercises more performant by using Numpy features and functions (i.e.
2*arr
multiplies all numbers in arr without the need of a slow Pythonfor
)(in inglese) machinelearningplus Esercizi su Numpy (Fermarsi a difficoltà L1, L2 e se vuoi prova L3)
[ ]:
Data formats solutions¶
Introduction¶
Here we review how to load and write tabular data such as CSV, tree-like data such as JSON files, and how to fetch it from the web with webapis.
Graph formats are treated in a separate notebook.
In this tutorial we will talk about data formats
textual files
line-based files
CSV
opendata catalogs
license mention (creative commons, ..)
In a separate notebook we will discuss graph formats
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-exercises
|- matrices
|- formats-exercise.ipynb
|- formats-solution.ipynb
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/matrix-networks/matrix-networks-exercise.ipynb
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
1. line files¶
Line files are typically text files which contain information grouped by lines. An example using historical characters might be like the following:
Leonardo
da Vinci
Sandro
Botticelli
Niccolò
Macchiavelli
We can immediately see a regularity: first two lines contain data of Leonardo da Vinci, second one the name and then the surname. Successive lines instead have data of Sandro Botticelli, with again first the name and then the surname and so on.
We might want to do a program that reads the lines and prints on the terminal names and surnames like the following:
Leonardo da Vinci
Sandro Botticelli
Niccolò Macchiavelli
To start having an approximation of the final result, we can open the file, read only the first line and print it:
[1]:
with open('people-simple.txt', encoding='utf-8') as f:
line=f.readline()
print(line)
Leonardo
What happened? Let’s examing first rows:
open command¶
The command
open('people-simple.txt', encoding='utf-8')
allows us to open the text file by telling PYthon the file path 'people-simple.txt'
and the encoding in which it was written (encoding='utf-8'
).
The encoding¶
The encoding dependes on the operating system and on the editor used to write the file. When we open a file, Python is not capable to divine the encoding, and if we do not specify anything Python might open the file assuming an encoding different from the original - in other words, if we omit the encoding (or we put a wrong one) we might end up seeing weird characters (like little squares instead of accented letters).
In general, when you open a file, try first to specify the encoding utf-8
which is the most common one. If it doesn’t work try others, for example for files written in south Europe with Windows you might check encoding='latin-1'
. If you open a file written elsewhere, you might need other encodings. For more in-depth information, you can read Dive into Python - Chapter 4 - Strings, and Dive into Python - Chapter 11 -
File, both of which are extremely recommended readings.
with block¶
The with
defines a block with instructions inside:
with open('people-simple.txt', encoding='utf-8') as f:
line=f.readline()
print(line)
We used the with
to tell PYthon that in any case, even if errors occur, we want that after having used the file, that is after having executed the instructions inside the internal block (the line=f.readline()
and print(line)
) Python must automatically close the file. Properly closing a file avoids to waste memory resources and creating hard to find paranormal errors. If you want to avoid hunting for never closed zombie files, always remember to open all files in with
blocks!
Furthermore, at the end of the row in the part as f:
we assigned the file to a variable hereby called f
, but we could have used any other name we liked.
WARNING: To indent the code, ALWAYS use sequences of four white spaces. Sequences of 2 spaces. Sequences of only 2 spaces even if allowed are not recommended.
WARNING: Depending on the editor you use, by pressing TAB you might get a sequence o f white spaces like it happens in Jupyter (4 spaces which is the recommended length), or a special tabulation character (to avoid)! As much as this annoying this distinction might appear, remember it because it might generate very hard to find errors.
WARNING: In the commands to create blocks such as with
, always remember to put the character of colon :
at the end of the line !
The command
line=f.readline()
puts in the variable line
the entire line, like a string. Warning: the string will contain at the end the special character of line return !
You might wonder from where that readline
comes from. Like everything in Python, our variable f
which represents the file we just opened is an object, and like any object, depending on its type, it has particular methods we can use on it. In this case the method is readline
.
The following command prints the string content:
print(line)
✪ 1.1 Exercise: Try to rewrite here the block we’ve just seen, and execute the cell by pressing Control-Enter. Rewrite the code with the fingers, not with copy-paste ! Pay attention to correct indentation with spaces in the block.
[2]:
# write here
with open('people-simple.txt', encoding='utf-8') as f:
line=f.readline()
print(line)
Leonardo
✪ 1.2 Exercise: you might wondering what exactly is that f
, and what exatly the method readlines
should be doing. When you find yourself in these situations, you might help yourself with functions type
and help
. This time, directly copy paste the same code here, but insert inside with
block the commands:
print(type(f))
print(help(f))
print(help(f.readline))
# Attention: remember the f. before the readline !!
Every time you add something, try to execute with Control+Enter and see what happens
[3]:
# write here the code (copy and paste)
with open('people-simple.txt', encoding='utf-8') as f:
line=f.readline()
print(type(f))
print(help(f.readline))
print(help(f))
print(line)
<class '_io.TextIOWrapper'>
Help on built-in function readline:
readline(size=-1, /) method of _io.TextIOWrapper instance
Read until newline or EOF.
Returns an empty string if EOF is hit immediately.
None
Help on TextIOWrapper object:
class TextIOWrapper(_TextIOBase)
| Character and line based layer over a BufferedIOBase object, buffer.
|
| encoding gives the name of the encoding that the stream will be
| decoded or encoded with. It defaults to locale.getpreferredencoding(False).
|
| errors determines the strictness of encoding and decoding (see
| help(codecs.Codec) or the documentation for codecs.register) and
| defaults to "strict".
|
| newline controls how line endings are handled. It can be None, '',
| '\n', '\r', and '\r\n'. It works as follows:
|
| * On input, if newline is None, universal newlines mode is
| enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
| these are translated into '\n' before being returned to the
| caller. If it is '', universal newline mode is enabled, but line
| endings are returned to the caller untranslated. If it has any of
| the other legal values, input lines are only terminated by the given
| string, and the line ending is returned to the caller untranslated.
|
| * On output, if newline is None, any '\n' characters written are
| translated to the system default line separator, os.linesep. If
| newline is '' or '\n', no translation takes place. If newline is any
| of the other legal values, any '\n' characters written are translated
| to the given string.
|
| If line_buffering is True, a call to flush is implied when a call to
| write contains a newline character.
|
| Method resolution order:
| TextIOWrapper
| _TextIOBase
| _IOBase
| builtins.object
|
| Methods defined here:
|
| __getstate__(...)
|
| __init__(self, /, *args, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| __next__(self, /)
| Implement next(self).
|
| __repr__(self, /)
| Return repr(self).
|
| close(self, /)
| Flush and close the IO object.
|
| This method has no effect if the file is already closed.
|
| detach(self, /)
| Separate the underlying buffer from the TextIOBase and return it.
|
| After the underlying buffer has been detached, the TextIO is in an
| unusable state.
|
| fileno(self, /)
| Returns underlying file descriptor if one exists.
|
| OSError is raised if the IO object does not use a file descriptor.
|
| flush(self, /)
| Flush write buffers, if applicable.
|
| This is not implemented for read-only and non-blocking streams.
|
| isatty(self, /)
| Return whether this is an 'interactive' stream.
|
| Return False if it can't be determined.
|
| read(self, size=-1, /)
| Read at most n characters from stream.
|
| Read from underlying buffer until we have n characters or we hit EOF.
| If n is negative or omitted, read until EOF.
|
| readable(self, /)
| Return whether object was opened for reading.
|
| If False, read() will raise OSError.
|
| readline(self, size=-1, /)
| Read until newline or EOF.
|
| Returns an empty string if EOF is hit immediately.
|
| seek(self, cookie, whence=0, /)
| Change stream position.
|
| Change the stream position to the given byte offset. The offset is
| interpreted relative to the position indicated by whence. Values
| for whence are:
|
| * 0 -- start of stream (the default); offset should be zero or positive
| * 1 -- current stream position; offset may be negative
| * 2 -- end of stream; offset is usually negative
|
| Return the new absolute position.
|
| seekable(self, /)
| Return whether object supports random access.
|
| If False, seek(), tell() and truncate() will raise OSError.
| This method may need to do a test seek().
|
| tell(self, /)
| Return current stream position.
|
| truncate(self, pos=None, /)
| Truncate file to size bytes.
|
| File pointer is left unchanged. Size defaults to the current IO
| position as reported by tell(). Returns the new size.
|
| writable(self, /)
| Return whether object was opened for writing.
|
| If False, write() will raise OSError.
|
| write(self, text, /)
| Write string to stream.
| Returns the number of characters written (which is always equal to
| the length of the string).
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| buffer
|
| closed
|
| encoding
| Encoding of the text stream.
|
| Subclasses should override.
|
| errors
| The error setting of the decoder or encoder.
|
| Subclasses should override.
|
| line_buffering
|
| name
|
| newlines
| Line endings translated so far.
|
| Only line endings translated during reading are considered.
|
| Subclasses should override.
|
| ----------------------------------------------------------------------
| Methods inherited from _IOBase:
|
| __del__(...)
|
| __enter__(...)
|
| __exit__(...)
|
| __iter__(self, /)
| Implement iter(self).
|
| readlines(self, hint=-1, /)
| Return a list of lines from the stream.
|
| hint can be specified to control the number of lines read: no more
| lines will be read if the total size (in bytes/characters) of all
| lines so far exceeds hint.
|
| writelines(self, lines, /)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from _IOBase:
|
| __dict__
None
Leonardo
First we put the content of the first line into the variable line
, now we might put it in a variable witha more meaningful name, like name
. Also, we can directly read the next row into the variable surname
and then print the concatenation of both:
[4]:
with open('people-simple.txt', encoding='utf-8') as f:
name=f.readline()
surname=f.readline()
print(name + ' ' + surname)
Leonardo
da Vinci
PROBLEM ! The printing puts a weird carriage return. Why is that? If you remember, first we said that readline
reads the line content in a string adding to the end also the special newline character. To eliminate it, you can use the command rstrip()
:
[5]:
with open('people-simple.txt', encoding='utf-8') as f:
name=f.readline().rstrip()
surname=f.readline().rstrip()
print(name + ' ' + surname)
Leonardo da Vinci
✪ 1.3 Exercise: Again, rewrite the block above in the cell below, ed execute the cell with Control+Enter. Question: what happens if you use strip()
instead of rstrip()
? What about lstrip()
? Can you deduce the meaning of r
and l
? If you can’t manage it, try to use python command help
by calling help(string.rstrip)
[6]:
# write here
with open('people-simple.txt', encoding='utf-8') as f:
name=f.readline().rstrip()
surname=f.readline().rstrip()
print(name + ' ' + surname)
Leonardo da Vinci
Very good, we have the first line ! Now we can read all the lines in sequence. To this end, we can use a while
cycle:
[7]:
with open('people-simple.txt', encoding='utf-8') as f:
line=f.readline()
while line != "":
name = line.rstrip()
surname=f.readline().rstrip()
print(name + ' ' + surname)
line=f.readline()
Leonardo da Vinci
Sandro Botticelli
Niccolò Macchiavelli
NOTE: In Python there are shorter ways to read a text file line by line, we used this approach to make explicit all passages.
What did we do? First, we added a while
cycle in a new block
WARNING: In new block, since it is already within the external with
, the instructions are indented of 8 spaces and not 4! If you use the wrong spaces, bad things happen !
We first read a line, and two cases are possible:
we are the end of the file (or file is empty) : in this case
readline()
call returns an empty stringwe are not at the end of the file: the first line is put as a string inside the variable
line
. Since Python internally uses a pointer to keep track at which position we are when reading inside the file, after the read such pointer is moved at the beginning of the next line. This way the next call toreadline()
will read a line from the new position.
In while
block we tell Python to continue the cycle as long as line
is not empty. If this is the case, inside the while
block we parse the name from the line and put it in variable name
(removing extra newline character with rstrip()
as we did before), then we proceed reading the next line and parse the result inside the surname
variable. Finally, we read again a line into the line
variable so it will be ready for the next round of name extraction. If line is empty
the cycle will terminate:
while line != "": # enter cycle if line contains characters
name = line.rstrip() # parses the name
surname=f.readline().rstrip() # reads next line and parses surname
print(name + ' ' + surname)
line=f.readline() # read next line
✪ 1.4 EXERCISE: As before, rewrite in the cell below the code with the while
, paying attention to the indentation (for the external with
line use copy-and-paste):
[8]:
# write here the code of internal while
with open('people-simple.txt', encoding='utf-8') as f:
line=f.readline()
while line != "":
name = line.rstrip()
surname=f.readline().rstrip()
print(name + ' ' + surname)
line=f.readline()
Leonardo da Vinci
Sandro Botticelli
Niccolò Macchiavelli
people-complex line file:¶
Look at the file people-complex.txt
:
name: Leonardo
surname: da Vinci
birthdate: 1452-04-15
name: Sandro
surname: Botticelli
birthdate: 1445-03-01
name: Niccolò
surname: Macchiavelli
birthdate: 1469-05-03
Supposing to read the file to print this output, how would you do it?
Leonardo da Vinci, 1452-04-15
Sandro Botticelli, 1445-03-01
Niccolò Macchiavelli, 1469-05-03
Hint 1: to obtain the string 'abcde'
, the substring 'cde'
, which starts at index 2, you can ue the operator square brackets, using the index followed by colon :
[9]:
x = 'abcde'
x[2:]
[9]:
'cde'
[10]:
x[3:]
[10]:
'de'
Hint 2: To know the length of a string, use the function len
:
[11]:
len('abcde')
[11]:
5
✪ 1.5 Exercise: Write here the solution of the exercise ‘People complex’:
[12]:
# write here
with open('people-complex.txt', encoding='utf-8') as f:
line=f.readline()
while line != "":
name = line.rstrip()[len("name: "):]
surname= f.readline().rstrip()[len("surname: "):]
born = f.readline().rstrip()[len("birthdate: "):]
print(name + ' ' + surname + ', ' + born)
line=f.readline()
Leonardo da Vinci, 1452-04-15
Sandro Botticelli, 1445-03-01
Niccolò Macchiavelli, 1469-05-03
Exercise: line file immersione-in-python-toc¶
✪✪✪ This exercise is more challenging, if you are a beginner you might skip it and go on to CSVs
The book Dive into Python is nice and for the italian version there is a PDF, which has a problem though: if you try to print it, you will discover that the index is missing. Without despairing, we found a program to extract titles in a file as follows, but you will discover it is not exactly nice to see. Since we are Python ninjas, we decided to transform raw titles in a real table of contents. Sure enough there are smarter ways to do this, like loading the pdf in Python with an appropriate module for pdfs, still this makes for an interesting exercise.
You are given the file immersione-in-python-toc.txt
:
BookmarkBegin
BookmarkTitle: Il vostro primo programma Python
BookmarkLevel: 1
BookmarkPageNumber: 38
BookmarkBegin
BookmarkTitle: Immersione!
BookmarkLevel: 2
BookmarkPageNumber: 38
BookmarkBegin
BookmarkTitle: Dichiarare funzioni
BookmarkLevel: 2
BookmarkPageNumber: 41
BookmarkBeginint
BookmarkTitle: Argomenti opzionali e con nome
BookmarkLevel: 3
BookmarkPageNumber: 42
BookmarkBegin
BookmarkTitle: Scrivere codice leggibile
BookmarkLevel: 2
BookmarkPageNumber: 44
BookmarkBegin
BookmarkTitle: Stringhe di documentazione
BookmarkLevel: 3
BookmarkPageNumber: 44
BookmarkBegin
BookmarkTitle: Il percorso di ricerca di import
BookmarkLevel: 2
BookmarkPageNumber: 46
BookmarkBegin
BookmarkTitle: Ogni cosa è un oggetto
BookmarkLevel: 2
BookmarkPageNumber: 47
Write a python program to print the following output:
Il vostro primo programma Python 38
Immersione! 38
Dichiarare funzioni 41
Argomenti opzionali e con nome 42
Scrivere codice leggibile 44
Stringhe di documentazione 44
Il percorso di ricerca di import 46
Ogni cosa è un oggetto 47
For this exercise, you will need to insert in the output artificial spaces, in a qunatity determined by the rows BookmarkLevel
QUESTION: what’s that weird value è
at the end of the original file? Should we report it in the output?
HINT 1: To convert a string into an integer number, use the function int
:
[13]:
x = '5'
[14]:
x
[14]:
'5'
[15]:
int(x)
[15]:
5
Warning: int(x)
returns a value, and never modifies the argument x
!
HINT 2: To substitute a substring in a string, you can use the method .replace
:
[16]:
x = 'abcde'
x.replace('cd', 'HELLO' )
[16]:
'abHELLOe'
HINT 3: while there is only one sequence to substitute, replace
is fine, but if we had a milion of horrible sequences like >
, >
, &x3e;
, what should we do? As good data cleaners, we recognize these are HTML escape sequences, so we could use methods specific to sequences like html.escape. TRy it instead of replace
and check if it works!
NOTE: Before using html.unescape
, import the module html
with the command:
import html
HINT 4: To write n copies of a character, use *
like this:
[17]:
"b" * 3
[17]:
'bbb'
[18]:
"b" * 7
[18]:
'bbbbbbb'
IMPLEMENTATION: Write here the solution for the line file immersione-in-python-toc.txt
, and try execute it by pressing Control + Enter:
[19]:
# write here
import html
with open("immersione-in-python-toc.txt", encoding='utf-8') as f:
line=f.readline()
while line != "":
line = f.readline().strip()
title = html.unescape(line[len("BookmarkTitle: "):])
line=f.readline().strip()
level = int(line[len("BookmarkLevel: "):])
line=f.readline().strip()
page = line[len("BookmarkPageNumber: "):]
print((" " * level) + title + " " + page)
line=f.readline()
Il vostro primo programma Python 38
Immersione! 38
Dichiarare funzioni 41
Argomenti opzionali e con nome 42
Scrivere codice leggibile 44
Stringhe di documentazione 44
Il percorso di ricerca di import 46
Ogni cosa è un oggetto 47
2. File CSV¶
There can be various formats for tabular data, among which you surely know Excel (.xls
or .xslx
). Unfortunately, if you want to programmatically process data, you should better avoid them and prefer if possible the CSV format, literally ‘Comma Separated Value’. Why? Excel format is very complex and may hide several things which have nothing to do with the raw data:
formatting (bold fonts, colors …)
merged cells
formulas
multiple tabs
macros
Correctly parsing complex files may become a nightmare. Instead, CSVs are far simpler, so much so you can even open them witha simple text editor.
We will try to open some CSV, taking into consideration the possible problems we might get. CSVs are not necessarily the perfect solution for everything, but they offer more control over reading and typically if there are conversion problems is because we made a mistake, and not because the reader module decided on its own to exchange days with months in dates.
Why parsing a CSV ?¶
To load and process CSVs there exist many powerful and intuitive modules such as Pandas in Python or R dataframes. Yet, in this notebook we will load CSVs using the most simple method possible, that is reading row by row, mimicking the method already seen in the previous part of the tutorial. Don’t think this method is primitive or stupid, according to the situation it may save the day. How? Some files may potentially occupy huge amounts of memory, and in moder laptops as of 2019 we only have 4 gigabytes of RAM, the memory where Python stores variables. Given this, Python base functions to read files try their best to avoid loading everything in RAM. Tyipcally a file is read sequentially one piece at a time, putting in RAM only one row at a time.
QUESTION 2.1: if we want to know if a given file of 1000 terabytes contains only 3 million rows in which the word ‘ciao’ is present, are we obliged to put in RAM all of the rows ?
ANSWER: no, it is sufficient to keep in memory one row at a time, and hold the count in another variable
QUESTION 2.2: What if we wanted to take a 100 terabyte file and create another one by appending to each row of the first one the word ‘ciao’? Should we put in RAM at the same time all the rows of the first file ? What about the rows of second one?
ANSWER: No, it is enough to keep in RAM one row at a time, which is first read from the first file and then written right away in the second file.
Reading a CSV¶
We will start with artifical example CSV. Let’s look at example-1.csv which you can find in the same folder as this Jupyter notebook. It contains animals with their expected lifespan:
animal, lifespan
dog, 12
cat, 14
pelican, 30
squirrel, 6
eagle, 25
We notice right away that the CSV is more structured than files we’ve seen in the previous section
in the first line there are column names, separated with commas:
animal, lifespan
fields in successive rows are also separated by commas
,
:dog, 12
Let’s try now to import this file in Python:
[20]:
import csv
with open('example-1.csv', encoding='utf-8', newline='') as f:
# we create an object 'my_reader' which will take rows from the file
my_reader = csv.reader(f, delimiter=',')
# 'my_reader' is an object considered 'iterable', that is,
# if used in a 'for' will produce a sequnce of rows from csv
# NOTE: here every file row is converted into a list of Python strings !
for row in my_reader:
print('We just read a row !')
print(row) # prints variable 'row', which is a list of strings
print('') # prints an empty string, to separate in vertical
We just read a row !
['animal', ' lifespan']
We just read a row !
['dog', '12']
We just read a row !
['cat', '14']
We just read a row !
['pelican', '30']
We just read a row !
['squirrel', '6']
We just read a row !
['eagle', '25']
We immediatly notice from output that example file is being printed, but there are square parrenthesis ( []
). What do they mean? Those we printed are lists of strings
Let’s analyze what we did:
import csv
Python natively has a module to deal with csv files, which has the intuitive csv
name. With this instruction, we just loaded the module.
What happens next? As already did for files with lines before, we open the file in a with
block:
with open('example-1.csv', encoding='utf-8', newline='') as f:
my_reader = csv.reader(f, delimiter=',')
for row in my_reader:
print(row)
For now ignore the newline=''
and notice how first we specificed the encoding
Once the file is open, in the row
my_reader = csv.reader(f, delimiter=',')
we ask to csv
module to create a reader object called my_reader
for our file, telling Python that comma is the delimiter for fields.
NOTE: my_reader
is the name of the variable we are creating, it could be any name.
This reader object can be exploited as a sort of generator of rows by using a for
cycle:
for row in my_reader:
print(row)
In for
cycle we employ lettore
to iterate in the reading of the file, producing at each iteration a row we call row
(but it could be any name we like). At each iteration, the variable row
gets printed.
If you look closely the prints of first lists, you will see that each time to each row is assigned only one Python list. The list contains as many elements as the number of fields in the CSV.
✪ EXERCISE 2.3: Rewrite in the cell below the instructions to read and print the CSV, paying attention to indentation:
[21]:
import csv
with open('example-1.csv', encoding='utf-8', newline='') as f:
# we create an object 'my_reader' which will take rows from the file
my_reader = csv.reader(f, delimiter=',')
# 'my_reader' is an object considered 'iterable', that is,
# if used in a 'for' will produce a sequnce of rows from csv
# NOTE: here every file row is converted into a list of Python strings !
for row in my_reader:
print("We just read a row !")
print(row) # prints variable 'row', which is a list of strings
print('') # prints an empty string, to separate in vertical
We just read a row !
['animal', ' lifespan']
We just read a row !
['dog', '12']
We just read a row !
['cat', '14']
We just read a row !
['pelican', '30']
We just read a row !
['squirrel', '6']
We just read a row !
['eagle', '25']
✪✪ Exercise 2.4: try to put into big_list
a list containing all the rows extracted from the file, which will be a list of lists like so:
[['eagle', 'lifespan'],
['dog', '12'],
['cat', '14'],
['pelican', '30'],
['squirrel', '6'],
['eagle', '25']]
HINT: Try creating an empty list and then adding elements with .append
method
[22]:
# write here
import csv
with open('example-1.csv', encoding='utf-8', newline='') as f:
# we create an object 'my_reader' which will take rows from the file
my_reader = csv.reader(f, delimiter=',')
# 'my_reader' is an object considered 'iterable', that is,
# if used in a 'for' will produce a sequnce of rows from csv
# NOTE: here every file row is converted into a list of Python strings !
big_list = []
for row in my_reader:
big_list.append(row)
print(big_list)
[['animal', ' lifespan'], ['dog', '12'], ['cat', '14'], ['pelican', '30'], ['squirrel', '6'], ['eagle', '25']]
✪✪ EXERCISE 2.5: You may have noticed that numbers in lists are represented as strings like '12'
(note apeces), instead that like Python integer numbers (represented without apeces), 12
:
We just read a row!
['dog', '12']
So, by reading the file and using normal for cycles, try to create a new variable big_list
like this, which
has only data, the row with the header is not present
numbers are represented as proper integers
[['dog', 12],
['cat', 14],
['pelican', 30],
['squirrel', 6],
['eagle', 25]]
HINT 1: to jump a row you can use the instruction next(my_reader)
HINT 2: to convert a string into an integer, you can use for example. int('25')
[23]:
# write here
import csv
with open('example-1.csv', encoding='utf-8', newline='') as f:
my_reader = csv.reader(f, delimiter=',')
big_list = []
next(my_reader)
for row in my_reader:
big_list.append([row[0], int(row[1])])
print(big_list)
[['dog', 12], ['cat', 14], ['pelican', 30], ['squirrel', 6], ['eagle', 25]]
What’s a reader ?¶
We said that my_reader
generates a sequence of rows, and it is iterable. In for
cycle, at every cycle we ask to read a new line, which is put into variable row
. We might then ask ourselves, what happens if we directly print my_reader
, without any for
? Will we see a nice list or something else? Let’s try:
[24]:
import csv
with open('example-1.csv', encoding='utf-8', newline='') as f:
my_reader = csv.reader(f, delimiter=',')
print(my_reader)
<_csv.reader object at 0x7f58767de978>
This result is quite disappointing
✪ EXERCISE 2.6: you probably found yourself in the same situation when trying to print a sequence generated by a call to range(5)
: instead of the actual sequence you get a range
object. If you want to convert the generator to a list, what should you do?
[25]:
# write here
import csv
with open('example-1.csv', encoding='utf-8', newline='') as f:
my_reader = csv.reader(f, delimiter=',')
print(list(my_reader))
[['animal', ' lifespan'], ['dog', '12'], ['cat', '14'], ['pelican', '30'], ['squirrel', '6'], ['eagle', '25']]
Consuming a file¶
Not all sequences are the same. From what you’ve seen so far, going through a file in Python looks a lot like iterating a list. Which is very handy, but you need to pay attention to some things. Given that files potentially might occupy terabytes, basic Python functions to load them avoid loading everything into memory and typically a file is read one piece at a time. But if the whole file is loaded into Python environment in one shot, what happens if we try to go through it twice inside the
same with
? What happens if we try using it outside with
? To find out look at next exercises.
✪ EXERCISE 2.7: taking the solution to previous exercise, try to call print(list(my_reader))
twice, in sequence. Do you get the same output in both occasions?
[ ]:
[26]:
# write here the code
#import csv
#with open('example-1.csv', encoding='utf-8', newline='') as f:
# my_reader = csv.reader(f, delimiter=',')
# print(list(my_reader))
# print(list(my_reader))
✪ Exercise 2.8: Taking the solution from previous exercise (using only one print), try down here to move the print to the left (removing any spaces). Does it still work ?
[27]:
# write here
import csv
with open('example-1.csv', encoding='utf-8', newline='') as f:
my_reader = csv.reader(f, delimiter=',')
#print(list(my_reader)) # COMMENTED, AS IT WOULD RAISE ON ERROR OF CLOSED FILE
# We can't use commands which read the file outside the with !
✪✪ Exercise 2.9: Now that we understood which kind of beast my_reader
is, try to produce this result as done before, but using a list comprehension instead of the for
:
[['dog', 12],
['cat', 14],
['pelican', 30],
['squirrel', 6],
['eagle', 25]]
If you can, try also to write the whole transformation to create
big_list
in one row, usinf the function itertools.islice to jump the header (for exampleitertools.islice(['A', 'B', 'C', 'D', 'E'], 2, None)
first two elements and produces the sequence C D E F G - in our case the elements produced bymy_reader
would be rows)
[28]:
import csv
import itertools
with open('example-1.csv', encoding='utf-8', newline='') as f:
my_reader = csv.reader(f, delimiter=',')
# write here
big_list = [[row[0], int(row[1])] for row in itertools.islice(my_reader, 1, None)]
print(big_list)
[['dog', 12], ['cat', 14], ['pelican', 30], ['squirrel', 6], ['eagle', 25]]
✪ Exercise 2.10: Create a file my-example.csv
in the same folder where this Jupyter notebook is, and copy inside the content of the file example-1.csv
. Then add a column description
, remembering to separate the column name from the preceding one with a comma. As column values, put into successive rows strings like dogs walk
, pelicans fly
, etc according to the animal, remembering to separate them from lifespan using a comma, like this:
dog,12,dogs walk
After this, copy and paste down here the Python code to load the file, putting the file name my-example.csv
, and try to load everything, just to check everything is working:
[29]:
# write here
ANSWER:
animal,lifespan,description
dog,12,dogs walk
cat,14,cats walk
pelican,30,pelicans fly
squirrel,6,squirrels fly
eagle,25,eagles fly
✪ Exercise 2.11: Not every CSV is structured in the same way, sometimes when we write csvs or import them some tweak is necessary. Let’s see which problems may arise:
In the file, try to put one or two spaces before numbers, for example write down here and look what happens
dog, 12,dogs fly
QUESTION 2.11.1: Does the space get imported?
ANSWER: yes
QUESTION 2.11.2: if we convert to integer, is the space a problem?
ANSWER: no
QUESTION 2.11.3 Modify only dogs description from dogs walk
to dogs walk, but don't fly
and try to riexecute the cell which opens the file. What happens?
ANSWER: Python reads one element more in the list
QUESTION 2.11.4: To overcome previous problem, a solution you can adopt in CSVs is to round strings containing commas with double quotes, like this: "dogs walk, but don't fly"
. Does it work ?
ANSWER: yes
Reading as dictionaries¶
To read a CSV, instead of getting lists, you may more conveniently get dictionaries in the form of OrderedDict
s
NOTE: different Python versions give different dictionaries:
\(<\) 3.6:
dict
3.6, 3.7:
OrderedDict
\(\geq\) 3.8:
dict
Python 3.8 returned to old dict
because in the implementation of its dictionariesthe key order is guaranteed, so it will be consistent with the one of CSV headers
[30]:
import csv
with open('example-1.csv', encoding='utf-8', newline='') as f:
my_reader = csv.DictReader(f, delimiter=',') # Notice we now used DictReader
for d in my_reader:
print(d)
{'animal': 'dog', ' lifespan': '12'}
{'animal': 'cat', ' lifespan': '14'}
{'animal': 'pelican', ' lifespan': '30'}
{'animal': 'squirrel', ' lifespan': '6'}
{'animal': 'eagle', ' lifespan': '25'}
Writing a CSV¶
You can easily create a CSV by instantiating a writer
object:
ATTENTION: BE SURE TO WRITE IN THE CORRECT FILE!
If you don’t pay attention to file names, you risk deleting data !
[31]:
import csv
# To write, REMEMBER to specify the `w` option.
# WARNING: 'w' *completely* replaces existing files !!
with open('written-file.csv', 'w', newline='') as csvfile_out:
my_writer = csv.writer(csvfile_out, delimiter=',')
my_writer.writerow(['This', 'is', 'a header'])
my_writer.writerow(['some', 'example', 'data'])
my_writer.writerow(['some', 'other', 'example data'])
Reading and writing a CSV¶
To create a copy of an existing CSV, you may nest a with
for writing inside another for reading:
ATTENTION: CAREFUL NOT TO SWAP FILE NAMES!
When we read and write it’s easy to make mistakes and accidentally overwrite our precious data.
To avoid issues:
use explicit names both for output files (es:
example-1-enriched.csv
and handles (i.e.csvfile_out
)backup data to read
always check before carelessly executing code you just wrote !
[32]:
import csv
# To write, REMEMBER to specify the `w` option.
# WARNING: 'w' *completely* replaces existing files !!
# WARNING: handle here is called *csvfile_out*
with open('example-1-enriched.csv', 'w', encoding='utf-8', newline='') as csvfile_out:
my_writer = csv.writer(csvfile_out, delimiter=',')
# Notice how this 'with' is *inside* the outer one:
# WARNING: handle here is called *csvfile_in*
with open('example-1.csv', encoding='utf-8', newline='') as csvfile_in:
my_reader = csv.reader(csvfile_in, delimiter=',')
for row in my_reader:
row.append('something else')
my_writer.writerow(row)
my_writer.writerow(row)
my_writer.writerow(row)
Let’s see the new file was actually created by reading it:
[33]:
with open('example-1-enriched.csv', encoding='utf-8', newline='') as csvfile_in:
my_reader = csv.reader(csvfile_in, delimiter=',')
for row in my_reader:
print(row)
['animal', ' lifespan', 'something else']
['animal', ' lifespan', 'something else']
['animal', ' lifespan', 'something else']
['dog', '12', 'something else']
['dog', '12', 'something else']
['dog', '12', 'something else']
['cat', '14', 'something else']
['cat', '14', 'something else']
['cat', '14', 'something else']
['pelican', '30', 'something else']
['pelican', '30', 'something else']
['pelican', '30', 'something else']
['squirrel', '6', 'something else']
['squirrel', '6', 'something else']
['squirrel', '6', 'something else']
['eagle', '25', 'something else']
['eagle', '25', 'something else']
['eagle', '25', 'something else']
CSV Botteghe storiche¶
Usually in open data catalogs like the popular CKAN platform (for example dati.trentino.it, data.gov.uk, European data portal run instances of CKAN) files are organized in datasets, which are collections of resources: each resource directly contains a file inside the catalog (typically CSV, JSON or XML) or a link to the real file located in a server belonging to the organizazion which created the data.
The first dataset we wil look at will be ‘Botteghe storiche del Trentino’:
https://dati.trentino.it/dataset/botteghe-storiche-del-trentino
Here you will find some generic information about the dataset, of importance note the data provider: Provincia Autonoma di Trento and the license Creative Commons Attribution v4.0, which basically allows any reuse provided you cite the author.
Inside the dataset page, there is a resource called ‘Botteghe storiche’
At the resource page, we find a link to the CSV file (you can also find it by clicking on the blue button ‘Go to the resource’):
Accordingly to the browser and operating system you have, by clicking on the link above you might get different results. In our case, on browser Firefox and operating system Linux we get (here we only show first 10 rows):
Numero,Insegna,Indirizzo,Civico,Comune,Cap,Frazione/Località ,Note
1,BAZZANELLA RENATA,Via del Lagorai,30,Sover,38068,Piscine di Sover,"generi misti, bar - ristorante"
2,CONFEZIONI MONTIBELLER S.R.L.,Corso Ausugum,48,Borgo Valsugana,38051,,esercizio commerciale
3,FOTOGRAFICA TRINTINAGLIA UMBERTO S.N.C.,Largo Dordi,8,Borgo Valsugana,38051,,"esercizio commerciale, attività artigianale"
4,BAR SERAFINI DI MINATI RENZO,,24,Grigno,38055,Serafini,esercizio commerciale
6,SEMBENINI GINO & FIGLI S.R.L.,Via S. Francesco,35,Riva del Garda,38066,,
7,HOTEL RISTORANTE PIZZERIA “ALLA NAVEâ€,Via Nazionale,29,Lavis,38015,Nave San Felice,
8,OBRELLI GIOIELLERIA DAL 1929 S.R.L.,Via Roma,33,Lavis,38015,,
9,MACELLERIE TROIER S.A.S. DI TROIER DARIO E C.,Via Roma,13,Lavis,38015,,
10,NARDELLI TIZIANO,Piazza Manci,5,Lavis,38015,,esercizio commerciale
As expected, values are separated with commas.
Problem: wrong characters ??¶
You can suddenly discover a problem in the first row of headers, in the column Frazione/LocalitÃ
. It seems last character is wrong, in italian it should show accented like à
. Is it truly a problem of the file ? Not really. Probably, the server is not telling Firefox which encoding is the correct one for the file. Firefox is not magical, and tries its best to show the CSV on the base of the info it has, which may be limited and / or even wrong. World is never like we would like it to be
…
✪ 2.12 Exercise: download the CSV, and try opening it in Excel and / or LibreOffice Calc. Do you see a correct accented character? If not, try to set the encoding to ‘Unicode (UTF-8)’ (in Calc is called ‘Character set’).
WARNING: CAREFUL IF YOU USE Excel!
By clicking directly on File->Open in Excel
, probably Excel will try to guess on its own how to put the CSV in a table, and will make the mistake to place everything in a column. To avoid the problem, we have to tell Excel to show a panel to ask us how we want to open the CSV, by doing like so:
In old Excels, find
File-> Import
In recent Excels, click on tab
Data
and then selectFrom text
. For further information, see copytrans guide
NOTE: If the file is not available, in the folder where this notebook is you will find the same file renamed to
botteghe-storiche.csv
We should get a table like this. Notice how the Frazione/Località
header displays with the right accent because we selected Character set: Unicode (UTF-8) which is the appropriate one for this dataset:
Botteghe storiche in Python¶
Now that we understood a couple of things about encoding, let’s try to import the file in Python.
If we load in Python the first 5 entries with a csv DictReader and print them we should see something like this:
OrderedDict([('Numero', '1'),
('Insegna', 'BAZZANELLA RENATA'),
('Indirizzo', 'Via del Lagorai'),
('Civico', '30'),
('Comune', 'Sover'),
('Cap', '38068'),
('Frazione/Località', 'Piscine di Sover'),
('Note', 'generi misti, bar - ristorante')]),
OrderedDict([('Numero', '2'),
('Insegna', 'CONFEZIONI MONTIBELLER S.R.L.'),
('Indirizzo', 'Corso Ausugum'),
('Civico', '48'),
('Comune', 'Borgo Valsugana'),
('Cap', '38051'),
('Frazione/Località', ''),
('Note', 'esercizio commerciale')]),
OrderedDict([('Numero', '3'),
('Insegna', 'FOTOGRAFICA TRINTINAGLIA UMBERTO S.N.C.'),
('Indirizzo', 'Largo Dordi'),
('Civico', '8'),
('Comune', 'Borgo Valsugana'),
('Cap', '38051'),
('Frazione/Località', ''),
('Note', 'esercizio commerciale, attività artigianale')]),
OrderedDict([('Numero', '4'),
('Insegna', 'BAR SERAFINI DI MINATI RENZO'),
('Indirizzo', ''),
('Civico', '24'),
('Comune', 'Grigno'),
('Cap', '38055'),
('Frazione/Località', 'Serafini'),
('Note', 'esercizio commerciale')]),
OrderedDict([('Numero', '6'),
('Insegna', 'SEMBENINI GINO & FIGLI S.R.L.'),
('Indirizzo', 'Via S. Francesco'),
('Civico', '35'),
('Comune', 'Riva del Garda'),
('Cap', '38066'),
('Frazione/Località', ''),
('Note', '')])
We would like to know which different categories of bottega there are, and count them. Unfortunately, there is no specific field for Categoria, so we will need to extract this information from other fields such as Insegna
and Note
. For example, this Insegna
contains the category BAR
, while the Note
(commercial enterprise) is a bit too generic to be useful:
'Insegna': 'BAR SERAFINI DI MINATI RENZO',
'Note': 'esercizio commerciale',
while this other Insegna
contains just the owner name and Note
holds both the categories bar
and ristorante
:
'Insegna': 'BAZZANELLA RENATA',
'Note': 'generi misti, bar - ristorante',
As you see, data is non uniform:
sometimes the category is in the
Insegna
sometimes is in the
Note
sometimes is in both
sometimes is lowercase
sometimes is uppercase
sometimes is single
sometimes is multiple (
bar - ristorante
)
First we want to extract all categories we can find, and rank them according their frequency, from most frequent to least frequent.
To do so, you need to
count all words you can find in both
Insegna
andNote
fields, and sort them. Note you need to normalize the uppercase.consider a category relevant if it is present at least 11 times in the dataset.
filter non relevant words: some words like prepositions, type of company (
'S.N.C'
,S.R.L.
, ..), etc will appear a lot, and will need to be ignored. To detect them, you are given a list calledstopwords
.
NOTE: the rules above do not actually extract all the categories, for the sake of the exercise we only keep the most frequent ones.
To know how to proceed, read the following.
Botteghe storiche: rank_categories¶
Load the file with csv.DictReader
and while you are loading it, calculate the words as described above. Afterwards, return a list of words with their frequencies.
Do not load the whole file into memory, just process one dictionary at a time and update statistics accordingly.
Expected output:
[('BAR', 191),
('RISTORANTE', 150),
('HOTEL', 67),
('ALBERGO', 64),
('MACELLERIA', 27),
('PANIFICIO', 22),
('CALZATURE', 21),
('FARMACIA', 21),
('ALIMENTARI', 20),
('PIZZERIA', 16),
('SPORT', 16),
('TABACCHI', 12),
('FERRAMENTA', 12),
('BAZAR', 11)]
[34]:
def rank_categories(stopwords):
#jupman-raise
ret = {}
import csv
with open('botteghe.csv', newline='', encoding='utf-8',) as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for d in reader:
words = d['Insegna'].split(" ") + d['Note'].upper().split(" ")
for word in words:
if word in ret and not word in stopwords:
ret[word] += 1
else:
ret[word] = 1
return sorted([(key, val) for key,val in ret.items() if val > 10], key=lambda c: c[1], reverse=True)
#/jupman-raise
stopwords = ['',
'S.N.C.', 'SNC','S.A.S.', 'S.R.L.', 'S.C.A.R.L.', 'SCARL','S.A.S', 'COMMERCIALE','FAMIGLIA','COOPERATIVA',
'-', '&', 'C.', 'ESERCIZIO',
'IL', 'DE', 'DI','A', 'DA', 'E', 'LA', 'AL', 'DEL', 'ALLA', ]
categories = rank_categories(stopwords)
categories
[34]:
[('BAR', 191),
('RISTORANTE', 150),
('HOTEL', 67),
('ALBERGO', 64),
('MACELLERIA', 27),
('PANIFICIO', 22),
('FARMACIA', 21),
('CALZATURE', 21),
('ALIMENTARI', 20),
('PIZZERIA', 16),
('SPORT', 16),
('FERRAMENTA', 12),
('TABACCHI', 12),
('BAZAR', 11)]
Botteghe storiche: enrich¶
Once you found the categories, implement function enrich
, which takes the db and previously computed categories, and WRITES a NEW file botteghe-enriched.csv
where the rows are enriched with a new field Categorie
, which holds a list of the categories a particular bottega belongs to.
Write the new file with a
DictWriter
, see documentation
The new file should contain rows like this (showing only first 5):
OrderedDict([ ('Numero', '1'),
('Insegna', 'BAZZANELLA RENATA'),
('Indirizzo', 'Via del Lagorai'),
('Civico', '30'),
('Comune', 'Sover'),
('Cap', '38068'),
('Frazione/Località', 'Piscine di Sover'),
('Note', 'generi misti, bar - ristorante'),
('Categorie', "['BAR', 'RISTORANTE']")])
OrderedDict([ ('Numero', '2'),
('Insegna', 'CONFEZIONI MONTIBELLER S.R.L.'),
('Indirizzo', 'Corso Ausugum'),
('Civico', '48'),
('Comune', 'Borgo Valsugana'),
('Cap', '38051'),
('Frazione/Località', ''),
('Note', 'esercizio commerciale'),
('Categorie', '[]')])
OrderedDict([ ('Numero', '3'),
('Insegna', 'FOTOGRAFICA TRINTINAGLIA UMBERTO S.N.C.'),
('Indirizzo', 'Largo Dordi'),
('Civico', '8'),
('Comune', 'Borgo Valsugana'),
('Cap', '38051'),
('Frazione/Località', ''),
('Note', 'esercizio commerciale, attività artigianale'),
('Categorie', '[]')])
OrderedDict([ ('Numero', '4'),
('Insegna', 'BAR SERAFINI DI MINATI RENZO'),
('Indirizzo', ''),
('Civico', '24'),
('Comune', 'Grigno'),
('Cap', '38055'),
('Frazione/Località', 'Serafini'),
('Note', 'esercizio commerciale'),
('Categorie', "['BAR']")])
OrderedDict([ ('Numero', '6'),
('Insegna', 'SEMBENINI GINO & FIGLI S.R.L.'),
('Indirizzo', 'Via S. Francesco'),
('Civico', '35'),
('Comune', 'Riva del Garda'),
('Cap', '38066'),
('Frazione/Località', ''),
('Note', ''),
('Categorie', '[]')])
[35]:
def enrich(categories):
#jupman-raise
ret = []
fieldnames = []
# read headers
with open('botteghe.csv', newline='', encoding='utf-8') as csvfile_in:
reader = csv.DictReader(csvfile_in, delimiter=',')
d1 = next(reader)
fieldnames = list(d1.keys()) # otherwise we cannot append
fieldnames.append('Categorie')
with open('botteghe-enriched-solution.csv', 'w', newline='', encoding='utf-8') as csvfile_out:
writer = csv.DictWriter(csvfile_out, fieldnames=fieldnames)
writer.writeheader()
with open('botteghe.csv', newline='', encoding='utf-8',) as csvfile_in:
reader = csv.DictReader(csvfile_in, delimiter=',')
for d in reader:
new_d = {key:val for key,val in d.items()}
new_d['Categorie'] = []
for cat in categories:
if cat[0] in d['Insegna'].upper() or cat[0] in d['Note'].upper():
new_d['Categorie'].append(cat[0])
writer.writerow(new_d)
#/jupman-raise
enrich(rank_categories(stopwords))
[36]:
# let's see if we created the file we wanted
# (using botteghe-enriched-solution.csv to avoid polluting your file)
with open('botteghe-enriched-solution.csv', newline='', encoding='utf-8',) as csvfile_in:
reader = csv.DictReader(csvfile_in, delimiter=',')
# better to pretty print the OrderedDicts, otherwise we get unreadable output
# for documentation see https://docs.python.org/3/library/pprint.html
import pprint
pp = pprint.PrettyPrinter(indent=4)
for i in range(5):
d = next(reader)
pp.pprint(d)
{ 'Cap': '38068',
'Categorie': "['BAR', 'RISTORANTE']",
'Civico': '30',
'Comune': 'Sover',
'Frazione/Località': 'Piscine di Sover',
'Indirizzo': 'Via del Lagorai',
'Insegna': 'BAZZANELLA RENATA',
'Note': 'generi misti, bar - ristorante',
'Numero': '1'}
{ 'Cap': '38051',
'Categorie': '[]',
'Civico': '48',
'Comune': 'Borgo Valsugana',
'Frazione/Località': '',
'Indirizzo': 'Corso Ausugum',
'Insegna': 'CONFEZIONI MONTIBELLER S.R.L.',
'Note': 'esercizio commerciale',
'Numero': '2'}
{ 'Cap': '38051',
'Categorie': '[]',
'Civico': '8',
'Comune': 'Borgo Valsugana',
'Frazione/Località': '',
'Indirizzo': 'Largo Dordi',
'Insegna': 'FOTOGRAFICA TRINTINAGLIA UMBERTO S.N.C.',
'Note': 'esercizio commerciale, attività artigianale',
'Numero': '3'}
{ 'Cap': '38055',
'Categorie': "['BAR']",
'Civico': '24',
'Comune': 'Grigno',
'Frazione/Località': 'Serafini',
'Indirizzo': '',
'Insegna': 'BAR SERAFINI DI MINATI RENZO',
'Note': 'esercizio commerciale',
'Numero': '4'}
{ 'Cap': '38066',
'Categorie': '[]',
'Civico': '35',
'Comune': 'Riva del Garda',
'Frazione/Località': '',
'Indirizzo': 'Via S. Francesco',
'Insegna': 'SEMBENINI GINO & FIGLI S.R.L.',
'Note': '',
'Numero': '6'}
[ ]:
Graph formats solutions¶
Introduction¶
Usual matrices from linear algebra are of great importance in computer science because they are widely used in many fields, for example in machine learning and network analysis. This tutorial will give you an appreciation of the meaning of matrices when considered as networks or, as we call them in computer science, graphs. We will also review other formats for storing graphs, such as adjacency lists and a have a quick look at a specialized library called Networkx.
In Part A we will limit ourselves to graph formats in this notebook and see some theory in separate binary relations notebook, while in Part B of the course will focus on graph algorithms.
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|- graph-formats
|- graph-formats-exercise.ipynb
|- graph-formats-solution.ipynb
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/graph-formats/graph-formats-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Required libraries¶
In order for visualizations to work, you need installed the python library networkx
and pydot
. Pydot is an interface to the non-pyhon package GraphViz.
Anaconda:
From Anaconda Prompt:
Install GraphViz:
conda install graphviz
Install python packages:
conda install pydot networkx
Ubuntu
From console:
Install PyGraphViz (note: you should use apt to install it, pip might give problems):
sudo apt install python3-pygraphviz
Install python packages:
python3 -m pip install --user pydot networkx
Graph definition¶
In computer science a graph is a set of verteces V (also called nodes) linked by a set of edges E. You can visualize nodes as circles and links as lines. If the graph is undirected, links are just lines, if the graph is directed, links are represented as arrows with a tip to show the direction:
For our purposes, we will consider directed graphs (also called digraphs).
Usually we will indicate nodes with numbers going from zero included but optionally they can be labelled. Since we are dealing with directed graphs, we can have an arrow going for example from node 1
to node 2
, but also another arrow going from node 2
to node 1
. Furthemore, a node (for example node 0
) can have a cap, that is an edge going to itself:
Edge weights¶
Optionally, we will sometimes assign a weight to the edges, that is a number to be shown over the edges. So we can modify the previous example. Note we can have an arrow going from node 1
to node 2
with a weight which is different from the weight arrow from 2
to 1
:
Matrices¶
Here we will represent graphs as matrices, which performance-wise is particularly good when the matrix is dense, that is, has many entries different from zero. Otherwise, when you have a so-called sparse matrix (few non-zero entries), it is best to represent the graph with adjacency list, but we will deal with them later.
If you have a directed graph (digraph) with n
verteces, you can represent it as an n x n
matrix by considering each row as vertex:
A row at index
i
represents the outward links from nodei
to the othern
nodes, with possibly nodei
itself included.A value of zero means there is no link to a given node.
In general,
mat[i][j]
is the weight of the edge between nodei
to nodej
Visualization examples¶
We defined a function sciprog.draw_mat
to display matrices as graphs (you don’t need to understand the internals, for now we won’t go into depth about matrix visualizations).
If it doesn’t work, see above Required libraries paragraph
[2]:
# PLEASE EXECUTE THIS CELL TO CHECK IF VISUALIZATION IS WORKING
# notice links with weight zero are not shown)
# all weights are set to 1
# first need to import this
import sys
sys.path.append('../../')
from sciprog import draw_mat
mat = [
[1,1,0,1], # node 0 is linked to node 0 itself, node 1 and node 2
[0,0,1,1], # node 1 is linked to node 2 and node 3
[1,1,1,1], # node 2 is linked to node 0, node 1, node 2 itself and node 3
[0,1,0,1] # node 3 is linked to node 1 and node 3 itself
]
draw_mat(mat)

Saving a graph to a file¶
If you want (or if you are not using Jupyter), optionally you can save the graph to a .png file by specificing the save_to
filepath:
[3]:
mat = [
[1,1],
[0,1]
]
draw_mat( mat, save_to='example.png')

Image saved to file: example.png
Minimal graph¶
With this representation derived from matrices as we intend them (that is with at least one row and one column), the corresponding minimal graph can have only one node:
[4]:
minimal = [
[0]
]
draw_mat(minimal)

If we set the weight different from zero, the zeroeth node will link to itself (here we put the weight 5 in the link):
[5]:
minimal = [
[5]
]
draw_mat(minimal)

Graph with two nodes example¶
[6]:
m = [
[5,9], # node 0 links to node 0 itself with a weight of 5, and to node 1 with a weight of 9
[0,6], # node 1 links to node 1 with a weight of 6
]
draw_mat(m)

Distance matrix¶
Depending on the problem at hand, it may be reasonable to change the weights. For example, on a road network the nodes could represent places and the weights could be the distances. If we assume it is possible to travel in both directions on all roads, we get a matrix symmetric along the diagonal, and we can call the matrix a distance matrix. Talking about the diagonal, for the special case of going from a place to itself, we set that street length to 0 (which make sense for street length but could give troubles for other purposes, for example if we give the numbers the meaning ‘is connected’ a place should always be connected to itself)
[7]:
# distance matrix example
mat = [
[0,6,0,8], # place 0 is linked to place 1 and place 2
[6,0,9,7], # place 1 is linked to place 0, place 2 and place 3
[5,9,0,4], # place 2 is linked to place 0, place 1 and place 3
[8,7,4,0] # place 3 is linked to place 0, place 1 and place 2
]
draw_mat(mat)

More realistic traffic road network, where going in one direction might take actually longer than going back, because of one-way streets and different routing times.
[8]:
mat = [
[0,6,0,8], # place 0 is linked to place 1 and place 2
[9,0,9,7], # place 1 is linked to place 0, place 2 and place 3
[5,5,0,4], # place 2 is linked to place 0, place 1 and place 3
[7,9,8,0] # place 3 is linked to place 0, place 1, place 2
]
draw_mat(mat)

Boolean matrix example¶
If we are not interested at all in the weights, we might use only zeroes and ones as we did before. But this could have implications when doing operations on matrices, so some times it is better to use only True
and False
[9]:
mat = [
[False, True, False],
[False, True, True],
[True, False, True],
]
draw_mat(mat)

Matrix exercises¶
We are now ready to start implementing the following functions. Before even start implementation, for each try to interpret the matrix as a graph, drawing it on paper. When you’re done implementing try to use draw_mat
on the results. Notice that since draw_mat
is a generic display function and knows nothing about the nature of the graph, sometimes it will not show the graph in the optimal way we humans would use.
line¶
✪✪ This function is similar to diag
. As that one, you can implement it in two ways: you can use a double for
, or a single one. For the sake of the first part of the course the double for
is acceptable, but in the second part it would be considered a waist of computing cycles.
What would be the graph representation of diag
?
[10]:
def line(n):
""" RETURN a matrix as lists of lists where node i must have an edge to node i + 1 with weight 1
Last node points to nothing
n must be >= 1, otherwise rises ValueError
"""
#jupman-raise
if n < 1:
raise ValueError("Invalid n %s" % n)
ret = [[0]*n for i in range(n)]
for i in range(n-1):
ret[i][i+1] = 1
return ret
#/jupman-raise
assert line(1) == [
[0]
]
assert line(2) == [
[0,1],
[0,0]
]
assert line(3) == [
[0,1,0],
[0,0,1],
[0,0,0]
]
assert line(4) == [
[0,1,0,0],
[0,0,1,0],
[0,0,0,1],
[0,0,0,0]
]
draw_mat(line(4))

cross¶
✪✪ RETURN a nxn matrix filled with zeros except on the crossing lines.
n
must be >=1 and odd, otherwise aValueError
is thrown
Example for n=7
:
0001000
0001000
0001000
1111111
0001000
0001000
0001000
Try to figure out how the resulting graph would look like (try to draw on paper, also notice that draw_mat
will probably not draw the best possible representation)
[11]:
def cross(n):
#jupman-raise
if n < 1 or n % 2 == 0:
raise ValueError("Invalid n %s" % n)
ret = [[0]*n for i in range(n)]
for i in range(n):
ret[n//2 ][i] = 1
ret[i][n//2] = 1
return ret
#/jupman-raise
assert cross(1) == [
[1]
]
assert cross(3) == [
[0,1,0],
[1,1,1],
[0,1,0]
]
assert cross(5) == [
[0,0,1,0,0],
[0,0,1,0,0],
[1,1,1,1,1],
[0,0,1,0,0],
[0,0,1,0,0]
]
union¶
✪✪ When we talk about the union of two graphs, we intend the graph having union of verteces of both graphs and having as edges the union of edges of both graphs. In this exercise, we have two graphs as list of lists with boolean edges. To simplify we suppose they have the same vertices but possibly different edges, and we want to calculate the union as a new graph.
For example, if we have a graph ma
like this:
[12]:
ma = [
[True, False, False],
[False, True, False],
[True, False, False]
]
[13]:
draw_mat(ma)

And another mb
like this:
[14]:
mb = [
[True, True, False],
[False, False, True],
[False, True, False]
]
[15]:
draw_mat(mb)

The result of calling union(ma, mb)
will be the following:
[16]:
res = [[True, True, False], [False, True, True], [True, True, False]]
which will be displayed as
[17]:
draw_mat(res)

So we get same verteces and edges from both ma
and mb
[18]:
def union(mata, matb):
""" Takes two graphs represented as nxn matrices of lists of lists with boolean edges,
and RETURN a NEW matrix which is the union of both graphs
if mata row number is different from matb, raises ValueError
"""
#jupman-raise
if len(mata) != len(matb):
raise ValueError("mata and matb have different row number a:%s b:%s!" % (len(mata), len(matb)))
n = len(mata)
ret = []
for i in range(n):
row = []
ret.append(row)
for j in range(n):
row.append(mata[i][j] or matb[i][j])
return ret
#/jupman-raise
try:
union([[False],[False]], [[False]])
raise Exception("Shouldn't arrive here !")
except ValueError:
"test passed"
try:
union([[False]], [[False],[False]])
raise Exception("Shouldn't arrive here !")
except ValueError:
"test passed"
ma1 = [
[False]
]
mb1 = [
[False]
]
assert union(ma1, mb1) == [
[False]
]
ma2 = [
[False]
]
mb2 = [
[True]
]
assert union(ma2, mb2) == [
[True]
]
ma3 = [
[True]
]
mb3 = [
[False]
]
assert union(ma3, mb3) == [
[True]
]
ma4 = [
[True]
]
mb4 = [
[True]
]
assert union(ma4, mb4) == [
[True]
]
ma5 = [
[False, False, False],
[False, False, False],
[False, False, False]
]
mb5 = [
[True, False, True],
[False, True, True],
[False, False, False]
]
assert union(ma5, mb5) == [
[True, False, True],
[False, True, True],
[False, False, False]
]
ma6 = [
[True, False, True],
[False, True, True],
[False, False, False]
]
mb6 = [
[False, False, False],
[False, False, False],
[False, False, False]
]
assert union(ma6, mb6) == [
[True, False, True],
[False, True, True],
[False, False, False]
]
ma7 = [
[True, False, False],
[False, True, False],
[True, False, False]
]
mb7 = [
[True, True, False],
[False, False, True],
[False, True, False]
]
assert union(ma7, mb7) == [
[True, True, False],
[False, True, True],
[True, True, False]
]
is_subgraph¶
✪✪ If we interpret a matrix as graph, we may wonder when a graph A is a subgraph of another graph B, that is, when A nodes are a subset of B nodes and when A edges are a subset of B edges. For convenience, here we only consider graphs having the same n
nodes both in A and B. Edges may instead vary. Graphs are represented as boolean matrices.
[19]:
def is_subgraph(A, B):
""" RETURN True is A is a subgraph of B, that is, some or all of its edges also belong to B.
A and B are boolean matrices of size nxn. If sizes don't match, raises ValueError
"""
#jupman-raise
n = len(A)
m = len(B)
if n != m:
raise ValueError("A size %s and B size %s should match !" % (n,m))
for i in range(n):
for j in range(n):
if A[i][j] and not B[i][j]:
return False
return True
#/jupman-raise
# the set of edges is empty
ma = [
[False]
]
# the set of edges is empty
mb = [
[False]
]
# an empty set is always a subset of an empty set
assert is_subgraph(ma, mb) == True
# the set of edges is empty
ma = [
[False]
]
# the set of edges contains one element
mb = [
[True]
]
# an empty set is always a subset of any set, so function gives True
assert is_subgraph(ma, mb) == True
ma = [
[True]
]
mb = [
[True]
]
assert is_subgraph(ma, mb) == True
ma = [
[True]
]
mb = [
[False]
]
assert is_subgraph(ma, mb) == False
ma = [
[True, False],
[True, False],
]
mb = [
[True, False],
[True, True],
]
assert is_subgraph(ma, mb) == True
ma = [
[False, False, True],
[True, True,True],
[True, False,True],
]
mb = [
[True, False, True],
[True, True,True],
[True, True,True],
]
assert is_subgraph(ma, mb) == True
remove_node¶
✪✪ Here the function text is not so precise, as it is talking about nodes but you have to operate on a matrix. Can you guess exactly what you have to do ? In your experiments, try to draw the matrix before and after executing remove_node
[20]:
def remove_node(mat, i):
""" MODIFIES mat by removing node i.
"""
#jupman-raise
del mat[i]
for row in mat:
del row[i]
#/jupman-raise
m = [
[3,5,2,5],
[6,2,3,7],
[4,2,1,2],
[7,2,2,6]
]
remove_node(m,2)
assert len(m) == 3
for i in range(3):
assert len(m[i]) == 3
utriang¶
✪✪✪ You will try to create an upper triangular matrix of side n
. What could possibly be the graph interpretation of such a matrix? Since draw_mat
is a generic drawing function doesn’t provide the best possible representation, try to draw on paper a more intuitive one.
[21]:
def utriang(n):
""" RETURN a matrix of size nxn which is upper triangular, that is,
has all nodes below the diagonal 0, while all the other nodes
are set to 1
"""
#jupman-raise
ret = []
for i in range(n):
row = []
for j in range(n):
if j < i:
row.append(0)
else:
row.append(1)
ret.append(row)
return ret
#/jupman-raise
assert utriang(1) == [
[1]
]
assert utriang(2) == [
[1,1],
[0,1]
]
assert utriang(3) == [
[1,1,1],
[0,1,1],
[0,0,1]
]
assert utriang(4) == [
[1,1,1,1],
[0,1,1,1],
[0,0,1,1],
[0,0,0,1]
]
ediff¶
✪✪✪ The edge difference of two graphs ediff(da,db)
is a graph with the edges of the first except the edges of the second. For simplicity, here we consider only graphs having the same verteces but possibly different edges. This time we will try operate on graphs represented as dictionaries of adjacency lists.
For example, if we have
[22]:
da = {
'a':['a','c'],
'b':['b', 'c'],
'c':['b','c']
}
[23]:
draw_adj(da)

and
[24]:
db = {
'a':['c'],
'b':['a','b', 'c'],
'c':['a']
}
[25]:
draw_adj(db)

The result of calling ediff(da,db)
will be:
[26]:
res = {
'a':['a'],
'b':[],
'c':['b','c']
}
Which can be shown as
[27]:
draw_adj(res)

[28]:
def ediff(da,db):
""" Takes two graphs as dictionaries of adjacency lists da and db, and
RETURN a NEW graph as dictionary of adjacency lists, containing the same vertices of da,
and the edges of da except the edges of db.
- As order of elements within the adjacency lists, use the same order as found in da.
- We assume all verteces in da and db are represented in the keys (even if they have
no outgoing edge), and that da and db have the same keys
EXAMPLE:
da = {
'a':['a','c'],
'b':['b', 'c'],
'c':['b','c']
}
db = {
'a':['c'],
'b':['a','b', 'c'],
'c':['a']
}
assert ediff(da, db) == {
'a':['a'],
'b':[],
'c':['b','c']
}
"""
#jupman-raise
ret = {}
for key in da:
ret[key] = []
for target in da[key]:
# not efficient but works for us
# using sets would be better, see https://stackoverflow.com/a/6486483
if target not in db[key]:
ret[key].append(target)
return ret
#/jupman-raise
da1 = {
'a': []
}
db1 = {
'a': []
}
assert ediff(da1, db1) == {
'a': []
}
da2 = {
'a': []
}
db2 = {
'a': ['a']
}
assert ediff(da2, db2) == {
'a': []
}
da3 = {
'a': ['a']
}
db3 = {
'a': []
}
assert ediff(da3, db3) == {
'a': ['a']
}
da4 = {
'a': ['a']
}
db4 = {
'a': ['a']
}
assert ediff(da4, db4) == {
'a': []
}
da5 = {
'a':['b'],
'b':[]
}
db5 = {
'a':['b'],
'b':[]
}
assert ediff(da5, db5) == {
'a':[],
'b':[]
}
da6 = {
'a':['b'],
'b':[]
}
db6 = {
'a':[],
'b':[]
}
assert ediff(da6, db6) == {
'a':['b'],
'b':[]
}
da7 = {
'a':['a','b'],
'b':[]
}
db7 = {
'a':['a'],
'b':[]
}
assert ediff(da7, db7) == {
'a':['b'],
'b':[]
}
da8 = {
'a':['a','b'],
'b':['a']
}
db8 = {
'a':['a'],
'b':['b']
}
assert ediff(da8, db8) == {
'a':['b'],
'b':['a']
}
da9 = {
'a':['a','c'],
'b':['b', 'c'],
'c':['b','c']
}
db9 = {
'a':['c'],
'b':['a','b', 'c'],
'c':['a']
}
assert ediff(da9, db9) == {
'a':['a'],
'b':[],
'c':['b','c']
}
pyramid¶
✪✪✪ The following function requires to create a matrix filled with non-zero numbers. Even if don’t know exactly the network meaning, with ust this fact we can conclude that all nodes are linked to all others. A graph where this happens is called a clique (the Italian name is cricca - where have you already seen it? ;-)
[29]:
def pyramid(n):
"""
Takes an odd number n >= 1 and RETURN a matrix as list of lists containing numbers displaced like this
example for a pyramid of square 7:
if n is even, raises ValueError
1111111
1222221
1233321
1234321
1233321
1222221
1111111
"""
#jupman-raise
if n % 2 == 0:
raise ValueError("n should be odd, found instead %s" % n)
ret = [[0]*n for i in range(n)]
for i in range(n//2 + 1):
for j in range(n//2 +1):
ret[i][j] = min(i, j) + 1
ret[i][n-j-1] = min(i, j) + 1
ret[n-i-1][j] = min(i, j) + 1
ret[n-i-1][n-j-1] = min(i, j) + 1
ret[n//2][n//2] = n // 2 + 1
return ret
#/jupman-raise
try:
pyramid(4)
raise Exception("SHOULD HAVE FAILED!")
except ValueError:
"passed test"
assert pyramid(1) == [
[1]
]
assert pyramid(3) == [
[1,1,1],
[1,2,1],
[1,1,1]
]
assert pyramid(5) == [
[1, 1, 1, 1, 1],
[1, 2, 2, 2, 1],
[1, 2, 3, 2, 1],
[1, 2, 2, 2, 1],
[1, 1, 1, 1, 1]
]
Adjacency lists¶
So far, we represented graphs as matrices, saying they are good when the graph is dense, that is any given node is likely to be connected to almost all other nodes - or equivalently, many cell entries in the matrix are different from zero. But if this is not the case, other representations might be needed. For example, we can represent a graph as a adjacency lists.
Let’s look at this 6x6 boolean matrix:
[30]:
m = [
[False, False, False, False, False, False],
[False, False, False, False, False, False],
[True, False, False, True, False, False],
[False, False, False, False, False, False],
[False, False, False, False, False, False],
[False, False, True, False, False, False]
]
We see just a few True
, so by drawing it we don’t expect to see many edges:
[31]:
draw_mat(m)

As a more compact representation, we might represent the data as a dictionary of adjacency lists where the keys are the node indexes and the to each node we associate a list with the target nodes it points to.
To reproduce the example above, we can write like this:
[32]:
d = {
0: [], # node 0 links to nothing
1: [], # node 1 links to nothing
2: [0,3], # node 2 links to node 0 and 3
3: [], # node 3 links to nothing
4: [], # node 4 links to nothing
5: [2] # node 5 links to node 2
}
In sciprog.py
, we provide also a function sciprog.draw_adj
to quickly inspect such data structure:
[33]:
from sciprog import draw_adj
draw_adj(d)

As expected, the resulting graph is the same as for the equivalent matrix representation.
mat_to_adj¶
✪✪ Implement a function that takes a boolean nxn matrix and RETURN the equivalent representation as dictionary of adjacency lists. Remember that to create an empty dict you have to write dict()
[34]:
def mat_to_adj(bool_mat):
#jupman-raise
ret = dict()
n = len(bool_mat)
for i in range(n):
ret[i] = []
for j in range(n):
if bool_mat[i][j]:
ret[i].append(j)
return ret
#/jupman-raise
m1 = [
[False]
]
d1 = {
0:[]
}
assert mat_to_adj(m1) == d1
m2 = [
[True]
]
d2 = {
0:[0]
}
assert mat_to_adj(m2) == d2
m3 = [
[False,False],
[False,False]
]
d3 = {
0:[],
1:[]
}
assert mat_to_adj(m3) == d3
m4 = [
[True,True],
[True,True]
]
d4 = {
0:[0,1],
1:[0,1]
}
assert mat_to_adj(m4) == d4
m5 = [
[False,False],
[False,True]
]
d5 = {
0:[],
1:[1]
}
assert mat_to_adj(m5) == d5
m6 = [
[True,False,False],
[True, True,False],
[False,True,False]
]
d6 = {
0:[0],
1:[0,1],
2:[1]
}
assert mat_to_adj(m6) == d6
mat_ids_to_adj¶
✪✪ Implement a function that takes a boolean nxn matrix and a list of immutable identifiers for the nodes, and RETURN the equivalent representation as dictionary of adjacency lists.
If matrix is not
n
xn
orids
length does not matchn
, raiseValueError
[35]:
def mat_ids_to_adj(bool_mat, ids):
#jupman-raise
ret = dict()
n = len(bool_mat)
m = len(bool_mat[0])
if n != m:
raise ValueError('matrix is not nxn !')
if n != len(ids):
raise ValueError("Identifiers quantity is different from matrix size!" )
for i in range(n):
ret[ids[i]] = []
for j in range(n):
if bool_mat[i][j]:
ret[ids[i]].append(ids[j])
return ret
#/jupman-raise
try:
mat_ids_to_adj([[False, True]], ['a','b'])
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
try:
mat_ids_to_adj([[False]], ['a','b'])
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
m1 = [
[False]
]
d1 = { 'a':[] }
assert mat_ids_to_adj(m1, ['a']) == d1
m2 = [
[True]
]
d2 = { 'a':['a'] }
assert mat_ids_to_adj(m2, ['a']) == d2
m3 = [
[False,False],
[False,False]
]
d3 = {
'a':[],
'b':[]
}
assert mat_ids_to_adj(m3,['a','b']) == d3
m4 = [
[True,True],
[True,True]
]
d4 = {
'a':['a','b'],
'b':['a','b']
}
assert mat_ids_to_adj(m4, ['a','b']) == d4
m5 = [
[False,False],
[False,True]
]
d5 = {
'a':[],
'b':['b']
}
assert mat_ids_to_adj(m5,['a','b']) == d5
m6 = [
[True,False,False],
[True, True,False],
[False,True,False]
]
d6 = {
'a':['a'],
'b':['a','b'],
'c':['b']
}
assert mat_ids_to_adj(m6,['a','b','c']) == d6
adj_to_mat¶
✪✪✪ Try now conversion from dictionary of adjacency list to matrix (this is a bit hard).
To solve this, the general idea is that you have to fill an nxn matrix to return. During the filling of a cell at row i
and column j
, you have to decide whether to put a True
or a False
. You should put True
if in the d
list value corresponding to the i-th
key, there is contained a number equal to j
. Otherwise, you should put False
.
If you look at the tests, as inputs we are passing OrderedDict
. The reason is that when we check the output matrix of your function, we want to be sure the matrix rows are ordered in a certain way.
But you have to assume d can contain arbitrary ids with no precise ordering, so:
first you should scan the dictionary and lists to save the mapping between indexes to ids in a separate list
NOTE: d.keys()
is not exactly a list (does not allow access by index), so you must convert to list with this: list(d.keys())
then you should build the matrix to return, using the previously built list when needed.
Now implement the function:
[36]:
def adj_to_mat(d):
""" Take a dictionary of adjacency lists with arbitrary ids and
RETURN its representation as an nxn boolean matrix (assume
all nodes are present as keys)
- Assume d is a simple dictionary (not necessarily an OrderedDict)
"""
#jupman-raise
ret = []
n = len(d)
ids_to_row_indexes = dict()
# first maps row indexes to keys
row_indexes_to_ids = list(d.keys()) # because d.keys() is *not* indexable !
i = 0
for key in d:
row = []
ret.append(row)
for j in range(n):
if row_indexes_to_ids[j] in d[key]:
row.append(True)
else:
row.append(False)
i += 1
return ret
#/jupman-raise
from collections import OrderedDict
od1 = OrderedDict([
('a',[])
])
m1 = [ [False] ]
assert adj_to_mat(od1) == m1
od2 = OrderedDict([
('a',['a'])
])
m2 = [ [True] ]
assert adj_to_mat(od2) == m2
od3 = OrderedDict([
('a',['a','b']),
('b',['a','b']),
])
m3 = [
[True, True],
[True, True]
]
assert adj_to_mat(od3) == m3
od4 = OrderedDict([
('a',[]),
('b',[]),
])
m4 = [
[False, False],
[False, False]
]
assert adj_to_mat(od4) == m4
od5 = OrderedDict([
('a',['a']),
('b',['a','b']),
])
m5 = [
[True, False],
[True, True]
]
assert adj_to_mat(od5) == m5
od6 = OrderedDict([
('a',['a','c']),
('b',['c']),
('c',['a','b']),
])
m6 = [
[True, False, True],
[False, False, True],
[True, True, False],
]
assert adj_to_mat(od6) == m6
table_to_adj¶
Suppose you have a table expressed as a list of lists with headers like this:
[37]:
m0 = [
['Identifier','Price','Quantity'],
['a',1,1],
['b',5,8],
['c',2,6],
['d',8,5],
['e',7,3]
]
where a
, b
, c
etc are the row identifiers (imagine they represent items in a store), Price
and Quantity
are properties they might have. NOTE: here we put two properties, but they might have n
properties !
We want to transform such table into a graph-like format as a dictionary of lists, which relates store items as keys to the properties they might have. To include in the list both the property identifier and its value, we will use tuples. So you need to write a function that transforms the above input into this:
[38]:
res0 = {
'a':[('Price',1),('Quantity',1)],
'b':[('Price',5),('Quantity',8)],
'c':[('Price',2),('Quantity',6)],
'd':[('Price',8),('Quantity',5)],
'e':[('Price',7),('Quantity',3)]
}
[39]:
def table_to_adj(table):
#jupman-raise
ret = {}
headers = table[0]
for row in table[1:]:
lst = []
for j in range(1, len(row)):
lst.append((headers[j], row[j]))
ret[row[0]] = lst
return ret
#/jupman-raise
m0 = [
['I','P','Q']
]
res0 = {}
assert res0 == table_to_adj(m0)
m1 = [
['Identifier','Price','Quantity'],
['a',1,1],
['b',5,8],
['c',2,6],
['d',8,5],
['e',7,3]
]
res1 = {
'a':[('Price',1),('Quantity',1)],
'b':[('Price',5),('Quantity',8)],
'c':[('Price',2),('Quantity',6)],
'd':[('Price',8),('Quantity',5)],
'e':[('Price',7),('Quantity',3)]
}
assert res1 == table_to_adj(m1)
m2 = [
['I','P','Q'],
['a','x','y'],
['b','w','z'],
['c','z','x'],
['d','w','w'],
['e','y','x']
]
res2 = {
'a':[('P','x'),('Q','y')],
'b':[('P','w'),('Q','z')],
'c':[('P','z'),('Q','x')],
'd':[('P','w'),('Q','w')],
'e':[('P','y'),('Q','x')]
}
assert res2 == table_to_adj(m2)
m3 = [
['I','P','Q', 'R'],
['a','x','y', 'x'],
['b','z','x', 'y'],
]
res3 = {
'a':[('P','x'),('Q','y'), ('R','x')],
'b':[('P','z'),('Q','x'), ('R','y')],
}
assert res3 == table_to_adj(m3)
Networkx¶
Before continuing, make sure to have installed the required libraries
Networkx is a library to perform statistics on networks. For now, it will offer us a richer data structure where we can store the properties we want in nodes and also edges.
You can initialize networkx objects with the dictionary of adjacency lists we’ve alredy seen:
[40]:
import networkx as nx
# notice with networkx if nodes are already referenced to in an adjacency list
# you do not need to put them as keys:
G=nx.DiGraph({
'a':['b','c'], # node a links to b and c
'b':['b','c', 'd'] # node b links to b itself, c and d
})
The resulting object is not a simple dict, but something more complex:
[41]:
G
[41]:
<networkx.classes.digraph.DiGraph at 0x7fef507c1080>
To display it in a way uniform with the rest of the course, we developed a function called sciprog.draw_nx
:
[42]:
from sciprog import draw_nx
[43]:
draw_nx(G)

From the picture above, we notice there are no weights displayed, because in networkx they are just considered optional attributes of edges.
To see all the attributes of an edge, you can write like this:
[44]:
G['a']['b']
[44]:
{}
This graph has no attributes for the node, so we get back an empty dict. If we wanted to add a weight of 123
to that particular a b
edge, you could write like this:
[45]:
G['a']['b']['weight'] = 123
[46]:
G['a']['b']
[46]:
{'weight': 123}
Let’s try to display it:
[47]:
draw_nx(G)

We still don’t see the weight as weight can be one of many properties: the only thing that gets displayed is the propery label
. So let’s set label equal to the weight:
[48]:
G['a']['b']['label'] = 123
[49]:
draw_nx(G)

Converting networkx graphs¶
If you try to just output the string representation of the graph, networkx will give the empty string:
[50]:
print(G)
[51]:
str(G)
[51]:
''
[52]:
repr(G)
[52]:
'<networkx.classes.digraph.DiGraph object at 0x7fef507c1080>'
To convert to the dict of adjacency lists we know, you can use this method:
[53]:
nx.to_dict_of_lists(G)
[53]:
{'a': ['b', 'c'], 'b': ['b', 'c', 'd'], 'c': [], 'd': []}
The above works, but it doesn’t convert additional edge info. For a complete conversion, use nx.to_dict_of_dicts
[54]:
nx.to_dict_of_dicts(G)
[54]:
{'a': {'b': {'weight': 123, 'label': 123}, 'c': {}},
'b': {'b': {}, 'c': {}, 'd': {}},
'c': {},
'd': {}}
mat_to_nx¶
✪✪ Now try by yourself to convert a matrix as list of lists along with node ids (like you did before) into a networkx object.
This time, don’t create a dictionary to pass it to nx.DiGraph
constructor: instead, use networkx methods like .add_edge
and add_node
. For usage example, check the networkx tutorial. Do you need to explicitly call add_node
before referring to some node with add_edge
?
[55]:
def mat_to_nx(mat, ids):
""" Given a real-valued nxn matrix as list of lists and a list of immutable identifiers for the nodes,
RETURN the corresponding graph in networkx format (as nx.DiGraph).
If matrix is not nxn or ids length does not match n, raise ValueError
- DON'T transform into a dict, use add_ methods from networkx object!
- WARNING: Remember to set the labels to the weights AS STRINGS!
"""
#jupman-raise
G = nx.DiGraph()
n = len(mat)
m = len(mat[0])
if n != m:
raise ValueError('matrix is not nxn !')
if n != len(ids):
raise ValueError("Identifiers quantity is different from matrix size!" )
for i in range(n):
G.add_node(ids[i])
for j in range(n):
if mat[i][j] != 0:
G.add_edge(ids[i], ids[j])
G[ids[i]][ids[j]]['weight'] = mat[i][j]
G[ids[i]][ids[j]]['label'] = str(mat[i][j])
return G
#/jupman-raise
try:
mat_ids_to_adj([[0, 3]], ['a','b'])
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
try:
mat_ids_to_adj([[0]], ['a','b'])
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
m1 = [
[0]
]
d1 = {'a': {}}
assert nx.to_dict_of_dicts(mat_to_nx(m1, ['a'])) == d1
m2 = [
[7]
]
d2 = {'a': {'a': {'weight': 7, 'label': '7'}}}
assert nx.to_dict_of_dicts(mat_to_nx(m2, ['a'])) == d2
m3 = [
[0,0],
[0,0]
]
d3 = {
'a':{},
'b':{}
}
assert nx.to_dict_of_dicts(mat_to_nx(m3,['a','b'])) == d3
m4 = [
[7,9],
[8,6]
]
d4 = {
'a':{'a': {'weight':7,'label':'7'},
'b' : {'weight':9,'label':'9'},
},
'b':{'a': {'weight':8,'label':'8'},
'b' : {'weight':6,'label':'6'},
}
}
assert nx.to_dict_of_dicts(mat_to_nx(m4, ['a','b'])) == d4
m5 = [
[0,0],
[0,7]
]
d5 = {
'a':{},
'b':{
'b' : {'weight':7,'label':'7'},
}
}
assert nx.to_dict_of_dicts(mat_to_nx(m5,['a','b'])) == d5
m6 = [
[7,0,0],
[7,9,0],
[0,7,0]
]
d6 = {
'a':{
'a' : {'weight':7,'label':'7'},
},
'b': {
'a': {'weight':7,'label':'7'},
'b' : {'weight':9,'label':'9'}
},
'c':{
'b' : {'weight':7,'label':'7'}
}
}
assert nx.to_dict_of_dicts(mat_to_nx(m6,['a','b','c'])) == d6
Simple statistics¶
We will now compute simple statistics abour graphs. More advanced stuff will be done in Part B notebook about graph algorithms.
Outdegrees and indegrees¶
The out-degree \(\deg^+(v)\) of a node \(v\) is the number of edges going out from it, while the in-degree \(\deg^-(v)\) is the number of edges going into it.
NOTE: the out-degree and in-degree are not the sum of weights ! They just count presence or absence of edges.
For example, consider this graph:
[56]:
from sciprog import draw_adj
d = {
'a' : ['b','c'],
'b' : ['b','d'],
'c' : ['a','b','c','d'],
'd' : ['b','d']
}
draw_adj(d)

The out-degree of d
is 2, because it has one outgoing edge to b
but also an outgoing edge to itself. The indegree of d
is 3, because it has an edge coming from b
, one from c
and one self-loop from d
itself.
outdegree_adj¶
[57]:
def outdegree_adj(d, v):
""" RETURN the outdegree of a node from graph d represented as a dictionary of adjacency lists
If v is not a vertex of d, raise ValueError
"""
#jupman-raise
if v not in d:
raise ValueError("Vertex %s is not in %s" % (v, d))
return len(d[v])
#/jupman-raise
try:
outdegree_adj({'a':[]},'b')
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
assert outdegree_adj({
'a':[]
},'a') == 0
assert outdegree_adj({
'a':['a']
},'a') == 1
assert outdegree_adj({
'a':['a','b'],
'b':[]
},'a') == 2
assert outdegree_adj({
'a':['a','b'],
'b':['a','b','c'],
'c':[]
},'b') == 3
outdegree_mat¶
✪✪ RETURN the outdegree of a node i
from a graph boolean matrix nxn represented as a list of lists
If
i
is not a node of the graph, raiseValueError
[58]:
def outdegree_mat(mat, i):
#jupman-raise
n = len(mat)
if i < 0 or i > n:
raise ValueError("i %s is not a row of matrix %s" % (i, mat))
ret = 0
for j in range(n):
if mat[i][j]:
ret += 1
return ret
#/jupman-raise
try:
outdegree_mat([[False]],7)
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
try:
outdegree_mat([[False]],-1)
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
assert outdegree_mat(
[
[False]
]
,0) == 0
assert outdegree_mat(
[
[True]
],0) == 1
assert outdegree_mat(
[
[True, True],
[False, False]
],0) == 2
assert outdegree_mat(
[
[True, True, False],
[True, True, True],
[False, False, False],
]
,1) == 3
outdegree_avg¶
✪✪ RETURN the average outdegree of nodes in graph d
, represented as dictionary of adjacency lists.
Assume all nodes are in the keys.
[59]:
def outdegree_avg(d):
#jupman-raise
s = 0
for k in d:
s += len(d[k])
return s / len(d)
#/jupman-raise
assert outdegree_avg({
'a':[]
}) == 0
assert round(
outdegree_avg({
'a':['a']
})
,2) == 1.00 / 1.00
assert round(
outdegree_avg({
'a':['a','b'],
'b':[]
})
,2) == (2 + 0) / 2
assert round(
outdegree_avg({
'a':['a','b'],
'b':['a','b','c'],
'c':[]
})
,2) == round( (2 + 3) / 3 , 2)
indegree_adj¶
The indegree of a node v
is the number of edges going into it.
✪✪ RETURN the indegree of node v
in graph d
, represented as a dictionary of adjacency lists
If
v
is not a node of the graph, raiseValueError
[60]:
def indegree_adj(d, v):
#jupman-raise
if v not in d:
raise ValueError("Vertex %s is not in %s" % (v, d))
ret = 0
for k in d:
if v in d[k]:
ret += 1
return ret
#/jupman-raise
try:
indegree_adj({'a':[]},'b')
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
assert indegree_adj({
'a':[]
},'a') == 0
assert indegree_adj({
'a':['a']
},'a') == 1
assert indegree_adj({
'a':['a','b'],
'b':[]
},'a') == 1
assert indegree_adj({
'a':['a','b'],
'b':['a','b','c'],
'c':[]
},'b') == 2
indegree_mat¶
✪✪ RETURN the indegree of a node i
from a graph boolean matrix nxn represented as a list of lists
If
i
is not a node of the graph, raiseValueError
[61]:
def indegree_mat(mat, i):
#jupman-raise
n = len(mat)
if i < 0 or i > n:
raise ValueError("i %s is not a row of matrix %s" % (i, mat))
ret = 0
for k in range(n):
if mat[k][i]:
ret += 1
return ret
#/jupman-raise
try:
indegree_mat([[False]],7)
raise Exception("SHOULD HAVE FAILED !")
except ValueError:
"passed test"
assert indegree_mat(
[
[False]
]
,0) == 0
assert indegree_mat(
[
[True]
],0) == 1
assert indegree_mat(
[
[True, True],
[False, False]
],0) == 1
assert indegree_mat(
[
[True, True, False],
[True, True, True],
[False, False, False],
]
,1) == 2
indegree_avg¶
✪✪ RETURN the average indegree of nodes in graph d
, represented as dictionary of adjacency lists.
Assume all nodes are in the keys
[62]:
def indegree_avg(d):
#jupman-raise
s = 0
for k in d:
s += len(d[k])
return s / len(d)
#/jupman-raise
assert indegree_avg({
'a':[]
}) == 0
assert round(
indegree_avg({
'a':['a']
})
,2) == 1.00 / 1.00
assert round(
indegree_avg({
'a':['a','b'],
'b':[]
})
,2) == (1 + 1) / 2
assert round(
indegree_avg({
'a':['a','b'],
'b':['a','b','c'],
'c':[]
})
,2) == round( (2 + 2 + 1) / 3 , 2)
Was it worth it?¶
QUESTION: Is there any difference between the results of indegree_avg
and outdegree_avg
?
ANSWER: They give the same result. Think about what you did: for outdegree_avg
you summed over all rows and then divided by n
. For indegree_avg
you summed over all columns, and then divided by n
.
More formally, we have that the so-called degree sum formula holds (see Wikipedia for more info):
\(\sum_{v \in V} \deg^-(v) = \sum_{v \in V} \deg^+(v) = |A|\)
min_outdeg¶
Difficulty: ✪✪✪
Before proceeding please make sure you read recursions on lists chapter
[63]:
def helper(mat, start, end):
"""
Takes a graph as matrix of list of lists and RETURN the minimum
outdegree of nodes with row index between indeces start (included)
and end included
This function MUST be recursive, so it must call itself.
- HINT: REMEMBER to put return instructions in all 'if' branches!
"""
#jupman-raise
n = len(mat)
if start == end:
return mat[start].count(True)
else:
half = (start + end) // 2
min_left = helper(mat, 0, half)
min_right = helper(mat, half+1, end)
return min(min_left, min_right)
#/jupman-raise
def min_outdeg(mat):
"""
Takes a graph as matrix of list of lists and RETURN the minimum
outdegree of nodes by calling function helper.
min_outdeg function is *not* recursive, only function helper is.
"""
#jupman-raise
n = len(mat)
return helper(mat, 0, len(mat) - 1)
#/jupman-raise
assert min_outdeg(
[
[False]
]) == 0
assert min_outdeg(
[
[True]
]) == 1
assert min_outdeg(
[
[False, True],
[True, False]
]) == 1
assert min_outdeg(
[
[True, True, False],
[True, True, True],
[False, True, True],
]) == 2
assert min_outdeg(
[
[True, True, False],
[True, True, True],
[False, True, False],
]) == 1
assert min_outdeg(
[
[True, True, True],
[True, True, True],
[False, True, False],
]) == 1
networkx Indegrees and outdegrees¶
With Networkx we can easily calculate indegrees and outdegrees of a node:
[64]:
import networkx as nx
# notice with networkx if nodes are already referenced to in an adjacency list
# you do not need to put them as keys:
G=nx.DiGraph({
'a':['b','c'], # node a links to b and c
'b':['b','c', 'd'] # node b links to b itself, c and d
})
draw_nx(G)

[65]:
G.out_degree('a')
[65]:
2
QUESTION: What is the outdegree of 'b'
? Try to think about it and then confirm your thoughts with networkx:
[66]:
# write here
#print("indegree b: %s" % G.in_degree('b'))
#print("outdegree b: %s" % G.out_degree('b'))
QUESTION: We defined indegree and outdegree. Can you guess what the degree might be ? In particular, for a self pointing node like 'b'
, what could it be? Try to use G.degree('b')
methods to validate your thoughts.
[67]:
# write here
#print("degree b: %s" % G.degree('b'))
ANSWER: it is the sum of indegree and outdegree. In presence of a self-loop like for 'b'
, we count the self-loop twice, once as outgoing edge and one as incident edge
[68]:
# write here
#G.degree('b')
[69]:
draw_nx(mat_to_nx([
[7,0,0],
[7,9,0],
[0,7,0]
], ['a','b','c']))

Visualization solutions¶
Introduction¶
We will review the famous library Matplotlib which allows to display a variety of charts, and it is the base of many other visualization libraries.
References
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|- visualization
|- visualization-exercise.ipynb
|- visualization-solution.ipynb
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/visualization/visualization-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
First example¶
Let’s start with a very simple plot:
[2]:
# this is *not* a python command, it is a Jupyter-specific magic command,
# to tell jupyter we want the graphs displayed in the cell outputs
%matplotlib inline
# imports matplotlib
import matplotlib.pyplot as plt
# we can give coordinates as simple numberlists
# this are couples for the function y = 2 * x
xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]
plt.plot(xs, ys)
# we can add this after plot call, it doesn't matter
plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')
# prevents showing '<matplotlib.text.Text at 0x7fbcf3c4ff28>' in Jupyter
plt.show()

Plot style¶
To change the way the line is displayed, you can set dot styles with another string parameter. For example, to display red dots, you would add the string ro
, where r
stands for red and o
stands for dot.
[3]:
%matplotlib inline
import matplotlib.pyplot as plt
xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]
plt.plot(xs, ys, 'ro') # NOW USING RED DOTS
plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

x power 2 exercise¶
Try to display the function y = x**2
(x power 2) using green dots and for integer xs going from -10 to 10
[4]:
# write here the solution
[5]:
# SOLUTION
%matplotlib inline
import matplotlib.pyplot as plt
xs = range(-10, 10)
ys = [x**2 for x in xs ]
plt.plot(xs, ys, 'go')
plt.title("x squared")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Axis limits¶
If you want to change the x axis, you can use plt.xlim
:
[6]:
%matplotlib inline
import matplotlib.pyplot as plt
xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]
plt.plot(xs, ys, 'ro')
plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')
plt.xlim(-5, 10) # SETS LOWER X DISPLAY TO -5 AND UPPER TO 10
plt.ylim(-7, 26) # SETS LOWER Y DISPLAY TO -7 AND UPPER TO 26
plt.show()

Axis size¶
[7]:
%matplotlib inline
import matplotlib.pyplot as plt
xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]
fig = plt.figure(figsize=(10,3)) # width: 10 inches, height 3 inches
plt.plot(xs, ys, 'ro')
plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Changing tick labels¶
You can also change labels displayed on ticks on axis with plt.xticks
and plt.yticks
functions:
Note: instead of xticks
you might directly use categorical variables IF you have matplotlib >= 2.1.0
Here we use xticks
as sometimes you might need to fiddle with them anyway
[8]:
%matplotlib inline
import matplotlib.pyplot as plt
xs = [1, 2, 3, 4, 5, 6]
ys = [2, 4, 6, 8,10,12]
plt.plot(xs, ys, 'ro')
plt.title("my function")
plt.xlabel('x')
plt.ylabel('y')
# FIRST NEEDS A SEQUENCE WITH THE POSITIONS, THEN A SEQUENCE OF SAME LENGTH WITH LABELS
plt.xticks(xs, ['a', 'b', 'c', 'd', 'e', 'f'])
plt.show()

Introducting numpy¶
For functions involving reals, vanilla python starts showing its limits and its better to switch to numpy library. Matplotlib can easily handle both vanilla python sequences like lists and numpy array. Let’s see an example without numpy and one with it.
Example without numpy¶
If we only use vanilla Python (that is, Python without extra libraries like numpy), to display the function y = 2x + 1
we can come up with a solution like this
[9]:
%matplotlib inline
import matplotlib.pyplot as plt
xs = [x*0.1 for x in range(10)] # notice we can't do a range with float increments
# (and it would also introduce rounding errors)
ys = [(x * 2) + 1 for x in xs]
plt.plot(xs, ys, 'bo')
plt.title("y = 2x + 1 with vanilla python")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Example with numpy¶
With numpy, we have at our disposal several new methods for dealing with arrays.
First we can generate an interval of values with one of these methods.
Sine Python range does not allow float increments, we can use np.arange
:
[10]:
import numpy as np
xs = np.arange(0,1.0,0.1)
xs
[10]:
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
Equivalently, we could use np.linspace
:
[11]:
xs = np.linspace(0,0.9,10)
xs
[11]:
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
Numpy allows us to easily write functions on arrays in a natural manner. For example, to calculate ys
we can now do like this:
[12]:
ys = 2*xs + 1
ys
[12]:
array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8])
Let’s put everything together:
[13]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
xs = np.linspace(0,0.9,10) # left end: 0 *included* right end: 0.9 *included* number of values: 10
ys = 2*xs + 1
plt.plot(xs, ys, 'bo')
plt.title("y = 2x + 1 with numpy")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

y = sin(x) + 3 exercise¶
✪✪✪ Try to display the function y = sin(x) + 3
for x at pi/4 intervals, starting from 0. Use exactly 8 ticks.
NOTE: 8 is the number of x ticks (telecom people would use the term ‘samples’), NOT the x of the last tick !!
try to solve it without using numpy. For pi, use constant
math.pi
(first you need to importmath
module)try to solve it with numpy. For pi, use constant
np.pi
(which is exactly the same asmath.pi
)
b.1) solve it with np.arange
b.2) solve it with np.linspace
For each tick, use the label sequence
"0π/4", "1π/4" , "2π/4", "3π/4" , "4π/4", "5π/4", ....
. Obviously writing them by hand is easy, try instead to devise a method that works for any number of ticks. What is changing in the sequence? What is constant? What is the type of the part changes ? What is final type of the labels you want to obtain ?If you are in the mood, try to display them better like 0, π/4 , π/2 π, 3π/4 , π, 5π/4 possibly using Latex (requires some search, this example might be a starting point)
NOTE: Latex often involves the usage of the \
bar, like in \frac{2,3}
. If we use it directly, Python will interpret \f
as a special character and will not send to the Latex processor the string we meant:
[14]:
'\frac{2,3}'
[14]:
'\x0crac{2,3}'
One solution would be to double the slashes, like this:
[15]:
'\\frac{2,3}'
[15]:
'\\frac{2,3}'
An even better one is to prepend the string with the r
character, which allows to write slashes only once:
[16]:
r'\frac{2,3}'
[16]:
'\\frac{2,3}'
[17]:
# write here solution for a) y = sin(x) + 3 with vanilla python
[18]:
# SOLUTION a) y = sin(x) + 3 with vanilla python
%matplotlib inline
import matplotlib.pyplot as plt
import math
xs = [x * (math.pi)/4 for x in range(8)]
ys = [math.sin(x) + 3 for x in xs]
plt.plot(xs, ys)
plt.title("a) solution y = sin(x) + 3 with vanilla python ")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

[19]:
# write here solution b.1) y = sin(x) + 3 with numpy, arange
[20]:
# SOLUTION b.1) y = sin(x) + 3 with numpy, linspace
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# left end = 0 right end = 7/4 pi 8 points
# notice numpy.pi is exactly the same as vanilla math.pi
xs = np.arange(0, # included
8 * np.pi/4, # *not* included (we put 8, as we actually want 7 to be included)
np.pi/4 )
ys = np.sin(xs) + 3 # notice we know operate on arrays. All numpy functions can operate on them
plt.plot(xs, ys)
plt.title("b.1 solution y = sin(x) + 3 with numpy arange")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

[21]:
# write here solution b.2) y = sin(x) + 3 with numpy, linspace
[22]:
# SOLUTION b.2) y = sin(x) + 3 with numpy, linspace
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# left end = 0 right end = 7/4 pi 8 points
# notice numpy.pi is exactly the same as vanilla math.pi
xs = np.linspace(0, (np.pi/4) * 7 , 8)
ys = np.sin(xs) + 3 # notice we know operate on arrays. All numpy functions can operate on them
plt.plot(xs, ys)
plt.title("b2 solution y = sin(x) + 3 with numpy , linspace")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

[23]:
# write here solution c) y = sin(x) + 3 with numpy and pi xlabels
[24]:
# SOLUTION c) y = sin(x) + 3 with numpy and pi xlabels
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
xs = np.linspace(0, (np.pi/4) * 7 , 8) # left end = 0 right end = 7/4 pi 8 points
ys = np.sin(xs) + 3 # notice we know operate on arrays. All numpy functions can operate on them
plt.plot(xs, ys)
plt.title("c) solution y = sin(x) + 3 with numpy and pi xlabels")
plt.xlabel('x')
plt.ylabel('y')
# FIRST NEEDS A SEQUENCE WITH THE POSITIONS, THEN A SEQUENCE OF SAME LENGTH WITH LABELS
plt.xticks(xs, ["%sπ/4" % x for x in range(8) ])
plt.show()

Showing degrees per node¶
Going back to the indegrees and outdegrees as seen in Network statistics chapter, we will try to study the distributions visually.
Let’s take an example networkx DiGraph:
[59]:
import networkx as nx
G1=nx.DiGraph({
'a':['b','c'],
'b':['b','c', 'd'],
'c':['a','b','d'],
'd':['b', 'd']
})
draw_nx(G1)

indegree per node¶
✪✪ Display a plot for graph G
where the xtick labels are the nodes, and the y is the indegree of those nodes.
Note: instead of xticks
you might directly use categorical variables IF you have matplotlib >= 2.1.0
Here we use xticks
as sometimes you might need to fiddle with them anyway
To get the nodes, you can use the G1.nodes()
function:
[26]:
G1.nodes()
[26]:
NodeView(('a', 'b', 'c', 'd'))
It gives back a NodeView
which is not a list, but still you can iterate through it with a for in
cycle:
[27]:
for n in G1.nodes():
print(n)
a
b
c
d
Also, you can get the indegree of a node with
[28]:
G1.in_degree('b')
[28]:
4
[29]:
# write here the solution
[30]:
# SOLUTION
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(G1.number_of_nodes())
ys_in = [G1.in_degree(n) for n in G1.nodes() ]
plt.plot(xs, ys_in, 'bo')
plt.ylim(0,max(ys_in) + 1)
plt.xlim(0,max(xs) + 1)
plt.title("G1 Indegrees per node solution")
plt.xticks(xs, G1.nodes())
plt.xlabel('node')
plt.ylabel('indegree')
plt.show()

Bar plots¶
The previous plot with dots doesn’t look so good - we might try to use instead a bar plot. First look at this this example, then proceed with the next exercise
[31]:
import numpy as np
import matplotlib.pyplot as plt
xs = [1,2,3,4]
ys = [7,5,8,2 ]
plt.bar(xs, ys,
0.5, # the width of the bars
color='green', # someone suggested the default blue color is depressing, so let's put green
align='center') # bars are centered on the xtick
plt.show()

indegree per node bar plot¶
✪✪ Display a bar plot for graph G1
where the xtick labels are the nodes, and the y is the indegree of those nodes.
[32]:
# write here
[33]:
# SOLUTION
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(G1.number_of_nodes())
ys_in = [G1.in_degree(n) for n in G1.nodes() ]
plt.bar(xs, ys_in, 0.5, align='center')
plt.title("G1 Indegrees per node solution")
plt.xticks(xs, G1.nodes())
plt.xlabel('node')
plt.ylabel('indegree')
plt.show()

indegree per node sorted alphabetically¶
✪✪ Display the same bar plot as before, but now sort nodes alphabetically.
NOTE: you cannot run .sort()
method on the result given by G1.nodes()
, because nodes in network by default have no inherent order. To use .sort()
you need first to convert the result to a list
object.
[34]:
# SOLUTION
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(G1.number_of_nodes())
xs_labels = list(G1.nodes())
xs_labels.sort()
ys_in = [G1.in_degree(n) for n in xs_labels ]
plt.bar(xs, ys_in, 0.5, align='center')
plt.title("G1 Indegrees per node, sorted labels solution")
plt.xticks(xs, xs_labels)
plt.xlabel('node')
plt.ylabel('indegree')
plt.show()

[35]:
# write here
indegree per node sorted¶
✪✪✪ Display the same bar plot as before, but now sort nodes according to their indegree. This is more challenging, to do it you need to use some sort trick. First read the Python documentation and then:
create a list of couples (list of tuples) where each tuple is the node identifier and the corresponding indegree
sort the list by using the second value of the tuples as a key.
[36]:
# write here
[37]:
# SOLUTION
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(G1.number_of_nodes())
coords = [(v, G1.in_degree(v)) for v in G1.nodes() ]
coords.sort(key=lambda c: c[1])
ys_in = [c[1] for c in coords]
plt.bar(xs, ys_in, 0.5, align='center')
plt.title("G1 Indegrees per node, sorted by indegree solution")
plt.xticks(xs, [c[0] for c in coords])
plt.xlabel('node')
plt.ylabel('indegree')
plt.show()

out degrees per node sorted¶
✪✪✪ Do the same graph as before for the outdegrees.
You can get the outdegree of a node with:
[38]:
G1.out_degree('b')
[38]:
3
[39]:
# SOLUTION
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(G1.number_of_nodes())
coords = [(v, G1.out_degree(v)) for v in G1.nodes() ]
coords.sort(key=lambda c: c[1])
ys_out = [c[1] for c in coords]
plt.bar(xs, ys_out, 0.5, align='center')
plt.title("G1 Outdegrees per node sorted solution")
plt.xticks(xs, [c[0] for c in coords])
plt.xlabel('node')
plt.ylabel('outdegree')
plt.show()

[40]:
# write here
degrees per node¶
✪✪✪ We might check as well the sorted degrees per node, intended as the sum of in_degree and out_degree. To get the sum, use G1.degree(node)
function.
[41]:
# write here the solution
[42]:
# SOLUTION
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(G1.number_of_nodes())
coords = [(v, G1.degree(v)) for v in G1.nodes() ]
coords.sort(key=lambda c: c[1])
ys_deg = [c[1] for c in coords]
plt.bar(xs, ys_deg, 0.5, align='center')
plt.title("G1 degrees per node sorted SOLUTION")
plt.xticks(xs, [c[0] for c in coords])
plt.xlabel('node')
plt.ylabel('degree')
plt.show()

✪✪✪✪ EXERCISE: Look at this example, and make a double bar chart sorting nodes by their total degree. To do so, in the tuples you will need vertex
, in_degree
, out_degree
and also degree
.
[43]:
# write here
[44]:
# SOLUTION
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(G1.number_of_nodes())
coords = [(v, G1.degree(v), G1.in_degree(v), G1.out_degree(v) ) for v in G1.nodes() ]
coords.sort(key=lambda c: c[1])
ys_deg = [c[1] for c in coords]
ys_in = [c[2] for c in coords]
ys_out = [c[3] for c in coords]
width = 0.35
fig, ax = plt.subplots()
rects1 = ax.bar(xs - width/2, ys_in, width,
color='SkyBlue', label='indegrees')
rects2 = ax.bar(xs + width/2, ys_out, width,
color='IndianRed', label='outdegrees')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_title('G1 in and out degrees per node SOLUTION')
ax.set_xticks(xs)
ax.set_xticklabels([c[0] for c in coords])
ax.legend()
plt.show()

Frequency histogram¶
Now let’s try to draw degree frequencies, that is, for each degree present in the graph we want to display a bar as high as the number of times that particular degree appears.
For doing so, we will need a matplot histogram, see documentation
We will need to tell matplotlib how many columns we want, which in histogram terms are called bins. We also need to give the histogram a series of numbers so it can count how many times each number occurs. Let’s consider this graph G2
:
[61]:
import networkx as nx
G2=nx.DiGraph({
'a':['b','c'],
'b':['b','c', 'd'],
'c':['a','b','d'],
'd':['b', 'd','e'],
'e':[],
'f':['c','d','e'],
'g':['e','g']
})
draw_nx(G2)

If we take the the degree sequence of G2
we get this:
[46]:
degrees_G2 = [G2.degree(n) for n in G2.nodes()]
degrees_G2
[46]:
[3, 7, 3, 6, 7, 3, 3]
We see 3 appears four times, 6 once, and seven twice.
Let’s try to determine a good number for the bins. First we can check the boundaries our x axis should have:
[47]:
min(degrees_G2)
[47]:
3
[48]:
max(degrees_G2)
[48]:
7
So our histogram on the x axis must go at least from 3 and at least to 7. If we want integer columns (bins), we will need at least ticks for going from 3 included to 7 included, so at least ticks for 3,4,5,6,7. For getting precise display, wen we have integer x it is best to also manually provide the sequence of bin edges, remembering it should start at least from the minimum included (in our case, 3) and arrive to the maximum + 1 included (in our case, 7 + 1 = 8)
NOTE: precise histogram drawing can be quite tricky, please do read this StackOverflow post for more details about it.
[49]:
import matplotlib.pyplot as plt
import numpy as np
degrees = [G2.degree(n) for n in G2.nodes()]
# add histogram
# in this case hist returns a tuple of three values
# we put in three variables
n, bins, columns = plt.hist(degrees_G2,
bins=range(3,9), # 3 *included* , 4, 5, 6, 7, 8 *included*
width=1.0) # graphical width of the bars
plt.xlabel('Degrees')
plt.ylabel('Frequency counts')
plt.title('G2 Degree distribution')
plt.xlim(0, max(degrees) + 2)
plt.show()

As expected we see 3 is counted four times, 6 once, and seven twice.
✪✪✪ EXERCISE: Still, it would be visually better to align the x ticks to the middle of the bars with xticks
, and also to make the graph more tight by setting the xlim
appropriately. This is not always easy to do.
Read carefully this StackOverflow post and try do it by yourself.
NOTE: set one thing at a time and try if it works(i.e. first xticks and then xlim), doing everything at once might get quite confusing
[50]:
# write here the solution
[51]:
# SOLUTION
import matplotlib.pyplot as plt
import numpy as np
degrees = [G2.degree(n) for n in G2.nodes()]
# add histogram
min_x = min(degrees) # 3
max_x = max(degrees) # 7
bar_width = 1.0
# in this case hist returns a tuple of three values
# we put in three variables
n, bins, columns = plt.hist(degrees_G2,
bins= range(3,9), # 3 *included* to 9 *excluded*
# it is like the xs, but with one number more !!
# to understand why read this
# https://stackoverflow.com/questions/27083051/matplotlib-xticks-not-lining-up-with-histogram/27084005#27084005
width=bar_width) # graphical width of the bars
plt.xlabel('Degrees')
plt.ylabel('Frequency counts')
plt.title('G2 Degree distribution, tight graph SOLUTION')
xs = np.arange(min_x,max_x + 1) # 3 *included* to 8 *excluded*
# used numpy so we can later reuse it for float vector operations
plt.xticks(xs + bar_width / 2, # position of ticks
xs ) # labels of ticks
plt.xlim(min_x, max_x + 1) # 3 *included* to 8 *excluded*
plt.show()

Showing plots side by side¶
You can display plots on a grid. Each cell in the grid is idientified by only one number. For example, for a grid of two rows and three columns, you would have cells indexed like this:
1 2 3
4 5 6
[52]:
%matplotlib inline
import matplotlib.pyplot as plt
import math
xs = [1,2,3,4,5,6]
# cells:
# 1 2 3
# 4 5 6
plt.subplot(2, # 2 rows
3, # 3 columns
1) # plotting in first cell
ys1 = [x**3 for x in xs]
plt.plot(xs, ys1)
plt.title('first cell')
plt.subplot(2, # 2 rows
3, # 3 columns
2) # plotting in first cell
ys2 = [2*x + 1 for x in xs]
plt.plot(xs,ys2)
plt.title('2nd cell')
plt.subplot(2, # 2 rows
3, # 3 columns
3) # plotting in third cell
ys3 = [-2*x + 1 for x in xs]
plt.plot(xs,ys3)
plt.title('3rd cell')
plt.subplot(2, # 2 rows
3, # 3 columns
4) # plotting in fourth cell
ys4 = [-2*x**2 for x in xs]
plt.plot(xs,ys4)
plt.title('4th cell')
plt.subplot(2, # 2 rows
3, # 3 columns
5) # plotting in fifth cell
ys5 = [math.sin(x) for x in xs]
plt.plot(xs,ys5)
plt.title('5th cell')
plt.subplot(2, # 2 rows
3, # 3 columns
6) # plotting in sixth cell
ys6 = [-math.cos(x) for x in xs]
plt.plot(xs,ys6)
plt.title('6th cell')
plt.show()

Graph models¶
Let’s study frequencies of some known network types.
Erdős–Rényi model¶
✪✪ A simple graph model we can think of is the so-called Erdős–Rényi model: is is an undirected graph where have n
nodes, and each node is connected to each other with probability p
. In networkx, we can generate a random one by issuing this command:
[53]:
G = nx.erdos_renyi_graph(10, 0.5)
In the drawing, by looking the absence of arrows confirms it is undirected:
[62]:
draw_nx(G)

Try plotting degree distribution for different values of p
(0.1, 0.5, 0.9) with a fixed n=1000
, putting them side by side on the same row. What does their distribution look like ? Where are they centered ?
To avoid rewriting the same code again and again, define a plot_erdos(n,p,j)
function to be called three times.
[55]:
# write here the solution
[56]:
# SOLUTION
import matplotlib.pyplot as plt
import numpy as np
def plot_erdos(n, p, j):
G = nx.erdos_renyi_graph(n, p)
plt.subplot(1, # 1 row
3, # 3 columns
j) # plotting in jth cell
degrees = [G.degree(n) for n in G.nodes()]
num_bins = 20
n, bins, columns = plt.hist(degrees, num_bins, width=1.0)
plt.xlabel('Degrees')
plt.ylabel('Frequency counts')
plt.title('p = %s' % p)
n = 1000
fig = plt.figure(figsize=(15,6)) # width: 10 inches, height 3 inches
plot_erdos(n, 0.1, 1)
plot_erdos(n, 0.5, 2)
plot_erdos(n, 0.9, 3)
print()
print(" Erdős–Rényi degree distribution SOLUTION")
plt.show()
Erdős–Rényi degree distribution SOLUTION

Other plots¶
Matplotlib allows to display pretty much any you might like, here we collect some we use in the course, for others, see the extensive Matplotlib documentation
Pie chart¶
[57]:
%matplotlib inline
import matplotlib.pyplot as plt
labels = ['Oranges', 'Apples', 'Cocumbers']
fracs = [14, 23, 5] # how much for each sector, note doesn't need to add up to 100
plt.pie(fracs, labels=labels, autopct='%1.1f%%', shadow=True)
plt.title("Super strict vegan diet (good luck)")
plt.show()

Pandas solutions¶
1. Introduction¶
Today we will try analyzing data with Pandas
data analysis with Pandas library
plotting with MatPlotLib
Examples from AstroPi dataset
Exercises with meteotrentino dataset
Python gives powerful tools for data analysis:
One of these is Pandas, which gives fast and flexible data structures, especially for interactive data analusis.
What to do¶
Install Pandas:
Anaconda:
conda install pandas
Without Anaconda (
--user
installs in your home):python3 -m pip install --user pandas
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|- pandas
|- pandas-exercise.ipynb
|- pandas-solution.ipynb
WARNING 1: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser.
The browser should show a file list: navigate the list and open the notebook
exercises/network-statistics/pandas-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
2. Data analysis of Astro Pi¶
Let’s try analyzing data recorded on a Raspberry present on the International Space Station, downloaded from here:
raspberrypi.org/learning/astro-pi-flight-data-analysis/worksheet
in which it is possible to find the detailed description of data gathered by sensors, in the month of February 2016 (one record each 10 seconds).
The method read_csv
imports data from a CSV file and saves them in DataFrame structure.
In this exercise we shall use the file Columbus_Ed_astro_pi_datalog.csv
[2]:
import pandas as pd # we import pandas and for ease we rename it to 'pd'
import numpy as np # we import numpy and for ease we rename it to 'np'
# remember the encoding !
df = pd.read_csv('Columbus_Ed_astro_pi_datalog.csv', encoding='UTF-8')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110869 entries, 0 to 110868
Data columns (total 20 columns):
ROW_ID 110869 non-null int64
temp_cpu 110869 non-null float64
temp_h 110869 non-null float64
temp_p 110869 non-null float64
humidity 110869 non-null float64
pressure 110869 non-null float64
pitch 110869 non-null float64
roll 110869 non-null float64
yaw 110869 non-null float64
mag_x 110869 non-null float64
mag_y 110869 non-null float64
mag_z 110869 non-null float64
accel_x 110869 non-null float64
accel_y 110869 non-null float64
accel_z 110869 non-null float64
gyro_x 110869 non-null float64
gyro_y 110869 non-null float64
gyro_z 110869 non-null float64
reset 110869 non-null int64
time_stamp 110869 non-null object
dtypes: float64(17), int64(2), object(1)
memory usage: 16.9+ MB
We can quickly see rows and columns of the dataframe with the attribute shape
:
NOTE: shape
is not followed by rounded parenthesis !
[3]:
df.shape
[3]:
(110869, 20)
The describe
method gives you on the fly many summary info:
rows counting
the average
minimum and maximum
[4]:
df.describe()
[4]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 110869.000000 | 110869.000000 | 110869.000000 | 110869.000000 | 110869.000000 | 110869.000000 | 110869.000000 | 110869.000000 | 110869.00000 | 110869.000000 | 110869.000000 | 110869.000000 | 110869.000000 | 110869.000000 | 110869.000000 | 1.108690e+05 | 110869.000000 | 1.108690e+05 | 110869.000000 |
mean | 55435.000000 | 32.236259 | 28.101773 | 25.543272 | 46.252005 | 1008.126788 | 2.770553 | 51.807973 | 200.90126 | -19.465265 | -1.174493 | -6.004529 | -0.000630 | 0.018504 | 0.014512 | -8.959493e-07 | 0.000007 | -9.671594e-07 | 0.000180 |
std | 32005.267835 | 0.360289 | 0.369256 | 0.380877 | 1.907273 | 3.093485 | 21.848940 | 2.085821 | 84.47763 | 28.120202 | 15.655121 | 8.552481 | 0.000224 | 0.000604 | 0.000312 | 2.807614e-03 | 0.002456 | 2.133104e-03 | 0.060065 |
min | 1.000000 | 31.410000 | 27.200000 | 24.530000 | 42.270000 | 1001.560000 | 0.000000 | 30.890000 | 0.01000 | -73.046240 | -43.810030 | -41.163040 | -0.025034 | -0.005903 | -0.022900 | -3.037930e-01 | -0.378412 | -2.970800e-01 | 0.000000 |
25% | 27718.000000 | 31.960000 | 27.840000 | 25.260000 | 45.230000 | 1006.090000 | 1.140000 | 51.180000 | 162.43000 | -41.742792 | -12.982321 | -11.238430 | -0.000697 | 0.018009 | 0.014349 | -2.750000e-04 | -0.000278 | -1.200000e-04 | 0.000000 |
50% | 55435.000000 | 32.280000 | 28.110000 | 25.570000 | 46.130000 | 1007.650000 | 1.450000 | 51.950000 | 190.58000 | -21.339485 | -1.350467 | -5.764400 | -0.000631 | 0.018620 | 0.014510 | -3.000000e-06 | -0.000004 | -1.000000e-06 | 0.000000 |
75% | 83152.000000 | 32.480000 | 28.360000 | 25.790000 | 46.880000 | 1010.270000 | 1.740000 | 52.450000 | 256.34000 | 7.299000 | 11.912456 | -0.653705 | -0.000567 | 0.018940 | 0.014673 | 2.710000e-04 | 0.000271 | 1.190000e-04 | 0.000000 |
max | 110869.000000 | 33.700000 | 29.280000 | 26.810000 | 60.590000 | 1021.780000 | 360.000000 | 359.400000 | 359.98000 | 33.134748 | 37.552135 | 31.003047 | 0.018708 | 0.041012 | 0.029938 | 2.151470e-01 | 0.389499 | 2.698760e-01 | 20.000000 |
QUESTION: is there some missing field from the table produced by describe? Why is it not included?
To limit describe
to only one column like humidity
, you can write like this:
[5]:
df['humidity'].describe()
[5]:
count 110869.000000
mean 46.252005
std 1.907273
min 42.270000
25% 45.230000
50% 46.130000
75% 46.880000
max 60.590000
Name: humidity, dtype: float64
Notation with the dot is even more handy:
[6]:
df.humidity.describe()
[6]:
count 110869.000000
mean 46.252005
std 1.907273
min 42.270000
25% 45.230000
50% 46.130000
75% 46.880000
max 60.590000
Name: humidity, dtype: float64
WARNING: Careful about spaces!:
In case the field name has spaces (es. 'blender rotations'
), do not use the dot notation, instead use squared bracket notation seen above (ie: df.['blender rotations'].describe()
)
head
method gives back the first datasets:
[7]:
df.head()
[7]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | time_stamp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 31.88 | 27.57 | 25.01 | 44.94 | 1001.68 | 1.49 | 52.25 | 185.21 | -46.422753 | -8.132907 | -12.129346 | -0.000468 | 0.019439 | 0.014569 | 0.000942 | 0.000492 | -0.000750 | 20 | 2016-02-16 10:44:40 |
1 | 2 | 31.79 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.03 | 53.73 | 186.72 | -48.778951 | -8.304243 | -12.943096 | -0.000614 | 0.019436 | 0.014577 | 0.000218 | -0.000005 | -0.000235 | 0 | 2016-02-16 10:44:50 |
2 | 3 | 31.66 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.24 | 53.57 | 186.21 | -49.161878 | -8.470832 | -12.642772 | -0.000569 | 0.019359 | 0.014357 | 0.000395 | 0.000600 | -0.000003 | 0 | 2016-02-16 10:45:00 |
3 | 4 | 31.69 | 27.52 | 25.01 | 45.32 | 1001.69 | 1.57 | 53.63 | 186.03 | -49.341941 | -8.457380 | -12.615509 | -0.000575 | 0.019383 | 0.014409 | 0.000308 | 0.000577 | -0.000102 | 0 | 2016-02-16 10:45:10 |
4 | 5 | 31.66 | 27.54 | 25.01 | 45.18 | 1001.71 | 0.85 | 53.66 | 186.46 | -50.056683 | -8.122609 | -12.678341 | -0.000548 | 0.019378 | 0.014380 | 0.000321 | 0.000691 | 0.000272 | 0 | 2016-02-16 10:45:20 |
tail
method gives back last dataset:
[8]:
df.tail()
[8]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | time_stamp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
110864 | 110865 | 31.56 | 27.52 | 24.83 | 42.94 | 1005.83 | 1.58 | 49.93 | 129.60 | -15.169673 | -27.642610 | 1.563183 | -0.000682 | 0.017743 | 0.014646 | -0.000264 | 0.000206 | 0.000196 | 0 | 2016-02-29 09:24:21 |
110865 | 110866 | 31.55 | 27.50 | 24.83 | 42.72 | 1005.85 | 1.89 | 49.92 | 130.51 | -15.832622 | -27.729389 | 1.785682 | -0.000736 | 0.017570 | 0.014855 | 0.000143 | 0.000199 | -0.000024 | 0 | 2016-02-29 09:24:30 |
110866 | 110867 | 31.58 | 27.50 | 24.83 | 42.83 | 1005.85 | 2.09 | 50.00 | 132.04 | -16.646212 | -27.719479 | 1.629533 | -0.000647 | 0.017657 | 0.014799 | 0.000537 | 0.000257 | 0.000057 | 0 | 2016-02-29 09:24:41 |
110867 | 110868 | 31.62 | 27.50 | 24.83 | 42.81 | 1005.88 | 2.88 | 49.69 | 133.00 | -17.270447 | -27.793136 | 1.703806 | -0.000835 | 0.017635 | 0.014877 | 0.000534 | 0.000456 | 0.000195 | 0 | 2016-02-29 09:24:50 |
110868 | 110869 | 31.57 | 27.51 | 24.83 | 42.94 | 1005.86 | 2.17 | 49.77 | 134.18 | -17.885872 | -27.824149 | 1.293345 | -0.000787 | 0.017261 | 0.014380 | 0.000459 | 0.000076 | 0.000030 | 0 | 2016-02-29 09:25:00 |
colums
property gives the column headers:
[9]:
df.columns
[9]:
Index(['ROW_ID', 'temp_cpu', 'temp_h', 'temp_p', 'humidity', 'pressure',
'pitch', 'roll', 'yaw', 'mag_x', 'mag_y', 'mag_z', 'accel_x', 'accel_y',
'accel_z', 'gyro_x', 'gyro_y', 'gyro_z', 'reset', 'time_stamp'],
dtype='object')
Nota: as you see in the above, the type of the found object is not a list, but a special container defined by pandas:
[10]:
type(df.columns)
[10]:
pandas.core.indexes.base.Index
Nevertheless, we can access the elements of this container using indeces within the squared parenthesis:
[11]:
df.columns[0]
[11]:
'ROW_ID'
[12]:
df.columns[1]
[12]:
'temp_cpu'
2.1 Exercise: meteo info¶
✪ a) Create a new dataframe called meteo
by importing the data from file meteo.csv, which contains the meteo data of Trento from November 2017 (source: https://www.meteotrentino.it). IMPORTANT: assign the dataframe to a variable called meteo
(so we avoid confusion whith AstroPi dataframe)
Visualize the information about this dataframe.
[13]:
# write here - create dataframe
meteo = pd.read_csv('meteo.csv', encoding='UTF-8')
print("COLUMNS:")
print()
print(meteo.columns)
print()
print("INFO:")
print(meteo.info())
print()
print("HEAD():")
meteo.head()
COLUMNS:
Index(['Date', 'Pressure', 'Rain', 'Temp'], dtype='object')
INFO:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2878 entries, 0 to 2877
Data columns (total 4 columns):
Date 2878 non-null object
Pressure 2878 non-null float64
Rain 2878 non-null float64
Temp 2878 non-null float64
dtypes: float64(3), object(1)
memory usage: 90.0+ KB
None
HEAD():
[13]:
Date | Pressure | Rain | Temp | |
---|---|---|---|---|
0 | 01/11/2017 00:00 | 995.4 | 0.0 | 5.4 |
1 | 01/11/2017 00:15 | 995.5 | 0.0 | 6.0 |
2 | 01/11/2017 00:30 | 995.5 | 0.0 | 5.9 |
3 | 01/11/2017 00:45 | 995.7 | 0.0 | 5.4 |
4 | 01/11/2017 01:00 | 995.7 | 0.0 | 5.3 |
3. Indexing, filtering, ordering¶
To obtain the i-th series you can use the method iloc[i]
(here we reuse AstroPi dataset) :
[14]:
df.iloc[6]
[14]:
ROW_ID 7
temp_cpu 31.68
temp_h 27.53
temp_p 25.01
humidity 45.31
pressure 1001.7
pitch 0.63
roll 53.55
yaw 186.1
mag_x -50.4473
mag_y -7.93731
mag_z -12.1886
accel_x -0.00051
accel_y 0.019264
accel_z 0.014528
gyro_x -0.000111
gyro_y 0.00032
gyro_z 0.000222
reset 0
time_stamp 2016-02-16 10:45:41
Name: 6, dtype: object
It is possible to select a dataframe by near positions using slicing:
Here for example we select the rows from 5th included to 7-th excluded :
[15]:
df.iloc[5:7]
[15]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | time_stamp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | 31.69 | 27.55 | 25.01 | 45.12 | 1001.67 | 0.85 | 53.53 | 185.52 | -50.246476 | -8.343209 | -11.938124 | -0.000536 | 0.019453 | 0.014380 | 0.000273 | 0.000494 | -0.000059 | 0 | 2016-02-16 10:45:30 |
6 | 7 | 31.68 | 27.53 | 25.01 | 45.31 | 1001.70 | 0.63 | 53.55 | 186.10 | -50.447346 | -7.937309 | -12.188574 | -0.000510 | 0.019264 | 0.014528 | -0.000111 | 0.000320 | 0.000222 | 0 | 2016-02-16 10:45:41 |
It is possible to filter data according to a condition:
We che discover the data type, for example for df.ROW_ID >= 6
:
[16]:
type(df.ROW_ID >= 6)
[16]:
pandas.core.series.Series
What is contained in this Series
object ? If we try printing it we will see it is a series of values True or False, according whether the ROW_ID
is greater or equal than 6:
[17]:
df.ROW_ID >= 6
[17]:
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
13 True
14 True
15 True
16 True
17 True
18 True
19 True
20 True
21 True
22 True
23 True
24 True
25 True
26 True
27 True
28 True
29 True
...
110839 True
110840 True
110841 True
110842 True
110843 True
110844 True
110845 True
110846 True
110847 True
110848 True
110849 True
110850 True
110851 True
110852 True
110853 True
110854 True
110855 True
110856 True
110857 True
110858 True
110859 True
110860 True
110861 True
110862 True
110863 True
110864 True
110865 True
110866 True
110867 True
110868 True
Name: ROW_ID, Length: 110869, dtype: bool
In an analogue way (df.ROW_ID >= 6) & (df.ROW_ID <= 10)
is a series of values True
or False
, if ROW_ID
is at the same time greater or equal than 6 and less or equal of 10
[18]:
type((df.ROW_ID >= 6) & (df.ROW_ID <= 10))
[18]:
pandas.core.series.Series
If we want complete rows of the dataframe which satisfy the condition, we can write like this:
IMPORTANT: we use df
externally from expression df[ ]
starting and closing the square bracket parenthesis to tell Python we want to filter the df
dataframe, and use again df
inside the parenthesis to tell on which columns and which rows we want to filter
[19]:
df[ (df.ROW_ID >= 6) & (df.ROW_ID <= 10) ]
[19]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | time_stamp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | 31.69 | 27.55 | 25.01 | 45.12 | 1001.67 | 0.85 | 53.53 | 185.52 | -50.246476 | -8.343209 | -11.938124 | -0.000536 | 0.019453 | 0.014380 | 0.000273 | 0.000494 | -0.000059 | 0 | 2016-02-16 10:45:30 |
6 | 7 | 31.68 | 27.53 | 25.01 | 45.31 | 1001.70 | 0.63 | 53.55 | 186.10 | -50.447346 | -7.937309 | -12.188574 | -0.000510 | 0.019264 | 0.014528 | -0.000111 | 0.000320 | 0.000222 | 0 | 2016-02-16 10:45:41 |
7 | 8 | 31.66 | 27.55 | 25.01 | 45.34 | 1001.70 | 1.49 | 53.65 | 186.08 | -50.668232 | -7.762600 | -12.284196 | -0.000523 | 0.019473 | 0.014298 | -0.000044 | 0.000436 | 0.000301 | 0 | 2016-02-16 10:45:50 |
8 | 9 | 31.67 | 27.54 | 25.01 | 45.20 | 1001.72 | 1.22 | 53.77 | 186.55 | -50.761529 | -7.262934 | -11.981090 | -0.000522 | 0.019385 | 0.014286 | 0.000358 | 0.000651 | 0.000187 | 0 | 2016-02-16 10:46:01 |
9 | 10 | 31.67 | 27.54 | 25.01 | 45.41 | 1001.75 | 1.63 | 53.46 | 185.94 | -51.243832 | -6.875270 | -11.672494 | -0.000581 | 0.019390 | 0.014441 | 0.000266 | 0.000676 | 0.000356 | 0 | 2016-02-16 10:46:10 |
So if we want to search the record where pressure is maximal, we user values
property of the series on which we calculate the maximal value:
[20]:
df[ (df.pressure == df.pressure.values.max()) ]
[20]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | time_stamp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
77602 | 77603 | 32.44 | 28.31 | 25.74 | 47.57 | 1021.78 | 1.1 | 51.82 | 267.39 | -0.797428 | 10.891803 | -15.728202 | -0.000612 | 0.01817 | 0.014295 | -0.000139 | -0.000179 | -0.000298 | 0 | 2016-02-25 12:13:20 |
The method sort_values
return a dataframe ordered according to one or more columns:
[21]:
df.sort_values('pressure',ascending=False).head()
[21]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | time_stamp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
77602 | 77603 | 32.44 | 28.31 | 25.74 | 47.57 | 1021.78 | 1.10 | 51.82 | 267.39 | -0.797428 | 10.891803 | -15.728202 | -0.000612 | 0.018170 | 0.014295 | -0.000139 | -0.000179 | -0.000298 | 0 | 2016-02-25 12:13:20 |
77601 | 77602 | 32.45 | 28.30 | 25.74 | 47.26 | 1021.75 | 1.53 | 51.76 | 266.12 | -1.266335 | 10.927442 | -15.690558 | -0.000661 | 0.018357 | 0.014533 | 0.000152 | 0.000459 | -0.000298 | 0 | 2016-02-25 12:13:10 |
77603 | 77604 | 32.44 | 28.30 | 25.74 | 47.29 | 1021.75 | 1.86 | 51.83 | 268.83 | -0.320795 | 10.651441 | -15.565123 | -0.000648 | 0.018290 | 0.014372 | 0.000049 | 0.000473 | -0.000029 | 0 | 2016-02-25 12:13:30 |
77604 | 77605 | 32.43 | 28.30 | 25.74 | 47.39 | 1021.75 | 1.78 | 51.54 | 269.41 | -0.130574 | 10.628383 | -15.488983 | -0.000672 | 0.018154 | 0.014602 | 0.000360 | 0.000089 | -0.000002 | 0 | 2016-02-25 12:13:40 |
77608 | 77609 | 32.42 | 28.29 | 25.74 | 47.36 | 1021.73 | 0.86 | 51.89 | 272.77 | 0.952025 | 10.435951 | -16.027235 | -0.000607 | 0.018186 | 0.014232 | -0.000260 | -0.000059 | -0.000187 | 0 | 2016-02-25 12:14:20 |
The loc
property allows to filter rows according to a property and select a column, which can be new. In this case, for rows where temperature is too much, we write True
value in the fields of the column with header'Too hot'
:
[22]:
df.loc[(df.temp_cpu > 31.68),'Too hot'] = True
Let’s see the resulting table (scroll until the end to see the new column). We note the values from the rows we did not filter are represented with NaN
, which literally means not a number :
[23]:
df.head()
[23]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | ... | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | time_stamp | Too hot | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 31.88 | 27.57 | 25.01 | 44.94 | 1001.68 | 1.49 | 52.25 | 185.21 | -46.422753 | ... | -12.129346 | -0.000468 | 0.019439 | 0.014569 | 0.000942 | 0.000492 | -0.000750 | 20 | 2016-02-16 10:44:40 | True |
1 | 2 | 31.79 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.03 | 53.73 | 186.72 | -48.778951 | ... | -12.943096 | -0.000614 | 0.019436 | 0.014577 | 0.000218 | -0.000005 | -0.000235 | 0 | 2016-02-16 10:44:50 | True |
2 | 3 | 31.66 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.24 | 53.57 | 186.21 | -49.161878 | ... | -12.642772 | -0.000569 | 0.019359 | 0.014357 | 0.000395 | 0.000600 | -0.000003 | 0 | 2016-02-16 10:45:00 | NaN |
3 | 4 | 31.69 | 27.52 | 25.01 | 45.32 | 1001.69 | 1.57 | 53.63 | 186.03 | -49.341941 | ... | -12.615509 | -0.000575 | 0.019383 | 0.014409 | 0.000308 | 0.000577 | -0.000102 | 0 | 2016-02-16 10:45:10 | True |
4 | 5 | 31.66 | 27.54 | 25.01 | 45.18 | 1001.71 | 0.85 | 53.66 | 186.46 | -50.056683 | ... | -12.678341 | -0.000548 | 0.019378 | 0.014380 | 0.000321 | 0.000691 | 0.000272 | 0 | 2016-02-16 10:45:20 | NaN |
5 rows × 21 columns
Pandas is a very flexible library, and gives several methods to obtain the same results. For example, we can try the same operation as above with the command np.where
as down below. For example, we add a column telling if pressure is above or below the average:
[24]:
avg_pressure = df.pressure.values.mean()
df['check_p'] = np.where(df.pressure <= avg_pressure, 'sotto', 'sopra')
3.1 Exercise: Meteo stats¶
✪ Analyze data from Dataframe meteo
and find:
values of average pression, minimal and maximal
average temperature
the dates of rainy days
[25]:
# write here
print("Average pressure : %s" % meteo.Pressure.values.mean())
print("Minimal pressure : %s" % meteo.Pressure.values.min())
print("Maximal pressure : %s" % meteo.Pressure.values.max())
print("Average temperature : %s" % meteo.Temp.values.mean())
meteo[(meteo.Rain > 0)]
Average pressure : 986.3408269631689
Minimal pressure : 966.3
Maximal pressure : 998.3
Average temperature : 6.410701876302988
[25]:
Date | Pressure | Rain | Temp | |
---|---|---|---|---|
433 | 05/11/2017 12:15 | 979.2 | 0.2 | 8.6 |
435 | 05/11/2017 12:45 | 978.9 | 0.2 | 8.4 |
436 | 05/11/2017 13:00 | 979.0 | 0.2 | 8.4 |
437 | 05/11/2017 13:15 | 979.1 | 0.8 | 8.2 |
438 | 05/11/2017 13:30 | 979.0 | 0.6 | 8.2 |
439 | 05/11/2017 13:45 | 978.8 | 0.4 | 8.2 |
440 | 05/11/2017 14:00 | 978.7 | 0.8 | 8.2 |
441 | 05/11/2017 14:15 | 978.4 | 0.6 | 8.3 |
442 | 05/11/2017 14:30 | 978.2 | 0.6 | 8.2 |
443 | 05/11/2017 14:45 | 978.1 | 0.6 | 8.2 |
444 | 05/11/2017 15:00 | 978.1 | 0.4 | 8.1 |
445 | 05/11/2017 15:15 | 977.9 | 0.4 | 8.1 |
446 | 05/11/2017 15:30 | 977.9 | 0.4 | 8.1 |
448 | 05/11/2017 16:00 | 977.4 | 0.2 | 8.1 |
455 | 05/11/2017 17:45 | 977.1 | 0.2 | 8.1 |
456 | 05/11/2017 18:00 | 977.1 | 0.2 | 8.2 |
457 | 05/11/2017 18:15 | 977.1 | 0.2 | 8.2 |
458 | 05/11/2017 18:30 | 976.8 | 0.2 | 8.3 |
459 | 05/11/2017 18:45 | 976.7 | 0.4 | 8.3 |
460 | 05/11/2017 19:00 | 976.5 | 0.2 | 8.4 |
461 | 05/11/2017 19:15 | 976.5 | 0.2 | 8.5 |
462 | 05/11/2017 19:30 | 976.3 | 0.2 | 8.5 |
463 | 05/11/2017 19:45 | 976.1 | 0.4 | 8.6 |
464 | 05/11/2017 20:00 | 976.3 | 0.2 | 8.7 |
465 | 05/11/2017 20:15 | 976.1 | 0.4 | 8.7 |
466 | 05/11/2017 20:30 | 976.1 | 0.4 | 8.7 |
467 | 05/11/2017 20:45 | 976.2 | 0.2 | 8.7 |
468 | 05/11/2017 21:00 | 976.4 | 0.6 | 8.8 |
469 | 05/11/2017 21:15 | 976.4 | 0.6 | 8.7 |
470 | 05/11/2017 21:30 | 976.9 | 1.2 | 8.7 |
... | ... | ... | ... | ... |
1150 | 12/11/2017 23:45 | 970.1 | 0.6 | 5.3 |
1151 | 13/11/2017 00:00 | 969.9 | 0.4 | 5.6 |
1152 | 13/11/2017 00:15 | 970.1 | 0.6 | 5.5 |
1153 | 13/11/2017 00:30 | 970.4 | 0.6 | 5.1 |
1154 | 13/11/2017 00:45 | 970.4 | 0.6 | 5.2 |
1155 | 13/11/2017 01:00 | 970.4 | 0.2 | 4.7 |
1159 | 13/11/2017 02:00 | 969.5 | 0.2 | 5.4 |
2338 | 25/11/2017 09:15 | 985.9 | 0.2 | 5.0 |
2346 | 25/11/2017 11:15 | 984.6 | 0.2 | 5.0 |
2347 | 25/11/2017 11:30 | 984.2 | 0.4 | 5.0 |
2348 | 25/11/2017 11:45 | 984.1 | 0.2 | 4.8 |
2349 | 25/11/2017 12:00 | 983.7 | 0.2 | 4.9 |
2350 | 25/11/2017 12:15 | 983.6 | 0.2 | 4.9 |
2352 | 25/11/2017 12:45 | 983.2 | 0.2 | 4.9 |
2353 | 25/11/2017 13:00 | 983.0 | 0.2 | 5.0 |
2354 | 25/11/2017 13:15 | 982.6 | 0.2 | 5.0 |
2355 | 25/11/2017 13:30 | 982.5 | 0.2 | 4.9 |
2356 | 25/11/2017 13:45 | 982.4 | 0.2 | 4.9 |
2358 | 25/11/2017 14:15 | 982.0 | 0.2 | 4.8 |
2359 | 25/11/2017 14:30 | 982.1 | 0.2 | 4.8 |
2362 | 25/11/2017 15:15 | 981.5 | 0.2 | 4.9 |
2363 | 25/11/2017 15:30 | 981.2 | 0.2 | 5.0 |
2364 | 25/11/2017 15:45 | 981.1 | 0.2 | 5.0 |
2366 | 25/11/2017 16:15 | 981.0 | 0.2 | 5.0 |
2736 | 29/11/2017 12:45 | 978.0 | 0.2 | 0.9 |
2754 | 29/11/2017 17:15 | 976.1 | 0.2 | 0.9 |
2755 | 29/11/2017 17:30 | 975.9 | 0.2 | 0.9 |
2802 | 30/11/2017 05:15 | 971.3 | 0.2 | 1.3 |
2803 | 30/11/2017 05:30 | 971.3 | 0.2 | 1.1 |
2804 | 30/11/2017 05:45 | 971.5 | 0.2 | 1.1 |
107 rows × 4 columns
4. MatPlotLib review¶
We’ve already seen MatplotLib in the part on visualization, and today we use Matplotlib to display data.
Let’s take again an example, with the Matlab approach. We will plot a line passing two lists of coordinates, one for xs and one for ys:
[26]:
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
[27]:
x = [1,2,3,4]
y = [2,4,6,8]
plt.plot(x, y) # we can directly pass x and y lists
plt.title('Some number')
plt.show()

We can also create the series with numpy. Let’s try making a parabola:
[28]:
x = np.arange(0.,5.,0.1)
# '**' is the power operator in Python, NOT '^'
y = x**2
Let’s use the type
function to understand which data types are x and y:
[29]:
type(x)
[29]:
numpy.ndarray
[30]:
type(y)
[30]:
numpy.ndarray
Hence we have NumPy arrays.
[31]:
plt.title('The parabola')
plt.plot(x,y);

If we want the x axis units to be same as y axis, we can use function gca
To set x and y limits, we can use xlim
e ylim
:
[32]:
plt.xlim([0, 5])
plt.ylim([0,10])
plt.title('La parabola')
plt.gca().set_aspect('equal')
plt.plot(x,y);

Matplotlib plots from pandas datastructures¶
We can get plots directly from pandas data structures, always using the matlab style. Here there is documentation of DataFrame.plot. Let’s make an example. In case of big quantity of data, it may be useful to have a qualitative idea of data by putting them in a plot:
[33]:
df.humidity.plot(label="Humidity", legend=True)
# with secondary_y=True we display number on y axis
# of graph on the right
df.pressure.plot(secondary_y=True, label="Pressure", legend=True);

We can put pressure values on horizontal axis, and see which humidity values on vertical axis have a certain pressure:
[34]:
plt.plot(df['pressure'], df['humidity'])
[34]:
[<matplotlib.lines.Line2D at 0x7f8e7e6d0978>]

Let’s select in the new dataframe df2
the rows between the 12500th (included) and the 15000th (excluded):
[35]:
df2=df.iloc[12500:15000]
[36]:
plt.plot(df2['pressure'], df2['humidity'])
[36]:
[<matplotlib.lines.Line2D at 0x7f8e7e52f240>]

[37]:
df2.humidity.plot(label="Humidity", legend=True)
df2.pressure.plot(secondary_y=True, label="Pressure", legend=True)
[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e7e4b0710>

With corr
method we can see the correlation between DataFrame columns.
[38]:
df2.corr()
[38]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | mag_y | mag_z | accel_x | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ROW_ID | 1.000000 | 0.561540 | 0.636899 | 0.730764 | 0.945210 | 0.760732 | 0.005633 | 0.266995 | 0.172192 | -0.108713 | 0.057601 | -0.270656 | 0.015936 | 0.121838 | 0.075160 | -0.014346 | -0.026012 | 0.011714 | NaN |
temp_cpu | 0.561540 | 1.000000 | 0.591610 | 0.670043 | 0.488038 | 0.484902 | 0.025618 | 0.165540 | 0.056950 | -0.019815 | -0.028729 | -0.193077 | -0.021093 | 0.108878 | 0.065628 | -0.019478 | -0.007527 | -0.006737 | NaN |
temp_h | 0.636899 | 0.591610 | 1.000000 | 0.890775 | 0.539603 | 0.614536 | 0.022718 | 0.196767 | -0.024700 | -0.151336 | 0.031512 | -0.260633 | -0.009408 | 0.173037 | 0.129074 | -0.005255 | -0.017054 | -0.016113 | NaN |
temp_p | 0.730764 | 0.670043 | 0.890775 | 1.000000 | 0.620307 | 0.650015 | 0.019178 | 0.192621 | 0.007474 | -0.060122 | -0.039648 | -0.285640 | -0.034348 | 0.187457 | 0.144595 | -0.010679 | -0.016674 | -0.017010 | NaN |
humidity | 0.945210 | 0.488038 | 0.539603 | 0.620307 | 1.000000 | 0.750000 | 0.012247 | 0.231316 | 0.181905 | -0.108781 | 0.131218 | -0.191957 | 0.040452 | 0.069717 | 0.021627 | 0.005625 | -0.001927 | 0.014431 | NaN |
pressure | 0.760732 | 0.484902 | 0.614536 | 0.650015 | 0.750000 | 1.000000 | 0.037081 | 0.225112 | 0.070603 | -0.246485 | 0.194611 | -0.173808 | 0.085183 | -0.032049 | -0.068296 | -0.014838 | -0.008821 | 0.032056 | NaN |
pitch | 0.005633 | 0.025618 | 0.022718 | 0.019178 | 0.012247 | 0.037081 | 1.000000 | 0.068880 | 0.030448 | -0.008220 | -0.002278 | -0.019085 | 0.024460 | -0.053634 | -0.029345 | 0.040685 | 0.041674 | -0.024081 | NaN |
roll | 0.266995 | 0.165540 | 0.196767 | 0.192621 | 0.231316 | 0.225112 | 0.068880 | 1.000000 | -0.053750 | -0.281035 | -0.479779 | -0.665041 | 0.057330 | -0.049233 | -0.153524 | 0.139427 | 0.134319 | -0.078113 | NaN |
yaw | 0.172192 | 0.056950 | -0.024700 | 0.007474 | 0.181905 | 0.070603 | 0.030448 | -0.053750 | 1.000000 | 0.536693 | 0.300571 | 0.394324 | -0.028267 | 0.078585 | 0.068321 | -0.021071 | -0.009650 | 0.064290 | NaN |
mag_x | -0.108713 | -0.019815 | -0.151336 | -0.060122 | -0.108781 | -0.246485 | -0.008220 | -0.281035 | 0.536693 | 1.000000 | 0.046591 | 0.475674 | -0.097520 | 0.168764 | 0.115423 | -0.017739 | -0.006722 | 0.008456 | NaN |
mag_y | 0.057601 | -0.028729 | 0.031512 | -0.039648 | 0.131218 | 0.194611 | -0.002278 | -0.479779 | 0.300571 | 0.046591 | 1.000000 | 0.794756 | 0.046693 | -0.035111 | -0.022579 | -0.084045 | -0.061460 | 0.115327 | NaN |
mag_z | -0.270656 | -0.193077 | -0.260633 | -0.285640 | -0.191957 | -0.173808 | -0.019085 | -0.665041 | 0.394324 | 0.475674 | 0.794756 | 1.000000 | 0.001699 | -0.020016 | -0.006496 | -0.092749 | -0.060097 | 0.101276 | NaN |
accel_x | 0.015936 | -0.021093 | -0.009408 | -0.034348 | 0.040452 | 0.085183 | 0.024460 | 0.057330 | -0.028267 | -0.097520 | 0.046693 | 0.001699 | 1.000000 | -0.197363 | -0.174005 | -0.016811 | -0.013694 | -0.017850 | NaN |
accel_y | 0.121838 | 0.108878 | 0.173037 | 0.187457 | 0.069717 | -0.032049 | -0.053634 | -0.049233 | 0.078585 | 0.168764 | -0.035111 | -0.020016 | -0.197363 | 1.000000 | 0.424272 | -0.023942 | -0.054733 | 0.014870 | NaN |
accel_z | 0.075160 | 0.065628 | 0.129074 | 0.144595 | 0.021627 | -0.068296 | -0.029345 | -0.153524 | 0.068321 | 0.115423 | -0.022579 | -0.006496 | -0.174005 | 0.424272 | 1.000000 | 0.006313 | -0.011883 | -0.015390 | NaN |
gyro_x | -0.014346 | -0.019478 | -0.005255 | -0.010679 | 0.005625 | -0.014838 | 0.040685 | 0.139427 | -0.021071 | -0.017739 | -0.084045 | -0.092749 | -0.016811 | -0.023942 | 0.006313 | 1.000000 | 0.802471 | -0.012705 | NaN |
gyro_y | -0.026012 | -0.007527 | -0.017054 | -0.016674 | -0.001927 | -0.008821 | 0.041674 | 0.134319 | -0.009650 | -0.006722 | -0.061460 | -0.060097 | -0.013694 | -0.054733 | -0.011883 | 0.802471 | 1.000000 | -0.043332 | NaN |
gyro_z | 0.011714 | -0.006737 | -0.016113 | -0.017010 | 0.014431 | 0.032056 | -0.024081 | -0.078113 | 0.064290 | 0.008456 | 0.115327 | 0.101276 | -0.017850 | 0.014870 | -0.015390 | -0.012705 | -0.043332 | 1.000000 | NaN |
reset | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5. Calculating new columns¶
It is possible to obtain new columns by calculating them from other columns. For example, we get new column mag_tot
, that is the absolute magnetic field taken from space station by mag_x
, mag_y
, e mag_z
, and then plot it:
[39]:
df['mag_tot'] = df['mag_x']**2 + df['mag_y']**2 + df['mag_z']**2
df.mag_tot.plot()
[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e7e3e9ba8>

Let’s find when the magnetic field was maximal:
[40]:
df['time_stamp'][(df.mag_tot == df.mag_tot.values.max())]
[40]:
96156 2016-02-27 16:12:31
Name: time_stamp, dtype: object
By filling in the value found on the website isstracker.com/historical, we can find the positions where the magnetic field is at the highest.
5.1 Exercise: Meteo Fahrenheit temperature¶
In meteo
dataframe, create a column Temp (Fahrenheit)
with the temperature measured in Fahrenheit degrees.
Formula to calculate conversion from Celsius degrees (C):
\(Fahrenheit = \frac{9}{5}C + 32\)
[41]:
# write here
[42]:
# SOLUTION
print()
print(" ************** SOLUTION OUTPUT **************")
meteo['Temp (Fahrenheit)'] = meteo['Temp']* 9/5 + 32
meteo.head()
************** SOLUTION OUTPUT **************
[42]:
Date | Pressure | Rain | Temp | Temp (Fahrenheit) | |
---|---|---|---|---|---|
0 | 01/11/2017 00:00 | 995.4 | 0.0 | 5.4 | 41.72 |
1 | 01/11/2017 00:15 | 995.5 | 0.0 | 6.0 | 42.80 |
2 | 01/11/2017 00:30 | 995.5 | 0.0 | 5.9 | 42.62 |
3 | 01/11/2017 00:45 | 995.7 | 0.0 | 5.4 | 41.72 |
4 | 01/11/2017 01:00 | 995.7 | 0.0 | 5.3 | 41.54 |
5.2 Exercise: Pressure vs Temperature¶
Pressure should be directly proportional to temperature in a closed environment Gay-Lussac’s law:
\(\frac{P}{T} = k\)
Does this holds true for meteo
dataset? Try to find out by direct calculation of the formula and compare with corr()
method results.
[43]:
# SOLUTION
# as expected, in an open environment there is not much linear correlation
#meteo.corr()
#meteo['Pressure'] / meteo['Temp']
[ ]:
6. Object values¶
In general, when we want to manipulate objects of a known type, say strings which have type str
, we can write .str
after a series and then treat the result like it were a single string, using any operator (es: slicing) or method that particular class allows us plus others provided by pandas. (for text in particular there are various ways to manipulate it, for more details (see pandas documentation)
Filter by textual values¶
When we want to filter by text values, we can use .str.contains
, here for example we select all the samples in the last days of february (which have timestamp containing 2016-02-2
) :
[44]:
df[ df['time_stamp'].str.contains('2016-02-2') ]
[44]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | ... | accel_y | accel_z | gyro_x | gyro_y | gyro_z | reset | time_stamp | Too hot | check_p | mag_tot | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30442 | 30443 | 32.30 | 28.12 | 25.59 | 45.05 | 1008.01 | 1.47 | 51.82 | 51.18 | 9.215883 | ... | 0.018792 | 0.014558 | -0.000042 | 0.000275 | 0.000157 | 0 | 2016-02-20 00:00:00 | True | sotto | 269.091903 |
30443 | 30444 | 32.25 | 28.13 | 25.59 | 44.82 | 1008.02 | 0.81 | 51.53 | 52.21 | 8.710130 | ... | 0.019290 | 0.014667 | 0.000260 | 0.001011 | 0.000149 | 0 | 2016-02-20 00:00:10 | True | sotto | 260.866157 |
30444 | 30445 | 33.07 | 28.13 | 25.59 | 45.08 | 1008.09 | 0.68 | 51.69 | 57.36 | 7.383435 | ... | 0.018714 | 0.014598 | 0.000299 | 0.000343 | -0.000025 | 0 | 2016-02-20 00:00:41 | True | sotto | 265.421154 |
30445 | 30446 | 32.63 | 28.10 | 25.60 | 44.87 | 1008.07 | 1.42 | 52.13 | 59.95 | 7.292313 | ... | 0.018857 | 0.014565 | 0.000160 | 0.000349 | -0.000190 | 0 | 2016-02-20 00:00:50 | True | sotto | 269.572476 |
30446 | 30447 | 32.55 | 28.11 | 25.60 | 44.94 | 1008.07 | 1.41 | 51.86 | 61.83 | 6.699141 | ... | 0.018871 | 0.014564 | -0.000608 | -0.000381 | -0.000243 | 0 | 2016-02-20 00:01:01 | True | sotto | 262.510966 |
30447 | 30448 | 32.47 | 28.12 | 25.61 | 44.83 | 1008.08 | 1.84 | 51.75 | 64.10 | 6.339477 | ... | 0.018833 | 0.014691 | -0.000233 | -0.000403 | -0.000337 | 0 | 2016-02-20 00:01:10 | True | sotto | 273.997653 |
30448 | 30449 | 32.41 | 28.11 | 25.61 | 45.00 | 1008.10 | 2.35 | 51.87 | 66.59 | 5.861904 | ... | 0.018828 | 0.014534 | -0.000225 | -0.000292 | -0.000004 | 0 | 2016-02-20 00:01:20 | True | sotto | 272.043915 |
30449 | 30450 | 32.41 | 28.12 | 25.61 | 45.02 | 1008.10 | 1.41 | 51.92 | 68.70 | 5.235877 | ... | 0.018724 | 0.014255 | 0.000134 | -0.000310 | -0.000101 | 0 | 2016-02-20 00:01:30 | True | sotto | 268.608057 |
30450 | 30451 | 32.38 | 28.12 | 25.61 | 45.00 | 1008.12 | 1.46 | 52.04 | 70.98 | 4.775404 | ... | 0.018730 | 0.014372 | 0.000319 | 0.000079 | -0.000215 | 0 | 2016-02-20 00:01:40 | True | sotto | 271.750032 |
30451 | 30452 | 32.36 | 28.13 | 25.61 | 44.97 | 1008.12 | 1.18 | 51.78 | 73.10 | 4.300375 | ... | 0.018814 | 0.014518 | -0.000023 | 0.000186 | -0.000118 | 0 | 2016-02-20 00:01:51 | True | sotto | 277.538126 |
30452 | 30453 | 32.38 | 28.12 | 25.61 | 45.10 | 1008.12 | 1.08 | 51.81 | 74.90 | 3.763551 | ... | 0.018526 | 0.014454 | -0.000184 | -0.000075 | -0.000077 | 0 | 2016-02-20 00:02:00 | True | sotto | 268.391448 |
30453 | 30454 | 32.33 | 28.12 | 25.61 | 44.96 | 1008.14 | 1.45 | 51.79 | 77.31 | 3.228626 | ... | 0.018607 | 0.014330 | -0.000269 | -0.000547 | -0.000262 | 0 | 2016-02-20 00:02:11 | True | sopra | 271.942019 |
30454 | 30455 | 32.32 | 28.14 | 25.61 | 44.86 | 1008.12 | 1.89 | 51.95 | 78.88 | 2.888813 | ... | 0.018698 | 0.014548 | -0.000081 | -0.000079 | -0.000240 | 0 | 2016-02-20 00:02:20 | True | sotto | 264.664070 |
30455 | 30456 | 32.39 | 28.13 | 25.61 | 45.01 | 1008.12 | 1.49 | 51.60 | 80.46 | 2.447253 | ... | 0.018427 | 0.014576 | -0.000349 | -0.000269 | -0.000198 | 0 | 2016-02-20 00:02:31 | True | sotto | 267.262186 |
30456 | 30457 | 32.34 | 28.09 | 25.61 | 45.02 | 1008.14 | 1.18 | 51.74 | 82.41 | 1.983143 | ... | 0.018866 | 0.014438 | 0.000248 | 0.000172 | -0.000474 | 0 | 2016-02-20 00:02:40 | True | sopra | 270.414588 |
30457 | 30458 | 32.34 | 28.11 | 25.61 | 45.02 | 1008.16 | 1.92 | 51.72 | 84.46 | 1.623884 | ... | 0.018729 | 0.014770 | 0.000417 | 0.000231 | -0.000171 | 0 | 2016-02-20 00:02:50 | True | sopra | 278.210856 |
30458 | 30459 | 32.33 | 28.10 | 25.61 | 44.85 | 1008.18 | 1.99 | 52.06 | 86.72 | 1.050999 | ... | 0.018867 | 0.014592 | 0.000377 | 0.000270 | -0.000074 | 0 | 2016-02-20 00:03:00 | True | sopra | 288.728974 |
30459 | 30460 | 32.35 | 28.11 | 25.61 | 44.98 | 1008.15 | 1.38 | 51.78 | 89.42 | 0.297179 | ... | 0.018609 | 0.014593 | 0.000622 | 0.000364 | -0.000134 | 0 | 2016-02-20 00:03:10 | True | sopra | 303.816530 |
30460 | 30461 | 32.34 | 28.11 | 25.61 | 44.93 | 1008.18 | 1.41 | 51.66 | 91.11 | -0.136305 | ... | 0.018504 | 0.014502 | -0.000049 | -0.000104 | -0.000286 | 0 | 2016-02-20 00:03:21 | True | sopra | 305.475482 |
30461 | 30462 | 32.29 | 28.11 | 25.61 | 44.90 | 1008.18 | 1.33 | 51.99 | 93.09 | -0.659496 | ... | 0.018584 | 0.014593 | 0.000132 | -0.000542 | -0.000221 | 0 | 2016-02-20 00:03:30 | True | sopra | 306.437506 |
30462 | 30463 | 32.32 | 28.12 | 25.61 | 45.04 | 1008.17 | 1.30 | 51.93 | 94.25 | -1.002867 | ... | 0.018703 | 0.014584 | 0.000245 | 0.000074 | -0.000308 | 0 | 2016-02-20 00:03:41 | True | sopra | 318.703894 |
30463 | 30464 | 32.30 | 28.12 | 25.61 | 44.86 | 1008.16 | 0.98 | 51.78 | 96.42 | -1.634671 | ... | 0.018833 | 0.014771 | 0.000343 | -0.000154 | -0.000286 | 0 | 2016-02-20 00:03:50 | True | sopra | 324.412585 |
30464 | 30465 | 32.31 | 28.10 | 25.60 | 44.96 | 1008.18 | 1.82 | 51.95 | 98.65 | -2.204607 | ... | 0.018867 | 0.014664 | -0.000058 | -0.000366 | -0.000091 | 0 | 2016-02-20 00:04:01 | True | sopra | 331.006515 |
30465 | 30466 | 32.34 | 28.11 | 25.60 | 45.07 | 1008.19 | 1.14 | 51.69 | 101.53 | -3.065968 | ... | 0.018461 | 0.014735 | 0.000263 | -0.000071 | -0.000370 | 0 | 2016-02-20 00:04:10 | True | sopra | 332.503688 |
30466 | 30467 | 32.37 | 28.12 | 25.61 | 44.92 | 1008.19 | 1.73 | 51.94 | 103.40 | -3.533967 | ... | 0.018810 | 0.014541 | 0.000442 | 0.000022 | -0.000193 | 0 | 2016-02-20 00:04:20 | True | sopra | 330.051496 |
30467 | 30468 | 32.32 | 28.11 | 25.60 | 44.98 | 1008.18 | 1.45 | 51.67 | 104.59 | -4.009444 | ... | 0.018657 | 0.014586 | -0.000125 | 0.000013 | 0.000209 | 0 | 2016-02-20 00:04:31 | True | sopra | 340.085476 |
30468 | 30469 | 32.32 | 28.12 | 25.60 | 44.98 | 1008.20 | 1.66 | 51.85 | 105.99 | -4.438902 | ... | 0.019021 | 0.014753 | -0.000055 | 0.000126 | 0.000070 | 0 | 2016-02-20 00:04:40 | True | sopra | 354.350961 |
30469 | 30470 | 32.30 | 28.12 | 25.60 | 44.93 | 1008.20 | 1.45 | 51.89 | 107.38 | -4.940700 | ... | 0.018959 | 0.014662 | 0.000046 | -0.000504 | 0.000041 | 0 | 2016-02-20 00:04:51 | True | sopra | 364.753950 |
30470 | 30471 | 32.28 | 28.11 | 25.60 | 44.88 | 1008.21 | 1.78 | 51.88 | 108.78 | -5.444541 | ... | 0.019012 | 0.014606 | -0.000177 | -0.000407 | -0.000427 | 0 | 2016-02-20 00:05:00 | True | sopra | 379.362654 |
30471 | 30472 | 32.33 | 28.10 | 25.60 | 44.96 | 1008.21 | 1.76 | 51.88 | 110.70 | -6.101692 | ... | 0.018822 | 0.014834 | 0.000044 | 0.000042 | -0.000327 | 0 | 2016-02-20 00:05:11 | True | sopra | 388.749366 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
110839 | 110840 | 31.60 | 27.49 | 24.82 | 42.74 | 1005.83 | 1.12 | 49.34 | 90.42 | 0.319629 | ... | 0.017461 | 0.014988 | -0.000209 | -0.000005 | 0.000138 | 0 | 2016-02-29 09:20:10 | NaN | sotto | 574.877314 |
110840 | 110841 | 31.59 | 27.48 | 24.82 | 42.75 | 1005.82 | 2.04 | 49.53 | 92.11 | 0.015879 | ... | 0.017413 | 0.014565 | -0.000472 | -0.000478 | 0.000126 | 0 | 2016-02-29 09:20:20 | NaN | sotto | 593.855683 |
110841 | 110842 | 31.59 | 27.51 | 24.82 | 42.76 | 1005.82 | 1.31 | 49.19 | 93.94 | -0.658624 | ... | 0.017516 | 0.015014 | -0.000590 | -0.000372 | 0.000207 | 0 | 2016-02-29 09:20:31 | NaN | sotto | 604.215692 |
110842 | 110843 | 31.60 | 27.50 | 24.82 | 42.74 | 1005.85 | 1.19 | 48.91 | 95.57 | -1.117541 | ... | 0.017400 | 0.014982 | -0.000039 | 0.000059 | 0.000149 | 0 | 2016-02-29 09:20:40 | NaN | sotto | 606.406098 |
110843 | 110844 | 31.57 | 27.49 | 24.82 | 42.80 | 1005.83 | 1.49 | 49.17 | 98.11 | -1.860475 | ... | 0.017580 | 0.014704 | 0.000223 | 0.000278 | 0.000038 | 0 | 2016-02-29 09:20:51 | NaN | sotto | 622.733559 |
110844 | 110845 | 31.60 | 27.50 | 24.82 | 42.81 | 1005.84 | 1.47 | 49.46 | 99.67 | -2.286044 | ... | 0.017428 | 0.014325 | -0.000283 | -0.000187 | 0.000077 | 0 | 2016-02-29 09:21:00 | NaN | sotto | 641.480748 |
110845 | 110846 | 31.61 | 27.50 | 24.82 | 42.81 | 1005.82 | 2.28 | 49.27 | 103.17 | -3.182359 | ... | 0.017537 | 0.014575 | -0.000451 | -0.000100 | -0.000351 | 0 | 2016-02-29 09:21:10 | NaN | sotto | 633.949204 |
110846 | 110847 | 31.61 | 27.50 | 24.82 | 42.75 | 1005.84 | 2.18 | 49.64 | 105.05 | -3.769940 | ... | 0.017739 | 0.014926 | 0.000476 | 0.000452 | -0.000249 | 0 | 2016-02-29 09:21:20 | NaN | sotto | 643.508698 |
110847 | 110848 | 31.58 | 27.50 | 24.82 | 43.00 | 1005.85 | 2.52 | 49.31 | 107.23 | -4.431722 | ... | 0.017588 | 0.015077 | 0.000822 | 0.000739 | -0.000012 | 0 | 2016-02-29 09:21:30 | NaN | sotto | 658.512439 |
110848 | 110849 | 31.54 | 27.51 | 24.82 | 42.76 | 1005.84 | 2.35 | 49.55 | 108.68 | -4.944477 | ... | 0.017487 | 0.014864 | 0.000613 | 0.000763 | -0.000227 | 0 | 2016-02-29 09:21:41 | NaN | sotto | 667.095455 |
110849 | 110850 | 31.60 | 27.50 | 24.82 | 42.79 | 1005.82 | 2.33 | 48.79 | 109.52 | -5.481255 | ... | 0.017455 | 0.014638 | 0.000196 | 0.000519 | -0.000234 | 0 | 2016-02-29 09:21:50 | NaN | sotto | 689.714415 |
110850 | 110851 | 31.61 | 27.50 | 24.82 | 42.79 | 1005.85 | 2.11 | 49.66 | 111.90 | -6.263577 | ... | 0.017489 | 0.014960 | 0.000029 | -0.000098 | -0.000073 | 0 | 2016-02-29 09:22:01 | NaN | sotto | 707.304506 |
110851 | 110852 | 31.56 | 27.50 | 24.83 | 42.84 | 1005.83 | 1.68 | 49.91 | 113.38 | -6.844946 | ... | 0.017778 | 0.014703 | -0.000177 | -0.000452 | -0.000232 | 0 | 2016-02-29 09:22:10 | NaN | sotto | 726.361255 |
110852 | 110853 | 31.59 | 27.51 | 24.83 | 42.76 | 1005.82 | 2.26 | 49.17 | 114.42 | -7.437300 | ... | 0.017733 | 0.014838 | 0.000396 | 0.000400 | -0.000188 | 0 | 2016-02-29 09:22:21 | NaN | sotto | 743.185242 |
110853 | 110854 | 31.58 | 27.50 | 24.83 | 42.98 | 1005.83 | 1.96 | 49.41 | 116.50 | -8.271114 | ... | 0.017490 | 0.014582 | 0.000285 | 0.000312 | -0.000058 | 0 | 2016-02-29 09:22:30 | NaN | sotto | 767.328522 |
110854 | 110855 | 31.61 | 27.51 | 24.83 | 42.69 | 1005.84 | 2.27 | 49.39 | 117.61 | -8.690470 | ... | 0.017465 | 0.014720 | -0.000001 | 0.000371 | -0.000274 | 0 | 2016-02-29 09:22:40 | NaN | sotto | 791.907055 |
110855 | 110856 | 31.55 | 27.50 | 24.83 | 42.79 | 1005.83 | 1.51 | 48.98 | 119.13 | -9.585351 | ... | 0.017554 | 0.014910 | -0.000115 | 0.000029 | -0.000223 | 0 | 2016-02-29 09:22:50 | NaN | sotto | 802.932850 |
110856 | 110857 | 31.55 | 27.49 | 24.83 | 42.81 | 1005.82 | 2.12 | 49.95 | 120.81 | -10.120745 | ... | 0.017494 | 0.014718 | -0.000150 | 0.000147 | -0.000320 | 0 | 2016-02-29 09:23:00 | NaN | sotto | 820.194642 |
110857 | 110858 | 31.60 | 27.51 | 24.83 | 42.92 | 1005.82 | 1.53 | 49.33 | 121.74 | -10.657858 | ... | 0.017544 | 0.014762 | 0.000161 | 0.000029 | -0.000210 | 0 | 2016-02-29 09:23:11 | NaN | sotto | 815.462202 |
110858 | 110859 | 31.58 | 27.50 | 24.83 | 42.81 | 1005.83 | 1.60 | 49.65 | 123.50 | -11.584851 | ... | 0.017608 | 0.015093 | -0.000073 | 0.000158 | -0.000006 | 0 | 2016-02-29 09:23:20 | NaN | sotto | 851.154631 |
110859 | 110860 | 31.61 | 27.50 | 24.83 | 42.82 | 1005.84 | 2.65 | 49.47 | 124.51 | -12.089743 | ... | 0.017433 | 0.014930 | 0.000428 | 0.000137 | 0.000201 | 0 | 2016-02-29 09:23:31 | NaN | sotto | 879.563826 |
110860 | 110861 | 31.57 | 27.50 | 24.83 | 42.80 | 1005.84 | 2.63 | 50.08 | 125.85 | -12.701497 | ... | 0.017805 | 0.014939 | 0.000263 | 0.000163 | 0.000031 | 0 | 2016-02-29 09:23:40 | NaN | sotto | 895.543882 |
110861 | 110862 | 31.58 | 27.51 | 24.83 | 42.90 | 1005.85 | 1.70 | 49.81 | 126.86 | -13.393369 | ... | 0.017577 | 0.015026 | -0.000077 | 0.000179 | 0.000148 | 0 | 2016-02-29 09:23:50 | NaN | sotto | 928.948693 |
110862 | 110863 | 31.60 | 27.51 | 24.83 | 42.80 | 1005.85 | 1.66 | 49.13 | 127.35 | -13.990712 | ... | 0.017508 | 0.014478 | 0.000119 | -0.000204 | 0.000041 | 0 | 2016-02-29 09:24:01 | NaN | sotto | 957.695014 |
110863 | 110864 | 31.64 | 27.51 | 24.83 | 42.80 | 1005.85 | 1.91 | 49.31 | 128.62 | -14.691672 | ... | 0.017789 | 0.014891 | 0.000286 | 0.000103 | 0.000221 | 0 | 2016-02-29 09:24:10 | NaN | sotto | 971.126355 |
110864 | 110865 | 31.56 | 27.52 | 24.83 | 42.94 | 1005.83 | 1.58 | 49.93 | 129.60 | -15.169673 | ... | 0.017743 | 0.014646 | -0.000264 | 0.000206 | 0.000196 | 0 | 2016-02-29 09:24:21 | NaN | sotto | 996.676408 |
110865 | 110866 | 31.55 | 27.50 | 24.83 | 42.72 | 1005.85 | 1.89 | 49.92 | 130.51 | -15.832622 | ... | 0.017570 | 0.014855 | 0.000143 | 0.000199 | -0.000024 | 0 | 2016-02-29 09:24:30 | NaN | sotto | 1022.779594 |
110866 | 110867 | 31.58 | 27.50 | 24.83 | 42.83 | 1005.85 | 2.09 | 50.00 | 132.04 | -16.646212 | ... | 0.017657 | 0.014799 | 0.000537 | 0.000257 | 0.000057 | 0 | 2016-02-29 09:24:41 | NaN | sotto | 1048.121268 |
110867 | 110868 | 31.62 | 27.50 | 24.83 | 42.81 | 1005.88 | 2.88 | 49.69 | 133.00 | -17.270447 | ... | 0.017635 | 0.014877 | 0.000534 | 0.000456 | 0.000195 | 0 | 2016-02-29 09:24:50 | NaN | sotto | 1073.629703 |
110868 | 110869 | 31.57 | 27.51 | 24.83 | 42.94 | 1005.86 | 2.17 | 49.77 | 134.18 | -17.885872 | ... | 0.017261 | 0.014380 | 0.000459 | 0.000076 | 0.000030 | 0 | 2016-02-29 09:25:00 | NaN | sotto | 1095.760426 |
80427 rows × 23 columns
Extracting strings¶
To extract only the day from timestamp
column, we can use str
and use slice operator with square brackets:
[45]:
df['time_stamp'].str[8:10]
[45]:
0 16
1 16
2 16
3 16
4 16
5 16
6 16
7 16
8 16
9 16
10 16
11 16
12 16
13 16
14 16
15 16
16 16
17 16
18 16
19 16
20 16
21 16
22 16
23 16
24 16
25 16
26 16
27 16
28 16
29 16
..
110839 29
110840 29
110841 29
110842 29
110843 29
110844 29
110845 29
110846 29
110847 29
110848 29
110849 29
110850 29
110851 29
110852 29
110853 29
110854 29
110855 29
110856 29
110857 29
110858 29
110859 29
110860 29
110861 29
110862 29
110863 29
110864 29
110865 29
110866 29
110867 29
110868 29
Name: time_stamp, Length: 110869, dtype: object
[46]:
count, division = np.histogram(df['temp_h'])
print(count)
print(division)
[ 2242 8186 15692 22738 20114 24683 9371 5856 1131 856]
[27.2 27.408 27.616 27.824 28.032 28.24 28.448 28.656 28.864 29.072
29.28 ]
7. Transforming¶
Suppose we want to convert all values of column temperature which are floats to integers.
We know that to convert a float to an integer there the predefined python function int
[47]:
int(23.7)
[47]:
23
We would like to apply such function to all the elements of the column humidity
.
To do so, we can call the transform
method and pass to it the function int
as a parameter
NOTE: there are no round parenthesis after int
!!!
[48]:
df['humidity'].transform(int)
[48]:
0 44
1 45
2 45
3 45
4 45
5 45
6 45
7 45
8 45
9 45
10 45
11 45
12 45
13 45
14 45
15 45
16 45
17 45
18 45
19 45
20 45
21 45
22 45
23 45
24 45
25 45
26 45
27 45
28 45
29 45
..
110839 42
110840 42
110841 42
110842 42
110843 42
110844 42
110845 42
110846 42
110847 43
110848 42
110849 42
110850 42
110851 42
110852 42
110853 42
110854 42
110855 42
110856 42
110857 42
110858 42
110859 42
110860 42
110861 42
110862 42
110863 42
110864 42
110865 42
110866 42
110867 42
110868 42
Name: humidity, Length: 110869, dtype: int64
Just to be clear what passing a function means, let’s see other two completely equivalent ways we could have used to pass the function:
Defining a function: We could have defined a function myf
like this (notice the function MUST RETURN something !)
[49]:
def myf(x):
return int(x)
df['humidity'].transform(myf)
[49]:
0 44
1 45
2 45
3 45
4 45
5 45
6 45
7 45
8 45
9 45
10 45
11 45
12 45
13 45
14 45
15 45
16 45
17 45
18 45
19 45
20 45
21 45
22 45
23 45
24 45
25 45
26 45
27 45
28 45
29 45
..
110839 42
110840 42
110841 42
110842 42
110843 42
110844 42
110845 42
110846 42
110847 43
110848 42
110849 42
110850 42
110851 42
110852 42
110853 42
110854 42
110855 42
110856 42
110857 42
110858 42
110859 42
110860 42
110861 42
110862 42
110863 42
110864 42
110865 42
110866 42
110867 42
110868 42
Name: humidity, Length: 110869, dtype: int64
lamda function: We could have used as well a lambda function, that is, a function without a name which is defined on one line:
[50]:
df['humidity'].transform( lambda x: int(x) )
[50]:
0 44
1 45
2 45
3 45
4 45
5 45
6 45
7 45
8 45
9 45
10 45
11 45
12 45
13 45
14 45
15 45
16 45
17 45
18 45
19 45
20 45
21 45
22 45
23 45
24 45
25 45
26 45
27 45
28 45
29 45
..
110839 42
110840 42
110841 42
110842 42
110843 42
110844 42
110845 42
110846 42
110847 43
110848 42
110849 42
110850 42
110851 42
110852 42
110853 42
110854 42
110855 42
110856 42
110857 42
110858 42
110859 42
110860 42
110861 42
110862 42
110863 42
110864 42
110865 42
110866 42
110867 42
110868 42
Name: humidity, Length: 110869, dtype: int64
Regardless of the way we choose to pass the function, transform
method does not change the original dataframe:
[51]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110869 entries, 0 to 110868
Data columns (total 23 columns):
ROW_ID 110869 non-null int64
temp_cpu 110869 non-null float64
temp_h 110869 non-null float64
temp_p 110869 non-null float64
humidity 110869 non-null float64
pressure 110869 non-null float64
pitch 110869 non-null float64
roll 110869 non-null float64
yaw 110869 non-null float64
mag_x 110869 non-null float64
mag_y 110869 non-null float64
mag_z 110869 non-null float64
accel_x 110869 non-null float64
accel_y 110869 non-null float64
accel_z 110869 non-null float64
gyro_x 110869 non-null float64
gyro_y 110869 non-null float64
gyro_z 110869 non-null float64
reset 110869 non-null int64
time_stamp 110869 non-null object
Too hot 105315 non-null object
check_p 110869 non-null object
mag_tot 110869 non-null float64
dtypes: float64(18), int64(2), object(3)
memory usage: 19.5+ MB
If we want to add a new column, say huimdity_int
, we have to explicitly assigne the result of transform to a new series:
[52]:
df['humidity_int'] = df['humidity'].transform( lambda x: int(x) )
Notice how pandas automatically infers type int64
for the newly created column:
[53]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110869 entries, 0 to 110868
Data columns (total 24 columns):
ROW_ID 110869 non-null int64
temp_cpu 110869 non-null float64
temp_h 110869 non-null float64
temp_p 110869 non-null float64
humidity 110869 non-null float64
pressure 110869 non-null float64
pitch 110869 non-null float64
roll 110869 non-null float64
yaw 110869 non-null float64
mag_x 110869 non-null float64
mag_y 110869 non-null float64
mag_z 110869 non-null float64
accel_x 110869 non-null float64
accel_y 110869 non-null float64
accel_z 110869 non-null float64
gyro_x 110869 non-null float64
gyro_y 110869 non-null float64
gyro_z 110869 non-null float64
reset 110869 non-null int64
time_stamp 110869 non-null object
Too hot 105315 non-null object
check_p 110869 non-null object
mag_tot 110869 non-null float64
humidity_int 110869 non-null int64
dtypes: float64(18), int64(3), object(3)
memory usage: 20.3+ MB
8. Grouping¶
Reference:
It is pretty easy to group items and perform aggregated calculations by using groupby
method. Let’s say we want to count how many huidity readings were taken for each integer humidity (here we use pandas groupby, but for histograms you could also use numpy)
After groupby
we can use count()
aggregation function (other common ones are sum()
, mean()
, min()
, max()
):
[54]:
df.groupby(['humidity_int'])['humidity'].count()
[54]:
humidity_int
42 2776
43 2479
44 13029
45 32730
46 35775
47 14176
48 7392
49 297
50 155
51 205
52 209
53 128
54 224
55 164
56 139
57 183
58 237
59 271
60 300
Name: humidity, dtype: int64
Notice we got only 19 rows. To have a series that fills the whole table, assigning to each row the count of its own group, we can use transform
like this:
[55]:
df.groupby(['humidity_int'])['humidity'].transform('count')
[55]:
0 13029
1 32730
2 32730
3 32730
4 32730
5 32730
6 32730
7 32730
8 32730
9 32730
10 32730
11 32730
12 32730
13 32730
14 32730
15 32730
16 32730
17 32730
18 32730
19 32730
20 32730
21 32730
22 32730
23 32730
24 32730
25 32730
26 32730
27 32730
28 32730
29 32730
...
110839 2776
110840 2776
110841 2776
110842 2776
110843 2776
110844 2776
110845 2776
110846 2776
110847 2479
110848 2776
110849 2776
110850 2776
110851 2776
110852 2776
110853 2776
110854 2776
110855 2776
110856 2776
110857 2776
110858 2776
110859 2776
110860 2776
110861 2776
110862 2776
110863 2776
110864 2776
110865 2776
110866 2776
110867 2776
110868 2776
Name: humidity, Length: 110869, dtype: int64
As usual, group_by
does not modify the dataframe, if we want the result stored in the dataframe we need to assign the result to a new column:
[56]:
df['Humidity counts'] = df.groupby(['humidity_int'])['humidity'].transform('count')
[57]:
df
[57]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | ... | gyro_x | gyro_y | gyro_z | reset | time_stamp | Too hot | check_p | mag_tot | humidity_int | Humidity counts | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 31.88 | 27.57 | 25.01 | 44.94 | 1001.68 | 1.49 | 52.25 | 185.21 | -46.422753 | ... | 0.000942 | 0.000492 | -0.000750 | 20 | 2016-02-16 10:44:40 | True | sotto | 2368.337207 | 44 | 13029 |
1 | 2 | 31.79 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.03 | 53.73 | 186.72 | -48.778951 | ... | 0.000218 | -0.000005 | -0.000235 | 0 | 2016-02-16 10:44:50 | True | sotto | 2615.870247 | 45 | 32730 |
2 | 3 | 31.66 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.24 | 53.57 | 186.21 | -49.161878 | ... | 0.000395 | 0.000600 | -0.000003 | 0 | 2016-02-16 10:45:00 | NaN | sotto | 2648.484927 | 45 | 32730 |
3 | 4 | 31.69 | 27.52 | 25.01 | 45.32 | 1001.69 | 1.57 | 53.63 | 186.03 | -49.341941 | ... | 0.000308 | 0.000577 | -0.000102 | 0 | 2016-02-16 10:45:10 | True | sotto | 2665.305485 | 45 | 32730 |
4 | 5 | 31.66 | 27.54 | 25.01 | 45.18 | 1001.71 | 0.85 | 53.66 | 186.46 | -50.056683 | ... | 0.000321 | 0.000691 | 0.000272 | 0 | 2016-02-16 10:45:20 | NaN | sotto | 2732.388620 | 45 | 32730 |
5 | 6 | 31.69 | 27.55 | 25.01 | 45.12 | 1001.67 | 0.85 | 53.53 | 185.52 | -50.246476 | ... | 0.000273 | 0.000494 | -0.000059 | 0 | 2016-02-16 10:45:30 | True | sotto | 2736.836291 | 45 | 32730 |
6 | 7 | 31.68 | 27.53 | 25.01 | 45.31 | 1001.70 | 0.63 | 53.55 | 186.10 | -50.447346 | ... | -0.000111 | 0.000320 | 0.000222 | 0 | 2016-02-16 10:45:41 | NaN | sotto | 2756.496929 | 45 | 32730 |
7 | 8 | 31.66 | 27.55 | 25.01 | 45.34 | 1001.70 | 1.49 | 53.65 | 186.08 | -50.668232 | ... | -0.000044 | 0.000436 | 0.000301 | 0 | 2016-02-16 10:45:50 | NaN | sotto | 2778.429164 | 45 | 32730 |
8 | 9 | 31.67 | 27.54 | 25.01 | 45.20 | 1001.72 | 1.22 | 53.77 | 186.55 | -50.761529 | ... | 0.000358 | 0.000651 | 0.000187 | 0 | 2016-02-16 10:46:01 | NaN | sotto | 2773.029554 | 45 | 32730 |
9 | 10 | 31.67 | 27.54 | 25.01 | 45.41 | 1001.75 | 1.63 | 53.46 | 185.94 | -51.243832 | ... | 0.000266 | 0.000676 | 0.000356 | 0 | 2016-02-16 10:46:10 | NaN | sotto | 2809.446772 | 45 | 32730 |
10 | 11 | 31.68 | 27.53 | 25.00 | 45.16 | 1001.72 | 1.32 | 53.52 | 186.24 | -51.616473 | ... | 0.000268 | 0.001194 | 0.000106 | 0 | 2016-02-16 10:46:20 | NaN | sotto | 2851.426683 | 45 | 32730 |
11 | 12 | 31.67 | 27.52 | 25.00 | 45.48 | 1001.72 | 1.51 | 53.47 | 186.17 | -51.781714 | ... | 0.000859 | 0.001221 | 0.000264 | 0 | 2016-02-16 10:46:30 | NaN | sotto | 2864.856376 | 45 | 32730 |
12 | 13 | 31.63 | 27.53 | 25.00 | 45.20 | 1001.72 | 1.55 | 53.75 | 186.38 | -51.992696 | ... | 0.000589 | 0.001151 | 0.000002 | 0 | 2016-02-16 10:46:40 | NaN | sotto | 2880.392591 | 45 | 32730 |
13 | 14 | 31.69 | 27.53 | 25.00 | 45.28 | 1001.71 | 1.07 | 53.63 | 186.60 | -52.409175 | ... | 0.000497 | 0.000610 | -0.000060 | 0 | 2016-02-16 10:46:50 | True | sotto | 2921.288936 | 45 | 32730 |
14 | 15 | 31.70 | 27.52 | 25.00 | 45.14 | 1001.72 | 0.81 | 53.40 | 186.32 | -52.648488 | ... | -0.000053 | 0.000593 | -0.000141 | 0 | 2016-02-16 10:47:00 | True | sotto | 2946.615432 | 45 | 32730 |
15 | 16 | 31.72 | 27.53 | 25.00 | 45.31 | 1001.75 | 1.51 | 53.34 | 186.42 | -52.850708 | ... | -0.000238 | 0.000495 | 0.000156 | 0 | 2016-02-16 10:47:11 | True | sotto | 2967.640766 | 45 | 32730 |
16 | 17 | 31.71 | 27.52 | 25.00 | 45.14 | 1001.72 | 1.82 | 53.49 | 186.39 | -53.449140 | ... | 0.000571 | 0.000770 | 0.000331 | 0 | 2016-02-16 10:47:20 | True | sotto | 3029.683044 | 45 | 32730 |
17 | 18 | 31.67 | 27.53 | 25.00 | 45.23 | 1001.71 | 0.46 | 53.69 | 186.72 | -53.679986 | ... | -0.000187 | 0.000159 | 0.000386 | 0 | 2016-02-16 10:47:31 | NaN | sotto | 3052.251538 | 45 | 32730 |
18 | 19 | 31.67 | 27.53 | 25.00 | 45.28 | 1001.71 | 0.67 | 53.55 | 186.61 | -54.159015 | ... | -0.000495 | 0.000094 | 0.000084 | 0 | 2016-02-16 10:47:40 | NaN | sotto | 3095.501435 | 45 | 32730 |
19 | 20 | 31.69 | 27.53 | 25.00 | 45.21 | 1001.71 | 1.23 | 53.43 | 186.21 | -54.400646 | ... | -0.000338 | 0.000013 | 0.000041 | 0 | 2016-02-16 10:47:51 | True | sotto | 3110.640598 | 45 | 32730 |
20 | 21 | 31.69 | 27.51 | 25.00 | 45.18 | 1001.71 | 1.44 | 53.58 | 186.40 | -54.609398 | ... | -0.000266 | 0.000279 | -0.000009 | 0 | 2016-02-16 10:48:00 | True | sotto | 3140.151110 | 45 | 32730 |
21 | 22 | 31.66 | 27.52 | 25.00 | 45.18 | 1001.73 | 1.25 | 53.34 | 186.50 | -54.746114 | ... | 0.000139 | 0.000312 | 0.000050 | 0 | 2016-02-16 10:48:10 | NaN | sotto | 3156.665111 | 45 | 32730 |
22 | 23 | 31.68 | 27.54 | 25.00 | 45.25 | 1001.72 | 1.18 | 53.49 | 186.69 | -55.091416 | ... | -0.000489 | 0.000086 | 0.000065 | 0 | 2016-02-16 10:48:21 | NaN | sotto | 3188.235806 | 45 | 32730 |
23 | 24 | 31.67 | 27.53 | 24.99 | 45.30 | 1001.72 | 1.34 | 53.32 | 186.84 | -55.516313 | ... | 0.000312 | 0.000175 | 0.000308 | 0 | 2016-02-16 10:48:30 | NaN | sotto | 3238.850567 | 45 | 32730 |
24 | 25 | 31.65 | 27.53 | 25.00 | 45.40 | 1001.71 | 1.36 | 53.56 | 187.02 | -55.560991 | ... | -0.000101 | -0.000023 | 0.000377 | 0 | 2016-02-16 10:48:41 | NaN | sotto | 3242.425155 | 45 | 32730 |
25 | 26 | 31.67 | 27.52 | 25.00 | 45.33 | 1001.72 | 1.17 | 53.44 | 186.95 | -56.016359 | ... | 0.000147 | 0.000054 | 0.000147 | 0 | 2016-02-16 10:48:50 | NaN | sotto | 3288.794716 | 45 | 32730 |
26 | 27 | 31.74 | 27.54 | 25.00 | 45.27 | 1001.71 | 0.88 | 53.41 | 186.57 | -56.393694 | ... | -0.000125 | -0.000193 | 0.000269 | 0 | 2016-02-16 10:49:01 | True | sotto | 3320.328854 | 45 | 32730 |
27 | 28 | 31.63 | 27.52 | 25.00 | 45.33 | 1001.75 | 0.78 | 53.84 | 186.85 | -56.524545 | ... | -0.000175 | -0.000312 | 0.000361 | 0 | 2016-02-16 10:49:10 | NaN | sotto | 3339.433796 | 45 | 32730 |
28 | 29 | 31.68 | 27.52 | 25.00 | 45.33 | 1001.73 | 0.88 | 53.41 | 186.62 | -56.791585 | ... | -0.000382 | -0.000253 | 0.000132 | 0 | 2016-02-16 10:49:20 | NaN | sotto | 3364.310107 | 45 | 32730 |
29 | 30 | 31.67 | 27.51 | 25.00 | 45.21 | 1001.74 | 0.86 | 53.29 | 186.71 | -56.915466 | ... | 0.000031 | -0.000260 | 0.000069 | 0 | 2016-02-16 10:49:30 | NaN | sotto | 3377.217368 | 45 | 32730 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
110839 | 110840 | 31.60 | 27.49 | 24.82 | 42.74 | 1005.83 | 1.12 | 49.34 | 90.42 | 0.319629 | ... | -0.000209 | -0.000005 | 0.000138 | 0 | 2016-02-29 09:20:10 | NaN | sotto | 574.877314 | 42 | 2776 |
110840 | 110841 | 31.59 | 27.48 | 24.82 | 42.75 | 1005.82 | 2.04 | 49.53 | 92.11 | 0.015879 | ... | -0.000472 | -0.000478 | 0.000126 | 0 | 2016-02-29 09:20:20 | NaN | sotto | 593.855683 | 42 | 2776 |
110841 | 110842 | 31.59 | 27.51 | 24.82 | 42.76 | 1005.82 | 1.31 | 49.19 | 93.94 | -0.658624 | ... | -0.000590 | -0.000372 | 0.000207 | 0 | 2016-02-29 09:20:31 | NaN | sotto | 604.215692 | 42 | 2776 |
110842 | 110843 | 31.60 | 27.50 | 24.82 | 42.74 | 1005.85 | 1.19 | 48.91 | 95.57 | -1.117541 | ... | -0.000039 | 0.000059 | 0.000149 | 0 | 2016-02-29 09:20:40 | NaN | sotto | 606.406098 | 42 | 2776 |
110843 | 110844 | 31.57 | 27.49 | 24.82 | 42.80 | 1005.83 | 1.49 | 49.17 | 98.11 | -1.860475 | ... | 0.000223 | 0.000278 | 0.000038 | 0 | 2016-02-29 09:20:51 | NaN | sotto | 622.733559 | 42 | 2776 |
110844 | 110845 | 31.60 | 27.50 | 24.82 | 42.81 | 1005.84 | 1.47 | 49.46 | 99.67 | -2.286044 | ... | -0.000283 | -0.000187 | 0.000077 | 0 | 2016-02-29 09:21:00 | NaN | sotto | 641.480748 | 42 | 2776 |
110845 | 110846 | 31.61 | 27.50 | 24.82 | 42.81 | 1005.82 | 2.28 | 49.27 | 103.17 | -3.182359 | ... | -0.000451 | -0.000100 | -0.000351 | 0 | 2016-02-29 09:21:10 | NaN | sotto | 633.949204 | 42 | 2776 |
110846 | 110847 | 31.61 | 27.50 | 24.82 | 42.75 | 1005.84 | 2.18 | 49.64 | 105.05 | -3.769940 | ... | 0.000476 | 0.000452 | -0.000249 | 0 | 2016-02-29 09:21:20 | NaN | sotto | 643.508698 | 42 | 2776 |
110847 | 110848 | 31.58 | 27.50 | 24.82 | 43.00 | 1005.85 | 2.52 | 49.31 | 107.23 | -4.431722 | ... | 0.000822 | 0.000739 | -0.000012 | 0 | 2016-02-29 09:21:30 | NaN | sotto | 658.512439 | 43 | 2479 |
110848 | 110849 | 31.54 | 27.51 | 24.82 | 42.76 | 1005.84 | 2.35 | 49.55 | 108.68 | -4.944477 | ... | 0.000613 | 0.000763 | -0.000227 | 0 | 2016-02-29 09:21:41 | NaN | sotto | 667.095455 | 42 | 2776 |
110849 | 110850 | 31.60 | 27.50 | 24.82 | 42.79 | 1005.82 | 2.33 | 48.79 | 109.52 | -5.481255 | ... | 0.000196 | 0.000519 | -0.000234 | 0 | 2016-02-29 09:21:50 | NaN | sotto | 689.714415 | 42 | 2776 |
110850 | 110851 | 31.61 | 27.50 | 24.82 | 42.79 | 1005.85 | 2.11 | 49.66 | 111.90 | -6.263577 | ... | 0.000029 | -0.000098 | -0.000073 | 0 | 2016-02-29 09:22:01 | NaN | sotto | 707.304506 | 42 | 2776 |
110851 | 110852 | 31.56 | 27.50 | 24.83 | 42.84 | 1005.83 | 1.68 | 49.91 | 113.38 | -6.844946 | ... | -0.000177 | -0.000452 | -0.000232 | 0 | 2016-02-29 09:22:10 | NaN | sotto | 726.361255 | 42 | 2776 |
110852 | 110853 | 31.59 | 27.51 | 24.83 | 42.76 | 1005.82 | 2.26 | 49.17 | 114.42 | -7.437300 | ... | 0.000396 | 0.000400 | -0.000188 | 0 | 2016-02-29 09:22:21 | NaN | sotto | 743.185242 | 42 | 2776 |
110853 | 110854 | 31.58 | 27.50 | 24.83 | 42.98 | 1005.83 | 1.96 | 49.41 | 116.50 | -8.271114 | ... | 0.000285 | 0.000312 | -0.000058 | 0 | 2016-02-29 09:22:30 | NaN | sotto | 767.328522 | 42 | 2776 |
110854 | 110855 | 31.61 | 27.51 | 24.83 | 42.69 | 1005.84 | 2.27 | 49.39 | 117.61 | -8.690470 | ... | -0.000001 | 0.000371 | -0.000274 | 0 | 2016-02-29 09:22:40 | NaN | sotto | 791.907055 | 42 | 2776 |
110855 | 110856 | 31.55 | 27.50 | 24.83 | 42.79 | 1005.83 | 1.51 | 48.98 | 119.13 | -9.585351 | ... | -0.000115 | 0.000029 | -0.000223 | 0 | 2016-02-29 09:22:50 | NaN | sotto | 802.932850 | 42 | 2776 |
110856 | 110857 | 31.55 | 27.49 | 24.83 | 42.81 | 1005.82 | 2.12 | 49.95 | 120.81 | -10.120745 | ... | -0.000150 | 0.000147 | -0.000320 | 0 | 2016-02-29 09:23:00 | NaN | sotto | 820.194642 | 42 | 2776 |
110857 | 110858 | 31.60 | 27.51 | 24.83 | 42.92 | 1005.82 | 1.53 | 49.33 | 121.74 | -10.657858 | ... | 0.000161 | 0.000029 | -0.000210 | 0 | 2016-02-29 09:23:11 | NaN | sotto | 815.462202 | 42 | 2776 |
110858 | 110859 | 31.58 | 27.50 | 24.83 | 42.81 | 1005.83 | 1.60 | 49.65 | 123.50 | -11.584851 | ... | -0.000073 | 0.000158 | -0.000006 | 0 | 2016-02-29 09:23:20 | NaN | sotto | 851.154631 | 42 | 2776 |
110859 | 110860 | 31.61 | 27.50 | 24.83 | 42.82 | 1005.84 | 2.65 | 49.47 | 124.51 | -12.089743 | ... | 0.000428 | 0.000137 | 0.000201 | 0 | 2016-02-29 09:23:31 | NaN | sotto | 879.563826 | 42 | 2776 |
110860 | 110861 | 31.57 | 27.50 | 24.83 | 42.80 | 1005.84 | 2.63 | 50.08 | 125.85 | -12.701497 | ... | 0.000263 | 0.000163 | 0.000031 | 0 | 2016-02-29 09:23:40 | NaN | sotto | 895.543882 | 42 | 2776 |
110861 | 110862 | 31.58 | 27.51 | 24.83 | 42.90 | 1005.85 | 1.70 | 49.81 | 126.86 | -13.393369 | ... | -0.000077 | 0.000179 | 0.000148 | 0 | 2016-02-29 09:23:50 | NaN | sotto | 928.948693 | 42 | 2776 |
110862 | 110863 | 31.60 | 27.51 | 24.83 | 42.80 | 1005.85 | 1.66 | 49.13 | 127.35 | -13.990712 | ... | 0.000119 | -0.000204 | 0.000041 | 0 | 2016-02-29 09:24:01 | NaN | sotto | 957.695014 | 42 | 2776 |
110863 | 110864 | 31.64 | 27.51 | 24.83 | 42.80 | 1005.85 | 1.91 | 49.31 | 128.62 | -14.691672 | ... | 0.000286 | 0.000103 | 0.000221 | 0 | 2016-02-29 09:24:10 | NaN | sotto | 971.126355 | 42 | 2776 |
110864 | 110865 | 31.56 | 27.52 | 24.83 | 42.94 | 1005.83 | 1.58 | 49.93 | 129.60 | -15.169673 | ... | -0.000264 | 0.000206 | 0.000196 | 0 | 2016-02-29 09:24:21 | NaN | sotto | 996.676408 | 42 | 2776 |
110865 | 110866 | 31.55 | 27.50 | 24.83 | 42.72 | 1005.85 | 1.89 | 49.92 | 130.51 | -15.832622 | ... | 0.000143 | 0.000199 | -0.000024 | 0 | 2016-02-29 09:24:30 | NaN | sotto | 1022.779594 | 42 | 2776 |
110866 | 110867 | 31.58 | 27.50 | 24.83 | 42.83 | 1005.85 | 2.09 | 50.00 | 132.04 | -16.646212 | ... | 0.000537 | 0.000257 | 0.000057 | 0 | 2016-02-29 09:24:41 | NaN | sotto | 1048.121268 | 42 | 2776 |
110867 | 110868 | 31.62 | 27.50 | 24.83 | 42.81 | 1005.88 | 2.88 | 49.69 | 133.00 | -17.270447 | ... | 0.000534 | 0.000456 | 0.000195 | 0 | 2016-02-29 09:24:50 | NaN | sotto | 1073.629703 | 42 | 2776 |
110868 | 110869 | 31.57 | 27.51 | 24.83 | 42.94 | 1005.86 | 2.17 | 49.77 | 134.18 | -17.885872 | ... | 0.000459 | 0.000076 | 0.000030 | 0 | 2016-02-29 09:25:00 | NaN | sotto | 1095.760426 | 42 | 2776 |
110869 rows × 25 columns
9. Exercise: meteo average temperatures¶
9.1 meteo plot¶
✪ Put in a plot the temperature from dataframe meteo:
[58]:
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
# write here
[59]:
# SOLUTION
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
meteo.Temp.plot()
[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e74689828>

9.2 meteo pressure and raining¶
✪ In the same plot as above show the pressure and amount of raining.
[60]:
# write here
[61]:
# SOLUTION
meteo.Temp.plot(label="Temperature", legend=True)
meteo.Rain.plot(label="Rain", legend=True)
meteo.Pressure.plot(secondary_y=True, label="Pressure", legend=True);

9.3 meteo average temperature¶
✪✪✪ Calculate the average temperature for each day, and show it in the plot, so to have a couple new columns like these:
Day Avg_day_temp
01/11/2017 7.983333
01/11/2017 7.983333
01/11/2017 7.983333
. .
. .
02/11/2017 7.384375
02/11/2017 7.384375
02/11/2017 7.384375
. .
. .
HINT 1: add 'Day'
column by extracting only the day from the date. To do it, use the function .str
applied to all the column.
HINT 2: There are various ways to solve the exercise:
Most perfomant and elegant is with
groupby
operator, see Pandas trasform - more than meets the eyeAs alternative, you may use a
for
to cycle through days. Typically, using afor
is not a good idea with Pandas, as on large datasets it can take a lot to perform the updates. Still, since this dataset is small enough, you should get results in a decent amount of time.
[62]:
# write here
[63]:
print()
print(' **************** SOLUTION 1 - recalculate average for every row - slow !')
meteo = pd.read_csv('meteo.csv', encoding='UTF-8')
meteo['Day'] = meteo['Date'].str[0:10]
print("WITH DAY")
print(meteo.head())
for day in meteo['Day']:
avg_day_temp = meteo[(meteo.Day == day)].Temp.values.mean()
meteo.loc[(meteo.Day == day),'Avg_day_temp']= avg_day_temp
print()
print("WITH AVERAGE TEMPERATURE")
print(meteo.head())
meteo.Temp.plot(label="Temperatura", legend=True)
meteo.Avg_day_temp.plot(label="Average temperature", legend=True)
**************** SOLUTION 1 - recalculate average for every row - slow !
WITH DAY
Date Pressure Rain Temp Day
0 01/11/2017 00:00 995.4 0.0 5.4 01/11/2017
1 01/11/2017 00:15 995.5 0.0 6.0 01/11/2017
2 01/11/2017 00:30 995.5 0.0 5.9 01/11/2017
3 01/11/2017 00:45 995.7 0.0 5.4 01/11/2017
4 01/11/2017 01:00 995.7 0.0 5.3 01/11/2017
WITH AVERAGE TEMPERATURE
Date Pressure Rain Temp Day Avg_day_temp
0 01/11/2017 00:00 995.4 0.0 5.4 01/11/2017 7.983333
1 01/11/2017 00:15 995.5 0.0 6.0 01/11/2017 7.983333
2 01/11/2017 00:30 995.5 0.0 5.9 01/11/2017 7.983333
3 01/11/2017 00:45 995.7 0.0 5.4 01/11/2017 7.983333
4 01/11/2017 01:00 995.7 0.0 5.3 01/11/2017 7.983333
[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e74499c50>

[64]:
print()
print('******** SOLUTION 2 - recalculate average only 30 times by using a dictionary d_avg,')
print(' faster but not yet optimal')
meteo = pd.read_csv('meteo.csv', encoding='UTF-8')
meteo['Day'] = meteo['Date'].str[0:10]
print()
print("WITH DAY")
print(meteo.head())
d_avg = {}
for day in meteo['Day']:
if day not in d_avg:
d_avg[day] = meteo[ meteo['Day'] == day ]['Temp'].mean()
for day in meteo['Day']:
meteo.loc[(meteo.Day == day),'Avg_day_temp']= d_avg[day]
print()
print("WITH AVERAGE TEMPERATURE")
print(meteo.head())
meteo.Temp.plot(label="Temperature", legend=True)
meteo.Avg_day_temp.plot(label="Average temperature", legend=True)
******** SOLUTION 2 - recalculate average only 30 times by using a dictionary d_avg,
faster but not yet optimal
WITH DAY
Date Pressure Rain Temp Day
0 01/11/2017 00:00 995.4 0.0 5.4 01/11/2017
1 01/11/2017 00:15 995.5 0.0 6.0 01/11/2017
2 01/11/2017 00:30 995.5 0.0 5.9 01/11/2017
3 01/11/2017 00:45 995.7 0.0 5.4 01/11/2017
4 01/11/2017 01:00 995.7 0.0 5.3 01/11/2017
WITH AVERAGE TEMPERATURE
Date Pressure Rain Temp Day Avg_day_temp
0 01/11/2017 00:00 995.4 0.0 5.4 01/11/2017 7.983333
1 01/11/2017 00:15 995.5 0.0 6.0 01/11/2017 7.983333
2 01/11/2017 00:30 995.5 0.0 5.9 01/11/2017 7.983333
3 01/11/2017 00:45 995.7 0.0 5.4 01/11/2017 7.983333
4 01/11/2017 01:00 995.7 0.0 5.3 01/11/2017 7.983333
[64]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e74661668>

[65]:
print()
print('**************** SOLUTION 3 - best solution with groupby and transform ')
meteo = pd.read_csv('meteo.csv', encoding='UTF-8')
meteo['Day'] = meteo['Date'].str[0:10]
# .transform is needed to avoid getting a table with only 30 lines
meteo['Avg_day_temp'] = meteo.groupby('Day')['Temp'].transform('mean')
meteo
print()
print("WITH AVERAGE TEMPERATURE")
print(meteo.head())
meteo.Temp.plot(label="Temperatura", legend=True)
meteo.Avg_day_temp.plot(label="Average temperature", legend=True)
**************** SOLUTION 3 - best solution with groupby and transform
WITH AVERAGE TEMPERATURE
Date Pressure Rain Temp Day Avg_day_temp
0 01/11/2017 00:00 995.4 0.0 5.4 01/11/2017 7.983333
1 01/11/2017 00:15 995.5 0.0 6.0 01/11/2017 7.983333
2 01/11/2017 00:30 995.5 0.0 5.9 01/11/2017 7.983333
3 01/11/2017 00:45 995.7 0.0 5.4 01/11/2017 7.983333
4 01/11/2017 01:00 995.7 0.0 5.3 01/11/2017 7.983333
[65]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e7e704780>

10. Merging tables¶
Suppose we want to add a column with geographical position of the ISS. To do so, we would need to join our dataset with another one containing such information. Let’s take for example the dataset iss_coords.csv
[66]:
iss_coords = pd.read_csv('iss-coords.csv', encoding='UTF-8')
[67]:
iss_coords
[67]:
timestamp | lat | lon | |
---|---|---|---|
0 | 2016-01-01 05:11:30 | -45.103458 | 14.083858 |
1 | 2016-01-01 06:49:59 | -37.597242 | 28.931170 |
2 | 2016-01-01 11:52:30 | 17.126141 | 77.535602 |
3 | 2016-01-01 11:52:30 | 17.126464 | 77.535861 |
4 | 2016-01-01 14:54:08 | 7.259561 | 70.001561 |
5 | 2016-01-01 18:24:00 | -15.990725 | -106.400927 |
6 | 2016-01-01 22:45:51 | 31.602388 | 85.647998 |
7 | 2016-01-02 07:48:31 | -51.578009 | -26.736801 |
8 | 2016-01-02 10:50:19 | -36.512021 | 14.452174 |
9 | 2016-01-02 14:01:27 | -27.459029 | 10.991151 |
10 | 2016-01-02 14:01:27 | -27.458783 | 10.991398 |
11 | 2016-01-02 20:30:13 | 29.861877 | 156.955941 |
12 | 2016-01-03 11:43:18 | 9.065825 | -172.436293 |
13 | 2016-01-03 14:39:47 | 15.529901 | 35.812502 |
14 | 2016-01-03 14:39:47 | 15.530149 | 35.812698 |
15 | 2016-01-03 21:12:17 | -44.793666 | -28.679197 |
16 | 2016-01-03 22:39:52 | 28.061007 | 178.935724 |
17 | 2016-01-04 13:40:02 | -14.153170 | -139.759391 |
18 | 2016-01-04 13:51:36 | 9.461309 | 30.520802 |
19 | 2016-01-04 13:51:36 | 9.461560 | 30.520986 |
20 | 2016-01-04 18:42:18 | 44.974327 | 84.801522 |
21 | 2016-01-04 21:46:03 | -51.551958 | -75.103323 |
22 | 2016-01-04 21:46:03 | -51.551933 | -75.102954 |
23 | 2016-01-05 12:57:50 | -41.439217 | 3.847215 |
24 | 2016-01-05 14:36:00 | -13.581246 | 39.166522 |
25 | 2016-01-05 14:36:00 | -13.581024 | 39.166692 |
26 | 2016-01-05 17:51:36 | 26.103252 | -151.570312 |
27 | 2016-01-05 22:28:56 | -26.458448 | -108.642807 |
28 | 2016-01-06 12:07:09 | -51.204236 | -19.679525 |
29 | 2016-01-06 13:41:23 | -51.166546 | -19.318519 |
... | ... | ... | ... |
308 | 2016-02-25 21:19:08 | 14.195431 | -133.777268 |
309 | 2016-02-25 21:38:48 | -14.698631 | -85.875320 |
310 | 2016-02-26 00:51:29 | -4.376121 | -94.773870 |
311 | 2016-02-26 00:51:29 | -51.097174 | -21.117794 |
312 | 2016-02-26 13:09:56 | -1.811782 | -99.010499 |
313 | 2016-02-26 14:28:13 | -15.363988 | -87.986579 |
314 | 2016-02-26 14:28:13 | -15.364276 | -87.986354 |
315 | 2016-02-26 17:49:36 | -32.517607 | 47.514800 |
316 | 2016-02-26 22:37:28 | -41.292043 | 29.733597 |
317 | 2016-02-27 01:43:10 | -41.049112 | 30.193004 |
318 | 2016-02-27 01:43:10 | -8.402991 | -100.981726 |
319 | 2016-02-27 13:34:30 | 18.406130 | -126.884570 |
320 | 2016-02-27 13:52:46 | -22.783724 | -90.869452 |
321 | 2016-02-27 13:52:46 | -22.784018 | -90.869189 |
322 | 2016-02-27 21:47:45 | -7.038283 | -106.607037 |
323 | 2016-02-28 00:51:03 | -31.699384 | -84.328371 |
324 | 2016-02-28 08:13:04 | 40.239764 | -155.465692 |
325 | 2016-02-28 09:48:40 | 50.047523 | 175.566751 |
326 | 2016-02-28 14:29:36 | 37.854997 | 106.124377 |
327 | 2016-02-28 14:29:36 | 37.855237 | 106.124735 |
328 | 2016-02-28 20:56:33 | 51.729529 | 163.754128 |
329 | 2016-02-29 04:39:20 | -10.946978 | -100.874429 |
330 | 2016-02-29 08:56:28 | 46.885514 | -167.143393 |
331 | 2016-02-29 10:32:56 | 46.773608 | -166.800893 |
332 | 2016-02-29 11:53:49 | 46.678097 | -166.512208 |
333 | 2016-02-29 13:23:17 | -51.077590 | -31.093987 |
334 | 2016-02-29 13:44:13 | 30.688553 | -135.403820 |
335 | 2016-02-29 13:44:13 | 30.688295 | -135.403533 |
336 | 2016-02-29 18:44:57 | 27.608774 | -130.198781 |
337 | 2016-02-29 21:36:47 | 27.325186 | -129.893278 |
338 rows × 3 columns
We notice there is a timestamp
column, which unfortunately has a slightly different name that time_stamp
column (notice the underscore _
) in original astropi dataset:
[68]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110869 entries, 0 to 110868
Data columns (total 25 columns):
ROW_ID 110869 non-null int64
temp_cpu 110869 non-null float64
temp_h 110869 non-null float64
temp_p 110869 non-null float64
humidity 110869 non-null float64
pressure 110869 non-null float64
pitch 110869 non-null float64
roll 110869 non-null float64
yaw 110869 non-null float64
mag_x 110869 non-null float64
mag_y 110869 non-null float64
mag_z 110869 non-null float64
accel_x 110869 non-null float64
accel_y 110869 non-null float64
accel_z 110869 non-null float64
gyro_x 110869 non-null float64
gyro_y 110869 non-null float64
gyro_z 110869 non-null float64
reset 110869 non-null int64
time_stamp 110869 non-null object
Too hot 105315 non-null object
check_p 110869 non-null object
mag_tot 110869 non-null float64
humidity_int 110869 non-null int64
Humidity counts 110869 non-null int64
dtypes: float64(18), int64(4), object(3)
memory usage: 21.1+ MB
To merge datasets according to the columns, we can use the command merge
like this:
[69]:
# remember merge produces a NEW dataframe
geo_astropi = df.merge(iss_coords, left_on='time_stamp', right_on='timestamp')
# merge will add both time_stamp and timestamp columns,
# so we remove the duplicate column `timestamp`
geo_astropi = geo_astropi.drop('timestamp', axis=1)
[70]:
geo_astropi
[70]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | ... | gyro_z | reset | time_stamp | Too hot | check_p | mag_tot | humidity_int | Humidity counts | lat | lon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 23231 | 32.53 | 28.37 | 25.89 | 45.31 | 1006.04 | 1.31 | 51.63 | 34.91 | 21.125001 | ... | 0.000046 | 0 | 2016-02-19 03:49:00 | True | sotto | 2345.207992 | 45 | 32730 | 31.434741 | 52.917464 |
1 | 27052 | 32.30 | 28.12 | 25.62 | 45.57 | 1007.42 | 1.49 | 52.29 | 333.49 | 16.083471 | ... | 0.000034 | 0 | 2016-02-19 14:30:40 | True | sotto | 323.634786 | 45 | 32730 | -46.620658 | -57.311657 |
2 | 27052 | 32.30 | 28.12 | 25.62 | 45.57 | 1007.42 | 1.49 | 52.29 | 333.49 | 16.083471 | ... | 0.000034 | 0 | 2016-02-19 14:30:40 | True | sotto | 323.634786 | 45 | 32730 | -46.620477 | -57.311138 |
3 | 46933 | 32.21 | 28.05 | 25.50 | 47.36 | 1012.41 | 0.67 | 52.40 | 27.57 | 15.441683 | ... | 0.000221 | 0 | 2016-02-21 22:14:11 | True | sopra | 342.159257 | 47 | 14176 | 19.138359 | -140.211489 |
4 | 64572 | 32.32 | 28.18 | 25.61 | 47.45 | 1010.62 | 1.14 | 51.41 | 33.68 | 11.994554 | ... | 0.000030 | 0 | 2016-02-23 23:40:50 | True | sopra | 264.655601 | 47 | 14176 | 4.713819 | 80.261665 |
5 | 68293 | 32.39 | 28.26 | 25.70 | 46.83 | 1010.51 | 0.61 | 51.91 | 287.86 | 6.554283 | ... | 0.000171 | 0 | 2016-02-24 10:05:51 | True | sopra | 436.876111 | 46 | 35775 | -46.061583 | 22.246025 |
6 | 73374 | 32.38 | 28.18 | 25.62 | 46.52 | 1008.28 | 0.90 | 51.77 | 30.80 | 9.947132 | ... | -0.000375 | 0 | 2016-02-25 00:23:01 | True | sopra | 226.089258 | 46 | 35775 | 47.047346 | 137.958918 |
7 | 90986 | 32.42 | 28.34 | 25.76 | 45.72 | 1006.79 | 0.57 | 49.85 | 10.57 | 7.805606 | ... | -0.000047 | 0 | 2016-02-27 01:43:10 | True | sotto | 149.700293 | 45 | 32730 | -41.049112 | 30.193004 |
8 | 90986 | 32.42 | 28.34 | 25.76 | 45.72 | 1006.79 | 0.57 | 49.85 | 10.57 | 7.805606 | ... | -0.000047 | 0 | 2016-02-27 01:43:10 | True | sotto | 149.700293 | 45 | 32730 | -8.402991 | -100.981726 |
9 | 102440 | 32.62 | 28.62 | 26.02 | 45.15 | 1006.06 | 1.12 | 50.44 | 301.74 | 10.348327 | ... | -0.000061 | 0 | 2016-02-28 09:48:40 | True | sotto | 381.014223 | 45 | 32730 | 50.047523 | 175.566751 |
10 rows × 27 columns
Exercise 10.1 better merge¶
If you notice, above table does have lat
and lon
columns, but has very few rows. Why ? Try to merge the tables in some meaningful way so to have all the original rows and all cells of lat
and lon
filled.
For other merging stategies, read about attribute
how
in Why And How To Use Merge With Pandas in PythonTo fill missing values don’t use fancy interpolation techniques, just put the station position in that given day or hour
[71]:
geo_astropi = df.merge(iss_coords, left_on='time_stamp', right_on='timestamp', how='left')
[72]:
pd.merge_ordered(df, iss_coords, fill_method='ffill', how='left', left_on='time_stamp', right_on='timestamp')
geo_astropi
[72]:
ROW_ID | temp_cpu | temp_h | temp_p | humidity | pressure | pitch | roll | yaw | mag_x | ... | reset | time_stamp | Too hot | check_p | mag_tot | humidity_int | Humidity counts | timestamp | lat | lon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 31.88 | 27.57 | 25.01 | 44.94 | 1001.68 | 1.49 | 52.25 | 185.21 | -46.422753 | ... | 20 | 2016-02-16 10:44:40 | True | sotto | 2368.337207 | 44 | 13029 | NaN | NaN | NaN |
1 | 2 | 31.79 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.03 | 53.73 | 186.72 | -48.778951 | ... | 0 | 2016-02-16 10:44:50 | True | sotto | 2615.870247 | 45 | 32730 | NaN | NaN | NaN |
2 | 3 | 31.66 | 27.53 | 25.01 | 45.12 | 1001.72 | 1.24 | 53.57 | 186.21 | -49.161878 | ... | 0 | 2016-02-16 10:45:00 | NaN | sotto | 2648.484927 | 45 | 32730 | NaN | NaN | NaN |
3 | 4 | 31.69 | 27.52 | 25.01 | 45.32 | 1001.69 | 1.57 | 53.63 | 186.03 | -49.341941 | ... | 0 | 2016-02-16 10:45:10 | True | sotto | 2665.305485 | 45 | 32730 | NaN | NaN | NaN |
4 | 5 | 31.66 | 27.54 | 25.01 | 45.18 | 1001.71 | 0.85 | 53.66 | 186.46 | -50.056683 | ... | 0 | 2016-02-16 10:45:20 | NaN | sotto | 2732.388620 | 45 | 32730 | NaN | NaN | NaN |
5 | 6 | 31.69 | 27.55 | 25.01 | 45.12 | 1001.67 | 0.85 | 53.53 | 185.52 | -50.246476 | ... | 0 | 2016-02-16 10:45:30 | True | sotto | 2736.836291 | 45 | 32730 | NaN | NaN | NaN |
6 | 7 | 31.68 | 27.53 | 25.01 | 45.31 | 1001.70 | 0.63 | 53.55 | 186.10 | -50.447346 | ... | 0 | 2016-02-16 10:45:41 | NaN | sotto | 2756.496929 | 45 | 32730 | NaN | NaN | NaN |
7 | 8 | 31.66 | 27.55 | 25.01 | 45.34 | 1001.70 | 1.49 | 53.65 | 186.08 | -50.668232 | ... | 0 | 2016-02-16 10:45:50 | NaN | sotto | 2778.429164 | 45 | 32730 | NaN | NaN | NaN |
8 | 9 | 31.67 | 27.54 | 25.01 | 45.20 | 1001.72 | 1.22 | 53.77 | 186.55 | -50.761529 | ... | 0 | 2016-02-16 10:46:01 | NaN | sotto | 2773.029554 | 45 | 32730 | NaN | NaN | NaN |
9 | 10 | 31.67 | 27.54 | 25.01 | 45.41 | 1001.75 | 1.63 | 53.46 | 185.94 | -51.243832 | ... | 0 | 2016-02-16 10:46:10 | NaN | sotto | 2809.446772 | 45 | 32730 | NaN | NaN | NaN |
10 | 11 | 31.68 | 27.53 | 25.00 | 45.16 | 1001.72 | 1.32 | 53.52 | 186.24 | -51.616473 | ... | 0 | 2016-02-16 10:46:20 | NaN | sotto | 2851.426683 | 45 | 32730 | NaN | NaN | NaN |
11 | 12 | 31.67 | 27.52 | 25.00 | 45.48 | 1001.72 | 1.51 | 53.47 | 186.17 | -51.781714 | ... | 0 | 2016-02-16 10:46:30 | NaN | sotto | 2864.856376 | 45 | 32730 | NaN | NaN | NaN |
12 | 13 | 31.63 | 27.53 | 25.00 | 45.20 | 1001.72 | 1.55 | 53.75 | 186.38 | -51.992696 | ... | 0 | 2016-02-16 10:46:40 | NaN | sotto | 2880.392591 | 45 | 32730 | NaN | NaN | NaN |
13 | 14 | 31.69 | 27.53 | 25.00 | 45.28 | 1001.71 | 1.07 | 53.63 | 186.60 | -52.409175 | ... | 0 | 2016-02-16 10:46:50 | True | sotto | 2921.288936 | 45 | 32730 | NaN | NaN | NaN |
14 | 15 | 31.70 | 27.52 | 25.00 | 45.14 | 1001.72 | 0.81 | 53.40 | 186.32 | -52.648488 | ... | 0 | 2016-02-16 10:47:00 | True | sotto | 2946.615432 | 45 | 32730 | NaN | NaN | NaN |
15 | 16 | 31.72 | 27.53 | 25.00 | 45.31 | 1001.75 | 1.51 | 53.34 | 186.42 | -52.850708 | ... | 0 | 2016-02-16 10:47:11 | True | sotto | 2967.640766 | 45 | 32730 | NaN | NaN | NaN |
16 | 17 | 31.71 | 27.52 | 25.00 | 45.14 | 1001.72 | 1.82 | 53.49 | 186.39 | -53.449140 | ... | 0 | 2016-02-16 10:47:20 | True | sotto | 3029.683044 | 45 | 32730 | NaN | NaN | NaN |
17 | 18 | 31.67 | 27.53 | 25.00 | 45.23 | 1001.71 | 0.46 | 53.69 | 186.72 | -53.679986 | ... | 0 | 2016-02-16 10:47:31 | NaN | sotto | 3052.251538 | 45 | 32730 | NaN | NaN | NaN |
18 | 19 | 31.67 | 27.53 | 25.00 | 45.28 | 1001.71 | 0.67 | 53.55 | 186.61 | -54.159015 | ... | 0 | 2016-02-16 10:47:40 | NaN | sotto | 3095.501435 | 45 | 32730 | NaN | NaN | NaN |
19 | 20 | 31.69 | 27.53 | 25.00 | 45.21 | 1001.71 | 1.23 | 53.43 | 186.21 | -54.400646 | ... | 0 | 2016-02-16 10:47:51 | True | sotto | 3110.640598 | 45 | 32730 | NaN | NaN | NaN |
20 | 21 | 31.69 | 27.51 | 25.00 | 45.18 | 1001.71 | 1.44 | 53.58 | 186.40 | -54.609398 | ... | 0 | 2016-02-16 10:48:00 | True | sotto | 3140.151110 | 45 | 32730 | NaN | NaN | NaN |
21 | 22 | 31.66 | 27.52 | 25.00 | 45.18 | 1001.73 | 1.25 | 53.34 | 186.50 | -54.746114 | ... | 0 | 2016-02-16 10:48:10 | NaN | sotto | 3156.665111 | 45 | 32730 | NaN | NaN | NaN |
22 | 23 | 31.68 | 27.54 | 25.00 | 45.25 | 1001.72 | 1.18 | 53.49 | 186.69 | -55.091416 | ... | 0 | 2016-02-16 10:48:21 | NaN | sotto | 3188.235806 | 45 | 32730 | NaN | NaN | NaN |
23 | 24 | 31.67 | 27.53 | 24.99 | 45.30 | 1001.72 | 1.34 | 53.32 | 186.84 | -55.516313 | ... | 0 | 2016-02-16 10:48:30 | NaN | sotto | 3238.850567 | 45 | 32730 | NaN | NaN | NaN |
24 | 25 | 31.65 | 27.53 | 25.00 | 45.40 | 1001.71 | 1.36 | 53.56 | 187.02 | -55.560991 | ... | 0 | 2016-02-16 10:48:41 | NaN | sotto | 3242.425155 | 45 | 32730 | NaN | NaN | NaN |
25 | 26 | 31.67 | 27.52 | 25.00 | 45.33 | 1001.72 | 1.17 | 53.44 | 186.95 | -56.016359 | ... | 0 | 2016-02-16 10:48:50 | NaN | sotto | 3288.794716 | 45 | 32730 | NaN | NaN | NaN |
26 | 27 | 31.74 | 27.54 | 25.00 | 45.27 | 1001.71 | 0.88 | 53.41 | 186.57 | -56.393694 | ... | 0 | 2016-02-16 10:49:01 | True | sotto | 3320.328854 | 45 | 32730 | NaN | NaN | NaN |
27 | 28 | 31.63 | 27.52 | 25.00 | 45.33 | 1001.75 | 0.78 | 53.84 | 186.85 | -56.524545 | ... | 0 | 2016-02-16 10:49:10 | NaN | sotto | 3339.433796 | 45 | 32730 | NaN | NaN | NaN |
28 | 29 | 31.68 | 27.52 | 25.00 | 45.33 | 1001.73 | 0.88 | 53.41 | 186.62 | -56.791585 | ... | 0 | 2016-02-16 10:49:20 | NaN | sotto | 3364.310107 | 45 | 32730 | NaN | NaN | NaN |
29 | 30 | 31.67 | 27.51 | 25.00 | 45.21 | 1001.74 | 0.86 | 53.29 | 186.71 | -56.915466 | ... | 0 | 2016-02-16 10:49:30 | NaN | sotto | 3377.217368 | 45 | 32730 | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
110841 | 110840 | 31.60 | 27.49 | 24.82 | 42.74 | 1005.83 | 1.12 | 49.34 | 90.42 | 0.319629 | ... | 0 | 2016-02-29 09:20:10 | NaN | sotto | 574.877314 | 42 | 2776 | NaN | NaN | NaN |
110842 | 110841 | 31.59 | 27.48 | 24.82 | 42.75 | 1005.82 | 2.04 | 49.53 | 92.11 | 0.015879 | ... | 0 | 2016-02-29 09:20:20 | NaN | sotto | 593.855683 | 42 | 2776 | NaN | NaN | NaN |
110843 | 110842 | 31.59 | 27.51 | 24.82 | 42.76 | 1005.82 | 1.31 | 49.19 | 93.94 | -0.658624 | ... | 0 | 2016-02-29 09:20:31 | NaN | sotto | 604.215692 | 42 | 2776 | NaN | NaN | NaN |
110844 | 110843 | 31.60 | 27.50 | 24.82 | 42.74 | 1005.85 | 1.19 | 48.91 | 95.57 | -1.117541 | ... | 0 | 2016-02-29 09:20:40 | NaN | sotto | 606.406098 | 42 | 2776 | NaN | NaN | NaN |
110845 | 110844 | 31.57 | 27.49 | 24.82 | 42.80 | 1005.83 | 1.49 | 49.17 | 98.11 | -1.860475 | ... | 0 | 2016-02-29 09:20:51 | NaN | sotto | 622.733559 | 42 | 2776 | NaN | NaN | NaN |
110846 | 110845 | 31.60 | 27.50 | 24.82 | 42.81 | 1005.84 | 1.47 | 49.46 | 99.67 | -2.286044 | ... | 0 | 2016-02-29 09:21:00 | NaN | sotto | 641.480748 | 42 | 2776 | NaN | NaN | NaN |
110847 | 110846 | 31.61 | 27.50 | 24.82 | 42.81 | 1005.82 | 2.28 | 49.27 | 103.17 | -3.182359 | ... | 0 | 2016-02-29 09:21:10 | NaN | sotto | 633.949204 | 42 | 2776 | NaN | NaN | NaN |
110848 | 110847 | 31.61 | 27.50 | 24.82 | 42.75 | 1005.84 | 2.18 | 49.64 | 105.05 | -3.769940 | ... | 0 | 2016-02-29 09:21:20 | NaN | sotto | 643.508698 | 42 | 2776 | NaN | NaN | NaN |
110849 | 110848 | 31.58 | 27.50 | 24.82 | 43.00 | 1005.85 | 2.52 | 49.31 | 107.23 | -4.431722 | ... | 0 | 2016-02-29 09:21:30 | NaN | sotto | 658.512439 | 43 | 2479 | NaN | NaN | NaN |
110850 | 110849 | 31.54 | 27.51 | 24.82 | 42.76 | 1005.84 | 2.35 | 49.55 | 108.68 | -4.944477 | ... | 0 | 2016-02-29 09:21:41 | NaN | sotto | 667.095455 | 42 | 2776 | NaN | NaN | NaN |
110851 | 110850 | 31.60 | 27.50 | 24.82 | 42.79 | 1005.82 | 2.33 | 48.79 | 109.52 | -5.481255 | ... | 0 | 2016-02-29 09:21:50 | NaN | sotto | 689.714415 | 42 | 2776 | NaN | NaN | NaN |
110852 | 110851 | 31.61 | 27.50 | 24.82 | 42.79 | 1005.85 | 2.11 | 49.66 | 111.90 | -6.263577 | ... | 0 | 2016-02-29 09:22:01 | NaN | sotto | 707.304506 | 42 | 2776 | NaN | NaN | NaN |
110853 | 110852 | 31.56 | 27.50 | 24.83 | 42.84 | 1005.83 | 1.68 | 49.91 | 113.38 | -6.844946 | ... | 0 | 2016-02-29 09:22:10 | NaN | sotto | 726.361255 | 42 | 2776 | NaN | NaN | NaN |
110854 | 110853 | 31.59 | 27.51 | 24.83 | 42.76 | 1005.82 | 2.26 | 49.17 | 114.42 | -7.437300 | ... | 0 | 2016-02-29 09:22:21 | NaN | sotto | 743.185242 | 42 | 2776 | NaN | NaN | NaN |
110855 | 110854 | 31.58 | 27.50 | 24.83 | 42.98 | 1005.83 | 1.96 | 49.41 | 116.50 | -8.271114 | ... | 0 | 2016-02-29 09:22:30 | NaN | sotto | 767.328522 | 42 | 2776 | NaN | NaN | NaN |
110856 | 110855 | 31.61 | 27.51 | 24.83 | 42.69 | 1005.84 | 2.27 | 49.39 | 117.61 | -8.690470 | ... | 0 | 2016-02-29 09:22:40 | NaN | sotto | 791.907055 | 42 | 2776 | NaN | NaN | NaN |
110857 | 110856 | 31.55 | 27.50 | 24.83 | 42.79 | 1005.83 | 1.51 | 48.98 | 119.13 | -9.585351 | ... | 0 | 2016-02-29 09:22:50 | NaN | sotto | 802.932850 | 42 | 2776 | NaN | NaN | NaN |
110858 | 110857 | 31.55 | 27.49 | 24.83 | 42.81 | 1005.82 | 2.12 | 49.95 | 120.81 | -10.120745 | ... | 0 | 2016-02-29 09:23:00 | NaN | sotto | 820.194642 | 42 | 2776 | NaN | NaN | NaN |
110859 | 110858 | 31.60 | 27.51 | 24.83 | 42.92 | 1005.82 | 1.53 | 49.33 | 121.74 | -10.657858 | ... | 0 | 2016-02-29 09:23:11 | NaN | sotto | 815.462202 | 42 | 2776 | NaN | NaN | NaN |
110860 | 110859 | 31.58 | 27.50 | 24.83 | 42.81 | 1005.83 | 1.60 | 49.65 | 123.50 | -11.584851 | ... | 0 | 2016-02-29 09:23:20 | NaN | sotto | 851.154631 | 42 | 2776 | NaN | NaN | NaN |
110861 | 110860 | 31.61 | 27.50 | 24.83 | 42.82 | 1005.84 | 2.65 | 49.47 | 124.51 | -12.089743 | ... | 0 | 2016-02-29 09:23:31 | NaN | sotto | 879.563826 | 42 | 2776 | NaN | NaN | NaN |
110862 | 110861 | 31.57 | 27.50 | 24.83 | 42.80 | 1005.84 | 2.63 | 50.08 | 125.85 | -12.701497 | ... | 0 | 2016-02-29 09:23:40 | NaN | sotto | 895.543882 | 42 | 2776 | NaN | NaN | NaN |
110863 | 110862 | 31.58 | 27.51 | 24.83 | 42.90 | 1005.85 | 1.70 | 49.81 | 126.86 | -13.393369 | ... | 0 | 2016-02-29 09:23:50 | NaN | sotto | 928.948693 | 42 | 2776 | NaN | NaN | NaN |
110864 | 110863 | 31.60 | 27.51 | 24.83 | 42.80 | 1005.85 | 1.66 | 49.13 | 127.35 | -13.990712 | ... | 0 | 2016-02-29 09:24:01 | NaN | sotto | 957.695014 | 42 | 2776 | NaN | NaN | NaN |
110865 | 110864 | 31.64 | 27.51 | 24.83 | 42.80 | 1005.85 | 1.91 | 49.31 | 128.62 | -14.691672 | ... | 0 | 2016-02-29 09:24:10 | NaN | sotto | 971.126355 | 42 | 2776 | NaN | NaN | NaN |
110866 | 110865 | 31.56 | 27.52 | 24.83 | 42.94 | 1005.83 | 1.58 | 49.93 | 129.60 | -15.169673 | ... | 0 | 2016-02-29 09:24:21 | NaN | sotto | 996.676408 | 42 | 2776 | NaN | NaN | NaN |
110867 | 110866 | 31.55 | 27.50 | 24.83 | 42.72 | 1005.85 | 1.89 | 49.92 | 130.51 | -15.832622 | ... | 0 | 2016-02-29 09:24:30 | NaN | sotto | 1022.779594 | 42 | 2776 | NaN | NaN | NaN |
110868 | 110867 | 31.58 | 27.50 | 24.83 | 42.83 | 1005.85 | 2.09 | 50.00 | 132.04 | -16.646212 | ... | 0 | 2016-02-29 09:24:41 | NaN | sotto | 1048.121268 | 42 | 2776 | NaN | NaN | NaN |
110869 | 110868 | 31.62 | 27.50 | 24.83 | 42.81 | 1005.88 | 2.88 | 49.69 | 133.00 | -17.270447 | ... | 0 | 2016-02-29 09:24:50 | NaN | sotto | 1073.629703 | 42 | 2776 | NaN | NaN | NaN |
110870 | 110869 | 31.57 | 27.51 | 24.83 | 42.94 | 1005.86 | 2.17 | 49.77 | 134.18 | -17.885872 | ... | 0 | 2016-02-29 09:25:00 | NaN | sotto | 1095.760426 | 42 | 2776 | NaN | NaN | NaN |
110871 rows × 28 columns
Binary relations solutions¶
Introduction¶
We can use graphs to model relations of many kinds, like isCloseTo, isFriendOf, loves, etc. Here we review some of them and their properties.
Before going on, make sure to have read the chapter Graph formats
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|- graph-formats
|- binary-relations-exercise.ipynb
|- binary-relations-solution.ipynb
WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !
open Jupyter Notebook from that folder. Two things should open, first a console and then browser. The browser should show a file list: navigate the list and open the notebook
exercises/binary-relations/binary-relations-exercise.ipynb
WARNING 2: DO NOT use the Upload button in Jupyter, instead navigate in Jupyter browser to the unzipped folder !
Go on reading that notebook, and follow instuctions inside.
Shortcut keys:
to execute Python code inside a Jupyter cell, press
Control + Enter
to execute Python code inside a Jupyter cell AND select next cell, press
Shift + Enter
to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press
Alt + Enter
If the notebooks look stuck, try to select
Kernel -> Restart
Reflexive relations¶
A graph is reflexive when each node links to itself.
In real life, the typical reflexive relation could be “is close to” , supposing “close to” means being within a 100 meters distance. Obviously, any place is always close to itself, let’s see an example (Povo is a small town around Trento):
[2]:
from sciprog import draw_adj
draw_adj({
'Trento Cathedral' : ['Trento Cathedral', 'Trento Neptune Statue'],
'Trento Neptune Statue' : ['Trento Neptune Statue', 'Trento Cathedral'],
'Povo' : ['Povo'],
})

Some relations might not always be necessarily reflexive, like “did homeworks for”. You should always do your own homeworks, but to our dismay, university intelligence services caught some of you cheating. In the following example we expose the situation - due to privacy concerns, we identify students with numbers starting from zero included:
[3]:
from sciprog import draw_mat
draw_mat(
[
[True, False, False, False],
[False, False, False, False],
[False, True, True, False],
[False, False, False, False],
]
)

From the graph above, we see student 0 and student 2 both did their own homeworks. Student 3 did no homerworks at all. Alarmingly, we notice student 2 did the homeworks for student 1. Resulting conspiration shall be severely punished with a one year ban from having spritz at Emma’s bar.
Exercises¶
is_reflexive_mat¶
✪✪ Implement now this function for matrices.
[4]:
def is_reflexive_mat(mat):
""" RETURN True if nxn boolean matrix mat as list of lists is reflexive, False otherwise.
A graph is reflexive when all nodes point to themselves. Please at least try to make the function efficient.
"""
#jupman-raise
n = len(mat)
for i in range(n):
if not mat[i][i]:
return False
return True
#/jupman-raise
assert is_reflexive_mat([
[False]
]) == False # m1
assert is_reflexive_mat([
[True]
]) == True # m2
assert is_reflexive_mat([
[False, False],
[False, False],
]) == False # m3
assert is_reflexive_mat([
[True, True],
[True, True],
]) == True # m4
assert is_reflexive_mat([
[True, True],
[False, True],
]) == True # m5
assert is_reflexive_mat([
[True, False],
[True, True],
]) == True # m6
assert is_reflexive_mat([
[True, True],
[True, False],
]) == False # m7
assert is_reflexive_mat([
[False, True],
[True, True],
]) == False # m8
assert is_reflexive_mat([
[False, True],
[True, False],
]) == False # m9
assert is_reflexive_mat([
[False, False],
[True, False],
]) == False # m10
assert is_reflexive_mat([
[False, True, True],
[True, False, False],
[True, True, True],
]) == False # m11
assert is_reflexive_mat([
[True, True, True],
[True, True, True],
[True, True, True],
]) == True # m12
is_reflexive_adj¶
✪✪ Implement now the same function for dictionaries of adjacency lists.
[5]:
def is_reflexive_adj(d):
""" RETURN True if provided graph as dictionary of adjacency lists is reflexive, False otherwise.
A graph is reflexive when all nodes point to themselves. Please at least try to make the function efficient.
"""
#jupman-raise
for v in d:
if not v in d[v]:
return False
return True
#/jupman-raise
assert is_reflexive_adj({
'a':[]
}) == False # d1
assert is_reflexive_adj({
'a':['a']
}) == True # d2
assert is_reflexive_adj({
'a':[],
'b':[]
}) == False # d3
assert is_reflexive_adj({
'a':['a'],
'b':['b']
}) == True # d4
assert is_reflexive_adj({
'a':['a','b'],
'b':['b']
}) == True # d5
assert is_reflexive_adj({
'a':['a'],
'b':['a','b']
}) == True # d6
assert is_reflexive_adj({
'a':['a','b'],
'b':['a']
}) == False # d7
assert is_reflexive_adj({
'a':['b'],
'b':['a','b']
}) == False # d8
assert is_reflexive_adj({
'a':['b'],
'b':['a']
}) == False # d9
assert is_reflexive_adj({
'a':[],
'b':['a']
}) == False # d10
assert is_reflexive_adj({
'a':['b','c'],
'b':['a'],
'c':['a','b','c']
}) == False # d11
assert is_reflexive_adj({
'a':['a','b','c'],
'b':['a','b','c'],
'c':['a','b','c']
}) == True # d12
Symmetric relations¶
A graph is symmetric when for all nodes, if a node A links to another node B, there is a also a link from node B to A.
In real life, the typical symmetric relation is “is friend of”. If you are friend to somene, that someone should be also be your friend.
For example, since Scrooge typically is not so friendly with his lazy nephew Donald Duck, but certainly both Scrooge and Donald Duck enjoy visiting the farm of Grandma Duck, we can model their friendship relation like this:
[6]:
from sciprog import draw_adj
draw_adj({
'Donald Duck' : ['Grandma Duck'],
'Scrooge' : ['Grandma Duck'],
'Grandma Duck' : ['Scrooge', 'Donald Duck'],
})

Not that Scrooge is not linked to Donald Duck, but this does not mean the whole graph cannot be considered symmetric. If you pay attention to the definition above, there is if written at the beginning: if a node A links to another node B, there is a also a link from node B to A.
QUESTION: Looking purely at the above definition (so do not consider ‘is friend of’ relation), should a symmetric relation be necessarily reflexive?
ANSWER: No, in a symmetric relation some nodes can be linked to themseves, while some other nodes may have no link to themselves. All we care about to check symmetry is links from a node to other nodes.
QUESTION: Think about the semantics of the specific “is friend of” relation: can you think of a social network where the relation is not shown as reflexive?
ANSWER: In the particular case of “is friend to” relation is interesting, as it prompts us to think about the semantic meaning of the relation: obviously, everybody should be a friend of himself/herself - but if were to implement say a social network service like Facebook, it would look rather useless to show in your your friends list the information that you are a friend of yourself.
QUESTION: Always talking about the specific semantics of “is friend of” relation: can you think about some case where it should be meaningful to store information about individuals not being friends of themselves ?
ANSWER: in real life it may always happen to find fringe cases - suppose you are given the task to model a network of possibly depressed people with self-harming tendencies. So always be sure your model correctly fits the problem at hand.
Some relations sometimes may or not be symmetric, depending on the graph at hand. Think about the relation loves. It is well known that Mickey Mouse lovel Minnie and the sentiment is reciprocal, and Donald Duck loves Daisy Duck and the sentiment is reciprocal. We can conclude this particular graph is symmetrical:
[7]:
from sciprog import draw_adj
draw_adj({
'Donald Duck' : ['Daisy Duck'],
'Daisy Duck' : ['Donald Duck'],
'Mickey Mouse' : ['Minnie'],
'Minnie' : ['Mickey Mouse']
})

But what about this one? Donald Duck is not the only duck in town and sometimes a contender shows up: Gladstone Gander (Gastone in Italian) also would like the attention of Daisy ( never mind in some comics he actually gets it when Donald Duck messes up big time):
[8]:
from sciprog import draw_adj
draw_adj({
'Donald Duck' : ['Daisy Duck'],
'Daisy Duck' : ['Donald Duck'],
'Mickey Mouse' : ['Minnie'],
'Minnie' : ['Mickey Mouse'],
'Gladstone Gander' : ['Daisy Duck']
})

is_symmetric_mat¶
✪✪ Implement an automated procedure to check whether or not a graph is symmetrical. Implement this function for matrices:
[9]:
def is_symmetric_mat(mat):
""" RETURN True if nxn boolean matrix mat as list of lists is symmetric, False otherwise.
A graph is symmetric when for all nodes, if a node A links to another node B,
there is a also a link from node B to A.
NOTE: if
"""
#jupman-raise
n = len(mat)
for i in range(n):
for j in range(n):
if mat[i][j] and not mat[j][i]:
return False
return True
#/jupman-raise
assert is_symmetric_mat([
[False]
]) == True # m1
assert is_symmetric_mat([
[True]
]) == True # m2
assert is_symmetric_mat([
[False, False],
[False, False],
]) == True # m3
assert is_symmetric_mat([
[True, True],
[True, True],
]) == True # m4
assert is_symmetric_mat([
[True, True],
[False, True],
]) == False # m5
assert is_symmetric_mat([
[True, False],
[True, True],
]) == False # m6
assert is_symmetric_mat([
[True, True],
[True, False],
]) == True # m7
assert is_symmetric_mat([
[False, True],
[True, True],
]) == True # m8
assert is_symmetric_mat([
[False, True],
[True, False],
]) == True # m9
assert is_symmetric_mat([
[False, False],
[True, False],
]) == False # m10
assert is_symmetric_mat([
[False, True, True],
[True, False, False],
[True, True, True],
]) == False # m11
assert is_symmetric_mat([
[False, True, True],
[True, False, True],
[True, True, True],
]) == True # m12
is_symmetric_adj¶
✪✪ Now implement the same as before but for a dictionary of adjacency lists:
[10]:
def is_symmetric_adj(d):
""" RETURN True if given dictionary of adjacency lists is symmetric, False otherwise.
Assume all the nodes are represented in the keys.
A graph is symmetric when for all nodes, if a node A links to another node B,
there is a also a link from node B to A.
"""
#jupman-raise
for k in d:
for v in d[k]:
if not k in d[v]:
return False
return True
#/jupman-raise
assert is_symmetric_adj({
'a':[]
}) == True # d1
assert is_symmetric_adj({
'a':['a']
}) == True # d2
assert is_symmetric_adj({
'a' : [],
'b' : []
}) == True # d3
assert is_symmetric_adj({
'a' : ['a','b'],
'b' : ['a','b']
}) == True # d4
assert is_symmetric_adj({
'a' : ['a','b'],
'b' : ['b']
}) == False # d5
assert is_symmetric_adj({
'a' : ['a'],
'b' : ['a','b']
}) == False # d6
assert is_symmetric_adj({
'a' : ['a','b'],
'b' : ['a']
}) == True # d7
assert is_symmetric_adj({
'a' : ['b'],
'b' : ['a','b']
}) == True # d8
assert is_symmetric_adj({
'a' : ['b'],
'b' : ['a']
}) == True # d9
assert is_symmetric_adj({
'a' : [],
'b' : ['a']
}) == False # d10
assert is_symmetric_adj({
'a' : ['b', 'c'],
'b' : ['a'],
'c' : ['a','b','c']
}) == False # d11
assert is_symmetric_adj({
'a' : ['b', 'c'],
'b' : ['a','c'],
'c' : ['a','b','c']
}) == True # d12
surjective¶
✪✪ If we consider a graph as a nxn binary relation where the domain is the same as the codomain, such relation is called surjective if every node is reached by at least one edge.
For example, G1
here is surjective, because there is at least one edge reaching into each node (self-loops as in 0 node also count as incoming edges)
[11]:
G1 = [
[True, True, False, False],
[False, False, False, True],
[False, True, True, False],
[False, True, True, True],
]
[12]:
draw_mat(G1)

G2
down here instead does not represent a surjective relation, as there is at least one node ( 2
in our case) which does not have any incoming edge:
[13]:
G2 = [
[True, True, False, False],
[False, False, False, True],
[False, True, False, False],
[False, True, False, False],
]
[14]:
draw_mat(G2)

[15]:
def surjective(mat):
""" RETURN True if provided graph mat as list of boolean lists is an
nxn surjective binary relation, otherwise return False
"""
#jupman-raise
n = len(mat)
c = 0 # number of incoming edges found
for j in range(len(mat)): # go column by column
for i in range(len(mat)): # go row by row
if mat[i][j]:
c += 1
break # as you find first incoming edge, increment c and stop search for that column
return c == n
#/jupman-raise
m1 = [
[False]
]
assert surjective(m1) == False
m2 = [
[True]
]
assert surjective(m2) == True
m3 = [
[True, False],
[False, False],
]
assert surjective(m3) == False
m4 = [
[False, True],
[False, False],
]
assert surjective(m4) == False
m5 = [
[False, False],
[True, False],
]
assert surjective(m5) == False
m6 = [
[False, False],
[False, True],
]
assert surjective(m6) == False
m7 = [
[True, False],
[True, False],
]
assert surjective(m7) == False
m8 = [
[True, False],
[False, True],
]
assert surjective(m8) == True
m9 = [
[True, True],
[False, True],
]
assert surjective(m9) == True
m10 = [
[True, True, False, False],
[False, False, False, True],
[False, True, False, False],
[False, True, False, False],
]
assert surjective(m10) == False
m11 = [
[True, True, False, False],
[False, False, False, True],
[False, True, True, False],
[False, True, True, True],
]
assert surjective(m11) == True
Further resources¶
Rule based design by Lex Wedemeijer, Stef Joosten, Jaap van der woude: a very readable text on how to represent information using only binary relations with boolean matrices. This a theorical book with no python exercise so it is not a mandatory read, it only gives context and practical applications for some of the material on graphs presented during the course
[ ]:
OOP¶
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|- oop
|- oop.ipynb
|- ComplexNumber_solution.py
|- ComplexNumber_exercise.py
This time you will not write in the notebook, instead you will edit .py files in Visual Studio Code.
Now proceed reading.
1. Abstract Data Types (ADT) Theory¶
1.1. Intro¶
Theory from the slides:
Data structures slides (First slides until slide 13 ‘Comments’ )
Object Oriented programming on the the book (In particular, Fraction class, in this course we won’t focus on inheritance)
1.2. Complex number theory¶
1.3. Datatypes the old way¶
From the definition we see that to identify a complex number we need two float values . One number is for the *real* part, and another number is for the *imaginary* part.
How can we represent this in Python? So far, you saw there are many ways to put two numbers together. One way could be to put the numbers in a list of two elements, and implicitly assume the first one is the real and the second the imaginary part:
[2]:
c = [3.0, 5.0]
Or we could use a tuple:
[3]:
c = (3.0, 5.0)
A problem with the previous representations is that a casual observer might not know exactly the meaning of the two numbers. We could be more explicit and store the values into a dictionary, using keys to identify the two parts:
[4]:
c = {'real': 3.0, 'imaginary': 5.0}
[5]:
print(c)
{'real': 3.0, 'imaginary': 5.0}
[6]:
print(c['real'])
3.0
[7]:
print(c['imaginary'])
5.0
Now, writing the whole record {'real': 3.0, 'imaginary': 5.0}
each time we want to create a complex number might be annoying and error prone. To help us, we can create a little shortcut function named complex_number
that creates and returns the dictionary:
[8]:
def complex_number(real, imaginary):
d = {}
d['real'] = real
d['imaginary'] = imaginary
return d
[9]:
c = complex_number(3.0, 5.0)
[10]:
print(c)
{'real': 3.0, 'imaginary': 5.0}
To do something with our dictionary, we would then define functions, like for example complex_str
to show them nicely:
[11]:
def complex_str(cn):
return str(cn['real']) + " + " + str(cn['imaginary']) + "i"
[12]:
c = complex_number(3.0, 5.0)
print(complex_str(c))
3.0 + 5.0i
We could do something more complex, like defining the phase
of the complex number which returns a float
:
IMPORTANT: In these exercises, we care about programming, not complex numbers theory. There’s no need to break your head over formulas!
[13]:
import math
def phase(cn):
""" Returns a float which is the phase (that is, the vector angle) of the complex number
See definition: https://en.wikipedia.org/wiki/Complex_number#Absolute_value_and_argument
"""
return math.atan2(cn['imaginary'], cn['real'])
[14]:
c = complex_number(3.0, 5.0)
print(phase(c))
1.0303768265243125
We could even define functions that that take the complex number and some other parameter, for example we could define the log
of complex numbers, which return another complex number (mathematically it would be infinitely many, but we just pick the first one in the series):
[15]:
import math
def log(cn, base):
""" Returns another complex number which is the logarithm of this complex number
See definition (accomodated for generic base b):
https://en.wikipedia.org/wiki/Complex_number#Natural_logarithm
"""
return {'real':math.log(cn['real']) / math.log(base),
'imaginary' : phase(cn) / math.log(base)}
[16]:
print(log(c,2))
{'real': 1.5849625007211563, 'imaginary': 1.4865195378735334}
You see we got our dictionary representing a complex number. If we want a nicer display we can call on it the complex_str
we defined:
[17]:
print(complex_str(log(c,2)))
1.5849625007211563 + 1.4865195378735334i
1.4. Finding the pattern¶
So, what have we done so far?
Decided a data format for the complex number, saw that the dictionary is quite convenient
Defined a function to quickly create the dictionary:
def complex_number(real, imaginary):
Defined some function like
phase
andlog
to do stuff on the complex number
def phase(cn):
def log(cn, base):
Defined a function
complex_str
to express the complex number as a readable string:
def complex_str(cn):
Notice that: * all functions above take a cn
complex number dictionary as first parameter * the functions phase
and log
are quite peculiar to complex number, and to know what they do you need to have deep knowledge of what a complex number is. * the function complex_str
is more intuitive, because it covers the common need of giving a nice string representation to the data format we just defined. Also, we used the word str
as part of the name to give a hint to the reader
that probably the function behaves in a way similar to the Python function str()
.
When we encounter a new datatype in our programs, we often follow the procedure of thinking listed above. Such procedure is so common that software engineering people though convenient to provide a specific programming paradigm to represent it, called Object Oriented programming. We are now going to rewrite the complex number example using such paradigm.
1.5. Object Oriented Programming¶
In Object Oriented Programming, we usually
Introduce new datatypes by declaring a class, named for example
ComplexNumber
Are given a dictionary and define how data is stored in the dictionary (i.e. in fields
real
andimaginary
)Define a way to construct specific instances , like
3 + 2i
,5 + 6i
(instances are also called objects)Define some methods to operate on the instances (like
phase
)Define some special methods to customize how Python treats instances (for example for displaying them as strings when printing)
Let’s now create our first class.
2. ComplexNumber class¶
2.1. Class declaration¶
A minimal class declaration will at least declare the class name and the __init__
method:
[18]:
class ComplexNumber:
def __init__(self, real, imaginary):
self.real = real
self.imaginary = imaginary
Here we declare to Python that we are starting defining a template for a new class called ComplexNumber
. This template will hold a collection of functions (called methods) that manipulate instances of complex numbers (instances are 1.0 + 2.0i
, 3.0 + 4.0i
, …).
IMPORTANT: Although classes can have any name (i.e. complex_number, complexNumber, …), by convention you SHOULD use a camel cased name like ComplexNumber, with capital letters as initials and no underscores.
2.2. Constructor __init__
¶
With the dictonary model, to create complex numbers remember we defined that small utility function complex_number
, where inside we were creating the dictionary:
def complex_number(real, imaginary):
d = {}
d['real'] = real
d['imaginary'] = imaginary
return d
With classes, to create objects we have instead to define a so-called constructor method called __init__
:
[19]:
class ComplexNumber:
def __init__(self, real, imaginary):
self.real = real
self.imaginary = imaginary
__init__
is a very special method, that has the job to initialize an instance of a complex number. It has three important features:
it is defined like a function, inside the
ComplexNumber
declaration (as usual, indentation matters!)it always takes as first parameter
self
, which is an instance of a special kind of dictionary that will hold the fields of the complex number. Inside the previouscomplex_number
function, we were creating a dictionaryd
. In__init__
method, the dictionary instead is automatically created by Python and given to us in the form of parameterself
__init__
does not return anything: this is different from the previouscomplex_number
function where instead we were returning the dictionaryd
.
Later we will explain better these properties. For now, let’s just concentrate on the names of things we see in the declaration.
WARNING: There can be only one constructor method per class, and MUST be named __init__
WARNING: init MUST take at least one parameter, by convention it is usually named self
IMPORTANT: self
is just a name we give to the first parameter. It could be any name our fantasy suggest and the program would behave exactly the same!
If the editor you are using will evidence it in some special color, it is because it is aware of the convention but not because self
is some special Python keyword.
IMPORTANT: In general, any of the __init__
parameters can have completely arbitrary names, so for example the following code snippet would work exactly the same as the initial definition:
[20]:
class ComplexNumber:
def __init__(donald_duck, mickey_mouse, goofy):
donald_duck.real = mickey_mouse
donald_duck.imaginary = goofy
Once the __init__
method is defined, we can create a specific ComplexNumber
instance with a call like this:
[21]:
c = ComplexNumber(3.0,5.0)
print(c)
<__main__.ComplexNumber object at 0x7f0c4c380f60>
What happend here?
init 2.2.1) We told Python we want to create a new particular instance of the template defined by class ComplexNumber
. As parameters for the instance we indicated 3.0
and 5.0
.
WARNING: to create the instance, we used the name of the class ComplexNumber
following it by an open round parenthesis and parameters like a function call: c=ComplexNumber(3.0,5.0)
Writing just: c = ComplexNumber
would NOT instantiate anything and we would end up messing with the template ``ComplexNumber``, which is a collection of functions for complex numbers.
init 2.2.2) Python created a new special dictionary for the instance
init 2.2.3) Python passed the special dictionary as first parameter of the method __init__
, so it will be bound to parameter self
. As second and third arguments passed 3.0 and 5.0, which will be bound respectively to parameters real
and imaginary
WARNING: When instantiating an object with a call like c=ComplexNumber(3.0,5.0) you don’t need to pass a dictionary as first parameter! Python will implicitly create it and pass it as first parameter to __init__
init 2.2.4) In the __init__
method, the instructions
self.real = real
self.imaginary = imaginary
first create a key in the dictionary called real
associating to the key the value of the parameter real
(in the call is 3.0). Then the value 5.0 is bound to the key imaginary
.
IMPORTANT: we said Python provides init with a special kind of dictionary as first parameter. One of the reason it is special is that you can access keys using the dot like self.my_key. With ordinary dictionaries you would have to write the brackets like self[“my_key”]
IMPORTANT: like with dictionaries, we can arbitrarily choose the name of the keys, and which values to associate to them.
IMPORTANT: In the following, we will often refer to keys of the self dictionary with the terms field, and/or attribute.
Now one important word of wisdom:
!!!!!! COMMANDMENT 5: YOU SHALL NEVER EVER REASSIGN ``self`` !!!!!!!
Since self is a kind of dictionary, you might be tempted to do like this:
[22]:
class EvilComplexNumber:
def __init__(self, real, imaginary):
self = {'real':real, 'imaginary':imaginary}
but to the outside world this will bring no effect. For example, let’s say somebody from outside makes a call like this:
[23]:
ce = EvilComplexNumber(3.0, 5.0)
At the first attempt of accessing any field, you would get an error because after the initalization c
will point to the yet untouched self
created by Python, and not to your dictionary (which at this point will be simply lost):
print(ce.real)
AttributeError: EvilComplexNumber instance has no attribute ‘real’
In general, you DO NOT reassign self
to anything. Here are other example DON’Ts:
self = ['666'] # self is only supposed to be a sort of dictionary which is passed by Python
self = 6 # self is only supposed to be a sort of dictionary which is passed by Python</p>
init 2.2.5) Python automatically returns from __init__
the special dictionary self
WARNING: __init__
must NOT have a return
statement ! Python will implicitly return self
!
init 2.2.6) The result of the call (so the special dictionary) is bound to external variable ‘c`:
c = ComplexNumber(3.0, 5.0)
init 2.2.7) You can then start using c
as any variable
[24]:
print(c)
<__main__.ComplexNumber object at 0x7f0c4c380f60>
From the output, you see we have indeed an instance of the class ComplexNumber
. To see the difference between instance and class, you can try printing the class instead:
[25]:
print(ComplexNumber)
<class '__main__.ComplexNumber'>
IMPORTANT: You can create an infinite number of different instances (i.e.
ComplexNumber(1.0, 1.0)
, ComplexNumber(2.0, 2.0)
, ComplexNumber(3.0, 3.0)
, … ), but you will have only one class definition for them (ComplexNumber
).
We can now access the fields of the special dictionary by using the dot notation as we were doing with the ‘self`:
[26]:
print(c.real)
3.0
[27]:
print(c.imaginary)
5.0
If we want, we can also change them:
[28]:
c.real = 6.0
print(c.real)
6.0
2.3. Defining methods¶
2.3.1 phase¶
Let’s make our class more interesting by adding the method phase(self)
to operate on the complex number:
[29]:
import unittest
import math
class ComplexNumber:
def __init__(self, real, imaginary):
self.real = real
self.imaginary = imaginary
def phase(self):
""" Returns a float which is the phase (that is, the vector angle) of the complex number
This method is something we introduce by ourselves, according to the definition:
https://en.wikipedia.org/wiki/Complex_number#Absolute_value_and_argument
"""
return math.atan2(self.imaginary, self.real)
The method takes as first parameter self
which again is a special dictionary. We expect the dictionary to have already been initialized with some values for real
and imaginary
fields. We can access them with the dot notation as we did before:
return math.atan2(self.imaginary, self.real)
How can we call the method on instances of complex numbers? We can access the method name from an instance using the dot notation as we did with other keys:
[30]:
c = ComplexNumber(3.0,5.0)
print(c.phase())
1.0303768265243125
What happens here?
By writing c.phase()
, we call the method phase(self)
which we just defined. The method expects as first parameter self
a class instance, but in the call c.phase()
apparently we don’t provide any parameter. Here some magic is going on, and Python implicitly is passing as first parameter the special dictionary bound to c
. Then it executes the method and returns the desired float.
WARNING: Put round parenthesis in method calls!
When calling a method, you MUST put the round parenthesis after the method name like in c.phase()
! If you just write c.phase
without parenthesis you will get back an address to the physical location of the method code:
>>> c.phase
<bound method ComplexNumber.phase of <__main__.ComplexNumber instance at 0xb465a4cc>>
2.3.2 log¶
We can also define methods that take more than one parameter, and also that create and return ComplexNumber
instances, like for example the method log(self, base)
:
[31]:
import math
class ComplexNumber:
def __init__(self, real, imaginary):
self.real = real
self.imaginary = imaginary
def phase(self):
""" Returns a float which is the phase (that is, the vector angle) of the complex number
This method is something we introduce by ourselves, according to the definition:
https://en.wikipedia.org/wiki/Complex_number#Absolute_value_and_argument
"""
return math.atan2(self.imaginary, self.real)
def log(self, base):
""" Returns another ComplexNumber which is the logarithm of this complex number
This method is something we introduce by ourselves, according to the definition:
(accomodated for generic base b)
https://en.wikipedia.org/wiki/Complex_number#Natural_logarithm
"""
return ComplexNumber(math.log(self.real) / math.log(base), self.phase() / math.log(base))
WARNING: ALL METHODS MUST HAVE AT LEAST ONE PARAMETER, WHICH BY CONVENTION IS NAMED self !
To call log
, you can do as with phase
but this time you will need also to pass one parameter for the base
parameter, in this case we use the exponential math.e
:
[32]:
c = ComplexNumber(3.0, 5.0)
logarithm = c.log(math.e)
WARNING: As before for phase
, notice we didn’t pass any dictionary as first parameter! Python will implicitly pass as first argument the instance c
as self
, and math.e
as base
[33]:
print(logarithm)
<__main__.ComplexNumber object at 0x7f0c4c39e470>
To see if the method worked and we got back we got back a different complex number, we can print the single fields:
[34]:
print(logarithm.real)
1.0986122886681098
[35]:
print(logarithm.imaginary)
1.0303768265243125
2.3.3 __str__
for printing¶
As we said, printing is not so informative:
[36]:
print(ComplexNumber(3.0, 5.0))
<__main__.ComplexNumber object at 0x7f0c4c3f53c8>
It would be nice to instruct Python to express the number like “3.0 + 5.0i” whenever we want to see the ComplexNumber
represented as a string. How can we do it? Luckily for us, defining the __str__(self) method
(see bottom of class definition)
WARNING: There are two underscores _
before and two underscores _
after in __str__
!
[37]:
import math
class ComplexNumber:
def __init__(self, real, imaginary):
self.real = real
self.imaginary = imaginary
def phase(self):
""" Returns a float which is the phase (that is, the vector angle) of the complex number
This method is something we introduce by ourselves, according to the definition:
https://en.wikipedia.org/wiki/Complex_number#Absolute_value_and_argument
"""
return math.atan2(self.imaginary, self.real)
def log(self, base):
""" Returns another ComplexNumber which is the logarithm of this complex number
This method is something we introduce by ourselves, according to the definition:
(accomodated for generic base b)
https://en.wikipedia.org/wiki/Complex_number#Natural_logarithm
"""
return ComplexNumber(math.log(self.real) / math.log(base), self.phase() / math.log(base))
def __str__(self):
return str(self.real) + " + " + str(self.imaginary) + "i"
IMPORTANT: all methods starting and ending with a double underscore __
have a special meaning in Python: depending on their name, they override some default behaviour. In this case, with __str__
we are overriding how Python represents a ComplexNumber
instance into a string.
WARNING:
Since we are overriding Python default behaviour, it is very important that we follow the specs of the method we are overriding to the letter. In our case, the specs for __str__ obviously state you MUST return a string. Do read them!
[38]:
c = ComplexNumber(3.0, 5.0)
We can also pretty print the whole complex number. Internally, print
function will look if the class ComplexNumber
has defined a method named __str__
. If so, it will pass to the method the instance c
as the first argument, which in our methods will end up in the self
parameter:
[39]:
print(c)
3.0 + 5.0i
[40]:
print(c.log(2))
1.5849625007211563 + 1.4865195378735334i
Special Python methods are like any other method, so if we wish, we can also call them directly:
[41]:
c.__str__()
[41]:
'3.0 + 5.0i'
EXERCISE: There is another method for getting a string representation of a Python object, called __repr__
. Read carefully __repr__ documentation and implement the method. To try it and see if any difference appear with respect to str, call the standard Python functions repr
and str
like this:
c = ComplexNumber(3,5)
print(repr(c))
print(str(c))
QUESTION: Would 3.0 + 5.0i
be a valid Python expression ? Should we return it with __repr__
? Read again also __str__ documentation
2.4. ComplexNumber code skeleton¶
We are now ready to write methods on our own. Open Visual Studio Code (no jupyter in part B !) and proceed editing file ComplexNumber_exercise.py
To see how to test, try running this in the console, tests should pass (if system doesn’t find python3
write python
):
python3 -m unittest ComplexNumber_test.ComplexNumberTest
2.5. Complex numbers magnitude¶
Implement the magnitude
method, using this signature:
def magnitude(self):
""" Returns a float which is the magnitude (that is, the absolute value) of the complex number
This method is something we introduce by ourselves, according to the definition:
https://en.wikipedia.org/wiki/Complex_number#Absolute_value_and_argument
"""
raise Exception("TODO implement me!")
To test it, check this test in MagnitudeTest
class passes (notice the almost
in assertAlmostEquals
!!!):
def test_01_magnitude(self):
self.assertAlmostEqual(ComplexNumber(3.0,4.0).magnitude(),5, delta=0.001)
To run the test, in the console type:
python3 -m unittest ComplexNumber_test.MagnitudeTest
2.6. Complex numbers equality¶
Here we will try to give you a glimpse of some aspects related to Python equality, and trying to respect interfaces when overriding methods. Equality can be a nasty subject, here we will treat it in a simplified form.
First of all, try to execute this command, you should get back False
[42]:
ComplexNumber(1,2) == ComplexNumber(1,2)
[42]:
False
How comes we get False
? The reason is whenever we write ComplexNumber(1,2)
we are creating a new object in memory. Such object will get assigned a unique address number in memory, and by default equality between class instances is calculated considering only equality among memory addresses. In this case we create one object to the left of the expression and another one to the right. So far we didn’t tell Python how to deal with equality for ComplexNumber
classes, so default equality
testing is used by checking memory addresses, which are different - so we get False
.
To get True
as we expect, we need to implement __eq__
special method. This method should tell Python to compare the fields within the objects, and not just the memory address.
REMEMBER: as all methods starting and ending with a double underscore __
, __eq__
has a special meaning in Python: depending on their name, they override some default behaviour. In this case, with __eq__
we are overriding how Python checks equality. Please review __eq__ documentation before continuing.
QUESTION: What is the return type of __eq__
?
Implement equality for
ComplexNumber
more or less as it was done forFraction
Use this method signature:
def __eq__(self, other):
Since
__eq__
is a binary operation, hereself
will represent the object to the left of the==
, andother
the object to the right.
Use this simple test case to check for equality in class EqTest
:
def test_01_integer_equality(self):
"""
Note all other tests depend on this test !
We want also to test the constructor, so in c we set stuff by hand
"""
c = ComplexNumber(0,0)
c.real = 1
c.imaginary = 2
self.assertEquals(c, ComplexNumber(1,2))
To run the test, in the console type:
python3 -m unittest ComplexNumber_test.EqTest
Beware ‘equality’ is tricky in Python for float numbers! Rule of thumb: when overriding
__eq__
, use ‘dumb’ equality, two things are the same only if their parts are literally equalIf instead you need to determine if two objects are similar, define other ‘closeness’ functions.
Once done, check again
ComplexNumber(1,2) == ComplexNumber(1,2)
command and see what happens, this time it should give backTrue
.
QUESTION: What about ComplexNumber(1,2) != ComplexNumber(1,2)
? Does it behaves as expected?
(Non mandatory read) if you are interested in the gory details of equality, see
2.7. Complex numbers isclose¶
Complex numbers can be represented as vectors, so intuitively we can determine if a complex number is close to another by checking that the distance between its vector tip and the the other tip is less than a given delta. There are more precise ways to calculate it, but here we prefer keeping the example simple.
Given two complex numbers
and
We can consider them as close if they satisfy this condition:
Implement the method in
ComplexNumber
class:
def isclose(self, c, delta):
""" Returns True if the complex number is within a delta distance from complex number c.
"""
raise Exception("TODO Implement me!")
Check this test case IsCloseTest
class pass:
def test_01_isclose(self):
""" Notice we use `assertTrue` because we expect `isclose` to return a `bool` value, and
we also test a case where we expect `False`
"""
self.assertTrue(ComplexNumber(1.0,1.0).isclose(ComplexNumber(1.0,1.1), 0.2))
self.assertFalse(ComplexNumber(1.0,1.0).isclose(ComplexNumber(10.0,10.0), 0.2))
To run the test, in the console type:
python3 -m unittest ComplexNumber_test.IscloseTest
REMEMBER: Equality with __eq__
and closeness functions like isclose
are very different things. Equality should check if two objects have the same memory address or, alternatively, if they contain the same things, while closeness functions should check if two objects are similar. You should never use functions like isclose
inside __eq__
methods, unless you really know what you’re doing.
2.8. Complex numbers addition¶
a
andc
correspond toreal
,b
andd
correspond toimaginary
implement addition for
ComplexNumber
more or less as it was done forFraction
in theory slideswrite some tests as well!
Use this definition:
def __add__(self, other):
raise Exception("TODO implement me!")
Check these two tests pass in AddTest
class:
def test_01_add_zero(self):
self.assertEquals(ComplexNumber(1,2) + ComplexNumber(0,0), ComplexNumber(1,2));
def test_02_add_numbers(self):
self.assertEquals(ComplexNumber(1,2) + ComplexNumber(3,4), ComplexNumber(4,6));
To run the tests, in the console type:
python3 -m unittest ComplexNumber_test.AddTest
2.9. Adding a scalar¶
We defined addition among ComplexNumbers, but what about addition among a ComplexNumber and an int
or a float
?
Will this work?
ComplexNumber(3,4) + 5
What about this?
ComplexNumber(3,4) + 5.0
Try to add the following method to your class, and check if it does work with the scalar:
[43]:
def __add__(self, other):
# checks other object is instance of the class ComplexNumber
if isinstance(other, ComplexNumber):
return ComplexNumber(self.real + other.real,self.imaginary + other.imaginary)
# else checks the basic type of other is int or float
elif type(other) is int or type(other) is float:
return ComplexNumber(self.real + other, self.imaginary)
# other is of some type we don't know how to process.
# In this case the Python specs say we MUST return 'NotImplemented'
else:
return NotImplemented
Hopefully now you have a better add. But what about this? Will this work?
5 + ComplexNumber(3,4)
Answer: it won’t, Python needs further instructions. Usually Python tries to see if the class of the object on left of the expression defines addition for operands to the right of it. In this case on the left we have a float
number, and float numbers don’t define any way to deal to the right with your very own ComplexNumber
class. So as a last resort Python tries to see if your ComplexNumber
class has defined also a way to deal with operands to the left of the ComplexNumber
,
by looking for the method __radd__
, which means reverse addition . Here we implement it :
def __radd__(self, other):
""" Returns the result of expressions like other + self """
if (type(other) is int or type(other) is float):
return ComplexNumber(self.real + other, self.imaginary)
else:
return NotImplemented
To check it is working and everything is in order for addition, check these tests in RaddTest
class pass:
def test_01_add_scalar_right(self):
self.assertEquals(ComplexNumber(1,2) + 3, ComplexNumber(4,2));
def test_02_add_scalar_left(self):
self.assertEquals(3 + ComplexNumber(1,2), ComplexNumber(4,2));
def test_03_add_negative(self):
self.assertEquals(ComplexNumber(-1,0) + ComplexNumber(0,-1), ComplexNumber(-1,-1));
2.10. Complex numbers multiplication¶
Implement multiplication for
ComplexNumber
, taking inspiration from previous__add__
implementationCan you extend multiplication to work with scalars (both left and right) as well?
To implement __mul__
, implement definition into ComplexNumber
class:
def __mul__(self, other):
raise Exception("TODO Implement me!")
and make sure these tests cases pass in MulTest
class:
def test_01_mul_by_zero(self):
self.assertEquals(ComplexNumber(0,0) * ComplexNumber(1,2), ComplexNumber(0,0));
def test_02_mul_just_real(self):
self.assertEquals(ComplexNumber(1,0) * ComplexNumber(2,0), ComplexNumber(2,0));
def test_03_mul_just_imaginary(self):
self.assertEquals(ComplexNumber(0,1) * ComplexNumber(0,2), ComplexNumber(-2,0));
def test_04_mul_scalar_right(self):
self.assertEquals(ComplexNumber(1,2) * 3, ComplexNumber(3,6));
def test_05_mul_scalar_left(self):
self.assertEquals(3 * ComplexNumber(1,2), ComplexNumber(3,6));
3. MultiSet¶
You are going to implement a class called MultiSet
, where you are only given the class skeleton, and you will need to determine which Python basic datastructures like list
, set
, dict
(or combinations thereof) is best suited to actually hold the data.
In math a multiset (or bag) generalizes a set by allowing multiple instances of the multiset’s elements.
The multiplicity of an element is the number of instances of the element in a specific multiset.
For example:
The multiset
a, b
contains only elementsa
andb
, each having multiplicity 1In multiset
a, a, b
,a
has multiplicity 2 andb
has multiplicity 1In multiset
a, a, a, b, b, b
,a
andb
both have multiplicity 3
NOTE: order of insertion does not matter, so a, a, b
and a, b, a
are the same multiset, where a
has multiplicity 2 and b
has multiplicity 1.
[44]:
from multiset_solution import *
3.1 __init__
add
and get
¶
Now implement all of the following methods: __init__
, add
and get
:
def __init__(self):
""" Initializes the MultiSet as empty."""
raise Exception("TODO IMPLEMENT ME !!!")
def add(self, el):
""" Adds one instance of element el to the multiset
NOTE: MUST work in O(1)
"""
raise Exception("TODO IMPLEMENT ME !!!")
def get(self, el):
""" Returns the multiplicity of element el in the multiset.
If no instance of el is present, return 0.
NOTE: MUST work in O(1)
"""
raise Exception("TODO IMPLEMENT ME !!!")
Testing
Once done, running this will run only the tests in AddGetTest
class and hopefully they will pass.
Notice that multiset_test
is followed by a dot and test class name .AddGetTest
:
python3 -m unittest multiset_test.AddGetTest
3.2 removen
¶
Implement the following removen
method:
def removen(self, el, n):
""" Removes n instances of element el from the multiset (that is, reduces el multiplicity by n)
If n is negative, raises ValueError.
If n represents a multiplicity bigger than the current multiplicity, raises LookupError
NOTE: multiset multiplicities are never negative
NOTE: MUST work in O(1)
"""
Testing: python3 -m unittest multiset_test.RemovenTest
Sorting¶
Introduction¶
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|-sorting
|- sorting.ipynb
|- selection_sort_exercise.py
|- selection_sort_test.py
|- selection_sort_solution.py
open the editor of your choice (for example Visual Studio Code, Spyder or PyCharme), you will edit the files ending in
_exercise.py
filesGo on reading this notebook, and follow instuctions inside.
List performance¶
Python lists are generic containers, they are useful in a variety of scenarios but sometimes their perfomance can be disappointing, so it’s best to know and avoid potentially expensive operations. Table from the book Chapter 2.6: Lists
|
|
---|---|
Fast or not?¶
x = ["a", "b", "c"]
x[2]
x[2] = "d"
x.append("d")
x.insert(0, "d")
x[3:5]
x.sort()
What about len(x)
? If you don’t know the answer, try googling it!
Sublist iteration performance
get slice
time complexity is O(k)
, but what about memory? It’s the same!
So if you want to iterate a part of a list, beware of slicing! For example, slicing a list like this can occupy much more memory than necessary:
[2]:
x = range(1000)
print([2*y for y in x[100:200]])
[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]
The reason is that, depending on the Python interpreter you have, slicing like x[100:200]
at loop start can create a new list. If we want to explicitly tell Python we just want to iterate through the list, we can use the so called itertools. In particular, the islice method is handy, with it we can rewrite the list comprehension above like this:
[3]:
import itertools
print([2*y for y in itertools.islice(x, 100, 200)])
[200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398]
Exercises¶
1 Selection Sort¶
We will try to implement Selection Sort on our own. Montresor slides already contain the Python solution, but don’t look at them (we will implement a slightly different solution anyway). In this exercises, you will only be allowed to look at this picture:
To start with, open selection_sort_exercise.py
in an editor of your choice.
Now proceed reading.
1.1 Implement swap
¶
[4]:
def swap(A, i, j):
""" MODIFIES the array A by swapping the elements at position i and j
"""
raise Exception("TODO implement me!")
In order to succeed in this part of the course, you are strongly invited to first think hard about a function, and then code! So to start with, pay particular attention at the required inputs and expected outputs of functions. Before start coding, answer these questions:
QUESTION 1.1.1: What are the input types of swap
? In particular
What is the type of the elements in
A
?Can we have both strings and floats inside
A
?
What is the type of
i
andj
?
COMMANDMENT 2: You shall also write on paper!
Help yourself by drawing a representation of input array. Staring at the monitor doesn’t always work, so help yourself and draw a representation of the state sof the program. Tables, nodes, arrows, all can help figuring out a solution for the problem.
QUESTION 1.1.2: What should be the result of the three prints here? Should the function swap
return something at all ? Try to answer this question before proceeding.
A = ['a','b','c']
print(A)
print(swap(A, 0, 2))
print(A)
HINT: Remember this:
COMMANDMENT 7: You shall use return
command only if you see written return in the function description!
If there is no return
in function description, the function is intended to return None
.
QUESTION 1.1.3: Try to answer this question before proceeding:
What is the result of the first and second print down here?
What is the result of the final print if we have arbitrary indeces \(i\) and \(j\) with \(0 \leq i,j \leq 2\) ?
A = ['a','b','c']
swap(A, 0, 2)
print(A)
swap(A, 0, 2)
print(A)
QUESTION 1.1.3: Try to answer this question before proceeding:
What is the result of the first and second print down here?
What is the result of the final print if we have arbitrary indeces \(i\) and \(j\) with \(0 \leq i,j \leq 2\) ?
A = ['a','b','c']
swap(A, 0, 2)
print(A)
swap(A, 2, 0)
print(A)
QUESTION 1.1.4: What is the result of the final print here? Try to answer this question before proceeding:
A = ['a','b','c']
swap(A, 1, 1)
print(A)
QUESTION 1.1.5:
In the same file
selection_sort.py
copy at the end the test code at the end of this question.Read carefully all the test cases, in particular
test_swap_property
andtest_double_swap
. They show two important properties of the swap function that you should have discovered while ansering the questions before.Why should these tests succeed with implemented code? Make sure to answer.
EXERCISE: implement swap
Proceed implementing the swap
function
To test the function, run:
python3 -m unittest selection_sort_test.SwapTest
Notice that:
In the command above there is no
.py
at the end ofselection_sort_test
We are executing the command in the operating system shell, not Python (there must not be
>>>
at the beginning)At the end of the filename, there is a dot followed by a test class name
SwapTest
, which means Python will only execute tests contained inSwapTest
. Of course, in this case those are all the tests we have, but if we add many test classes to our file, it will be useful to able to filter executed tests.According to your distribution (i.e. Anaconda), you might need to write
python
instead ofpython3
QUESTION 1.1.6: Read Error kinds section in Testing. Suppose you will be the only one calling swap
, and you suspect your program somewhere is calling swap with wrong parameters. Which kind of error would that be? Add to swap
some appropriate precondition checking.
1.2 Implement argmin
¶
Try to code and test the partial argmin
pos function:
[5]:
def argmin(A, i):
""" RETURN the *index* of the element in list A which is lesser than or equal
to all other elements in A that start from index i included
- MUST execute in O(n) where n is the length of A
"""
raise Exception("TODO implement me!")
QUESTION 1.2.1: What are the input types of argmin
? In particular
What could be the type of the elements in
A
?Can we have both strings and floats inside
A
?
What is the type of
i
?What is the range of
i
?
QUESTION 1.2.2: Should the function argmin
return something ? What would be the result type? Try to answer this question before proceeding.
QUESTION 1.2.3: Look again at the selection_sort
matrix, and compare it to the argmin
function definition:
Can you understand the meaning of orange and white boxes?
What does the yellow box represent?
QUESTION 1.2.4:
Draw a matrix like the above for the array
A = ['b','a','c']
, adding the corresponding row and column numbers fori
andj
What should be the result of the three prints here?
A = ['a','b','c']
print(argmin(A,0))
print(argmin(A,1))
print(argmin(A,2))
print(A)
EXERCISE 1.2.5: Copy the following test code at the end of the file selection_sort.py
, and start coding a solution.
To test the function, run:
python3 -m unittest selection_sort_test.ArgminTest
Notice how now we are appending .ArgminTest
at the end of the command.
Warning: Don’t use slices ! Remember their computational complexity, and that in these labs we do care about performances!
1.3: Full selection_sort
¶
Let’s talks about implementing selection_sort
function in selection_sort_exercise.py
[6]:
def selection_sort(A):
""" Sorts the list A in-place in O(n^2) time this ways:
1. Looks at minimal element in the array [i:n],
and swaps it with first element.
2. Repeats step 1, but considering the subarray [i+1:n]
Remember selection sort has complexity O(n^2) where n is the
size of the list.
"""
raise Exception("TODO implement me!")
Note: on the book website there is an implementation of the selection sort with a nice animated histogram showing a sorting process. Differently from the slides, instead of selecting the minimal element the algorithm on the book selects the maximal element and puts it to the right of the array.
QUESTION 1.3.1:
What is the expected return type? Does it return anything at all?
What is the meaning of ‘Sorts the list A in-place’ ?
QUESTION 1.3.2:
At the beginning, which array indeces are considered?
At the end, which array indeces are considered ? Is
A[len(A) - 1:len(A)]
ever considered ?
EXERCISE 1.3.3:
Try now to implement selection_sort
in selection_sort_exercise.py
, using the two previously defined functions swap
and argmin
.
HINT: If you are having troubles because your selection sort passes wrong arguments to either swap or argmin, feel free to add further assertions to both. They are much more effective than prints !
To test the function, run:
python3 -m unittest selection_sort_test.SelectionSortTest
2 Insertion sort¶
Insertion sort is a basic sorting algorithm. This animation gives you an idea of how it works:
From the animation, you can see these things are going on:
The red square selects one number starting from the leftomost (question: does it actually need to be the leftmost ? Can we save one iteration?). Let’s say it starts at position
i
.While the number in the red square is lesser then the previous one, it is shifted back one position at a time
The red square now selects the number immediately following the previous starting point of the red square, that is, selects position
i + 1
From the analysis above:
how many cycles do we need ? One, Two, Three?
Are they nested?
Is there one cycle with a fixed number of iterations ? Is there one with an unknown number of iterations?
What is the worst-case complexity of the algorithm?
As always, if you have troubles finding a generic solution, take a fixed list and manually write down all the steps to do the algorithm. Here we give a sketch:
i=0,1,2,3,4,5
A = [3,8,9,7,6,2]
Let’s say we have red square at i=4
i = 4
red = A[4] # in red we put the value in A[4] which is 6
# 0,1,2,3,4,5
# [3,7,8,9,6,2] start
A[4] = A[3] # [3,7,8,9,9,2]
A[3] = A[2] # [3,7,8,8,9,2]
A[2] = A[1] # [3,7,7,8,9,2]
A[1] = red # [3,6,7,8,9,2] A[1] < red, stop
We can generalize A index with a j
:
i = 4
red = A[4]
j = 4
while ...
A[j] = A[j-1]
j -= 1
A[j] = red
Start editing the file insertion_sort_exercise.py
and implement insertion_sort
without looking at theory slides.
def insertion_sort(A):
""" Sorts in-place list A with insertion sort
"""
3 Merge sort¶
With merge sort we model lists to ordered as stacks, so it is important to understand how to take elements from the end of a list and how to reverse a list to change its order.
Taking last element¶
To take last element from a list you may use [-1]
:
[7]:
[9,7,8][-1]
[7]:
8
Reversing a list¶
REMEMBER: .reverse()
method MODIFIES the list it is called on and returns None
!
[8]:
lst = [9,7,8]
lst.reverse()
Notice how above Jupyter did not show anything, because implicitly the result of the call was None
. Still, we have an effect, lst
is now reversed:
[9]:
lst
[9]:
[8, 7, 9]
If you want to reversed version of a list without actually changing it, you can use reversed
function:
[10]:
lst = [9,7,8]
reversed(lst)
[10]:
<list_reverseiterator at 0x7f1848121198>
The returned value is an iterator, so something which is able to produce a reversed version of the list but it is still not a list. If you actually want to get back a list, you need to explicitly cast it to list
:
[11]:
lst = [9,7,8]
list(reversed(lst))
[11]:
[8, 7, 9]
Notice lst
itself was not changed:
[12]:
lst
[12]:
[9, 7, 8]
Removing last element with .pop()¶
To remove an element, you can use .pop()
method, which does two things:
if not given any argument, removes the last element in \(O(1)\) time
returns it to the caller of the method, so for example we can conveniently store it in a variable
[13]:
A = [9,7,8]
x = A.pop()
[14]:
print(A)
[9, 7]
[15]:
print(x)
8
WARNING: internal deletion is expensive !
If you pay attention to performance (and in this course part you are), whenever you have to remove elements from a Python list be very careful about the complexity! Removal at the end is a very fast O(1), but internal removal is O(n) !
Costly internal del¶
You can remove an internal element with del
NOTE: del
returns None
[16]:
lst = [9,5,6,7]
del lst[2] # internal delete is O(n)
[17]:
lst
[17]:
[9, 5, 7]
Costly internal pop¶
You can remove an internal element with pop(i)
[18]:
lst = [9,5,6,7]
lst.pop(2) # internal pop is O(n)
[18]:
6
[19]:
lst
[19]:
[9, 5, 7]
3.1 merge 1¶
Start editing merge_sort_exercise.py
merge1
takes two already ordered lists of size n
and m
and return a new one made with the elements of both in \(O(n+m)\) time. For example:
[20]:
from merge_sort_solution import *
merge1([3,6,9,13,15], [2,4,8,9])
[20]:
[2, 3, 4, 6, 8, 9, 9, 13, 15]
To implement it, keep comparing the last elements of the two lists, and at each round append the greatest in a temporary list, which you shall return at the end of the function (remember to reverse it!).
Example:
If we imagine the numbers as ordered card decks, we can picture them like this:
2 15
4 13
4 10
6 9
15 8 8
13 10 9 6
9 8 10 4
6 4 13 4
4 2 15 2
A B TMP RESULT
As Python lists, they would look like:
A=[4,6,9,13,15]
B=[2,4,8,10]
TMP=[15,13,10,9,8,6,4,4,2]
RESULT=[2,4,4,6,8,9,10,13,15]
The algorithm would:
compare 15 and 10, pop 15 and put it in TMP
compare 13 and 10, pop 13 and put it in TMP
compare 9 and 10, pop 10 and put it in TMP
compare 9 and 8, pop 9 and put it in TMP
etc …
finally return a reversed TMP
It remains to decide what to do when one of the two lists remains empty, but this is up to you.
To test:
python3 -m unittest merge_sort_test.Merge1Test
3.2 merge2¶
merge2
takes A and B as two ordered lists (from smallest to greatest) of (possibly negative) integers. Lists are of size n and m respectively, and RETURN a NEW list composed of the items in A and B ordered from smallest to greatest
MUST RUN IN O(m+n)
in this version, do NOT use
.pop()
on input lists to reduce their size. Instead, use indeces to track at which point you are, starting at zero and putting minimal elements in result list, so this time you don’t even need a temporary list.
8 15
7 13
6 10
5 9
4 15 8
3 13 10 6
2 9 8 4
1 6 4 4
0 4 2 2
index A B RESULT
Sketch:
set i=0 (left index) and j=0 (right index)
compare 4 and 2, put 2 in RESULT, set i=0, j=1
compare 4 and 4, put 4 in RESULT, set i=1, j=1
compare 6 and 4, put 4 in RESULT, set i=1, j=2
compare 6 and 8, put 6 in RESULT, set i=2, j=2
etc …
finally return RESULT
To test:
python3 -m unittest merge_sort_test.Merge2Test
4 quick sort¶
Quick sort is a widely used sorting algorithm and in this exercise you will implement it following the pseudo code.
IMPORTANT: Array A
in the pseudo code has indexes starting from zero included
IMPORTANT: The functions pivot and quicksort operate an a subarray that goes from indeces first included and last included !!!
Start editing the file quick_sort_exercise.py
:
4.1 pivot¶
Try look at this pseudocode and implement pivot
method.
IMPORTANT: If something goes wrong (it will), find the problem using the debugger !
def pivot(A, first, last):
""" MODIFIES in-place the slice of the array A with indeces between first included
and last **included**. RETURN the new pivot index.
"""
raise Exception("TODO IMPLEMENT ME!")
You can run tests only for pivot
with this command:
python3 -m unittest quick_sort_test.PivotTest
4.2 quicksort and qs¶
Implement quicksort
and qs
method:
def quicksort(A, first, last):
"""
Sorts in-place the slice of the array A with indeces between
first included and last included.
"""
raise Exception("TODO IMPLEMENT ME !")
def qs(A):
"""
Sorts in-place the array A by calling quicksort function on the
full array.
"""
raise Exception("TODO IMPLEMENT ME !")
You can run tests only for both quicksort
and qs
with this command:
python3 -m unittest quick_sort_test.QuicksortTest
5. chaining¶
You will be doing exercises about chainable lists, using plain old Python lists. This time we don’t actually care about sorting, we just want to detect duplicates and chain sequences fast.
Start editing the file exerciseB2.py
and read the following.
5.1 has_duplicates¶
Implement the function has_duplicates
def has_duplicates(external_list):
"""
Returns True if internal lists inside external_list contain duplicates,
False otherwise. For more info see exam and tests.
INPUT: a list of list of strings, possibily containing repetitions, like:
[
['ab', 'c', 'de'],
['v', 'a'],
['c', 'de', 'b']
]
OUTPUT: Boolean (in the example above it would be True)
"""
MUST RUN IN \(O(m*n)\), where \(m\) is the number of internal lists and \(n\) is the length of the longest internal list (just to calculate complexity think about the scenario where all lists have equal size)
HINT: Given the above constraint, whenever you find an item, you cannot start another for loop to check if the item exists elsewhere - that would cost around \(O(m^2*n)\). Instead, you need to keep track of found items with some other data structure of your choice, which must allow fast read and writes.
Testing: python3 -m unittest chains_test.TestHasDuplicates
B.2.2 chain¶
Implement the function chain
:
def chain(external_list):
"""
Takes a list of list of strings and return a list containing all the strings
from external_list in sequence, joined by the ending and starting strings
of the internal lists. For more info see exam and tests.
INPUT: a list of list of strings , like:
[
['ab', 'c', 'de'],
['gh', 'i'],
['de', 'f', 'gh']
]
OUTPUT: a list of strings, like ['ab', 'c', 'de', 'f', 'gh', 'i']
It is assumed that
external_list
always contains at least one internal listinternal lists always contain at least two strings
no string is duplicated among all internal lists
Output sequence is constructed as follows:
it starts will all the items from the first internal list
successive items are taken from an internal list which starts with a string equal to the previous taken internal list last string
sequence must not contain repetitions (so joint strings are taken only once).
all internal lists must be used. If this is not possible (because there are no joint strings), raise
ValueError
Be careful that:
MUST BE WRITTEN WITH STANDARD PYTHON FUNCTIONS
MUST RUN IN \(O(m * n)\), where \(m\) is the number of internal lists and \(n\) is the length of the longest internal list (just to calculate complexity think about the scenario where all lists have equal size)
HINT: Given the above constraint, whenever you find a string, you cannot start another for loop to check if the string exists elsewhere (that would likely introduce a quadratic \(m^2\) factor) Instead, you need to first keep track of both starting strings and the list they are contained within using some other data structure of your choice, which must allow fast read and writes.
if possible avoid slicing (which doubles memory usage) and use
itertools.islice
instead
Testing: python3 -m unittest chains_test.TestChain
6 SwapArray¶
NOTE: This exercise was given at an exam. Solving it could have been quite easy, if students had just read the book (which is available when doing the exam)!
Interpret it as a warning that reading these worksheets alone is not enough to pass the exam.
You are given a class SwapArray
that models an array where the only modification you can do is to swap an element with the successive one.
[21]:
from swap_array_solution import *
To create a SwapArray
, just call it passing a python list:
[22]:
sarr = SwapArray([7,8,6])
print(sarr)
SwapArray: [7, 8, 6]
Then you can query in \(O(1)\) it by calling get()
and get_last()
[23]:
sarr.get(0)
[23]:
7
[24]:
sarr.get(1)
[24]:
8
[25]:
sarr.get_last()
[25]:
6
You can know the size in \(O(1)\) with size()
method:
[26]:
sarr.size()
[26]:
3
As we said, the only modification you can do to the internal array is to call swap_next
method:
def swap_next(self, i):
""" Swaps the elements at indeces i and i + 1
If index is negative or greater or equal of the last index, raises
an IndexError
"""
For example:
[27]:
sarr = SwapArray([7,8,6,3])
print(sarr)
SwapArray: [7, 8, 6, 3]
[28]:
sarr.swap_next(2)
print(sarr)
SwapArray: [7, 8, 3, 6]
[29]:
sarr.swap_next(0)
print(sarr)
SwapArray: [8, 7, 3, 6]
Now start editing the file swap_array_exercise.py
:
6.1 is_sorted¶
Implement the is_sorted
function, which is a function external to the class SwapArray
:
def is_sorted(sarr):
""" Returns True if the provided SwapArray sarr is sorted, False otherwise
NOTE: Here you are a user of SwapArray, so you *MUST NOT* access
directly the field _arr.
NOTE: MUST run in O(n) where n is the length of the array
"""
raise Exception("TODO IMPLEMENT ME !")
Once done, running this will run only the tests in IsSortedTest
class and hopefully they will pass.
python3 -m unittest swap_array_test.IsSortedTest
Example usage:
[30]:
is_sorted(SwapArray([8,5,6]))
[30]:
False
[31]:
is_sorted(SwapArray([5,6,6,8]))
[31]:
True
6.2 max_to_right¶
Implement max_to_right
function, which is a function external to the class SwapArray
. There are two ways to implement it, try to minimize the reads from the SwapArray.
def max_to_right(sarr,i):
""" Modifies the provided SwapArray sarr so that its biggest element
in the subarray from 0 to i is moved at index i.
Elements *after* i are *not* considered.
The order in which the other elements will be after a call
to this function is left unspecified (so it could be any).
NOTE: Here you are a user of SwapArray, so you *MUST NOT* access
directly the field _arr. To do changes, you can only use
the method swap_next(self, i).
NOTE: does *not* return anything!
NOTE: MUST run in O(n) where n is the length of the array
"""
** Testing **: python3 -m unittest swap_array_test.MaxToRightTest
Example usage:
[32]:
sarr = SwapArray([7, 9, 6, 5, 8])
print(sarr)
SwapArray: [7, 9, 6, 5, 8]
[33]:
max_to_right(sarr,4) # 4 is an *index*
print(sarr)
SwapArray: [7, 6, 5, 8, 9]
[34]:
sarr = SwapArray([7, 9, 6, 5, 8])
print(sarr)
SwapArray: [7, 9, 6, 5, 8]
[35]:
max_to_right(sarr,3)
print(sarr)
SwapArray: [7, 6, 5, 9, 8]
[36]:
sarr = SwapArray([7, 9, 6, 5, 8])
print(sarr)
SwapArray: [7, 9, 6, 5, 8]
[37]:
max_to_right(sarr,1)
print(sarr)
SwapArray: [7, 9, 6, 5, 8]
[38]:
sarr = SwapArray([7, 9, 6, 5, 8])
print(sarr)
SwapArray: [7, 9, 6, 5, 8]
[39]:
max_to_right(sarr,0) # changes nothing
print(sarr)
SwapArray: [7, 9, 6, 5, 8]
6.6 swapsort¶
When you know how to push a maximum element to the rightmost position of an array, you almost have a sorting algorithm. So now you can try to implement swapsort
function, taking inspiration from max_to_right
. Note swapsort
is a function external to the class SwapArray
:
def swapsort(sarr):
""" Sorts in-place provided SwapArray.
NOTE: Here you are a user of SwapArray, so you *MUST NOT* access
directly the field _arr. To do changes, you can only use
the method swap_next(self, i).
NOTE: does *not* return anything!
NOTE: MUST execute in O(n^2), where n is the length of the array
"""
raise Exception("TODO IMPLEMENT ME !")
You can run tests only for swapsort
with this command:
python3 -m unittest swap_array_test.SwapSortTest
Example usage:
[40]:
sar = SwapArray([8,4,2,4,2,7,3])
[41]:
swapsort(sar)
[42]:
print(sar)
SwapArray: [2, 2, 3, 4, 4, 7, 8]
[ ]:
Linked lists¶
0 Introduction¶
In these exercises, you will be implementing several versions of a LinkedList
, improving its performances with each new version.
References¶
theory slides(Monodirectional list)
LinkedList Abstract Data Type on the book
Implementing LinkedListLinkedLists on the book
NOTE: What the book calls UnorderedList
, in this lab is just called LinkedList
. May look confusing, but in the wild you will never find code called UnorderedList
so let’s get rid of the weird name right now!
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|-linked-lists
|- linked-lists.ipynb
|- linked_list_test.py
|- linked_list_exercise.py
|- linked_list_solution.py
|- linked_list_v2_sol.py
|- linked_list_v2_test_sol.py
|- linked_list_v3_sol.py
|- linked_list_v3_test_sol.py
open the editor of your choice (for example Visual Studio Code, Spyder or PyCharme), you will edit the files ending in
_exercise.py
filesGo on reading this notebook, and follow instuctions inside.
0.1 Initialization¶
A LinkedList
for us is a linked list starting with a pointer called head that points to the first Node
(if the list is empty the pointer points to None
). Think of the list as a chain where each Node
can contain some data retriavable with Node.get_data()
method and you can access one Node
at a time by calling the method Node.get_next()
on each node.
Let’s see how a LinkedList should behave:
[2]:
from linked_list_solution import *
[3]:
ll = LinkedList()
At the beginning the LinkedList
is empty:
[4]:
print(ll)
LinkedList:
NOTE: print
calls __str__
method, which in our implementation was overridden to produce a nice string you’ve just seen. Still, we did not override __repr__
method which is the default one used by Jupyter when displaying on object without using print
, so if you omit it you won’t get nice display:
[5]:
ll
[5]:
<linked_list_solution.LinkedList at 0x7f1fec5cf748>
0.2 Growing¶
Main way to grow a LinkedList
is by using the .add
method, which executes in costant time \(O(1)\):
[6]:
ll.add('a')
Internally, each time you call .add
a new Node
object is created which will hold the actual data that you are passing. In this implementation, users of the class are supposed to never get instances of Node
, they will just be able to see the actual data contained in the Node
s:
[7]:
print(ll)
LinkedList: a
Notice that .add
actually inserts nodes at the beginning :
[8]:
ll.add('b')
[9]:
print(ll)
LinkedList: b,a
[10]:
ll.add('c')
[11]:
print(ll)
LinkedList: c,b,a
Our basic LinkedList
instance will only hold a pointer to the first Node
of the chain (such pointer is called _next
). When you add an element:
a new Node is created
provided data is stored inside new node
the new node
_next
field is set to point to current first Nodethe new node becomes the first node of the
LinkedList
, by settingLinkedList._next
to new node
0.3 Visiting¶
Any method that needs to visit the LinkedList
will have to start from the first Node pointed by LinkedList._next
and then follow the chain of _next
links from one Node to the next one. This is why the data structure is called ‘linked’. While insertion at the beginning is very fast, retrieving an element at arbitrary position requires a linear scan which in worst case costs \(O(n)\).
1 v1: a slow LinkedList¶
Implement the missing methods in linked_list_exercise.py
, in the order they are presented in the skeleton. Before implementing, read carefully all this point 1) and all its subsections (1.a,b and c)
1.a) Testing¶
You will have two files to look at, the code in linked_list_exercise.py
and the test code in a separate linked_list_test.py
file:
linked_list_exercise.py
linked_list_test.py
You can run tests with this shell command:
python3 -m unittest linked_list_test
Let’s look inside the first lines of linked_list_test.py
code, you will see a structure like this:
from linked_list_exercise import *
import unittest
class LinkedListTest(unittest.TestCase):
def myAssert(self, linked_list, python_list):
##### etc #####
class AddTest(LinkedListTest):
def test_01_init(self):
##### etc #####
def test_04_add(self):
##### etc #####
class SizeTest(LinkedListTest):
##### etc #####
Note:
the test automatically imports everything from first module
linked_list_exercise
, so when you run the test, it automatically loads the file you will be working on.) :
from linked_list_exercise import *
there is a base class for testing called
LinkedListTest
there are many classes for testing individual methods, each class inherits from
LinkedListTest
You will be writing several versions of the linked list. For the first one, you won’t need
myAssert
This time there is not much Python code to find around, you should rely solely on theory from the slides and book, method definitions and your intuition
1.b) Differences with the book¶
We don’t assume the list has all different values
We used more pythonic names for properties and methods, so for example private attribute
Node.data
was renamed toNode._data
and accessor methodNode.getData()
was renamed toNode.get_data()
. There are nicer ways to handle these kind of getters/setters pairs called ‘properties’ but we won’t address them here.In boundary cases like removing a non-existing element we prefer to raise an
LookupError
with the command
raise LookupError("Some error occurred!")
In general, this is the behaviour you also find in regular Python lists.
1.c) Please remember…¶
WARNING: Methods of the class LinkedList
are supposed to never return instances of Node
. If you see them returned in the tests, then you are making some mistake. Users of LinkedList
are should only be able to get access to items inside the Node data
fields.
WARNING: Do not use a Python list to hold data inside the data structure. Differently from the CappedStack
exercise, here you can only use Node
class. Each Node
in the _data
field can hold only one element which is provided by the user of the class, and we don’t care about the type of the value the user gives us (so it can be an int
, a float
, a string
, or even a Python list !)
COMMANDMENT 2: You shall also draw lists on paper, helps a lot avoiding mistakes
COMMANDMENT 5: You shall never ever reassign ``self``:
Never ever write horrors such as:
class MyClass
def my_method(self, x, y):
self = {a:666} # since self is a kind of dictionary, you might be tempted to do like this
# but to the outside world this will bring no effect.
# For example, let's say somebody from outside makes a call like this:
# mc = MyClass()
# mc.my_method()
# after the call mc will not point to {a:666}
self = ['666'] # self is only supposed to be a sort of dictionary and passed from outside
self = 6 # self is only supposed to be a sort of dictionary and passed from outside
COMMANDMENT 7: You shall use return
command only if you see written return in the function description!
If there is no return
in function description, the function is intended to return None
. In this case you don’t even need to write return None
, as Python will do it implicitly for you.
2 v2 faster size¶
2.1 Save a copy of your work¶
You already wrote a lot of code, and you don’t want to lose it, right? Since we are going to make many modifications, when you reach a point when the code does something useful, it is good practice to save a copy of what you have done somewhere, so if you later screw up something, you can always restore the copy.
Copy the whole folder
linked-lists
in a new folderlinked-lists-v1
Add also in the copied folder a separate
README.txt
file, writing inside the version (like1.0
), the date, and a description of the main features you implemented (for example “Simple linked list, not particularly performant”).Backing up the work is a form of the so-called versioning : there are much better ways to do it (like using git) but we don’t address them here.
WARNING: DO NOT SKIP THIS STEP!
No matter how smart you are, you will fail, and a backup may be the only way out.
WARNING: HAVE YOU READ WHAT I JUST WROTE ????
Just. Copy. The. Folder.
2.2. Improve size¶
Once you saved your precious work in the copy folder linked-lists-v1
, you can now more freely improve the current folder linked-lists
, being sure your previous efforts are not going to get lost!
As a first step, in linked-lists/linked_list_exercise.py
implement a size()
method that works in O(1)
. To make this work without going through the whole list each time, we will need a new _size
field that keeps track of the size. When the list is mutated with methods like add
, append
, etc you will also need to update the _size
field accordingly. Proceed like this:
2.2.1) add a new field _size
in the class constructor and initialize it to zero
2.2.2) modify the size()
method to just return the _size
field.
2.2.3) The data structure starts to be complex, and we need better testing. If you look at the tests, very often there are lines of code like self.assertEquals(to_py(ul), ['a', 'b'])
in the test_add
method:
def test_add(self):
ul = LinkedList()
self.myAssert(ul, [])
ul.add('b')
self.assertEquals(to_py(ul), ['b'])
ul.add('a')
self.assertEquals(to_py(ul), ['a', 'b'])
Last line checks our linked list ul
contains a sequence of linked nodes that once transformed to a python list actually equals ['a', 'b']
. Since in the new implementation we are going to mutate _size
field a lot, it could be smart to also check that ul.size()
equals len(["a", "b"])
. Repeating this check in every test method could be quite verbose. Instead, we can do a smarter thing, and develop in the LinkedListTest
class a new assertion method on our own:
If you noticed, there is a method myAssert
in LinkedListTest
class (in the current exercises/linked-lists/linked_list_test.py
file) which we never used so far, which performs a more thourough check:
class LinkedListTest(unittest.TestCase):
def myAssert(self, linked_list, python_list):
""" Checks provided linked_list can be represented as the given python_list. Since v2.
"""
self.assertEquals(to_py(linked_list), python_list)
# check this new invariant about the size
self.assertEquals(linked_list.size(), len(python_list))
WARNING: method myAssert
must not start with test
, otherwise unittest
will run it as a test!
2.3.4) Now, how to use this powerful new myAssert
method? In the test class, just replace every occurence of
self.assertEquals(to_py(ul), ['a', 'b'])
into calls like this:
self.myAssert(ul, ['a', 'b'])
WARNING: Notice the to_py( )
enclosing ul
is gone.
2.3.5) Actually update _size
in the various methods where data is mutated, like add
, insert
, etc.
2.3.6) Run the tests and hope for the best ;-)
python3 -m unittest linked_list_test
3 v3 Faster append¶
We are now better equipped to make further improvements. Once you’re done implementing the above and made sure everything works, you can implement an append
method that works in \(O(1)\) by adding an additional pointer in the data structure that always point at the last node. To further exploit the pointer, you can also add a fast last(self)
method that returns the last value in the list. Proceed like this:
3.1 Save a copy of your work¶
Copy the whole folder
linked-lists
in a new folderlinked-lists-v2
Add also in the copied folder a separate
README.txt
file, writing inside the version (like2.0
), the date, and a description of the main features you implemented (for example “Simple linked list, not particularly performant”).
WARNING: DO NOT SKIP THIS STEP!
3.2 add _last
field¶
Work on linked_list.py
and simply add an additional pointer called _last
in the constructor.
3.3 add method skeleton¶
Copy this method last
into the class. Just copy it, don’t implement it for now.
def last(self):
""" Returns the last element in the list, in O(1).
- If list is empty, raises a ValueError. Since v3.
"""
raise ValueError("TODO implement me!")
3.4 test driven development¶
Let’s do some so-called test driven development, that is, first we write the tests, then we write the implementation.
WARNING: During the exam you may be asked to write tests, so don’t skip writing them now !!
3.4.1 LastTest¶
Create a class LastTest
which inherits from LinkedListTest
, and add this method Implement a test for last()
method, by adding this to LinkedListTest
class:
def test_01_last(self):
raise Exception("TODO IMPLEMENT ME !")
In the method, create a list and add elements using only calls to add
method and checks using the myAssert
method. When done, ask your instructor if the test is correct (or look at the proposed solution), it is important you get it right otherwise you won’t be able to properly test your code.
3.4.2 improve myAssert¶
You already have a test for the append()
method, but, how can you be sure the _last
pointer is updated correctly throughout the code? When you implemented the fast size()
method you wrote some invariant in the myAssert
method. We can do the same this time, too. Find the invariant and add the corresponding check to the myAssert
method. When done, ask your instructor if the invariant is correct (or look at the proposed solution): it is important you get it right otherwise you
won’t be able to properly test your code.
3.5 update methods that mutate the LinkedList¶
Update the methods that mutate the data structure (add
, insert
, remove
…) so they keep _last
pointed to last element. If the list is empty, _last
will point to None
. Take particular care of corner cases such as empty list and one element list.
4 v4 Go bidirectional¶
Our list so far has links that allow us to traverse it fast in one direction. But what if we want fast traversal in the reverse direction, from last to first element? What if we want a pop()
that works in \(O(1)\) ? To speed up these operations we could add backward links to each Node
. Note no solution is provided for this part (yet).
Proceed in the following way:
4.1 Save your work¶
Once you’re done with previous points, save the version you have in a folder linked-list-v3
somewhere adding in the README.txt
comments about the improvements done so far, the version number (like 3.0) and the date. Then start working on a new copy.
4.2 Node backlinks¶
In Node
class, add backlinks by adding the attribute _prev
and methods get_prev(self)
and set_prev(self, pointer)
.
4.3 Better str¶
Improve __str__
method so it shows presence or absence of links, along with the size of the list (note you might need to adapt the test for str method):
next
pointers presence must be represented with>
character , absence with*
character. They must be put after the item representation.prev
pointers presence must be represented with<
character , absence with*
character. They must be put befor the item representation.
For example, for the list ['a','b','c']
, you would have the following representation:
LinkedList(size=3):*a><b><c*
As a special case for empty list you should print the following:
LinkedList(size=0):**
Other examples of proper lists, with 3, 2, and 1 element can be:
LinkedList(size=3):*a><b><c*
LinkedList(size=2):*a><b*
LinkedList(size=1):*a*
This new __str__
method should help you to spot broken lists like the following, were some pointers are not correct:
Broken list, all prev pointers are missing:
LinkedList(size=3):*a>*b>*c*
Broken list, size = 3 but shows only one element with next pointer set to None:
LinkedList(size=3):*a*
Broken list, first backward pointer points to something other than None
LinkedList(size=3):<a>*b><c*
4.4 Modify add¶
Update the LinkedList
add
method to take into account you now have backlinks. Take particular care for the boundary cases when the list is empty, has one element, or for nodes at the head and at the tail of the list.
4.5 Add to_python_reversed¶
Implement to_python_reversed
method with a linear scan by using the newly added backlinks:
def to_python_reversed(self):
""" Returns a regular Python list with the elements in reverse order,
from last to first. Since v3. """
raise Exception("TODO implement me")
Add also this test, and make sure it pass:
def test_to_python_reversed(self):
ul = LinkedList()
ul.add('c')
ul.add('b')
ul.add('a')
pr = to_py(ul)
pr.reverse() # we are reversing pr with Python's 'reverse()' method
self.assertEquals(pr, ul.to_python_reversed())
4.6 Add invariant¶
By using the method to_python_reversed()
, add a new invariant to the myAssert
method. If implemented correctly, this will surely spot a lot of possible errors in the code.
4.7 Modify other methods¶
Modify all other methods that mutate the data structure (insert
, remove
, etc) so that they update the backward links properly.
4.8 Run the tests¶
If you wrote meaningful tests and all pass, congrats!
5 EqList¶
Open file eqlist_exercise.py
, which is a simple linked list, and start editing the following methods.
5.1 eq¶
Implement the method __eq__
(with TWO underscores before and TWO underscores after ‘eq’) !:
def __eq__(self, other):
""" Returns True if self is equal to other, that is, if all the data elements in the respective
nodes are the same. Otherwise, return False.
NOTE: compares the *data* in the nodes, NOT the nodes themselves !
"""
Testing: python -m unittest eqlist_test.EqTest
5.2 remsub¶
Implement the method remsub
:
def remsub(self, rem):
""" Removes the first elements found in this LinkedList that match subsequence rem
Parameter rem is the subsequence to eliminate, which is also a LinkedList.
Examples:
aabca remsub ac = aba
aabca remsub cxa = aaba # when we find a never matching character in rem like 'x' here,
the rest of rem after 'x' is not considered.
aabca remsub ba = aac
aabca remsub a = abca
abcbab remsub bb = acab
"""
Testing: python3 -m unittest eqlist_test.RemsubTest
6 Cloning¶
Start editing the file cloning_exercise.py
, which contains a simplified LinkedList.
6.1 rev¶
Implement the method rev(self)
that you find in the skeleton and check provided tests pass.
Testing: python3 -m unittest cloning_test.RevTest
6.2 clone¶
Implement the method clone(self)
that you find in the skeleton and check provided tests pass.
Testing: python3 -m unittest cloning_test.CloneTest
7 More exercises¶
Start editing the file more_exercise.py
, which contains a simplified LinkedList.
7.1 occurrences¶
Implement this method:
def occurrences(self, item):
"""
Returns the number of occurrences of item in the list.
- MUST execute in O(n) where 'n' is the length of the list.
"""
Testing: python3 -m unittest more_test.CloneTest
**Examples: **
[17]:
from more_solution import *
ul = LinkedList()
ul.add('a')
ul.add('c')
ul.add('b')
ul.add('a')
print(ul)
LinkedList: a,b,c,a
[18]:
print(ul.occurrences('a'))
2
[19]:
print(ul.occurrences('c'))
1
[20]:
print(ul.occurrences('z'))
0
7.2 shrink¶
Implement this method in LinkedList
class:
def shrink(self):
"""
Removes from this LinkedList all nodes at odd indeces (1, 3, 5, ...),
supposing that the first node has index zero, the second node
has index one, and so on.
So if the LinkedList is
'a','b','c','d','e'
a call to shrink will transform the UnorderedList into
'a','c','e'
- MUST execute in O(n) where 'n' is the length of the list.
- Does *not* return anything.
"""
raise Exception("TODO IMPLEMENT ME!")
Testing: python3 -m unittest more_test.ShrinkTest
[21]:
ul = LinkedList()
ul.add('e')
ul.add('d')
ul.add('c')
ul.add('b')
ul.add('a')
print(ul)
LinkedList: a,b,c,d,e
[22]:
ul.shrink()
print(ul)
LinkedList: a,c,e
7.3 dup_first¶
Implement the method dup_first
:
def dup_first(self):
""" MODIFIES this list by adding a duplicate of first node right after it.
For example, the list 'a','b','c' should become 'a','a','b','c'.
An empty list remains unmodified.
- DOES NOT RETURN ANYTHING !!!
"""
raise Exception("TODO IMPLEMENT ME !")
Testing: python3 -m unittest more_test.DupFirstTest
7.4 dup_all¶
Implement the method dup_all
:
def dup_all(self):
""" Modifies this list by adding a duplicate of each node right after it.
For example, the list 'a','b','c' should become 'a','a','b','b','c','c'.
An empty list remains unmodified.
- MUST PERFORM IN O(n) WHERE n is the length of the list.
- DOES NOT RETURN ANYTHING !!!
"""
raise Exception("TODO IMPLEMENT ME !")
Testing: python3 -m unittest more_test.DupAllTest
7.5 mirror¶
Implement following mirror
function. NOTE: the function is external to class LinkedList
.
def mirror(lst):
""" Returns a new LinkedList having double the nodes of provided lst
First nodes will have same elements of lst, following nodes will
have the same elements but in reversed order.
For example:
>>> mirror(['a'])
LinkedList: a,a
>>> mirror(['a','b'])
LinkedList: a,b,b,a
>>> mirror(['a','c','b'])
LinkedList: a,c,b,b,c,a
"""
raise Exception("TODO IMPLEMENT ME !")
Testing: python -m unittest more_test.MirrorTest
7.6 norep¶
Implement the method norep
:
def norep(self):
""" MODIFIES this list by removing all the consecutive
repetitions from it.
- MUST perform in O(n), where n is the list size.
For example, after calling norep:
'a','a','b','c','c','c' will become 'a','b','c'
'a','a','b','a' will become 'a','b','a'
"""
raise Exception("TODO IMPLEMENT ME !")
Testing: python -m unittest more_test.NorepTest
7.8 find_couple¶
Implement following find_couple
method.
def find_couple(self,a,b):
""" Search the list for the first two consecutive elements having data equal to
provided a and b, respectively. If such elements are found, the position
of the first one is returned, otherwise raises LookupError.
- MUST run in O(n), where n is the size of the list.
- Returned index start from 0 included
"""
Testing: python3 -m unittest more_test.FindCoupleTest
7.9 swap¶
Implement the method swap
:
def swap (self, i, j):
"""
Swap the data of nodes at index i and j. Indeces start from 0 included.
If any of the indeces is out of bounds, rises IndexError.
NOTE: You MUST implement this function with a single scan of the list.
"""
Testing: python3 -m unittest more_test.SwapTest
7.10 gaps¶
Given a linked list of size n which only contains integers, a gap is an index i
, 0<i<n
, such that L[i−1]<L[i]
. For the purpose of this exercise, we assume an empy list or a list with one element have zero gaps
Example:
data: 9 7 6 8 9 2 2 5
index: 0 1 2 3 4 5 6 7
contains three gaps [3,4,7] because:
number 8 at index 3 is greater than previous number 6 at index 2
number 9 at index 4 is greater than previous number 8 at index 3
number 5 at index 7 is greater than previous number 2 at index 6
Implement this method:
def gaps(self):
""" Assuming all the data in the linked list is made by numbers,
finds the gaps in the LinkedList and return them as a Python list.
- we assume empty list and list of one element have zero gaps
- MUST perform in O(n) where n is the length of the list
NOTE: gaps to return are *indeces* , *not* data!!!!
"""
Testing: python3 -m unittest more_test.GapsTest
7.11 flatv¶
Suppose a LinkedList
only contains integer numbers, say 3,8,8,7,5,8,6,3,9. Implement method flatv
which scans the list: when it finds the first occurence of a node which contains a number which is less then the previous one, and the less than successive one, it inserts after the current one another node with the same data as the current one, and exits.
Example:
for Linked list 3,8,8,7,5,8,6,3,9
calling flatv
should modify the linked list so that it becomes
Linked list 3,8,8,7,5,5,8,6,3,9
Note that it only modifies the first occurrence found 7,5,8 to 7,5,5,8 and the successive sequence 6,3,9 is not altered
Implement this method:
def flatv(self):
Testing: python3 -m unittest more_test.FlatvTest
7.12 bubble_sort¶
You will implement bubble sort on a LinkedList
.
def bubble_sort(self):
""" Sorts in-place this linked list using the method of bubble sort
- MUST execute in O(n^2) where n is the length of the linked list
"""
As a reference, you can look at this example_bubble
implementation below that operates on regular python lists. Basically, you will have to translate the for
cycles into two suitable while
and use node pointers.
NOTE: this version of the algorithm is inefficient as we do not use j
in the inner loop: your linked list implementation can have this inefficiency as well.
Testing: python3 -m unittest more_test.BubbleSortTest
[23]:
def example_bubble(plist):
for j in range(len(plist)):
for i in range(len(plist)):
if i + 1 < len(plist) and plist[i]>plist[i+1]:
temp = plist[i]
plist[i] = plist[i+1]
plist[i+1] = temp
my_list = [23, 34, 55, 32, 7777, 98, 3, 2, 1]
example_bubble(my_list)
print(my_list)
[1, 2, 3, 23, 32, 34, 55, 98, 7777]
7.13 merge¶
Implement this method:
def merge(self,l2):
""" Assumes this linkedlist and l2 linkedlist contain integer numbers
sorted in ASCENDING order, and RETURN a NEW LinkedList with
all the numbers from this and l2 sorted in DESCENDING order
IMPORTANT 1: *MUST* EXECUTE IN O(n1+n2) TIME where n1 and n2 are
the sizes of this and l2 linked_list, respectively
IMPORTANT 2: *DO NOT* attempt to convert linked lists to
python lists!
"""
Testing: python3 -m unittest more_test.MergeTest
[ ]:
Stacks¶
0. Introduction¶
References¶
and following sections :
Balanced Symbols - a General Case
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|-stacks
|- stacks.ipynb
|- capped_stack_exercise.py
|- capped_stack_solution.py
|- capped_stack_test.py
|- ...
open the editor of your choice (for example Visual Studio Code, Spyder or PyCharme), you will edit the files ending in
_exercise.py
filesGo on reading this notebook, and follow instuctions inside
1. CappedStack¶
You will try to implement a so called capped stack, which has a limit called cap over which elements are discarded.
Your internal implementation will use python lists
Please name internal variables that you don’t want to expose to class users by prepending them with one underscore
'_'
, like_elements
or_cap
The underscore is just a convention, class users will still be able to get internal variables by accessing them with field accessors like
mystack._elements
If users manipulate private fields and complain something is not working, you can tell them it’s their fault!
try to write robust code. In general, when implementing code in the real world you might need to think more about boundary cases. In this case, we add the additional constraint that if you pass to the stack a negative or zero cap, your class initalization is expected to fail and raise a
ValueError
.For easier inspection of the stack, implement also an
__str__
method so that calls toprint
show text likeCappedStack: cap=4 elements=['a', 'b']
IMPORTANT: you can exploit any Python feature you deem correct to implement the data structure. For example, internally you could represent the elements as a list , and use its own methods to grow it.
QUESTION: If we already have Python lists that can more or less do the job of the stack, why do we need to wrap them inside a Stack? Can’t we just give our users a Python list?
QUESTION: When would you not use a Python list to hold the data in the stack?
Notice that:
We tried to use pythonic names for methods, so for example
isEmpty
was renamed tois_empty
In this case, when this stack is required to
pop
orpeek
but it is found to be empty, anIndexError
is raised
CappedStack Examples¶
To get an idea of the class to be made, in the terminal you may run the python interpreter and load the solution module like we are doing here:
[2]:
from capped_stack_solution import *
[3]:
s = CappedStack(2)
[4]:
print(s)
CappedStack: cap=2 elements=[]
[5]:
s.push('a')
[6]:
print(s)
CappedStack: cap=2 elements=['a']
[7]:
s.peek()
[7]:
'a'
[8]:
s.push('b')
[9]:
s.peek()
[9]:
'b'
[10]:
print(s)
CappedStack: cap=2 elements=['a', 'b']
[11]:
s.peek()
[11]:
'b'
[12]:
s.push('c') # exceeds cap, gets silently discarded
[13]:
print(s) # no c here ...
CappedStack: cap=2 elements=['a', 'b']
[14]:
s.pop()
[14]:
'b'
[15]:
print(s)
CappedStack: cap=2 elements=['a']
[16]:
s.pop()
[16]:
'a'
s.pop() # can't pop empty stack
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-41-c88c8c48122b> in <module>()
----> 1 s.pop()
~/Da/prj/datasciprolab/prj/exercises/stacks/capped_stack_solution.py in pop(self)
63 #jupman-raise
64 if len(self._elements) == 0:
---> 65 raise IndexError("Empty stack !")
66 else:
67 return self._elements.pop()
IndexError: Empty stack !
s.peek() # can't peek empty stack
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-18-f056e7e54f5d> in <module>()
----> 1 s.peek()
~/Da/prj/datasciprolab/prj/exercises/stacks/capped_stack_solution.py in peek(self)
77 #jupman-raise
78 if len(self._elements) == 0:
---> 79 raise IndexError("Empty stack !")
80
81 return self._elements[-1]
IndexError: Empty stack !
Capped Stack basic methods¶
Now open capped_stack_exercise.py
and start implementing the methods in the order you find them.
All basic methods are grouped within the CappedStackTest class: to execute single tests you can put the test method name after the test class name, see examples below.
1.1 __init__
¶
Test: python3 -m unittest capped_stack_test.CappedStackTest.test_01_init
1.2 cap¶
Test: python3 -m unittest capped_stack_test.CappedStackTest.test_02_cap
1.3 size¶
Test: python3 -m unittest capped_stack_test.CappedStackTest.test_03_size
1.4 __str__
¶
Test: python3 -m unittest capped_stack_test.CappedStackTest.test_04_str
1.5 is_empty¶
Test: python3 -m unittest capped_stack_test.CappedStackTest.test_05_is_empty
1.6 push¶
Test: python3 -m unittest capped_stack_test.CappedStackTest.test_06_push
1.7 peek¶
Test: python3 -m unittest capped_stack_test.CappedStackTest.test_07_peek
1.8 pop¶
Test: python3 -m unittest capped_stack_test.CappedStackTest.test_08_pop
1.9 peekn¶
Implement the peekn
method:
def peekn(self, n):
"""
RETURN a list with the n top elements, in the order in which they
were pushed. For example, if the stack is the following:
e
d
c
b
a
peekn(3) will return the list ['c','d','e']
- If there aren't enough element to peek, raises IndexError
- If n is negative, raises an IndexError
"""
raise Exception("TODO IMPLEMENT ME!")
Test: python3 -m unittest capped_stack_test.PeeknTest
1.10 popn¶
Implement the popn
method:
def popn(self, n):
""" Pops the top n elements, and RETURN them as a list, in the order in
which they where pushed. For example, with the following stack:
e
d
c
b
a
popn(3)
will give back ['c','d','e'], and stack will become:
b
a
- If there aren't enough element to pop, raises an IndexError
- If n is negative, raises an IndexError
"""
Test: python3 -m unittest capped_stack_test.PopnTest
1.11 set_cap¶
Implement the set_cap
method:
def set_cap(self, cap):
""" MODIFIES the cap, setting its value to the provided cap.
If the cap is less then the stack size, all the elements above
the cap are removed from the stack.
If cap < 1, raises an IndexError
Does *not* return anything!
For example, with the following stack, and cap at position 7:
cap -> 7
6
5 e
4 d
3 c
2 b
1 a
calling method set_cap(3) will change the stack to this:
cap -> 3 c
2 b
1 a
"""
Test: python3 -m unittest capped_stack_test.SetCapTest
2. SortedStack¶
You are given a class SortedStack
that models a simple stack. This stack is similar to the CappedStack
you already saw, the differences being:
it can only contain integers, trying to put other type of values will raise a
ValueError
integers must be inserted sorted in the stack, either ascending or descending
there is no cap
Example:
Ascending: Descending
8 3
5 5
3 8
[17]:
from sorted_stack_solution import *
To create a SortedStack
sorted in ascending order, just call it passing True
:
[18]:
s = SortedStack(True)
print(s)
SortedStack (ascending): elements=[]
[19]:
s.push(5)
print(s)
SortedStack (ascending): elements=[5]
[20]:
s.push(7)
print(s)
SortedStack (ascending): elements=[5, 7]
[21]:
print(s.pop())
7
[22]:
print(s)
SortedStack (ascending): elements=[5]
[23]:
print(s.pop())
5
[24]:
print(s)
SortedStack (ascending): elements=[]
For descending order, pass False
when you create it:
[25]:
sd = SortedStack(False)
sd.push(7)
sd.push(5)
sd.push(4)
print(sd)
SortedStack (descending): elements=[7, 5, 4]
2.1 transfer¶
Now implement the transfer
function.
NOTE: function is external to class SortedStack
, so you must NOT access fields which begin with underscore (like _elements
), which are meant to be private !!
def transfer(s):
""" Takes as input a SortedStack s (either ascending or descending) and
returns a new SortedStack with the same elements of s, but in reverse order.
At the end of the call s will be empty.
Example:
s result
2 5
3 3
5 2
"""
raise Exception("TODO IMPLEMENT ME !!")
Testing
Once done, running this will run only the tests in TransferTest
class and hopefully they will pass.
**Notice that exercise1
is followed by a dot and test class name .TransferTest
: **
python -m unittest sorted_stack_test.TransferTest
2.2 merge¶
Implement following merge
function. NOTE: function is external to class SortedStack
.
def merge(s1,s2):
""" Takes as input two SortedStacks having both ascending order,
and returns a new SortedStack sorted in descending order, which will be the sorted merge
of the two input stacks. MUST run in O(n1 + n2) time, where n1 and n2 are s1 and s2 sizes.
If input stacks are not both ascending, raises ValueError.
At the end of the call the input stacks will be empty.
Example:
s1 (asc) s2 (asc) result (desc)
5 7 2
4 3 3
2 4
5
7
"""
raise Exception("TODO IMPLEMENT ME !")
Testing: python -m unittest sorted_stack_test.MergeTest
3. WStack¶
Using a text editor, open file wstack_exercise.py
. You will find a WStack
class skeleton which represents a simple stack that can only contain integers.
3.1 implement class WStack¶
Fill in missing methods in class WStack
in the order they are presented so to have a .weight()
method that returns the total sum of integers in the stack in O(1)
time.
Example:
[26]:
from wstack_solution import *
[27]:
s = WStack()
[28]:
print(s)
WStack: weight=0 elements=[]
[29]:
s.push(7)
[30]:
print(s)
WStack: weight=7 elements=[7]
[31]:
s.push(4)
[32]:
print(s)
WStack: weight=11 elements=[7, 4]
[33]:
s.push(2)
[34]:
s.pop()
[34]:
2
[35]:
print(s)
WStack: weight=11 elements=[7, 4]
3.2 accumulate¶
Implement function accumulate
:
def accumulate(stack1, stack2, min_amount):
""" Pushes on stack2 elements taken from stack1 until the weight of
stack2 is equal or exceeds the given min_amount
- if the given min_amount cannot possibly be reached because
stack1 has not enough weight, raises early ValueError without
changing stack1.
- DO NOT access internal fields of stacks, only use class methods.
- MUST perform in O(n) where n is the size of stack1
- NOTE: this function is defined *outside* the class !
"""
Testing: python -m unittest wstack_test.AccumulateTest
Example:
[36]:
s1 = WStack()
print(s1)
WStack: weight=0 elements=[]
[37]:
s1.push(2)
s1.push(9)
s1.push(5)
s1.push(3)
[38]:
print(s1)
WStack: weight=19 elements=[2, 9, 5, 3]
[39]:
s2 = WStack()
print(s2)
WStack: weight=0 elements=[]
[40]:
s2.push(1)
s2.push(7)
s2.push(4)
[41]:
print(s2)
WStack: weight=12 elements=[1, 7, 4]
[42]:
# attempts to reach in s2 a weight of at least 17
[43]:
accumulate(s1,s2,17)
[44]:
print(s1)
WStack: weight=11 elements=[2, 9]
Two top elements were taken from s1 and now s2 has a weight of 20, which is >= 17
4. Backpack¶
Open a text editor and edit file backpack_solution.py
We can model a backpack as stack of elements, each being a tuple with a name and a weight.
A sensible strategy to fill a backpack is to place heaviest elements to the bottom, so our backback will allow pushing an element only if that element weight is equal or lesser than current topmost element weight.
The backpack has also a maximum weight: you can put any number of items you want, as long as its maximum weight is not exceeded.
Example
[45]:
from backpack_solution import *
bp = Backpack(30) # max_weight = 30
bp.push('a',10) # item 'a' with weight 10
DEBUG: Pushing (a,10)
[46]:
print(bp)
Backpack: weight=10 max_weight=30
elements=[('a', 10)]
[47]:
bp.push('b',8)
DEBUG: Pushing (b,8)
[48]:
print(bp)
Backpack: weight=18 max_weight=30
elements=[('a', 10), ('b', 8)]
>>> bp.push('c', 11)
DEBUG: Pushing (c,11)
ValueError: ('Pushing weight greater than top element weight! %s > %s', (11, 8))
[49]:
bp.push('c', 7)
DEBUG: Pushing (c,7)
[50]:
print(bp)
Backpack: weight=25 max_weight=30
elements=[('a', 10), ('b', 8), ('c', 7)]
>>> bp.push('d', 6)
DEBUG: Pushing (d,6)
ValueError: Can't exceed max_weight ! (31 > 30)
4.1 class¶
✪✪ Implement methods in the class Backpack
, in the order they are shown. If you want, you can add debug prints by calling the debug
function
IMPORTANT: the data structure should provide the total current weight in O(1), so make sure to add and update an appropriate field to meet this constraint.
Testing: python3 -m unittest backpack_test.BackpackTest
4.2 remove¶
✪✪ Implement function remove
:
# NOTE: this function is implemented *outside* the class !
def remove(backpack, el):
"""
Remove topmost occurrence of el found in the backpack,
and RETURN it (as a tuple name, weight)
- if el is not found, raises ValueError
- DO *NOT* ACCESS DIRECTLY FIELDS OF BACKPACK !!!
Instead, just call methods of the class!
- MUST perform in O(n), where n is the backpack size
- HINT: To remove el, you need to call Backpack.pop() until
the top element is what you are looking for. You need
to save somewhere the popped items except the one to
remove, and then push them back again.
"""
Testing: python3 -m unittest backpack_test.RemoveTest
Example:
[51]:
bp = Backpack(50)
bp.push('a',9)
bp.push('b',8)
bp.push('c',8)
bp.push('b',8)
bp.push('d',7)
bp.push('e',5)
bp.push('f',2)
DEBUG: Pushing (a,9)
DEBUG: Pushing (b,8)
DEBUG: Pushing (c,8)
DEBUG: Pushing (b,8)
DEBUG: Pushing (d,7)
DEBUG: Pushing (e,5)
DEBUG: Pushing (f,2)
[52]:
print(bp)
Backpack: weight=47 max_weight=50
elements=[('a', 9), ('b', 8), ('c', 8), ('b', 8), ('d', 7), ('e', 5), ('f', 2)]
[53]:
remove(bp, 'b')
DEBUG: Popping ('f', 2)
DEBUG: Popping ('e', 5)
DEBUG: Popping ('d', 7)
DEBUG: Popping ('b', 8)
DEBUG: Pushing (d,7)
DEBUG: Pushing (e,5)
DEBUG: Pushing (f,2)
[53]:
('b', 8)
[54]:
print(bp)
Backpack: weight=39 max_weight=50
elements=[('a', 9), ('b', 8), ('c', 8), ('d', 7), ('e', 5), ('f', 2)]
[55]:
print(s2)
WStack: weight=20 elements=[1, 7, 4, 3, 5]
5. Tasks¶
Very often, you begin to do a task just to discover it requires doing 3 other tasks, so you start carrying them out one at a time and discover one of them actually requires to do yet another two other subtasks….
To represent the fact a task may have subtasks, we will use a dictionary mapping a task label to a list of subtasks, each represented as a label. For example:
[56]:
subtasks = {
'a':['b','g'],
'b':['c','d','e'],
'c':['f'],
'd':['g'],
'e':[],
'f':[],
'g':[]
}
Task a
requires subtasks b
andg
to be carried out (in this order), but task b
requires subtasks c
, d
and e
to be done. c
requires f
to be done, and d
requires g
.
You will have to implement a function called do
and use a Stack data structure, which is already provided and you don’t need to implement. Let’s see an example of execution.
IMPORTANT: In the execution example, there are many prints just to help you understand what’s going on, but the only thing we actually care about is the final list returned by the function!
IMPORTANT: notice subtasks are scheduled in reversed order, so the item on top of the stack will be the first to get executed !
[57]:
from tasks_solution import *
do('a', subtasks)
DEBUG: Stack: elements=['a']
DEBUG: Doing task a, scheduling subtasks ['b', 'g']
DEBUG: Stack: elements=['g', 'b']
DEBUG: Doing task b, scheduling subtasks ['c', 'd', 'e']
DEBUG: Stack: elements=['g', 'e', 'd', 'c']
DEBUG: Doing task c, scheduling subtasks ['f']
DEBUG: Stack: elements=['g', 'e', 'd', 'f']
DEBUG: Doing task f, scheduling subtasks []
DEBUG: Nothing else to do!
DEBUG: Stack: elements=['g', 'e', 'd']
DEBUG: Doing task d, scheduling subtasks ['g']
DEBUG: Stack: elements=['g', 'e', 'g']
DEBUG: Doing task g, scheduling subtasks []
DEBUG: Nothing else to do!
DEBUG: Stack: elements=['g', 'e']
DEBUG: Doing task e, scheduling subtasks []
DEBUG: Nothing else to do!
DEBUG: Stack: elements=['g']
DEBUG: Doing task g, scheduling subtasks []
DEBUG: Nothing else to do!
DEBUG: Stack: elements=[]
[57]:
['a', 'b', 'c', 'f', 'd', 'g', 'e', 'g']
The Stack
you must use is simple and supports push
, pop
, and is_empty
operations:
[58]:
s = Stack()
[59]:
print(s)
Stack: elements=[]
[60]:
s.is_empty()
[60]:
True
[61]:
s.push('a')
[62]:
print(s)
Stack: elements=['a']
[63]:
s.push('b')
[64]:
print(s)
Stack: elements=['a', 'b']
[65]:
s.pop()
[65]:
'b'
[66]:
print(s)
Stack: elements=['a']
5.1 do¶
Now open tasks_exercise.py
and implement function do
:
def do(task, subtasks):
""" Takes a task to perform and a dictionary of subtasks,
and RETURN a list of performed tasks
- To implement it, inside create a Stack instance and a while cycle.
- DO *NOT* use a recursive function
- Inside the function, you can use a print like "I'm doing task a',
but that is only to help yourself in debugging, only the
list returned by the function will be considered in the evaluation!
"""
Testing: python3 -m unittest tasks_test.DoTest
5.2 do_level¶
In this exercise, you are asked to implement a slightly more complex version of the previous function where on the Stack
you push two-valued tuples, containing the task label and the associated level. The first task has level 0, the immediate subtask has level 1, the subtask of the subtask has level 2 and so on and so forth. In the list returned by the function, you will put such tuples.
One possibile use is to display the executed tasks as an indented tree, where the indentation is determined by the level. Here we see an example:
IMPORTANT: Again, the prints are only to let you understand what’s going on, and you are not required to code them. The only thing that really matters is the list the function must return !
[67]:
subtasks = {
'a':['b','g'],
'b':['c','d','e'],
'c':['f'],
'd':['g'],
'e':[],
'f':[],
'g':[]
}
do_level('a', subtasks)
DEBUG: Stack: elements=[('a', 0)]
DEBUG: I'm doing a level=0 Stack: elements=[('g', 1), ('b', 1)]
DEBUG: I'm doing b level=1 Stack: elements=[('g', 1), ('e', 2), ('d', 2), ('c', 2)]
DEBUG: I'm doing c level=2 Stack: elements=[('g', 1), ('e', 2), ('d', 2), ('f', 3)]
DEBUG: I'm doing f level=3 Stack: elements=[('g', 1), ('e', 2), ('d', 2)]
DEBUG: I'm doing d level=2 Stack: elements=[('g', 1), ('e', 2), ('g', 3)]
DEBUG: I'm doing g level=3 Stack: elements=[('g', 1), ('e', 2)]
DEBUG: I'm doing e level=2 Stack: elements=[('g', 1)]
DEBUG: I'm doing g level=1 Stack: elements=[]
[67]:
[('a', 0),
('b', 1),
('c', 2),
('f', 3),
('d', 2),
('g', 3),
('e', 2),
('g', 1)]
Now implement the function:
def do_level(task, subtasks):
""" Takes a task to perform and a dictionary of subtasks,
and RETURN a list of performed tasks, as tuples (task label, level)
- To implement it, use a Stack and a while cycle
- DO *NOT* use a recursive function
- Inside the function, you can use a print like "I'm doing task a',
but that is only to help yourself in debugging, only the
list returned by the function will be considered in the evaluation
"""
Testing: python3 -m unittest tasks_test.DoLevelTest
6. Stacktris¶
Open a text editor and edit file stacktris_exercise.py
A Stacktris
is a data structure that operates like the famous game Tetris, with some restrictions:
Falling pieces can be either of length 1 or 2. We call them
1-block
and2-block
respectivelyThe pit has a fixed width of 3 columns
2-block
s can only be in horizontal
We print a Stacktris
like this:
\ j 012
i
4 | 11| # two 1-block
3 | 22| # one 2-block
2 | 1 | # one 1-block
1 |22 | # one 2-block
0 |1 1| # on the ground there are two 1-block
In Python, we model the Stacktris
as a class holding in the variable _stack
a list of lists of integers, which models the pit:
class Stacktris:
def __init__(self):
""" Creates a Stacktris
"""
self._stack = []
So in the situation above the _stack
variable would look like this (notice row order is inverted with respect to the print)
[
[1,0,1],
[2,2,0],
[0,1,0],
[0,2,2],
[0,1,1],
]
The class has three methods of interest which you will implement, drop1(j)
, drop2h(j)
and _shorten
Example
Let’s see an example:
[68]:
from stacktris_solution import *
st = Stacktris()
At the beginning the pit is empty:
[69]:
st
[69]:
Stacktris:
EMPTY
We can start by dropping from the ceiling a block of dimension 1 into the last column at index j=2
. By doing so, a new row will be created, and will be a list containing the numbers [0,0,1]
IMPORTANT: zeroes are not displayed
[70]:
st.drop1(2)
DEBUG: Stacktris:
| 1|
[70]:
[]
Now we drop an horizontal block of dimension 2 (a 2-block
) having the leftmost block at column j=1
. Since below in the pit there is already the 1
block we previosly put, the new block will fall and stay upon it. Internally, we will add a new row as a python list containing the numbers [0,2,2]
[71]:
st.drop2h(1)
DEBUG: Stacktris:
| 22|
| 1|
[71]:
[]
We see the zeroth column is empty, so if we drop there a 1-block
it will fall to the ground. Internally, the zeroth list will become [1,0,1]
:
[72]:
st.drop1(0)
DEBUG: Stacktris:
| 22|
|1 1|
[72]:
[]
Now we drop again a 2-block
at column j=2
, on top of the previously laid one. This will add a new row as list [0,2,2]
.
[73]:
st.drop2h(1)
DEBUG: Stacktris:
| 22|
| 22|
|1 1|
[73]:
[]
In the game Tetris, when a row becomes completely filled it disappears. So if we drop a 1-block
to the leftmost column, the mid line should be removed.
NOTE: The messages on the console are just debug print, the function drop1
only returns the extracted line [1,2,2]
:
[74]:
st.drop1(0)
DEBUG: Stacktris:
| 22|
|122|
|1 1|
DEBUG: POPPING [1, 2, 2]
DEBUG: Stacktris:
| 22|
|1 1|
[74]:
[1, 2, 2]
Now we insert another 2-block
starting at j=0
. It will fall upon the previously laid one:
[75]:
st.drop2h(0)
DEBUG: Stacktris:
|22 |
| 22|
|1 1|
[75]:
[]
We can complete teh topmost row by dropping a 1-block
to the rightmost column. As a result, the row will be removed from the stack and the row will be returned by the call to drop1
:
[76]:
st.drop1(2)
DEBUG: Stacktris:
|221|
| 22|
|1 1|
DEBUG: POPPING [2, 2, 1]
DEBUG: Stacktris:
| 22|
|1 1|
[76]:
[2, 2, 1]
Another line completion with a drop1
at column j=0
:
[77]:
st.drop1(0)
DEBUG: Stacktris:
|122|
|1 1|
DEBUG: POPPING [1, 2, 2]
DEBUG: Stacktris:
|1 1|
[77]:
[1, 2, 2]
We can finally empty the Stacktris by dropping a 1-block
in the mod column:
[78]:
st.drop1(1)
DEBUG: Stacktris:
|111|
DEBUG: POPPING [1, 1, 1]
DEBUG: Stacktris:
EMPTY
[78]:
[1, 1, 1]
6.1 _shorten¶
Start by implementing this private method:
def _shorten(self):
""" Scans the Stacktris from top to bottom searching for a completely filled line:
- if found, remove it from the Stacktris and return it as a list.
- if not found, return an empty list.
"""
If you wish, you can add debug prints but they are not mandatory
Testing: python3 -m unittest stacktris_test.ShortenTest
6.2 drop1¶
Once you are done with the previous function, implement drop1
method:
NOTE: In the implementation, feel free to call the previously implemented _shorten
method.
def drop1(self, j):
""" Drops a 1-block on column j.
- If another block is found, place the 1-block on top of that block,
otherwise place it on the ground.
- If, after the 1-block is placed, a row results completely filled, removes
the row and RETURN it. Otherwise, RETURN an empty list.
- if index `j` is outside bounds, raises ValueError
"""
Testing: python3 -m unittest stacktris_test.Drop1Test
6.3 drop2h¶
Once you are done with the previous function, implement drop2
method:
def drop2h(self, j):
""" Drops a 2-block horizontally with left block on column j,
- If another block is found, place the 2-block on top of that block,
otherwise place it on the ground.
- If, after the 2-block is placed, a row results completely filled,
removes the row and RETURN it. Otherwise, RETURN an empty list.
- if index `j` is outside bounds, raises ValueError
"""
Testing: python3 -m unittest stacktris_test.Drop2hTest
[ ]:
Queues¶
Introduction¶
In these exercises, you will be implementing several queues.
See theory slides
See Queue Abstract Data Type on the book
See Implementing a Queue in Python on the book
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|-queues
|- queues.ipynb
|- circular_queue_exercise.py
|- circular_queue_test.py
|- circular_queue_solution.py
|- ...
open the editor of your choice (for example Visual Studio Code, Spyder or PyCharme), you will edit the files ending in
_exercise.py
filesGo on reading this notebook, and follow instuctions inside.
1. LinkedQueue¶
Open linked_queue_exercise.py
.
You are given a queue implemented as a LinkedList, with usual _head
pointer plus additional _tail
pointer and _size
counter
Data in enqueued at the right, in the tail
Data is dequeued at the left, removing it from the head
Example, where the arrows represent _next pointers:
_head _tail
a -> b -> c -> d -> e -> f
In this exercise you will implement the methods enqn(lst)
and deqn(n)
which respectively enqueue a python list of n elements and dequeue n elements, returning python a list of them.
Here we show an example usage, see to next points for detailed instructions.
Example:
[2]:
from linked_queue_solution import *
[3]:
q = LinkedQueue()
[4]:
print(q)
LinkedQueue:
[5]:
q.enqn(['a','b','c'])
Return nothing, queue becomes:
_head _tail
a -> b -> c
[6]:
q.enqn(['d'])
Return nothing, queue becomes:
_head _tail
a -> b -> c -> d
[7]:
q.enqn(['e','f'])
Return nothing, queue becomes:
_head _tail
a -> b -> c -> d -> e -> f
[8]:
q.deqn(3)
[8]:
['a', 'b', 'c']
Returns [‘d’, ‘e’, ‘f’] and queue becomes:
_head _tail
a -> b -> c
[9]:
q.deqn(1)
[9]:
['d']
Returns [‘c’] and queue becomes:
_head _tail
a -> b
q.deqn(5)
---------------------------------------------------------------------------
LookupError Traceback (most recent call last)
<ipython-input-55-e68c2e9949d0> in <module>()
1
----> 2 q.deqn(5)
~/Da/prj/datasciprolab/prj/exercises/queues/linked_queue_solution.py in deqn(self, n)
202 #jupman-raise
203 if n > self._size:
--> 204 raise LookupError('Asked to dequeue %s elements, but only %s are available!' % (n, self._size))
205
206 ret = []
LookupError: Asked to dequeue 5 elements, but only 2 are available!
Raises LookupError
as there aren’t enough elements to remove
1.1 enqn¶
Implement the method enqn
:
def enqn(self, lst):
""" Enqueues provided list of elements at the tail of the queue
- Required complexity: O(len(lst))
- NOTE: remember to update the _size and _tail
Example: supposing arrows represent _next pointers:
_head _tail
a -> b -> c
Calling
q.enqn(['d', 'e', 'f', 'g'])
will produce the queue:
_head _tail
a -> b -> c -> d -> e -> f -> g
Testing: python3 -m unittest linked_queue_test.EnqnTest
1.2 deqn¶
Implement the method deqn
:
def deqn(self, n):
""" Removes n elements from the head, and return them as a Python list,
where the first element that was enqueued will appear at the
*beginning* of the returned Python list.
- if n is greater than the size of the queue, raises a LookupError.
- required complexity: O(n)
NOTE 1: return a list of the *DATA* in the nodes, *NOT* the nodes
themselves
NOTE 2: DO NOT try to convert the whole queue to a Python
list for playing with splices.
NOTE 3: remember to update _size, _head and _tail when needed.
For example, supposing arrows represent _next pointers:
_head _tail
a -> b -> c -> d -> e -> f -> g
q.deqn(3) will return the Python list ['a', 'b', 'c']
After the call, the queue will be like this:
_head _tail
d -> e -> f -> g
"""
Testing: python3 -m unittest linked_queue_test.DeqnTest
2. CircularQueue¶
A circular queue is a data structure which when initialized occupies a fixed amount of memory called capacity. Typically, fixed size data structures are found in systems programming (i.e. programming drivers), when space is constrained and you want predictable results as much as possible. For us, it will be an example of modular arithmetic usage. In our implementation, to store data we will use a Python list, which we initialize with a number of empty cells equal to capacity. During
initialization, it does’t matter what we actually put inside cells, in this case we will use None
. Note that capacity never changes, and cells are never added nor remove from the list. What varies during execution is the actual content of the cells, the index pointing to the head of the queue (from which elements are dequeued) and another number we call size which is a number telling us how many elements are present in the queue. Summing head and size numbers will allow us to
determine where to enqueue elements at the tail of the queue - to avoid overflow, we will have to take modulus of the sum. Keep reading for details.
To implement the circular queue you can use this pseudo code:
QUESTION 2.1: Pseudo code is meant to give a general overview of the algorithms, and can often leave out implementation details, such as defining what to do when things don’t work as expected. If you were to implement this in a real life scenario, do you see any particular problem?
In our implementation, we will:
use more pythonic names, with underscores instead of camelcase.
explicitly handle exceptions and corner cases
be able to insert any kind of object in the queue
Initial queue will be populated with
None
objects, and will have length set to provided capacity_size
is the current dimension of the queue, which is different from the initial providedcapacity
.we consider
capacity
as fixed: it will never change during execution. For this reason, since we use a Python list to represent the data, we don’t need an extra variable to hold it, just getting the list length will suffice._head
is an index pointing to the next element to be dequeuedelements are inserted at the position pointed to by
(_head + _size) % capacity()
, and dequeued from position pointed by_head
. The module%
operator allows using a list as it were circular, that is, if an index apparently falls outside the list, with the modulus it gets transformed to a small index. Since_size
can never exceedcapacity()
, the formula(_head + _size) % capacity()
never points to a place which could overwrite elements not yet dequeued, except cases when the queue has_size==0
or_size==capacity()
which are to be treated as special.enqueuing and dequeing operations don’t modify list length !
QUESTION 2.2: If we can insert any kind of object in the queue including None
, are we going to have troubles with definitions like top()
above?
2.1 Implementation¶
Implement methods in file circular_queue_exercise.py
in the order they are presented, and test them with circular_queue_test.py
python3 -m unittest circular_queue_test
3. ItalianQueue¶
You will implement an ItalianQueue
, modelled as a LinkedList with two pointers, a _head
and a _tail
.
an element is enqueued scanning from
_head
until a matching group is found, in which case are inserted after (that is, at the right) of the matching group, otherwise the element is appended at the_tail
an element is dequeued from the
_head
3.1 Slow v1¶
To gain some understanding about the data structure, look at the following excerpts.
Excerpt from Node
:
class Node:
""" A Node of an ItalianQueue.
Holds both data and group provided by the user.
"""
def __init__(self, initdata, initgroup):
def get_data(self):
def get_group(self):
def get_next(self):
# etc ..
Excerpt from ItalianQueue
class:
class ItalianQueue:
""" An Italian queue, v1.
- Implemented as a LinkedList
- Worst case enqueue is O(n)
- has extra methods, for accessing groups and tail:
- top_group()
- tail()
- tail_group()
Each element is assigned a group; during enqueing, queue is scanned
from head to tail to find if there is another element with a
matching group.
- If there is, element to be enqueued is inserted after the last
element in the same group sequence (that is, to the right of
the group)
- otherwise the element is inserted at the end of the queue
"""
def __init__(self):
""" Initializes the queue. Note there is no capacity as parameter
- MUST run in O(1)
"""
Example:
[10]:
from italian_queue_solution import *
q = ItalianQueue()
print(q)
ItalianQueue:
_head: None
_tail: None
[11]:
q.enqueue('a','x') # 'a' is the element,'x' is the group
[12]:
print(q)
ItalianQueue: a
x
_head: Node(a,x)
_tail: Node(a,x)
[13]:
q.enqueue('c','y') # 'c' belongs to new group 'y', goes to the end of the queue
[14]:
print(q)
ItalianQueue: a->c
x y
_head: Node(a,x)
_tail: Node(c,y)
[15]:
q.enqueue('d','y') # 'd' belongs to existing group 'y', goes to the end of the group
[16]:
print(q)
ItalianQueue: a->c->d
x y y
_head: Node(a,x)
_tail: Node(d,y)
[17]:
q.enqueue('b','x') # 'b' belongs to existing group 'x', goes to the end of the group
[18]:
print(q)
ItalianQueue: a->b->c->d
x x y y
_head: Node(a,x)
_tail: Node(d,y)
[19]:
q.enqueue('f','z') # 'f' belongs to new group, goes to the end of the queue
[20]:
print(q)
ItalianQueue: a->b->c->d->f
x x y y z
_head: Node(a,x)
_tail: Node(f,z)
[21]:
q.enqueue('e','y') # 'e' belongs to an existing group 'y', goes to the end of the group
[22]:
print(q)
ItalianQueue: a->b->c->d->e->f
x x y y y z
_head: Node(a,x)
_tail: Node(f,z)
[23]:
q.enqueue('g','z') # 'g' belongs to an existing group 'z', goes to the end of the group
[24]:
print(q)
ItalianQueue: a->b->c->d->e->f->g
x x y y y z z
_head: Node(a,x)
_tail: Node(g,z)
[25]:
q.enqueue('h','z') # 'h' belongs to an existing group 'z', goes to the end of the group
[26]:
print(q)
ItalianQueue: a->b->c->d->e->f->g->h
x x y y y z z z
_head: Node(a,x)
_tail: Node(h,z)
Dequeue is always from the head, without taking in consideration the group:
[27]:
q.dequeue()
[27]:
'a'
[28]:
print(q)
ItalianQueue: b->c->d->e->f->g->h
x y y y z z z
_head: Node(b,x)
_tail: Node(h,z)
[29]:
q.dequeue()
[29]:
'b'
[30]:
print(q)
ItalianQueue: c->d->e->f->g->h
y y y z z z
_head: Node(c,y)
_tail: Node(h,z)
[31]:
q.dequeue()
[31]:
'c'
[32]:
print(q)
ItalianQueue: d->e->f->g->h
y y z z z
_head: Node(d,y)
_tail: Node(h,z)
3.1.1 init¶
Implement methods in file italian_queue_exercise.py
in the order they are presented up until enqueue
excluded
Testing: python3 -m unittest italian_queue_test.InitEmptyTest
3.1.2 Slow enqueue¶
Implement version 1 of enqueue
running in \(O(n)\) where \(n\) is the queue size.
def enqueue(self, v, g):
""" Enqueues provided element v having group g, with the following
criteria:
Queue is scanned from head to find if there is another element
with a matching group:
- if there is, v is inserted after the last element in the
same group sequence (so to the right of the group)
- otherwise v is inserted at the end of the queue
- MUST run in O(n)
"""
Testing: python3 -m unittest italian_queue_test.EnqueueTest
QUESTION: The ItalianQueue was implemented as a LinkedList. Even if this time we don’t care much about perfomance, if we wanted an efficient enqueue
operation, could we start with a circular data structure ? Or would you prefer improving a LinkedList ?
3.1.2 dequeue¶
Implement version 1 of dequeue
running in \(O(1)\)
def dequeue(self):
""" Removes head element and returns it.
- If the queue is empty, raises a LookupError.
- MUST run in O(1)
"""
Testing: python3 -m unittest italian_queue_test.DequeueTest
3.2 Fast v2¶
3.2.1 Save a copy¶
You already wrote a lot of code, and you don’t want to lose it, right? Since we are going to make many modifications, when you reach a point when the code does something useful, it is good practice to save a copy of what you have done somewhere, so if you later screw up something, you can always restore the copy.
Copy the whole folder
queues
in a new folderqueues_v1
Add also in the copied folder a separate
README.txt
file, writing inside the version (like1.0
), the date, and a description of the main features you implemented (for example “Simple Italian Queue, not particularly performant”).Backing up the work is a form of the so-called versioning : there are much better ways to do it (like using git) but we don’t address them here.
WARNING: DO NOT SKIP THIS STEP!
No matter how smart you are, you will fail, and a backup may be the only way out.
WARNING: NOT CONVINCED YET?
If you still don’t understand why you should spend time with this copy bureaucracy, to help you enter the right mood imagine tomorrow is demo day with your best client and you screw up the only working version: your boss will skin you alive.
3.2.2 Improve enqueue¶
Improve enqueue
so it works in \(O(1)\)
HINT:
You will need an extra data structure that keeps track of the starting points of each group and how they are ordered
You will also need to update this data structure as
enqueue
anddequeue
calls are made
4. Supermarket queues¶
In this exercises, you will try to model a supermarket containing several cash queues.
CashQueue¶
WARNING: DO *NOT* MODIFY CashQueue CLASS
For us, a CashQueue
is a simple queue of clients represented as strings. A CashQueue
supports the enqueue
, dequeue
, size
and is_empty
operations:
Clients are enqueued at the right, in the tail
Clients are dequeued from the left, removing them from the head
For example:
q = CashQueue()
q.is_empty() # True
q.enqueue('a') # a
q.enqueue('b') # a,b
q.enqueue('c') # a,b,c
q.size() # 3
q.dequeue() # returns: a
# queue becomes: [b,c]
q.dequeue() # returns: b
# queue becomes: [c]
q.dequeue() # returns: c
# queue becomes: []
q.dequeue() # raises LookupError as there aren't enough elements to remove
Supermarket¶
A Supermarket
contains several cash queues. It is possible to initialize a Supermarket
by providing queues as simple python lists, where the first clients arrived are on the left, and the last clients are on the right.
For example, by calling:
s = Supermarket([
['a','b','c'], # <------ clients arrive from right
['d'],
['f','g']
])
internally three CashQueue
objects are created. Looking at the first queue with clients ['a','b','c']
, a
at the head arrived first and c
at the tail arrived last
>>> print(s)
Supermarket
0 CashQueue: ['a', 'b', 'c']
1 CashQueue: ['d']
2 CashQueue: ['f', 'g']
Note a supermarket must have at least one queue, which may be empty:
s = Supermarket( [[]] )
>>> print(s)
Supermarket
0 CashQueue: []
Supermarket as a queue¶
Our Supermarket
should maximize the number of served clients (we assume each clients is served in an equal amount of time). To do so, the whole supermarket itself can be seen as a particular kind of queue, which allows the enqueue
and dequeue
operations described as follows:
by calling
supermarket.enqueue(client)
a client gets enqueued in the shortestCashQueue
.by calling
supermarket.dequeue()
, all clients which are at the heads of non-emptyCashQueue
s are dequeued all at once, and their list is returned (this simulates parallelism).
Implementation¶
Now start editing supermarket_exercise.py
implementing methods in the following points.
4.1 Supermarket size¶
Implement Supermarket.size
:
def size(self):
""" Return the total number of clients present in all cash queues.
"""
Testing: python3 -m unittest supermarket_test.SizeTest
4.2 Supermarket dequeue¶
Implement Supermarket.dequeue
:
def dequeue(self):
""" Dequeue all the clients which are at the heads of non-empty cash queues,
and return a list of such clients.
- clients are returned in the same order as found in the queues
- if supermarket is empty, an empty list is returned
For example, suppose we have following supermarket:
0 ['a','b','c']
1 []
2 ['d','e']
3 ['f']
A call to deque() will return ['a','d','f']
and the supermarket will now look like this:
0 ['b','c']
1 []
2 ['e']
3 []
"""
Testing: python3 -m unittest supermarket_test.DequeueTest
4.3 Supermarket enqueue¶
Implement Supermarket.enqueue
:
def enqueue(self, client):
""" Enqueue provided client in the cash queue with minimal length.
If more than one minimal length cash queue is available, the one
with smallest index is chosen.
For example:
If we have supermarket
0 ['a','b','c']
1 ['d','e','f','g']
2 ['h','i']
3 ['m','n']
since queues 2 and 3 have both minimal length 2,
supermarket.enqueue('z') will enqueue the client on queue 2:
0 ['a','b','c']
1 ['d','e','f','g']
2 ['h','i','z']
3 ['m','n']
"""
Testing: python3 -m unittest supermarket_test.EnqueueTest
5. Shopping mall queues¶
In this exercises, you will try to model a shopping mall containing several shops and clients.
Client¶
WARNING: DO *NOT* MODIFY Client CLASS
For us, a Client
is composed by a name (in the exercise we will use a
, b
, c
…) and a list of shops he wants to visit as a list. We will identify the shops with letters such as x
, y
, z
…
Note: shops to visit are a Python list intended as a stack, so the first shop to visit is at end (top) of the list
Example:
c = Client('f', ['y','x','z'])
creates a Client
named f
who wants to visit first the shop z
, then x
and finally y
Methods:
>>> print(c.name())
a
>>> print(c.to_visit())
['z','x','y']
Shop¶
WARNING: DO *NOT* MODIFY Shop CLASS
For us, a Shop
is a class with a name and a queue of clients. A Shop
supports the name
, enqueue
, dequeue
, size
and is_empty
operations:
Clients are enqueued at the right, in the tail
Clients are dequeued from the left, removing them from the head
For example:
s = Shop('x') # creates a shop named 'x'
print(s.name()) # prints x
s.is_empty() # True
s.enqueue('a') # a enqueues client 'a'
s.enqueue('b') # a,b
s.enqueue('c') # a,b,c
s.size() # 3
s.dequeue() # returns: a
# queue becomes: [b,c]
s.dequeue() # returns: b
# queue becomes: [c]
s.dequeue() # returns: c
# queue becomes: []
s.dequeue() # raises LookupError as there aren't enough elements to remove
Mall¶
A shopping Mall
contains several shops and clients. It is possible to initialize a Mall
by providing
shops as a list of values
shop name , client list
, where the first clients arrived are on the left, and the last clients are on the right.clients as a list of values
client name , shop to visit list
For example, by calling:
m = Mall(
[
'x', ['a','b','c'], # <------ clients arrive from right
'y', ['d'],
'z', ['f','g']
],
[
'a',['y','x'],
'b',['x'],
'c',['x'],
'd',['z','y'], # IMPORTANT: shops to visit stack grows from right, so
'f',['y','x','z'], # client 'f' wants to visit first shop 'z', then 'x', and finally 'y'
'g',['x','z']
])
Internally:
three
Shop
objects are created in anOrderedDict
. Looking at the first queue with clients['a','b','c']
,a
at the head arrived first andc
at the tail arrived last.6
Client
objects are created in anOrderedDict
. Note if a client is in a particular shop queue, that shop must be his top desired shop to visit in its stack.
>>> print(s)
Mall
Shop x: ['a', 'b', 'c']
Shop y: ['d']
Shop z: ['f', 'g']
Client a: ['y','x']
Client b: ['x']
Client c: ['x']
Client d: ['z','y']
Client f: ['x','y','z']
Client g: ['x','z']
Methods:
>>> m.shops()
OrderedDict([
('x', Shop x: ['a', 'b', 'c'])
('y', Shop y: ['d'])
('z', Shop z: ['f', 'g'])
])
>>> m.clients()
OrderedDict([
('a', Client a: ['y','x']),
('b', Client b: ['x']),
('c', Client c: ['x']),
('d', Client d: ['z','y']),
('f', Client f: ['x','y','z']),
('g', Client g: ['x','z'])
])
Note a mall must have at least one shop and may have zero clients:
m = Mall( {'x':[]}, {} )
>>> print(m)
Mall
Shop x: []
Mall as a queue¶
Our Mall
should maximize the number of served clients (we assume each clients is served in an equal amount of time). To do so, the whole mall itself can be seen as a particular kind of queue, which allows the enqueue
and dequeue
operations described as follows:
by calling
mall.enqueue(client)
a client gets enqueued in the topShop
he wants to visit (its desired shop to visit list doesn’t change)by calling
mall.dequeue()
all clients which are at the heads of non-empty
Shop
s are dequeued all at oncetheir top desired shop to visit is removed
if a client has any shop to visit left, he is automatically enqueued in that
Shop
the list of clients with no shops to visit is returned (this simulates parallelism)
Implementation¶
Now start editing mall_exercise.py
implementing methods in the following points.
6.1 Mall enqueue¶
Implement Mall.enqueue
method:
def enqueue(self, client):
""" Enqueue provided client in the top shop he wants to visit
- If client is already in the mall, raise ValueError
- if client has no shop to visit, raise ValueError
- If any of the shops to visit are not in the mall, raise ValueError
For example:
If we have this mall:
Mall
Shop x: ['a','b']
Shop y: ['c']
Client a: ['y','x']
Client b: ['x']
Client c: ['x','y']
mall.enqueue(Client('d',['x','y'])) will enqueue the client in Shop y :
Mall
Shop x: ['a','b']
Shop y: ['c','d']
Client a: ['y','x']
Client b: ['x']
Client c: ['x','y']
Client d: ['x','y']
"""
Testing: python3 -m unittest mall_test.EnqueueTest
6.2 Mall dequeue¶
Implement Mall.dequeue
method:
def dequeue(self):
""" Dequeue all the clients which are at the heads of non-empty
shop queues,enqueues clients in their next shop to visit and return
a list of names of clients that exit the mall.
In detail:
- shop list is scanned, and all clients which are at the heads
of non-empty Shops are dequeued
VERY IMPORTANT HINT: FIRST put all this clients in a list,
THEN using that list do all of the following
- for each dequeued client, his top desired shop is removed from
his visit list
- if a client has a shop to visit left, he is automatically
enqueued in that Shop
- clients are enqueued in the same order they were dequeued
from shops
- the list of clients with no shops to visit anymore
is returned (this simulates parallelism)
- clients are returned in the same order they were dequeued
from shops
- if mall has no clients, an empty list is returned
"""
Testing: python3 -m unittest mall_test.DequeueTest
For example, suppose we have following mall:
[33]:
from mall_solution import *
[34]:
m = Mall([
'x', ['a', 'b', 'c'],
'y', ['d'],
'z', ['f', 'g']
],
[
'a', ['y', 'x'],
'b', ['x'],
'c', ['x'],
'd', ['z','y'],
'f', ['y','x','z'],
'g', ['x','z']
])
[35]:
print(m)
Mall
Shop x : ['a', 'b', 'c']
Shop y : ['d']
Shop z : ['f', 'g']
Client a : ['y', 'x']
Client b : ['x']
Client c : ['x']
Client d : ['z', 'y']
Client f : ['y', 'x', 'z']
Client g : ['x', 'z']
[36]:
m.dequeue() # first call
[36]:
[]
Clients ‘a’, ‘d’ and ‘f’ change shop, the others stay in their current shop. The mall will now look like this:
[37]:
print(m)
Mall
Shop x : ['b', 'c', 'f']
Shop y : ['a']
Shop z : ['g', 'd']
Client a : ['y']
Client b : ['x']
Client c : ['x']
Client d : ['z']
Client f : ['y', 'x']
Client g : ['x', 'z']
[38]:
m.dequeue() # second call
[38]:
['b', 'a']
because client ‘b’ was top shop in the list, ‘a’ in the second, and both clients had nothing else to visit. Client ‘g’ changes shop, the others remain in their current shop.
The mall will now look like this:
[39]:
print(m) # Clients a and b are gone
Mall
Shop x : ['c', 'f', 'g']
Shop y : []
Shop z : ['d']
Client c : ['x']
Client d : ['z']
Client f : ['y', 'x']
Client g : ['x']
[40]:
m.dequeue() # third call
[40]:
['c', 'd']
[41]:
print(m)
Mall
Shop x : ['f', 'g']
Shop y : []
Shop z : []
Client f : ['y', 'x']
Client g : ['x']
[42]:
m.dequeue() # fourth call
[42]:
[]
[43]:
print(m)
Mall
Shop x : ['g']
Shop y : ['f']
Shop z : []
Client f : ['y']
Client g : ['x']
[44]:
m.dequeue() # fifth call
[44]:
['g', 'f']
[45]:
print(m)
Mall
Shop x : []
Shop y : []
Shop z : []
6. Company queues¶
We can model a company as a list of many employees ordered by their rank, the highest ranking being the first in the list. We assume all employees have different rank. Each employee has a name, a rank, and a queue of tasks to perform (as a Python deque).
When a new employee arrives, it is inserted in the list in the right position according to his rank:
[46]:
from company_solution import *
c = Company()
print(c)
Company:
name rank tasks
[47]:
c.add_employee('x',9)
[48]:
print(c)
Company:
name rank tasks
x 9 deque([])
[49]:
c.add_employee('z',2)
[50]:
print(c)
Company:
name rank tasks
x 9 deque([])
z 2 deque([])
[51]:
c.add_employee('y',6)
[52]:
print(c)
Company:
name rank tasks
x 9 deque([])
y 6 deque([])
z 2 deque([])
7.1 add_employee¶
Implement this method:
def add_employee(self, name, rank):
"""
Adds employee with name and rank to the company, maintaining
the _employees list sorted by rank (higher rank comes first)
Represent the employee as a dictionary with keys 'name', 'rank'
and 'tasks' (a Python deque)
- here we don't mind about complexity, feel free to use a
linear scan and .insert
- If an employee of the same rank already exists, raise ValueError
- if an employee of the same name already exists, raise ValueError
"""
Testing: python3 -m unittest company_test.AddEmployeeTest
7.2 add_task¶
Each employee has a queue of tasks to perform. Tasks enter from the right and leave from the left. Each task has associated a required rank to perform it, but when it is assigned to an employee the required rank may exceed the employee rank or be far below the employee rank. Still, when the company receives the task, it is scheduled in the given employee queue, ignoring the task rank.
[53]:
c.add_task('a',3,'x')
[54]:
c
[54]:
Company:
name rank tasks
x 9 deque([('a', 3)])
y 6 deque([])
z 2 deque([])
[55]:
c.add_task('b',5,'x')
[56]:
c
[56]:
Company:
name rank tasks
x 9 deque([('a', 3), ('b', 5)])
y 6 deque([])
z 2 deque([])
[57]:
c.add_task('c',12,'x')
c.add_task('d',1,'x')
c.add_task('e',8,'y')
c.add_task('f',2,'y')
c.add_task('g',8,'y')
c.add_task('h',10,'z')
[58]:
c
[58]:
Company:
name rank tasks
x 9 deque([('a', 3), ('b', 5), ('c', 12), ('d', 1)])
y 6 deque([('e', 8), ('f', 2), ('g', 8)])
z 2 deque([('h', 10)])
Implement this function:
def add_task(self, task_name, task_rank, employee_name):
""" Append the task as a (name, rank) tuple to the tasks of
given employee
- If employee does not exist, raise ValueError
"""
Testing: python3 -m unittest company_test.AddTaskTest
7.3 work¶
Work in the company is produced in work steps. Each work step produces a list of all task names executed by the company in that work step.
A work step is done this way:
For each employee, starting from the highest ranking one, dequeue its current task (from the left), and than compare the task required rank with the employee rank according to these rules:
When an employee discovers a task requires a rank strictly greater than his rank, he will append the task to his supervisor tasks. Note the highest ranking employee may be forced to do tasks that are greater than his rank.
When an employee discovers he should do a task requiring a rank strictly less than his, he will try to see if the next lower ranking employee can do the task, and if so append the task to that employee tasks.
When an employee cannot pass the task to the supervisor nor the next lower ranking employee, he will actually execute the task, adding it to the work step list
Example:
[59]:
c
[59]:
Company:
name rank tasks
x 9 deque([('a', 3), ('b', 5), ('c', 12), ('d', 1)])
y 6 deque([('e', 8), ('f', 2), ('g', 8)])
z 2 deque([('h', 10)])
[60]:
c.work()
DEBUG: Employee x gives task ('a', 3) to employee y
DEBUG: Employee y gives task ('e', 8) to employee x
DEBUG: Employee z gives task ('h', 10) to employee y
DEBUG: Total performed work this step: []
[60]:
[]
[61]:
c
[61]:
Company:
name rank tasks
x 9 deque([('b', 5), ('c', 12), ('d', 1), ('e', 8)])
y 6 deque([('f', 2), ('g', 8), ('a', 3), ('h', 10)])
z 2 deque([])
[62]:
c.work()
DEBUG: Employee x gives task ('b', 5) to employee y
DEBUG: Employee y gives task ('f', 2) to employee z
DEBUG: Employee z executes task ('f', 2)
DEBUG: Total performed work this step: ['f']
[62]:
['f']
[63]:
c
[63]:
Company:
name rank tasks
x 9 deque([('c', 12), ('d', 1), ('e', 8)])
y 6 deque([('g', 8), ('a', 3), ('h', 10), ('b', 5)])
z 2 deque([])
[64]:
c.work()
DEBUG: Employee x executes task ('c', 12)
DEBUG: Employee y gives task ('g', 8) to employee x
DEBUG: Total performed work this step: ['c']
[64]:
['c']
[65]:
c
[65]:
Company:
name rank tasks
x 9 deque([('d', 1), ('e', 8), ('g', 8)])
y 6 deque([('a', 3), ('h', 10), ('b', 5)])
z 2 deque([])
[66]:
c.work()
DEBUG: Employee x gives task ('d', 1) to employee y
DEBUG: Employee y executes task ('a', 3)
DEBUG: Total performed work this step: ['a']
[66]:
['a']
[67]:
c
[67]:
Company:
name rank tasks
x 9 deque([('e', 8), ('g', 8)])
y 6 deque([('h', 10), ('b', 5), ('d', 1)])
z 2 deque([])
[68]:
c.work()
DEBUG: Employee x executes task ('e', 8)
DEBUG: Employee y gives task ('h', 10) to employee x
DEBUG: Total performed work this step: ['e']
[68]:
['e']
[69]:
c
[69]:
Company:
name rank tasks
x 9 deque([('g', 8), ('h', 10)])
y 6 deque([('b', 5), ('d', 1)])
z 2 deque([])
[70]:
c.work()
DEBUG: Employee x executes task ('g', 8)
DEBUG: Employee y executes task ('b', 5)
DEBUG: Total performed work this step: ['g', 'b']
[70]:
['g', 'b']
[71]:
c
[71]:
Company:
name rank tasks
x 9 deque([('h', 10)])
y 6 deque([('d', 1)])
z 2 deque([])
[72]:
c.work()
DEBUG: Employee x executes task ('h', 10)
DEBUG: Employee y gives task ('d', 1) to employee z
DEBUG: Employee z executes task ('d', 1)
DEBUG: Total performed work this step: ['h', 'd']
[72]:
['h', 'd']
[73]:
c
[73]:
Company:
name rank tasks
x 9 deque([])
y 6 deque([])
z 2 deque([])
Now implement this method:
def work(self):
""" Performs a work step and RETURN a list of performed task names.
For each employee, dequeue its current task from the left and:
- if the task rank is greater than the rank of the
current employee, append the task to his supervisor queue
(the highest ranking employee must execute the task)
- if the task rank is lower or equal to the rank of the
next lower ranking employee, append the task to that employee
queue
- otherwise, add the task name to the list of
performed tasks to return
"""
Testing: python3 -m unittest company_test.WorkTest
7. Concert¶
Start editing file concert_exercise.py
.
When there are events with lots of potential visitors such as concerts, to speed up check-in there are at least two queues: one for cash where tickets are sold, and one for the actual entrance at the event.
Each visitor may or may not have a ticket. Also, since people usually attend in groups (coupls, families, and so on), in the queue lines each group tends to move as a whole.
In Python, we will model a Person
as a class you can create like this:
[74]:
from concert_solution import *
[75]:
Person('a', 'x', False)
[75]:
Person(a,x,False)
a
is the name, 'x'
is the group, and False
indicates the person doesn’t have ticket
To model the two queues, in Concert
class we have these fields and methods:
class Concert:
def __init__(self):
self._cash = deque()
self._entrance = deque()
def enqc(self, person):
""" Enqueues at the cash from the right """
self._cash.append(person)
def enqe(self, person):
""" Enqueues at the entrance from the right """
self._entrance.append(person)
7.1 dequeue¶
✪✪✪ Implement dequeue
. If you want, you can add debug prints by calling the debug
function.
def dequeue(self):
""" RETURN the names of people admitted to concert
Dequeuing for the whole queue system is done in groups, that is,
with a _single_ call to dequeue, these steps happen, in order:
1. entrance queue: all people belonging to the same group at
the front of entrance queue who have the ticket exit the queue
and are admitted to concert. People in the group without the
ticket are sent to cash.
2. cash queue: all people belonging to the same group at the front
of cash queue are given a ticket, and are queued at the entrance queue
"""
Testing: python3 -m unittest concert_test.DequeueTest
Example:
[76]:
con = Concert()
con.enqc(Person('a','x',False)) # a,b,c belong to same group x
con.enqc(Person('b','x',False))
con.enqc(Person('c','x',False))
con.enqc(Person('d','y',False)) # d belongs to group y
con.enqc(Person('e','z',False)) # e,f belongs to group z
con.enqc(Person('f','z',False))
con.enqc(Person('g','w',False)) # g belongs to group w
[77]:
con
[77]:
Concert:
cash: deque([Person(a,x,False),
Person(b,x,False),
Person(c,x,False),
Person(d,y,False),
Person(e,z,False),
Person(f,z,False),
Person(g,w,False)])
entrance: deque([])
First time we dequeue, entrance queue is empty so no one enters concert, while at the cash queue people in group x
are given a ticket and enqueued at the entrance queue
NOTE: The messages on the console are just debug print, the function dequeue
only return name sof people admitted to concert
[78]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: giving ticket to a (group x)
DEBUG: giving ticket to b (group x)
DEBUG: giving ticket to c (group x)
DEBUG: Concert:
cash: deque([Person(d,y,False),
Person(e,z,False),
Person(f,z,False),
Person(g,w,False)])
entrance: deque([Person(a,x,True),
Person(b,x,True),
Person(c,x,True)])
[78]:
[]
[79]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: a (group x) admitted to concert
DEBUG: b (group x) admitted to concert
DEBUG: c (group x) admitted to concert
DEBUG: giving ticket to d (group y)
DEBUG: Concert:
cash: deque([Person(e,z,False),
Person(f,z,False),
Person(g,w,False)])
entrance: deque([Person(d,y,True)])
[79]:
['a', 'b', 'c']
[80]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: d (group y) admitted to concert
DEBUG: giving ticket to e (group z)
DEBUG: giving ticket to f (group z)
DEBUG: Concert:
cash: deque([Person(g,w,False)])
entrance: deque([Person(e,z,True),
Person(f,z,True)])
[80]:
['d']
[81]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: e (group z) admitted to concert
DEBUG: f (group z) admitted to concert
DEBUG: giving ticket to g (group w)
DEBUG: Concert:
cash: deque([])
entrance: deque([Person(g,w,True)])
[81]:
['e', 'f']
[82]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: g (group w) admitted to concert
DEBUG: Concert:
cash: deque([])
entrance: deque([])
[82]:
['g']
[83]:
# calling dequeue on empty lines gives empty list:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: Concert:
cash: deque([])
entrance: deque([])
[83]:
[]
Special dequeue case: broken group¶
In the special case when there is a group at the entrance with one or more members without a ticket, it is assumed that the group gets broken, so whoever has the ticket enters and the others get enqueued at the cash.
[84]:
con = Concert()
con.enqe(Person('a','x',True))
con.enqe(Person('b','x',False))
con.enqe(Person('c','x',True))
con.enqc(Person('f','y',False))
con
[84]:
Concert:
cash: deque([Person(f,y,False)])
entrance: deque([Person(a,x,True),
Person(b,x,False),
Person(c,x,True)])
[85]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: a (group x) admitted to concert
DEBUG: b (group x) has no ticket! Sending to cash
DEBUG: c (group x) admitted to concert
DEBUG: giving ticket to f (group y)
DEBUG: Concert:
cash: deque([Person(b,x,False)])
entrance: deque([Person(f,y,True)])
[85]:
['a', 'c']
[86]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: f (group y) admitted to concert
DEBUG: giving ticket to b (group x)
DEBUG: Concert:
cash: deque([])
entrance: deque([Person(b,x,True)])
[86]:
['f']
[87]:
con.dequeue()
DEBUG: DEQUEUING ..
DEBUG: b (group x) admitted to concert
DEBUG: Concert:
cash: deque([])
entrance: deque([])
[87]:
['b']
[88]:
con
[88]:
Concert:
cash: deque([])
entrance: deque([])
[89]:
m.dequeue() # no clients left
[89]:
[]
[ ]:
Trees¶
0. Introduction¶
We will deal with both binary and generic trees.
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|-trees
|- trees.ipynb
|- bin_tree_test.py
|- bin_tree_exercise.py
|- bin_tree_solution.py
|- gen_tree_test.py
|- gen_tree_exercise.py
|- gen_tree_solution.py
open the editor of your choice (for example Visual Studio Code, Spyder or PyCharme), you will edit the files ending in
_exercise.py
filesGo on reading this notebook, and follow instuctions inside.
BT 0. Binary Tree Introduction¶
BT 0.2 Terminology - relations¶
BT 0.3 Terminology - levels¶
BT 0.4 Terminology - shapes¶
In this worksheet we are first going to provide an implementation of a BinaryTree
class:
Differently from the
LinkedList
, which actually had two classesNode
andLinkedList
that was pointing to the first node, in this case we just have oneBinaryTree
class.Each
BinaryTree
instance may have a leftBinaryTree
instance and may have a rightBinaryTree
instance, while absence of a branch is marked withNone
. This reflects the recursive nature of trees.To grow a tree, first you need to create an instance of
BinaryTree
, and then you call.insert_left
or.insert_right
methods on it and pass data. Keep reading to see how to do it.
BT 0.2 Code skeleton¶
Look at the files:
exercises/trees/bin_tree_exercise.py
: the exercise to editexercises/trees/bin_tree_test.py
: the tests to run. Do not modify this file.
Before starting to implement methods in BinaryTree
class, read all the following sub sections (starting with ‘0.x’)
BT 0.3 Building trees¶
Let’s learn how to build BinaryTree
. For these trials, feel free to launch a Python 3 interpreter and load this module:
[2]:
from bin_tree_solution import *
BT 0.3.1 Pointers¶
A BinaryTree
class holds 2 pointers that link it to other nodes: _left
, and _right
It also holds a value data
which is provided by the user to store arbitrary data (could be ints, strings, lists, even other trees, we don’t care):
class BinaryTree:
def __init__(self, data):
self._data = data
self._left = None
self._right = None
NOTE: BinaryTree
as defined here is unidirectional, that is, has no backlinks (so no _parent
field).
Formally, a tree as described in discrete mathematics books is always unidirectional (can’t have any cycle) and every node can have at most one incoming link. When we program, though, for convenience we may decide to have or not have backlinks (later with GenericTree
we will see an example)
To create a BinaryTree
of one node, just call the constructor passing whatever you want like this:
[3]:
tblah = BinaryTree("blah")
tn = BinaryTree(5)
Note that with the provided constructor you can’t pass children.
BT 0.3.2 Building with insert_left
¶
To grow a BinaryTree
, as basic building block you will have to implement insert_left
:
def insert_left(self, data):
""" Takes as input DATA (*NOT* a node !!) and MODIFIES current
node this way:
- First creates a new BinaryTree (let's call it B) into which
provided data is wrapped.
- Then:
- if there is no left node in self, new node B is attached to
the left of self
- if there already is a left node L, it is substituted by
new node B, and L becomes the left node of B
"""
You can call it like this:
[4]:
t = BinaryTree('a')
t.insert_left('c')
[5]:
print(t)
a
├c
└
[6]:
t.insert_left('b')
[7]:
print(t)
a
├b
│├c
│└
└
[8]:
t.left().data()
[8]:
'b'
[9]:
t.left().left().data()
[9]:
'c'
BT 0.3.3 Building with bt
¶
If you need to test your data structure, we provide you with this handy function bt
in bin_tree_test
module that allows to easily construct trees from other trees.
WARNING: DO NOT USE bt
inside your implementation code !!!! bt
is just meant for testing.
def bt(*args):
""" Shorthand function that returns a GenericTree containing the provided
data and children. First parameter is the data, the following ones are the children.
[10]:
from bin_tree_test import bt
bt('a')
print(bt('a'))
a
[11]:
print(bt('a', None, bt('b')))
a
├
└b
[12]:
print(bt('a', bt('b'), bt('c')))
a
├b
└c
[13]:
print(bt('a', bt('b'), bt('c', bt('d'), None)) )
a
├b
└c
├d
└
BT 1. Insertions¶
BT 1.1 insert_left¶
Implement insert_left
def insert_left(self, data):
""" Takes as input DATA (*NOT* a node !!) and MODIFIES current node
this way:
- First creates a new BinaryTree (let's call it B) into which
provided data is wrapped.
- Then:
- if there is no left node in self, new node B is attached to
the left of self
- if there already is a left node L, it is substituted by
new node B, and L becomes the left node of B
Testing: python3 -m unittest bin_tree_test.InsertLeftTest
BT 1.2 insert_right¶
def insert_right(self, data):
""" Takes as input DATA (*NOT* a node !!) and MODIFIES current node
this way:
- First creates a new BinaryTree (let's call it B) into which
provided data is wrapped.
- Then:
- if there is no right node in self, new node B is attached
to the right of self
- if there already is a right node L, it is substituted by
new node B, and L becomes the right node of B
"""
Testing: python3 -m unittest bin_tree_test.InsertRightTest
BT 2. Recursive visit¶
In these exercises, we are going to implement methods which do recursive calls. Before doing it, we should ask oursevles why. Tyipically, recursive calls are present in funcitonal languages. Is Python one of them? Python is a general purpose language, that allows writing imperative, object-oriented code and also sports some, but not all functional programming features. Unfortunately, one notably missing feature is the capability to efficiently perform recursive calls. If too many recursive calls happen, you will probabily get a ‘Recursion limit exceed’ error. So why should we bother?
It turns out that recursive code is much shorter and elegant than corrisponding imperative one (which would often use stacks). So to gain a first understanding of problems, it might be beneficial to think about a recursive solution. After that, we may increase efficiency by explicitly using a stack instead of recursive calls.
BT 2.1 sum_rec¶
Supposing all nodes hold a number, let’s see how to write a method that returns the sum of all numbers in the tree. We can define sum recursively:
if a node has no children: the sum is equal to the node data.
if a node has only left child: the sum is equal to the node data plus the (recursive) sum of left child
if a node has only right child: the sum is equal to the node data plus the (recursive) sum of right child
if a node has both left and right child: the sum is equal to the node data plus the (recursive) sum of left child and the (recursive) sum of the right child
Example: black numbers are node data, purple numbers are the respective sums.
Let’s look at node with black number 10
: its sum is 23
, which is given by its data 10
, plus 1
( the recursive sum of the left child 1
), plus 12
( recursive sum of the right child 7
)
def sum_rec(self):
""" Supposing the tree holds integer numbers in all nodes,
RETURN the sum of the numbers.
- implement it as a recursive Depth First Search (DFS) traversal
NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.ContainsRecTest
Code example:
[14]:
t = bt(3,
bt(10,
bt(1),
bt(7,
bt(5))),
bt(9,
bt(6,
bt(2,
None,
bt(4)),
bt(8))))
print(t)
3
├10
│├1
│└7
│ ├5
│ └
└9
├6
│├2
││├
││└4
│└8
└
[15]:
t.sum_rec()
[15]:
55
BT 2.2 height_rec¶
Let’s say we want to know the height a tree, which is defined as ‘the maximum depth of all the leaves’. We can think recursively as:
the height of a node without children is 0
the height of a node with only a left child is the height of the left node plus one
the height of a node with only a right child is the height of the right node plus one
the height of a node with both left and right children is the maximum of the height of the left node and height of the right node, plus one
Look at the example and try to convince yourself this makes sense:
in purple you see nodes corresponding heights
notice how leaves have all height 0
def height_rec(self):
""" RETURN an integer which is the height of the tree
- implement it as recursive call which does NOT modify the tree
NOTE: with big trees a recursive solution would surely exceed
the call stack, but here we don't mind
- A tree with only one node has height zero.
Testing: python3 -m unittest bin_tree_test.HeightRecTest
BT 2.3 depth_rec¶
def depth_rec(self, level):
"""
- MODIFIES the tree by putting in the data field the provided
value level (which is an integer),
and recursively calls itself on left and right nodes
(if present) passing level + 1
- implement it as a recursive Depth First Search (DFS) traversal
NOTE: with big trees a recursive solution would surely exceed
the call stack, but here we don't mind
- The root of a tree has depth zero.
- does not return anything
Testing: python3 -m unittest bin_tree_test.DepthDfsTest
Example: For example, if we take this tree:
[16]:
t = bt('a', bt('b', bt('c'), None), bt('d', None, bt('e', bt('f'))))
print(t)
a
├b
│├c
│└
└d
├
└e
├f
└
After a call do depth_rec
on t
passing 0 as starting level, all letters will be substituted by the tree depth at that point:
[17]:
t.depth_rec(0)
[18]:
print(t)
0
├1
│├2
│└
└1
├
└2
├3
└
BT 2.4 contains_rec¶
def contains_rec(self, item):
""" RETURN True if at least one node in the tree has data equal
to item, otherwise RETURN False.
- implement it as a recursive Depth First Search (DFS) traversal
NOTE: with big trees a recursive solution would surely exceed
the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.ContainsRecTest
Example:
[19]:
t = bt('a',
bt('b',
bt('c'),
bt('d',
None,
bt('e'))),
bt('f',
bt('g',
bt('h')),
bt('i')))
[20]:
print(t)
a
├b
│├c
│└d
│ ├
│ └e
└f
├g
│├h
│└
└i
[21]:
t.contains_rec('g')
[21]:
True
[22]:
t.contains_rec('z')
[22]:
False
BT 2.5 join_rec¶
def join_rec(self):
""" Supposing the tree nodes hold a character each, RETURN a STRING
holding all characters IN-ORDER
- implement it as a recursive Depth First Search (DFS) traversal
NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.JoinRecTest
[23]:
t = bt('e',
bt('b',
bt('a'),
bt('c',
None,
bt('d'))),
bt('h',
bt('g',
bt('f')),
bt('i')))
[24]:
print(t)
e
├b
│├a
│└c
│ ├
│ └d
└h
├g
│├f
│└
└i
[25]:
t.join_rec()
[25]:
'abcdefghi'
BT 2.6 fun_rec¶
def fun_rec(self):
""" Supposing the tree nodes hold expressions which can either be
functions or single variables, RETURN a string holding
the complete formula with needed parenthesis.
- implement it as a recursive Depth First Search (DFS)
PRE-ORDER visit
- NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.FunRecTest
Example:
[26]:
t = bt('f',
bt('g',
bt('x'),
bt('y')),
bt('f',
bt('h',
bt('z')),
bt('w')))
[27]:
print(t)
f
├g
│├x
│└y
└f
├h
│├z
│└
└w
[28]:
t.fun_rec()
[28]:
'f(g(x,y),f(h(z),w))'
BT 2.7 bin_search_rec¶
You are given a so-called binary search tree, which holds numbers as data, and all nodes respect this constraint:
if a node A holds a number strictly less than the number held by its parent node B, then node A must be a left child of B
if a node C holds a number greater or equal than its parent node B, then node C must be a right child of B
[29]:
t = bt(7,
bt(3,
bt(2),
bt(6)),
bt(12,
bt(8,
None,
bt(11,
bt(9))),
bt(14,
bt(13))))
print(t)
7
├3
│├2
│└6
└12
├8
│├
│└11
│ ├9
│ └
└14
├13
└
Implement following method:
def bin_search_rec(self, m):
""" Assuming the tree is a binary search tree of integer numbers,
RETURN True if m is present in the tree, False otherwise
- MUST EXECUTE IN O(height(t))
- NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
raise Exception("TODO IMPLEMENT ME !")
QUESTION: what is the complexity in worst case scenario?
QUESTION: what is the complexity when tree is balanced?
Testing: python3 -m unittest bin_tree_test.BinSearchRecTest
BT 2.8 bin_insert_rec¶
def bin_insert_rec(self, m):
""" Assuming the tree is a binary search tree of integer numbers,
MODIFIES the tree by inserting a new node with the value m
in the appropriate position. Node is always added as a leaf.
- MUST EXECUTE IN O(height(t))
- NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.BinInsertRecTest
Example:
[30]:
t = bt(7)
print(t)
7
[31]:
t.bin_insert_rec(3)
print(t)
7
├3
└
[32]:
t.bin_insert_rec(6)
print(t)
7
├3
│├
│└6
└
[33]:
t.bin_insert_rec(2)
print(t)
7
├3
│├2
│└6
└
[34]:
t.bin_insert_rec(12)
print(t)
7
├3
│├2
│└6
└12
[35]:
t.bin_insert_rec(14)
print(t)
7
├3
│├2
│└6
└12
├
└14
[36]:
t.bin_insert_rec(13)
print(t)
7
├3
│├2
│└6
└12
├
└14
├13
└
[37]:
t.bin_insert_rec(8)
print(t)
7
├3
│├2
│└6
└12
├8
└14
├13
└
[38]:
t.bin_insert_rec(11)
print(t)
7
├3
│├2
│└6
└12
├8
│├
│└11
└14
├13
└
[39]:
t.bin_insert_rec(9)
print(t)
7
├3
│├2
│└6
└12
├8
│├
│└11
│ ├9
│ └
└14
├13
└
BT 2.9 univalued_rec¶
def univalued_rec(self):
""" RETURN True if the tree is univalued, otherwise RETURN False.
- a tree is univalued when all nodes have the same value as data
- MUST execute in O(n) where n is the number of nodes of the tree
- NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
"""
Testing: python3 -m unittest bin_tree_test.UnivaluedRecTest
Example:
[40]:
t = bt(3, bt(3), bt(3, bt(3, bt(3, None, bt(3)))))
print(t)
3
├3
└3
├3
│├3
││├
││└3
│└
└
[41]:
t.univalued_rec()
[41]:
True
[42]:
t = bt(2, bt(3), bt(6, bt(3, bt(3, None, bt(3)))))
print(t)
2
├3
└6
├3
│├3
││├
││└3
│└
└
[43]:
t.univalued_rec()
[43]:
False
BT 2.10 same_rec¶
def same_rec(self, other):
""" RETURN True if this binary tree is equal to other binary tree,
otherwise return False.
- MUST execute in O(n) where n is the number of nodes of the tree
- NOTE: with big trees a recursive solution would surely
exceed the call stack, but here we don't mind
- HINT: defining a helper function
def helper(t1, t2):
which recursively calls itself and assumes both of the
inputs can be None may reduce the number of ifs to write.
"""
Testing: python3 -m unittest bin_tree_test.SameRecTest
BT 3. Stack visit¶
To avoid getting ‘Recursion limit exceeded’ errors which can happen with Python, instead of using recursion we can implement tree operations with a while cycle and a stack (or a queue, depending on the case).
Typically, in these algorithms you follow this recipe:
at the beginning you put inside the stack the current node on which the method is called
you keep executing the while until the stack is empty
inside the while, you pop the stack and do some processing on the popped node data
if the node has children, you put them on the stack
We will try to reimplement this way methods we’ve already seen.
BT 3.1 sum_stack¶
Implement sum_stack
def sum_stack(self):
""" Supposing the tree holds integer numbers in all nodes,
RETURN the sum of the numbers.
- DO *NOT* use recursion
- implement it with a while and a stack (as a python list)
- In the stack place nodes to process
"""
Testing: python3 -m unittest bin_tree_test.SumStackTest
BT 3.3 height_stack¶
The idea of this function is not that different from the Tasks do_level exercise we’ve seen in the lab about stacks
def height_stack(self):
""" RETURN an integer which is the height of the tree
- A tree with only one node has height zero.
- DO *NOT* use recursion
- implement it with a while and a stack (as a python list).
- In the stack place *tuples* holding a node *and* its level
"""
Testing: python3 -m unittest bin_tree_test.HeightStackTest
BT 3.3 others¶
Hopefully you got an idea of how stack recursion works, now you could try to implement by yourself previously defined recursive functions, this time using a while and a stack (or a queue, depending on what you are trying to achieve).
BT Further resources¶
See Trees exercises on LeetCode (sort by easy difficulty), for example:
same_tree (give recursive solution)
GT 0. Generic Tree Introduction¶
See Luca Bianco Generic Tree theory
In this worksheet we are going to provide an implementation of a GenericTree
class:
Why
GenericTree
? Because many object hierarchies in real life tend to have many interlinked pointers this, in one form or anotherDifferently from the
LinkedList
, which actually had two classesNode
andLinkedList
that was pointing to the first node, in this case we just have oneGenericTree
class. So to grow a tree like the above one in the picture, for each of the boxes that you see we will need to create one instance ofGenericTree
and link it to the other instances.Ordinary simple trees just hold pointers to the children. In this case, we have an enriched tree which holds ponters also up to the parent and on the right to the siblings. Whenever we are going to manipulate the tree, we need to take good care of updating these pointers.
Do we need sidelinks and backlinks ?:
Here we use sidelinks and backlinks like _sibling
and _parent
for exercise purposes, but keep in mind such extra links need to be properly managed when you write algorithms and thus increase the likelihood of introducing bugs.
As a general rule of thumb, if you are to design a data structure, always first try to start making it unidirectional (like for example the BinaryTree we’ve seen before). Then, if you notice you really need extra links (for example to quickly traverse a tree from a node up to the root), you can always add them in a later development iteration.
**ROOT NODE**: In this context, we call a node _root_
if has no incoming edges _and_ it has no parent nor sibling
**DETACHING A NODE**: In this context, when we _detach_ a node from a tree,
the node becomes the _root_ of a new tree, which means it will have no
link anymore with the tree it was in.
GT 0.2 Code skeleton¶
Look at the files:
exercises/trees/gen_tree_exercise.py
: the exercise to editexercises/trees/gen_tree_test.py
: the tests to run. Do not modify this file.
Before starting to implement methods in GenericTree
class, read all the following sub sections (starting with ‘0.x’)
GT 0.3 Building trees¶
Let’s learn how to build GenericTree
. For these trials, feel free to launch a Python 3 interpreter and load this module:
[44]:
from gen_tree_solution import *
GT 0.3.1 Pointers¶
A GenericTree
class holds 3 pointers that link it to the other nodes: _child
, _sibling
and _parent
. So this time we have to manage more pointers, in particular beware of the _parent
one which as a matter of fact creates cycles in the structure.
It also holds a value data
which is provided by the user to store arbitrary data (could be ints, strings, lists, even other trees, we don’t care):
class GenericTree:
def __init__(self, data):
self._data = data
self._child = None
self._sibling = None
self._parent = None
To create a tree of one node, just call the constructor passing whatever you want like this:
[45]:
tblah = GenericTree("blah")
tn = GenericTree(5)
Note that with the provided constructor you can’t pass children.
GT 0.3.2 Building with insert_child
¶
To grow a GenericTree
, as basic building block you will have to implement insert_child
:
def insert_child(self, new_child):
""" Inserts new_child at the beginning of the children sequence. """
WARNING: here we insert a node !!
Differently from the BinaryTree
, this time instead of passing data we pass a node. This can cause more troubles than before, as when we add a new_child
we must be careful it doesn’t have wrong pointers. For example, think the case when you insert node B as child of node A, but by mistake you previously set B _child
field to point to A. Such a cycle would not be a tree anymore and would basically disrupt any algorithm you would try to run.
You can call it like this:
[46]:
ta = GenericTree('a')
print(ta) # 'a' is the root
a
[47]:
tb = GenericTree('b')
ta.insert_child(tb)
print(ta)
a
└b
a 'a' is the root
└b 'b' is the child . The '└' means just that it is also the last child of the siblings sequence
[48]:
tc = GenericTree('c')
ta.insert_child(tc)
print(ta)
a
├c
└b
a # 'a' is the root
├c # 'c' is inserted as the first child (would be shown on the left in the graph image)
└b # 'b' is now the next sibling of c The '\' means just that it
# is also the last child of the siblings sequence
[49]:
td = GenericTree('d')
tc.insert_child(td)
print(ta)
a
├c
│└d
└b
a # 'a' is the root
├c # 'c' is the first child of 'a'
|└d # 'd' is the first child of 'c'
└b # 'b' is the next sibling of c
GT 0.3.3 Building with gt
¶
If you need to test your data structure, we provide you with this handy function gt
in gen_tree_test
module that allows to easily construct trees from other trees.
WARNING: DO NOT USE gt
inside your implementation code !!!! gt
is just meant for testing.
def gt(*args):
""" Shorthand function that returns a GenericTree containing the provided
data and children. First parameter is the data, the following ones are the children.
[50]:
# first remember to import it from gen_tree_test:
from gen_tree_test import gt
# NOTE: this function is _not_ a class method, you can directly invoke it like this:
print(gt('a'))
a
[51]:
# NOTE: the external call gt('a', ......... ) INCLUDES gt('b') and gt('c') in the parameters !
print(gt('a', gt('b'), gt('c')))
a
├b
└c
GT 0.4 Displaying trees side by side with str_trees
¶
If you have a couple of trees, like the actual one you get from your method calls and the one you expect, it might be useful to display them side by side with the str_trees
method in gen_tree_test
module:
[52]:
# first remember to import it:
from gen_tree_test import str_trees
# NOTE: this function is _not_ a class method, you can directly invoke it like this:
print(str_trees(gt('a', gt('b')), gt('x', gt('y'), gt('z'))))
ACTUAL EXPECTED
a x
└b ├y
└z
GT 0.5 Look at the tests¶
Have a look at the gen_tree_test.py
file header, notice it imports GenericTree
class from exercises file gen_tree_exercise
:
from gen_tree_exercise import *
import unittest
GT 0.6 Look at gen_tree_test.GenericTreeTest¶
Have a quick look at GenericTreeTest
definitions inside gen_tree_test
:
class GenericTreeTest(unittest.TestCase):
def assertReturnNone(self, ret, function_name):
""" Asserts method result ret equals None """
def assertRoot(self, t):
""" Checks provided node t is a root, if not raises Exception """
def assertTreeEqual(self, t1, t2):
""" Asserts the trees t1 and t2 are equal """
We see we added extra asserts you will later find used around in test methods. Of these ones, the most important is assertTreeEqual
: when you have complex data structures like trees, it is helpful being able to compare the tree you obtain from your method calls to the tree you expect. This assertion we created provides a way to quickly display such differences.
GT 1 Implement basic methods¶
Start editing gen_tree_exercise.py
, implementing methods in GenericTree
in the order you find them in the next points.
IMPORTANT: All methods and functions without written inside raise Exception("TODO IMPLEMENT ME!")
are already provided and you don’t need to edit them !
GT 1.1 insert_child¶
Implement method insert_child
, which is the basic building block for our GenericTree
:
WARNING: here we insert a node !!
Differently from the BinaryTree
, this time instead of passing data we pass a node. This implies that inside the insert_child
method you will have to take care of pointers of new_child
: for example, you will need to set the _parent
pointer of new_child
to point to the current node you are attaching to (that is, self
)
def insert_child(self, new_child):
""" Inserts new_child at the beginning of the children sequence. """
IMPORTANT: before proceding, make sure the tests for it pass by running:
python3 -m unittest gen_tree_test.InsertChildTest
QUESTION: Look at the tests, they are quite thourough and verbose. Why ?
GT 1.2 insert_children¶
Implement insert_children
:
def insert_children(self, new_children):
""" Takes a list of children and inserts them at the beginning of the
current children sequence,
NOTE: in the new sequence new_children appear in the order they
are passed to the function!
For example:
>>> t = gt('a', gt('b'), gt('c))
>>> print t
a
├b
└c
>>> t.insert_children([gt('d'), gt('e')])
>>> print t
a
├d
├e
├b
└c
"""
HINT 1: try to reuse insert_child
, but note it inserts only to the left. Calling it on the input sequence you would get wrong ordering in the tree.
WARNING: Function description does not say anything about changing the input new_children
, so users calling your method don’t expect you to modify it ! However, you can internally produce a new Python list out of the input one, if you wish to.
Testing: python3 -m unittest gen_tree_test.InsertChildrenTest
GT 1.3 insert_sibling¶
Implement insert_sibling
:
def insert_sibling(self, new_sibling):
""" Inserts new_sibling as the *immediate* next sibling.
If self is a root, raises an Exception
"""
Testing: python3 -m unittest tree_test.InsertSiblingTest
Examples:
[53]:
tb = gt('b')
ta = gt('a', tb, gt('c'))
print(ta)
a
├b
└c
[54]:
tx = gt('x', gt('y'))
print(tx)
x
└y
[55]:
tb.insert_sibling(tx)
print(ta)
a
├b
├x
│└y
└c
QUESTION: if you call insert_sibling
an a root node such as ta
, you should get an Exception. Why? Does it make sense to have parentless brothers ?
ta.insert_sibling(g('z'))
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-35-a1e4ba8b1ee5> in <module>()
----> 1 ta.insert_sibling(gt('z'))
~/Da/prj/sciprolab2/prj/exercises/trees/tree_solution.py in insert_sibling(self, new_sibling)
128 """
129 if (self.is_root()):
--> 130 raise Exception("Can't add siblings to a root node !!")
131
132 new_sibling._parent = self._parent
Exception: Can't add siblings to a root node !!
GT 1.4 insert_siblings
¶
Testing: python3 -m unittest tree_test.InsertSiblingsTest
GT 1.5 detach_child
¶
QUESTION: does a detached child have still any parent or sibling ?
Testing: python3 -m unittest tree_test.DetachChildTest
GT 1.6 detach_sibling
¶
Testing: python3 -m unittest tree_test.DetachSiblingTest
GT 1.7 detach
¶
Testing: python3 -m unittest tree_test.DetachTest
GT 1.8 ancestors
¶
Implement ancestors
:
def ancestors(self):
""" Return the ancestors up until the root as a Python list.
First item in the list will be the parent of this node.
NOTE: this function return the *nodes*, not the data.
"""
raise Exception("TODO IMPLEMENT ME !")
Testing: python3 -m unittest gen_tree_test.AncestorsTest
Examples:
ancestors of p: f, b, a
ancestors of h: c, a
ancestors of a: empty list
GT 2 Implement more complex functions¶
After you understood well and implemented the previous methods, you can continue with the following ones:
GT 2.1 grandchildren
¶
Implement the grandchildren
method. NOTE: it returns the data inside the nodes, NOT the nodes !!!!!
def grandchildren(self):
""" Returns a python list containing the data of all the
grandchildren of this node.
- Data must be from left to right order in the tree horizontal
representation (or up to down in the vertical representation).
- If there are no grandchildren, returns an empty array.
For example, for this tree:
a
├b
│├c
│└d
│ └g
├e
└f
└h
Returns ['c','d','h']
"""
Testing: python3 -m unittest gen_tree_test.ZagTest
Examples:
[56]:
ta = gt('a', gt('b', gt('c')))
print(ta)
a
└b
└c
[57]:
print(ta.grandchildren())
['c']
[58]:
ta = gt('a', gt('b'))
print(ta)
a
└b
[59]:
print(ta.grandchildren())
[]
[60]:
ta = gt('a', gt('b', gt('c'), gt('d')), gt('e', gt('f')) )
print(ta)
a
├b
│├c
│└d
└e
└f
[61]:
print(ta.grandchildren())
['c', 'd', 'f']
GT 2.2 Zig Zag¶
Here you will be visiting a generic tree in various ways.
GT 2.2.1 zig¶
The method zig must return as output a list of data of the root and all the nodes in the chain of child attributes. Basically, you just have to follow the red lines and gather data in a list, until there are no more red lines to follow.
Testing: python3 -m unittest tree_test.ZigTest
Examples: in the labeled tree in the image, these would be the results of calling zig on various nodes:
From a: ['a','b', 'e']
From b: ['b', 'e']
From c: ['c', 'g']
From h: ['h']
From q: ['h']
GT 2.2.2 zag¶
This function is quite similar to zig, but this time it gathers data going right, along the sibling arrows.
Testing: python3 -m unittest gen_tree_test.ZagTest
Examples: in the labeled tree in the image, these would be the results of calling zag on various nodes:
From a : ['a']
From b : ['b', 'c', 'd']
From o : ['o', 'p']
GT 2.2.3 zigzag¶
As you are surely thinking, zig and zag alone are boring. So let’s mix the concepts, and go zigzaging. This time you will write a function zigzag, that first zigs collecting data along the child vertical red chain as much as it can. Then, if the last node links to at least a sibling, the method continues to collect data along the siblings horizontal chain as much as it can. At this point, if it finds a child, it goes zigging again along the child vertical red chain as much as it can, and then horizontal zaging, and so on. It continues zig-zaging like this until it reaches a node that has no child nor sibling: when this happens returns the list of data found so far.
Testing: python3 -m unittest tree_test.ZigZagTest
Examples: in the labeled tree in the image, these would be the results of calling zigzag on various nodes:
From a: ['a', 'b', 'e', 'f', 'o']
From c: ['c', 'g', 'h', 'i', 'q'] NOTE: if node h had a child z, the process would still proceed to i
From d: ['d', 'm', 'n']
From o: ['o', 'p']
From n: ['n']
GT 2.3 uncles¶
Implement the uncles
method:
def uncles(self):
""" RETURN a python list containing the data of all the uncles
of this node (that is, *all* the siblings of its parent).
NOTE: returns also the father siblings which are *BEFORE*
the father !!
- Data must be from left to right order in the tree horizontal
representation (or up to down in the vertical representation)
- If there are no uncles, returns an empty array.
For example, for this tree:
a
├b
│├c
│└d
│ └g
├e
│└h
└f
calling this method on 'h' returns ['b','f']
"""
Testing: python3 -m unittest gen_tree_test.UnclesTest
Example usages:
[62]:
td = gt('d')
tb = gt('b')
ta = gt('a', tb, gt('c', td), gt('e'))
print(ta)
a
├b
├c
│└d
└e
[63]:
print(td.uncles())
['b', 'e']
[64]:
print(tb.uncles())
[]
GT 2.4 common_ancestor¶
Implement the method common_ancestor
:
def common_ancestor(self, gt2):
""" RETURN the first common ancestor of current node and the provided
gt2 node
- If gt2 is not a node of the same tree, raises LookupError
NOTE: this function returns a *node*, not the data.
Ideally, this method should perform in O(h) where h is the height
of the tree.
HINT: you should use a Python Set). If you can't figure out how
to make it that fast, try to make it at worst O(h^2)
"""
raise Exception("TODO IMPLEMENT ME !")
Testing: python3 -m unittest gen_tree_test.CommonAncestorTest
Examples:
common ancestor of g and i: tree rooted at c
common_ancestor of g and q: tree rooted at c
common_ancestor of e and d: tree rooted at a
GT 2.5 mirror¶
def mirror(self):
""" Modifies this tree by mirroring it, that is, reverses the order
of all children of this node and of all its descendants
- MUST work in O(n) where n is the number of nodes
- MUST change the order of nodes, NOT the data (so don't touch the
data !)
- DON'T create new nodes
- It is acceptable to use a recursive method.
Example:
a <- Becomes: a
├b ├i
│├c ├e
│└d │├h
├e │├g
│├f │└f
│├g └b
│└h ├d
└i └c
"""
Testing: python3 -m unittest gen_tree_test.MirrorTest
GT 2.6 clone¶
Implement the method clone:
def clone(self):
""" Clones this tree, by returning an *entirely* new tree which is an
exact copy of this tree (so returned node and *all* its descendants
must be new).
- MUST run in O(n) where n is the number of nodes
- a recursive method is acceptable.
"""
raise Exception("TODO IMPLEMENT ME !")
Testing: python3 -m unittest gen_tree_test.CloneTest
GT 2.7 rightmost¶
In the example above, the rightmost branch of a
is given by the node sequence a
,d
,n
Implement this method:
def rightmost(self):
""" RETURN a list containing the *data* of the nodes
in the *rightmost* branch of the tree.
Example:
a
├b
├c
|└e
└d
├f
└g
├h
└i
should give
['a','d','g','i']
"""
Testing: python3 -m unittest gen_tree_test.RightmostTest
GT 2.8 fill_left¶
Open tree_exercise.py
and implement fill_left
method:
def fill_left(self, stuff):
""" MODIFIES the tree by filling the leftmost branch data
with values from provided array 'stuff'
- if there aren't enough nodes to fill, raise ValueError
- root data is not modified
- *DO NOT* use recursion
"""
Testing: python3 -m unittest gen_tree_test.FillLeftTest
Example:
[65]:
from gen_tree_test import gt
from gen_tree_solution import *
[66]:
t = gt('a',
gt('b',
gt('e',
gt('f'),
gt('g',
gt('i')),
gt('h')),
gt('c'),
gt('d')))
[67]:
print(t)
a
└b
├e
│├f
│├g
││└i
│└h
├c
└d
[68]:
t.fill_left(['x','y'])
[69]:
print(t)
a
└x
├y
│├f
│├g
││└i
│└h
├c
└d
[70]:
t.fill_left(['W','V','T'])
print(t)
a
└W
├V
│├T
│├g
││└i
│└h
├c
└d
GT 2.9 follow¶
Open tree_exercise.py
and implement follow
method:
def follow(self, positions):
"""
RETURN an array of node data, representing a branch from the
root down to a certain depth.
The path to follow is determined by given positions, which
is an array of integer indeces, see example.
- if provided indeces lead to non-existing nodes, raise ValueError
- IMPORTANT: *DO NOT* use recursion, use a couple of while instead.
- IMPORTANT: *DO NOT* attempt to convert siblings to
a python list !!!! Doing so will give you less points!
"""
Example:
level 01234
a
├b
├c
|└e
| ├f
| ├g
| |└i
| └h
└d
RETURNS
t.follow([]) [a] root data is always present
t.follow([0]) [a,b] b is the 0-th child of a
t.follow([2]) [a,d] d is the 2-nd child of a
t.follow([1,0,2]) [a,c,e,h] c is the 1-st child of a
e is the 0-th child of c
h is the 2-nd child of e
t.follow([1,0,1,0]) [a,c,e,g,i] c is the 1-st child of a
e is the 0-th child of c
g is the 1-st child of e
i is the 0-th child of g
Testing: python3 -m unittest gen_tree_test.FollowTest
GT 2.10 is_triangle¶
A triangle is a node which has exactly two children.
Let’s see some example:
a
/ \
/ \
b ----- c
/|\ /
d-e-f g
/ \
h---i
/
l
The tree above can also be represented like this:
a
├b
|├d
|├e
|└f
└c
└g
├h
└i
└l
node
a
is a triangle because has exactly two childrenb
andc
, note it doesn’t matter ifb
orc
have children)b
is not a triangle (has 3 children)c
andi
are not triangles (have only 1 child)g
is a triangle as it has exactly two childrenh
andi
d
,e
,f
,h
andl
are not triangles, because they have zero children
Now implement this method:
def is_triangle(self, elems):
""" RETURN True if this node is a triangle matching the data
given by list elems.
In order to match:
- first list item must be equal to this node data
- second list item must be equal to this node first child data
- third list item must be equal to this node second child data
- if elems has less than three elements, raises ValueError
"""
Testing: python -m unittest gen_tree_test.IsTriangleTest
Examples:
[71]:
from gen_tree_test import gt
[72]:
# this is the tree from the example above
tb = gt('b', gt('d', gt('e'), gt('f')))
tg = gt('g', gt('h'), gt('i', gt('l')))
ta = gt('a', tb, gt('c', tg))
ta.is_triangle(['a','b','c'])
[72]:
True
[73]:
ta.is_triangle(['b','c','a'])
[73]:
False
[74]:
tb.is_triangle(['b','d','e'])
[74]:
False
[75]:
tg.is_triangle(['g','h','i'])
[75]:
True
[76]:
tg.is_triangle(['g','i','h'])
[76]:
False
GT 2.11 has_triangle¶
Implement this method:
def has_triangle(self, elems):
""" RETURN True if this node *or one of its descendants* is a triangle
matching given elems. Otherwise, return False.
- a recursive solution is acceptable
"""
Testing: python -m unittest gen_tree_test.HasTriangleTest
Examples:
[77]:
# example tree seen at the beginning
tb = gt('b', gt('d', gt('e'), gt('f')))
tg = gt('g', gt('h'), gt('i', gt('l')))
tc = gt('c', tg)
ta = gt('a', tb, tc)
ta.has_triangle(['a','b','c'])
[77]:
True
[78]:
ta.has_triangle(['a','c','b'])
[78]:
False
[79]:
ta.has_triangle(['b','c','a'])
[79]:
False
[80]:
tb.is_triangle(['b','d','e'])
[80]:
False
[81]:
tg.has_triangle(['g','h','i'])
[81]:
True
[82]:
tc.has_triangle(['g','h','i']) # check recursion
[82]:
True
[83]:
ta.has_triangle(['g','h','i']) # check recursion
[83]:
True
[ ]:
Graph algorithms¶
Download exercises zip¶
(before editing read whole introduction section 0.x)
What to do¶
unzip exercises in a folder, you should get something like this:
-jupman.py
-sciprog.py
-exercises
|-graph-algos
|- graph-algos.ipynb
|- graph_exercise.py
|- graph_solution.py
open the editor of your choice (for example Visual Studio Code, Spyder or PyCharme), you will edit the files ending in
_exercise.py
filesGo on reading this notebook, and follow instuctions inside.
Introduction¶
0.1 Graph theory¶
In short, a graph is a set of vertices linked by edges.
Longer version:
-
In particular, see Vocabulary and definitions
0.2 Directed graphs¶
In this worksheet we are going to use so called Directed Graphs (DiGraph
for brevity), that is, graphs with directed edges: each edge can be pictured as an arrow linking source node a
to target node b
. With such an arrow, you can go from a
to b
but you cannot go from b
to a
unless there is another edge in the reverse direction.
DiGraph
for us can also have no edges or no verteces at all.Verteces for us can be anything, strings like ‘abc’, numbers like
3
, etcIn our model, edges simply link vertices and have no weights
DiGraph
is represented as an adjacency list, mapping each vertex to the verteces it is linked to.
QUESTION: is DiGraph
model good for dense or sparse graphs?
0.3 Serious graphs¶
In this worksheet we follow the Do It Yourself methodology and create graph classes from scratch for didactical purposes. Of course, in Python world you have alread nice libraries entirely devoted to graphs like networkx, you can also use them for visualizating graphs. If you have huge graphs to process you might consider big data tools like Spark GraphX which is programmable in Python.
0.4 Code skeleton¶
First off, download the exercises zip and look at the files:
graph_exercise.py
: the exercise to editgraph_test.py
: the tests to run. Do not modify this file.
Before starting to implement methods in DiGraph
class, read all the following sub sections (starting with ‘0.x’)
0.5 Building graphs¶
IMPORTANT: All the functions in section 0 are already provided and you don’t need to implement them !
For now, open a Python 3 interpreter and try out the graph_solution
module :
[2]:
from graph_solution import *
0.5.1 Building basics¶
Let’s look at the constructor __init__
and add_vertex
. They are already provided and you don’t need to implement it:
class DiGraph:
def __init__(self):
# The class just holds the dictionary _edges: as keys it has the verteces, and
# to each vertex associates a list with the verteces it is linked to.
self._edges = {}
def add_vertex(self, vertex):
""" Adds vertex to the DiGraph. A vertex can be any object.
If the vertex already exist, does nothing.
"""
if vertex not in self._edges:
self._edges[vertex] = []
You will see that inside it just initializes _edges
. So the only way to create a DiGraph
is with a call like
[3]:
g = DiGraph()
DiGraph
provides an __str__
method to have a nice printout:
[4]:
print(g)
DiGraph()
To draw a DiGraph, you can use draw_dig
from sciprog
module - in this case draw nothing as the graph is empty:
[5]:
from sciprog import draw_dig
draw_dig(g)

You can add then vertices to the graph like so:
[6]:
g.add_vertex('a')
g.add_vertex('b')
g.add_vertex('c')
[7]:
print(g)
a: []
b: []
c: []
To draw a DiGraph, you can use draw_dig
from sciprog
module:
[8]:
from sciprog import draw_dig
draw_dig(g)

Adding a vertex twice does nothing:
[9]:
g.add_vertex('a')
print(g)
a: []
b: []
c: []
Once you added the verteces, you can start adding directed edges among them with the method add_edge
:
def add_edge(self, vertex1, vertex2):
""" Adds an edge to the graph, from vertex1 to vertex2
If verteces don't exist, raises an Exception.
If there is already such an edge, exits silently.
"""
if not vertex1 in self._edges:
raise Exception("Couldn't find source vertex:" + str(vertex1))
if not vertex2 in self._edges:
raise Exception("Couldn't find target vertex:" + str(vertex2))
if not vertex2 in self._edges[vertex1]:
self._edges[vertex1].append(vertex2)
[10]:
g.add_edge('a', 'c')
print(g)
a: ['c']
b: []
c: []
[11]:
draw_dig(g)

[12]:
g.add_edge('a', 'b')
print(g)
a: ['c', 'b']
b: []
c: []
[13]:
draw_dig(g)

Adding an edge twice makes no difference:
[14]:
g.add_edge('a', 'b')
print(g)
a: ['c', 'b']
b: []
c: []
Notice a DiGraph
can have self-loops too (also called caps):
[15]:
g.add_edge('b', 'b')
print(g)
a: ['c', 'b']
b: ['b']
c: []
[16]:
draw_dig(g)

0.5.2 dig()¶
dig()
is a shortcut to build graphs, it is already provided and you don’t need to implement it.
USE IT ONLY WHEN TESTING, *NOT* IN THE ``DiGraph`` CLASS CODE !!!!
First of all, remember to import it from graph_test
package:
[17]:
from graph_test import dig
With empty dict prints the empty graph:
[18]:
print(dig({}))
DiGraph()
To build more complex graphs, provide a dictionary with pairs source vertex / target verteces list like in the following examples:
[19]:
print(dig({'a':['b','c']}))
a: ['b', 'c']
b: []
c: []
[20]:
print(dig({'a': ['b','c'],
'b': ['b'],
'c': ['a']}))
a: ['b', 'c']
b: ['b']
c: ['a']
0.6 Equality¶
Graphs for us are equal irrespectively of the order in which elements in adjacency lists are specified. So for example these two graphs will be considered equal:
[21]:
dig({'a': ['c', 'b']}) == dig({'a': ['b', 'c']})
[21]:
True
0.7 Basic querying¶
There are some provided methods to query the DiGraph
: adj
, verteces
, is_empty
0.7.1 adj¶
To obtain the edges, you can use the method adj(self, vertex)
. It is already provided and you don’t need to implement it:
def adj(self, vertex):
""" Returns the verteces adjacent to vertex.
NOTE: verteces are returned in a NEW list.
Modifying the list will have NO effect on the graph!
"""
if not vertex in self._edges:
raise Exception("Couldn't find a vertex " + str(vertex))
return self._edges[vertex][:]
[22]:
lst = dig({'a': ['b', 'c'],
'b': ['c']}).adj('a')
print(lst)
['b', 'c']
Let’s check we actually get back a new list (so modifying the old one won’t change the graph):
[23]:
lst.append('d')
print(lst)
['b', 'c', 'd']
[24]:
print(g.adj('a'))
['c', 'b']
NOTE: This technique of giving back copies is also called defensive copying: it prevents users from modifying the internal data structures of a class instance in an uncontrolled manner. For example, if we allowed them direct access to the internal verteces list, they could add duplicate edges, which we don’t allow in our model. If instead we only allow users to add edges by calling add_edge
, we are sure the constraints for our model will always remain satisfied.
0.7.2 is_empty()¶
We can check if a DiGraph
is empty. It is already provided and you don’t need to implement it:
def is_empty(self):
""" A DiGraph for us is empty if it has no verteces and no edges """
return len(self._edges) == 0
[25]:
print(dig({}).is_empty())
True
[26]:
print(dig({'a':[]}).is_empty())
False
0.7.3 verteces()¶
To obtain the verteces, you can use the function verteces
. (NOTE for Italians: method is called verteces, with two es !!!). It is already provided and you don’t need to implement it:
def verteces(self):
""" Returns a set of the graph verteces. Verteces can be any object. """
# Note dict keys() return a list, not a set. Bleah.
# See http://stackoverflow.com/questions/13886129/why-does-pythons-dict-keys-return-a-list-and-not-a-set
return set(self._edges.keys())
[27]:
g = dig({'a': ['c', 'b'],
'b': ['c']})
print(g.verteces())
{'a', 'c', 'b'}
Notice it returns a set, as verteces are stored as keys in a dictionary, so they are not supposed to be in any particular order. When you print the whole graph you see them vertically ordered though, for clarity purposes:
[28]:
print(g)
a: ['c', 'b']
b: ['c']
c: []
Verteces in the edges list are instead stored and displayed in the order in which they were inserted.
0.8 Blow up your computer¶
Try to call the already implemented function graph_test.gen_graphs
with small numbers for n
, like 1, 2 , 3 , 4 …. Just with 2 we get back a lot of graphs:
def gen_graphs(n):
""" Returns a list with all the possible 2^(n^2) graphs of size n
Verteces will be identified with numbers from 1 to n
"""
[29]:
from graph_test import gen_graphs
print(gen_graphs(2))
[
1: []
2: []
,
1: []
2: [2]
,
1: []
2: [1]
,
1: []
2: [1, 2]
,
1: [2]
2: []
,
1: [2]
2: [2]
,
1: [2]
2: [1]
,
1: [2]
2: [1, 2]
,
1: [1]
2: []
,
1: [1]
2: [2]
,
1: [1]
2: [1]
,
1: [1]
2: [1, 2]
,
1: [1, 2]
2: []
,
1: [1, 2]
2: [2]
,
1: [1, 2]
2: [1]
,
1: [1, 2]
2: [1, 2]
]
QUESTION: What happens if you call gen_graphs(10)
? How many graphs do you get back ?
1. Implement building¶
Enough for talking! Let’s implement building graphs.
1.1 has_edge¶
Implement this method in DiGraph
:
def has_edge(self, source, target):
""" Returns True if there is an edge between source vertex and target vertex.
Otherwise returns False.
If either source, target or both verteces don't exist raises an Exception.
"""
raise Exception("TODO IMPLEMENT ME!")
Testing: python3 -m unittest graph_test.HasEdgeTest
1.2 full_graph¶
Implement this function outside the class definition. It is not a method of DiGraph
!
def full_graph(verteces):
""" Returns a DiGraph which is a full graph with provided verteces list.
In a full graph all verteces link to all other verteces (including themselves!).
"""
raise Exception("TODO IMPLEMENT ME!")
Testing: python3 -m unittest graph_test.FullGraphTest
1.3 dag¶
Implement this function outside the class definition. It is not a method of DiGraph
!
def dag(verteces):
""" Returns a DiGraph which is DAG (Directed Acyclic Graph) made out of provided verteces list
Provided list is intended to be in topological order.
NOTE: a DAG is ACYCLIC, so caps (self-loops) are not allowed !!
"""
raise Exception("TODO IMPLEMENT ME!")
Testing: python3 -m unittest graph_test.DagTest
1.4 list_graph¶
Implement this function outside the class definition. It is not a method of DiGraph
!
def list_graph(n):
""" Return a graph of n verteces displaced like a
monodirectional list: 1 -> 2 -> 3 -> ... -> n
Each vertex is a number i, 1 <= i <= n and has only one edge connecting it
to the following one in the sequence
If n = 0, return the empty graph.
if n < 0, raises an Exception.
"""
raise Exception("TODO IMPLEMENT ME!")
Testing: python3 -m unittest graph_test.ListGraphTest
1.5 star_graph¶
Implement this function outside the class definition. It is not a method of DiGraph
!
def star_graph(n):
""" Returns graph which is a star with n nodes
First node is the center of the star and it is labeled with 1. This node is linked
to all the others. For example, for n=4 you would have a graph like this:
3
^
|
2 <- 1 -> 4
If n = 0, the empty graph is returned
If n < 0, raises an Exception
"""
raise Exception("TODO IMPLEMENT ME!")
Testing: python3 -m unittest graph_test.StarGraphTest
1.6 odd_line¶
Implement this function outside the class definition. It is not a method of DiGraph
!
def odd_line(n):
""" Returns a DiGraph with n verteces, displaced like a line of odd numbers
Each vertex is an odd number i, for 1 <= i < 2n. For example, for
n=4 verteces are displaced like this:
1 -> 3 -> 5 -> 7
For n = 0, return the empty graph
"""
Testing: python3 -m unittest graph_test.OddLineTest
Example usage:
[30]:
odd_line(0)
[30]:
DiGraph()
[31]:
odd_line(1)
[31]:
1: []
[32]:
odd_line(2)
[32]:
1: [3]
3: []
[33]:
odd_line(3)
[33]:
1: [3]
3: [5]
5: []
[34]:
odd_line(4)
[34]:
1: [3]
3: [5]
5: [7]
7: []
1.7 even_line¶
Implement this function outside the class definition. It is not a method of DiGraph
!
def even_line(n):
""" Returns a DiGraph with n verteces, displaced like a line of even numbers
Each vertex is an even number i, for 2 <= i <= 2n. For example, for
n=4 verteces are displaced like this:
2 <- 4 <- 6 <- 8
For n = 0, return the empty graph
"""
Testing: python3 -m unittest graph_test.EvenLineTest
Example usage:
[35]:
even_line(0)
[35]:
DiGraph()
[36]:
even_line(1)
[36]:
2: []
[37]:
even_line(2)
[37]:
2: []
4: [2]
[38]:
even_line(3)
[38]:
2: []
4: [2]
6: [4]
1.8 quads¶
Implement this function outside the class definition. It is not a method of DiGraph
!
def quads(n):
""" Returns a DiGraph with 2n verteces, displaced like a strip of quads.
Each vertex is a number i, 1 <= i <= 2n.
For example, for n = 4, verteces are displaced like this:
1 -> 3 -> 5 -> 7
^ | ^ |
| ; | ;
2 <- 4 <- 6 <- 8
where
^ |
| represents an upward arrow, while ; represents a downward arrow
"""
Testing: python3 -m unittest graph_test.QuadsTest
Example usage:
[39]:
quads(0)
[39]:
DiGraph()
[40]:
quads(1)
[40]:
1: []
2: [1]
[41]:
quads(2)
[41]:
1: [3]
2: [1]
3: [4]
4: [2]
[42]:
quads(3)
[42]:
1: [3]
2: [1]
3: [5, 4]
4: [2]
5: []
6: [4, 5]
[43]:
quads(4)
[43]:
1: [3]
2: [1]
3: [5, 4]
4: [2]
5: [7]
6: [4, 5]
7: [8]
8: [6]
1.9 pie¶
Implement this function outside the class definition. It is not a method of DiGraph
!
def pie(n):
"""
Returns a DiGraph with n+1 verteces, displaced like a polygon with a perimeter
of n verteces progressively numbered from 1 to n.
A central vertex numbered zero has outgoing edges to all other verteces.
For n = 0, return the empty graph.
For n = 1, return vertex zero connected to node 1, and node 1 has a self-loop.
"""
Testing: python3 -m unittest graph_test.PieTest
Example usage:
For n=5
, the function creates this graph:
[44]:
pie(5)
[44]:
0: [1, 2, 3, 4, 5]
1: [2]
2: [3]
3: [4]
4: [5]
5: [1]
Degenerate cases:
[45]:
pie(0)
[45]:
DiGraph()
[46]:
pie(1)
[46]:
0: [1]
1: [1]
1.10 Flux Capacitor¶
A Flux Capacitor is a plutonium-powered device that enables time travelling. During the 80s it was installed on a Delorean car and successfully used to ride humans back and forth across centuries:
In this exercise you will build a Flux Capacitor model as a Y-shaped DiGraph
, created according to a parameter depth
. Here you see examples at different depths:
Implement this function outside the class definition. It is not a method of DiGraph
!
def flux(depth):
""" Returns a DiGraph with 1 + (d * 3) numbered verteces displaced like a Flux Capacitor:
- from a central node numbered 0, three branches depart
- all edges are directed outward
- on each branch there are 'depth' verteces.
- if depth < 0, raises a ValueError
For example, for depth=2 we get the following graph (suppose arrows point outward):
4 5
\ /
1 2
\ /
0
|
3
|
6
Testing: python3 -m unittest graph_test.FluxTest
Example usage:
[47]:
flux(0)
[47]:
0: []
[48]:
flux(1)
[48]:
0: [1, 2, 3]
1: []
2: []
3: []
[49]:
flux(2)
[49]:
0: [1, 2, 3]
1: [4]
2: [5]
3: [6]
4: []
5: []
6: []
[50]:
flux(3)
[50]:
0: [1, 2, 3]
1: [4]
2: [5]
3: [6]
4: [7]
5: [8]
6: [9]
7: []
8: []
9: []
2. Manipulate graphs¶
You will now implement some methods to manipulate graphs.
2.1 remove_vertex¶
def remove_vertex(self, vertex):
""" Removes the provided vertex and returns it
If the vertex is not found, raises an Exception.
"""
Testing: python3 -m unittest graph_test.RemoveVertexTest
2.2 transpose¶
def transpose(self):
""" Reverses the direction of all the edges
- MUST perform in O(|V|+|E|)
Note in adjacency lists model we suppose there are only few edges per node,
so if you end up with an algorithm which is O(|V|^2) you are ending up with a
complexity usually reserved for matrix representations !!
NOTE: this method changes in-place the graph: does **not** create a new instance
and does *not* return anything !!
NOTE: To implement it *avoid* modifying the existing _edges dictionary (would
probably more problems than anything else).
Instead, create a new dictionary, fill it with the required
verteces and edges ad then set _edges to point to the new dictionary.
"""
Testing: python3 -m unittest graph_test.TransposeTest
2.3 has_self_loops¶
def has_self_loops(self):
""" Returns True if the graph has any self loop (a.k.a. cap), False otherwise """
Testing: python3 -m unittest graph_test.HasSelfLoopsTest
2.4 remove_self_loops¶
def remove_self_loops(self):
""" Removes all of the self-loops edges (a.k.a. caps)
NOTE: Removes just the edges, not the verteces!
"""
Testing: python3 -m unittest graph_test.RemoveSelfLoopsTest
2.5 undir¶
def undir(self):
""" Return a *NEW* undirected version of this graph, that is, if an edge a->b exists in this graph,
the returned graph must also have both edges a->b and b->a
*DO NOT* modify the current graph, just return an entirely new one.
"""
Testing: python3 -m unittest graph_test.UndirTest
3. Query graphs¶
You can query graphs the Do it yourself way with Depth First Search (DFS) or Breadth First Search (BFS).
Let’s make a simple example:
[51]:
g = dig({'a': ['a','b', 'c'],
'b': ['c'],
'd': ['e']})
from sciprog import draw_dig
draw_dig(g)

[52]:
g.dfs('a')
DEBUG: Stack is: ['a']
DEBUG: popping from stack: a
DEBUG: not yet visited
DEBUG: Scheduling for visit: a
DEBUG: Scheduling for visit: b
DEBUG: Scheduling for visit: c
DEBUG: Stack is : ['a', 'b', 'c']
DEBUG: popping from stack: c
DEBUG: not yet visited
DEBUG: Stack is : ['a', 'b']
DEBUG: popping from stack: b
DEBUG: not yet visited
DEBUG: Scheduling for visit: c
DEBUG: Stack is : ['a', 'c']
DEBUG: popping from stack: c
DEBUG: already visited!
DEBUG: popping from stack: a
DEBUG: already visited!
Compare it wirh the example for the bfs :
[53]:
draw_dig(g)

[54]:
g.bfs('a')
DEBUG: Removed from queue: a
DEBUG: Found neighbor: a
DEBUG: already visited
DEBUG: Found neighbor: b
DEBUG: not yet visited, enqueueing ..
DEBUG: Found neighbor: c
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['b', 'c']
DEBUG: Removed from queue: b
DEBUG: Found neighbor: c
DEBUG: already visited
DEBUG: Queue is: ['c']
DEBUG: Removed from queue: c
DEBUG: Queue is: []
Predictably, results are different.
3.1 distances()¶
Implement this method of DiGraph
:
def distances(self, source):
"""
Returns a dictionary where the keys are verteces, and each vertex v is associated
to the *minimal* distance in number of edges required to go from the source
vertex to vertex v. If node is unreachable, the distance will be -1
Source has distance zero from itself
Verteces immediately connected to source have distance one.
- if source is not a vertex, raises an LookupError
- MUST execute in O(|V| + |E|)
- HINT: implement this using bfs search.
"""
If you look at the following graph, you can see an example of the distances to associate to each vertex, supposing that the source
is a
. Note that a
iself is at distance zero from itself and also that unreachable nodes like f
and g
will be at distance -1
[55]:
import sciprog
sciprog.draw_nx(sciprog.show_distances())

distances('a')
called on this graph would return a map like this:
{
'a':0,
'b':1,
'c':1,
'd':2,
'e':3,
'f':-1,
'g':-1,
}
3.2 equidistances()¶
Implement this method of DiGraph
:
def equidistances(self, va, vb):
""" RETURN a dictionary holding the nodes which
are equidistant from input verteces va and vb.
The dictionary values will be the distances of the nodes.
- if va or vb are not present in the graph, raises LookupError
- MUST execute in O(|V| + |E|)
- HINT: To implement this, you can use the previously defined distances() method
"""
Example:
[56]:
G = dig({'a': ['b','e'],
'b': ['d'],
'c': ['d'],
'd': ['f'],
'e': ['d','b'],
'f': ['g','h'],
'g': ['e']})
draw_dig(G, options={'graph':{'size':'15,3!', 'rankdir':'LR'}})

Consider a
and g
, they both:
can reach
e
in one stepcan reach
d
in two stepscan reach
f
in three stepscan reach
h
in four stepsc
is unreachable by botha
andg
,so it won’t be present in the outputb
is reached froma
in one step, and fromg
in two steps, so it won’t be included in the output
[57]:
G.equidistances('a','g')
[57]:
{'e': 1, 'd': 2, 'f': 3, 'h': 4}
3.3 Play with dfs and bfs¶
Create small graphs (like linked lists a->b->c, triangles, mini-full graphs, trees - you can also use the functions you defined to create graphs like full_graph
, dag
, list_graph
, star_graph
) and try to predict the visit sequence (verteces order, with discovery and finish times) you would have running a dfs or bfs. Then write tests that assert you actually get those sequences when running provided dfs and bfs
3.4 Exits graph¶
There is a place nearby Trento called Silent Hill, where people always study and do little else. Unfortunately, one day an unethical biotech AI experiment goes wrong and a buggy cyborg is left free to roam in the building. To avoid panic, you are quickly asked to devise an evacuation plan. The place is a well known labyrinth, with endless corridors also looping into cycles. But you know you can model this network as a digraph, and decide to represent crossings as nodes. When a crossing has a
door to leave the building, its label starts with letter e
, while when there is no such door the label starts with letter n
.
In the example below, there are three exits e1
, e2
, and e3
. Given a node, say n1
, you want to tell the crowd in that node the shortest paths leading to the three exits. To avoid congestion, one third of the crowd may be told to go to e2
, one third to reach e1
and the remaining third will go to e3
even if they are farther than e2
.
In Python terms, we would like to obtain a dictionary of paths like the following, where as keys we have the exits and as values the shortest sequence of nodes from n1
leading to that exit
{
'e1': ['n1', 'n2', 'e1'],
'e2': ['n1', 'e2'],
'e3': ['n1', 'e2', 'n3', 'e3']
}
[58]:
from sciprog import draw_dig
from graph_solution import *
from graph_test import dig
[59]:
G = dig({'n1':['n2','e2'],
'n2':['e1'],
'e1':['n1'],
'e2':['n2','n3', 'n4'],
'n3':['e3'],
'n4':['n1']})
draw_dig(G)

You will solve the exercise in steps, so open exits_solution.py
and proceed reading the following points.
3.4.1 Exits graph cp¶
Implement this method
def cp(self, source):
""" Performs a BFS search starting from provided node label source and
RETURN a dictionary of nodes representing the visit tree in the
child-to-parent format, that is, each key is a node label and as value
has the node label from which it was discovered for the first time
So if node "n2" was discovered for the first time while
inspecting the neighbors of "n1", then in the output dictionary there
will be the pair "n2":"n1".
The source node will have None as parent, so if source is "n1" in the
output dictionary there will be the pair "n1": None
- MUST execute in O(|V| + |E|)
- NOTE: This method must *NOT* distinguish between exits
and normal nodes, in the tests we label them n1, e1 etc just
because we will reuse in next exercise
- NOTE: You are allowed to put debug prints, but the only thing that
matters for the evaluation and tests to pass is the returned
dictionary
"""
Testing: python3 -m unittest graph_test.CpTest
Example:
[60]:
G.cp('n1')
DEBUG: Removed from queue: n1
DEBUG: Found neighbor: n2
DEBUG: not yet visited, enqueueing ..
DEBUG: Found neighbor: e2
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['n2', 'e2']
DEBUG: Removed from queue: n2
DEBUG: Found neighbor: e1
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['e2', 'e1']
DEBUG: Removed from queue: e2
DEBUG: Found neighbor: n2
DEBUG: already visited
DEBUG: Found neighbor: n3
DEBUG: not yet visited, enqueueing ..
DEBUG: Found neighbor: n4
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['e1', 'n3', 'n4']
DEBUG: Removed from queue: e1
DEBUG: Found neighbor: n1
DEBUG: already visited
DEBUG: Queue is: ['n3', 'n4']
DEBUG: Removed from queue: n3
DEBUG: Found neighbor: e3
DEBUG: not yet visited, enqueueing ..
DEBUG: Queue is: ['n4', 'e3']
DEBUG: Removed from queue: n4
DEBUG: Found neighbor: n1
DEBUG: already visited
DEBUG: Queue is: ['e3']
DEBUG: Removed from queue: e3
DEBUG: Queue is: []
[60]:
{'n1': None,
'n2': 'n1',
'e2': 'n1',
'e1': 'n2',
'n3': 'e2',
'n4': 'e2',
'e3': 'n3'}
Basically, the dictionary above represents this visit tree:
n1
/ \
n2 e2
\ / \
e1 n3 n4
|
e3
3.4.2 Exit graph exits¶
Implement this function. NOTE: the function is external to class DiGraph.
def exits(cp):
"""
INPUT: a dictionary of nodes representing a visit tree in the
child-to-parent format, that is, each key is a node label and
as value has its parent as a node label. The root has
associated None as parent.
OUTPUT: a dictionary mapping node labels of exits to a list
of node labels representing the the shortest path from
the root to the exit (root and exit included)
- MUST execute in O(|V| + |E|)
"""
Testing: python3 -m unittest graph_test.ExitsTest
Example:
[61]:
# as example we can use the same dictionary outputted by the cp call in the previous exercise
visit_cp = { 'e1': 'n2',
'e2': 'n1',
'e3': 'n3',
'n1': None,
'n2': 'n1',
'n3': 'e2',
'n4': 'e2'
}
exits(visit_cp)
[61]:
{'e1': ['n1', 'n2', 'e1'], 'e2': ['n1', 'e2'], 'e3': ['n1', 'e2', 'n3', 'e3']}
3.5 connected components¶
Implement cc
:
def cc(self):
""" Finds the connected components of the graph, returning a dict object
which associates to the verteces the corresponding connected component
number id, where 1 <= id <= |V|
IMPORTANT: ASSUMES THE GRAPH IS UNDIRECTED !
ON DIRECTED GRAPHS, THE RESULT IS UNPREDICTABLE !
To develop this function, implement also ccdfs
HINT: store 'counter' as field in Visit object
"""
Which in turn uses the FUNCTION ccdfs
, also to implement INSIDE the method cc:
def ccdfs(counter, source, ids):
"""
Performs a DFS from source vertex
HINT: Copy in here the method from DFS and adapt it as needed
HINT: store the connected component id in VertexLog objects
"""
Testing: python3 -m unittest graph_test.CCTest
NOTE: In tests, to keep code compact graphs are created a call to udig()
[62]:
from graph_test import udig
udig({'a': ['b'],
'c': ['d']})
[62]:
a: ['b']
b: ['a']
c: ['d']
d: ['c']
which makes sure the resulting graph is undirected as CC algorithm requires (so if there is one edge a->b
there will also be another edge b->a
)
3.6 has_cycle¶
Implement has_cycle
method for directed graphs:
```python
def has_cycle(self):
""" Return True if this directed graph has a cycle, return False otherwise.
- To develop this function, implement also has_cycle_rec(u) inside this method
- Inside has_cycle_rec, to reference variables of has_cycle you need to
declare them as nonlocal like
nonlocal clock, dt, ft
- MUST be able to also detect self-loops
"""```
and also has_cycle_rec
inside has_cycle
:
def has_cycle_rec(u):
raise Exception("TODO IMPLEMENT ME !")
Testing: python3 -m unittest graph_test.HasCycleTest
3.7 top_sort¶
Look at Montresor slides on topological sort
Keep in mind two things:
topological sort works on DAGs, that is, Directed Acyclic Graphs
given a graph, there can be more than one valid topological sort
it works also on DAGs having disconnected components, in which case the nodes of one component can be interspersed with the nodes of other components at will, provided the order within nodes belonging to the same component is preserved.
EXERCISE: Before coding, try by hand to find all the topological sorts of the following graphs. For all them, you will find the solutions listed in the tests.
[63]:
G = dig({'a':['c'],
'b':['c']})
draw_dig(G)

[64]:
G = dig({'a':['b'], 'c':[]})
draw_dig(G)

[65]:
G = dig({'a':['b'], 'c':['d']})
draw_dig(G)

[66]:
G = dig({'a':['b','c'], 'b':['d'], 'c':['d']})
draw_dig(G)

[67]:
G = dig({'a':['b','c','d'], 'b':['e'], 'c':['e'], 'd':['e']})
draw_dig(G)

[68]:
G = dig({'a':['b','c','d'], 'b':['c','d'], 'c':['d'], 'd':[]})
draw_dig(G)

Now implement this method:
def top_sort(self):
""" RETURN a topological sort of the graph. To implement this code,
feel free to adapt Montresor algorithm
- implement Stack S as a list
- implement visited as a set
- NOTE: differently from Montresor code, for tests to pass
you will need to return a reversed list. Why ?
"""
Testing: python3 -m unittest graph_test.TopSortTest
Note: in tests there is the method self.assertIn(el,elements)
which checks el
is in elements
. We use it because for a graph there a many valid topological sorts, and we want the test independent from your particular implementation .
[ ]: