Over the last few years machine learning techniques have become increasingly popular and are now widely used in both the academia and the industry.
The BSE Data Science Summer School provides an overview of the state-of-the-art tools employed in machine learning.
"Foundations of Data Science" is an introductory course that provides a broad overview of the main methodologies used to analyze data in data science. "Statistical Machine Learning for Large and Unstructured Data" and "Deep Learning and Applications" are advanced courses focusing on specialized topics.
The teaching style of the courses is hands-on. Special attention will be devoted on applying the methodologies introduced in class on empirical data using Jupyter notebooks coded in either Python or R.
Overall, the courses of the BSE Summer School in Data Science give participants the tools to apply modern machine learning techniques, from data exploration to building predictive models and extracting insights.
Course list for 2023
June 25, 2023 (Face-to-face)
- Coding Bootcamp in Python and R
Instructor: Paul Rognon (UPF and Universitat Politècnica de Catalunya)
This course is held in an intensive one-day format.
Week of June 26 - June 30, 2023 (Face-to-face)
- Foundations of Data Science
Instructor: André B.M. Souza (ESADE)
Week of July 3 - 7, 2023 (Face-to-face)
- Statistical Machine Learning for Large and Unstructured Data
Instructors: Lorenzo Cappello (UPF and BSE) and Stephen Hansen (University College London and BSE)
Week of July 10 - 14, 2023 (Face-to-face)
- Deep Learning and Applications
Instructors: Vicenç Gómez (UPF Artificial Intelligence and Machine Learning group) and Anders Jonsson (UPF Artificial Intelligence and Machine Learning group)
Program director
Apply to Summer School 2023
There is no fee to apply. Submit your application online in a few easy steps!
Apply to Summer School courses
10% Early-bird discount deadline: April 14, 2023
Last day to apply: June 1, 2023
Fees and discounts
Fees vary by course. You may be eligible for one or more available Summer School discounts. Our staff can provide a personalized quote for you.
Applications will open soon!
Very soon you'll be able to apply to the 2023 edition of the BSE Summer Schools.
See you in Summer 2023!
Courses for the 2023 edition of the BSE Summer Schools will be announced later this year. We look forward to meeting you here in Barcelona!
Request a quote or get more information
Contact our Summer School Team
Let us design a course for your employees at any time of year.
Coding Bootcamp in Python and R
Course Overview
The “Coding Bootcamp in Python and R” provides basic training in programming with Python and R for data analysis and machine learning. This is an intensive 8-hour course based on a hands-on approach using Jupyter notebooks.
This course has been specially designed to provide you with essential coding skills that come in handy when initiating or continuing your training in Data Science. If you are interested in mastering this area and learning the tools to apply modern machine learning techniques, from data exploration to building predictive models and extracting insights, this intensive 8-hour course is the ideal preparation to get started.
The bootcamp will also give you the required preparation for other courses of the BSE Summer School in Data Science.
Prerequisites
Although not mandatory, some knowledge of Python, Jupyter notebooks, and matrix algebra is recommended. Students are encouraged to install R and Python on their own laptop or desktop computer. This is also non-mandatory since the course is based on Jupyter notebooks that can be run with Google Collab.
Course Outline
The course evolves along the following thematic units:
1. Programming with Python
- Running Python with Jupyter
- Basic variables: Numbers and strings
- Basic operations
- Loops, control flow
- Data structures: Lists, maps, reductions
- Functions and classes
- Inputs and outputs
- String manipulation
- Date/time manipulation
- Special packages: panda and numpy
2. Programming with R
- Running R with Jupyter
- Basic variables: Numbers and strings
- Basic Operations
- Loops, control flow
- Data structures: Lists, vectors, matrices, data frames
- Functions
- Inputs and outputs
- String manipulation
- Date/time manipulation
- Basic data visualization
- Special packages: ggplot2, dplyr
References
An Introduction to Statistical Learning: with Applications in R, G. James, D. Witten, T. Hastie, R. Tibshirani, Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2021)
Software / Hardware
The course is based on Jupyter Notebooks that can be run in Google Colab. Laptop is required.
About the Instructor
Paul Rognon is a PhD Student in Statistics and Machine Learning at Universitat Pompeu Fabra and Universitat Politècnica de Catalunya. His background is in applied mathematics and mathematical statistics. His research currently focuses on graphical and high-dimensional statistical theory. He has previously worked on applications ranging from financial risk, functional genomics, text mining, and extreme weather risk.
Foundations of Data Science
Course Outline
"Foundations of Data Science" is a hands-on course that exposes participants to state-of-the-art tools employed in data science. The course is composed of three units that guide participants through the process of converting raw data into actionable insights:
- Data handling and visualization
- Supervised learning: the course covers some of the most relevant supervised learning tools, ranging from linear models such as LASSO, Ridge and Elastic Net to nonlinear models, such as Decision Trees, Random Forests and Boosting.
- Unsupervised learning: participants are introduced to the main concepts and tools for dealing with unsupervised learning problems, such as clustering algorithms and Principal Components Analysis.
Classes consist of rigorous treatment of the topics and practical sessions to learn deploying these techniques to real data sets. Built around Jupyter Notebooks, the course offers participants the opportunity to improve their programming skills in both Python and R, the most widely used programming languages in Data Science.
After completing this course, you will:
- Be able to work with and extract valuable insights from real data,
- Have improved programming skills in two of the most used programming languages in data science,
- Understand some of the key methods, as well as their limitations, used by data scientists.
- Have gained practical experience in applying these methods to large and heterogeneous data
- Have learned to work with large datasets.
Prerequisites
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra and programming with Python and R.
Students that do not have a significant programming experience will be admitted provided that they attend the 8h "Computing Bootcamp" that will be held prior to the “Foundations of Data Science” course.
You can check your own skill level downloading the following script (this script is not part of the application process. It will not be reviewed by the BSE admissions team or the instructors).
Course Outline
The course lasts 5 days. At the end of each lecture, students will be given projects to complete on a voluntary basis. The projects will be reviewed in the following lecture.
The course evolves around the following units:
1. Data visualization
- Elements of data visualization
- Exploration plots: scatter, lines, barplots, boxplots
- Advanced plots: correlation, regression, biplots
- Special plots
- Reporting using visualization
Keywords: seaborn, plotly
2. Data Handling
- Handling missing data: imputation methods
- Feature transformation and engineering: normalization, dimensionality reduction, category encoding
Keywords: sklearn
3. Supervised learning
3.1. Linear models for regression
- Linear models and non-linear feature maps
- Model evaluation
- Bias-Variance tradeoff
- Penalized regression
- Cross validation and model selection
3.2. Linear models for classification
- Logistic regression
- Misclassification, ROC, AUC
- Class imbalance
3.3. Nonlinear models: decision trees
- Decision trees
- Variable selection
- Random Forests
- Bagging and boosting
Keywords: sklearn
4. Unsupervised learning
- Clustering
- Principal components
Keywords: clustering, principal components, special clustering
5. Data handling, Supervised and Unsupervised Learning with R.
References
The Elements of Statistical Learning. T. Hastie, R. Tibshirani, and J. Friedman. Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2001)
An Introduction to Statistical Learning: with Applications in R, G. James, D. Witten, T. Hastie, R. Tibshirani, Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2021)
Software / Hardware
The course is based on Jupyter Notebooks that can be run in Google Colab. Laptop is required.
About the Instructor
The course will be delivered by André B.M. Souza, Assistant Professor in the Economics, Finance and Accounting at ESADE Business School. His research interests include forecasting, financial econometrics, machine learning, empirical finance and empirical macroeconomics. He has taught several econometrics and statistics courses for the Master's programs at the BSE.
Statistical Machine Learning For Large Unstructured Data
Course Overview
We introduce a combination of cutting-edge data analysis methods for analyzing large and unstructured data. The goal is that attendees become familiar with data analysis methods lying at the frontier of research and applications and how to apply them in practice. We focus on problems where one wishes to go beyond point prediction and wants to obtain statistically valid inferences on parameters, hypotheses, or probabilistic forecasts. We emphasize Bayesian methodology as a natural probabilistic machine learning framework for such tasks. Recent advances in the statistical, Bayesian, and machine learning literature will be the core of the lectures.
We start discussing topics in high-dimensional Statistics, focusing on penalized likelihood and Bayesian methods for high-dimensional regression. Then, we discuss how to adapt them for treatment effect estimation in causal inference. Third, we discuss probabilistic machine learning methods such as non-parametric regression and probabilistic forecasts. Alternative ways to quantify uncertainty such as conformal inference will also be touched upon. Finally, we discuss latent discrete variable models helpful in analyzing unstructured data such as count-based outcomes or text. We also introduce modern computational methods and programming software that allow one to deploy efficiently the methods discussed and showcase how to do so via case studies. We will emphasize applications in Economics and the Social Sciences, although the presented ideas are widely applicable to other fields, and we also present examples from disciplines such as Biomedicine.
Prerequisites
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra, programming, and data analysis with Python and R as well as fundamental concepts in statistical learning. Overall the type of material that is covered in the "Computing Bootcamp" and "Foundations of Data Science" courses.
Course Outline
The program combines a series of lectures by leading researchers in the development of Bayesian methods and their applications and practical sessions with hands-on data analysis. The course is 20 hours, split into ten sessions of 2 hours each. Theory and practice will alternate within each session, but two practical sessions focus on hands-on data analysis.
- Regression with many covariates I. Penalized likelihood and the Bayesian approach to model selection and averaging.
- Regression with many covariates II. Computation basics (Laplace approximations, Markov Chain Monte Carlo).
- Causal inference. Treatment effects with many variables, Double machine learning, Bayesian views on topics.
- Practice Session. Regression with many covariates and linear treatment effects
- Probabilistic machine learning. Non-parametric regression. Probabilistic forecasts, conformal inference.
- Practice Session. Probabilistic machine learning
- Latent Variable Models. Principal components analysis, mixture and mixed-membership models, latent Dirichlet allocation
- Scalable Bayesian inference. Hamiltonian Monte Carlo, Black Box Variational Inference.
- Applications. Overview of latent variable models applied to problems in economics and related disciplines
- Practice Session. Implementation of scalable Bayesian inference methods.
References
- The Elements of Statistical Learning. T. Hastie, R. Tibshirani, and J. Friedman. Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2001)
- An Introduction to Statistical Learning: with Applications in R, G. James, D. Witten, T. Hastie, R. Tibshirani, Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2021)
Software / Hardware
The course is based on Jupyter Notebooks that can be run in Google Colab. Laptop is required.
About the Instructors
Lorenzo Cappello is an assistant professor in the Statistics group at Universitat Pompeu Fabra and an affiliated professor at the Barcelona School of Economics. He previously was a postdoctoral scholar in the Department of Statistics at Stanford University, mentored by Prof. Julia A. Palacios. He earned my PhD in Statistics in 2018 from the Department of Decision Sciences at the Bocconi University, under the supervision of Prof. Stephen G. Walker and Prof. Sonia Petrone.
Stephen Hansen, PhD, London School of Economics, is an Associate Professor of Economics at Imperial College Business School. His research focuses on organizational economics and monetary policy, relying on unstructured data sources and machine learning methods to address questions in these areas.
Deep Learning and Applications
Course Overview
Deep learning is a key technology behind most of the recent successful applications in Artificial Intelligence, including driverless cars, online recommender systems, scientific discoveries, or chatbots, to name a few. Deep learning differs from other machine learning techniques because it can learn representations that generalize to unseen data with little user supervision. For that, deep learning techniques make use of appropriate architectures, large corpus of data, and powerful graphics processing units.
This course is beneficial to data scientists interested in understanding the mathematical and algorithmic principles behind deep learning, as well as to know the recent advances in the field. The students will be able to apply existing pre-trained models, as well as to fine-tune new models for different application domains, including image processing and natural language processing.
Prerequisites
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra, programming, data analysis with Python and R as well as fundamental concepts in statistical learning. Overall the type of material which is covered in the "Computing Bootcamp" and "Foundations of Data Science" courses.
Course Outline
Each day will consist of two parts: a morning lecture introducing the main theoretical concepts and methods, and a practical session in the afternoon:
- Day 1: An overall introduction to Deep Learning, including the basic building blocks and optimization algorithms for training deep models.
- Day 2: Computer Vision, with focus on neural architectures such as convolutional networks, generative models, or normalizing flows, for tasks such as image classification, object detection or image captioning.
- Day 3: Natural Language Processing, with focus on word/document embeddings, recurrent neural networks, and large-scale pre-trained language models based on transformers.
- Day 4: Deep Learning on Graphs, with focus on learning architectures designed for inputs structured as graphs, such as social networks, web graphs, or protein networks.
- Day 5: Deep Reinforcement Learning, with focus on how deep learning can be used in sequential decision making problems such as board/video-games.
References
- Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD. Jeremy Howard, Sylvain Gugger. Publisher: O'Reilly Media
- Deep Learning with PyTorch Step-by-Step: A Beginner's Guide.
- Representation Learning: A Review and New Perspectives. Yoshua Bengio, Aaron Courville, Pascal Vincent
- Distilling the Knowledge in a Neural Network - Geoffrey Hinton, Oriol Vinyals, Jeff Dean
- Welcome to Pytorch tutorials - Pytorch Tutorials 1.9.1+cu102 documentation
Software / Hardware
The course is based on Jupyter Notebooks that can be run in Google Colab. Laptop is required.
About the Instructors
Vicenç Gómez is currently a tenure-track faculty under a Ramon y Cajal fellowship in the Artificial Intelligence and Machine Learning group at UPF, which he joined with a transnational academic career grant (FP7 Marie Curie Actions) in 2014. Prior to this, he worked for more than six years as a research scientist in the machine learning group at the Radboud University Nijmegen, the Netherlands. His main research interests are probabilistic inference and reinforcement learning. He works on developing novel machine learning methods derived from first principles and understanding their theoretical properties, as well as their application for modeling, understanding and improving the functioning of networked systems.
Anders Jonsson is professor in the ICT Department of UPF, and director of the Artificial Intelligence and Machine Learning group. He received his MSc in Engineering Physics from the Royal Institute of Technology in Stockholm, Sweden, and his PhD in Computer Science from the University of Massachusetts Amherst, USA. His research work is mainly focused on sequential decision problems, either in the form of reinforcement learning, or in the form of automatic planning. In particular, his work focuses on finding and exploiting the structure of these types of problems in order to simplify their solution. He has participated in numerous international projects.
UPF
Who will benefit from this program?
- Those who finished their Master's 10 years ago and want to get up to speed with the latest developments in data science
- Those who don’t have the time to do a Master's in data science but want to get an introduction to the area
- Those who manage a team of data scientists, don’t know much about data science but want to get acquainted with the basic techniques
- In particular, this program is useful for those who want to learn how to do data analysis
Requirements for Foundations course
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra and programming with Python and R. Students that do not have a significant programming experience will be admitted provided that they attend the 8h "Coding Bootcamp in Python and R" that will be held prior to the "Foundations of Data Science" course.
Requirements for the other courses
Besides the basic entry requirements, the participants of this course are expected to be familiar with the fundamental concepts in statistical learning and programming with Python and R. Overall the type of material which is covered in the "Coding Bootcamp in Python and R" and "Foundations of Data Science" courses.
Credit transfers (ECTS)
Students will deliver a short final project one week after the summer school finishes. It will consist in solving a final problem that will include the practical and empirical issues worked on in class.
Consult the Credit Transfer page for more information about this option.
Certificate of attendance
Participants not interested in credit transfer will instead receive a Certificate of Attendance, stating the courses and number of hours completed. These students will be neither evaluated nor graded. There is no fee for the certificate.
Fees
Multiple course discounts are available. Fees for courses in other Summer School programs may vary.
Course | Modality | Hours | ECTS | Regular Fee | Reduced Fee* |
---|---|---|---|---|---|
Coding Bootcamp in Python and R | Face-to-face | 8 | 0 | 575€ | 325€ |
Deep Learning and Applications | Face-to-face | 20 | 2 | 1475€ | 850€ |
Foundations of Data Science | Face-to-face | 20 | 2 | 1475€ | 850€ |
Statistical Machine Learning for Large and Unstructured Data | Face-to-face | 20 | 2 | 1475€ | 850€ |
* Reduced Fee applies for PhD or Master's students, Alumni of BSE Master's programs, and participants who are unemployed.
** Flexible cancelation policy: cancelations made on or before June 1, 2023, will receive a 100% refund.
See more information about available discounts or request a personalized discount quote by email.
Day / Time | Sun* | Mo | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|---|
9:00 - 11:00 |
Coding Bootcamp in Python and R** |
Foundations of Data Science | ||||
13:00 - 14:00 | ||||||
14:00 - 16:00 | Foundations of Data Science | |||||
16:00 - 18:00 |
*This week starts on Sunday with the one-day course "Coding Bootcamp in Python and R"
**There will be a lunch break from 13:00 - 14:00 on Sunday
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
9:00 - 11:00 | Statistical Machine Learning for Large and Unstructured Data | ||||
14:00 - 16:00 | Statistical Machine Learning for Large and Unstructured Data |
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
9:00 - 11:00 | Deep Learning and Applications | ||||
14:00 - 16:00 | Deep Learning and Applications |
Mix and match your summer courses!
Remember that you can combine Data Science courses with courses in any of the other BSE Summer School programs (schedule permitting).