Over the last few years machine learning techniques have become increasingly popular and widely used in both academia and the industry.
The BSE Data Science Summer School provides an overview of the state-of-the-art tools employed in machine learning.
"Foundations of Data Science" is an introductory course that provides a broad overview of the main methodologies used to analyze data in data science. "Harnessing Language Models: Your Path to NLP Expert" and "Statistical Machine Learning for Large and Unstructured Data" are advanced courses focusing on more specialized topics.
The teaching style of the courses is hands-on. Special attention will be devoted to applying the methodologies introduced in class on empirical data using Jupyter notebooks coded in either Python or R.
Overall, the courses of the BSE Summer School in Data Science give participants the tools to apply modern machine learning techniques, from data exploration to building predictive models and extracting insights.
Course list for 2024
June 24, 2024 (Face-to-face)
- Coding Bootcamp in Python and R
Instructor: Déborah Sulem (UPF and BSE)
This course is held in an intensive one-day format.
Week of June 25 - June 29, 2024 (Face-to-face)
- Foundations of Data Science
Instructor: André B.M. Souza (ESADE)
Week of July 1 - 5, 2024 (Face-to-face)
- Harnessing Language Models: Your Path to NLP Expert
Instructors: Hannes Mueller (IAE-CSIC and BSE) and Arnault Gombert (Freelance NLP Data Scientist)
Week of July 8 - 12, 2024 (Face-to-face)
- Statistical Machine Learning for Large and Unstructured Data
Instructor: Lorenzo Cappello (UPF and BSE)
Program director
Apply to Summer School 2024
There is no fee to apply. Submit your application online in a few easy steps!
Apply to Summer School courses
10% Early-bird discount deadline: April 14, 2023
Last day to apply: June 1, 2023
Fees and discounts
Fees vary by course. You may be eligible for one or more available Summer School discounts. Our staff can provide a personalized quote for you.
Applications will open soon!
Very soon you'll be able to apply to the 2023 edition of the BSE Summer Schools.
See you in Summer 2023!
Courses for the 2023 edition of the BSE Summer Schools will be announced later this year. We look forward to meeting you here in Barcelona!
Request a quote or get more information
Contact our Summer School Team
Let us design a course for your employees at any time of year.
Coding Bootcamp in Python and R
Course Overview
This is an 8-hour course that provides an introduction to programming with Python and R for data analysis and machine learning. The course is taught with a hands-on approach via Jupyter notebooks.
This course has been specifically designed to provide you with the coding skills that are essential for training in data science. If you are interested in mastering this area and learning the tools to apply modern machine learning techniques – from data exploration to building predictive models and extracting insights – this course is the ideal preparation to get started.Our 8-hour Coding Bootcamp in Python and R is an immersive introduction to programming for data analysis and machine learning. This hands-on course, conducted through the use of Jupyter notebooks, equips you with essential coding skills necessary for excelling in data science. Whether you're striving to master the field or seeking to apply modern machine learning techniques, from data exploration to predictive modeling and insights extraction, this course is your ideal starting point.
Get professional training in the most popular programming languages for modern data analysis and machine learning
You will also get the required preparation for the other courses in our Data Science Winter School such as Foundations of Data Science.
After successful completion of this course you will have:
- Learned the two main languages used in data science: Python and R
- Learned how to use state-of-the-art packages in Python and R for data manipulation
- Gained practical experience in manipulating data in Python and R
Get up to speed on the latest developments in data science in a short time
This is an engaging course where you will be encouraged to engage with the material, experiment, and directly interact with the lecturer.
Prerequisites
Although not mandatory, some knowledge of Python, Jupyter notebooks, and matrix algebra is recommended. Students are encouraged to install R and Python on their own laptop or desktop computer. This is also non-mandatory since the course is based on Jupyter notebooks that can be run with Google Collab.
Course Outline
The course evolves along the following thematic units:
1. Programming with Python
- Running Python with Jupyter
- Basic variables: Numbers and strings
- Basic operations
- Loops, control flow
- Data structures: Lists, maps, reductions
- Functions and classes
- Inputs and outputs
- String manipulation
- Date/time manipulation
- Special packages: panda and numpy
2. Programming with R
- Running R with Jupyter
- Basic variables: Numbers and strings
- Basic Operations
- Loops, control flow
- Data structures: Lists, vectors, matrices, data frames
- Functions
- Inputs and outputs
- String manipulation
- Date/time manipulation
- Basic data visualization
- Special packages: ggplot2, dplyr
References
An Introduction to Statistical Learning: with Applications in R, G. James, D. Witten, T. Hastie, R. Tibshirani, Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2021)
Software / Hardware
The course is based on Jupyter Notebooks that can be run in Google Colab. Laptop is required.
About the Instructor
Déborah Sulem is a post-doctoral researcher at the Barcelona School of Economics and Universitat Pompeu Fabra, in the Statistics group, working with Prof. David Rossell and Prof. Gabor Lugosi.
Her research interests lie around networks, point processes, Bayesian inference, and high-dimensional and nonparametric statistics. She works on different problems from applied probability to machine learning, and recently became interested in network archaeology, tree algorithms, and graphical models.
She is also interested in the interpretability, robustness, and fairness of statistical and machine learning methods, and have notably worked with counterfactual and Bayesian analysis.
Déborah Sulem
Post-doctoral Researcher at UPF and BSE
Foundations of Data Science
Course Outline
"Foundations of Data Science" is a hands-on course that exposes participants to state-of-the-art tools employed in data science. The course is composed of three units that guide participants through the process of converting raw data into actionable insights:
- Data handling and visualization
- Supervised learning: the course covers some of the most relevant supervised learning tools, ranging from linear models such as LASSO, Ridge and Elastic Net to nonlinear models, such as Decision Trees, Random Forests and Boosting.
- Unsupervised learning: participants are introduced to the main concepts and tools for dealing with unsupervised learning problems, such as clustering algorithms and Principal Components Analysis.
Classes consist of rigorous treatment of the topics and practical sessions to learn deploying these techniques to real data sets. Built around Jupyter Notebooks, the course offers participants the opportunity to improve their programming skills in both Python and R, the most widely used programming languages in Data Science.
After completing this course, you will:
- Be able to work with and extract valuable insights from real data,
- Have improved programming skills in two of the most used programming languages in data science,
- Understand some of the key methods, as well as their limitations, used by data scientists.
- Have gained practical experience in applying these methods to large and heterogeneous data
- Have learned to work with large datasets.
Prerequisites
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra and programming with Python and R.
Students that do not have a significant programming experience will be admitted provided that they attend the 8h "Computing Bootcamp" that will be held prior to the “Foundations of Data Science” course.
You can check your own skill level downloading the following script (this script is not part of the application process. It will not be reviewed by the BSE admissions team or the instructors).
Course Outline
The course lasts 5 days. At the end of each lecture, students will be given projects to complete on a voluntary basis. The projects will be reviewed in the following lecture.
The course evolves around the following units:
1. Data visualization
- Elements of data visualization
- Exploration plots: scatter, lines, barplots, boxplots
- Advanced plots: correlation, regression, biplots
- Special plots
- Reporting using visualization
Keywords: seaborn, plotly
2. Data Handling
- Handling missing data: imputation methods
- Feature transformation and engineering: normalization, dimensionality reduction, category encoding
Keywords: sklearn
3. Supervised learning
3.1. Linear models for regression
- Linear models and non-linear feature maps
- Model evaluation
- Bias-Variance tradeoff
- Penalized regression
- Cross validation and model selection
3.2. Linear models for classification
- Logistic regression
- Misclassification, ROC, AUC
- Class imbalance
3.3. Nonlinear models: decision trees
- Decision trees
- Variable selection
- Random Forests
- Bagging and boosting
Keywords: sklearn
4. Unsupervised learning
- Clustering
- Principal components
Keywords: clustering, principal components, special clustering
5. Data handling, Supervised and Unsupervised Learning with R.
References
The Elements of Statistical Learning. T. Hastie, R. Tibshirani, and J. Friedman. Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2001)
An Introduction to Statistical Learning: with Applications in R, G. James, D. Witten, T. Hastie, R. Tibshirani, Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2021)
Software / Hardware
The course is based on Jupyter Notebooks that can be run in Google Colab. Laptop is required.
About the Instructor
The course will be delivered by André B.M. Souza, Assistant Professor in the Economics, Finance and Accounting at ESADE Business School. His research interests include forecasting, financial econometrics, machine learning, empirical finance and empirical macroeconomics. He has taught several econometrics and statistics courses for the Master's programs at the BSE.
André B.M. Souza
Assistant Professor, ESADE
Harnessing Language Models: Your Path to NLP Expert
Course Overview
In an era marked by rapid technological advancements and a growing reliance on digital communication, the role of Natural Language Processing (NLP) and Large Language Models (LLMs) is becoming increasingly pivotal in shaping our future work environments. LLMs have revolutionized the way we interact with, analyze, and derive insights from vast amounts of text data. In this context, mastering the art of working with these models is not just a skill; it's a strategic advantage that can elevate the productivity of an organization and transform the way we work.
NLP and LLM mastery redefine the digital frontier, opening doors to limitless possibilities
To embark on this journey, we will commence by unraveling the fundamentals of transformers and BERT, demystifying their inner workings to provide you with a solid foundation of how deep learning is used for NLP. As the course progresses, we will delve deeper into the world of Large Language Models (LLMs), focusing on practical applications and hands-on exercises that enable you to harness the power of LLMs.
Upon successful completion of the course, you will have acquired a comprehensive understanding of:
- NLP Methods: Understand the most important concepts of NLP and how LLMs compare to other NLP methods.
- Transformers and BERT: You will dive deep into transformer architecture, with a particular focus on BERT and its practical applications in NLP.
- Large Language Models (LLMs): Mastery of LLM fundamentals, their roles in NLP, and practical uses.as sentiment analysis, language translation, and text summarization.
- Hugging Face Tools: Proficiency in using Hugging Face's cutting-edge tools, libraries, and APIs will empower you to harness the full potential of LLMs for tasks like text generation, question-answering, and more.
- Real-World Use Cases: You will develop the capability to identify and apply NLP and LLM techniques in real-world scenarios.
- Ethical Concerns: Understanding the ethical considerations surrounding LLMs is critical in today's data-driven world.
In the digital age, NLP and LLM expertise are the keys to innovation and success
You can also expect to develop critical skills such as:
- Fine-Tuning LLMs
- Few-Shot Learning
- Prompt Engineering
- Problem Solving
The course aims to equip students with the expertise needed to stay relevant in today's data-driven landscape and thrive and lead in future work environments where LLMs will undoubtedly play a pivotal role.
Course Learning Methods
In this course, students will learn through:
- Lectures and Presentations: Engaging lectures and presentations to provide foundational knowledge of NLP concepts, LLMs, and tools.
- Hands-On Labs: Practical, hands-on labs and exercises where students can apply their knowledge by working on real-world NLP projects.
- Case Studies: In-depth case studies of NLP applications in various industries, allowing students to analyze and understand real-world scenarios.
- Real-Time Demos: Live demonstrations of NLP tools and techniques to illustrate practical implementation.
Prerequisites
Competence in Python, R, Jupyter Notebooks, and matrix algebra is recommended for optimal course comprehension. Participants with limited or no experience with programming in Python and R are encouraged to register for the "Coding Bootcamp in Python and R" course.
While a graduate-level background in Statistics, Machine Learning, or Data Science is not mandatory to attend the course, it is highly desirable. Participants with limited experience in these fields are encouraged to register for the "Foundations of Data Science" course.Course Outline
- Class 1: "Foundations of NLP and LLMs"
An introductory overview of NLP methods, including key concepts and comparisons with other NLP techniques. Discussion of ethical concerns. - Class 2: "Transformers, BERT, and Large Language Models"
A deep dive into transformer architecture, with a particular focus on BERT and its practical applications in NLP. Explore the fundamentals and practical roles of Large Language Models (LLMs). - Class 3: "Few-Shot Learning and Hugging Face Tools"
Learn how to efficiently apply few-shot learning techniques for fast problem-solving in NLP. Gain proficiency in using Hugging Face's tools, libraries, and APIs for tasks related to Large Language Models (LLMs). - Class 4: "Prompt Engineering for LLMs"
Develop expertise in crafting precise prompts that effectively extract desired information from Large Language Models (LLMs). - Class 5: "Fine-Tuning LLMs and Adaptation"
Master the art of customizing pre-trained Large Language Models (LLMs) for specific tasks and domains. Learn how to adapt these models to your unique data and requirements.
References
- Vaswani et al., (2017), Attention Is All You Need
- Devlin et al., (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Alamar, Jay, (2018) The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
- Alamar, Jay, (2018) The Illustrated Transformer
- Wolf et al. (2019) HuggingFace's Transformers: State-of-the-art Natural Language Processing
- Sun et al., (2019), How to Fine-Tune BERT for Text Classification?
- Brown et al., (2020), Language Models are Few-Shot Learners
- Gao, Tianyu, (2021), Prompting: Better Ways of Using Language Models for NLP Tasks
- Timo Schick and Hinrich Schütze (2021). Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference
- Bender et al., (2021), On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
- Strubell et al. (2019) Energy and Policy Considerations for Deep Learning in NLP
- Dodge et al. (2022) Measuring the Carbon Intensity of AI in Cloud Instances
- Sheng et al. (2019) The Woman Worked as a Babysitter: On Biases in Language Generation
Software / Hardware
Students are required to have their own laptop or desktop computer and a stable Internet connection to fully engage in and benefit from the course.
About the Instructors
Hannes Mueller is a tenured researcher at the Institute for Economic Analysis (IAE-CSIC) and an Associate Research Professor at the BSE. He is the director of the Data Science for Decision Making Program and head of department at the IAE-CSIC. He is also affiliated to MOVE and the CEPR.
Prof. Mueller's fields of interest are Machine Learning, Political Economy, Development Economics and Conflict Studies with a particular focus on the effect of violent conflict on the economy. In his most recent projects Prof. Mueller adopts supervised and unsupervised machine learning techniques to forecast and nowcast violence using large archives of newspaper articles and satellite images.
He has published in leading journals in Economics and Political Science such as the American Economic Review (AER), the American Political Science Review (APSR), the Journal of the European Economic Association (JEEA) and the American Journal of Economics: Macro (AEJ: Macro).
Professor Mueller is involved in research projects targeting public policy problems. He has contributed reports for the World Bank and International Monetary Fund on fragility and the economic effects of conflict, for a joint UN/World Bank study on conflict prevention and for the UN Economic Commission for Africa on structural change in Northern Africa. He is currently involved in several projects with the Banco de España developing techniques for nowcasting and forecasting economic conditions and political risks with text. He has also conducted capacity building missions for staff in international organizations, private firms and governments.
Arnault Gombert is a senior data scientist and head of the data sciences department at Citibeats. He is in charge of the operational research at Citibeats focusing mainly on Natural Language Processing algorithms applied to social networks data. He works on detecting from posts and/or profiles demographics characteristics of users to be able to sample representative pieces of the population from social media. He also works on semi-supervised learning methods to adapt large language models to make. He has also worked as a freelancer in Natural Language Processing in different fields such as food-processing industry, teaching assistance sector or legaltech.
He works at the border between Machine Learning investigation and applications: developing solutions for start-up that are inspired from state-of-the-art research. One main concern is making machine learning useful for all and not restricting it to a small part of the population. The second main concern is about bringing an environmental impact perspective in the data science field in order not to judge only by performance metrics.Arnault studied at ENSAE and holds a diploma of engineering in statistics and economics. He also holds a master in research in quantitative economics from University of Paris Saclay, HEC, École Polytechnique and ENSAE.
Statistical Machine Learning For Large Unstructured Data
Course Overview
We introduce a combination of cutting-edge data analysis methods for analyzing large and unstructured data. The goal is that attendees become familiar with data analysis methods lying at the frontier of research and applications and how to apply them in practice. We focus on problems where one wishes to go beyond point prediction and wants to obtain statistically valid inferences on parameters, hypotheses, or probabilistic forecasts. We emphasize Bayesian methodology as a natural probabilistic machine learning framework for such tasks. Recent advances in the statistical, Bayesian, and machine learning literature will be the core of the lectures.
We start discussing topics in high-dimensional Statistics, focusing on penalized likelihood and Bayesian methods for high-dimensional regression. Then, we discuss how to adapt them for treatment effect estimation in causal inference. Third, we discuss probabilistic machine learning methods such as non-parametric regression and probabilistic forecasts. Alternative ways to quantify uncertainty such as conformal inference will also be touched upon. Finally, we discuss latent discrete variable models helpful in analyzing unstructured data such as count-based outcomes or text. We also introduce modern computational methods and programming software that allow one to deploy efficiently the methods discussed and showcase how to do so via case studies. We will emphasize applications in Economics and the Social Sciences, although the presented ideas are widely applicable to other fields, and we also present examples from disciplines such as Biomedicine.
Prerequisites
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra, programming, and data analysis with Python and R as well as fundamental concepts in statistical learning. Overall the type of material that is covered in the "Computing Bootcamp" and "Foundations of Data Science" courses.
Course Outline
The program combines a series of lectures by leading researchers in the development of Bayesian methods and their applications and practical sessions with hands-on data analysis. The course is 20 hours, split into ten sessions of 2 hours each. Theory and practice will alternate within each session, but two practical sessions focus on hands-on data analysis.
- Regression with many covariates I. Penalized likelihood and the Bayesian approach to model selection and averaging.
- Regression with many covariates II. Computation basics (Laplace approximations, Markov Chain Monte Carlo).
- Causal inference. Treatment effects with many variables, Double machine learning, Bayesian views on topics.
- Practice Session. Regression with many covariates and linear treatment effects
- Probabilistic machine learning. Non-parametric regression. Probabilistic forecasts, conformal inference.
- Practice Session. Probabilistic machine learning
- Latent Variable Models. Principal components analysis, mixture and mixed-membership models, latent Dirichlet allocation
- Scalable Bayesian inference. Hamiltonian Monte Carlo, Black Box Variational Inference.
- Applications. Overview of latent variable models applied to problems in economics and related disciplines
- Practice Session. Implementation of scalable Bayesian inference methods.
References
- The Elements of Statistical Learning. T. Hastie, R. Tibshirani, and J. Friedman. Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2001)
- An Introduction to Statistical Learning: with Applications in R, G. James, D. Witten, T. Hastie, R. Tibshirani, Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2021)
Software / Hardware
The course is based on Jupyter Notebooks that can be run in Google Colab. Laptop is required.
About the Instructors
Lorenzo Cappello is an assistant professor in the Statistics group at Universitat Pompeu Fabra and an affiliated professor at the Barcelona School of Economics. He previously was a postdoctoral scholar in the Department of Statistics at Stanford University, mentored by Prof. Julia A. Palacios. He earned my PhD in Statistics in 2018 from the Department of Decision Sciences at the Bocconi University, under the supervision of Prof. Stephen G. Walker and Prof. Sonia Petrone.
Stephen Hansen, PhD, London School of Economics, is an Associate Professor of Economics at Imperial College Business School. His research focuses on organizational economics and monetary policy, relying on unstructured data sources and machine learning methods to address questions in these areas.
Who will benefit from this program?
This course is specifically aimed at:
- Students: graduate students looking to build a strong foundation in machine learning.
- Professionals: professionals and consultants who want to transition into data science and machine learning roles and are interested in expanding their machine learning skills for predictive modeling and data-driven decision-making.
- Researchers: Researchers, academics, and PhD students seeking to incorporate machine learning methods into their research projects
Requirements for Foundations course
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra and programming with Python and R. Students that do not have a significant programming experience will be admitted provided that they attend the 8h "Coding Bootcamp in Python and R" that will be held prior to the "Foundations of Data Science" course.
Requirements for the other courses
Besides the basic entry requirements, the participants of this course are expected to be familiar with the fundamental concepts in statistical learning and programming with Python and R. Overall the type of material which is covered in the "Coding Bootcamp in Python and R" and "Foundations of Data Science" courses.
Credit transfers (ECTS)
Students will deliver a short final project one week after the summer school finishes. It will consist in solving a final problem that will include the practical and empirical issues worked on in class.
Consult the Credit Transfer page for more information about this option.
Certificate of attendance
Participants not interested in credit transfer will instead receive a Certificate of Attendance, stating the courses and number of hours completed. These students will be neither evaluated nor graded. There is no fee for the certificate.
Fees
Multiple course discounts are available. Fees for courses in other Summer School programs may vary.
Course | Modality | Hours | ECTS | Regular Fee | Reduced Fee* |
---|---|---|---|---|---|
Coding Bootcamp in Python and R | Face-to-face | 8 | 0 | 600€ | 350€ |
Foundations of Data Science | Face-to-face | 17.5 | 1 | 1500€ | 875€ |
Harnessing Language Models: Your Path to NLP Expert | Face-to-face | 17.5 | 1 | 1500€ | 875€ |
Statistical Machine Learning for Large and Unstructured Data | Face-to-face | 17.5 | 1 | 1500€ | 875€ |
* Reduced Fee applies for PhD or Master's students, Alumni of BSE Master's programs, and participants who are unemployed.
** Flexible cancelation policy: view the BSE Summer School Policies
See more information about available discounts or request a personalized discount quote by email.
Day / Time | Mon* | Tue | Wed | Thur | Fri | Sat |
---|---|---|---|---|---|---|
9:00 - 11:00 |
Coding Bootcamp in Python and R** |
Foundations of Data Science (Lecture)*** | ||||
13:00 - 14:00 | ||||||
14:30 - 16:00 | Foundations of Data Science (Practical) | |||||
16:00 - 18:00 |
*This week starts on Monday with the one-day course "Coding Bootcamp in Python and R"
**There will be a lunch break from 13:00 - 14:00 on Monday
***This course requires participants to bring their own portable laptop
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
9:00 - 11:00 | Harnessing Language Models: Your Path to NLP Expert (Lecture)* | ||||
14:30 - 16:00 | Harnessing Language Models: Your Path to NLP Expert*(Practical) |
*This course requires participants to bring their own portable laptop
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
9:00 - 11:00 | Statistical Machine Learning for Large and Unstructured Data (Lecture)* | ||||
14:30 - 16:00 | Statistical Machine Learning for Large and Unstructured Data (Practical) |
*This course requires participants to bring their own portable laptop
Mix and match your summer courses!
Remember that you can combine Data Science courses with courses in any of the other BSE Summer School programs (schedule permitting).