Over the last few years machine learning techniques have become increasingly popular and are now widely used in both the academia and the industry.
The BSE Data Science Summer School provides an overview of the state-of-the-art tools employed in machine learning.
"Foundations of Data Science" is an introductory course that provides a broad overview of the main methodologies used to analyze data in data science. "Using Text as Data: Methods and Applications" and "Deep Learning and Applications" are advanced courses focusing on specialized topics.
The teaching style of the courses is hands-on. Special attention will be devoted on applying the methodologies introduced in class on empirical data using Jupyter notebooks coded in either Python or R.
Overall, the courses of the BSE Summer School in Data Science give participants the tools to apply modern machine learning techniques, from data exploration to building predictive models and extracting insights.
Course list for 2022
June 26, 2022 (Face-to-face)
- Coding Bootcamp in Python and R
Instructor: Laura Battaglia
This course is held in an intensive one-day format.
Week of June 27 - July 1, 2022 (Face-to-face)
- Foundations of Data Science
Instructor: André B.M. Souza (ESADE)
Week of July 4-8, 2022 (Face-to-face)
- Using Text as Data: Methods and Applications
Instructors: Rubén Durante (ICREA-UPF and BSE) and Hannes Mueller (IAE-CSIC and BSE)
Week of July 11-15, 2022 (Face-to-face)
- Deep Learning and Applications
Instructors: Vicenç Gómez (UPF Artificial Intelligence and Machine Learning group) and Anders Jonsson (UPF Artificial Intelligence and Machine Learning group)
Program director
Apply to Summer School 2022
There is no fee to apply. Submit your application online in a few easy steps!
Apply to Summer School courses
Early-bird registration deadline: April 4, 2022
Last day to apply: June 16, 2022
Fees and discounts
Fees vary by course. You may be eligible for one or more available Summer School discounts. Our staff can provide a personalized quote for you.
Applications will open soon!
Very soon you'll be able to apply to the 2023 edition of the BSE Summer Schools.
See you in Summer 2023!
Courses for the 2023 edition of the BSE Summer Schools will be announced later this year. We look forward to meeting you here in Barcelona!
Questions about Summer School?
Let us design a course for your employees at any time of year.
Coding Bootcamp in Python and R
Overview and Objectives
The course provides basic training in programming with Python and R for data analysis and machine learning. This is an intensive 8-hour course based on a hands-on approach using Jupyter notebooks.
Prerequisites to Enroll
Although not mandatory, some knowledge of Python, Jupyter notebooks, and matrix algebra is recommended. Students are encouraged to install R and Python on their own laptop or desktop computer. This is also non mandatory since the course is based on Jupyter notebooks that can be run with Google Collab.
Course Outline
The course evolves along the following thematic units:
1. Programming with Python
- Running Python with Jupyter
- Basic variables: Numbers and strings
- Basic operations
- Loops, control flow
- Data structures: Lists, maps, reductions
- Functions and classes
- Inputs and outputs
- String manipulation
- Date/time manipulation
- Special packages: panda and numpy
2. Programming with R
- Running R with Jupyter
- Basic variables: Numbers and strings
- Basic Operations
- Loops, control flow
- Data structures: Lists, vectors, matrices, data frames
- Functions
- Inputs and outputs
- String manipulation
- Date/time manipulation
- Basic data visualization
- Special packages: ggplot2, dplyr
About the instructor
Laura Battaglia is a Research Assistant in Statistics and Machine Learning at the Barcelona School of Economics. She has an Economics, Finance, and Data Science background, and extensive work experience at the European Central Bank.
Deep Learning and Applications
Overview and Objectives
In this course, we will introduce several aspects of modern machine learning, deep learning, and its applications. The learning objectives are:
- To understand the basic techniques of deep learning at an analytical and programming level.
- To be able to use some of these techniques in popular application domains.
Entry Requirements
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra, programming, data analysis with Python and R as well as fundamental concepts in statistical learning. Overall the type of material which is covered in the "Computing Bootcamp" and "Foundations of Data Science" courses.
Course Outline
Each day will consist of two parts: a morning lecture introducing the main theoretical concepts and methods, and a practical session in the afternoon:
- Day 1: An overall introduction to Deep Learning, including the basic building blocks and optimization algorithms for training deep models.
- Day 2: Computer Vision, with focus on neural architectures such as convolutional networks, generative models, or normalizing flows, for tasks such as image classification, object detection or image captioning.
- Day 3: Natural Language Processing, with focus on word/document embeddings, recurrent neural networks, and large-scale pre-trained language models based on transformers.
- Day 4: Deep Learning on Graphs, with focus on learning architectures designed for inputs structured as graphs, such as social networks, web graphs, or protein networks.
- Day 5: Deep Reinforcement Learning, with focus on how deep learning can be used in sequential decision making problems such as board/video-games.
About the Instructors
Vicenç Gómez is currently a tenure-track faculty under a Ramon y Cajal fellowship in the Artificial Intelligence and Machine Learning group at UPF, which he joined with a transnational academic career grant (FP7 Marie Curie Actions) in 2014. Prior to this, he worked for more than six years as a research scientist in the machine learning group at the Radboud University Nijmegen, the Netherlands. His main research interests are probabilistic inference and reinforcement learning. He works on developing novel machine learning methods derived from first principles and understanding their theoretical properties, as well as their application for modeling, understanding and improving the functioning of networked systems.
Anders Jonsson is professor in the ICT Department of UPF, and director of the Artificial Intelligence and Machine Learning group. He received his MSc in Engineering Physics from the Royal Institute of Technology in Stockholm, Sweden, and his PhD in Computer Science from the University of Massachusetts Amherst, USA. His research work is mainly focused on sequential decision problems, either in the form of reinforcement learning, or in the form of automatic planning. In particular, his work focuses on finding and exploiting the structure of these types of problems in order to simplify their solution. He has participated in numerous international projects.
Reading Material:
1. Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD. Jeremy Howard, Sylvain Gugger. Publisher: O'Reilly Media
2. Deep Learning with PyTorch Step-by-Step: A Beginner's Guide.
3. Representation Learning: A Review and New Perspectives. Yoshua Bengio, Aaron Courville, Pascal Vincent
4. Distilling the Knowledge in a Neural Network - Geoffrey Hinton, Oriol Vinyals, Jeff Dean
5. Welcome to Pytorch tutorials - Pytorch Tutorials 1.9.1+cu102 documentation
UPF
Foundations of Data Science
Overview and Objectives
This is an intensive 20-hour hands-on course in the foundations of data science. The course provides basic training in data analysis and machine learning with Python and R. The material covered is motivated by specific data analysis questions, and each unit concludes with a project to be done by the students.
Entry Requirements
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra and programming with Python and R.
Students that do not have a significant programming experience will be admitted provided that they attend the 8h "Computing Bootcamp" that will be held prior to the “Foundations of Data Science” course.
You can check your own skill level downloading the following script (this script is not part of the application process. It will not be reviewed by the BSE admissions team or the instructors).
Course Outline
The course lasts 5 days. At the end of each lecture, students will be given projects to complete on a voluntary basis. The projects will be reviewed on the following lecture.
The course evolves around the following units:
1. Data visualization
- Elements of data visualization
- Exploration plots: scatter, lines, barplots, boxplots
- Advanced plots: correlation, regression, biplots
- Special plots
- Reporting using visualization
Keywords: seaborn, plotly
2. Data Handling
- Handling missing data: imputation methods
- Feature transformation and engineering: normalization, dimensionality reduction, category encoding
Keywords: sklearn
3. Supervised learning
3.1. Linear models for regression
- Linear models and non-linear feature maps
- Model evaluation
- Bias-Variance tradeoff
- Penalized regression
- Cross validation and model selection
3.2. Linear models for classification
- Logistic regression
- Misclassification, ROC, AUC
- Class imbalance
3.3. Nonlinear models: decision trees
- Decision trees
- Variable selection
- Random Forests
- Bagging and boosting
Keywords: sklearn
4. Unsupervised learning
- Clustering
- Principal components
Keywords: clustering, principal components, special clustering
5. Data handling, Supervised and Unsupervised Learning with R.
About the Instructors
The course will be delivered by André B.M. Souza, Assistant Professor in the Economics, Finance and Accounting at ESADE Business School. His research interests include forecasting, financial econometrics, machine learning, empirical finance and empirical macroeconomics. He has taught several econometrics and statistics courses for the Master's programs at the BSE.
Using Text as Data: Methods and Applications
Overview and Objectives
An ever-increasing share of human communication is recorded as digital text. Analyzing and making sense of this vast amount of data is increasingly important for research in the social sciences.
This course provides an accelerated introduction to the theory and practice of text analysis by surveying methods for systematically extracting quantitative information from text, from classical content analysis and dictionary-based methods, to classification methods, scaling methods, and topic models. The course introduces the theoretical foundations for text analysis but mainly takes a practical approach, illustrating the methods through state-of-the-art applications to research questions in economics, political science, and finance.
Lectures will cover a series of case-studies from economics and related fields, and will be complemented by hands-on programming sessions in Python. By the end of the course students will be able to: i) convert vast archives of text into a format that can be used for data analysis, and ii) use the data generated from text to tackle research and policy problems relevant to their interests and organizations.
Entry Requirements
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra, programming, data analysis with Python and R as well as fundamental concepts in statistical learning. Overall the type of material which is covered in the "Computing Bootcamp" and "Foundations of Data Science" courses.
Course schedule
1. From text to data
- Introduction to documents, metadata, corpora
- Word counts, document-feature matrix
- Collocation and n-grams
2. Important notions
- TF-IDF
- Measuring text length, diversity, and complexity
- Measuring similarity between documents
3. Machine Learning
- Introduction to statistical learning theory
- Supervised and unsupervised learning
4. Statistical methods
- Dictionary-based methods
- Penalized linear models
- Dimension reduction and feature selection
- Non-Linear text regression
- Random forests
5. Generative language models: unsupervised methods
- Latent semantic analysis
- Topic models and Latent Dirichlet Allocation (LDA)
- K-means clustering
6. Word Embeddings
- Word2Vec
- Doc2Vec
About the instructors
Ruben Durante is ICREA Research Professor at UPF. He is also Affiliated Professor of the Barcelona GSE and of the Barcelona Institute for Political Economy and Governance, Research Affiliate of the Center for Economic Policy and Research, and associate editor of the European Economic Review. He holds a Ph.D. from Brown University and master’s degrees from Brown and Sorbonne. His main area of interest is political economy, with a focus on the functioning and impact of traditional and new media in democratic societies. His work has been featured extensively in the popular press and published in top economic journals including the Journal of Political Economy, the American Economic Review, Management Science, the American Economic Journal: Applied Economics, the Journal of the European Economic Association, the Economic Journal, and the Journal of Public Economics among others. He is the recipients of several awards and grants including a five-year Starting Grant from the European Research Council. He has worked as consultant for the World Bank, the Inter-American Development Bank, and the UN.
Hannes Mueller is a tenured researcher at the Institute for Economic Analysis (IAE-CSIC) and an Associate Research Professor at the BSE. His fields of interest are Political Economy, Development Economics and Conflict Studies with a particular focus on the effect of violent conflict on the economy. Most recently, Prof. Mueller is trying to adopt supervised and unsupervised machine learning techniques for economics and political science research. He has published in leading journals in Economics and Political Science such as the American Economic Review (AER), the American Political Science Review (APSR), the Journal of the European Economic Association (JEEA) and the American Journal of Economics: Macro (AEJ: Macro). He has contributed reports for the International Growth Centre (UK government) and the World Bank on the economic effects of conflict, a joint UN/World Bank study on conflict prevention and the UN Economic Commission for Africa on structural change in Northern Africa. He is currently involved in projects with the Banco de España developing techniques for nowcasting and forecasting economic conditions with text.
Main readings:
- Gentzkow, M., Kelly, B. T., and Taddy, M., 2017, Text as Data. NBER Working Paper #23276.
- Grimmer, J. and Stewart, B., 2013, Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts, Political Analysis, vol. 21, n. 3, pp. 267-297.
- Manning, C. D., Raghavan, P., and Shutze, H., 2008, An Introduction to Information Retrieval, Cambridge University Press.
- Bengfort, B., Ojeda, T., Bilbro, R., Applied Text Analysis with Python, 2018, O’Reilly Media.
- Krippendorff, K., 2013, Content Analysis: An Introduction to Its Methodology, Sage.
Applications:
- Ash, E., Chen, D., and Naidu, S., 2018, Ideas Have Consequences: The Impact of Law and Economics on American Justice, Working Paper.
- Baker, S. R., Bloom, N., and Davis, S. J., 2016, Measuring Economic Policy Uncertainty, Quarterly Journal of Economics, vol. 131, n.4, pp. 1593-1636.
- Bandiera, O., Prat, A., Hansen, S., and Sadun, R., 2020, CEO Behavior and Firm Performance, Journal of Political Economy, vol. 128, n. 4, pp. 1325-1369.
- Blei, D., Ng, A., and Jordan, M., 2003, Latent Dirichlet Allocation, Journal of Machine Learning Research, vol. 3, pp. 993–1022.
- Blei, D., and Lafferty, D., 2006, Dynamic Topic Models, In Proceedings of the 23rd International Conference on Machine Learning.
- Cage, J., Hervé, N., and Viaud, M.-L., The Production of Information in an Online World, forthcoming, Review of Economic Studies.
- Garg, N., Schiebinger, L., Jurafsky, D., and Zou, J, 2018, Word embeddings quantify 100 years of gender and ethnic stereotypes, Proceedings of the National Academy of Sciences, vol. 115, n. 16, pp. 3635-3644
- Gentzkow, M. and Shapiro, J., 2009, What Drives Media Slant?, Econometrica, vol. 78, n.1, pp.35-71.
- Hassan, T., Hollander, S., van Lent, L., and Tahoun, A., 2018, Firm-Level Political Risk: Measurement and Effects, Working Paper.
- Hansen, S., McMahon, M., and Prat, A., 2018, Transparency and Deliberation within the FOMC: a Computational Linguistics Approach, Quarterly Journal of Economics, vol. 133, n. 2, pp. 801-870.
- Kelly, Bryan T., Papanikolaou, D., Seru, A., and Taddy, M, Measuring Technological Innovation Over the Long Run, forthcoming, American Economic Review: Insights.
- Loughran, T. and McDonald, B., 2011, When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10K-s, The Journal of Finance, vol. 66, n. 1, pp. 35-65.
- Tetlock, P. C., 2007, Giving Content to Investor Sentiment: The Role of Media in the Stock Market, Journal of Finance, vol. 62, n. 3, pp. 1139-1168.
Requirements for Foundations course
Applicants to all Summer School programs should meet the basic entry requirements. The participants of this course are expected to be familiar with the fundamentals of linear algebra and programming with Python and R. Students that do not have a significant programming experience will be admitted provided that they attend the 8h "Coding Bootcamp in Python and R" that will be held prior to the "Foundations of Data Science" course.
Requirements for the other courses
Besides the basic entry requirements, the participants of this course are expected to be familiar with the fundamental concepts in statistical learning and programming with Python and R. Overall the type of material which is covered in the "Coding Bootcamp in Python and R" and "Foundations of Data Science" courses.
Credit transfers (ECTS)
This BSE Summer School program offers participants the possibility of being assessed for the purpose of requesting official credit transfers (ECTS). There is an administrative fee of 25€ per credit.
Participants who wish to join the Summer School under this scheme will be asked to make an online request and pay the administrative fees during the standard admissions process.
Consult the Credit Transfer page for more information about this option.
Certificate of attendance
Participants not interested in credit transfer will instead receive a Certificate of Attendance, stating the courses and number of hours completed. These students will be neither evaluated nor graded. There is no fee for the certificate.
Fees
Multiple course discounts are available. Fees for courses in other Summer School programs may vary.
Course | Modality | Hours | ECTS | Regular Fee | Reduced Fee* |
---|---|---|---|---|---|
Coding Bootcamp in Python and R | Face-to-face | 8 | 0 | 525€ | 300€ |
Deep Learning and Applications | Face-to-face | 20 | 2 | 1400€ | 800€ |
Foundations of Data Science | Face-to-face | 20 | 2 | 1400€ | 800€ |
Using Text as Data: Methods and Applications | Face-to-face | 20 | 2 | 1400€ | 800€ |
* Reduced Fee applies for PhD or Master's students, Alumni of BSE Master's programs, and participants who are unemployed.
See more information about available discounts or request a personalized discount quote by email.
Day / Time | Sun* | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|---|
9:00 - 11:00 | Coding Bootcamp in Python and R | Foundations of Data Science | ||||
11:00 - 13:00 | ||||||
14:00 - 16:00 | Foundations of Data Science | |||||
16:00 - 18:00 |
* This week starts on Sunday with the one-day course "Coding Bootcamp in Python and R"
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
9:00 - 11:00 | Using Text as Data: Methods and Applications | ||||
14:00 - 16:00 | Using Text as Data: Methods and Applications |
Day / Time | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
9:00 - 11:00 | Deep Learning and Applications | ||||
14:00 - 16:00 | Deep Learning and Applications |
Mix and match your summer courses!
Remember that you can combine Data Science courses with courses in any of the other BSE Summer School programs (schedule permitting).