Imagen de fondo
Data Science

Statistical Machine Learning for Large and Unstructured Data

Bayesian Methods and modern computational techniques for inference and analysis of large data sets.

clock_icon
17.5h (5 days)
price_icon_white
€850 - €1,450
people_icon_white
Face-to-face
language_icon
English
Program date: July 14 -18, 2025
Early bird deadline: April 15, 2025
Info icon
Learn more
Statistical Machine Learning for Large and Unstructured Data
Applications for 2025 Summer School programs are now open!

We introduce a combination of cutting-edge data analysis methods for analyzing large and unstructured data. The goal is for attendees to become familiar with methods lying at the frontier of research and applications and how to apply them in practice. We will discuss several technical details on how these methods and algorithms work. We focus on problems where the premiere goal is not a prediction but one wants to obtain statistically valid inferences on parameters, hypotheses, or probabilistic forecasts. We emphasize Bayesian methodology as a natural probabilistic machine learning framework for such tasks. Recent advances in the statistical, Bayesian, and machine learning literature will be the core of the lectures.

We start introducing the basics of probabilistic modeling, emphasizing how Bayesian learning allows us to build models (painlessly) describing complex dependencies while quantifying uncertainty in the unknown parameters. Then, we discuss the high-dimensional linear regression model, introducing penalized likelihood and Bayesian methods for high-dimensional regression. Rather than using these methods for prediction, we will discuss how to use them for inference. In particular, we will look at treatment effect estimation. Finally, we discuss latent discrete variable models helpful in analyzing unstructured data, with a main focus on text.

Parallel to the methodologies, we will introduce modern computational methods and programming software that allow one to deploy efficiently the methods discussed and showcase how to do so via case studies. Here, we will describe the technical details of the algorithms and some publicly available implementations. We will emphasize some applications in Economics and the Social Sciences, although the presented ideas are widely applicable to other fields, and we also present examples from disciplines such as Biomedicine. The practical session will mostly be based on R, but most techniques discussed have a Python implementation as well.

Faculty

An Advanced Data Science course focused on Machine Learning

This course would be of use for the following profiles:

  • Graduate students looking to build a strong foundation in machine learning
  • Professionals who want to take a graduate-level class in data science and machine learning to explore the technical aspects of the algorithm discussed (see syllabus for more details)
  • Researchers, academics and PhD students seeking to incorporate machine learning methods into their research projects

Learn cutting-edge techniques for analyzing large, unstructured data

Participants of this course will:

  • Apply Bayesian approaches for valid inferences, not just predictions
  • Understand Bayesian modeling to describe dependencies and quantify uncertainty
  • Explore Bayesian methods for inference in high-dimensional regression models
  • Study latent variable models for analyzing unstructured data like text
  • Learn modern tools for implementing advanced analytical methods through case studies
  • Discover applications in Economics, Social Sciences, and Biomedicine, among others
  • Gain hands-on experience using R for implementing analytical techniques

Program Syllabus for Statistical Machine Learning for Large Unstructured Data

Here is a course outline of what you will cover.

Course Outline

Plus iconPlus icon
  • Intro to Probabilistic Modeling, Bayesian Inference, and Hierarchical Modeling
  • Computation for Probabilistic Models I: Markov chain Monte Carlo
  • Implementing Complex Bayesian Models and Computational Methods (Stan)
  • Large data: regression with many covariates I. Penalized likelihood and Bayesian approaches, primer on implementing these methods
  • Large data: regression with many covariates II. Bayesian Model Selection. Treatment effects with many variables, Double machine learning, and Bayesian views on topics
  • Linear treatment effects estimators.
  • Computation for Probabilistic Models II: Variational Inference, how it works, and how it is implemented in publicly available software
    Unstructured data: the case of text data. Exploratory analysis and early latent variable models.
  • Unstructured data: the case of text data. Latent Dirichlet allocation, some extensions, and applications of these methods in economics and related discipline
  • Implementation of latent variable models used for unstructured data (LDA etc.)

List of References

Below is a list of references that may help you prepare for the course.

Articles and Books

Plus iconPlus icon
  • The Elements of Statistical Learning. T. Hastie, R. Tibshirani, and J. Friedman. Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2001).
  • An Introduction to Statistical Learning: with Applications in R, G. James, D. Witten, T. Hastie, R. Tibshirani, Springer Series in Statistics Springer New York Inc., New York, NY, USA, (2021).
  • Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning (Vol. 4, No. 4, p. 738). New York: springer. Gelman.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson D., Vehtari A. & Rubin, D. B. (2015). Bayesian data analysis 3. Chapman and Hall/CRC.
  • A series of scientific papers describing recent work by leading researchers in Econometrics, Machine Learning and Statistics.

Software / Hardware

Plus iconPlus icon
  • The course is based on Jupyter Notebooks that can be run in Google Colab
  • The practical session will mostly be based on R, but some techniques discussed also have a Python implementation
  • Students are required to have their own laptop or desktop computer

Why join our Summer School?

All BSE Summer courses are taught to the same high standard as our Master’s programs. Join us to:

1

Network with like-minded peers

2

Study in vibrant Barcelona

3

Learn from world-renowned faculty

Admissions and Requirements

It is the participant’s responsibility to ensure they meet the Admissions criteria.

Program date: July 14 - 18, 2025
Early bird deadline: April 15, 2025

Requirements

Summer School applicants normally demonstrate one or more of the following:

  • A strong background in Economics or a field closely related to the course topic (Statistics, Law, etc.)
  • Postgraduate degree or current Master’s/PhD studies related to the course topic
  • Relevant professional experience

Requirements for this Statistical Machine Learning course

  • Basic knowledge of linear algebra, programming, Python/R for data analysis, and statistical learning is required
  • Students who do not have significant programming experience will be admitted provided that they attend the 8h “Coding Bootcamp
  • While a graduate-level background in Statistics, Machine Learning, or Data Science is not mandatory to attend the course, it is highly desirable. Participants with limited experience in these fields are encouraged to register for the “Foundations of Data Science” course

Schedule

Here is your schedule for this edition of BSE Data Science Summer School, Statistical Machine Learning course.

Time
14
mon
15
tue
16
wed
17
thu
18
fri
09:00 - 11:00
Lecture
14:30 - 16:00
Practical

Credit Transfers (ECTS)

To be eligible for credit transfer, students must complete a final project.

Students will deliver a short final project one week after the summer school finishes. It will consist in solving a final problem that will include the practical and empirical issues worked on in class.

Consult the Summer School Admissions page for more information about this option.

Certificate of Attendance

Participants who attend more than 80% of the course will receive a Certificate of Attendance, free of charge.

Fees

Multiple course discounts are available; see more information about available discounts. Fees for courses in other Summer School programs may vary.

Course
Coding Bootcamp in Python and R (8h day)
Foundations of Data Science
Harnessing Language Models: Your Path to NLP Expert
Statistical Machine Learning for Large and Unstructured Data
Modality
Online
Face-to-face
Face-to-face
Face-to-face
Total Hours
8
17.5
17.5
17.5
ECTS
0
1
1
1
Regular Fee
600€
1,450€
1,450€
1,450€
Reduced Fee*
350€
850€
850€
850€

FAQ

Need more information? Check out our most commonly asked questions or contact our Admissions Team.

Can I see the full Summer School calendar?

Plus iconMinus icon

You can view the full Summer School calendar here.

Is accommodation included in the course fee?

Plus iconMinus icon

Accommodation is not included in the course fee. Participants are responsible for finding accommodation.

Are the sessions recorded?

Plus iconMinus icon

Sessions will NOT be recorded; however, the materials provided by the professor will be available for a month after the course has finished.

How much does each Summer School course cost?

Plus iconMinus icon

Fees for each course may vary. Please consult each course page for accurate information.

Are there any discounts available?

Plus iconMinus icon

Yes, BSE offers a variety of discounts on its Summer School courses. See more information about available discounts or request a personalized discount quote by email.

Can I take more than one course?

Plus iconMinus icon

Yes! you can combine any of the Summer School courses (schedule permitting). See the full course calendar.

Cancelation and Refund Policy

Plus iconMinus icon

Please consult BSE Summer School policies for more information.

Are there any evening activities during the course?

Plus iconMinus icon

Yes, a social dinner is held once a week for all participants, it is free to attend.

Contact our Admissions Team

Related Courses

Summer School
Menu
Data Science

Harnessing Language Models: Your path to NLP Expert

Calendar Icon
July 7-11, 2025
Summer School
Menu
Data Science

Foundations of Data Science

Calendar Icon
30 June - July 4, 2025
Summer School
Menu
Data Science

Coding Bootcamp in Python and R

Calendar Icon
June 29, 2025
Subscribe to our newsletter
Want to receive the latest news and updates from the BSE? Share your details below.
Founding Institutions
Distinctions
Logo BSE
© Barcelona Graduate School of
Economics. All rights reserved.
FacebookInstagramLinkedinXYoutube