The latest generation of Large Language Models (LLMs) like ChatGPT can perform state-of-the-art text analysis that is comparable to human judgments without the need for extensive training data, offering a cost-effective alternative for social and behavior science research, according to a new publication in PNAS by Barcelona School of Economics Affiliated Professor Gaël Le Mens (UPF and BSE) and co-authors Balázs Kovács (Yale University), Michael T. Hannan (Stanford University), and Guillem Pros Rius (UPF).
The team's paper focuses on ChatGPT's ability to assess "perceived typicality" or the degree to which a text is representative of a given concept or category (for example, a book's synopsis and a literary genre, or a political candidate's speeches and a political party's platform).
The authors found that the typicality measurements made by ChatGPT represented a dramatic improvement over previous LLMs, producing assessments that were an excellent match for human judgment. While they are optimistic about their results, they advise researchers to validate model-based measurements and to provide evidence of their correlation with measurements provided by human judges.
Citation
Le Mens, Gaël, Balázs Kovács, Michael T. Hannan, and Guillem Pros Rius. Uncovering the Semantics of Concepts Using GPT-4. Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2023-11-30.
Draft circulated under the title, Uncovering the Semantics of Concepts Using GPT-4 and Other Recent Large Language Models, BSE Working Paper 1394.