Uncovering the Semantics of Concepts Using GPT-4 and Other Recent Large Language Models

Open Access       

Authors: Gaël Le Mens, Balász Kovács, Michael T. Hannan and Guillem Pros

PNAS, Vol. 120, No 49, November, 2023

We use GPT-4 to create “typicality measures” that quantitatively assess how closely text documents align with a specific concept or category. Unlike previous methods that required extensive training on large text datasets, the GPT-4-based measures achieve state-of-the-art correlation with human judgments without such training. Because training data is not needed, this dramatically reduces the data requirements for obtaining high performing model-based typicality measures. Our analysis spans two domains: judging the typicality of books in literary genres and the typicality of tweets in the Democratic and Republican parties. Our results demonstrate that modern Large Language Models (LLMs) can be used for text analysis in the social sciences beyond simple classification or labelling.

This paper originally appeared as Barcelona School of Economics Working Paper 1394