Authors: Gaël Le Mens, Balász Kovács, Michael T. Hannan and Guillem Pros
PNAS, Vol. 120, No 49, November, 2023We use GPT-4 to create “typicality measures” that quantitatively assess how closely text documents align with a specific concept or category. Unlike previous methods that required extensive training on large text datasets, the GPT-4-based measures achieve state-of-the-art correlation with human judgments without such training. Because training data is not needed, this dramatically reduces the data requirements for obtaining high performing model-based typicality measures. Our analysis spans two domains: judging the typicality of books in literary genres and the typicality of tweets in the Democratic and Republican parties. Our results demonstrate that modern Large Language Models (LLMs) can be used for text analysis in the social sciences beyond simple classification or labelling.