Multi-dimensional Classification on Social Media Data for Detailed Reporting with Large Language Models

Abstract

Every day, more and more people harness the power of social media platforms to express their thoughts, share information and personal experiences, and engage with others. All this knowledge can then be transformed into informative reports with the assistance of Large Language Models (LLMs), like ChatGPT, which leverage deep learning techniques to analyze data and generate comprehensive analyses. By effectively classifying user-generated posts based on dimensions such as topic, sentiment, and emotion, it is possible to create even more detailed reports by carefully condensing large amounts of data collected along the different dimensions considered. To tackle this challenge, we have developed an automated approach with two primary goals: (i) categorizing posts across different dimensions using ready-to-use and fine-tuned classifiers; and (ii) generating detailed reports via LLMs that summarize posts with similar characteristics along the defined dimensions. In our analysis, we examined a large and varied set of posts about COVID, classifying them along several dimensions, including topic, content type, expressed sentiment and emotions, and reliability of information. Specifically, by choosing to generate a report for the main discussion topics present in the dataset, such as allergic reactions or school issues, and using the remaining dimensions for post classification, we successfully created highly detailed and informative reports with ChatGPT. These reports outperformed those generated directly by ChatGPT, in both quantitative measures such as linguistic scores and qualitative evaluations by field experts.

Publication
International Conference on Artificial Intelligence Applications and Innovations (AIAI2024), June 2024, pp. 100-114