Deep Learning and Large Language Models

Riccardo Cantini’s research in deep learning explores the potential of Transformer-based Large Language Models (LLMs), such as BERT and GPT, showcasing their versatility across diverse domains. Sustainability is a central theme in this research area, emphasizing green awareness and promoting the efficient, fair, and trustworthy use of LLMs.

LLM Applications

Riccardo Cantini’s research in this field introduces several contributions, with a particular emphasis on social media contexts and real-world use cases. In the hashtag recommendation field, a novel BERT based methodology was developed, namely HASHET (HAshtag recommendation using Sentence-to-Hashtag Embedding Translation). It employs dual latent spaces to embed the textual content of posts and their corresponding hashtags, learning to map semantic features of text to latent hashtag representations for a more accurate and scalable recommendation process. LLMs were also employed to tackle the spread of false information in online social media discourse. Specifically, a novel methodology was designed, termed TM-FID (Topic-oriented Multimodal False Information Detection), which combines false information detection with neural topic modeling within a semi-supervised multimodal framework. By jointly leveraging textual and visual information in online news, TM-FID provides insights into how false information influences specific discussion topics, enabling a comprehensive and fine-grained understanding of its spread and impact on social media conversations. LLMs were also leveraged in automated reporting from social posts, with applications ranging from analyzing COVID-related discussions to enhancing emergency management through disaster monitoring. To ensure high-quality outputs, LLMs are enhanced with information across dimensions such as sentiment, emotion, and topic, extracted from user-generated content through fine-tuning and in-context learning approaches.

The potential of LLMs has also been explored in the healthcare domain, investigating how explainable AI (XAI) models can present technical results in a human-readable format through seamless integration with generative AI, improving accessibility for healthcare professionals. This approach was applied in two key areas: interpretable breast cancer grade prediction using genomic data and explainable depression detection from social media posts. In the former, explanations are achieved using interpretable-by-design machine learning models based on randomized trees and ensemble distillation. For the latter, explanations are generated using BERT-XDD (BERT-based eXplainable Depression Detection), a novel self-explainable model that integrates BERTweet with a bi-directional Long Short-Term Memory (LSTM) network, producing classifications and explanations via masked attention.

Green-Aware and Sustainable AI

In this context, techniques for efficient fine-tuning are combined with curriculum learning, meta-learning, and knowledge distillation, enabling effective learning from limited data and scarce computing resources. Innovative methodologies have been proposed to enhance cross-architecture knowledge distillation from LLMs to more compute-efficient neural architectures. Notably, DiXtill (XAI-driven Knowledge Distillation) introduces a novel approach to distilling explainable knowledge from a teacher LLM into a lightweight, self- explainable student model. DiXtill combines local explanations with traditional prediction-based supervision, yielding higher accuracy and interpretability compared to standard distillation techniques. Furthermore, it achieves superior compression ratios and speedups compared to approaches like post-training quantization and attention head pruning, enabling efficient deployment on resource-constrained devices and advancing sustainable edge AI applications. Expanding upon online knowledge distillation, a parameter-efficient meta-distillation approach was developed, integrating the effectiveness of the learning-to-teach paradigm with the efficiency of parameter-efficient fine-tuning (PeFT) via low-rank adaptation (LoRA) within a second-order learning framework. LoRA-based PeFT was also used to efficiently fine-tune a set of LLMs collaboratively, introducing a novel knowledge-sharing approach based on transfer learning and transferability estimation, leading to faster convergence and reduced computational costs.

Beyond computational efficiency, Cantini’s research also addresses social sustainability by examining biases and stereotypes in responses generated by LLMs across various scales. Through in-depth adversarial analysis, hidden vulnerabilities in LLMs are probed, quantifying the effectiveness of different jailbreak attacks and exploring how model size influences filtering mechanisms and overall safety. By thoroughly characterizing LLM behavior under bias elicitation in terms of fairness and robustness, this analysis provides critical insights to promote the ethical design of AI systems. Similarly aimed at ensuring fairness in LLM output, retrieval-augmented generation (RAG) systems are explored, combining agentic designs with real-time retrieval and textual entailment to enhance the accuracy, fairness, and reliability of LLM responses. Finally, sustainability-related research encompasses the integration of green awareness into neural architecture search techniques and the development of interpretable energy estimation models for edge AI applications.