Offensive Language Detection in Social Media Using Machine Learning Algorithms

Published in Pontifical Catholic University of Campinas (PUC-Campinas), 2020

Deep neural networks based on general pre-trained language models, using the BERT (Bidirectional Encoder Representations from Transformers) representation, have achieved remarkable results in various natural language processing tasks. However, there is little research focused on fine-tuning these models for text classification tasks, particularly in the Portuguese language. On the other hand, the production of such models involves high computational costs (memory and processing time) and, consequently, significant energy consumption. At runtime, the large model size and high inference time limit their applicability in resource-constrained environments with real-time response requirements. This project aimed to propose and analyze mechanisms to, on one hand, improve the accuracy of Portuguese text classification models derived from general pre-trained language representations, and on the other hand, increase efficiency in both the training and deployment processes.

Status: Completed; Nature: Research.

Citation: ARAUJO, M. P; ADÁN COELLO, J. M. (2020). "Identificação de linguagem ofensiva em mídias sociais utilizando algoritmos de aprendizado de máquina." Pontifical Catholic University of Campinas (PUC-Campinas).
Download Paper | Download Certificate | Source repository