Detección de ciberbullying en español para el dominio de corpus de texto teatrales aplicado a redes sociales usando transferencia de aprendizaje y validación adversarial
Date
2024-11-20
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Pontificia Universidad Católica del Perú
Abstract
El aprendizaje de los modelos de detección de ciberbullying en redes sociales
depende significativamente del conjunto de datos en cual fue entrenado lo que puede
limitar su capacidad de generalización a otros conjuntos de datos. Este estudio propone
un enfoque innovador utilizando transferencia de aprendizaje. Se desarrolló un modelo
robusto de detección de ciberbullying basado en guiones teatrales, que ofrecen contextos
ricos y variados. Para ello, se creó un corpus en español a partir de estos guiones, el cual
fue meticulosamente etiquetado por expertos. Posteriormente, el modelo fue entrenado
con este corpus para establecer una base de conocimiento que se aplicó luego a otros
corpus de redes sociales. Los resultados mostraron una exactitud del 83% en las pruebas
realizadas. Complementamos dicho modelo con una validación utilizando ejemplos
adversarios, a partir de técnicas de data aumentada generamos más oraciones para
fortalecer su capacidad de generalización, mejorando su desempeño tanto en su corpus
como en distintos dominios de ciberbullying.
The learning of cyberbullying detection models in social networks depends significantly on the data set on which it was trained, which can limit its generalization capacity to other data sets. This study proposes an innovative approach using transfer learning. A robust cyberbullying detection model was developed based on theatrical scripts, which offer rich and varied contexts. To do this, a Spanish corpus was created from these scripts, which experts meticulously labeled. The model was then trained with this corpus to establish a knowledge base that was then applied to other social media corpora. The results showed an accuracy of 83% in the tests carried out. We complement this model with a validation using adversarial examples, using augmented data techniques we generate more sentences to strengthen its generalization capacity, improving its performance both in its corpus and in different cyberbullying domains.
The learning of cyberbullying detection models in social networks depends significantly on the data set on which it was trained, which can limit its generalization capacity to other data sets. This study proposes an innovative approach using transfer learning. A robust cyberbullying detection model was developed based on theatrical scripts, which offer rich and varied contexts. To do this, a Spanish corpus was created from these scripts, which experts meticulously labeled. The model was then trained with this corpus to establish a knowledge base that was then applied to other social media corpora. The results showed an accuracy of 83% in the tests carried out. We complement this model with a validation using adversarial examples, using augmented data techniques we generate more sentences to strengthen its generalization capacity, improving its performance both in its corpus and in different cyberbullying domains.
Description
Keywords
Ciberacoso, Redes sociales en línea, Transferencia de aprendizaje, Aprendizaje automático (Inteligencia artificial)
Citation
Collections
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as info:eu-repo/semantics/openAccess