Clasificación automática de eventos en videos de fútbol utilizando redes convolucionales profundas
Date
2024-06-21
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Pontificia Universidad Católica del Perú
Abstract
La forma en que las nuevas generaciones consumen y experimentan el
deporte especialmente el fútbol, ha generado oportunidades significativas en la
difusión de contenidos deportivos en plataformas no tradicionales y en formatos
más reducidos. Sin embargo, recuperar información con contenido semántico de
eventos deportivos presentados en formato de video no es tarea sencilla y
plantea diversos retos. En videos de partidos de fútbol entre otros retos tenemos:
las posiciones de las cámaras de grabación, la superposición de eventos o
jugadas y la ingente cantidad de fotogramas disponibles.
Para generar resúmenes de calidad y que sean interesantes para el
aficionado, en esta investigación se desarrolló un sistema basado en Redes
Convolucionales Profundas para clasificar automáticamente eventos o jugadas
que ocurren durante un partido de fútbol.
Para ello se construyó una base de datos a partir de videos de fútbol
descargados de SoccerNet, la cual contiene 1,959 videoclips de 5 eventos:
saques de meta, tiros de esquina, faltas cometidas, tiros libres indirectos y
remates al arco.
Para la experimentación se utilizó técnicas de preprocesamiento de video,
una arquitectura convolucional propia y se aplicó transfer learning con modelos
como ResNet50, EfficientNetb0, Visión Transformers y Video Visión
Transformers.
El mejor resultado se obtuvo con una EfficentNetb0 modificada en su
primera capa convolucional, con la cual se obtuvo un 91% accuracy, y una
precisión de 100% para los saques de meta, 92% para los tiros de esquina, 90% para las faltas cometidas, 88% para los tiros libres indirectos y 89% para los
remates al arco.
The way the new generations consume and experiment sports, especially soccer, has generated significant opportunities in the dissemination of sports content on non-traditional platforms and in smaller formats. However, retrieving information with semantic content of sporting events presented in video format is not an easy task and poses several challenges. In videos of soccer matches, among other challenges we have: the positions of the recording cameras, the overlapping of events or plays and the huge amount of frames available. In order to generate quality summaries that are interesting for the fan, this research developed a system based on Deep Convolutional Networks to automatically classify events or plays that occur during a soccer match. For this purpose, a database was built from soccer videos downloaded from SoccerNet, which contains 1,959 video clips of 5 events: goal kicks, corner kicks, fouls, indirect free kicks and shots on target. For the experimentation, video preprocessing techniques were used, a proprietary convolutional architecture and transfer learning was applied with models such as ResNet50, EfficientNetb0, Vision Transformers and Video Vision Transformers. The best result was obtained with a modified EfficentNetb0 in its first convolutional layer, with which 91% accuracy was obtained, and an accuracy of 100% for goal kicks, 92% for corner kicks, 90% for fouls committed, 88% for indirect free kicks and 89% for shots on target.
The way the new generations consume and experiment sports, especially soccer, has generated significant opportunities in the dissemination of sports content on non-traditional platforms and in smaller formats. However, retrieving information with semantic content of sporting events presented in video format is not an easy task and poses several challenges. In videos of soccer matches, among other challenges we have: the positions of the recording cameras, the overlapping of events or plays and the huge amount of frames available. In order to generate quality summaries that are interesting for the fan, this research developed a system based on Deep Convolutional Networks to automatically classify events or plays that occur during a soccer match. For this purpose, a database was built from soccer videos downloaded from SoccerNet, which contains 1,959 video clips of 5 events: goal kicks, corner kicks, fouls, indirect free kicks and shots on target. For the experimentation, video preprocessing techniques were used, a proprietary convolutional architecture and transfer learning was applied with models such as ResNet50, EfficientNetb0, Vision Transformers and Video Vision Transformers. The best result was obtained with a modified EfficentNetb0 in its first convolutional layer, with which 91% accuracy was obtained, and an accuracy of 100% for goal kicks, 92% for corner kicks, 90% for fouls committed, 88% for indirect free kicks and 89% for shots on target.
Description
Keywords
Futbol, Procesamiento de imágenes digitales, Redes neuronales (Computación)
Citation
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as info:eu-repo/semantics/embargoedAccess