Generación de imágenes de acciones específicas de una persona utilizando aprendizaje profundo
Date
2024-04-16
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Pontificia Universidad Católica del Perú
Abstract
Desde que aparecieron las redes GAN, se han realizado varias investigaciones
sobre cómo generar imágenes en diversos ámbitos, como la generación de imágenes,
conversión de imágenes, síntesis de videos, síntesis de imágenes a partir de textos y
predicción de cuadros de videos. Basándose mayormente en mejorar la generación de
imágenes de alta resolución y la reconstrucción o predicción de datos.
El propósito de este trabajo es implementar las redes GAN en otros ámbitos, como
la generación de imágenes de entidades realizando una acción. En este caso se consideró
3 acciones de personas, que son los ejercicios de Glúteo, Abdomen y Cardio. En
primer lugar, se descargaron y procesaron las imágenes de YouTube, el cual incluye
una secuencia de imágenes de cada acción. Posteriormente, se separó dos grupos
de imágenes, de una sola persona, y de personas diferentes realizando las acciones.
En segundo lugar, se seleccionó el modelo InfoGAN para la generación de imágenes,
teniendo como evaluador de rendimiento, la Puntuación Inicial (PI). Obteniendo como
resultados para el primer grupo, una puntuación máxima de 1.28 y en el segundo
grupo, una puntuación máxima de 1.3.
En conclusión, aunque no se obtuvo el puntaje máximo de 3 para este evaluador de
rendimiento, debido a la cantidad y calidad de las imágenes. Se aprecia, que el modelo
si logra diferenciar los 3 tipos de ejercicios, aunque existen casos donde se muestran
incorrectamente las piernas, los brazos y la cabeza.
Since the appearance of GAN networks, various investigations have been carried out on how to generate images in various fields, such as image generation, image conversion, video synthesis, image synthesis from text, and video frame prediction. Based mostly on improving the generation of high resolution images and the reconstruction or prediction of data. The purpose of this work is to implement GAN networks in other areas, such as the generation of images of entities performing an action. In this case, 3 actions of people were considered, which are the Gluteus, Abdomen and Cardio exercises. First, the images from YouTube were downloaded and processed, which includes a sequence of images of each action. Subsequently, two groups of images were separated, of a single person, and of different people performing the actions. Secondly, the InfoGAN model was selected for image generation, having the Initial Score (PI) as a performance evaluator. Obtaining as results for the first group, a maximum score of 1.28 and in the second group, a maximum score of 1.3. In conclusion, although the maximum score of 3 was not obtained for this performance tester, due to the quantity and quality of the images. It can be seen that the model is able to differentiate the 3 types of exercises, although there are cases where the legs, arms and head are shown incorrectly.
Since the appearance of GAN networks, various investigations have been carried out on how to generate images in various fields, such as image generation, image conversion, video synthesis, image synthesis from text, and video frame prediction. Based mostly on improving the generation of high resolution images and the reconstruction or prediction of data. The purpose of this work is to implement GAN networks in other areas, such as the generation of images of entities performing an action. In this case, 3 actions of people were considered, which are the Gluteus, Abdomen and Cardio exercises. First, the images from YouTube were downloaded and processed, which includes a sequence of images of each action. Subsequently, two groups of images were separated, of a single person, and of different people performing the actions. Secondly, the InfoGAN model was selected for image generation, having the Initial Score (PI) as a performance evaluator. Obtaining as results for the first group, a maximum score of 1.28 and in the second group, a maximum score of 1.3. In conclusion, although the maximum score of 3 was not obtained for this performance tester, due to the quantity and quality of the images. It can be seen that the model is able to differentiate the 3 types of exercises, although there are cases where the legs, arms and head are shown incorrectly.
Description
Keywords
Procesamiento de imágenes digitales, Procesamiento de datos, Aprendizaje profundo
Citation
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as info:eu-repo/semantics/openAccess