Procesamiento de Señales e Imágenes Digitales.

Permanent URI for this collectionhttp://98.81.228.127/handle/20.500.12404/5040

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    Efficient algorithms for convolutional dictionary learning via accelerated proximal gradient
    (Pontificia Universidad Católica del Perú, 2019-04-05) Silva Obregón, Gustavo Manuel; Rodríguez Valderrama, Paul Antonio
    Convolutional sparse representations and convolutional dictionary learning are mathematical models that consist in representing a whole signal or image as a sum of convolutions between dictionary filters and coefficient maps. Unlike the patch-based counterparts, these convolutional forms are receiving an increase attention in multiple image processing tasks, since they do not present the usual patchwise drawbacks such as redundancy, multi-evaluations and non-translational invariant. Particularly, the convolutional dictionary learning (CDL) problem is addressed as an alternating minimization between coefficient update and dictionary update stages. A wide number of different algorithms based on FISTA (Fast Iterative Shrinkage-Thresholding Algorithm), ADMM (Alternating Direction Method of Multipliers) and ADMM consensus frameworks have been proposed to efficiently solve the most expensive steps of the CDL problem in the frequency domain. However, the use of the existing methods on large sets of images is computationally restricted by the dictionary update stage. The present thesis report is strategically organized in three parts. On the first part, we introduce the general topic of the CDL problem and the state-of-the-art methods used to deal with each stage. On the second part, we propose our first computationally efficient method to solve the entire CDL problem using the Accelerated Proximal Gradient (APG) framework in both updates. Additionally, a novel update model reminiscent of the Block Gauss-Seidel (BGS) method is incorporated to reduce the number of estimated components during the coefficient update. On the final part, we propose another alternative method to address the dictionary update stage based on APG consensus approach. This last method considers particular strategies of theADMMconsensus and our first APG framework to develop a less complex solution decoupled across the training images. In general, due to the lower number of operations, our first approach is a better serial option while our last approach has as advantage its independent and highly parallelizable structure. Finally, in our first set of experimental results, which is composed of serial implementations, we show that our first APG approach provides significant speedup with respect to the standard methods by a factor of 1:6 5:3. A complementary improvement by a factor of 2 is achieved by using the reminiscent BGS model. On the other hand, we also report that the second APG approach is the fastest method compared to the state-of-the-art consensus algorithm implemented in serial and parallel. Both proposed methods maintain comparable performance as the other ones in terms of reconstruction metrics, such as PSNR, SSIM and sparsity, in denoising and inpainting tasks.
  • Thumbnail Image
    Item
    Object detection in videos using principal component pursuit and convolutional neural networks
    (Pontificia Universidad Católica del Perú, 2018-05-03) Tejada Gamero, Enrique David; Rodríguez Valderrama, Paul Antonio
    Object recognition in videos is one of the main challenges in computer vision. Several methods have been proposed to achieve this task, such as background subtraction, temporal differencing, optical flow, particle filtering among others. Since the introduction of Convolutonal Neural Networks (CNN) for object detection in the Imagenet Large Scale Visual Recognition Competition (ILSVRC), its use for image detection and classification has increased, becoming the state-of-the-art for such task, being Faster R-CNN the preferred model in the latest ILSVRC challenges. Moreover, the Faster R-CNN model, with minimum modifications, has been succesfully used to detect and classify objects (either static or dynamic) in video sequences; in such setup, the frames of the video are input “as is” i.e. without any pre-processing. In this thesis work we propose to use Robust PCA (RPCA, a.k.a. Principal Component Pursuit, PCP), as a video background modeling pre-processing step, before using the Faster R-CNN model, in order to improve the overall performance of detection and classification of, specifically, the moving objects. We hypothesize that such pre-processing step, which segments the moving objects from the background, would reduce the amount of regions to be analyzed in a given frame and thus (i) improve the classification time and (ii) reduce the error in classification for the dynamic objects present in the video. In particular, we use a fully incremental RPCA / PCP algorithm that is suitable for real-time or on-line processing. Furthermore, we present extensive computational results that were carried out in three different platforms: A high-end server with a Tesla K40m GPU, a desktop with a Tesla K10m GPU and the embedded system Jetson TK1. Our classification results attain competitive or superior performance in terms of Fmeasure, achieving an improvement ranging from 3.7% to 97.2%, with a mean improvement of 22% when the sparse image was used to detect and classify the object with the neural network, while at the same time, reducing the classification time in all architectures by a factor raging between 2% and 25%.