Each day, it increases the frequency we hear terms such as artificial intelligence (AI), machine learning (ML) and deep learning (DP). A lot of time these terms are used interchangeably and the relationship between them is not always clear. Artificial intelligence is the intelligence demonstrated by machines to creatively solve problems. Machine learning is a subset of artificial intelligence that enables computer programs to learn from experience without being explicitly programmed. There are many methods within machine learning, and deep learning is a subset of a family of methods known as artificial neural networks (ANN).
One of the restrictions of deep learning relates to the limited understanding around the theory of these algorithms. Many times, deep learning seems to be a black box where the box learns complex correlations and provides a specific output to given inputs. However, the contents of the box and the representation of the causality between the inputs and the outputs is not entirely clear. So, the learned patterns or strategies could be more superficial than they are.
Given the limitations of deep learning, it becomes interesting to encounter a new proposal based on a widely known model, such as sparse coding, that intends to explain the success of deep learning models while addressing those challenges.
In the article “Theoretical Foundations of Deep Learning via Sparse Representations” by Vardan Papyan, Yaniv Romano, Jeremias Sulam, and Michael Elad, published on IEEE Signal Processing Magazine, July 2018, it is proposed a new theory that explains deep convolutional neural networks using sparse representations as a model. The name of the model is multilayered-convolutional-sparse-coding (ML-CSC).
Any proposal to theoretically explain deep learning requires the consideration of three aspects: architecture, algorithms, and data. The data aspect is covered by creating the model directly from it. This results in the architecture aspect, which emerges from the model. Meanwhile, the instructions to create such model forms the algorithm aspect. All these can be incorporated using a sparse representation model.
A sparse representation is a generative model, meaning that a signal, which can be of any type, is generated by a combination procedure. Using generative models permits a systematic design of an algorithm and having a systematic algorithm design facilitates the analysis of the model. Thus, the ML-CSC model proposed by the authors shares characteristics of a convolutional neural network (CNN), which is a deep learning algorithm.
One fundamental property shared by any meaningful source of data is that it possesses structure. Having a structure enables any sort of assumption about it. It is possible to process structured data because unstructured data (noise) is not interesting. The structure in the data is identified by imposing a model on it, which fulfills a series of properties. The selection of the properties to use on the model will depend on the application. Some of the applications of sparse representations are signal denoising, audio/image/video processing and object recognition.
The sparse coding model, in its simplest form, treats a signal as a model with the following equation form:
Y = Ds + E
This model describes any signal Y as a linear combination of basis functions plus noise E. A set of basis functions is known as a dictionary D, in which the number of columns is equal to the number of possible functions and the number of rows is equal to the length of the signal. The vector s has the same length that the signal and it is sparse, meaning that most of its values are zeroes (0) and there are just some sporadic ones (1). Thus, the product of the dictionary with the sparse vector describes the estimated signal structure.
A key advantage of a sparse representation is the universal ability to describe information faithfully and effectively. This is due to the expanding number of basis functions that any given dictionary can support. However, the generation of the sparse representation comes with two major challenges. One is to find the sparse vector s and the other is to find the dictionary D. There are many algorithms to solve each challenge, which we will not list here, but we briefly describe the overall procedure to create a sparse representation.
A sparse representation can be created via an iterative process in which first the dictionary is kept fixed and learns the sparse vector, and then the sparse vector is fixed and updates the dictionary. This procedure continues until reaching a stopping criterion, such as a desired signal-to-noise ratio. The result is the learning of the input signal’s underlying structure.
A special case of the sparse model is known as the convolutional sparse coding (CSC) model. Under this model, a signal is assumed to be constructed of many instances of small basis functions (filters). More precisely, the flipped version of these functions is shifted along the extent of all the dimensions of the signal, which is the reason it is called convolutional. Thus, the structured dictionary matrix becomes a union of banded and circular matrices equal to the number of the small basis functions.
After the previous description, the authors propose a new extension of the sparse model known as multi-layered convolutional sparse coding (ML-CSC). Under the overall sparse model, a signal emerges from the dictionary and the sparse model. However, a further layer can be added where the sparse vector emerges from a further dictionary and sparse vector and further layers can be applied to the model under the chain rule. This derivation implies that not all sparse vectors are necessarily sparse except for the deep-down layer.
The ML-CSC model and the deep learning convolutional neural networks share many similarities. Both these algorithms use convolution operations, involve sparsifying operations, and depend on data to create a fit. The figure below shows the similarity of their equations in which the multi-layered algorithm to create the sparse representation is equivalent to the forward pass of a convolutional neural network.
Figure 1-Equations of the ML-CSC and CNN algorithms
The model proposed by the authors of this manuscript provides an alternative to the study of CNNs that permits the analysis of the performance of a given architecture while suggesting other architectures based on the sparse decomposition paradigm to be used. Further work is still in development in these areas, in particular, the refinement of the dictionary learning and sparse representation algorithms, as well furthering the understanding of the theory behind them and how to deploy this knowledge into practice.
If you want to read more, the complete paper can be accessed in this link: https://ieeexplore.ieee.org/abstract/document/8398588