A software framework for learning representations from data
The core idea of the Lrn2Cre8 project is that a general computational approach to creativity involves developing representations and concepts of the outside world, rather than taking for granted human-developed concepts. The area of machine learning that concerns representation learning has witnessed great progress in the past decade, particularly the advance of deep learning, providing effective methods for unsupervised learning of hierarchically structured representations. The purpose of the prototype presented here is to gather a selection of these methods in a software framework. Moreover, the framework aims to facilitate practical work flows involving representation learning. It does so by formalising the process of input data preparation, by allowing declarative specifications of a representation learning setup, and by providing functionality for several use cases of learned representations, including visualisation, prediction, and information retrieval. As such, the presented framework reduces the effort required to employ state-of-the-art representation learning methods in the investigation of computational creativity.
This article gives a brief description of the lrn2 framework. For a detailed documentation, see the lrn2 web documentation.
Download the lrn2 software framework.
Overview
The structure of lrn2 is designed to facilitate the creation of an end-to-end work flow. This work flow involves the following three phases:
- Reading and preparing data from files
- Defining and training models to learning representations from the data
- Using the learned representations in specific use case scenarios
1. Input data processing
The models for representation learning offered in lrn2 all work on basic representations of the data as vectors of either binary, integer, or real values, depending on the model. The question to be addressed is thus, given a data set stored in one or more digital files, how to convert the data to a format necessary for representation learning (i.e. sequences of numeric vectors).
The formalization consists in the separation of the input data processing into two components:
- data loading function a function for loading data from files
- viewpoints a set of viewpoint objects that produce a vectorized representation of some aspect of the data returned by the data loading function, in the form of one or more instances
To maintain maximal flexibility, there are no constraints on the format of the data that the data loading function returns, other than that a viewpoint should be able to produce a vectorized representation of that data, and in order for viewpoints to be combined, they should produce the same number of data instances for a particular raw data object.
For example, in the case of MIDI files, the raw data object may be a matrix with columns representing MIDI pitches, durations, and MIDI velocities for each note in the file. A viewpoint might take the pitches of each note, and represent each pitch as a vector using a one-out-of-K binary representation (where MIDI pitch number i is encoded by setting the i-th bit to 1).
Finally, we include n-gram segmentation of the viewpoint data as a standard part of input processing. Static data such as images (where the n-gram approach is not meaningful), are treated as a special case of non-static data, with an n-gram size of one. In this way, there is no need to make separate provisions for static and non-static data.
2. Model definition for representation learning
The current version of lrn2 provides ready-to-use implementations of (stacked) auto-encoders and RBMs, in several variations. Although future versions of lrn2 should offer similarly easy access to models for sequential data, priority has been given to the simpler models for static data. In part, this is because most models that deal with non-static data are extensions, elaborations, or generalizations of those simpler models. Furthermore, using an n-gram approach, the currently included models still allow for modelling sequential data.
A particular model with particular settings can be instantiated by loading a configuration file of the INI format. Such a file may contain an entry like:
[[RBM1]] | |
type = RBM_STD | # the type of model |
n_hidden = 2000 | # the number of units in the hidden layer |
epochs = 200 | # the number of times all data is iterated for training |
learning_rate = 0.005 | # how fast do the parameters of the model adapt |
batch_size = 10 | # the number of data instances used for a parameter update |
n_cd = 1 | # the number of Gibbs sampling steps to compute the update |
wl1_param = 0.0 | # penalty for the L1 norm of parameters |
wl2_param = 0.01 | # penalty for the L2 norm of parameters |
momentum = 0.5 | # how strongly do past updates influence the current update |
An RBM with the sparsity constraint of either Goh et al. [2010] or Lee et al. [2008] can be instantiated by specifying the type RBM_GOH or RBM_LEE, respectively. In the same way, RBM variants with Gaussian visible units and with linear hidden units can be instantiated. Finally, there is an RBM implementation that is trained with Persistent Contrastive Divergence [Tieleman, 2008], possibly with the fast weights adaptation [Tieleman and Hinton, 2009]. The training progress of a model is visualized by plotting several aspects of the model (weight/bias distributions, filters, reconstructions of data samples), at intervals specified by the user.
Lastly, back-propagation is available as a method for fine-tuning models using supervised data (i.e. data instances with labels).
3. Application of learned features
After defining and training a model architecture to learn representations from data, a user will likely want to apply the trained model to a perform particular task, or possibly just inspect the type of repre- sentations learned. The lrn2 framework provides functionality for a few common use cases for learned representations. We will discuss each of them briefly:
- Prediction: One of the major applications of representation learning in general is prediction, usually in the form of classification or regression. In this case the learned representations are used as features to predict some target variable. Since a multitude of classification/regression methods is already available through software like WEKA [Hall et al., 2009], lrn2 facilitates its use by projecting data sets into a learned feature space, and exporting the feature data (possibly along with user specified labels) to an ARFF file, which can be read by WEKA. Note that in case of such supervised scenarios, any available labelled data can already be used in the model training phase, for fine-tuning a model for prediction, after unsupervised training (see Section 2).
- Visualization: A question that often arises when learning representations is whether the data, when projected into the learned space, exhibits any meaningful structure. To this end, it is often helpful to project the learned space into a 2 or 3 dimensional space for visualization. lrn2 uses PCA to realize this, and can either plot data directly for a given data set, or write the low dimensional coordinates to a file for visualization using external tools (e.g. gnuplot). In fact, to make the low-rank PCA approximation on large data sets/feature spaces computationally feasible, we use a Randomized PCA [Compton, 2010], which greedily finds approximations of the strongest components. An alternative visualization is offered by using Stochastic Neighbour Embedding (SNE) [van der Maaten, 2013], which is useful in case the neighbour relationships in the feature space cannot be faithfully represented in the low-dimensional embedding.
- Retrieval: Learned representations can also be useful for fast retrieval of items from a database. In this scenario, learned representations are typically used to generate a binary hash code for an item, which can be a document [Salakhutdinov and Hinton, 2009], or a song [Schlüter, 2013], for instance. A crucial aspect of the learned hash codes is that semantically related items have similar hash codes (in terms of their Hamming distance). lrn2 provides a method to create a database from given data sets, such that data instances are indexed by their learned representation (in binarized form). Given a query item, the best matching candidate items are then retrieved from the database. This method is known as semantic hashing [Salakhutdinov and Hinton, 2009].
Bibliography
Compton, R. (2010). A Randomized Algorithm for PCA. pages 1–4.
Goh, H., Thome, N., and Cord, M. (2010). Biasing restricted Boltzmann machines to manipulate latent
selectivity and sparsity. NIPS workshop on deep learning and Unsupervised Feature Learning.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1).
Lee, H., Ekanadham, C., and Ng, A. Y. (2008). Sparse deep belief net model for visual area V2. In
Advances in Neural Information Processing Systems 20, pages 873–880.
Salakhutdinov, R. and Hinton, G. (2009). Semantic hashing. International Journal of Approximate Reasoning.
Schlüter, J. (2013). Learning binary codes for efficient large-scale music similarity search. In Proceed- ings of the 14th International Society for Music Information Retrieval Conference, pages 1–6.
Tieleman, T. (2008). Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient. In Proceedings of the 25th international conference on Machine learning, pages 1064– 1071. ACM New York, NY, USA.
Tieleman, T. and Hinton, G. (2009). Using Fast Weights to Improve Persistent Contrastive Divergence. In Proceedings of the 26th international conference on Machine learning, pages 1033–1040. ACM New York, NY, USA.
van der Maaten, L. (2013). Barnes-Hut-SNE. arXiv.org, page 3342, 1301.3342.