PCA and NMF are both dimension reduction methods that can be used for BIG DATA problems. They use matrix algebra to transform the original feature space in the dataset to a much lower dimensional space while keeping the majority of the information in the dataset. To make this article more vivid, I would like to use image data to show the functionality and difference about these two methods. In fact, PCA and NMF are very natural transformation methods for image data, because image data has large feature space (width x height x channel) and pixels in the image are also highly correlated.
Let’s take a look at this flower dataset first.
We have roses and sunflowers in this image dataset. Each image is represented as a 128x128x3 numpy array. So if you flatten each image, you will get a 1D array with 49152 features, which is a REALLY LARGE number in terms of features.
But no worries, we can use PCA and NMF to reduce dimension to these images. Let’s start with PCA.
PCA (Principal Component Analysis)
PCA uses orthogonal transformations to transform features to a set of uncorrelated components. Each component is a new feature in the transformed dataset. Each new feature is a linear combination of the original features. New features (principal components) are ranked in decreasing order by how much variance they explained in the data.
Algorithm
- Center the features (Subtracting the mean of each column).
- Find the first component that explains most of the variation in the data. Specifically, create a new variable z1 using a vector of weights denoted by w1, choose the weights that can maximize the variance of z1. (Constraint w1 to have a norm of 1). Please remember z1 is a linear combination of original features and weights (w1).
- Do this again to find z2 and w2, but this time we'll constrain these new combinations to be uncorrelated with the previous one (Orthogonal transformation assures the uncorrelated components), and z2 will explain the second most variation in the data.
- Keep repeating this process until we have p new features (z) and our weights (w).
Apply PCA to the flower image data
NMF (Non-negative Matrix Factorization)
Algorithm
- Choose the rank of the matrix that will approximate X. The rank is denoted by m. This will be the number of columns in W and the number of rows in H. Smaller m will use less data to approximate X, but a larger m will give better approximations. W is analogous to the principal components in PCA, and H is like the matrix of loadings.
- In this optimization process, we are actually minimizing the Euclidean distance between the original and the approximation, which is called the Frobenius Norm (Constrain all the values in the matrices as non-negative).
- The resulting matrix W is our new lower dimensional representation of X. We can use these as features. H is the matrix of loadings that we can dig into to understand the decomposition and reconstruct the approximation to X.
Apply NMF to the flower image data
NMF is a slower algorithm than PCA, so I chose to start with a small rank (m = 100). Unlike PCA, NMF doesn't have a similar interpretation for the explained variance, but we can get an estimate of the reconstruction error, which measures how much our factorized or decomposed matrices approximate the original one.
Similarly, we can take a look at the components and the reconstructed image after NMF transformation.
Summary & Caution
Both PCA and NMF can be used for any data when we want to reduce the number of features. But when you're working with non-negative data like images or audios, NMF can be better as it can be more interpretable than PCA when you want to inspect the components.
PCA should only be used for numeric data because the transformation doesn't make as much sense if you apply it to categorical data and dummy variables.
Both PCA and NMF are unsupervised learning algorithms, which means they don’t consider the response variables at all. So if you apply them in a supervised learning task, they don't necessarily give you better predictions because the data they left out might be what’s actually important to the prediction.
Comments
Post a Comment