Use Flowers to Analyze PCA and NMF

PCA and NMF are both dimension reduction methods that can be used for BIG DATA problems. They use matrix algebra to transform the original feature space in the dataset to a much lower dimensional space while keeping the majority of the information in the dataset. To make this article more vivid, I would like to use image data to show the functionality and difference about these two methods. In fact, PCA and NMF are very natural transformation methods for image data, because image data has large feature space (width x height x channel) and pixels in the image are also highly correlated.

Let’s take a look at this flower dataset first.

The flower data set I used in this article was from Kaggle (https://www.kaggle.com/alxmamaev/flowers-recognition)

We have roses and sunflowers in this image dataset. Each image is represented as a 128x128x3 numpy array. So if you flatten each image, you will get a 1D array with 49152 features, which is a REALLY LARGE number in terms of features.

But no worries, we can use PCA and NMF to reduce dimension to these images. Let’s start with PCA.

PCA (Principal Component Analysis)

PCA uses orthogonal transformations to transform features to a set of uncorrelated components. Each component is a new feature in the transformed dataset. Each new feature is a linear combination of the original features. New features (principal components) are ranked in decreasing order by how much variance they explained in the data.

Algorithm

Center the features (Subtracting the mean of each column).
Find the first component that explains most of the variation in the data. Specifically, create a new variable z1 using a vector of weights denoted by w1, choose the weights that can maximize the variance of z1. (Constraint w1 to have a norm of 1). Please remember z1 is a linear combination of original features and weights (w1).
Do this again to find z2 and w2, but this time we'll constrain these new combinations to be uncorrelated with the previous one (Orthogonal transformation assures the uncorrelated components), and z2 will explain the second most variation in the data.
Keep repeating this process until we have p new features (z) and our weights (w).

Apply PCA to the flower image data

As I applied PCA to the flower data, I used the first 200 principal components as the new features and transformed the original dataset. The first 200 principal components explained over 80% of the variance in the data. It means I successfully reduced the feature space from 49152 to 200, while keeping over 80% information in the data.

The first 8 principal components

You may be curious about how much the original data changed after PCA was applied, because in this case, I missed out about ~20% of the information. To answer this curiosity, we can reconstruct one of the flowers using the transformed data and compare it with its original image.

Compare a flower image before and after the PCA transformation

The reconstructed flower image is more blurry, but we can still see discernible flower patterns in the image after PCA transformation.

NMF (Non-negative Matrix Factorization)

NMF is another dimension reduction method. It works by taking the data matrix X and decomposing it into two new matrices, W and H , so that X can be approximately represented as WH (X ~ WH). So intrinsically, NMF is an optimization problem to find W and H that best approximate X.

NMF is useful for non-negative data, for example, image data, audio data, or text data, etc.

Algorithm

Choose the rank of the matrix that will approximate X. The rank is denoted by m. This will be the number of columns in W and the number of rows in H. Smaller m will use less data to approximate X, but a larger m will give better approximations. W is analogous to the principal components in PCA, and H is like the matrix of loadings.
In this optimization process, we are actually minimizing the Euclidean distance between the original and the approximation, which is called the Frobenius Norm (Constrain all the values in the matrices as non-negative).
The resulting matrix W is our new lower dimensional representation of X. We can use these as features. H is the matrix of loadings that we can dig into to understand the decomposition and reconstruct the approximation to X.

Apply NMF to the flower image data

NMF is a slower algorithm than PCA, so I chose to start with a small rank (m = 100). Unlike PCA, NMF doesn't have a similar interpretation for the explained variance, but we can get an estimate of the reconstruction error, which measures how much our factorized or decomposed matrices approximate the original one.

Similarly, we can take a look at the components and the reconstructed image after NMF transformation.

The first 8 components in the H matrix

Compare a flower image before and after the NMF transformation

The reconstructed flower image is very blurry, but we can also see some flower patterns in the image after NMF transformation. As aforementioned, if we increase the rank, we will see flowers closer to the original image in the reconstructed image.

Summary & Caution

Both PCA and NMF can be used for any data when we want to reduce the number of features. But when you're working with non-negative data like images or audios, NMF can be better as it can be more interpretable than PCA when you want to inspect the components.

PCA should only be used for numeric data because the transformation doesn't make as much sense if you apply it to categorical data and dummy variables.

Both PCA and NMF are unsupervised learning algorithms, which means they don’t consider the response variables at all. So if you apply them in a supervised learning task, they don't necessarily give you better predictions because the data they left out might be what’s actually important to the prediction.

One Punch Dan （一拳丹）

Search This Blog