Wednesday, July 27, 2011

PCA and image compression with numpy

In the previous post we have seen the princomp function. This function performs principal components analysis (PCA) on the n-by-p data matrix and uses all the p principal component to computed the principal component scores. In this new post, we will see a modified version of the princomp where the representation of the original data in the in the principal component space is computed with less than p principal components:
from numpy import mean,cov,cumsum,dot,linalg,size,flipud

def princomp(A,numpc=0):
 # computing eigenvalues and eigenvectors of covariance matrix
 M = (A-mean(A.T,axis=1)).T # subtract the mean (along columns)
 [latent,coeff] = linalg.eig(cov(M))
 p = size(coeff,axis=1)
 idx = argsort(latent) # sorting the eigenvalues
 idx = idx[::-1]       # in ascending order
 # sorting eigenvectors according to the sorted eigenvalues
 coeff = coeff[:,idx]
 latent = latent[idx] # sorting eigenvalues
 if numpc < p and numpc >= 0:
  coeff = coeff[:,range(numpc)] # cutting some PCs if needed
 score = dot(coeff.T,M) # projection of the data in the new space
 return coeff,score,latent
The following code uses the new version of the princomp to compute the PCA of a matrix that represents an image in gray scale. The PCA is computed ten times with an increasing number of principal components. The script show the images reconstructed using less than 50 principal components (out of 200).
from pylab import imread,subplot,imshow,title,gray,figure,show,NullLocator
A = imread('shakira.jpg') # load an image
A = mean(A,2) # to get a 2-D array
full_pc = size(A,axis=1) # numbers of all the principal components
i = 1
dist = []
for numpc in range(0,full_pc+10,10): # 0 10 20 ... full_pc
 coeff, score, latent = princomp(A,numpc)
 Ar = dot(coeff,score).T+mean(A,axis=0) # image reconstruction
 # difference in Frobenius norm
 dist.append(linalg.norm(A-Ar,'fro'))
 # showing the pics reconstructed with less than 50 PCs
 if numpc <= 50:
  ax = subplot(2,3,i,frame_on=False)
  ax.xaxis.set_major_locator(NullLocator()) # remove ticks
  ax.yaxis.set_major_locator(NullLocator())
  i += 1 
  imshow(flipud(Ar))
  title('PCs # '+str(numpc))
  gray()

figure()
imshow(flipud(A))
title('numpc FULL')
gray()
show()
The resulting images:


We can see that 40 principal components are enough to reconstruct the original image.


At the end of this experiment, we can plot the distance of the reconstructed images from the original image in Frobenius norm (red curve) and the cumulative sum of the eigenvalues (blue curve). Recall that the cumulative sum of the eigenvalues shows the level of variance accounted by each of the corresponding eigenvectors. On the x axis there is the number of eigenvalues/eigenvectors used.
from pylab import plot,axis
figure()
perc = cumsum(latent)/sum(latent)
dist = dist/max(dist)
plot(range(len(perc)),perc,'b',range(0,full_pc+10,10),dist,'r')
axis([0,full_pc,0,1.1])
show()

13 comments:

  1. I know this is an old post ( and I appreciated it as a reference ), but I was amused at the following:
    In you code you
    from numpy import mean,cov,cumsum,dot,linalg,size,flipud
    from pylab import *

    The from pylab import * overwrites *everything* you imported from numpy in the global namespace. =)

    ReplyDelete
    Replies
    1. Hi, thanks for the comment. I usually extract my snippets from bigger programs and I have to rewrite the import before the publication.

      Delete
  2. JustGlowing, you say in your program:

    if numpc < p or numpc >= 0: # then cut some PCs

    Should it be "and" instead of "or" there?

    ReplyDelete
  3. Could you please explain about this line:

    Ar = dot(coeff,score).T+mean(A,axis=0) # image reconstruction

    I confused where you add the old subtracted mean of the original matrix to the dot product. The Original data is already projected onto a new vector space, are we allowed to add that mean subtracted from the original data back to the new scores? I also do not understand why should we multiply the scores again to the eigenvectors!? By this time We have the sorted scores and taking the first, second ets scores will give us the projected data onto the PC1 and PC2 ... so why didn't you just take the first 'n' scores in order to show the data in the reduced space?

    ReplyDelete
    Replies
    1. Hi Sinooshka,

      In the line you pointed we try to rebuild the original data matrix. The motivation that lead to that line of code are the following. To compute the coefficient matrix we use the following equation:

      P = V*A (here, P is coeff, V is latent and A is our data matrix)

      To get A back, one could use

      A = V^-1*P

      but, since V is orthonormal (which means that V^1=V^T), we can compute A as

      A = V^T*P (which corresponds to dot(coeff,score).T)

      Now, since the PCA requires that the data are centered, the mean is subtracted to all the data points in the function princomp. Then, to rebuild the data, we need to add again the mean.

      Delete
  4. Thank you For the clarification JustGlowing !
    I also have another question :
    What if we want to apply PCA on the Original Image with all 3 band? In that case how we are going to define our latent variables and consequently the coefficients ?

    ReplyDelete
    Replies
    1. In this case you have three different matrices and you need to apply the PCA on each matrix separately.

      Delete
  5. OK , now lets think that I have a cube (a hyperspectral image) with not 3 bands but lets say 200 bands... now I want to reduce the number of these band into lest say 5 bands ... What I need to do is to roll out each band (2D image) into a 1D vector and create a (n x 200) matrix from them ... now I want to do PCA or any dimension reduction method on this data. The question is Is that a correct way of using PCA on this data ?

    ReplyDelete
    Replies
    1. This is a different kind of problem. You should try to give a look at the literature in this field.

      Delete
  6. Just in case for others having the same question:

    In order to apply conventional PCA to a hypercube, it is
    necessary to ‘unfold’ the hypercube into a two-dimensional
    matrix in which each row represents the spectrum of 1 pixel.

    http://onlinelibrary.wiley.com/doi/10.1002/cem.1127/pdf

    ReplyDelete
  7. why do you use

    M = (A-mean(A.T,axis=1)).T

    for example

    A= np.array([[1,2],[2,3],[7,6]])
    array([[1, 2],
    [2, 3],
    [7, 6]])

    M = (A-np.mean(A.T,axis=1)).T

    array([[-2.33333333, -1.33333333, 3.66666667],
    [-1.66666667, -0.66666667, 2.33333333]])

    so matrix seems to change dimensions

    ReplyDelete
    Replies
    1. yes, it changes dimensions but it's just the result of the transposition.

      Delete
    2. but why we need to transpose matrix?

      Delete