## Wednesday, July 27, 2011

### PCA and image compression with numpy

In the previous post we have seen the princomp function. This function performs principal components analysis (PCA) on the n-by-p data matrix and uses all the p principal component to computed the principal component scores. In this new post, we will see a modified version of the princomp where the representation of the original data in the in the principal component space is computed with less than p principal components:
```from numpy import mean,cov,cumsum,dot,linalg,size,flipud

def princomp(A,numpc=0):
# computing eigenvalues and eigenvectors of covariance matrix
M = (A-mean(A.T,axis=1)).T # subtract the mean (along columns)
[latent,coeff] = linalg.eig(cov(M))
p = size(coeff,axis=1)
idx = argsort(latent) # sorting the eigenvalues
idx = idx[::-1]       # in ascending order
# sorting eigenvectors according to the sorted eigenvalues
coeff = coeff[:,idx]
latent = latent[idx] # sorting eigenvalues
if numpc < p and numpc >= 0:
coeff = coeff[:,range(numpc)] # cutting some PCs if needed
score = dot(coeff.T,M) # projection of the data in the new space
return coeff,score,latent```
The following code uses the new version of the princomp to compute the PCA of a matrix that represents an image in gray scale. The PCA is computed ten times with an increasing number of principal components. The script show the images reconstructed using less than 50 principal components (out of 200).
```from pylab import imread,subplot,imshow,title,gray,figure,show,NullLocator
A = mean(A,2) # to get a 2-D array
full_pc = size(A,axis=1) # numbers of all the principal components
i = 1
dist = []
for numpc in range(0,full_pc+10,10): # 0 10 20 ... full_pc
coeff, score, latent = princomp(A,numpc)
Ar = dot(coeff,score).T+mean(A,axis=0) # image reconstruction
# difference in Frobenius norm
dist.append(linalg.norm(A-Ar,'fro'))
# showing the pics reconstructed with less than 50 PCs
if numpc <= 50:
ax = subplot(2,3,i,frame_on=False)
ax.xaxis.set_major_locator(NullLocator()) # remove ticks
ax.yaxis.set_major_locator(NullLocator())
i += 1
imshow(flipud(Ar))
title('PCs # '+str(numpc))
gray()

figure()
imshow(flipud(A))
title('numpc FULL')
gray()
show()
```
The resulting images:

We can see that 40 principal components are enough to reconstruct the original image.

At the end of this experiment, we can plot the distance of the reconstructed images from the original image in Frobenius norm (red curve) and the cumulative sum of the eigenvalues (blue curve). Recall that the cumulative sum of the eigenvalues shows the level of variance accounted by each of the corresponding eigenvectors. On the x axis there is the number of eigenvalues/eigenvectors used.
```from pylab import plot,axis
figure()
perc = cumsum(latent)/sum(latent)
dist = dist/max(dist)
plot(range(len(perc)),perc,'b',range(0,full_pc+10,10),dist,'r')
axis([0,full_pc,0,1.1])
show()
```

## Friday, July 22, 2011

### Principal Component Analysis with numpy

The following function is a three-line implementation of the Principal Component Analysis (PCA). It is inspired by the function princomp of the matlab's statistics toolbox.
```from numpy import mean,cov,double,cumsum,dot,linalg,array,rank
from pylab import plot,subplot,axis,stem,show,figure

def princomp(A):
""" performs principal components analysis
(PCA) on the n-by-p data matrix A
Rows of A correspond to observations, columns to variables.

Returns :
coeff :
is a p-by-p matrix, each column containing coefficients
for one principal component.
score :
the principal component scores; that is, the representation
of A in the principal component space. Rows of SCORE
correspond to observations, columns to components.
latent :
a vector containing the eigenvalues
of the covariance matrix of A.
"""
# computing eigenvalues and eigenvectors of covariance matrix
M = (A-mean(A.T,axis=1)).T # subtract the mean (along columns)
[latent,coeff] = linalg.eig(cov(M)) # attention:not always sorted
score = dot(coeff.T,M) # projection of the data in the new space
return coeff,score,latent
```
(In this other post you can find an updated version of this function).
In the following test a 2D dataset wil be used. The result of this test is a plot with the two principal components (dashed lines), the original data (blue dots) and the new data (red stars). As we expected the first principal component describes the direction of maximum variance and the second is orthogonal to the first.
```A = array([ [2.4,0.7,2.9,2.2,3.0,2.7,1.6,1.1,1.6,0.9],
[2.5,0.5,2.2,1.9,3.1,2.3,2,1,1.5,1.1] ])

coeff, score, latent = princomp(A.T)

figure()
subplot(121)
# every eigenvector describe the direction
# of a principal component.
m = mean(A,axis=1)
plot([0, -coeff[0,0]*2]+m[0], [0, -coeff[0,1]*2]+m[1],'--k')
plot([0, coeff[1,0]*2]+m[0], [0, coeff[1,1]*2]+m[1],'--k')
plot(A[0,:],A[1,:],'ob') # the data
axis('equal')
subplot(122)
# new data
plot(score[0,:],score[1,:],'*g')
axis('equal')
show()
```

In this second example princomp(.) is tested on a 4D dataset. In this example the matrix of the data is rank deficient and only the first two components are necessary to bring the information of the entry dataset.
```A = array([[-1, 1, 2, 2],
[-2, 3, 1, 0],
[ 4, 0, 3,-1]],dtype=double)

coeff, score, latent = princomp(A)
perc = cumsum(latent)/sum(latent)
figure()
# the following plot shows that first two components
# account for 100% of the variance.
stem(range(len(perc)),perc,'--b')
axis([-0.3,4.3,0,1.3])
show()
print 'the principal component scores'
print score.T # only the first two columns are nonzero
print 'The rank of A is'
print rank(A)  # indeed, the rank of A is 2
```

```Coefficients for principal components
[[  1.464140e+00   1.588382e+00   0.000000e+00  -4.440892e-16]
[  2.768170e+00  -1.292503e+00  -2.775557e-17   6.557254e-16]
[ -4.232310e+00  -2.958795e-01   1.110223e-16  -3.747002e-16]]
The rank of A is
2
```

## Monday, July 18, 2011

### Jukowsi Transformation

The following code applies the Jukowsi transformation to various circles on the complex plane.
```from pylab import plot,show,axis,subplot
import cmath
from numpy import *

"""Returns an array z where every z[i] is a point
on the circle's circumference on the complex plane"""
teta = linspace(0,pi*2,150)
z = []
for a in teta:
z.append(zc)
return array( z, dtype=complex )

def jukowsi(z,L):
return z+L/z # asimmetric Jukowski transformation

# let's try the transformation on three different circles
subplot(131)
z = cplxcircm(1,complex(0.5,0)) #1st circle
w = jukowsi(z,1) # applying the trasformation
plot(real(z),imag(z),'b',real(w),imag(w),'r')
axis('equal')

subplot(132)
z = cplxcircm(0.8,complex(0.4,0.3))
w = jukowsi(z,0.188)
plot(real(z),imag(z),'b',real(w),imag(w),'r')
axis('equal')

subplot(133)
z = cplxcircm(1.1,complex(0.1,0.0))
w = jukowsi(z,1)
plot(real(z),imag(z),'b',real(w),imag(w),'r')
axis('equal')
show()
```
The script shows the following figure. Every subplot shows two curves on the complex plane (x real values, y imaginary values), the blue curve is the starting curve and the red is the transformation.

## Thursday, July 14, 2011

### Polynomial curve fitting

We have seen already how to a fit a given set of points minimizing an error function, now we will see how to find a fitting polynomial for the data using the function polyfit provided by numpy:
```from numpy import *
import pylab

# data to fit
x = random.rand(6)
y = random.rand(6)

# fit the data with a 4th degree polynomial
z4 = polyfit(x, y, 4)
p4 = poly1d(z4) # construct the polynomial

z5 = polyfit(x, y, 5)
p5 = poly1d(z5)

xx = linspace(0, 1, 100)
pylab.plot(x, y, 'o', xx, p4(xx),'-g', xx, p5(xx),'-b')
pylab.legend(['data to fit', '4th degree poly', '5th degree poly'])
pylab.axis([0,1,0,1])
pylab.show()

```
Let's see the two polynomials:

## Tuesday, July 12, 2011

### Dice rolling experiment

If we roll a die a large number of times, and we compute the mean and variance, as exaplained here, we’d expect to obtain a mean = 3.5 and a variance = 2.916. Let's simulate that with a script:
```import pylab
import math

# Rolling the die 1000 times
v = pylab.randint(1,7,size=(1000))

print 'mean',pylab.mean(v)
print 'variance',pylab.var(v)

pylab.hist(v, bins=6) # histogram of the outcoming
pylab.xlim(1,6)
pylab.show()
```
Here's the result:
```mean 3.435
variance 2.781775
```

## Monday, July 11, 2011

### Prime factor decomposition of a number

The following function compute the prime factors (PFs) of an integer.
```from math import floor

def factors(n):
result = []
for i in range(2,n+1): # test all integers between 2 and n
s = 0;
while n/i == floor(n/float(i)): # is n/i an integer?
n = n/float(i)
s += 1
if s > 0:
for k in range(s):
result.append(i) # i is a pf s times
if n == 1:
return result

# test
print factors(90)
```
The result will be
```[2, 3, 3, 5]
```
This means that 90 is equal to 2*3*3*5.

## Wednesday, July 6, 2011

### How to use reflection

A simple code snippet that use reflection to print names of attributes, methods and doc strings:
```class Obj:
""" An object that use reflection """

def __init__(self,name):
""" the constructor of this object """
self.name = name

def print_methods(self):
""" print all the methods of this object and their doc string"""
print '\n* Methods *'
for names in dir(self):
attr = getattr(self,names)
if callable(attr):
print names,':',attr.__doc__

def print_attributes(self):
""" print all the attributes of this object and their value """
print '* Attributes *'
for names in dir(self):
attr = getattr(self,names)
if not callable(attr):
print names,':',attr

def print_all(self):
""" calls all the methods of this object """
for names in dir(self):
attr = getattr(self,names)
if callable(attr) and names != 'print_all' and names != '__init__':
attr() # calling the method

o = Obj('the my object')
o.print_all()
```
Which gives the following output:
```* Attributes *
__doc__ :  An object that use reflection
__module__ : __main__
name : the my object

* Methods *
__init__ :  the constructor of this object
print_all :  calls all the methods of this object
print_my_attributes :  print all the attributes of this object
print_my_methods :  print all the methods of this object
```

## Monday, July 4, 2011

### How to plot biorhythm

The following script plot the biorhythm of a person born in 14/3/1988 in a range of 20 days.
```from datetime import date
import matplotlib.dates
from pylab import *
from numpy import array,sin,pi

t0 = date(1988,3,14).toordinal()
t1 = date.today().toordinal()
t = array(range((t1-10),(t1+10))) # range of 20 days

y = 100*[sin(2*pi*(t-t0)/23),  # Physical
sin(2*pi*(t-t0)/28),  # Emotional
sin(2*pi*(t-t0)/33)]; # Intellectual

# converting ordinals to date
label = []
for p in t:
label.append(date.fromordinal(p))

fig = figure()
ax = fig.gca()
plot(label,y[0],label,y[1],label,y[2])
legend(['Physical', 'Emotional', 'Intellectual'])
# formatting the dates on the x axis
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%d/%b'))

show()
```
The resulting graph:

## Friday, July 1, 2011

### Approximating pi

This script uses the following formula
to approximate the value of pi with a fixed number of correct digits.
```import math
def pi(digits):
k = 0
pi = 0.0
e = 1.0
tol = pow(10,-digits) # is minimum error that we want to accept
while e > tol:
pi_old = pi
pi += (2*pow(-1,k)*pow(3,.5-k))/(2*k+1)
e = abs(pi-pi_old) # current error
k += 1
print '%0.16f %20.16f' % (pi,e)
print '\nerror',e,'\niterations ',k
print '%0.16f' % pi,'result'
print '%0.16f' % math.pi,'real pi'

pi(8) # approximating pi with 8 correcet digits
```
During the execution you can see the current approximated value on the left column and the difference with the approximation at the previous step on the right column. At the end, the difference between the approximated value and the value provided by the math python module will be printed.
```\$ python pi.py
3.4641016151377544   3.4641016151377544
3.0792014356780038   0.3849001794597506
3.1561814715699539   0.0769800358919501
3.1378528915956800   0.0183285799742738
3.1426047456630846   0.0047518540674045
3.1413087854628832   0.0012959602002014
3.1416743126988376   0.0003655272359544
3.1415687159417840   0.0001055967570536
3.1415997738115058   0.0000310578697218
3.1415905109380802   0.0000092628734256
3.1415933045030817   0.0000027935650015
3.1415924542876463   0.0000008502154354
3.1415927150203800   0.0000002607327336
3.1415926345473140   0.0000000804730660
3.1415926595217138   0.0000000249743999
3.1415926517339976   0.0000000077877162

error 7.78771624965e-09
iterations  16
3.1415926517339976 result
3.1415926535897931 real pi
```