The Glowing Python: Distribution fitting with scipy

Friday, July 20, 2012

Distribution fitting with scipy

Distribution fitting is the procedure of selecting a statistical distribution that best fits to a dataset generated by some random process. In this post we will see how to fit a distribution using the techniques implemented in the Scipy library.
This is the first snippet:

from scipy.stats import norm
from numpy import linspace
from pylab import plot,show,hist,figure,title

# picking 150 of from a normal distrubution
# with mean 0 and standard deviation 1
samp = norm.rvs(loc=0,scale=1,size=150) 

param = norm.fit(samp) # distribution fitting

# now, param[0] and param[1] are the mean and 
# the standard deviation of the fitted distribution
x = linspace(-5,5,100)
# fitted distribution
pdf_fitted = norm.pdf(x,loc=param[0],scale=param[1])
# original distribution
pdf = norm.pdf(x)

title('Normal distribution')
plot(x,pdf_fitted,'r-',x,pdf,'b-')
hist(samp,normed=1,alpha=.3)
show()

The result should be as follows

In the code above a dataset of 150 samples have been created using a normal distribution with mean 0 and standar deviation 1, then a fitting procedure have been applied on the data. In the figure we can see the original distribution (blue curve) and the fitted distribution (red curve) and we can observe that they are really similar.
Let's do the same with a Rayleigh distribution:

from scipy.stats import norm,rayleigh

samp = rayleigh.rvs(loc=5,scale=2,size=150) # samples generation

param = rayleigh.fit(samp) # distribution fitting

x = linspace(5,13,100)
# fitted distribution
pdf_fitted = rayleigh.pdf(x,loc=param[0],scale=param[1])
# original distribution
pdf = rayleigh.pdf(x,loc=5,scale=2)

title('Rayleigh distribution')
plot(x,pdf_fitted,'r-',x,pdf,'b-')
hist(samp,normed=1,alpha=.3)
show()

The resulting plot:

As expected, the two distributions are very close.

21 comments:

AnonymousJuly 21, 2012 at 1:52 AM
or you could plug your samples into http://zunzun.com/ :D
ReplyDelete
Replies
AnonymousJuly 22, 2012 at 12:42 PM
The actual direct link would be:

http://zunzun.com/StatisticalDistributions/1/
ReplyDelete
Replies
DavidAugust 9, 2012 at 2:00 PM
Hurray! Been missing Glowing Python posts. Happy to see a new one, learn something new.
ReplyDelete
Replies
AnonymousAugust 7, 2013 at 3:06 PM
I think, this does nothing else than calculating the mean and standard deviation of samp:
>>> samp = norm.rvs(loc=0,scale=1,size=150)
>>> param = norm.fit(samp)
>>> mu = np.mean(samp)
>>> sigma = np.std(samp)
>>> mu==param[0]
True
>>> sigma==param[1]
True
>>>
ReplyDelete
Replies
AnonymousDecember 10, 2013 at 1:58 AM
If it helps, some code for doing this w/o normalizing, which plots the gaussian fit over the real histogram:

from scipy.stats import norm
from numpy import linspace
from pylab import plot,show,hist

def PlotHistNorm(data, log=False):
# distribution fitting
param = norm.fit(data)
mean = param[0]
sd = param[1]

#Set large limits
xlims = [-6*sd+mean, 6*sd+mean]

#Plot histogram
histdata = hist(data,bins=12,alpha=.3,log=log)

#Generate X points
x = linspace(xlims[0],xlims[1],500)

#Get Y points via Normal PDF with fitted parameters
pdf_fitted = norm.pdf(x,loc=mean,scale=sd)

#Get histogram data, in this case bin edges
xh = [0.5 * (histdata[1][r] + histdata[1][r+1]) for r in xrange(len(histdata[1])-1)]

#Get bin width from this
binwidth = (max(xh) - min(xh)) / len(histdata[1])

#Scale the fitted PDF by area of the histogram
pdf_fitted = pdf_fitted * (len(data) * binwidth)

#Plot PDF
plot(x,pdf_fitted,'r-')
ReplyDelete
Replies
babaJanuary 30, 2014 at 3:08 PM
This comment has been removed by the author.
ReplyDelete
Replies
gaomrxJune 23, 2015 at 9:52 PM
Is there a way to fit data to an exponential distribution such that it maximizes the entropy H(p_i) = - sum p_i*log(p_i) where p_i is the probability of a given event?
ReplyDelete
Replies
UnknownJune 21, 2019 at 1:49 AM
Hello I have precipiation data, I am trying to fit a general extreme value distribution to this data before computing percentiles of this data. will I get new data if I fit this distribution and then use this for my analysis? Thanks
ReplyDelete
Replies
UnknownJuly 8, 2019 at 6:32 PM
Hello, I noticed this code fits a distribution over random data. If we had an array of data that we wanted to fit our distribution over, how would we do that?
ReplyDelete
Replies
UnknownJanuary 14, 2020 at 5:39 PM
I used this code wanted to use my data instead of random data but I got this picture.
ReplyDelete
Replies
UnknownJanuary 14, 2020 at 5:41 PM
The normal distribution is the line!
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Friday, July 20, 2012

Distribution fitting with scipy

21 comments:

Quote