numpy - Kurtosis,Skewness of a bar graph? - Python -
what efficient method determining skew/kurtosis of bar graph in python? considering bar graphs not binned (unlike histograms) question not make lot of sense trying determine symmetry of graph's height vs distance (rather frequency vs bins). in other words, given value of heights(y) measured along distance(x) i.e.
y = [6.18, 10.23, 33.15, 55.25, 84.19, 91.09, 106.6, 105.63, 114.26, 134.24, 137.44, 144.61, 143.14, 150.73, 156.44, 155.71, 145.88, 120.77, 99.81, 85.81, 55.81, 49.81, 37.81, 25.81, 5.81] x = [0.03, 0.08, 0.14, 0.2, 0.25, 0.31, 0.36, 0.42, 0.48, 0.53, 0.59, 0.64, 0.7, 0.76, 0.81, 0.87, 0.92, 0.98, 1.04, 1.09, 1.15, 1.2, 1.26, 1.32, 1.37]
what symmetry of height(y) distribution (skewness) , peakness (kurtosis) measured on distance(x)? skewness/kurtosis appropriate measurements determining normal distribution of real values? or scipy/numpy offer similar type of measurement?
i can achieve skew/kurtosis estimate of height(y) frequency values binned along distance(x) following
freq=list(chain(*[[x_v]*int(round(y_v)) x_v,y_v in zip(x,y)])) x.extend([x[-1:][0]+x[0]]) #add 1 bin edge hist(freq,bins=x) ylabel("height frequency") xlabel("distance(km) bins") print "skewness,","kurtosis:",stats.describe(freq)[4:] skewness, kurtosis: (-0.019354300509997705, -0.7447085398785758)
in case height distribution symmetrical (skew 0.02) around midpoint distance , characterized platykurtic (-0.74 kurtosis i.e. broad) distribution.
considering multiply each occurrence of x value height y create frequency, size of result list can large. wondering if there better method approach problem? suppose try normalize dataset y range of perhaps 0 - 100 without loosing information on datasets skew/kurtosis.
this isn't python question, nor programming question answer simple nonetheless. instead of skew , kurtosis, let's first consider easier values based off lower moments, mean , standard deviation. make concrete, , fit question, let's assume data looks like:
x = 3, 3, 5, 5, 5, 7 = x1, x2, x3 ....
which give "bar graph" looks like:
{3:2, 5:3, 7:1} = {k1:p1, k2:p2, k3:p3}
the mean, u, given by
e[x] = (1/n) * (x1 + x2 + x3 + ...) = (1/n) * (3 + 3 + 5 + ...)
our data, however, has repeated values, can rewritten as
e[x] = (1/n) * (p1*k1 + p2*k2 + ...) = (1/n) * (3*2 + 5*3 + 7*1)
the next term, standard dev., s, simply
sqrt(e[(x-u)^2]) = sqrt((1/n)*( (x1-u)^2 + (x2-u)^3 + ...))
but can apply same reduction e[(x-u)^2]
term , write as
e[(x-u)^2] = (1/n)*( p1*(k1-u)^2 + p2*(k2-u)^2 + ... ) = (1/6)*( 2*(3-u)^2 + 3*(5-u)^2 + 1*(7-u)^2 )
which means don't have have multiple copy of each data item sum indicated in question.
the skew , kurtosis quite simple point:
skew = e[(x-u)^3] / (e[(x-u)^2])^(3/2) kurtosis = ( e[(x-u)^4] / (e[(x-u)^2])^2 ) - 3
Comments
Post a Comment