Loading [MathJax]/jax/output/HTML-CSS/jax.js

Saturday, June 4, 2011

John #Canny's function #summarizer formulae #Pig #MapReduce

Credit for the idea to Ted Dunning, but I wanted to log the final formulae set i used for MapReduce and Pig implementation of these kind of summarizers for my future reference. Here it goes.

Adaptation

s(t)=ket(k1)ekk1t,k>1.

This s(t)  (which Ted Dunning calls 'Canny's filter') has desirable properties of s(0)=0  and s(0)=1.

In our case, it takes form of 

y(x)=kex/α(k1)ekx/α(k1)

Given margin m  and decay time Δt , so that 

y(Δt)=m, 

solving for α, we get estimate for α  assuming margin m1  as 
αΔtln(m/k). 

I take Δt as constructor input with good default values of k=4 and m=0.01.

Iterative update tn+1tn 

{{π0=1,ν0=1,π=e(tn+1tn)/α;ν=ek(tn+1tn)/α(k1);wn+1=1+πwn;vn+1=1+νvn;sn+1=xn+1+πsn;un+1=xn+1+νun;tn+1=tn+1.

Then average μ=ks(k1)ukw(k1)v.

Persisted summarizer state thus consists of parameters {w,v,s,u,t,k,α} .  Parameters k  and α  are set in constructor and thus don't evolve.


 Iterative update in-the-past tn+1<tn

{{π0=1,ν0=1;π=e(tntn+1)/α,ν=ek(tntn+1)/α(k1);wn+1=wn+π,vn+1=vn+ν,sn+1=sn+πxn+1,un+1=un+νxn+1,tn+1=tn.

Combining Canny summarizers

Combining two summarizer states S1,S2  having observed two disjoint sets of original observation superset (for algebraic pig UDF, MapReduce use):

Case t2t1 :

{t=t2;π=e(t2t1)/α;ν=ek(t2t1)/α(k1);s=s2+s1π;u=u2+u1ν;w=w2+w1π;v=v2+v1ν.

Merging in positive samples only for binomial cases

Case t(1)t(0) :

{t=t(1);π=e(t(1)t(0))/α;ν=ek(t(1)t(0))/α(k1);s=s(1)+s(0)π;u=u(1)+u(0);w=w(0)π;v=v(0)ν.

Case t(1)<t(0) :

{t=t(0);(nochange)π=e(t(0)t(1))/α;ν=ek(t(0)t(1))/α(k1);s=s(0)+s(1)π;u=u(0)+u(1)ν;w=w(0);(nochange)v=v(0)(nochange).

Pecularities of Canny rate summarizer implementation

Assuming π0=1,ν0=1 , but otherwise

π=exp(|tn+1tn|α),ν=exp(k|tn+1tn|α(k1)),

for the case of tn+1tn :

{wn+1=πwn+α(1π);vn+1=νvn+α(k1)(1ν)k;

 and for the case of tn+1<tn :

{wn+1=max(wn,α(1π));vn+1=max(vn,α(k1)(1ν)k).

(of course in the latter case if wnα(1π)vnα(k1)(1ν)k , so we can optimize a little based on that).

The logic of advancing sn+1  and un+1 is the same as in Canny average summarizer.

 Pecularities of combining Canny rate summarizers

Assuming both summarizers have nonzero history and α(1)=α(2)=α,  and k(1)=k(2)=k  as a precondition, combining routine looks like

π=exp(|t(2)t(1)|α),ν=exp(k|t(2)t(1)n|α(k1)),{{t=t(1)s=s(1)+πs(2)u=u(1)+νu(2),t(1)t(2);{t=t(2)s=s(2)+πs(1)u=u(2)+νu(1),t(1)<t(2);w=max(w(1),w(2)),allcases;v=max(v(1),v(2)),allcases.

In simple words, here we take longest history of two (expressed by denominator terms w  and v ), and merge numerators based on which history contained the latest observation. Again, we can assume w(1)w(2)v(1)v(2)  to save on comparisons a little bit.

No comments:

Post a Comment