Saturday, June 4, 2011

John #Canny's function #summarizer formulae #Pig #MapReduce

Credit for the idea to Ted Dunning, but I wanted to log the final formulae set i used for MapReduce and Pig implementation of these kind of summarizers for my future reference. Here it goes.

Adaptation

\[s\left(t\right)=ke^{-t}-\left(k-1\right)e^{-\frac{k}{k-1}t},\, k>1. \]

This $s\left(t\right)$  (which Ted Dunning calls 'Canny's filter') has desirable properties of $s'\left(0\right)=0$  and $s\left(0\right)=1 $.

In our case, it takes form of 

\[y\left(x\right)=ke^{-x/\alpha}-\left(k-1\right)e^{-kx/\alpha\left(k-1\right)} \]

Given margin $m$  and decay time $\Delta t$ , so that 

\[y\left(\Delta t\right)=m,\] 

solving for $\alpha$, we get estimate for $\alpha$  assuming margin $m\ll1$  as 
\[\alpha\approx-\frac{\Delta t}{\ln\left(m/k\right)}.\] 

I take $\Delta t$ as constructor input with good default values of $k=4$ and $m=0.01$.

Iterative update $t_{n+1}\geq t_{n}$ 

\[\begin{cases}\begin{cases}\pi_{0}=1,\,\nu_{0}=1,\\\pi=e^{-\left(t_{n+1}-t_{n}\right)/\alpha};\\\nu=e^{-k\left(t_{n+1}-t_{n}\right)/\alpha\left(k-1\right)};\end{cases}\\w_{n+1}=1+\pi w_{n};\\v_{n+1}=1+\nu v_{n};\\s_{n+1}=x_{n+1}+\pi s_{n};\\u_{n+1}=x_{n+1}+\nu u_{n};\\t_{n+1}=t_{n+1}.\end{cases} \]

Then average \[\mu=\frac{ks-\left(k-1\right)u}{kw-\left(k-1\right)v}. \]

Persisted summarizer state thus consists of parameters $\left\{ w,v,s,u,t,k,\alpha\right\}$ .  Parameters $k$  and $\alpha$  are set in constructor and thus don't evolve.


 Iterative update in-the-past $t_{n+1}<t_{n}$

\[\begin{cases}
\begin{cases}
\pi_{0}=1,\,\nu_{0}=1;\\
\pi=e^{-\left(t_{n}-t_{n+1}\right)/\alpha},\\
\nu=e^{-k\left(t_{n}-t_{n+1}\right)/\alpha\left(k-1\right)};
\end{cases}\\
w_{n+1}=w_{n}+\pi,\\
v_{n+1}=v_{n}+\nu,\\
s_{n+1}=s_{n}+\pi x_{n+1},\\
u_{n+1}=u_{n}+\nu x_{n+1},\\
t_{n+1}=t_{n}.
\end{cases} \]

Combining Canny summarizers

Combining two summarizer states $S_{1},\, S_{2}$  having observed two disjoint sets of original observation superset (for algebraic pig UDF, MapReduce use):

Case $t_{2}\geq t_{1}$ :

\[\begin{cases}
t=t_{2};\\
\pi=e^{-\left(t_{2}-t_{1}\right)/\alpha};\\
\nu=e^{-k\left(t_{2}-t_{1}\right)/\alpha\left(k-1\right)};\\
s=s_{2}+s_{1}\pi;\\
u=u_{2}+u_{1}\nu;\\
w=w_{2}+w_{1}\pi;\\
v=v_{2}+v_{1}\nu.
\end{cases} \]

Merging in positive samples only for binomial cases

Case $t^{\left(1\right)}\geq t^{\left(0\right)}$ :

\[\begin{cases}
t=t^{\left(1\right)};\\
\pi=e^{-\left(t^{\left(1\right)}-t^{\left(0\right)}\right)/\alpha};\\
\nu=e^{-k\left(t^{\left(1\right)}-t^{\left(0\right)}\right)/\alpha\left(k-1\right)};\\
s=s^{\left(1\right)}+s^{\left(0\right)}\pi;\\
u=u^{\left(1\right)}+u^{\left(0\right)};\\
w=w^{\left(0\right)}\pi;\\
v=v^{\left(0\right)}\nu.
\end{cases} \]

Case $t^{\left(1\right)}<t^{\left(0\right)}$ :

\[\begin{cases}
t=t^{\left(0\right)}; & \left(no\, change\right)\\
\pi=e^{-\left(t^{\left(0\right)}-t^{\left(1\right)}\right)/\alpha};\\
\nu=e^{-k\left(t^{\left(0\right)}-t^{\left(1\right)}\right)/\alpha\left(k-1\right)};\\
s=s^{\left(0\right)}+s^{\left(1\right)}\pi;\\
u=u^{\left(0\right)}+u^{\left(1\right)}\nu;\\
w=w^{\left(0\right)}; & \left(no\, change\right)\\
v=v^{\left(0\right)} & (no\, change).
\end{cases}  \]

Pecularities of Canny rate summarizer implementation

Assuming $\pi_{0}=1,\,\nu_{0}=1$ , but otherwise

\[\pi=\exp\left(-\frac{\left|t_{n+1}-t_{n}\right|}{\alpha}\right),\]\[\nu=\exp\left(-\frac{k\cdot\left|t_{n+1}-t_{n}\right|}{\alpha\left(k-1\right)}\right),\]

for the case of $t_{n+1}\geq t_{n}$ :

\[\begin{cases}
w_{n+1}=\pi w_{n}+\alpha\left(1-\pi\right);\\
v_{n+1}=\nu v_{n}+\alpha\frac{\left(k-1\right)\left(1-\nu\right)}{k};
\end{cases}\]

 and for the case of $t_{n+1}<t_{n}$ :

\[\begin{cases}
w_{n+1}=\max\left(w_{n},\alpha\left(1-\pi\right)\right);\\
v_{n+1}=\max\left(v_{n},\alpha\frac{\left(k-1\right)\left(1-\nu\right)}{k}\right).
\end{cases} \]

(of course in the latter case if $w_{n}\geq\alpha\left(1-\pi\right)\equiv v_{n}\geq\alpha\frac{\left(k-1\right)\left(1-\nu\right)}{k}$ , so we can optimize a little based on that).

The logic of advancing $s_{n+1}$  and $u_{n+1} $ is the same as in Canny average summarizer.

 Pecularities of combining Canny rate summarizers

Assuming both summarizers have nonzero history and $\alpha^{\left(1\right)}=\alpha^{\left(2\right)}=\alpha$,  and $k^{\left(1\right)}=k^{\left(2\right)}=k$  as a precondition, combining routine looks like

\[\pi=\exp\left(-\frac{\left|t^{\left(2\right)}-t^{(1)}\right|}{\alpha}\right), \]\[\nu=\exp\left(-\frac{k\cdot\left|t^{\left(2\right)}-t_{n}^{\left(1\right)}\right|}{\alpha\left(k-1\right)}\right), \]\[\begin{cases}
\begin{cases}
t=t^{\left(1\right)}\\
s=s^{\left(1\right)}+\pi s^{\left(2\right)}\\
u=u^{\left(1\right)}+\nu u^{\left(2\right)}
\end{cases}, & t^{\left(1\right)}\ge t^{\left(2\right)};\\
\begin{cases}
t=t^{\left(2\right)}\\
s=s^{(2)}+\pi s^{\left(1\right)}\\
u=u^{\left(2\right)}+\nu u^{\left(1\right)}
\end{cases}, & t^{\left(1\right)}<t^{\left(2\right)};\\
w=\max\left(w^{\left(1\right)},w^{\left(2\right)}\right), & \mathrm{all\, cases;}\\
v=\max\left(v^{\left(1\right)},v^{\left(2\right)}\right), & \mathrm{all\, cases.}
\end{cases} \]

In simple words, here we take longest history of two (expressed by denominator terms $w$  and $v$ ), and merge numerators based on which history contained the latest observation. Again, we can assume $w^{\left(1\right)}\ge w^{\left(2\right)}\equiv v^{\left(1\right)}\ge v^{\left(2\right)}$  to save on comparisons a little bit.

No comments:

Post a Comment