Weathering Thru Tech Days: John #Canny's function #summarizer formulae #Pig #MapReduce

Credit for the idea to Ted Dunning, but I wanted to log the final formulae set i used for MapReduce and Pig implementation of these kind of summarizers for my future reference. Here it goes.

Adaptation

$s\left(t\right)=ke^{-t}-\left(k-1\right)e^{-\frac{k}{k-1}t},\, k>1.$

This

$s\left(t\right)$ (which Ted Dunning calls 'Canny's filter') has desirable properties of

$s'\left(0\right)=0$ and

$s\left(0\right)=1$ .

In our case, it takes form of

$y\left(x\right)=ke^{-x/\alpha}-\left(k-1\right)e^{-kx/\alpha\left(k-1\right)}$

Given margin

$m$ and decay time

$\Delta t$ , so that

$y\left(\Delta t\right)=m,$

solving for

$\alpha$ , we get estimate for

$\alpha$ assuming margin

$m\ll1$ as

$\alpha\approx-\frac{\Delta t}{\ln\left(m/k\right)}.$

I take

$\Delta t$ as constructor input with good default values of

$k=4$ and

$m=0.01$ .

Iterative update $t_{n+1}\geq t_{n}$

$\begin{cases}\begin{cases}\pi_{0}=1,\,\nu_{0}=1,\\\pi=e^{-\left(t_{n+1}-t_{n}\right)/\alpha};\\\nu=e^{-k\left(t_{n+1}-t_{n}\right)/\alpha\left(k-1\right)};\end{cases}\\w_{n+1}=1+\pi w_{n};\\v_{n+1}=1+\nu v_{n};\\s_{n+1}=x_{n+1}+\pi s_{n};\\u_{n+1}=x_{n+1}+\nu u_{n};\\t_{n+1}=t_{n+1}.\end{cases}$

Then average

$\mu=\frac{ks-\left(k-1\right)u}{kw-\left(k-1\right)v}.$

Persisted summarizer state thus consists of parameters

$\left\{ w,v,s,u,t,k,\alpha\right\}$ . Parameters

$k$ and

$\alpha$ are set in constructor and thus don't evolve.

Iterative update in-the-past

$t_{n+1}<t_{n}$

$\begin{cases} \begin{cases} \pi_{0}=1,\,\nu_{0}=1;\\ \pi=e^{-\left(t_{n}-t_{n+1}\right)/\alpha},\\ \nu=e^{-k\left(t_{n}-t_{n+1}\right)/\alpha\left(k-1\right)}; \end{cases}\\ w_{n+1}=w_{n}+\pi,\\ v_{n+1}=v_{n}+\nu,\\ s_{n+1}=s_{n}+\pi x_{n+1},\\ u_{n+1}=u_{n}+\nu x_{n+1},\\ t_{n+1}=t_{n}. \end{cases}$

Combining Canny summarizers

Combining two summarizer states

$S_{1},\, S_{2}$ having observed two disjoint sets of original observation superset (for algebraic pig UDF, MapReduce use):

Case

$t_{2}\geq t_{1}$ :

$\begin{cases} t=t_{2};\\ \pi=e^{-\left(t_{2}-t_{1}\right)/\alpha};\\ \nu=e^{-k\left(t_{2}-t_{1}\right)/\alpha\left(k-1\right)};\\ s=s_{2}+s_{1}\pi;\\ u=u_{2}+u_{1}\nu;\\ w=w_{2}+w_{1}\pi;\\ v=v_{2}+v_{1}\nu. \end{cases}$

Merging in positive samples only for binomial cases

Case

$t^{\left(1\right)}\geq t^{\left(0\right)}$ :

$\begin{cases} t=t^{\left(1\right)};\\ \pi=e^{-\left(t^{\left(1\right)}-t^{\left(0\right)}\right)/\alpha};\\ \nu=e^{-k\left(t^{\left(1\right)}-t^{\left(0\right)}\right)/\alpha\left(k-1\right)};\\ s=s^{\left(1\right)}+s^{\left(0\right)}\pi;\\ u=u^{\left(1\right)}+u^{\left(0\right)};\\ w=w^{\left(0\right)}\pi;\\ v=v^{\left(0\right)}\nu. \end{cases}$

Case

$t^{\left(1\right)}<t^{\left(0\right)}$ :

$\begin{cases} t=t^{\left(0\right)}; & \left(no\, change\right)\\ \pi=e^{-\left(t^{\left(0\right)}-t^{\left(1\right)}\right)/\alpha};\\ \nu=e^{-k\left(t^{\left(0\right)}-t^{\left(1\right)}\right)/\alpha\left(k-1\right)};\\ s=s^{\left(0\right)}+s^{\left(1\right)}\pi;\\ u=u^{\left(0\right)}+u^{\left(1\right)}\nu;\\ w=w^{\left(0\right)}; & \left(no\, change\right)\\ v=v^{\left(0\right)} & (no\, change). \end{cases}$

Pecularities of Canny rate summarizer implementation

Assuming

$\pi_{0}=1,\,\nu_{0}=1$ , but otherwise

$\pi=\exp\left(-\frac{\left|t_{n+1}-t_{n}\right|}{\alpha}\right),$

$\nu=\exp\left(-\frac{k\cdot\left|t_{n+1}-t_{n}\right|}{\alpha\left(k-1\right)}\right),$

for the case of

$t_{n+1}\geq t_{n}$ :

$\begin{cases} w_{n+1}=\pi w_{n}+\alpha\left(1-\pi\right);\\ v_{n+1}=\nu v_{n}+\alpha\frac{\left(k-1\right)\left(1-\nu\right)}{k}; \end{cases}$

and for the case of

$t_{n+1}<t_{n}$ :

$\begin{cases} w_{n+1}=\max\left(w_{n},\alpha\left(1-\pi\right)\right);\\ v_{n+1}=\max\left(v_{n},\alpha\frac{\left(k-1\right)\left(1-\nu\right)}{k}\right). \end{cases}$

(of course in the latter case if

$w_{n}\geq\alpha\left(1-\pi\right)\equiv v_{n}\geq\alpha\frac{\left(k-1\right)\left(1-\nu\right)}{k}$ , so we can optimize a little based on that).

The logic of advancing

$s_{n+1}$ and

$u_{n+1}$ is the same as in Canny average summarizer.

Pecularities of combining Canny rate summarizers

Assuming both summarizers have nonzero history and

$\alpha^{\left(1\right)}=\alpha^{\left(2\right)}=\alpha$ , and

$k^{\left(1\right)}=k^{\left(2\right)}=k$ as a precondition, combining routine looks like

$\pi=\exp\left(-\frac{\left|t^{\left(2\right)}-t^{(1)}\right|}{\alpha}\right),$

$\nu=\exp\left(-\frac{k\cdot\left|t^{\left(2\right)}-t_{n}^{\left(1\right)}\right|}{\alpha\left(k-1\right)}\right),$

$\begin{cases} \begin{cases} t=t^{\left(1\right)}\\ s=s^{\left(1\right)}+\pi s^{\left(2\right)}\\ u=u^{\left(1\right)}+\nu u^{\left(2\right)} \end{cases}, & t^{\left(1\right)}\ge t^{\left(2\right)};\\ \begin{cases} t=t^{\left(2\right)}\\ s=s^{(2)}+\pi s^{\left(1\right)}\\ u=u^{\left(2\right)}+\nu u^{\left(1\right)} \end{cases}, & t^{\left(1\right)}<t^{\left(2\right)};\\ w=\max\left(w^{\left(1\right)},w^{\left(2\right)}\right), & \mathrm{all\, cases;}\\ v=\max\left(v^{\left(1\right)},v^{\left(2\right)}\right), & \mathrm{all\, cases.} \end{cases}$

In simple words, here we take longest history of two (expressed by denominator terms

$w$ and

$v$ ), and merge numerators based on which history contained the latest observation. Again, we can assume

$w^{\left(1\right)}\ge w^{\left(2\right)}\equiv v^{\left(1\right)}\ge v^{\left(2\right)}$ to save on comparisons a little bit.

Weathering Thru Tech Days

Saturday, June 4, 2011

John #Canny's function #summarizer formulae #Pig #MapReduce

No comments:

Post a Comment