## Wednesday, June 1, 2011

### On MapReduce application for online exponential rates summarizer.

as a follow-up to Ted Dunning's post exponentially weighted averaging:

Here's the bottom line result from that post:

$\pi=e^{-\left(t_{n}-t_{n-1}\right)/\alpha}$$s_{n}=\pi s_{n-1}+x$$w_{n}=\pi w_{n-1}+\left(t-t_{n}\right)$$t_{n}= t$

Combiner routine:
In most general case, when both summarizers have non-zero history, and $t_{1}>t_{2}$ ,

$\pi=e^{-\left(t_{1}-t_{2}\right)/\alpha},$$w=\max\left(w_{1},\pi w_{2}\right),$$s=s_{1}+\pi s_{2}.$

Case $t_{1}<t_{2}$ is symmetrical w.r.t. indices $\left(\cdot\right)_{1},\,\left(\cdot\right)_{2}$ .

This combiner routine however is not consistent with original formula as time spans are not exponentially compressed into time scale. Therefore, unit test for such combiner produces significant errors in various history combinations.

A combine-consistent routine:
A better formulation that works fine with the combiner is as follows:

Case $t_{n+1}>t_{n}$ :

$\begin{cases} \Delta t=t_{n+1}-t_{n},\\ \begin{cases} \pi_{0}=1,\\ \pi_{n+1}=e^{-\Delta t/\alpha}; \end{cases}\\ w_{n+1}=\pi_{n+1}w_{n}+\alpha\cdot(1-\pi_{n+1});\\ s_{n+1}=x_{n+1}+\pi_{n+1}s_{n};\\ t_{n+1}=t_{n+1}. \end{cases}$

The term $\alpha\left(1-\pi_{n+1}\right)$  really corresponds to the function average of the exponent function $f\left(x\right)=e^{x}$ multiplied by $\Delta t$  as in
$\Delta t\cdot\bar{f}\left(x|\left[-\frac{\Delta t}{\alpha},0\right]\right)=\Delta t\cdot\frac{\alpha}{\Delta t}\cdot\int_{-\Delta t/\alpha}^{0}e^{x}dx=$$=\alpha\cdot\left(e^{0}-e^{-\Delta t/\alpha}\right)=$$=\alpha\left(1-\pi\right).$

Finally, update-in-the-past also looks different for consistency and combining sake:

Case $t_{n+1}<t_{n}$:

$\begin{cases} \Delta t=t_{n}-t_{n+1},\\ \begin{cases} \pi_{0}=1,\\ \pi=e^{-\Delta t/\alpha}, \end{cases}\\ w_{n+1}=\max\left[w_{n},\alpha\cdot\left(1-\pi_{n+1}\right)\right],\\ s_{n+1}=s_{n}+\pi_{n+1}x_{n+1},\\ t_{n+1}=t_{n}. & \left(\mathrm{no\, change}\right) \end{cases}$