Weathering Thru Tech Days: Follow-up for the "Mean Summarizer..." post

This is a small follow-up for the previous post.

In practice, in MapReduce world, such as Pig UDF functions (which I btw already put in place), when we run over observation data, we often encounter 2 types of problems:

1-- Unordered observations, i.e. cases such as

$t_{n+1}<t_{n}$ .
2-- Parallel processing of disjoint subsets of common superset of observations and combining into one ("combiner"-hadoop, "Algebraic UDF"-pig).

Hence, we need to add those specific cases to our formulas.

1. Updating state with events in the past.
Updated formula will look like:

Case

$t_{n+1}\geq t_{n}$ :

$\begin{cases}\begin{cases}\pi_{0}=1,\\\pi_{n+1}=e^{-\left(t_{n+1}-t_{n}\right)/\alpha};\end{cases}\\w_{n+1}=1+\pi_{n+1}w_{n};\\s_{n+1}=x_{n+1}+\pi_{n+1}s_{n};\\t_{n+1}=t_{n+1}.\end{cases}$

Case

$t_{n+1}<t_{n}$ (updating in-the-past):

$\begin{cases}\begin{cases}\pi_{0}=1,\\\pi=e^{-\left(t_{n}-t_{n+1}\right)/\alpha},\end{cases}\\w_{n+1}=w_{n}+\pi_{n+1},\\s_{n+1}=s_{n}+\pi_{n+1}x_{n+1},\\t_{n+1}=t_{n}.\end{cases}$

2. Combining two summarizers having observed two disjoint sets as subsets of original observation set.
Combining two summarizer states

$S_{1},\, S_{2}$ having observed two disjoint sets of original observation superset:
Case

$t_{2}\geq t_{1}$ :

$\begin{cases}t=t_{2};\\s=s_{2}+s_{1}e^{-\left(t_{2}-t_{1}\right)/\alpha};\\w=w_{2}+w_{1}e^{-\left(t_{2}-t_{1}\right)/\alpha}.\end{cases}$

Case

$t_{2}<t_{1}$ is symmetrical w.r.t. indices

$\left(\cdot\right)_{1},\left(\cdot\right)_{2}$ . Also, the prerequisite for combining two summarizers is

$\alpha_{1}=\alpha_{2}=\alpha$ (history decay is the same).

But enough of midnight oil burning.

Weathering Thru Tech Days

Thursday, April 21, 2011

Follow-up for the "Mean Summarizer..." post

No comments:

Post a Comment