Processing math: 100%

Thursday, April 21, 2011

Follow-up for the "Mean Summarizer..." post

This is a small follow-up for the previous post.

In practice, in MapReduce world, such as Pig UDF functions (which I btw already put in place), when we run over observation data, we often encounter 2 types of problems:


1-- Unordered observations, i.e. cases such as tn+1<tn.
2-- Parallel processing of disjoint subsets of common superset of observations and combining into one ("combiner"-hadoop, "Algebraic UDF"-pig).

Hence, we need to add those specific cases to our formulas.

1. Updating state with events in the past.
Updated formula will look like:

Case tn+1tn:
{{π0=1,πn+1=e(tn+1tn)/α;wn+1=1+πn+1wn;sn+1=xn+1+πn+1sn;tn+1=tn+1.


Case tn+1<tn (updating in-the-past):
{{π0=1,π=e(tntn+1)/α,wn+1=wn+πn+1,sn+1=sn+πn+1xn+1,tn+1=tn.


2.  Combining two summarizers having observed two disjoint sets as subsets of original observation set.
Combining two summarizer states S1,S2 having observed two disjoint sets of original observation superset:
Case t2t1:
{t=t2;s=s2+s1e(t2t1)/α;w=w2+w1e(t2t1)/α.


Case t2<t1 is symmetrical w.r.t. indices ()1,()2. Also, the prerequisite for combining two summarizers is α1=α2=α (history decay is the same).

But enough of midnight oil burning.

No comments:

Post a Comment