Weathering Thru Tech Days: September 2011

I have been working on MAHOUT-796 this week (current work is in https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-796).

Nathan Halko pointed to me that power iterations in practice improve precision and reduce noise.

After a little discussion, that's the route we went :

Original formula

\[\mathbf{Q}=\mbox{qr}\left[\left(\mathbf{AA^{\top}}\right)^q\mathbf{A}\boldsymbol{\Omega}\right].\mathbf{Q},\]

\[\mathbf{B}=\mathbf{Q}^{\top}\mathbf{A}.\]

Modified version

\[\mathbf{Y}=\mathbf{A}\boldsymbol{\Omega},\]

\[\mathbf{B}_{0}=\left[\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}\right]^{\top}\mathbf{A},\]

\[\mathbf{B}_{i}=\left[\mbox{qr}\left(\mathbf{AB}_{i-1}^{\top}\right).\mathbf{Q}\right]^{\top}\mathbf{A},\, i\in\left[1..q\right].\]

Notation \(\mbox{qr}\left(\cdot\right).\mathbf{Q}\) means "compute QR decomposition of the argument and retain Q as a result".

Current combination of QJob and BtJob is essentially producing \(\mathbf{B}_{0}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\boldsymbol{\Omega}\right).\mathbf{Q}\). Intermediate QJob results are QR blocks, not a final Q, so QJob is not terribly meaningful without BtJob.

The task boils down to figuring out alternative pipeline modifications necessary to produce \(\mathbf{B}_{i}^{\top}\). After that, algorithm proceeds as before with assumption of \[\mathbf{B}\equiv\mathbf{B}_{q}.\]

The existing processing will be equivalent to \({q}=0\).

\(\mathbf{B}_{i}\) pipeline (some new code)

\(\mathbf{B}_{i}\) pipeline produces \[\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}^{\top}\right).\mathbf{Q}.\]

this is very similar to \(\mathbf{B}_{0}\) pipeline with specfics being full multiplication of \(\mathbf{A}\mathbf{B}^{\top}\) in the first job and first pass qr pushdown to the reducer of the first job:

map1 : \(\mathbf{A}\mathbf{B}^{\top}\) outer products are produced

combiner1 : presumming outer products of \(\mathbf{A}\mathbf{B}^{\top}\).

reducer1 finalizes summing up outer products of \(\mathbf{A}\mathbf{B}^{\top}\) and starts \(\mbox{qrFirstPass}\left(\mathbf{A}\mathbf{B}^{\top}\right)\to\mbox{qr blocks}\). (partitioning must be done by A block # it seems).

mapper2, combiner2, reducer2 proceed exactly as mapper2, combiner2, reducer2 in B pipeline and output final \(\mathbf{B}_{i}^{\top}\) with blocks corresponding to initial splits of A input.

Thus, this pipeline is 2 MR jobs with 2 sorts (map1 + combiner1 + shuffle and sort + reducer1 + map2 + combiner2 + shuffle and sort + reducer2).

Integration of Cholesky trick route for computing power iterations

Ted Dunning pointed out that the whole B pipeline could avoid QR step.

When power iterations are concened, note that whatever route is chosen to calculate \(\mathbf{B}_{i}=g\left(\mathbf{Y}_{i}\right)\), mathematically it should still be valid \(\forall i\in\left[0..q\right]\) for as long as we assume

\[\mathbf{Y}_{0}=\mathbf{A}\boldsymbol{\Omega}, i=0;\]

\[\mathbf{Y}_{i}=\mathbf{A}\mathbf{B}_{i-1}^{\top}, i>{0}.\]

So, if Cholesky trick allows to produce \(\mathbf{B}_{0}\) efficiently, I see no reason why it could not be applied to producing the \(\mathbf{B}_{i}\).

I will be pursuing MR option of running Cholesky decompositon of \(\mathbf{Y}^{\top}\mathbf{Y}\) in MAHOUT-797.

If that holds, then power iterations with Cholesky decomposition will likely be similar.

\(\mathbf{LL}^{\top}=\mathbf{Y}^{\top}\mathbf{Y}=\mathbf{R}^{\top}\mathbf{R}\Rightarrow\) we know how to compute \(\mathbf{R}\);

\[\mathbf{B}_{i}=\left(\mathbf{R}^{-1}\right)^{\top}\mathbf{Y_{i}}^{\top}\mathbf{A};\]

\[\mathbf{Y}_{0}=\mathbf{A}\boldsymbol{\Omega}, i=0;\]

\[\mathbf{Y}_{i}=\mathbf{A}\mathbf{B}_{i-1}^{\top}, i>{0}.\]

Tests

Tests with local MR QR-based solver seem to be encouraging. There's clear improvement on precision.

With surrogate random input with predefined singular values 10,4,1,(0.1...), toy input 2000x1000 and k=3,p=10 I seem to be getting improvement after just one additional power iteration:

q=0:

--SSVD solver singular values:
svs: 9.998472 3.993542 0.990456 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000

q=1: (+2 more MR sequential steps):
--SSVD solver singular values:
svs: 10.000000 4.000000 0.999999 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000

After some tests and cleanup, I guess it is going to become a commit really soon.

Bug fixes

In addition to all this, MAHOUT-796 branch gets a lot of bug fixes (in particular, better handling sparse inputs which seemed to have still been broken in current version).

Weathering Thru Tech Days

Thursday, September 8, 2011

#MAHOUT -796 power iterations doc here

Thursday, September 1, 2011

#Mahout #MapReduce #SSVD version gets power iterations, fixes

Original formula

\[\mathbf{Q}=\mbox{qr}\left[\left(\mathbf{AA^{\top}}\right)^q\mathbf{A}\boldsymbol{\Omega}\right].\mathbf{Q},\]

\[\mathbf{B}=\mathbf{Q}^{\top}\mathbf{A}.\]

Modified version

\[\mathbf{Y}=\mathbf{A}\boldsymbol{\Omega},\]

\[\mathbf{B}_{0}=\left[\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}\right]^{\top}\mathbf{A},\]

\[\mathbf{B}_{i}=\left[\mbox{qr}\left(\mathbf{AB}_{i-1}^{\top}\right).\mathbf{Q}\right]^{\top}\mathbf{A},\, i\in\left[1..q\right].\]

Notation \(\mbox{qr}\left(\cdot\right).\mathbf{Q}\) means "compute QR decomposition of the argument and retain Q as a result".

Current combination of QJob and BtJob is essentially producing \(\mathbf{B}_{0}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\boldsymbol{\Omega}\right).\mathbf{Q}\). Intermediate QJob results are QR blocks, not a final Q, so QJob is not terribly meaningful without BtJob.

The task boils down to figuring out alternative pipeline modifications necessary to produce \(\mathbf{B}_{i}^{\top}\). After that, algorithm proceeds as before with assumption of \[\mathbf{B}\equiv\mathbf{B}_{q}.\]

The existing processing will be equivalent to \({q}=0\).

\(\mathbf{B}_{i}\) pipeline (some new code)

\(\mathbf{B}_{i}\) pipeline produces \[\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}^{\top}\right).\mathbf{Q}.\]

this is very similar to \(\mathbf{B}_{0}\) pipeline with specfics being full multiplication of \(\mathbf{A}\mathbf{B}^{\top}\) in the first job and first pass qr pushdown to the reducer of the first job:

Thus, this pipeline is 2 MR jobs with 2 sorts (map1 + combiner1 + shuffle and sort + reducer1 + map2 + combiner2 + shuffle and sort + reducer2).

Integration of Cholesky trick route for computing power iterations

When power iterations are concened, note that whatever route is chosen to calculate \(\mathbf{B}_{i}=g\left(\mathbf{Y}_{i}\right)\), mathematically it should still be valid \(\forall i\in\left[0..q\right]\) for as long as we assume

\[\mathbf{Y}_{0}=\mathbf{A}\boldsymbol{\Omega}, i=0;\]

\[\mathbf{Y}_{i}=\mathbf{A}\mathbf{B}_{i-1}^{\top}, i>{0}.\]

So, if Cholesky trick allows to produce \(\mathbf{B}_{0}\) efficiently, I see no reason why it could not be applied to producing the \(\mathbf{B}_{i}\).

\[\mathbf{Y}_{0}=\mathbf{A}\boldsymbol{\Omega}, i=0;\]

\[\mathbf{Y}_{i}=\mathbf{A}\mathbf{B}_{i-1}^{\top}, i>{0}.\]