## Thursday, September 1, 2011

### #Mahout #MapReduce #SSVD version gets power iterations, fixes

I have been working on MAHOUT-796 this week (current work is in https://github.com/dlyubimov/mahout-commits/tree/MAHOUT-796).

Nathan Halko pointed to me that power iterations in practice improve precision and reduce noise.

After a little discussion, that's the route we went :

### Integration of Cholesky trick route for computing power iterations

Ted Dunning pointed out that the whole B pipeline could avoid QR step.

### So, if Cholesky trick allows to produce $\mathbf{B}_{0}$ efficiently, I see no reason why it could not be applied to producing the $\mathbf{B}_{i}$.

I will be pursuing MR option of running Cholesky decompositon of $\mathbf{Y}^{\top}\mathbf{Y}$ in MAHOUT-797.
If that holds, then power iterations with Cholesky decomposition will likely be similar.
$\mathbf{LL}^{\top}=\mathbf{Y}^{\top}\mathbf{Y}=\mathbf{R}^{\top}\mathbf{R}\Rightarrow$ we know how to compute $\mathbf{R}$;
$\mathbf{B}_{i}=\left(\mathbf{R}^{-1}\right)^{\top}\mathbf{Y_{i}}^{\top}\mathbf{A};$

### $\mathbf{Y}_{i}=\mathbf{A}\mathbf{B}_{i-1}^{\top}, i>{0}.$

Tests
Tests with local MR QR-based solver seem to be encouraging. There's clear improvement on precision.
With surrogate random input with predefined singular values 10,4,1,(0.1...), toy input 2000x1000 and k=3,p=10 I seem to be getting improvement after just one additional power iteration:

q=0:

--SSVD solver singular values:
svs: 9.998472 3.993542 0.990456 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000
q=1: (+2 more MR sequential steps):
--SSVD solver singular values:
svs: 10.000000 4.000000 0.999999 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000
After some tests and cleanup, I guess it is going to become a commit really soon.
Bug fixes
In addition to all this, MAHOUT-796 branch gets a lot of bug fixes (in particular, better handling sparse inputs which seemed to have still been broken in current version).