Friday, March 11, 2016

AM: BMR Book Errata


Section 4.11.1

The `rowSums` formula should read
and the `colSums` formula should read
Note: it is a little bit confusing. Following R semantics, the rowSums() method means "row-wise sum" which is the same as "sum of columns"; and vice versa, the colSums() method means "column-wise sum", which can be computed as the sum of matrix rows.

Section 6.1

The dimensions of matrix $$\mathbf{V}$$ are $$\mathbf{V}\in\mathbb{R}^{n\times k}$$.
Formula (6.1) should read
Formula (6.2) - (6.3) should read
\boldsymbol{a} & = & \left(\mathbf{V}^{\top}\right)^{\text{-}1}\boldsymbol{a}_{pca}+\boldsymbol{\mu} \\
& = & \mathbf{V}\boldsymbol{a}_{pca}+\boldsymbol{\mu}.\label{eq:from-pca}


Section 8.3

In Step (1): Setup working directories and acquire data, the URL of the Wikipedia XML dump has since changed. The third command of Step (1) should read:

curl -o $WORK_DIR/wikixml/enwiki-latest-pages-articles.xml.bz2

We keep the Kindle edition updated with the Errata. If you have ever bought the print version from Amazon, the Kindle version is free via Amazon MatchBook. If you already have a Kindle version, you should be able just to reload it with the updated one.


  1. Hi,
    Firstly I must say that your book is excellent. Thank you very much for writing the book that leads us to develop many distributed mathematical algorithms. I have a problem for the first example of the book.
    Page 19., Example 2.4 Simulating regression data with a small noise
    When I declare mxData object the scala interpreter launches that error: error: recursive value mxData needs type
    Can you explain me why?

    1. Hello,

      Firstly, thanks for such an excellent book.

      Same problem here, I execute your code from your repository everything works well. Problem is when I try to execute the code from the book. I got the following error:

      Error:(26, 32) value checkpoint is not a member of org.apache.mahout.math.Matrix
      val drmXB = (1 cbind drmX).checkpoint()

      Why code from your repository and the book are so different?


    2. Xavier -- thanks!

      I am not quite sure what you mean by "different". Unless it is a trivial one-, two- or maybe three-line example (which we still verified for errors on actual execution) all code is copied verbatim from the book's github code.

      Of course, many code examples are just fragments being illustrated. Omission of various boilerplate or irrelevant code (import, Scala unit test setup etc. etc.) is customary for printed examples that focus on a specific point being illustrated. That's why we publish the whole code as well so you could refer to it for other details if something doesn't seem to be working quite right.

      Specifically toward your problem: I am guessing you are referring to Example 2.3, which again is found in its entirety here:

      In example 2.3, the value drmXB (as its prefix implies) is of distributed matrix type -- `DRMLike[?]`. However, the compilation error you are getting implies that it is not a distributed matrix type, but rather an in-memory type (o.a.m.math.Matrix).

      The `checkpoint()` operation is a method on distributed matrices only (it initiates lazy optimization barrier). Chap. 4 goes over these specifics in more details. You must be doing something different -- my guess is in that line you are trying to run, drmX is not really a DRM but rather an in-memory `Matrix` type, which obviously does not have distributed optimizer contracts like `checkpoint`, hence the error you see.

      But I don't know what exactly you may be doing differently, since I do not see the rest of your code.

      Hope that helps.


    3. Aykut, thank you -- and thank you for your question.

      This code is not meant to run in a shell, but rather in the application. The source code for this particular example is here: You should be able to compile it with maven and run tests, which pass.

      The book implies in the very chap. 2 that you download and compile corresponding version of Mahout locally, as the examples reuse test artifacts of Mahout which may not have been published to Maven Central for any given release.

      In fact, I always rely on my own recompilation rather than on binary releases (as i often work with a snapshot rather than a release). Even with Spark. :)

      Feel free to check out the example code and follow steps in chap2 to compile Mahout, make sure that you are compiling the same version the examples use (or adjust example dependencies to the version you compile).

      Please let me know if you have further difficulties.

    4. Thanks problem solved function I was looking for it's in LinearRegression [1], sorry I didn't notice. Everything works!


  2. Thanks Dmitriy for your reply. I have created a replica from your code repository and it is working very well.

    Kind regards.

  3. hello Sir,
    Firstly thanks for your excellent book.I have a problem for the example of parallel matrix multiplication.My code is here :
    package myMahoutApp.mthread

    * Created by chomon on 11/14/16.
    import org.apache.log4j.{BasicConfigurator, Level}
    import org.scalatest.{FunSuite, Matchers}
    import org.apache.mahout.math._
    import scalabindings._
    import RLikeOps._
    import org.apache.mahout.logging._

    import scala.concurrent.duration.Duration
    import scala.concurrent.{Await, Future}

    class MThreadSuite extends FunSuite with Matchers {
    private[mthread] final implicit val log = getLog(classOf[MThreadSuite])

    test("mthread-mmul") {

    val m = 5000
    val n = 300
    val s = 350

    val mxA = Matrices.symmetricUniformView(m, s, 1234).cloned
    val mxB = Matrices.symmetricUniformView(s, n, 1323).cloned

    // Just to warm up
    mxA %*% mxB
    MMul.mmulParA(mxA, mxB)

    val ntimes = 30

    val controlMsStart = System.currentTimeMillis()
    val mxControlC = mxA %*% mxB
    for (i ← 1 until ntimes) mxA %*% mxB
    val controlMs = System.currentTimeMillis() - controlMsStart

    val cMsStart = System.currentTimeMillis()
    val mxC = MMul.mmulParA(mxA, mxB)
    for (i ← 1 until ntimes) MMul.mmulParA(mxA, mxB)
    val cMs = System.currentTimeMillis() - cMsStart

    debug(f"control: ${controlMs / ntimes.toDouble}%.2f ms.")
    debug(f"mthread: ${cMs / ntimes.toDouble}%.2f ms.")


    (mxControlC - mxC).norm should be < 1e-5

    def mmulParA(mxA: Matrix, mxB: Matrix): Matrix = {
    val result = if (mxA.getFlavor.isDense), mxB.ncol)
    else if (mxB.getFlavor.isDense), mxB.ncol)
    else, mxB.ncol)

    val nsplits = Runtime.getRuntime.availableProcessors() min mxA.nrow
    val ranges = createSplits(mxA.nrow, nsplits)
    val blocks = { r ⇒
    Future {
    r → (mxA(r, ::) %*% mxB)

    Await.result(Future.fold(blocks)(result) {
    case (result, (r, block)) ⇒
    result(r, ::) := block
    }, Duration.Inf)

    def createSplits(nrow: Int, nsplits: Int):
    TraversableOnce[Range] = {
    val step = nrow / nsplits
    val slack = nrow % nsplits
    ((0 until slack * (step + 1) by (step + 1)) ++

    (slack * (step + 1) to nrow by step))
    .sliding(2).map(s => s(0) until s(1))


    Although I gave mmulParA method,it appears "cannot resolve symbol mmulParA".
    How can I solve?

    1. Cherry:

      Thank you for all your questions. This is great that you are showing up the interest to learn.

      (1) all the examples given in the book are available on Andrew's github here: Just clone it, it all compiles and runs. Try to figure what is different.

      (2) for strictly mahout related questions (rather than strictly book related -- i know you asked a few), please be sure to ask on mahout mailing list (user, or dev, doesn't matter much). The information how to subscribe is here:,-irc-and-archives.html. It is ok to refer to the book or book examples, we all have copies, we all will find the reference if you make one.

      Being Mahout's PMC (emeritus), we at Mahout have certain rules we abide by. One of them is to steer all questions to the mailing list. We don't answer direct project-related questions (usually), instead we ask people to go to the list and ask the question there. The reasons for this are: (1) you get better served because there are much more people there that can help you and that may know the answer to your particular question. (2) Other people learn from the same question as well as all answers are archived and searchable. This, to a certain degree, reduces the burden of answering the same question multiple times. (3) The project activity is measured, among other things, by the user list activity too. So the project gains every time you ask on its list.

      If you can do that, that'd be great and we all at Mahout would be very thankful to you for doing this.