Thursday, October 21, 2010

Matrix class is not array of arrays

The same reasons you encapsulate your data structures, and the same reason you check parameters to make sure they are valid.
A few people use [][] despite its limitations, arguing that [][] is better because it is faster or because it uses C-syntax. The problem with the "it's faster" argument is that it's not — at least not on the latest version of two of the world's best known C++ compilers. The problem with the "uses C-syntax" argument is that C++ is not C. Plus, oh yea, the C-syntax makes it harder to change the data structure and harder to check parameter values.
The point of the previous two FAQs is that m(i,j) gives you a clean, simple way to check all the parameters and to hide (and therefore, if you want to, change) the internal data structure. The world already has way too many exposed data structures and way too many out-of-bounds parameters, and those cost way too much money and cause way too many delays and way too many defects.
Now everybody knows that you are different. You are clairvoiant with perfect knowledge of the future, and you know that no one will ever find any benefit from changing your matrix's internal data structure. Plus you are a good programmer, unlike those slobs out there that occasionally pass wrong parameters, so you don't need to worry about pesky little things like parameter checking. But even though you don't need to worry about maintenance costs (no one ever needs to change your code), there might be one or two other programmers who aren't quite perfect yet. For them, maintenance costs are high, defects are real, and requirements change. Believe it or not, every once in a while they need to (better sit down) change their code.
Okay, my thongue wath in my theek. But there was a point. The point was that encapsulation and parameter-checking are not crutches for the weak. It's smart to use techniques that make encapsulation and/or parameter checking easy. The m(i,j) syntax is one of those techniques.
Having said all that, if you find yourself maintaining a billion-line app where the original team used m[i][j], or even if you are writing a brand new app and you just plain want to use m[i][j], you can still encapsulate the data structure and/or check all your parameters. It's not even that hard. However it does require a level of sophistication that, like it or not, the average C++ programmers fears. Fortunately you are not average, so read on.
If you merely want to check parameters, just make sure the outer operator[] returns an object rather than a raw array, then that object's operator[] can check its parameter in the usual way. Beware that this can slow down your program. In particular, if these inner array-like objects end up allocating their own block of memory for their row of the matrix, the performance overhead for creating / destroying your matrix objects can grow dramatically. The theoretical cost is still O(rows*cols), but in practice, the overhead of the memory allocator (new or malloc) can be much larger than anything else, and that overhead can swamp the other costs. For instance, on two of the world's best known C++ compilers, the separate-allocation-per-row technique was 10x slower than the than one-allocation-for-the-entire-matrix technique. 10% is one thing, 10x is another.
If you want to check the parameters without the above overhead and/or if you want to encapsulate (and possibly change) the matrix's internal data structure, follow these steps:
  1. Add operator()(unsigned row, unsigned col) to the Matrix class.
  2. Create nested class Matrix::Row. It should have a ctor with parameters (Matrix& matrix, unsigned row), and it should store those two values in its this object.
  3. Change Matrix::operator[](unsigned row) so it returns an object of class Matrix::Row, e.g., { return Row(*this,row); }.
  4. Class Matrix::Row then defines its own operator[](unsigned col) which turns around and calls, you guessed it, Matrix::operator()(unsigned row, unsigned col). If the Matrix::Row data members are called Matrix& matrix_ and unsigned row_, the code for Matrix::Row::operator[](unsigned col) will be { return matrix_(row_, col); }
Next you will enable const overloading by repeating the above steps. You will create the const version of the various methods, and you will create a new nested class, probably called Matrix::ConstRow. Don't forget to use Matrix const& instead of Matrix&.
Final step: find the joker who failed to read the previous FAQ and thonk him in the noggin.
If you have a decent compiler and if you judiciously use inlining, the compiler should optimize away the temporary objects. In other words, the operator[]-approach above will hopefully not be slower than what it would have been if you had directly called Matrix::operator()(unsigned row, unsigned col) in the first place. Of course you could have made your life simpler and avoided most of the above work by directly calling Matrix::operator()(unsigned row, unsigned col) in the first place. So you might as well directly call Matrix::operator()(unsigned row, unsigned col) in the first place.

No comments:

Post a Comment