2838
Comment:
|
4046
|
Deletions are marked like this. | Additions are marked like this. |
Line 18: | Line 18: |
* implement LQUP decomposition [Clement, MartinAlbrecht] * implement LQUP routine [Clement] * implement TRSM routine [Clement] * implement efficient column swaps/rotations [MartinAlbrecht] * SSE2 might help a lot here * implement memory efficient mzd_addmul_strassen [Martin] |
* implement LQUP decomposition [ClementPernet, MartinAlbrecht] * --(implement LQUP routine)-- [Clement] * implement {{{_mzd_addmul_weird}}} [Clement] * implement LQUP basecase routine based on M4RI {{{mzd_lqup_m4ri}}} [MartinAlbrecht] * dont' zero out in _mzd_gauss_submatrix (!) * don't zero out below * remember rowswaps => P * remember Q[r] = c (translate to Lapack style later :-) ) * --(implement TRSM routine)-- [Clement] * implement inplace triangular matrix inversion {{{mzd_trtri}}} [Clement] * implement triangular triangular matrix multiplication {{{mzd_trtrm}}} [Clement] * --(implement column swaps)-- [MartinAlbrecht] * --(implement column rotations)-- [MartinAlbrecht] * --(implement memory efficient mzd_addmul_strassen)-- [Martin] |
Line 25: | Line 33: |
* implement Arne's asymptotically fast elimination algorithm [MartinAlbrecht] | * --(implement Arne's asymptotically fast elimination algorithm)-- [MartinAlbrecht] * It was agreed that this is not what we want for M4RI, a toy implementation is available at: http://www.informatik.uni-bremen.de/cgi-bin/cgiwrap/malb/blosxom.pl/2008/06/13#gauss-jordan-storjohann |
Line 30: | Line 39: |
* try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1 [MartinAlbrecht] | * --(try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1)-- [MartinAlbrecht] |
Line 32: | Line 41: |
* L1 cache doesn't seem to be the main reason for performance then * try say 16 tables. * try to fit three matrices rather than two into L2 or understand why it works so good for two * detect L1/L2 cache sizes at runtime and choose optimal parameters for them |
* try say 16 tables instead of 8 on the Core2Duo [MartinAlbrecht] * try Bill Hart's idea for L1 cache efficiency on the Core2Duo [MartinAlbrecht] * The idea is: construct 8 gray code tables but use one once it is in L1 a lot to reduce before using the second * --(try to fit three matrices rather than two into L2 or understand why it works so good for two)-- [MartinAlbrecht] * it works since once we've written the date we can go on computing and let a cache handle the rest * a cache miss for reading on the other hand really prevents us from computing * --(detect L1/L2 cache sizes at runtime and choose optimal parameters for them)-- [MartinAlbrecht] * Implement memory management s.t. the sys time on Opterons decreases. |
Dev Days 1: Exact Linear Algebra
Michael Abshoff -> LinBox debianization
MartinAlbrecht -> M4RI GF(2)
Nick Alexander -> cylcotomic linalg
Gregory Bard -> M4RI
- Rob Beezer
- Tom Boothby
Craig Citro -> cyclotomic linalg, p-adic linalg
Burcin Erocal -> nullspace of polynomial matrices
Robert Miller (especially sparse GF(2)) -> sparse GF(2)
David Roe (p-adic linear algebra?) -> cyclotomic linear algebra
Arne Storjohann -> HNF, Sparse GF(2) Linalg
William Stein -> Cyclotomic linear algebra, HNF (LLL)
Ralf-Philipp Weinmann -> Sparse Elimination, Nullspace
Dense GF(2)
implement LQUP decomposition [ClementPernet, MartinAlbrecht]
implement LQUP routine [Clement]
implement _mzd_addmul_weird [Clement]
implement LQUP basecase routine based on M4RI mzd_lqup_m4ri [MartinAlbrecht]
dont' zero out in _mzd_gauss_submatrix
- don't zero out below
remember rowswaps => P
remember Q[r] = c (translate to Lapack style later )
implement TRSM routine [Clement]
implement inplace triangular matrix inversion mzd_trtri [Clement]
implement triangular triangular matrix multiplication mzd_trtrm [Clement]
implement column swaps [MartinAlbrecht]
implement column rotations [MartinAlbrecht]
implement memory efficient mzd_addmul_strassen [Martin]
- See Clement's et al. paper on memory efficient Strassen-Winograd
implement Arne's asymptotically fast elimination algorithm [MartinAlbrecht]
It was agreed that this is not what we want for M4RI, a toy implementation is available at: http://www.informatik.uni-bremen.de/cgi-bin/cgiwrap/malb/blosxom.pl/2008/06/13#gauss-jordan-storjohann
- implement multi-core multiplication with optimal speed-up
- OpenMP seems to be nice and easy
- 2 cores probably main target, but think about 4 cores too
- improve efficiency of M4RM
try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1 [MartinAlbrecht]
- It seems slower to be slower to use 7 tables rather than 8
try say 16 tables instead of 8 on the Core2Duo [MartinAlbrecht]
try Bill Hart's idea for L1 cache efficiency on the Core2Duo [MartinAlbrecht]
- The idea is: construct 8 gray code tables but use one once it is in L1 a lot to reduce before using the second
try to fit three matrices rather than two into L2 or understand why it works so good for two [MartinAlbrecht]
- it works since once we've written the date we can go on computing and let a cache handle the rest
- a cache miss for reading on the other hand really prevents us from computing
detect L1/L2 cache sizes at runtime and choose optimal parameters for them [MartinAlbrecht]
- Implement memory management s.t. the sys time on Opterons decreases.
implement Bill's [http://groups.google.com/group/sage-devel/msg/6279228095b3d9f7 half table idea] and benchmark it
Sparse GF(2) (and other small finite fields)
- [Arne, Ralph, Clément, Rob Miller]
- Sparse Reduced Echelon form (RPW)
- Sparse Elimination:
improve LinBox gauss-domain
- eclib sparse elimination
- ....
- Sparse Elimination:
- ....
Hermite Normal Form
- [Arne, Clément, William]
new algorithm, based on system solving (Arne Storjohann) -> already an implementation
- integrate it in IML
- benchmark it against Sage
- generalization: block vector system solving
- implementation
- benchmark
LLL trick for the existing implementation in Sage (William & Clément)
Polynomial Matrix computations
- [Burcin]
- Nullspace computation (Burcin)
Cyclotomic linear algebra
- [Rob Bradshaw, Craig, William]
P-adic linear algebra
- [David Roe, Craig, David Cohet]
LinBox debianization and 1.1.6 spkgization
- [mabshoff, Clément]
- get rid of gmp++
- generate and test linbox-1.1.6rc0.spkg