Differences between revisions 7 and 20 (spanning 13 versions)

Dev Days 1: Exact Linear Algebra

Michael Abshoff -> LinBox debianization
MartinAlbrecht -> M4RI GF(2)
Nick Alexander -> cylcotomic linalg
Gregory Bard -> M4RI
Rob Beezer
Tom Boothby
Craig Citro -> cyclotomic linalg, p-adic linalg
Burcin Erocal -> nullspace of polynomial matrices
Robert Miller (especially sparse GF(2)) -> sparse GF(2)
David Roe (p-adic linear algebra?) -> cyclotomic linear algebra
Arne Storjohann -> HNF, Sparse GF(2) Linalg
William Stein -> Cyclotomic linear algebra, HNF (LLL)
Ralf-Philipp Weinmann -> Sparse Elimination, Nullspace

Dense GF(2)

implement LQUP decomposition [ClementPernet, MartinAlbrecht]
- implement LQUP routine [Clement]
- implement _mzd_addmul_weird [Clement]
- implement LQUP basecase routine based on M4RI mzd_lqup_m4ri [MartinAlbrecht]
  - dont' zero out in _mzd_gauss_submatrix
  - don't zero out below
  - remember rowswaps => P
  - remember Q[r] = c (translate to Lapack style later )
- implement TRSM routine [Clement]
- implement inplace triangular matrix inversion mzd_trtri [Clement]
- implement triangular triangular matrix multiplication mzd_trtrm [Clement]
- implement column swaps [MartinAlbrecht]
- implement column rotations [MartinAlbrecht]
- implement memory efficient mzd_addmul_strassen [Martin]
  - See Clement's et al. paper on memory efficient Strassen-Winograd
implement Arne's asymptotically fast elimination algorithm [MartinAlbrecht]
- It was agreed that this is not what we want for M4RI, a toy implementation is available at: http://www.informatik.uni-bremen.de/cgi-bin/cgiwrap/malb/blosxom.pl/2008/06/13#gauss-jordan-storjohann
implement multi-core multiplication with optimal speed-up
- OpenMP seems to be nice and easy
- 2 cores probably main target, but think about 4 cores too
improve efficiency of M4RM
- try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1 [MartinAlbrecht]
  - It seems slower to be slower to use 7 tables rather than 8
- try say 16 tables instead of 8 on the Core2Duo [MartinAlbrecht]
- try Bill Hart's idea for L1 cache efficiency on the Core2Duo [MartinAlbrecht]
  - The idea is: construct 8 gray code tables but use one once it is in L1 a lot to reduce before using the second
- try to fit three matrices rather than two into L2 or understand why it works so good for two [MartinAlbrecht]
  - it works since once we've written the date we can go on computing and let a cache handle the rest
  - a cache miss for reading on the other hand really prevents us from computing
- detect L1/L2 cache sizes at runtime and choose optimal parameters for them [MartinAlbrecht]
- Implement memory management s.t. the sys time on Opterons decreases.
- implement Bill's half table idea and benchmark it

Sparse GF(2) (and other small finite fields)

[Arne, Ralph, Clément, Rob Miller]
Sparse Reduced Echelon form (RPW)
- Sparse Elimination:
  - improve LinBox gauss-domain
  - eclib sparse elimination
  - ....
....

Hermite Normal Form

[Arne, Clément, William]
new algorithm, based on system solving (Arne Storjohann) -> already an implementation
- integrate it in IML
- benchmark it against Sage
generalization: block vector system solving
- implementation
- benchmark
LLL trick for the existing implementation in Sage (William & Clément)

Polynomial Matrix computations

[Burcin]
Nullspace computation (Burcin)

Cyclotomic linear algebra

[Rob Bradshaw, Craig, William]

P-adic linear algebra

[David Roe, Craig, David Cohet]

LinBox debianization and 1.1.6 spkgization

[mabshoff, Clément]
get rid of gmp++
generate and test linbox-1.1.6rc0.spkg

-  ⇤ ← Revision 7 as of 2008-06-13 04:11:12 → 
  Size: 1301
  Editor: MartinAlbrecht
  Comment:
+   ← Revision 20 as of 2008-11-14 13:41:56 → ⇥
  Size: 4048
  Editor: anonymous
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-  * Michael Abshoff
  * Martin Albrecht
  * Nick Alexander
  * Gregory Bard
+  * Michael Abshoff -> LinBox debianization
  * MartinAlbrecht -> M4RI GF(2)
  * Nick Alexander -> cylcotomic linalg
  * Gregory Bard -> M4RI
 Line 9:
-  * Craig Citro
  * Burcin Erocal
  * Robert Miller (especially sparse GF(2))
  * David Roe (p-adic linear algebra?)
  * Arne Storjohann
  * William Stein
  * Ralf-Philipp Weinmann
+  * Craig Citro -> cyclotomic linalg, p-adic linalg
  * Burcin Erocal -> nullspace of polynomial matrices
  * Robert Miller (especially sparse GF(2)) -> sparse GF(2)
  * David Roe (p-adic linear algebra?) -> cyclotomic linear algebra
  * Arne Storjohann -> HNF, Sparse GF(2) Linalg
  * William Stein -> Cyclotomic linear algebra, HNF (LLL)
  * Ralf-Philipp Weinmann -> Sparse Elimination, Nullspace
 Line 17:
-== GF(2) ==
 * implement LQUP decomposition [Clement, Martin]
   * implement LQUP routine [Clement]
   * implement TRSM routine [Clement]
   * implement efficient column swaps/rotations [Martin]
     * SSE2 might help a lot here
   * implement memory efficient mzd_addmul_strassen [Martin]
+== Dense GF(2) ==
 * implement LQUP decomposition [ClementPernet, MartinAlbrecht]
   * --(implement LQUP routine)-- [Clement]
   * implement {{{_mzd_addmul_weird}}} [Clement]
   * implement LQUP basecase routine based on M4RI {{{mzd_lqup_m4ri}}} [MartinAlbrecht]
     * dont' zero out in _mzd_gauss_submatrix (!)
     * don't zero out below
     * remember rowswaps => P
     * remember Q[r] = c (translate to Lapack style later :-) )
   * --(implement TRSM routine)-- [Clement]
   * implement inplace triangular matrix inversion {{{mzd_trtri}}} [Clement]
   * implement triangular triangular matrix multiplication {{{mzd_trtrm}}} [Clement]
   * --(implement column swaps)-- [MartinAlbrecht]
   * --(implement column rotations)-- [MartinAlbrecht]
   * --(implement memory efficient mzd_addmul_strassen)-- [Martin]
-Line 25:
+Line 33:
- * implement Arne's asymptotically fast elimination algorithm [Martin]
+ * --(implement Arne's asymptotically fast elimination algorithm)-- [MartinAlbrecht]
   * It was agreed that this is not what we want for M4RI, a toy implementation is available at: http://www.informatik.uni-bremen.de/cgi-bin/cgiwrap/malb/blosxom.pl/2008/06/13#gauss-jordan-storjohann
-Line 30:
+Line 39:
-   * try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1
   * try to fit three matrices rather than two into L2 or understand why it works so good for two
   * detect L1/L2 cache sizes at runtime and choose optimal parameters for them
   * implement Bill's half table idea and benchmark it
+   * --(try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1)-- [MartinAlbrecht]
     * It seems slower to be slower to use 7 tables rather than 8
   * try say 16 tables instead of 8 on the Core2Duo [MartinAlbrecht]
   * try Bill Hart's idea for L1 cache efficiency on the Core2Duo [MartinAlbrecht]
     * The idea is: construct 8 gray code tables but use one once it is in L1 a lot to reduce before using the second
   * --(try to fit three matrices rather than two into L2 or understand why it works so good for two)-- [MartinAlbrecht]
     * it works since once we've written the date we can go on computing and let a cache handle the rest
     * a cache miss for reading on the other hand really prevents us from computing
   * --(detect L1/L2 cache sizes at runtime and choose optimal parameters for them)-- [MartinAlbrecht]
   * Implement memory management s.t. the sys time on Opterons decreases.
   * implement Bill's [[http://groups.google.com/group/sage-devel/msg/6279228095b3d9f7|half table idea]] and benchmark it 

== Sparse GF(2) (and other small finite fields) ==
 [Arne, Ralph, Clément, Rob Miller]
 * Sparse Reduced Echelon form (RPW)
   * Sparse Elimination: 
     * improve LinBox gauss-domain
     * eclib sparse elimination
     * ....
 * ....


== Hermite Normal Form ==
 [Arne, Clément, William]
 * new algorithm, based on system solving (Arne Storjohann) -> already an implementation
   * integrate it in IML
   * benchmark it against Sage
 * generalization: block vector system solving
   * implementation
   * benchmark
 * LLL trick for the existing implementation in Sage (William & Clément)

== Polynomial Matrix computations ==
 [Burcin]
 * Nullspace computation (Burcin)

== Cyclotomic linear algebra ==
 [Rob Bradshaw, Craig, William]

== P-adic linear algebra ==
 [David Roe, Craig, David Cohet]

== LinBox debianization and 1.1.6 spkgization ==
 [mabshoff, Clément]
 * get rid of gmp++
 * generate and test linbox-1.1.6rc0.spkg

Diff for "dev1/linalg"

Dev Days 1: Exact Linear Algebra

Dense GF(2)

Sparse GF(2) (and other small finite fields)

Hermite Normal Form

Polynomial Matrix computations

Cyclotomic linear algebra

P-adic linear algebra

LinBox debianization and 1.1.6 spkgization