Differences between revisions 14 and 15

Dev Days 1: Exact Linear Algebra

Dense GF(2)

implement LQUP decomposition [Clement, MartinAlbrecht]
- implement LQUP routine [Clement]
- implement TRSM routine [Clement]
- implement efficient column swaps [MartinAlbrecht]
- implement efficient column rotations [MartinAlbrecht]
  - SSE2 might help a lot here
- implement memory efficient mzd_addmul_strassen [Martin]
  - See Clement's et al. paper on memory efficient Strassen-Winograd
implement Arne's asymptotically fast elimination algorithm [MartinAlbrecht]
implement multi-core multiplication with optimal speed-up
- OpenMP seems to be nice and easy
- 2 cores probably main target, but think about 4 cores too
improve efficiency of M4RM
- try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1 [MartinAlbrecht]
  - It seems slower to be slower to use 7 tables rather than 8
- try say 16 tables instead of 8 on the Core2Duo [MartinAlbrecht]
- try Bill Hart's idea for L1 cache efficiency on the Core2Duo [MartinAlbrecht]
- try to fit three matrices rather than two into L2 or understand why it works so good for two [MartinAlbrecht]
  - it works since once we've written the date we can go on computing and let a cache handle the rest
  - a cache miss for reading on the other hand really prevents us from computing
- detect L1/L2 cache sizes at runtime and choose optimal parameters for them
- implement Bill's [http://groups.google.com/group/sage-devel/msg/6279228095b3d9f7 half table idea] and benchmark it

[Arne, Ralph, Clément, Rob Miller]
Sparse Reduced Echelon form (RPW)
- Sparse Elimination:
  - improve LinBox gauss-domain
  - eclib sparse elimination
  - ....
....

[Arne, Clément, William]
new algorithm, based on system solving (Arne Storjohann) -> already an implementation
- integrate it in IML
- benchmark it against Sage
generalization: block vector system solving
- implementation
- benchmark
LLL trick for the existing implementation in Sage (William & Clément)

-  ⇤ ← Revision 14 as of 2008-06-16 08:04:49 → 
  Size: 2900
  Editor: MartinAlbrecht
  Comment:
+   ← Revision 15 as of 2008-06-16 08:08:08 → ⇥
  Size: 3176
  Editor: MartinAlbrecht
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 31:
-   * try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1 [MartinAlbrecht]
+   * --(try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1)-- [MartinAlbrecht]
 Line 33:
-     * L1 cache doesn't seem to be the main reason for performance then
     * try say 16 tables.
   * try to fit three matrices rather than two into L2 or understand why it works so good for two
+   * try say 16 tables instead of 8 on the Core2Duo [MartinAlbrecht]
   * try Bill Hart's idea for L1 cache efficiency on the Core2Duo [MartinAlbrecht]
   * --(try to fit three matrices rather than two into L2 or understand why it works so good for two)-- [MartinAlbrecht]
     * it works since once we've written the date we can go on computing and let a cache handle the rest
     * a cache miss for reading on the other hand really prevents us from computing