Differences between revisions 4 and 5
Revision 4 as of 2008-05-20 17:14:53
Size: 693
Editor: RobertMiller
Comment:
Revision 5 as of 2008-06-13 04:00:11
Size: 1271
Comment:
Deletions are marked like this. Additions are marked like this.
Line 18: Line 18:
 * Implement Strassen (variant) for dense matrix reduction. The result should be faster than anything else available. Greg has two strategies in mind. We'll implement both and compare their speed.
 * Improve/implement multi-core matrix multiplication and if feasible multi-core reduction
 * If there is time: sparse matrix techniques
 * implement LQUP decomposition [Clement]
   * implement LQUP routine
   * implement TRSM routine
   * implement efficient column swaps/rotations [Martin]
     * SSE2 might help a lot here
   * implement memory efficient mzd_addmul_strassen [Martin]
     * See Clement's et al. paper on memory efficient Strassen-Winograd
 * implement Arne's asymptotically fast reduction algorithm [Martin]
 * implement multi-core multiplication with optimal speed-up
   * OpenMP seems to be nice and easy
   * 2 cores probably main target, but think about 4 cores too
 * improve efficiency of M4RM
   * try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1
   * try to fit three matrices rather than two into L2 or understand why it works so good for two
   * detect L1/L2 cache sizes at runtime and choose optimal parameters for them
   * implement Bill's half table idea and benchmark it

Dev Days 1: Exact Linear Algebra

  • Michael Abshoff
  • Martin Albrecht
  • Nick Alexander
  • Gregory Bard
  • Rob Beezer
  • Tom Boothby
  • Craig Citro
  • Burcin Erocal
  • Robert Miller (especially sparse GF(2))
  • David Roe (p-adic linear algebra?)
  • Arne Storjohann
  • William Stein
  • Ralf-Philipp Weinmann

GF(2)

  • implement LQUP decomposition [Clement]
    • implement LQUP routine
    • implement TRSM routine
    • implement efficient column swaps/rotations [Martin]
      • SSE2 might help a lot here
    • implement memory efficient mzd_addmul_strassen [Martin]
      • See Clement's et al. paper on memory efficient Strassen-Winograd
  • implement Arne's asymptotically fast reduction algorithm [Martin]
  • implement multi-core multiplication with optimal speed-up
    • OpenMP seems to be nice and easy
    • 2 cores probably main target, but think about 4 cores too
  • improve efficiency of M4RM
    • try 7 instead of 8 Gray code tables to leave room for the actual matrix in L1
    • try to fit three matrices rather than two into L2 or understand why it works so good for two
    • detect L1/L2 cache sizes at runtime and choose optimal parameters for them
    • implement Bill's half table idea and benchmark it

dev1/linalg (last edited 2008-11-14 13:41:56 by anonymous)