Differences between revisions 9 and 10
Revision 9 as of 2009-05-17 22:36:38
Size: 1321
Editor: WilliamHart
Comment:
Revision 10 as of 2009-05-17 22:36:59
Size: 1323
Editor: WilliamHart
Comment:
Deletions are marked like this. Additions are marked like this.
Line 29: Line 29:
 * Memory bandwidth limits algorithms - matrices n^2 entries to get in and out, matrix multiplication O(n^2.7), but for integers n limbs to get in and out O(n log n log log n) operations to multiply  * Memory bandwidth limits algorithms - matrices n**2 entries to get in and out, matrix multiplication O(n**2.7), but for integers n limbs to get in and out O(n log n log log n) operations to multiply

MPIR - Parallel Algorithms and CUDA

Present : Carl Witty, Bill Hart, Michael Abshoff, Glenn Tarbox Virtually Present : Jeff Gilchrist, Gonzalo Tornaria

You can chat in a Linux text console by installing "irssi" and running: "irssi -c irc.freenode.net" and then type "/join #sage-devel"

Parallel algorithms:

  • Multimodular algorithms
  • Scalar algorithms
  • Peter Montgomery's remainder algorithm a mod b, precompute b1 = B mod b, b2 = B2 mod b, b3 = B3 mod b, then write a = a0 + a1*B + a2*B^2 +..., then compute a0 + a1*b1 + a2*b2 +.... and do final reduction mod b. Multiplications can be done in parallel.

  • Addition and subtraction can be parallelised using nails - non-unique representation of numbers

Glenn Tarbox (Owner of cuda1, AMD K10 with NVIDA CUDA card - expert on large scale parallelisation)

  • What are the top level integration issues, e.g. by libraries using MPIR

Michael Abshoff (Sage release manager)

  • Link into Sage via cython and link in CUDA

CUDA documentation:

CUDA issues:

  • Memory bandwidth limits algorithms - matrices n**2 entries to get in and out, matrix multiplication O(n**2.7), but for integers n limbs to get in and out O(n log n log log n) operations to multiply

CUDA (last edited 2009-05-17 23:53:03 by WilliamHart)