Differences between revisions 13 and 27 (spanning 14 versions)

(Preliminary) SEP 2: Optimizing the SAGE library using Parallel Techniques

AUTHOR: William Stein

Basic Principles

Parallel methods should be viewed as a means to an end -- speedups. Never parallelize any computation except to speed up a calculation beyond what can be done using sequential techniques.
Parallel methods should never completely replace sequential implementations. Parallel algorithms are often very complicated to understand and test, so we need to at a minimum have a randomized test function that compares with the output of purely sequential code.
Do not write extremely complicated parallel code that nobody can understand or maintain. Because SAGE is an open source system that is widely developed, it is crucial that it be readable.
It is *crucial* that implementation of parallel methods in SAGE have the following properties:
- It can be done incrementally. One must be able to start with almost any specific operation or algorithm in SAGE and make a parallel version without having to drastically change code all over SAGE. Any proposed solutions that violate this fails our needs.
- It doesn't depend on any libraries or tools that are not open source and free, and all dependencies must work on the SAGE target platforms: Linux, OS X, Windows, (and soon Solaris).
- All dependencies for parallel algorithms must be included standard with SAGE.

Architecture

We propose that parallel optimizations for SAGE are carried out (in parallel!) using three complementary approaches: fine (multithreaded), medium (mpi), and coarse (dsage task farming).

COMMENTS:

Collapse the two bottom levels (i.e., thread and message passing)!
Avoid threads if at all possible!
- But sometimes use them.
- Parallel databases?!
- Scheduling of threads is very hard.
- Scaling not the same for threads... (something is possible but...)
MPI -- harder to write, but maybe less subtle.
Not at all clear how to manage how parallel each component should be.
Be realistic. Don't try to solve the general problem...
If you want up and running soon (I do!); the only choice is MPI. Duh.
Definitely want support library to make it easy to use MPI by programmers.
SAGE -- target audience = working mathematician researcher; quick prototyping; not an industrial applications...
Resources -- evil malloc; pthreads can very quickly give your application level info. Need some sort of "how much 'parallel power' is available". Be single threaded....

TWO LEVELS: i.e., collapse the top two.
- coarse -- proudly parallel
- fine -- maybe use gasnet.
THREE LEVELS OF USERS:
1. task farmer
2. i do parallel programming, e.g., using SAGE constructs.
3. i use a function that happens to be parallel (and maybe don't know)
MPI:
- you might think it's static from tutorials...
- but it's dynamic MPI-2.* only though.
- parallel programming is hard.
pthreads -- programming model is "simpler", since you can ask for bazillions of threads... and get a speedup.
java is nicely parallel already, so why don't you just write everything in java!! because: (1) maintenance issues and frustration, (2)
cilk
what will help the target (research mathematicians) the most with their research?? (task farming.) Enable people to do what they would otherwise not do.
distinguish: ideal versus reality -- KISS.
Task farming is first thing people are interested in. Step 1. BUT, then I will want to *parallelize my algorithms* transparently. Optimize iteratively.
Loosing the intermediate level:
- - low level: - Python people LOOSE something in between - coarse grain:
moving around complicated mathematical objects is interesting and subtle.

1. Fine level -- shared memory (mostly multicore desktop/laptop)

Proposed tool: the standard POSIX thread library pthread

Justification:

pthread is available on all target platforms, is mature, and is well supported and optimized
with some thought I think we can make it more usable for our applications (macros, preparsing whatever).

Design issues:

Have a global variable nthreads

Problems:

That Python is not thread safe means that many natural approaches to optimizing the SAGE libraries is not a good idea.
It's difficult to decide on how many threads to spawn at any given point.
(When) Should one use a thread pool?

Self-imposed constraint: pthreads can only be used as follows:

    ... arbitrary sagex code ...
    # atomic threaded c-level function call that gets no PyObject*'s and makes no Python/C API calls
    # called with explicit input that gives the max number of threads it is allowed to spawn (-1 = any number)
    ... arbitrary sagex code ...

Sample problems: non-generic matrix addition, non-generic scalar multiplication, polynomial arithmetic, L-series coefficients, approximation of infinite sums, matrix times vector

2. Medium level -- homogeneous trusted cluster

Proposed tool: ipython1 with MPI under the hood.

Justification:

This is the hardware that the ipython developers use.
It's written in Python, well tested, and will be included in SAGE anyways.

Example problems:

multi-modular linear algebra algorithms
systems of Hecke eigenvalues
speed-up very generic matrix operations in some cases
Optimizing interpreted python code with various loops, etc., where individual operations don't take long.

3. Coarse level -- heterogenous task farm (both trusted and untrusted)

Proposed tool: dsage

Justification:

Written in Python to address specific problems we have.

Example problems:

proudly parallel problems.
integer factorization
creation of a wide range of tables (e.g., tables of elliptic curves, modular forms, computing [f(n) for n in range(...)] where f is a function in GAP, PARI, Magma, etc.)
computing plots of a collection of functions (especially high quality 3d)
search for abc triples
lots of *end user* use of parallelism for their own work.

-  ⇤ ← Revision 13 as of 2007-02-01 20:10:26 → 
  Size: 3465
  Editor: wstein
  Comment:
+   ← Revision 27 as of 2008-11-14 13:41:58 → ⇥
  Size: 6417
  Editor: anonymous
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= (Preliminary) SEP 2: Parallelization Plans For SAGE =

This is a SAGE enhancement proposal.
+= (Preliminary) SEP 2: Optimizing the SAGE library using Parallel Techniques =
-Line 6:
+Line 4:
-Line 7:
+Line 6:
-The core SAGE library is a collection of Python and sagex files.
-Line 12:
+Line 9:
-Many of these are motivated by my (Stein's) perspective as the '''maintainer''' and integrated of SAGE, and recruiter of new developers...

  1. Parallel methods should always be viewed as a means to an end -- speedups.  Never parallelize any computation except to speed up a calculation beyond what can be done using sequential techniques. 
  2. Parallel methods should never completely replace sequential implementations.  Parallel algorithms are often very complicated to understand and test, so we need to at a ''minimum'' have a randomized test function that compares with that output of purely sequential code. 
  3. Do not write insanely complicated parallel code that nobody can understand or maintain.  Because SAGE is an open source system that is widely developed, it is ''crucial'' that it be readable.
+. Parallel methods should be viewed as a means to an end -- speedups.  Never parallelize any computation except to speed up a calculation beyond what can be done using sequential techniques. 
  2. Parallel methods should never completely replace sequential implementations.  Parallel algorithms are often very complicated to understand and test, so we need to at a ''minimum'' have a randomized test function that compares with the output of purely sequential code. 
  3. Do not write extremely complicated parallel code that nobody can understand or maintain.  Because SAGE is an open source system that is widely developed, it is ''crucial'' that it be readable.
-Line 18:
+Line 13:
-      * It can be done incrementally.  One must be able to start with almost any specific operation or algorithm in SAGE and make a parallel version without having to drastically change code all over SAGE.  Any proposed solutions that violate this fail our needs.
+      * It can be done incrementally.  One must be able to start with almost any specific operation or algorithm in SAGE and make a parallel version without having to drastically change code all over SAGE.  Any proposed solutions that violate this fails our needs.
-Line 20:
+Line 15:
-      * For any core tools that are needed must be made part of SAGE.
+      * All dependencies for parallel algorithms must be included standard with SAGE.
-Line 24:
+Line 19:
-There are three levels to consider.
+We propose that parallel optimizations for SAGE are carried out (in parallel!) using three complementary approaches: fine (multithreaded), medium (mpi), and coarse (dsage task farming).
-Line 26:
+Line 21:
-=== 1. Low -- shared memory (mostly multicore desktop/laptop) ===
+COMMENTS:
  1. Collapse the two bottom levels (i.e., thread and message passing)!
  2. Avoid threads if at all possible!
      * But sometimes use them. 
      * Parallel databases?!
      * Scheduling of threads is very hard.
      * Scaling not the same for threads... (something is possible but...)
  3. MPI -- harder to write, but maybe less subtle.
  4. Not at all clear how to manage how parallel each component should be.
  5. Be realistic.  Don't try to solve the general problem...
  6. If you want up and running soon (I do!); the only choice is MPI.  Duh.
  7. Definitely want support library to make it easy to use MPI by programmers.
  8. SAGE -- target audience = working mathematician researcher; quick prototyping; not an industrial applications...
  9. Resources -- evil malloc; pthreads can very quickly give your application level info.  Need some sort of "how much 'parallel power' is available".   Be single threaded....
 10. TWO LEVELS:  i.e., collapse the top two.
      * coarse -- proudly parallel
      * fine -- maybe use gasnet.
 11. THREE LEVELS OF USERS:
      1. task farmer
      2. i do parallel programming, e.g., using SAGE constructs.
      3. i use a function that happens to be parallel (and maybe don't know)
 12. MPI:
      * you might think it's static from tutorials...
      * but it's dynamic (!)   MPI-2.* only though.
      * parallel programming is hard.
 13. pthreads -- programming model is "simpler", since you can ask for bazillions of threads... and get a speedup.
 14. java is nicely parallel already, so why don't you just write everything in java!!  because: (1) maintenance issues and frustration, (2) 
 15. cilk
 16. what will help the target (research mathematicians) the most with their research??  (task farming.)  Enable people to do what they would otherwise not do.
 17. distinguish: ideal versus reality -- KISS.
 18. Task farming is first thing people are interested in.  Step 1.  BUT, then I will want to *parallelize my algorithms* transparently.  Optimize iteratively.
 19. Loosing the intermediate level:
       - low level:
       - Python people LOOSE something in between
       - coarse grain:
 20. moving around complicated mathematical objects is interesting and subtle.
 21. 
=== 1. Fine level -- shared memory (mostly multicore desktop/laptop) ===
-Line 28:
+Line 60:
-Proposed tool: pthread
+Proposed tool: the standard POSIX thread library pthread
-Line 31:
+Line 63:
-   * pthread is available on all target platforms and is well supported
   * mature
+   * pthread is available on all target platforms, is mature, and is well supported and optimized
-Line 40:
+Line 71:
-   * That Python is not thread safe is a '''major source of misery'''.
+   * That Python is not thread safe means that many natural approaches to optimizing the SAGE libraries is not a good idea.
-Line 43:
+Line 74:
-   * If we try to do too much, this will be really hard.
   * If we make a couple of very clear constraints and rules, this will be doable, but maybe frustrating.  Possibilities:
        * pthreads can '''only''' be used as follows:
{{{
+   * Self-imposed constraint: pthreads can '''only''' be used as follows:{{{
-Line 49:
+Line 77:
+    # called with explicit input that gives the max number of threads it is allowed to spawn (-1 = any number)
-Line 51:
+Line 80:
+        * Sample problems: non-generic matrix addition, non-generic scalar multiplication, polynomial arithmetic, L-series coefficients, approximation of infinite sums, matrix times vector
-Line 52:
+Line 82:
-Alternative tool: multiple processes and a shared memory segment
   * via UPC -- heavy and hard to build (??)  maybe not right for us, because it's mainly for rather uniform computations.
   * via shared pages -- might not be fast enough.
+=== 2. Medium level -- homogeneous trusted cluster ===
-Line 56:
+Line 84:
-=== 2. Middle -- homogeneous trusted cluster ===

Proposed tool: ipython1 (with mpi)
+Proposed tool: ipython1 with MPI under the hood.
-Line 67:
+Line 90:
-=== 3. High -- heterogenous task farm (both trusted and untrusted) ===
+Example problems:
  * multi-modular linear algebra algorithms
  * systems of Hecke eigenvalues
  * speed-up very generic matrix operations in some cases
  * Optimizing interpreted python code with various loops, etc., where individual operations don't take long.

=== 3. Coarse level -- heterogenous task farm (both trusted and untrusted) ===
-Line 74:
+Line 103:
+Example problems:
  * proudly parallel problems.
  * integer factorization
  * creation of a wide range of tables (e.g., tables of elliptic curves, modular forms, computing {{{[f(n) for n in range(...)]}}} where f is a function in GAP, PARI, Magma, etc.)
  * computing plots of a collection of functions (especially high quality 3d)
  * search for abc triples :-)
  * lots of *end user* use of parallelism for their own work.

Diff for "msri07/plans"

(Preliminary) SEP 2: Optimizing the SAGE library using Parallel Techniques

Basic Principles

Architecture

1. Fine level -- shared memory (mostly multicore desktop/laptop)

2. Medium level -- homogeneous trusted cluster

3. Coarse level -- heterogenous task farm (both trusted and untrusted)