attachment:notes_hida.txt of msri07/notes

Toggle line numbers
   1 10:30 a.m.
   2 Moving Sca/Lapack to Higher Precision (without too much work)
   3 Yozo Hida
   4 
   5 Lapack 3.1 has been released for about 2 months now.
   6 	-Improved MRRR algorithm for symmetric eigenproblems
   7 	-Faster Hessenberg QR
   8 	-Faster Hessenberg, tridiagonal, bidiagonal reductions.
   9 	-Mixed precision iterative refinement.
  10 	-Thread safety
  11 
  12 Current Activities:
  13 	Faster Algorithms
  14 		-O(n^2) solver for polynomial roots via companion matrix
  15 	More Accurate Algorithms
  16 		-Higher precisions
  17 	Expanded contents (more Lapack in ScaLapack)
  18 	Improve ease of use
  19 
  20 What could go into Lapack?
  21 for all linear algebra problems do
  22 	for all matrix types do
  23 		for all data types do
  24 			for all architectures and networks do
  25 				for all programming interfaces do
  26 					Procude fastest algorithm providing acceptable accuracy;
  27 					Produce most accurate algorithm running at acceptable speed;
  28 				od;
  29 			od;
  30 		od;
  31 	od;
  32 od;
  33 
  34 This talk will focus on what we can do as far as data types goes, especially multiple precision.
  35 
  36 Motivation
  37 	Lapack and ScaLapack are widely used.
  38 	Significant minority want multiple precision
  39 	Want to support codes like:
  40 		n_bits <- 32
  41 		repeat
  42 			n_bits <- n_bits x 2
  43 			Solve with n_bits
  44 		until error < tol
  45 
  46 Fixed Precision:
  47 	Compiler supported quad
  48 	Double-double, quad-double
  49 Arbitrary Precision
  50 	GMP / MPFR
  51 	ARPREC
  52 
  53 Double-double / Quad-double Package:
  54 	-These represent a higher precision numers as an unevaluated sum of 2 or 4 double precision numbers
  55 	-Ex:  2^60 + 1 is represented as (2^60, 1)
  56 	-Can represent about 32/64 digits of precision
  57 	-Highly efficient algorithms due to fixed, small size.
  58 	-Simple representation.  Easy for allocation or to use with MPI
  59 	-Somewhat fuzzy definition of "machine precision" for double-double
  60 	-Exponent range limited by that of double precision.
  61 	-C/C++/Fortran 95 interafaces
  62 	-Support for ocmplex data types recently added
  63 
  64 GMP/MPFR:
  65 	-One of most widely used multiple-precision floating point computations with correct rounding.
  66 	-Speed
  67 	-Uses a somewhat complicated data structure, mixing various types in a C structure.
  68 
  69 ARPREC:
  70 	-Uses simple floating point array to represent data.
  71 	-Message passing is made easier
  72 	-Slower than GMP
  73 
  74 Considerations:
  75 	-Code will be easier to maintain if only a single source code is used for varying precision.
  76 	-Need to let the user know how much workspace to allocate
  77 	-Allocation of temporary variables
  78 	-Multiple precisions in one instance: allowable?
  79 	-How to adjust precision.
  80 	-Co-existence of multiple versions?
  81 
  82 Workspace allocation:
  83 	Specifying syze by bytes doesn't work well in Fortran.  Specifying size by bignum slots works okay if every bignum in the workspace has the same size.
  84 	Allocate ourself?  Would be nice for cache friendliness and ease of message passing.
  85 
  86 Temporary Variable Allocation:  Memory for temp variables needs to be allocated somewhere.
  87 	Could use external malloc routine or explicitly allocate.  With fixed precisions, have compiler automaticall allocate
  88 	Could be solved through the use of macros.
  89 
  90 Memory allocation:  how much?  when?  Where?
  91 Fixed Precision, the easiest approach.
  92 	Many compilers provide support for quad precision
  93 	OMF77 supports multi-precision variables (precision is compile-time selected)
  94 Maximium precision:
  95 	User specifies maximum precision at compile time.  At run time, program specifies precision to be used.  Memory allocation issues are handled automatically, but memory can be wasted needlessly.
  96 Variable precision:
  97 	Memory allocation becomes tricky.  Memory must be allocated dynamically.
  98 
  99 A simple replacement of variable types is not enough when working in multiple precision.  The compiler must see each piece of a variable as a multiple precision number so that multiple precision.  Also, when dealing with constants not expressable as a double, extra care is needed.
 100 
 101 There are naming issues that need to be resolved as more packages are added.
 102 
 103 Implemented currently as a Perl script to convert Lapack sources to perform what is needed.
 104 
 105 Iterative Refinement:
 106 	Use Newton's iteration on r(x) = b-Ax but compute the residual in double the working precision
 107 	You can get a more accurate answer up to a point.  Error does not grow with condition number up to a point.
 108 
 109 Once the high-precision library is set up appropriately, most of the existing Lapack code can be transformed with little trouble.
 110 
 111 Future work:
 112 	More comprehensive error handling.
 113 	How much performance can we buy from using multi-core BLAS
 114 	Apply this to ScaLapack.
 115 
 116 Recent Results in Fast Stable matrix Computations:
 117 	-Coppersmith and Winograd, 1990, fastest
 118 	-Strassen et al. showed O(n^a) matrix multiplication implies O(n^a) other things.
 119 	-Cohn, Kleinberg, Szegedy, Umans 2005 algorithm equals or beats Coppersmith and Winograd, depending on finding the right groups.  IT uses Wedderburn's Theorem to reduce matrix multiplication to FFT.
 120 	New results (Demmel, Dumitriu, Holtz, Kleinberg 2006):
 121 		-CKSU(2005) algorithm is stable
 122 		-Any O(n^a) algothm can be converted to a stable one (based on Rax, 2003)
 123 		-There are stable algorithms for QR, LU, solve, and determinant costing O(n^(a+e)) for e>=0
attachment:notes_hida.txt of msri07/notes

Attachment 'notes_hida.txt'

Attached Files