Attachment 'notes_hida.txt'
Download 1 10:30 a.m.
2 Moving Sca/Lapack to Higher Precision (without too much work)
3 Yozo Hida
4
5 Lapack 3.1 has been released for about 2 months now.
6 -Improved MRRR algorithm for symmetric eigenproblems
7 -Faster Hessenberg QR
8 -Faster Hessenberg, tridiagonal, bidiagonal reductions.
9 -Mixed precision iterative refinement.
10 -Thread safety
11
12 Current Activities:
13 Faster Algorithms
14 -O(n^2) solver for polynomial roots via companion matrix
15 More Accurate Algorithms
16 -Higher precisions
17 Expanded contents (more Lapack in ScaLapack)
18 Improve ease of use
19
20 What could go into Lapack?
21 for all linear algebra problems do
22 for all matrix types do
23 for all data types do
24 for all architectures and networks do
25 for all programming interfaces do
26 Procude fastest algorithm providing acceptable accuracy;
27 Produce most accurate algorithm running at acceptable speed;
28 od;
29 od;
30 od;
31 od;
32 od;
33
34 This talk will focus on what we can do as far as data types goes, especially multiple precision.
35
36 Motivation
37 Lapack and ScaLapack are widely used.
38 Significant minority want multiple precision
39 Want to support codes like:
40 n_bits <- 32
41 repeat
42 n_bits <- n_bits x 2
43 Solve with n_bits
44 until error < tol
45
46 Fixed Precision:
47 Compiler supported quad
48 Double-double, quad-double
49 Arbitrary Precision
50 GMP / MPFR
51 ARPREC
52
53 Double-double / Quad-double Package:
54 -These represent a higher precision numers as an unevaluated sum of 2 or 4 double precision numbers
55 -Ex: 2^60 + 1 is represented as (2^60, 1)
56 -Can represent about 32/64 digits of precision
57 -Highly efficient algorithms due to fixed, small size.
58 -Simple representation. Easy for allocation or to use with MPI
59 -Somewhat fuzzy definition of "machine precision" for double-double
60 -Exponent range limited by that of double precision.
61 -C/C++/Fortran 95 interafaces
62 -Support for ocmplex data types recently added
63
64 GMP/MPFR:
65 -One of most widely used multiple-precision floating point computations with correct rounding.
66 -Speed
67 -Uses a somewhat complicated data structure, mixing various types in a C structure.
68
69 ARPREC:
70 -Uses simple floating point array to represent data.
71 -Message passing is made easier
72 -Slower than GMP
73
74 Considerations:
75 -Code will be easier to maintain if only a single source code is used for varying precision.
76 -Need to let the user know how much workspace to allocate
77 -Allocation of temporary variables
78 -Multiple precisions in one instance: allowable?
79 -How to adjust precision.
80 -Co-existence of multiple versions?
81
82 Workspace allocation:
83 Specifying syze by bytes doesn't work well in Fortran. Specifying size by bignum slots works okay if every bignum in the workspace has the same size.
84 Allocate ourself? Would be nice for cache friendliness and ease of message passing.
85
86 Temporary Variable Allocation: Memory for temp variables needs to be allocated somewhere.
87 Could use external malloc routine or explicitly allocate. With fixed precisions, have compiler automaticall allocate
88 Could be solved through the use of macros.
89
90 Memory allocation: how much? when? Where?
91 Fixed Precision, the easiest approach.
92 Many compilers provide support for quad precision
93 OMF77 supports multi-precision variables (precision is compile-time selected)
94 Maximium precision:
95 User specifies maximum precision at compile time. At run time, program specifies precision to be used. Memory allocation issues are handled automatically, but memory can be wasted needlessly.
96 Variable precision:
97 Memory allocation becomes tricky. Memory must be allocated dynamically.
98
99 A simple replacement of variable types is not enough when working in multiple precision. The compiler must see each piece of a variable as a multiple precision number so that multiple precision. Also, when dealing with constants not expressable as a double, extra care is needed.
100
101 There are naming issues that need to be resolved as more packages are added.
102
103 Implemented currently as a Perl script to convert Lapack sources to perform what is needed.
104
105 Iterative Refinement:
106 Use Newton's iteration on r(x) = b-Ax but compute the residual in double the working precision
107 You can get a more accurate answer up to a point. Error does not grow with condition number up to a point.
108
109 Once the high-precision library is set up appropriately, most of the existing Lapack code can be transformed with little trouble.
110
111 Future work:
112 More comprehensive error handling.
113 How much performance can we buy from using multi-core BLAS
114 Apply this to ScaLapack.
115
116 Recent Results in Fast Stable matrix Computations:
117 -Coppersmith and Winograd, 1990, fastest
118 -Strassen et al. showed O(n^a) matrix multiplication implies O(n^a) other things.
119 -Cohn, Kleinberg, Szegedy, Umans 2005 algorithm equals or beats Coppersmith and Winograd, depending on finding the right groups. IT uses Wedderburn's Theorem to reduce matrix multiplication to FFT.
120 New results (Demmel, Dumitriu, Holtz, Kleinberg 2006):
121 -CKSU(2005) algorithm is stable
122 -Any O(n^a) algothm can be converted to a stable one (based on Rax, 2003)
123 -There are stable algorithms for QR, LU, solve, and determinant costing O(n^(a+e)) for e>=0
Attached Files
To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.You are not allowed to attach a file to this page.