10:30 a.m. Interactive Parallel Computing with Python and IPython Brian Granger Python -Freely available (BSD License) -Highly portable: OSX, Windows, Linux, supercomputers -Can be used interactively (like Matlab, Mathematica, IDL) -Simple, expressive syntax readable by human beings -Supports OO, functional, generic and meta programming -Large community of scientific/HPC users -Powerful built-in data types and libraries -Strings, lists, sets, dictionaries (hash tables) -Networking, XML parsing, thrading, regular expressions... -Larger number of third party libraries for scientific computing -Easy to wrap existing C/C++/Fortran codes IPython: Enhanced Interactive Python Shell -Freely available (BSD license) http://ipython.scipy.org/ -Goal: Provide an efficient environment for exploratory and interactive scientific computing -The de facto shell for scientific computing in Python (has become in past few years) -Available as a standard package on every major Linux distribution. Downloaded over 27k times in 2006 alone -Interactive Shell for many other projects: -Math (SAGE) -Astronomy (PyRAF,CASA) -Physics (Ganga, PyMAD) -Biology (Pymerase) -Web Frameworks Capabilities: -Input/output histories -interactive GUI control: enables interactive plotting -Highly customizable: extensible syntax, error handling, ... -Interactive control system: magic commands -Dynamic Introspection of nearly everything (objects, help, filesystem, etc.) -Direc acess to filesystem and shell -Integrated debugger and profiler support -Easy to embed: give any program an interactive console with one line of code -Interactive Parallel/Distributed Computing... Traditional Parallel Computing: (at least in the realm of physics) Compiled Languages: -C/C++/Fortran are FAST for computers, SLOW for you -Everything is low-level, you get nothing for free -Only primitive data types -Few built-in libraries -Manual memory management: bugs and more bugs -With C/C++ you don't even get built-in high performance numerical arrays Message Passing Interface: MPI Pros- -Robust, optimized, standardized, portable, common -Existing parallel libraries (FFTW, ScaLAPACK, Trillinos, PETSc) -Runs over Ethernet, Infiniband, Myrinet -Great at moving data around fast! Cons- -Trivial things are not trivial. Lots of boilerplate code. -Orthogonal to how scientists think and work -Static: load balancing and fault tolerance are difficult to implement -Emphasis on compiled languages -Non-interactive and non-collaborative -Doesn't play well with other tools: GUIs, plotting, visualization, web -Labor intensive to learn and use properly Case study: Parallel Jobs at NERSC in 2006: NERSC = DOE supercomputing center at Lawrence Berkeley National Laboratory Seaborg = IBM SP RS/6000 with 6080 CPUs -90% of jobs used less than 113 CPUs -Only 0.26% of jobs used more than 2048 CPUs Jacquard = 712 CPU Operton system -50% of jobs used fewwer than 15 CPUs -Only 0.39% of jobs used more than 256 CPUs Yes, this isn't the entire story. There are other issues involved, reasons people don't want to use lots of CPUs. Realities: -Developing highly parallel codes with these tools is extremely difficult and time consuming -When it comes to parallel, WE are often the bottleneck -We spend most of our time writing code rather than waiting for those "slow" computers -With the advent of multi-core CPUs, this problem i scoming to a laptop/desktop near you -Parallel speedups are not guaranteed! Our Goals with IPython: -Trivial parallel things should be trivial -Difficult parallel things should be possible -Make all stages of parallel computing fully interactive: development, debugging, testing, execution, monitoring, ... -Make parallel computing more collaborative -More dynamic model for load balancing and fualt tolerance. -Seamless integration with other tools: plotting/visualization, system shell -Also want to keep the benefits of traditional approaches: -Should be able to use MPI if it is appropriate -Should be easy to integrate compiled code and libraries. -Support many types of parallelism. Computing with Namespaces: Namespaces: -Namespace = a container for objects and their unique identifiers Very common with Python. Almost a hash table -An instruction stream causes a namespace to evolve with time -Interactive computing: the instruction stream has a human agent as its runtime source at some level Human is at the command at some point -A (namespace, instruction stream) is a higher level abstraction than a process or a thread. -Data in a namespace can be reated inplace (by instructions) or by external I/O (disk, network). Important Points: -Requirements for Interactive Computation: -Alice/Bob must be able to send instruction stream to a namespace -Alice/Bob must be able to push/pull objects to/from the namespace (disk, network) -Requirements for Parallel Computation: -Multiple namespaces and instruction streams (for general MIMD parallelism). -Send data between namespaces (MPI is REALLY good at this). -Requirements for Interactive Parallel Computation: -Alice/Bob must be able to send multiple instruction streams to multiple namespaces -User must be able to push/pull objects to/from the namespaces THESE REQUIREMENTS HOLD for any type of parallelism IPython Architecture: pic Architecture Details: The IPython Engine/Controller/Client are typically different processes. Why not threads? Later -Can be run in arbitrary configurations on laptops, clusters, supercomputers -Everything is asynchronous. Can't hack this on as an afterthought -Must deal with long running commands that block all network traffic -Dynamic process model. Engines and Clients can come and go at will at any time (unless you're using MPI) Mapping Namespaces to Various Models of Parallel Computation: Key Points: -Most models of parallel/distributed computing can be mapped onto this architecture. -Message passing -Task farming -TupleSpaces -BSP (Bulk Synchronous Parallel) -Google's MapReduce -??? -With IPython's architecture of all these types of parallel computations can be done Interactively and collaboratively. -The mapping of these models onto our architecture is done using interfaces + adapters and requires very little code. Live Demo: easy execution of commands on machines, easy to see what machines are connected, output of responses from machines as they come in. One way they abstracte the asynchronicity is to use block=false to just return after it submits the command Magic commands are simply handled by such example: %px import numpy %px parallel execute %pn # execute only on machine # %result get results of last things %autopx automatically send everything sent to all machines rc.block=BOOL show blocking or not rc.pushall Send data rc.pullall Get data Can use dictionary syntax: rc.pushAll(d=3564567) = rc['d'] = 3564567 rc.scatterAll Scatters an array across machines rc.gatherAll Gathers the array back The parallel time on the example rc.mapAll('lambda x: x**10', range(1000000)) is horrible There's lots of tools here, but you still have to think. Task-Based Computing: -Common style of parallelism for loosely coupled or independent tasks Stepping Back a Bit: -Event based systems are a nice alternative to threads: -Scale to multiple CPU systems -Bu9ild asynchronous nature of things in at low level -No deadlocks to worry about -The networking framework used by IPython and SAGE (Twisted) has an abstraction (called a Deferred) for a result that will arrive at some point in the future (like a promise in E) -We have n interface that uses Deferreds to encapsulate asynchronous results/errors. Benefits of an Event Based System: -Arbitrary configurations of namespaces are immediately possible without worrying about deadlocks -Our test suite create a controller/client and multiple engines in a single process -Possibiliities: -Hierarchies of client/contro... -Recursive systems Error Propagation: -Building a distributed system is easy... -Unless you want to handle errors well -We have spent a lot of time working/thinking about this issue. Not always obvious what should happen. -Our goal: error handling/propagation in a parallel distributed context should be a nice analytic continuation of what happens in a serial context. -Remote exceptions should propagate to the user in a meaningful manner -Policy of safety: don't ever let errors pass silently This week: -All the core developers of IPython's parallel capabilities are here at the workshop.