DsageNg Scatter Gather Alpha Edition

This is not an official release or anything close... but it seems to work.

Also, you're entirely off the reservation here. This is just for those few who want to play around with the simplest of simple first cases of DsageNg. All comments should come to me for the time being... sage-devel ain't gonna help you with this one

List

Pending...

Code

The tip of my repo is at:

http://noc2.tarbox.org:8080/?p=sage-finance/.git;a=summary

You can grab a tarball of any point. I'll be making an Hg Clone of snapshots. How to use them can be viewed at

GlennTarbox/HgDevelopment

High Level Architecture

There are a few key elements in the proposed architecture

Architecture Diagram

ImageLink(DOC.png, height=500)

The client itself, now has 2 processes. In the example above, the notebook layer supports the browser view into Sage. In Scatter Gather, another process is spawned using PyProcessing and communications between the two processes is maintained through Queues.

  1. The Client Child is passed a set of url's identifying the remote processes for execution. While this is a very general capability, for Scatter Gather, it simply means a process which takes a block of work and returns a result.

  2. The Client Parent generates a Task which in its simplest case (for scatter gather) is simply the argument set of a function call. Fortunately, since we're using Python, this simply means we need to generate an args list and kargs dict for each Task. A Scatter Set is simply a list of args, kargs tuples.

   1 """ Create list of args, kargs for remote method """
   2 inList[]
   3 for x in range(100):
   4     inList.append(([x,3.14159,'hello'],{'distribution':'normal',
   5                            'loc':1,'scale':1,'bins':10}))
  1. This list is simply inserted in the downward queue in the figure. The result of the function call is returned in the upward queue. A nice feature is the ability to check for result availability without blocking. In the case of the notebook, one can continuously add jobs to be processsed while investigating the results of previous jobs.
  2. If desired, you can block waiting for results as well.
  3. The heavy lifting of object marshalling, fortunately, is handled by Sage's dumps() method. Essentially, I can take a tuple(args,kargs) and push it through dumps() and use loads() on the other side. Foolscap provides a simple way to use custom marshalling

   1 class AllWrap(Copyable, RemoteCopy):
   2     """ copious documentation """
   3     
   4     copytype = typeToCopy = "unique-string-AllWrap" # easy to mach
   5 
   6     def __init__(self):
   7         #log.msg("AllWrap __init__")
   8         pass
   9 
  10     def setValue(self, inVal):
  11         #log.msg("AllWrap.setVal: ")        
  12         self.v = inVal
  13         return self
  14 
  15     def getStateToCopy(self):
  16         #log.msg("AllWrap.getStateToCopy")                
  17         d = {}
  18         d['value'] = cPickle.dumps(self.v)
  19         return d
  20 
  21     def setCopyableState(self, d):
  22         #log.msg("AllWrap.setCopyableState")                        
  23         self.v = cPickle.loads(d['value'])
  1. And the worker process is simply invoked like

   1     def remote_execJob(self,wrap):
   2         """ suck args and kargs out of allwrap transport
   3         and wrap em back up for the long trip home """
   4         #log.msg("AllServer.remote_execJob: " + repr(wrap.v))
   5         return AllWrap().setValue(self.local_execJob(*wrap.v[0],**wrap.v[1]))
   6 
   7     def local_execJob(self, sizeSeries,distribution='normal',
   8                        loc=1,scale=1,bins=50):
   9         s='AllServer.local_execJob %d, %s, %d, %d, %d'
  10         #log.msg(s % (sizeSeries, distribution,loc, scale, bins))
  11         ts = TimeSeries(sizeSeries)
  12         ts.randomize(distribution,loc,scale)
  13         return ts.histogram(bins=bins)
  1. The blocks from the trivial test workbook are

   1 %python
   2 urls=[("pbu://ghtmyth:",12345,"/all_series",2)]
   3 #urls.append(("pbu://hq2:",12345,"/all_series",8))
   4 
   5 inList=[]

   1 %python
   2 for x in range(100):
   3     inList.append(([1000],{'distribution':'normal',
   4                            'loc':1,'scale':1,'bins':10}))

   1 %python
   2 f=open('/tmp/client.log','w')
   3 c=sage.all.dsageng.spawnScatterGather(urls,logFile=f)
   4 goVal = c.waitForResult(); goVal

   1 %python
   2 c.addJob(inList)

   1 %python
   2 r=c.waitForResult(); r[0:5]

   1 %python
   2 c.command('refresh')
   3 c.command('die') # kills the child... probably don't wanna do this

Enhancements Comming Soon

First, answers to the questions I expect:

Installation

You'll need foolscap and PyProcessing

source SAGE_ROOT/local/bin/sage-env
cd /tmp
hg clone http://foolscap.lothar.com/repos/trunk foolscap-trunk
cd foolscap-trunk
./setup.py install
sage -i processing-0.52

Then grab the tarball (or the repo... but the tarball is easier) from the Distributed branch

http://noc2.tarbox.org:8080/?p=sage-finance/.git;a=shortlog;h=refs/heads/distributed

Unwind the tarball into SAGE_ROOT/devel/sage-finance

sage -b finance

Now you're mostly good to go.

Insert useful instructions here...


CategoryDsageNg