Differences between revisions 4 and 5
Revision 4 as of 2008-06-27 06:12:53
Size: 3910
Editor: GlennTarbox
Comment:
Revision 5 as of 2008-06-27 07:13:46
Size: 6202
Editor: GlennTarbox
Comment:
Deletions are marked like this. Additions are marked like this.
Line 14: Line 14:
GlennTarbox has the usual contact information = List =

http://groups.google.com/group/sage-dsageng?hl=en
Line 42: Line 44:
 1. The heavy lifting of object marshalling, fortunately, is handled by Sage's dumps() method. Essentially, I can take a tuple(args,kargs) and push it through dumps() and use loads() on the other side. Foolscap provides a simple way to use custom marshalling
Line 43: Line 46:
{{{#!python
class AllWrap(Copyable, RemoteCopy):
    """ copious documentation """
    
    copytype = typeToCopy = "unique-string-AllWrap" # easy to mach
Line 44: Line 52:
    def __init__(self):
        #log.msg("AllWrap __init__")
        pass
Line 45: Line 56:
The child process is a conventional Twisted process. A scheduler continuously checks for jobs having been submitted through a queue     def setValue(self, inVal):
        #log.msg("AllWrap.setVal: ")
        self.v = inVal
        return self

    def getStateToCopy(self):
        #log.msg("AllWrap.getStateToCopy")
        d = {}
        d['value'] = cPickle.dumps(self.v)
        return d

    def setCopyableState(self, d):
        #log.msg("AllWrap.setCopyableState")
        self.v = cPickle.loads(d['value'])

}}}

 1. And the worker process is simply invoked like

{{{#!python
    def remote_execJob(self,wrap):
        """ suck args and kargs out of allwrap transport
        and wrap em back up for the long trip home """
        #log.msg("AllServer.remote_execJob: " + repr(wrap.v))
        return AllWrap().setValue(self.local_execJob(*wrap.v[0],**wrap.v[1]))

    def local_execJob(self, sizeSeries,distribution='normal',
                       loc=1,scale=1,bins=50):
        s='AllServer.local_execJob %d, %s, %d, %d, %d'
        #log.msg(s % (sizeSeries, distribution,loc, scale, bins))
        ts = TimeSeries(sizeSeries)
        ts.randomize(distribution,loc,scale)
        return ts.histogram(bins=bins)
}}}

 1. The blocks from the trivial test workbook are

{{{#!python
%python
urls=[("pbu://ghtmyth:",12345,"/all_series",2)]
#urls.append(("pbu://hq2:",12345,"/all_series",8))

inList=[]
}}}

{{{#!python
%python
for x in range(100):
    inList.append(([1000],{'distribution':'normal',
                           'loc':1,'scale':1,'bins':10}))
}}}

{{{#!python
%python
f=open('/tmp/client.log','w')
c=sage.all.dsageng.spawnScatterGather(urls,logFile=f)
goVal = c.waitForResult(); goVal
}}}

{{{#!python
%python
c.addJob(inList)
}}}

{{{#!python
%python
r=c.waitForResult(); r[0:5]
}}}

{{{#!python
%python
c.command('refresh')
c.command('die') # kills the child... probably don't wanna do this
}}}

DsageNg Scatter Gather Alpha Edition

This is not an official release or anything close... but it seems to work.

Also, you're entirely off the reservation here. This is just for those few who want to play around with the simplest of simple first cases of DsageNg. All comments should come to me for the time being... sage-devel ain't gonna help you with this one

List

http://groups.google.com/group/sage-dsageng?hl=en

High Level Architecture

There are a few key elements in the proposed architecture

  • Twisted for asynchronous communications, event management, scheduling
  • PyProcessing for spawning local processes and IPC

  • Foolscap for Remote Procedure Call (RPC) abstraction over Twisted

Architecture Diagram

ImageLink(DOC.png, height=500)

The client itself, now has 2 processes. In the example above, the notebook layer supports the browser view into Sage. In Scatter Gather, another process is spawned using PyProcessing and communications between the two processes is maintained through Queues.

  1. The Client Child is passed a set of url's identifying the remote processes for execution. While this is a very general capability, for Scatter Gather, it simply means a process which takes a block of work and returns a result.

  2. The Client Parent generates a Task which in its simplest case (for scatter gather) is simply the argument set of a function call. Fortunately, since we're using Python, this simply means we need to generate an args list and kargs dict for each Task. A Scatter Set is simply a list of args, kargs tuples.

   1 """ Create list of args, kargs for remote method """
   2 inList[]
   3 for x in range(100):
   4     inList.append(([x,3.14159,'hello'],{'distribution':'normal',
   5                            'loc':1,'scale':1,'bins':10}))
  1. This list is simply inserted in the downward queue in the figure. The result of the function call is returned in the upward queue. A nice feature is the ability to check for result availability without blocking. In the case of the notebook, one can continuously add jobs to be processsed while investigating the results of previous jobs.
  2. If desired, you can block waiting for results as well.
  3. The heavy lifting of object marshalling, fortunately, is handled by Sage's dumps() method. Essentially, I can take a tuple(args,kargs) and push it through dumps() and use loads() on the other side. Foolscap provides a simple way to use custom marshalling

   1 class AllWrap(Copyable, RemoteCopy):
   2     """ copious documentation """
   3     
   4     copytype = typeToCopy = "unique-string-AllWrap" # easy to mach
   5 
   6     def __init__(self):
   7         #log.msg("AllWrap __init__")
   8         pass
   9 
  10     def setValue(self, inVal):
  11         #log.msg("AllWrap.setVal: ")        
  12         self.v = inVal
  13         return self
  14 
  15     def getStateToCopy(self):
  16         #log.msg("AllWrap.getStateToCopy")                
  17         d = {}
  18         d['value'] = cPickle.dumps(self.v)
  19         return d
  20 
  21     def setCopyableState(self, d):
  22         #log.msg("AllWrap.setCopyableState")                        
  23         self.v = cPickle.loads(d['value'])
  1. And the worker process is simply invoked like

   1     def remote_execJob(self,wrap):
   2         """ suck args and kargs out of allwrap transport
   3         and wrap em back up for the long trip home """
   4         #log.msg("AllServer.remote_execJob: " + repr(wrap.v))
   5         return AllWrap().setValue(self.local_execJob(*wrap.v[0],**wrap.v[1]))
   6 
   7     def local_execJob(self, sizeSeries,distribution='normal',
   8                        loc=1,scale=1,bins=50):
   9         s='AllServer.local_execJob %d, %s, %d, %d, %d'
  10         #log.msg(s % (sizeSeries, distribution,loc, scale, bins))
  11         ts = TimeSeries(sizeSeries)
  12         ts.randomize(distribution,loc,scale)
  13         return ts.histogram(bins=bins)
  1. The blocks from the trivial test workbook are

   1 %python
   2 urls=[("pbu://ghtmyth:",12345,"/all_series",2)]
   3 #urls.append(("pbu://hq2:",12345,"/all_series",8))
   4 
   5 inList=[]

   1 %python
   2 for x in range(100):
   3     inList.append(([1000],{'distribution':'normal',
   4                            'loc':1,'scale':1,'bins':10}))

   1 %python
   2 f=open('/tmp/client.log','w')
   3 c=sage.all.dsageng.spawnScatterGather(urls,logFile=f)
   4 goVal = c.waitForResult(); goVal

   1 %python
   2 c.addJob(inList)

   1 %python
   2 r=c.waitForResult(); r[0:5]

   1 %python
   2 c.command('refresh')
   3 c.command('die') # kills the child... probably don't wanna do this

Enhancements Comming Soon

First, answers to the questions I expect:

  • Interactive Use - This means to most, tab completion and use from the notebook. This does work from the notebook, but the meta-data for tab completion needs to be added. The good news is it appears straightforward
  • Dynamic Code Migration - This is also likely straightforward. Numerous approaches exist:
    1. Send a block of code in a string and have it compiled and resident on the servers
    2. Pickle a python function and send it over (gfurnish has this capability coded up)
    3. Task Objects - send over an object with a well-known interface (this is easily extended to be very general)

Installation

You'll need foolscap and PyProcessing

source SAGE_ROOT/local/bin/sage-env
cd /tmp
hg clone http://foolscap.lothar.com/repos/trunk foolscap-trunk
cd foolscap-trunk
./setup.py install
sage -i processing-0.52

Then grab the tarball (or the repo... but the tarball is easier) from the Distributed branch

http://noc2.tarbox.org:8080/?p=sage-finance/.git;a=shortlog;h=refs/heads/distributed

Unwind the tarball into SAGE_ROOT/devel/sage-finance

sage -b finance

Now you're mostly good to go.

Insert useful instructions here...


CategoryDsageNg

DsageNg/DistributedComputing/ScatterGather (last edited 2008-11-14 13:42:06 by anonymous)