Differences between revisions 1 and 6 (spanning 5 versions)

What happens when some code is evaluated

Here's an example of the sequence of events when a person evaluates the following input:

      print 2
      sleep(10)
      print 3
      plot(sin)
      print "hello"
      graph_editor()


The cast is as follows:

 * USER -- a human or a program controlling a web browser (selenium) or other user interface
 * CLIENT -- a program, possibly in javascript that displays something
   to the USER
 * SERVER -- a program that handles requests from the CLIENT,
   typically a web server such as flask + mod_wsgi + apache.
 * DATABASE -- stores data
 * DEVICE -- queries the DATABASE for work that needs to be performed,
   does that work, and updates the database in response

 1. The USER types the above into an input object and submits this input.

 2. The CLIENT (e.g., javascript) instantly adds some confirmation that the input is being sent, e.g., a spinning wheel, a green bar, or something.   This CLIENT widget will timeout with an error if no output appears after 15 seconds (say).  

 3. The CLIENT sends a message to the SERVER using this URL schema: 
           /home/wstein/17/5/evaluate  
    The request does contain the input to the cell (POST).  Here worksheet_id=17, cell_id=5.

 4. (Alternatively, if the input was not changed -- e.g., in evaluate all.) the CLIENT sends a message to the SERVER using this URL schema: 
           /home/wstein/17/5/evaluate
    The request does *NOT* contain the input to the cell.  The same URL, but the absence of the cell input text field means don't change it. 

 5. The SERVER receives the above request (let's just assume it is the evaluate one).  
    First it checks (via a DATABASE query) if a session_id has been assigned for this worksheet document.  (It hasn't.)

    The SERVER does a query about device status and load, runs a very
    fancy algorithm to conclude that DEVICE 1 (not DEVICE 0) is the
    way to go.

    The SERVER upserts the following cell document in the DATABASE:
               
             {cell_id:5, worksheet_id:17, 
              input:"""print 2\nsleep(10)\nprint 3\nplot(sin)\nprint "hello"\ngraph_editor()""", 
              status:"needs_work", device:1, user_id='wstein',
              sequence_number: 0}

   The sequence number is global to the entire worksheet. After inserting this into the database, it returns a message to the CLIENT as follows:

             {cell_id:5, status:'needs_work'}

   The CLIENT receives the message and changes the cell 5 output to "working", and adds 5 to the list of needs_work cells.

 6. DEVICE 1 does a query for all cells that have status "needs_work"
    and for which device:1.  It gets back an iterator with one
    document in it, namely the above inserted document (from step 5).

    It then:

     - Allocates a fresh Python process with id 1974 for evaluation of code in the worksheet: 'wstein/17'
       We have an in-memory table that maps wstein/17 to 1974.

     - Does a DATABASE query to change the cell:

             {status:"working..."}

     - Sends a message to the Python process with id 1974 to evaluate """print 2\nsleep(10)\nprint 3\nplot(sin)\nprint "hello"\ngraph_editor()""". 
    
 7. The CLIENT queries the SERVER    

           /home/wstein/17/5/update

    The SERVER does the following:

        - Responds to the CLIENT with nothing much, since nothing happened yet.

                   {cell_id:5, status:'working', sequence_number: 1}        (JSON)

    It happens this time that the CLIENT does *not* get the response message, due to a flakie network.

 8. Meanwhile, DEVICE 1 checks on its message queue with the Sage process and finds the
    following output for process 1974: sys.stdout:"2".  It then does
    this:

        - Update DATABASE cell 5 document:

                  {... output:{stdout_0:{type:'text', order:0, content:'2', state:'open'}}, sequence_number: 2 ...}

 9. Next, again the CLIENT queries for updates on cell 5 via the URL: 

           /home/wstein/17/5/update

    The SERVER queries the DATABASE for info about cell 5 gets 

           ... {stdout_0:{type:'text', order:0, content:'2', state:'open'}} ...

    and returns the JSON message:

                  {cell_id:5, status:'working', output:{stdout_0:{type:'text', order:0, content:'2', state:'open'} }}

    The CLIENT gets the update back and calls a (javascript) function
    that renders sys.stdout.  It also stores the number of characters
    from the sys.stdout stream that it has received because that
    stream is open.

  10. Now DEVICE 1 notices that more output has appeared from process
  1974, namely "3\n" and a new stream has started, since Sage's plot command has called the 
  api function to make a new output block, so the DEVICE updates the DATABASE:

             {... output:{stdout_0:{type:'text', order:0, content:'2\n3\n', state:'closed'}},
                  sequence_number: 3 ...}

  Incidentally, if there were any files 'foo.png' and 'bar.png' (say)
  created as a side effect (check modification times of files that are *closed* and compare them
  with the time the block started), then we would add them to the database as well.

             {... output:{stdout_0:{type:'text', order:0, 
                          files:{'foo.png':"lkajsfljsd", 'bar.png':"lksjflkjssdlfkj..."}, 
                          content:'2\n3\n', state:'closed'}} ...}
  
  In order to detect these automatically generated files, the code the device 
  actually asks the worker process to execute will look like the following:

                    try:
                        block_api.new_block() # store current time, output sentinal character to stdout, etc.
                        print 2
                        sleep(10)
                        print 3
                        plot(sin) # calls new_block()
                        print "hello"
                        graph_editor() # calls new_block()
                    finally:
                        block_api.close() # check for files created since the beginning of the last block

  Note that the plot(sin)... function will actually call
  block_api.new_block().  At this moment, the block_api object will
  know it's in a block, and check the filesystem for all new files
  created until now.

  This output will actually send 3 streams, including two copies of the file a.png, one a.png displayed above the cosine
  plot and one beneath:

     g = plot(sin)
     g.save('a.png')
     plot(cos)
     g.save('a.png')

  whereas this will display just one sine plot

     g = plot(sin)
     g.save('a.png')
     print 1+2 
     g.save('a.png')
  
  However, this example would only display one copy of the file test.txt since the file was not closed when
  the first stdout block was ended:

     f=open('test.txt','w')
     f.write('test1')
     plot(cos)
     f.write('after plot')
     f.close()

 
  11. The CLIENT queries for updates on cell 5, sending a parameter 'closed' or the number of characters (or bytes 
      if the stream is not text) for each stream it has received information about.

           /home/wstein/17/5/update?stdout_0=1

    The SERVER queries the database and gets

         output:{stdout_0:{content:'2\n3\n', state:'closed', ...}}

    The SERVER then sends the JSON message to the CLIENT.

        {cell_id:5, status:'working', output:{stdout_0:{content:'\n3\n', state:'closed'}} }
        
    (note that it did not send the first character since the client said it already had the first character)

  12. Next DEVICE 1 sees that a plot appeared (in a new chunk of
  output), so it updates this into the DATABASE:

        {... output:{stdout_0:{type:'text', order:0, content:'2\n3\n', state:'closed'},
                     plot_0:{type:'plot', order:1, files:{'plot0.png':r"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'}} },
             sequence_number: 4  ... }

  13. The CLIENT queries for updates on cell 5.

     # closed means we already got it all, so do not bother sending it again or telling us it is closed.

           /home/wstein/17/5/update?stdout_0=closed  

    The SERVER queries, gets stuff, and responds with a message:

            {cell_id:5, plot_0:{type:'plot', order:1, files:["plot0.png"], state:'closed'}} }

    The whole thing gets dropped!  The client sees nothing.

  14. The CLIENT queries for updates on cell 5.

     # closed means we already got it all, so do not bother sending it again or telling us it is closed.

           /home/wstein/17/5/update?stdout_0=closed  

    The SERVER queries, gets stuff, and responds with a message:

            {cell_id:5, plot_0:{type:'plot', order:1, files:["plot0.png"], state:'closed'}} }

    This time the CLIENT draws (using javascript, somehow) the content.  It gets the actual image file using the URL

            /home/wstein/17/5/plot_0/plot0.png
 
    So, for example, the CLIENT could insert an <img src="/home/wstein/17/5/plot_0/plot0.png"/> tag in the html of the page

  15. The DRIVER 1 see a marker in the 1974 process stdout which says "another new output block".  Also, it
    sees the output "hello".  It puts this in the DATABASE.

        {... output:{{stdout_0:{type:'text', order:0, content:'2\n3\n', state:'closed'},
                    plot_0:{type:'plot', order:1, files:{'plot0.png':r"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'}},
                    stdout_1:{type:'text', order:2, content:"hello", state:'open'} }},
             sequence_number: 5
        }

  16. The CLIENT queries for updates:

           /home/wstein/17/5/update?stdout_0=closed&plot_0=closed

    Gets back this JSON document:

          {cell_id:5, status:'working', output:{stdout_1:{type:'text', order:2, content:'hello', state:'open'}} }

   
  17. Finally, the DRIVER 1 sees a marker in the process stating that
      there is a new output block, and the full computation of that
      cell is done.  It also sees that a new stream called
      "graph_editor_0" with type 'graph_editor' was placed in the
      output along with a payload.
     
      It updates the DATABASE to look like this:

        {... output:{stdout_0:{type:'text', order:0, content:'2\n3\n', state:'closed'},
                    plot_0:{type:'plot', order:1, files:{'plot0.png':r"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'} ,
                    stdout_1:{type:'text', order:2, content:"hello", state:'closed'} ,
                    graph_editor_0:{type:'graph_editor', order:3, content:"(^%$*^@S...", state:'closed'}} } ,

            status:'done',
            sequence_number: 6
        }

  18. The CLIENT queries for updates:
    
           /home/wstein/17/5/update?stdout_0=closed&plot_0=closed&stdout_1=5

      and gets back this JSON:

        {cell_id:5, status:'done', output:{stdout_1:{content:'',state:'closed'}, 
                   graph_editor_0:{type:'graph_editor', order:3, content:"(^%$*^@S...", state:'closed'}} }}

      The client renders the graph editor, and stops painting the cell green, and stops querying for updates.

-  ⇤ ← Revision 1 as of 2011-01-13 06:20:15 → 
  Size: 9716
  Editor: was
  Comment:
+   ← Revision 6 as of 2011-01-13 08:00:14 → ⇥
  Size: 11300
  Editor: robertwb
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-=What happens when some code is evaluated=
+= What happens when some code is evaluated =
 Line 4:
-== Example: with discussion, simplified ==
-Line 33:
+Line 31:
-           /home/wstein/myws/5/evaluate  
    The request does contain the input to the cell (POST).  Here worksheet_filename='myws', cell_id=5.
+           /home/wstein/17/5/evaluate  
    The request does contain the input to the cell (POST).  Here worksheet_id=17, cell_id=5.
-Line 37:
+Line 35:
-           /home/wstein/myws/5/evaluate
    The request does *NOT* contain the input to the cell.  The same URL, but the absense of the cell input text field means don't change it.
+           /home/wstein/17/5/evaluate
    The request does *NOT* contain the input to the cell.  The same URL, but the absence of the cell input text field means don't change it.
-Line 49:
+Line 47:
-             {cell_id:5, worksheet_filename:'myws', 
              input:"print 2\nsleep(10)\nprint 3\ngraph_editor()", 
              status:"needs_work", device:1, user_id='wstein'}

   After inserting this into the database, it returns a message to the CLIENT as follows:
+             {cell_id:5, worksheet_id:17, 
              input:"""print 2\nsleep(10)\nprint 3\nplot(sin)\nprint "hello"\ngraph_editor()""", 
              status:"needs_work", device:1, user_id='wstein',
              sequence_number: 0}

   The sequence number is global to the entire worksheet. After inserting this into the database, it returns a message to the CLIENT as follows:
-Line 65:
+Line 64:
-     - Allocates a fresh Python process with id 1974 for evaluation of code in the worksheet: 'wstein/myws'
       We have an in-memory table that maps wstein/myws to 1974.
+     - Allocates a fresh Python process with id 1974 for evaluation of code in the worksheet: 'wstein/17'
       We have an in-memory table that maps wstein/17 to 1974.
-Line 72:
+Line 71:
-     - Sends a message to the Python process with id 1974 to evaluate "print 2\nsleep(10)\nprint 3\ngraph_editor()".
+     - Sends a message to the Python process with id 1974 to evaluate """print 2\nsleep(10)\nprint 3\nplot(sin)\nprint "hello"\ngraph_editor()""".
-Line 76:
+Line 75:
-           /home/wstein/myws/5/update
+           /home/wstein/17/5/update
-Line 82:
+Line 81:
-                   {cell_id:5, status:'working'}        (JSON)

     The CLIENT does *not* get the response message, due to a flakie network.

 8. Meanwhile, DEVICE 1 checks on its message queue and finds the
+                   {cell_id:5, status:'working', sequence_number: 1}        (JSON)

    It happens this time that the CLIENT does *not* get the response message, due to a flakie network.

 8. Meanwhile, DEVICE 1 checks on its message queue with the Sage process and finds the
-Line 92:
+Line 91:
-                  {... output:{stdout_0:{type:'text', order:0, content:'2', state:'open'}} ...}
+                  {... output:{stdout_0:{type:'text', order:0, content:'2', state:'open'}}, sequence_number: 2 ...}
-Line 96:
+Line 95:
-           /home/wstein/myws/5/update
+           /home/wstein/17/5/update
-Line 104:
+Line 103:
-                  {cell_id:5, status:'working', output:{stdout_0:{type:'text', order:0, content:'2', state:'open'}}}
+                  {cell_id:5, status:'working', output:{stdout_0:{type:'text', order:0, content:'2', state:'open'} }}
-Line 112:
+Line 111:
-, namely "3\n" and it's closed this stream, since it is about to
  produce a plot, so it updates the DATABASE:

             {... output:{stdout_0:{type:'text', order:0, content:'2\n3\n', state:'closed'}} ...}
+, namely "3\n" and a new stream has started, since Sage's plot command has called the 
  api function to make a new output block, so the DEVICE updates the DATABASE:

             {... output:{stdout_0:{type:'text', order:0, content:'2\n3\n', state:'closed'}},
                  sequence_number: 3 ...}
 Line 118:
-  created as a side effect (check modification times), then we would
  add them to the database as well.  This is actually done entirely
  by the worker process.
+  created as a side effect (check modification times of files that are *closed* and compare them
  with the time the block started), then we would add them to the database as well.
-Line 123:
+Line 122:
-                          file:{foo.png:"lkajsfljsd", bar.png:"lksjflkjssdlfkj..."},
+                          files:{'foo.png':"lkajsfljsd", 'bar.png':"lksjflkjssdlfkj..."},
-Line 126:
+Line 125:
-  The code the device asks the worker process to execute will look like the following:
+  In order to detect these automatically generated files, the code the device    actually asks the worker process to execute will look like the following:
 Line 129:
-                        block_api.new_block()
+                        block_api.new_block() # store current time, output sentinal character to stdout, etc.
 Line 133:
-                        plot(sin)
+                        plot(sin) # calls new_block()
 Line 135:
-                        graph_editor()
+                        graph_editor() # calls new_block()
 Line 137:
-                        block_api.close()
+                        block_api.close() # check for files created since the beginning of the last block
 Line 141:
-  know its in a block, and check the filesystem for all new files
+  know it's in a block, and check the filesystem for all new files
 Line 144:
- g = plot(sin)
 g.save('a.png')
 plot(sin)
 g.save('a.png')
+  This output will actually send 3 streams, including two copies of the file a.png, one a.png displayed above the cosine
  plot and one beneath:

     g = plot(sin)
     g.save('a.png')
     plot(cos)
     g.save('a.png')

  whereas this will display just one sine plot

     g = plot(sin)
     g.save('a.png')
     print 1+2 
     g.save('a.png')
  
  However, this example would only display one copy of the file test.txt since the file was not closed when
  the first stdout block was ended:

     f=open('test.txt','w')
     f.write('test1')
     plot(cos)
     f.write('after plot')
     f.close()
-Line 149:
+Line 169:
-. The CLIENT queries for updates on cell 5.

           /home/wstein/myws/5/update?stdout_0=1
+. The CLIENT queries for updates on cell 5, sending a parameter 'closed' or the number of characters (or bytes 
      if the stream is not text) for each stream it has received information about.

           /home/wstein/17/5/update?stdout_0=1
-Line 159:
+Line 180:
-        {cell_id:5, status:'working', output:{stdout_0:{content:'\n3\n', state:'closed'}}}
+        {cell_id:5, status:'working', output:{stdout_0:{content:'\n3\n', state:'closed'}} }
      (note that it did not send the first character since the client said it already had the first character)
-Line 165:
+Line 188:
-                  {plot_0:{type:'plot', order:1, files:{filename:"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'}}}}  ... }
+                     plot_0:{type:'plot', order:1, files:{'plot0.png':r"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'}} },
             sequence_number: 4  ... }
-Line 171:
+Line 195:
-           /home/wstein/myws/5/update?stdout_0=closed
+           /home/wstein/17/5/update?stdout_0=closed
-Line 175:
+Line 199:
-            {cell_id:5, plot_0:{type:'plot', order:1, files:["filename"], state:'closed'}}}
+            {cell_id:5, plot_0:{type:'plot', order:1, files:["plot0.png"], state:'closed'}} }
-Line 183:
+Line 207:
-           /home/wstein/myws/5/update?stdout_0=closed
+           /home/wstein/17/5/update?stdout_0=closed
-Line 187:
+Line 211:
-            {cell_id:5, plot_0:{type:'plot', order:1, files:["filename"], state:'closed'}}}

    This time the CLIENT draw (using javascript, somehow) the content.  It gets the actual data by querying the URL

            /home/wstein/myws/5/plot_0/filename
+            {cell_id:5, plot_0:{type:'plot', order:1, files:["plot0.png"], state:'closed'}} }

    This time the CLIENT draws (using javascript, somehow) the content.  It gets the actual image file using the URL

            /home/wstein/17/5/plot_0/plot0.png
 
    So, for example, the CLIENT could insert an <img src="/home/wstein/17/5/plot_0/plot0.png"/> tag in the html of the page
-Line 196:
+Line 222:
-        {... output:{stdout_0:{type:'text', order:0, content:'2\n3\n', state:'closed'},
                    {plot_0:{type:'plot', order:1, files:{filename:"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'}}},
                    {stdout_1:{type:'text', order:2, content:"hello", state:'open'}}}
+        {... output:{{stdout_0:{type:'text', order:0, content:'2\n3\n', state:'closed'},
                    plot_0:{type:'plot', order:1, files:{'plot0.png':r"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'}},
                    stdout_1:{type:'text', order:2, content:"hello", state:'open'} }},
             sequence_number: 5
-Line 203:
+Line 230:
-           /home/wstein/myws/5/update?stdout_0=closed&plot_0=closed
+           /home/wstein/17/5/update?stdout_0=closed&plot_0=closed
-Line 207:
+Line 234:
-          {cell_id:5, status:'working', output:{stdout_1:{type:'text', order:2, content:'hello', state:'open'}}}
+          {cell_id:5, status:'working', output:{stdout_1:{type:'text', order:2, content:'hello', state:'open'}} }
-Line 219:
+Line 246:
-                    {plot_0:{type:'plot', order:1, files:{filename:"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'}}},
                    {stdout_1:{type:'text', order:2, content:"hello", state:'closed'}}},
                    {graph_editor_0:{type:'graph_editor', order:3, content:"(^%$*^@S...", state:'closed'}}},

            status:'done'
+                    plot_0:{type:'plot', order:1, files:{'plot0.png':r"ASDFJAIEAJSJSF@#$#@$@(^!..."}, state:'closed'} ,
                    stdout_1:{type:'text', order:2, content:"hello", state:'closed'} ,
                    graph_editor_0:{type:'graph_editor', order:3, content:"(^%$*^@S...", state:'closed'}} } ,

            status:'done',
            sequence_number: 6
-Line 228:
+Line 256:
-           /home/wstein/myws/5/update?stdout_0=closed&plot_0=closed&stdout_1=5
+           /home/wstein/17/5/update?stdout_0=closed&plot_0=closed&stdout_1=5
-Line 233:
+Line 261:
-                   graph_editor_0:{type:'graph_editor', order:3, content:"(^%$*^@S...", state:'closed'}}}}
+                   graph_editor_0:{type:'graph_editor', order:3, content:"(^%$*^@S...", state:'closed'}} }}
-Line 236:
+Line 264:

Diff for "notebook/scalability/walkthrough"

What happens when some code is evaluated