Differences between revisions 26 and 27
Revision 26 as of 2017-03-22 01:25:40
Size: 5711
Editor: mrennekamp
Comment:
Revision 27 as of 2022-04-05 02:11:30
Size: 0
Editor: mkoeppe
Comment: outdated sagenb
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from Notebook scalability/Database calls
Guiding principle: put as much in the URL as is reasonable.

== Things that happen: ==
=== Notebook: ===

  * download worksheet(s)

  * upload worksheet(s)

  * add user

  * new worksheet

  * set_metadata

  * get_metadata


=== Worksheet: ===
  * evaluate_code(''input'')
      * introspection
      * interacts

  * get_output(''uid'')

  * create cells

  * delete cells

  * join cells

  * split cells

  * promote/demote cells in a cell heirarchy

  * publish

  * share

  * delete

  * user ping

  * get text representation, html representation

  * rename

  * set_metadata: deals with system, pretty printing, etc.

  * get_metadata

  

=== Object ===

  * change input

  * evaluate

  * get_output(''start''=0)

      -- ''start'' - offset telling how much text we've already received



  * update metadata: hide, lock, cell priority...

JSON <---> HTTP

== HTTP Requests: ==
Structure of URL:

/home/ ''username'' / ''worksheet_num'' / ''cell_id'' /..

 * ../update

== JSON Messages: ==
 * 'id' : ''id''

 * 'status' : '' 'success' '' or '' 'failure' ''

== Database Calls: ==
 * increase_worksheet_state_number()

 * update_cell_input(''cell_id'', ''input_text'')
 
 * evaluate_cell(''cell_id'')

 * insert_after_cell(''cell_id'')

== Example ==

Here's an example of the sequence of events when a person evaluates the following input:
{{{
print 2
sleep(10)
print 3
graph_editor()
}}}

The cast is as follows:

 * USER -- a human or a program controlling a web browser or other user interface
 * CLIENT -- a program, possibly in javascript that displays something
   to the USER
 * SERVER -- a program that handles requests from the CLIENT,
   typically a web server such as flask + mod_wsgi + apache.
 * DATABASE -- stores data
 * WORKER -- queries the DATABASE for work that needs to be performed,
   does that work, and updates the database in response

 1. The USER types the above into an input object and presses shift-enter.

 2. The CLIENT (e.g., javascript) instantly adds some confirmation that the input is being sent, e.g., a spinning wheel, a green bar, or something. This CLIENT widget will timeout with an error if no output appears after 15 seconds (say).

 3. The CLIENT sends a message to the SERVER using this URL schema:
           /home/wstein/19/17/5/save_and_evaluate
    The request does contain the input to the cell. Here 19=folder_id, 17=worksheet_id, 5=cell_id.

 4. (Alternatively, if the input was not changed -- e.g., in evaluate all.) the CLIENT sends a message to the SERVER using this URL schema:
           /home/wstein/19/17/5/evaluate
    The request does *NOT* contain the input to the cell.

 5. The SERVER receives the above request (let's just assume it is the evaluate one).
    The SERVER also inserts the following document into the DATABASE:

        {type:'container', input:"print 2\nsleep(10)\nprint 3\ngraph_editor()",
         status:"needs_work", worker:1, last_update_time:392924082.494,
         cell_id:6, worksheet_id:17, parent_cel:5, user_id='wstein'}

   The last_update_time is right now.

   After inserting this into the database, it returns a message to the CLIENT as follows:

        {cell_id:6, action:'create', type:'container', parent_cell_id:5, status:'needs_work'}

   The CLIENT receives the message and creates a containing cell with id 6 inside of the cell with id 5 and displays it. It also add 6 to the in-memory list of needs_work cells.

 6. WORKER 1 does a query for all cells that have status "needs_work" and for which worker is 1. It gets back an iterator with one document in it, namely the above inserted document (from step 5). It then:

     - Allocates a fresh Python process with id 1974 for evaluation of code in the worksheet: 'wstein/19/17' (the folder id=19 does matter, since the same worksheet linked to another folder, has totally different semantics due to relative paths).

     - Does a database query to change the container document to the following:

        {type:'container', input:"print 2\nsleep(10)\nprint 3\ngraph_editor()",
         status:"working", worker:1, last_update_time:392924082.8,
         cell_id:6, worksheet_id:17, parent_cel:5, user_id='wstein'}

     - Sends a message to the Python process with id 1974 to evaluate "print 2\nsleep(10)\nprint 3\ngraph_editor()".
    
 7. The CLIENT queries the SERVER

           /home/wstein/19/17/6/updates?sequence=0

    to which the SERVER can respond with updates about cell 6, after the update tagged 0. The SERVER does the following:

        - Does a database update to record that the given worksheet is being viewed by a USER:

                  {folder_id:19, worksheet_id:17, user_id='wstein', last_update_time:392924082.9}
       
        - Responds to the CLIENT with nothing much, since nothing happened yet.

                   {cell_id:6, status:'working', next_sequence:1}

        - Records in the DATABASE the the document [{cell_id:6, status:'working', next_sequence:1}] in the list of messages for the cell.
 
        - The CLIENT does *not* get the response message, due to a flakie network.

 8. Meanwhile, WORKER 1 checks on its message queue and finds the following output (to stdout) for process 1974: "2". It then does this:

        - Queries the DATABASE to determine that 7 is the next available cell_id

        - Adds the following document to the DATABASE:
            {cell_id:7, parent_cel:6, worksheet_id:17, user_id='wstein',