Differences between revisions 25 and 26
Revision 25 as of 2011-01-13 02:30:35
Size: 5649
Editor: was
Comment:
Revision 26 as of 2017-03-22 01:25:40
Size: 5711
Editor: mrennekamp
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from Notebook scalability/Database calls

Guiding principle: put as much in the URL as is reasonable.

Things that happen:

Notebook:

  • download worksheet(s)
  • upload worksheet(s)
  • add user
  • new worksheet
  • set_metadata
  • get_metadata

Worksheet:

  • evaluate_code(input)

    • introspection
    • interacts
  • get_output(uid)

  • create cells
  • delete cells
  • join cells
  • split cells
  • promote/demote cells in a cell heirarchy
  • publish
  • share
  • delete
  • user ping
  • get text representation, html representation
  • rename
  • set_metadata: deals with system, pretty printing, etc.
  • get_metadata

Object

  • change input
  • evaluate
  • get_output(start=0)

    • -- start - offset telling how much text we've already received

  • update metadata: hide, lock, cell priority...

JSON <---> HTTP

HTTP Requests:

Structure of URL:

/home/ username / worksheet_num / cell_id /..

  • ../update

JSON Messages:

  • 'id' : id

  • 'status' : 'success' or 'failure'

Database Calls:

  • increase_worksheet_state_number()
  • update_cell_input(cell_id, input_text)

  • evaluate_cell(cell_id)

  • insert_after_cell(cell_id)

Example

Here's an example of the sequence of events when a person evaluates the following input:

print 2
sleep(10)
print 3
graph_editor()

The cast is as follows:

  • USER -- a human or a program controlling a web browser or other user interface
  • CLIENT -- a program, possibly in javascript that displays something
    • to the USER
  • SERVER -- a program that handles requests from the CLIENT,
    • typically a web server such as flask + mod_wsgi + apache.
  • DATABASE -- stores data
  • WORKER -- queries the DATABASE for work that needs to be performed,
    • does that work, and updates the database in response
  • The USER types the above into an input object and presses shift-enter.
  • The CLIENT (e.g., javascript) instantly adds some confirmation that the input is being sent, e.g., a spinning wheel, a green bar, or something. This CLIENT widget will timeout with an error if no output appears after 15 seconds (say).
  • The CLIENT sends a message to the SERVER using this URL schema:
    • /home/wstein/19/17/5/save_and_evaluate
    • The request does contain the input to the cell. Here 19=folder_id, 17=worksheet_id, 5=cell_id.
  • (Alternatively, if the input was not changed -- e.g., in evaluate all.) the CLIENT sends a message to the SERVER using this URL schema:
    • /home/wstein/19/17/5/evaluate
    • The request does *NOT* contain the input to the cell.
  • The SERVER receives the above request (let's just assume it is the evaluate one).
    • The SERVER also inserts the following document into the DATABASE:
      • {type:'container', input:"print 2\nsleep(10)\nprint 3\ngraph_editor()",
        • status:"needs_work", worker:1, last_update_time:392924082.494, cell_id:6, worksheet_id:17, parent_cel:5, user_id='wstein'}
    • The last_update_time is right now. After inserting this into the database, it returns a message to the CLIENT as follows:
      • {cell_id:6, action:'create', type:'container', parent_cell_id:5, status:'needs_work'}
      The CLIENT receives the message and creates a containing cell with id 6 inside of the cell with id 5 and displays it. It also add 6 to the in-memory list of needs_work cells.
  • WORKER 1 does a query for all cells that have status "needs_work" and for which worker is 1. It gets back an iterator with one document in it, namely the above inserted document (from step 5). It then:
    • - Allocates a fresh Python process with id 1974 for evaluation of code in the worksheet: 'wstein/19/17' (the folder id=19 does matter, since the same worksheet linked to another folder, has totally different semantics due to relative paths). - Does a database query to change the container document to the following:
      • {type:'container', input:"print 2\nsleep(10)\nprint 3\ngraph_editor()",
        • status:"working", worker:1, last_update_time:392924082.8, cell_id:6, worksheet_id:17, parent_cel:5, user_id='wstein'}
      - Sends a message to the Python process with id 1974 to evaluate "print 2\nsleep(10)\nprint 3\ngraph_editor()".
  • The CLIENT queries the SERVER
    • /home/wstein/19/17/6/updates?sequence=0
    • to which the SERVER can respond with updates about cell 6, after the update tagged 0. The SERVER does the following:
      • - Does a database update to record that the given worksheet is being viewed by a USER:
        • {folder_id:19, worksheet_id:17, user_id='wstein', last_update_time:392924082.9}
        - Responds to the CLIENT with nothing much, since nothing happened yet.
        • {cell_id:6, status:'working', next_sequence:1}
        - Records in the DATABASE the the document [{cell_id:6, status:'working', next_sequence:1}] in the list of messages for the cell. - The CLIENT does *not* get the response message, due to a flakie network.
  • Meanwhile, WORKER 1 checks on its message queue and finds the following output (to stdout) for process 1974: "2". It then does this:
    • - Queries the DATABASE to determine that 7 is the next available cell_id - Adds the following document to the DATABASE:
      • {cell_id:7, parent_cel:6, worksheet_id:17, user_id='wstein',