5649
Comment:
|
5711
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from Notebook scalability/Database calls |
Guiding principle: put as much in the URL as is reasonable.
Things that happen:
Notebook:
- download worksheet(s)
- upload worksheet(s)
- add user
- new worksheet
- set_metadata
- get_metadata
Worksheet:
evaluate_code(input)
- introspection
- interacts
get_output(uid)
- create cells
- delete cells
- join cells
- split cells
- promote/demote cells in a cell heirarchy
- publish
- share
- delete
- user ping
- get text representation, html representation
- rename
- set_metadata: deals with system, pretty printing, etc.
- get_metadata
Object
- change input
- evaluate
get_output(start=0)
-- start - offset telling how much text we've already received
- update metadata: hide, lock, cell priority...
JSON <---> HTTP
HTTP Requests:
Structure of URL:
/home/ username / worksheet_num / cell_id /..
- ../update
JSON Messages:
'id' : id
'status' : 'success' or 'failure'
Database Calls:
- increase_worksheet_state_number()
update_cell_input(cell_id, input_text)
evaluate_cell(cell_id)
insert_after_cell(cell_id)
Example
Here's an example of the sequence of events when a person evaluates the following input:
print 2 sleep(10) print 3 graph_editor()
The cast is as follows:
- USER -- a human or a program controlling a web browser or other user interface
- CLIENT -- a program, possibly in javascript that displays something
- to the USER
- SERVER -- a program that handles requests from the CLIENT,
- typically a web server such as flask + mod_wsgi + apache.
- DATABASE -- stores data
- WORKER -- queries the DATABASE for work that needs to be performed,
- does that work, and updates the database in response
- The USER types the above into an input object and presses shift-enter.
- The CLIENT (e.g., javascript) instantly adds some confirmation that the input is being sent, e.g., a spinning wheel, a green bar, or something. This CLIENT widget will timeout with an error if no output appears after 15 seconds (say).
- The CLIENT sends a message to the SERVER using this URL schema:
- /home/wstein/19/17/5/save_and_evaluate
- The request does contain the input to the cell. Here 19=folder_id, 17=worksheet_id, 5=cell_id.
- (Alternatively, if the input was not changed -- e.g., in evaluate all.) the CLIENT sends a message to the SERVER using this URL schema:
- /home/wstein/19/17/5/evaluate
- The request does *NOT* contain the input to the cell.
- The SERVER receives the above request (let's just assume it is the evaluate one).
- The SERVER also inserts the following document into the DATABASE:
- {type:'container', input:"print 2\nsleep(10)\nprint 3\ngraph_editor()",
- status:"needs_work", worker:1, last_update_time:392924082.494, cell_id:6, worksheet_id:17, parent_cel:5, user_id='wstein'}
- {type:'container', input:"print 2\nsleep(10)\nprint 3\ngraph_editor()",
- The last_update_time is right now. After inserting this into the database, it returns a message to the CLIENT as follows:
- {cell_id:6, action:'create', type:'container', parent_cell_id:5, status:'needs_work'}
- The SERVER also inserts the following document into the DATABASE:
- WORKER 1 does a query for all cells that have status "needs_work" and for which worker is 1. It gets back an iterator with one document in it, namely the above inserted document (from step 5). It then:
- - Allocates a fresh Python process with id 1974 for evaluation of code in the worksheet: 'wstein/19/17' (the folder id=19 does matter, since the same worksheet linked to another folder, has totally different semantics due to relative paths). - Does a database query to change the container document to the following:
- {type:'container', input:"print 2\nsleep(10)\nprint 3\ngraph_editor()",
- status:"working", worker:1, last_update_time:392924082.8, cell_id:6, worksheet_id:17, parent_cel:5, user_id='wstein'}
- {type:'container', input:"print 2\nsleep(10)\nprint 3\ngraph_editor()",
- - Allocates a fresh Python process with id 1974 for evaluation of code in the worksheet: 'wstein/19/17' (the folder id=19 does matter, since the same worksheet linked to another folder, has totally different semantics due to relative paths). - Does a database query to change the container document to the following:
- The CLIENT queries the SERVER
- /home/wstein/19/17/6/updates?sequence=0
- to which the SERVER can respond with updates about cell 6, after the update tagged 0. The SERVER does the following:
- - Does a database update to record that the given worksheet is being viewed by a USER:
- {folder_id:19, worksheet_id:17, user_id='wstein', last_update_time:392924082.9}
- {cell_id:6, status:'working', next_sequence:1}
- - Does a database update to record that the given worksheet is being viewed by a USER:
- Meanwhile, WORKER 1 checks on its message queue and finds the following output (to stdout) for process 1974: "2". It then does this:
- - Queries the DATABASE to determine that 7 is the next available cell_id - Adds the following document to the DATABASE:
- {cell_id:7, parent_cel:6, worksheet_id:17, user_id='wstein',
- - Queries the DATABASE to determine that 7 is the next available cell_id - Adds the following document to the DATABASE: