Differences between revisions 79 and 105 (spanning 26 versions)

Databases

Other SAGE Days 4 Projects

SAGE interface to sqlite: creating your own database using sqlite in sage

Robert Miller, Emily Kirkman, Tom Boothby, David Joyner, Yi Qiang

SAGE Enhancement Proposal

Create two classes for interacting with sql databases: the base class will be an immutable database object, and a mutable database object will extend that one with modification functions. Create a separate Query class that points to a specific database, keeping track of a query string and a set of join instructions, with logico-arithmetic operations like and, or, etc.
1. class SQLDatabase(SageObject) is the base.
  - idea - a different class for databases that are included with sage
  - instance field
    - filename -- the database itself
    - connection -- a pysqlite connection to 'filename'
    - dict of tables -- keyed by table name, entries are
      - dict of columns -- keyed by column name, keeps track of indices, primary key state, data type
  - functions
    - init
      - D = SQLDatabase('graphs.db') returns a database on graphs up to 1,000,000 vertices, which is queryable, but cannot be modified.
    - copy
      - creates new db, mutable by default
      - skeleton option will create a new mutable database with empty tables but the same column/index/primary key structure
    - save
      - D.save('my.db') should copy the class's database file to my.db.
      - option to close connection with old file and connect to new file, but default behavior shouldn't do this. motivation:
        D = Database() D.add( something great! ) # If save automatically connects D.save('im_rich.db') # to saved file, results could D.remove( the great ) # easily be lost. D.add( useless ) D.save('crap.db') # Oh crap!
    - print
      - should print a string describing the skeleton of the database
    - vacuum
2. class MutableSQLDatabase(SQLDatabase) -- seems counterintuitive, to avoid issues relating to stupid people accidentally messing up an important database in a public setting (if possible) -- best way to avoid this is to not have mutation functions at all
  - functions
    - init
      - D = MutableSQLDatabase() creates a new temp db, ready to go
      - D = MutableSQLDatabase('mydb.db') opens a connection to mydb.db, ready to go
      - D = MutableSQLDatabase('yourdb.db') if called on a protected database, just make a temp copy and connect to that
    - create/drop table
    - create/drop column( column name, col type, table, bool index=False, bool primary key=False )
      - if no table specified, raise an error and educate user about sql
    - create/drop index( column, table(s) )
    - create/drop primary key( column, table )
      - QUESTION: should we allow multiple tables? bounce this off someone who knows (i.e. is it possible to do this with sqlite?)
    - create/drop row( table, dict )
      - create accepts a dictionary keyed by column name
        def foo(table=None, **kwds): if table is None: raise KeyError('Table must be specified') print kwds sage: foo(table='table_name', stuff='poo', dog='cat') {'stuff':'poo', 'dog':'cat'} sage: foo(stuff='poo', dog='cat') Exception (click to the left for traceback): ... KeyError: 'Table must be specified'
    - for later? add data from whatever (e.g. quickly via sql file): magic function to deal with other ways to add data? think about this more later.
3. class AdvancedQuery(SageObject) -- simply a pointer to an SQLDatabase object (recall, this could be mutable), and a string. When run_query is called, query the database and return results.
4. class Query(AdvancedQuery) -- a naive query class for the n00b in us all...
  - comes with operations intersect, union and complement
  - creating a new query: specify one table, columns, and simple search string (i.e. the "WHERE" clause)
  - A dict with entries for the "SELECT" clause, the "FROM" clause, and the "WHERE" clause
  - allows for recursive searching quickly, since we can simply modify the string, and wait until we have to execute the query
  - deals with crazy stuff like joins automatically, informing user of what is required of him/her
Persistent objects via ZODB. Only stores data for objects that have been declared to the db to be immutable, and only for classes that instantiate the proper functionality. Do this with at least one class to provide an example.
- perhaps a global option of whether to cache anything available or not, as well as class-level option for same thing
- different classes of objects would have different tables, defined by their classes.
- any object would be required to be set immutable. when it is set to mutable again, the entry in the database corresponding to the object would be dropped.
- export these tables to files, along with a pickle of each object, if desired.
  - note- this should not necessarily be depended on! pickling isn't quite stable yet. As William says, the safest is to store things in a text file, together with a function that can recreate the pickles (with a current version so pickles are up to date).
Create a web server modeled on the new notebook that uses Zope to serve either sql databases (for query type webpages like Grout's) or ZODB databases (e.g. for webpages such as WS's Modular Forms Database).
Automated Caching
- idea - when you do an expensive calculation on an object, the result is automatically stored in a global database.
- ZODB may be very good for object persistence
  - caveats: sometimes you need to tell ZODB that you have changed some data or else it won't remember...
  - ZODB never shrinks a file, unless you call a vacuum-like command
  - pickling is dangerous without a plain text backup, since pickling can fail when you bridge versions...
    - therefore, it is best to have a plain human-readable text file, and a function that will read in the text and convert the data to current pickled cucumbers...
  - real-world data is great for recursive databasing, but algebraic objects can be too complicated...
  - it's like a dictionary...
Online databases
- (*) host these same portable mergable files online, in a killer 2.0 web app that serves the data
- modeled on the notebook server, there could be a database server... Tom?... Can you hear me?... Apparently there's a SQLite server for Windoze called SQLite_on_sockets. That should be useful...
- these databases could also serve sobj's, and depending on how ambitious people are, it could seamlessly interface with the notebook...

Unresolved issues

lists as entries
- storing a whole list in one entry in the database: this is feasible via Emily's rawking regexp skills, IF we can find a delimiter guaranteed not to be in pickle strings. It is optimal for speed and memory.
- storing a list as an array of rows (experience tells us that this is inefficient)
Licensing: \exists DBGPL?
Noticing identical database entries [added by David Joyner -- I think this can be resolved using a query. Is that correct Tom?]
[added by David Joyner] I'd like the issue of "uniformity" to be discussed. For example, I think having an "AUTHORS" and "REFERENCES" field (like the OEIS) would be great. For example, when Kate Minolta computes a new Conway polynomial, not only is the entry added by she gets proper credit.

-  ⇤ ← Revision 79 as of 2007-06-16 09:39:05 → 
  Size: 5852
  Editor: anonymous
  Comment:
+   ← Revision 105 as of 2008-11-14 13:42:11 → ⇥
  Size: 7767
  Editor: anonymous
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-[:days4/projects/: Other SAGE Days 4 Projects]
+[[days4/projects/| Other SAGE Days 4 Projects]]

SAGE [[http://www.initd.org/pub/software/pysqlite/doc/usage-guide.html|interface]] to sqlite: [[sqlite-tutorial| creating your own database using sqlite in sage]]
-Line 7:
+Line 9:
-GOAL: SAGE Enhancement Proposal.
+== SAGE Enhancement Proposal ==
-Line 9:
+Line 11:
- * Unresolved issues:
  * require primary key to be the same in each table? else, how to join?
  * lists as entries: this is feasible via Emily's rawking regexp skills, IF we can find a delimiter guaranteed not to be in pickle strings (or pickled cucumbers, whichever comes first)
+. Create two classes for interacting with sql databases: the base class will be an immutable database object, and a mutable database object will extend that one with modification functions. Create a separate Query class that points to a specific database, keeping track of a query string and a set of join instructions, with logico-arithmetic operations like and, or, etc.
 Line 13:
- * Database class
  * instance field
   * filename -- the database itself
   * connection -- a pysqlite connection to 'filename'
   * query -- this would be the following two objects:
    * a string representing the 'where clause'.
    * a dict representing what data to return (default everything), {{{ {'table1':['col1', 'col2'], 'table2':['col9'] } }}}
    * allows for recursive searching quickly, since we can simply modify the string, and wait until we have to execute the query
    * recursive searching is strictly an 'intersection'-- both of data returned (design decision) and of where clauses (definition of recursive search)
   * dict of tables -- keyed by table name, entries are
    * dict of columns -- keyed by column name, keeps track of indices, primary key state
   * mutable -- boolean
  * functions
   * init
    * {{{A = Database()}}} creates a new (obviously mutable) temp database and opens a connection to it
    * {{{B = Database()}}} creates a new (different) temp database and opens a connection to it
    * {{{C = Database('existing.db')}}} opens a connection to 'existing.db'
    * {{{X = Database('special.db')}}} where 'special.db' is one of the databases included in sage, which will usually be treated as immutable
     * border case, not likely to happen, but what SHOULD happen
     * issue warning/error
     * create mutable Database instance
      * "Because hey, if you want to rm -rf your own hard drive, go for it!" (R. Miller)
     * create immutable Database instance
      * public notebook, BAD
      * avoidable with chroot jails?
      * is this even an issue?
      * can't someone mess with this via other methods?
   
   * (mut'ble only) create/drop table
   * (mut'ble only) create/drop column( column name, col type, table, bool index=False, bool primary key=False )
    * if no table specified, raise an error and educate user about sql
   * (mut'ble only) create/drop index( column, table(s) )
   * (mut'ble only) create/drop primary key( column, table )
    * QUESTION: should we allow multiple tables? bounce this off someone who knows (i.e. is it possible to do this with sqlite?)
   * (mut'ble only) create/drop row( table, dict )
    * create accepts a dictionary keyed by column name
+  A. class SQLDatabase(!SageObject) is the base.
   * idea - a different class for databases that are included with sage
   * instance field
    * filename -- the database itself
    * connection -- a pysqlite connection to 'filename'
    * dict of tables -- keyed by table name, entries are
     * dict of columns -- keyed by column name, keeps track of indices, primary key state, data type
   * functions
    * init
     * {{{D = SQLDatabase('graphs.db')}}} returns a database on graphs up to 1,000,000 vertices, which is queryable, but cannot be modified.
    * copy
     * creates new db, mutable by default
     * skeleton option will create a new mutable database with empty tables but the same column/index/primary key structure
    * save
     * {{{ D.save('my.db') }}} should copy the class's database file to my.db.
     * option to close connection with old file and connect to new file, but default behavior shouldn't do this. motivation:
-Line 51:
+Line 30:
+D = Database()
D.add( something great! ) # If save automatically connects
D.save('im_rich.db')      # to saved file, results could
D.remove( the great )     # easily be lost.
D.add( useless )
D.save('crap.db')         # Oh crap!
}}}
    * print
     * should print a string describing the skeleton of the database
    * vacuum
  
  A. class MutableSQLDatabase(SQLDatabase) -- seems counterintuitive, to avoid issues relating to stupid people accidentally messing up an important database in a public setting (if possible) -- best way to avoid this is to not have mutation functions at all
   * functions
    * init
     * {{{D = MutableSQLDatabase()}}} creates a new temp db, ready to go
     * {{{D = MutableSQLDatabase('mydb.db')}}} opens a connection to mydb.db, ready to go
     * {{{D = MutableSQLDatabase('yourdb.db')}}} if called on a protected database, just make a temp copy and connect to that
    * create/drop table
    * create/drop column( column name, col type, table, bool index=False, bool primary key=False )
     * if no table specified, raise an error and educate user about sql
    * create/drop index( column, table(s) )
    * create/drop primary key( column, table )
     * QUESTION: should we allow multiple tables? bounce this off someone who knows (i.e. is it possible to do this with sqlite?)
    * create/drop row( table, dict )
     * create accepts a dictionary keyed by column name
       {{{
-Line 56:
+Line 61:
-sage: foo(table='table_name', shit='poo', dog='cat')
# works
sage: foo(shit='poo', dog='cat')
+sage: foo(table='table_name', stuff='poo', dog='cat')
{'stuff':'poo', 'dog':'cat'}
sage: foo(stuff='poo', dog='cat')
-Line 63:
+Line 68:
+    * for later? add data from whatever (e.g. quickly via sql file): magic function to deal with other ways to add data? think about this more later.
-Line 64:
+Line 70:
-   * (mut'ble only) add data from whatever (e.g. quickly via sql file): magic function to deal with other ways to add data? think about this more later.
   * set_mutable
   * copy
    * since it is [http://www.mail-archive.com/[email protected]/msg24725.html possible] to have two connections to the same db file, whether to create a new db may be a good option
     * pros? cons?
    * mutable by default
    * skeleton option will create a new mutable database with empty tables but the same column/index/primary key structure
    * preserves the query string?
    * executes the query string first?
   * save
    * {{{ D.save('my.db') }}} should copy the class's database file to my.db.
    * this should not execute queries at all
    * option to close connection with old file and connect to new file, but default behavior shouldn't do this. motivation:
      {{{
D = Database()
D.add_a_bunch_of_shit()...
D.save('bunch of shit')
D.remove_shit()
D.add_piss_and_vinegar()...
D.save('piss and vinegar')
}}}
   * print
    * should probably just print a string with database name, number of tables...?
   * show
    * for extensions to the Database class, this can be anything, e.g. graph database prints nice table
    * for __default__ database class, execute any queries and print the data- this isn't that much of an issue since everything will be int, real, bool or string...
    * notebook will print a nice html table
    * for command line, output returned by sqlite is a pretty good template
   * fn to update query data (see under instance field)
    * inplace option
    * how to do a join?
   * vacuum
   * clear queries
 * !ImmutableDatabase class
  * idea - a different class for databases that are included with sage, to avoid issues relating to stupid people accidentally fucking up an important database in a public setting (if possible)
  * god damn cheeky -- have database class extend immutable database! although a database won't technically be an immutable database to a person, it is a good way to do this, since we can add on the property 'mutable', as well as all the modification functions
+  A. class !AdvancedQuery(!SageObject) -- simply a pointer to an SQLDatabase object (recall, this could be mutable), and a string. When run_query is called, query the database and return results.
-Line 101:
+Line 72:
-----------------------------------------------------------------------
+  A. class Query(!AdvancedQuery) -- a naive query class for the n00b in us all...
    * comes with operations intersect, union and complement
    * creating a new query: specify one table, columns, and simple search string (i.e. the "WHERE" clause)
    * A dict with entries for the "SELECT" clause, the "FROM" clause, and the "WHERE" clause
    * allows for recursive searching quickly, since we can simply modify the string, and wait until we have to execute the query
    * deals with crazy stuff like joins automatically, informing user of what is required of him/her

 1. Persistent objects via ZODB. Only stores data for objects that have been declared to the db to be immutable, and only for classes that instantiate the proper functionality. Do this with at least one class to provide an example.

  * perhaps a global option of whether to cache anything available or not, as well as class-level option for same thing
  * different classes of objects would have different tables, defined by their classes.
  * any object would be required to be set immutable. when it is set to mutable again, the entry in the database corresponding to the object would be dropped.
  * export these tables to files, along with a pickle of each object, if desired.
    * note- this should not necessarily be depended on! pickling isn't quite stable yet. As William says, the safest is to store things in a text file, together with a function that can recreate the pickles (with a current version so pickles are up to date).

 1. Create a web server modeled on the new notebook that uses Zope to serve either sql databases (for query type webpages like Grout's) or ZODB databases ([[http://modular.fas.harvard.edu:8080/mfd/index.html|e.g.]] for webpages such as WS's Modular Forms Database).
-Line 105:
+Line 91:
-   * Online databases
 
     * cgi web-accessible...
+ * Automated Caching
  * idea - when you do an expensive calculation on an object, the result is automatically stored in a global database.
  * ZODB may be very good for object persistence
     * caveats: sometimes you need to tell ZODB that you have changed some data or else it won't remember...
     * ZODB never shrinks a file, unless you call a vacuum-like command
     * pickling is dangerous without a plain text backup, since pickling can fail when you bridge versions...
      * therefore, it is best to have a plain human-readable text file, and a function that will read in the text and convert the data to current pickled cucumbers...
     * real-world data is great for recursive databasing, but algebraic objects can be too complicated...
     * it's like a dictionary...
-Line 109:
+Line 101:
-     * accessible from SAGE, perhaps hosted at sage.math...
+ * Online databases
  * (*) host these same portable mergable files online, in a killer 2.0 web app that serves the data
  * modeled on the notebook server, there could be a database server... Tom?... Can you hear me?... Apparently there's a SQLite server for Windoze called SQLite_on_sockets. That should be useful...
  * these databases could also serve sobj's, and depending on how ambitious people are, it could seamlessly interface with the notebook...
-Line 111:
+Line 106:
-     * SQLite servers?
-Line 113:
+Line 107:
-   * Automagic caching: keep track of computation results
-Line 115:
+Line 108:
-     * keeping those results in portable objects, making these objects merge-able, etc...

   * Licensing: $\exists$ DBGPL?

   * Noticing identical database entries

   * SAGE [http://www.initd.org/pub/software/pysqlite/doc/usage-guide.html interface] to sqlite [:sqlite-tutorial: creating your own database using sqlite in sage]
+== Unresolved issues ==
  1. lists as entries
     * storing a whole list in one entry in the database: this is feasible via Emily's rawking regexp skills, IF we can find a delimiter guaranteed not to be in pickle strings. It is optimal for speed and memory.
     * storing a list as an array of rows (experience tells us that this is inefficient)
  1. Licensing: $\exists$ DBGPL?
  1. Noticing identical database entries [added by David Joyner -- I think this can be resolved using a query. Is that correct Tom?]
  1. [added by David Joyner] I'd like the issue of "uniformity" to be discussed. For example, I think having an "AUTHORS" and "REFERENCES" field (like the [[http://www.research.att.com/~njas/sequences/|OEIS]]) would be great. For example, when Kate Minolta computes a new Conway polynomial, not only is the entry added by she gets proper credit.

Diff for "days4/projects/database"

Databases

SAGE Enhancement Proposal

Unresolved issues