SEP: Migrate to modern DVCS-based development workflow

This is a proposal to migrate Sage from our current (2012-02-13) development model to a more modern, distributed version control -based workflow.

Hopefully the following should be understandable even to very new Sage developers. Note that this SEP does not apply to "spinoff" projects such as the Sage Notebook Server (i.e. the web-application interface to Sage), even though it is packaged with Sage.

For those who are in a hurry, you can just skip to the actual proposal.

Current workflow

At the moment, we have the following workflow.

Trac tickets and patches

The release manager (a person, currently Jeroen Demeyer) maintains four major Mercurial repositories, namely the root, library, scripts, and extcode repositories, which are located in ., devel/sage-main, local/bin, and devel/ext-main respectively (relative to the base path of the Sage installation, a.k.a. $SAGE_ROOT).

Whenever Sage releases a new stable version, both its binary and source tarballs contain all four of these repositories, with all the history up to that point.

When a developer wants to make a change to Sage code - usually in the library or scripts repositories - he must first open a ticket on the Sage trac issue tracker. Then he must provide a patch, or a series of patches, which demonstrate the changes he would like to make to the code, and upload the patches to the trac ticket page as attachments. Generally these patches will be generated automatically by Mercurial (which is shipped with Sage), specifically its Mercurial Queues extension.

Once the patches are uploaded to the trac ticket, other Sage developers must review the code. Often some deficiency will be pointed out by a commenter, and the code must be changed. Usually the author of the patches will simply make the changes, and use Mercurial to update the patch or set of patches. Sometimes the author will instead create a new patch, to be applied on top of the already uploaded ones, which implements the changes requested.

In either case, the new patch or patches must be uploaded to the trac ticket. At this point the previously existing patches on the trac ticket may be out of date. To clarify which patches are still relevant, developers are required to mention in the ticket description the exact list of patches they would like to apply to Sage and in what order. This description is not required to be in any particular machine-readable format.

As the release manager prepares to create a new stable release of Sage, he builds an ordered list of tickets which contain code changes which have been positively reviewed but not yet incorporated into Sage. The order of this list should be such that the patches from a ticket later on in the list will apply cleanly on top of tickets found earlier in the list.

If no such order can be found easily by the release manager due to conflicts between two tickets, he may request one of the authors to "rebase" his code changes on the other ticket's code (upload a new set of patches which does apply cleanly on the other ticket's code). Developers can also specify on the trac ticket what other tickets' patches must be earlier on the list than the ticket in question, by using the "dependencies" field.

As the time between releases is not very short, the release manager periodically releases "development versions" of Sage, which are named by appending "beta" or "rc" (release candidate) followed by a number to the version number of the next Sage version that will be released. These development versions contain copies of the four major repositories onto which the list of patches so far has been applied. These applied patches appear in the history of the repository, but this history has no future, as eventually when the ordered list of tickets is finalized, they will be applied all over again to the old stable release to produce the new stable release.

The purpose of development releases is to allow developers to base their patches on a partially complete patch list, to make it easier to ensure that a cleanly-applying ordering of patches exists by the time the next stable release comes around. However, many developers continue to base their patches on the latest stable release instead of the latest development release anyway.

This is partially because the bits of extra history found in each development release are no longer found in the next development release, causing the sage --upgrade command to break when used in a development release, so to use development releases you have to install build them from scratch every time, which can be very time consuming.

Sage-Combinat

Sage-Combinat is a project founded in 2008 by former developers of the MuPAD package MuPAD-Combinat. It has converted MuPAD-Combinat into a part of the Sage library and aims to continue development of combinatorics-related sections of the Sage library. Sage-Combinat deserves special mention here because they have their own development method which takes the above patch-based method to extremes.

Sage-Combinat developers have their own mailing list, sage-combinat-devel, where they coordinate their development. In order to conform to the above development workflow of Sage, the Sage-Combinat developers must write and perfect single patches that implement certain features or bugfixes. Since these patches all generally involve the combinatorics section of the Sage library, they often conflict with each other.

To preemptively avoid the eventual problems that would result from two conflicting patches being accepted, Sage-Combinat keeps a centralized list of all their patches in an order that guarantees that they will apply properly. Since Combinat patches often remain in progress for a relatively long time, there is a very large number of patches in this list. The list even contains patches that update quite old versions of the Sage library to the current version, for the benefit of Sage-Combinat developers who have not upgraded yet.

This list is maintained under Mercurial version control, primarily by Nicolas Thiéry and Florent Hivert, in the combinat patches repository.

Packages

Apart from the four major repositories, Sage as a distribution of mathematical software also has a package installation system, which uses packages called SPKGs. Each SPKG is a .tar.bz2 or .tar archive containing both the vanilla source code for some piece of software and ancillary files which are used to patch, customize, build, and install the software for Sage's specific purposes.

Each SPKG contains its own Mercurial repository which tracks the ancillary files but not the vanilla source code. When a developer wants to modify these ancillary files, he must commit his changes to the repository inside the archive, and simultaneously document those changes in the SPKG.txt file in the archive. Then he must upload the new SPKG, with a bumped version number, to some website (for example the spkg-upload Google Code project exists solely for this purpose), and provide a link to it on the trac ticket.

It is up to the developers to figure out how to coordinate their work on the SPKG, if indeed multiple people are working on the SPKG. However, this happens only rarely.

Patchbot

Thanks to Robert Bradshaw, there is a bot running on the Sage cluster at the University of Washington which periodically trawls through Sage trac and looks for tickets with new code on them. When it sees new code, it puts the ticket in a queue for testing. Testing a ticket involves downloading the patch files from the ticket, figuring out which patches to apply, what order to apply them in, and what version of Sage to apply them to, doing so, and then running the full doctest suite on them, i.e. checking that all the examples in the documentation strings in the Python/Cython source code indeed produce the output shown when run.

Problems

There are several problems with the current workflow. Here is a list of some of them, in approximately the same order as the contents of the above current workflow section.

  1. Sage has four major repositories and arbitrarily many SPKG repositories, instead of one repository like most software. This adds to complexity and may confuse new developers.
  2. Requiring human developers to manually create and upload patch files adds to the maintenance burden for coordinators.
  3. The lack of a standardized machine-readable format in which to specify on a ticket which patches to apply where and in what order causes the patchbot to often guess the answers to these questions incorrectly, and causes developers to be uncertain as to how to influence the patchbot's guesses.
  4. The common practice of continually updating patches with new versions is confusing because one ends up with a soup of patches on a trac ticket, only the latest few of which are actually relevant anymore.

    This is especially bad since there is no uniform naming scheme for trac attachments which could give a clue about the correct ordering / which attachments should be ignored. Also, old attachments are often overwritten entirely when a new attachment with the same name is uploaded, leaving behind no trace of its existence.

  5. In that vein, continually updating patches (as opposed to only adding new patches on top of existing ones) encourages history rewriting, which leads to a loss of granularity and larger individual commits in the final Mercurial history of the Sage repositories.

    This is bad because it makes automated rebasing of patches more difficult. When a patch is based on an old version of Sage and must be rebased on a newer version of Sage, it is necessary to reconcile any changes the patch makes with any changes to the same locations in files which have occurred between the old version of Sage and the new version of Sage.

    If these changes are presented in small pieces, there is more semantic information about what has happened and what lines have moved where, which often allows version control systems to perform rebases automatically. If the changes are presented in giant blocks, this becomes more difficult, leading to more work for developers as they must do the rebasing manually.

  6. Patch files by nature provide no information about what revision they should be applied to. This means that reviewers and the patchbot are forced to guess the correct revision to use.
  7. If it becomes necessary to rebase a patch file on another patch file, it is often difficult to do so manually. Mercurial can help you rebase commits on other commits, but if neither of the patches is actually in the released Sage codebase already, you cannot have them both applied at the same time in order to take advantage of this functionality, unless you start committing permanent changes to the Sage repositories, which will then screw up sage --upgrade in the future.
  8. The fact that development versions of Sage have throwaway commits in them is extremely confusing and a bad practice, as commits that have been publicized (in a full alpha/beta/rc tarball no less, not just on a repository website), should not be rescinded if at all possible.
  9. The impossibility of upgrading from such a development version of Sage is a problem in and of itself.
  10. The maintenance burden of Sage-Combinat's patch queue is excessive. It would be nice if it could be simplified somehow.

Proposal

Warning

This is a work in progress!

We propose to improve the workflow of Sage development by moving away from using patch files to communicate changes to the Sage library and ancillary structures, and instead start to use the modern DVCS (distributed version control system) method of lightweight branching and merging. We also propose various other improvements of developers' situation when writing code for Sage.

Primary goals:

  • Switch from patches to branches

    • Consolidate all Sage repositories into a single repository

      • Initially this will be the four core Sage repositories, but as SPKGs are updated, the installer/patch repositories should be merged as well

      • The src/ directory of the non-core Sage SPKGs will be separated from the rest of the SPKG (which is under version control) and placed in a different location.

      • This requires a new directory structure layout; proposed new layout:

        sage_root/
            sage          # the binary
            Makefile      # top level Makefile
            (configure)   # perhaps, eventually
            ...           # other standard top level files (README, etc.)
            build/        # sage's build system
                deps
                install
                ...
                pkgs/     # install, patch, and metadata from spkgs
            src/
                setup.py
                module_list.py
                ...
                sage/     # sage library, i.e. devel/sage-main/sage
                ext/      # sage_extcode, i.e. devel/ext-main
                mac-app/  # would no longer have to awkwardly be in extcode
                bin/      # sage_scripts, i.e. the scripts in local/bin that are tracked
            upstream/     # (stripped) tarballs of upstream sources (not tracked)
            local/        # installed binaries and compile artifacts (not tracked)
    • Switch to git for version control

    • Implement and use something similar to ccache for Cython, so that building will be faster when switching branches

  • Implement a better review system on Trac

    • Make Trac aware of users' personal repositories and read new commits from them into its own overarching repository on demand
    • Implement "attaching" of branches to a ticket
      • By "attaching" we mean that there is an easy method to add a link to the list of new changesets not already in the development branch.
    • Make it easy to view source code, commits, changesets, and hopefully even diffs between arbitrary pairs of commits on Trac
      • Trac already has this functionality
    • Customize Trac to allow for line-by-line comments on changesets
      • Also allow for line-by-line comments on patch files that currently exist on Trac
  • Make a script, sage dev, which completely wraps some limited git functionality necessary to allow developers to use our new workflow without being git experts and also provides a command line interface for adding, modifying, reviewing, viewing, or commenting on a ticket on Trac.

    • It will know about Trac, and handle any branching or merging required
    • User is hand-held through everything they need to do - i.e. a wizard for development
      • User configurable to allow disabling parts of the wizard.
  • FUTURE: Implement "live development" from sagenb.org or other public notebook servers

See also our brainstorming page on the wiki page for Review Days 2, which was where most of these ideas came together.

WorkflowSEP (last edited 2013-03-24 19:22:05 by rohana)