Differences between revisions 9 and 10
Revision 9 as of 2013-03-24 19:22:05
Size: 16299
Editor: rohana
Comment:
Revision 10 as of 2022-04-05 05:21:18
Size: 0
Editor: mkoeppe
Comment: Outdated
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
{{{#!rst

======================================================
SEP: Migrate to modern DVCS-based development workflow
======================================================

This is a proposal to migrate Sage from our current (2012-02-13)
development model to a more modern, distributed version control -based
workflow.

Hopefully the following should be understandable even to very new Sage
developers. Note that this SEP does not apply to "spinoff" projects such
as the Sage Notebook Server (i.e. the web-application interface to
Sage), even though it is packaged with Sage.

For those who are in a hurry, you can just skip to the actual
`proposal`_.

Current workflow
================

At the moment, we have the following workflow.

Trac tickets and patches
------------------------

The release manager (a person, currently Jeroen Demeyer) maintains four
major Mercurial repositories, namely the root, library, scripts, and
extcode repositories, which are located in ``.``, ``devel/sage-main``,
``local/bin``, and ``devel/ext-main`` respectively (relative to the base
path of the Sage installation, a.k.a. ``$SAGE_ROOT``).

Whenever Sage releases a new stable version, both its binary and source
tarballs contain all four of these repositories, with all the history up
to that point.

When a developer wants to make a change to Sage code - usually in the
library or scripts repositories - he must first open a ticket on the
`Sage trac`_ issue tracker. Then he must provide a patch, or a series of
patches, which demonstrate the changes he would like to make to the
code, and upload the patches to the trac ticket page as attachments.
Generally these patches will be generated automatically by Mercurial
(which is shipped with Sage), specifically its Mercurial Queues
extension.

Once the patches are uploaded to the trac ticket, other Sage developers
must review the code. Often some deficiency will be pointed out by
a commenter, and the code must be changed. Usually the author of the
patches will simply make the changes, and use Mercurial to update the
patch or set of patches. Sometimes the author will instead create a new
patch, to be applied on top of the already uploaded ones, which
implements the changes requested.

In either case, the new patch or patches must be uploaded to the trac
ticket. At this point the previously existing patches on the trac ticket
may be out of date. To clarify which patches are still relevant,
developers are required to mention in the ticket description the exact
list of patches they would like to apply to Sage and in what order. This
description is not required to be in any particular machine-readable
format.

As the release manager prepares to create a new stable release of Sage,
he builds an ordered list of tickets which contain code changes which
have been positively reviewed but not yet incorporated into Sage. The
order of this list should be such that the patches from a ticket later
on in the list will apply cleanly on top of tickets found earlier in the
list.

If no such order can be found easily by the release manager due to
conflicts between two tickets, he may request one of the authors to
"rebase" his code changes on the other ticket's code (upload a new set
of patches which does apply cleanly on the other ticket's code).
Developers can also specify on the trac ticket what other tickets'
patches must be earlier on the list than the ticket in question, by
using the "dependencies" field.

As the time between releases is not very short, the release manager
periodically releases "development versions" of Sage, which are named by
appending "beta" or "rc" (release candidate) followed by a number to the
version number of the next Sage version that will be released. These
development versions contain copies of the four major repositories onto
which the list of patches so far has been applied. These applied patches
appear in the history of the repository, but this history has no future,
as eventually when the ordered list of tickets is finalized, they will
be applied all over again to the old stable release to produce the new
stable release.

The purpose of development releases is to allow developers to base their
patches on a partially complete patch list, to make it easier to ensure
that a cleanly-applying ordering of patches exists by the time the next
stable release comes around. However, many developers continue to base
their patches on the latest stable release instead of the latest
development release anyway.

This is partially because the bits of extra history found in each
development release are no longer found in the next development release,
causing the ``sage --upgrade`` command to break when used in
a development release, so to use development releases you have to
install build them from scratch every time, which can be very time
consuming.

.. _Sage trac: http://trac.sagemath.org/sage_trac/


Sage-Combinat
-------------

`Sage-Combinat`_ is a project founded in 2008 by former developers of
the MuPAD package MuPAD-Combinat. It has converted MuPAD-Combinat into
a part of the Sage library and aims to continue development of
combinatorics-related sections of the Sage library. Sage-Combinat
deserves special mention here because they have their own development
method which takes the above patch-based method to extremes.

Sage-Combinat developers have their own mailing list,
`sage-combinat-devel`_, where they coordinate their development. In
order to conform to the above development workflow of Sage, the
Sage-Combinat developers must write and perfect single patches that
implement certain features or bugfixes. Since these patches all
generally involve the combinatorics section of the Sage library, they
often conflict with each other.

To preemptively avoid the eventual problems that would result from two
conflicting patches being accepted, Sage-Combinat keeps `a centralized
list of all their patches`_ in an order that guarantees that they will
apply properly. Since Combinat patches often remain in progress for
a relatively long time, there is a very large number of patches in this
list. The list even contains patches that update quite old versions of
the Sage library to the current version, for the benefit of
Sage-Combinat developers who have not upgraded yet.

This list is maintained under Mercurial version control, primarily by
Nicolas Thiéry and Florent Hivert, in the `combinat patches
repository`_.

.. _Sage-Combinat: http://combinat.sagemath.org/
.. _sage-combinat-devel:
   http://groups.google.com/group/sage-combinat-devel
.. _a centralized list of all their patches:
   http://combinat.sagemath.org/patches/file/tip/series
.. _combinat patches repository: http://combinat.sagemath.org/patches/


Packages
--------

Apart from the four major repositories, Sage as a distribution of
mathematical software also has a package installation system, which uses
packages called SPKGs. Each SPKG is a .tar.bz2 or .tar archive
containing both the vanilla source code for some piece of software and
ancillary files which are used to patch, customize, build, and install
the software for Sage's specific purposes.

Each SPKG contains its own Mercurial repository which tracks the
ancillary files but not the vanilla source code. When a developer wants
to modify these ancillary files, he must commit his changes to the
repository inside the archive, and simultaneously document those changes
in the SPKG.txt file in the archive. Then he must upload the new SPKG,
with a bumped version number, to some website (for example the
`spkg-upload Google Code project`_ exists solely for this purpose), and
provide a link to it on the trac ticket.

It is up to the developers to
figure out how to coordinate their work on the SPKG, if indeed multiple
people are working on the SPKG. However, this happens only rarely.

.. _spkg-upload Google Code project:
   http://code.google.com/p/spkg-upload


Patchbot
--------

Thanks to Robert Bradshaw, there is a bot running on the Sage cluster at
the University of Washington which periodically trawls through `Sage
trac`_ and looks for tickets with new code on them. When it sees new
code, it puts the ticket in a queue for testing. Testing a ticket
involves downloading the patch files from the ticket, figuring out which
patches to apply, what order to apply them in, and what version of Sage
to apply them to, doing so, and then running the full doctest suite on
them, i.e. checking that all the examples in the documentation strings
in the Python/Cython source code indeed produce the output shown when
run.



Problems
========

There are several problems with the current workflow. Here is a list of
some of them, in approximately the same order as the contents of the
above `current workflow`_ section.

#. Sage has four major repositories and arbitrarily many SPKG
    repositories, instead of one repository like most software. This
    adds to complexity and may confuse new developers.

#. Requiring human developers to manually create and upload patch files
    adds to the maintenance burden for coordinators.

#. The lack of a standardized machine-readable format in which to
    specify on a ticket which patches to apply where and in what order
    causes the patchbot to often guess the answers to these questions
    incorrectly, and causes developers to be uncertain as to how to
    influence the patchbot's guesses.

#. The common practice of continually updating patches with new
    versions is confusing because one ends up with a soup of patches on
    a trac ticket, only the latest few of which are actually relevant
    anymore.

    This is especially bad since there is no uniform naming scheme for
    trac attachments which could give a clue about the correct ordering
    / which attachments should be ignored. Also, old attachments are
    often overwritten entirely when a new attachment with the same name
    is uploaded, leaving behind no trace of its existence.

#. In that vein, continually updating patches (as opposed to only
    adding new patches on top of existing ones) encourages history
    rewriting, which leads to a loss of granularity and larger
    individual commits in the final Mercurial history of the Sage
    repositories.

    This is bad because it makes automated rebasing of patches more
    difficult. When a patch is based on an old version of Sage and must
    be rebased on a newer version of Sage, it is necessary to reconcile
    any changes the patch makes with any changes to the same locations
    in files which have occurred between the old version of Sage and the
    new version of Sage.

    If these changes are presented in small pieces, there is more
    semantic information about what has happened and what lines have
    moved where, which often allows version control systems to perform
    rebases automatically. If the changes are presented in giant blocks,
    this becomes more difficult, leading to more work for developers as
    they must do the rebasing manually.

#. Patch files by nature provide no information about what revision
    they should be applied to. This means that reviewers and the
    patchbot are forced to guess the correct revision to use.

#. If it becomes necessary to rebase a patch file on another patch
    file, it is often difficult to do so manually. Mercurial can help
    you rebase commits on other commits, but if neither of the patches
    is actually in the released Sage codebase already, you cannot have
    them both applied at the same time in order to take advantage of
    this functionality, unless you start committing permanent changes to
    the Sage repositories, which will then screw up ``sage --upgrade``
    in the future.

#. The fact that development versions of Sage have throwaway commits in
    them is extremely confusing and a bad practice, as commits that have
    been publicized (in a full alpha/beta/rc tarball no less, not just
    on a repository website), should *not* be rescinded if at all
    possible.

#. The impossibility of upgrading from such a development version of
    Sage is a problem in and of itself.

#. The maintenance burden of Sage-Combinat's patch queue is excessive.
    It would be nice if it could be simplified somehow.


Proposal
========
.. warning:: This is a work in progress!

We propose to improve the workflow of Sage development by moving away
from using patch files to communicate changes to the Sage library and
ancillary structures, and instead start to use the modern DVCS
(distributed version control system) method of lightweight branching
and merging. We also propose various other improvements of developers'
situation when writing code for Sage.

Primary goals:

* Switch from patches to branches

  - Consolidate *all* Sage repositories into a single repository

    + Initially this will be the four core Sage repositories, but as
      SPKGs are updated, the installer/patch repositories should be
      merged as well

    + The src/ directory of the non-core Sage SPKGs will be separated
      from the rest of the SPKG (which is under version control) and
      placed in a different location.

    + This requires a new directory structure layout; proposed new
      layout::

        sage_root/
            sage # the binary
            Makefile # top level Makefile
            (configure) # perhaps, eventually
            ... # other standard top level files (README, etc.)
            build/ # sage's build system
                deps
                install
                ...
                pkgs/ # install, patch, and metadata from spkgs
            src/
                setup.py
                module_list.py
                ...
                sage/ # sage library, i.e. devel/sage-main/sage
                ext/ # sage_extcode, i.e. devel/ext-main
                mac-app/ # would no longer have to awkwardly be in extcode
                bin/ # sage_scripts, i.e. the scripts in local/bin that are tracked
            upstream/ # (stripped) tarballs of upstream sources (not tracked)
            local/ # installed binaries and compile artifacts (not tracked)

  - Switch to git for version control

  - Implement and use something similar to ccache for Cython, so that
    building will be faster when switching branches

* Implement a better review system on Trac

  - Make Trac aware of users' personal repositories and read new commits
    from them into its own overarching repository on demand

  - Implement "attaching" of branches to a ticket

    + By "attaching" we mean that there is an easy method to add a link
      to the list of new changesets not already in the development branch.

  - Make it easy to view source code, commits, changesets, and hopefully
    even diffs between arbitrary pairs of commits on Trac

    + Trac already has this functionality

  - Customize Trac to allow for line-by-line comments on changesets

    + Also allow for line-by-line comments on patch files that currently
      exist on Trac

* Make a script, ``sage dev``, which completely wraps some limited git
  functionality necessary to allow developers to use our new workflow
  without being git experts and also provides a command line interface
  for adding, modifying, reviewing, viewing, or commenting on a ticket
  on Trac.

  - It will know about Trac, and handle any branching or merging
    required

  - User is hand-held through everything they need to do - i.e. a
    wizard for development

    + User configurable to allow disabling parts of the wizard.

* FUTURE: Implement "live development" from sagenb.org or other public
  notebook servers

See also our `brainstorming page`_ on the wiki page for Review Days 2,
which was where most of these ideas came together.

.. _brainstorming page:
    http://wiki.sagemath.org/review2/Projects/SystemProposals

}}}