Packaging and distributing SageMath

a Sage days 77 project

Part of the days 77 workshop was dedicated to studying possible improvements to SageMath's package system and availability in popular distributions.

There is a cyclic interest in these issues. Past Sage Days that have also dealt with them were days 4 and days 7.

When talking about modularization, packaging, distribution, etc., Sage devs may mean several different things at the same time:

Although these are separate problems, the interactions are non-trivial.

Package maintainers from several distributions were present, in person, or remotely, at Days 77:

The workshop was the occasion to share knowledge and concerns on packaging SageMath for Linux distributions. Given that some packagers are not involved in SageMath development proper, it was decided to create a separate sage-packaging mailing list exclusively devoted to packaging issues.

Gentoo

At the time of writing, SageMath ebuilds (packages) have existed for Gentoo for six years. The layout follows closely the Sage layout:

They are distributed as an overlay, though some individual ebuilds have made it into the standard distro.

At each new SageMath beta, the Sage-on-Gentoo overaly is updated, usually in less than 24h. This requires a couple of hours of manual work on average:

Additional notes

Debian

There have been repeated efforts to package SageMath for debian, as shown by this (outdated) wiki page. At some point in 2009, a Debian package existed in stable, installing the monolithic sage in /opt/sage; the package became unmaintained and was eventually removed.

The DebianScience project maintains experimental packages for SageMath, as documented in this wiki page. It also tracks dependency versions discrepancies for Debian vs Sagemath's master branch and develop branch. (source code for the tracker)

At Days 77, Julien and others worked on:

More info available on Logilab's blog.

Arch

Antonio Rojas maintains a very up-to-date SageMath package for Arch/community (v7.1 at the time of writing). Arch stastics show that >3% of users (who report package statistics) have the package installed.

SageMath is split into separate packages, and some patches are applied.

Current issues

Note: it would be hard to have a stable package that always work for all Arch environments. There are distributions based on Arch that aim at slower but more stable progress (e.g. Antergos). Aiming for a stable package for such would be more reasonable.

Guix/Nix

Guix and Nix are two very similar distribution agnostic functional package managers. Both projects focus on reproducible builds, in particular building/installing a package twice should give the same result byte-for-byte.

Efforts were started at Days 77 to package SageMath for Guix (Andreas Enge) and Nix (Julian Rüth). Andreas managed to make a Guix recipe for SageMath, and run it, but at the moment it is not up to Guix standards.

Citing Julian: "I am not sure that Nix/Guix will in the end help much with the problem of distributing or developing Sage. Anyway, if we can get to the point where an unpatched Sage builds in the very restrictive setting those two impose, then it should be relatively easy to build for any distribution."

Anaconda

Anaconda is a user-space distribution and package manager for scientific software. Born in the python ecosystem, it is becoming a de facto standard for scientific software.

Anaconda is mostly oriented towards binary packages, though Erik noted that nothing prevents shipping source packages with it. At the moment, no serious experiment with packaging SageMath for Anaconda has been done yet, but there was a consensus at the Days 77 that having SageMath in Anaconda is highly desirable, because of the overlapping interests with the Anaconda community, and because it has the potential to bridge the fracture between pure and applied math communities.

Common obstacles to packaging SageMath for distributions

SageMath as a distribution, candidates to replace the `spkg-install` system

The second topic addressed at Sage Days 77 was internal packaging. SageMath has its own package system, with its pros and cons. Here's some common complaints:

The workshop investigated possible alternatives to the packaging system. Two, mostly orthogonal, goals for such a system are:

The two goals are not necessarily achieved by the same system. For example, Anaconda is a very good candidate for the first one, but it does very little for the second (and, potentially, it makes it worse). However, nothing prevents having two systems complementing each other (except that two such systems might not exist yet).

Wanted features for an packaging replacement are:

Desirable features:

The following systems where considered at Days 77: #Pip.2FPyPI, #Anaconda-1, #Guix.2FNix-1, #Gentoo_prefix, #HashDist.

Pip/PyPI

Pip is NOT a package manager. Pip is just a Python module installer, it does very little to help install non-Python dependencies, and is not very smart about version handling.

However, many components of SageMath are stock python modules available on PyPI, and they could be installed by pip install. Work on this is underway, see #20218.

A common wish in the community is that more SageMath components which would be useful outside SageMath be shipped as separate Python/Cython modules on PyPI, so they benefit a larger community. This has recently happend with CySignals, and is happening with the Pari interface.

To some extent, pip+PyPI already offer a way for users to distribute SageMath code via sage -pip. However this is not well documented, and not explicitly supported.

Anaconda

Anaconda is a user-space distribution and package manager for scientific software. Born in the python ecosystem, it is becoming a de facto standard for scientific software.

Its most interesting features are

- Supports Linux, Windows, Mac. - There is a condahub (binstar) where people can submit their packages, and communities create channels (see also Anaconda cloud). - Very advanced dependency handling. Its developers say "Anaconda has solved the packaging problem".

However, as it is put here:

"Of course, solving the packaging problem and removing it are different things. Conda does not make it easier to compile difficult packages. It only makes it so that fewer people have to do it. And there is still work to be done before Conda really takes over the world."

Anaconda being mostly oriented towards binary packages, it does very little to help developers handle a modular distribution such as SageMath (it is possible in principle to package sources for Anaconda, though). Some people have explored options to make it easy to compile complex distributions, while (semi-)automatically generating Anaconda packages. Some pointers here :

although this seems to have stalled for the moment.

To some extent, with respect to its host system, Anaconda is a monolith as much as SageMath, albeit with larger adoption, better integration, and a better packaging system. Among other things, transitioning to Anaconda would shift the "monolith blame" from SageMath to Anaconda.

Gentoo prefix

https://wiki.gentoo.org/wiki/Project:Prefix brings Gentoo portage package manager to other Linux distributions. Packages are installed in a prefix, rather than in the system paths, allowing users to have different versions of the same library at the same time. Following Gentoo's philosophy, it targets source packages, although it is possible in principle to make it handle binaries.

On the plus side, SageMath being already packaged for gentoo, it is relatively easy to package it for prefix. Indeed, this was already done by Timo Kluck at Days 47. Prefix supports Linux and OS X (although the usual breakages happen on OS X updates).

On the minus side, prefix has no support for Windows. Being very powerful, it may look daunting at first. There are some concerns on stability, maturity and sustainability, and it is almost a one man show (Fabian Groffen).

Guix/Nix

Guix and Nix are two very similar distribution agnostic functional package managers. Both projects focus on reproducible builds, in particular building/installing a package twice should give the same result byte-for-byte. Like Gentoo prefix, the learning curve can be steep.

Both Guix and Nix can be used as system-level package mangers, or as user-space package managers. In user-space, they install everything starting from the build chain, only taking the kernel from the system.

Like Gentoo prefix, they do isolated builds, allowing the user to have many version of the same library. They also support binaries, but need to be root, or use a user-space chroot such as https://github.com/proot-me/PRoot/blob/master/doc/proot/manual.txt or https://github.com/lethalman/nix-user-chroot.

Differences between Guix and Nix:

Some more notes on Nix:

HashDist

HashDist is an environment management tool whose promise is to "build only once". It uses concepts similar to Guix and Nix, but its focus is on build management, rather than reproducible builds (although it CAN do reproducible builds). A succinct abstract on HashDist's goals can be found here.

HashStack, HashDist's companion, is a collection of software profiles for HashDist.

HashDist works as a userspace tool, defining software stacks which allow isolated builds (like in prefix, guix, nix). Contrary to guix and nix, it does not insist on building a stack from the build chain up, but obviously one gives up reproducibility by relying on the system build tools. Any other library can be swapped between HashDist built libraries and system ones.

To avoid unneeded recompilations, HashDist computes hashes of the build steps, in a similar way to Guix/Nix.

HashDist has no support for binary packages, but has experimental support to generate Anaconda packages. It works on Linux and OS X, and has some experimental support for Cygwin.

Our own Volker is part of the HashDist community. He made an attempt in 2014 to auto-convert spkg-install scripts to hashdist.

Discussions outside Days 77

Roughly in parallel with Days 77, a great deal of discussion on packaging-related topics took place in sage-devel:

It would be extremely useful to summarize these discussion, but a better place for this would be a separate wiki page.