David Harvey: would like to experiment further with speeding up object construction

Email announcing the project from David Harvey

hi people,

Just want to float an idea for discussion and possibly a coding sprint at SD3.

Some background: Today I did some work on speeding up getting coefficients out of NTL objects, specifically polynomials in Z[x]. It's much improved now; when you request a coefficient of an NTL ZZX object, it copies the bytes directly into a new Integer object, instead of what it used to do (which went via a C string in decimal, and a python string, and a python long, etc etc).

But still what is taking a lot of time is constructing the Integer object. In fact, it's quite embarrasing: it takes us about half as long to construct 100,000 Integer objects as NTL takes to *multiply* two polynomials with 100,000 small coefficients:

sage: time for i in range(100000): pass
CPU times: user 0.05 s, sys: 0.00 s, total: 0.05 s
 
sage: time for i in range(100000): x = None
CPU times: user 0.09 s, sys: 0.00 s, total: 0.09 s
 
sage: time for i in range(100000): x = int()
CPU times: user 0.16 s, sys: 0.00 s, total: 0.17 s
 
sage: time for i in range(100000): x = Integer()
CPU times: user 0.36 s, sys: 0.00 s, total: 0.36 s
 
sage: f = PolynomialRing(ZZ, "x")([ZZ.random_element() for _ in range 
(100000)])
sage: time g = f*f
CPU times: user 0.76 s, sys: 0.02 s, total: 0.79 s

This is despite all the work we put into this at SD2.

It would be good to be able to optimise object construction in general. Unfortunately I think the general case is a very difficult problem. Anyone who worked on this at SD2 will agree, I'm sure

On the other hand, I would wager that construction of Integer objects is by far the most important. So perhaps we should give up some beauty and unity of code to just get Integers working damn fast. So here's what I propose: at SD3, let's try writing an experimental pure C function for constructing Integers that gets inserted into whatever tp_xyz slot is appropriate. I don't care if it has to deal directly with mangled pyrex names or whatever. From memory, all it needs to do is: (1) reference counting on the integer ring (ha ha we could even skip this if we could guarantee that no-one else ever resets the parent, and that there is always at least one reference to the integer ring lying around somewhere) (2) malloc some space for the actual python object (3) fill in some fields, like the TypeObject* (4) mpz_init

Let's skip all the function calls, all the crap that pyrex puts in, etc etc. Basically the only stuff left that will really suck up time is the two malloc calls. We could even try writing a buffering system for mallocing space for a whole bunch of Integers at once, if that proves to be taking up time.

Similarly we would need a destructor.

I'm sure it's not quite as simple as what I've put in the above list, but let's just see what we can do. I would like it to be as close as possible to the int() construction time. We still are a factor of > 2 away.

David

Email back from William

> (1) reference counting on the integer ring (ha ha we could even skip
> this if we could guarantee that no-one else ever resets the parent,
> and that there is always at least one reference to the integer ring
> lying around somewhere)

There is my integer with the integer ring -- it should be immutable and
create at module load time and there should only ever be exactly one
copy of it.  I think we should definitely be allowed to forgot about
reference counting for it. 

> (2) malloc some space for the actual python object
> (3) fill in some fields, like the TypeObject*
> (4) mpz_init

You should put 
  (0) or (5) object pool
as an important step -- this "object pool" idea is one of the tricks
that Python uses for its ints.  For example, in your benchmark:

  time for i in range(10^5): x = int()

Python is looking up and returning exactly the same int (the 0) every time:
  sage: int() is int()
  True
  sage: a = 999038r; b=999038r
  sage: a is b
  True

In contrast, when you do Integer(), so is creating a new integer object
every time, and probably (?) also freeing one:
  sage: Integer() is Integer()
  False
  sage: a = 999038; b=999038
  sage: a is b
  False


> Let's skip all the function calls, all the crap that pyrex puts in,
> etc etc. Basically the only stuff left that will really suck up time
> is the two malloc calls. We could even try writing a buffering system
> for mallocing space for a whole bunch of Integers at once, if that
> proves to be taking up time.
>
> Similarly we would need a destructor.

Or return objects to the pool -- this can also speed up desctructing,
since you just don't do it!

> I'm sure it's not quite as simple as what I've put in the above list,
> but let's just see what we can do. I would like it to be as close as
> possible to the int() construction time. We still are a factor of > 2
> away.

This project is very very well worth pursing. 

  -- William