This page gathers ideas for refactorization of sage.combinat.words and implementation of tilings. You can subscribe to the associated mailing-list to discuss about this.

How do I implement my language ? my tiling ?

There are different places to look at for examples:

For shifts and tilings, there is (up to now) almost nothing:


The refactorization of the current code should go in the patch #12224 which is almost done. Up to now the code is a bit dissaminated everywhere in Sage..

General overview

Classes for finite words

The base class for all finite words is currently in sage.combinat.words.finite_word and is called FiniteWord. Some generic algorithms are implemented in the category of languages and factorial languages. The specific classes are


What is bad/nice with categories:

What do we keep? What categories do we create? Do we provide a default _element_constructor_ in the category (if yes, it is highly incompatible with facade)?

In a next future we should think about mutability/immutability.

Algorithmic and naming conventions

== Behavior of algorithms with infinite input data =

What to do for equality of infinite words ?

What should do

sage: w1 == w2

Two possibilities:

  1. test the first XXX letters for finding a difference. If find one then returns False otherwise raise an error, "seems to be equal use .is_equal(force=True) to launch the infinite test".
  2. test all letters and never return True in case of equality

Other suggestions ?

Parikh vector, evaluation, abelianization

The name abelianization is the most generic one. Parikh vector is standard in word combinatorics. Evaluation is mainly used in combinatorics.

Notice that evaluation is formally a composition, in other words the alphabet should be finite and ordered.

Pattern matching

Algorithms for pattern matching may be optimized in subclasses. Hence, we should pay attention for function of low and high level.

Vincent proposes to use the following conventions for low-level routines. As it is questions to implement them in C in a near future, this question is crucial:

There is also the question of multiple matchings and more generally about regular expressions.

The actual implementation of pattern matching uses Boyer-Moore algorithm which needs some precomputation for y: last_position_dict, prefix_function_table, good_suffix_table. All theses precomputations are cached_method of a word which may be memory consuming and not very efficient as the following code actually calls twice the precomputation:

sage: w1 = Word('abbabaabaaababa', alphabet='ab')
sage: w2 = Word('abababaaaa', alphabet='ab')
sage: w1.find('aa')
sage: w2.find('aa')

Vincent proposes to move all precomputation in a module dedicated to pattern matching and to not use caching except if the user want to perform many search of y in many different x. In which case we should do something like that:

sage: w = Word('ab', alphabet='ab')
sage: f = Finder(w)
sage: f.match(Word('abbababaababbbbababab', alphabet='ab'))

Repetitions and exponents

see also #

Actual names


Finite languages and factor set

Most of it was implemented by Franco (suffix tree and suffix trie). We would like to enhance it and make a specific data structure (called Rauzy castle) for FiniteFactorialLanguages. See #12225.

Substitutive and adic languages

There are many algorithms for languages described by a sequence of substitutions (called a directive word). The particular case of morphic and purely morphic languages correspond respectively to periodic and purely_periodic directive words.

Eventually periodic languages / words

They will be useful to define eventually periodic directive words for adic languages. See #12228.

TODO list

for #12224:

for other tickets:


LanguagesAndTilings (last edited 2014-03-19 13:30:06 by vdelecroix)