Strings and the Burrows-Wheeler Transform -- Thematic Tutorials v4.7.rc0
system:sage


<div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index" accesskey="I">index</a></li>
  
    
      <a href="../../index.html"><img src="../_static/sagelogo.png" style="vertical-align: middle" title="Sage Logo"></a>
    
  
  
        <li><a href="../index.html">Thematic Tutorials v4.7.rc0</a> &raquo;</li>
 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="strings-and-the-burrows-wheeler-transform">
<span id="siena-tutorials-worksheet07-stringsandthebwt"></span><h1>Strings and the Burrows-Wheeler Transform<a class="headerlink" href="#strings-and-the-burrows-wheeler-transform" title="Permalink to this headline">¶</a></h1>
<p><em>Author: Franco Saliola &lt;saliola at gmail.com&gt;</em></p>
<p>Sage/Python includes a builtin datastructure from strings.</p>
<p>There are several ways to input strings. You can input a string using single
quotes (&#8216;) or double quotes (&#8221;):</p>
<div class="highlight-python">

{{{id=0|
s = &quot;This is a string!&quot;
s
///
'This is a string!'
}}}

</div>
<div class="highlight-python">

{{{id=1|
t = &#39;So is this!&#39;
print t
///
So is this!
}}}

</div>
<p>You can also input a string using three quotes (&#8220;&#8221;&#8221; or &#8216;&#8217;&#8216;). This is useful if you want to use both &#8221; and &#8216; in your string, or you want your string to span multiple lines:</p>
<div class="highlight-python">

{{{id=2|
s = &quot;&quot;&quot;
This is a multi-line
           string
that includes &#39;single quotes&#39;
             and &quot;double quotes&quot;.
&quot;&quot;&quot;
print s
///
This is a multi-line
        string
that includes 'single quotes'
          and &quot;double quotes&quot;.
}}}

</div>
<p><strong>Exercises</strong></p>
<ol class="arabic">
<li><p class="first">Create and print the following string</p>
<div class="highlight-python"><pre>  \ | ( | ) / /
_________________
|               |
|               |
|  I &lt;3 Coffee! /--\
|               |  |
 \             /\--/
  \___________/</pre>
</div>
</li>
<li><p class="first">Without using cut-and-paste(!) <em>replace</em> the substring  <tt class="docutils literal"><span class="pre">I</span> <span class="pre">&lt;3</span> <span class="pre">Coffee!</span></tt>
with the substring  <tt class="docutils literal"><span class="pre">I</span> <span class="pre">&lt;3</span> <span class="pre">Tea!</span></tt>.</p>
</li>
<li><p class="first">Print a copy of your string with all the letters capitalized (upercase).</p>
</li>
</ol>
<div class="section" id="operations-on-strings">
<h2>Operations on strings<a class="headerlink" href="#operations-on-strings" title="Permalink to this headline">¶</a></h2>
<p>Strings behave very much like lists. The table below summarizes their common
operations.</p>
<blockquote>
<table border="1" class="docutils">
<colgroup>
<col width="33%">
<col width="31%">
<col width="36%">
</colgroup>
<thead valign="bottom">
<tr><th class="head">Operation</th>
<th class="head">Syntax for lists</th>
<th class="head">Syntax for strings</th>
</tr>
</thead>
<tbody valign="top">
<tr><td>Accessing a letter</td>
<td><tt class="docutils literal"><span class="pre">list[3]</span></tt></td>
<td><tt class="docutils literal"><span class="pre">string[3]</span></tt></td>
</tr>
<tr><td>Slicing</td>
<td><tt class="docutils literal"><span class="pre">list[3:17:2]</span></tt></td>
<td><tt class="docutils literal"><span class="pre">string[3:17:2]</span></tt></td>
</tr>
<tr><td>Concatenation</td>
<td><tt class="docutils literal"><span class="pre">list1</span> <span class="pre">+</span> <span class="pre">list2</span></tt></td>
<td><tt class="docutils literal"><span class="pre">string1</span> <span class="pre">+</span> <span class="pre">sting2</span></tt></td>
</tr>
<tr><td>A copy</td>
<td><tt class="docutils literal"><span class="pre">list[:]</span></tt></td>
<td><tt class="docutils literal"><span class="pre">string[:]</span></tt></td>
</tr>
<tr><td>A reversed copy</td>
<td><tt class="docutils literal"><span class="pre">list[::-1]</span></tt></td>
<td><tt class="docutils literal"><span class="pre">string[::-1]</span></tt></td>
</tr>
<tr><td>Length</td>
<td><tt class="docutils literal"><span class="pre">len(list)</span></tt></td>
<td><tt class="docutils literal"><span class="pre">len(string)</span></tt></td>
</tr>
</tbody>
</table>
</blockquote>
<p><strong>Exercises</strong></p>
<ol class="arabic">
<li><p class="first">The factors of length 2 of &#8216;rhubarb&#8217; are</p>
<blockquote>
<div class="line-block">
<div class="line">rh</div>
<div class="line">hu</div>
<div class="line">ub</div>
<div class="line">ba</div>
<div class="line">ar</div>
<div class="line">rb</div>
</div>
</blockquote>
<p>Write a function called  <tt class="docutils literal"><span class="pre">factors</span></tt> that returns a list of the factors of
length  <tt class="docutils literal"><span class="pre">l</span></tt>  of  <tt class="docutils literal"><span class="pre">s</span></tt> , and list all the factors of length 3 of &#8216;rhubarb&#8217;.</p>
<div class="highlight-python">

{{{id=3|
# edit here
///
}}}

</div>
</li>
<li><p class="first">What happens if you apply your function  <tt class="docutils literal"><span class="pre">factors</span></tt> to the list
<tt class="docutils literal"><span class="pre">[0,1,1,0,1,0,0,1]</span></tt> ? If it doesn&#8217;t work for both lists and strings, go
back and modify your function so that it does work for both.</p>
<div class="highlight-python">

{{{id=4|
# edit here
///
}}}

</div>
</li>
</ol>
<div class="section" id="run-length-encoding">
<h3>Run-length encoding<a class="headerlink" href="#run-length-encoding" title="Permalink to this headline">¶</a></h3>
<p>The string</p>
<blockquote>
<tt class="docutils literal"><span class="pre">WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW</span></tt></blockquote>
<p>begins with <tt class="docutils literal"><span class="pre">W</span></tt> 12 times, then <tt class="docutils literal"><span class="pre">B</span></tt> once, then <tt class="docutils literal"><span class="pre">W</span></tt> 12 times, then <tt class="docutils literal"><span class="pre">B</span></tt> 3
times, then <tt class="docutils literal"><span class="pre">W</span></tt> 24 times, then <tt class="docutils literal"><span class="pre">B</span></tt> once and then <tt class="docutils literal"><span class="pre">W</span></tt> 14 times. Thus, it
can be encoded by the tuples:</p>
<div class="highlight-python"><div class="highlight"><pre>(W, 12), (B, 1), (W, 12), (B, 3), (W, 24), (B, 1), (W, 14)
</pre></div></div>
<p>This is called the  <em>run-length encoding</em> of the string.</p>
<p><strong>Exercises</strong></p>
<ol class="arabic">
<li><p class="first">Write a function that returns the run-length encoding of a string. Does your
function work for lists as well as strings? If not, then can you make it so
that it works for both strings and lists? Use your function to compute the
run-length encoding of the list:</p>
<p><tt class="docutils literal"><span class="pre">[0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0]</span></tt></p>
<div class="highlight-python">

{{{id=5|
# edit here
///
}}}

</div>
</li>
</ol>
</div>
<div class="section" id="rotations">
<h3>Rotations<a class="headerlink" href="#rotations" title="Permalink to this headline">¶</a></h3>
<p>The <em>rotations</em> of the string &#8216;bananas&#8217; are:</p>
<blockquote>
<div class="line-block">
<div class="line">bananas</div>
<div class="line">ananasb</div>
<div class="line">nanasba</div>
<div class="line">anasban</div>
<div class="line">nasbana</div>
<div class="line">asbanan</div>
<div class="line">sbanana</div>
</div>
</blockquote>
<p>and if we sort these alphabetically, then we get:</p>
<blockquote>
<div class="line-block">
<div class="line">ananasb</div>
<div class="line">anasban</div>
<div class="line">asbanan</div>
<div class="line">bananas</div>
<div class="line">nanasba</div>
<div class="line">nasbana</div>
<div class="line">sbanana</div>
</div>
</blockquote>
<p><strong>Exercises</strong></p>
<ol class="arabic">
<li><p class="first">Define a function  <tt class="docutils literal"><span class="pre">print_sorted_rotations</span></tt>  that sorts all the rotations
of a string and prints them in an array as above. Print the sorted rotations
of the strings &#8216;ananas&#8217;  and &#8216;cocomero&#8217;.</p>
<div class="highlight-python">

{{{id=6|
# edit here
///
}}}

</div>
</li>
</ol>
</div>
<div class="section" id="the-burrows-wheeler-transform">
<h3>The Burrows-Wheeler Transform<a class="headerlink" href="#the-burrows-wheeler-transform" title="Permalink to this headline">¶</a></h3>
<p>The  <em>Burrows-Wheeler Transform</em>  (BWT) of a string  <tt class="docutils literal"><span class="pre">s</span></tt>  sorts all the
rotations of  <tt class="docutils literal"><span class="pre">s</span></tt>  and then returns the last column.</p>
<p>For example, if we sort the rotations of &#8216;bananas&#8217;:</p>
<blockquote>
<div class="line-block">
<div class="line">ananasb</div>
<div class="line">anasban</div>
<div class="line">asbanan</div>
<div class="line">bananas</div>
<div class="line">nanasba</div>
<div class="line">nasbana</div>
<div class="line">sbanana</div>
</div>
</blockquote>
<p>then the last column is  <em>bnnsaaa</em> , so the BWT of  <em>bananas</em> is <em>bnnsaaa</em>.</p>
<p><strong>Exercises</strong></p>
<ol class="arabic">
<li><p class="first">Write a function that returns the BWT of a string. Compute the BWT of
<em>bananas</em> ,  <em>ananas</em>  and  <em>cocomero</em> . (<em>Hint:</em>  You can return you answer
as a list, but if you want to return a string, then you might want to use
the  <tt class="docutils literal"><span class="pre">join</span></tt>  method for strings.)</p>
<div class="highlight-python">

{{{id=7|
# edit here
///
}}}

</div>
</li>
<li><p class="first">Combine the functions you defined above to create an <tt class="docutils literal"><span class="pre">&#64;interact</span></tt> object
that takes a string  <tt class="docutils literal"><span class="pre">s</span></tt>  and prints:</p>
<ul class="simple">
<li>the sorted rotations of  <tt class="docutils literal"><span class="pre">s</span></tt></li>
<li>the run-length encoding of  <tt class="docutils literal"><span class="pre">s</span></tt></li>
<li>the BWT of  <tt class="docutils literal"><span class="pre">s</span></tt></li>
<li>the run-length encoding of the BWT of  <tt class="docutils literal"><span class="pre">s</span></tt></li>
</ul>
<p>(<em>Hint:</em>  String formatting can be done using the  <tt class="docutils literal"><span class="pre">%</span></tt>  operator. Here is
an example:</p>
<div class="highlight-python">

{{{id=8|
print &#39;The sum of %s and %s is %s.&#39; % (3,2,3+2)
///
The sum of 3 and 2 is 5.
}}}

</div>
<p>If you are familiar with  <em>C</em>  then you will notice that string formating
is very similar to  <em>C</em> &#8216;s  <tt class="docutils literal"><span class="pre">sprintf</span></tt>  statement.)</p>
<div class="highlight-python">

{{{id=9|
# edit here
///
}}}

</div>
</li>
<li><p class="first">Use your interact object to explore this transformation, and to answer
the following questions.</p>
<ol class="loweralpha simple">
<li>Compute the BWT of the following.<ul>
<li><tt class="docutils literal"><span class="pre">xxyxyxyxyxyxyxyxyxxyxyxyxyxyxyxyxyxy</span></tt></li>
<li><tt class="docutils literal"><span class="pre">01101001100101101001011001101001100101100110100101</span></tt></li>
<li><tt class="docutils literal"><span class="pre">cdccdcdccdccdcdccdcdccdccdcdccdccdcdccdcdccdccdcdc</span></tt></li>
</ul>
</li>
<li>Do you notice any patterns in the BWT of a string?</li>
<li>Can you think of an application for this transformation?</li>
<li>Find 3 other strings that have a &#8216;nice&#8217; image under the BWT.</li>
<li>Is the Burrows-Wheeler transformation invertible? (That is, can you find
two strings that have the same BWT?)</li>
</ol>
<div class="highlight-python">

{{{id=10|
# edit here
///
}}}

</div>
</li>
<li><p class="first">By comparing the BWT of a string with the first column of the array of
sorted rotations of a string  <tt class="docutils literal"><span class="pre">s</span></tt> , devise and implement an algorithm
that reconstructs the first column of the array from the BWT of  <tt class="docutils literal"><span class="pre">s</span></tt> .</p>
<div class="highlight-python">

{{{id=11|
# edit here
///
}}}

</div>
</li>
<li><p class="first">By examining the first  <em>two</em>  columns of the array, devise and implement an
algorithm that reconstructs the first  <em>two</em>  columns of the array from the
BWT of a string. ( <em>Hint:</em>  compare the last and first column with the first
two columns.)</p>
<div class="highlight-python">

{{{id=12|
# edit here
///
}}}

</div>
</li>
<li><p class="first">By examining the first  <em>three</em>  columns of the array, devise and implement
an algorithm that reconstructs the first  <em>three</em>  columns of the array from
the BWT of a string.</p>
<div class="highlight-python">

{{{id=13|
# edit here
///
}}}

</div>
</li>
<li><p class="first">Write a function that reconstructs the entire array of sorted rotations of a
string from the BWT of the string.</p>
<div class="highlight-python">

{{{id=14|
# edit here
///
}}}

</div>
</li>
<li><p class="first">A  <em>Lyndon word</em>  is a word w that comes first in alphabetical order among
all its rotations. Is the BWT invertible on Lyndon words?</p>
<div class="highlight-python">

{{{id=15|
# edit here
///
}}}

</div>
</li>
<li><p class="first">Explain how one can modify the BWT to make it invertible on arbitrary words.
Implement your modified transformation and the inverse transformation.</p>
<div class="highlight-python">

{{{id=16|
# edit here
///
}}}

</div>
</li>
</ol>
</div>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
            <h3><a href="../index.html">Table Of Contents</a></h3>
            <ul>
<li><a class="reference internal" href="#">Strings and the Burrows-Wheeler Transform</a><ul>
<li><a class="reference internal" href="#operations-on-strings">Operations on strings</a><ul>
<li><a class="reference internal" href="#run-length-encoding">Run-length encoding</a></li>
<li><a class="reference internal" href="#rotations">Rotations</a></li>
<li><a class="reference internal" href="#the-burrows-wheeler-transform">The Burrows-Wheeler Transform</a></li>
</ul>
</li>
</ul>
</li>
</ul>

            <h3>This Page</h3>
            <ul class="this-page-menu">
              <li><a href="../_sources/siena_tutorials/Worksheet07-StringsAndTheBWT.txt" rel="nofollow">Show Source</a></li>
            </ul>
          <div id="searchbox" style="display: none">
            <h3>Quick search</h3>
              <p class="searchtip" style="font-size: 90%">
              Enter search terms or a module, class or function name.
              </p>
          </div>
          <script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index">index</a></li>
  
    
      <a href="../../index.html"><img src="../_static/sagelogo.png" style="vertical-align: middle" title="Sage Logo"></a>
    
  
  
        <li><a href="../index.html">Thematic Tutorials v4.7.rc0</a> &raquo;</li>
 
      </ul>
    </div>
    
    <div class="footer">
        &copy; Copyright 2005--2011, The Sage Development Team.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.4.
    </div>
    <script type="text/javascript">
/*global jQuery, window */
/* Sphinx sidebar toggle.  Putting this code at the end of the body
 * enables the toggle for the live, static, and offline docs.  Note:
 * sage.misc.html.math_parse() eats jQuery's dollar-sign shortcut. */
var jq = jQuery;  
jq(document).ready(function () {
    var bar, bod, bg, fg, key, tog, wid_old, wid_new, resize, get_state, set_state;
    bod = jq('div.bodywrapper');
    bar = jq('div.sphinxsidebar');
    tog = jq('<div class="sphinxsidebartoggle"></div>');
    
    /* Delayed resize helper.  Not perfect but good enough. */
    resize = function () {
        setTimeout(function () {
            tog.height(bod.height());
        }, 100);
    };
    jq(window).resize(function () {
        resize();
    });
    
    /* Setup and add the toggle. See Sphinx v0.5.1 default.css. */
    fg = jq('div.sphinxsidebar p a').css('color') || 'rgb(152, 219, 204)';
    bg = jq('div.document').css('background-color') || 'rgb(28, 78, 99)';
    wid_old = '230px';
    wid_new = '5px';
    tog.css('background-color', bg)
        .css('border-width', '0px')
        .css('border-right', wid_new + ' ridge ' + bg)
        .css('cursor', 'pointer')
        .css('position', 'absolute')
        .css('left', '-' + wid_new)
        .css('top', '0px')
        .css('width', wid_new);
    bod.css('position', 'relative');
    bod.prepend(tog);
    resize();
    
    /* Cookie helpers. */
    key = 'sphinxsidebar=';
    set_state = function (s) {
        var date = new Date();
        /* Expiry in 7 days. */
        date.setTime(date.getTime() + (7 * 24 * 3600 * 1000));
        document.cookie = key + encodeURIComponent(s) + '; expires=' +
            date.toUTCString() + '; path=/';
    };
    get_state = function () {
        var i, c, crumbs = document.cookie.split(';');
        for (i = 0; i < crumbs.length; i += 1) {
            c = crumbs[i].replace(/^\s+/, '');
            if (c.indexOf(key) === 0) {
                return decodeURIComponent(c.substring(key.length, c.length));
            }
        }
        return null;
    };
    
    /* Event handlers. */
    tog.mouseover(function (ev) {
        tog.css('border-right-color', fg);
    }).mouseout(function (ev) {
        tog.css('border-right-color', bg);
    }).click(function (ev) {
        if (bod.hasClass('wide')) {
            bod.removeClass('wide');
            bod.css('margin-left', wid_old);
            bar.css('width', wid_old);
            bar.show();
            set_state('visible');
        } else {
            set_state('hidden');
            bar.hide();
            bar.css('width', '0px');
            bod.css('margin-left', wid_new);
            bod.addClass('wide');
        }
        resize();
    });
    
    /* Hide the normally visible sidebar? */
    if (get_state() === 'hidden') {
        tog.trigger('click');
    } else {
        set_state('visible');
    }
});
    </script>