Exploring Grok

Friday 22 January 2010

WyPy January 2010

This month it was my turn to give a talk at the West Yorkshire Python Usergroup. In attendance were Simon Davy, Bernie Czenkusz, Peter Russell, Ed Saxton and myself. The aim of the talk was to present the structure of the Zope Toolkit, its place and role in other frameworks, and to explain the technical details Zope Component Architecture in a balanced and fair way.

Overall I was happy with the talk: people seemed to already have a lot of knowledge about the various topics I was presenting, which made for some good questions and discussion. If I could criticise any of it, I'd say it was probably too long - weighing in at around 1hr 40mins, but perhaps that was because there was a lot of questions. I expected it to be about an hour in total!

To prepare the images for the slides I used Xara on Linux. It's very good for drawing and manipulating vector diagrams and ideal for illustrations. I put the slides together in Open Office Presentation and exported them as a PDF. This worked extremely well and can highly recommend this approach for simple presentations such as mine. I've made them available here.

Looking forward to next month: Peter Russell committed to a talk on the XML and HTML parsing library 'lxml'. Bernie tried to pin some of us down for writing a Python Rag article for March, although I'm not totally sure what the outcome of that was. Simon pitched the idea of setting up a 'WyPy Blog', which we all contribute to regularly. Perhaps Bernie could cherry pick the best of the bunch for his Python Rag magazine?

Thursday 24 December 2009

Article in Python Rag

Next month's Python Rag contains an article written by me on the topic of namespaces. Knowing what namespaces are and where they are found helps you understand how the language works and explains how simple its concepts of functions, classes and modules are.
On balance I'm happy with the article; I've never written anything like this before and I think I've communicated the point. However, I've definitely identified areas that could be improved. For example:

The article doesn't flow. This is because I've tried to back each assertion I make with proof at the Python interpreter. As a result the narrative flow gets over punctuated with example excerpts, and becomes difficult to read and hard to follow. Perhaps I'm too used to writing doctests! Instead, I think I should have dedicated a large set of more creative narrative to each point, and offered bigger and fuller examples to back things up.

Not enough visuals. Many of us are visual thinkers, tending to draw pictures in our minds representing abstract concepts. One thing that I think is important about understanding namespaces is that they allow you to visualize python programs, as a sort of containment hierarchy. I really think I could of draw some attractive images to help communicate my point a little better.

Not creative or imaginative. Although the article is technical in nature, it doesn't hurt to use a few metaphors here and there, or provide some historical context to certain features or throw in some personal reflection. Perhaps I was preoccupied with being technically accurate, so the result is a little dry.

Fortunately, next month (Feb), I'm writing a follow-up on Python scopes which ties in quite nicely with namespaces. This is the perfect opportunity to address these issues and write a better article!

Wednesday 16 December 2009

WyPY December 2009

Last week I attended the West Yorkshire Python User Group for the second time. There were two talks which I'll give an overview of.

The first, by Wavy (Simon) Davy, was a tour of his experience of Python networking frameworks, specifically Twisted. He contrasted thread-based against non-blocking asynchronous approaches, and demonstrated how the latter afforded significant improvements for a project he was working on.

The second from Peter Russell, was pretty hardcore - he talked about Issue 4753 (I believe) http://bugs.python.org/issue4753. In summary, early versions of Python used a large switch statement to interrogate and dispatch to the correct function to process various opcodes in Python bytecode. He then showed how a recent reworking of the code allowed CPUs to do more clever branch prediction in the execution pipeline!

I promised to provide a talk to the guys next week regarding Zope components, via Grok. Looking forward to it! :-)

Thursday 22 October 2009

Custom Doctest Syntax with Manuel

Python doctests allow you to execute code within a single document. The special syntax '>>>' is used to denote python statements that are to be executed within the interpreter, and '...' is used to denote the body of compound statements. Using these tools, you can create anything that is available to you in a standard Python interpreter session:


>>> 2 + 2
4
>>> def func():
...    pass
>>>

That said, there is no way of easily defining a module in a Python doctest, and having that namespace available in subsequent doctest interactions. Many packages (or maybe meta-packages??) themselves operate on modules, one good example is martian which 'scans' them for configuration information.

The martian project's solution was to write the module's body within a class that inherited from 'FakeModule'. The FakeModule's metaclass then extracted the namespace out of your class, and pushed it into a module (updating all the __module__ variables on the way). A cut down version of this (from the martian README):


>>> class templating(FakeModule):
...
...   class InterpolationTemplate(Template):
...      "Use %(foo)s for dictionary interpolation."
...      extension('.txt')
...      def __init__(self, text):
...          self.text = text
...
...
...   # the registry, empty to start with
...   extension_handlers = {}
...
...   def render(data, extension, **kw):
...      # this hasn't changed
...      template = extension_handlers[extension](data)
...      return template.render(**kw)
>>> from martiantest.fake import templating

As you can see, this is quite ugly!

Fortunately, Benji York has written Manuel, a system for defining and executing your own custom doctest syntax. As a simple example, take a look at the footnote example:


Here we reference a footnote. [1]_

    >>> x
    42

.. [1] This is a test footnote definition.

    >>> x = 42

So this is a doctest with executable footnotes! Manuel scans the document for footnotes defined with syntax looking like:

'[?]_'

and then executes the matching footnote's body! Check out the manuel page for details of other extensions and syntax.

Fake Modules

Using this system, I've written an extension to Manuel that recognises module blocks and makes them available in the rest of your doctests. We use the 'module-block' directive to define our modules. Take a look at the doctest example:


.. module-block:: some_module
     import pprint

     a = ['10', '20', '30']

     def some_function(bar):
        pprint.pprint(bar)

     def get_my_module():
        print get_my_module.__module__

An unindented line like this marks the end of the module 
definition. We can, of course, mix classic doctest tests 
in with our module-blocks. Let's test that our module 
was set up correctly:

  >>> some_module.some_function(some_module.a)
  ['10', '20', '30']

We'll have a poke around and see what's in `some_module`:

  >>> dir(some_module)
  ['__builtins__', '__doc__', '__file__', '__name__', 
'a', 'get_my_module', 'pprint', 'some_function']

Modules must have a file location, this is just a dummy/
unique namespace::

  >>> print some_module.__file__
  /manueltest/fake/some_module

Are functions aware of their module?::

  >>> print some_module.some_function.__module__
  manueltest.fake.some_module

Externally, yes. What about from within the module::

  >>> some_module.get_my_module()
  manueltest.fake.some_module

As you can see, this is a much cleaner and readible way of defining modules within your doctests!

I intend to release this as a separate plugin package 'manuelpi.fake_module'.

Thursday 8 October 2009

Grok 1.0 Released!!

Grok 1.0 has been released today!

Over the past 9 months I've been watching the development of Grok progress, and it has been a great learning experience. I've only been able to contribute a single module and some documentation, but I've written quite a bit of code upon it, and have been mightily impressed with its power and versatility.

Hopefully over the next year I'll have more time to dedicate to it....

Tuesday 22 September 2009

Generator for distributing calls

At work I was confronted with the problem of 'How to distribute calls between 2 call centers in a fair and even way according to some yet-to-be-determined percentage split ration?' Taking 30% split, we cannot send the first 30 to call centre 1 and the remaining 70 to call centre 2. The calls must be distributed as evenly as possible throughout; for example split 30% or 10 calls would give: [1,2,1,1,2,1,1,2,1,1] and not [1,1,1,1,1,1,1,2,2,2]!

When thinking about these kinds of problems, it helps to abstract away from the context of calls and call centres and think about the underlying problem; distribution over a descrete choice space. From here, we might find a solution in another domain that solves our problem...

It turns out that the problem of call distribution distils to the same problem that Bresenham tackled when trying to draw a line on a computer screen matrix. Imagine that you have a graph of cells 100 pixels on the X axis and 100 pixels on the Y axis. Now imagine you are using the line drawing algorithm to paint a line from the X/Y origin to, say, the coordinates '100,30'.

The line drawing algorithm will, after 100 interations, have distributed pixels evenly between 70 in the X direction and 30 in the Y direction (but always moving towards X). This is exactly the same as distributing 30 calls to call centre 1 and the rest to call centre 2.

To model this, it makes sense to write a python generator that 'yields' the correct call centre when asked:


class CallDistributor():
    def __init__(self, split=50):
        self.split = split
        if split > 100 or split <= 0:
            raise ValueError
 
    def _execute(self):
        error = 0.0
        step = 0
        deltaErr = self.split / 100.00
        yield 1
        while True:
            error += deltaErr
            if abs(error) >= 0.5:
                yield 2
                error -= 1.0
            else:
                yield 1
 
    def __iter__(self):
        return iter(self._execute( ))
 
 
if __name__ == "__main__":
    dist1 = CallDistributor(split=50)
    anIter = iter(dist1)
    for i in range(10):
        print anIter.next()

If I have chance I'd like to expand this to take a 'number of call centres' parameter, rather than having it restricted to 2.

SqlSoup is cool

I've been playing with our internal wiki's database, and with SqlSoup. I've found that SqlSoup did exactly what I wanted it to do; read a database, infer its table layout, create classes that map attributes to those table columns and allow me to manipulate those classes before adding the modifications back.

This code prints out all the rows in the 'People' table:


from sqlalchemy.ext.sqlsoup import SqlSoup
db = SqlSoup("mysql://root@localhost/movements")
people = db.People.all()
people.sort()
import pprint
pprint.pprint(people)

How simple is that! :-)