Friday, October 07, 2005

My Take on Python Concurrency

The mailing lists and blogs have been abuzz lately with discussions, arguments, and good and bad ideas about how to deal with concurrency in Python. There is a general distaste against using vanilla threads, due to the difficulty in managing shared resources. Generators are an increasingly popular option, but they don't offer any pre-emptive nature, if you are into that sort of thing. And, of course, the Global Interpreter Lock is bashed and many pray for its removal, but that will only happen if a real concurrency mechanism is added not just as some library, but into the library itself.

Now, off the bat I'll say that I don't hate threads, but I really wish there was a better alternative. I do want to see the GIL go away. Finally, I don't like having to explicitly suspend my task every so often to allow something else to process for a while. This is tedious and does not always reflect the nature of the task well.

Active Objects seems like the most Pythonic approach, and like they could work well with existing, non-concurrent code. With Active Objects, you have objects designed to work concurrently. Calls to their methods are typically queued so that only one method of one active object will be executing at a single time. Thus, the active object does not need to syncronize on its internal state data. Extended loops in a method could get greedy, however, but that is a small matter.

The biggest issue is dealing with shared objects, which happens so much in Python's pass-everything-by-reference world. Methods can be crafted to be safer, such as by taking only basic types (integers and strings) and containers of those or containers (lists of strings and tuples of dictionaries mapping strings to booleans, etc.). But, inevitably, you will have to pass something more to a method.

One approach could be to copy objects before passing them, even to deep copy them all. Another would be to create special shallow objects that represent the state of the original without copying all of it. But, perhaps the best choice is to simply use more active objects. If we pass one active object to the method of another, and are accessing all data through their "active methods", then syncronization is generally secured. Perhaps it would also be beneficial to define a common locking API for all active objects.

Python is a wonderful, dynamic language, but there is some speculation that the very dynamic rules that make Python great, are also making it so difficult to settle on a good concurrency mechanism and to allow safe, concurrent code. Many Pythoneers scuff at the idea of true protected and private attributes, but maybe that's just what we need.

The ability to modify everything may be useful, but maybe it should be controlled a little more. We should be able to execute code in the context of protections on certain objects, like giving read-only versions of modules or write-locked lists. This becomes faulty when you realize you can just call object.__getattribute__, but I have a possible solution: the protected class. It would be a new built-in class, and anything inheriting it would be given special abilities: object's methods would treat is specially and following its access rules (properties would be immutable, exception through the functions defined for them, for example) even when using object methods directly, or other introspection mechanisms. Combining this with crafted globals and built-ins for 'jailed' code, could be a sucessful way to execute multiple lines of code in parellel without worrying they will step on each others' toes.

No comments: