Brian Lloyd
Thursday, April 13, 2006
 
Python + .NET: Meta-model mashup
I'm happy to say its been a pretty busy month in Python for .NET-land ;^)
There is now a 1.0 branch for the .NET 1.x compatible version of the bridge, and development for the .NET 2.x compatible version is proceeding on the trunk. The goals for the 1.x and 2.x finals are solid stability, support for the features of the respective runtime versions and code-compatibility with IronPython, which has been making great strides lately. Some improvements that have now landed on both branches:
  • Refactored import syntax: now you can use un-prefixed namespace names ( "from System import *") instead of the old "magic CLR module" syntax ("from CLR.System...). The old "CLR." syntax is still supported until 3.0, but now officially deprecated. This was the main compatibility problem with code targeted for IP.
  • Method overload selection using [] syntax. You can now select a specific overload of a method using syntax like: System.Console.WriteLine[System.String](...) or System.Console.WriteLine[System.Boolean](...) rather than rely on the default "find the best match behavior" for method invokation. For IP compatibility, both branches also support mapping of builtin Python type names to corresponding .NET types in this case (System.Console.WriteLine[str](...) System.Console.WriteLine[bool](...)).
  • A few leaks fixed and various obvious performance improvements.
The trunk (.NET 2.x compatible) also contains basic support for generic methods and generic types. There is still a little bit of work to do on this - I hope to be able to make an RC3 release of 1.0 and a beta of 2.0 quite soon.

Implementing support for generics has posed some interesting issues. From the beginning of this project, I've really been pleasantly surprised and impressed with the symmetry of the meta-models of Python and the CLR (though the details are, of course, a bit different). Like Jim's anecdote about starting IronPython, I also started Python for .NET with the mindset of "how far can I go with this before I hit the inevitable brick wall?"

To my surprise (and pleasure), I kept finding that each time I hit an apparent wall it meant that I just wasn't being clever enough and I would eventually find a way to make things work, work well, work fast, and remain Pythonic (yes, I'm hell-bent for aesthetics).

But with 2.x generics, there appear to be some trade-offs to be made. This is the first must-support feature that doesn't have an obvious corollary or perfect solution in the Python meta-model. Jim and his team came up with the nifty subscript syntax for binding generic types (Dictionary[string, int]) which works well in the common case, but which raises a bunch of issues for (not-so) edge cases.

The heart of the issue is that CLR uses name mangling to disambiguate generic types. In C#, when you say

"System.Collections.Generic.Dictionary<TKey, TValue>"

it is translated to the type name

"System.Collections.Generic.Dictionary`2"

in IL. This is a perfectly good and reasonable design for compiled CLR-targeted languages, but poses a problem for Python.

A concrete example in the BCL 2.x: there is a "System.Nullable<T>" generic type and a "System.Nullable" (non-generic) type in the same namespace.
When you say "from System import Nullable", which did you mean? The <> syntax obviously won't work in Python (you also can't import or reference the type by its mangled name, which is also illegal as a Python name) , and there aren't many tricks left to use to disambiguate here.

One approach would be to have whatever represents the name "Nullable" be willing to support both direct invocation (Nullable(), to instantiate the non-generic type) and the [] syntax (Nullable[int], to bind and return a type Nullable of Int32). This is what I have at the moment, but it still leaves some edge cases unresolved, if we assume that we'd like to disambiguate constructors in the same way that we disambiguate methods .

Example: say in the namespace "System.Spam" we have a non-generic class with two constructors (Spam(int) and Spam(string)). In the same namespace we also have a generic type "System.Spam`1" (System.Spam<T>). What does "System.Spam[int]" mean from Python?"

It could mean either:
  • construct an instance of the non-generic "Spam" with an argument of type "int"
  • produce a closed generic type from the generic type definition "Spam<T>" with the type param "int"
Likewise, say that you have a (non-generic) type "System.Spam" that defines a (non-generic) method "SayHello(int)" and a generic method "SayHello<T>(T arg)". If you say from Python "object.SayHello[int](...)" - what did you mean? The generic, or the non-generic? The method implementations could be semantically different, but the [] syntax does not give us a way to clearly state what we want.

This is a pretty tough problem, since there are few other options (within existing Python syntax) that we can exploit, yet the [] syntax still leaves holes in the new meta-model mashup.

Update: Thanks to Seo for pointing out to me that this seems to be mostly resolved with IP beta 5 (I'd been using b4). So now the unadorned [] syntax is used only for binding generic types and methods, and a magic "__overloads__" attribute is used to select methods / ctors by signature.

From the IP workspace site:
    public class Foo {
public void Bar(int arg) {}
public void Bar<T>(int arg) {}
}

We can call the non-generic version with any of:
    foo.Bar(1)
foo.Bar.__overloads__[int](1)

And the generic one with any of:
foo.Bar[str](1)
foo.Bar.__overloads__[int][str](1)
foo.Bar[str].__overloads__[int](1)






 

archive
March 2005
December 2005
March 2006
April 2006

links
python
monologue
wuzzadem
plope