Planet Parrot

March 10, 2010

v
^
x

Andrew WhitworthGSoC Idea: GMP Bindings

This conversation happened yesterday on IRC, with some off-topic things edited out:

darbelo: That reminds me. I hate our bignums and want them to die...
whiteknight: darbelo: I agree with the bignums thing 100%. I want bignums out of the repo and moved to their own project
whiteknight: There's no sense keeping them when I suspect a majority of users can't use them because they don't have GMP installed
darbelo: Actually, I wouldn't mind them being in the core if they weren't dependant on a lib I don't have.
darbelo: I actually had started to write a stand-alone BigInteger PMC after last year's SoC.
whiteknight: that would make an awesome project too.
whiteknight: I think we should have lots of projects like that, and for developers to be able to pick which solution they want
whiteknight: as we are now, it's easier to force BigInt to pretend to do what we need instead of just using the best solution, which might be DecNumber or something else
darbelo: Maybe, but GMP is much, much more than just bignums. It's a pretty big library.
darbelo: Our PMCs don't even start to scratch the surface of what GMP can do.
whiteknight: darbelo: so moving those PMCs out to a separate library and adding wrappers for other functionality might be nice
darbelo: I would consider a GMP binding much more valuble to parrot than our current use of the lib, yes.
bubaflub: last year i worked a bit with GMP library and suggested to dukeleto we work on a GMP binding for parrot
bubaflub: last year with GSOC and perl 5
darbelo: bubaflub: That would be nice to have.
bubaflub: though we used an existing perl 5 binding (Math::GMPz)
whiteknight: yes, that would be a wonderful project
bubaflub: but we could nab the test suite and what not
whiteknight: exactly. We have the two PMC types, and we could write wrappers for the rest of the library and get all sorts of additional power
bubaflub: i think access to the GMP library in general would be nice; the stuff i worked on last year was setting some foundational stuff for cryptography libraries

Parrot has two PMC types that wrap GMP: BigInt nd BigNum. I think, and apparently a few people agree, that these two types have no business being in the core Parrot repository and should be moved to another project. The immediate benefit to this would be that the bindings for GMP could be improved and expanded independently, instead of only providing what little functionality Parrot actually makes direct use of.

A good GSoC project for this year would be to move (or fork) the current BigInt and BigNum PMC types to a new project and use them as the cornerstone for writing a more comprehensive interface for the GMP library.This could include other PMC types, NCI function wrappers, PMC methods, ops, and other things to allow access to the power of the GMP library. Adding custom Integer-like and Float-like PMCs that autopromote to their Big- counterparts would be nice too.

For more info about this project, you could probably get in touch with myself, darbelo, or bubaflub.

by Whiteknight (noreply@blogger.com) at March 10, 2010 16:00 UTC

March 09, 2010

v
^
x

Andrew WhitworthWeekend Hackathon

This weekend we were supposed to have a hackathon to get the new PCC branch up and running. The purpose of this branch is, as I have discussed before, to rearrange the call sequence so return values are processed after the function invocation instead of before. In the grand scheme of things, especially in comparison to the previous PCC refactors, this ends up being a minor change characterized mostly by massive code deletions instead of needing to write huge new functions or rewrite tons of existing functions. A few bugs stymied completion of the branch, but I have high hopes that the remaining bugs will get worked out soon. This branch was worked on primarily by allison, though I lent an eye as time permitted and chromatic lent some major debugging support as well.

Very few people ended up working on the PCC branch, even though that was the "official" target of the hackathon. A large amount of effort instead went to work on other branches. I'm certainly not complaining about the division of effort. In fact, I want to celebrate it. I'm extremely happy to see other worthwhile projects getting extra manhours devoted to them. It's very good to get people working on Parrot in any capacity, and as I mentioned above the PCC work was not a huge project that would have required a dozen developers focusing on it anyway.

cotto and bacek focused their considerable talents on the ops_pct branch, which aims to replace the Perl5-based ops parser with a bootstrapped version written with PCT. Im not sure about the exact status of that branch, but there was a huge flurry of commits and I have to believe things are progressing rapidly.

plobsing started a new branch to tackle ticket #1015. Using some of the new mechanisms he's developed to find and prevent cycles in the freeze/thaw code, he decided to try and fix the problems with cycles in deep clones as well. I don't know the current status, but last I saw his work was going well.

Coke has started a new branch to continue the makefile cleanups, this time focusing on the recursive makefile for the dynops. He seems to be running into some bugs this morning, but hopefully nothing that cannot be quickly overcome.

Overall I would label the hackathon a great success. A lot of people came out to IRC to follow progress and work on various projects, and it is all much-appreciated.

by Whiteknight (noreply@blogger.com) at March 09, 2010 12:01 UTC

v
^
x

Andrew WhitworthGSoC: Parrot in Summer 2010

Jonathan Leto sent out a great email to the list today about Parrot's involvement in GSoC this year. Parrot will be combining together with the Perl foundation again and entering as a single organization. I very much like this arrangement, under the blind assumption that we do better together in terms of student allotment than we do apart. I have no reason to doubt that.

Mentors: If you want to sign up to be a potential mentor, you can do it on the Perl foundation wiki.

Project Ideas: If you have any project ideas (I know I do!), list them on the Perl foundation wiki. If you tell me about the ideas as well, I'll feature them in a blog post and hopefully drum up some interest among prospective students.

by Whiteknight (noreply@blogger.com) at March 09, 2010 09:10 UTC

v
^
x

Jonathan LetoGoogle Summer of Code 2010

I am working on the application for The Perl Foundation and Parrot
Foundation
to participate in Google Summer of Code 2010. GSoC is a
program where Google funds eligible students to hack on open source
projects for a summer. It is a great opportunity for the students and
the communities that mentor them. You also may be interested in this
summary of our involvement last year . Our application will be
submitted by the end of this week.

Please join us in getting prepared for this year. There is a page for
possible mentors to volunteer as well as a page for 
project ideas . If you would like to help with the wiki, our 
main GSoC page is the best place to start. You are also invited to join 
our mailing list  and come ask questions in #soc-help on irc.perl.org .

by Jonathan Leto at March 09, 2010 07:24 UTC

March 02, 2010

v
^
x

Andrew WhitworthDifference between PMCs and Objects

There has been lots of talk and activity lately that has to deal with Parrot Objects. My rant about exceptions in Parrot has incited Tene to begin a flurry of development on that system, and Austin's Kakapo project has been regularly pushing the boundaries of what kinds of operations are and should be possible (and finding lots of bugs along the way!). Other people have been bringing up the topic as well, and lots of people are asking lots of questions about the implementation. I'm going to use this post to explain a bit about how Objects and PMCs work in Parrot, and maybe later I'll devote a post or two to ideas for fixing this system.

PMCs are basically objects, though extremely simple, flexible, and low-level. PMCs are interacted with, primarily, through the VTABLE interface. VTABLEs in Parrot are long lists of C function pointers that implement various behaviors. Calling the in-place addition VTABLE, add_i, is done like this in C:

VTABLE_add_i(interp, pmc, 5);

...Which translates to this:

pmc->vtable->add_i(interp, pmc, 5);

By pointing to a per-type VTABLE structure, PMCs with the same type can access a common list of function behaviors without overlapping or needing to do expensive switch/cases over a list of direct function calls. Likewise, determining the type of a PMC means finding the type of the VTABLE it points to:

pmc->vtable->base_type; // type number
pmc->vtable->whoami; // type name (Parrot STRING)
pmc->vtable->class; // Class or PMCProxy PMC for the type

Also, if we have the type number, we can look up the particular VTABLE in an array:

VTABLE * tbl = interp->vtable[index];

In a sense, that's all there is to a PMC. All interactions with a PMC happen through this interface of about 185 function pointers. A PMC, by itself, doesn't have things that we would normally associate with "objects" in higher-level systems: Attributes and Methods. Sure, PMCs do have a way to associate a C structure, and therefore maintain a list of what we call "attributes", but those aren't directly accessible from PIR without adding some kind of lookup routine to find them and maybe wrap them into one of the Parrot register types (INTVAL, FLOATVAL, STRING, PMC). PMCs also appear to have methods, but this really isn't the case when you look at it closely.

As I describe in a previous post, the long way to invoke a method on a PMC is like this:

$P0 = new ['Foo']
$P1 = find_method $P0, "bar"
callmethodcc $P0, $P1

The find_method opcode is a thin wrapper around the VTABLE_find_method interface function. If I translate this to an extremely condensed and wildly inaccurate pseudo-C listing, we get:

PMC * p0 = Parrot_pmc_new(interp, type_Foo);
PMC * p1 = VTABLE_find_method(interp, p0, "bar");
setup_method_call(interp, p0);
VTABLE_invoke(interp, p1);

This is obviously an extremely inaccurate listing, but should do well to illustrate my point. The method is actually a separate PMC type. It can be either a Sub (a .sub written in PIR) or an NCI (a wrapper type around a C function call). To make the call we set up the argument list (the invocant, $P0, is treated sort of like an argument but is kept distinct) and then invoke the method.

Before they are invoked, methods are stored inside either a Class or PMCProxy PMC associated with that type. When we call VTABLE_find_method(interp, p0, "bar"), we go through this machination:

PMC * class = pmc->vtable->class;
PMC * methods = class->data->methods;
PMC * method = VTABLE_get_pmc_keyed_str(interp, methods, "bar");

What we think of as an "object" and a "class" is actually a small collection of interoperating PMCs. The PMC itself contains a long list of VTABLEs and a small amount of data stored in a C structure, which cannot be directly accessed from PIR code. The PMCProxy PMC (like Class, which I will describe later, but designed to work with PMC types written in C) contains a hash of methods and a variety of other data. Methods themselves are their own PMCs, complete with their own type data. To really blow your mind consider that, as a PMC, you can call a method on a method, or even a method on a method on a method.

In short, a PMC is sort of like the building block that is used to create objects and a type system, though the PMCs themselves are not what we normally think of as "objects". The only way to interact with a PMC is through VTABLEs, not attributes or methods. Luckily, VTABLEs exist that allow us to query the object for related attributes and methods, though the PMC itself may not necessarily respond to these requests.

Using PMCs, Parrot does provide a proper Object system through the use of two special PMC types: Object and Class. Class, as can be guessed, is a "metaobject" that defines type information for objects of a single type. The Class uses a series of PMCs internally to manage things like method PMCs and attributes. The Object PMC is the basic building block of a class instance object. It provides a series of default vtables that allow it to interact with Class the way we expect (to find methods that are stored in the class reliably, for instance) and to provide a set of attributes that are available for access from PIR. PMCs are the almost formless building blocks, Object is a very specific PMC type that provides behaviors that we expect from an OO type system.

Now that we've covered basic definitions, what are the big operational differences between the two systems? Here's a short list:

  1. Object types are defined by Class PMCs. PMCs are defined by PMCProxy PMCs
  2. Class PMCs are created whenever we do a "newclass" or "subclass" operation from PIR. PMCProxy PMCs are created lazily, only when we actually need to introspect a built-in PMC type.
  3. Objects must be created from a Class, which means the Class PMC must exist before any Objects of that type can be created. PMCs can be created by themselves and generally don't require instantiation from another PMC.
  4. Objects have very regimented behavior: You can (and should) expect certain things when you access a named attribute or named Method. In a PMC these behaviors may be overridden to do different and unexpected things. Specifically, it can be very difficult to get access to named attributes on a PMC unless they are explicitly made visible from PIR (which can be a lot of work, and not a lot of PMC types do it completely)
  5. Inheritance between PMCs happens at the C level, so C-level attribute structures are merged together and made visible from C code. Inheritance between objects happens at the PIR level, method and attribute lists are combined and made visible as expected when accessed from PIR code. Inheritance from a PMC to an object is almost always broken, if you expect the attributes and methods from the PMC to magically become visible as attributes and methods on the Object. I've never seen inheritance from an Object to a PMC subclass, but I suspect it is broken even worse.
  6. The VTABLEs in the Object PMC all provide an option to use a PIR-based override routine to implement the behavior. To do this, every VTABLE function in the Object PMC searches the associated Class for a similarly named VTABLE Sub PMC and, if one is found, calls that. PMC types almost never search for an override in the Proxy, and if you define one it will never be called (unless you specifically implement the logic to search for and execute it). On a related note the VTABLEs of an Object, because they are stored as PMCs in a Hash in the Class, can be modified at runtime. The VTABLEs of a PMC cannot be (well, I guess you could change the pointer to call a different function if your C-foo is strong, but I would prepare for fire and brimstone. Also, I won't fix any "bugs" that arise from this misguided behavior). I estimate at least 10% of reported bugs or feature requests in Parrot come from the "this sucks worse than I would expect" behavior of subclassing Objects from PMCs. If you can get away with it, it is almost always better to delegate to a built-in type instead of inheriting from it directly. But, I can talk more about problems and workaround solutions like this in another post.
So there you have a guide to the differences between Objects and PMCs. PMCs are the low-level building blocks of an object system, and Objects are combinations of several PMCs and a large number of default VTABLEs to implement an expected set of OO behaviors. In a sense, Objects are PMCs, but in another sense they really aren't.

by Whiteknight (noreply@blogger.com) at March 02, 2010 08:00 UTC

v
^
x

chromaticPerl 6 Design Minutes for 24 February 2010

The Perl 6 design team met by phone on 24 February 2010. Larry, Allison, Patrick, and chromatic attended.

Larry:

  • my work last week was almost entirely responsive to various discussions on irc and p6l, even when it doesn't seem like it
  • clarified that LEAVE-style phasers do not trip till after an exception is handled (and not resumed)
  • the implementation of take is specifically before unwinding even if implemented with a control exception
  • simplified series operator by moving generator function to the left side (any function on right side will now be a limiting conditional)
  • a * is no longer required to intuit the series on the left; the absence of generator before the ... operator is sufficient
  • first argument on the right of ... is now always a limiter argument
  • for convenience and consistency, added a new ...^ form to exclude a literal limiter from the generated series
  • unlike ranges, however, there is no leading exclusion ^... or ^...^
  • series is a list associative list infix, and each ... pays attention only the portion of the list immediately to its left (plus the limit from the right)
  • an "impossible" limit can terminate a monotonic intuited series even if the limit can never match exactly
  • variables now default to a type of Any, and must explicitly declare Mu or Junction type to hold junctions
  • this is to reduce pressure to duplicate many functions like == with Mu arguments; most of our failure values should be derived from Any in any case
  • a Mu result is more indicative of a major malfunction now, and is caught at first assignment to an Any variable
  • Instant/Duration types are biased away from Num and towards Rat/FatRat semantics
  • Instant is now completely opaque; we no longer pretend to be the same as TAI, numerically speaking
  • Instants are now considered a more basic type than epochs, which are just particular named instants
  • all culturally aware time can be based on calculations involving instants and durations
  • list associative operators now treat non-matching op names as non-associative rather than right-associative, forcing parens
  • Whatever semantics now autocurry any prefix, postfix, or infix operator that doesn't explicitly declare that it handles whateverness itself
  • WhateverCode objects now take a signature to keep clear how many args are not yet curried
  • so *+* is now more like WhateverCode:($x,$y)
  • autocurrying is still transitive so multiple ops can curry themselves around a *
  • added semilists as Slicel type to go with Parcel
  • this allows us to bind @array[1,2,3] differently from @array[1,2,3;4,5,6], for instance
  • the Matcher type now excludes Bool arguments to prevent accidental binding to outer $_ when closure is needed
  • when and ~~ will now warn of always/never matching on direct use of True or False names as matcher
  • STD generalizes \w lookahead to all twigils now
  • STD now treats non-matching list associatives as non-associative
  • things like 1 min 2 max 3 are now illegal, and require parenthesization for clarity
  • STD now treat invocant colon as just a comma variant so it does not fall afoul of the list associativity change
  • CORE now recognizes the TrigBase enumeration

Patrick:

  • first release of the new branch of Rakudo last week
  • passing ~25,000 tests at the release
  • thanks to optimizations from chromatic, Jonathan, and Vasily, Rakudo has a lot of speed improvements
  • in particular, it can run those tests in under 10 minutes, non-parallel, depending on your hardware
  • older releases took 25 minutes and more
  • the regex tests will slow things down
  • ultimately, we're seeing a big speed improvement over the past releases
  • cleaned up lists and slices, now they work pretty well
  • worked with Solomon Foster and others to speed up trig operations
  • fixed a bug related to lexicals declared in classes
  • fixed the long-standing and often recurring problem with curlies ending a line/statement causing the next statement to be a statement modifier
  • easy to fix in the new grammar
  • that was nice
  • made an initial implementation of the sort method
  • it's very short, because Parrot provides one
  • there are a few bugs in Rakudo there still, but I'll get them
  • planning for the Copenhagen hackathon on March 5 - 9
  • Jonathan and I have been updating the Rakudo roadmap
  • will check that in in the next couple of hours
  • so far, every time we review it, we surprise ourselves at how much we've accomplished
  • we're meeting all of the top priority goals without making any heroic efforts
  • we'll put those goals in as well as timelines
  • most of the major tasks from previous roadmaps have happened

Allison:

  • working on Python this week
  • attended Python VM summit, Python language summit, and PyCon
  • Parrot's on good track to support what Python needs
  • useful to make community connections
  • when I reviewed Pynie, I was surprised to see how close it is to supporting the whole Python syntax
  • some of those features are big, like objects
  • but we should support them soon
  • Debian packages delayed by the absence of a sponsor
  • they should go into Debian soon though
  • I put in a request for feature-freeze exception for Ubuntu 10.4
  • Parrot 2.0 should go in
  • haven't made any commits to the PCC branch
  • that'll be a top priority for next week

c:

  • fixed a Parrot GC bug for last week's Rakudo release
  • made some optimizations in Rakudo and Parrot
  • helped Jonathan find a few more
  • fixed a long-standing math MMD bug
  • still working on HLL subclassing; more tricky than you think
  • may be some conflicting design goals about vtable overriding and MMD

Allison:

  • Patrick, do we need an explicit deprecation for old PGE and NQP?

Patrick:

  • I think Will already added one for NQP
  • we can add one for PGE if we need
  • they don't necessarily have to disappear at the next release
  • but no one's planning to maintain them

Allison:

  • no reason not to put in the notice now
  • we don't have to remove them at the earliest possible date

by chromatic at March 02, 2010 05:12 UTC

March 01, 2010

v
^
x

Andrew WhitworthNQ-NQP Blog

A few days ago I mentioned an interesting new project called NQ-NQP, an implementation of the NQP language with a flex/bison frontend and an LLVM code generating backend. I've heard tonight that he's started blogging about it. Anybody who is interested in NQP or LLVM stuff might do well to give it a read.

by Whiteknight (noreply@blogger.com) at March 01, 2010 19:07 UTC

February 28, 2010

v
^
x

Andrew WhitworthProposal to Change find_method

Austin Hastings, as part of his Kakapo project (which I now have a commit bit to!) has started creating a mock object framework. We were talking about how to implement expected method calls, so I took a look at the find_method VTABLE of the Object PMC for some inspiration. What I saw was absolutely horrible, so I promptly created a branch to fix it. However, the more I looked and edited, the bigger I found the problems to be. I'll talk more about Kakapo in another post.

When I do code like this:

$P0 = new ['Foo']
$P0.'Bar'()

What is really happening is something similar to this:

$P0 = new ['Foo']
$P1 = find_method $P0, 'Bar'
callmethodcc $P0, $P1

Internally, the find_method opcode calls the VTABLE_find_method function on the given object. The object itself is expected then to walk the method resolution order (MRO) of it's inheritance hierachy to find a suitable method and return it. Along the way, the Object PMC needs to completely violate the encapsulation of the Class PMC to gather information about the MRO and then to search the list of methods in the Class for an entry with the given name. In short version, the C code from Object.find_method looks like this:

int num_classes = VTABLE_elements(interp, class->all_parents);
int i;
for (i = 0; i num_classes; i++) {
cur_class = VTABLE_get_pmc_keyed_int(interp,class->all_parents, i);
if (VTABLE_exists_keyed_str(interp, class->methods, name))
return VTABLE_get_pmc_keyed_str(interp, class->methods, name);
}

So Object reads the attributes of it's Class PMC directly, and manually traverses the MRO looking for the proper method. This causes a few problems. First, as a mostly stylistic point, this completely breaks encapsulation. We can't make a change to the MRO or the method storage and lookup mechanism in Class without likewise changing the behavior in Object.

Second point, since Object needs to know how to traverse the MRO and lookup methods, and requires intimate internal knowledge of the classes in the MRO, we are extremely limited in the types of objects that can be in the inheritance hierarchy. That is, we can't define our own metaobject types, we must use Class or PMCProxy, or a subclass thereof (and a careful reading of the code suggests that even subclasses will not work). This seems to be a remarkable limitation when you consider some of the diverse high-level languages that Parrot aims to support.

One thing I tried to do was create a find_method VTABLE in the Class PMC, and then delegate traversal of the MRO to Class instead of Object. This helped improve encapsulation greatly, but created another problem: Now I couldn't call methods on Class itself. Here's example code that broke:

$P0 = getclass 'Foo'
$P0.'add_vtable_override'("bar")

What we want to do is call a method on the class object itself, but what we end up doing is finding a method on objects of that type, and then trying to call that method on the class object. Problems.

Let's recap some issues:
  1. Find_method searches for a method to use on a given invocant
  2. The Class type has methods that need to be accessible through find_method
  3. Object has to break encapsulation and monkey around in Class's internals, which means we can only use Class objects, and objects strictly isomorphic to Class (like PMCProxy) in an MRO
  4. We cannot delegate the method lookup operation to the Class object, where it arguably belongs.
With these things in mind, I had an idea that I sent to the list which aims to fix all this: Create a new VTABLE function that searches for a method in a metaobject, instead of searching for the method on the invocant (like find_method does now). In terms of PIR, I'm thinking of enabling this kind of sequence:

$P0 = new ['Foo']
$P1 = getclass 'Foo'
$P2 = find_class_method 'Bar'
callmethodcc $P0, $P2

I don't want to remove find_method or change it in any way. But what I want to have is a way to delegate method lookup to the Class object as well. I think we will find that when we have a way to delegate lookup to the Class object that we will use it much more frequently and to greater effect than we use find_method now. I also think we will find that find_method can eventually be deprecated entirely, but that's another issue for another time.

One other problem that I failed to mention above is that every class has it's own completely linearized resolution order. So if Foo is a Bar, and Bar is a Baz, the Foo class has the MRO ("Foo", "Bar", "Baz"), Bar would have the MRO ("Bar", "Baz"), and Baz would have the MRO ("Baz"). Asking the Foo Class object for a method "Frobulate" would look in Bar, which would ask Baz. Then, Foo would move to the next item in it's MRO, Baz, and ask it. The net result is that Baz would be queried twice, since the Foo Class item doesn't know necessarily that Baz is in Bar's MRO, and Bar doesn't know that it is being queried from Foo (maybe Bar was being queried directly). So what we need is some kind of way to keep track of the MRO up front, and avoid re-defining the search MRO for each new delegation.

I think we could solve this issue if we defined a new VTABLE like this:

VTABLE PMC * find_class_method(STRING *name, PMC *mro_iterator)

In this conception, SELF would be the metaobject currently being searched, name would be the string name of the method to find, and mro_iterator would be an iterator object for the MRO list. When we do the PIR code:

$P0 = getclass "Foo"
$P1 = find_class_method $P0, "Frobulate"

The first call to the Foo class object would be VTABLE_find_class_method("Frobulate", NULL). Foo would then create an iterator over it's MRO (removing itself from the front of the list to avoid direct recursion) and passing that MRO iterator to Bar, which then calls the next item on the list (Baz). This has a few major advantages which are not necessarily obvious up front: Any object that defines find_class_method can be inserted into the MRO. This includes things that aren't really classes like Roles, Mixins, extension methods, and even autoloaders. Second, we gain more flexibility to modify the MRO of a class, because that class (and it's super-classes) can add additional search parents to the iterator as needed. We would also gain the ability to have more manual control over the MRO, because we could add a find_class_method_p_p_s_p op variant that also takes an existing MRO iterator. This would enable us to better implement something like a super() call, where we take the MRO iterator, manually pop the top item off it, and then call find_class_method with it. I've got several bonus points available to whoever can explain how to call a method in a super class when it's overridden in the subclass, without having to hard-code in the name of the parent class. With the new VTABLE and a new op, this becomes trivial.

So that's my idea for method lookups. I've sent a mail to the list with the idea, and I'm going to raise the idea at #ps if I can make it to the meeting. I think it has a lot of merit, enables a few cool new abilities and doesn't take away any existing functionality. I would like to hear any other ideas, but I'm becoming convinced that this one is a winner.

by Whiteknight (noreply@blogger.com) at February 28, 2010 16:50 UTC

February 25, 2010

v
^
x

chromaticPerl 6 Design Minutes for 17 February 2010

The Perl 6 design team met by phone on 17 February 2010. Larry, Allison, Patrick, and chromatic attended.

Larry:

  • much work clarifying relationship of parcels to everything else (<a b>, assignment, arguments, captures, parameters, signatures, gather/take, and loop returns)
  • we now list all scope declarators in one spot
  • conjectured some ideas on how to handle the allomorphism of literals more dwimmily
  • had already specced some of this behavior for literals found inside qw angles.
  • literals that exceed a Rat64's denominator automatically keep the string form around for coercion to other types
  • clarified that anon declarator allows a name but simply doesn't install it in the symbol table
  • respecced the trig functions to use a pragma to imported fast curried functions
  • still uses enum second argument for the general case (rakudo is still stuck on slow strings there)
  • on iterators, renamed .getobj to .getarg since arguments are the typical positional/slicey usage
  • signatures are never bound against parcels anymore, only against captures
  • we now use "argument" as a technical term meaning either a real parcel or an object that can be used independent of context as an argument
  • anything that would stay discrete when bound to a positional, basically
  • return, take, and loop return objects are also arguments in that sense
  • they all return either a parcel or anything that can stand on its own as an argument
  • STD now adds a shortname alias on adverbialized names, ignores collisions on the shortname for now, which is okay for multis
  • STD now complains about longname (adverbialized) collisions
  • STD no longer carps about duplicate anonymous routine declarations
  • made the undeclared type message the same for parameters as for other declarations
  • clarify the error message about anonymous variables
  • no longer report a $) variable error where ) is the $*GOAL
  • add WHAT etc. to list of functions that require an argument

Allison:

  • working on two HLL implementations
  • one is Pynie, the other is Camle
  • nothing to do with Caml or ML
  • I've noticed huge improvements in NQP-rx from the previous NQP
  • can't say which feature improvements make the most difference, but I'll migrate Pynie pretty soon to take advantage of the new version
  • continuing to shepherd Debian and Ubuntu packages

Patrick:

  • essentially all I did was unify things
  • previously it had been two or three tools
  • it's just one

Allison:

  • even the syntax seems more regular

Patrick:

  • there are more pieces available in NQP-rx
  • Rakudo's -ng is now master
  • the old master is now -alpha
  • we took a big hit on spectests, but they seem to be coming back quickly
  • 5000 tests pass on trunk now
  • we have 16k or 17k we haven't re-enabled; they make the spectest slower
  • Jonathan thinks we may pass 25,000 tests now
  • that's great, considering where we were a week ago
  • I redid Rakudo's container, value, and assignment module
  • previously variables held values directly
  • now they contain reference PMCs
  • that cleaned up many things
  • we use more PMCs, but now we don't clone and copy as much
  • we move references around more
  • seems closer to how Perl 6 handles things
  • was much easier than I expected
  • updated the NQP-rx regex engine and built in constant types
  • handles Unicode character names
  • reclaims plenty of tests
  • answered lots of questions for people adding things into Rakudo
  • prioritizing other people writing code over writing code
  • increases our developer pool; seems to be working well
  • new release of Rakudo planned for tomorrow
  • don't know how many tests we'll pass, but it should go well
  • plan to put in a few things like sort and grammars over the next week
  • then I'll review the RT queue to find bugs and (hopefully) closeable bugs

c:

  • working on GC tuning
  • also working on String PMC tuning
  • working on built-in types and their behavior as classes and parent classes
  • the multidispatch bugs in particular I hope to solve

by chromatic at February 25, 2010 00:27 UTC

February 24, 2010

v
^
x

Andrew WhitworthPDD23 Exceptions Critique

Following my post a few days ago, I would like to take a more in-depth look at PDD23, which lays the specification for the exceptions subsystem. I hadn't intended to go through line-by-line, but in a lot of places I have to.

[Update: I wrote this post at the same time as I wrote the last one on the topic, but I delayed in posting this one until now. In the interim time, Austin created a page on the wiki to plan out a major refactor of the system and Tene started a branch to do some work. I'll post updates on both those things as they happen.]

exceptions are indications by running code that something unusual -- an "exception" to the normal processing -- has occurred. When code detects an exceptional condition, it throws an exception object. Before this occurs, code can register exception handlers, which are functions (or closures) which may (but are not obligated to) handle the exception. Some exceptions permit continued execution immediately after the throw; some don't.

Exceptions transfer control to a piece of code outside the normal flow of control. They are mainly used for error reporting or cleanup tasks.

This is, essentially, the preamble to the rest of the document and already shows some disconnect with reality. High level languages are already using exceptions to handle normal control flow in some cases. In this case they are less "exception" and more "expection". I could go on and talk about how bad an idea it is to use exceptions for normal control flow for a variety of reasons, but I won't. I know that Parrot's control flow model still isn't mature enough to tackle all the cases that HLLs have been digging up, so exceptions are the only available mechanism to implement some structures. Also, that I am aware of, no exception prevents resuming after the point of the throw. I believe that determination is left up to the handler.

When an exception is thrown, Parrot walks up the stack of active exception handlers, invoking each one in turn, but still in the dynamic context of the exception (i.e. the call stack is not unwound first).

I need to carefully read through some of the code again, but I'm pretty certain that this is patently false. ExceptionHandlers are implemented as Continuations which do rewind the call stack and are executed in the dynamic context of the function that contains the handler. Again, I need to look at all the code and the semantics in greater detail, but at the very least this is highly suspect.

Exception handlers can resume execution after handling the exception by
invoking the continuation stored in the 'resume' slot of the exception object. That continuation must be invoked with no parameters; in other words, throw never returns a value.

Not a problem here so much as a little nit. Why can't exception resumes return values? If you think about common exception uses in some popular programming languages this is never used. But when you consider that exceptions in Parrot are currently used, as I mention above, to implement complex control flow, you start to see that there is maybe some utility to it. Slightly more to the point, what if the resume object wasn't just a continuation pointing to the opcode after the throw instruction, but was instead a Sub object representing a lexically-scoped finally{} block that needed to be invoked? I can come up with a few ideas of places where the functionality to pass parameters to the resume continuation might be nice to have. It's interesting to consider that maybe we resume to a multi-sub, which dispatches to a post-handler routine based on signature? I have several ideas like this, and while they are all a little bit off the radar of current programming languages they are by no means unthinkable or undesirable in the long run. If it's possible to provide this, and Parrot's internal mechanics should certainly make it so, I don't see why we would artificially limit it.

The die opcode throws an exception of type exception;death and severity
except_error with a payload of message. The exception payload is a string PMC containing message.

I have been accused of being anti-Perl, and I maintain that I am not. Maybe I'll devote another blog post to the topic later. But I don't think Parrot needs a "die" op that does what it does here. I can understand and appreciate that Perl is very motivated by linguistic factors, and that Parrot has been traditionally very influenced by Perl. But Parrot's opcodes represent an assembly language, and using these kinds of linguistic features seems a little bit out of place. Why have "die", when we have "exit"?

The routines to search the op library are not linear. I think it uses a skip list, but I haven't studied the implementation enough to be able to say so definitively. What I do know is that the time it takes to search the oplist for a valid op name is proportional in some measure to the number of op "short" names. I think it's O(log n). As an example, die_s, die_p, and die_i_i all have the short name "die". IMCC, during lexical analysis, looks to determine whether an opcode exists in the library using it's short name. Later in the process, IMCC hunts down the exact long name of the op, which again uses the same algorithm (skip list?) but looks at long names instead of short names. I'll spare more details on this point, but the lesson is clear: Having fewer ops is better for IMCC's code generation performance. Having fewer short names (even if the number of ops remains the same) improves parsing performance in IMCC. For a PIR-based benchmark, we would see some improvement (though admittedly it would be very small) if we did nothing besides rename all "die" opcodes to "exit" instead.

When I see the word "die", It seems to me like it should do what it says: Kill the program. Do not pass go. Do not collect 200 dollars. Die. I can't imagine having any other preconception about it. Why would the "die" opcode not make the program...die? At least, why not without an explicitly-defined mechanism to prevent it, such as how Perl5 uses eval()? So you can imagine my surprise that die seems to throw just another exception that can be caught. You can imagine how perplexed I tried calling

die 'Program is closing'

or

exit 0

didn't exit my program! Instead, I had to use

die 5, 0

to tell the system that yes, I actually wanted the program to shut down. Of course, now I can't supply a helpful message about why we need to die. It's also surprising to me that, for some reason, the exit opcode seems to have the same general behavior. It doesn't actually exit if you have a handler active, and doesn't have an overload that let's you manually specify a severity that forces an exit. So that seems pointless to me. Again, what else could the word "exit" mean besides "get out of my damn program"?

All exceptions will have at least message, severity, resume, and payload attributes.

There are three forms of the die opcode: die_s, die_p, and die_i_i. The first two basically throw a normal, catchable exception with the given argument treated as the string message to display to the user. The third form throws a normal, catchable exception with a user-definable severity and error code. The exit opcode has form exit_i, which throws a normal, catchable exception with only the given error code. The throw opcode has flavors throw_p, and throw_p_p, which let you throw a given pre-constructed exception, optionally with a given resume continuation. This all seems like a hugely redundant waste of opcodes which all essentially do the same thing but each of which only lets you specify a subset of the parameters that every exception object is supposed to provide. None of the opcodes allow you to specify a payload, even though the spec suggests (as I will discuss below) the payload should be used for type filtering by HLLs, and the current implementation prevents proper type subclassing!

"die" lets you specify a message or a severity and error code, but doesn't actually make the program die. "exit" lets you specify an error code only, and doesn't necessarily make the program exit. "throw" lets you specify a pre-built exception and optionally a custom resume continuation only. Considering that every exception must have a message, severity, resume, and payload, this assortment of opcodes really doesn't make any sense at all.

I won't harp on opcodes any further in this post, but I think I've made my point: The ops we do have are a stupid mish-mash of the kinds of ops we need to work with exceptions. If every Exception must have resume, severity, type, and payload, why do our ops not support that? Why do we have die, when we have throw, rethrow, and exit? I highly suggest we slim down these opcodes. I think an exit_i opcode is fine, if it forces an exit in lieu of a specifically-defined exit-handler. That is, most handlers would not handle exit events by default, allowing the exit op to do what we expect. To catch and handle these types, which would be necessary in some places involving embedding or nesting, we could specifically define an exit-handler type that is capable of catching them.

I think a throw_p opcode is all we really need to throw other types of exceptions. Maybe, if we were worried about writing out all the PIR for constructing elaborate Exception objects, we could have a throw_p_s_i_i, which would set all four required attributes at once, and throw it.

Anyway, that's enough on this particular subtopic. But, in tangent, I would like to suggest again that we try to find a good way to specify aggregate literals in PIR code. In this way we could specify exception constants (or proto-exception initializer objects) to reduce the runtime cost of constructing exceptions where things like the severity, type, and message are the same. The ability to specify ExceptionHandler constants in the code likewise would create a huge performance savings, especially when you consider that in a normally-operating program more ExceptionHandlers are created and registered than Exceptions.

count_eh Return the quantity of currently active exception handlers.

I'm not certain that we need an opcode for this, especially since I think it's used pretty infrequently. A method call on the current context object could provide the same info. A series of methods would allow fine-grained manipulation of the handler stack, which would be even better.

If no handler is found, and the exception is non-fatal (such as a warning), and there is a continuation in the exception record (because the throwing opcode was throw), invoke the continuation (resume execution). Whether to resume or die when an exception isn't handled is determined by the severity of the exception.


I'm not sure if the implementation follows the letter of the spec in regards to the "exception record". As far as I am aware, an unhandled exception doesn't automatically cause the program to resume normal control flow no matter what type it is. I need to check on this, but I have never witnessed this behavior. If it does exist, I apologize for not knowing about it, of course.

typedef enum {
EXCEPT_normal = 0,
EXCEPT_warning = 1,
EXCEPT_error = 2,
EXCEPT_severe = 3,
EXCEPT_fatal = 4,
EXCEPT_doomed = 5,
EXCEPT_exit = 6
} exception_severity;

As Austin mentioned, there are way too many of these. Also, as I've found out experimentally, only EXCEPT_doomed actually causes Parrot to exit despite other severities having harmful-sounding names like "fatal", and "exit". In my mind we need only four severities, at most: Trivial, Normal, Fatal and Control. Anything else is superfluous, not just in theory but also in the code as it currently exists. Trivial exceptions can automatically resume if unhandled. Normal exceptions are ones that represent an error. They can be handled by any default handler, but cause a program exit when unhandled. Fatal exceptions mark an error that is typically unrecoverable unless a special exit handler has been specifically configured to catch such events. Control exceptions bypass the error-reporting system and are used to implement non-error control flow. I'm hard-pressed to come up with any other designations we would ever need for this mechanism.

typedef enum {
EXCEPTION_BAD_BUFFER_SIZE,
EXCEPTION_MISSING_ENCODING_NAME,
EXCEPTION_INVALID_STRING_REPRESENTATION,
EXCEPTION_ICU_ERROR,
EXCEPTION_UNIMPLEMENTED,
EXCEPTION_NULL_REG_ACCESS,
EXCEPTION_NO_REG_FRAMES,
EXCEPTION_SUBSTR_OUT_OF_STRING,
EXCEPTION_ORD_OUT_OF_STRING,
...
} exception_type_enum;

There are a huge number of exception types, and they really seem superfluous when you consider that every exception must contain a message field with a human-readable message that describes it and a payload field that can contain any arbitrary object with additional data. I know that the intention with this huge list is to implement exception types without using subclasses. The reason for this is that subclasses can be largely expensive because each subclass needs to have it's own VTABLE and other information which can become prohibitive if we want to have more than a few types. I've recently put forward an idea for allowing extremely inexpensive subclasses which was inspired by exactly this problem. My idea was not without it's caveats, of course, but it's not the only possible route to take to make the subclassing operation less expensive. That said...

The payload more specifically identifies the detailed cause/nature of
the exception. Each exception class will have its own specific payload type(s). See the table of standard exception classes for examples.


So every Exception has a payload, which can be a user-defined object type with information about the exception type, and it needs to have one of these dozens of enum values that indicates it's type? This is all highly redundant, and there are at least two paths we could follow to make this system sane:
  1. Only have one type of Exception PMC with no subclasses. Get rid of the type enums. The Exception "type" can be determined from the user-specified payload, if any. Add opcodes or methods that better facilitate throwing an exception with a custom payload. We're likely going to need to define several "Payload" PMC types to handle those exceptions thrown by core. This would require implementing cheap subclasses, but has the benefit that built-in types can be overridden by HLL types if needed.
  2. Have many subclasses of Exception. Get rid of type enums. We only need a throw_p opcode and can construct "new ['ICUError']" objects or whatever we need. This is going to require implementation of cheap subclasses, and will allow HLL type overrides if needed.
Either way, a major improvement over what we have now.
Exceptions have been incorporated into built-in opcodes in a limited way. For the most part, they're used when the return value is either impractical to
check (perhaps because we don't want to add that many error checks in line), or where the output type is unable to represent an error state (e.g. the output I register of the ord opcode).


Color me stupid, but isn't consistency of interface a good thing? How do we know, without having to memorize the behavior of all 1302 ops, which throw exceptions to signal errors and which do not?

Other opcodes respond to an errorson setting to decide whether to throw an exception or return an error value.


I think this should be the default behavior. All ops should throw exceptions on error if "ops throw exceptions" is turned on. Otherwise, no ops do. This setting is cheap enough to toggle.

{{ TODO: "errorson" as specified is dynamically rather than lexically
scoped; is this good? Probably not good. Let's revisit it when we get the basic exceptions functionality implemented. }}


Good point! Maybe an opcode for this isn't a great idea. Methods on the ParrotInterpreter object (to set global settings) and methods on the CallContext PMC (to set local settings) would be a good alternative. When is the basic implementation expected?

{{ NOTE: There are a couple of different factors here. One is the ability to globally define the severity of certain exceptions or categories of exceptions without needing to define a handler for each one. (e.g. Perl 6 may have pragmas to set how severe type-checking errors are. A simple "incompatible type" error may be fatal under one pragma, a resumable warning under another pragma, and completely silent under a third pragma.) Another is the ability to "defang" opcodes so they return error codes instead of throwing exceptions. We might provide a very simple interface to catch an exception and capture its payload without the full complexity of manually defining exception handlers (though it would still be implemented as an exception handler internally)


Another warning in the same vein as the previous note. The point here is that we may want to say that some opcodes throw exceptions, but that we may want those exceptions to have different effects under different "pragmas". This kind of system can be hugely expensive if every error-capable opcode needs to check not only whether to return an error code or throw an exception, but also what the severity of that exception is depending on a series of pragmata that, most likely, would need to be lexically-scoped anyway. Way too complicated. Far better is to enable cheap subclasses of Exception, and have the HLL hot-swap type-maps at runtime with different behaviors such as different severities. Or better yet, forget hot-swapping and instead introspect on the Exception subclasses' Class object to change the default severity values and behaviors. That way when the new Exception object is created, the initialization routine sets a different default severity, the op throws it no matter what, and the exceptions system handles things like it is supposed to do.

So that's my in-depth critique of the Exceptions PDD. I may make it a regular feature to go through other PDDs as well, and I'm sure I'll post other ideas, proposals, and insights for this system in the future as well.

by Whiteknight (noreply@blogger.com) at February 24, 2010 08:08 UTC

February 23, 2010

v
^
x

Andrew WhitworthCheap Subclasses

I had an idea the other night when reading over PDD23. That PDD talks about the intention to have an entire hierarchy of exception types, but then mentions a caveat that having too many types is expensive. That got me to thinking, does it really have to be so expensive to make subtypes?

In Parrot when we create a subtype we first create a new VTABLE struct. This struct contains function pointers to all the VTABLE interface functions, plus a small amount of metadata about the class. The VTABLE structure contains a string that is the class name, and a pointer to the Class or PMCProxy PMC that defines the type. There are several function pointers in the VTABLE structure. On a very quick count tonight it looks like there are about 184 of them, and before the vtable_massacre branch merged there were significantly more. Plus other fields, there are over 200 pointers (or fields with equivalent size) in that structure. It's a huge amount of memory to hold for every type, especially if HLLs are expecting to be able to create large amounts of their own types.

Now, consider a case like what is described in PDD23, where we have several exception subtypes which appear to differ from each other only by name. It's a huge waste to give each of these subtypes it's own 184-pointer VTABLE structure, when they are all going to be mostly identical. It's absurd to do it that way, and this is probably a big reason why we don't support the subtypes as described in PDD23.

Consider now the case of user-defined classes and subclasses. This is, I suspect, the largest set of types for most applications. Every PIR-defined object type is an Object PMC, which means the VTABLE structure in C for every user-defined type is 99% identical to the VTABLE structure of Object. All the function pointers, all 184 of them, are identical. The associated NameSpace PMC (after chromatic's refactor the Class PMC instead) contains a list of all the :vtable and :method Sub PMCs. The VTABLEs in Object all search the NameSpace for an override and then launch that override if provided. So for types defined in PIR, we don't need the whole VTABLE struct: just the pointer to the Class PMC that contains the info. We can point the VTABLE pointer to Object's VTABLE and use it without needing an expensive copy.

Instead of creating a Class PMC and a VTABLE structure with over 200 pointers, we only define the Class and the handful of defined overrides that we already define anyway. This is significant memory savings for applications that define many types.

There are two options to implement this kind of idea:
  1. Add a PMC* pointer to every PMC that points to the Class or PMCProxy object that controls it. This could create a mess in GC if Class and PMCProxies weren't marked constant.
  2. Define a new "PMCType" structure. PMCType would contain pointers like a string name, a Class PMC pointer, and maybe a VTABLE pointer. If we add this structure, PMCs get larger by one pointer. If we replace the VTABLE struct and include a pointer to a VTABLE in the PMCType, we have to suffer an additional pointer dereference per VTABLE call (with opportunities to cache).
So this system is not without it's tradeoffs, but with this in place we gain the ability to define large numbers of cheap subclasses of built-in types like what is specified in PDD23, but we also significantly simplify the process of creating new classes in PIR and reduce the amount of memory required for each type.

by Whiteknight (noreply@blogger.com) at February 23, 2010 20:00 UTC

v
^
x

Andrew WhitworthHaskell with LLVM

[Update 23 Feb 2010: I've been informed that this was not a JIT, but instead a native-code generation backend for LLVM demonstrating LLVM's aggressive optimization potential. These numbers are not representative of JIT performance.]

Several people sent me a link to a very interesting blog post yesterday about using LLVM to provide native code generation for Haskell in GHC. I recommend it as an interesting read.

One thing I will point out is that the blog post doesn't really explain the whole situation. He shows plenty of examples where LLVM improved performance, but only mentions briefly that this isn't typical of larger programs and that most programs won't experience as much speed up, if any. So to anybody who reads this remember the caveat that the results aren't typical. JIT speeds some things up and slows some things down. It's not a magic bullet that makes everything better.

by Whiteknight (noreply@blogger.com) at February 23, 2010 08:41 UTC

v
^
x

Andrew WhitworthParrot's Exceptions System

I've been vaguely unhappy with the exceptions system for a while now. Everybody knows that the implementation really hasn't caught up with the spec, and until now I've been pretty happy to write off all my problems as being an artifact of an incomplete implementation. Plus, I've seen some of the great work that some of our developers have done fixing various bugs and implementing various changes, and I'm always willing to let problems slide under the rug if I know good minds are working on them. Today, however, I was talking to Austin and he expressed some criticisms on IRC that really do a great job of expressing the thoughts I (and others) have had, and show that maybe it's the spec that's the problem, not the implementation:

I was going to embark on a rant about this, but then I read the PDD, and i realized the entire exception subsystem is a farce.

That which is documented is inadequate and poorly thought out. And that which is implemented doesn't do even remotely what is documented.

The pdd makes the assumption that exception filtering will be done based on 'type', but provides no mechanism for extending the 'types'. The logical (and widely popular) alternative is to filter based on subclass. The pdd's answer to that is that you can throw anything, if you just stuff it in the payload. So naturally, the parameters to the exception handler objects are the...

...exception and it's *message*.

The throw/rethrow ops differ in that rethrow marks the exception unhandled. IMO, rethrow should be transparent - particularly, the exception backtrace should still point at the original location where the exception occured. The pdd makes nothing of this, and naturally parrot gets it wrong.

There are too many categories of severity, too many attributes (backtrace versus resume versus thrower; severity versus exit code versus type versus class).


So there you have it, a pretty succinct criticism of Parrot's exception system. I'll be elaborating on some of these ideas in the next few days.

by Whiteknight (noreply@blogger.com) at February 23, 2010 08:00 UTC

February 22, 2010

v
^
x

Andrew WhitworthParrotProjects: February 2010 Edition

I haven't posted a ParrotProjects update in a while, but that doesn't mean development of new projects has slowed down at all. Quite the contrary, there are plenty of new projects popping up left and right.

Fun

I can't speak towards how enjoyable it might be, but Fun is an implementation of the Joy language on Parrot. It's still early in development, but it is exciting to have more functional languages targetting Parrot like this.

Digest-Dynpmcs

The Parrot repo currently contains a few dynamic PMCs (dynpmcs) for calculating digests such as MD1, MD4, MD5, and various SHA sums. It has been decided that these kinds of things should find a new home, so our own enterprising developer darbelo has forked them to a new home on Gitorious. Copies of these PMCs are still in the repo pending a deprecation cycle, but after the 2.3 release they will only live on Gitorious.

ParrotSDL

If you need to write any SDL applications, you might be excited to hear that bindings for the multimedia library for Parrot are in active development. ParrotSDL is still a new project and is navigating through some difficulties with Parrot's NCI system. The lead developer, Parrot newcomer kthakore, is also working on SDL bindings for Perl5, so he's very familiar with the whole system.

kthakore is looking for PIR coders to help with the project. Chat about ParrotSDL and the Perl port happens at irc://irc.perl.org/#sdl

NQ-NQP

You might think of NQP as being a language that only runs on Parrot, but you'd be wrong. Developer ash_ has been working on a variant of NQP that runs on top of LLVM instead. This compiler, which is not quite NQP, is a very interesting project and may help to inform our future use of LLVM for Parrot's JIT system.

by Whiteknight (noreply@blogger.com) at February 22, 2010 12:14 UTC

February 21, 2010

v
^
x

Andrew WhitworthOpcode and OpLib PMCs

A few days ago, after some discussion with NotFound and others on #parrot, I started a small branch to experiment with some new PMC types. The results of that work were the two new experimental PMC types Opcode and OpLib. The branch merged into trunk shortly after the 2.1.0 release, so now they are available--experimentally--for people to test and use.

OpLib provides an introspective accessor layer over the interpreter's op table. The OpLib allows us to get a current count of the number of opcodes currently loaded in the system. It can also be used to return the index number of an opcode specified by name, or the name of an opcode given by it's number. On one hand it's important to hide these kinds of details from the average PIR user for reasons of backwards-compatibility and encapsulation. However, for the people writing PIR assemblers and disassemblers in PIR, the information is vital.

These PMC types are read-only types. You can use them to read information about the opcodes in the system, but you can't manipulate that information. However, I'm not against that capability entirely. Imagine the ability to remap an op number to a new custom opcode at runtime. This would allow us to write tools that can attach to live PIR code such as memory usage analyzers, profilers, watchdog monitors, etc. Of course, in most cases this capability would horribly crash the program if used incorrectly, but in the right hands it has much potential. This, if it happens at all, is a long way off.

These two PMC types are still immature but they, along with the ever-improving Packfile PMCs, are already starting to enable some cool new applications. We don't quite expose all the information yet that we need to do complete compilation or decompilation, and some improvements are needed in Parrot itself to fill in some of the remaining gaps, but we are getting closer.

Before the 3.0 release I think we will have a PIR/PASM compiler that runs on top of Parrot natively. This could be written in PIR, of course, or one of the other cool developing languages such as NQP, Winxed, or something else. With this, we could cut IMCC out of the loop almost entirely if we wanted. We could also easily come up with new assembly languages or language dialects for interacting with Parrot. My dislike for PIR is not a secret, so the ability to come up with another, better, assembly language for working with Parrot is an idea that makes me very happy.

by Whiteknight (noreply@blogger.com) at February 21, 2010 14:05 UTC

February 20, 2010

v
^
x

Jonathan WorthingtonUnpacking data structures with signatures

My signature improvements Hague Grant is pretty much wrapped up. I wrote a couple of posts already about the new signature binder and also about signature introspection. In this post I want to talk about some of the other cool stuff I've been working on as part of it.

First, a little background. When you make a call in Perl 6, the arguments are packaged up into a data structure called a capture. A capture contains an arrayish part (for positional parameters) and a hashish part (for smok^Wnamed parameters). The thing you're calling has a signature, which essentially describes where we want the data from a capture to end up. The signature binder is the chunk of code that takes a capture and a signature as inputs, and maps things in the capture to - most of the time, anyway - variables in the lexpad, according to the names given in the signature.

Where things get interesting is that if you take a parameter and coerce it to a Capture, then you can bind that too against a signature. And it so turns out that Perl 6 allows you to write a signature within another signature just for this very purpose. Let's take a look.

multi quicksort([$pivot, *@values]) {
    my @before = @values.grep({ $^n < $pivot });
    my @after = @values.grep({ $^n >= $pivot });
    (quicksort(@before), $pivot, quicksort(@after))
}
multi quicksort( [] ) { () }

Here, instead of writing an array in the signature, we use [...] to specify we want a sub-signature. The binder takes the incoming array and coerces it into a Capture, which essentially flattens it out. We then bind the sub-signature against it, which puts the first item in the incoming array into $pivot and the rest into @values. We then just partition the values and recurse.

The second multi candidate has a nested empty signature, which binds only if the capture is empty. Thus when we have an empty list, we end up there, since the first candidate requires at least one item to bind to $pivot. Multi-dispatch is smart enough to know about sub-signatures and treat them like constraints, which means that you can now use multi-dispatch to distinguish between the deeper structure of your incoming parameters. So, to try it out...

my @unsorted = 1, 9, 28, 3, -9, 10;
my @sorted = quicksort(@unsorted);
say @sorted.perl; # [-9, 1, 3, 9, 10, 28]

It's not just for lists either. An incoming hash can be unpacked as if it had named parameters; for that write the nested signature in (...) rather than [...] (we could have use (...) above too, but [...] implies we expect to be passed a Positional). For any other object, we coerce to a capture by looking at all of the public attributes (things declared has $.foo) up the class hierarchy and making those available as named parameters. Here's an example.

class TreeNode { has $.left; has $.right; }
sub unpack(TreeNode $node (:$left, :$right)) {
    say "Node has L: $left, R: $right";
}
unpack(TreeNode.new(left => 42, right => 99));

This outputs:

Node has L: 42, R: 99

You can probably imagine that a multi and some constraints on the branches gives you some interesting possibilities in writing tree transversals. Also fun is that you can also unpack return values. When you write things like:

my ($a, $b) = foo();

Then you get list assignment. No surprises there. What maybe will surprise you a bit is that Perl 6 actually parses a signature after the my, not just a list of variables. There's a few reasons for that, not least that you can put different type constraints on the variables too. I've referred to signature binding a lot, and it turns out that if instead of writing the assignment operator you write the binding operator, you get signature binding semantics. Which means...you can do unpacks on return values too. So assuming the same TreeNode class:

sub foo() {
    return TreeNode.new(left => 'lol', right => 'rofl');
}
my ($node (:$left, :$right)) := foo();
say "Node has L: $left, R: $right";

This, as you might have guessed, outputs:

Node has L: lol, R: rofl

Note that if you didn't need the $node, you could just omit it (put keep the things that follow nested in another level of parentheses). This works with some built-in classes too, by the way.

It works for some built-in types with accessors too:

sub frac() { return 2/3; }
my ((:$numerator, :$denominator)) := frac();
say "$numerator, $denominator";

Have fun, be creative, submit bugs. :-)

by JonathanWorthington at February 20, 2010 00:21 UTC

February 19, 2010

v
^
x

Andrew WhitworthArgument Passing Refactors

On tuesday it was decided that the next round of PCC refactors should start this sprint. Allison created a branch for the task, after having created a detailed tasklist for it in the previous weeks. To understand what the point of the refactor is I first need to describe the system as it is now.

When we make a function or method call in Parrot, we use fancy-schmance PIR that looks like this:

($P0, $I0) = foo(1, 2.0, $S0)

This looks all well and good, and certainly makes the programmers happy to see familiar syntax. Internally, this call is anything but pretty. In PIR, we can construct a call using a more verbose syntax with some compiler directives:

.const 'Sub' foo = 'foo'
.begin_call
.set_arg 1
.set_arg 2.0
.set_arg $S0
.call foo
.get_result $P0
.get_result $I0
.end_call

This is much worse in terms of syntax and verbosity, but at least it makes good explicit sense: We find the sub object, we get the arguments, we call the function, then we get the result values. This seems all well and good, but this isn't the bottom layer of the cake. These things above are IMCC compiler directives, not actual bytecode. The actual bytecode of the file looks much more like this:

$P97 = find_name "foo"
$P98 = new ['String']
$P98 = "0x0010,0x0013,0x0001"
set_args $P98, 1, 2.0, $S0
$P99 = new ['FixedIntegerArray']
$P99[0] = 0x02
$P99[1] = 0x00
get_results $P99, $P0, $I0
invokecc $P97

There are a few things we can immediately see about this code listing that are a little bit obnoxious. I'll list them out in no particular order:
  1. get_results is called before invokecc. This means we are preparing to retrieve results before we've even called the function. The actual process of copying returns from the callee into the caller happens inside the callee. This creates a fundamental disconnect in a system that is supposed to be continuation-based.
  2. set_params takes a string PMC containing a string of hex values containing flags corresponding to each argment. Inside set_params, that string needs to be painstakingly parsed to get a proper array of flags.
  3. set_params and get_results opcodes both take variadic argument lists. It's impossible for something like a bytecode disassembler to figure out how much memory the opcode takes up without reading the first argument and determining how many flags are specified.
Allison's current branch is intending to address #1. She's going to reverse the logic so that results are collected after the returns are passed. This will allow us to unify the code paths that handle function calls and returns into a single function. Hopefully this will lead to a few optimizations.

#2 and #3 above are a little disconcerting for a variety of reasons. First, we have all the necessary information about the call at compile time. We have the number and types of the arguments, and all the associated flags that govern what they are and how they are used. All this information is passed directly to set_args, which uses it to built a CallContext PMC.

To recap, we have all the information we need to build the CallContext PMC at compile time.

So let's ignore for a second how stupid it is to iterate character-by-character over a String PMC to get the flags, when it's obvious that the results mechanism uses a much better suited integer array for the same purpose. The question isn't how we store the flags in the bytecode, it's why we're bothering to store them separately at all? Why don't we create a CallContext PMC constant, or maybe some new kind of "CallArguments" PMC constant at compile time, cache it in the bytecode in exactly the form we need the data to be in, and use that when performing calls?

The question is a rhetorical one, and I've opened a ticket to suggest we bring a little bit of sanity to this code and maybe see some serious performance wins as well. Since Allison is already working on this code, it should be pretty easy to build on that momentum and fix the last major wart that the calling code has.

by Whiteknight (noreply@blogger.com) at February 19, 2010 13:01 UTC

v
^
x

Andrew WhitworthSpecial Release: Parrot 2.1.1

Earlier today we got a bug report from the #perl6 folks that Parrot was leaking memory. chromatic put a fix together and it was decided to cut an emergency bug-fix release. Since Rakudo bases it's releases on the previous Parrot release, and they can't really put out a release that is known to leak memory, they now have the special Parrot 2.1.1.

So if you were using 2.1.0 for your application and would like to plug a memory leak in long-running programs, please update to 2.1.1 instead.

by Whiteknight (noreply@blogger.com) at February 19, 2010 11:41 UTC

v
^
x

chromaticPerl 6 Design Minutes for 10 February 2010

The Perl 6 design team met by phone on 10 February 2010. Larry, Patrick, Will, Jerry, and chromatic attended.

Will:

  • working on simplifying Parrot's build process
  • trying to remove an invocation of Perl 5 for every compilation
  • it's old and a waste of many things
  • hope to have that removed by the end of the week

Jerry:

  • the new #ps time should help me to attend
  • looking forward to a Parrot/Rakudo workshop, possibly at YAPC::NA
  • already working on artwork
  • would like to get the command-line done for Rakudo *
  • lacking tuits
  • need some time with Patrick over the next few days
  • weekends should free up after next week

Larry:

  • refined the specified semantics of bitwise operators
  • changed ugly **() special form to prefix:<||> by analogy to prefix:<|>, and relationship of ** to *.
  • STD now accepts prefix || for slice interpolation
  • deleted old p5=> that masak++ noticed
  • added explicit copyright notices to STD files
  • spruced up error message on -> in postfix position (either pointy block or Perl 5 method dereference)
  • mostly just served as Chief Resident Oracle on IRC

Patrick:

  • had a nice vacation in Florida
  • didn't have as much hacking time, due to plane delays
  • should get back to coding later today
  • working on the Rakudo hackathon in Copenhagen on March 6 and 7
  • core hackers session on 8th and 9th there
  • looking forward to that

c:

  • fixed a couple of bugs
  • did a bit of optimization
  • wrote out a GC optimization plan
  • wrote plan for a sweep free GC
  • think we can get those both going in the next week

Jerry:

  • noticing a lot of new branches and removals and new things in Parrot recently
  • are these following the roadmap?
  • are people going off on their own?

Will:

  • the deprecation stuff is all documented and seems reasonable
  • Andrew's discussion today is new stuff, but a reasonable discussion to have
  • I'm working on cleanup stuff
  • having a roadmap and trying to force people to stick to it is always... impossible
  • people will work on what they find shiny or what blocks them
  • if it's not on the roadmap, it's okay if it's not hurting the project

Jerry:

  • we've changed our deprecation cycle
  • was that change enough to unstick people to do something?
  • was it beneficial to our users and our core developers?

Will:

  • definitely a positive

Jerry:

  • still not a lot of mailing list discussion
  • how is Parrot meeting Rakudo's goals for the Rakudo * release?

Patrick:

  • as it stands today, it's adequate for what we need
  • if it weren't, you'd be hearing about it
  • the next thing for us is performance
  • any performance improvements are welcome
  • the biggest thing there is GC, and that's an area of focus
  • no big pushes I need to make lately
  • have noticed Andrew's desire to remove some Parrot features
  • they're useful from an HLL perspective
  • I do worry about changes to core Parrot divorced from HLL concerns
  • I don't know who's going to be the traffic cop for those changes
  • I don't have time to do it

Will:

  • based on the discussion in channel today
  • making Parrot leaner, faster, smaller may not necessarily jive with keeping the features as they exist now
  • he's not trying to remove features
  • he's trying to get the same effect with a faster Parrot

Patrick:

  • I agree with those motives

Will:

  • even if we do rewrite things, they have to work more or less as they do right now

Patrick:

  • reviewing the roadmap from December....
  • GC work is happening
  • no one seems to work on subroutine leave semantics
  • Stephen Weeks is the best one to look at that
  • performance is our biggest need right now
  • but the -ng branch performs better for various reasons
  • has anyone built -ng against the latest Parrot?

Will:

  • I think Vasily has checked his branches against Rakudo
  • not sure if that was against -ng

Patrick:

  • master and -ng are pretty close together in terms of the Parrot core
  • we'll make -ng the master branch very soon
  • unless I get bogged down on iterators again

by chromatic at February 19, 2010 03:31 UTC

February 18, 2010

v
^
x

chromaticPerl 6 Design Minutes for 03 February 2010

The Perl 6 design team met by phone on 03 February 2010. Larry, Patrick, and chromatic attended.

Larry:

  • more cleanup of iteration semantics
  • no longer signal end with Nil, but with special EMPTY failure
  • this can support either unthrown or thrown exception styles
  • added in batching iterator interface
  • proposed new E operator for efficient list end detection; gathering feedback
  • detangling of sigils from contexts; for example, @ no longer implies flattening
  • coercions all defined to take parcels so they don't flatten accidentally
  • more cleanup of various types (captures,lists) that should be considered parcels
  • forcibly amputated the @@ sigil; have fixed up most of the bloody stumps
  • instead of *@@ parameters, we now have a ** slice marker on parameters
  • removed references to [;] reduction since it wouldn't work (because of return parcel embedding)
  • new **() interpolator instead
  • clarified that function calls in a list are called eagerly, but their results are potentially lazy
  • (also mentioned ways to make the call lazy too)
  • renamed iterator methods for more clarity, removing contradictory usages of "item"
  • iterators now iterated with get, getobj, batch, and batchobj
  • specced that a missing maximum allows the iterator to decide batch size.
  • get and getobj must be atomic under multi-threading so message queues work (but maybe that's backwards, and push should be atomic)
  • slice now defined to turn subparcels into Seq objects
  • spec that most of the work of flat and slice are done by binding to *@ or **@
  • new flat operator detangles flattening semantics from normal unmarked list semantics
  • for all specced functions, *@@ parameters changed to **@
  • multiple dimensions now defined in terms of nested parcels, not feeds, to avoid implying multithreading on every subscript
  • either range or series iterator now autotruncates in a subscript
  • no autotruncation on left end of a subscript anymore
  • did some cleanup of feeds; more is needed to have clearer target semantics
  • feeds no longer take a whatever target with implicit semantics; just use an explicit target
  • not much hacking, but edited tests to change @@ to something else appropriate
  • tracked name changes in CORE
  • wrote a long screed on why Perl 6 has one-pass parsing and why typenames must be pre-declared

Patrick:

  • working on interators and lists in the -ng branch
  • brought up a few issues with Larry as appropriate
  • took issue with others, as appropriate
  • happy with our progress there
  • expect to make this branch the new master in the next day or so
  • will be some regressions, but it's time to do it
  • there's no development taking place on other branches, so let's commit and do it
  • people will be comfortable about doing their own work and not having it lost on some other branch

c:

  • looking into GC tuning and ideas
  • still working on getting methods out of namespaces
  • need four uninterrupted hours

by chromatic at February 18, 2010 04:58 UTC

v
^
x

Jonathan WorthingtonThe first release from ng is coming!

Tomorrow's regularly scheduled Rakudo release is the first one since the long-running "ng" branch became master. It represents both a huge step forward and at the same time a fairly major regression. Internally, the changes are enormous; some of the biggest include:

  • We're parsing using a new implementation of Perl 6 regexes by pmichaud++. It is a huge improvement, supporting amongst other things protoregexes, a basic form of LTM, variable declarations - including contextuals - inside regexes and more. The AST it generates is part of the PAST tree rather than having a distinct AST, which is a neater, more hackable approach. The issues with lexical scopes and regexes are resolved. Closures in regexes work.
  • NQP is also re-built atop of this. It incorporates regex and grammar support, so now we run both grammar and actions through the one compiler. It's bootstrapped.
  • In light of those major changes, we started putting the grammar back together from scratch. A large part of this was copy and paste - from STD.pm. The grammar we have now is far, far closer to STD than what we had before. Operator precedence parsing is handled in the same kind of way. We've started to incorporate some of the nice STD error detection bits, and catch and nicely report some Perl 5-isms.
  • Since the grammar got re-done, we've been taking the same approach with the actions (the methods that take parse tree nodes and make AST nodes). Thanks to contextual variable support and other improvements, a lot of stuff got WAY cleaner.
  • The list/array implementation has been done over, and this time it's lazy. There's certainly rough edges, but it's getting better every day. The work to implement laziness has led to many areas of the spec getting fleshed out, too - a consequence of being the first implementation on the scene I guess.
  • All class and role construction is done through a meta-model rather than "magic". The Parrot role composition algorithm is no longer relied upon, instead we have our own implementation mostly written in NQP.
  • The assignment model was improved to do much less copying, so we should potentially perform a bit better there.
  • Lexical handling was refactored somewhat, and the changes should eliminate a common source of those pesky Null PMC Access errors.

Every one of these - and some others I didn't mention - are important for getting us towards the Rakudo * release. The downside is that since we've essentially taken Rakudo apart and put it back together again - albeit on far, far better foundations - we're still some way from getting all of the language constructs, built-in types and functions back in place that we had before. It's often not just a case of copy-paste; many of the list related things now have to be written with laziness in mind, for example.

So anyway, if you download tomorrow's release and your code doesn't compile or run, this post should explain - at least at a higher level - why. After a slower December and January, Rakudo development has now once again picked up an incredible pace, and the last couple of week's efforts by many Rakudo hackers have made this release far better than I had feared it was going to be. If we can keep this up, the March release should be a very exciting one.

by JonathanWorthington at February 18, 2010 01:18 UTC

February 16, 2010

v
^
x

Andrew WhitworthParrot 2.1.0 Released

Just a few moments ago Daniel Arbelo (darbelo) released Parrot 2.1.0 "As Scheduled". It's always nice to see a software project released on time and under budget!

Parrot 2.2.0 is scheduled for 16 March, and will be released by Christoph Otto (cotto).

by Whiteknight (noreply@blogger.com) at February 16, 2010 14:52 UTC

Perl.org sites : books | dev | history | jobs | learn | lists | use   
When you need perl, think perl.org  
the camel    
(Last updated: March 13, 2010 12:00 GMT)