On Topic
Exploring "topic" and "topicalizers" in Perl 6
Conference: YAPC::NA
Location: Saint Louis, Missouri
Slides: PDF
Transcript: A version of this talk was published as an article on Perl.com. An archive copy is preserved below.
On Topic
30 October 2002
A few concepts in Perl 6 are strange at first sight. They seem hard to understand, but it’s only because they’re new and different. They aren’t deep mystical concepts known only to Tibetan lamas. Anyone can understand them, but it helps to start with a common-sense explanation.
This article looks at the concepts of “topic” and “topicalizer”. The words aren’t quotes from a particularly nasty bit of Vogon poetry. They’re actually common terms from the field of linguistics … which some might say is even worse. Still, the best way to understand topic in Perl is to understand its source.
Topic in Linguistics
Every larger unit of human language has a topic – whether it’s a sentence, a paragraph, a conversation or some other sizable chunk. The topic is the central idea of the unit. It’s the focus of what’s communicated. Native speakers usually have no trouble figuring out what the current topic is, when they think about it. If two little old ladies were talking over a cup of tea:
“I saw Lister yesterday.”
“Really? What’s he up to these days?”
“Oh, you know, drunk again, and mooning over that awful Krissy Kochanski.”
etc …
and someone asked an observer what the conversation was about, they would instantly reply “Lister”.
Topicalizers in Linguistics
A topicalizer is simply a word that marks some thing or some idea as the current topic. In English, we have topicalizers such as “for”, “given” and “regarding”:
“For our first trick tonight, ladies and gentlemen, my partner Kryten will attempt to eat a boiled egg.”
“Given that God is infinite, and that the universe is also infinite, would you like a toasted tea-cake?”
“Regarding topicalizers, I should point out that this sentence starts with one.”
Topic in Perl
Now we need to adapt the linguistic definition of topic to Perl. In Perl, the topic is the most important variable in a block of code. It can be any variable: a scalar, array, hash, object. To be more accurate, it’s the underlying storage location that’s the topic. This might sound a little too abstract, but it’s an important distinction. Variables are really only names we use to get at stored values. A single storage location can have multiple names. In English, “Rimmer”, “he” and “the hologram” could all appear in a text meaning the same person. In Perl $_
, $name
, %characters{‘title'}
and any number of other variables could appear in a section of code as different ways of accessing a single value. And if that value is the current topic then all the variables connected to it are too. This will be important later.
At this point, the average reader is thinking “That’s very interesting, but why do I care what the topic is? I’ve gotten along just fine without it all these years. Why start now?”
And the answer is: it’s not required. No one has to understand topic to use it, any more than they have to understand gravity to catch a ball.
Why? It’s a really simple rule. We’ll call it the first law of topics: Topic is $_
. Any time a value becomes the current topic, $_
becomes another one of the names for that value. We say $_
is aliased to it. So, all it takes to use topic is to use $_
either explicitly or implicitly in all the old familiar places, like chomp
, print
and substitutions, and in a few new places, like when
statements and with unary dot.
Even so, it’s still a good idea to understand topic. Understanding gravity makes a number of things that seem unrelated suddenly fit. Things like apples falling, planes crashing, the way the moon and sun move, baseball, and rollercoasters. It’s the same with topic. Any programmer can use $_
without understanding topic. But when they understand topic, it becomes a logical system instead of a random collection of “things $_
does”. I like logical systems.
Topicalizers in Perl
A topicalizer in Perl is a keyword or construct that makes a value become the current topic. The current topicalizers are given
, for
, method
, rule
, ->
, certain bare closures, and CATCH
blocks, but the cast of characters keeps growing.
Coal and Switches
Perl 6’s switch, the given
construct, is the prime example of a topicalizer. Its sole purpose is to make the value passed to it the current topic within the block.
given $name {
when "Lister" { print "I'm a curryaholic." }
when "Cat" { print "Orange?! With this suit?!" }
when "Rimmer" { print "4,691 irradiated haggis." }
}
So, in this example, the given
makes $name
the current topic by aliasing it to $_
. Then the when
statements compare against $_
. After the block, $_
regains the value in the outer scope. In Perl 6, $_
is just an ordinary lexical variable and every topicalizer creates its own $_
variable, lexically scoped to its associated block.
Fruit Loops and M&M’s
The for
loop is the classic topicalizer. It was topicalizing long before most of us had a word for the activity. for
is similar to given
, but instead of creating a single topic, it creates a series of topics, one for each iteration.
for @orders {
when /scone/ {
print "Would you like some toast?";
}
when /croissant/ {
print "Hot, buttered, scrummy toast?";
}
when /toast/ {
print "Really? How about a muffin?";
}
}
Just like given
, for
takes a value, in this case the current element of the array, and makes it the topic.
In simple cases like these, both for
and given
create the $_
alias read-write. This is the same as Perl 5: any change to $_
inside the block modifies the original value.
Bow and Arrow
The new and improved arrow (->
) is the most flexible topicalizer. It appears in a variety of different contexts. By itself ->
creates an anonymous subroutine just like sub
.
-> $param { ... }
# is the same as:
sub ($param) { ... }
The only differences are that ->
doesn’t require parentheses around its parameter list, and that ->
topicalizes its first parameter.
In the following example, the first expression creates an anonymous sub and stores it in $cleanup
. When the sub stored in $cleanup
executes, the $line parameter takes the string argument and becomes the current topic, so both $line
and $_
are aliased to the value of $intro
. The usual suspects, chomp
, substitution and print
then use the topic as default.
$cleanup = -> $line is rw {
s:w/Captain Rimmer!/the bloke/;
$line _= " who cleans the soup machine!";
print;
}
$intro = "Fear not, I'm Captain Rimmer!";
$cleanup($intro);
Unlike the simple for
and given
, the arrow creates its aliases read-only by default. The is rw
property marks both the named alias and the $_
alias as read-write. Without the property attached, any statements within the block that modify $line
or $_
cause a compile-time error just as if they had been explicitly flagged is constant
.
The arrow isn’t limited to working alone. It can also combine with other topicalizers. When it does, it creates a named alias for the current topic.
for @lines -> $line is rw {
s:w/Captain Rimmer!/the bloke/;
$line _= " who cleans the soup machine!";
print;
}
As the for
iterates over the array it aliases every element in turn to $line
and to $_
. This takes the place of the Perl 5 way of aliasing a loop variable:
# Perl 5
for my $line (@lines) {
$line =~ s/Captain Rimmer!/the bloke/;
$line .= " who cleans the soup machine!";
print $line;
}
The Perl 6 way has some added benefits, though. Since the arrow aliases both $line
and $_
to the current value, it works with the defaulting constructs, like print
, but also provides a more meaningful name than $_
when an explicitly named variable is necessary.
The first example of for
and the example of the anonymous sub reference are fascinatingly similar. The only difference is one is stored in a variable to be called later and one is tacked onto a for
. Really, all the for
example has done is replace the loop’s block with an anonymous sub reference. This is the second advantage of the Perl 6 way. Because $line
is now the parameter of a subroutine, it’s automatically lexically scoped to the block. The my
happens implicitly.
The arrow also combines with constructs that aren’t topicalizers, like if
and while
, and allows them to topicalize.
if %people{$name}{'details'}{'age'} -> $age {
print "$age already?\n";
if $age > 3000000 {
print "How was your stasis?\n";
} elsif $age < 10 {
print "How 'bout a muffin?\n";
}
}
As the if
tests the truth value of the element of the data structure, the arrow also aliases that value to $age
and to $_
. The example could have accessed the hash of hashes of hashes directly each time it needed the age value, but the short alias is much more convenient.
This feature is really only useful with simple truth tests. The truth value tested in the following example isn’t 3
or $counter
, it’s the result of a complex conditional, $counter > 3
.
if $counter > 3 -> $value {
# do something with $value
}
The result will be true or false, but if it’s false, the block will never execute. In fact, when the truth test is false, $value
is never aliased at all. It simply doesn’t exist. A lexically scoped variable with a true value would have the same effect.
if $counter > 3 {
my $value = 1;
# do something with $value
}
So the arrow isn’t a Ronco plum-pitter, yoghurt-squirter, do-everything tool. When it’s useful, it’s very, very useful, but when it’s not… well… don’t use it. :)
Method in my Madness
Methods topicalize their invocant. An invocant is the object on which a method is called. After saying that 10 times fast, anyone can see why it needed a name. The design team chose “invocant”. Methods topicalize the invocant when it’s a named parameter like $self
:
method sub_ether ($self: $message) {
.transmit( .encode($message) );
}
and when it’s left implicit:
method sub_ether {
.transmit( .encoded_message );
}
method sub_ether (: $message) {
.transmit( .encode($message) );
}
This is handy in short methods. The unary dot is just another defaulting construct. Any method call without an explicit object executes on the current topic. In the previous examples, .transmit
is exactly the same as $self.transmit
and $_.transmit
.
The Sub of All Fears
Unlike methods and the arrow, ordinary subs don’t topicalize, at least not by default. The following example will either print nothing, or else print a stray $_
that is in lexical scope wherever the sub is defined.
sub eddy ($space, $time) {
print;
}
Subs can topicalize, though, with a little help from the is topic
property. The property flags a parameter as the topic within the subroutine’s block. It can be attached to any parameter in the list, but not to more than one parameter at a time.
sub eddy ($space, $time is topic) {
print;
}
Built-in functions like print
that default to the current topic when they’re called are incredibly useful. Wouldn’t it be great to have user-defined subroutines that behaved the same way? But, since $_
is now just an ordinary lexical variable, a subroutine generally can’t access the topic from its caller. It can only access variables in the scope where it is defined.
The is given
property gives subroutines access to their caller’s topic. It accesses the topic within the caller’s scope and binds it to the property’s parameter.
sub print_quotes is given($default) {
print "Random Quote: ";
print $default;
}
...
given $quote {
print_quotes;
}
The is given
property can appear on subroutines with a full parameter list as well. The only restriction is that the property’s parameter can’t have the same name as a parameter in the full list.
sub print_quotes (*@quotes) is given($default) {
print "Random Quote: ";
if ( @quotes.length > 0 ) {
print @quotes;
} else {
print $default;
}
}
(A true simulation of the built-in print function would use multimethod dispatch, but that’s outside the scope of this article.)
The property’s parameter can have any name, but a parameter named $_
or one that has the is topic
property attached will set the caller’s topic as the topic within the subroutine as well.
sub print_quotes is given($_) {
# alias $_ to caller's $_
print; # prints the value of caller's $_
}
# or
sub print_quotes is given($default is topic) {
# alias $default to caller's $_
# and to $_ within the subroutine
print; # prints $default, the value of caller's $_
}
Perl Rules!
Grammar rules and closures within a rule topicalize their state object. This is convenient because it means methods on the state object can use the unary dot syntax.
m:each/ aardvark { print .pos } /
The state object for a rule is similar to the $self object for a method. It’s an instance of a grammar class. Named rules are really just named methods called on the state object and anonymous rules and closures within a rule are really just anonymous methods called on the state object. Unfortunately, that’s just enough information to be tantalizing without actually being useful, but a full explanation could take up an entire article. Still, knowing that the state object is like a $self object is a step in the right direction.
The CATCH
-er in the try
CATCH
blocks always topicalize the error variable, $!
. This streamlines the exception catching syntax because the CATCH
block acts as its own switch statement.
CATCH {
when Err::WrongUniverse {
try_new_universe();
}
}
This is much tidier than the equivalent:
CATCH {
if $!.isa(Err::WrongUniverse) {
try_new_universe();
}
}
The Bare Truth
Bare closures topicalize their first argument. If the block uses placeholder variables, the topic is also aliased to the Unicode-abetically first variable. The topic is lexically scoped to the block, but it is a read-write parameter, so modifications to $_
within the block modify the original value.
%commands = (
add => { $^a + $^b },
incr => { $_ + 1 },
);
Constructs like grep
and map
are no longer special because they use $_
within the block argument. They simply benefit from the normal behavior of bare blocks.
@names = map { chomp; split /\s*/; } @input;
Nesting Instinct
Nested topicalizers are a little more complicated. The following example starts with $name
as the topic and the first case matches against it. Within the case, the print
also defaults to the current topic. The second case is a little more complicated; it contains a loop. Within the loop is another print
statement. This one also defaults to the current topic which is… Hmmmm… is the topic $name
or $quote
?
given $name {
when /Rimmer/ {
print;
print rimmer_quote();
}
when /Kryten/ {
for kryten_quotes() -> $quote {
print;
}
}
}
The answer falls out of a few simple rules:
- There is only one topic at a time.
A series of nested topicalizers doesn’t create a collection of topics. The interpreter doesn’t have to sort through a complicated set of options to know what the current topic is. There will never be any ambiguity. A script or module may have a series of different topics, but only one at a time.
- Topic obeys the lexical scope of topicalizers.
For the programmer, determining the current topic is never any more complex than tracing outward to the nearest topicalizer. The scope of the topic is always restricted to the scope of the topicalizer that created it. So the example above will print the inner topic, $quote
.
An equivalent pair of nested topicalizers in Perl 5 would have printed the outer topic instead. That’s because topicalizers never created a $_
alias at the same time as a named alias. It’s a fairly common trick in Perl 5 to use a named alias with a for
loop to avoid clobbering $_
. That trick doesn’t work anymore. Which brings us to the third rule:
- To keep outer topics, use a named alias.
It’s just the old trick inverted. Instead of using a named alias with the inner topicalizer to avoid clobbering the outer topic, it uses a named alias with the outer topicalizer to access it after the topic has changed.
when /Kryten/ {
for kryten_quotes() -> $quote {
print $name;
print;
}
This really fits better with the way humans think. It makes more sense to give a name to the thing worth keeping than to give a name to the thing to be thrown away.
Methods have the same problem, but in their case it means that even though the unary dot is really handy in simple methods, in more complex methods with nested topicalizers it’s better to use a named parameter for the invocant.
method locate ($self: *@characters) {
.cleanup_names(@characters);
for @characters -> $name {
.display_location($name);
}
.change_location('Holly');
}
The .display_location
method won’t call a method on $self
, it will try to call a method on $name
, and fail (unless $name is an object with a .display_location
method). The code will have to call the method as $self.display_location()
instead. It would be clearer to add $self
in front of the other method calls as well, but that’s entirely a matter of style.
Multiple aliases
Another complication is that topicalizers aren’t restricted to a single alias. A for
loop might iterate through an array several parameters at a time:
for @characters -> $role1, $role2, $role3 {
...
}
iterate through several arrays, one after the other, taking a few parameters at a time:
for @humans, @betelgeusians, @vogons -> $role1, $role2 {
...
}
or iterate through several arrays at the same time:
for @characters; @locations -> $name; $place {
...
}
But no matter how complicated the code gets, the topic stays the same. The rules of the game are:
- There is only one topic.
This rule should look familiar. It might even deserve to be called “the second law of topics”. With nested topicalizers, the restriction means that two topics from different scopes will never be accessible at the same time. With multiple aliases, it means that while a topicalizer can create more than one alias, only one of the aliases can be the topic.
- The topic is the first parameter.
This rule makes it easy to pick out the topic. In the first two examples, the topic is $role1
, and in the third example it’s $name
.
There is one exception to the rule. The is topic
property can select any parameter as topic in place of the default first parameter.
for @characters -> $role1, $role2 is topic, $role3 {
...
}
The Final Frontier
That’s pretty much all there is to topic. Hopefully, this article has pushed one more thing into the “Gee, that’s easy!” category. But, if not, carry away one idea: that first law, “Topic is $_
”. The next time conversation turns to Perl 6 and topic, that simple translation will make it understandable.