Bob Lied

Posted on Dec 22 • Edited on Dec 24

PWC 353 To each (array) his own

#pwc #perlweeklychallenge #perl

This week's challenges were early and on the easy side, which was appreciated as the holidays loom. The musical interlude this week ignores the challenge tasks in favor of seasonal light jazz; I have Spotify playing "Winter's Songs: A Windham Hill Christmas."

Task 1: Max Words

The Task

You are given an array of sentences. Write a script to return the maximum number of words that appear in a single sentence.

Example 1:
- Input: @sentences = ("Hello world", "This is a test", "Perl is great")
- Output: 4

The Solution

An easy one -- an early Xmas gift? Split each sentence into words and count them, then take the maximum. Completely ignoring all the complexities of punctuation and internationalization, it's a one-liner:

sub maxWords(@sentence)
{
    use List::Util qw/max/;
    return max map { scalar(split(" ", $_)) } @sentence;
}

The only thing mildly tricky here is transforming the list of words into a count. Inside the body of the map block, we're in array context, so split will return a list. Coercing it with scalar gives the count.

Task 2: Validate Coupon

The Task

You are given three arrays, @codes, @names and @status. Write a script to validate codes in the given array. A code is valid when the following conditions are true:

codes[i] is non-empty and consists only of alphanumeric characters (a-z, A-Z, 0-9) and underscores (_).
names[i] is one of the following four categories: "electronics", "grocery", "pharmacy", "restaurant".
status[i] is true.

Return an array of booleans indicating validity: output[i] is true if and only if codes[i], names[i] and status[i] are all valid.

Example 1:
- Input:
  - @codes = ("A123", "B_456", "C789", "D@1", "E123")
  - @names = ("electronics", "restaurant", "electronics", "pharmacy", "grocery")
  - @status = ("true", "false", "true", "true", "true")
- Output: (true, false, true, false, true)

Thoughts

The volume of text for this one was daunting at first glance, but it turns out to be mostly example data and actually quite easy.

The validation is straight-forward. @codes is a regular expression match; @names is a keyword lookup from a dictionary table (hash); and @status is just a string compare.

We'll be operating on parallel arrays. I see two ways of doing it. First, we could use a variable to loop over the indexes. That's going to yield code with a lot of array subscripts. Or second, we could use each_array from List::More::Utils to do the indexing for us. I'm curious which is more efficient.

To the code-sleigh, Rudolph!

Loop with `indexed`

sub valid($code, $name, $status)
{
    state %ValidName = ( electronics => true, grocery => true,
                         pharmacy => true, restaurant => true );
    my @valid = (true) x $code->@*;

    for my ($i, $c) ( indexed $code->@* )
    {
        $valid[$i] &&= ( $code->[$i] !~ m/[^a-zA-Z0-9_]/ );
        $valid[$i] &&= ( exists $ValidName{$name->[$i]} );
        $valid[$i] &&= $status->[$i] eq "true";
    }
    return \@valid;
}

Notes:

Passing multiple arrays as subroutine parameters implies using references. We could use traditional prototype attributes to make it possible to pass arrays, but let's not.
state %ValidName -- These are valid keywords for the @names input array. Using state makes this a static table, initialized only the first time the subroutine is entered.
@valid -- Our output array, same size as the input array. Assume they're all valid and turn elements false if any check fails.
indexed -- My favorite recent Perl improvement. Returns both the index and the value from an array during a for loop.
valid->[$i] &&= ... -- Any failing test will turn the value to false.
$code->[$i] !~ m/[^a-zA-Z0-9_]/ -- The code has to consist entirely of valid characters, or conversely, it must not match (!~) anything that contains a character other than those. Later I went back to look up exactly what \w and \W mean in Perl regular expressions and realized that I could have used that instead of the character classes.
exists $ValidName{...} -- Use exists explicitly to return a boolean value. A simple lookup that fails would return undef, and will make the result of the expression undef instead of false. Although undef is treated as false in many contexts, it failed the unit test.
return \@valid -- Return a reference. This is convenient for the is() function from the Test2::V0 unit testing framework, and returning a reference also improves performance by avoiding an array copy.

Loop with array iterator

sub validEach($code, $name, $status)
{
    state %ValidName = ( electronics => true, grocery => true,
                         pharmacy => true, restaurant => true );
    my @valid;

    use List::MoreUtils qw/each_arrayref/;
    my $ea = each_arrayref($code, $name, $status);
    while ( my ($c, $n, $s) = $ea->() )
    {
        push @valid, ( $c !~ m/[^a-zA-Z0-9_]/ )
                  && ( exists $ValidName{$n}    )
                  && ( $s eq "true"             );
    }
    return \@valid;
}

Notes:

This is basically the same function, except that the explicit indexing has been replaced by an iterator.
each_arrayref -- There's an each_array function, but since we have array references, this one is convenient.
my $ea = ... -- Creates an iterator over the three arrays.
ea->() -- Call the iterator. ea is the equivalent of a function pointer, and this is function dereference syntax.
while ( my ($c, $n, $s) = ea->() ) -- Each call to the iterator returns a tuple of the next element from each of the $code, $name, and $status arrays.
push @valid, ... -- The same three checks as before, but building the list instead of modifying it, because we don't have an index in hand.

Performance check

I think the iterator version is a little cleaner, because it eliminates the clutter of repeated array subscripts. But I was curious how it compared in efficiency, so I set up a quick benchmark using the examples.

It turns out that the explicit indexing is about twice as efficient as the each_arrayref iterator. Still gonna use it, though.

$ perl ch-2.pl -b 100000
             Rate iterator forindex
iterator  72464/s       --     -47%
forindex 136986/s      89%       --

To do the benchmarking, I set up the examples as an array of test cases, and used Benchmark::cmpthese to do a lot of iterations.

sub runBenchmark($repeat)
{
    use Benchmark qw/cmpthese/;

    cmpthese($repeat, {
            forindex => sub {
                for ( @case )
                {
                    valid($_->{codes}, $_->{names}, $_->{status});
                }
            },
            iterator => sub {
                for ( @case )
                {
                    validEach($_->{codes}, $_->{names}, $_->{status});
                }
            },
        });
}

And the input data looks like this:

my @case = (
    { id     => 'Example 1',
      codes  => [ 'A123',        'B_456',      'C789',        'D@1',      'E123' ],
      names  => [ 'electronics', 'restaurant', 'electronics', 'pharmacy', 'grocery' ],
      status => [ 'true',        'false',      'true',        'true',     'true' ],
      expect => [ true, false, true, false, true ],
    },
    { id     => 'Example 2',
      codes  => [ 'Z_9',      'AB_12',       'G01',     'X99',         'test' ],
      names  => [ 'pharmacy', 'electronics', 'grocery', 'electronics', 'unknown' ],
      status => [ 'true',     'true',        'false',   'true',        'true' ],
      expect => [ true, true, false, true, false ],
    },
    { id     => 'Example 3',
      codes  => [ '_123',       '123',         '',            'Coupon_A', 'Alpha' ],
      names  => [ 'restaurant', 'electronics', 'electronics', 'pharmacy', 'grocery' ],
      status => [ 'true',       'true',        'false',       'true',     'true' ],
      expect => [ true, true, false, true, true ],
    },
    { id     => 'Example 4',
      codes  => [ 'ITEM_1',      'ITEM_2',      'ITEM_3',  'ITEM_4' ],
      names  => [ 'electronics', 'electronics', 'grocery', 'grocery' ],
      status => [ 'true',        'true',        'true',    'true' ],
      expect => [ true, true, true, true ],
    },
    { id     => 'Example 5',
      codes  => [ 'CAFE_X',     'ELEC_100',    'FOOD_1',  'DRUG_A',   'ELEC_99' ],
      names  => [ 'restaurant', 'electronics', 'grocery', 'pharmacy', 'electronics' ],
      status => [ 'true',       'true',        'true',    'true',     'false' ],
      expect => [ true, true, true, true, false ],
    },
);

DEV Community

PWC 353 To each (array) his own

Task 1: Max Words

The Task

The Solution

Task 2: Validate Coupon

The Task

Thoughts

To the code-sleigh, Rudolph!

Loop with `indexed`

Loop with array iterator

Performance check

Top comments (0)

The Task

The Solution

The Task

Thoughts

To the code-sleigh, Rudolph!

Loop with indexed

Loop with array iterator

Performance check

Loop with `indexed`