AoC 2023 D1P2: Convert Words to Digits

I can’t link to part two of today’s problem because you have to solve part 1 to unlock the second half of the web page; but I’m sure it’s been countlessly reposted by now and I’ll summarize it:

Oops, some of the digits that we want to add up have been written as English words instead of numerals. Recognize the words one, two, three, four, five, six, seven, eight, and nine as digits also. And we have new sample input:

two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen

Which yields 29 + 83 + 13 + 14 + 42 + 14 + 76 = 281.

So … we need to convert number-words into digits. But:

  • Not zero.
  • Only need to convert the first and last number-words of a line, not every occurrence.
  • Only need to convert number words that are closer to the start or end of a line than any digits.
  • Not specifically stated in the problem: Watch out for lines ending in things like oneight.

Tricksy hobbitses!

So I cloned my part 1 solution and added to it:

my %value = (
one => 1, two => 2, three => 3, four => 4, five => 5,
six => 6, seven => 7, eight => 8, nine => 9,
);
my $wordre = join("|", keys %value);

I create an associative array / hash that will let me look up the value of every number-word that this problem asks us to replace. I then extract the keys of that hash — the number-words — and join them into a list separated by | pipe (regular expression or) characters. Note that the keys are extracted in arbitrary order and I don’t sort them because I don’t actually care about the order.

This gives me a string like three|one|four|five|nine|... that has all of the number-words listed as alternatives.

Then in the loop to process each line, we need to do some conversions:

# Substitute the first number word if no digits before it.
s/^([^\d]*?)($wordre)/$1$value{$2}/;

If a line starts ^ with things that aren’t digits [^\d], zero or more of them *, and is then followed by ($wordre) a number-word, preserve everything that was before the number-word and replace the number-word with the value of that word — the digit.

Note that for this problem, I don’t actually need the text $1 that was before the number-word so I don’t have to preserve that; but it’s habit. If this were time-sensitive (computationally-heavy or big-data input), I’d be more careful about extracting only needed information with minimal manipulation.

Note also that the ? operator after the * says to match the fewest non-digit characters possible. Without that, a line like onetwothreeboohoohoo7 would be converted to onetwo3boohoohoo7 (because one and two are non-digits that we’re accepting as many of as possible) instead of the correct 1twothreeboohoohoo7. A normal * is called a “greedy” operator, eating as much of the string as possible; adding ? after it makes it non-greedy, eating as little of the string as possible.

# Substitute the last number word if no digits after it.
s/(.*)($wordre)([^\d]*)$/$1$value{$2}$3/;

Here, if we eat up as much of the line as possible (.*) and remember it for later; and then match a number-word ($wordre) and remember it for later; and that’s followed only by things that aren’t digits ([^\d]*) (that we’ll remember for later) and then the end of the line $; then rebuild the line, substituting the value of the number in place of the word.

There should be a cleaner way to write that regular expression than starting with .* to eat as much as possible; but it is not simply to make the final [^\d]* non-greedy (that gives a different output) and I’m lazy enough to have stopped with a correct output and a regular expression that I understand how it gives me the correct output.

With these conversions done, the remainder of my part 1 program does the remainder of the work.

Note that if I were writing part 2 without already having written part 1, I’d be much more likely to write something that extracts the first digit or number-word from the string and the last digit or number-word from the string. But when you already have tested code that extracts the first and last digits from a string, it’s super-convenient to convert number-words to digits in place and then reuse the existing extraction code. Again, subject to performance constraints.

The Whole Program

#!/usr/bin/perl

use warnings;
use strict;

my %value = (
one => 1, two => 2, three => 3, four => 4, five => 5,
six => 6, seven => 7, eight => 8, nine => 9,
);
my $wordre = join("|", keys %value);

my $sum;

while (<>) {
# Substitute the first number word if no digits before it.
s/^([^\d]*?)($wordre)/$1$value{$2}/;

# Substitute the last number word if no digits after it.
s/(.*)($wordre)([^\d]*)$/$1$value{$2}$3/;

# Extract first and last digits (if separate) and add to running total.
/^[^\d]*(\d).*(\d)[^\d]*$/ and $sum += 10 * $1 + $2;

# Extract lone digit and add value to running total.
/^[^\d]*(\d)[^\d]*$/ and $sum += 10 * $1 + $1;
}

print "sum is $sum\n";

Leave a Reply