I can’t link to part two of today’s problem because you have to solve part 1 to unlock the second half of the web page; but I’m sure it’s been countlessly reposted by now and I’ll summarize it:

Oops, some of the digits that we want to add up have been written as English words instead of numerals. Recognize the words `one`

, `two`

, `three`

, `four`

, `five`

, `six`

, `seven`

, `eight`

, and `nine`

as digits also. And we have new sample input:

`two1nine`

eightwothree

abcone2threexyz

xtwone3four

4nineeightseven2

zoneight234

7pqrstsixteen

Which yields 29 + 83 + 13 + 14 + 42 + 14 + 76 = 281.

So … we need to convert number-words into digits. But:

- Not zero.
- Only need to convert the first and last number-words of a line, not every occurrence.
- Only need to convert number words that are closer to the start or end of a line than any digits.
- Not specifically stated in the problem: Watch out for lines ending in things like
`oneight`

.

Tricksy hobbitses!

So I cloned my part 1 solution and added to it:

`my %value = (`

one => 1, two => 2, three => 3, four => 4, five => 5,

six => 6, seven => 7, eight => 8, nine => 9,

);

my $wordre = join("|", keys %value);

I create an associative array / hash that will let me look up the value of every number-word that this problem asks us to replace. I then extract the keys of that hash — the number-words — and join them into a list separated by `|`

pipe (regular expression or) characters. Note that the keys are extracted in arbitrary order and I don’t sort them because I don’t actually care about the order.

This gives me a string like `three|one|four|five|nine|...`

that has all of the number-words listed as alternatives.

Then in the loop to process each line, we need to do some conversions:

` # Substitute the first number word if no digits before it.`

s/^([^\d]*?)($wordre)/$1$value{$2}/;

If a line starts `^`

with things that aren’t digits `[^\d]`

, zero or more of them `*`

, and is then followed by `($wordre)`

a number-word, preserve everything that was before the number-word and replace the number-word with the value of that word — the digit.

Note that for this problem, I don’t actually need the text `$1`

that was before the number-word so I don’t *have* to preserve that; but it’s habit. If this were time-sensitive (computationally-heavy or big-data input), I’d be more careful about extracting only needed information with minimal manipulation.

Note also that the `?`

operator after the `*`

says to match the *fewest* non-digit characters possible. Without that, a line like `onetwothreeboohoohoo7`

would be converted to `onetwo3boohoohoo7`

(because `one`

and `two`

are non-digits that we’re accepting as many of as possible) instead of the correct `1twothreeboohoohoo7`

. A normal `*`

is called a “greedy” operator, eating as much of the string as possible; adding `?`

after it makes it non-greedy, eating as little of the string as possible.

` # Substitute the last number word if no digits after it.`

s/(.*)($wordre)([^\d]*)$/$1$value{$2}$3/;

Here, if we eat up as much of the line as possible `(.*)`

and remember it for later; and then match a number-word `($wordre)`

and remember it for later; and that’s followed only by things that aren’t digits `([^\d]*)`

(that we’ll remember for later) and then the end of the line `$`

; then rebuild the line, substituting the value of the number in place of the word.

There should be a cleaner way to write that regular expression than starting with `.*`

to eat as much as possible; but it is not simply to make the final `[^\d]*`

non-greedy (that gives a different output) and I’m lazy enough to have stopped with a correct output and a regular expression that I understand how it gives me the correct output.

With these conversions done, the remainder of my part 1 program does the remainder of the work.

Note that if I were writing part 2 without already having written part 1, I’d be much more likely to write something that extracts the first digit or number-word from the string and the last digit or number-word from the string. But when you already have tested code that extracts the first and last digits from a string, it’s super-convenient to convert number-words to digits in place and then reuse the existing extraction code. Again, subject to performance constraints.

### The Whole Program

`#!/usr/bin/perl`

```
```use warnings;

use strict;

my %value = (

one => 1, two => 2, three => 3, four => 4, five => 5,

six => 6, seven => 7, eight => 8, nine => 9,

);

my $wordre = join("|", keys %value);

my $sum;

while (<>) {

# Substitute the first number word if no digits before it.

s/^([^\d]*?)($wordre)/$1$value{$2}/;

# Substitute the last number word if no digits after it.

s/(.*)($wordre)([^\d]*)$/$1$value{$2}$3/;

# Extract first and last digits (if separate) and add to running total.

/^[^\d]*(\d).*(\d)[^\d]*$/ and $sum += 10 * $1 + $2;

# Extract lone digit and add value to running total.

/^[^\d]*(\d)[^\d]*$/ and $sum += 10 * $1 + $1;

}

`print "sum is $sum\n";`