The Python Challenge on Perl

http://www.pythonchallenge.com/

のレベル2をPerlで解く。激しくネタばれ注意。

レベル2

http://www.pythonchallenge.com/pc/def/ocr.html

recognize the characters. maybe they are in the book,
but MAYBE they are in the page source.

ページのソースを見てみると、

<!--
%%$@_$^__#)^)&!_+]!*@&^}@[@%]()%+$&[(_@%+%$*^@$^!+]!&_#)_*}{}}!}_]$[%}@[{_@#_^{*
(以下略)

上記のようにゴミが混ざったコメント中から"rare characters"だけ抜く。
ここでは、「文字そのもの」つまり、a〜zまたはA〜Zを指していると思う。決して、ゲームなどにおける「希少種（レアキャラ）」の類ではない！　はずだ。
処理対象のソースはローカルに落とし、以下のスクリプト *1にかける。



$i=0;

while(<>) {

  $i=1 if(grep(/<!--/,$_));

  $i=0 if(grep(/-->/,$_));

  print grep(/[a-zA-Z :]/,split(//)) if($i eq 1);

}

実行。



$ perl chop.pl < ocr.html

find rare characters in the mess below:equality

OK。次なる問題のURLは"equality.html"だ。

*1:":"や" "は、先ほど述べた、"rare characters"には該当しないが、これを含めないと単語の間が詰まる。ゴミが出る可能性はあるが、情報落ちよりはマシ。