TODO: * najit, kde to zije, kdo to udrzuje * pripadne submitnout internacionalni patch * v tom pripade ho musis udelat pres binmode utf-8 a lc a ne prasit tr! ;-) * opravdu absolutni vaha na 4-gramech => vety ze tri slov maji 0 sub NormalizeText { my ($norm_text) = @_; # language-independent part: $norm_text =~ s///g; # strip "skipped" tags $norm_text =~ s/-\n//g; # strip end-of-line hyphenation and join lines ... # language-dependent part (assuming Western languages): $norm_text = " $norm_text "; $norm_text =~ tr/[A-Z]/[a-z]/ unless $preserve_case; # BEWARE! PRIDAL JSEM RUCNE !!! $norm_text =~ tr/ÁÉÍÓÚŮČĎĚŘŠŤŽ/áéíóúůčďěřšťž/ unless $preserve_case; # BEWARE! PRIDAL JSEM RUCNE !!!