Author Topic: yakra's collaborator thread  (Read 5884 times)

0 Members and 2 Guests are viewing this topic.

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2315
  • Last Login:Today at 12:39:58 pm
Re: yakra's collaborator thread
« Reply #15 on: September 22, 2019, 11:30:36 pm »
grep-o-roonie
inna USA/CAN stylee

1. Potential malformed city suffixes, including directional
grep '^[A-Z][A-Z][0-9]\{1,3\}[A-Z][a-z][a-z]_[NEWS]' */*/*.wpt | grep -v '[A-Z][A-Z][0-9]\{1,3\}Alt_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Bus_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Byp_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Trk_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}His_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Spr_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Con_[NEWS]'

2. Potential malformed city suffixes, more complete list
grep '^[A-Z][A-Z][0-9]\{1,3\}[A-Z][a-z][a-z]' */*/*.wpt | grep -v '[A-Z][A-Z][0-9]\{1,3\}Alt\|[A-Z][A-Z][0-9]\{1,3\}Bus\|[A-Z][A-Z][0-9]\{1,3\}Byp\|[A-Z][A-Z][0-9]\{1,3\}Trk\|[A-Z][A-Z][0-9]\{1,3\}His\|[A-Z][A-Z][0-9]\{1,3\}Spr\|[A-Z][A-Z][0-9]\{1,3\}Con'

3. "Good Alabama"
grep _ AL/*/*.wpt | grep -v '_[NEWS] \|_[A-Za-z]\{4\}'
A few false positives.

4. "Bad Maine"
grep 'US[0-9]\{1,3\}Alt.\+ ' ME/*/*.wpt | grep -v 'US[0-9]\{1,3\}Alt_[NEWS] '

5. 4-character LONG_UNDERSCORE datacheck
grep -i '_[A-Z]\{3\}[abcdfghijklmopqrtuvxyz] ' */*/*.wpt
Ubuntu:grep '_...[A-DF-MO-RTUVXYZa-z] ' */*/*.wpt
Error results:for rg in `ls`; do grep -s '_...[A-DF-MO-RTUVXYZa-z] ' $rg/*/*.wpt; done
Error counts:for rg in `ls`; do echo -en "$rg\t"; grep -s '_...[A-DF-MO-RTUVXYZa-z] ' $rg/*/*.wpt | wc -l; done
Good results:for rg in `ls`; do grep -s '_...[NEWS] ' $rg/*/*.wpt; done
Good counts:for rg in `ls`; do echo -en "$rg\t"; grep -s '_...[NEWS] ' $rg/*/*.wpt | wc -l; done

6. Old highway designation labels that may have a current visible name
grep '^Old..[0-9]' */*/*.wpt
This one will have a lot of false positives.
« Last Edit: September 26, 2019, 11:03:06 pm by yakra »

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2315
  • Last Login:Today at 12:39:58 pm
Re: yakra's collaborator thread
« Reply #16 on: September 27, 2019, 03:21:53 am »
Operation Good Labels
MyRegions='CT MA ME NH RI KS NE OK NJ NY TX AB MB NB NS PE NL'
NewEngland='CT MA ME NH RI'
CentralStates='KS NE OK'
Canada='AB MB NB NS PE NL'
MiscRegions='NJ NY TX'


Operation 4-Char Suffix (with NEWS)
CT(14) MA(6) ME(33) NH(12) RI(12)
for rg in $NewEngland; do grep '_.... ' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep '_....$'; done

Are all city suffixes appropriate?
Excluding 4-char cases with directional "subfix", as those are covered above
CT(16) MA(41) ME(181) NH(93) RI(7) (338)
KS(123) NE(103) OK(68) (294)
NJ(28) NY(81) TX(14) (123)
AB(28) MB(5) NB(18) NS(51) NL(6) (108)
for rg in $MyRegions; do grep '_... ' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v ':+' | grep '_...$'; done

Operation 2-Char Suffix
Excludes 4-char cases with directional "subfix"; those are covered above
CT(1) KS(3) MA(1) ME(11) NB(2)
for rg in $MyRegions; do grep '_.. ' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v ':+' | grep '_..$'; done

Operation Too Many Words
CT(5) MA(1) ME(4) NE(1) NH(5) NY(6) NB(16) NL(29)
for rg in $MyRegions; do grep -s '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*'; done
Not sufficient to match all cases. Refine regex & redo.

FooBarBRd
CT(1) MA(2) ME(2) NH(12) (17)
KS(3) NE(7) OK(10) (20)
NJ(5) NY(83) TX(9) (97)
AB(3) NB(25) NS(6) PE(1) NL(3) (38)
for rg in $MyRegions; do grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v '[NPS]P$' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]*'; done
for rg in $MyRegions; do grep -s '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v '[NPS]P$' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+'; done

FooBBazRd
CT(1) MA(1) ME(7) NH(1) (10)
NJ(2) NY(8) TX(1) (11)
OK(1) (1)
NB(6) NL(4) (10)
for rg in $MyRegions; do grep -s '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v ':Mc[A-Z][A-Z]' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]*'; done

FBarBazRd
CT(3) MA(1) ME(2) NH(3) RI(1) (10)
NY(28) TX(2) (30)
NB(16) NS(2) PE(1) NL(1) (20)
NE(1) (unprocessed WPT deleted) (1)
for rg in $MyRegions; do grep -s '[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep '[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*'; done

"All of the Above" mammoth regex
for rg in `ls`; do grep -s . $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v '[NPS]P$\|:[A-Z]*[a-z]*Mc[A-Z][A-Z][a-z]*$\|:Mc[A-Z][A-Z]\+[a-z]*[A-Z][a-z]*$\|:[A-Z]*Mc[A-Z]\+[a-z]\?$' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*\|[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+\|[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]*\|[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*'; done

Operation _End
MA(2) ME(3) NE(1) NH(2) OK(1) RI(2) AB(1)
for rg in $MyRegions; do grep _End $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep _End; done

Operation Predir
This got its own post

Operation Postdir
CT(6) ME(7) NH(5) (18)
KS(16) NE(1) (17)
NJ(3) NY(10) TX(6) (19)
MB(2) NB(9) NS(3) NL(5) (19)
for rg in $MyRegions; do grep '[NEWS]' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep '[NEWS]$' | grep -v ':+\|_\|[0-9]\{1,3\}[NEWS]$\|:[A-Z][A-Z]/[A-Z][A-Z]$\|:CR[A-Z]$\|:Rd[A-Z]$\|:Ave[A-Z]$\|:I\-[0-9]\{1,3\}BS$'; done

Operation Long Words
ME(1) NL(1)
for rg in $MyRegions; do grep . $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v ':+' | grep ':.*[a-z]\{4\}'; done

Operation McPotrzebie
MA(1) ME(2)
AB(2)
NJ(6) NY(6) TX(4)
for rg in $MyRegions; do grep Mc[A-Z][a-z] $rg/*/*.wpt; done
for rg in $Canada; do grep Mac $rg/*/*.wpt; done


Tpke -> Tpk
CT(7) MA(1) NH(10) RI(1)
OK(2)
NJ(4) NY(19) TX(1)

for rg in $MyRegions; do grep -s Tpke $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep 'Tpke'; done
« Last Edit: October 17, 2019, 11:06:26 am by yakra »


Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2315
  • Last Login:Today at 12:39:58 pm
Re: yakra's collaborator thread
« Reply #18 on: Today at 02:10:33 am »
abbrev_as_suffix
These can show up in graph labels. 31 examples in:
egrep -i '([A-Z]{3})_\1' waypointsimplification.log
egrep -i '([A-Z]{3})_\1' tm-master-simple.tmg

I may have some data to fix; there's stuff in my regions

datacheck-style pseudocode
if colocated:
  if underscore:
    for p in colocated:
      if p.route.abbrev.size() and label.substr(underscore+1, p.route.abbrev.size()) == p.route.abbrev:
        flag(abbrev_as_suffix);
        break;
  else:
    for p in colocated:
      if regex("[0-9]+[A-Za-z]{3}") in label:
        if p.route.route ends with same [0-9]+
        and not p.route.banner.startswith(same [A-Za-z]{3})
        and not p.route.abbrev.startswith(same [A-Za-z]{3}):
          flag(no_banner_or_abbrev);
« Last Edit: Today at 03:55:22 am by yakra »