Author Topic: yakra's collaborator thread  (Read 6333 times)

0 Members and 1 Guest are viewing this topic.

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2364
  • Last Login:Today at 12:59:52 pm
Re: yakra's collaborator thread
« Reply #15 on: September 22, 2019, 11:30:36 pm »
grep-o-roonie
inna USA/CAN stylee

1. Potential malformed city suffixes, including directional
grep '^[A-Z][A-Z][0-9]\{1,3\}[A-Z][a-z][a-z]_[NEWS]' */*/*.wpt | grep -v '[A-Z][A-Z][0-9]\{1,3\}Alt_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Bus_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Byp_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Trk_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}His_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Spr_[NEWS]\|[A-Z][A-Z][0-9]\{1,3\}Con_[NEWS]'

2. Potential malformed city suffixes, more complete list
grep '^[A-Z][A-Z][0-9]\{1,3\}[A-Z][a-z][a-z]' */*/*.wpt | grep -v '[A-Z][A-Z][0-9]\{1,3\}Alt\|[A-Z][A-Z][0-9]\{1,3\}Bus\|[A-Z][A-Z][0-9]\{1,3\}Byp\|[A-Z][A-Z][0-9]\{1,3\}Trk\|[A-Z][A-Z][0-9]\{1,3\}His\|[A-Z][A-Z][0-9]\{1,3\}Spr\|[A-Z][A-Z][0-9]\{1,3\}Con'

3. "Good Alabama"
grep _ AL/*/*.wpt | grep -v '_[NEWS] \|_[A-Za-z]\{4\}'
A few false positives.

4. "Bad Maine"
grep 'US[0-9]\{1,3\}Alt.\+ ' ME/*/*.wpt | grep -v 'US[0-9]\{1,3\}Alt_[NEWS] '

5. 4-character LONG_UNDERSCORE datacheck
grep -i '_[A-Z]\{3\}[abcdfghijklmopqrtuvxyz] ' */*/*.wpt
Ubuntu:grep '_...[A-DF-MO-RTUVXYZa-z] ' */*/*.wpt
Error results:for rg in `ls`; do grep -s '_...[A-DF-MO-RTUVXYZa-z] ' $rg/*/*.wpt; done
Error counts:for rg in `ls`; do echo -en "$rg\t"; grep -s '_...[A-DF-MO-RTUVXYZa-z] ' $rg/*/*.wpt | wc -l; done
Good results:for rg in `ls`; do grep -s '_...[NEWS] ' $rg/*/*.wpt; done
Good counts:for rg in `ls`; do echo -en "$rg\t"; grep -s '_...[NEWS] ' $rg/*/*.wpt | wc -l; done

6. Old highway designation labels that may have a current visible name
grep '^Old..[0-9]' */*/*.wpt
This one will have a lot of false positives.
« Last Edit: September 26, 2019, 11:03:06 pm by yakra »

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2364
  • Last Login:Today at 12:59:52 pm
Re: yakra's collaborator thread
« Reply #16 on: September 27, 2019, 03:21:53 am »
Operation Good Labels
MyRegions='CT MA ME NH RI KS NE OK NJ NY TX AB MB NB NS PE NL'
NewEngland='CT MA ME NH RI'
CentralStates='KS NE OK'
Canada='AB MB NB NS PE NL'
MiscRegions='NJ NY TX'


Operation 4-Char Suffix (with NEWS)
CT(14) MA(6) ME(33) NH(12) RI(12)
for rg in $NewEngland; do grep '_.... ' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep '_....$'; done

Are all city suffixes appropriate?
Excluding 4-char cases with directional "subfix", as those are covered above
CT(16) MA(41) ME(181) NH(93) RI(7) (338)
KS(123) NE(103) OK(68) (294)
NJ(28) NY(81) TX(14) (123)
AB(28) MB(5) NB(18) NS(51) NL(6) (108)
for rg in $MyRegions; do grep '_... ' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v ':+' | grep '_...$'; done

Operation 2-Char Suffix
Excludes 4-char cases with directional "subfix"; those are covered above
CT(1) KS(3) MA(1) ME(11) NB(2)
for rg in $MyRegions; do grep '_.. ' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v ':+' | grep '_..$'; done

Operation Too Many Words
CT(5) MA(1) ME(4) NE(1) NH(5) NY(6) NB(16) NL(29)
for rg in $MyRegions; do grep -s '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*'; done
Not sufficient to match all cases. Refine regex & redo.

FooBarBRd
CT(1) MA(2) ME(2) NH(12) (17)
KS(3) NE(7) OK(10) (20)
NJ(5) NY(83) TX(9) (97)
AB(3) NB(25) NS(6) PE(1) NL(3) (38)
for rg in $MyRegions; do grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v '[NPS]P$' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]*'; done
for rg in $MyRegions; do grep -s '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v '[NPS]P$' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+'; done

FooBBazRd
CT(1) MA(1) ME(7) NH(1) (10)
NJ(2) NY(8) TX(1) (11)
OK(1) (1)
NB(6) NL(4) (10)
for rg in $MyRegions; do grep -s '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v ':Mc[A-Z][A-Z]' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]*'; done

FBarBazRd
CT(3) MA(1) ME(2) NH(3) RI(1) (10)
NY(28) TX(2) (30)
NB(16) NS(2) PE(1) NL(1) (20)
NE(1) (unprocessed WPT deleted) (1)
for rg in $MyRegions; do grep -s '[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep '[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*'; done

"All of the Above" mammoth regex
for rg in `ls`; do grep -s . $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v '[NPS]P$\|:[A-Z]*[a-z]*Mc[A-Z][A-Z][a-z]*$\|:Mc[A-Z][A-Z]\+[a-z]*[A-Z][a-z]*$\|:[A-Z]*Mc[A-Z]\+[a-z]\?$' | grep '[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*\|[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+\|[A-Z]\+[a-z]\+[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]*\|[A-Z]\+[a-z]\?[A-Z]\+[a-z]\+[A-Z]\+[a-z]\+[A-Z]\+[a-z]*'; done

Operation _End
MA(2) ME(3) NE(1) NH(2) OK(1) RI(2) AB(1)
for rg in $MyRegions; do grep _End $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep _End; done

Operation Predir
This got its own post

Operation Postdir
CT(6) ME(7) NH(5) (18)
KS(16) NE(1) (17)
NJ(3) NY(10) TX(6) (19)
MB(2) NB(9) NS(3) NL(5) (19)
for rg in $MyRegions; do grep '[NEWS]' $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep '[NEWS]$' | grep -v ':+\|_\|[0-9]\{1,3\}[NEWS]$\|:[A-Z][A-Z]/[A-Z][A-Z]$\|:CR[A-Z]$\|:Rd[A-Z]$\|:Ave[A-Z]$\|:I\-[0-9]\{1,3\}BS$'; done

Operation Long Words
ME(1) NL(1)
for rg in $MyRegions; do grep . $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep -v ':+' | grep ':.*[a-z]\{4\}'; done

Operation McPotrzebie
MA(1) ME(2)
AB(2)
NJ(6) NY(6) TX(4)
for rg in $MyRegions; do grep Mc[A-Z][a-z] $rg/*/*.wpt; done
for rg in $Canada; do grep Mac $rg/*/*.wpt; done


Tpke -> Tpk
CT(7) MA(1) NH(10) RI(1)
OK(2)
NJ(4) NY(19) TX(1)

for rg in $MyRegions; do grep -s Tpke $rg/*/*.wpt | sed -e 's/ /%/' -e 's/\(.\+\)%.\+/\1/' | grep 'Tpke'; done
« Last Edit: October 17, 2019, 11:06:26 am by yakra »


Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2364
  • Last Login:Today at 12:59:52 pm
Re: yakra's collaborator thread
« Reply #18 on: November 20, 2019, 02:10:33 am »
Ideas to reject. Or not.

abbrev_as_suffix
These can show up in graph labels. 31 examples in:
egrep -i '([A-Z]{3})_\1' waypointsimplification.log
egrep -i '([A-Z]{3})_\1' tm-master-simple.tmg

I may have some data to fix; there's stuff in my regions
30 cases of 3+ intersection + one Exit/Intersection out of 430031 collapsed vertices. May well be fixable on the waypoint simplification side.

datacheck-style pseudocode
Code: [Select]
if (underscore != string::npos)
  for (Waypoint *p : *(w->colocated))
  { if (p == w) continue;
if ( p->route->abbrev.size()
     && w->label.substr(underscore+1, p->route->abbrev.size()) == p->route->abbrev
     && w->label.substr(0, underscore) == p->route->name_no_abbrev()
   ) { datacheckerrors->add(&r, w->label, "", "", "ABBREV_AS_SUFFIX", p->route->abbrev);
break;
     }
  }
/*TODO: else for (Waypoint *p : *(w->colocated))
     { if (p == w) continue;
if (regex("[0-9]+[A-Za-z]{3}") in label)
if ( p->route->route ends with same [0-9]+ \
and not p.route.banner.startswith(same [A-Za-z]{3}) \
and not p.route.abbrev.startswith(same [A-Za-z]{3})
   ) { datacheckerrors->add(&r, w->label, "", "", "FALSE_ABBREV", "Lorem Ipsum");
break;
     }
     }//*/
      }
Not really worth doing as a datacheck. A child route can intersect or overlap its parent midway, resulting in a city_suffix: see MA122@MA122A_Wor. These cases alone + desire to be consistent with otherwise-same-style labels + enough of a gray area in the manual = abbrev_as_suffix should be accepted as normal, albeit awkward for graph labels.
false_abbrev or unexpected_abbrev, OTOH... ;)



LABEL_SELFREF
yakra@BiggaTomato:~/TravelMapping/yakra/DataProcessing/siteupdate/cplusplus/logs$ diff waypointsimplification.old waypointsimplification.i5 | egrep '^< Straightforward concurrency'
< Straightforward concurrency: L3238Her@L3238&L3238Mar@L3238 -> L3238Her/L3238Mar@L3238
< Straightforward concurrency: B279@B279&B279Hol@B279 -> B279/B279Hol@B279
< Straightforward concurrency: A329@A329_E&A329Rea@A329_E -> A329/A329Rea@A329_E
yakra@BiggaTomato:~/TravelMapping/yakra/DataProcessing/siteupdate/cplusplus/logs$ egrep 'L3238Her@L3238&L3238Mar@L3238|B279@B279&B279Hol@B279|A329@A329_E&A329Rea@A329_E' waypointsimplification.i5
Straightforward intersection: L3238Her@L3238&L3238Mar@L3238 -> L3238/L3238
Straightforward intersection: B279@B279&B279Hol@B279 -> B279/B279
Straightforward intersection: A329@A329_E&A329Rea@A329_E -> A329_E/A329_E
[yakra@noreaster /home/terescoj/travelmapping/HighwayData]$ egrep -n 'L3238|B279|A329' datacheckfps.csv
4030:deuby.b279;B279;;;LABEL_SELFREF;
4031:deuby.b279hol;B279;;;LABEL_SELFREF;
4109:deuhe.l3238her;L3238;;;LABEL_SELFREF;
4110:deuhe.l3238mar;L3238;;;LABEL_SELFREF;
4425:eng.a0329;A329_E;;;LABEL_SELFREF;
4426:eng.a0329;A329_W;;;LABEL_SELFREF;
4427:eng.a0329rea;A329_E;;;LABEL_SELFREF;
4428:eng.a0329rea;A329_W;;;LABEL_SELFREF;

Should arguably have the abbrev included...



banner_before_number
http://forum.travelmapping.net/index.php?topic=2601.msg15607#msg15607

closed_open_coloc
for label in $labels; do echo $label | tr '@&' ';;'; done
Not always a true error.
Upon adding to colocated list: check if (label[0] == '*') == (colocated[0].label[0] == '*')

only_1_intersecting
Points with a '/' in label colocated with only 1 (or 0?) other
Lots of FP potential. Borders! Don't flag when...
• points are in different regions?
• point is at beginning or end of route?

underscore_numeral
numeral immediately follows underscore

extraneous_suffix
either parenthetical or underscored, when only one jct with route
this could be a little tricky

interstate_no_hyphen
for rg in AK AL AR AS AZ CA CO CT DC DE FL GA GU HI IA ID IL IN KS KY LA MA MD ME MI MN MO MP MS MT NC ND NE NH NJ NM NV NY OH OK OR PA PR RI SC SD TN TX UT VA VI VT WA WI WV WY; do egrep -v '^\+|[HMRVW]I[0-9]' $rg/*/*.wpt | cut -f1 -d' ' | grep 'I[0-9]'; done
False Positives:
MN/usamn/mn.mn089.wpt:BAI3
OH/usaoh/oh.oh694.wpt:CoHwyI17


slashed designation, unless: (number, optionally following single letter)
grep -v '^+' */*/*.wpt | cut -f1 -d' ' | egrep -iv ':.*/[A-Z]?[0-9]|:\*?[A-Z]+/[A-Z]+$' | egrep ':.*+/'
overlap with graph-based unexpected_slash?

parens_long_prefix
grep -v '^+' */*/*.wpt | cut -f1 -d' ' | grep -v '([A-Z]\?[0-9]' | grep '(..[0-9]'

I-designation + unexpected letter, excluding common banners
egrep -v '^\+|^\*?I\-[0-9]+B[LS]|^\*?I\-[0-9]+Fut|^\*?I\-[0-9]+Spr|^\*?I\-[0-9]+Trk|^\*?I\-35[EW]|^\*?I-69[CEW]' */*/*.wpt | cut -f1 -d' ' | egrep 'I\-[0-9]+[A-Za-z]'
A couple "Con" examples, incl 2 in NJ

$rg/Can, Can/$rg, Mex/$rg, $rg/Mex border labels in USA
for rg in AK AL AR AS AZ CA CO CT DC DE FL GA GU HI IA ID IL IN KS KY LA MA MD ME MI MN MO MP MS MT NC ND NE NH NJ NM NV NY OH OK OR PA PR RI SC SD TN TX UT VA VI VT WA WI WV WY; do egrep -i "^$rg/Can|^Can/$rg|^Mex/$rg|^$rg/Mex" $rg/*/*.wpt; done

Old routes with false abbrevs that should be after the underscore
grep -v '^+' */*/*.wpt | cut -f1 -d' ' | egrep -i ':.*Old.*[0-9]+[A-Z]{3}'
« Last Edit: December 03, 2019, 02:48:26 am by yakra »

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2364
  • Last Login:Today at 12:59:52 pm
Re: yakra's collaborator thread
« Reply #19 on: November 28, 2019, 02:56:10 pm »
mysql> select distinct banner from routes left join systems on systems.systemName = routes.systemName where countryCode = 'USA';
+--------+
| banner |
+--------+
|        |
| Bus    |
| Trk    |
| His    |
| Alt    |
| Con    |
| Lp     |
| Byp    |
| Spr    |
| BL     |
| BS     |
| Fut    |
| AltTrk |
| Sce    |
| BusTrk |
| AltBus |
| AltByp |
| Wye    |
+--------+
18 rows in set (0.07 sec)