Author Topic: Using grep and shell scripts to search highway data, waypoint labels, etc.  (Read 57099 times)

0 Members and 2 Guests are viewing this topic.

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4422
  • Last Login:November 11, 2024, 12:50:03 pm
  • I like C++
grep is a powerful tool to search for regular expressions in text files. It runs on Unix-like systems such as Linux, FreeBSD (e.g. noreaster) or Macintosh. With a little practice, you can put together expressions to search for very specific types of text strings. In this post, I'll focus on waypoint labels.

Commands were executed from the hwy_data/ directory.
They're designed to easily substitute different regions into the $rg variable, so you can execute the same command different times to search different regions.
rg=PA
grep whatever $rg/*/*.wpt
rg=DE
grep whatever $rg/*/*.wpt

et cetera. You can also wrap them up inside loops to search multiple regions in one go:
for rg in PA DE; do grep whatever $rg/*/*.wpt; done
or
MyRegions='PA DE'
for rg in $MyRegions; do grep whatever $rg/*/*.wpt; done

et cetera.

The cut command was included to strip out AltLabels and URLs from the .wpt file lines, to ensure that we're only searching within primary labels.
If you have questions about the various tokens, expressions or commands, feel free to ask.

Some example search results in Pennsylvania and Delaware are posted here.



Potential no-underscore malformed city suffixes
grep -v '^+' $rg/*/*.wpt | cut -f1 -d' ' | egrep -v '[A-Z][A-Z][0-9]{1,3}Alt|[A-Z][A-Z][0-9]{1,3}Bus|[A-Z][A-Z][0-9]{1,3}Byp|[A-Z][A-Z][0-9]{1,3}Trk|[A-Z][A-Z][0-9]{1,3}His|[A-Z][A-Z][0-9]{1,3}Spr|[A-Z][A-Z][0-9]{1,3}Con' | egrep ':[A-Z]{2}[0-9]{1,3}[A-Z][a-z]{2}'
This one can get confusing.
Edit, 2019-11-18 01:32 EST: Only check primary labels; avoid false negatives.

Intersections with visibly numbered highways
Quote
US40: US40BusWhi | US73: US40BusWhi US40BusTho | A3: A3Zur
Distinguish two different same-bannered same-numbered routes as needed with the 3-letter city abbreviations. Also use the city abbreviation for bannerless same-designation spurs or branches, such as the Zurich A3 spur intersecting the main A3.
While we do want to use city abbreviations for routes (in the HB) that have them...

Distinguishing otherwise identical waypoints (not for exit numbers)
Quote
US95: NV57_Ren | NV57_Tah
If more than two points for the same non-exit-numbered cross road are needed, there are two options which can be used in combination with or ignoring the previous options for pairs of identical labels.
1. Use alphabetical suffixes _A, _B, _C, etc.
2. Choose 3-letter suffixes for a nearby towns if they are fairly close. The 3-letter suffix should be the first 3 letters of the town name. Add a suffix with an underscore and those 3 letters.
...disambiguating multiple intersections with a single route, or with roads not in the HB, is done with an underscore.
A lot of time, people may use the former when they intended to use the latter.



Underscored 4-character city suffixes not ending in N, E, W, or S
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | grep '_...[A-DF-MO-RTUVXYZa-z]$'
Distinguishing otherwise identical waypoints (not for exit numbers)
Quote
NV88_PitS | NV88_PitN
If you need the same town twice, add a 4th letter that is a direction letter (_PitS and _PitN for southern and northern junctions near Pittston). If the town suffixes are not useful, are confusing, or require further elaboration (3+ junctions with same town), use the alphabetical suffixes instead.
Edit, 2019-11-17 20:58 EST: Only check primary labels.



2-character city suffixes
egrep -sv '^\+' $rg/*/*.wpt | cut -f1 -d' ' | grep -v '_U[0-9]$' | grep '_..$'
Edit, 2019-11-17 22:08 EST: Exclude false positives caused by numbered U-turn ramps.



Old highway designation labels
grep -s '^Old..[0-9]' $rg/*/*.wpt
Waypoints for roads that no longer have a name or no longer exist as a road
Quote
MainSt
If the old highway has a posted name or number, use that name or number (don't make one up), and label the waypoint according to the usual rules.
If US 30/Main Street becomes Main Street, then the label is MainSt, not something like OldUS30.

OldUS40
If the old highway has a posted name that mentions the old designation, then applying the above rule will result in a label like OldUS40 for "Old Route 40" if the route was formerly US 40.
There will be a good number of false-positive, legitimate labels here.
I'm not sure how closely these rules (if it's a requirement & not an option) have been followed; there are a lot of examples where "OldRt" or "OldHwy", etc. are used if a road is signed as such in the field, or if that's the official name. (Although, Tim may have done it that way before these specific guidelines were codified...) May be something worth discussing in the forum.



Long words, >4 characters
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | egrep ':.*[a-z]{4}'
Intersections with named highways
Quote
FaiRd PapMillRd
For up to two other (specifying) words, truncate the word as follows:
1-4 letters - use whole word
5+ letters - use the first 3 letters. Don't use a made-up abbreviation. Fairchild Road becomes FaiRd, not FrchldRd or FchRd or anything else.



Too many words
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | egrep -v ':[A-Z]*[a-z]*Mc[A-Z][A-Z][a-z]*$|:Mc[A-Z]{2,}[a-z]*[A-Z][a-z]*$|:[A-Z]*Mc[A-Z]+[a-z]?$' | egrep '[A-Z]+[a-z]+[A-Z]+[a-z]+[A-Z]+[a-z]+[A-Z]+[a-z]*|[A-Z]+[a-z]+[A-Z]+[a-z]+[A-Z]+[a-z]?[A-Z]+[a-z]+|[A-Z]+[a-z]+[A-Z]+[a-z]?[A-Z]+[a-z]+[A-Z]+[a-z]*|[A-Z]+[a-z]?[A-Z]+[a-z]+[A-Z]+[a-z]+[A-Z]+[a-z]*'
Intersections with named highways
Quote
MarKingBlvd MLKingBlvd
If the cross road name has more than 3 words, use one of two options:
1. Pick out the two most important words besides the road type and use only those: Martin Luther King Boulevard becomes MarKingBlvd. Three words in total are included in shortened form.
2. Pick out one important word besides the road type and use it and the initials of the other words: Martin Luther King Boulevard becomes MLKingBlvd. Two words in total are included in shortened form along with initials of the rest.
An older style for 3-word road names was to have Two truncated words & one initial. E.G., Lisbon Falls Village Rd -> LisFalVRd.
This was later deprecated in favor of the rule quoted above. LisFalRd, LFVilRd, etc.
Edit, 2019-11-17 21:40 EST: Use egrep for BSD (e.g. noreaster) compatibility due to + token.



McDonRd
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | grep 'Mc[A-Z][a-z]'
In the above search, the Mc[A-Z] tokens are included to filter out "Old McDonald Farm Rd" style false positives.
OTOH, McDonald is one word, so it makes sense to truncate it to McD, rather than treat it as 2 for McDon.



Tpke for Turnpike
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | grep 'Tpke'
CHM standardized on "Tpk" for turnpike, standard abbreviations notwithstanding...



Directional prefixes
egrep -sv "^\+|^.[a-z]|^$rg[0-9]+" $rg/*/*.wpt | cut -f1 -d' ' | grep ':[NEWS]' | egrep -v ":[A-Z][A-Z]/[A-Z][A-Z]$|:[NEWS]End$"
Intersections with named highways
Quote
6thSt | 33rdAve | SeeLn
Ignore any non-essential direction specifier. N. 6th St becomes 6thSt. 33rd Avenue SW becomes 33rdAve. W. Seedy Lane becomes SeeLn.

NorPkwy | SouBlvd
But keep directions that are the main part of the road name, such as NorPkwy for Northern Parkway or SouBlvd for Southeast Boulevard.
What's a "non-essential direction specifier" and what are "directions that are the main part of the road name" can get wibbly-wobbly at times; use your best judgment. Many of these may be legit, but they're worth checking out.
IMO though, N/E/W/S are common enough abbreviations for their respective directions, and there are enough examples of this throughout the data, that I'm not going to call for every WPondRd to be changed to a WestPondRd.



Similarly, a search for suffixed directional specifiers:
egrep -sv '^\+' $rg/*/*.wpt | cut -f1 -d' ' | grep '[NEWS]$' | egrep -v '_|[0-9]{1,3}[NEWS]$|:[A-Z][A-Z]/[A-Z][A-Z]$|:CR[A-Z]$|:Rd[A-Z]$|:Ave[A-Z]$|:I\-[0-9]{1,3}BS$'

Edit, 2019-11-17 20:11 EST: Use cut instead of sed to isolate primary labels. More concise, clear, fast.
Edit, 2019-11-17 21:08 EST: Use grep -s to suppress error messages for regions with no WPTs.
Edit, 2019-11-18 01:07 EST: Use egrep for more human-readable expression strings, particularly with | and {}
Edit, 2019-11-18 01:53 EST: Exclude hidden points from all searches.
« Last Edit: December 23, 2022, 08:54:42 am by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4422
  • Last Login:November 11, 2024, 12:50:03 pm
  • I like C++
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #1 on: October 06, 2019, 01:12:13 pm »
Rte(Number) style labels on US Interstates
for file in */usai/*.wpt; do results=`grep -H . $file | cut -f1 -d' ' | grep 'I-[0-9][0-9]*(..*)' | wc -l`; if [ $results -gt 2 ]; then grep -H . $file | cut -f1 -d' ' | grep 'I-[0-9][0-9]*(..*)'; fi; done
One point of confusion can be the two styles of waypoint labeling for concurrencies with Interstate Highways and other exit numbered routes.

Interchanges on exit-numbered highways
Quote
I-80: 56 | 87(75) | 89(75) | 63
In multiplexes where the concurrency uses exit numbers from the other highway, put the highway number in parentheses. Drop the letter prefix of the concurrent highway if it is more than one character long: I-75 becomes (75). A5 can stay as (A5).
The vast majority of Interstates (with rare exceptions, most obviously WY I-180) will be exit-numbered routes, and thus fall into this category.

Waypoint labels for multiplexes
Quote
US90: LA76 | I-49(45) | I-49(47) | I-49(52) | US40
For non-exit-numbered routes concurrent with a numbered, exit-numbered route, use the concurrent highway designation with the exit numbers in parentheses.
We'll sometimes see this erroneously; this basic (Interstate-to-Interstate only) search is designed to pick up on these cases.
It only prints results for routes with at least 3 such labels: Some routes, such as NY I-490 or ME I-295, have both ends at unnumbered interchanges with the parent route, necessitating this style label. Some routes, such as TX I-69, are not fully exit-numbered yet, and contain some crossroad-style labels.
Right now, it reports some malformed labels in ms.i055.wpt & wi.i041.wpt.



How far down this rabbit hole can we go?
An experiment. A somewhat unwieldy one...
rm -f exit_numbers.log looks_unnumbered.log looks_numbered.log
touch exit_numbers.log looks_unnumbered.log looks_numbered.log
for rg in `ls -F | grep '/$' | cut -f1 -d/`; do
  for sys in `ls -F $rg | grep '/$' | cut -f1 -d/`; do
    for file in `ls $rg/$sys | grep '\.wpt$'`; do
      # This gets messed up by files with spaces in their names, and ignores them.
      if [ -r $rg/$sys/$file ]; then
        LooksNumbered=`cut -f1 -d' ' $rg/$sys/$file | grep -c -m 1 '^[0-9][0-9]*$'`
        if [ $LooksNumbered == 0 ]; then
          results=`cut -f1 -d' ' $rg/$sys/$file | grep '^\*\?[0-9][0-9]*[A-Z]\?([0-9A-Za-z][0-9A-Za-z]*)' | wc -l`
          if (( $results != 0 && $results != `cat $rg/$sys/$file | wc -l` )); then
            echo -e "\n$rg/$sys/$file looks like a non-exit-numbered route" \
            | tee -a exit_numbers.log | tee -a looks_unnumbered.log
            cut -f1 -d' ' $rg/$sys/$file \
            | grep '^\*\?[0-9][0-9]*[A-Z]\?([0-9A-Za-z][0-9A-Za-z]*)' \
            | tee -a exit_numbers.log | tee -a looks_unnumbered.log
          fi
        fi
        if [ $LooksNumbered == 1 ]; then
          results=`cut -f1 -d' ' $rg/$sys/$file | grep '^\*\?[A-Z][A-Z]*[A-Za-z]*-\?[0-9]*[A-Za-z]*([A-Z]*[0-9]*[A-Z]*)$' | wc -l`
          if [ $results -gt 2 ]; then
            echo -e "\n$rg/$sys/$file looks like an exit-numbered highway" \
            | tee -a exit_numbers.log | tee -a looks_numbered.log
            cut -f1 -d' ' $rg/$sys/$file \
            | grep '^\*\?[A-Z][A-Z]*[A-Za-z]*-\?[0-9]*[A-Za-z]*([A-Z]*[0-9]*[A-Z]*)$' \
            | tee -a exit_numbers.log | tee -a looks_numbered.log
          fi
        fi
      fi
    done
  done
done
lbc=`cut -f1 -d' ' exit_numbers.log | grep -v '\.wpt$' | wc -w`
rtc=`cut -f1 -d' ' exit_numbers.log | grep '\.wpt$' | wc -l`
rgc=`cut -f1 -d' ' exit_numbers.log | grep '\.wpt$' | cut -f1 -d/ | uniq | wc -l`
lbw=`echo $lbc | wc -c`
rtw=`echo $rtc | wc -c`
rgw=`echo $rgc | wc -c`
printf   'looks   numbered: %'"$lbw"'d labels in %'"$rtw"'d routes in %'"$rgw"'d regions\n' \
   `cut -f1 -d' ' looks_numbered.log | grep -v '\.wpt$' | wc -w` \
   `cut -f1 -d' ' looks_numbered.log | grep '\.wpt$' | wc -l` \
   `cut -f1 -d' ' looks_numbered.log | grep '\.wpt$' | cut -f1 -d/ | uniq | wc -l`
printf   'looks unnumbered: %'"$lbw"'d labels in %'"$rtw"'d routes in %'"$rgw"'d regions\n' \
   `cut -f1 -d' ' looks_unnumbered.log | grep -v '\.wpt$' | wc -w` \
   `cut -f1 -d' ' looks_unnumbered.log | grep '\.wpt$' | wc -l` \
   `cut -f1 -d' ' looks_unnumbered.log | grep '\.wpt$' | cut -f1 -d/ | uniq | wc -l`
printf   '        combined: %'"$lbw"'d labels in %'"$rtw"'d routes in %'"$rgw"'d regions\n' $lbc $rtc $rgc

Works on my Linux machines, but not noreaster, for reasons I've not looked into. Fixed!
looks   numbered:    600 labels in   58 routes in   26 regions
looks unnumbered:  12425 labels in  703 routes in  102 regions
        combined:  13025 labels in  761 routes in  112 regions
...Back away slowly.
« Last Edit: November 17, 2019, 04:36:11 pm by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4422
  • Last Login:November 11, 2024, 12:50:03 pm
  • I like C++
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #2 on: November 18, 2019, 03:16:35 am »
Label lacks generic highway type
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | egrep 'Old[0-9]+_|Old[0-9]+$'
Label should be OldUS42, OldHwy42, OldRte42, OldME42, etc., not "Old42". This was a datacheck on CHM, but although a comment describing the siteupdate program's DatacheckEntry class mentions a LACKS_GENERIC datacheck code, there's nothing actually implementing it yet.

There are only 4 results project-wide. Not bad.
FL/usaus/fl.us027.wpt: CROld50 (should be WasSt?)
NOR/eure/nor.e18.wpt: *Old70
NOR/eure/nor.e18.wpt: *Old67
NOR/eursf/nor.oslokrimotkri.wpt: *Old70
ON/canon/on.on127.wpt: Old127
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline neroute2

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 1141
  • Last Login:Today at 04:16:10 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #3 on: November 18, 2019, 08:45:36 am »
FL/usaus/fl.us027.wpt: CROld50 (should be WasSt?)
It's a numbered county road with Old 50 or Old Hwy 50 in the shield. False positive.

Offline michih

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4849
  • Last Login:Yesterday at 11:21:09 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #4 on: November 18, 2019, 11:50:36 am »
NOR/eure/nor.e18.wpt: *Old70
NOR/eure/nor.e18.wpt: *Old67
NOR/eursf/nor.oslokrimotkri.wpt: *Old70

Former location of exit 70 and exit 67. Fine to me.

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4422
  • Last Login:November 11, 2024, 12:50:03 pm
  • I like C++
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #5 on: November 18, 2019, 01:17:40 pm »
It's a numbered county road with Old 50 or Old Hwy 50 in the shield. False positive.
Oh wows. I did look for something like that, just not hard enough. :D
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline rickmastfan67

  • TM Collaborator (A)
  • Hero Member
  • *****
  • Posts: 2064
  • Gender: Male
  • Last Login:Today at 12:07:50 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #6 on: November 21, 2019, 04:28:56 am »
It's a numbered county road with Old 50 or Old Hwy 50 in the shield. False positive.
Oh wows. I did look for something like that, just not hard enough. :D

Welcome to the world of insanity in FL.  :P

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4422
  • Last Login:November 11, 2024, 12:50:03 pm
  • I like C++
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #7 on: November 22, 2019, 01:19:22 am »
Could that be considered a form of bannered route?  ;D
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline cvoight

  • Newbie
  • *
  • Posts: 18
  • Last Login:September 03, 2021, 10:39:18 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #8 on: January 10, 2020, 02:09:49 pm »
The closest equivalent native to Windows PowerShell is Select-String, which is decently powerful though no grep. Executing from the hwy_data directory, you use Get-ChildItem -Include "*.wpt" -Recurse to feed all the files into Select-String. Your pattern will always start with ^ if searching waypoint labels as that signifies the start of a line, functioning similarly to the cut -f1 -d " " portion of yakra's commands.

For example, executing the following command finds any waypoint labels that start with US- (i.e. US routes with a hyphen between the US and the route number).

Code: [Select]
Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US-"
Slightly more complicated: find all the waypoints in the US that start with I followed directly by a number. [Note: this is not quite restricted to the US as there are some non-US regions with highway directories that contain the string "usa", but it eliminates most regions and (currently) all false positives. See the code section in this post for a regex that matches only US states and territories.]

Code: [Select]
Get-ChildItem -Directory "*usa*" -Recurse | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^I[0-9]"
« Last Edit: January 10, 2020, 04:27:55 pm by cvoight »

Offline cvoight

  • Newbie
  • *
  • Posts: 18
  • Last Login:September 03, 2021, 10:39:18 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #9 on: January 10, 2020, 08:51:57 pm »
Pursuant to a point by neroute2 that I had overlooked:

Quote
FaiRd, PapMillRd
Truncation rules: abbreviate the generic road type (Rd for Road, Blvd for Boulevard, etc.) if it's one of the very common types. Otherwise, use the first three letters: Uli for Ulica. Skip the final period.

For up to two other (specifying) words, truncate the word as follows:
1-4 letters - use whole word
5+ letters - use the first 3 letters. Don't use a made-up abbreviation. Fairchild Road becomes FaiRd, not FrchldRd or FchRd or anything else.

First, find all waypoints with a capital letter followed by 4 or more lowercase letters, executed from the hwy_data directory: Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[A-Z][a-z]{4,}" (In actuality, I piped this to a text file by appending | Out-File .\four_lowercase.txt

Scanning this file, I filter the results: delete many odd results from ITA\ita.nsa (2).wpt; delete Minnesota subway lines (MN\usamnmt\BlueLine.wpt & MN\usamnmt\GreenLine.wpt); all lines that contain a + [presumably waypoint alias] -- using Sublime Text, execute a regex search for ^.*\+.*$, select find all, and press delete twice; results from _boundaries directory. This leaves 151 results of which the vast majority are valid. (Some look like simple miscapitalizations -- e.g. CA\usaca\ca.ca001.wpt:299:Fulst, others are more involved. There are quite a few in preview systems, but still many in active systems.)

region\system\wpt file : line number : result

Code: [Select]
AL\usaal\al.al005.wpt:69:PrattHwy http://www.openstreetmap.org/?lat=33.577266&lon=-86.913711
AL\usaal\al.al013.wpt:44:CourtSt http://www.openstreetmap.org/?lat=34.797488&lon=-87.674160
AL\usaal\al.al013.wpt:47:RoyalAve http://www.openstreetmap.org/?lat=34.805329&lon=-87.664483
AL\usaal\al.al017.wpt:115:CourtSt http://www.openstreetmap.org/?lat=34.797488&lon=-87.674160
AL\usaal\al.al017.wpt:118:RoyalAve http://www.openstreetmap.org/?lat=34.805329&lon=-87.664483
AL\usaal\al.al041.wpt:68:WaterAve http://www.openstreetmap.org/?lat=32.412348&lon=-86.995336
AL\usaal\al.al053.wpt:13:TN7Truck http://www.openstreetmap.org/?lat=34.991991&lon=-86.844633
AL\usaus\al.us043.wpt:2:CraftHwy http://www.openstreetmap.org/?lat=30.761632&lon=-88.074029
AL\usaus\al.us043.wpt:12:SteelDr http://www.openstreetmap.org/?lat=31.139821&lon=-88.009876
AL\usaus\al.us043.wpt:58:CedarAve http://www.openstreetmap.org/?lat=32.518402&lon=-87.835168
AL\usaus\al.us043.wpt:150:CourtSt http://www.openstreetmap.org/?lat=34.797488&lon=-87.674160
AL\usaus\al.us043.wpt:153:RoyalAve http://www.openstreetmap.org/?lat=34.805329&lon=-87.664483
AL\usaus\al.us072.wpt:6:HaleyDr http://www.openstreetmap.org/?lat=34.734590&lon=-87.897191
AL\usaus\al.us072.wpt:20:CourtSt http://www.openstreetmap.org/?lat=34.797488&lon=-87.674160
AL\usaus\al.us072.wpt:23:RoyalAv http://www.openstreetmap.org/?lat=34.805329&lon=-87.664483
AL\usaus\al.us072.wpt:40:ClintSt http://www.openstreetmap.org/?lat=34.788594&lon=-86.964217
AL\usaus\al.us072.wpt:51:NanceRd http://www.openstreetmap.org/?lat=34.755622&lon=-86.731852
AL\usaus\al.us072.wpt:65:BrockRd http://www.openstreetmap.org/?lat=34.734947&lon=-86.430259
AL\usaus\al.us078.wpt:38:PrattHwy http://www.openstreetmap.org/?lat=33.577266&lon=-86.913711
AL\usaus\al.us080.wpt:38:WaterAve http://www.openstreetmap.org/?lat=32.412348&lon=-86.995336
AL\usaus\al.us082.wpt:27:CampusDr http://www.openstreetmap.org/?lat=33.212584&lon=-87.525823
AL\usaus\al.us098.wpt:12:I-65Ramps http://www.openstreetmap.org/?lat=30.707093&lon=-88.122593
AL\usaus\al.us231.wpt:168:WinchRd http://www.openstreetmap.org/?lat=34.792814&lon=-86.577448
AL\usaus\al.us278.wpt:42:RidgeRd http://www.openstreetmap.org/?lat=34.193062&lon=-87.194286
AL\usaus\al.us278.wpt:82:CleveAv http://www.openstreetmap.org/?lat=34.019661&lon=-86.086024
AL\usaus\al.us280.wpt:9:CahabaRd http://www.openstreetmap.org/?lat=33.483357&lon=-86.779590
AL\usaus\al.us280.wpt:10:HollyBlvd http://www.openstreetmap.org/?lat=33.482355&lon=-86.778045
AL\usaus\al.us431.wpt:127:EBroadSt http://www.openstreetmap.org/?lat=33.986730&lon=-85.965711
AL\usaus\al.us431.wpt:135:CleveAv http://www.openstreetmap.org/?lat=34.019661&lon=-86.086024
AL\usaus\al.us431.wpt:185:WinchRd http://www.openstreetmap.org/?lat=34.792814&lon=-86.577448
ARG\argrn\arg.rn003.wpt:182:AveoseIng http://www.openstreetmap.org/?lat=-45.824216&lon=-67.466655
BLZ\blzar\blz.ar001.wpt:15:ValofPeaRd http://www.openstreetmap.org/?lat=17.258452&lon=-88.804291
CA\usaca\ca.ca001.wpt:299:Fulst http://www.openstreetmap.org/?lat=37.773133&lon=-122.471791
CA\usaca\ca.ca002.wpt:6:AveStars http://www.openstreetmap.org/?lat=34.061137&lon=-118.418541
CA\usaca\ca.ca029.wpt:5:Tenst http://www.openstreetmap.org/?lat=38.109986&lon=-122.254744
CA\usaush\ca.us066hishol.wpt:10:AveStars http://www.openstreetmap.org/?lat=34.061137&lon=-118.418541
CO\usaus\co.us285.wpt:67:LightLn http://www.openstreetmap.org/?lat=39.530608&lon=-105.305586
COD\afrrtr\cod.rtr05.wpt:206:Kimbi http://www.openstreetmap.org/?lat=-5.141853&lon=19.044714
COD\afrtah\cod.tah10.wpt:206:Kimbi http://www.openstreetmap.org/?lat=-5.141853&lon=19.044714
COD\codn\cod.n001.wpt:191:Kimbi http://www.openstreetmap.org/?lat=-5.141853&lon=19.044714
FL\usaus\fl.us001.wpt:274:LewSpdwy http://www.openstreetmap.org/?lat=29.940358&lon=-81.335014
GA\usaga\ga.ga003.wpt:87:AMSpeed http://www.openstreetmap.org/?lat=33.389291&lon=-84.307837
GA\usaus\ga.us019.wpt:87:AMSpeed http://www.openstreetmap.org/?lat=33.389291&lon=-84.307837
GA\usaus\ga.us041.wpt:134:AMSpeed http://www.openstreetmap.org/?lat=33.389291&lon=-84.307837
IA\usaia\ia.ia028.wpt:2:Prole http://www.openstreetmap.org/?lat=41.409261&lon=-93.726897
IA\usaia\ia.ia148.wpt:3:PearlSt http://www.openstreetmap.org/?lat=40.668872&lon=-94.721396
IA\usaia\ia.ia163.wpt:13:Monroe http://www.openstreetmap.org/?lat=41.544304&lon=-93.130245
IA\usaia\ia.ia163.wpt:17:Otley http://www.openstreetmap.org/?lat=41.464405&lon=-93.038921
ID\usaid\id.id004.wpt:5:Burke http://www.openstreetmap.org/?lat=47.520635&lon=-115.818837
ID\usaid\id.id013buskoo.wpt:2:Brdway_E http://www.openstreetmap.org/?lat=46.145176&lon=-115.971272
ID\usaid\id.id014.wpt:16:Golden http://www.openstreetmap.org/?lat=45.811881&lon=-115.682401
ID\usaid\id.id021.wpt:7:HilndVlySmt http://www.openstreetmap.org/?lat=43.570123&lon=-116.031404
ID\usaid\id.id022.wpt:12:SmallCO http://www.openstreetmap.org/?lat=44.172718&lon=-112.418217
ID\usaid\id.id051.wpt:14:Blkstn http://www.openstreetmap.org/?lat=42.440221&lon=-115.887000
IRL\eurtr\irl.waw.wpt:64:Veagh http://www.openstreetmap.org/?lat=54.974363&lon=-7.578678
ISL\islth\isl.th001.wpt:105:ThverDal http://www.openstreetmap.org/?lat=65.524878&lon=-19.797192
ISL\islth\isl.th083.wpt:14:AegisSid http://www.openstreetmap.org/?lat=65.946612&lon=-18.185484
ISL\islth\isl.th326.wpt:3:Haell http://www.openstreetmap.org/?lat=64.063079&lon=-20.238147
ISL\islth\isl.th413.wpt:8:ThingTor http://www.openstreetmap.org/?lat=64.101128&lon=-21.780245
ISL\islth\isl.th635.wpt:2:Arsel http://www.openstreetmap.org/?lat=65.892119&lon=-22.307868
ISL\islth\isl.th643.wpt:50:Arnes http://www.openstreetmap.org/?lat=66.011236&lon=-21.510061
ISL\islth\isl.th645.wpt:16:Bakki http://www.openstreetmap.org/?lat=65.776642&lon=-21.520329
LA\usala\la.la0428.wpt:3:GendeGDr_E http://www.openstreetmap.org/?lat=29.922734&lon=-90.017268
LA\usala\la.la0454.wpt:3:ABPorterRd http://www.openstreetmap.org/?lat=31.189230&lon=-92.248641
LA\usala\la.la0608.wpt:6:DavisLdg http://www.openstreetmap.org/?lat=32.036520&lon=-91.136130
LA\usala\la.la1168.wpt:3:BoisdArcRd http://www.openstreetmap.org/?lat=30.656146&lon=-92.228916
MEX-BC\mexd\mexbc.mex002d.wpt:2:BlvrJosexPon http://www.openstreetmap.org/?lat=32.536697&lon=-116.930344
MEX-DF\mexsf\mexdf.aniper.wpt:44:AveConst http://www.openstreetmap.org/?lat=19.410897&lon=-99.193870
MEX-DF\mexsf\mexdf.aniper.wpt:54:AveConsc http://www.openstreetmap.org/?lat=19.444180&lon=-99.218023
MEX-DF\mexsf\mexdf.auturbnte.wpt:3:AveConst http://www.openstreetmap.org/?lat=19.410897&lon=-99.193870
MEX-DF\mexsf\mexdf.auturbnte.wpt:8:AveConsc http://www.openstreetmap.org/?lat=19.444180&lon=-99.218023
MEX-YUC\mexsf\mexyuc.yuc01.wpt:24:Chenku http://www.openstreetmap.org/?lat=21.018874&lon=-89.661055
MO\usai\mo.i064.wpt:28:FortyDr http://www.openstreetmap.org/?lat=38.631539&lon=-90.378728
MO\usamo\mo.mo001.wpt:5:ChoTrfwy http://www.openstreetmap.org/?lat=39.193940&lon=-94.548683
MO\usamo\mo.mo125.wpt:9:GladTopTrail http://www.openstreetmap.org/?lat=36.660123&lon=-92.851810
MO\usamo\mo.mo269.wpt:1:FrontSt http://www.openstreetmap.org/?lat=39.130816&lon=-94.522848
MO\usaus\mo.us040.wpt:126:FortyDr http://www.openstreetmap.org/?lat=38.631539&lon=-90.378728
MP\usamp\mp.mp101.wpt:6:Isang http://www.openstreetmap.org/?lat=14.149185&lon=145.162826
MS\usaus\ms.us045.wpt:45:Lauder http://www.openstreetmap.org/?lat=32.519754&lon=-88.517261
MS\usaus\ms.us049e.wpt:26:Sidon http://www.openstreetmap.org/?lat=33.408159&lon=-90.204127
MS\usaus\ms.us051.wpt:87:I55Front http://www.openstreetmap.org/?lat=32.405110&lon=-90.145912
MS\usaus\ms.us051.wpt:98:UnionSt http://www.openstreetmap.org/?lat=32.599397&lon=-90.035834
MS\usaus\ms.us051.wpt:163:PearlSt http://www.openstreetmap.org/?lat=33.786370&lon=-89.809906
MS\usaus\ms.us051.wpt:182:HentzRd http://www.openstreetmap.org/?lat=34.225748&lon=-89.939361
MS\usaus\ms.us061.wpt:7:SligoSt http://www.openstreetmap.org/?lat=31.084565&lon=-91.304219
MS\usaus\ms.us061.wpt:22:SmithRd http://www.openstreetmap.org/?lat=31.304112&lon=-91.358829
MS\usaus\ms.us061.wpt:94:BowieRd http://www.openstreetmap.org/?lat=32.386739&lon=-90.827322
MS\usaus\ms.us061.wpt:111:KelsoRd http://www.openstreetmap.org/?lat=32.633701&lon=-90.863714
MS\usaus\ms.us061.wpt:113:DixieRd http://www.openstreetmap.org/?lat=32.654190&lon=-90.869765
MS\usaus\ms.us061.wpt:115:OmegaRd http://www.openstreetmap.org/?lat=32.669943&lon=-90.879121
MS\usaus\ms.us061.wpt:122:MooreRd http://www.openstreetmap.org/?lat=32.778327&lon=-90.933881
MS\usaus\ms.us061.wpt:153:HeadsRd http://www.openstreetmap.org/?lat=33.470507&lon=-90.846505
MS\usaus\ms.us061.wpt:170:MoodyRd http://www.openstreetmap.org/?lat=33.908783&lon=-90.742393
MS\usaus\ms.us072.wpt:41:SalemRd http://www.openstreetmap.org/?lat=34.909021&lon=-88.491526
MS\usaus\ms.us080.wpt:43:GuldeRd http://www.openstreetmap.org/?lat=32.294026&lon=-89.876533
MS\usaus\ms.us080.wpt:82:AdamsSt http://www.openstreetmap.org/?lat=32.326161&lon=-88.931065
MS\usaus\ms.us084.wpt:2:CanalSt http://www.openstreetmap.org/?lat=31.552062&lon=-91.412044
MS\usaus\ms.us084.wpt:20:Roxie http://www.openstreetmap.org/?lat=31.507435&lon=-91.064987
MS\usaus\ms.us084.wpt:22:KirbyRd http://www.openstreetmap.org/?lat=31.495177&lon=-91.013274
MS\usaus\ms.us084.wpt:107:MaxeyRd http://www.openstreetmap.org/?lat=31.695492&lon=-89.214649
MS\usaus\ms.us084.wpt:109:FlyntRd http://www.openstreetmap.org/?lat=31.694178&lon=-89.185038
MS\usaus\ms.us278.wpt:11:HeadsRd http://www.openstreetmap.org/?lat=33.470507&lon=-90.846505
MS\usaus\ms.us278.wpt:28:MoodyRd http://www.openstreetmap.org/?lat=33.908783&lon=-90.742393
MS\usaus\ms.us278.wpt:135:WolfeRd http://www.openstreetmap.org/?lat=33.883225&lon=-88.274760
MS\usaus\ms.us425.wpt:3:CanalSt http://www.openstreetmap.org/?lat=31.552062&lon=-91.412044
MS\usausb\ms.us051sprjac.wpt:7:EndMaint http://www.openstreetmap.org/?lat=32.285273&lon=-90.183120
MT\usausb\mt.us093busmis.wpt:3:MountAve http://www.openstreetmap.org/?lat=46.855936&lon=-114.013120
MT\usausb\mt.us093busmis.wpt:5:RiverSt http://www.openstreetmap.org/?lat=46.869683&lon=-114.003443
NC\usanc\nc.nc126.wpt:4:LakeJamesStaPk http://www.openstreetmap.org/?lat=35.728239&lon=-81.901703
NC\usaus\nc.us064.wpt:142:SpeMtnrd http://www.openstreetmap.org/?lat=35.716768&lon=-79.910329
NC\usaus\nc.us176.wpt:4:SlakeSumRd http://www.openstreetmap.org/?lat=35.233669&lon=-82.391018
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973
NM\usaus\nm.us054.wpt:25:Polly http://www.openstreetmap.org/?lat=33.570846&lon=-105.981481
ON\cannf\on.rr174.wpt:3:Traway http://www.openstreetmap.org/?lat=45.435736&lon=-75.598018
OR\usaor\or.or003.wpt:15:FloraLn http://www.openstreetmap.org/?lat=45.891054&lon=-117.262316
OR\usaor\or.or018buswil.wpt:3:Mainst http://www.openstreetmap.org/?lat=45.078445&lon=-123.486414
OR\usaor\or.or034.wpt:82:Colst http://www.openstreetmap.org/?lat=44.555846&lon=-123.079598
OR\usaor\or.or099.wpt:163:SixthAve http://www.openstreetmap.org/?lat=43.395140&lon=-123.314978
OR\usaor\or.or099.wpt:234:Brdwy http://www.openstreetmap.org/?lat=44.049927&lon=-123.086561
OR\usaor\or.or126busspr.wpt:3:Brdwy http://www.openstreetmap.org/?lat=44.049927&lon=-123.086561
OR\usaor\or.or138.wpt:64:CleFalls http://www.openstreetmap.org/?lat=43.247891&lon=-122.233865
OR\usaor\or.or173.wpt:11:TimLodge http://www.openstreetmap.org/?lat=45.330555&lon=-121.709290
OR\usaor\or.or219.wpt:17:NorthSt http://www.openstreetmap.org/?lat=45.304369&lon=-122.972760
OR\usaor\or.or223.wpt:4:MaxcreRd http://www.openstreetmap.org/?lat=44.695048&lon=-123.432212
OR\usaor\or.or238.wpt:16:Mainst http://www.openstreetmap.org/?lat=42.325412&lon=-122.943449
OR\usaor\or.or281.wpt:4:ClearCreRd http://www.openstreetmap.org/?lat=45.519564&lon=-121.596680
OR\usaor\or.or551.wpt:3:NEArdntRd http://www.openstreetmap.org/?lat=45.259603&lon=-122.769470
OR\usaush\or.us030hiscol.wpt:6:StarkSt http://www.openstreetmap.org/?lat=45.515610&lon=-122.361689
PR\usapr\pr.pr0003.wpt:83:CllLMRivera_E http://www.openstreetmap.org/?lat=18.007422&lon=-65.899344
QC\canqc\qc.qc198.wpt:12:AvMiller http://www.openstreetmap.org/?lat=48.951245&lon=-65.494206
ROU\roudj\rou.dj682e.wpt:3:Igrfry http://www.openstreetmap.org/?lat=46.123455&lon=20.786380
SCT\eurtr\sct.fifcoatr.wpt:50:BackStile http://www.openstreetmap.org/?lat=56.297586&lon=-2.658691
SD\usasd\sd.sd053.wpt:10:Mosher http://www.openstreetmap.org/?lat=43.474894&lon=-100.294061
SD\usasd\sd.sd471.wpt:2:Rumford http://www.openstreetmap.org/?lat=43.131025&lon=-103.700895
SD\usasd\sd.sd471.wpt:7:Provo http://www.openstreetmap.org/?lat=43.192255&lon=-103.833761
SD\usaus\sd.us016.wpt:11:AveofChiefs http://www.openstreetmap.org/?lat=43.820316&lon=-103.640041
SD\usaus\sd.us385.wpt:29:AveofChiefs http://www.openstreetmap.org/?lat=43.820316&lon=-103.640041
SD\usausb\sd.us014altdea.wpt:4:Savoy http://www.openstreetmap.org/?lat=44.352716&lon=-103.931608
SK\cansk\sk.sk010.wpt:33:Barvas http://www.openstreetmap.org/?lat=51.210211&lon=-102.099724
UT\usausb\ut.us191bushel.wpt:2:PopSt PoplarSt http://www.openstreetmap.org/?lat=39.682635&lon=-110.854765 [should this have a + before PoplarSt? it appears correctly in Highway Browser.]
UT\usausb\ut.us191bushel.wpt:3:JanSt JanetSt http://www.openstreetmap.org/?lat=39.688473&lon=-110.854679 [should this have a + before JanetSt? it appears correctly in Highway Browser.]
UT\usaut\ut.ut024.wpt:137:OldStateHwy http://www.openstreetmap.org/?lat=38.873870&lon=-110.359243
UT\usaut\ut.ut257.wpt:11:Bloom http://www.openstreetmap.org/?lat=38.937581&lon=-112.808647
VA\usaus\va.us050.wpt:50:QueenSt http://www.openstreetmap.org/?lat=38.889655&lon=-77.077965
WI\usaus\wi.us010.wpt:159:Ferry http://www.openstreetmap.org/?lat=44.090020&lon=-87.650449
WI\usausb\wi.us041altosh.wpt:7:SnellRd http://www.openstreetmap.org/?lat=44.068409&lon=-88.543153
WI\usawi\wi.wi080.wpt:64:Sprague http://www.openstreetmap.org/?lat=44.147633&lon=-90.131750
WI\usawi\wi.wi169.wpt:8:Gurney http://www.openstreetmap.org/?lat=46.470734&lon=-90.508289
WI\usawi\wi.wi175.wpt:7:LloydSt http://www.openstreetmap.org/?lat=43.054684&lon=-87.971778
ZWE\zwep\zwe.p011.wpt:22:Gokwe http://www.openstreetmap.org/?lat=-18.217632&lon=28.942280
« Last Edit: January 11, 2020, 09:15:13 am by cvoight »

Offline oscar

  • TM Collaborator
  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 1584
  • Last Login:Today at 02:29:04 am
    • Hot Springs and Highways pages
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #10 on: January 10, 2020, 09:51:45 pm »
Many of these errors are in in-development/preview systems, rather than active systems. Many errors in those systems will be picked up and fixed in the peer review process, to get the systems ready for promotion to active status.

In the preview systems I manage, peer review has already picked up all the listed errors, and fixes are in my queue. The two errors in my active systems, one was already in my queue, and the other one has just been added.

Offline michih

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4849
  • Last Login:Yesterday at 11:21:09 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #11 on: January 11, 2020, 04:32:22 am »
@cvoight, thanks for your effort.

We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php

I just wonder, why some are not detected, e.g.:

Code: [Select]
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973

@yakra? Any idea, should I open a Github issue?

It's an active system and I'll fix it.

Offline cvoight

  • Newbie
  • *
  • Posts: 18
  • Last Login:September 03, 2021, 10:39:18 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #12 on: January 15, 2020, 03:49:25 pm »
Parenthesizing your Select-String query and appending .length allows you to find the total number of results. For example, to find the number of waypoint labels with a US route as the primary highway (31,265):

Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Directory "*usa*" -Recurse | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US").length
31265

Quote
US80/42, A5/A6, I-5/6, I-80/90, I-80/6
Putting two highways in a waypoint label: Put the primary highway first, followed by a slash, followed by the second highway. Drop the prefix of the second highway if it is more than one character long. A5/A6 becomes A5/A6. I-5/I-6 becomes I-5/6. I-25/US50 becomes I-25/50.

edit: I redid this post as I did not read the label style guide properly. In fact, I quite bungled it in just about every way! My apologies for any confusion! Real results can be found below.

I actually find the waypoints with prefixes on both sides of the slash more attractive, but this does make for an interesting regex query. First, let's check and make sure we can return results for a general query by seeing if there are any waypoint labels with US routes on both sides of the slash: US<something>/US<something>.

Code: [Select]
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US.*\/US"

NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NV\usai\nv.i580.wpt:1:US50/395 +US50/US395 http://www.openstreetmap.org/?lat=39.120747&lon=-119.771919
SC\usaus\sc.us521.wpt:47:US521Trk/601Trk_N +US521Trk/US601Trk_N http://www.openstreetmap.org/?lat=34.290499&lon=-80.611326

Sure enough, but this regex gives us two false positives (NV and SC). yakra's grep queries use cut -f1 -d " " so that only the primary waypoint label is matched (US50/395 and US521Trk/601Trk_N here). The naive regex I'm been using looks for US at the start of a primary waypoint label, but then matches zero or more of any character (.*) until it finds a /US. Changing .* to [^\s]* (zero or more non-whitespace characters) eliminates the issue as the match will stop once it gets past the primary waypoint label. Success!

Code: [Select]
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US[^\s]*\/US"

NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397

The style guide says to drop the secondary highway's prefix after the slash if it's more than one character long, which is a somewhat simple query. Let us assume two things: (1) the only valid characters for highway prefixes are capital letters and hyphens; (2) a highway waypoint label is composed of a highway prefix followed by a number. So we can start by matching all non-whitespace characters that aren't a + (hidden waypoint label or something, not sure on the terminology) from the start of a primary waypoint label [^\s+] until we find a forward slash \/, then look for any instances where there are two or more uppercase letters followed by a number [A-Z-]{2,}[0-9]. Tying it all together: Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s+]+\/[A-Z-]{2,}[0-9]" (88 results using .length).

This is somewhat less interesting than one of the queries I was attempting because I misread the examples: what are the waypoint labels that include the same highway prefix on both sides of the slash? (I had also failed to note that the secondary highway prefix must be 2 characters, bah!) The reason it's interesting is that we get to use a capture group and backreferences! (backreferences actually make regular expressions...not regular, but that's why they're fun!) The pattern we want to construct should match two or more valid highway prefix characters, followed by any number of non-whitespace characters, followed by a forward slash, followed by our first match. A naive construction (([A-Z-]{2,})[^\s]*\/\1) quickly runs into problems due to the way backreferences work (see the section Backtracking Into Capturing Groups from the link): if two different sets of 3+ capital letters are on opposite sides of the slash and any subset of 2 characters is in the same order, a match results. You might see where this is going: borders look like valid highway prefixes to a regex and several border pairings yield matches: BGR/GRC, ALM/ALA, IRN/IRQ, etc. (I like to start testing regexes live and then use test data with regexr to iterate more quickly.)

So, matching highway prefixes alone is not enough, we have to be sure that the waypoint label is a highway. The first thing we do is change [^\s]* to [^\s]+. The + requires that the primary highway prefix be followed by one or more non-whitespace characters instead of zero or more. This screens out the BGR/GRC matches, but not many of the other border matches. If we assume that a highway is distinguished by the presence of a number after the highway prefix, we can screen out the rest of the false matches by requiring a number to appear in the secondary highway label after the secondary highway prefix. So, our final regex is: ^([A-Z-]{2,})[^\s]+\/\1[0-9]. If we tried to use non-whitespace characters [^\s] instead of a number [0-9] after the secondary highway prefix, the borders would still match because letters are non-whitespace. Likewise, trying to match only numbers after the primary highway prefix wouldn't work because many highways have suffixes that don't have to be numbers. (I'm sure this is well-understood information to anyone reading this, but I find it a useful practice to go through the reasoning.) Lastly, we can change our assumptions about what highway prefixes look like and use [^0-9] instead of [A-Z-]: assume any non-digit character can be a highway prefix.

note: I have not evaluated active vs. preview systems for these results, nor do I know whether any of these results need to be changed.

Code: [Select]
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^([^0-9]{2,})[^\s]+\/\1[0-9]"

BC\canbc\bc.bc003.wpt:224:BC93/BC95 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
BC\canbc\bc.bc093.wpt:22:BC3/BC95 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
BC\canbc\bc.bc095.wpt:33:BC3/BC93 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
CO\usaus\co.us085.wpt:119:I-25/I-70 +I-25(214A) http://www.openstreetmap.org/?lat=39.780113&lon=-104.989429
DEU-BY\deub\deuby.b013.wpt:16:St2256/St2419 http://www.openstreetmap.org/?lat=49.543987&lon=10.233706
DEU-BY\deub\deuby.b022bam.wpt:2:St2271/St2450 http://www.openstreetmap.org/?lat=49.796849&lon=10.222435
DNK\eurtr\dnk.mrsja.wpt:214:SR207/SR211 http://www.openstreetmap.org/?lat=55.845085&lon=12.094746
IN\usain\in.in001.wpt:68:I-69/I-469 http://www.openstreetmap.org/?lat=41.168382&lon=-85.104218
IND-PB\index\indpb.acexpy.wpt:8:NH5/NH7 http://www.openstreetmap.org/?lat=30.658329&lon=76.820827
ISL\islth\isl.th041.wpt:3:TH45/TH429 http://www.openstreetmap.org/?lat=64.002911&lon=-22.602568
ITA\eure\ita.e78.wpt:24:SS73/SS715 +SS326 http://www.openstreetmap.org/?lat=43.326457&lon=11.549098
NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us301.wpt:76:NC43/NC48_S http://www.openstreetmap.org/?lat=35.973595&lon=-77.802172
OH\usaoh\oh.oh012.wpt:1:OH115/OH189 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaoh\oh.oh115.wpt:7:OH12/OH189 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaoh\oh.oh189.wpt:13:OH12/OH115 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaus\oh.us023.wpt:13:OH32/OH124 http://www.openstreetmap.org/?lat=39.047869&lon=-83.023725
OR\usaor\or.or099.wpt:253:OR99W/OR99E http://www.openstreetmap.org/?lat=44.229772&lon=-123.204675
OR\usaor\or.or211.wpt:1:OR99E/OR214 http://www.openstreetmap.org/?lat=45.151311&lon=-122.831290
OR\usaor\or.or214.wpt:4:OR99E/OR211 http://www.openstreetmap.org/?lat=45.151311&lon=-122.831290
POL\eure\pol.e67.wpt:228:DW382/DW385 http://www.openstreetmap.org/?lat=50.580430&lon=16.803718
POL\poldk\pol.dk008klo.wpt:19:DW382/DW385 http://www.openstreetmap.org/?lat=50.580430&lon=16.803718
ROU\eure\rou.e60.wpt:186:DJ100B/DJ101E http://www.openstreetmap.org/?lat=44.784738&lon=26.098564
ROU\roudj\rou.dj222.wpt:15:DN22/DN22D http://www.openstreetmap.org/?lat=44.775267&lon=28.688680
ROU\roudj\rou.dj601e.wpt:1:DJ401A/DJ601 http://www.openstreetmap.org/?lat=44.448668&lon=25.778433
ROU\roudn\rou.dn001.wpt:19:DJ100B/DJ101E http://www.openstreetmap.org/?lat=44.784738&lon=26.098564
WY\usawy\wy.wy530.wpt:13:FR7/FR157 http://www.openstreetmap.org/?lat=41.209445&lon=-109.632559
« Last Edit: January 15, 2020, 10:26:49 pm by cvoight »

Offline cvoight

  • Newbie
  • *
  • Posts: 18
  • Last Login:September 03, 2021, 10:39:18 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #13 on: January 15, 2020, 04:53:15 pm »
@cvoight, thanks for your effort.

We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php

I just wonder, why some are not detected, e.g.:

Code: [Select]
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973

@yakra? Any idea, should I open a Github issue?
not yakra, but I did peruse siteupdate.py (this link freezes my mobile browser for those browsing by phone, too much data), specifically the results for datacheckerrors.append which I believe is a comprehensive accounting of all the data checks. Generally, the waypoint label checks currently implemented don't test for consistency with the labeling style guidelines, but rather what I would consider more fundamental errors*, such as more than one slash, unbalanced parentheses, and invalid characters. There are a few checks for consistency, e.g. using "Bus" instead of "BL" or "BS" for business interstates, or the LONG_UNDERSCORE checks (flags _Abcd, _AbcdN type stuff), but nothing comprehensive. It would probably be useful to produce public documentation on the exact rules that throw flags (or maybe this exists and I couldn't find it at first glance).

* for what it's worth, I may be alone in drawing a distinction here, and you may consider these checks to be the same kind of thing.
« Last Edit: January 15, 2020, 10:01:29 pm by cvoight »

Offline cvoight

  • Newbie
  • *
  • Posts: 18
  • Last Login:September 03, 2021, 10:39:18 am
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #14 on: January 16, 2020, 12:37:12 pm »
Regions for the United States and Canada don't have a country prefix, but they can be isolated using an exhaustive OR'd regex. The results can then be piped into Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "<pattern_goes_here>" to scan waypoint labels. The (?: indicates this is a non-capturing group, which doesn't matter much in this instance and could be changed to just (. (We have to group the entire OR'd query so that the beginning of line symbol ^ and end of line symbol $ match properly. Without those symbols, MB would match Zambia ZMB or M[ADEINOPST] would match all of the Mexico MEX- regions.)
Code: [Select]
USA: Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"}
CAN: Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:AB|BC|MB|N[BLST]|ON|PE|QC|SK|YT)$"}
This is more seemly than using a unique system wildcard with Get-ChildItem -Directory "<system_wildcard_goes_here>" -Recurse, which does not exist in the case of the US because AUS A route directories contain the string usa).

Quote
I-80, A-73
USA Interstates and Quebec Autoroutes retain hyphens.

We can redo our previous search for primary waypoint labels for interstates that lack a hyphen (already posted up, so I won't repost the results of the query):
Code: [Select]
PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^I[0-9]"
To check the primary waypoint labels for Quebec autoroutes, we can slim down our Where-Object query because we only care about QC:
Code: [Select]
PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^QC$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^A[0-9]"No results, so all the primary waypoint labels for QC autoroutes (450 results) meet the style guidelines.
« Last Edit: January 16, 2020, 12:40:04 pm by cvoight »