Travel Mapping

User Discussions => How To? => Topic started by: yakra on October 04, 2019, 04:41:33 pm

Title: Using grep and shell scripts to search highway data, waypoint labels, etc.
Post by: yakra on October 04, 2019, 04:41:33 pm
grep (https://ss64.com/bash/grep.html) is a powerful tool to search for regular expressions (https://en.wikipedia.org/wiki/Regular_expression) in text files. It runs on Unix-like systems such as Linux, FreeBSD (e.g. noreaster) or Macintosh. With a little practice, you can put together expressions to search for very specific types of text strings. In this post, I'll focus on waypoint labels.

Commands were executed from the hwy_data/ directory.
They're designed to easily substitute different regions into the $rg variable, so you can execute the same command different times to search different regions.
rg=PA
grep whatever $rg/*/*.wpt
rg=DE
grep whatever $rg/*/*.wpt

et cetera. You can also wrap them up inside loops to search multiple regions in one go:
for rg in PA DE; do grep whatever $rg/*/*.wpt; done
or
MyRegions='PA DE'
for rg in $MyRegions; do grep whatever $rg/*/*.wpt; done

et cetera.

The cut (https://ss64.com/bash/cut.html) command was included to strip out AltLabels and URLs from the .wpt file lines, to ensure that we're only searching within primary labels.
If you have questions about the various tokens, expressions or commands, feel free to ask.

Some example search results in Pennsylvania and Delaware are posted here (http://forum.travelmapping.net/index.php?topic=3246).



Potential no-underscore malformed city suffixes
grep -v '^+' $rg/*/*.wpt | cut -f1 -d' ' | egrep -v '[A-Z][A-Z][0-9]{1,3}Alt|[A-Z][A-Z][0-9]{1,3}Bus|[A-Z][A-Z][0-9]{1,3}Byp|[A-Z][A-Z][0-9]{1,3}Trk|[A-Z][A-Z][0-9]{1,3}His|[A-Z][A-Z][0-9]{1,3}Spr|[A-Z][A-Z][0-9]{1,3}Con' | egrep ':[A-Z]{2}[0-9]{1,3}[A-Z][a-z]{2}'
This one can get confusing.
Edit, 2019-11-18 01:32 EST: Only check primary labels; avoid false negatives.

Intersections with visibly numbered highways (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
US40: US40BusWhi | US73: US40BusWhi US40BusTho | A3: A3Zur
Distinguish two different same-bannered same-numbered routes as needed with the 3-letter city abbreviations. Also use the city abbreviation for bannerless same-designation spurs or branches, such as the Zurich A3 spur intersecting the main A3.
While we do want to use city abbreviations for routes (in the HB) that have them...

Distinguishing otherwise identical waypoints (not for exit numbers) (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
US95: NV57_Ren | NV57_Tah
If more than two points for the same non-exit-numbered cross road are needed, there are two options which can be used in combination with or ignoring the previous options for pairs of identical labels.
1. Use alphabetical suffixes _A, _B, _C, etc.
2. Choose 3-letter suffixes for a nearby towns if they are fairly close. The 3-letter suffix should be the first 3 letters of the town name. Add a suffix with an underscore and those 3 letters.
...disambiguating multiple intersections with a single route, or with roads not in the HB, is done with an underscore.
A lot of time, people may use the former when they intended to use the latter.



Underscored 4-character city suffixes not ending in N, E, W, or S
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | grep '_...[A-DF-MO-RTUVXYZa-z]$'
Distinguishing otherwise identical waypoints (not for exit numbers) (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
NV88_PitS | NV88_PitN
If you need the same town twice, add a 4th letter that is a direction letter (_PitS and _PitN for southern and northern junctions near Pittston). If the town suffixes are not useful, are confusing, or require further elaboration (3+ junctions with same town), use the alphabetical suffixes instead.
Edit, 2019-11-17 20:58 EST: Only check primary labels.



2-character city suffixes
egrep -sv '^\+' $rg/*/*.wpt | cut -f1 -d' ' | grep -v '_U[0-9]$' | grep '_..$'
Edit, 2019-11-17 22:08 EST: Exclude false positives caused by numbered U-turn ramps.



Old highway designation labels
grep -s '^Old..[0-9]' $rg/*/*.wpt
Waypoints for roads that no longer have a name or no longer exist as a road (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
MainSt
If the old highway has a posted name or number, use that name or number (don't make one up), and label the waypoint according to the usual rules.
If US 30/Main Street becomes Main Street, then the label is MainSt, not something like OldUS30.

OldUS40
If the old highway has a posted name that mentions the old designation, then applying the above rule will result in a label like OldUS40 for "Old Route 40" if the route was formerly US 40.
There will be a good number of false-positive, legitimate labels here.
I'm not sure how closely these rules (if it's a requirement & not an option) have been followed; there are a lot of examples where "OldRt" or "OldHwy", etc. are used if a road is signed as such in the field, or if that's the official name. (Although, Tim may have done it that way before these specific guidelines were codified...) May be something worth discussing in the forum.



Long words, >4 characters
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | egrep ':.*[a-z]{4}'
Intersections with named highways (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
FaiRd PapMillRd
For up to two other (specifying) words, truncate the word as follows:
1-4 letters - use whole word
5+ letters - use the first 3 letters. Don't use a made-up abbreviation. Fairchild Road becomes FaiRd, not FrchldRd or FchRd or anything else.



Too many words
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | egrep -v ':[A-Z]*[a-z]*Mc[A-Z][A-Z][a-z]*$|:Mc[A-Z]{2,}[a-z]*[A-Z][a-z]*$|:[A-Z]*Mc[A-Z]+[a-z]?$' | egrep '[A-Z]+[a-z]+[A-Z]+[a-z]+[A-Z]+[a-z]+[A-Z]+[a-z]*|[A-Z]+[a-z]+[A-Z]+[a-z]+[A-Z]+[a-z]?[A-Z]+[a-z]+|[A-Z]+[a-z]+[A-Z]+[a-z]?[A-Z]+[a-z]+[A-Z]+[a-z]*|[A-Z]+[a-z]?[A-Z]+[a-z]+[A-Z]+[a-z]+[A-Z]+[a-z]*'
Intersections with named highways (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
MarKingBlvd MLKingBlvd
If the cross road name has more than 3 words, use one of two options:
1. Pick out the two most important words besides the road type and use only those: Martin Luther King Boulevard becomes MarKingBlvd. Three words in total are included in shortened form.
2. Pick out one important word besides the road type and use it and the initials of the other words: Martin Luther King Boulevard becomes MLKingBlvd. Two words in total are included in shortened form along with initials of the rest.
An older style for 3-word road names was to have Two truncated words & one initial. E.G., Lisbon Falls Village Rd -> LisFalVRd.
This was later deprecated in favor of the rule quoted above. LisFalRd, LFVilRd, etc.
Edit, 2019-11-17 21:40 EST: Use egrep for BSD (e.g. noreaster) compatibility due to + token.



McDonRd
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | grep 'Mc[A-Z][a-z]'
In the above search, the Mc[A-Z] tokens are included to filter out "Old McDonald Farm Rd" style false positives.
OTOH, McDonald is one word, so it makes sense to truncate it to McD, rather than treat it as 2 for McDon.



Tpke for Turnpike
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | grep 'Tpke'
CHM standardized on "Tpk" for turnpike, standard abbreviations (https://pe.usps.com/text/pub28/28apc_002.htm) notwithstanding...



Directional prefixes
egrep -sv "^\+|^.[a-z]|^$rg[0-9]+" $rg/*/*.wpt | cut -f1 -d' ' | grep ':[NEWS]' | egrep -v ":[A-Z][A-Z]/[A-Z][A-Z]$|:[NEWS]End$"
Intersections with named highways (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
6thSt | 33rdAve | SeeLn
Ignore any non-essential direction specifier. N. 6th St becomes 6thSt. 33rd Avenue SW becomes 33rdAve. W. Seedy Lane becomes SeeLn.

NorPkwy | SouBlvd
But keep directions that are the main part of the road name, such as NorPkwy for Northern Parkway or SouBlvd for Southeast Boulevard.
What's a "non-essential direction specifier" and what are "directions that are the main part of the road name" can get wibbly-wobbly at times; use your best judgment. Many of these may be legit, but they're worth checking out.
IMO though, N/E/W/S are common enough abbreviations for their respective directions, and there are enough examples of this throughout the data, that I'm not going to call for every WPondRd to be changed to a WestPondRd.



Similarly, a search for suffixed directional specifiers:
egrep -sv '^\+' $rg/*/*.wpt | cut -f1 -d' ' | grep '[NEWS]$' | egrep -v '_|[0-9]{1,3}[NEWS]$|:[A-Z][A-Z]/[A-Z][A-Z]$|:CR[A-Z]$|:Rd[A-Z]$|:Ave[A-Z]$|:I\-[0-9]{1,3}BS$'

Edit, 2019-11-17 20:11 EST: Use cut instead of sed to isolate primary labels. More concise, clear, fast.
Edit, 2019-11-17 21:08 EST: Use grep -s to suppress error messages for regions with no WPTs.
Edit, 2019-11-18 01:07 EST: Use egrep for more human-readable expression strings, particularly with | and {}
Edit, 2019-11-18 01:53 EST: Exclude hidden points from all searches.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: yakra on October 06, 2019, 01:12:13 pm
Rte(Number) style labels on US Interstates
for file in */usai/*.wpt; do results=`grep -H . $file | cut -f1 -d' ' | grep 'I-[0-9][0-9]*(..*)' | wc -l`; if [ $results -gt 2 ]; then grep -H . $file | cut -f1 -d' ' | grep 'I-[0-9][0-9]*(..*)'; fi; done
One point of confusion can be the two styles of waypoint labeling for concurrencies with Interstate Highways and other exit numbered routes.

Interchanges on exit-numbered highways (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
I-80: 56 | 87(75) | 89(75) | 63
In multiplexes where the concurrency uses exit numbers from the other highway, put the highway number in parentheses. Drop the letter prefix of the concurrent highway if it is more than one character long: I-75 becomes (75). A5 can stay as (A5).
The vast majority of Interstates (with rare exceptions, most obviously WY I-180) will be exit-numbered routes, and thus fall into this category.

Waypoint labels for multiplexes (http://travelmapping.net/devel/manual/wayptlabels.php)
Quote
US90: LA76 | I-49(45) | I-49(47) | I-49(52) | US40
For non-exit-numbered routes concurrent with a numbered, exit-numbered route, use the concurrent highway designation with the exit numbers in parentheses.
We'll sometimes see this erroneously; this basic (Interstate-to-Interstate only) search is designed to pick up on these cases.
It only prints results for routes with at least 3 such labels: Some routes, such as NY I-490 or ME I-295, have both ends at unnumbered interchanges with the parent route, necessitating this style label. Some routes, such as TX I-69, are not fully exit-numbered yet, and contain some crossroad-style labels.
Right now, it reports some malformed labels in ms.i055.wpt & wi.i041.wpt.



How far down this rabbit hole can we go?
An experiment. A somewhat unwieldy one...
rm -f exit_numbers.log looks_unnumbered.log looks_numbered.log
touch exit_numbers.log looks_unnumbered.log looks_numbered.log
for rg in `ls -F | grep '/$' | cut -f1 -d/`; do
  for sys in `ls -F $rg | grep '/$' | cut -f1 -d/`; do
    for file in `ls $rg/$sys | grep '\.wpt$'`; do
      # This gets messed up by files with spaces in their names, and ignores them.
      if [ -r $rg/$sys/$file ]; then
        LooksNumbered=`cut -f1 -d' ' $rg/$sys/$file | grep -c -m 1 '^[0-9][0-9]*$'`
        if [ $LooksNumbered == 0 ]; then
          results=`cut -f1 -d' ' $rg/$sys/$file | grep '^\*\?[0-9][0-9]*[A-Z]\?([0-9A-Za-z][0-9A-Za-z]*)' | wc -l`
          if (( $results != 0 && $results != `cat $rg/$sys/$file | wc -l` )); then
            echo -e "\n$rg/$sys/$file looks like a non-exit-numbered route" \
            | tee -a exit_numbers.log | tee -a looks_unnumbered.log
            cut -f1 -d' ' $rg/$sys/$file \
            | grep '^\*\?[0-9][0-9]*[A-Z]\?([0-9A-Za-z][0-9A-Za-z]*)' \
            | tee -a exit_numbers.log | tee -a looks_unnumbered.log
          fi
        fi
        if [ $LooksNumbered == 1 ]; then
          results=`cut -f1 -d' ' $rg/$sys/$file | grep '^\*\?[A-Z][A-Z]*[A-Za-z]*-\?[0-9]*[A-Za-z]*([A-Z]*[0-9]*[A-Z]*)$' | wc -l`
          if [ $results -gt 2 ]; then
            echo -e "\n$rg/$sys/$file looks like an exit-numbered highway" \
            | tee -a exit_numbers.log | tee -a looks_numbered.log
            cut -f1 -d' ' $rg/$sys/$file \
            | grep '^\*\?[A-Z][A-Z]*[A-Za-z]*-\?[0-9]*[A-Za-z]*([A-Z]*[0-9]*[A-Z]*)$' \
            | tee -a exit_numbers.log | tee -a looks_numbered.log
          fi
        fi
      fi
    done
  done
done
lbc=`cut -f1 -d' ' exit_numbers.log | grep -v '\.wpt$' | wc -w`
rtc=`cut -f1 -d' ' exit_numbers.log | grep '\.wpt$' | wc -l`
rgc=`cut -f1 -d' ' exit_numbers.log | grep '\.wpt$' | cut -f1 -d/ | uniq | wc -l`
lbw=`echo $lbc | wc -c`
rtw=`echo $rtc | wc -c`
rgw=`echo $rgc | wc -c`
printf   'looks   numbered: %'"$lbw"'d labels in %'"$rtw"'d routes in %'"$rgw"'d regions\n' \
   `cut -f1 -d' ' looks_numbered.log | grep -v '\.wpt$' | wc -w` \
   `cut -f1 -d' ' looks_numbered.log | grep '\.wpt$' | wc -l` \
   `cut -f1 -d' ' looks_numbered.log | grep '\.wpt$' | cut -f1 -d/ | uniq | wc -l`
printf   'looks unnumbered: %'"$lbw"'d labels in %'"$rtw"'d routes in %'"$rgw"'d regions\n' \
   `cut -f1 -d' ' looks_unnumbered.log | grep -v '\.wpt$' | wc -w` \
   `cut -f1 -d' ' looks_unnumbered.log | grep '\.wpt$' | wc -l` \
   `cut -f1 -d' ' looks_unnumbered.log | grep '\.wpt$' | cut -f1 -d/ | uniq | wc -l`
printf   '        combined: %'"$lbw"'d labels in %'"$rtw"'d routes in %'"$rgw"'d regions\n' $lbc $rtc $rgc

Works on my Linux machines, but not noreaster, for reasons I've not looked into. Fixed!
looks   numbered: (http://yakra.teresco.org/logs/looks_numbered.log)    600 labels in   58 routes in   26 regions
looks unnumbered: (http://yakra.teresco.org/logs/looks_unnumbered.log)  12425 labels in  703 routes in  102 regions
        combined: (http://yakra.teresco.org/logs/exit_numbers.log)  13025 labels in  761 routes in  112 regions
...Back away slowly.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: yakra on November 18, 2019, 03:16:35 am
Label lacks generic highway type
grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' | egrep 'Old[0-9]+_|Old[0-9]+$'
Label should be OldUS42, OldHwy42, OldRte42, OldME42, etc., not "Old42". This was a datacheck on CHM, but although a comment describing the siteupdate program's DatacheckEntry class mentions a LACKS_GENERIC datacheck code, there's nothing actually implementing it yet.

There are only 4 results project-wide. Not bad.
FL/usaus/fl.us027.wpt: CROld50 (should be WasSt (https://www.google.com/maps/@28.575797,-81.7483546,3a,15y,200.25h,97.21t/data=!3m9!1e1!3m7!1s953PPxSLiSPx89EeljnU_A!2e0!7i16384!8i8192!9m2!1b1!2i40)?)
NOR/eure/nor.e18.wpt: *Old70
NOR/eure/nor.e18.wpt: *Old67
NOR/eursf/nor.oslokrimotkri.wpt: *Old70
ON/canon/on.on127.wpt: Old127
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: neroute2 on November 18, 2019, 08:45:36 am
FL/usaus/fl.us027.wpt: CROld50 (should be WasSt (https://www.google.com/maps/@28.575797,-81.7483546,3a,15y,200.25h,97.21t/data=!3m9!1e1!3m7!1s953PPxSLiSPx89EeljnU_A!2e0!7i16384!8i8192!9m2!1b1!2i40)?)
It's a numbered county road with Old 50 (https://www.google.com/maps/@28.577435,-81.7494863,3a,27.4y,195.47h,91.53t/data=!3m9!1e1!3m7!1sOWbH4xqT9eQ-WbPxP_XTpA!2e0!7i16384!8i8192!9m2!1b1!2i40) or Old Hwy 50 (https://www.google.com/maps/@28.5753137,-81.7420551,3a,17.7y,185.63h,83.92t/data=!3m6!1e1!3m4!1sBTU78ReCUkP7QKdZJ_04bQ!2e0!7i13312!8i6656) in the shield. False positive.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: michih on November 18, 2019, 11:50:36 am
NOR/eure/nor.e18.wpt: *Old70
NOR/eure/nor.e18.wpt: *Old67
NOR/eursf/nor.oslokrimotkri.wpt: *Old70

Former location of exit 70 and exit 67. Fine to me.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: yakra on November 18, 2019, 01:17:40 pm
It's a numbered county road with Old 50 (https://www.google.com/maps/@28.577435,-81.7494863,3a,27.4y,195.47h,91.53t/data=!3m9!1e1!3m7!1sOWbH4xqT9eQ-WbPxP_XTpA!2e0!7i16384!8i8192!9m2!1b1!2i40) or Old Hwy 50 (https://www.google.com/maps/@28.5753137,-81.7420551,3a,17.7y,185.63h,83.92t/data=!3m6!1e1!3m4!1sBTU78ReCUkP7QKdZJ_04bQ!2e0!7i13312!8i6656) in the shield. False positive.
Oh wows. I did look for something like that, just not hard enough. :D
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: rickmastfan67 on November 21, 2019, 04:28:56 am
It's a numbered county road with Old 50 (https://www.google.com/maps/@28.577435,-81.7494863,3a,27.4y,195.47h,91.53t/data=!3m9!1e1!3m7!1sOWbH4xqT9eQ-WbPxP_XTpA!2e0!7i16384!8i8192!9m2!1b1!2i40) or Old Hwy 50 (https://www.google.com/maps/@28.5753137,-81.7420551,3a,17.7y,185.63h,83.92t/data=!3m6!1e1!3m4!1sBTU78ReCUkP7QKdZJ_04bQ!2e0!7i13312!8i6656) in the shield. False positive.
Oh wows. I did look for something like that, just not hard enough. :D

Welcome to the world of insanity in FL.  :P
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: yakra on November 22, 2019, 01:19:22 am
Could that be considered a form of bannered route?  ;D
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: cvoight on January 10, 2020, 02:09:49 pm
The closest equivalent native to Windows PowerShell is Select-String, which is decently powerful though no grep. Executing from the hwy_data directory, you use Get-ChildItem -Include "*.wpt" -Recurse to feed all the files into Select-String. Your pattern will always start with ^ if searching waypoint labels as that signifies the start of a line, functioning similarly to the cut -f1 -d " " portion of yakra's commands.

For example, executing the following command finds any waypoint labels that start with US- (i.e. US routes with a hyphen between the US and the route number).

Code: [Select]
Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US-"
Slightly more complicated: find all the waypoints in the US that start with I followed directly by a number. [Note: this is not quite restricted to the US as there are some non-US regions with highway directories that contain the string "usa", but it eliminates most regions and (currently) all false positives. See the code section in this post (http://forum.travelmapping.net/index.php?topic=3455.msg17451#msg17451) for a regex that matches only US states and territories.]

Code: [Select]
Get-ChildItem -Directory "*usa*" -Recurse | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^I[0-9]"
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: cvoight on January 10, 2020, 08:51:57 pm
Pursuant to a point by neroute2 (http://forum.travelmapping.net/index.php?topic=3473.msg17450#msg17450) that I had overlooked:

Quote
FaiRd, PapMillRd
Truncation rules (http://travelmapping.net/devel/manual/wayptlabels.php#truncate): abbreviate the generic road type (Rd for Road, Blvd for Boulevard, etc.) if it's one of the very common types. Otherwise, use the first three letters: Uli for Ulica. Skip the final period.

For up to two other (specifying) words, truncate the word as follows:
1-4 letters - use whole word
5+ letters - use the first 3 letters. Don't use a made-up abbreviation. Fairchild Road becomes FaiRd, not FrchldRd or FchRd or anything else.

First, find all waypoints with a capital letter followed by 4 or more lowercase letters, executed from the hwy_data directory: Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[A-Z][a-z]{4,}" (In actuality, I piped this to a text file by appending | Out-File .\four_lowercase.txt

Scanning this file, I filter the results: delete many odd results from ITA\ita.nsa (2).wpt; delete Minnesota subway lines (MN\usamnmt\BlueLine.wpt & MN\usamnmt\GreenLine.wpt); all lines that contain a + [presumably waypoint alias] -- using Sublime Text, execute a regex search for ^.*\+.*$, select find all, and press delete twice; results from _boundaries directory. This leaves 151 results of which the vast majority are valid. (Some look like simple miscapitalizations -- e.g. CA\usaca\ca.ca001.wpt:299:Fulst, others are more involved. There are quite a few in preview systems, but still many in active systems.)

region\system\wpt file : line number : result

Code: [Select]
AL\usaal\al.al005.wpt:69:PrattHwy http://www.openstreetmap.org/?lat=33.577266&lon=-86.913711
AL\usaal\al.al013.wpt:44:CourtSt http://www.openstreetmap.org/?lat=34.797488&lon=-87.674160
AL\usaal\al.al013.wpt:47:RoyalAve http://www.openstreetmap.org/?lat=34.805329&lon=-87.664483
AL\usaal\al.al017.wpt:115:CourtSt http://www.openstreetmap.org/?lat=34.797488&lon=-87.674160
AL\usaal\al.al017.wpt:118:RoyalAve http://www.openstreetmap.org/?lat=34.805329&lon=-87.664483
AL\usaal\al.al041.wpt:68:WaterAve http://www.openstreetmap.org/?lat=32.412348&lon=-86.995336
AL\usaal\al.al053.wpt:13:TN7Truck http://www.openstreetmap.org/?lat=34.991991&lon=-86.844633
AL\usaus\al.us043.wpt:2:CraftHwy http://www.openstreetmap.org/?lat=30.761632&lon=-88.074029
AL\usaus\al.us043.wpt:12:SteelDr http://www.openstreetmap.org/?lat=31.139821&lon=-88.009876
AL\usaus\al.us043.wpt:58:CedarAve http://www.openstreetmap.org/?lat=32.518402&lon=-87.835168
AL\usaus\al.us043.wpt:150:CourtSt http://www.openstreetmap.org/?lat=34.797488&lon=-87.674160
AL\usaus\al.us043.wpt:153:RoyalAve http://www.openstreetmap.org/?lat=34.805329&lon=-87.664483
AL\usaus\al.us072.wpt:6:HaleyDr http://www.openstreetmap.org/?lat=34.734590&lon=-87.897191
AL\usaus\al.us072.wpt:20:CourtSt http://www.openstreetmap.org/?lat=34.797488&lon=-87.674160
AL\usaus\al.us072.wpt:23:RoyalAv http://www.openstreetmap.org/?lat=34.805329&lon=-87.664483
AL\usaus\al.us072.wpt:40:ClintSt http://www.openstreetmap.org/?lat=34.788594&lon=-86.964217
AL\usaus\al.us072.wpt:51:NanceRd http://www.openstreetmap.org/?lat=34.755622&lon=-86.731852
AL\usaus\al.us072.wpt:65:BrockRd http://www.openstreetmap.org/?lat=34.734947&lon=-86.430259
AL\usaus\al.us078.wpt:38:PrattHwy http://www.openstreetmap.org/?lat=33.577266&lon=-86.913711
AL\usaus\al.us080.wpt:38:WaterAve http://www.openstreetmap.org/?lat=32.412348&lon=-86.995336
AL\usaus\al.us082.wpt:27:CampusDr http://www.openstreetmap.org/?lat=33.212584&lon=-87.525823
AL\usaus\al.us098.wpt:12:I-65Ramps http://www.openstreetmap.org/?lat=30.707093&lon=-88.122593
AL\usaus\al.us231.wpt:168:WinchRd http://www.openstreetmap.org/?lat=34.792814&lon=-86.577448
AL\usaus\al.us278.wpt:42:RidgeRd http://www.openstreetmap.org/?lat=34.193062&lon=-87.194286
AL\usaus\al.us278.wpt:82:CleveAv http://www.openstreetmap.org/?lat=34.019661&lon=-86.086024
AL\usaus\al.us280.wpt:9:CahabaRd http://www.openstreetmap.org/?lat=33.483357&lon=-86.779590
AL\usaus\al.us280.wpt:10:HollyBlvd http://www.openstreetmap.org/?lat=33.482355&lon=-86.778045
AL\usaus\al.us431.wpt:127:EBroadSt http://www.openstreetmap.org/?lat=33.986730&lon=-85.965711
AL\usaus\al.us431.wpt:135:CleveAv http://www.openstreetmap.org/?lat=34.019661&lon=-86.086024
AL\usaus\al.us431.wpt:185:WinchRd http://www.openstreetmap.org/?lat=34.792814&lon=-86.577448
ARG\argrn\arg.rn003.wpt:182:AveoseIng http://www.openstreetmap.org/?lat=-45.824216&lon=-67.466655
BLZ\blzar\blz.ar001.wpt:15:ValofPeaRd http://www.openstreetmap.org/?lat=17.258452&lon=-88.804291
CA\usaca\ca.ca001.wpt:299:Fulst http://www.openstreetmap.org/?lat=37.773133&lon=-122.471791
CA\usaca\ca.ca002.wpt:6:AveStars http://www.openstreetmap.org/?lat=34.061137&lon=-118.418541
CA\usaca\ca.ca029.wpt:5:Tenst http://www.openstreetmap.org/?lat=38.109986&lon=-122.254744
CA\usaush\ca.us066hishol.wpt:10:AveStars http://www.openstreetmap.org/?lat=34.061137&lon=-118.418541
CO\usaus\co.us285.wpt:67:LightLn http://www.openstreetmap.org/?lat=39.530608&lon=-105.305586
COD\afrrtr\cod.rtr05.wpt:206:Kimbi http://www.openstreetmap.org/?lat=-5.141853&lon=19.044714
COD\afrtah\cod.tah10.wpt:206:Kimbi http://www.openstreetmap.org/?lat=-5.141853&lon=19.044714
COD\codn\cod.n001.wpt:191:Kimbi http://www.openstreetmap.org/?lat=-5.141853&lon=19.044714
FL\usaus\fl.us001.wpt:274:LewSpdwy http://www.openstreetmap.org/?lat=29.940358&lon=-81.335014
GA\usaga\ga.ga003.wpt:87:AMSpeed http://www.openstreetmap.org/?lat=33.389291&lon=-84.307837
GA\usaus\ga.us019.wpt:87:AMSpeed http://www.openstreetmap.org/?lat=33.389291&lon=-84.307837
GA\usaus\ga.us041.wpt:134:AMSpeed http://www.openstreetmap.org/?lat=33.389291&lon=-84.307837
IA\usaia\ia.ia028.wpt:2:Prole http://www.openstreetmap.org/?lat=41.409261&lon=-93.726897
IA\usaia\ia.ia148.wpt:3:PearlSt http://www.openstreetmap.org/?lat=40.668872&lon=-94.721396
IA\usaia\ia.ia163.wpt:13:Monroe http://www.openstreetmap.org/?lat=41.544304&lon=-93.130245
IA\usaia\ia.ia163.wpt:17:Otley http://www.openstreetmap.org/?lat=41.464405&lon=-93.038921
ID\usaid\id.id004.wpt:5:Burke http://www.openstreetmap.org/?lat=47.520635&lon=-115.818837
ID\usaid\id.id013buskoo.wpt:2:Brdway_E http://www.openstreetmap.org/?lat=46.145176&lon=-115.971272
ID\usaid\id.id014.wpt:16:Golden http://www.openstreetmap.org/?lat=45.811881&lon=-115.682401
ID\usaid\id.id021.wpt:7:HilndVlySmt http://www.openstreetmap.org/?lat=43.570123&lon=-116.031404
ID\usaid\id.id022.wpt:12:SmallCO http://www.openstreetmap.org/?lat=44.172718&lon=-112.418217
ID\usaid\id.id051.wpt:14:Blkstn http://www.openstreetmap.org/?lat=42.440221&lon=-115.887000
IRL\eurtr\irl.waw.wpt:64:Veagh http://www.openstreetmap.org/?lat=54.974363&lon=-7.578678
ISL\islth\isl.th001.wpt:105:ThverDal http://www.openstreetmap.org/?lat=65.524878&lon=-19.797192
ISL\islth\isl.th083.wpt:14:AegisSid http://www.openstreetmap.org/?lat=65.946612&lon=-18.185484
ISL\islth\isl.th326.wpt:3:Haell http://www.openstreetmap.org/?lat=64.063079&lon=-20.238147
ISL\islth\isl.th413.wpt:8:ThingTor http://www.openstreetmap.org/?lat=64.101128&lon=-21.780245
ISL\islth\isl.th635.wpt:2:Arsel http://www.openstreetmap.org/?lat=65.892119&lon=-22.307868
ISL\islth\isl.th643.wpt:50:Arnes http://www.openstreetmap.org/?lat=66.011236&lon=-21.510061
ISL\islth\isl.th645.wpt:16:Bakki http://www.openstreetmap.org/?lat=65.776642&lon=-21.520329
LA\usala\la.la0428.wpt:3:GendeGDr_E http://www.openstreetmap.org/?lat=29.922734&lon=-90.017268
LA\usala\la.la0454.wpt:3:ABPorterRd http://www.openstreetmap.org/?lat=31.189230&lon=-92.248641
LA\usala\la.la0608.wpt:6:DavisLdg http://www.openstreetmap.org/?lat=32.036520&lon=-91.136130
LA\usala\la.la1168.wpt:3:BoisdArcRd http://www.openstreetmap.org/?lat=30.656146&lon=-92.228916
MEX-BC\mexd\mexbc.mex002d.wpt:2:BlvrJosexPon http://www.openstreetmap.org/?lat=32.536697&lon=-116.930344
MEX-DF\mexsf\mexdf.aniper.wpt:44:AveConst http://www.openstreetmap.org/?lat=19.410897&lon=-99.193870
MEX-DF\mexsf\mexdf.aniper.wpt:54:AveConsc http://www.openstreetmap.org/?lat=19.444180&lon=-99.218023
MEX-DF\mexsf\mexdf.auturbnte.wpt:3:AveConst http://www.openstreetmap.org/?lat=19.410897&lon=-99.193870
MEX-DF\mexsf\mexdf.auturbnte.wpt:8:AveConsc http://www.openstreetmap.org/?lat=19.444180&lon=-99.218023
MEX-YUC\mexsf\mexyuc.yuc01.wpt:24:Chenku http://www.openstreetmap.org/?lat=21.018874&lon=-89.661055
MO\usai\mo.i064.wpt:28:FortyDr http://www.openstreetmap.org/?lat=38.631539&lon=-90.378728
MO\usamo\mo.mo001.wpt:5:ChoTrfwy http://www.openstreetmap.org/?lat=39.193940&lon=-94.548683
MO\usamo\mo.mo125.wpt:9:GladTopTrail http://www.openstreetmap.org/?lat=36.660123&lon=-92.851810
MO\usamo\mo.mo269.wpt:1:FrontSt http://www.openstreetmap.org/?lat=39.130816&lon=-94.522848
MO\usaus\mo.us040.wpt:126:FortyDr http://www.openstreetmap.org/?lat=38.631539&lon=-90.378728
MP\usamp\mp.mp101.wpt:6:Isang http://www.openstreetmap.org/?lat=14.149185&lon=145.162826
MS\usaus\ms.us045.wpt:45:Lauder http://www.openstreetmap.org/?lat=32.519754&lon=-88.517261
MS\usaus\ms.us049e.wpt:26:Sidon http://www.openstreetmap.org/?lat=33.408159&lon=-90.204127
MS\usaus\ms.us051.wpt:87:I55Front http://www.openstreetmap.org/?lat=32.405110&lon=-90.145912
MS\usaus\ms.us051.wpt:98:UnionSt http://www.openstreetmap.org/?lat=32.599397&lon=-90.035834
MS\usaus\ms.us051.wpt:163:PearlSt http://www.openstreetmap.org/?lat=33.786370&lon=-89.809906
MS\usaus\ms.us051.wpt:182:HentzRd http://www.openstreetmap.org/?lat=34.225748&lon=-89.939361
MS\usaus\ms.us061.wpt:7:SligoSt http://www.openstreetmap.org/?lat=31.084565&lon=-91.304219
MS\usaus\ms.us061.wpt:22:SmithRd http://www.openstreetmap.org/?lat=31.304112&lon=-91.358829
MS\usaus\ms.us061.wpt:94:BowieRd http://www.openstreetmap.org/?lat=32.386739&lon=-90.827322
MS\usaus\ms.us061.wpt:111:KelsoRd http://www.openstreetmap.org/?lat=32.633701&lon=-90.863714
MS\usaus\ms.us061.wpt:113:DixieRd http://www.openstreetmap.org/?lat=32.654190&lon=-90.869765
MS\usaus\ms.us061.wpt:115:OmegaRd http://www.openstreetmap.org/?lat=32.669943&lon=-90.879121
MS\usaus\ms.us061.wpt:122:MooreRd http://www.openstreetmap.org/?lat=32.778327&lon=-90.933881
MS\usaus\ms.us061.wpt:153:HeadsRd http://www.openstreetmap.org/?lat=33.470507&lon=-90.846505
MS\usaus\ms.us061.wpt:170:MoodyRd http://www.openstreetmap.org/?lat=33.908783&lon=-90.742393
MS\usaus\ms.us072.wpt:41:SalemRd http://www.openstreetmap.org/?lat=34.909021&lon=-88.491526
MS\usaus\ms.us080.wpt:43:GuldeRd http://www.openstreetmap.org/?lat=32.294026&lon=-89.876533
MS\usaus\ms.us080.wpt:82:AdamsSt http://www.openstreetmap.org/?lat=32.326161&lon=-88.931065
MS\usaus\ms.us084.wpt:2:CanalSt http://www.openstreetmap.org/?lat=31.552062&lon=-91.412044
MS\usaus\ms.us084.wpt:20:Roxie http://www.openstreetmap.org/?lat=31.507435&lon=-91.064987
MS\usaus\ms.us084.wpt:22:KirbyRd http://www.openstreetmap.org/?lat=31.495177&lon=-91.013274
MS\usaus\ms.us084.wpt:107:MaxeyRd http://www.openstreetmap.org/?lat=31.695492&lon=-89.214649
MS\usaus\ms.us084.wpt:109:FlyntRd http://www.openstreetmap.org/?lat=31.694178&lon=-89.185038
MS\usaus\ms.us278.wpt:11:HeadsRd http://www.openstreetmap.org/?lat=33.470507&lon=-90.846505
MS\usaus\ms.us278.wpt:28:MoodyRd http://www.openstreetmap.org/?lat=33.908783&lon=-90.742393
MS\usaus\ms.us278.wpt:135:WolfeRd http://www.openstreetmap.org/?lat=33.883225&lon=-88.274760
MS\usaus\ms.us425.wpt:3:CanalSt http://www.openstreetmap.org/?lat=31.552062&lon=-91.412044
MS\usausb\ms.us051sprjac.wpt:7:EndMaint http://www.openstreetmap.org/?lat=32.285273&lon=-90.183120
MT\usausb\mt.us093busmis.wpt:3:MountAve http://www.openstreetmap.org/?lat=46.855936&lon=-114.013120
MT\usausb\mt.us093busmis.wpt:5:RiverSt http://www.openstreetmap.org/?lat=46.869683&lon=-114.003443
NC\usanc\nc.nc126.wpt:4:LakeJamesStaPk http://www.openstreetmap.org/?lat=35.728239&lon=-81.901703
NC\usaus\nc.us064.wpt:142:SpeMtnrd http://www.openstreetmap.org/?lat=35.716768&lon=-79.910329
NC\usaus\nc.us176.wpt:4:SlakeSumRd http://www.openstreetmap.org/?lat=35.233669&lon=-82.391018
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973
NM\usaus\nm.us054.wpt:25:Polly http://www.openstreetmap.org/?lat=33.570846&lon=-105.981481
ON\cannf\on.rr174.wpt:3:Traway http://www.openstreetmap.org/?lat=45.435736&lon=-75.598018
OR\usaor\or.or003.wpt:15:FloraLn http://www.openstreetmap.org/?lat=45.891054&lon=-117.262316
OR\usaor\or.or018buswil.wpt:3:Mainst http://www.openstreetmap.org/?lat=45.078445&lon=-123.486414
OR\usaor\or.or034.wpt:82:Colst http://www.openstreetmap.org/?lat=44.555846&lon=-123.079598
OR\usaor\or.or099.wpt:163:SixthAve http://www.openstreetmap.org/?lat=43.395140&lon=-123.314978
OR\usaor\or.or099.wpt:234:Brdwy http://www.openstreetmap.org/?lat=44.049927&lon=-123.086561
OR\usaor\or.or126busspr.wpt:3:Brdwy http://www.openstreetmap.org/?lat=44.049927&lon=-123.086561
OR\usaor\or.or138.wpt:64:CleFalls http://www.openstreetmap.org/?lat=43.247891&lon=-122.233865
OR\usaor\or.or173.wpt:11:TimLodge http://www.openstreetmap.org/?lat=45.330555&lon=-121.709290
OR\usaor\or.or219.wpt:17:NorthSt http://www.openstreetmap.org/?lat=45.304369&lon=-122.972760
OR\usaor\or.or223.wpt:4:MaxcreRd http://www.openstreetmap.org/?lat=44.695048&lon=-123.432212
OR\usaor\or.or238.wpt:16:Mainst http://www.openstreetmap.org/?lat=42.325412&lon=-122.943449
OR\usaor\or.or281.wpt:4:ClearCreRd http://www.openstreetmap.org/?lat=45.519564&lon=-121.596680
OR\usaor\or.or551.wpt:3:NEArdntRd http://www.openstreetmap.org/?lat=45.259603&lon=-122.769470
OR\usaush\or.us030hiscol.wpt:6:StarkSt http://www.openstreetmap.org/?lat=45.515610&lon=-122.361689
PR\usapr\pr.pr0003.wpt:83:CllLMRivera_E http://www.openstreetmap.org/?lat=18.007422&lon=-65.899344
QC\canqc\qc.qc198.wpt:12:AvMiller http://www.openstreetmap.org/?lat=48.951245&lon=-65.494206
ROU\roudj\rou.dj682e.wpt:3:Igrfry http://www.openstreetmap.org/?lat=46.123455&lon=20.786380
SCT\eurtr\sct.fifcoatr.wpt:50:BackStile http://www.openstreetmap.org/?lat=56.297586&lon=-2.658691
SD\usasd\sd.sd053.wpt:10:Mosher http://www.openstreetmap.org/?lat=43.474894&lon=-100.294061
SD\usasd\sd.sd471.wpt:2:Rumford http://www.openstreetmap.org/?lat=43.131025&lon=-103.700895
SD\usasd\sd.sd471.wpt:7:Provo http://www.openstreetmap.org/?lat=43.192255&lon=-103.833761
SD\usaus\sd.us016.wpt:11:AveofChiefs http://www.openstreetmap.org/?lat=43.820316&lon=-103.640041
SD\usaus\sd.us385.wpt:29:AveofChiefs http://www.openstreetmap.org/?lat=43.820316&lon=-103.640041
SD\usausb\sd.us014altdea.wpt:4:Savoy http://www.openstreetmap.org/?lat=44.352716&lon=-103.931608
SK\cansk\sk.sk010.wpt:33:Barvas http://www.openstreetmap.org/?lat=51.210211&lon=-102.099724
UT\usausb\ut.us191bushel.wpt:2:PopSt PoplarSt http://www.openstreetmap.org/?lat=39.682635&lon=-110.854765 [should this have a + before PoplarSt? it appears correctly in Highway Browser.]
UT\usausb\ut.us191bushel.wpt:3:JanSt JanetSt http://www.openstreetmap.org/?lat=39.688473&lon=-110.854679 [should this have a + before JanetSt? it appears correctly in Highway Browser.]
UT\usaut\ut.ut024.wpt:137:OldStateHwy http://www.openstreetmap.org/?lat=38.873870&lon=-110.359243
UT\usaut\ut.ut257.wpt:11:Bloom http://www.openstreetmap.org/?lat=38.937581&lon=-112.808647
VA\usaus\va.us050.wpt:50:QueenSt http://www.openstreetmap.org/?lat=38.889655&lon=-77.077965
WI\usaus\wi.us010.wpt:159:Ferry http://www.openstreetmap.org/?lat=44.090020&lon=-87.650449
WI\usausb\wi.us041altosh.wpt:7:SnellRd http://www.openstreetmap.org/?lat=44.068409&lon=-88.543153
WI\usawi\wi.wi080.wpt:64:Sprague http://www.openstreetmap.org/?lat=44.147633&lon=-90.131750
WI\usawi\wi.wi169.wpt:8:Gurney http://www.openstreetmap.org/?lat=46.470734&lon=-90.508289
WI\usawi\wi.wi175.wpt:7:LloydSt http://www.openstreetmap.org/?lat=43.054684&lon=-87.971778
ZWE\zwep\zwe.p011.wpt:22:Gokwe http://www.openstreetmap.org/?lat=-18.217632&lon=28.942280
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: oscar on January 10, 2020, 09:51:45 pm
Many of these errors are in in-development/preview systems, rather than active systems. Many errors in those systems will be picked up and fixed in the peer review process, to get the systems ready for promotion to active status.

In the preview systems I manage, peer review has already picked up all the listed errors, and fixes are in my queue. The two errors in my active systems, one was already in my queue, and the other one has just been added.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: michih on January 11, 2020, 04:32:22 am
@cvoight, thanks for your effort.

We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php

I just wonder, why some are not detected, e.g.:

Code: [Select]
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973

@yakra? Any idea, should I open a Github issue?

It's an active system and I'll fix it.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: cvoight on January 15, 2020, 03:49:25 pm
Parenthesizing your Select-String query and appending .length allows you to find the total number of results. For example, to find the number of waypoint labels with a US route as the primary highway (31,265):

Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Directory "*usa*" -Recurse | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US").length
31265

Quote
US80/42, A5/A6, I-5/6, I-80/90, I-80/6
Putting two highways in a waypoint label (http://travelmapping.net/devel/manual/wayptlabels.php#2highways): Put the primary highway first, followed by a slash, followed by the second highway. Drop the prefix of the second highway if it is more than one character long. A5/A6 becomes A5/A6. I-5/I-6 becomes I-5/6. I-25/US50 becomes I-25/50.

edit: I redid this post as I did not read the label style guide properly. In fact, I quite bungled it in just about every way! My apologies for any confusion! Real results can be found below.

I actually find the waypoints with prefixes on both sides of the slash more attractive, but this does make for an interesting regex query. First, let's check and make sure we can return results for a general query by seeing if there are any waypoint labels with US routes on both sides of the slash: US<something>/US<something>.

Code: [Select]
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US.*\/US"

NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NV\usai\nv.i580.wpt:1:US50/395 +US50/US395 http://www.openstreetmap.org/?lat=39.120747&lon=-119.771919
SC\usaus\sc.us521.wpt:47:US521Trk/601Trk_N +US521Trk/US601Trk_N http://www.openstreetmap.org/?lat=34.290499&lon=-80.611326

Sure enough, but this regex gives us two false positives (NV and SC). yakra's grep queries use cut -f1 -d " " so that only the primary waypoint label is matched (US50/395 and US521Trk/601Trk_N here). The naive regex I'm been using looks for US at the start of a primary waypoint label, but then matches zero or more of any character (.*) until it finds a /US. Changing .* to [^\s]* (zero or more non-whitespace characters) eliminates the issue as the match will stop once it gets past the primary waypoint label. Success!

Code: [Select]
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US[^\s]*\/US"

NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397

The style guide says to drop the secondary highway's prefix after the slash if it's more than one character long, which is a somewhat simple query. Let us assume two things: (1) the only valid characters for highway prefixes are capital letters and hyphens; (2) a highway waypoint label is composed of a highway prefix followed by a number. So we can start by matching all non-whitespace characters that aren't a + (hidden waypoint label or something, not sure on the terminology) from the start of a primary waypoint label [^\s+] until we find a forward slash \/, then look for any instances where there are two or more uppercase letters followed by a number [A-Z-]{2,}[0-9]. Tying it all together: Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s+]+\/[A-Z-]{2,}[0-9]" (88 results using .length).

This is somewhat less interesting than one of the queries I was attempting because I misread the examples: what are the waypoint labels that include the same highway prefix on both sides of the slash? (I had also failed to note that the secondary highway prefix must be 2 characters, bah!) The reason it's interesting is that we get to use a capture group and backreferences (https://www.regular-expressions.info/backref.html)! (backreferences actually make regular expressions...not regular, but that's why they're fun!) The pattern we want to construct should match two or more valid highway prefix characters, followed by any number of non-whitespace characters, followed by a forward slash, followed by our first match. A naive construction (([A-Z-]{2,})[^\s]*\/\1) quickly runs into problems due to the way backreferences work (see the section Backtracking Into Capturing Groups from the link): if two different sets of 3+ capital letters are on opposite sides of the slash and any subset of 2 characters is in the same order, a match results. You might see where this is going: borders look like valid highway prefixes to a regex and several border pairings yield matches: BGR/GRC, ALM/ALA, IRN/IRQ, etc. (I like to start testing regexes live and then use test data with regexr (https://regexr.com/4sain) to iterate more quickly.)

So, matching highway prefixes alone is not enough, we have to be sure that the waypoint label is a highway. The first thing we do is change [^\s]* to [^\s]+. The + requires that the primary highway prefix be followed by one or more non-whitespace characters instead of zero or more. This screens out the BGR/GRC matches, but not many of the other border matches. If we assume that a highway is distinguished by the presence of a number after the highway prefix, we can screen out the rest of the false matches by requiring a number to appear in the secondary highway label after the secondary highway prefix. So, our final regex is: ^([A-Z-]{2,})[^\s]+\/\1[0-9]. If we tried to use non-whitespace characters [^\s] instead of a number [0-9] after the secondary highway prefix, the borders would still match because letters are non-whitespace. Likewise, trying to match only numbers after the primary highway prefix wouldn't work because many highways have suffixes that don't have to be numbers. (I'm sure this is well-understood information to anyone reading this, but I find it a useful practice to go through the reasoning.) Lastly, we can change our assumptions about what highway prefixes look like and use [^0-9] instead of [A-Z-]: assume any non-digit character can be a highway prefix.

note: I have not evaluated active vs. preview systems for these results, nor do I know whether any of these results need to be changed.

Code: [Select]
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^([^0-9]{2,})[^\s]+\/\1[0-9]"

BC\canbc\bc.bc003.wpt:224:BC93/BC95 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
BC\canbc\bc.bc093.wpt:22:BC3/BC95 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
BC\canbc\bc.bc095.wpt:33:BC3/BC93 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
CO\usaus\co.us085.wpt:119:I-25/I-70 +I-25(214A) http://www.openstreetmap.org/?lat=39.780113&lon=-104.989429
DEU-BY\deub\deuby.b013.wpt:16:St2256/St2419 http://www.openstreetmap.org/?lat=49.543987&lon=10.233706
DEU-BY\deub\deuby.b022bam.wpt:2:St2271/St2450 http://www.openstreetmap.org/?lat=49.796849&lon=10.222435
DNK\eurtr\dnk.mrsja.wpt:214:SR207/SR211 http://www.openstreetmap.org/?lat=55.845085&lon=12.094746
IN\usain\in.in001.wpt:68:I-69/I-469 http://www.openstreetmap.org/?lat=41.168382&lon=-85.104218
IND-PB\index\indpb.acexpy.wpt:8:NH5/NH7 http://www.openstreetmap.org/?lat=30.658329&lon=76.820827
ISL\islth\isl.th041.wpt:3:TH45/TH429 http://www.openstreetmap.org/?lat=64.002911&lon=-22.602568
ITA\eure\ita.e78.wpt:24:SS73/SS715 +SS326 http://www.openstreetmap.org/?lat=43.326457&lon=11.549098
NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us301.wpt:76:NC43/NC48_S http://www.openstreetmap.org/?lat=35.973595&lon=-77.802172
OH\usaoh\oh.oh012.wpt:1:OH115/OH189 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaoh\oh.oh115.wpt:7:OH12/OH189 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaoh\oh.oh189.wpt:13:OH12/OH115 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaus\oh.us023.wpt:13:OH32/OH124 http://www.openstreetmap.org/?lat=39.047869&lon=-83.023725
OR\usaor\or.or099.wpt:253:OR99W/OR99E http://www.openstreetmap.org/?lat=44.229772&lon=-123.204675
OR\usaor\or.or211.wpt:1:OR99E/OR214 http://www.openstreetmap.org/?lat=45.151311&lon=-122.831290
OR\usaor\or.or214.wpt:4:OR99E/OR211 http://www.openstreetmap.org/?lat=45.151311&lon=-122.831290
POL\eure\pol.e67.wpt:228:DW382/DW385 http://www.openstreetmap.org/?lat=50.580430&lon=16.803718
POL\poldk\pol.dk008klo.wpt:19:DW382/DW385 http://www.openstreetmap.org/?lat=50.580430&lon=16.803718
ROU\eure\rou.e60.wpt:186:DJ100B/DJ101E http://www.openstreetmap.org/?lat=44.784738&lon=26.098564
ROU\roudj\rou.dj222.wpt:15:DN22/DN22D http://www.openstreetmap.org/?lat=44.775267&lon=28.688680
ROU\roudj\rou.dj601e.wpt:1:DJ401A/DJ601 http://www.openstreetmap.org/?lat=44.448668&lon=25.778433
ROU\roudn\rou.dn001.wpt:19:DJ100B/DJ101E http://www.openstreetmap.org/?lat=44.784738&lon=26.098564
WY\usawy\wy.wy530.wpt:13:FR7/FR157 http://www.openstreetmap.org/?lat=41.209445&lon=-109.632559
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: cvoight on January 15, 2020, 04:53:15 pm
@cvoight, thanks for your effort.

We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php

I just wonder, why some are not detected, e.g.:

Code: [Select]
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973

@yakra? Any idea, should I open a Github issue?
not yakra, but I did peruse siteupdate.py (https://github.com/TravelMapping/DataProcessing/blob/master/siteupdate/python-teresco/siteupdate.py) (this link freezes my mobile browser for those browsing by phone, too much data), specifically the results for datacheckerrors.append which I believe is a comprehensive accounting of all the data checks. Generally, the waypoint label checks currently implemented don't test for consistency with the labeling style guidelines, but rather what I would consider more fundamental errors*, such as more than one slash, unbalanced parentheses, and invalid characters. There are a few checks for consistency, e.g. using "Bus" instead of "BL" or "BS" for business interstates, or the LONG_UNDERSCORE checks (flags _Abcd, _AbcdN type stuff), but nothing comprehensive. It would probably be useful to produce public documentation on the exact rules that throw flags (or maybe this exists and I couldn't find it at first glance).

* for what it's worth, I may be alone in drawing a distinction here, and you may consider these checks to be the same kind of thing.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: cvoight on January 16, 2020, 12:37:12 pm
Regions for the United States and Canada don't have a country prefix, but they can be isolated using an exhaustive OR'd regex. The results can then be piped into Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "<pattern_goes_here>" to scan waypoint labels. The (?: indicates this is a non-capturing group, which doesn't matter much in this instance and could be changed to just (. (We have to group the entire OR'd query so that the beginning of line symbol ^ and end of line symbol $ match properly. Without those symbols, MB would match Zambia ZMB or M[ADEINOPST] would match all of the Mexico MEX- regions.)
Code: [Select]
USA: Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"}
CAN: Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:AB|BC|MB|N[BLST]|ON|PE|QC|SK|YT)$"}
This is more seemly than using a unique system wildcard with Get-ChildItem -Directory "<system_wildcard_goes_here>" -Recurse, which does not exist in the case of the US because AUS A route directories contain the string usa).

Quote
I-80, A-73
USA Interstates and Quebec Autoroutes retain hyphens. (http://travelmapping.net/devel/manual/wayptlabels.php#hyphen)

We can redo our previous search for primary waypoint labels for interstates that lack a hyphen (already posted up, so I won't repost the results of the query):
Code: [Select]
PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^I[0-9]"
To check the primary waypoint labels for Quebec autoroutes, we can slim down our Where-Object query because we only care about QC:
Code: [Select]
PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^QC$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^A[0-9]"No results, so all the primary waypoint labels for QC autoroutes (450 results) meet the style guidelines.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: cvoight on January 16, 2020, 01:52:09 pm
Quote
RayWinSP, GreSmoNP
Highway ends at park (http://travelmapping.net/devel/manual/wayptlabels.php#park): For ends at a park or other non-commercial endpoint, abbreviate the park name as if it were a named highway (see rules above). Use NP for national park, PP for provincial park, SP for state park.

First let's establish some baseline numbers. Park suffixes should be preceded by at least one lowercase letter and followed by a space and at some point an OSM URL. This screens out borders with Spain (ESP) and any wpt files that aren't true wpt files (some stuff in ITA).
Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]+NP\s.*http").length
75
PS ..\hwy_data> (Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]+PP\s.*http").length
163
PS ..\hwy_data> (Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]+SP\s.*http").length
225

Theoretically the US will not have provincial parks PP, so let's see if there are any waypoint labels for provincial parks in the US:
Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]+PP\s.*http").length
11
A surprising amount, but upon investigation (http://travelmapping.net/hb/?units=miles&u=null&r=mo.mo023) they are all MOsPP -- waypoints for Missouri supplemental routes, which are interesting (https://en.wikipedia.org/wiki/Missouri_supplemental_route). [brief, old forums discussion] (http://forum.travelmapping.net/index.php?topic=127.msg600#msg600)

We can screen these out by requiring two or more lowercase letters [a-z]{2,} before a NP/PP/SP suffix and see what happens. NP stays at 75, SP decreases to 219 (-6), and PP decreases to 151 (-12). We are missing 1 result from PP, so let's see if we can find the odd one out:
Code: [Select]
PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:AB|BC|MB|N[BLST]|ON|PE|QC|SK|YT)$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[A-Z][a-z]{1}PP\s.*http"
BC\canbc\bc.bc097.wpt:284:WhiPtPP http://www.openstreetmap.org/?lat=54.908111&lon=-122.933021
It turns out that BC97 (http://travelmapping.net/hb/?units=miles&u=null&r=bc.bc097) has many waypoints with the suffix PP, which is not something the style guidelines mention.

To see what's going on, we could attempt to construct a search query that finds all the NP/PP/SP waypoints that aren't in the first or last line of a file. But it's a lot easier to conduct a cursory perusal of the results for SP and PP and look for files with more than two hits since you can't have more than two ends in a highway wpt file. What we find is that this nomenclature is widespread, and may merit a specific mention in the style guidelines.
Code: [Select]
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]{2,}SP\s.*http"
[...]
CA\usaca\ca.ca001.wpt:7:CryCoveSP http://www.openstreetmap.org/?lat=33.564211&lon=-117.826000
CA\usaca\ca.ca001.wpt:183:HarHeaSP http://www.openstreetmap.org/?lat=35.478241&lon=-120.991826
CA\usaca\ca.ca001.wpt:187:SanSimSP http://www.openstreetmap.org/?lat=35.593590&lon=-121.124847
CA\usaca\ca.ca001.wpt:201:LimBeaSP http://www.openstreetmap.org/?lat=36.008557&lon=-121.518209
CA\usaca\ca.ca001.wpt:205:JulPfeSP http://www.openstreetmap.org/?lat=36.158883&lon=-121.670644
CA\usaca\ca.ca001.wpt:208:BigSurSP http://www.openstreetmap.org/?lat=36.252768&lon=-121.787332
CA\usaca\ca.ca001.wpt:209:AndMolSP http://www.openstreetmap.org/?lat=36.288382&lon=-121.844172
CA\usaca\ca.ca001.wpt:353:RusGulSP http://www.openstreetmap.org/?lat=39.329785&lon=-123.804808
[...]
OR\usaus\or.us026.wpt:2:KloCreSP http://www.openstreetmap.org/?lat=45.921155&lon=-123.895140
OR\usaus\or.us026.wpt:15:HumCreRd +SunHwySP http://www.openstreetmap.org/?lat=45.889471&lon=-123.625449
OR\usaus\or.us026.wpt:177:OchLakeSP http://www.openstreetmap.org/?lat=44.307679&lon=-120.698118
OR\usaus\or.us026.wpt:228:ClyHolSP http://www.openstreetmap.org/?lat=44.417344&lon=-119.090466
[...]
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: michih on January 16, 2020, 02:42:47 pm
@cvoight, thanks for your effort.

We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php

I just wonder, why some are not detected, e.g.:

Code: [Select]
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973

@yakra? Any idea, should I open a Github issue?
not yakra, but I did peruse siteupdate.py (https://github.com/TravelMapping/DataProcessing/blob/master/siteupdate/python-teresco/siteupdate.py), specifically the results for datacheckerrors.append which I believe is a comprehensive accounting of all the data checks. Generally, the waypoint label checks currently implemented don't test for consistency with the labeling style guidelines

True, we just check long underscores (https://github.com/TravelMapping/DataProcessing/issues/250).
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: cvoight on January 20, 2020, 03:42:40 pm
A variation on potential no-underscore malformed city suffixes from the OP.

Quote
US40: US40BusWhi; US73: US40BusWhi, US40BusTho; A3: A3Zur
Intersections with visibly numbered highways (http://travelmapping.net/devel/manual/wayptlabels.php#visiblynumbered): Distinguish two different same-bannered same-numbered routes as needed with the 3-letter city abbreviations. Also use the city abbreviation for bannerless same-designation spurs or branches, such as the Zurich A3 spur intersecting the main A3.

Start by identifying some candidates for inclusion: find all the US primary waypoint labels that have a non-underscored suffix where 1 uppercase character is followed by 3 or more lowercase characters (38 results). (Similar idea holds across the project, it's just a lot easier to focus on the US for posting results.)

Code: [Select]
PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+[A-Z][a-z]{3,}[A-Z]*\s.*http"

--AL\usaal\al.al025.wpt:13:CR16Hale http://www.openstreetmap.org/?lat=32.600940&lon=-87.592294
--AL\usaal\al.al025.wpt:27:CR49Hale http://www.openstreetmap.org/?lat=32.888960&lon=-87.426410
--AL\usaal\al.al025.wpt:30:CR16Bibb http://www.openstreetmap.org/?lat=32.903288&lon=-87.313542
--AL\usaal\al.al051.wpt:13:CR11Dale http://www.openstreetmap.org/?lat=31.577896&lon=-85.735438
--AL\usaal\al.al053.wpt:13:TN7Truck http://www.openstreetmap.org/?lat=34.991991&lon=-86.844633
AL\usaus\al.us011.wpt:6:CR2York http://www.openstreetmap.org/?lat=32.482895&lon=-88.312300
AL\usaus\al.us011.wpt:7:CR19York http://www.openstreetmap.org/?lat=32.487959&lon=-88.299555
AL\usaus\al.us011.wpt:16:CR20Epes http://www.openstreetmap.org/?lat=32.689060&lon=-88.128548
AL\usaus\al.us011.wpt:17:CR21Epes http://www.openstreetmap.org/?lat=32.689755&lon=-88.122915
AL\usaus\al.us011.wpt:99:AL75Conn http://www.openstreetmap.org/?lat=33.587713&lon=-86.699638
AL\usaus\al.us011.wpt:135:AL35Conn http://www.openstreetmap.org/?lat=34.430120&lon=-85.732729
AL\usaus\al.us029.wpt:21:CR25Rome http://www.openstreetmap.org/?lat=31.141818&lon=-86.668611
AL\usaus\al.us084.wpt:109:CR31Dale http://www.openstreetmap.org/?lat=31.268918&lon=-85.673044
AL\usaus\al.us098.wpt:12:I-65Ramps http://www.openstreetmap.org/?lat=30.707093&lon=-88.122593
--AR\usaar\ar.ar125.wpt:18:Hwy125Park http://www.openstreetmap.org/?lat=36.490224&lon=-92.778170
--AR\usaar\ar.ar300.wpt:8:Hwy300Spur http://www.openstreetmap.org/?lat=34.937503&lon=-92.586571
KS\usaus\ks.us183.wpt:53:17Terr http://www.openstreetmap.org/?lat=39.390206&lon=-99.296091
MN\usamn\mn.mn371.wpt:48:CR38Cass http://www.openstreetmap.org/?lat=47.121790&lon=-94.613078
MS\usaus\ms.us051.wpt:87:I55Front http://www.openstreetmap.org/?lat=32.405110&lon=-90.145912
MS\usaus\ms.us084.wpt:26:MS184Mead http://www.openstreetmap.org/?lat=31.465422&lon=-90.906458
MS\usaus\ms.us084.wpt:30:MS184Bude http://www.openstreetmap.org/?lat=31.469521&lon=-90.841870
MS\usaus\ms.us084.wpt:119:MS184Cleo http://www.openstreetmap.org/?lat=31.700203&lon=-89.037237
NC\usanc\nc.nc159.wpt:4:NC159Spur http://www.openstreetmap.org/?lat=35.633211&lon=-79.782693
NC\usaus\nc.us001.wpt:80:US158/1Conn http://www.openstreetmap.org/?lat=36.364915&lon=-78.361954
NC\usaus\nc.us023.wpt:16:US23Conn http://www.openstreetmap.org/?lat=35.381729&lon=-83.222433
NC\usaus\nc.us070.wpt:161:US70Conn http://www.openstreetmap.org/?lat=36.083329&lon=-79.148244
NC\usaus\nc.us074.wpt:36:US23Conn http://www.openstreetmap.org/?lat=35.381729&lon=-83.222433
NC\usaus\nc.us301.wpt:30:US301Conn http://www.openstreetmap.org/?lat=35.121901&lon=-78.763411
NC\usausb\nc.us019trkway.wpt:11:US23Conn http://www.openstreetmap.org/?lat=35.381729&lon=-83.222433
NC\usausb\nc.us023bussyl.wpt:2:US23Conn http://www.openstreetmap.org/?lat=35.373978&lon=-83.225909
NC\usausb\nc.us064trkhen.wpt:10:US23Conn http://www.openstreetmap.org/?lat=35.381729&lon=-83.222433
NC\usausb\nc.us220busash.wpt:8:US220Conn http://www.openstreetmap.org/?lat=35.736951&lon=-79.808335
NE\usane\ne.ne008.wpt:52:645Blvd http://www.openstreetmap.org/?lat=40.058998&lon=-95.720047
NE\usaus\ne.us030.wpt:137:15Blvd http://www.openstreetmap.org/?lat=41.452731&lon=-96.626269
OH\usaus\oh.us052.wpt:93:US23Spur http://www.openstreetmap.org/?lat=38.486542&lon=-82.638752
SC\usasc\sc.sc133.wpt:2:12MileRA http://www.openstreetmap.org/?lat=34.704488&lon=-82.833384
VA\usava\va.va365.wpt:3:VA365East http://www.openstreetmap.org/?lat=36.957932&lon=-81.072166
WA\usawa\wa.wa100.wpt:5:WA100Spur +BeaHolSP http://www.openstreetmap.org/?lat=46.289990&lon=-124.056281

Some of these are in preview (prefixed with --). Some of these are mystifying and I simply have no clue whether there is an issue, e.g. the AL US routes. And some we can do a quick analysis to find out whether they are consistently formatted across the project or are outliers, e.g. Conn and Spur. (Turns out, the latter.)
Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+Con\s.*http").length
300 (Con suffix)
PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+Conn\s.*http").length
11 (Conn suffix)

PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+Spr\s.*http").length
342 (Spr suffix)
PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+Spur\s.*http").length
4 (Spur suffix)
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: yakra on March 15, 2020, 11:29:56 pm
Slightly more complicated: find all the waypoints in the US that start with I followed directly by a number. [Note: this is not quite restricted to the US as there are some non-US regions with highway directories that contain the string "usa", but it eliminates most regions and (currently) all false positives. See the code section in this post (http://forum.travelmapping.net/index.php?topic=3455.msg17451#msg17451) for a regex that matches only US states and territories.]
I have a git stash on my desktop implementing an INTERSTATE_NO_HYPHEN datacheck (http://travelmapping.net/devel/datacheck.php) in the C++ siteupdate program. Haven't tackled Python yet.
Code: [Select]
inline void Waypoint::interstate_no_hyphen(DatacheckEntryList *datacheckerrors)
{ if (route->system->country->first == "USA" && label.size() >= 2)
{ const char *c = label.data();
if (c[0] == 'T' && c[1] == 'o') c += 2;
if (c[0] == 'I' && isdigit(c[1]))
  datacheckerrors->add(route, label, "", "", "INTERSTATE_NO_HYPHEN", "");
}
}

First, find all waypoints with a capital letter followed by 4 or more lowercase letters
We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php
I just wonder, why some are not detected, e.g.:
Code: [Select]
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973
@yakra? Any idea, should I open a Github issue?
I implemented a LABEL_LONG_WORD datacheck on my own TM mirror (not currently online). Didn't yet merge the changes back into the TravelMapping/DataProcessing:master -- as cvoight records, there are still plenty of results. (More below...)
Probably should though  ;), if we're now expecting to see this datacheck now, and not seeing it causes confusion.

executed from the hwy_data directory: Get-ChildItem -Include "*.wpt" -Recurse
...
Scanning this file, I filter the results: delete many odd results from ITA\ita.nsa (2).wpt; delete Minnesota subway lines (MN\usamnmt\BlueLine.wpt & MN\usamnmt\GreenLine.wpt);
How about Get-ChildItem -Include "*/*/*.wpt" ?

Select-String -CaseSensitive -Pattern "[A-Z][a-z]{4,}"
Or even just "[a-z]{4,}"

all lines that contain a + [presumably waypoint alias] -- using Sublime Text, execute a regex search for ^.*\+.*$, select find all, and press delete twice;
We still want to search before the + sign though. Hence the | cut -f1 -d' ' on Unix-like systems.
Sounds like you know this per later posts though. :)

Parenthesizing your Select-String query and appending .length allows you to find the total number of results. For example, to find the number of waypoint labels with a US route as the primary highway (31,265):

Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Directory "*usa*" -Recurse | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US").length
31265
in Unix, we'd pipe the results to wc and do a line count:
grep '^US' */usa*/*.wpt | wc -l

I like the discussion on searching for extraneous highway prefixes. Did I experiment with a regex for this at some point? OK, yes. (http://forum.travelmapping.net/index.php?topic=62.msg16459#msg16459) But I hadn't hit the mark yet. Now working with
grep -v '^+' */*/*.wpt | cut -f1 -d' ' | egrep -iv ':\*?[A-Z]+/[A-Z]+$' | egrep ':.*/[A-Za-z\-]{2,}[0-9]'
we get 123 results. I don't see anything that should be a false positive, other than arguably the Texas entries; I think the original idea behind those was to consider the "Lp" or "Spr" part of the designation & not part of the prefix, I.E. TX95 + TXLp363 = drop the 2nd "TX" = TX95/Lp363. Whatever the case though, I don't like those and plan on fixing them.
We can expand on this a little bit and drop the requirement for numeral(s) after the slash:
grep -v '^+' */*/*.wpt | cut -f1 -d' ' | grep ':.*[0-9]' | egrep -i ':.*/[A-Z\-]{2,}' | egrep -v '/[A-Z]{2}$'
This brings some local road names, named roads, and lettered county routes with extraneous "CR" back into the mix. Looks like these wouldn't be FPs either:
http://travelmapping.net/devel/manual/wayptlabels.php#slash
http://travelmapping.net/devel/manual/wayptlabels.php#dropnamed
Code: [Select]
AL/usaal/al.al013.wpt:US72/ALT72
AL/usaal/al.al017.wpt:US72/ALT72
AL/usaus/al.us043.wpt:US72/ALT72
AL/usaus/al.us072.wpt:US43/ALT72
AL/usaus/al.us090.wpt:I10/US98_W
AL/usaus/al.us090.wpt:I10/US98_E
AL/usaus/al.us098.wpt:I10/US90_W
AL/usaus/al.us098.wpt:I10/US90_E
BC/canbc/bc.bc003.wpt:BC93/BC95
BC/canbc/bc.bc093.wpt:BC3/BC95
BC/canbc/bc.bc095.wpt:BC3/BC93
CA/usaush/ca.us099hisind.wpt:1stSt/Hef
CHE/eurtr/che.gts.wpt:H10/AutGalsIns
CO/usaco/co.co030.wpt:6th/HavSt
CO/usaus/co.us085.wpt:I-25/I-70
CZE/eure/cze.e442.wpt:I30/II613
CZE/eure/cze.e461.wpt:I43/II640
CZE/eure/cze.e461.wpt:I42/II640
CZE/eure/cze.e551.wpt:I34/II634
CZE/eure/cze.e65.wpt:MO/II243
DEU-BY/deub/deuby.b013.wpt:St2256/St2419
DEU-BY/deub/deuby.b022bam.wpt:St2271/St2450
DEU-BY/eurtr/deuby.romstr.wpt:B466/St2212
DEU-BY/eurtr/deuby.romstr.wpt:B300/St2051
DEU-BY/eurtr/deuby.romstr.wpt:B17/St2014
DEU-BY/eurtr/deuby.romstr.wpt:B23/St2059
DEU-BY/eurtr/deuby.romstr.wpt:B17/St2059
DEU-BY/eurtr/deuby.romstrwur.wpt:B8/St2300
DNK/eurtr/dnk.mraal.wpt:PR55/SR180
DNK/eurtr/dnk.mresb.wpt:PR11/SR175
DNK/eurtr/dnk.mresb.wpt:E20/PR24
DNK/eurtr/dnk.mresb.wpt:PR15/SR181
DNK/eurtr/dnk.mrnyk.wpt:PR9/SR297
DNK/eurtr/dnk.mrnyk.wpt:PR9/SR289
DNK/eurtr/dnk.mrode.wpt:PR16/SR563
DNK/eurtr/dnk.mrode.wpt:PR21/SR563
DNK/eurtr/dnk.mrode.wpt:PR52/SR451
DNK/eurtr/dnk.mrode.wpt:PR8/SR167
DNK/eurtr/dnk.mrsja.wpt:SR231/PR57
DNK/eurtr/dnk.mrsja.wpt:SR207/SR211
DNK/eurtr/dnk.mrsja.wpt:SR211/PR16
DNK/eurtr/dnk.mrsja.wpt:PR16/SR205
DNK/eurtr/dnk.mrsja.wpt:PR14/SR155
DNK/eurtr/dnk.mrsve.wpt:SR305/PR9
DNK/eurtr/dnk.mrvib.wpt:PR26/SR186
DNK/eurtr/dnk.mr.wpt:PR26/SR545
DNK/eurtr/dnk.mr.wpt:PR26/SR181
DNK/eurtr/dnk.mr.wpt:E39/SR597
DNK/eurtr/dnk.mr.wpt:E45/SR585
ESP-AN/espa/espan.ca035.wpt:AP4/CA32
ESP-CM/espa/espcm.ap041.wpt:A40/TO22
ESP-CM/espn/espcm.n403.wpt:TO21/CM40
ESP-MC/espmc/espmc.rm015.wpt:A7/MU30
FRA-GES/fragesd55/frages.d069455.wpt:N135/NVS
FRA-IDF/eure/fraidf.e15.wpt:A3/BlvdPer
FRA-IDF/eure/fraidf.e15.wpt:A6b/BlvdPer
FRA-IDF/eure/fraidf.e50.wpt:A6b/BlvdPer
FRA-IDF/eure/fraidf.e50.wpt:A4/BlvdPer
FRA-IDF/eure/fraidf.e5.wpt:A13/BlvdPer
FRA-IDF/eure/fraidf.e5.wpt:A6a/BlvdPer
IDN/asiahr/idn.ah152.wpt:N3/JTP
IND-PB/index/indpb.acexpy.wpt:NH5/NH7
IN/usain/in.in001.wpt:I-69/I-469
ISL/islth/isl.th041.wpt:TH45/TH429
ITA/eure/ita.e78.wpt:SS73/SS715
ITA/eure/ita.e80.wpt:A91/AGRA
ITA/eure/ita.e80.wpt:A24/AGRA
MA/usama/ma.ma002acam.wpt:US3/MA3
MEX-JAL/mexdn/mexjal.mex080d.wpt:MEX90D/GUA10D
MEX-JAL/mexdn/mexjal.mex090d.wpt:MEX80D/GUA10D
NC/usanc/nc.nc087.wpt:US311/NC770
NC/usanc/nc.nc308trkwin.wpt:US13/US17_S
NC/usaus/nc.us017.wpt:US13/US17BypWin_S
NC/usaus/nc.us158.wpt:US29Bus/NC87
NC/usaus/nc.us301.wpt:NC43/NC48_S
OH/usaoh/oh.oh012.wpt:OH115/OH189
OH/usaoh/oh.oh115.wpt:OH12/OH189
OH/usaoh/oh.oh189.wpt:OH12/OH115
OH/usaus/oh.us023.wpt:OH32/OH124
ON/canon/on.on058tho.wpt:ON20/RR20
OR/usaor/or.or010.wpt:US26/OR99W
OR/usaor/or.or019.wpt:I-84/US30
OR/usaor/or.or035.wpt:I-84/US30
OR/usaor/or.or038.wpt:I-5/OR99
OR/usaor/or.or039.wpt:US97/OR39Bus
OR/usaor/or.or074.wpt:I-84/US30
OR/usaor/or.or099e.wpt:I-84/US30
OR/usaor/or.or099.wpt:US199/OR238
OR/usaor/or.or099.wpt:OR99W/OR99E
OR/usaor/or.or131trktil.wpt:US101/OR6
OR/usaor/or.or131.wpt:US101/OR6
OR/usaor/or.or140.wpt:I-5/OR99
OR/usaor/or.or207.wpt:I-84/US30
OR/usaor/or.or211.wpt:OR99E/OR214
OR/usaor/or.or214.wpt:OR99E/OR211
OR/usaor/or.or320.wpt:I-84/US30/395
OR/usaus/or.us097.wpt:US97Bus/OR126
POL/eure/pol.e67.wpt:DW382/DW385
POL/poldk/pol.dk008klo.wpt:DW382/DW385
PRT/eure/prt.e801.wpt:A24/IP3
ROU/eure/rou.e60.wpt:DN1A/DNVO1K
ROU/eure/rou.e60.wpt:DJ100B/DJ101E
ROU/eure/rou.e81.wpt:A2/DN22C
ROU/roudj/rou.dj222.wpt:DN22/DN22D
ROU/roudj/rou.dj601e.wpt:DJ401A/DJ601
ROU/roudj/rou.dj793a.wpt:DJ792A/DJ793
ROU/roudn/rou.dn001.wpt:DJ100B/DJ101E
ROU/roudn/rou.dn001.wpt:DN1A/DNVO1K
ROU/roudn/rou.dn007.wpt:A1/DN73
RUS/asiah/rus.ah008.wpt:A181/ZSD
RUS/asiah/rus.ah008.wpt:A118/ZSD
RUS/eure/rus.e18.wpt:A181/ZSD
RUS/eure/rus.e18.wpt:A118/ZSD
TX/usai/tx.i369.wpt:US59/Lp151
TX/usatxl/tx.lp0012.wpt:TXLp354/Spr482
TX/usatxl/tx.lp0020.wpt:TX359/Spr260
TX/usatxl/tx.lp0340.wpt:TX6/Lp484
TX/usatxl/tx.lp0354.wpt:TXLp12/Spr482
TX/usatxl/tx.lp0484.wpt:TX6/Lp340
TX/usatxs/tx.sp0339.wpt:TX103/Lp287
TX/usatxs/tx.sp0450.wpt:TX302/Lp338
TX/usatx/tx.tx019.wpt:TX19Bus/Lp7
TX/usatx/tx.tx019.wpt:TX154/Lp301
TX/usatx/tx.tx021.wpt:TX95/Lp150
TX/usatx/tx.tx036.wpt:TX95/Lp363
TX/usatx/tx.tx036.wpt:TX53/Lp363
TX/usatx/tx.tx046.wpt:TX46Bus/Lp337
TX/usatx/tx.tx053.wpt:TX36/Lp363
TX/usatx/tx.tx095.wpt:TX21/Lp150
TX/usatx/tx.tx154.wpt:TX19/Lp301
TX/usatx/tx.tx158.wpt:TX158Bus/Lp250
TX/usaus/tx.us059.wpt:US59Bus/Lp20
TX/usaus/tx.us059.wpt:US59Bus/Lp224
TX/usaus/tx.us059.wpt:TX93/Lp151
TX/usaus/tx.us083.wpt:US83Bus/Lp322
TX/usaus/tx.us084.wpt:US83Bus/Lp322
TX/usaus/tx.us190.wpt:TX95/Lp363
TX/usaus/tx.us380.wpt:TX207/Lp46
UT/usaut/ut.ut186.wpt:300N/ColSt
UT/usaut/ut.ut186.wpt:300N/StaSt
UT/usaut/ut.ut227.wpt:200W/StaSt
WI/usausb/wi.us012busbar.wpt:US12/CRW
WI/usawi/wi.wi035.wpt:WI64/CRE
WI/usawi/wi.wi064.wpt:WI35/CRE
WY/usaus/wy.us026.wpt:1stSt/RamSt
WY/usaus/wy.us085.wpt:Rd128/CadRd
WY/usaus/wy.us287.wpt:1stSt/RamSt
WY/usawy/wy.wy159.wpt:TeaRd/Rd94
WY/usawy/wy.wy530.wpt:FR7/FR157

but I did peruse siteupdate.py (https://github.com/TravelMapping/DataProcessing/blob/master/siteupdate/python-teresco/siteupdate.py) ... for datacheckerrors.append which I believe is a comprehensive accounting of all the data checks.
Correct, datacheckerrors.append will show all the datachecks, even a couple that are commented out. The live ones are compiled all in one place for convenience here (https://github.com/TravelMapping/DataProcessing/blob/3e1b7d954f6b8655609679368fb158b2d2caa421/siteupdate/python-teresco/siteupdate.py#L1247-L1273).

Generally, the waypoint label checks currently implemented don't test for consistency with the labeling style guidelines, but rather what I would consider more fundamental errors*,
...
[* for what it's worth, I may be alone in drawing a distinction here, and you may consider these checks to be the same kind of thing.]
...
such as more than one slash, unbalanced parentheses, and invalid characters.
I'd categorize more than one slash (LABEL_SLASHES) as more style guideline, but I getcha.

There are a few checks for consistency, e.g. using "Bus" instead of "BL" or "BS" for business interstates, or the LONG_UNDERSCORE checks (flags _Abcd, _AbcdN type stuff), but nothing comprehensive.
Other than what you mentioned, for style guide stuff we have LABEL_UNDERSCORES (too many) and NONTERMINAL_UNDERSCORE (slash after an underscore).
I may have a go at adding US_BANNER US_LETTER back in; that's easy. Edit: Done!
I've brainstormed (http://forum.travelmapping.net/index.php?topic=62.msg16459#msg16459) some other potential datachecks, still mostly in regex form recorded here on the forum for now, but have implemented some of these as actual datachecks, either partially or fully, and left saved as a git stash. I just worry about dropping too many bombs on the datacheck page too fast, and generating fatigue and ill will... even if we should be labeling stuff right! ;)

It would probably be useful to produce public documentation on the exact rules that throw flags (or maybe this exists and I couldn't find it at first glance).
https://github.com/yakra/Web/issues/13 wasn't too helpful/visible; replaced with https://github.com/TravelMapping/Web/issues/403 .

conduct a cursory perusal of the results for SP and PP and look for files with more than two hits since you can't have more than two ends in a highway wpt file. What we find is that this nomenclature is widespread, and may merit a specific mention in the style guidelines.
"Road (not driveway/parking lot) to a national or state-level park, major airport, or popular tourist attraction." (http://travelmapping.net/devel/manual/includepts.php) The (not driveway/parking lot) directive has more recently been relaxed in practice.
If a localized national park or tourist attraction is immediately served by the cross road, a truncated version of its name could also suffice. (http://travelmapping.net/devel/manual/wayptlabels.php#nearbytown)
Clarificaction on NP/PP/SP abbreviations may be more useful here than in the section on highway ends.
https://github.com/TravelMapping/Web/issues/404

Start by identifying some candidates for inclusion: find all the US primary waypoint labels that have a non-underscored suffix where 1 uppercase character is followed by 3 or more lowercase characters (38 results).
This will only look for 4-character strings; in the OP I was also interested in finding cases where a more standard 3-character string should have an underscore but lacks it; these will be much more common.
Beware false positives! Do the search in the command for all regions (*/*/*.wpt), and you'll see a FP right away, on AB748 (http://travelmapping.net/hb/?r=ab.ab748&lat=53.599203&lon=-116.384654&zoom=14). Since we can't have two "AB748"s to use in list files, one of them becomes AB748Eds, hence we'll see labels like this sometimes.
• No underscore when the city abbrev is part of the highway's .list name.
• Underscore when appending a city suffix to disambiguate 2+ intersections with the same route, that would otherwise have the same label.

Weeding out false positives would be better handled as a datacheck, where we have access to info on other routes occupying that same waypoint's coordinates.
My search in the OP is pretty North America centric; it returns a lot of FPs for Italy, which has a number of frequently used banners that I didn't include in that big regex.

A number of the searches posted in the OP, I have eyes toward eventually implementing as datachecks where practicable.
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: yakra on November 18, 2020, 09:18:40 pm
Who gets hurt by exit renumberings?

MA I-195 has been renumbered, sequential -> mileage-based. (https://forum.travelmapping.net/index.php?topic=7.msg20881#msg20881)
What if I want to tag people
who maintain their .lists via GitHub, so GitHub can send them a notification?
I've got a list of the exit numbers in use that got moved to a new location... Where do I go from there?

cd $HOME/TravelMapping/UserData/list_files/

First, I set up my search terms. Change these around, and we can generalize the scripts below to other similar applications.
rg=MA
rte=I-195
root=ma.i195
labels='5 10 12 13 15 22'


A few more variables to simplify some of the shell commands:
r=$(echo -en "\r")
t=$(echo -en "\t")
d="[ $t]+"



Build a list of travelers using any of the specified labels:
AllBroken=\
$(for label in $labels; do
    grep -v '^#' * \
    | sed -r -e 's~#.*~~' -e "s~:[ $t]+~:~" -e "s~[ $t$r]+$~~" \
    | egrep -i ":$rg$d$rte$d$label$d.*|:$rg$d$rte$d.*$d$label$|:$rg$d$rte$d$label$d.*$d.*$d.*|:.*$d.*$d.*$d$rg$d$rte$d$label$"
  done | cut -f1 -d. | sort | uniq)

I can use the echo command to print out this list and slap it in the forum. (https://forum.travelmapping.net/index.php?topic=7.msg20881#msg20881)


Who updates using GitHub? Which files have commits by someone other than Jim?
GitHub=\
$(for u in $AllBroken; do
    echo -en "$u\t"
    echo \
    $(git log $u.list \
      | grep -v Teresco \
      | grep -B 1 -m 1 'Author:' \
      | head -n 1 | cut -f2 -d' ')
  done | egrep "$t[0-9a-f]{40}$" | cut -f1)

This is a list of TM usernames. We still need to figure out GitHub usernames, so we'll set it aside for later.


Back up a step, convert the names to URLs, and visit their GitHub commit history at the last commit not by Jim:
for u in $AllBroken; do
  echo -en "$u\t"
  echo \
  $(git log $u.list \
    | grep -v Teresco \
    | grep -B 1 -m 1 'Author:' \
    | head -n 1 | cut -f2 -d' ')
done \
| egrep "$t[0-9a-f]{40}$" \
| sed -r "s~(.*)$t(.*)~https://github.com/TravelMapping/UserData/commits/\2/list_files/\1.list~" \
| xargs firefox

This opens up 24 browser tabs. From there I just clicked on the users' profiles & copied their names the old-fashioned way, but this could possibly be automated too...


Who's left over who updates by email, and could use a heads-up from Jim?
diff <(echo $AllBroken | tr ' ' '\n') <(echo $GitHub | tr ' ' '\n') | grep '<' | sed 's~< ~~'


Who's fine?
What if, for giggles, I want to look at the travels of people whose .list lines didn't get broken?
First, find everyone who marked off any segment of MA I-195...
AllListed=\
$(grep -v '^#' * \
  | sed -r -e 's~#.*~~' -e "s~:[ $t]+~:~" -e "s~[ $t$r]+$~~" \
  | egrep -i ":$rg$d$rte$d.*$d.*|:$rg$d$rte$d.*$d.*$d.*$d.*|:.*$d.*$d.*$d$rg$d$rte$d.*" \
  | cut -f1 -d. | sort | uniq)



Then filter out the broken .lists, and open up what's left in the HB.
diff <(echo $AllListed | tr ' ' '\n') <(echo $AllBroken | tr ' ' '\n') | grep '<' \
| sed -e 's~< ~~' -e "s~.*~https://travelmapping.net/hb/showroute.php?r=$root\&u=&~" \
| xargs firefox



Was there anyone with only concurrencies, not explicitly .listing MA I-195 itself?
We can search user logs for anyone with MA I-195 mileage, and filter out our previous list.
cd $HOME/TravelMapping/DataProcessing/siteupdate/python-teresco/logs/users
diff <(grep -m 1 "$rg $rte" * | cut -f1 -d. | sort) <(echo $AllListed | tr ' ' '\n') | grep '<' \
| sed -r "s~< (.*)~https://travelmapping.net/user/mapview.php?rg=$rg\&u=\1~" \
| xargs firefox
Title: Re: Using grep to search highway data, waypoint labels, etc.
Post by: yakra on December 05, 2020, 01:51:41 pm
Open all of a route's intersecting/concurrent routes in the HB
...in however many separate browser tabs centered at each intersecting/concurrent waypoint.
Run this in the appropriate regional subdirectory of hwy_data/:
file=usama/ma.ma003.wpt #or whatever
for url in `cat $file | sed 's~.* ~~'`; do grep $url */*.wpt | grep -v "^$file"; done | sed -r 's~.*/(.*)\.wpt:.*(lat=.*)~https://travelmapping.net/hb/showroute.php?r=\1\&\2\&zoom=16~' | xargs firefox




Named designation with a slash
for rg in `ls | grep -v _`; do
  grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' \
  | egrep -v '_|Alt_?[NEWS]?$|Bus_?[NEWS]?$|Byp_?[NEWS]?$|Spr_?[NEWS]?$|Trk_?[NEWS]?$' \
  | grep ':.*[0-9].*/.*[A-Z][a-z]'
done

for rg in `ls | grep -v _`; do
  grep -sv '^+' $rg/*/*.wpt | cut -f1 -d' ' \
  | egrep -v 'Alt_?[NEWS]?$|Bus_?[NEWS]?$|Byp_?[NEWS]?$|Spr_?[NEWS]?$|Trk_?[NEWS]?$|:[A-Za-z]{3,4}/[A-Za-z]{3,4}$' \
  | grep ':.*[0-9a-z].*[A-Z].*/.*[A-Z][a-z]'
done

Putting two highways in a waypoint label (https://travelmapping.net/devel/manual/wayptlabels.php#dropnamed)
Quote
If you encounter the need for using both a named and numbered designation in the waypoint label, or the need for two named designations, pick only one of the two for brevity. I-95/New Jersey Turnpike becomes I-95.



Potential continuing highways that should be "End"
for rg in `ls | grep -v _`; do
  head -n 1 -v $rg/*/*.wpt 2>/dev/null | tr '\n>' ' \n' | sed -e 's~ ~~' -e 's~ <== ~:~' | cut -f1 -d' ' | grep -v ':.*[0-9]' | grep '_[NEWS]$'
  tail -n 1 -v $rg/*/*.wpt 2>/dev/null | tr '\n>' ' \n' | sed -e 's~ ~~' -e 's~ <== ~:~' | cut -f1 -d' ' | grep -v ':.*[0-9]' | grep '_[NEWS]$'
done

Highway ends at non-intersections or non-borders (https://travelmapping.net/devel/manual/wayptlabels.php#continuing)
Quote
For sudden ends at no particular intersection or landmark, the name of the continuing highway can be used if it begins where the highway in question ends, i.e., is not concurrent.
Plenty false positives & some false negatives.
Title: Re: Using grep and shell scripts to search highway data, waypoint labels, etc.
Post by: yakra on August 27, 2021, 03:13:40 pm
List routes in a region sorted by number of shaping points
Run this in the appropriate regional subdirectory of hwy_data/:

for f in */*.wpt; do printf '%03i\t' $(grep -c '^+' $f); echo $f; done | grep -v '^000' | sort
Title: Re: Using grep and shell scripts to search highway data, waypoint labels, etc.
Post by: yakra on October 08, 2021, 01:55:27 am
When you've edited some files and want to find intersecting/concurrent points in other files not edited yet
Helpful for avoiding breaking concurrencies. Changes must be unstaged for this to work.
Run this from HighwayData/ or hwy_data/, or in the regional subdirectory if you didn't go near any borders:

fgrep -rf <(git diff | grep ^- | grep -v '\.wpt$' | sed 's~.* ~~') . | fgrep -vf <(git diff --name-only | cut -f4 -d/)

Downsides:
• Doesn't help with brand new points added to one route in a concurrency.
• Won't find http://www.openstreetmap.org/?lat=42.094226&lon=-75.75703 when you're looking for http://www.openstreetmap.org/?lat=42.094226&lon=-75.757030.