Parenthesizing your Select-String query and appending
.length allows you to find the total number of results. For example, to find the number of waypoint labels with a US route as the primary highway (31,265):
PS ..\hwy_data> (Get-ChildItem -Directory "*usa*" -Recurse | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US").length
31265
US80/42, A5/A6, I-5/6, I-80/90, I-80/6
Putting two highways in a waypoint label: Put the primary highway first, followed by a slash, followed by the second highway. Drop the prefix of the second highway if it is more than one character long. A5/A6 becomes A5/A6. I-5/I-6 becomes I-5/6. I-25/US50 becomes I-25/50.
edit: I redid this post as I did not read the label style guide properly. In fact, I quite bungled it in just about every way! My apologies for any confusion! Real results can be found below.I actually find the waypoints with prefixes on both sides of the slash more attractive, but this does make for an interesting regex query. First, let's check and make sure we can return results for a general query by seeing if there are any waypoint labels with US routes on both sides of the slash:
US<something>/
US<something>.
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US.*\/US"
NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NV\usai\nv.i580.wpt:1:US50/395 +US50/US395 http://www.openstreetmap.org/?lat=39.120747&lon=-119.771919
SC\usaus\sc.us521.wpt:47:US521Trk/601Trk_N +US521Trk/US601Trk_N http://www.openstreetmap.org/?lat=34.290499&lon=-80.611326
Sure enough, but this regex gives us two false positives (
NV and
SC). yakra's grep queries use
cut -f1 -d " " so that only the primary waypoint label is matched (
US50/395 and
US521Trk/601Trk_N here). The naive regex I'm been using looks for
US at the start of a primary waypoint label, but then matches zero or more of any character (
.*) until it finds a
/US. Changing
.* to
[^\s]* (zero or more non-whitespace characters) eliminates the issue as the match will stop once it gets past the primary waypoint label. Success!
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US[^\s]*\/US"
NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
The style guide says to drop the secondary highway's prefix after the slash if it's more than one character long, which is a somewhat simple query. Let us assume two things: (1) the only valid characters for highway prefixes are capital letters and hyphens; (2) a highway waypoint label is composed of a highway prefix followed by a number. So we can start by matching all non-whitespace characters that aren't a
+ (hidden waypoint label or something, not sure on the terminology) from the start of a primary waypoint label
[^\s+] until we find a forward slash
\/, then look for any instances where there are two or more uppercase letters followed by a number
[A-Z-]{2,}[0-9]. Tying it all together:
Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s+]+\/[A-Z-]{2,}[0-9]" (88 results using
.length).
This is somewhat less interesting than one of the queries I was attempting because I misread the examples: what are the waypoint labels that include the
same highway prefix on both sides of the slash? (I had also failed to note that the secondary highway prefix must be 2 characters, bah!) The reason it's interesting is that we get to use a capture group and
backreferences! (backreferences actually make regular expressions...not regular, but that's why they're fun!) The pattern we want to construct should match two or more valid highway prefix characters, followed by any number of non-whitespace characters, followed by a forward slash, followed by our first match. A naive construction (
([A-Z-]{2,})[^\s]*\/\1) quickly runs into problems due to the way backreferences work (see the section Backtracking Into Capturing Groups from the link): if two different sets of 3+ capital letters are on opposite sides of the slash and any subset of 2 characters is in the same order, a match results. You might see where this is going: borders look like valid highway prefixes to a regex and several border pairings yield matches: BGR/GRC, ALM/ALA, IRN/IRQ, etc. (I like to start testing regexes live and then use test data with
regexr to iterate more quickly.)
So, matching highway prefixes
alone is not enough, we have to be sure that the waypoint label is a highway. The first thing we do is change
[^\s]* to
[^\s]+. The
+ requires that the primary highway prefix be followed by one or more non-whitespace characters instead of zero or more. This screens out the BGR/GRC matches, but not many of the other border matches. If we assume that a highway is distinguished by the presence of a number after the highway prefix, we can screen out the rest of the false matches by requiring a number to appear in the secondary highway label after the secondary highway prefix. So, our final regex is:
^([A-Z-]{2,})[^\s]+\/\1[0-9]. If we tried to use non-whitespace characters
[^\s] instead of a number
[0-9] after the secondary highway prefix, the borders would still match because letters are non-whitespace. Likewise, trying to match only numbers after the primary highway prefix wouldn't work because many highways have suffixes that don't have to be numbers. (I'm sure this is well-understood information to anyone reading this, but I find it a useful practice to go through the reasoning.) Lastly, we can change our assumptions about what highway prefixes look like and use
[^0-9] instead of
[A-Z-]: assume any non-digit character can be a highway prefix.
note: I have not evaluated active vs. preview systems for these results, nor do I know whether any of these results need to be changed.
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^([^0-9]{2,})[^\s]+\/\1[0-9]"
BC\canbc\bc.bc003.wpt:224:BC93/BC95 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
BC\canbc\bc.bc093.wpt:22:BC3/BC95 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
BC\canbc\bc.bc095.wpt:33:BC3/BC93 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
CO\usaus\co.us085.wpt:119:I-25/I-70 +I-25(214A) http://www.openstreetmap.org/?lat=39.780113&lon=-104.989429
DEU-BY\deub\deuby.b013.wpt:16:St2256/St2419 http://www.openstreetmap.org/?lat=49.543987&lon=10.233706
DEU-BY\deub\deuby.b022bam.wpt:2:St2271/St2450 http://www.openstreetmap.org/?lat=49.796849&lon=10.222435
DNK\eurtr\dnk.mrsja.wpt:214:SR207/SR211 http://www.openstreetmap.org/?lat=55.845085&lon=12.094746
IN\usain\in.in001.wpt:68:I-69/I-469 http://www.openstreetmap.org/?lat=41.168382&lon=-85.104218
IND-PB\index\indpb.acexpy.wpt:8:NH5/NH7 http://www.openstreetmap.org/?lat=30.658329&lon=76.820827
ISL\islth\isl.th041.wpt:3:TH45/TH429 http://www.openstreetmap.org/?lat=64.002911&lon=-22.602568
ITA\eure\ita.e78.wpt:24:SS73/SS715 +SS326 http://www.openstreetmap.org/?lat=43.326457&lon=11.549098
NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us301.wpt:76:NC43/NC48_S http://www.openstreetmap.org/?lat=35.973595&lon=-77.802172
OH\usaoh\oh.oh012.wpt:1:OH115/OH189 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaoh\oh.oh115.wpt:7:OH12/OH189 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaoh\oh.oh189.wpt:13:OH12/OH115 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaus\oh.us023.wpt:13:OH32/OH124 http://www.openstreetmap.org/?lat=39.047869&lon=-83.023725
OR\usaor\or.or099.wpt:253:OR99W/OR99E http://www.openstreetmap.org/?lat=44.229772&lon=-123.204675
OR\usaor\or.or211.wpt:1:OR99E/OR214 http://www.openstreetmap.org/?lat=45.151311&lon=-122.831290
OR\usaor\or.or214.wpt:4:OR99E/OR211 http://www.openstreetmap.org/?lat=45.151311&lon=-122.831290
POL\eure\pol.e67.wpt:228:DW382/DW385 http://www.openstreetmap.org/?lat=50.580430&lon=16.803718
POL\poldk\pol.dk008klo.wpt:19:DW382/DW385 http://www.openstreetmap.org/?lat=50.580430&lon=16.803718
ROU\eure\rou.e60.wpt:186:DJ100B/DJ101E http://www.openstreetmap.org/?lat=44.784738&lon=26.098564
ROU\roudj\rou.dj222.wpt:15:DN22/DN22D http://www.openstreetmap.org/?lat=44.775267&lon=28.688680
ROU\roudj\rou.dj601e.wpt:1:DJ401A/DJ601 http://www.openstreetmap.org/?lat=44.448668&lon=25.778433
ROU\roudn\rou.dn001.wpt:19:DJ100B/DJ101E http://www.openstreetmap.org/?lat=44.784738&lon=26.098564
WY\usawy\wy.wy530.wpt:13:FR7/FR157 http://www.openstreetmap.org/?lat=41.209445&lon=-109.632559