User Discussions > How To?
Using grep and shell scripts to search highway data, waypoint labels, etc.
oscar:
Many of these errors are in in-development/preview systems, rather than active systems. Many errors in those systems will be picked up and fixed in the peer review process, to get the systems ready for promotion to active status.
In the preview systems I manage, peer review has already picked up all the listed errors, and fixes are in my queue. The two errors in my active systems, one was already in my queue, and the other one has just been added.
michih:
@cvoight, thanks for your effort.
We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php
I just wonder, why some are not detected, e.g.:
--- Code: ---NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973
--- End code ---
@yakra? Any idea, should I open a Github issue?
It's an active system and I'll fix it.
cvoight:
Parenthesizing your Select-String query and appending .length allows you to find the total number of results. For example, to find the number of waypoint labels with a US route as the primary highway (31,265):
--- Code: ---PS ..\hwy_data> (Get-ChildItem -Directory "*usa*" -Recurse | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US").length
31265
--- End code ---
--- Quote ---US80/42, A5/A6, I-5/6, I-80/90, I-80/6
Putting two highways in a waypoint label: Put the primary highway first, followed by a slash, followed by the second highway. Drop the prefix of the second highway if it is more than one character long. A5/A6 becomes A5/A6. I-5/I-6 becomes I-5/6. I-25/US50 becomes I-25/50.
--- End quote ---
edit: I redid this post as I did not read the label style guide properly. In fact, I quite bungled it in just about every way! My apologies for any confusion! Real results can be found below.
I actually find the waypoints with prefixes on both sides of the slash more attractive, but this does make for an interesting regex query. First, let's check and make sure we can return results for a general query by seeing if there are any waypoint labels with US routes on both sides of the slash: US<something>/US<something>.
--- Code: ---PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US.*\/US"
NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NV\usai\nv.i580.wpt:1:US50/395 +US50/US395 http://www.openstreetmap.org/?lat=39.120747&lon=-119.771919
SC\usaus\sc.us521.wpt:47:US521Trk/601Trk_N +US521Trk/US601Trk_N http://www.openstreetmap.org/?lat=34.290499&lon=-80.611326
--- End code ---
Sure enough, but this regex gives us two false positives (NV and SC). yakra's grep queries use cut -f1 -d " " so that only the primary waypoint label is matched (US50/395 and US521Trk/601Trk_N here). The naive regex I'm been using looks for US at the start of a primary waypoint label, but then matches zero or more of any character (.*) until it finds a /US. Changing .* to [^\s]* (zero or more non-whitespace characters) eliminates the issue as the match will stop once it gets past the primary waypoint label. Success!
--- Code: ---PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^US[^\s]*\/US"
NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
--- End code ---
The style guide says to drop the secondary highway's prefix after the slash if it's more than one character long, which is a somewhat simple query. Let us assume two things: (1) the only valid characters for highway prefixes are capital letters and hyphens; (2) a highway waypoint label is composed of a highway prefix followed by a number. So we can start by matching all non-whitespace characters that aren't a + (hidden waypoint label or something, not sure on the terminology) from the start of a primary waypoint label [^\s+] until we find a forward slash \/, then look for any instances where there are two or more uppercase letters followed by a number [A-Z-]{2,}[0-9]. Tying it all together: Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s+]+\/[A-Z-]{2,}[0-9]" (88 results using .length).
This is somewhat less interesting than one of the queries I was attempting because I misread the examples: what are the waypoint labels that include the same highway prefix on both sides of the slash? (I had also failed to note that the secondary highway prefix must be 2 characters, bah!) The reason it's interesting is that we get to use a capture group and backreferences! (backreferences actually make regular expressions...not regular, but that's why they're fun!) The pattern we want to construct should match two or more valid highway prefix characters, followed by any number of non-whitespace characters, followed by a forward slash, followed by our first match. A naive construction (([A-Z-]{2,})[^\s]*\/\1) quickly runs into problems due to the way backreferences work (see the section Backtracking Into Capturing Groups from the link): if two different sets of 3+ capital letters are on opposite sides of the slash and any subset of 2 characters is in the same order, a match results. You might see where this is going: borders look like valid highway prefixes to a regex and several border pairings yield matches: BGR/GRC, ALM/ALA, IRN/IRQ, etc. (I like to start testing regexes live and then use test data with regexr to iterate more quickly.)
So, matching highway prefixes alone is not enough, we have to be sure that the waypoint label is a highway. The first thing we do is change [^\s]* to [^\s]+. The + requires that the primary highway prefix be followed by one or more non-whitespace characters instead of zero or more. This screens out the BGR/GRC matches, but not many of the other border matches. If we assume that a highway is distinguished by the presence of a number after the highway prefix, we can screen out the rest of the false matches by requiring a number to appear in the secondary highway label after the secondary highway prefix. So, our final regex is: ^([A-Z-]{2,})[^\s]+\/\1[0-9]. If we tried to use non-whitespace characters [^\s] instead of a number [0-9] after the secondary highway prefix, the borders would still match because letters are non-whitespace. Likewise, trying to match only numbers after the primary highway prefix wouldn't work because many highways have suffixes that don't have to be numbers. (I'm sure this is well-understood information to anyone reading this, but I find it a useful practice to go through the reasoning.) Lastly, we can change our assumptions about what highway prefixes look like and use [^0-9] instead of [A-Z-]: assume any non-digit character can be a highway prefix.
note: I have not evaluated active vs. preview systems for these results, nor do I know whether any of these results need to be changed.
--- Code: ---PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^([^0-9]{2,})[^\s]+\/\1[0-9]"
BC\canbc\bc.bc003.wpt:224:BC93/BC95 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
BC\canbc\bc.bc093.wpt:22:BC3/BC95 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
BC\canbc\bc.bc095.wpt:33:BC3/BC93 http://www.openstreetmap.org/?lat=49.574442&lon=-115.682755
CO\usaus\co.us085.wpt:119:I-25/I-70 +I-25(214A) http://www.openstreetmap.org/?lat=39.780113&lon=-104.989429
DEU-BY\deub\deuby.b013.wpt:16:St2256/St2419 http://www.openstreetmap.org/?lat=49.543987&lon=10.233706
DEU-BY\deub\deuby.b022bam.wpt:2:St2271/St2450 http://www.openstreetmap.org/?lat=49.796849&lon=10.222435
DNK\eurtr\dnk.mrsja.wpt:214:SR207/SR211 http://www.openstreetmap.org/?lat=55.845085&lon=12.094746
IN\usain\in.in001.wpt:68:I-69/I-469 http://www.openstreetmap.org/?lat=41.168382&lon=-85.104218
IND-PB\index\indpb.acexpy.wpt:8:NH5/NH7 http://www.openstreetmap.org/?lat=30.658329&lon=76.820827
ISL\islth\isl.th041.wpt:3:TH45/TH429 http://www.openstreetmap.org/?lat=64.002911&lon=-22.602568
ITA\eure\ita.e78.wpt:24:SS73/SS715 +SS326 http://www.openstreetmap.org/?lat=43.326457&lon=11.549098
NC\usanc\nc.nc308trkwin.wpt:3:US13/US17_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us017.wpt:83:US13/US17BypWin_S http://www.openstreetmap.org/?lat=35.984342&lon=-76.957397
NC\usaus\nc.us301.wpt:76:NC43/NC48_S http://www.openstreetmap.org/?lat=35.973595&lon=-77.802172
OH\usaoh\oh.oh012.wpt:1:OH115/OH189 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaoh\oh.oh115.wpt:7:OH12/OH189 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaoh\oh.oh189.wpt:13:OH12/OH115 http://www.openstreetmap.org/?lat=40.881556&lon=-84.150386
OH\usaus\oh.us023.wpt:13:OH32/OH124 http://www.openstreetmap.org/?lat=39.047869&lon=-83.023725
OR\usaor\or.or099.wpt:253:OR99W/OR99E http://www.openstreetmap.org/?lat=44.229772&lon=-123.204675
OR\usaor\or.or211.wpt:1:OR99E/OR214 http://www.openstreetmap.org/?lat=45.151311&lon=-122.831290
OR\usaor\or.or214.wpt:4:OR99E/OR211 http://www.openstreetmap.org/?lat=45.151311&lon=-122.831290
POL\eure\pol.e67.wpt:228:DW382/DW385 http://www.openstreetmap.org/?lat=50.580430&lon=16.803718
POL\poldk\pol.dk008klo.wpt:19:DW382/DW385 http://www.openstreetmap.org/?lat=50.580430&lon=16.803718
ROU\eure\rou.e60.wpt:186:DJ100B/DJ101E http://www.openstreetmap.org/?lat=44.784738&lon=26.098564
ROU\roudj\rou.dj222.wpt:15:DN22/DN22D http://www.openstreetmap.org/?lat=44.775267&lon=28.688680
ROU\roudj\rou.dj601e.wpt:1:DJ401A/DJ601 http://www.openstreetmap.org/?lat=44.448668&lon=25.778433
ROU\roudn\rou.dn001.wpt:19:DJ100B/DJ101E http://www.openstreetmap.org/?lat=44.784738&lon=26.098564
WY\usawy\wy.wy530.wpt:13:FR7/FR157 http://www.openstreetmap.org/?lat=41.209445&lon=-109.632559
--- End code ---
cvoight:
--- Quote from: michih on January 11, 2020, 04:32:22 am ---@cvoight, thanks for your effort.
We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php
I just wonder, why some are not detected, e.g.:
--- Code: ---NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973
--- End code ---
@yakra? Any idea, should I open a Github issue?
--- End quote ---
not yakra, but I did peruse siteupdate.py (this link freezes my mobile browser for those browsing by phone, too much data), specifically the results for datacheckerrors.append which I believe is a comprehensive accounting of all the data checks. Generally, the waypoint label checks currently implemented don't test for consistency with the labeling style guidelines, but rather what I would consider more fundamental errors*, such as more than one slash, unbalanced parentheses, and invalid characters. There are a few checks for consistency, e.g. using "Bus" instead of "BL" or "BS" for business interstates, or the LONG_UNDERSCORE checks (flags _Abcd, _AbcdN type stuff), but nothing comprehensive. It would probably be useful to produce public documentation on the exact rules that throw flags (or maybe this exists and I couldn't find it at first glance).
* for what it's worth, I may be alone in drawing a distinction here, and you may consider these checks to be the same kind of thing.
cvoight:
Regions for the United States and Canada don't have a country prefix, but they can be isolated using an exhaustive OR'd regex. The results can then be piped into Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "<pattern_goes_here>" to scan waypoint labels. The (?: indicates this is a non-capturing group, which doesn't matter much in this instance and could be changed to just (. (We have to group the entire OR'd query so that the beginning of line symbol ^ and end of line symbol $ match properly. Without those symbols, MB would match Zambia ZMB or M[ADEINOPST] would match all of the Mexico MEX- regions.)
--- Code: ---USA: Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"}
CAN: Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:AB|BC|MB|N[BLST]|ON|PE|QC|SK|YT)$"}
--- End code ---
This is more seemly than using a unique system wildcard with Get-ChildItem -Directory "<system_wildcard_goes_here>" -Recurse, which does not exist in the case of the US because AUS A route directories contain the string usa).
--- Quote ---I-80, A-73
USA Interstates and Quebec Autoroutes retain hyphens.
--- End quote ---
We can redo our previous search for primary waypoint labels for interstates that lack a hyphen (already posted up, so I won't repost the results of the query):
--- Code: ---PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^I[0-9]"
--- End code ---
To check the primary waypoint labels for Quebec autoroutes, we can slim down our Where-Object query because we only care about QC:
--- Code: ---PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^QC$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -Pattern "^A[0-9]"
--- End code ---
No results, so all the primary waypoint labels for QC autoroutes (450 results) meet the style guidelines.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version