Author Topic: Using grep to search highway data, waypoint labels, etc.  (Read 2497 times)

0 Members and 1 Guest are viewing this topic.

Offline cvoight

  • Newbie
  • *
  • Posts: 18
  • Last Login:Today at 04:43:50 pm
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #15 on: January 16, 2020, 01:52:09 pm »
Quote
RayWinSP, GreSmoNP
Highway ends at park: For ends at a park or other non-commercial endpoint, abbreviate the park name as if it were a named highway (see rules above). Use NP for national park, PP for provincial park, SP for state park.

First let's establish some baseline numbers. Park suffixes should be preceded by at least one lowercase letter and followed by a space and at some point an OSM URL. This screens out borders with Spain (ESP) and any wpt files that aren't true wpt files (some stuff in ITA).
Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]+NP\s.*http").length
75
PS ..\hwy_data> (Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]+PP\s.*http").length
163
PS ..\hwy_data> (Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]+SP\s.*http").length
225

Theoretically the US will not have provincial parks PP, so let's see if there are any waypoint labels for provincial parks in the US:
Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]+PP\s.*http").length
11
A surprising amount, but upon investigation they are all MOsPP -- waypoints for Missouri supplemental routes, which are interesting. [brief, old forums discussion]

We can screen these out by requiring two or more lowercase letters [a-z]{2,} before a NP/PP/SP suffix and see what happens. NP stays at 75, SP decreases to 219 (-6), and PP decreases to 151 (-12). We are missing 1 result from PP, so let's see if we can find the odd one out:
Code: [Select]
PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:AB|BC|MB|N[BLST]|ON|PE|QC|SK|YT)$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[A-Z][a-z]{1}PP\s.*http"
BC\canbc\bc.bc097.wpt:284:WhiPtPP http://www.openstreetmap.org/?lat=54.908111&lon=-122.933021
It turns out that BC97 has many waypoints with the suffix PP, which is not something the style guidelines mention.

To see what's going on, we could attempt to construct a search query that finds all the NP/PP/SP waypoints that aren't in the first or last line of a file. But it's a lot easier to conduct a cursory perusal of the results for SP and PP and look for files with more than two hits since you can't have more than two ends in a highway wpt file. What we find is that this nomenclature is widespread, and may merit a specific mention in the style guidelines.
Code: [Select]
PS ..\hwy_data> Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "[a-z]{2,}SP\s.*http"
[...]
CA\usaca\ca.ca001.wpt:7:CryCoveSP http://www.openstreetmap.org/?lat=33.564211&lon=-117.826000
CA\usaca\ca.ca001.wpt:183:HarHeaSP http://www.openstreetmap.org/?lat=35.478241&lon=-120.991826
CA\usaca\ca.ca001.wpt:187:SanSimSP http://www.openstreetmap.org/?lat=35.593590&lon=-121.124847
CA\usaca\ca.ca001.wpt:201:LimBeaSP http://www.openstreetmap.org/?lat=36.008557&lon=-121.518209
CA\usaca\ca.ca001.wpt:205:JulPfeSP http://www.openstreetmap.org/?lat=36.158883&lon=-121.670644
CA\usaca\ca.ca001.wpt:208:BigSurSP http://www.openstreetmap.org/?lat=36.252768&lon=-121.787332
CA\usaca\ca.ca001.wpt:209:AndMolSP http://www.openstreetmap.org/?lat=36.288382&lon=-121.844172
CA\usaca\ca.ca001.wpt:353:RusGulSP http://www.openstreetmap.org/?lat=39.329785&lon=-123.804808
[...]
OR\usaus\or.us026.wpt:2:KloCreSP http://www.openstreetmap.org/?lat=45.921155&lon=-123.895140
OR\usaus\or.us026.wpt:15:HumCreRd +SunHwySP http://www.openstreetmap.org/?lat=45.889471&lon=-123.625449
OR\usaus\or.us026.wpt:177:OchLakeSP http://www.openstreetmap.org/?lat=44.307679&lon=-120.698118
OR\usaus\or.us026.wpt:228:ClyHolSP http://www.openstreetmap.org/?lat=44.417344&lon=-119.090466
[...]
« Last Edit: January 16, 2020, 01:55:02 pm by cvoight »

Offline michih

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2119
  • Last Login:Today at 01:48:39 pm
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #16 on: January 16, 2020, 02:42:47 pm »
@cvoight, thanks for your effort.

We have automated checks which are reported and updated with every site update: http://travelmapping.net/devel/datacheck.php

I just wonder, why some are not detected, e.g.:

Code: [Select]
NLD\nlda\nld.a325.wpt:2:Elden http://www.openstreetmap.org/?lat=51.958363&lon=5.888801
NLD\nlda\nld.a325.wpt:3:BurgMatsl http://www.openstreetmap.org/?lat=51.950772&lon=5.878973

@yakra? Any idea, should I open a Github issue?
not yakra, but I did peruse siteupdate.py, specifically the results for datacheckerrors.append which I believe is a comprehensive accounting of all the data checks. Generally, the waypoint label checks currently implemented don't test for consistency with the labeling style guidelines

True, we just check long underscores.

Offline cvoight

  • Newbie
  • *
  • Posts: 18
  • Last Login:Today at 04:43:50 pm
Re: Using grep to search highway data, waypoint labels, etc.
« Reply #17 on: January 20, 2020, 03:42:40 pm »
A variation on potential no-underscore malformed city suffixes from the OP.

Quote
US40: US40BusWhi; US73: US40BusWhi, US40BusTho; A3: A3Zur
Intersections with visibly numbered highways: Distinguish two different same-bannered same-numbered routes as needed with the 3-letter city abbreviations. Also use the city abbreviation for bannerless same-designation spurs or branches, such as the Zurich A3 spur intersecting the main A3.

Start by identifying some candidates for inclusion: find all the US primary waypoint labels that have a non-underscored suffix where 1 uppercase character is followed by 3 or more lowercase characters (38 results). (Similar idea holds across the project, it's just a lot easier to focus on the US for posting results.)

Code: [Select]
PS ..\hwy_data> Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+[A-Z][a-z]{3,}[A-Z]*\s.*http"

--AL\usaal\al.al025.wpt:13:CR16Hale http://www.openstreetmap.org/?lat=32.600940&lon=-87.592294
--AL\usaal\al.al025.wpt:27:CR49Hale http://www.openstreetmap.org/?lat=32.888960&lon=-87.426410
--AL\usaal\al.al025.wpt:30:CR16Bibb http://www.openstreetmap.org/?lat=32.903288&lon=-87.313542
--AL\usaal\al.al051.wpt:13:CR11Dale http://www.openstreetmap.org/?lat=31.577896&lon=-85.735438
--AL\usaal\al.al053.wpt:13:TN7Truck http://www.openstreetmap.org/?lat=34.991991&lon=-86.844633
AL\usaus\al.us011.wpt:6:CR2York http://www.openstreetmap.org/?lat=32.482895&lon=-88.312300
AL\usaus\al.us011.wpt:7:CR19York http://www.openstreetmap.org/?lat=32.487959&lon=-88.299555
AL\usaus\al.us011.wpt:16:CR20Epes http://www.openstreetmap.org/?lat=32.689060&lon=-88.128548
AL\usaus\al.us011.wpt:17:CR21Epes http://www.openstreetmap.org/?lat=32.689755&lon=-88.122915
AL\usaus\al.us011.wpt:99:AL75Conn http://www.openstreetmap.org/?lat=33.587713&lon=-86.699638
AL\usaus\al.us011.wpt:135:AL35Conn http://www.openstreetmap.org/?lat=34.430120&lon=-85.732729
AL\usaus\al.us029.wpt:21:CR25Rome http://www.openstreetmap.org/?lat=31.141818&lon=-86.668611
AL\usaus\al.us084.wpt:109:CR31Dale http://www.openstreetmap.org/?lat=31.268918&lon=-85.673044
AL\usaus\al.us098.wpt:12:I-65Ramps http://www.openstreetmap.org/?lat=30.707093&lon=-88.122593
--AR\usaar\ar.ar125.wpt:18:Hwy125Park http://www.openstreetmap.org/?lat=36.490224&lon=-92.778170
--AR\usaar\ar.ar300.wpt:8:Hwy300Spur http://www.openstreetmap.org/?lat=34.937503&lon=-92.586571
KS\usaus\ks.us183.wpt:53:17Terr http://www.openstreetmap.org/?lat=39.390206&lon=-99.296091
MN\usamn\mn.mn371.wpt:48:CR38Cass http://www.openstreetmap.org/?lat=47.121790&lon=-94.613078
MS\usaus\ms.us051.wpt:87:I55Front http://www.openstreetmap.org/?lat=32.405110&lon=-90.145912
MS\usaus\ms.us084.wpt:26:MS184Mead http://www.openstreetmap.org/?lat=31.465422&lon=-90.906458
MS\usaus\ms.us084.wpt:30:MS184Bude http://www.openstreetmap.org/?lat=31.469521&lon=-90.841870
MS\usaus\ms.us084.wpt:119:MS184Cleo http://www.openstreetmap.org/?lat=31.700203&lon=-89.037237
NC\usanc\nc.nc159.wpt:4:NC159Spur http://www.openstreetmap.org/?lat=35.633211&lon=-79.782693
NC\usaus\nc.us001.wpt:80:US158/1Conn http://www.openstreetmap.org/?lat=36.364915&lon=-78.361954
NC\usaus\nc.us023.wpt:16:US23Conn http://www.openstreetmap.org/?lat=35.381729&lon=-83.222433
NC\usaus\nc.us070.wpt:161:US70Conn http://www.openstreetmap.org/?lat=36.083329&lon=-79.148244
NC\usaus\nc.us074.wpt:36:US23Conn http://www.openstreetmap.org/?lat=35.381729&lon=-83.222433
NC\usaus\nc.us301.wpt:30:US301Conn http://www.openstreetmap.org/?lat=35.121901&lon=-78.763411
NC\usausb\nc.us019trkway.wpt:11:US23Conn http://www.openstreetmap.org/?lat=35.381729&lon=-83.222433
NC\usausb\nc.us023bussyl.wpt:2:US23Conn http://www.openstreetmap.org/?lat=35.373978&lon=-83.225909
NC\usausb\nc.us064trkhen.wpt:10:US23Conn http://www.openstreetmap.org/?lat=35.381729&lon=-83.222433
NC\usausb\nc.us220busash.wpt:8:US220Conn http://www.openstreetmap.org/?lat=35.736951&lon=-79.808335
NE\usane\ne.ne008.wpt:52:645Blvd http://www.openstreetmap.org/?lat=40.058998&lon=-95.720047
NE\usaus\ne.us030.wpt:137:15Blvd http://www.openstreetmap.org/?lat=41.452731&lon=-96.626269
OH\usaus\oh.us052.wpt:93:US23Spur http://www.openstreetmap.org/?lat=38.486542&lon=-82.638752
SC\usasc\sc.sc133.wpt:2:12MileRA http://www.openstreetmap.org/?lat=34.704488&lon=-82.833384
VA\usava\va.va365.wpt:3:VA365East http://www.openstreetmap.org/?lat=36.957932&lon=-81.072166
WA\usawa\wa.wa100.wpt:5:WA100Spur +BeaHolSP http://www.openstreetmap.org/?lat=46.289990&lon=-124.056281

Some of these are in preview (prefixed with --). Some of these are mystifying and I simply have no clue whether there is an issue, e.g. the AL US routes. And some we can do a quick analysis to find out whether they are consistently formatted across the project or are outliers, e.g. Conn and Spur. (Turns out, the latter.)
Code: [Select]
PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+Con\s.*http").length
300 (Con suffix)
PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+Conn\s.*http").length
11 (Conn suffix)

PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+Spr\s.*http").length
342 (Spr suffix)
PS ..\hwy_data> (Get-ChildItem -Directory | Where-Object {$_.name -Match "^(?:A[KLRSZ]|C[AOT]|D[CE]|FL|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEINOPST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$"} | Get-ChildItem -Include "*.wpt" -Recurse | Select-String -CaseSensitive -Pattern "^[^\s]+[0-9]+Spur\s.*http").length
4 (Spur suffix)