Travel Mapping

User Discussions => Other Discussion => Topic started by: bejacob on June 14, 2017, 08:44:00 pm

Title: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: bejacob on June 14, 2017, 08:44:00 pm
Something odd is going on here http://tm.teresco.org/user/region.php?units=miles&u=bejacob&rg=ABW (http://tm.teresco.org/user/region.php?units=miles&u=bejacob&rg=ABW)

I picked ABW as a region because there is only one system so it's easy to see the issue. I'm guessing the same thing may be happening in other regions, though with multiple systems, it might be difficult to tell. This is a low priority fix, it's just something I noticed.

The first table "Overall ABW Region Statistics" shows a total of 53.76 clinchable miles

The second table "Statistics by System" shows 71.88 clinchable miles. I'm assuming concurrencies are not being correctly calculated on this table (i.e. the total is just the sum of the mileage from the route in the next table "Statistics by Route" without concern for concurrencies).

The last table once again has 53.76 total miles that could be clinched.

It's probably not a big deal that one part of this page is wrong, but it would be nice if all tables on this page show the same total possible mileage that could be clinched.

Again, low priority, just something I saw and thought to bring up.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on June 14, 2017, 08:56:27 pm
Definitely something to check, out thanks.

https://github.com/TravelMapping/Web/issues/186
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: the_spui_ninja on June 24, 2017, 11:24:08 am
The second table "Statistics by System" shows 71.88 clinchable miles. I'm assuming concurrencies are not being correctly calculated on this table (i.e. the total is just the sum of the mileage from the route in the next table "Statistics by Route" without concern for concurrencies).
I think that's the case; I noticed the same thing in Idaho where adding up the statistics by system is more than the total clinchable mileage.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: new_friends_gr on July 22, 2017, 03:11:59 pm
I noticed a similar issue discrepancy with some of my travel data. We'll use the Interstate highways in Ohio as my example.

When I look at my table under Statistics by Region, it says that I have 15.39% of Ohio's highways traveled (I assume this incorporates all systems in Ohio). For the Interstate system within Ohio, it says I have 1476.33 miles of 1736.43 miles clinched (85.02%).

On the other hand, when I look at my table under Statistics by System, then go to Interstate Highways, and then Ohio as the region, it says I have 1317.50 of 1577.61 miles clinched (83.51%). I guess I'm not sure why these numbers don't match the ones above. Any idea?
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Ib3kii on July 23, 2017, 03:27:09 am
I believe it has to do with concurring routes. I big example in Ohio is I-80/90, the two routes concur together for over 100 miles so I believe concurrences are where those extra miles come from.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: michih on July 23, 2017, 04:25:32 am
I big example in Ohio is I-80/90, the two routes concur together for over 100 miles so I believe concurrences are where those extra miles come from.

I-80/I-90 concurrency is NOT broken in Ohio.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: si404 on July 23, 2017, 08:48:43 am
I big example in Ohio is I-80/90, the two routes concur together for over 100 miles so I believe concurrences are where those extra miles come from.

I-80/I-90 concurrency is NOT broken in Ohio.
The issue is with the processing of the data - one figure single counts the concurrency, the other double counts it.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: michih on July 23, 2017, 09:25:52 am
The issue is with the processing of the data - one figure single counts the concurrency, the other double counts it.

I've added my region mileages and it's exactly my total mileage (53524.17mi). If there is really a general bug that concurrent routes are (always?) counted twice, all stats would be... quite incorrect...
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: mapmikey on July 23, 2017, 10:31:27 am
It appears to me this is what happens, using my Georgia file as an example http://tm.teresco.org/user/region.php?units=miles&u=mapmikey&rg=GA

The distance traveled at the top is the actual road mileage in the TM for Georgia, not the route mileage, meaning no concurrencies are counted twice or more.

I have not updated my Georgia file very far for the state route system, so most of my state route clinches in TM are associated with the US route overlays.  So most of my state route clinches do not contribute to the overall region total.

 Therefore looking at the state route clinched mileage it will include any state route's road mileage (includes concurrency routings with US routes), not the route mileage (concurrencies with other state routes).  If I go to the individual state route listings I would then get the mileage of that individual's route including any concurrencies.

TM at the region level or overall stats level is summing road miles, not route miles.





Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on July 23, 2017, 01:55:20 pm
Thanks for the bump on this.  I just need to trace through where this is all calculated (possibly some combination of site update and web DB queries) to see where the double-counting is coming in.  Might be able to look tonight.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on July 23, 2017, 09:19:32 pm
region.php has code that pulls from route-by-route mileages from the DB and adds them up without considering possible concurrencies.  system.php pulls the sums from the DB which does account for concurrencies during the site update process.  I need to change region.php's queries to use the numbers already computed and stored in the DB.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on July 23, 2017, 10:16:09 pm
I have a possible fix for region.php on the tmtest server now.  It no longer will list devel systems, but I never thought they made sense on the stat pages anyway since we can't clinch them.  Please let me know how it looks.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: new_friends_gr on July 24, 2017, 07:24:59 am
As of this morning, the discrepancy still exists between the two places where I'm looking at my mileage on Ohio's interstates. But I think the fact that one of those calculations counts the concurrencies separately and the other calculation does not seems to explain it. The difference is about 158 miles which seems like a fairly accurate number of miles of Interstate highways in Ohio that have concurrencies.

Thanks for looking into this!
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: michih on July 24, 2017, 07:34:37 am
I have a possible fix for region.php on the tmtest server now.

http://tmtest.teresco.org/stat.php
http://tm.teresco.org/stat.php

Both output exactly the same figures. Single user stats for regions are also identical.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on July 24, 2017, 09:34:13 am
As of this morning, the discrepancy still exists between the two places where I'm looking at my mileage on Ohio's interstates. But I think the fact that one of those calculations counts the concurrencies separately and the other calculation does not seems to explain it. The difference is about 158 miles which seems like a fairly accurate number of miles of Interstate highways in Ohio that have concurrencies.

Looks like they match when I bring them up.  Maybe you're looking at the production server instead of the test server. Here are the pages I'm looking at:

http://tmtest.teresco.org/user/region.php?units=miles&u=new_friends_gr&rg=OH

http://tmtest.teresco.org/user/system.php?units=miles&u=new_friends_gr&sys=usai

I'll push this to the production server some time today unless I hear of problems.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on July 24, 2017, 02:54:57 pm
The fix is live on the main site now.  Please report here or in GitHub if it looks like this is still problematic.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: bejacob on July 25, 2017, 09:53:52 am
I have a possible fix for region.php on the tmtest server now.  It no longer will list devel systems, but I never thought they made sense on the stat pages anyway since we can't clinch them.  Please let me know how it looks.

I just checked my ABW routes, which was how I identified the issue in the first place, and the discrepancy is gone. This appears to have fixed the problem. Thank you.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: new_friends_gr on July 29, 2017, 02:35:01 pm
The problem I had is fixed now too. I even looked at one of my other states for which I have Interstate highway mileage where there's concurrencies in that state, and the separate pages match now in terms of Interstate mileage.

Thanks, Jim!
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Markkos1992 on February 15, 2022, 08:58:03 am
I was looking for the thread that the oscar .01 mileage difference issue was mentioned in, but I could not find it.

I bring it up because my log file (https://travelmapping.net/logs/users/markkos1992.log) is currently showing myself having 11290.95 miles in PA while my stats page (https://travelmapping.net/user/?u=markkos1992) shows myself having 11290.90 miles.

The overall mileage of 42044.44 miles in active status and 42059.50 miles in active and preview is the same on both pages.  I added preview mileage on my last trip because of the Korean War Veterans Pkwy being in the maybe activated soon usanyp instead of usasf.

I did not see any discrepancies in my other regions.

EDIT:  There is also a difference between the total mileage on my stats (17653.20 miles) and my log file (17653.22 miles).
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: yakra on February 15, 2022, 10:32:44 am
Running a site update on lab2 to see if the C++ siteupdate program produces different results.
Edit 2: Same results.

Edit 1: Rounding errors & point-to-point great circle distance: the curious case of Oscar's mileage on ME103 (https://forum.travelmapping.net/index.php?topic=3014)
That was the smallest of freak occurrences, and produced the only diff in anyone's stats sitewide at the time.
Whatever's going on here has got to be something different. Multiple errors that far down in the sig figs won't add up to a discrepancy this big.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: yakra on February 15, 2022, 12:10:20 pm
The 11290.9 figure is ostensibly stored directly in the DB (the tm_convert_distance (https://github.com/TravelMapping/Web/blob/bc57123b057001f313818023038872cf33efdd82/lib/tmphpfuncs.php#L457) function then adds back one more sig fig):
Code: [Select]
mysql> describe clinchedOverallMileageByRegion;
+----------------------+-------------+------+-----+---------+-------+
| Field                | Type        | Null | Key | Default | Extra |
+----------------------+-------------+------+-----+---------+-------+
| region               | varchar(8)  | YES  |     | NULL    |       |
| traveler             | varchar(48) | YES  |     | NULL    |       |
| activeMileage        | float       | YES  |     | NULL    |       |
| activePreviewMileage | float       | YES  |     | NULL    |       |
+----------------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> select * from clinchedOverallMileageByRegion where region = 'PA' and traveler = 'markkos1992';
+--------+-------------+---------------+----------------------+
| region | traveler    | activeMileage | activePreviewMileage |
+--------+-------------+---------------+----------------------+
| PA     | markkos1992 |       11290.9 |              11290.9 |
+--------+-------------+---------------+----------------------+
1 row in set (0.02 sec)

But the line from the .sql file itself is:
,('PA','markkos1992','11290.9455438252','11290.9455438252')

Is this a MySQL bug? Why is this getting rounded so far down?
Other regions are rounded to different precisions that make little to no sense:

,('VA','markkos1992','7032.97920062242','7032.97920062242')
Code: [Select]
mysql> select * from clinchedOverallMileageByRegion where region = 'VA' and traveler = 'markkos1992';
+--------+-------------+---------------+----------------------+
| region | traveler    | activeMileage | activePreviewMileage |
+--------+-------------+---------------+----------------------+
| VA     | markkos1992 |       7032.98 |              7032.98 |
+--------+-------------+---------------+----------------------+
1 row in set (0.01 sec)

,('NH','markkos1992','2.86282452101549','2.86282452101549')
Code: [Select]
mysql> select * from clinchedOverallMileageByRegion where region = 'NH' and traveler = 'markkos1992';
+--------+-------------+---------------+----------------------+
| region | traveler    | activeMileage | activePreviewMileage |
+--------+-------------+---------------+----------------------+
| NH     | markkos1992 |       2.86282 |              2.86282 |
+--------+-------------+---------------+----------------------+
1 row in set (0.01 sec)

Jim, what results do you get for the same SQL queries on noreaster?

I get comparable results with grep "'$rg','markkos1992'" "home/tmp/tm/TravelMapping-2022-02-14@22:27:17.sql"
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: michih on February 15, 2022, 12:55:20 pm
I was curious. 2 out of the my first 10 regions were different, so I checked all 86 regions. 172 values. But didn't find more.

First line log, second line web (in log format so that the diffs can be seen):

Code: [Select]
DEU-BY: 11219.47 of 13849.34 mi (81.01%), 11265.43 of 13930.47 mi (80.87%)
DEU-BY: 11219.50 of 13849.30 mi (81.01%), 11265.40 of 13930.50 mi (80.87%)

Active traveled is off by 0.03mi.
Active total is off by 0.04mi.
Preview traveled is off by 0.03mi.
Preview total is off by 0.03mi.


Code: [Select]
DEU-HE: 2563.86 of 6598.61 mi (38.85%), 2563.86 of 6598.61 mi (38.85%)
DEU-HE: 2563.85 of 6598.61 mi (38.85%), 2563.85 of 6598.61 mi (38.85%)

Both travel values are off by 0.01mi

The only conspicuity is that BY and HE are neighboring regions....
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: yakra on February 15, 2022, 01:06:09 pm
grep 'DEU-BY','michih' /home/tmp/tm/TravelMapping-2022-02-14@22:27:17.sql
NEer: ,('DEU-BY','michih','11219.469861011894','11265.430332183509')
lab2: ,('DEU-BY','michih','11219.4698610119','11265.4303321835')
Code: [Select]
mysql> select * from clinchedOverallMileageByRegion where region = 'DEU-BY' and traveler = 'michih';
+--------+----------+---------------+----------------------+
| region | traveler | activeMileage | activePreviewMileage |
+--------+----------+---------------+----------------------+
| DEU-BY | michih   |       11219.5 |              11265.4 |
+--------+----------+---------------+----------------------+
1 row in set (0.02 sec)

grep 'DEU-BY','michih' /home/tmp/tm/TravelMapping-2022-02-14@22:27:17.sql
NEer: ,('DEU-HE','michih','2563.855005560839','2563.855005560839')
lab2: ,('DEU-HE','michih','2563.85500556084','2563.85500556084')
Code: [Select]
mysql> select * from clinchedOverallMileageByRegion where region = 'DEU-HE' and traveler = 'michih';
+--------+----------+---------------+----------------------+
| region | traveler | activeMileage | activePreviewMileage |
+--------+----------+---------------+----------------------+
| DEU-HE | michih   |       2563.85 |              2563.85 |
+--------+----------+---------------+----------------------+
1 row in set (0.02 sec)

What's with DEU-BY getting rounded up while DEU-HE gets truncated?
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on February 15, 2022, 01:15:33 pm
Maybe we need to try making it a DOUBLE instead of FLOAT for those entries, as they seem to be limited to 6 digits of precision whether before or after the decimal point (though a float should do much better than that!).
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: michih on February 15, 2022, 01:19:03 pm
Maybe we need to try making it a DOUBLE instead of FLOAT for those entries

I also thought about a data type issue.... but why does it only affect so less values?

And why do we mostly face diffs only for traveled mileage - only BY totals are different so far?
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: yakra on February 15, 2022, 01:20:20 pm
I wonder if under the hood, that's fixed, or text, or something.
In .dbf files, numeric fields are actually just stored as text. The more you know...

I can give DOUBLEs a try on lab2 and see what I come up with.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: michih on February 15, 2022, 01:29:29 pm
And why do we mostly face diffs only for traveled mileage - only BY totals are different so far?

nope. CZE has also diffs for active and active+preview totals (12875.58 vs 12875.60), Finland for active (16873.6 vs. 16873.62),.... I just didn't compare them with the first run :D
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: yakra on February 15, 2022, 01:31:45 pm
Crossing my fingers & trying out a
sed 's~FLOAT~DOUBLE~g' 'TravelMapping-2022-02-15@10:32:20.sql' | mysql --defaults-group-suffix=travmapadmin -u travmapadmin TravelMapping

Edit: This appears to fix it. Shall I go ahead & make the changes to siteupdate?
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on February 15, 2022, 01:45:27 pm
Sure, let's change that where needed.  More memory/disk but we have plenty of that.  I don't think we want a blanket FLOAT->DOUBLE, just placed where numbers are likely to get larger.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: yakra on February 15, 2022, 02:08:46 pm
Locally converted: overallMileageByRegion & clinchedOverallMileageByRegion

Less sure about systemMileageByRegion & clinchedSystemMileageByRegion.
Shell script, mumble mumble, curl... see if differences...
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: yakra on February 15, 2022, 03:33:24 pm
systemMileageByRegion & clinchedSystemMileageByRegion
Code: [Select]
server=$1

echo `date`": Downloading user logs"
mkdir -p userlogs.tmp
for u in `curl $server/stats/allbyregionactivepreview.csv 2>/dev/null | cut -f1 -d, | tail -n +2 | head -n -1`; do
  wget -O userlogs.tmp/$u.log "$server/logs/users/$u.log" 2>/dev/null
  echo -n "$u "
done; echo
echo `date`": Done."

for sys in `tail -n +2 /home/yakra/tm/HighwayData/systems.csv | egrep -v '^#|^$|;devel$' | cut -f1 -d';'`; do
  data=`curl $server/stats/$sys-all.csv 2>/dev/null`
  regions=`echo "$data" | head -n 1 | sed -e 's~^Traveler,Total,~~' -e 's~,~ ~g'`
  users=`echo "$data" | head -n -1 | tail -n +2 | cut -f1 -d,`
  ut=`echo $users | wc -w`
  for rg in $regions; do
    un=0
    for u in $users; do
      echo -en "$sys in $rg for $u                "

      # web
      web=$(curl "$server/user/system.php?units=miles&u=$u&sys=$sys&rg=$rg" 2>/dev/null \
            | egrep '[0-9.]+ of [0-9.]+ miles' \
            | sed -r 's~.*>([0-9.]+ of [0-9.]+) miles.*~\1~')
      web1=`echo "$web" | cut -f1 -d' '`
      web2=`echo "$web" | cut -f3 -d' '`

      # log
      if [ `echo $regions | wc -w` == 1 ]; then
        log=$(grep "^System $sys ([a-z]\+) overall: " userlogs.tmp/$u.log | sed -r 's~.*: (.*) mi \(.*~\1~')
      else
        beg=$(grep -m 1 -n "^System $sys by region:$" userlogs.tmp/$u.log | cut -f1 -d:)
        end=$(grep -m 1 -n "^System $sys by route (traveled routes only):$" userlogs.tmp/$u.log | cut -f1 -d:)
        log=$(tail -n +$beg userlogs.tmp/$u.log | head -n $(expr $end - $beg) | grep "^  $rg: " | sed -r 's~.*: (.*) mi \(.*~\1~')
      fi
      log1=`echo "$log" | cut -f1 -d' '`
      log2=`echo "$log" | cut -f3 -d' '`

      #compare
      if   [ "$web1" != "$log1" ]; then
        echo -e "\n  web: $web"
        echo -e "  log: $log"
      elif [ "$web2" != "$log2" ]; then
        echo -e "\n  web: $web"
        echo -e "  log: $log"
        break
      else echo -en '\r'
      fi

      un=`expr $un + 1`
    done
  done
done
echo `date`": Finished."
I'm going to do 3 2 runs of this:
noreaster (normal .sql file) <-- a site update is going to happen, and invalidate the results. Just lab2 will be sufficient.
• lab2, tm.conf -> tm.conf.updating (normal .sql file)
• lab2, tm.conf -> tm.conf.standard (all FLOAT -> DOUBLE)
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on February 15, 2022, 09:51:25 pm
We might want DECIMAL instead of FLOAT or DOUBLE here.  I did not know about it, but I think we can force a certain number of digits after the decimal point that way.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: yakra on February 15, 2022, 10:28:10 pm
Would that require further changes to the tables, like setting the precision or number of decimal places, etc?

Sure, let's change that where needed.  More memory/disk but we have plenty of that.  I don't think we want a blanket FLOAT->DOUBLE, just placed where numbers are likely to get larger.
Good news is, just changing overallMileageByRegion, clinchedOverallMileageByRegion, systemMileageByRegion & clinchedSystemMileageByRegion appears to do the trick.
Larger numbers will be here, and with 58573 rows between the 4 tables, that's way less than the (clinched)(Connected)Routes tables, with 961038 combined.
I could script around for the unlikely diff there... but that'd take literally days. :D Edit: (I really like saying Edit:) I won't attempt this (thankfully), because there's no way to disambiguate some ConnectedRoutes, such as I-265FutLou in KY & IN.
Title: Re: Regional Statistics - Discrepancy on Total Clinchable Mileage
Post by: Jim on February 16, 2022, 07:09:37 am
Yes, it looks like we would want something like DECIMAL(7,2) but since I just learned of this option I don't know for sure.  If some changes to DOUBLE have fixed the immediate problem, I'd say let's open an Issue (if there's not one) to remind to investigate further which columns might be more appropriate as a DECIMAL.