Travel Mapping

Web Design Discussion => General Web Design Discussion => Topic started by: mapcat on January 21, 2023, 11:37:42 am

Title: Unusual new behavior for region.php
Post by: mapcat on January 21, 2023, 11:37:42 am

The table is responding strangely when multiple users have the same distance traveled.

https://travelmapping.net/user/region.php?units=miles&u=mapcat&rg=dc (https://travelmapping.net/user/region.php?units=miles&u=mapcat&rg=dc)

In the DC region, 17 users have clinched everything. The table "Travelers in Region dc" shows two users ranked 1, one ranked 3, and varying numbers of other users ranked 4, 5, 7, 10, 13, and 16. All of these users have 47.64 miles in the region and until recently all of them were ranked 1.

It's not just happening with users at the top of the rankings. In the same table, the two users ranked 33 and the one ranked 35 have traveled the same routes. And it's not just affecting DC.

What changed?

Title: Re: Unusual new behavior for region.php
Post by: Markkos1992 on January 21, 2023, 12:58:54 pm

I just checked DE, and it has the same issue. (https://travelmapping.net/user/region.php?u=markkos1992&rg=DE&units=miles)

Title: Re: Unusual new behavior for region.php
Post by: michih on January 21, 2023, 02:32:57 pm

Yep, it is a general issue. LIE is a very small region. It might help on debugging: https://travelmapping.net/user/region.php?rg=lie

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 21, 2023, 04:19:23 pm

I suppose it's possible this could potentially be a data processing issue. Mapcat, how recently were they all ranked number one? Did you notice this after December 28th?

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 21, 2023, 05:22:48 pm

I'll update lab 2 with the same commits when I get home and check the db.
It could be a rounding issue; if it is data processing I have a good idea of what's probably causing it.

I'll load up both the C++ and python flavors and see if the issue happens with both.

Title: Re: Unusual new behavior for region.php
Post by: Jim on January 21, 2023, 05:37:00 pm

Yes, looking like rounding before populating the DB:

Code: [Select]

> select * from clinchedoverallmileagebyregion where region='LIE' order by activemileage;
+--------+----------------+--------------------+----------------------+
| region | traveler       | activeMileage      | activePreviewMileage |
+--------+----------------+--------------------+----------------------+
| LIE    | Aseluit        | 0.9841289652476017 |   0.9841289652476017 |
| LIE    | panda80        | 1.1540176816881598 |   1.1540176816881598 |
| LIE    | charliezeb     |  1.742166048369596 |    1.742166048369596 |
| LIE    | M3200          | 2.0687608407804086 |   2.0687608407804086 |
| LIE    | dantheman      |  4.626547316794324 |    4.626547316794324 |
| LIE    | spinoza        | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | griffith       |  7.196376674169677 |    7.196376674169677 |
| LIE    | martin0102     |  7.196376674169677 |    7.196376674169677 |
| LIE    | sneezy         |  7.196376674169677 |    7.196376674169677 |
| LIE    | the_spui_ninja |   7.94667527598344 |     7.94667527598344 |
| LIE    | bickendan      |  8.180505639417278 |    8.180505639417278 |
| LIE    | cougar1989     |  8.180505639417278 |    8.180505639417278 |
| LIE    | dave1693       |   8.46395000128351 |     8.46395000128351 |
| LIE    | philimon       |   8.46395000128351 |     8.46395000128351 |
| LIE    | niels          |  9.100692957671601 |    9.100692957671601 |
| LIE    | bartpetat      |  9.846927579160567 |    9.846927579160567 |
| LIE    | cinx           | 13.727240274465924 |   13.727240274465924 |
| LIE    | michih         | 13.727240274465927 |   13.727240274465927 |
+--------+----------------+--------------------+----------------------+

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 21, 2023, 06:17:10 pm

OK. there's this one part of the code that's a bit of a kluge; logs look okay (since they're rounded to two decimal places) but I've always felt kind of nervous about it. For good reason as it turns out; I'd failed to take the database into account.
I do have an idea for a fix (and I believe there's a GitHub issue already open as well); should be easy to do but may take some unexpected tinkering.
Your choice I guess whether to revert to Python in the interim.

Title: Re: Unusual new behavior for region.php
Post by: Jim on January 21, 2023, 06:47:52 pm

I'm willing to revert to Python for a bit if people find it enough of an annoyance. Definitely not a critical error, as best I can tell it only affects rankings not the actual mileages (at least not anywhere near the precisions we display).

Title: Re: Unusual new behavior for region.php
Post by: mapcat on January 21, 2023, 07:17:15 pm

Quote from: yakra on January 21, 2023, 04:19:23 pm

I suppose it's possible this could potentially be a data processing issue. Mapcat, how recently were they all ranked number one? Did you notice this after December 28th?

I did not notice this on January 1. After that, I haven't paid attention until today.

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 21, 2023, 08:32:36 pm

system.php also dispays the same behavior mapcat describes in the OP.
https://travelmapping.net/user/system.php?u=sys=usadc
https://travelmapping.net/user/system.php?u=sys=usanh
https://travelmapping.net/user/system.php?u=rg=NH&sys=usame

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 21, 2023, 08:36:32 pm

This is the post where I mumble to myself
about what variables are at play, how they're populated, what's iterated & how, yadda yadda...

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 21, 2023, 08:48:35 pm

Jim, I believe this may be a problem with only multithreaded siteupdate, and not siteupdateST.
ISTR the new commonupdate uses siteupdateST for -t 1, yes?
Let's give that a try for tonight's update.

Edit: grepping thru archived sql files.
~~TravelMapping-2022-11-29@21:17:10.sql would be the first update done with C++, using siteupdateST I believe. There are 4 travelers with exactly 7.1963766741696755 mi.~~
Schmedit: Oops wrong month. I wanted December.
When was the first multithreaded siteupdate performed, as opposed to using siteupdateST?

Title: Re: Unusual new behavior for region.php
Post by: Jim on January 21, 2023, 09:48:20 pm

Just finishing up a Python update for tonight. I can run ST tomorrow if you'd like, or re-run tonight's if it's helpful to see it sooner.

Title: Re: Unusual new behavior for region.php
Post by: Markkos1992 on January 21, 2023, 09:59:57 pm

DE and DC are fixed now.

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 21, 2023, 10:34:58 pm

Python is fine, thanks.
I've got a good handle on what the problem is, and will post soon to describe it.
I've a basic idea on what to do for a fix, but implementation may get a little tricky if I want to avoid data races and also maintain good performance. In particular I'm thinking of the "messy dictionary of dictionaries" (or messy unordered_map of unordered_maps in C++ese (https://www.google.com/search?q=inglese+kliban&tbm=isch)).

Title: Re: Unusual new behavior for region.php
Post by: Jim on January 22, 2023, 09:01:59 am

Last night's update ran Python. I'll do one now with single-threaded C++.

Title: Re: Unusual new behavior for region.php
Post by: Jim on January 22, 2023, 09:50:39 am

With this morning's one-thread C++:

Code: [Select]

> select * from clinchedoverallmileagebyregion where region='LIE' order by activemileage;
+--------+----------------+--------------------+----------------------+
| region | traveler       | activeMileage      | activePreviewMileage |
+--------+----------------+--------------------+----------------------+
| LIE    | Aseluit        | 0.9841289652476017 |   0.9841289652476017 |
| LIE    | panda80        | 1.1540176816881598 |   1.1540176816881598 |
| LIE    | charliezeb     |  1.742166048369596 |    1.742166048369596 |
| LIE    | M3200          |  2.068760840780408 |    2.068760840780408 |
| LIE    | dantheman      |  4.626547316794324 |    4.626547316794324 |
| LIE    | griffith       | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | martin0102     | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | sneezy         | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | spinoza        | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | the_spui_ninja | 7.9466752759834405 |   7.9466752759834405 |
| LIE    | bickendan      |  8.180505639417277 |    8.180505639417277 |
| LIE    | cougar1989     |  8.180505639417277 |    8.180505639417277 |
| LIE    | dave1693       |  8.463950001283509 |    8.463950001283509 |
| LIE    | philimon       |  8.463950001283509 |    8.463950001283509 |
| LIE    | niels          |  9.100692957671601 |    9.100692957671601 |
| LIE    | bartpetat      |  9.846927579160567 |    9.846927579160567 |
| LIE    | cinx           | 13.727240274465924 |   13.727240274465924 |
| LIE    | michih         | 13.727240274465924 |   13.727240274465924 |
+--------+----------------+--------------------+----------------------+

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 22, 2023, 10:51:33 am

Code: [Select]

beg=`grep -n 'INSERT INTO clinchedOverallMileageByRegion' $file | cut -f1 -d:`
end=`grep -n 'CREATE TABLE clinchedSystemMileageByRegion' $file | cut -f1 -d:`
tail -n +$beg $file | head -n $(expr $end - $beg) | egrep "^,?\('LIE',"

TravelMapping-2022-12-29@21:42:10.sql, the first update done with siteupdateST, looked fine.
TravelMapping-2023-01-02@21:53:23.sql (are we missing 2023-01-01?), the first update with multi-threaded siteupdate, has the differences in the least significant digits.

Quote from: yakra on March 26, 2022, 07:07:04 pm

Quote from: yakra on April 29, 2019, 05:33:19 pm
The concept at play here is that of cumulative rounding errors when working with finite-precision floating-point numbers.
Another thing affecting the result in the least significant digits is the order in which numbers are added together.

A simplified example -- if we can only ever have 3 sig figs each step of the way:
1.67 + 1.67 + 30 = 33.3
30 + 1.67 + 1.67 = 33.4

The values for the clinchedSystemMileageByRegion DB table are affected by this.
They're added together in whatever order they come out of a hash table.

The problem is with this little function (https://github.com/TravelMapping/DataProcessing/blob/871b4ff000934b554e48abde0784363e2f9681fa/siteupdate/cplusplus/classes/HighwaySegment/compute_stats_t.cpp) right here:

Code: [Select]

void HighwaySegment::compute_stats_t(TravelerList* t)
{	// credit active+preview for this region, which it must be
	// if this segment is clinched by anyone
	t->active_preview_mileage_by_region[route->region] += length/active_preview_concurrency_count;

	// credit active only for this region
	if (route->system->active())
		t->active_only_mileage_by_region[route->region] += length/active_only_concurrency_count;

	// credit this system in this region in the messy unordered_map of unordered_maps
	t->system_region_mileages[route->system][route->region] += length/system_concurrency_count;
}

More precisely, not in the function itself, but in what's going on when it gets called...

siteupdateST behaves essentially the same way siteupdate.py does:
We iterate through each HighwaySystem, then each Route, then each HighwaySegment, then call the function for each traveler it's clinched_by. (https://github.com/TravelMapping/DataProcessing/blob/871b4ff000934b554e48abde0784363e2f9681fa/siteupdate/cplusplus/siteupdate.cpp#L482-L494) Travelers get their segments credited in the same order (system->route->segment) every time, and the cumulative rounding error is always the same.

Threaded siteupdate works differently:
In order to avoid having to wait for mutexes (https://en.wikipedia.org/wiki/Mutex) while multiple threads processing different HighwaySystems compete for access to TravelerList objects, each thread (https://github.com/TravelMapping/DataProcessing/blob/871b4ff000934b554e48abde0784363e2f9681fa/siteupdate/cplusplus/threads/CompStatsTThread.cpp) gets a TravelerList to process, iterates through their clinched_segments & calls compute_stats_t. Problem is, these segments are stored in an unordered_set (https://github.com/TravelMapping/DataProcessing/blob/871b4ff000934b554e48abde0784363e2f9681fa/siteupdate/cplusplus/classes/TravelerList/TravelerList.h#L34). The name says is all; when we iterate through this container, its elements are processed in whatever order, each traveler's regional totals get rounded off a little differently at each stage of the game, and the cumulative rounding errors come out a little different in the end.

What to do:

The simpler solution is to undo the changes made in ComputeStatsThread parallelism phase 1 (https://github.com/TravelMapping/DataProcessing/pull/386/commits/31ccd30c985c95960a3b5b57850d89e7f463ecd8#diff-e218e3601adc91d7945af69a28b2860a004c1f915becce52fd3bc3bf66865377). We'd iterate the same way as before (system->route->segment), lock travelers' regional totals with mutexes, and just live with the performance hit. It'd not be as bad as the charts at #386 (https://github.com/TravelMapping/DataProcessing/pull/386) show, since we'd still keep the changes from ComputeStatsThread parallelism phase 2 (https://github.com/TravelMapping/DataProcessing/pull/386/commits/ae31cc6748918bf0056709f92dbe936608521239).
Another solution that could reduce lock contention... First, during Route construction, add each Route to a list in its Region. A small amount of overhead here, but not much. Then, when Computing stats per traveler, have each thread process a Region. This way, segments are processed in the same order -- system->route->segment. For the most part, we'd not need mutexes, as each thread would be reading/writing different areas of travelers' hash tables. The main snag is when a given region isn't in a traveler's tables yet, and we have to create a new key/value pair. I've an idea what to do here. It'd involve a little lock contention (and overhead), but not as much as in solution #1.
Potentially, implement #1 near-term and then work on #2.

system.php

Quote from: Jim on January 21, 2023, 09:48:20 pm

Just finishing up a Python update for tonight. I can run ST tomorrow if you'd like, or re-run tonight's if it's helpful to see it sooner.

Quote from: yakra on January 21, 2023, 10:34:58 pm

Python is fine, thanks.

Quote from: Jim on January 22, 2023, 09:01:59 am

Last night's update ran Python. I'll do one now with single-threaded C++.

Excellent, thanks! I was just about to request that.

system.php, when displaying mileage in all regions, sums up multiple values from the clinchedSystemMileageByRegion table (https://github.com/TravelMapping/Web/blob/b487bd0b6b867f5d03f6d4cccc2b514cf88b9e7a/user/system.php#L160-L166). Surely in whatever order they come out of the table. But these data aren't inserted into the table in a deterministic order, again due to being stored in an unordered_map. It might take a bit of searching to find this effect.

Edit: I did some searching. In 26 different systems covering multiple regions. Didn't observe the problem with the rankings, but I still believe it can happen.
Maybe I'll make 100 different .list files all clinching usaif or something, and update lab2. :)
for n in {100..199}; do for f in `tail -n +2 usaif.csv | tr ' ' '%' | sed -r 's~usaif;([A-Z]{2});.*;(.*);~\1/usaif/\2.wpt~'`; do echo -n $f | sed -r -e 's~.*([a-z]{2})\.i0*~\1 i-~' -e 's~\.wpt$~ ~' head -n 1 ../$f | cut -f1 -d' ' | tr '\n' ' ' tail -n 1 ../$f | cut -f1 -d' ' done > ~/tm/UserData/list_files/melvin$n.list done
Schmedit: LOL still nothing. I guess fixing this is low priority? 8)

Potential soution:

In "the messy unordered_map of unordered_maps (https://github.com/TravelMapping/DataProcessing/blob/871b4ff000934b554e48abde0784363e2f9681fa/siteupdate/cplusplus/classes/TravelerList/TravelerList.h#L38)", replace the outer unordered_map (https://cplusplus.com/reference/unordered_map/unordered_map/) with an ordered map (https://cplusplus.com/reference/map/map/). As long as systems get sorted the same way for everybody, sort criterion isn't important. Why not keep it fast & simple, and use the HighwaySystem's memory address.
Potential oddities: HighwaySystem objects could be ordered differently in memory in separate runs of siteupdate. This would mean that while people who have the same mileage in a system have the same mileage in each individual site update, mileage could fluctuate a tiny little bit from one update to the next. In practice this won't be obsevable, as mileages are rounded to 2 decimal places on the userpages and in userlogs & stats CSVs. Unless there's another freak occurrence like what happened with Oscar's mileage on ME103 (https://forum.travelmapping.net/index.php?topic=3014) where a difference 1/3 the width of a human hair made the difference between rounding off to one semi-trailer and the next. This is possible, but not probable. I don't believe I've ever observed it when comparing (https://github.com/yakra/tmtools/blob/master/shell_scripts/compare_all.sh) the output of different siteupdate versions.
Longer-term, we can eliminate the (remote) possibility of that happening by storing HighwaySystem objects sequentially in memory rather than allocating them one-by-one on the heap. Eventually, I'd like to do that for a lot more of the classes for performance purposes. There are complications in doing so, but not insurmountable ones.

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 25, 2023, 09:06:27 am

WRT Option #2:

Quote from: yakra on January 22, 2023, 10:51:33 am

when Computing stats per traveler, have each thread process a Region. This way, segments are processed in the same order -- system->route->segment. For the most part, we'd not need mutexes, as each thread would be reading/writing different areas of travelers' hash tables. The main snag is when a given region isn't in a traveler's tables yet, and we have to create a new key/value pair. I've an idea what to do here. It'd involve a little lock contention (and overhead), but not as much as in solution #1.

I knew this'd be complicated, but it's more complicated than I first thought.

Let's say we start by trying to find (https://cplusplus.com/reference/unordered_map/unordered_map/find/) the region. If it's not found, then lock the mutex, insert a new element, and unlock.
Between the time we fail to find the region and the time we acquire the lock, another thread may have inserted a value for that region. (No surprises here; most existing threads already need to check (https://github.com/TravelMapping/DataProcessing/blob/b3f2440d24dd49e2fafa02ed080270b4a67e99ec/siteupdate/cplusplus/threads/ReadWptThread.cpp#L6-L8) for these lightning-quick state changes.) No problem; just use the [] operator (https://cplusplus.com/reference/unordered_map/unordered_map/operator%5B%5D/). There'd be a little redundant searching & hashing, but that's the price we pay for thread safety.

But what if we DO find the region? We'd have an iterator we'd need to dereference. Which may become invalidated between when we find it and when we attempt to dereference:

Quote from: https://cplusplus.com/reference/unordered_map/unordered_map/operator[]/ (https://cplusplus.com/reference/unordered_map/unordered_map/operator%5B%5D/)

On most cases, all iterators in the container remain valid after the insertion. The only exception being when this function inserts a new element and this forces a rehash. In this case, all iterators in the container are invalidated.

Oy vey.
This would mean using a mutex here too and getting a new reference or iterator that we know is valid. At which point things are less efficient than option #1 in my last post.

Quote from: https://cplusplus.com/reference/unordered_map/unordered_map/operator[]/ (https://cplusplus.com/reference/unordered_map/unordered_map/operator%5B%5D/)

References to elements in the unordered_map (https://cplusplus.com/unordered_map) container remain valid in all cases, even after a rehash.

Part of me wants to cowboy thru the whole process, no-mutex, using the [] operator (which returns a reference).
Probably a bad idea though -- I could see multiple threads trying to do an insertion at the same time, and getting the container into a bad state.
May still try it though as a learning exercise; see what I can see. :pan:
Edit: It crashes. No big surprise here. :)

Assuming that's a no-go, what are our options?

I've implemented Option #1 locally, improved relative to before #386 by going from 3 mutexes in the TravelerList class to 1. Decreased overhead outweighs increased contention.
Takes 55% longer than no-build @ 4 threads on BiggaTomato. As always I'm curious to see how this scales to higher thread counts on the other machines, along with 2-mutex and 3-mutex flavors. :)
A few more ideas to further reduce contention:
1A: Move the compute_stats_t calls into the region->mtx.lock (https://github.com/TravelMapping/DataProcessing/blob/487b0423a0ec23b32f4e79c46f9bec96fe86dbdb/siteupdate/cplusplus/classes/Route/compute_stats_r.cpp#L36)? No, wait. Multiple threads could be trying to insert new regions into a traveler's hash tables at once. Probably still a bit too cowboy. Scratch that! Plus it introduces unnecessary contention, locking the region for all travelers at once.
1B: For a more threadsafe variant of the same idea, create a temporary unordered_map<TravelerList*,double[3]> while processing each Route, to store each traveler's mileage on it. No mutex required. Once we have the totals for the route, iterate through each traveler's totals and add them to the per-traveler hash tables. This still requires a mutex in the TravelerList class, but it's locked & unlocked once per Route instead of once per HighwaySegment. Decreased overhead. Same basic idea as in ComputeStatsThread parallelism phase 2 (https://github.com/TravelMapping/DataProcessing/pull/386/commits/ae31cc6748918bf0056709f92dbe936608521239#diff-cfffce5fd934b05229e6ec96117555b4725f1bd31d1e84b5d90c64bdab6c1406).
- 9,865,256 hash table lookups (the length of the clinched DB table + 4*the length of the clinchedRoutes DB table).
- (as of HD @ 4488c228 / UD @ 4b879ee)
2: As proposed above, iterate thru regions and cowboy thru the whole thing mutex-free after all. With one key difference: set up all the key/value pairs beforehand, most likely in Route::store_traveled_segments (https://github.com/TravelMapping/DataProcessing/blob/07db7cbdf65c389ae9c4c7d253c1dbf74d2c13a5/siteupdate/cplusplus/classes/Route/Route.cpp#L218-L227). This way, we wouldn't have to worry about multiple threads performing insertions, altering the container, and invalidating iterators.
How would this perform vs 1B?
- Pro: No mutexes at all. No contention, no overhead.
- Con: Way more hash table lookups: 33,721,660 (4*the number of times store_traveled_segments is called + 4*the length of the clinched DB table).
- Con: A wee bit of overhead during route construction.
3: Best-of-both-worlds approach. Set up key/value pairs beforehand as in #2. No mutex. Update traveler mileage tables once per route instead of every segment.
- 11,607,588 hash table lookups (4*the number of times store_traveled_segments is called + the length of the clinched DB table + 4*the length of the clinchedRoutes DB table).
Refactoring? The per-traveler regional mileage tables are stored in the TravelerList class, with Region*s as keys. They could in theory be stored in the Region class, with TravelerList*s as keys. This would remove the need to set up the tables in advance, but would slow down userlogs and stats CSVs. (The messy table of tables would have to be set up in advance no matter what.) Maybe a net loss in performance altogether; low priority.

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 25, 2023, 11:00:55 am

Option 1B is slower than even the 3-mutex flavor of Option 1. Probably due to one or both of:

The hash table lookups themselves may be fairly cheap, while insertions are expensive, leading to the creation of the temporary hash table taking more time.
The expense of iterating through an unordered_map, however much that is.

This probably means Option 3 isn't viable.
Meanwhile, Option 2 crashes. No idea why yet.

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 25, 2023, 08:47:03 pm

Quote from: yakra on January 25, 2023, 11:00:55 am

Meanwhile, Option 2 crashes. No idea why yet.

Quote from: yakra on January 25, 2023, 09:06:27 am

set up all the key/value pairs beforehand, most likely in Route::store_traveled_segments (https://github.com/TravelMapping/DataProcessing/blob/07db7cbdf65c389ae9c4c7d253c1dbf74d2c13a5/siteupdate/cplusplus/classes/Route/Route.cpp#L218-L227).

Just doing it here is insufficient.

Quote from: niels.list (https://github.com/TravelMapping/UserData/blob/13e8ce86bcd6f9d8df23095610b38acf5ed8ecfd/list_files/niels.list#L2023-L2024)

AB AB1 ComRd BC/AB AB AB93 BC/AB AB1_E

That's it for Alberta. More importantly, no TCH anywhere.
That means, no slot in their system_region_mileages for cantch.
When it comes time to call compute_stats_t:
• Using the [] operator means inserting new elements, means data races, means bad data in userlogs, stats CSVs, and the DB.
• Replacing it with unordered_map::at (https://cplusplus.com/reference/unordered_map/unordered_map/at/) leads to unhandled exceptions, crashes, and the ability to diagnose the problem. ;)
Solution: Set up additional key/value pairs when Augmenting travelers for detected concurrent segments.
How many more hash table lookups is this? 4 for every occurrence of "Concurrency augment for traveler" in concurrencies.log. 3,788,896. For a total of 37,510,556. :'(
On the plus side, we're right back down 947,224 lookups/insertions, no longer having to insert into clinched_segments. Nix creating, maintaining, iterating & destroying the temporary to_add list. (O HAY! I can remove that from Alternative #1 too!)

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 27, 2023, 10:42:49 am

Jim, as a programmer and CS prof, maybe you'll find this stuff interesting :D

Quote from: yakra on January 21, 2023, 06:17:10 pm

there's this one part of the code that's a bit of a kluge; ... I've always felt kind of nervous about it.

The kluge is here (https://github.com/TravelMapping/DataProcessing/commit/31ccd30c985c95960a3b5b57850d89e7f463ecd8#diff-8a323ef0bce4aba54cfd7eef56fdfeda18216c23c3f5135c13b00a89e5449386).
An early revision of "ComputeStatsThread Parallelism Phase 1" broke userlogs; the number of active & preview systems was consistently under-reported -- often being 0.
The culprit was the difference in rounding errors from adding up a series of doubles in different orders.
A HighwaySystem's total mileage is calculated by iterating through HighwaySegments in system -> route -> segment order.
A user's traveled mileage on that system went from the same to iterating thru TravelerList::clinched_segments (https://github.com/TravelMapping/DataProcessing/issues/366#issuecomment-732743745). And thus, the difference in rounding errors.
I justified this kluge-around (http://www.catb.org/esr/jargon/html/K/kluge-around.html) on that basis that in practice there won't be any single segment whose mileage is lower than the minimum precision of a 32-bit float. But it never sat right with me. It's not the right thing to do!

But wait... shouldn't this have been broken before (https://github.com/TravelMapping/DataProcessing/tree/a20ee6dcccbac73135a6b73488251c4b4652817f/siteupdate/cplusplus)?
After all, the two totals being compared are calculated by iterating through the HighwaySystem::mileage_by_region (https://github.com/TravelMapping/DataProcessing/blob/a20ee6dcccbac73135a6b73488251c4b4652817f/siteupdate/cplusplus/classes/HighwaySystem.cpp#L131-L135) and TravelerList::system_region_mileages (https://github.com/TravelMapping/DataProcessing/blob/a20ee6dcccbac73135a6b73488251c4b4652817f/siteupdate/cplusplus/classes/TravelerList/TravelerList.cpp#L119-L124) unordered_maps.
Unordered maps. The name says it all, right? Won't data be coming out of these in any old order at all? Or is it deterministic?
I got curious and pasted

Code: [Select]

HighwaySystem* h;
for (auto& hs : HighwaySystem::syslist)
  if (hs->systemname == "usai")
  {	h = hs;
	break;
  }

for (std::pair<Region* const, double>& rm : h->mileage_by_region)
	std::cout << rm.first->code << '\t' << rm.first->name << '\t' << rm.second << std::endl;

std::cout << std::endl;

TravelerList* yakra;
for (auto& t : TravelerList::allusers)
  if (t->traveler_name == "yakra")
  {	yakra = t;
	break;
  }

for (auto& rm : yakra->system_region_mileages.at(h))
	std::cout << rm.first->code << '\t' << rm.first->name << '\t' << rm.second << std::endl;

return 0;

into yakra:master (still at bc04537 (https://github.com/yakra/DataProcessing/tree/bc04537138af75f558bd9b4481e1fbe324c5e04a); I've not yet FFWDed to your more recent merge commit) here (https://github.com/yakra/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/siteupdate.cpp#L495).
It spits out two tables:

Mileage for all regions on usai
The order of the regions differed from one machine to the next, but stayed the same in successive runs on each.
My mileage in my regions on usai
The order of the regions differed between successive runs using multiple threads. It stayed the same if I either:
• ran siteupdate with only 1 thread, or
• pasted the same code into my local WIP branch that sets up the hash tables in advance in Route::store_traveled_segments (https://github.com/TravelMapping/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/classes/Route/Route.cpp#L218-L227) and ConcAugThread (https://github.com/TravelMapping/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/threads/ConcAugThread.cpp) as noted above.

This leads me to believe that the sequence of items in an unordered_map depends on the order in which they're inserted, along with the hash of the key itself.

HighwaySystem::mileage_by_region gets its insertions (https://github.com/TravelMapping/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/classes/Route/compute_stats_r.cpp#L44) in a deterministic order, as we iterate thru systems & routes.
In the WIP branch, TravelerList::system_region_mileages got its insertions in the same order the system+region combos were encountered in yakra.list.
In the master branch, TravelerList::system_region_mileages gets its insertions in the order the HighwaySegment*s come out of TravelerList::clinched_segments.
This looks pretty stable with just one thread, but multiple threads have a big impact on where these objects are allocated in memory, and thus their positions in the clinched_segments set.

OK. So now that segments are all summed up in system->route->segment order, including for traveler's regional totals again, can we remove the float conversion and the logs will be OK? Well, not quite. For two reasons:

For travelers, the segment lengths are added up one by one. For the overall region totals, since ComputeStatsThread parallelism phase 2 (https://github.com/TravelMapping/DataProcessing/commit/ae31cc6748918bf0056709f92dbe936608521239#diff-cfffce5fd934b05229e6ec96117555b4725f1bd31d1e84b5d90c64bdab6c1406), segment lengths are summed into a subtotal for each route, when are then added to the regional totals. This was done in order to greatly reduce mutex usage. So, the rounding errors will be a little different. (I see a way to fix/undo this, but that's off-topic for the moment.) Even still...
As discussed above, HighwaySystem::mileage_by_region and TravelerList::system_region_mileages are no longer populated, and thus no longer iterated, in the same order.

OTOH, we can get rid of the float conversion after all if we don't compare traveled vs total mileage, and instead see if num_con_rtes_clinched == everything in the system. As an added bonus, this eliminates a call (https://github.com/TravelMapping/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/classes/TravelerList/userlog.cpp#L40) to HighwaySystem::total_mileage (https://github.com/TravelMapping/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/classes/HighwaySystem/HighwaySystem.cpp#L118-L123) and saves us a wee bit of time.

WRT ComputeStatsThread parallelism phase 2:
Forget what I said upthread (https://forum.travelmapping.net/index.php?topic=5379.msg30468#msg30468) about "since we'd still keep the changes from ComputeStatsThread parallelism phase 2." Iterating by region isn't just for CompStatsTThread; it can benefit CompStatsRThread too.

We'd no longer need a mutex here (https://github.com/TravelMapping/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/classes/Route/compute_stats_r.cpp#L36-L40), and can then simplify away that block of code (and these temporaries (https://github.com/TravelMapping/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/classes/Route/compute_stats_r.cpp#L2-L4)) completely, instead adding the segment lengths directly to the regional totals.
We would need a mutex around this stuff (https://github.com/TravelMapping/DataProcessing/blob/bc04537138af75f558bd9b4481e1fbe324c5e04a/siteupdate/cplusplus/classes/Route/compute_stats_r.cpp#L41-L45) since each thread no longer handles a HighwaySystem, unless the tables are set up in advance. Do whichever performs better.
This raises a question... Chasing a few pointers and doing a little arithmetic is pretty inexpensive. What would perform better, having that or a hash table lookup/insertion in a mutex?
Again, though, there's the option to set up the tables in advance. HighwaySystem construction seems a natural place to do this, but that's still single-threaded for the time being. Multi-threaded options include ReadWptThread and RteIntThread. Got some ideas up my sleeve on how to minimize lookups while doing this.
We could add to system->mileage_by_region[region] segment-by-segment without having to muck about with a per-route system_mileage subtotal OR multiple hash table lookups if we obtain a reference before the for loop, and add to that reference.
Iterating both CompStatsRThread and CompStatsTThread by region means we could simplify the code and reunify them, effectively undoing all of #386 (https://github.com/TravelMapping/DataProcessing/pull/386). This eliminates the need to store concurrency counts in the HighwaySegment object (https://github.com/TravelMapping/DataProcessing/commit/31ccd30c985c95960a3b5b57850d89e7f463ecd8#diff-ce66db9042c3dc2a120cdcfb0da256aec25ae77500ba9c139e2a3c9292ac608d) for future use, saving us 16 bytes per segment.

Benchmarking
~~TODO~~ DONE: make a list of tasks affected by all the changes. Some will be faster; some will be slower. (https://forum.travelmapping.net/index.php?topic=5379.msg30547#msg30547)
And for completeness' sake, tasks following changed ones, LOL. As observed once upon a time before, changes in what goes on in one task can affect the number of cache misses (and thus performance) in the next, even if the code in these subsequent tasks is itself unchanged.

Title: Re: Unusual new behavior for region.php
Post by: Jim on January 27, 2023, 02:08:53 pm

Wow, thanks again for all of this digging, @yakra. I think next time I teach my Parallel Processing or OS or other classes where I talk about the dangers of concurrency ("what could possibly go wrong?"), I'll just use some of your threads on the topic here and on GitHub as assigned readings...

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 27, 2023, 08:34:33 pm

Woe be to those who try to follow my incoherent ramblings :)

Thankfully, there's more than one way to skin this cat. Now it's just a matter of finding which one is fastest. After all, the C++ siteupdate program is all about The Quest For Speed :)

Title: Re: Unusual new behavior for region.php
Post by: yakra on January 29, 2023, 05:10:26 pm

Tasks to benchmark

Total run time (w/early termination to save time)
Reading region|stats csv

Primary tasks with their efficiency directly changed:

Reading systems list
Reading systems list|Finding all
Slower in v.2 due to populating Region::routes
Reading waypoints for all routes
ReadWpt
Slower in v.2s* due to HighwaySystem::mileage_by_region setup
Processing traveler list files
ReadList
Slower in v.2 due to traveler mileage table setup
Augmenting travelers for detected concurrent segments
ConcAug
Slower in v.2 due to traveler mileage table setup
Faster in both: to_add removed
Computing stats
CompStatsB
Slower in v.1 due to mutexes and temporary table creation in v.1B
Faster in v.2 due to prior hash table setup

Secondary tasks themselves unchanged that may be indirectly affected by cache misses due to changes in prior tasks:

Finding all .wpt files
Finding all|Reading waypoints
Sorting waypoints in Quadtree
QtSort
Searching for near-miss points
NmpSearch
Writing route and label logs
Writing route|augmenting
Writing highway data stats log file (highwaydatastats.log)
highwaydatastats|per-traveler

Title: Re: Unusual new behavior for region.php
Post by: yakra on February 05, 2023, 10:21:55 pm

Out of curiosity, has single-threaded always been the default for C++ in the new common siteupdate.sh, even before this bug was discovered?
I was looking at the blame for siteupdate.sh and that appears to be the case, but maybe I missed something.

Title: Re: Unusual new behavior for region.php
Post by: Jim on February 05, 2023, 10:23:08 pm

Quote from: yakra on February 05, 2023, 10:21:55 pm

Out of curiosity, has single-threaded always been the default for C++ in the new common siteupdate.sh, even before this bug was discovered?
I was looking at the blame for siteupdate.sh and that appears to be the case, but maybe I missed something.

I was running with 8 threads before this problem was noticed, but it's possible I had never committed that version.

Title: Re: Unusual new behavior for region.php
Post by: yakra on February 17, 2023, 12:30:24 pm

TODO: Make sure all hash table pre-setup is conditioned out in siteupdateST -- or else done in siteupdate in the thread functions only.
Cleaner and easier than conditioning out mutexes (https://github.com/TravelMapping/DataProcessing/issues/393).
All this iff one of the 'S' alternatives is selected; hopefully this will be the case.

Title: Re: Unusual new behavior for region.php
Post by: yakra on February 19, 2023, 04:56:28 pm

LOL, things got way out of hand. I have 25 build alternatives, all a little bit different.
Benchmark results from the last machine, lab3, won't be in for another 4 hours, but it looks like I'll be dropping 5 of these alternatives from consideration right away.
Then, my task over the next few days will be to JUST PICK ONE.

Of the remaining 20 options, if all goes well I should be able to take one look through the data & eliminate 10 options, then take another pass through and eliminate another 5.
Of the remaining 5, 3 look like slightly worse performers, though I've yet to take a closer look.
For whatever's left at the very end, the difference in processing time will be close to zero, and I'll probably be looking at more noise than signal in the data. It might not even be possible to pick one definitive "best performer", and I may just go with the version that has the cleanest code.

When I originally drafted the first 9 alternatives 3 weeks ago, 4 of them (which have since ballooned into the 20 still in consideration) clearly outperformed no-build.
Since then, what "no-build" is has changed, and I've rebased my in-dev branches onto DataProcessing#579 (https://github.com/TravelMapping/DataProcessing/pull/579) and more recent commits.
Now, no-build is clearly fastest -- though it does need to change in order to fix this bug.
Why are the fixes slower now? I see two major components:

The fixes all remove a component of Augmenting travelers for detected concurrent segments that's a big drain on efficiency. Meanwhile, #579 sped that component up, so removing it means a smaller improvement.
For Computing stats per traveler specifically, that went from iterating thru Travelerlist::clinched_segments to iterating thru HighwaySegment::clinched_by.
Originally, each was an unordered_set, so iteration took the same amount of time. Now, #579 changed the former to a vector, meaning the fixes get slower.

Future developments are a bit of a wild card on that 2nd point. HighwaySegment::clinched_by will eventually be converted to a custom TMBitset class. As currently drafted, iterating through this class should be quite slow, but there are some options for optimization that should reduce that impact. Where it'd ultimately stand relative to an unordered_set remains to be seen.

Title: Re: Unusual new behavior for region.php
Post by: yakra on March 10, 2023, 09:13:42 pm

Jim, I have a bleeding-edge executable at /home/yakra/ytm/DataProcessing/siteupdate/cplusplus/siteupdate. Wanna give it a try tonight, multi-threaded?

Title: Re: Unusual new behavior for region.php
Post by: Jim on March 10, 2023, 09:43:52 pm

Quote from: yakra on March 10, 2023, 09:13:42 pm

Jim, I have a bleeding-edge executable at /home/yakra/ytm/DataProcessing/siteupdate/cplusplus/siteupdate. Wanna give it a try tonight, multi-threaded?

Running now with 8 threads.

Title: Re: Unusual new behavior for region.php
Post by: Jim on March 10, 2023, 10:02:47 pm

Quote from: Jim on March 10, 2023, 09:43:52 pm

Quote from: yakra on March 10, 2023, 09:13:42 pm
Jim, I have a bleeding-edge executable at /home/yakra/ytm/DataProcessing/siteupdate/cplusplus/siteupdate. Wanna give it a try tonight, multi-threaded?

Running now with 8 threads.

Completed.

Title: Re: Unusual new behavior for region.php
Post by: yakra on March 12, 2023, 05:13:53 pm

TLDR, no real reason to waste your time reading this.
Mostly talking to myself here; decided to save all this rather than just bin it.

The shootout!

Quote from: yakra on February 19, 2023, 04:56:28 pm

it looks like I'll be dropping 5 of these alternatives from consideration right away.

Only 4 are out; 1 is back in, at least temporarily.
Turns out I was grabbing the wrong timestamps from the siteupdate.log files, and not measuring one of the tasks that slowed down in order to speed other tasks up. This made the newer alternatives artificially look better in comparison to siteupdate_2.
Meanwhile, 10 more alternatives were added, making 35 in total. :P More on this below.

siteupdate_3mtx, 2mtx & 1mtx were 3 different flavors of Option 1 listed under "What to do" here (https://forum.travelmapping.net/index.php?topic=5379.msg30468#msg30468). The idea was to experiment with having fewer mutexes than the original 3, and balance reducing lock contention with reducing lock overhead. Less overhead wins; siteupdate_1mtx generally performed best, but the siteupdate_2+ alternatives outperformed all these; they were 3 of the 4 options dropped from considertion.

siteupdate_1B was proposed (https://forum.travelmapping.net/index.php?topic=5379.msg30502#msg30502) to decrease mutex overhead by creating a temporary hash table. This one was absolute bottom of the barrel everywhere except on bsdlab, where its performance fell between siteupdate_3mtx & 2mtx.

siteupdate_2 as revised here (https://forum.travelmapping.net/index.php?topic=5379.msg30502#msg30502), when Computing stats per traveler, iterates with each thread processing a Region, and avoids bringing back per-traveler mutexes at the cost of setting up hash tables' key/value pairs in a threadsafe manner in another function beforehand. CompStatsRThread/compute_stats_r & CompStatsTThread/compute_stats_t are still separate entities. This is the alternative that was mistakenly removed from consideration then put back on the table; I'll weigh it against whatever gets selected from The Big 30 in the final round.

The Big 30:
These all re-unify CompStatsRThread & CompStatsTThread into a single CompStatsThread that, again, iterates with each thread processing a Region, as outlined in the last set of bullet points at the bottom of this post (https://forum.travelmapping.net/index.php?topic=5379.msg30521#msg30521).

So there are 30 remaining alternatives. 2 times 3 times 5.
For those unfamiliar with brace expansion (https://www.google.com/search?q="brace+expansion"), hopefully this will still make some sense from context:
siteupdate_{3,4}{m,s,S}{t,ta,r,pa,ra}

The 4 family attempted to improve on 3 by storing a few variables to eliminate some redundant dereferencing & arithmetic. In practice, this didn't have much effect.
The 3xx family was previously referred to as 2xx variants upthread, confusingly. :(

m vs s: Right now, this code (https://github.com/TravelMapping/DataProcessing/blob/7ec7e9a5aa13e8eed9cbe8ca6a2798c268d47652/siteupdate/cplusplus/classes/Route/compute_stats_r.cpp#L41-L45) is threadsafe, with each thread processing a HighwaySystem. However, switching to processing a Region will change that, and we'll either need a mutex, or to setup the key/value pairs in advance.
The capital S family is where the 10 new alternatives came in -- m combined the HighwaySystem mutexes as proposed in yakra#210 (https://github.com/yakra/DataProcessing/issues/210) to preserve my sanity & code organization/readability, while s didn't really need to do that & went no-build. After noticing some counterintuitive benchmark results on lab2, I decided to go for a more apples-to-apples comparison, and S combines the mutexes too. Prior benchmarks suggested a slight speedup anyway.

t is more of a no-build compared to r -- it continues to keep a running system_mileage subtotal, and add it to r->system->mileage_by_region[this] afterwards, while r defines system_mileage as a reference to r->system->mileage_by_region[this] and adds to it directly. So we have one less arithmetic operation per route, and slightly cleaner code. No reason to use t, really. :P

ta, ra, pa: 'a' is for "at". Background is at #517 (https://github.com/TravelMapping/DataProcessing/pull/517); search for "No-Build". Leaving this ugly/clunky block of code as it was performed better than changing to a more concise
system->mileage_by_region[region] += system_mileage;
It's possible that, two days later, using the clang -O3 flag (https://github.com/TravelMapping/DataProcessing/pull/520) optimized away any/most of the difference. I don't remember testing that at the time; gonna have to do that now. :D Annnd... it does appear to optimize that away, yes.
ta is no-build here, still adding/assigning the total in the try/catch blocks.
Would-be ra is less straightforward -- the reference can't be declared in the try/catch blocks due to scope limitations, and can't be declared before that as it must be assigned upon declaration.
pa declares a pointer first, assigns it in the try/catch blocks, and dereferences if when adding to *system_mileage.
ra declares a pointer first, then uses it to get a reference right after the try/catch.

The winner: siteupdate_3Sr!

S slightly outperforms s, and has cleaner code.
S slightly outperforms m, though it's very close. I just like the idea of going mutex-free too. :)
No significant difference between 3 & 4. Going with 3 for cleaner code.
For the remaining "Big 5", no real observable difference in speed; r has the cleanest code, and in theory is faster than t even if not noticeably so in practice.

Taking another look at 2 compared to 3Sr, it's in the same ballpark. Slower on 6/7 machines, faster by about 12 ms on average on lab1. Just noise in the data.
Disadvantage of 3Sr: key/value pair setup for HighwaySystem::mileage_by_region during ReadWptThread.
Advantages of 3Sr: Less HighwaySystem mutex overhead; less pointer-chasing & arithmetic; no region mutexes; half the iterating thru routes because only one threaded ComputeStats task, not 2.
Plus, cleaner code -- fewer threads, fewer functions, fewer files, and fewer lines in what's left.

3Sr has the 2nd-cleanest code overall, with only 5 more lines than 3mr.

Anyway! Enough faffing about; I've waster a staggering amount of time. Time to delete all the other experimental HighwayData branches and get to work on committing this.

Dropping this here as I delete dead branches, in care I wanna refer back to it in the future:

Code: [Select]

3mtx	7b090e1be7a05fb173f27f0ccb1d83951c502707
2mtx	bc71d69b7697ce22974b94137d13011c6fc6334f
1mtx	b031cd744eab66d35078df29971ffd59e2019d77
1B	1e9fd5803803e8a3442f290d0ac65b08816e06fd
2	1b6f07d72a205a7aace79e9f9dea71412d537892
3mt	5b4794db55e8a0f64d499d720622b67cd2773ae9
3mta	eadec347d03e9c05280f2e4938ea4edfe964ad75
3mr	4f91059cf83d628c0aa350829304a83a58dd29f4
3mpa	43c6d014f8781c0c30ed4f65b845542743530186
3mra	62ea0e698519b795c6a6edb23572943028eec134
3st	836a5f4b203d06bb53634441c51eae3d161ba3bd
3sta	e29c2d9f6bc188d8866a23d94726fe0711eede86
3sr	2f394b13510d6d9c69e0e72d19be6bf72b0f546f
3spa	1e8bd750e89849796b3079ca094a9a7df86703e3
3sra	a2cb69eb94a5be1038c1116dc6a688f263f45c61
3St	8bb3052c31e40f888006781eb397cf0bb2d6f707
3Sta	a368b2ef656727b5bd5e84bee45561a9512b3a73
3Sr	9198035f49ae24842941eaec8be17d898e1baf5b
3Spa	a1c39077211e693f5a67b328bcccb962e28a715b
3Sra	372ea9ed3ee8f184a0ab16d4ff17df0067fd5544
4mt	933ce42c93d37970d7d0c53c9042f7545ebb768b
4mta	7582a07b53fd04df0d6072581d208135fc0ecd32
4mr	df840c78da7e75088db08c6584a45a4ffb691299
4mpa	d01b65d995eb76f928e061a2b7de312fff4f68f2
4mra	1dbfcacaab3089f3db0e835bffe8a1f32cb46dff
4st	50492dada8521bdd5b8fe449dd50bdf4a06d6993
4sta	d58e3edcfaf3afb81f273311296baff1b86340bb
4sr	518210b631e62b2f438a8ae9c4d547d12d7d46f9
4spa	e1cf7b2f28cba203c60163dea11ca5d76d0389b7
4sra	cc82288859fe56981f2af06f5d7eb71b76becf5c
4St	4eac70852d19573e7d56e80ef93054351753b205
4Sta	c5ebe2ebf78071533528b13cc0bbddf995afc84c
4Sr	63d085cb27463438e8937debc53fff09ac92dd96
4Spa	fe5d268886959791fa9cb29ed39031447a4a5987
4Sra	95e5cf58413d77ffad00d5cb40279304c60c0987

CompStats3	0eaf81ae77e613f3802f15089584c1e83f81ae7d
2sr		83f42cfd4c86eabe5ded700cd9fb0bc81db6aad4
2mr		0112222c87860fe89f9a94256f5ab981bec096e8
CompStats2	1e9fd5803803e8a3442f290d0ac65b08816e06fd
y210_rebase	62b6dc6a0e9526532708a3b7d8ac0c49a100aaac

Final ToDo List:
Check wording in comments & fix if needed, incl CompStatsThread.cpp
Make sure all hash table pre-setup is conditioned out in siteupdateST -- or else done in siteupdate in the thread functions only.
Make sure siteupdateST still gives the same output.
Probably implement the rest of yakra#210 (https://github.com/yakra/DataProcessing/issues/210), closing the issue.
Userlogs: Don't compare traveled vs total mileage, and instead see if num_con_rtes_clinched == everything in the system. Python too?
Housekeeping: refactor datachecks into HighwaySystem::route_integrity. Python too, to keep things similar.
Implement a few items from #574 (https://github.com/TravelMapping/DataProcessing/issues/574) since we already need to recompile things.
~~Clear TravelerList::clinched_segments after concurrency augments? Leave it in in case I make changes to DB population (https://github.com/TravelMapping/DataProcessing/issues/580)?~~ Decide later.
Interactive rebase / squash

Title: Re: Unusual new behavior for region.php
Post by: Jim on March 13, 2023, 09:30:44 pm

Going back to the single-threaded one for tonight, @yakra let me know if you'd like me to use your current version for further testing on production site updates.

Title: Re: Unusual new behavior for region.php
Post by: yakra on March 19, 2023, 12:54:56 pm

Should be good until the push to production, thanks.

In the meantime, I had some fun intentionally creating a bug I knew in advance how to fix:
I removed the float conversion kluge-around (https://github.com/TravelMapping/DataProcessing/commit/31ccd30c985c95960a3b5b57850d89e7f463ecd8#diff-8a323ef0bce4aba54cfd7eef56fdfeda18216c23c3f5135c13b00a89e5449386) mentioned upthread (https://forum.travelmapping.net/index.php?topic=5379.msg30521#msg30521), and saw what I could see.

Quote from: yakra on January 27, 2023, 10:42:49 am

OK. So now that segments are all summed up in system->route->segment order, including for traveler's regional totals again, can we remove the float conversion and the logs will be OK? Well, not quite. For two reasons:
For travelers, the segment lengths are added up one by one. For the overall region totals, since ComputeStatsThread parallelism phase 2 (https://github.com/TravelMapping/DataProcessing/commit/ae31cc6748918bf0056709f92dbe936608521239#diff-cfffce5fd934b05229e6ec96117555b4725f1bd31d1e84b5d90c64bdab6c1406), segment lengths are summed into a subtotal for each route, when are then added to the regional totals. This was done in order to greatly reduce mutex usage. So, the rounding errors will be a little different. (I see a way to fix/undo this, but that's off-topic for the moment.) Even still...

This fix was done, and now for overall regional totals, segment lengths are added up one by one, same as for per-traveler regional totals. In the same order.
This makes the "missing clinched system" bug far less frequent, but still possible. The only factor we have to worry about is...

Quote from: yakra on January 27, 2023, 10:42:49 am

As discussed above, HighwaySystem::mileage_by_region and TravelerList::system_region_mileages are no longer populated, and thus no longer iterated, in the same order.

Here's a list of systems where the bug can occur, those with 3 or more regions, clinched by 1 or more travelers. (Nota bene: not the most recent commits on BiggaTomato. HD @ 59afa75, UD @ 2126e76.)

system regions clinchers gbnam 3 1 gbnm 3 1 usany 3 2 usawv 3 2 usaif 13 2 deua 16 1

Of course, the more regions in a system, the more probable it is for the error to occur. I never observed it in any of the 3-region systems.
Every time, the buggy siteupdate version failed to recognize michih's clinch of deua.

For usaif, things get a little more interesting.
Almost every time, siteupdate fails to recognize mapcat's clinch.
2 or 3 times though, mapcat's clinch was recognized, and Oscar's was not.

This fits in with my theory...

Quote from: yakra on January 27, 2023, 10:42:49 am

that the sequence of items in an unordered_map depends on the order in which they're inserted, along with the hash of the key itself.

What happens most of the time is something like this -- here's the data from the last run:
VA FL AR NC LA AZ CA TN KY NY IN SC PA is the order of regions in usaif's overall mileage table.
SC NY CA TN KY NC FL AR AZ IN LA VA PA is the order of regions in Oscar's traveled mileage table.
VA SC LA NY KY IN NC PA FL AR AZ TN CA is the order of regions in mapcat's traveled mileage table.

For Oscar, his mileages summed up in that order magically come out to the same number, as a 64-bit double-precision float, as usaif's overall mileage the majority of the time.
676.61164962090924746
For mapcat, no such luck most of the time; for his region order the rounding error works out a little different, the numbers aren't equal, and no clinch is detected.
676.61164962090913377
LOL, meanwhile michih's mileage of 8235.36727394201443531 is greater than deua's total of 8235.36727394201261632. Not equal = no clinch. That's not fair now is it!

The order of the regions' insertion into the table is deterministic. The hash of the key, not quite. The key is the Region object's memory address. The objects are dynamically allocated on the heap, and their addresses can change between runs. Seems fairly stable for the most part though; while I've only observed insertion order having an effect previously (https://forum.travelmapping.net/index.php?topic=5379.msg30521#msg30521), now it's confirmed -- changing memory addresses affect key hashes, affect the order of storage in the unordered_map, affects order in which mileages are summed, affects cumulative rounding errors, affect whether a user's traveled mileage == a system's mileage, affects whether a system is flagged as clinched.

Well. That was a bit of fun now wasn't it!
And now back to the actual programming -- to remove the kluge, and speed things up a tiny bit in the process, without introducing a bug.

after (https://github.com/TravelMapping/DataProcessing/blob/525c2b6e2a29c6e1b34e500834164b589ce9e5cb/siteupdate/cplusplus/classes/TravelerList/userlog.cpp#L14):

Code: [Select]

std::ofstream dbg(Args::logfilepath+"/users/"+traveler_name+".dbg");
change (https://github.com/TravelMapping/DataProcessing/blob/525c2b6e2a29c6e1b34e500834164b589ce9e5cb/siteupdate/cplusplus/classes/TravelerList/userlog.cpp#L42):

Code: [Select]

{	active_systems_clinched++;
	dbg << h->systemname << '\n';
}

in here (https://github.com/TravelMapping/DataProcessing/blob/525c2b6e2a29c6e1b34e500834164b589ce9e5cb/siteupdate/cplusplus/classes/TravelerList/userlog.cpp#L44):

Code: [Select]

// having fun with some debugging even though I created the bug on purpose and know how to fix it!
if (traveler_name == "michih" && h->systemname == "deua"
 || traveler_name == "mapcat" && h->systemname == "usaif"
 || traveler_name == "oscar"  && h->systemname == "usaif")
{	auto hit = h->mileage_by_region.begin();
	auto tit = system_region_mileages.at(h).begin();
	while(hit != h->mileage_by_region.end())
	{	sprintf(fstr, "%.17f", tit->second); dbg << tit->first->code << '\t' << fstr << '\t';
		sprintf(fstr, "%.17f", hit->second); dbg << hit->first->code << '\t' << fstr << '\n';
		++tit;
		++hit;
	}
}

Title: Re: Unusual new behavior for region.php
Post by: yakra on March 28, 2023, 12:19:50 am

https://github.com/TravelMapping/DataProcessing/pull/584