Author Topic: Unusual new behavior for region.php  (Read 529 times)

0 Members and 1 Guest are viewing this topic.

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2532
  • Last Login:Today at 08:00:06 am
Re: Unusual new behavior for region.php
« Reply #15 on: January 22, 2023, 09:01:59 am »
Last night's update ran Python.  I'll do one now with single-threaded C++.

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2532
  • Last Login:Today at 08:00:06 am
Re: Unusual new behavior for region.php
« Reply #16 on: January 22, 2023, 09:50:39 am »
With this morning's one-thread C++:

Code: [Select]
> select * from clinchedoverallmileagebyregion where region='LIE' order by activemileage;
+--------+----------------+--------------------+----------------------+
| region | traveler       | activeMileage      | activePreviewMileage |
+--------+----------------+--------------------+----------------------+
| LIE    | Aseluit        | 0.9841289652476017 |   0.9841289652476017 |
| LIE    | panda80        | 1.1540176816881598 |   1.1540176816881598 |
| LIE    | charliezeb     |  1.742166048369596 |    1.742166048369596 |
| LIE    | M3200          |  2.068760840780408 |    2.068760840780408 |
| LIE    | dantheman      |  4.626547316794324 |    4.626547316794324 |
| LIE    | griffith       | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | martin0102     | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | sneezy         | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | spinoza        | 7.1963766741696755 |   7.1963766741696755 |
| LIE    | the_spui_ninja | 7.9466752759834405 |   7.9466752759834405 |
| LIE    | bickendan      |  8.180505639417277 |    8.180505639417277 |
| LIE    | cougar1989     |  8.180505639417277 |    8.180505639417277 |
| LIE    | dave1693       |  8.463950001283509 |    8.463950001283509 |
| LIE    | philimon       |  8.463950001283509 |    8.463950001283509 |
| LIE    | niels          |  9.100692957671601 |    9.100692957671601 |
| LIE    | bartpetat      |  9.846927579160567 |    9.846927579160567 |
| LIE    | cinx           | 13.727240274465924 |   13.727240274465924 |
| LIE    | michih         | 13.727240274465924 |   13.727240274465924 |
+--------+----------------+--------------------+----------------------+

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4051
  • Last Login:Today at 10:39:13 am
Re: Unusual new behavior for region.php
« Reply #17 on: January 22, 2023, 10:51:33 am »
Code: [Select]
beg=`grep -n 'INSERT INTO clinchedOverallMileageByRegion' $file | cut -f1 -d:`
end=`grep -n 'CREATE TABLE clinchedSystemMileageByRegion' $file | cut -f1 -d:`
tail -n +$beg $file | head -n $(expr $end - $beg) | egrep "^,?\('LIE',"
TravelMapping-2022-12-29@21:42:10.sql, the first update done with siteupdateST, looked fine.
TravelMapping-2023-01-02@21:53:23.sql (are we missing 2023-01-01?), the first update with multi-threaded siteupdate, has the differences in the least significant digits.

The concept at play here is that of cumulative rounding errors when working with finite-precision floating-point numbers.
Another thing affecting the result in the least significant digits is the order in which numbers are added together.

A simplified example -- if we can only ever have 3 sig figs each step of the way:
1.67 + 1.67 + 30 = 33.3
30 + 1.67 + 1.67 = 33.4

The values for the clinchedSystemMileageByRegion DB table are affected by this.
They're added together in whatever order they come out of a hash table.

The problem is with this little function right here:
Code: [Select]
void HighwaySegment::compute_stats_t(TravelerList* t)
{ // credit active+preview for this region, which it must be
// if this segment is clinched by anyone
t->active_preview_mileage_by_region[route->region] += length/active_preview_concurrency_count;

// credit active only for this region
if (route->system->active())
t->active_only_mileage_by_region[route->region] += length/active_only_concurrency_count;

// credit this system in this region in the messy unordered_map of unordered_maps
t->system_region_mileages[route->system][route->region] += length/system_concurrency_count;
}
More precisely, not in the function itself, but in what's going on when it gets called...

siteupdateST behaves essentially the same way siteupdate.py does:
We iterate through each HighwaySystem, then each Route, then each HighwaySegment, then call the function for each traveler it's clinched_by. Travelers get their segments credited in the same order (system->route->segment) every time, and the cumulative rounding error is always the same.

Threaded siteupdate works differently:
In order to avoid having to wait for mutexes while multiple threads processing different HighwaySystems compete for access to TravelerList objects, each thread gets a TravelerList to process, iterates through their clinched_segments & calls compute_stats_t. Problem is, these segments are stored in an unordered_set. The name says is all; when we iterate through this container, its elements are processed in whatever order, each traveler's regional totals get rounded off a little differently at each stage of the game, and the cumulative rounding errors come out a little different in the end.

What to do:
  • The simpler solution is to undo the changes made in ComputeStatsThread parallelism phase 1. We'd iterate the same way as before (system->route->segment), lock travelers' regional totals with mutexes, and just live with the performance hit. It'd not be as bad as the charts at #386 show, since we'd still keep the changes from ComputeStatsThread parallelism phase 2.
  • Another solution that could reduce lock contention... First, during Route construction, add each Route to a list in its Region. A small amount of overhead here, but not much. Then, when Computing stats per traveler, have each thread process a Region. This way, segments are processed in the same order -- system->route->segment. For the most part, we'd not need mutexes, as each thread would be reading/writing different areas of travelers' hash tables. The main snag is when a given region isn't in a traveler's tables yet, and we have to create a new key/value pair. I've an idea what to do here. It'd involve a little lock contention (and overhead), but not as much as in solution #1.
  • Potentially, implement #1 near-term and then work on #2.



system.php

Just finishing up a Python update for tonight.  I can run ST tomorrow if you'd like, or re-run tonight's if it's helpful to see it sooner.
Python is fine, thanks.
Last night's update ran Python.  I'll do one now with single-threaded C++.
Excellent, thanks! I was just about to request that.

system.php, when displaying mileage in all regions, sums up multiple values from the clinchedSystemMileageByRegion table. Surely in whatever order they come out of the table. But these data aren't inserted into the table in a deterministic order, again due to being stored in an unordered_map. It might take a bit of searching to find this effect.

Edit: I did some searching. In 26 different systems covering multiple regions. Didn't observe the problem with the rankings, but I still believe it can happen.
Maybe I'll make 100 different .list files all clinching usaif or something, and update lab2. :)
for n in {100..199}; do
  for f in `tail -n +2 usaif.csv | tr ' ' '%' | sed -r 's~usaif;([A-Z]{2});.*;(.*);~\1/usaif/\2.wpt~'`; do
    echo -n $f | sed -r -e 's~.*([a-z]{2})\.i0*~\1 i-~' -e 's~\.wpt$~ ~'
    head -n 1 ../$f | cut -f1 -d' ' | tr '\n' ' '
    tail -n 1 ../$f | cut -f1 -d' '
  done > ~/tm/UserData/list_files/melvin$n.list
done

Schmedit: LOL still nothing. I guess fixing this is low priority? 8)

Potential soution:
  • In "the messy unordered_map of unordered_maps", replace the outer unordered_map with an ordered map. As long as systems get sorted the same way for everybody, sort criterion isn't important. Why not keep it fast & simple, and use the HighwaySystem's memory address.
    Potential oddities: HighwaySystem objects could be ordered differently in memory in separate runs of siteupdate. This would mean that while people who have the same mileage in a system have the same mileage in each individual site update, mileage could fluctuate a tiny little bit from one update to the next. In practice this won't be obsevable, as mileages are rounded to 2 decimal places on the userpages and in userlogs & stats CSVs. Unless there's another freak occurrence like what happened with Oscar's mileage on ME103 where a difference 1/3 the width of a human hair made the difference between rounding off to one semi-trailer and the next. This is possible, but not probable. I don't believe I've ever observed it when comparing the output of different siteupdate versions.
  • Longer-term, we can eliminate the (remote) possibility of that happening by storing HighwaySystem objects sequentially in memory rather than allocating them one-by-one on the heap. Eventually, I'd like to do that for a lot more of the classes for performance purposes. There are complications in doing so, but not insurmountable ones.
« Last Edit: January 22, 2023, 04:39:34 pm by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4051
  • Last Login:Today at 10:39:13 am
Re: Unusual new behavior for region.php
« Reply #18 on: January 25, 2023, 09:06:27 am »
WRT Option #2:
when Computing stats per traveler, have each thread process a Region. This way, segments are processed in the same order -- system->route->segment. For the most part, we'd not need mutexes, as each thread would be reading/writing different areas of travelers' hash tables. The main snag is when a given region isn't in a traveler's tables yet, and we have to create a new key/value pair. I've an idea what to do here. It'd involve a little lock contention (and overhead), but not as much as in solution #1.
I knew this'd be complicated, but it's more complicated than I first thought.

Let's say we start by trying to find the region. If it's not found, then lock the mutex, insert a new element, and unlock.
Between the time we fail to find the region and the time we acquire the lock, another thread may have inserted a value for that region. (No surprises here; most existing threads already need to check for these lightning-quick state changes.) No problem; just use the [] operator. There'd be a little redundant searching & hashing, but that's the price we pay for thread safety.

But what if we DO find the region? We'd have an iterator we'd need to dereference. Which may become invalidated between when we find it and when we attempt to dereference:
On most cases, all iterators in the container remain valid after the insertion. The only exception being when this function inserts a new element and this forces a rehash. In this case, all iterators in the container are invalidated.
Oy vey.
This would mean using a mutex here too and getting a new reference or iterator that we know is valid. At which point things are less efficient than option #1 in my last post.

References to elements in the unordered_map container remain valid in all cases, even after a rehash.
Part of me wants to cowboy thru the whole process, no-mutex, using the [] operator (which returns a reference).
Probably a bad idea though -- I could see multiple threads trying to do an insertion at the same time, and getting the container into a bad state.
May still try it though as a learning exercise; see what I can see. :pan:
Edit: It crashes. No big surprise here. :)

Assuming that's a no-go, what are our options?
  • I've implemented Option #1 locally, improved relative to before #386 by going from 3 mutexes in the TravelerList class to 1. Decreased overhead outweighs increased contention.
    Takes 55% longer than no-build @ 4 threads on BiggaTomato. As always I'm curious to see how this scales to higher thread counts on the other machines, along with 2-mutex and 3-mutex flavors. :)
    A few more ideas to further reduce contention:
  • 1A: Move the compute_stats_t calls into the region->mtx.lock? No, wait. Multiple threads could be trying to insert new regions into a traveler's hash tables at once. Probably still a bit too cowboy. Scratch that! Plus it introduces unnecessary contention, locking the region for all travelers at once.
  • 1B: For a more threadsafe variant of the same idea, create a temporary unordered_map<TravelerList*,double[3]> while processing each Route, to store each traveler's mileage on it. No mutex required. Once we have the totals for the route, iterate through each traveler's totals and add them to the per-traveler hash tables. This still requires a mutex in the TravelerList class, but it's locked & unlocked once per Route instead of once per HighwaySegment. Decreased overhead. Same basic idea as in ComputeStatsThread parallelism phase 2.
    • 9,865,256 hash table lookups (the length of the clinched DB table + 4*the length of the clinchedRoutes DB table).
    • (as of HD @ 4488c228 / UD @ 4b879ee)
  • 2: As proposed above, iterate thru regions and cowboy thru the whole thing mutex-free after all. With one key difference: set up all the key/value pairs beforehand, most likely in Route::store_traveled_segments. This way, we wouldn't have to worry about multiple threads performing insertions, altering the container, and invalidating iterators.
    How would this perform vs 1B?
    • Pro: No mutexes at all. No contention, no overhead.
    • Con: Way more hash table lookups: 33,721,660 (4*the number of times store_traveled_segments is called + 4*the length of the clinched DB table).
    • Con: A wee bit of overhead during route construction.
  • 3: Best-of-both-worlds approach. Set up key/value pairs beforehand as in #2. No mutex. Update traveler mileage tables once per route instead of every segment.
    • 11,607,588 hash table lookups (4*the number of times store_traveled_segments is called + the length of the clinched DB table + 4*the length of the clinchedRoutes DB table).
  • Refactoring? The per-traveler regional mileage tables are stored in the TravelerList class, with Region*s as keys. They could in theory be stored in the Region class, with TravelerList*s as keys. This would remove the need to set up the tables in advance, but would slow down userlogs and stats CSVs. (The messy table of tables would have to be set up in advance no matter what.) Maybe a net loss in performance altogether; low priority.
« Last Edit: January 25, 2023, 09:29:31 am by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4051
  • Last Login:Today at 10:39:13 am
Re: Unusual new behavior for region.php
« Reply #19 on: January 25, 2023, 11:00:55 am »
Option 1B is slower than even the 3-mutex flavor of Option 1. Probably due to one or both of:
  • The hash table lookups themselves may be fairly cheap, while insertions are expensive, leading to the creation of the temporary hash table taking more time.
  • The expense of iterating through an unordered_map, however much that is.
This probably means Option 3 isn't viable.
Meanwhile, Option 2 crashes. No idea why yet.
« Last Edit: January 25, 2023, 11:19:17 am by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4051
  • Last Login:Today at 10:39:13 am
Re: Unusual new behavior for region.php
« Reply #20 on: January 25, 2023, 08:47:03 pm »
Meanwhile, Option 2 crashes. No idea why yet.
set up all the key/value pairs beforehand, most likely in Route::store_traveled_segments.
Just doing it here is insufficient.

Quote from: niels.list
AB AB1 ComRd BC/AB
AB AB93 BC/AB AB1_E

That's it for Alberta. More importantly, no TCH anywhere.
That means, no slot in their system_region_mileages for cantch.
When it comes time to call compute_stats_t:
• Using the [] operator means inserting new elements, means data races, means bad data in userlogs, stats CSVs, and the DB.
• Replacing it with unordered_map::at leads to unhandled exceptions, crashes, and the ability to diagnose the problem. ;)
Solution: Set up additional key/value pairs when Augmenting travelers for detected concurrent segments.
How many more hash table lookups is this? 4 for every occurrence of "Concurrency augment for traveler" in concurrencies.log. 3,788,896. For a total of 37,510,556. :'(
On the plus side, we're right back down 947,224 lookups/insertions, no longer having to insert into clinched_segments. Nix creating, maintaining, iterating & destroying the temporary to_add list. (O HAY! I can remove that from Alternative #1 too!)
« Last Edit: Today at 09:40:36 am by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4051
  • Last Login:Today at 10:39:13 am
Re: Unusual new behavior for region.php
« Reply #21 on: January 27, 2023, 10:42:49 am »
Jim, as a programmer and CS prof, maybe you'll find this stuff interesting :D

there's this one part of the code that's a bit of a kluge; ... I've always felt kind of nervous about it.
The kluge is here.
An early revision of "ComputeStatsThread Parallelism Phase 1" broke userlogs; the number of active & preview systems was consistently under-reported -- often being 0.
The culprit was the difference in rounding errors from adding up a series of doubles in different orders.
A HighwaySystem's total mileage is calculated by iterating through HighwaySegments in system -> route -> segment order.
A user's traveled mileage on that system went from the same to iterating thru TravelerList::clinched_segments. And thus, the difference in rounding errors.
I justified this kluge-around on that basis that in practice there won't be any single segment whose mileage is lower than the minimum precision of a 32-bit float. But it never sat right with me. It's not the right thing to do!

But wait... shouldn't this have been broken before?
After all, the two totals being compared are calculated by iterating through the HighwaySystem::mileage_by_region and TravelerList::system_region_mileages unordered_maps.
Unordered maps. The name says it all right? Won't data be coming out of these in any old order at all? Or is it deterministic?
I got curious and pasted
Code: [Select]
HighwaySystem* h;
for (auto& hs : HighwaySystem::syslist)
  if (hs->systemname == "usai")
  { h = hs;
break;
  }

for (std::pair<Region* const, double>& rm : h->mileage_by_region)
std::cout << rm.first->code << '\t' << rm.first->name << '\t' << rm.second << std::endl;

std::cout << std::endl;

TravelerList* yakra;
for (auto& t : TravelerList::allusers)
  if (t->traveler_name == "yakra")
  { yakra = t;
break;
  }

for (auto& rm : yakra->system_region_mileages.at(h))
std::cout << rm.first->code << '\t' << rm.first->name << '\t' << rm.second << std::endl;

return 0;

into yakra:master (still at bc04537; I've not yet FFWDed to your more recent merge commit) here.
It spits out two tables:
  • Mileage for all regions on usai
    The order of the regions differed from one machine to the next, but stayed the same in successive runs on each.
  • My mileage in my regions on usai
    The order of the regions differed between successive runs using multiple threads. It stayed the same if I either:
    • ran siteupdate with only 1 thread, or
    • pasted the same code into my local WIP branch that sets up the hash tables in advance in Route::store_traveled_segments and ConcAugThread as noted above.
This leads me to believe that the sequence of items in an unordered_map depends on the order in which they're inserted, along with the hash of the key itself.
  • HighwaySystem::mileage_by_region gets its insertions in a deterministic order, as we iterate thru systems & routes.
  • In the WIP branch, TravelerList::system_region_mileages got its insertions in the same order the system+region combos were encountered in yakra.list.
  • In the master branch, TravelerList::system_region_mileages gets its insertions in the order the HighwaySegment*s come out of TravelerList::clinched_segments.
    This looks pretty stable with just one thread, but multiple threads have a big impact on where these objects are allocated in memory, and thus their positions in the clinched_segments set.
OK. So now that segments are all summed up in system->route->segment order, including for traveler's regional totals again, can we remove the float conversion and the logs will be OK? Well, not quite. For two reasons:
  • For travelers, the segment lengths are added up one by one. For the overall region totals, since ComputeStatsThread parallelism phase 2, segment lengths are summed into a subtotal for each route, when are then added to the regional totals. This was done in order to greatly reduce mutex usage. So, the rounding errors will be a little different. (I see away to fix/undo this, but that's off-topic for the moment.) Even still...
  • As discussed above, HighwaySystem::mileage_by_region and TravelerList::system_region_mileages are no longer populated, and thus no longer iterated, in the same order.
OTOH, we can get rid of the float conversion after all if we don't compare traveled vs total mileage, and instead see if num_con_rtes_clinched == everything in the system. As an added bonus, this eliminates a call to HighwaySystem::total_mileage and saves us a wee bit of time.



WRT ComputeStatsThread parallelism phase 2:
Forget what I said upthread about "since we'd still keep the changes from ComputeStatsThread parallelism phase 2." Iterating by region isn't just for CompStatsTThread; it can benefit CompStatsRThread too.
  • We'd no longer need a mutex here, and can then simplify away that block of code (and these temporaries) completely, instead adding the segment lengths directly to the regional totals.
  • We would need a mutex around this stuff since each thread no longer handles a HighwaySystem, unless the tables are set up in advance. Do whichever performs better.
  • This raises a question... Chasing a few pointers and doing a little arithmetic is pretty inexpensive. What would perform better, having that or a hash table lookup/insertion in a mutex?
    Again, though, there's the option to set up the tables in advance. HighwaySystem construction seems a natural place to do this, but that's still single-threaded for the time being. Multi-threaded options include ReadWptThread and RteIntThread. Got some ideas up my sleeve on how to minimize lookups while doing this.
  • We could add to system->mileage_by_region[region] segment-by-segment without having to muck about with a per-route system_mileage subtotal OR multiple hash table lookups if we obtain a reference before the for loop, and add to that reference.
  • Iterating both CompStatsRThread and CompStatsTThread by region means we could simplify the code and reunify them, effectively undoing all of #386. This eliminates the need to store concurrency counts in the HighwaySegment object for future use, saving us 16 bytes per segment.



Benchmarking
TODO DONE: make a list of tasks affected by all the changes. Some will be faster; some will be slower.
And for completeness' sake, tasks following changed ones, LOL. As observed once upon a time before, changes in what goes on in one task can affect the number of cache misses (and thus performance) in the next, even if the code in these subsequent tasks is itself unchanged.
« Last Edit: January 31, 2023, 08:39:59 pm by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2532
  • Last Login:Today at 08:00:06 am
Re: Unusual new behavior for region.php
« Reply #22 on: January 27, 2023, 02:08:53 pm »
Wow, thanks again for all of this digging, @yakra.  I think next time I teach my Parallel Processing or OS or other classes where I talk about the dangers of concurrency ("what could possibly go wrong?"), I'll just use some of your threads on the topic here and on GitHub as assigned readings...

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4051
  • Last Login:Today at 10:39:13 am
Re: Unusual new behavior for region.php
« Reply #23 on: January 27, 2023, 08:34:33 pm »
Woe be to those who try to follow my incoherent ramblings :)

Thankfully, there's more than one way to skin this cat. Now it's just a matter of finding which one is fastest. After all, the C++ siteupdate program is all about The Quest For Speed :)
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4051
  • Last Login:Today at 10:39:13 am
Re: Unusual new behavior for region.php
« Reply #24 on: January 29, 2023, 05:10:26 pm »
Tasks to benchmark
  • Total run time (w/early termination to save time)
        Reading region|per-traveler
Primary tasks with their efficiency directly changed:
  • Reading systems list
        Reading systems list|Finding all
        Slower in v.2 due to populating Region::routes
  • Reading waypoints for all routes
        ReadWpt
        Slower in v.2s* due to HighwaySystem::mileage_by_region setup
  • Processing traveler list files
        ReadList
        Slower in v.2 due to traveler mileage table setup
  • Augmenting travelers for detected concurrent segments
        ConcAug
        Slower in v.2 due to traveler mileage table setup
        Faster in both: to_add removed
  • Computing stats
        CompStatsB
        Slower in v.1 due to mutexes and temporary table creation in v.1B
        Faster in v.2 due to prior hash table setup
Secondary tasks themselves unchanged that may be indirectly affected by cache misses due to changes in prior tasks:
  • Finding all .wpt files
        Finding all|Reading waypoints
  • Sorting waypoints in Quadtree
        QtSort
  • Searching for near-miss points
        NmpSearch
  • Writing route and label logs
        Writing route|augmenting
  • Writing highway data stats log file (highwaydatastats.log)
        highwaydatastats|per-traveler
« Last Edit: January 29, 2023, 10:33:09 pm by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca