Author Topic: Unusual new behavior for region.php  (Read 12245 times)

0 Members and 1 Guest are viewing this topic.

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2732
  • Last Login:Today at 10:27:44 am
Re: Unusual new behavior for region.php
« Reply #30 on: March 10, 2023, 09:43:52 pm »
Jim, I have a bleeding-edge executable at /home/yakra/ytm/DataProcessing/siteupdate/cplusplus/siteupdate. Wanna give it a try tonight, multi-threaded?

Running now with 8 threads.

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2732
  • Last Login:Today at 10:27:44 am
Re: Unusual new behavior for region.php
« Reply #31 on: March 10, 2023, 10:02:47 pm »
Jim, I have a bleeding-edge executable at /home/yakra/ytm/DataProcessing/siteupdate/cplusplus/siteupdate. Wanna give it a try tonight, multi-threaded?

Running now with 8 threads.

Completed.

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4234
  • Last Login:February 13, 2024, 07:19:36 pm
  • I like C++
Re: Unusual new behavior for region.php
« Reply #32 on: March 12, 2023, 05:13:53 pm »
TLDR, no real reason to waste your time reading this.
Mostly talking to myself here; decided to save all this rather than just bin it.

The shootout!

it looks like I'll be dropping 5 of these alternatives from consideration right away.
Only 4 are out; 1 is back in, at least temporarily.
Turns out I was grabbing the wrong timestamps from the siteupdate.log files, and not measuring one of the tasks that slowed down in order to speed other tasks up. This made the newer alternatives artificially look better in comparison to siteupdate_2.
Meanwhile, 10 more alternatives were added, making 35 in total. :P More on this below.

siteupdate_3mtx, 2mtx & 1mtx were 3 different flavors of Option 1 listed under "What to do" here. The idea was to experiment with having fewer mutexes than the original 3, and balance reducing lock contention with reducing lock overhead. Less overhead wins; siteupdate_1mtx generally performed best, but the siteupdate_2+ alternatives outperformed all these; they were 3 of the 4 options dropped from considertion.

siteupdate_1B was proposed to decrease mutex overhead by creating a temporary hash table. This one was absolute bottom of the barrel everywhere except on bsdlab, where its performance fell between siteupdate_3mtx & 2mtx.

siteupdate_2 as revised here, when Computing stats per traveler, iterates with each thread processing a Region, and avoids bringing back per-traveler mutexes at the cost of setting up hash tables' key/value pairs in a threadsafe manner in another function beforehand. CompStatsRThread/compute_stats_r & CompStatsTThread/compute_stats_t are still separate entities. This is the alternative that was mistakenly removed from consideration then put back on the table; I'll weigh it against whatever gets selected from The Big 30 in the final round.

The Big 30:
These all re-unify CompStatsRThread & CompStatsTThread into a single CompStatsThread that, again, iterates with each thread processing a Region, as outlined in the last set of bullet points at the bottom of this post.

So there are 30 remaining alternatives. 2 times 3 times 5.
For those unfamiliar with brace expansion, hopefully this will still make some sense from context:
siteupdate_{3,4}{m,s,S}{t,ta,r,pa,ra}

The 4 family attempted to improve on 3 by storing a few variables to eliminate some redundant dereferencing & arithmetic. In practice, this didn't have much effect.
The 3xx family was previously referred to as 2xx variants upthread, confusingly. :(

m vs s: Right now, this code is threadsafe, with each thread processing a HighwaySystem. However, switching to processing a Region will change that, and we'll either need a mutex, or to setup the key/value pairs in advance.
The capital S family is where the 10 new alternatives came in -- m combined the HighwaySystem mutexes as proposed in yakra#210 to preserve my sanity & code organization/readability, while s didn't really need to do that & went no-build. After noticing some counterintuitive benchmark results on lab2, I decided to go for a more apples-to-apples comparison, and S combines the mutexes too. Prior benchmarks suggested a slight speedup anyway.

t is more of a no-build compared to r -- it continues to keep a running system_mileage subtotal, and add it to r->system->mileage_by_region[this] afterwards, while r defines system_mileage as a reference to r->system->mileage_by_region[this] and adds to it directly. So we have one less arithmetic operation per route, and slightly cleaner code. No reason to use t, really. :P

ta, ra, pa: 'a' is for "at". Background is at #517; search for "No-Build". Leaving this ugly/clunky block of code as it was performed better than changing to a more concise
system->mileage_by_region[region] += system_mileage;
It's possible that, two days later, using the clang -O3 flag optimized away any/most of the difference. I don't remember testing that at the time; gonna have to do that now. :D Annnd... it does appear to optimize that away, yes.
ta is no-build here, still adding/assigning the total in the try/catch blocks.
Would-be ra is less straightforward -- the reference can't be declared in the try/catch blocks due to scope limitations, and can't be declared before that as it must be assigned upon declaration.
pa declares a pointer first, assigns it in the try/catch blocks, and dereferences if when adding to *system_mileage.
ra declares a pointer first, then uses it to get a reference right after the try/catch.

The winner: siteupdate_3Sr!

S slightly outperforms s, and has cleaner code.
S slightly outperforms m, though it's very close. I just like the idea of going mutex-free too. :)
No significant difference between 3 & 4. Going with 3 for cleaner code.
For the remaining "Big 5", no real observable difference in speed; r has the cleanest code, and in theory is faster than t even if not noticeably so in practice.

Taking another look at 2 compared to 3Sr, it's in the same ballpark. Slower on 6/7 machines, faster by about 12 ms on average on lab1. Just noise in the data.
Disadvantage of 3Sr: key/value pair setup for HighwaySystem::mileage_by_region during ReadWptThread.
Advantages of 3Sr: Less HighwaySystem mutex overhead; less pointer-chasing & arithmetic; no region mutexes; half the iterating thru routes because only one threaded ComputeStats task, not 2.
Plus, cleaner code -- fewer threads, fewer functions, fewer files, and fewer lines in what's left.

3Sr has the 2nd-cleanest code overall, with only 5 more lines than 3mr.

Anyway! Enough faffing about; I've waster a staggering amount of time. Time to delete all the other experimental HighwayData branches and get to work on committing this.



Dropping this here as I delete dead branches, in care I wanna refer back to it in the future:
Code: [Select]
3mtx 7b090e1be7a05fb173f27f0ccb1d83951c502707
2mtx bc71d69b7697ce22974b94137d13011c6fc6334f
1mtx b031cd744eab66d35078df29971ffd59e2019d77
1B 1e9fd5803803e8a3442f290d0ac65b08816e06fd
2 1b6f07d72a205a7aace79e9f9dea71412d537892
3mt 5b4794db55e8a0f64d499d720622b67cd2773ae9
3mta eadec347d03e9c05280f2e4938ea4edfe964ad75
3mr 4f91059cf83d628c0aa350829304a83a58dd29f4
3mpa 43c6d014f8781c0c30ed4f65b845542743530186
3mra 62ea0e698519b795c6a6edb23572943028eec134
3st 836a5f4b203d06bb53634441c51eae3d161ba3bd
3sta e29c2d9f6bc188d8866a23d94726fe0711eede86
3sr 2f394b13510d6d9c69e0e72d19be6bf72b0f546f
3spa 1e8bd750e89849796b3079ca094a9a7df86703e3
3sra a2cb69eb94a5be1038c1116dc6a688f263f45c61
3St 8bb3052c31e40f888006781eb397cf0bb2d6f707
3Sta a368b2ef656727b5bd5e84bee45561a9512b3a73
3Sr 9198035f49ae24842941eaec8be17d898e1baf5b
3Spa a1c39077211e693f5a67b328bcccb962e28a715b
3Sra 372ea9ed3ee8f184a0ab16d4ff17df0067fd5544
4mt 933ce42c93d37970d7d0c53c9042f7545ebb768b
4mta 7582a07b53fd04df0d6072581d208135fc0ecd32
4mr df840c78da7e75088db08c6584a45a4ffb691299
4mpa d01b65d995eb76f928e061a2b7de312fff4f68f2
4mra 1dbfcacaab3089f3db0e835bffe8a1f32cb46dff
4st 50492dada8521bdd5b8fe449dd50bdf4a06d6993
4sta d58e3edcfaf3afb81f273311296baff1b86340bb
4sr 518210b631e62b2f438a8ae9c4d547d12d7d46f9
4spa e1cf7b2f28cba203c60163dea11ca5d76d0389b7
4sra cc82288859fe56981f2af06f5d7eb71b76becf5c
4St 4eac70852d19573e7d56e80ef93054351753b205
4Sta c5ebe2ebf78071533528b13cc0bbddf995afc84c
4Sr 63d085cb27463438e8937debc53fff09ac92dd96
4Spa fe5d268886959791fa9cb29ed39031447a4a5987
4Sra 95e5cf58413d77ffad00d5cb40279304c60c0987

CompStats3 0eaf81ae77e613f3802f15089584c1e83f81ae7d
2sr 83f42cfd4c86eabe5ded700cd9fb0bc81db6aad4
2mr 0112222c87860fe89f9a94256f5ab981bec096e8
CompStats2 1e9fd5803803e8a3442f290d0ac65b08816e06fd
y210_rebase 62b6dc6a0e9526532708a3b7d8ac0c49a100aaac



Final ToDo List:
Check wording in comments & fix if needed, incl CompStatsThread.cpp
Make sure all hash table pre-setup is conditioned out in siteupdateST -- or else done in siteupdate in the thread functions only.
Make sure siteupdateST still gives the same output.
Probably implement the rest of yakra#210, closing the issue.
Userlogs: Don't compare traveled vs total mileage, and instead see if num_con_rtes_clinched == everything in the system. Python too?
Housekeeping: refactor datachecks into HighwaySystem::route_integrity. Python too, to keep things similar.
Implement a few items from #574 since we already need to recompile things.

Clear TravelerList::clinched_segments after concurrency augments? Leave it in in case I make changes to DB population? Decide later.
Interactive rebase / squash
« Last Edit: March 28, 2023, 12:19:24 am by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2732
  • Last Login:Today at 10:27:44 am
Re: Unusual new behavior for region.php
« Reply #33 on: March 13, 2023, 09:30:44 pm »
Going back to the single-threaded one for tonight, @yakra let me know if you'd like me to use your current version for further testing on production site updates.

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4234
  • Last Login:February 13, 2024, 07:19:36 pm
  • I like C++
Re: Unusual new behavior for region.php
« Reply #34 on: March 19, 2023, 12:54:56 pm »
Should be good until the push to production, thanks.

In the meantime, I had some fun intentionally creating a bug I knew in advance how to fix:
I removed the float conversion kluge-around mentioned upthread, and saw what I could see.

OK. So now that segments are all summed up in system->route->segment order, including for traveler's regional totals again, can we remove the float conversion and the logs will be OK? Well, not quite. For two reasons:
  • For travelers, the segment lengths are added up one by one. For the overall region totals, since ComputeStatsThread parallelism phase 2, segment lengths are summed into a subtotal for each route, when are then added to the regional totals. This was done in order to greatly reduce mutex usage. So, the rounding errors will be a little different. (I see a way to fix/undo this, but that's off-topic for the moment.) Even still...
This fix was done, and now for overall regional totals, segment lengths are added up one by one, same as for per-traveler regional totals. In the same order.
This makes the "missing clinched system" bug far less frequent, but still possible. The only factor we have to worry about is...

  • As discussed above, HighwaySystem::mileage_by_region and TravelerList::system_region_mileages are no longer populated, and thus no longer iterated, in the same order.
Here's a list of systems where the bug can occur, those with 3 or more regions, clinched by 1 or more travelers. (Nota bene: not the most recent commits on BiggaTomato. HD @ 59afa75, UD @ 2126e76.)

system  regions clinchers
gbnam   3       1
gbnm    3       1
usany   3       2
usawv   3       2
usaif   13      2
deua    16      1


Of course, the more regions in a system, the more probable it is for the error to occur. I never observed it in any of the 3-region systems.
Every time, the buggy siteupdate version failed to recognize michih's clinch of deua.

For usaif, things get a little more interesting.
Almost every time, siteupdate fails to recognize mapcat's clinch.
2 or 3 times though, mapcat's clinch was recognized, and Oscar's was not.

This fits in with my theory...
that the sequence of items in an unordered_map depends on the order in which they're inserted, along with the hash of the key itself.
What happens most of the time is something like this -- here's the data from the last run:
VA FL AR NC LA AZ CA TN KY NY IN SC PA is the order of regions in usaif's overall mileage table.
SC NY CA TN KY NC FL AR AZ IN LA VA PA is the order of regions in Oscar's traveled mileage table.
VA SC LA NY KY IN NC PA FL AR AZ TN CA is the order of regions in mapcat's traveled mileage table.

For Oscar, his mileages summed up in that order magically come out to the same number, as a 64-bit double-precision float, as usaif's overall mileage the majority of the time.
676.61164962090924746
For mapcat, no such luck most of the time; for his region order the rounding error works out a little different, the numbers aren't equal, and no clinch is detected.
676.61164962090913377
LOL, meanwhile michih's mileage of 8235.36727394201443531 is greater than deua's total of 8235.36727394201261632. Not equal = no clinch. That's not fair now is it!

The order of the regions' insertion into the table is deterministic. The hash of the key, not quite. The key is the Region object's memory address. The objects are dynamically allocated on the heap, and their addresses can change between runs. Seems fairly stable for the most part though; while I've only observed insertion order having an effect previously, now it's confirmed -- changing memory addresses affect key hashes, affect the order of storage in the unordered_map, affects order in which mileages are summed, affects cumulative rounding errors, affect whether a user's traveled mileage == a system's mileage, affects whether a system is flagged as clinched.

Well. That was a bit of fun now wasn't it!
And now back to the actual programming -- to remove the kluge, and speed things up a tiny bit in the process, without introducing a bug.



after:
Code: [Select]
std::ofstream dbg(Args::logfilepath+"/users/"+traveler_name+".dbg");
change:
Code: [Select]
{ active_systems_clinched++;
dbg << h->systemname << '\n';
}

in here:
Code: [Select]
// having fun with some debugging even though I created the bug on purpose and know how to fix it!
if (traveler_name == "michih" && h->systemname == "deua"
 || traveler_name == "mapcat" && h->systemname == "usaif"
 || traveler_name == "oscar"  && h->systemname == "usaif")
{ auto hit = h->mileage_by_region.begin();
auto tit = system_region_mileages.at(h).begin();
while(hit != h->mileage_by_region.end())
{ sprintf(fstr, "%.17f", tit->second); dbg << tit->first->code << '\t' << fstr << '\t';
sprintf(fstr, "%.17f", hit->second); dbg << hit->first->code << '\t' << fstr << '\n';
++tit;
++hit;
}
}
« Last Edit: March 19, 2023, 01:04:01 pm by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4234
  • Last Login:February 13, 2024, 07:19:36 pm
  • I like C++
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca