Jim, as a programmer and CS prof, maybe you'll find this stuff interesting
there's this one part of the code that's a bit of a kluge; ... I've always felt kind of nervous about it.
The kluge is
here.
An early revision of "ComputeStatsThread Parallelism Phase 1" broke userlogs; the number of active & preview systems was consistently under-reported -- often being 0.
The culprit was the difference in rounding errors from adding up a series of
doubles in different orders.
A
HighwaySystem's total mileage is calculated by iterating through
HighwaySegments in system -> route -> segment order.
A user's traveled mileage on that system
went from the same to iterating thru TravelerList::clinched_segments. And thus, the difference in rounding errors.
I justified this
kluge-around on that basis that in practice there won't be any single segment whose mileage is lower than the minimum precision of a 32-bit float. But it never sat right with me. It's
not the right thing to do!But wait... shouldn't this have been broken
before?
After all, the two totals being compared are calculated by iterating through the
HighwaySystem::mileage_by_region and
TravelerList::system_region_mileages unordered_maps.
Unordered maps. The name says it all, right? Won't data be coming out of these in any old order at all? Or is it deterministic?
I got curious and pasted
HighwaySystem* h;
for (auto& hs : HighwaySystem::syslist)
if (hs->systemname == "usai")
{ h = hs;
break;
}
for (std::pair<Region* const, double>& rm : h->mileage_by_region)
std::cout << rm.first->code << '\t' << rm.first->name << '\t' << rm.second << std::endl;
std::cout << std::endl;
TravelerList* yakra;
for (auto& t : TravelerList::allusers)
if (t->traveler_name == "yakra")
{ yakra = t;
break;
}
for (auto& rm : yakra->system_region_mileages.at(h))
std::cout << rm.first->code << '\t' << rm.first->name << '\t' << rm.second << std::endl;
return 0;
into yakra:master
(still at bc04537; I've not yet FFWDed to your more recent merge commit) here.
It spits out two tables:
- Mileage for all regions on usai
The order of the regions differed from one machine to the next, but stayed the same in successive runs on each. - My mileage in my regions on usai
The order of the regions differed between successive runs using multiple threads. It stayed the same if I either:
• ran siteupdate with only 1 thread, or
• pasted the same code into my local WIP branch that sets up the hash tables in advance in Route::store_traveled_segments and ConcAugThread as noted above.
This leads me to believe that the sequence of items in an
unordered_map depends on the order in which they're inserted, along with the hash of the key itself.
- HighwaySystem::mileage_by_region gets its insertions in a deterministic order, as we iterate thru systems & routes.
- In the WIP branch, TravelerList::system_region_mileages got its insertions in the same order the system+region combos were encountered in yakra.list.
- In the master branch, TravelerList::system_region_mileages gets its insertions in the order the HighwaySegment*s come out of TravelerList::clinched_segments.
This looks pretty stable with just one thread, but multiple threads have a big impact on where these objects are allocated in memory, and thus their positions in the clinched_segments set.
OK. So now that segments are all summed up in system->route->segment order, including for traveler's regional totals again, can we remove the
float conversion and the logs will be OK? Well, not quite. For two reasons:
- For travelers, the segment lengths are added up one by one. For the overall region totals, since ComputeStatsThread parallelism phase 2, segment lengths are summed into a subtotal for each route, when are then added to the regional totals. This was done in order to greatly reduce mutex usage. So, the rounding errors will be a little different. (I see a way to fix/undo this, but that's off-topic for the moment.) Even still...
- As discussed above, HighwaySystem::mileage_by_region and TravelerList::system_region_mileages are no longer populated, and thus no longer iterated, in the same order.
OTOH, we
can get rid of the
float conversion after all if we
don't compare traveled vs total mileage, and instead see if num_con_rtes_clinched == everything in the system. As an added bonus, this eliminates
a call to
HighwaySystem::total_mileage and saves us a wee bit of time.
WRT ComputeStatsThread parallelism phase 2:Forget what I said
upthread about "since we'd still keep the changes from ComputeStatsThread parallelism phase 2." Iterating by region isn't just for
CompStatsTThread; it can benefit
CompStatsRThread too.
- We'd no longer need a mutex here, and can then simplify away that block of code (and these temporaries) completely, instead adding the segment lengths directly to the regional totals.
- We would need a mutex around this stuff since each thread no longer handles a HighwaySystem, unless the tables are set up in advance. Do whichever performs better.
- This raises a question... Chasing a few pointers and doing a little arithmetic is pretty inexpensive. What would perform better, having that or a hash table lookup/insertion in a mutex?
Again, though, there's the option to set up the tables in advance. HighwaySystem construction seems a natural place to do this, but that's still single-threaded for the time being. Multi-threaded options include ReadWptThread and RteIntThread. Got some ideas up my sleeve on how to minimize lookups while doing this. - We could add to system->mileage_by_region[region] segment-by-segment without having to muck about with a per-route system_mileage subtotal OR multiple hash table lookups if we obtain a reference before the for loop, and add to that reference.
- Iterating both CompStatsRThread and CompStatsTThread by region means we could simplify the code and reunify them, effectively undoing all of #386. This eliminates the need to store concurrency counts in the HighwaySegment object for future use, saving us 16 bytes per segment.
BenchmarkingTODO DONE: make a list of tasks affected by all the changes. Some will be faster; some will be slower.And for completeness' sake, tasks following changed ones, LOL. As observed once upon a time before, changes in what goes on in one task can affect the number of cache misses (and thus performance) in the next, even if the code in these subsequent tasks is itself unchanged.