User Discussions > Welcome & Notices
Site update now using C++ version
yakra:
--- Quote from: michih on December 29, 2022, 04:07:13 pm ---The initial loading of the classes takes about 30s
--- End quote ---
The good news here is that bit only happens when the siteupdate code has changed. The program datcheck.sh runs to build siteupdate (gmake) keeps track of this for us.
When it does need to be recompiled, only the modules that have changed need to be recompiled, so the process should be a lot faster on future runs.
It'll only get near that original 30s again when files (like Waypoint.h) that are essential building blocks of the program and #included in a lot of different modules have changed.
The less-good news is that Waypoint.h gets updated fairly often; it's where all the datacheck function prototypes are. :) It won't make everything recompile -- just a lot of it.
--- Quote from: michih on December 29, 2022, 04:07:13 pm ---my first data check using C++ took just 44s, the 2nd one 40s.
--- End quote ---
Good news here is, we can easily do even better than this!
datacheck.sh and localupdate.sh use siteupdateST, a single-threaded variant. This was done back before getting the standard multi-threaded version to compile & run on FreeBSD.
Using the multi-threaded version will make things even faster.
How many threads to use? Things to consider:
* Where's the sweet spot for running a production site update, before efficiency decreases and total run time increases?
My tests have been using the old-school magnetic HD. Would results be better writing to /fast?
* Should we use a slightly smaller number, below some point of diminishing returns, and leave some cores free for noreaster to perform its other duties?
* Where's the sweet spot for running siteupdate --errorcheck? This mode skips nmp_merged writing and subgraph generation, which don't scale well. Meaning siteupdate --errorcheck should more gracefully scale to a larger number of threads.
* Should we use a lower number of threads anyway, to avoid hogging all noreaster's resources? How many contributors are likely to run datacheck.sh simultaneously?
--- Quote from: michih on December 29, 2022, 04:07:13 pm ---@yakra + @Jim: Will you maintain the Python version in the future
--- End quote ---
--- Quote from: Jim on December 29, 2022, 07:03:21 pm ---In the short term, I'd like to keep the two versions aligned. But once I get a little more familiar with the C++ code and we've been using it for a while with no problems, we'll probably retire the Python version.
--- End quote ---
My thoughts as well. I'll continue to maintain the Python version, for a while at least. It'll be good to have it around as a fallback in case something goes wrong.
--- Quote from: michih on December 29, 2022, 04:07:13 pm ---should all (hwy data managers) switch to C++?
--- End quote ---
Yes!
* It's way way faster!
* It will be the production version of the code, with Python to be retired at some point.
* Gimme bug reports! ;D Hopefully bugs should be few and far between, but one never knows. I test out the program when updating it, but usually with known good commits, and/or whatever's in the master HighwayData branch at the time. There's always the possibility that with just the right dataset, with something that fails (or even should pass) datacheck, something unexpected could occur that I wouldn't happen to observe on my own. The more eyes on it the better.
michih:
--- Quote from: yakra on December 29, 2022, 10:56:47 pm ---
--- Quote from: michih on December 29, 2022, 04:07:13 pm ---should all (hwy data managers) switch to C++?
--- End quote ---
Yes!
* It's way way faster!
* It will be the production version of the code, with Python to be retired at some point.
* Gimme bug reports! ;D Hopefully bugs should be few and far between, but one never knows. I test out the program when updating it, but usually with known good commits, and/or whatever's in the master HighwayData branch at the time. There's always the possibility that with just the right dataset, with something that fails (or even should pass) datacheck, something unexpected could occur that I wouldn't happen to observe on my own. The more eyes on it the better.
--- End quote ---
Highway data managers should run data verification (Run site update program to generate the same logs, stats, and database file that are produced as part of the regular site update process ) before submitting a pull request to avoid that the production site update fails and a) Jim has to deal with the errors or b) the site will not be updated that day or c) the site is "down" for hours till Jim is facing the issue.
Thus, I think it is essential that all highway data managers switch to the C++ version with their next pull request because the code of Python and C++ version is not 100% the same. A successful data check with Python has not relevance / advantage anymore since the production site update is run with the C++ version now. We only have to change one command.
--- Quote from: yakra on December 29, 2022, 03:45:49 pm ---For those who want to try out the C++ datacheck.sh on noreaster, just change your directory to
siteupdate/cplusplus instead of siteupdate/python-teresco.
--- End quote ---
--- Quote from: michih on December 29, 2022, 04:07:13 pm ---I simply run:
--- Code: ---cd ~/DataProcessing/siteupdate/cplusplus
git pull
sh datacheck.sh
--- End code ---
--- End quote ---
If bugs arise, so what. They will also arise on the production server.
yakra:
Topic split: Crash/core dump and optimal number of threads
nezinscot:
I tried using the cplusplus version of datacheck this morning and received this error:
[xxx ~]$ cd DataProcessing/siteupdate/cplusplus
[xxx ~/DataProcessing/siteupdate/cplusplus]$ sh datacheck.sh
datacheck.sh: updating TM repositories
Already up to date.
Already up to date.
datacheck.sh: creating directories
datacheck.sh: Building latest site update program...
gmake: 'siteupdateST' is up to date.
datacheck.sh: launching siteupdateST
ld-elf.so.1: /usr/local/lib/gcc10/libstdc++.so.6: version GLIBCXX_3.4.29 required by /home/xxx/DataProcessing/siteupdate/cplusplus/siteupdateST not found
My sandbox seems to have an obsolete (or no) version of GLIBCXX. Do I need to recreate my sandbox, or did I miss a step?
yakra:
You have an old revision of the DataProcessing repo. Newer revisions build the siteupdate program with a different compiler.
[xxx ~]$ cd DataProcessing/siteupdate/cplusplus
[xxx ~/DataProcessing/siteupdate/cplusplus]$ git pull
[xxx ~/DataProcessing/siteupdate/cplusplus]$ gmake clean #(you'll only need to do this once)
[xxx ~/DataProcessing/siteupdate/cplusplus]$ sh datacheck.sh
(Jim, speaking of gmake clean: https://github.com/TravelMapping/DataProcessing/pull/569)
As an aside, it's weird to see the GLIBCXX error with siteupdateST. I thought that was only a thing with the multithreaded version. Either way, this will be fixed by recompiling with clang instead of GCC.
Navigation
[0] Message Index
[*] Previous page
Go to full version