Travel Mapping

User Discussions => Welcome & Notices => Topic started by: Jim on December 28, 2022, 10:10:42 pm

Title: Site update now using C++ version
Post by: Jim on December 28, 2022, 10:10:42 pm
Tonight's production site update was the first done with the C++ site update program written by @yakra.  It's much more efficient than the Python version I developed initially, and which he has extended and improved greatly over the last few years, in parallel with the C++ development.

First, thanks to @yakra for taking this on and getting it all to this point.

Second, there's of course some chance that bugs will show up.  If you notice anything that looks like it's not quite right, let us know.
Title: Re: Site update now using C++ version
Post by: yakra on December 29, 2022, 03:45:49 pm
For those who want to try out the C++ datacheck.sh on noreaster, just change your directory to
siteupdate/cplusplus instead of siteupdate/python-teresco.
Then, type sh datacheck.sh as you normally would.
Title: Re: Site update now using C++ version
Post by: michih on December 29, 2022, 04:07:13 pm
Thanks! The data check is very, very quick! The initial loading of the classes takes about 30s but my first data check using C++ took just 44s, the 2nd one 40s. That helps a lot when we have to fix errors :)

@yakra + @Jim: Will you maintain the Python version in the future, or should all (hwy data managers) switch to C++?
Nevertheless, the Github docu (https://github.com/TravelMapping/DataProcessing/blob/master/SETUP.md) needs to be updated. It's linked from devel.php (https://travelmapping.net/devel/devel.php#tools).

Edit: I simply run:
Code: [Select]
cd ~/DataProcessing/siteupdate/cplusplus
git pull
sh datacheck.sh
Title: Re: Site update now using C++ version
Post by: Jim on December 29, 2022, 07:03:21 pm
In the short term, I'd like to keep the two versions aligned.  But once I get a little more familiar with the C++ code and we've been using it for a while with no problems, we'll probably retire the Python version.
Title: Re: Site update now using C++ version
Post by: Jim on December 29, 2022, 07:10:31 pm
Also: https://github.com/TravelMapping/DataProcessing/issues/548
Title: Re: Site update now using C++ version
Post by: yakra on December 29, 2022, 10:56:47 pm
The initial loading of the classes takes about 30s
The good news here is that bit only happens when the siteupdate code has changed. The program datcheck.sh runs to build siteupdate (gmake) keeps track of this for us.
When it does need to be recompiled, only the modules that have changed need to be recompiled, so the process should be a lot faster on future runs.
It'll only get near that original 30s again when files (like Waypoint.h (https://github.com/TravelMapping/DataProcessing/blob/master/siteupdate/cplusplus/classes/Waypoint/Waypoint.h)) that are essential building blocks of the program and #included in a lot of different modules have changed.
The less-good news is that Waypoint.h gets updated fairly often; it's where all the datacheck function prototypes are. :) It won't make everything recompile -- just a lot of it.

my first data check using C++ took just 44s, the 2nd one 40s.
Good news here is, we can easily do even better than this!
datacheck.sh and localupdate.sh use siteupdateST, a single-threaded variant. This was done back before getting the standard multi-threaded version to compile & run on FreeBSD.
Using the multi-threaded version will make things even faster.

How many threads to use? Things to consider:

@yakra + @Jim: Will you maintain the Python version in the future
In the short term, I'd like to keep the two versions aligned.  But once I get a little more familiar with the C++ code and we've been using it for a while with no problems, we'll probably retire the Python version.
My thoughts as well. I'll continue to maintain the Python version, for a while at least. It'll be good to have it around as a fallback in case something goes wrong.

should all (hwy data managers) switch to C++?
Yes!
Title: Re: Site update now using C++ version
Post by: michih on December 30, 2022, 02:20:23 am
should all (hwy data managers) switch to C++?
Yes!
  • It's way way faster!
  • It will be the production version of the code, with Python to be retired at some point.
  • Gimme bug reports! ;D Hopefully bugs should be few and far between, but one never knows. I test out the program when updating it, but usually with known good commits, and/or whatever's in the master HighwayData branch at the time. There's always the possibility that with just the right dataset, with something that fails (or even should pass) datacheck, something unexpected could occur that I wouldn't happen to observe on my own. The more eyes on it the better.

Highway data managers should run data verification (Run site update program to generate the same logs, stats, and database file that are produced as part of the regular site update process ) (https://travelmapping.net/devel/devel.php#tools) before submitting a pull request to avoid that the production site update fails and a) Jim has to deal with the errors or b) the site will not be updated that day or c) the site is "down" for hours till Jim is facing the issue.
Thus, I think it is essential that all highway data managers switch to the C++ version with their next pull request because the code of Python and C++ version is not 100% the same. A successful data check with Python has not relevance / advantage anymore since the production site update is run with the C++ version now. We only have to change one command.

For those who want to try out the C++ datacheck.sh on noreaster, just change your directory to
siteupdate/cplusplus instead of siteupdate/python-teresco.

I simply run:
Code: [Select]
cd ~/DataProcessing/siteupdate/cplusplus
git pull
sh datacheck.sh

If bugs arise, so what. They will also arise on the production server.
Title: Re: Site update now using C++ version
Post by: yakra on December 30, 2022, 08:18:51 pm
Topic split: Crash/core dump and optimal number of threads (https://forum.travelmapping.net/index.php?topic=5342)
Title: Re: Site update now using C++ version
Post by: nezinscot on January 19, 2023, 09:28:21 am
I tried using the cplusplus version of datacheck this morning and received this error:

[xxx ~]$ cd DataProcessing/siteupdate/cplusplus
[xxx ~/DataProcessing/siteupdate/cplusplus]$ sh datacheck.sh

datacheck.sh: updating TM repositories
Already up to date.
Already up to date.
datacheck.sh: creating directories
datacheck.sh: Building latest site update program...
gmake: 'siteupdateST' is up to date.
datacheck.sh: launching siteupdateST
ld-elf.so.1: /usr/local/lib/gcc10/libstdc++.so.6: version GLIBCXX_3.4.29 required by /home/xxx/DataProcessing/siteupdate/cplusplus/siteupdateST not found

My sandbox seems to have an obsolete (or no) version of GLIBCXX.  Do I need to recreate my sandbox, or did I miss a step?



Title: Re: Site update now using C++ version
Post by: yakra on January 19, 2023, 10:23:32 am
You have an old revision of the DataProcessing repo. Newer revisions build the siteupdate program with a different compiler.

[xxx ~]$ cd DataProcessing/siteupdate/cplusplus
[xxx ~/DataProcessing/siteupdate/cplusplus]$ git pull
[xxx ~/DataProcessing/siteupdate/cplusplus]$ gmake clean #(you'll only need to do this once)
[xxx ~/DataProcessing/siteupdate/cplusplus]$ sh datacheck.sh

(Jim, speaking of gmake clean: https://github.com/TravelMapping/DataProcessing/pull/569)



As an aside, it's weird to see the GLIBCXX error with siteupdateST. I thought that was only a thing with the multithreaded version. Either way, this will be fixed by recompiling with clang instead of GCC.