Author Topic: Site update now using C++ version  (Read 3562 times)

0 Members and 1 Guest are viewing this topic.

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2732
  • Last Login:Today at 08:47:31 am
Site update now using C++ version
« on: December 28, 2022, 10:10:42 pm »
Tonight's production site update was the first done with the C++ site update program written by @yakra.  It's much more efficient than the Python version I developed initially, and which he has extended and improved greatly over the last few years, in parallel with the C++ development.

First, thanks to @yakra for taking this on and getting it all to this point.

Second, there's of course some chance that bugs will show up.  If you notice anything that looks like it's not quite right, let us know.

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4234
  • Last Login:February 13, 2024, 07:19:36 pm
  • I like C++
Re: Site update now using C++ version
« Reply #1 on: December 29, 2022, 03:45:49 pm »
For those who want to try out the C++ datacheck.sh on noreaster, just change your directory to
siteupdate/cplusplus instead of siteupdate/python-teresco.
Then, type sh datacheck.sh as you normally would.
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline michih

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4555
  • Last Login:Yesterday at 04:04:16 pm
Re: Site update now using C++ version
« Reply #2 on: December 29, 2022, 04:07:13 pm »
Thanks! The data check is very, very quick! The initial loading of the classes takes about 30s but my first data check using C++ took just 44s, the 2nd one 40s. That helps a lot when we have to fix errors :)

@yakra + @Jim: Will you maintain the Python version in the future, or should all (hwy data managers) switch to C++?
Nevertheless, the Github docu needs to be updated. It's linked from devel.php.

Edit: I simply run:
Code: [Select]
cd ~/DataProcessing/siteupdate/cplusplus
git pull
sh datacheck.sh
« Last Edit: December 29, 2022, 04:09:23 pm by michih »

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2732
  • Last Login:Today at 08:47:31 am
Re: Site update now using C++ version
« Reply #3 on: December 29, 2022, 07:03:21 pm »
In the short term, I'd like to keep the two versions aligned.  But once I get a little more familiar with the C++ code and we've been using it for a while with no problems, we'll probably retire the Python version.

Offline Jim

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 2732
  • Last Login:Today at 08:47:31 am

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4234
  • Last Login:February 13, 2024, 07:19:36 pm
  • I like C++
Re: Site update now using C++ version
« Reply #5 on: December 29, 2022, 10:56:47 pm »
The initial loading of the classes takes about 30s
The good news here is that bit only happens when the siteupdate code has changed. The program datcheck.sh runs to build siteupdate (gmake) keeps track of this for us.
When it does need to be recompiled, only the modules that have changed need to be recompiled, so the process should be a lot faster on future runs.
It'll only get near that original 30s again when files (like Waypoint.h) that are essential building blocks of the program and #included in a lot of different modules have changed.
The less-good news is that Waypoint.h gets updated fairly often; it's where all the datacheck function prototypes are. :) It won't make everything recompile -- just a lot of it.

my first data check using C++ took just 44s, the 2nd one 40s.
Good news here is, we can easily do even better than this!
datacheck.sh and localupdate.sh use siteupdateST, a single-threaded variant. This was done back before getting the standard multi-threaded version to compile & run on FreeBSD.
Using the multi-threaded version will make things even faster.

How many threads to use? Things to consider:
  • Where's the sweet spot for running a production site update, before efficiency decreases and total run time increases?
    My tests have been using the old-school magnetic HD. Would results be better writing to /fast?
  • Should we use a slightly smaller number, below some point of diminishing returns, and leave some cores free for noreaster to perform its other duties?
  • Where's the sweet spot for running siteupdate --errorcheck? This mode skips nmp_merged writing and subgraph generation, which don't scale well. Meaning siteupdate --errorcheck should more gracefully scale to a larger number of threads.
  • Should we use a lower number of threads anyway, to avoid hogging all noreaster's resources? How many contributors are likely to run datacheck.sh simultaneously?

@yakra + @Jim: Will you maintain the Python version in the future
In the short term, I'd like to keep the two versions aligned.  But once I get a little more familiar with the C++ code and we've been using it for a while with no problems, we'll probably retire the Python version.
My thoughts as well. I'll continue to maintain the Python version, for a while at least. It'll be good to have it around as a fallback in case something goes wrong.

should all (hwy data managers) switch to C++?
Yes!
  • It's way way faster!
  • It will be the production version of the code, with Python to be retired at some point.
  • Gimme bug reports! ;D Hopefully bugs should be few and far between, but one never knows. I test out the program when updating it, but usually with known good commits, and/or whatever's in the master HighwayData branch at the time. There's always the possibility that with just the right dataset, with something that fails (or even should pass) datacheck, something unexpected could occur that I wouldn't happen to observe on my own. The more eyes on it the better.
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline michih

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4555
  • Last Login:Yesterday at 04:04:16 pm
Re: Site update now using C++ version
« Reply #6 on: December 30, 2022, 02:20:23 am »
should all (hwy data managers) switch to C++?
Yes!
  • It's way way faster!
  • It will be the production version of the code, with Python to be retired at some point.
  • Gimme bug reports! ;D Hopefully bugs should be few and far between, but one never knows. I test out the program when updating it, but usually with known good commits, and/or whatever's in the master HighwayData branch at the time. There's always the possibility that with just the right dataset, with something that fails (or even should pass) datacheck, something unexpected could occur that I wouldn't happen to observe on my own. The more eyes on it the better.

Highway data managers should run data verification (Run site update program to generate the same logs, stats, and database file that are produced as part of the regular site update process ) before submitting a pull request to avoid that the production site update fails and a) Jim has to deal with the errors or b) the site will not be updated that day or c) the site is "down" for hours till Jim is facing the issue.
Thus, I think it is essential that all highway data managers switch to the C++ version with their next pull request because the code of Python and C++ version is not 100% the same. A successful data check with Python has not relevance / advantage anymore since the production site update is run with the C++ version now. We only have to change one command.

For those who want to try out the C++ datacheck.sh on noreaster, just change your directory to
siteupdate/cplusplus instead of siteupdate/python-teresco.

I simply run:
Code: [Select]
cd ~/DataProcessing/siteupdate/cplusplus
git pull
sh datacheck.sh

If bugs arise, so what. They will also arise on the production server.

Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4234
  • Last Login:February 13, 2024, 07:19:36 pm
  • I like C++
Re: Site update now using C++ version
« Reply #7 on: December 30, 2022, 08:18:51 pm »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca

Offline nezinscot

  • TM Collaborator
  • Full Member
  • *****
  • Posts: 110
  • Last Login:March 23, 2024, 05:33:52 pm
Re: Site update now using C++ version
« Reply #8 on: January 19, 2023, 09:28:21 am »
I tried using the cplusplus version of datacheck this morning and received this error:

[xxx ~]$ cd DataProcessing/siteupdate/cplusplus
[xxx ~/DataProcessing/siteupdate/cplusplus]$ sh datacheck.sh

datacheck.sh: updating TM repositories
Already up to date.
Already up to date.
datacheck.sh: creating directories
datacheck.sh: Building latest site update program...
gmake: 'siteupdateST' is up to date.
datacheck.sh: launching siteupdateST
ld-elf.so.1: /usr/local/lib/gcc10/libstdc++.so.6: version GLIBCXX_3.4.29 required by /home/xxx/DataProcessing/siteupdate/cplusplus/siteupdateST not found

My sandbox seems to have an obsolete (or no) version of GLIBCXX.  Do I need to recreate my sandbox, or did I miss a step?




Offline yakra

  • TM Collaborator
  • Hero Member
  • *****
  • Posts: 4234
  • Last Login:February 13, 2024, 07:19:36 pm
  • I like C++
Re: Site update now using C++ version
« Reply #9 on: January 19, 2023, 10:23:32 am »
You have an old revision of the DataProcessing repo. Newer revisions build the siteupdate program with a different compiler.

[xxx ~]$ cd DataProcessing/siteupdate/cplusplus
[xxx ~/DataProcessing/siteupdate/cplusplus]$ git pull
[xxx ~/DataProcessing/siteupdate/cplusplus]$ gmake clean #(you'll only need to do this once)

[xxx ~/DataProcessing/siteupdate/cplusplus]$ sh datacheck.sh

(Jim, speaking of gmake clean: https://github.com/TravelMapping/DataProcessing/pull/569)



As an aside, it's weird to see the GLIBCXX error with siteupdateST. I thought that was only a thing with the multithreaded version. Either way, this will be fixed by recompiling with clang instead of GCC.
« Last Edit: January 19, 2023, 10:43:18 am by yakra »
Sri Syadasti Syadavaktavya Syadasti Syannasti Syadasti Cavaktavyasca Syadasti Syannasti Syadavatavyasca Syadasti Syannasti Syadavaktavyasca