Hm. Maybe disk access affects the Python execution time a little bit, but I think it may be less than it appears...
Last night (with current repos) I got
Total run time: [482.0] single-threaded, off the normal disk.
For comparison,
2020-08-24@21:53:51/logs/siteupdate.log:Total run time: [480.7]. Noreaster runs multithreaded, which will make it a bit slower. Repos were however much smaller in August, which will make it a bit faster.
siteupdate.py appears
mostly CPU-bound even when reading .wpts -- lab2 stays mostly 65-85% CPU use when reading.
Threaded C++ OTOH, when reading .wpts that aren't in the disk's cache, CPU use is close to zero. So yes, SSDs will definitely help there.
RE .sql ingestion, based on
my experiences....
The way lab2 is set up right now, CPU use is pretty near 100%, so it looks CPU-bound.
It appears the bottleneck isn't disk access unless logging is set up to make it so. If that's what your logging need are, then SSDs can help.
I'll certainly agree that SSDs would help with regular DB access though.
The bigger improvement would probably come from switching to the C++ site update program
which can make use of the processing cores.
It'd be interesting to see what noreaster can do, and where its sweet spot would be.
Few tasks can scale all the way up to 24 threads, though with noreaster's presumably greater memory bandwidth (& greater L3 cache) that could change.
What's noreaster have for RAM?
top says 13G free, so... 16? If just 3 people log in and run datacheck.sh, that could break the bank...
Combine that with faster I/O both in latency and transfer times, and we might be on our way to running automated updates.
I'll refrain from getting into the whats & hows for now, and say... neato!