Monday, March 22, 2021

Further Tweaking of Other Python3 Scripts

What a day!

I spent much of the morning watching more Linus Tech Tips and other videos from my YouTube playlist. Subsequently, it was more Cyberpunk 2077 again.

Remember the off-comment from a previous post about getting bored waiting 22 s for the minification process? Well, before my meet up with a friend out at Casuarina Curry, I decided to sit down and do the refactoring of the script to make use of multiprocessing. It wasn't hard, but it did take a little bit of time. I think the refactored code is a little bit cleaner than before, and after tuning the number of processes to spawn in a pool (it is 3, for some reason, with a pre-sorted decreasing file size order), I managed to get the run-time for this segment down to around 12.102 s from the original 28.455 s (this means that my number of 22 s in the original post is quite off).

I had enough time after converting the minifier to use multiprocessing that I decided to do the same for the sitemap generator as well. This is a little bit overkill because the run-time was already 1.835 s, which is as fast as it gets. Turns out, I could still gain some wins through multiprocessing and tuning (3 is the magic pool size again), leading to a run-time of 1.144 s.

Roughly, the final runtime is like 12.617 s against 30.092 s, or about 42% of the original run-time. Put in another way, the processing is now 2.4× faster than before. That was rather surprising.

The timings include overhead from the bash shell that orchestrates the whole execution. For reference, without the bash overhead, the tuned version of the sitemap generator was at around 0.8 s.

Considering that Eileen-II has 6 physical cores and 12 logical cores, it is a little bit surprising that the best number of processes to spawn is not at either of those two numbers for this particular work load. Upon a little more reflection, it makes sense in that each time we spawn a new process in Python3 using the pool workers, it effectively reloads the module, possibly in another instance of the interpreter. We are processing a moderate list of small files, so this spawning process may end up dominating the run-time. Thus a more controlled (and smaller) number of processes would work better.

The more astute might ask why I am using process for [data] parallelisation instead of threads. Well, for CPython (the Python3 interpreter I am using), there exists a global interpreter lock that basically only allows a single thread to execute the Python byte-code. This limitation may not be applicable, but using multiprocessing is a quick and dirty way to get the ``embarassingly dumb`` data parallelisation to work, as compared to threading. I cannot guarantee that the workloads are completely I/O-bound since both the minification and sitemap generation process does include some in-memory based parsing/compression.

For now, I am happy with the results.

------

I met up with a friend I made when she was working at my favourite bar (last featured at this entry). She's in between jobs as well, though her domain of interest/expertise is vastly different from mine. We went to Casuarina Curry, located at 136, 138 Casuarina Road (Off Upper Thomson), Singapore 579524. I like Casuarina Curry because of the consistency of their prata, and that I have been there relatively recently to confirm the quality.

Back in the old days, I used to eat rather regularly at Thasevi Food out in Jalan Kayu. But I've not been there for a very, very long time, especially since the road to Jalan Kayu has been tweaked when the whole Seletar Aerospace Park became a thing.

Before Seletar Aerospace Park, it was a nice quiet quasi-abandoned version of the old Seletar Base. I used to serve my national service there, enjoying many a cycle about in the evenings after work and before nightfall. I don't think it is possible to cycle there safely now, because of the rather wide high-speed roads that are built there.

Anyway, we caught up and had a chat. It was nice to meet a familiar face that was not of my old crowd, not because I dislike my old crowd, but that it is important to start developing newer memories through newer and different social connections.

This is, of course, another attempt at helping with the re-alignment process. I can't get ready for the future if my mind is always full of information about the past.

I am running some experiments to extend a point that I raised earlier in this post, but I will just put this up first and put up a separate post when the results of the experiments are ready.

Stay tuned.

No comments: