Tuesday, March 23, 2021

Promised Experiment Results

Okay, I mentioned in the last post of running an experiment, and I have completed it to the level that I care about. Before I continue, here are the results:
Now the thing to observe here is that as the number of processes increases to 1 less than the total number of logical processors, the run-time for the page counting script steadily decreases. The run-time is estimated from the average of the list of readings while ignoring the two outliers (the maximum, and the minimum, highlighted in red). The ratio of the [sample] standard deviation to the [arithmetic] mean is the coefficient of variation, and can be thought of as a type of ``normalised'' measure to hint at the accuracy of the estimated reading.

The total number of raw bytes to be read numbers in the 3.1G range, and compared to the 18.6k bytes of the script and about one or two megabytes of support libraries/interpreters, completely dominates them. This means that the overhead of CPython's implementation of process-based data parallelisation is, in comparison, nothing as compared to the gains from the parallelism.

So the only tuning I did here was just to assign the larger files for processing first to better improve the pipeline.

Finally, I just want to add that I am now done with page 505/704 of OpenStax College: Organisational Behaviour. I feel like I need yet another shower before I turn in for the night.

Till the next update.

No comments: