Thursday, March 11, 2021

Python Takes the Tedium Out of C

The Thursday came and is just about gone.

I continued reading a little bit more of On Playing the Flute: The Classic of Baroque Music Instruction (2nd Edition), and spent a little time watching various YouTube videos, but I had a concept of something to work on at the back of my mind.

A long time ago (end November 2014), I started on a small side project that was tentatively called filemanip. The side project was supposed to be explorations on various compression techniques, using a ``universal'' wrapper that would provide backward compatibility to any of the previous compression algorithms. Part of the reason for starting on that project was to explore the use of online machine learning techniques to provide better compression.

However, the last commit to that code base was some time in September 2016.

Part of the reason was that I was writing it in C99, and even though I love the C programming language, it was rather... cumbersome for something that was more research-y.

That was why even at the end of the last commit, all I had was run-length encoding with some bitstream I/O using Elias gamma coding for certain unbounded integers. And even then, the RLE was only for the encoding, and not the decoding.

In short, it was a bloody mess.

So I am starting from a different tack. This time, I'm just going to rely on good old Python3 as the main language, and writing the algorithms in some abstract way, before doing bit-stream I/O using the struct module. And to start things rolling, I just completed the code for generalised Huffman coding, or at least, the tree-building part. It runs fast enough, even when we have ridiculous input (265536 counter sizes anyone?). It's currently missing the generation of the canonical Huffman code, as well as other support modules to feed information in and out of the encoder/decoder.

Part of working with struct is a way of getting my hands dirty with the idea of using Python3 as a tool for lower-level programming. Most of the Python3 programming that I have done is very high-level in comparison---think writing RESTful services and/or orchestrating numerical computations. Most of the I/O that I have done are very text-centric---the concept of low-level byte or even bit I/O is something that I am not so familiar with in Python3.

But re-implementing bit I/O when I already have that set up in the C version seems like re-inventing the wheel. Perhaps what might happen is that I do some hybrid thing, where the C parts provide the lowest-level operations, but they call Python3 modules via an embedded Python3 interpreter to run the algorithms with the associated niceties.

Hmmm. That actually sounds fun. But maybe I will look into that if performance becomes more of an issue.

Anyway, that's all I have for today. Tomorrow's going to be a Friday---perhaps I should go cycling; I have not done so in a while thanks to really bad skin at the ankles where my sandal strap rubs on.

If it rains though... then it's back to being at home all over again.

Ah well, till the next update then.

No comments: