Monday, February 22, 2021

[Short] Fiction LaTeX Toolchain Update and Other Stories

Ah... so close to completing The Outer Worlds! If only I didn't actually spend time working on other things... but I believe I am jumping the gun here.

I finally spent some time reading Deep Learning, and have something to admit---the more I read it, the more I am starting to agree with Brian that I am wasting my time. It is not that the concept of deep learning is bad, it is just that, unlike more ``stupid'' algorithms like say decision trees, or even some kind of Bayesian belief network, there seems to be nothing innately interesting about deep learning. I get that they are trying to attack the machine learning problem through the use of function composition as the way of generating complexity out of algorithmically simpler units, but there is almost nothing intuitive about it, just really dry tensor mathematics with a tinge of function optimisation in disguise.

Maybe it's because I'm only [nearly] halfway through the 802-page e-book (387 as at the time of writing). But honestly, it doesn't look like it is going to get any better. I can probably understand what goes into a deep learning system, but it is unlikely to draw me close in the way the other learning architectures would.

Heck, even SVMs are more appealing from the intuitive and theoretical aspect.

I think part of the reason is that a large part of deep learning is not amenable to proper human understanding, at least in the way it is being presented. It's a case of ``here, pick/design some deep learning architecture (equivalent to deciding on the basis functions, and the space of function composition to operate in), chuck enough data and time at it, and we can universally approximate the unknown function to `compress' the data representation for future prediction''.

But I should really be drawing conclusions after finishing the book. As I said though, it really doesn't seem to be getting any more interesting. If this is the kind of work that people are expecting data science to be, wow, it's probably the most hypocritcal thing for me to take up those kinds of jobs.

------

Deep Learning aside, I accidentally spent some time doing more fixes on my various toolchains, not necessarily related to the maintenance of my personal domain. I was actually debating on whether to work on Print Me for today when I realised that I still had Elizabeth left to complete.

For those who aren't familiar with my blog-based scribbles, I have a tendency to create nicely typeset versions of such ``serialised'' works after I have completed the story. That I hadn't generated one for Elizabeth was a hint that I haven't completed the story, despite the 25 parts that I have written thus far.

I tried building the PDF from the current writing so as to read the story as a whole to see what is left to write, and found some warnings that were being tossed at me by LaTeX---they had something to do with the weird margins that I was using. I fiddled about before finally realising that I had already fixed the style sheet that I was using for my fiction back in 2015---I just didn't propagate the changes. And so, I spent some time updating the pipelines for each of these standalone stories' typesetting, and regenerated those PDFs to replace the old [and bad] ones. The dates of the stories are still the same as before, but the margins are definitely more consistent.

Naturally, I didn't get to read Elizabeth to determine the amount of work left. I didn't work on Print Me either. Maybe tomorrow.

------

On one final note, I had finally bring myself to writing a quick tool to identify duplicate images by visual perception. The situation is this, I have a set of one thousand or so image files that I have been collecting over time to act as desktop background images. They tend to be of high resolution, and come from many sources, and so one of the first things that I did was to write a script that will rename them according to some schema that keeps their order and resolution ``obvious''.

Some images are just too lovely, and after a long enough time, I accidentally have some duplicates from accidentally redownloading them for storage again.

Visually inspecting which ones are similar from Windows Explorer alone was stupidly impossible, and it was obvious that I needed to use automation to help, specifically some kind of similarity comparison.

Armed with Pillow, I wrote a Python3 script that would convert each image into a row-vector of 20 dimensions, with each entry being a number from [-1,+1] representing absolute black and absolute white from a Lanczos-resampled grayscale conversion of the original image.

Similarity was then done using a variation of cosine similarity---I used the cosine-angle in radians and set the angle threshold to 0.01 to claim similarity.

Images that were similar were physically dumped into a directory representing the cluster, while images that had nothing similar to them were just copied wholesale.

From there, I could do the visual inspection (there were 10 clusters of similar images) and weed out what I didn't want, before re-running the renaming script on the unique-ified images.

Was it overkill? Maybe, especially when the images that were similar appeared to be exactly the same---it wasn't even the case that one was a higher resolution version of the same image as another. Since I did not have the wherewithal to actually see which of the similarity cases were there, it was just faster to use a more general algorithm.

The job was done, and I am happy.

------

I think that I have been writing a little too much about writing random scripts/updating various toolchains that I use to operate my cyberspace affairs. I suppose it is just the season---it seems that I am currently in the ``hot'' phase of programming useful scripts/fixing old ones.

There is still one more thing that is niggling at the back of my mind that I would want to do before I am ``programmed out'', and that involves music.

I may also want to do some lighter reading instead of Deep Learning, perhaps a short anthology of short stories, or even poems. I don't know which just yet, but I have some decent ideas. And they are likely to be dead-tree versions too, just for variety.

I think that's all I have for today. Maybe after I shower, I will try and finish up one of the endings for The Outer Worlds.

Maybe.

Anyway, till the next update.

No comments: