Friday, July 30, 2021

More Tuning of Eileen-II to Beat the Heat and Clock

Okay, I've run this configuration for more than 2 days, and I think that things are mostly stable. So let's talk about what I did, and how it all turned out.

First, recall the specifications of Eileen-II. Next, recall the use of power limiting as a means of controlling the thermal load to ensure that the keyboard doesn't melt my fingers---the CPU can handle 100°C, but my fingers don't like the resultant temperature (which is less than 100°C for sure, but I don't have a working infrared sensor to report what the actual key temperatures are) from it.

Recall also statements made about stability issues when CPU VID was less than 0.500 V, and the associated setting of the offset to −35.2 mV (on CPU Core and CPU Cache) to stop that weird screen-blanking nonsense.

Finally, recall the most recent summary of the experiment, and the one preceding it, with the more interesting point on how the lowest observed CPU VID was 0.486 V (it has since dropped to 0.480 V without crashing), which is less than 0.500 V.

Now, with all the background work linked in, let us begin.

The current fairly stable configuration with tolerable subjective keyboard temperature is the following (in ThrottleStop 9.3):
  1. Setting of two profiles at −78.1 mV offsets for all components (CPU Core, CPU Cache, System Agent, Intel GPU, and iGPU Unslice);
  2. Turbo ratio limits for active core counts for the ``Bursty Performance'' profile (profile #1) set to 50×, 49×, 47×, 46×, 45×, and 43×;
  3. Turbo ratio limits for active core counts for the ``Continuous Performance'' profile (profile #2) set to 37×, 34×, 32×, 31×, 29×, and 26×, as determined from experimentation with a final more aggressive nerfing on the all-core multiplier;
  4. Disabling the ``Thermal Velocity Boost'' option for the ``Continuous Performance'' profile while leaving that option on for the ``Bursty Performance'' profile;
  5. Setting both profiles' ``Speed Shift---EPP'' values to 255;
  6. Setting ``More Data'' option on;
  7. Enabling ``Nvidia GPU'' in the Options button;
  8. Checking the ``Alarm'' checkbox, and filling in 16 for the ``DTS'' box, and 87 for the ``GPU °C'' box, while leaving both ``Use Profile'' options to 2;
  9. Have the PL1 setting set to 25 without clamping, and PL2 to 35 with clamping, and a Turbo Time Limit of 28, all under the TPL section.
The idea here is to allow the default setting (``Bursty Performance'' profile) to be primary, and use the Alarm capability of ThrottleStop to nerf the clock speeds hard when the temperatures go beyond the [much lowered] temperatures that can cause the keyboard to heat up uncomfortably. With the ``More Data'' option set, this check for switching profiles occurs much more frequently, and allows a higher average clock speed than using any individual profile alone, while having better control over the thermal behaviour. This is necessary because the thermal cooling system of Eileen-II is shared between the CPU and GPU, and of the two, the CPU is the one that generates the bulk of the heat, so it becomes important to treat the entire system holistically from a thermal control perspective. I've only set this to happen when we are plugged in, so all that extra CPU clock slices used to enact this switching can be easily justified from the power budget perspective.

While it has been reported in many articles on undervolting that the CPU Cache and CPU Core values should be the ``only'' ones to be undervolted to have the best cooling results, it turns out that doing that alone will lead to general instability. Careful tests with undervolting all other aspects of the CPU package equally has resulted in using much lower offsets with nearly-equal stability.

That is not enough though, because the Intel graphics driver likes to crash randomly. As noted in a previous quick summary, I updated the old version 27.x drivers to the newer version 30.x ones. I don't think that is sufficient though, and one of my hypotheses is that when undervolted, if the integrated GPU doesn't get enough work, the voltage drops, and it becomes less responsive, which panics the device driver. So one other thing that I did was to pull up the ``Power Options'' from Windows Settings, and changed the ``Minimum processor state'' when ``Plugged in'' from the original 5% to 15% of the current power plan's advanced settings.

So far, I have not experienced any BSOD crashes, and this is from more than 2 days of continuous operation with workloads spanning from idling [at night], to watching YouTube videos [continuously] to playing Grim Dawn while watching YouTube videos [for a few hours].

What I did observe though is that sometimes, ``Desktop Window Manager'' decides to grow really fat in terms of memory use, often hitting the 5+ GiB range. This is a known problem, and all the ``fixes'' described are, in my humble opinion, pretty bullshitty.

I do have a working workaround that does not involve doing very strange and scary things. It does, however, require one to get Process Explorer. Once downloaded, run procexp64.exe as Administrator, and in the application, look for dwm.exe. Right click on that, and select Restart. There will be some confirmation dialogue, but just choose the option that allows one to proceed. That rogue process will get killed off, and automatically restarted, giving it a much lower (~100+ MiB) memory use, all without having to restart or reinstall Windows, or roll back to an old version of the Intel graphics drivers and what-not. Process Explorer can be closed after that---its use is done after the restarting of the dwm.exe process. This same workaround can be used as many times as it is necessary to do so---this particular experiment run has seen me do this at least twice at the time of writing.

And so, now I have a slightly more performant Eileen-II without burning off my finger tips even as the bloody weather gets increasingly hotter and more humid.

That's all I wanted to write for this entry. Till the next update then.

No comments: