As it turns out, victim scheduling priority elevation has always been broken for two reasons: 1. The minimum valid RT priority is 1, not 0. As a result, sched_setscheduler_nocheck() always fails with -EINVAL. 2. The thread within a victim thread group which happens to hold the mm is not necessarily the only thread with references to the mm, and isn't necessarily the thread which will release the final mm reference. As a result, victim threads which hold mm references may take a while to release them, and the unlucky thread which puts the final mm reference may take a very long time to release all memory if it doesn't have RT scheduling priority. These issues cause victims to often take a very long time to release their memory, possibly up to several seconds depending on system load. This, in turn, causes Simple LMK to constantly hit the reclaim timeout and kill more processes, with Simple LMK being rather ineffective since victims may not release any memory for several seconds. Fix the broken scheduling priority elevation by changing the RT priority to the valid lowest priority of 1 and applying it to all threads in the thread group, instead of just the thread which holds the mm. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>fourteen
parent
2e24da16e2
commit
4736a011ba
Loading…
Reference in new issue