Hard-coding adj ranges to search for victims results in a few problems.
Firstly, the hard-coded adjs must be vigilantly updated to match what
userspace uses, which makes long-term support a headache. Secondly, a
full traversal of every running process must be done for each adj range,
which can turn out to be quite expensive, especially if userspace
assigns many different adj values and we want to enumerate them all.
This leads us to the final problem, which is that processes with
different adjs within the same hard-coded adj range will be treated the
same, even though they're not: the process with a higher adj is less
important, and the process with a lower adj is more important. This
could be fixed by enumerating every possible adj, but again, that would
necessitate several scans through the active process list, which is bad
for performance, especially since latency is critical here.
Since adjs are only 16 bits, and we only care about positive adjs, that
leaves us with 15 bits of the adj that matter. This is a relatively
small number of potential adjs (32,768), which makes it possible to
allocate a static array that's indexed using the adj. Each entry in this
array is a pointer to the first task_struct in a singly-linked list of
task_structs sharing an adj. A `simple_lmk_next` member is added to
task_struct to accommodate this linked list. The victim finder now
iterates downward through the array searching for linked lists of tasks,
starting from the highest adj found, so that the lowest-priority
processes are always considered first for reclaim. This fixes all of the
problems mentioned above, and now there is only one traversal through
every running process. The array itself only takes up 256 KiB of memory
on 64-bit, which is a very small price to pay for the advantages gained.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>