commit 4675ff05de2d76d167336b368bd07f3fef6ed5a6 upstream. Fix up makefiles, remove references, and git rm kmemcheck. Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Pekka Enberg <penberg@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexander Potapenko <glider@google.com> Cc: Tim Hansen <devtimhansen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>tirimbino
parent
b9870f8581
commit
f369f14861
@ -1,733 +0,0 @@ |
||||
Getting started with kmemcheck |
||||
============================== |
||||
|
||||
Vegard Nossum <vegardno@ifi.uio.no> |
||||
|
||||
|
||||
Introduction |
||||
------------ |
||||
|
||||
kmemcheck is a debugging feature for the Linux Kernel. More specifically, it |
||||
is a dynamic checker that detects and warns about some uses of uninitialized |
||||
memory. |
||||
|
||||
Userspace programmers might be familiar with Valgrind's memcheck. The main |
||||
difference between memcheck and kmemcheck is that memcheck works for userspace |
||||
programs only, and kmemcheck works for the kernel only. The implementations |
||||
are of course vastly different. Because of this, kmemcheck is not as accurate |
||||
as memcheck, but it turns out to be good enough in practice to discover real |
||||
programmer errors that the compiler is not able to find through static |
||||
analysis. |
||||
|
||||
Enabling kmemcheck on a kernel will probably slow it down to the extent that |
||||
the machine will not be usable for normal workloads such as e.g. an |
||||
interactive desktop. kmemcheck will also cause the kernel to use about twice |
||||
as much memory as normal. For this reason, kmemcheck is strictly a debugging |
||||
feature. |
||||
|
||||
|
||||
Downloading |
||||
----------- |
||||
|
||||
As of version 2.6.31-rc1, kmemcheck is included in the mainline kernel. |
||||
|
||||
|
||||
Configuring and compiling |
||||
------------------------- |
||||
|
||||
kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of |
||||
configuration variables must have specific settings in order for the kmemcheck |
||||
menu to even appear in "menuconfig". These are: |
||||
|
||||
- ``CONFIG_CC_OPTIMIZE_FOR_SIZE=n`` |
||||
This option is located under "General setup" / "Optimize for size". |
||||
|
||||
Without this, gcc will use certain optimizations that usually lead to |
||||
false positive warnings from kmemcheck. An example of this is a 16-bit |
||||
field in a struct, where gcc may load 32 bits, then discard the upper |
||||
16 bits. kmemcheck sees only the 32-bit load, and may trigger a |
||||
warning for the upper 16 bits (if they're uninitialized). |
||||
|
||||
- ``CONFIG_SLAB=y`` or ``CONFIG_SLUB=y`` |
||||
This option is located under "General setup" / "Choose SLAB |
||||
allocator". |
||||
|
||||
- ``CONFIG_FUNCTION_TRACER=n`` |
||||
This option is located under "Kernel hacking" / "Tracers" / "Kernel |
||||
Function Tracer" |
||||
|
||||
When function tracing is compiled in, gcc emits a call to another |
||||
function at the beginning of every function. This means that when the |
||||
page fault handler is called, the ftrace framework will be called |
||||
before kmemcheck has had a chance to handle the fault. If ftrace then |
||||
modifies memory that was tracked by kmemcheck, the result is an |
||||
endless recursive page fault. |
||||
|
||||
- ``CONFIG_DEBUG_PAGEALLOC=n`` |
||||
This option is located under "Kernel hacking" / "Memory Debugging" |
||||
/ "Debug page memory allocations". |
||||
|
||||
In addition, I highly recommend turning on ``CONFIG_DEBUG_INFO=y``. This is also |
||||
located under "Kernel hacking". With this, you will be able to get line number |
||||
information from the kmemcheck warnings, which is extremely valuable in |
||||
debugging a problem. This option is not mandatory, however, because it slows |
||||
down the compilation process and produces a much bigger kernel image. |
||||
|
||||
Now the kmemcheck menu should be visible (under "Kernel hacking" / "Memory |
||||
Debugging" / "kmemcheck: trap use of uninitialized memory"). Here follows |
||||
a description of the kmemcheck configuration variables: |
||||
|
||||
- ``CONFIG_KMEMCHECK`` |
||||
This must be enabled in order to use kmemcheck at all... |
||||
|
||||
- ``CONFIG_KMEMCHECK_``[``DISABLED`` | ``ENABLED`` | ``ONESHOT``]``_BY_DEFAULT`` |
||||
This option controls the status of kmemcheck at boot-time. "Enabled" |
||||
will enable kmemcheck right from the start, "disabled" will boot the |
||||
kernel as normal (but with the kmemcheck code compiled in, so it can |
||||
be enabled at run-time after the kernel has booted), and "one-shot" is |
||||
a special mode which will turn kmemcheck off automatically after |
||||
detecting the first use of uninitialized memory. |
||||
|
||||
If you are using kmemcheck to actively debug a problem, then you |
||||
probably want to choose "enabled" here. |
||||
|
||||
The one-shot mode is mostly useful in automated test setups because it |
||||
can prevent floods of warnings and increase the chances of the machine |
||||
surviving in case something is really wrong. In other cases, the one- |
||||
shot mode could actually be counter-productive because it would turn |
||||
itself off at the very first error -- in the case of a false positive |
||||
too -- and this would come in the way of debugging the specific |
||||
problem you were interested in. |
||||
|
||||
If you would like to use your kernel as normal, but with a chance to |
||||
enable kmemcheck in case of some problem, it might be a good idea to |
||||
choose "disabled" here. When kmemcheck is disabled, most of the run- |
||||
time overhead is not incurred, and the kernel will be almost as fast |
||||
as normal. |
||||
|
||||
- ``CONFIG_KMEMCHECK_QUEUE_SIZE`` |
||||
Select the maximum number of error reports to store in an internal |
||||
(fixed-size) buffer. Since errors can occur virtually anywhere and in |
||||
any context, we need a temporary storage area which is guaranteed not |
||||
to generate any other page faults when accessed. The queue will be |
||||
emptied as soon as a tasklet may be scheduled. If the queue is full, |
||||
new error reports will be lost. |
||||
|
||||
The default value of 64 is probably fine. If some code produces more |
||||
than 64 errors within an irqs-off section, then the code is likely to |
||||
produce many, many more, too, and these additional reports seldom give |
||||
any more information (the first report is usually the most valuable |
||||
anyway). |
||||
|
||||
This number might have to be adjusted if you are not using serial |
||||
console or similar to capture the kernel log. If you are using the |
||||
"dmesg" command to save the log, then getting a lot of kmemcheck |
||||
warnings might overflow the kernel log itself, and the earlier reports |
||||
will get lost in that way instead. Try setting this to 10 or so on |
||||
such a setup. |
||||
|
||||
- ``CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT`` |
||||
Select the number of shadow bytes to save along with each entry of the |
||||
error-report queue. These bytes indicate what parts of an allocation |
||||
are initialized, uninitialized, etc. and will be displayed when an |
||||
error is detected to help the debugging of a particular problem. |
||||
|
||||
The number entered here is actually the logarithm of the number of |
||||
bytes that will be saved. So if you pick for example 5 here, kmemcheck |
||||
will save 2^5 = 32 bytes. |
||||
|
||||
The default value should be fine for debugging most problems. It also |
||||
fits nicely within 80 columns. |
||||
|
||||
- ``CONFIG_KMEMCHECK_PARTIAL_OK`` |
||||
This option (when enabled) works around certain GCC optimizations that |
||||
produce 32-bit reads from 16-bit variables where the upper 16 bits are |
||||
thrown away afterwards. |
||||
|
||||
The default value (enabled) is recommended. This may of course hide |
||||
some real errors, but disabling it would probably produce a lot of |
||||
false positives. |
||||
|
||||
- ``CONFIG_KMEMCHECK_BITOPS_OK`` |
||||
This option silences warnings that would be generated for bit-field |
||||
accesses where not all the bits are initialized at the same time. This |
||||
may also hide some real bugs. |
||||
|
||||
This option is probably obsolete, or it should be replaced with |
||||
the kmemcheck-/bitfield-annotations for the code in question. The |
||||
default value is therefore fine. |
||||
|
||||
Now compile the kernel as usual. |
||||
|
||||
|
||||
How to use |
||||
---------- |
||||
|
||||
Booting |
||||
~~~~~~~ |
||||
|
||||
First some information about the command-line options. There is only one |
||||
option specific to kmemcheck, and this is called "kmemcheck". It can be used |
||||
to override the default mode as chosen by the ``CONFIG_KMEMCHECK_*_BY_DEFAULT`` |
||||
option. Its possible settings are: |
||||
|
||||
- ``kmemcheck=0`` (disabled) |
||||
- ``kmemcheck=1`` (enabled) |
||||
- ``kmemcheck=2`` (one-shot mode) |
||||
|
||||
If SLUB debugging has been enabled in the kernel, it may take precedence over |
||||
kmemcheck in such a way that the slab caches which are under SLUB debugging |
||||
will not be tracked by kmemcheck. In order to ensure that this doesn't happen |
||||
(even though it shouldn't by default), use SLUB's boot option ``slub_debug``, |
||||
like this: ``slub_debug=-`` |
||||
|
||||
In fact, this option may also be used for fine-grained control over SLUB vs. |
||||
kmemcheck. For example, if the command line includes |
||||
``kmemcheck=1 slub_debug=,dentry``, then SLUB debugging will be used only |
||||
for the "dentry" slab cache, and with kmemcheck tracking all the other |
||||
caches. This is advanced usage, however, and is not generally recommended. |
||||
|
||||
|
||||
Run-time enable/disable |
||||
~~~~~~~~~~~~~~~~~~~~~~~ |
||||
|
||||
When the kernel has booted, it is possible to enable or disable kmemcheck at |
||||
run-time. WARNING: This feature is still experimental and may cause false |
||||
positive warnings to appear. Therefore, try not to use this. If you find that |
||||
it doesn't work properly (e.g. you see an unreasonable amount of warnings), I |
||||
will be happy to take bug reports. |
||||
|
||||
Use the file ``/proc/sys/kernel/kmemcheck`` for this purpose, e.g.:: |
||||
|
||||
$ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck |
||||
|
||||
The numbers are the same as for the ``kmemcheck=`` command-line option. |
||||
|
||||
|
||||
Debugging |
||||
~~~~~~~~~ |
||||
|
||||
A typical report will look something like this:: |
||||
|
||||
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024) |
||||
80000000000000000000000000000000000000000088ffff0000000000000000 |
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u |
||||
^ |
||||
|
||||
Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A |
||||
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190 |
||||
RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002 |
||||
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009 |
||||
RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84 |
||||
RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000 |
||||
R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e |
||||
R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8 |
||||
FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000 |
||||
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 |
||||
CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0 |
||||
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 |
||||
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 |
||||
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170 |
||||
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390 |
||||
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0 |
||||
[<ffffffff8100c7b5>] int_signal+0x12/0x17 |
||||
[<ffffffffffffffff>] 0xffffffffffffffff |
||||
|
||||
The single most valuable information in this report is the RIP (or EIP on 32- |
||||
bit) value. This will help us pinpoint exactly which instruction that caused |
||||
the warning. |
||||
|
||||
If your kernel was compiled with ``CONFIG_DEBUG_INFO=y``, then all we have to do |
||||
is give this address to the addr2line program, like this:: |
||||
|
||||
$ addr2line -e vmlinux -i ffffffff8104ede8 |
||||
arch/x86/include/asm/string_64.h:12 |
||||
include/asm-generic/siginfo.h:287 |
||||
kernel/signal.c:380 |
||||
kernel/signal.c:410 |
||||
|
||||
The "``-e vmlinux``" tells addr2line which file to look in. **IMPORTANT:** |
||||
This must be the vmlinux of the kernel that produced the warning in the |
||||
first place! If not, the line number information will almost certainly be |
||||
wrong. |
||||
|
||||
The "``-i``" tells addr2line to also print the line numbers of inlined |
||||
functions. In this case, the flag was very important, because otherwise, |
||||
it would only have printed the first line, which is just a call to |
||||
``memcpy()``, which could be called from a thousand places in the kernel, and |
||||
is therefore not very useful. These inlined functions would not show up in |
||||
the stack trace above, simply because the kernel doesn't load the extra |
||||
debugging information. This technique can of course be used with ordinary |
||||
kernel oopses as well. |
||||
|
||||
In this case, it's the caller of ``memcpy()`` that is interesting, and it can be |
||||
found in ``include/asm-generic/siginfo.h``, line 287:: |
||||
|
||||
281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from) |
||||
282 { |
||||
283 if (from->si_code < 0) |
||||
284 memcpy(to, from, sizeof(*to)); |
||||
285 else |
||||
286 /* _sigchld is currently the largest know union member */ |
||||
287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld)); |
||||
288 } |
||||
|
||||
Since this was a read (kmemcheck usually warns about reads only, though it can |
||||
warn about writes to unallocated or freed memory as well), it was probably the |
||||
"from" argument which contained some uninitialized bytes. Following the chain |
||||
of calls, we move upwards to see where "from" was allocated or initialized, |
||||
``kernel/signal.c``, line 380:: |
||||
|
||||
359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info) |
||||
360 { |
||||
... |
||||
367 list_for_each_entry(q, &list->list, list) { |
||||
368 if (q->info.si_signo == sig) { |
||||
369 if (first) |
||||
370 goto still_pending; |
||||
371 first = q; |
||||
... |
||||
377 if (first) { |
||||
378 still_pending: |
||||
379 list_del_init(&first->list); |
||||
380 copy_siginfo(info, &first->info); |
||||
381 __sigqueue_free(first); |
||||
... |
||||
392 } |
||||
393 } |
||||
|
||||
Here, it is ``&first->info`` that is being passed on to ``copy_siginfo()``. The |
||||
variable ``first`` was found on a list -- passed in as the second argument to |
||||
``collect_signal()``. We continue our journey through the stack, to figure out |
||||
where the item on "list" was allocated or initialized. We move to line 410:: |
||||
|
||||
395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, |
||||
396 siginfo_t *info) |
||||
397 { |
||||
... |
||||
410 collect_signal(sig, pending, info); |
||||
... |
||||
414 } |
||||
|
||||
Now we need to follow the ``pending`` pointer, since that is being passed on to |
||||
``collect_signal()`` as ``list``. At this point, we've run out of lines from the |
||||
"addr2line" output. Not to worry, we just paste the next addresses from the |
||||
kmemcheck stack dump, i.e.:: |
||||
|
||||
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170 |
||||
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390 |
||||
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0 |
||||
[<ffffffff8100c7b5>] int_signal+0x12/0x17 |
||||
|
||||
$ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \ |
||||
ffffffff8100b87d ffffffff8100c7b5 |
||||
kernel/signal.c:446 |
||||
kernel/signal.c:1806 |
||||
arch/x86/kernel/signal.c:805 |
||||
arch/x86/kernel/signal.c:871 |
||||
arch/x86/kernel/entry_64.S:694 |
||||
|
||||
Remember that since these addresses were found on the stack and not as the |
||||
RIP value, they actually point to the _next_ instruction (they are return |
||||
addresses). This becomes obvious when we look at the code for line 446:: |
||||
|
||||
422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) |
||||
423 { |
||||
... |
||||
431 signr = __dequeue_signal(&tsk->signal->shared_pending, |
||||
432 mask, info); |
||||
433 /* |
||||
434 * itimer signal ? |
||||
435 * |
||||
436 * itimers are process shared and we restart periodic |
||||
437 * itimers in the signal delivery path to prevent DoS |
||||
438 * attacks in the high resolution timer case. This is |
||||
439 * compliant with the old way of self restarting |
||||
440 * itimers, as the SIGALRM is a legacy signal and only |
||||
441 * queued once. Changing the restart behaviour to |
||||
442 * restart the timer in the signal dequeue path is |
||||
443 * reducing the timer noise on heavy loaded !highres |
||||
444 * systems too. |
||||
445 */ |
||||
446 if (unlikely(signr == SIGALRM)) { |
||||
... |
||||
489 } |
||||
|
||||
So instead of looking at 446, we should be looking at 431, which is the line |
||||
that executes just before 446. Here we see that what we are looking for is |
||||
``&tsk->signal->shared_pending``. |
||||
|
||||
Our next task is now to figure out which function that puts items on this |
||||
``shared_pending`` list. A crude, but efficient tool, is ``git grep``:: |
||||
|
||||
$ git grep -n 'shared_pending' kernel/ |
||||
... |
||||
kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending; |
||||
kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending; |
||||
... |
||||
|
||||
There were more results, but none of them were related to list operations, |
||||
and these were the only assignments. We inspect the line numbers more closely |
||||
and find that this is indeed where items are being added to the list:: |
||||
|
||||
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t, |
||||
817 int group) |
||||
818 { |
||||
... |
||||
828 pending = group ? &t->signal->shared_pending : &t->pending; |
||||
... |
||||
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && |
||||
852 (is_si_special(info) || |
||||
853 info->si_code >= 0))); |
||||
854 if (q) { |
||||
855 list_add_tail(&q->list, &pending->list); |
||||
... |
||||
890 } |
||||
|
||||
and:: |
||||
|
||||
1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) |
||||
1310 { |
||||
.... |
||||
1339 pending = group ? &t->signal->shared_pending : &t->pending; |
||||
1340 list_add_tail(&q->list, &pending->list); |
||||
.... |
||||
1347 } |
||||
|
||||
In the first case, the list element we are looking for, ``q``, is being |
||||
returned from the function ``__sigqueue_alloc()``, which looks like an |
||||
allocation function. Let's take a look at it:: |
||||
|
||||
187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags, |
||||
188 int override_rlimit) |
||||
189 { |
||||
190 struct sigqueue *q = NULL; |
||||
191 struct user_struct *user; |
||||
192 |
||||
193 /* |
||||
194 * We won't get problems with the target's UID changing under us |
||||
195 * because changing it requires RCU be used, and if t != current, the |
||||
196 * caller must be holding the RCU readlock (by way of a spinlock) and |
||||
197 * we use RCU protection here |
||||
198 */ |
||||
199 user = get_uid(__task_cred(t)->user); |
||||
200 atomic_inc(&user->sigpending); |
||||
201 if (override_rlimit || |
||||
202 atomic_read(&user->sigpending) <= |
||||
203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) |
||||
204 q = kmem_cache_alloc(sigqueue_cachep, flags); |
||||
205 if (unlikely(q == NULL)) { |
||||
206 atomic_dec(&user->sigpending); |
||||
207 free_uid(user); |
||||
208 } else { |
||||
209 INIT_LIST_HEAD(&q->list); |
||||
210 q->flags = 0; |
||||
211 q->user = user; |
||||
212 } |
||||
213 |
||||
214 return q; |
||||
215 } |
||||
|
||||
We see that this function initializes ``q->list``, ``q->flags``, and |
||||
``q->user``. It seems that now is the time to look at the definition of |
||||
``struct sigqueue``, e.g.:: |
||||
|
||||
14 struct sigqueue { |
||||
15 struct list_head list; |
||||
16 int flags; |
||||
17 siginfo_t info; |
||||
18 struct user_struct *user; |
||||
19 }; |
||||
|
||||
And, you might remember, it was a ``memcpy()`` on ``&first->info`` that |
||||
caused the warning, so this makes perfect sense. It also seems reasonable |
||||
to assume that it is the caller of ``__sigqueue_alloc()`` that has the |
||||
responsibility of filling out (initializing) this member. |
||||
|
||||
But just which fields of the struct were uninitialized? Let's look at |
||||
kmemcheck's report again:: |
||||
|
||||
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024) |
||||
80000000000000000000000000000000000000000088ffff0000000000000000 |
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u |
||||
^ |
||||
|
||||
These first two lines are the memory dump of the memory object itself, and |
||||
the shadow bytemap, respectively. The memory object itself is in this case |
||||
``&first->info``. Just beware that the start of this dump is NOT the start |
||||
of the object itself! The position of the caret (^) corresponds with the |
||||
address of the read (ffff88003e4a2024). |
||||
|
||||
The shadow bytemap dump legend is as follows: |
||||
|
||||
- i: initialized |
||||
- u: uninitialized |
||||
- a: unallocated (memory has been allocated by the slab layer, but has not |
||||
yet been handed off to anybody) |
||||
- f: freed (memory has been allocated by the slab layer, but has been freed |
||||
by the previous owner) |
||||
|
||||
In order to figure out where (relative to the start of the object) the |
||||
uninitialized memory was located, we have to look at the disassembly. For |
||||
that, we'll need the RIP address again:: |
||||
|
||||
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190 |
||||
|
||||
$ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8: |
||||
ffffffff8104edc8: mov %r8,0x8(%r8) |
||||
ffffffff8104edcc: test %r10d,%r10d |
||||
ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168> |
||||
ffffffff8104edd5: mov %rax,%rdx |
||||
ffffffff8104edd8: mov $0xc,%ecx |
||||
ffffffff8104eddd: mov %r13,%rdi |
||||
ffffffff8104ede0: mov $0x30,%eax |
||||
ffffffff8104ede5: mov %rdx,%rsi |
||||
ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi) |
||||
ffffffff8104edea: test $0x2,%al |
||||
ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0> |
||||
ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi) |
||||
ffffffff8104edf0: test $0x1,%al |
||||
ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5> |
||||
ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi) |
||||
ffffffff8104edf5: mov %r8,%rdi |
||||
ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free> |
||||
|
||||
As expected, it's the "``rep movsl``" instruction from the ``memcpy()`` |
||||
that causes the warning. We know about ``REP MOVSL`` that it uses the register |
||||
``RCX`` to count the number of remaining iterations. By taking a look at the |
||||
register dump again (from the kmemcheck report), we can figure out how many |
||||
bytes were left to copy:: |
||||
|
||||
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009 |
||||
|
||||
By looking at the disassembly, we also see that ``%ecx`` is being loaded |
||||
with the value ``$0xc`` just before (ffffffff8104edd8), so we are very |
||||
lucky. Keep in mind that this is the number of iterations, not bytes. And |
||||
since this is a "long" operation, we need to multiply by 4 to get the |
||||
number of bytes. So this means that the uninitialized value was encountered |
||||
at 4 * (0xc - 0x9) = 12 bytes from the start of the object. |
||||
|
||||
We can now try to figure out which field of the "``struct siginfo``" that |
||||
was not initialized. This is the beginning of the struct:: |
||||
|
||||
40 typedef struct siginfo { |
||||
41 int si_signo; |
||||
42 int si_errno; |
||||
43 int si_code; |
||||
44 |
||||
45 union { |
||||
.. |
||||
92 } _sifields; |
||||
93 } siginfo_t; |
||||
|
||||
On 64-bit, the int is 4 bytes long, so it must the union member that has |
||||
not been initialized. We can verify this using gdb:: |
||||
|
||||
$ gdb vmlinux |
||||
... |
||||
(gdb) p &((struct siginfo *) 0)->_sifields |
||||
$1 = (union {...} *) 0x10 |
||||
|
||||
Actually, it seems that the union member is located at offset 0x10 -- which |
||||
means that gcc has inserted 4 bytes of padding between the members ``si_code`` |
||||
and ``_sifields``. We can now get a fuller picture of the memory dump:: |
||||
|
||||
_----------------------------=> si_code |
||||
/ _--------------------=> (padding) |
||||
| / _------------=> _sifields(._kill._pid) |
||||
| | / _----=> _sifields(._kill._uid) |
||||
| | | / |
||||
-------|-------|-------|-------| |
||||
80000000000000000000000000000000000000000088ffff0000000000000000 |
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u |
||||
|
||||
This allows us to realize another important fact: ``si_code`` contains the |
||||
value 0x80. Remember that x86 is little endian, so the first 4 bytes |
||||
"80000000" are really the number 0x00000080. With a bit of research, we |
||||
find that this is actually the constant ``SI_KERNEL`` defined in |
||||
``include/asm-generic/siginfo.h``:: |
||||
|
||||
144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */ |
||||
|
||||
This macro is used in exactly one place in the x86 kernel: In ``send_signal()`` |
||||
in ``kernel/signal.c``:: |
||||
|
||||
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t, |
||||
817 int group) |
||||
818 { |
||||
... |
||||
828 pending = group ? &t->signal->shared_pending : &t->pending; |
||||
... |
||||
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && |
||||
852 (is_si_special(info) || |
||||
853 info->si_code >= 0))); |
||||
854 if (q) { |
||||
855 list_add_tail(&q->list, &pending->list); |
||||
856 switch ((unsigned long) info) { |
||||
... |
||||
865 case (unsigned long) SEND_SIG_PRIV: |
||||
866 q->info.si_signo = sig; |
||||
867 q->info.si_errno = 0; |
||||
868 q->info.si_code = SI_KERNEL; |
||||
869 q->info.si_pid = 0; |
||||
870 q->info.si_uid = 0; |
||||
871 break; |
||||
... |
||||
890 } |
||||
|
||||
Not only does this match with the ``.si_code`` member, it also matches the place |
||||
we found earlier when looking for where siginfo_t objects are enqueued on the |
||||
``shared_pending`` list. |
||||
|
||||
So to sum up: It seems that it is the padding introduced by the compiler |
||||
between two struct fields that is uninitialized, and this gets reported when |
||||
we do a ``memcpy()`` on the struct. This means that we have identified a false |
||||
positive warning. |
||||
|
||||
Normally, kmemcheck will not report uninitialized accesses in ``memcpy()`` calls |
||||
when both the source and destination addresses are tracked. (Instead, we copy |
||||
the shadow bytemap as well). In this case, the destination address clearly |
||||
was not tracked. We can dig a little deeper into the stack trace from above:: |
||||
|
||||
arch/x86/kernel/signal.c:805 |
||||
arch/x86/kernel/signal.c:871 |
||||
arch/x86/kernel/entry_64.S:694 |
||||
|
||||
And we clearly see that the destination siginfo object is located on the |
||||
stack:: |
||||
|
||||
782 static void do_signal(struct pt_regs *regs) |
||||
783 { |
||||
784 struct k_sigaction ka; |
||||
785 siginfo_t info; |
||||
... |
||||
804 signr = get_signal_to_deliver(&info, &ka, regs, NULL); |
||||
... |
||||
854 } |
||||
|
||||
And this ``&info`` is what eventually gets passed to ``copy_siginfo()`` as the |
||||
destination argument. |
||||
|
||||
Now, even though we didn't find an actual error here, the example is still a |
||||
good one, because it shows how one would go about to find out what the report |
||||
was all about. |
||||
|
||||
|
||||
Annotating false positives |
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~ |
||||
|
||||
There are a few different ways to make annotations in the source code that |
||||
will keep kmemcheck from checking and reporting certain allocations. Here |
||||
they are: |
||||
|
||||
- ``__GFP_NOTRACK_FALSE_POSITIVE`` |
||||
This flag can be passed to ``kmalloc()`` or ``kmem_cache_alloc()`` |
||||
(therefore also to other functions that end up calling one of |
||||
these) to indicate that the allocation should not be tracked |
||||
because it would lead to a false positive report. This is a "big |
||||
hammer" way of silencing kmemcheck; after all, even if the false |
||||
positive pertains to particular field in a struct, for example, we |
||||
will now lose the ability to find (real) errors in other parts of |
||||
the same struct. |
||||
|
||||
Example:: |
||||
|
||||
/* No warnings will ever trigger on accessing any part of x */ |
||||
x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE); |
||||
|
||||
- ``kmemcheck_bitfield_begin(name)``/``kmemcheck_bitfield_end(name)`` and |
||||
``kmemcheck_annotate_bitfield(ptr, name)`` |
||||
The first two of these three macros can be used inside struct |
||||
definitions to signal, respectively, the beginning and end of a |
||||
bitfield. Additionally, this will assign the bitfield a name, which |
||||
is given as an argument to the macros. |
||||
|
||||
Having used these markers, one can later use |
||||
kmemcheck_annotate_bitfield() at the point of allocation, to indicate |
||||
which parts of the allocation is part of a bitfield. |
||||
|
||||
Example:: |
||||
|
||||
struct foo { |
||||
int x; |
||||
|
||||
kmemcheck_bitfield_begin(flags); |
||||
int flag_a:1; |
||||
int flag_b:1; |
||||
kmemcheck_bitfield_end(flags); |
||||
|
||||
int y; |
||||
}; |
||||
|
||||
struct foo *x = kmalloc(sizeof *x); |
||||
|
||||
/* No warnings will trigger on accessing the bitfield of x */ |
||||
kmemcheck_annotate_bitfield(x, flags); |
||||
|
||||
Note that ``kmemcheck_annotate_bitfield()`` can be used even before the |
||||
return value of ``kmalloc()`` is checked -- in other words, passing NULL |
||||
as the first argument is legal (and will do nothing). |
||||
|
||||
|
||||
Reporting errors |
||||
---------------- |
||||
|
||||
As we have seen, kmemcheck will produce false positive reports. Therefore, it |
||||
is not very wise to blindly post kmemcheck warnings to mailing lists and |
||||
maintainers. Instead, I encourage maintainers and developers to find errors |
||||
in their own code. If you get a warning, you can try to work around it, try |
||||
to figure out if it's a real error or not, or simply ignore it. Most |
||||
developers know their own code and will quickly and efficiently determine the |
||||
root cause of a kmemcheck report. This is therefore also the most efficient |
||||
way to work with kmemcheck. |
||||
|
||||
That said, we (the kmemcheck maintainers) will always be on the lookout for |
||||
false positives that we can annotate and silence. So whatever you find, |
||||
please drop us a note privately! Kernel configs and steps to reproduce (if |
||||
available) are of course a great help too. |
||||
|
||||
Happy hacking! |
||||
|
||||
|
||||
Technical description |
||||
--------------------- |
||||
|
||||
kmemcheck works by marking memory pages non-present. This means that whenever |
||||
somebody attempts to access the page, a page fault is generated. The page |
||||
fault handler notices that the page was in fact only hidden, and so it calls |
||||
on the kmemcheck code to make further investigations. |
||||
|
||||
When the investigations are completed, kmemcheck "shows" the page by marking |
||||
it present (as it would be under normal circumstances). This way, the |
||||
interrupted code can continue as usual. |
||||
|
||||
But after the instruction has been executed, we should hide the page again, so |
||||
that we can catch the next access too! Now kmemcheck makes use of a debugging |
||||
feature of the processor, namely single-stepping. When the processor has |
||||
finished the one instruction that generated the memory access, a debug |
||||
exception is raised. From here, we simply hide the page again and continue |
||||
execution, this time with the single-stepping feature turned off. |
||||
|
||||
kmemcheck requires some assistance from the memory allocator in order to work. |
||||
The memory allocator needs to |
||||
|
||||
1. Tell kmemcheck about newly allocated pages and pages that are about to |
||||
be freed. This allows kmemcheck to set up and tear down the shadow memory |
||||
for the pages in question. The shadow memory stores the status of each |
||||
byte in the allocation proper, e.g. whether it is initialized or |
||||
uninitialized. |
||||
|
||||
2. Tell kmemcheck which parts of memory should be marked uninitialized. |
||||
There are actually a few more states, such as "not yet allocated" and |
||||
"recently freed". |
||||
|
||||
If a slab cache is set up using the SLAB_NOTRACK flag, it will never return |
||||
memory that can take page faults because of kmemcheck. |
||||
|
||||
If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still |
||||
request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags. |
||||
This does not prevent the page faults from occurring, however, but marks the |
||||
object in question as being initialized so that no warnings will ever be |
||||
produced for this object. |
||||
|
||||
Currently, the SLAB and SLUB allocators are supported by kmemcheck. |
@ -1 +0,0 @@ |
||||
obj-y := error.o kmemcheck.o opcode.o pte.o selftest.o shadow.o
|
@ -1,658 +0,0 @@ |
||||
/**
|
||||
* kmemcheck - a heavyweight memory checker for the linux kernel |
||||
* Copyright (C) 2007, 2008 Vegard Nossum <vegardno@ifi.uio.no> |
||||
* (With a lot of help from Ingo Molnar and Pekka Enberg.) |
||||
* |
||||
* This program is free software; you can redistribute it and/or modify |
||||
* it under the terms of the GNU General Public License (version 2) as |
||||
* published by the Free Software Foundation. |
||||
*/ |
||||
|
||||
#include <linux/init.h> |
||||
#include <linux/interrupt.h> |
||||
#include <linux/kallsyms.h> |
||||
#include <linux/kernel.h> |
||||
#include <linux/kmemcheck.h> |
||||
#include <linux/mm.h> |
||||
#include <linux/page-flags.h> |
||||
#include <linux/percpu.h> |
||||
#include <linux/ptrace.h> |
||||
#include <linux/string.h> |
||||
#include <linux/types.h> |
||||
|
||||
#include <asm/cacheflush.h> |
||||
#include <asm/kmemcheck.h> |
||||
#include <asm/pgtable.h> |
||||
#include <asm/tlbflush.h> |
||||
|
||||
#include "error.h" |
||||
#include "opcode.h" |
||||
#include "pte.h" |
||||
#include "selftest.h" |
||||
#include "shadow.h" |
||||
|
||||
|
||||
#ifdef CONFIG_KMEMCHECK_DISABLED_BY_DEFAULT |
||||
# define KMEMCHECK_ENABLED 0 |
||||
#endif |
||||
|
||||
#ifdef CONFIG_KMEMCHECK_ENABLED_BY_DEFAULT |
||||
# define KMEMCHECK_ENABLED 1 |
||||
#endif |
||||
|
||||
#ifdef CONFIG_KMEMCHECK_ONESHOT_BY_DEFAULT |
||||
# define KMEMCHECK_ENABLED 2 |
||||
#endif |
||||
|
||||
int kmemcheck_enabled = KMEMCHECK_ENABLED; |
||||
|
||||
int __init kmemcheck_init(void) |
||||
{ |
||||
#ifdef CONFIG_SMP |
||||
/*
|
||||
* Limit SMP to use a single CPU. We rely on the fact that this code |
||||
* runs before SMP is set up. |
||||
*/ |
||||
if (setup_max_cpus > 1) { |
||||
printk(KERN_INFO |
||||
"kmemcheck: Limiting number of CPUs to 1.\n"); |
||||
setup_max_cpus = 1; |
||||
} |
||||
#endif |
||||
|
||||
if (!kmemcheck_selftest()) { |
||||
printk(KERN_INFO "kmemcheck: self-tests failed; disabling\n"); |
||||
kmemcheck_enabled = 0; |
||||
return -EINVAL; |
||||
} |
||||
|
||||
printk(KERN_INFO "kmemcheck: Initialized\n"); |
||||
return 0; |
||||
} |
||||
|
||||
early_initcall(kmemcheck_init); |
||||
|
||||
/*
|
||||
* We need to parse the kmemcheck= option before any memory is allocated. |
||||
*/ |
||||
static int __init param_kmemcheck(char *str) |
||||
{ |
||||
int val; |
||||
int ret; |
||||
|
||||
if (!str) |
||||
return -EINVAL; |
||||
|
||||
ret = kstrtoint(str, 0, &val); |
||||
if (ret) |
||||
return ret; |
||||
kmemcheck_enabled = val; |
||||
return 0; |
||||
} |
||||
|
||||
early_param("kmemcheck", param_kmemcheck); |
||||
|
||||
int kmemcheck_show_addr(unsigned long address) |
||||
{ |
||||
pte_t *pte; |
||||
|
||||
pte = kmemcheck_pte_lookup(address); |
||||
if (!pte) |
||||
return 0; |
||||
|
||||
set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); |
||||
__flush_tlb_one(address); |
||||
return 1; |
||||
} |
||||
|
||||
int kmemcheck_hide_addr(unsigned long address) |
||||
{ |
||||
pte_t *pte; |
||||
|
||||
pte = kmemcheck_pte_lookup(address); |
||||
if (!pte) |
||||
return 0; |
||||
|
||||
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); |
||||
__flush_tlb_one(address); |
||||
return 1; |
||||
} |
||||
|
||||
struct kmemcheck_context { |
||||
bool busy; |
||||
int balance; |
||||
|
||||
/*
|
||||
* There can be at most two memory operands to an instruction, but |
||||
* each address can cross a page boundary -- so we may need up to |
||||
* four addresses that must be hidden/revealed for each fault. |
||||
*/ |
||||
unsigned long addr[4]; |
||||
unsigned long n_addrs; |
||||
unsigned long flags; |
||||
|
||||
/* Data size of the instruction that caused a fault. */ |
||||
unsigned int size; |
||||
}; |
||||
|
||||
static DEFINE_PER_CPU(struct kmemcheck_context, kmemcheck_context); |
||||
|
||||
bool kmemcheck_active(struct pt_regs *regs) |
||||
{ |
||||
struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); |
||||
|
||||
return data->balance > 0; |
||||
} |
||||
|
||||
/* Save an address that needs to be shown/hidden */ |
||||
static void kmemcheck_save_addr(unsigned long addr) |
||||
{ |
||||
struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); |
||||
|
||||
BUG_ON(data->n_addrs >= ARRAY_SIZE(data->addr)); |
||||
data->addr[data->n_addrs++] = addr; |
||||
} |
||||
|
||||
static unsigned int kmemcheck_show_all(void) |
||||
{ |
||||
struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); |
||||
unsigned int i; |
||||
unsigned int n; |
||||
|
||||
n = 0; |
||||
for (i = 0; i < data->n_addrs; ++i) |
||||
n += kmemcheck_show_addr(data->addr[i]); |
||||
|
||||
return n; |
||||
} |
||||
|
||||
static unsigned int kmemcheck_hide_all(void) |
||||
{ |
||||
struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); |
||||
unsigned int i; |
||||
unsigned int n; |
||||
|
||||
n = 0; |
||||
for (i = 0; i < data->n_addrs; ++i) |
||||
n += kmemcheck_hide_addr(data->addr[i]); |
||||
|
||||
return n; |
||||
} |
||||
|
||||
/*
|
||||
* Called from the #PF handler. |
||||
*/ |
||||
void kmemcheck_show(struct pt_regs *regs) |
||||
{ |
||||
struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); |
||||
|
||||
BUG_ON(!irqs_disabled()); |
||||
|
||||
if (unlikely(data->balance != 0)) { |
||||
kmemcheck_show_all(); |
||||
kmemcheck_error_save_bug(regs); |
||||
data->balance = 0; |
||||
return; |
||||
} |
||||
|
||||
/*
|
||||
* None of the addresses actually belonged to kmemcheck. Note that |
||||
* this is not an error. |
||||
*/ |
||||
if (kmemcheck_show_all() == 0) |
||||
return; |
||||
|
||||
++data->balance; |
||||
|
||||
/*
|
||||
* The IF needs to be cleared as well, so that the faulting |
||||
* instruction can run "uninterrupted". Otherwise, we might take |
||||
* an interrupt and start executing that before we've had a chance |
||||
* to hide the page again. |
||||
* |
||||
* NOTE: In the rare case of multiple faults, we must not override |
||||
* the original flags: |
||||
*/ |
||||
if (!(regs->flags & X86_EFLAGS_TF)) |
||||
data->flags = regs->flags; |
||||
|
||||
regs->flags |= X86_EFLAGS_TF; |
||||
regs->flags &= ~X86_EFLAGS_IF; |
||||
} |
||||
|
||||
/*
|
||||
* Called from the #DB handler. |
||||
*/ |
||||
void kmemcheck_hide(struct pt_regs *regs) |
||||
{ |
||||
struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); |
||||
int n; |
||||
|
||||
BUG_ON(!irqs_disabled()); |
||||
|
||||
if (unlikely(data->balance != 1)) { |
||||
kmemcheck_show_all(); |
||||
kmemcheck_error_save_bug(regs); |
||||
data->n_addrs = 0; |
||||
data->balance = 0; |
||||
|
||||
if (!(data->flags & X86_EFLAGS_TF)) |
||||
regs->flags &= ~X86_EFLAGS_TF; |
||||
if (data->flags & X86_EFLAGS_IF) |
||||
regs->flags |= X86_EFLAGS_IF; |
||||
return; |
||||
} |
||||
|
||||
if (kmemcheck_enabled) |
||||
n = kmemcheck_hide_all(); |
||||
else |
||||
n = kmemcheck_show_all(); |
||||
|
||||
if (n == 0) |
||||
return; |
||||
|
||||
--data->balance; |
||||
|
||||
data->n_addrs = 0; |
||||
|
||||
if (!(data->flags & X86_EFLAGS_TF)) |
||||
regs->flags &= ~X86_EFLAGS_TF; |
||||
if (data->flags & X86_EFLAGS_IF) |
||||
regs->flags |= X86_EFLAGS_IF; |
||||
} |
||||
|
||||
void kmemcheck_show_pages(struct page *p, unsigned int n) |
||||
{ |
||||
unsigned int i; |
||||
|
||||
for (i = 0; i < n; ++i) { |
||||
unsigned long address; |
||||
pte_t *pte; |
||||
unsigned int level; |
||||
|
||||
address = (unsigned long) page_address(&p[i]); |
||||
pte = lookup_address(address, &level); |
||||
BUG_ON(!pte); |
||||
BUG_ON(level != PG_LEVEL_4K); |
||||
|
||||
set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); |
||||
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_HIDDEN)); |
||||
__flush_tlb_one(address); |
||||
} |
||||
} |
||||
|
||||
bool kmemcheck_page_is_tracked(struct page *p) |
||||
{ |
||||
/* This will also check the "hidden" flag of the PTE. */ |
||||
return kmemcheck_pte_lookup((unsigned long) page_address(p)); |
||||
} |
||||
|
||||
void kmemcheck_hide_pages(struct page *p, unsigned int n) |
||||
{ |
||||
unsigned int i; |
||||
|
||||
for (i = 0; i < n; ++i) { |
||||
unsigned long address; |
||||
pte_t *pte; |
||||
unsigned int level; |
||||
|
||||
address = (unsigned long) page_address(&p[i]); |
||||
pte = lookup_address(address, &level); |
||||
BUG_ON(!pte); |
||||
BUG_ON(level != PG_LEVEL_4K); |
||||
|
||||
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); |
||||
set_pte(pte, __pte(pte_val(*pte) | _PAGE_HIDDEN)); |
||||
__flush_tlb_one(address); |
||||
} |
||||
} |
||||
|
||||
/* Access may NOT cross page boundary */ |
||||
static void kmemcheck_read_strict(struct pt_regs *regs, |
||||
unsigned long addr, unsigned int size) |
||||
{ |
||||
void *shadow; |
||||
enum kmemcheck_shadow status; |
||||
|
||||
shadow = kmemcheck_shadow_lookup(addr); |
||||
if (!shadow) |
||||
return; |
||||
|
||||
kmemcheck_save_addr(addr); |
||||
status = kmemcheck_shadow_test(shadow, size); |
||||
if (status == KMEMCHECK_SHADOW_INITIALIZED) |
||||
return; |
||||
|
||||
if (kmemcheck_enabled) |
||||
kmemcheck_error_save(status, addr, size, regs); |
||||
|
||||
if (kmemcheck_enabled == 2) |
||||
kmemcheck_enabled = 0; |
||||
|
||||
/* Don't warn about it again. */ |
||||
kmemcheck_shadow_set(shadow, size); |
||||
} |
||||
|
||||
bool kmemcheck_is_obj_initialized(unsigned long addr, size_t size) |
||||
{ |
||||
enum kmemcheck_shadow status; |
||||
void *shadow; |
||||
|
||||
shadow = kmemcheck_shadow_lookup(addr); |
||||
if (!shadow) |
||||
return true; |
||||
|
||||
status = kmemcheck_shadow_test_all(shadow, size); |
||||
|
||||
return status == KMEMCHECK_SHADOW_INITIALIZED; |
||||
} |
||||
|
||||
/* Access may cross page boundary */ |
||||
static void kmemcheck_read(struct pt_regs *regs, |
||||
unsigned long addr, unsigned int size) |
||||
{ |
||||
unsigned long page = addr & PAGE_MASK; |
||||
unsigned long next_addr = addr + size - 1; |
||||
unsigned long next_page = next_addr & PAGE_MASK; |
||||
|
||||
if (likely(page == next_page)) { |
||||
kmemcheck_read_strict(regs, addr, size); |
||||
return; |
||||
} |
||||
|
||||
/*
|
||||
* What we do is basically to split the access across the |
||||
* two pages and handle each part separately. Yes, this means |
||||
* that we may now see reads that are 3 + 5 bytes, for |
||||
* example (and if both are uninitialized, there will be two |
||||
* reports), but it makes the code a lot simpler. |
||||
*/ |
||||
kmemcheck_read_strict(regs, addr, next_page - addr); |
||||
kmemcheck_read_strict(regs, next_page, next_addr - next_page); |
||||
} |
||||
|
||||
static void kmemcheck_write_strict(struct pt_regs *regs, |
||||
unsigned long addr, unsigned int size) |
||||
{ |
||||
void *shadow; |
||||
|
||||
shadow = kmemcheck_shadow_lookup(addr); |
||||
if (!shadow) |
||||
return; |
||||
|
||||
kmemcheck_save_addr(addr); |
||||
kmemcheck_shadow_set(shadow, size); |
||||
} |
||||
|
||||
static void kmemcheck_write(struct pt_regs *regs, |
||||
unsigned long addr, unsigned int size) |
||||
{ |
||||
unsigned long page = addr & PAGE_MASK; |
||||
unsigned long next_addr = addr + size - 1; |
||||
unsigned long next_page = next_addr & PAGE_MASK; |
||||
|
||||
if (likely(page == next_page)) { |
||||
kmemcheck_write_strict(regs, addr, size); |
||||
return; |
||||
} |
||||
|
||||
/* See comment in kmemcheck_read(). */ |
||||
kmemcheck_write_strict(regs, addr, next_page - addr); |
||||
kmemcheck_write_strict(regs, next_page, next_addr - next_page); |
||||
} |
||||
|
||||
/*
|
||||
* Copying is hard. We have two addresses, each of which may be split across |
||||
* a page (and each page will have different shadow addresses). |
||||
*/ |
||||
static void kmemcheck_copy(struct pt_regs *regs, |
||||
unsigned long src_addr, unsigned long dst_addr, unsigned int size) |
||||
{ |
||||
uint8_t shadow[8]; |
||||
enum kmemcheck_shadow status; |
||||
|
||||
unsigned long page; |
||||
unsigned long next_addr; |
||||
unsigned long next_page; |
||||
|
||||
uint8_t *x; |
||||
unsigned int i; |
||||
unsigned int n; |
||||
|
||||
BUG_ON(size > sizeof(shadow)); |
||||
|
||||
page = src_addr & PAGE_MASK; |
||||
next_addr = src_addr + size - 1; |
||||
next_page = next_addr & PAGE_MASK; |
||||
|
||||
if (likely(page == next_page)) { |
||||
/* Same page */ |
||||
x = kmemcheck_shadow_lookup(src_addr); |
||||
if (x) { |
||||
kmemcheck_save_addr(src_addr); |
||||
for (i = 0; i < size; ++i) |
||||
shadow[i] = x[i]; |
||||
} else { |
||||
for (i = 0; i < size; ++i) |
||||
shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; |
||||
} |
||||
} else { |
||||
n = next_page - src_addr; |
||||
BUG_ON(n > sizeof(shadow)); |
||||
|
||||
/* First page */ |
||||
x = kmemcheck_shadow_lookup(src_addr); |
||||
if (x) { |
||||
kmemcheck_save_addr(src_addr); |
||||
for (i = 0; i < n; ++i) |
||||
shadow[i] = x[i]; |
||||
} else { |
||||
/* Not tracked */ |
||||
for (i = 0; i < n; ++i) |
||||
shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; |
||||
} |
||||
|
||||
/* Second page */ |
||||
x = kmemcheck_shadow_lookup(next_page); |
||||
if (x) { |
||||
kmemcheck_save_addr(next_page); |
||||
for (i = n; i < size; ++i) |
||||
shadow[i] = x[i - n]; |
||||
} else { |
||||
/* Not tracked */ |
||||
for (i = n; i < size; ++i) |
||||
shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; |
||||
} |
||||
} |
||||
|
||||
page = dst_addr & PAGE_MASK; |
||||
next_addr = dst_addr + size - 1; |
||||
next_page = next_addr & PAGE_MASK; |
||||
|
||||
if (likely(page == next_page)) { |
||||
/* Same page */ |
||||
x = kmemcheck_shadow_lookup(dst_addr); |
||||
if (x) { |
||||
kmemcheck_save_addr(dst_addr); |
||||
for (i = 0; i < size; ++i) { |
||||
x[i] = shadow[i]; |
||||
shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; |
||||
} |
||||
} |
||||
} else { |
||||
n = next_page - dst_addr; |
||||
BUG_ON(n > sizeof(shadow)); |
||||
|
||||
/* First page */ |
||||
x = kmemcheck_shadow_lookup(dst_addr); |
||||
if (x) { |
||||
kmemcheck_save_addr(dst_addr); |
||||
for (i = 0; i < n; ++i) { |
||||
x[i] = shadow[i]; |
||||
shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; |
||||
} |
||||
} |
||||
|
||||
/* Second page */ |
||||
x = kmemcheck_shadow_lookup(next_page); |
||||
if (x) { |
||||
kmemcheck_save_addr(next_page); |
||||
for (i = n; i < size; ++i) { |
||||
x[i - n] = shadow[i]; |
||||
shadow[i] = KMEMCHECK_SHADOW_INITIALIZED; |
||||
} |
||||
} |
||||
} |
||||
|
||||
status = kmemcheck_shadow_test(shadow, size); |
||||
if (status == KMEMCHECK_SHADOW_INITIALIZED) |
||||
return; |
||||
|
||||
if (kmemcheck_enabled) |
||||
kmemcheck_error_save(status, src_addr, size, regs); |
||||
|
||||
if (kmemcheck_enabled == 2) |
||||
kmemcheck_enabled = 0; |
||||
} |
||||
|
||||
enum kmemcheck_method { |
||||
KMEMCHECK_READ, |
||||
KMEMCHECK_WRITE, |
||||
}; |
||||
|
||||
static void kmemcheck_access(struct pt_regs *regs, |
||||
unsigned long fallback_address, enum kmemcheck_method fallback_method) |
||||
{ |
||||
const uint8_t *insn; |
||||
const uint8_t *insn_primary; |
||||
unsigned int size; |
||||
|
||||
struct kmemcheck_context *data = this_cpu_ptr(&kmemcheck_context); |
||||
|
||||
/* Recursive fault -- ouch. */ |
||||
if (data->busy) { |
||||
kmemcheck_show_addr(fallback_address); |
||||
kmemcheck_error_save_bug(regs); |
||||
return; |
||||
} |
||||
|
||||
data->busy = true; |
||||
|
||||
insn = (const uint8_t *) regs->ip; |
||||
insn_primary = kmemcheck_opcode_get_primary(insn); |
||||
|
||||
kmemcheck_opcode_decode(insn, &size); |
||||
|
||||
switch (insn_primary[0]) { |
||||
#ifdef CONFIG_KMEMCHECK_BITOPS_OK |
||||
/* AND, OR, XOR */ |
||||
/*
|
||||
* Unfortunately, these instructions have to be excluded from |
||||
* our regular checking since they access only some (and not |
||||
* all) bits. This clears out "bogus" bitfield-access warnings. |
||||
*/ |
||||
case 0x80: |
||||
case 0x81: |
||||
case 0x82: |
||||
case 0x83: |
||||
switch ((insn_primary[1] >> 3) & 7) { |
||||
/* OR */ |
||||
case 1: |
||||
/* AND */ |
||||
case 4: |
||||
/* XOR */ |
||||
case 6: |
||||
kmemcheck_write(regs, fallback_address, size); |
||||
goto out; |
||||
|
||||
/* ADD */ |
||||
case 0: |
||||
/* ADC */ |
||||
case 2: |
||||
/* SBB */ |
||||
case 3: |
||||
/* SUB */ |
||||
case 5: |
||||
/* CMP */ |
||||
case 7: |
||||
break; |
||||
} |
||||
break; |
||||
#endif |
||||
|
||||
/* MOVS, MOVSB, MOVSW, MOVSD */ |
||||
case 0xa4: |
||||
case 0xa5: |
||||
/*
|
||||
* These instructions are special because they take two |
||||
* addresses, but we only get one page fault. |
||||
*/ |
||||
kmemcheck_copy(regs, regs->si, regs->di, size); |
||||
goto out; |
||||
|
||||
/* CMPS, CMPSB, CMPSW, CMPSD */ |
||||
case 0xa6: |
||||
case 0xa7: |
||||
kmemcheck_read(regs, regs->si, size); |
||||
kmemcheck_read(regs, regs->di, size); |
||||
goto out; |
||||
} |
||||
|
||||
/*
|
||||
* If the opcode isn't special in any way, we use the data from the |
||||
* page fault handler to determine the address and type of memory |
||||
* access. |
||||
*/ |
||||
switch (fallback_method) { |
||||
case KMEMCHECK_READ: |
||||
kmemcheck_read(regs, fallback_address, size); |
||||
goto out; |
||||
case KMEMCHECK_WRITE: |
||||
kmemcheck_write(regs, fallback_address, size); |
||||
goto out; |
||||
} |
||||
|
||||
out: |
||||
data->busy = false; |
||||
} |
||||
|
||||
bool kmemcheck_fault(struct pt_regs *regs, unsigned long address, |
||||
unsigned long error_code) |
||||
{ |
||||
pte_t *pte; |
||||
|
||||
/*
|
||||
* XXX: Is it safe to assume that memory accesses from virtual 86 |
||||
* mode or non-kernel code segments will _never_ access kernel |
||||
* memory (e.g. tracked pages)? For now, we need this to avoid |
||||
* invoking kmemcheck for PnP BIOS calls. |
||||
*/ |
||||
if (regs->flags & X86_VM_MASK) |
||||
return false; |
||||
if (regs->cs != __KERNEL_CS) |
||||
return false; |
||||
|
||||
pte = kmemcheck_pte_lookup(address); |
||||
if (!pte) |
||||
return false; |
||||
|
||||
WARN_ON_ONCE(in_nmi()); |
||||
|
||||
if (error_code & 2) |
||||
kmemcheck_access(regs, address, KMEMCHECK_WRITE); |
||||
else |
||||
kmemcheck_access(regs, address, KMEMCHECK_READ); |
||||
|
||||
kmemcheck_show(regs); |
||||
return true; |
||||
} |
||||
|
||||
bool kmemcheck_trap(struct pt_regs *regs) |
||||
{ |
||||
if (!kmemcheck_active(regs)) |
||||
return false; |
||||
|
||||
/* We're done. */ |
||||
kmemcheck_hide(regs); |
||||
return true; |
||||
} |
@ -1,173 +0,0 @@ |
||||
#include <linux/kmemcheck.h> |
||||
#include <linux/export.h> |
||||
#include <linux/mm.h> |
||||
|
||||
#include <asm/page.h> |
||||
#include <asm/pgtable.h> |
||||
|
||||
#include "pte.h" |
||||
#include "shadow.h" |
||||
|
||||
/*
|
||||
* Return the shadow address for the given address. Returns NULL if the |
||||
* address is not tracked. |
||||
* |
||||
* We need to be extremely careful not to follow any invalid pointers, |
||||
* because this function can be called for *any* possible address. |
||||
*/ |
||||
void *kmemcheck_shadow_lookup(unsigned long address) |
||||
{ |
||||
pte_t *pte; |
||||
struct page *page; |
||||
|
||||
if (!virt_addr_valid(address)) |
||||
return NULL; |
||||
|
||||
pte = kmemcheck_pte_lookup(address); |
||||
if (!pte) |
||||
return NULL; |
||||
|
||||
page = virt_to_page(address); |
||||
if (!page->shadow) |
||||
return NULL; |
||||
return page->shadow + (address & (PAGE_SIZE - 1)); |
||||
} |
||||
|
||||
static void mark_shadow(void *address, unsigned int n, |
||||
enum kmemcheck_shadow status) |
||||
{ |
||||
unsigned long addr = (unsigned long) address; |
||||
unsigned long last_addr = addr + n - 1; |
||||
unsigned long page = addr & PAGE_MASK; |
||||
unsigned long last_page = last_addr & PAGE_MASK; |
||||
unsigned int first_n; |
||||
void *shadow; |
||||
|
||||
/* If the memory range crosses a page boundary, stop there. */ |
||||
if (page == last_page) |
||||
first_n = n; |
||||
else |
||||
first_n = page + PAGE_SIZE - addr; |
||||
|
||||
shadow = kmemcheck_shadow_lookup(addr); |
||||
if (shadow) |
||||
memset(shadow, status, first_n); |
||||
|
||||
addr += first_n; |
||||
n -= first_n; |
||||
|
||||
/* Do full-page memset()s. */ |
||||
while (n >= PAGE_SIZE) { |
||||
shadow = kmemcheck_shadow_lookup(addr); |
||||
if (shadow) |
||||
memset(shadow, status, PAGE_SIZE); |
||||
|
||||
addr += PAGE_SIZE; |
||||
n -= PAGE_SIZE; |
||||
} |
||||
|
||||
/* Do the remaining page, if any. */ |
||||
if (n > 0) { |
||||
shadow = kmemcheck_shadow_lookup(addr); |
||||
if (shadow) |
||||
memset(shadow, status, n); |
||||
} |
||||
} |
||||
|
||||
void kmemcheck_mark_unallocated(void *address, unsigned int n) |
||||
{ |
||||
mark_shadow(address, n, KMEMCHECK_SHADOW_UNALLOCATED); |
||||
} |
||||
|
||||
void kmemcheck_mark_uninitialized(void *address, unsigned int n) |
||||
{ |
||||
mark_shadow(address, n, KMEMCHECK_SHADOW_UNINITIALIZED); |
||||
} |
||||
|
||||
/*
|
||||
* Fill the shadow memory of the given address such that the memory at that |
||||
* address is marked as being initialized. |
||||
*/ |
||||
void kmemcheck_mark_initialized(void *address, unsigned int n) |
||||
{ |
||||
mark_shadow(address, n, KMEMCHECK_SHADOW_INITIALIZED); |
||||
} |
||||
EXPORT_SYMBOL_GPL(kmemcheck_mark_initialized); |
||||
|
||||
void kmemcheck_mark_freed(void *address, unsigned int n) |
||||
{ |
||||
mark_shadow(address, n, KMEMCHECK_SHADOW_FREED); |
||||
} |
||||
|
||||
void kmemcheck_mark_unallocated_pages(struct page *p, unsigned int n) |
||||
{ |
||||
unsigned int i; |
||||
|
||||
for (i = 0; i < n; ++i) |
||||
kmemcheck_mark_unallocated(page_address(&p[i]), PAGE_SIZE); |
||||
} |
||||
|
||||
void kmemcheck_mark_uninitialized_pages(struct page *p, unsigned int n) |
||||
{ |
||||
unsigned int i; |
||||
|
||||
for (i = 0; i < n; ++i) |
||||
kmemcheck_mark_uninitialized(page_address(&p[i]), PAGE_SIZE); |
||||
} |
||||
|
||||
void kmemcheck_mark_initialized_pages(struct page *p, unsigned int n) |
||||
{ |
||||
unsigned int i; |
||||
|
||||
for (i = 0; i < n; ++i) |
||||
kmemcheck_mark_initialized(page_address(&p[i]), PAGE_SIZE); |
||||
} |
||||
|
||||
enum kmemcheck_shadow kmemcheck_shadow_test(void *shadow, unsigned int size) |
||||
{ |
||||
#ifdef CONFIG_KMEMCHECK_PARTIAL_OK |
||||
uint8_t *x; |
||||
unsigned int i; |
||||
|
||||
x = shadow; |
||||
|
||||
/*
|
||||
* Make sure _some_ bytes are initialized. Gcc frequently generates |
||||
* code to access neighboring bytes. |
||||
*/ |
||||
for (i = 0; i < size; ++i) { |
||||
if (x[i] == KMEMCHECK_SHADOW_INITIALIZED) |
||||
return x[i]; |
||||
} |
||||
|
||||
return x[0]; |
||||
#else |
||||
return kmemcheck_shadow_test_all(shadow, size); |
||||
#endif |
||||
} |
||||
|
||||
enum kmemcheck_shadow kmemcheck_shadow_test_all(void *shadow, unsigned int size) |
||||
{ |
||||
uint8_t *x; |
||||
unsigned int i; |
||||
|
||||
x = shadow; |
||||
|
||||
/* All bytes must be initialized. */ |
||||
for (i = 0; i < size; ++i) { |
||||
if (x[i] != KMEMCHECK_SHADOW_INITIALIZED) |
||||
return x[i]; |
||||
} |
||||
|
||||
return x[0]; |
||||
} |
||||
|
||||
void kmemcheck_shadow_set(void *shadow, unsigned int size) |
||||
{ |
||||
uint8_t *x; |
||||
unsigned int i; |
||||
|
||||
x = shadow; |
||||
for (i = 0; i < size; ++i) |
||||
x[i] = KMEMCHECK_SHADOW_INITIALIZED; |
||||
} |
@ -1,94 +0,0 @@ |
||||
config HAVE_ARCH_KMEMCHECK |
||||
bool |
||||
|
||||
if HAVE_ARCH_KMEMCHECK |
||||
|
||||
menuconfig KMEMCHECK |
||||
bool "kmemcheck: trap use of uninitialized memory" |
||||
depends on DEBUG_KERNEL |
||||
depends on !X86_USE_3DNOW |
||||
depends on SLUB || SLAB |
||||
depends on !CC_OPTIMIZE_FOR_SIZE |
||||
depends on !FUNCTION_TRACER |
||||
select FRAME_POINTER |
||||
select STACKTRACE |
||||
default n |
||||
help |
||||
This option enables tracing of dynamically allocated kernel memory |
||||
to see if memory is used before it has been given an initial value. |
||||
Be aware that this requires half of your memory for bookkeeping and |
||||
will insert extra code at *every* read and write to tracked memory |
||||
thus slow down the kernel code (but user code is unaffected). |
||||
|
||||
The kernel may be started with kmemcheck=0 or kmemcheck=1 to disable |
||||
or enable kmemcheck at boot-time. If the kernel is started with |
||||
kmemcheck=0, the large memory and CPU overhead is not incurred. |
||||
|
||||
choice |
||||
prompt "kmemcheck: default mode at boot" |
||||
depends on KMEMCHECK |
||||
default KMEMCHECK_ONESHOT_BY_DEFAULT |
||||
help |
||||
This option controls the default behaviour of kmemcheck when the |
||||
kernel boots and no kmemcheck= parameter is given. |
||||
|
||||
config KMEMCHECK_DISABLED_BY_DEFAULT |
||||
bool "disabled" |
||||
depends on KMEMCHECK |
||||
|
||||
config KMEMCHECK_ENABLED_BY_DEFAULT |
||||
bool "enabled" |
||||
depends on KMEMCHECK |
||||
|
||||
config KMEMCHECK_ONESHOT_BY_DEFAULT |
||||
bool "one-shot" |
||||
depends on KMEMCHECK |
||||
help |
||||
In one-shot mode, only the first error detected is reported before |
||||
kmemcheck is disabled. |
||||
|
||||
endchoice |
||||
|
||||
config KMEMCHECK_QUEUE_SIZE |
||||
int "kmemcheck: error queue size" |
||||
depends on KMEMCHECK |
||||
default 64 |
||||
help |
||||
Select the maximum number of errors to store in the queue. Since |
||||
errors can occur virtually anywhere and in any context, we need a |
||||
temporary storage area which is guarantueed not to generate any |
||||
other faults. The queue will be emptied as soon as a tasklet may |
||||
be scheduled. If the queue is full, new error reports will be |
||||
lost. |
||||
|
||||
config KMEMCHECK_SHADOW_COPY_SHIFT |
||||
int "kmemcheck: shadow copy size (5 => 32 bytes, 6 => 64 bytes)" |
||||
depends on KMEMCHECK |
||||
range 2 8 |
||||
default 5 |
||||
help |
||||
Select the number of shadow bytes to save along with each entry of |
||||
the queue. These bytes indicate what parts of an allocation are |
||||
initialized, uninitialized, etc. and will be displayed when an |
||||
error is detected to help the debugging of a particular problem. |
||||
|
||||
config KMEMCHECK_PARTIAL_OK |
||||
bool "kmemcheck: allow partially uninitialized memory" |
||||
depends on KMEMCHECK |
||||
default y |
||||
help |
||||
This option works around certain GCC optimizations that produce |
||||
32-bit reads from 16-bit variables where the upper 16 bits are |
||||
thrown away afterwards. This may of course also hide some real |
||||
bugs. |
||||
|
||||
config KMEMCHECK_BITOPS_OK |
||||
bool "kmemcheck: allow bit-field manipulation" |
||||
depends on KMEMCHECK |
||||
default n |
||||
help |
||||
This option silences warnings that would be generated for bit-field |
||||
accesses where not all the bits are initialized at the same time. |
||||
This may also hide some real bugs. |
||||
|
||||
endif |
Loading…
Reference in new issue