Currently, get_user_pages() returns fully coherent pages to the kernel for
anything other than anonymous pages. This is a problem for things like
fuse and the SCSI generic ioctl SG_IO which can potentially wish to do DMA
to anonymous pages passed in by users.
The fix is to add a new memory management API: flush_anon_page() which
is used in get_user_pages() to make anonymous pages coherent.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It has been discovered that the remove_proc_entry has a race in the removing
of entries in the proc file system that are siblings. There's no protection
around the traversing and removing of elements that belong in the same
subdirectory.
This subdirectory list is protected in other areas by the BKL. So the BKL was
at first used to protect this area too, but unfortunately, remove_proc_entry
may be called with spinlocks held. The BKL may schedule, so this was not a
solution.
The final solution was to add a new global spin lock to protect this list,
called proc_subdir_lock. This lock now protects the list in
remove_proc_entry, and I also went around looking for other areas that this
list is modified and added this protection there too. Care must be taken
since these locations call several functions that may also schedule.
Since I don't see any location that these functions that modify the
subdirectory list are called by interrupts, the irqsave/restore versions of
the spin lock was _not_ used.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Warn if free_irq() is called in IRQ context - free_irq() can execute /proc
VFS work, which must not be done in IRQ context.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
free_irq() should not be executed from softirq context.
Found by the lock validator. The fix is to push fd_free_irq() into
keventd. The code validates fine with this patch applied.
(akpm: this is revolting, but so is floppy.c)
[akpm@osdl.org: added flush_scheduled_work()]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 3030/2: fix permission check in the obscur cmpxchg syscall
[ARM] nommu: rename compressed/head.S symbols to a new style
[ARM] select TLS_REG_EMUL and NEEDS_SYSCALL_FOR_CMPXCHG
[ARM] nommu: Move hardware page table definitions to pgtable-hwdef.h
[ARM] Move read of processor ID out of lookup_processor_type()
[ARM] Fix typo in tlbflush.h
[ARM] noMMU: removes TLB codes in nommu mode
[ARM] noMMU: block sys_fork in nommu mode
[ARM] 3399/1: Fix link problem when CONFIG_PRINTK is disabled
[ARM] 3398/1: Fix the VFP registers loading/storing base address
[ARM] 3397/1: AT91RM9200 Header update
[ARM] 3385/1: Battery support for sharp zaurus sl-5500 (collie)
[ARM] SMP: don't set cpu_*_map in smp_prepare_boot_cpu
include/linux/clk.h is betraying its ARM origins
[ARM] Move enable_irq and disable_irq to assembler.h
[ARM] 3391/1: use PLAT8250_DEV_PLATFORM{,1} for platform device id instead of 0/1
Broken earlier by me by a x86-64 patch.
The code was optimized away, but the compiler still complained about an
undeclared function.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Lennert Buytenhek
Add a PLAT8250_DEV_PLATFORM2, and convert the two ixdp2x01 CPLD serial
ports to use platform serial devices with ids PLAT8250_DEV_PLATFORM[12].
(The on-chip xscale UART is PLAT8250_DEV_PLATFORM, id #0.)
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Nicolas Pitre
Quoting RMK:
|pte_write() just says that the page _may_ be writable. It doesn't say
|that the MMU is programmed to allow writes. If pte_dirty() doesn't
|return true, that means that the page is _not_ writable from userspace.
|If you write to it from kernel mode (without using put_user) you'll
|bypass the MMU read-only protection and may end up writing to a page
|owned by two separate processes.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Malcolm Parsons
Printking a backtrace requires printk, so disable backtrace code
when printk is disabled.
Without this patch, a kernel with CONFIG_PRINTK disabled does not link:
arch/arm/lib/lib.a(backtrace.o): In function `c_backtrace':
arch/arm/lib/backtrace.S:(.text+0x108): undefined reference to `printk'
arch/arm/lib/backtrace.S:(.text+0x11c): undefined reference to `printk'
arch/arm/lib/lib.a(backtrace.o):(.fixup+0x8): undefined reference to `printk'
Signed-off-by: Malcolm Parsons <malcolm.parsons@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Catalin Marinas
The current VFP code corrupts the VFP registers (including the control
ones) if more than one floating point application is executed at the same
time. This patch fixes the updating of the load/store base addresses for
the VFP registers.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Andrew Victor
This patch updates the hardware header to include definitions for the
Memory Controller registers.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Pavel Machek
This adds support for battery reading on collie. Collie slowly charges
battery even with charging disabled, so I did not yet enable fast
charge.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The recent addition of boot_cpu_init() implements the initialisation
of the online, present and possible cpu maps for the boot CPU, so
there is no reason to duplicate this in the architecture
smp_prepare_boot_cpu() hook.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
include/linux/clk.h is betraying its ARM origins.
Signed-off-by: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With the combination of PNPACPI and 8250_pnp, we no longer need 8250_acpi.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The icom driver uses request_firmware()
and thus needs to select FW_LOADER.
Signed-off-by: maximilian attems <maks@sternwelten.at>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
[PATCH] fix audit_init failure path
[PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
[PATCH] sem2mutex: audit_netlink_sem
[PATCH] simplify audit_free() locking
[PATCH] Fix audit operators
[PATCH] promiscuous mode
[PATCH] Add tty to syscall audit records
[PATCH] add/remove rule update
[PATCH] audit string fields interface + consumer
[PATCH] SE Linux audit events
[PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
[PATCH] Fix IA64 success/failure indication in syscall auditing.
[PATCH] Miscellaneous bug and warning fixes
[PATCH] Capture selinux subject/object context information.
[PATCH] Exclude messages by message type
[PATCH] Collect more inode information during syscall processing.
[PATCH] Pass dentry, not just name, in fsnotify creation hooks.
[PATCH] Define new range of userspace messages.
[PATCH] Filter rule comparators
...
Fixed trivial conflict in security/selinux/hooks.c
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/aoe-2.6:
[PATCH] aoe [3/3]: update version to 22
[PATCH] aoe [2/3]: don't request ATA device ID on ATA error
[PATCH] aoe [1/3]: support multiple AoE listeners
[PATCH] aoe: do not stop retransmit timer when device goes down
[PATCH] aoe [8/8]: update driver version number
[PATCH] aoe [7/8]: update driver compatibility string
[PATCH] aoe [6/8]: update device information on last close
[PATCH] aoe [5/8]: allow network interface migration on packet retransmit
[PATCH] aoe [4/8]: use less confusing driver name
[PATCH] aoe [3/8]: increase allowed outstanding packets
[PATCH] aoe [2/8]: support dynamic resizing of AoE devices
[PATCH] aoe [1/8]: zero packet data after skb allocation
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits)
SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies
SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc
LOCKD: Make nlmsvc_traverse_shares return void
LOCKD: nlmsvc_traverse_blocks return is unused
SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
SUNRPC: Fix memory barriers for req->rq_received
NFS: Fix a race in nfs_sync_inode()
NFS: Clean up nfs_flush_list()
NFS: Fix a race with PG_private and nfs_release_page()
NFSv4: Ensure the callback daemon flushes signals
SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
NFS, NLM: Allow blocking locks to respect signals
NFS: Make nfs_fhget() return appropriate error values
NFSv4: Fix an oops in nfs4_fill_super
lockd: blocks should hold a reference to the nlm_file
NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
NFSv4: Send the delegation stateid for SETATTR calls
...
* master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart:
[AGPGART] x86_64: Enable VIA AGP driver on x86-64 for VIA P4 chipsets
[AGPGART] x86_64: Fix wrong PCI ID for ALI M1695 AGP bridge
[AGPGART] ATI RS350 support.
[AGPGART] Lots of CodingStyle/whitespace cleanups.
DEBUG_KERNEL is often enabled just for sysrq, but this doesn't
mean the user wants more heavyweight debugging information.
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
tcsh is not happy with the -9999 error code.
Suggested by Ernie Petrides
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I got an oops on a dual core system because the lost tick handler
called cpufreq_get() on core 1 and powernow tried to follow
a NULL powernow_data[] pointer there.
Initialize powernow_data for all cores of a CPU.
Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No need to restrict to power of two here.
TBD needs more double checking
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pfn_to_page() and others need to access both memnode_shift and the very
first bytes of memnodemap[]. If we force memnode_shift to be just before the
memnodemap array, we can reduce the memory footprint to one cache line
instead of two for most setups. This patch introduce a 'memnode' structure
where shift and map[] are carefully placed.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
register_die_notifier is exported twice, once in traps.c and once in
x8664_ksyms.c. This results in a warning on build.
Signed-off-by: Kevin Winchester <kwin@ns.sympatico.ca>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
arch/x86_64/kernel/aperture.c: The search for the AGP bridge has been
extended to search for all the 256 buses instead of the first 32. This
is required since on a some systems, the bridge may be located on a bus
much farther than the first 32. By searching all 256 buses, we guarantee
that the search succeeds on such systems.
arch/x86_64/kernel/pci-gart.c: The search for the Northbridge is not
limited to just bus 0 anymore. This is required because on certain
systems, we may not find one on bus 0.
Signed-off-by: Navin Boppuri <navin.boppuri@newisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Have the GART_IOMMU help text specify that this is the hardware IOMMU in
amd64 processors. This will be significant if/when other IOMMUs are
added to the x86-64 architecture. :-)
Also, note that the previous help text stated that IOMMU was needed for
>3GB memory instead of >4GB. This is fixed in the newer version.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It was a failed experiment - all benchmarks done with it on both AMD
and Intel showed it was a loss. That was probably because the store
buffers of the CPUs for write combining traffic weren't large enough.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
free_bootmem_node expects a physical address to be passed in, but
__alloc_bootmem_node returns a virtual one. That address needs to be
translated to physical.
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o check_timer() routine fails while second kernel is booting after a crash
on an opetron box. Problem happens because timer vector (0x31) seems to be
locked.
o After a system crash, it is not safe to service interrupts any more, hence
interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
sends these interrupts to the CPU during early boot of second kernel. Other
pending interrupts are discarded saying unexpected trap but timer interrupt
is serviced and CPU does not issue an LAPIC EOI because it think this
interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
locking as LAPIC does not clear respective ISR and keeps on waiting for
EOI.
o This patch issues extra EOI for the pending interrupts who have ISR set.
o Though today only timer seems to be the special case because in early
boot it thinks interrupts are coming from i8259 and uses
mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But
probably doing it in generic manner for all vectors makes sense.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cpu_vm_mask is of type cpumask_t, so use the proper bitops.
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes problems with very large nodes (over 128GB) filling up all of
the first 4GB with their mem_map and not leaving enough space for the
swiotlb.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This means i386 processes compiled with a recent compiler will get non
executable heap by default now. This is the same default as a 32bit PAE
kernel would use on a NX enabled CPU.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reorders the mmu_state int in the pda, such that there is no more
padding (there currently is 4 bytes of padding). Boot tested.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because 256 causes overflows in some code that stores them in 8 bit
fields and the x86 APIC architecture cannot handle more than 255
anyways.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When x86_64 timer init messages were changed to use apic verbosity
levels, two messages were missed and one got the wrong level. This
causes the last word of a suppressed message to print on a line by
itself. Fix that so either the entire message prints or none of it
does.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Only log data in microcode driver when something is changed Otherwise it
was far too noisy on large systems.
Also remove the printk when it is unloaded.
Cc: tigran@veritas.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch puts the infrastructure in place to allow for a reordering of
functions based inside the vmlinux. The general idea is that it is possible
to put all "common" functions into the first 2Mb of the code, so that they
are covered by one TLB entry. This as opposed to the current situation where
a typical vmlinux covers about 3.5Mb (on x86-64) and thus 2 TLB entries.
This is done by enabling the -ffunction-sections flag in gcc, which puts
each function in its own ELF section, so that the linker can then order them
in a way defined by the linker script.
As per previous discussions, Linus said he wanted a "static" list for this,
eg a list provided by the kernel tarbal, so that most people have the same
ordering at least. A script is provided to create this list based on
readprofile(1) output. The included list is provisional, and entirely biased
on my own testbox and me running a few kernel compiles and some other
things.
I think that to get to a better list we need to invite people to submit
their own profiles, and somehow add those all up and base the final list on
that. I'm willing to do that effort if this is ends up being the prefered
approach. Such an effort probably needs to be repeated like once a year or
so to adopt to the changing nature of the kernel.
Made it a CONFIG with default n because it increases link times
dramatically.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I tested it on a couple of chipsets and it worked everywhere so it
should be ok as default for now.
So far I haven't done the great purge of the useless old check_timer
code yet though.
Can be overwritten with enable_8254_timer in the worst case
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>