kernel_samsung_sm7125

jenna

Author	SHA1	Message	Date
Yury Norov	08e3f54fc2	ARM64: enable GENERIC_FIND_FIRST_BIT ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't enable it in a config. It leads to using find_next_bit() which is less efficient: 0000000000000000 <find_first_bit>: 0: aa0003e4 mov x4, x0 4: aa0103e0 mov x0, x1 8: b4000181 cbz x1, 38 <find_first_bit+0x38> c: f9400083 ldr x3, [x4] 10: d2800802 mov x2, #0x40 // #64 14: 91002084 add x4, x4, #0x8 18: b40000c3 cbz x3, 30 <find_first_bit+0x30> 1c: 14000008 b 3c <find_first_bit+0x3c> 20: f8408483 ldr x3, [x4], #8 24: 91010045 add x5, x2, #0x40 28: b50000c3 cbnz x3, 40 <find_first_bit+0x40> 2c: aa0503e2 mov x2, x5 30: eb02001f cmp x0, x2 34: 54ffff68 b.hi 20 <find_first_bit+0x20> // b.pmore 38: d65f03c0 ret 3c: d2800002 mov x2, #0x0 // #0 40: dac00063 rbit x3, x3 44: dac01063 clz x3, x3 48: 8b020062 add x2, x3, x2 4c: eb02001f cmp x0, x2 50: 9a829000 csel x0, x0, x2, ls // ls = plast 54: d65f03c0 ret ... 0000000000000118 <_find_next_bit.constprop.1>: 118: eb02007f cmp x3, x2 11c: 540002e2 b.cs 178 <_find_next_bit.constprop.1+0x60> // b.hs, b.nlast 120: d346fc66 lsr x6, x3, #6 124: f8667805 ldr x5, [x0, x6, lsl #3] 128: b4000061 cbz x1, 134 <_find_next_bit.constprop.1+0x1c> 12c: f8667826 ldr x6, [x1, x6, lsl #3] 130: 8a0600a5 and x5, x5, x6 134: ca0400a6 eor x6, x5, x4 138: 92800005 mov x5, #0xffffffffffffffff // #-1 13c: 9ac320a5 lsl x5, x5, x3 140: 927ae463 and x3, x3, #0xffffffffffffffc0 144: ea0600a5 ands x5, x5, x6 148: 54000120 b.eq 16c <_find_next_bit.constprop.1+0x54> // b.none 14c: 1400000e b 184 <_find_next_bit.constprop.1+0x6c> 150: d346fc66 lsr x6, x3, #6 154: f8667805 ldr x5, [x0, x6, lsl #3] 158: b4000061 cbz x1, 164 <_find_next_bit.constprop.1+0x4c> 15c: f8667826 ldr x6, [x1, x6, lsl #3] 160: 8a0600a5 and x5, x5, x6 164: eb05009f cmp x4, x5 168: 540000c1 b.ne 180 <_find_next_bit.constprop.1+0x68> // b.any 16c: 91010063 add x3, x3, #0x40 170: eb03005f cmp x2, x3 174: 54fffee8 b.hi 150 <_find_next_bit.constprop.1+0x38> // b.pmore 178: aa0203e0 mov x0, x2 17c: d65f03c0 ret 180: ca050085 eor x5, x4, x5 184: dac000a5 rbit x5, x5 188: dac010a5 clz x5, x5 18c: 8b0300a3 add x3, x5, x3 190: eb03005f cmp x2, x3 194: 9a839042 csel x2, x2, x3, ls // ls = plast 198: aa0203e0 mov x0, x2 19c: d65f03c0 ret ... 0000000000000238 <find_next_bit>: 238: a9bf7bfd stp x29, x30, [sp, #-16]! 23c: aa0203e3 mov x3, x2 240: d2800004 mov x4, #0x0 // #0 244: aa0103e2 mov x2, x1 248: 910003fd mov x29, sp 24c: d2800001 mov x1, #0x0 // #0 250: 97ffffb2 bl 118 <_find_next_bit.constprop.1> 254: a8c17bfd ldp x29, x30, [sp], #16 258: d65f03c0 ret Enabling find_{first,next}_bit() would also benefit for_each_{set,clear}_bit(). On A-53 find_first_bit() is almost twice faster than find_next_bit(), according to lib/find_bit_benchmark (thanks to Alexey for testing): GENERIC_FIND_FIRST_BIT=n: [7126084.948181] find_first_bit: 47389224 ns, 16357 iterations [7126085.032315] find_first_bit: 19048193 ns, 655 iterations GENERIC_FIND_FIRST_BIT=y: [ 84.158068] find_first_bit: 27193319 ns, 16406 iterations [ 84.233005] find_first_bit: 11082437 ns, 656 iterations GENERIC_FIND_FIRST_BIT=n bloats the kernel despite that it disables generation of find_{first,next}_bit(): yury:linux$ scripts/bloat-o-meter vmlinux vmlinux.ffb add/remove: 4/1 grow/shrink: 19/251 up/down: 564/-1692 (-1128) ... Overall, GENERIC_FIND_FIRST_BIT=n is harmful both in terms of performance and code size, and it's better to have GENERIC_FIND_FIRST_BIT enabled. Tested-by: Alexey Klimov <aklimov@redhat.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210225135700.1381396-2-yury.norov@gmail.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	5 months ago
Vladimir Murzin	a0951fd046	arm64: Kconfig: select HAVE_FUTEX_CMPXCHG arm64 provides always working implementation of futex_atomic_cmpxchg_inatomic(), so there is no need to check it runtime. Change-Id: Id4b9ba07d979fddbdac9f2aaa5250b1487ff9042 Reported-by: Piyush swami <Piyush.swami@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>	5 months ago
Robin Murphy	8f148df61e	arm64: Select ARCH_HAS_FAST_MULTIPLIER It is probably safe to assume that all Armv8-A implementations have a multiplier whose efficiency is comparable or better than a sequence of three or so register-dependent arithmetic instructions. Select ARCH_HAS_FAST_MULTIPLIER to get ever-so-slightly nicer codegen in the few dusty old corners which care. In a contrived benchmark calling hweight64() in a loop, this does indeed turn out to be a small win overall, with no measurable impact on Cortex-A57 but about 5% performance improvement on Cortex-A53. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	5 months ago
Robin Murphy	2dceb98935	arm64: csum: Optimise IPv6 header checksum Throwing our __uint128_t idioms at csum_ipv6_magic() makes it about 1.3x-2x faster across a range of microarchitecture/compiler combinations. Not much in absolute terms, but every little helps. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	5 months ago
Robin Murphy	b97237243a	arm64: csum: Fix pathological zero-length calls In validating the checksumming results of the new routine, I sadly neglected to test its not-checksumming results. Thus it slipped through that the one case where @buff is already dword-aligned and @len = 0 manages to defeat the tail-masking logic and behave as if @len = 8. For a zero length it doesn't make much sense to deference @buff anyway, so just add an early return (which has essentially zero impact on performance). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	5 months ago
Robin Murphy	ea2eb07efc	arm64: Implement optimised checksum routine Apparently there exist certain workloads which rely heavily on software checksumming, for which the generic do_csum() implementation becomes a significant bottleneck. Therefore let's give arm64 its own optimised version - for ease of maintenance this foregoes assembly or intrisics, and is thus not actually arm64-specific, but does rely heavily on C idioms that translate well to the A64 ISA and the typical load/store capabilities of most ARMv8 CPU cores. The resulting increase in checksum throughput scales nicely with buffer size, tending towards 4x for a small in-order core (Cortex-A53), and up to 6x or more for an aggressive big core (Ampere eMAG). Reported-by: Lingyan Huang <huanglingyan2@huawei.com> Tested-by: Lingyan Huang <huanglingyan2@huawei.com> Change-Id: I42f718428ee872541006b3932dc010dd3f8b0f28 Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	5 months ago
Sultan Alsawaf	b70c3aa79c	dma-buf/sync_file: Speed up ioctl by omitting debug names A lot of CPU time is wasted on allocating, populating, and copying debug names back and forth with userspace when they're not actually needed. We can't just remove the name buffers from the various sync data structures though because we must preserve ABI compatibility with userspace, but instead we can just pretend the name fields of the user-shared structs aren't there. This massively reduces the sizes of memory allocated for these data structures and the amount of data passed between userspace, as well as eliminates a kzalloc() entirely from sync_file_ioctl_fence_info(), thus improving graphics performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Danny Lin	3012ced790	clk: qcom: mdss: Omit support for unused PLLs This kernel won't be used on devices with 7nm, 14nm, or 28nm PLLs nor will it be used with the 10nm DisplayPort PLL since our only display is connected via DSI. Don't compile support for PLLs we won't use. Signed-off-by: Danny Lin <danny@kdrag0n.dev> [dereference23: Adapted for atoll] Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Danny Lin	0ce57e422b	msm: kgsl: Omit code for GPUs other than Adreno 618 This kernel will not be used on devices with other GPUs. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com> Change-Id: I205e625dc1520c8baef622bc6fd303f714c6f7d6	5 months ago
Park Ju Hyung	4d5e853887	adreno: disable snapshot and coresight Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Sultan Alsawaf	4d8b75cd75	drm/msm/sde: Remove debug print from sde_reg_write() This unused debug print wastes CPU time when writing to registers, resulting in perf top reporting a decent chunk of time spent inside sde_reg_write(). Removing the debug print gets sde_reg_write() off perf top's radar. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	abf177d5d3	drm/msm/sde: Stub out debug log macros and compile them out These debug logs are everywhere and not only bloat the driver, but add latency everywhere they're used because they're not compiled out. Since they serve no purpose for us as we're not debugging SDE, compile them out. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	ce5c89bb26	msm: camera: Stub out the camera_debug_util API and compile it out A measurably significant amount of CPU time is spent in these routines while the camera is open. These are also responsible for a grotesque amount of dmesg spam, so let's nuke them. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Park Ju Hyung	181d55d7fb	blk: disable IO_STAT completely Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>	5 months ago
kdrag0n	ce10d00810	block: disable I/O stats accounting by default While Android userspace (e.g. storaged) does use iostats via /proc/diskstats, init will explicitly enable iostats for the devices on which it is primarily used - sda and sdf. Avoid the 0.5-1% overhead for block devices that do not need it. Signed-off-by: kdrag0n <dragon@khronodragon.com>	5 months ago
Park Ju Hyung	d33b99a752	mmc: disable SPI CRC CRC errors on SPI bus usually means there is something wrong with the hardware(unstable voltage, wiring, etc). Disable SPI CRC in favor of improving performance as the cost of detecting hardware errors are too high, and not all that useful. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Change-Id: I5d7ef9dedbddf8d7f4c4911788051a7753eb67d8	5 months ago
kdrag0n	bf73db087b	arm64: debug: disable self-hosted debug by default Signed-off-by: kdrag0n <dragon@khronodragon.com> Signed-off-by: celtare21 <celtare21@gmail.com>	5 months ago
Sultan Alsawaf	caf6b06015	binder: Stub out debug prints by default Binder code is very hot, so checking frequently to see if a debug message should be printed is a waste of cycles. We're not debugging binder, so just stub out the debug prints to compile them out entirely. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Alexander Winkowski	4c7659aa15	qcacld-3.0: Build with -O2 This driver is too bloated Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Danny Lin	a5154dfa9c	clk: qcom: clk-cpu-osm: Allow overriding CPU frequency tables in DT Sometimes, it may be desirable to use CPU frequency tables different from the ones in the hardware's OSM LUTs. This commit adds support for overriding each CPU's frequency table with a list of allowed frequencies defined in the OSM driver's DT node. Signed-off-by: Danny Lin <danny@kdrag0n.dev>	5 months ago
Danny Lin	ea10c94597	sched/energy: Check out to Android 4.14 common kernel To fix CAF's breakage. Signed-off-by: Danny Lin <danny@kdrag0n.dev>	5 months ago
Alexander Winkowski	19bc6cdf9b	arm64: dts: atoll: Optimised energy model Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Steven Rostedt (VMware)	31f3af8277	rcu: Speed up calling of RCU tasks callbacks Joel Fernandes found that the synchronize_rcu_tasks() was taking a significant amount of time. He demonstrated it with the following test: # cd /sys/kernel/tracing # while [ 1 ]; do x=1; done & # echo '__schedule_bug:traceon' > set_ftrace_filter # time echo '!__schedule_bug:traceon' > set_ftrace_filter; real 0m1.064s user 0m0.000s sys 0m0.004s Where it takes a little over a second to perform the synchronize, because there's a loop that waits 1 second at a time for tasks to get through their quiescent points when there's a task that must be waited for. After discussion we came up with a simple way to wait for holdouts but increase the time for each iteration of the loop but no more than a full second. With the new patch we have: # time echo '!__schedule_bug:traceon' > set_ftrace_filter; real 0m0.131s user 0m0.000s sys 0m0.004s Which drops it down to 13% of what the original wait time was. Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.org Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	5 months ago
Sultan Alsawaf	4d7bcbc492	defconfig: Reduce PELT half-life from 32 ms to 16 ms Ramping up faster should improve interactivity, especially at 90 Hz. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Alexander Winkowski	36e8ad7ae8	cpufreq: schedutil: Give explicit hints to compiler Make sure the compiler optimises away conditions that are always false since commit `b91319892e` (cpufreq: schedutil: Don't jump to max frequency for RT tasks). Change-Id: I7a108ff1a4ba09f2cb82ea8a82bd15967e724709 Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Wei Wang	2e0656cc22	ANDROID: cpufreq: schedutil: maintain raw cache when next_f is not changed Currently, the raw cache will be reset when next_f is changed after get_next_freq for correctness. However, it may introduce more cycles in those cases. This patch changes it to maintain the cached value instead of dropping it. Bug: 159936782 Bug: 158863204 Signed-off-by: Wei Wang <wvw@google.com> [dereference23: Backport to 4.14] Signed-off-by: Alexander Winkowski <dereference23@outlook.com> Change-Id: I519ca02dd2e6038e3966e1f68fee641628827c82	5 months ago
John Dias	977116cf8b	cpufreq: schedutil: clear cached_raw_freq when invalidated The cpufreq_schedutil governor keeps a cache of the last raw frequency that was mapped to a supported device frequency. If the next request for a frequency matches the cached value, the policy's next_freq value is reused. But there are paths that can update the raw cached value without updating the next_freq value, and there are paths that can set the next_freq value without setting the raw cached value. On those paths, the cached value must be reset. The case that has been observed is when a frequency request reaches sugov_update_commit but is then rejected by to the sugov_up_down_rate_limit check. Bug: 116279565 Change-Id: I7c585339a04ff1732054d6e5b36a57e2d41266aa Signed-off-by: John Dias <joaodias@google.com> Signed-off-by: Miguel de Dios <migueldedios@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Connor O'Brien	aa6676efa5	cpufreq: schedutil: fix check for stale utilization values Part of the fix from commit `d86ab9cff8` ("cpufreq: schedutil: use now as reference when aggregating shared policy requests") is reversed in commit `05d2ca2420` ("cpufreq: schedutil: Ignore CPU load older than WALT window size") due to a porting mistake. Restore it while keeping the relevant change from the latter patch. Bug: 117438867 Test: build & boot Change-Id: I21399be760d7c8e2fff6c158368a285dc6261647 Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Miguel de Dios	6b4eb85716	kernel: sched: cpufreq_schedutil: Make iowait boost optional. Bug: 120438505 Change-Id: I59e3675a320ce71c3c90be3904756b125300ba6b Signed-off-by: Miguel de Dios <migueldedios@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Danny Lin	9845d42b1f	Revert "cpufreq: schedutil: Fix for CR 2040904" This reverts commit `b8b6f565c0`. CAF's hispeed boost and predicted load features aren't any good. Remove them entirely to prevent userspace from trying to enable them (specifically pl) and to reduce useless overhead in schedutil, since it runs very often. Change-Id: I0446b49a59e5dce8e1b7712bdb654c9a5e6ff0ed Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Alexander Winkowski	3577495855	sched: fair: Modify capacity margins for atoll This is tuned to match energy model characteristics and scheduler efficiency enhancements. Change-Id: Ia60e1ea888457fa1c0c0273cdd4b0180f0a87abf Co-authored-by: Diep Quynh <remilia.1505@gmail.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Danny Lin	8c22859e7c	defconfig: Disable EDAC Qualcomm's LLCC controller does not have an error IRQ line on lito and instead polls to check memory banks for errors every 5 seconds, which is inefficient and will add to system jitter. The generic Kryo CPU cache controller does have error IRQ lines so it doesn't need to use polling, but EDAC in general is fairly useless in its current state anyway because Google disabled the option to panic on uncorrectable error. Let's follow their decision and just disable EDAC entirely, as well as its placeholder RAS dependency. Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Alexander Winkowski	a89853b423	defconfig: Disable tracing Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Alexander Winkowski	67bd750620	defconfig: Disable leftover debug features Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Alex Winkowski	e3796713bb	arm64: Select dead code elimination Change-Id: Icdd4bc5bee45f8b647e0cdd2737807d1df7ae930 Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Sultan Alsawaf	9056bcb50e	init: Kconfig: Don't force DEBUG_KERNEL when EXPERT is enabled Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	5 months ago
Aaron Lu	a31c067199	mm/page_alloc: make sure __rmqueue() etc are always inline __rmqueue(), __rmqueue_fallback(), __rmqueue_smallest() and __rmqueue_cma_fallback() are all in page allocator's hot path and better be finished as soon as possible. One way to make them faster is by making them inline. But as Andrew Morton and Andi Kleen pointed out: https://lkml.org/lkml/2017/10/10/1252 https://lkml.org/lkml/2017/10/10/1279 To make sure they are inlined, we should use __always_inline for them. With the will-it-scale/page_fault1/process benchmark, when using nr_cpu processes to stress buddy, the results for will-it-scale.processes with and without the patch are: On a 2-sockets Intel-Skylake machine: compiler base head gcc-4.4.7 6496131 6911823 +6.4% gcc-4.9.4 7225110 7731072 +7.0% gcc-5.4.1 7054224 7688146 +9.0% gcc-6.2.0 7059794 7651675 +8.4% On a 4-sockets Intel-Skylake machine: compiler base head gcc-4.4.7 13162890 13508193 +2.6% gcc-4.9.4 14997463 15484353 +3.2% gcc-5.4.1 14708711 15449805 +5.0% gcc-6.2.0 14574099 15349204 +5.3% The above 4 compilers are used because I've done the tests through Intel's Linux Kernel Performance(LKP) infrastructure and they are the available compilers there. The benefit being less on 4 sockets machine is due to the lock contention there(perf-profile/native_queued_spin_lock_slowpath=81%) is less severe than on the 2 sockets machine(85%). What the benchmark does is: it forks nr_cpu processes and then each process does the following: 1 mmap() 128M anonymous space; 2 writes to each page there to trigger actual page allocation; 3 munmap() it. in a loop. https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c Binary size wise, I have locally built them with different compilers: [aaron@aaronlu obj]$ size //mm/page_alloc.o text data bss dec hex filename 37409 9904 8524 55837 da1d gcc-4.9.4/base/mm/page_alloc.o 38273 9904 8524 56701 dd7d gcc-4.9.4/head/mm/page_alloc.o 37465 9840 8428 55733 d9b5 gcc-5.5.0/base/mm/page_alloc.o 38169 9840 8428 56437 dc75 gcc-5.5.0/head/mm/page_alloc.o 37573 9840 8428 55841 da21 gcc-6.4.0/base/mm/page_alloc.o 38261 9840 8428 56529 dcd1 gcc-6.4.0/head/mm/page_alloc.o 36863 9840 8428 55131 d75b gcc-7.2.0/base/mm/page_alloc.o 37711 9840 8428 55979 daab gcc-7.2.0/head/mm/page_alloc.o Text size increased about 800 bytes for mm/page_alloc.o. [aaron@aaronlu obj]$ size //vmlinux text data bss dec hex filename 10342757 5903208 17723392 33969357 20654cd gcc-4.9.4/base/vmlinux 10342757 5903208 17723392 33969357 20654cd gcc-4.9.4/head/vmlinux 10332448 5836608 17715200 33884256 2050860 gcc-5.5.0/base/vmlinux 10332448 5836608 17715200 33884256 2050860 gcc-5.5.0/head/vmlinux 10094546 5836696 17715200 33646442 201676a gcc-6.4.0/base/vmlinux 10094546 5836696 17715200 33646442 201676a gcc-6.4.0/head/vmlinux 10018775 5828732 17715200 33562707 2002053 gcc-7.2.0/base/vmlinux 10018775 5828732 17715200 33562707 2002053 gcc-7.2.0/head/vmlinux Text size for vmlinux has no change though, probably due to function alignment. Link: http://lkml.kernel.org/r/20171013063111.GA26032@intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Kemi Wang <kemi.wang@intel.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	5 months ago
Masahiro Yamada	d1843ac456	compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING Commit `60a3cdd063` ("x86: add optimized inlining") introduced CONFIG_OPTIMIZE_INLINING, but it has been available only for x86. The idea is obviously arch-agnostic. This commit moves the config entry from arch/x86/Kconfig.debug to lib/Kconfig.debug so that all architectures can benefit from it. This can make a huge difference in kernel image size especially when CONFIG_OPTIMIZE_FOR_SIZE is enabled. For example, I got 3.5% smaller arm64 kernel for v5.1-rc1. dec file 18983424 arch/arm64/boot/Image.before 18321920 arch/arm64/boot/Image.after This also slightly improves the "Kernel hacking" Kconfig menu as `e61aca5158` ("Merge branch 'kconfig-diet' from Dave Hansen') suggested; this config option would be a good fit in the "compiler option" menu. Link: http://lkml.kernel.org/r/20190423034959.13525-12-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boris Brezillon <bbrezillon@kernel.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marek Vasut <marek.vasut@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Stefan Agner <stefan@agner.ch> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [kdrag0n: Backported to k4.14] Signed-off-by: Danny Lin <danny@kdrag0n.dev>	5 months ago
Sultan Alsawaf	f6e2b287b5	Makefile: Use -O3 optimization for CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE -O3 is much more stable with modern compilers these days than it was a decade ago. Using -O3 on the kernel results in significantly improved hackbench performance, which is a sign that overall performance in the kernel is improved. It works especially well in conjunction with LTO. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	db968a22bf	kbuild: Disable stack conservation for GCC There's plenty of room on the stack for a few more inlined bytes here and there. The measured stack usage at runtime is still safe without this, and performance is surely improved at a microscopic level, so remove it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Yaroslav Furman	1801bee9d5	arm64: boot: atoll: Fix a few freq inconsistencies SD720G doesn't have 2400000 frequency, it only goes to 2323200. Signed-off-by: Yaroslav Furman <yaro330@gmail.com> [dereference23: Adapted for atoll] Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Kees Cook	8badbc5439	pstore/ram: Introduce max_reason and convert dump_oops Now that pstore_register() can correctly pass max_reason to the kmesg dump facility, introduce a new "max_reason" module parameter and "max-reason" Device Tree field. The "dump_oops" module parameter and "dump-oops" Device Tree field are now considered deprecated, but are now automatically converted to their corresponding max_reason values when present, though the new max_reason setting has precedence. For struct ramoops_platform_data, the "dump_oops" member is entirely replaced by a new "max_reason" member, with the only existing user updated in place. Additionally remove the "reason" filter logic from ramoops_pstore_write(), as that is not specifically needed anymore, though technically this is a change in behavior for any ramoops users also setting the printk.always_kmsg_dump boot param, which will cause ramoops to behave as if max_reason was set to KMSG_DUMP_MAX. Co-developed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/lkml/20200515184434.8470-6-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago
Pavel Tatashin	96fd33e3b0	pstore/platform: Pass max_reason to kmesg dump Add a new member to struct pstore_info for passing information about kmesg dump maximum reason. This allows a finer control of what kmesg dumps are sent to pstore storage backends. Those backends that do not explicitly set this field (keeping it equal to 0), get the default behavior: store only Oopses and Panics, or everything if the printk.always_kmsg_dump boot param is set. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/lkml/20200515184434.8470-5-keescook@chromium.org/ Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago
Kees Cook	c307a9985e	printk: Collapse shutdown types into a single dump reason To turn the KMSG_DUMP_* reasons into a more ordered list, collapse the redundant KMSG_DUMP_(RESTART\|HALT\|POWEROFF) reasons into KMSG_DUMP_SHUTDOWN. The current users already don't meaningfully distinguish between them, so there's no need to, as discussed here: https://lore.kernel.org/lkml/CA+CK2bAPv5u1ih5y9t5FUnTyximtFCtDYXJCpuyjOyHNOkRdqw@mail.gmail.com/ Link: https://lore.kernel.org/lkml/20200515184434.8470-2-keescook@chromium.org/ Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago
Kees Cook	c0f41ae523	pstore/ram: Refactor DT size parsing Refactor device tree size parsing routines to be able to pass a non-zero default value for providing a configurable default for the coming "max_reason" field. Also rename the helpers, since we're not always parsing a size -- we're parsing a u32 and making sure it's not greater than INT_MAX. Link: https://lore.kernel.org/lkml/20200506211523.15077-4-keescook@chromium.org/ Link: https://lore.kernel.org/lkml/20200521205223.175957-1-tyhicks@linux.microsoft.com Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago
Kees Cook	17da3ff9df	pstore/ram: Adjust module param permissions to reflect reality A couple module parameters had 0600 permissions, but changing them would have no impact on ramoops, so switch these to 0400 to reflect reality. Link: https://lore.kernel.org/lkml/20200506211523.15077-7-keescook@chromium.org/ Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago
Kees Cook	fee07f5570	pstore/ram: Avoid needless alloc during header write Since the header is a fixed small maximum size, just use a stack variable to avoid memory allocation in the write path. Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago
Yue Hu	b075e55001	pstore/ram: Add kmsg hlen zero check to ramoops_pstore_write() If zero-length header happened in ramoops_write_kmsg_hdr(), that means we will not be able to read back dmesg record later, since it will be treated as invalid header in ramoops_pstore_read(). So we should not execute the following code but return the error. Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago
Yue Hu	95065e9061	pstore/ram: Move initialization earlier Since only one single ramoops area allowed at a time, other probes (like device tree) are meaningless, as it will waste CPU resources. So let's check for being already initialized first. Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago
Yue Hu	d945505acb	pstore: Avoid writing records with zero size Sometimes pstore_console_write() will write records with zero size to persistent ram zone, which is unnecessary. It will only increase resource consumption. Also adjust ramoops_write_kmsg_hdr() to have same logic if memory allocation fails. Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Kees Cook <keescook@chromium.org>	5 months ago

1 2 3 4 5 ...

762308 Commits (fourteen) All Branches Search

762308 Commits (fourteen)

All Branches