kernel_samsung_sm7125

jenna

Author	SHA1	Message	Date
Rick Yiu	a4bf74d008	sched/fair: use actual cpu capacity to calculate boosted util Currently when calculating boosted util for a cpu, it uses a fixed value of 1024 for calculation. So when top-app tasks moved to LC, which has much lower capacity than BC, the freq calculated will be high even the cpu util is low. This results in higher power consumption, especially on arch which has more little cores than big cores. By replacing the fixed value of 1024 with actual cpu capacity will reduce the freq calculated on LC. Bug: 152925197 Test: boosted util reduced on little cores Signed-off-by: Rick Yiu <rickyiu@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com> Change-Id: I80cdd08a2c7fa5e674c43bfc132584d85c14622b	5 months ago
Kyle Lin	5f23c8fec1	kernel: sched: account for real time utilization PELT doesn't account for real time task utilization in cpu_util(). As the result a CPU busy running RT task is considered as low utilization by the scheduler. Fix this by adding real time loading in to account. Bug: 147385228 Test: boot to home and run audio test Change-Id: Ie4412b186608b9a618f0d35cee9a7310db481f7c Signed-off-by: Kyle Lin <kylelin@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Quentin Perret	fcb24c73e4	sched/fair: fix misfit with PELT Commit `20017f3383` ("sched/fair: Only kick nohz balance when runqueue has more than 1 task") disabled the nohz kick for LB when a rq has a misfit task. The assumption is that this would be addressed in the forced up-migration path. However, this path is WALT-specific, so disabling the nohz kick breaks PELT. Fix it by re-enabling the nohz_kick when there is a misfit task on the rq. Bug: 143472450 Test: 10/10 iterations of eas_small_to_big ended up up-migrating Fixes: `20017f3383` ("sched/fair: Only kick nohz balance when runqueue has more than 1 task") Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: I9f708eb7661a9e82afdd4e99b878995c33703a45 Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Wei Wang	96a105f3a6	kernel: sched: fix cpu cpu_capacity_orig being capped incorrectly update_cpu_capacity will update cpu_capacity_orig capped with thermal_cap, in non-WALT case, thermal_cap is previous cpu_capacity_orig. This caused cpu_capacity_orig being capped incorrectly. Test: Build Bug: 144143594 Change-Id: I1ff9d9c87554c2d2395d46b215276b7ab50585c0 Signed-off-by: Wei Wang <wvw@google.com> (cherry picked from commit dac65a5a494f8d0c80101acc5d482d94cda6f158) Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Connor O'Brien	5431d53c4a	sched: delete unused & buggy function definitions None of these functions does what its name implies when CONFIG_SCHED_WALT=n. While all are currently unused, future patches could introduce subtle bugs by calling any of them from non WALT specific code. Delete the functions so it's obvious if new callers are added. Test: build kernel Change-Id: Ib7552afb5668b48fe2ae56307016e98716e00e63 Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Connor O'Brien	295f1a8a53	sched/fair: fix implementation of is_min_capacity_cpu() With CONFIG_SCHED_WALT disabled, is_min_capacity_cpu() is defined to always return true, which breaks the intended behavior of task_fits_max(). Revise is_min_capacity_cpu() to return correct results. An earlier version of this patch failed to handle the case when min_cap_orig_cpu == -1 while sched domains are being updated due to hotplug. Add a check for this case. Test: trace shows increased top-app placement on medium cores Bug: 117499098 Bug: 128477368 Bug: 130756111 Change-Id: Ia2b41aa7c57f071c997bcd0e9cdfd0808f6a2bf9 Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Patrick Bellasi	0109786a13	FROMLIST: sched/fair: util_est: fast ramp-up EWMA on utilization increases The estimated utilization for a task: util_est = max(util_avg, est.enqueue, est.ewma) is defined based on: - util_avg: the PELT defined utilization - est.enqueued: the util_avg at the end of the last activation - est.ewma: a exponential moving average on the est.enqueued samples According to this definition, when a task suddenly change its bandwidth requirements from small to big, the EWMA will need to collect multiple samples before converging up to track the new big utilization. This slow convergence towards bigger utilization values is not aligned to the default scheduler behavior, which is to optimize for performance. Moreover, the est.ewma component fails to compensate for temporarely utilization drops which spans just few est.enqueued samples. To let util_est do a better job in the scenario depicted above, change its definition by making util_est directly follow upward motion and only decay the est.ewma on downward. Signed-off-by: Patrick Bellasi <patrick.bellasi@matbug.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> (am from https://lkml.org/lkml/2019/10/23/1071) Change-Id: Ifbde836af2e903815904b1dbf44c782b7b66f9ce Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Kyle Lin	0b118dfd3c	kernel: sched: fix build breakage when PELT enabled Because mark_reserved use for WALT and it's called by load_balance, it leads to build breakage when WALT disabled. Executing the function only if CONFIG_SCHED_WALT enabled. Bug: 144142283 Test: Build and boot to home Change-Id: I5cc3e3ece6a28c6cdabbe6964f6a6032ff2ea809 Signed-off-by: Kyle Lin <kylelin@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Danny Lin	da43cd3376	sched/rt: Fix compile errors when WALT is disabled Change-Id: If25bddeb70670d0fcaf93088ebf55ab3dc80b4e3 Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Danny Lin	7ed8c9417f	sched/fair: Fix compile errors when WALT is disabled Change-Id: I65d0d1ae7b633969a88e20a39750fff6279db460 Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Daniel Bristot de Oliveira	23208049df	UPSTREAM: sched/rt: Disable RT_RUNTIME_SHARE by default The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime between CPUs, allowing a CPU to run a real-time task up to 100% of the time while leaving more space for non-real-time tasks to run on the CPU that lend rt_runtime. The problem is that a CPU can easily borrow enough rt_runtime to allow a spinning rt-task to run forever, starving per-cpu tasks like kworkers, which are non-real-time by design. This patch disables RT_RUNTIME_SHARE by default, avoiding this problem. The feature will still be present for users that want to enable it, though. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Wei Wang <wvw@google.com> Link: https://lkml.kernel.org/r/b776ab46817e3db5d8ef79175fa0d71073c051c7.1600697903.git.bristot@redhat.com (cherry picked from commit 2586af1ac187f6b3a50930a4e33497074e81762d) Change-Id: Ibb1b185d512130783ac9f0a29f0e20e9828c86fd Bug: 169673278 Test: build, boot and check the trace with RT task Signed-off-by: Kyle Lin <kylelin@google.com> Change-Id: Iffede8107863b02ad4a0cb902fc8119416931bdb	5 months ago
Wei Wang	d2c1e7928c	Revert "sched/core: fix userspace affining threads incorrectly" This reverts commit `d43b69c4ad`. Bug:133481659 Test: build Change-Id: I615023c611c4de1eb334e4374af7306991f4216b Signed-off-by: Wei Wang <wvw@google.com>	5 months ago
Wei Wang	ea8ad39349	Revert "sched/core: Fix use after free issue in is_sched_lib_based_app()" This reverts commit `0e6ca1640c`. Bug:133481659 Test: build Change-Id: Ie6a0b5e46386c98882614be19dedc61ffd3870e5 Signed-off-by: Wei Wang <wvw@google.com>	5 months ago
Wei Wang	561c96abe1	Revert "sched: Improve the scheduler" This reverts commit `a3dd94a1bb`. Bug:133481659 Test: build Change-Id: Ib23609315f3446223521612621fe54469537c172 Signed-off-by: Wei Wang <wvw@google.com>	5 months ago
Alexander Winkowski	74c9dedb22	Revert "sched: Improve the scheduler" This reverts commit `92daaf50af`. Change-Id: I52d562da3c755f114d459ad09813188697ca81d8	5 months ago
Alexander Winkowski	87e4a34e24	Revert "sched: fair: Add strict skip buddy support" This reverts commit `6f58caae21`. It's not present in newer CAF kernels and Google removed it on their 4.14 devices as well. Change-Id: I3675cbfe4a37ae9ed31bf3659a545965a0d59c6f Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Kyle Lin	2d94ad17af	defconfig: Enable PELT Bug: 144142283 Test: build and boot to home Change-Id: I63128831127f52bd233f1ce99650ccaceb25bf5e Signed-off-by: Kyle Lin <kylelin@google.com>	5 months ago
Jenna	519c10b864	zram: Protect handle_decomp_fail behind a check the previous definitions as well as the creation of this is locked behind CONFIG_ZRAM_LRU_WRITEBACK as well Change-Id: I869b5595f69cc481e93ca6862b460594762d9b25 # Conflicts: # drivers/block/zram/zram_drv.c	5 months ago
Ruchit	78bb754c75	defconfig: Build wireguard	5 months ago
Ruchit	4aae3a22dd	net: Import wireguard-linux-compat v1.0.20220627 git clone git://git.zx2c4.com/wireguard-linux-compat -b v1.0.20220627 bash wireguard-linux-compat/kernel-tree-scripts/create-patch.sh > wireguard.patch git apply wireguard.patch rm wireguard.patch rm -rf wireguard-linux-compat	5 months ago
Ruchit	74687a8ed0	cpuidle: drop some samsung debug logging drivers/cpuidle/lpm-levels.o: In function `lpm_suspend_prepare': /home/risen/android/ascendia/out/../drivers/cpuidle/lpm-levels.c:1750: undefined reference to `debug_masterstats_show' /home/risen/android/ascendia/out/../drivers/cpuidle/lpm-levels.c:1751: undefined reference to `debug_rpmstats_show' drivers/cpuidle/lpm-levels.o: In function `lpm_suspend_wake': /home/risen/android/ascendia/out/../drivers/cpuidle/lpm-levels.c:1773: undefined reference to `debug_rpmstats_show' /home/risen/android/ascendia/out/../drivers/cpuidle/lpm-levels.c:1774: undefined reference to `debug_masterstats_show' make[1]: *** [/home/risen/android/ascendia/Makefile:1190: vmlinux] Error 1	5 months ago
Alex Winkowski	5b28bc6b57	defconfig: Enable SchedTune Assist	5 months ago
Alex Winkowski	dc25ffc52d	schedtune_assist: Don't allow to change the values	5 months ago
Alex Winkowski	81abed6f55	schedtune_assist: Disable prefer_idle	5 months ago
Yaroslav Furman	d66a682167	kernel: stune_assist: clarify logger a bit We are not actually setting WALT specific values, no need in printing them. Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	5 months ago
Danny Lin	432f8ff465	sched/tune: Refactor SchedTune Assist code - Return proper values when write wrappers aren't bypassed - Revise Kconfig description - Improve overall code style - Don't write colocate and sched_boost_no_override values when WALT is disabled - Mark static data as static - Improve readability of log messages - Propagate cftype struct in write wrappers - Use task_is_booster helper rather than hard-coded "init" check Signed-off-by: Danny Lin <danny@kdrag0n.dev> [0ctobot: Squash kdrag0n/proton_zf6@12d005c with kdrag0n/proton_zf6@eb73f2f] Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	5 months ago
Yaroslav Furman	edbcb6146e	sched/tune: Introduce SchedTune Assist[v3] This implements a mechanism by which default SchedTune parameters can be configured in-kernel, circumventing userspace, and mitigating reliance on ramdisk modification in the context of custom kernels. [2.5V]: This version adds proper protection from userspace (mainly init) trying to write lame boost values and gives full control to developer and user (sh is not blocked). [V3.0]: Use a struct to store all the values. [0ctobot: Update for msm-4.9 and improve coding style] [YaroST12: Update for msm-4.14] Co-authored-by: Adam W. Willis <return.of.octobot@gmail.com> Co-authored-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Change-Id: I70b676014d580b7df0f2962a989579376e261d49	5 months ago
Julian Liu	a8b2d6f011	qcacld-3.0: Free a bunch of pkts at once It is too bad to do a tight loop every adding pkt. When the hotspot is turned on, I notice that the htt_htc_misc_pkt_list_trim() function consumes at least 5% of CPU time. By caching the head of pkt queue and freeing multiple pkts at once to reduce CPU consumption. Signed-off-by: Julian Liu <wlootlxt123@gmail.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Sultan Alsawaf	5bb1be3c35	msm: kgsl: Remove POPP POPP constantly attempts to lower the GPU's frequency behind the governor's back in order to save power; however, the GPU governor in use (msm-adreno-tz) is very good at determining the GPU's load and selecting an appropriate frequency to run the GPU at. POPP was created long ago, perhaps when msm-adreno-tz didn't exist or didn't work so well, so it is clearly deprecated. Remove it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	3c98557759	msm: kgsl: Increase worker thread priority Currently, the kgsl worker thread is erroneously ranked right below Android's audio threads in terms of priority. The kgsl worker thread is in the critical path for rendering frames to the display, so increase its priority to match the priority of the display commit threads. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	622f1c3402	mbcache: Speed up cache entry creation In order to prevent redundant entry creation by racing against itself, mb_cache_entry_create scans through a large hash-list of all current entries in order to see if another allocation for the requested new entry has been made. Furthermore, it allocates memory for a new entry before scanning through this hash-list, which results in that allocated memory being discarded when the requested new entry is already present. This happens more than half the time. Speed up cache entry creation by keeping a small linked list of requested new entries in progress, and scanning through that first instead of the large hash-list. Additionally, don't bother allocating memory for a new entry until it's known that the allocated memory will be used. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	8198b5b46e	mm: Don't hog the CPU and zone lock in rmqueue_bulk() There is noticeable scheduling latency and heavy zone lock contention stemming from rmqueue_bulk's single hold of the zone lock while doing its work, as seen with the preemptoff tracer. There's no actual need for rmqueue_bulk() to hold the zone lock the entire time; it only does so for supposed efficiency. As such, we can relax the zone lock and even reschedule when IRQs are enabled in order to keep the scheduling delays and zone lock contention at bay. Forward progress is still guaranteed, as the zone lock can only be relaxed after page removal. With this change, rmqueue_bulk() no longer appears as a serious offender in the preemptoff tracer, and system latency is noticeably improved. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	490d43491e	defconfig: Enable devfreq boosting Boost the DDR bus for 58 ms when requested in order to improve jitter. The 3879 frequency step was determined empirically to be the minimum needed to sustain acceptably low jitter in UIBench. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	5 months ago
Sultan Alsawaf	beca6e300f	devfreq: Introduce devfreq boost driver This driver boosts enumerated devfreq devices upon input, and allows for boosting specific devfreq devices on other custom events. The boost frequencies for this driver should be set so that frame drops are near-zero at the boosted frequencies and power consumption is minimized at said frequencies. The goal of this driver is to provide an interface to achieve optimal device performance by requesting boosts on key events, such as when a frame is ready to rendered to the display. Currently, support is only present for boosting the cpu-llcc-ddr-bw devfreq device, but the driver is structured in a way that makes it easy to add support for new boostable devfreq devices in the future. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	eaf4e78745	kernel: Warn when an IRQ's affinity notifier gets overwritten An IRQ affinity notifier getting overwritten can point to some annoying issues which need to be resolved, like multiple pm_qos objects being registered to the same IRQ. Print out a warning when this happens to aid debugging. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	59e9010741	kernel: Only set one CPU in the default IRQ affinity mask On ARM, IRQs are executed on the first CPU inside the affinity mask, so setting an affinity mask with more than one CPU set is deceptive and causes issues with pm_qos. To fix this, only set the CPU0 bit inside the affinity mask, since that's where IRQs will run by default. This is a follow-up to "kernel: Don't allow IRQ affinity masks to have more than one CPU". Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	13b48b7bcc	kernel: Don't allow IRQ affinity masks to have more than one CPU Even with an affinity mask that has multiple CPUs set, IRQs always run on the first CPU in their affinity mask. Drivers that register an IRQ affinity notifier (such as pm_qos) will therefore have an incorrect assumption of where an IRQ is affined. Fix the IRQ affinity mask deception by forcing it to only contain one set CPU. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	73fb3db4f8	qos: Don't allow userspace to impose restrictions on CPU idle levels Giving userspace intimate control over CPU latency requirements is nonsense. Userspace can't even stop itself from being preempted, so there's no reason for it to have access to a mechanism primarily used to eliminate CPU delays on the order of microseconds. Remove userspace's ability to send pm_qos requests so that it can't hurt power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	a239bb4b62	cpuidle: Mark CPUs idle as late as possible to avoid unneeded IPIs It isn't guaranteed a CPU will idle upon calling lpm_cpuidle_enter(), since it could abort early at the need_resched() check. In this case, it's possible for an IPI to be sent to this "idle" CPU needlessly, thus wasting power. For the same reason, it's also wasteful to keep a CPU marked idle even after it's woken up. Reduce the window that CPUs are marked idle to as small as it can be in order to improve power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	06c9ba9cb2	cpuidle: Optimize pm_qos notifier callback and IPI semantics The pm_qos callback currently suffers from a number of pitfalls: it sends IPIs to CPUs that may not be idle, waits for those IPIs to finish propagating while preemption is disabled (resulting in a long busy wait for the pm_qos_update_target() caller), and needlessly calls a no-op function when the IPIs are processed. Optimize the pm_qos notifier by only sending IPIs to CPUs that are idle, and by using arch_send_wakeup_ipi_mask() instead of smp_call_function_many(). Using IPI_WAKEUP instead of IPI_CALL_FUNC, which is what smp_call_function_many() uses behind the scenes, has the benefit of doing zero work upon receipt of the IPI; IPI_WAKEUP is designed purely for sending an IPI without a payload, whereas IPI_CALL_FUNC does unwanted extra work just to run the empty smp_callback() function. Determining which CPUs are idle is done efficiently with an atomic bitmask instead of using the wake_up_if_idle() API, which checks the CPU's runqueue in an RCU read-side critical section and under a spin lock. Not very efficient in comparison to a simple, atomic bitwise operation. A cpumask isn't needed for this because NR_CPUS is guaranteed to fit within a word. Change-Id: Ic4dd7e4781172bb8e3b6eb13417a814256d44cf0 Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	58f3b55061	arm64: Allow IPI_WAKEUP to be used outside of the ACPI parking protocol An empty IPI is useful for cpuidle to wake sleeping CPUs without causing them to do unnecessary work upon receipt of the IPI. IPI_WAKEUP fills this use-case nicely, so let it be used outside of the ACPI parking protocol. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Sultan Alsawaf	d8da065a7b	cpuidle: lpm-levels: Allow exit latencies equal to target latencies This allows pm_qos votes with, say, 100 us for example to select power levels with exit latencies equal to 100 us. The extra microsecond of exit latency doesn't hurt. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	5 months ago
Mark Brown	92b75289e1	arm64: lib: Consistently enable crc32 extension Currently most of the assembly files that use architecture extensions enable them using the .arch directive but crc32.S uses .cpu instead. Move that over to .arch for consistency. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20200414182843.31664-1-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	5 months ago
Park Ju Hyung	b9e04ef6fd	arm64: crc32: always assume ARM64_HAS_CRC32 Our alternative framework is not ready for this. Just hardcode this in. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>	5 months ago
Miguel Ojeda	b254d07c53	lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure The upcoming GCC 9 release extends the -Wmissing-attributes warnings (enabled by -Wall) to C and aliases: it warns when particular function attributes are missing in the aliases but not in their target. In particular, it triggers here because crc32_le_base/__crc32c_le_base aren't __pure while their target crc32_le/__crc32c_le are. These aliases are used by architectures as a fallback in accelerated versions of CRC32. See commit 9784d82db3eb ("lib/crc32: make core crc32() routines weak so they can be overridden"). Therefore, being fallbacks, it is likely that even if the aliases were called from C, there wouldn't be any optimizations possible. Currently, the only user is arm64, which calls this from asm. Still, marking the aliases as __pure makes sense and is a good idea for documentation purposes and possible future optimizations, which also silences the warning. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	5 months ago
Ard Biesheuvel	16e9419abb	lib/crc32: make core crc32() routines weak so they can be overridden Allow architectures to drop in accelerated CRC32 routines by making the crc32_le/__crc32c_le entry points weak, and exposing non-weak aliases for them that may be used by the accelerated versions as fallbacks in case the instructions they rely upon are not available. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	5 months ago
Ard Biesheuvel	c2a78afc15	arm64/lib: improve CRC32 performance for deep pipelines Improve the performance of the crc32() asm routines by getting rid of most of the branches and small sized loads on the common path. Instead, use a branchless code path involving overlapping 16 byte loads to process the first (length % 32) bytes, and process the remainder using a loop that processes 32 bytes at a time. Tested using the following test program: #include <stdlib.h> extern void crc32_le(unsigned short, char const, int); int main(void) { static const char buf[4096]; srand(20181126); for (int i = 0; i < 100 1000 * 1000; i++) crc32_le(0, buf, rand() % 1024); return 0; } On Cortex-A53 and Cortex-A57, the performance regresses but only very slightly. On Cortex-A72 however, the performance improves from $ time ./crc32 real 0m10.149s user 0m10.149s sys 0m0.000s to $ time ./crc32 real 0m7.915s user 0m7.915s sys 0m0.000s Cc: Rui Sun <sunrui26@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	5 months ago
Ard Biesheuvel	cc518de95d	arm64/lib: add accelerated crc32 routines Unlike crc32c(), which is wired up to the crypto API internally so the optimal driver is selected based on the platform's capabilities, crc32_le() is implemented as a library function using a slice-by-8 table based C implementation. Even though few of the call sites may be bottlenecks, calling a time variant implementation with a non-negligible D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates support for the CRC32 instructions that were optional in ARMv8.0, but are already widely available, even on the Cortex-A53 based Raspberry Pi. So implement routines that use these instructions if available, and fall back to the existing generic routines otherwise. The selection is based on alternatives patching. Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE, this just codifies the status quo. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	5 months ago
Julien Thierry	b899abddfb	arm64: use WFE for long delays The current delay implementation uses the yield instruction, which is a hint that it is beneficial to schedule another thread. As this is a hint, it may be implemented as a NOP, causing all delays to be busy loops. This is the case for many existing CPUs. Taking advantage of the generic timer sending periodic events to all cores, we can use WFE during delays to reduce power consumption. This is beneficial only for delays longer than the period of the timer event stream. If timer event stream is not enabled, delays will behave as yield/busy loops. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	5 months ago
Julien Thierry	e28657aac5	arm_arch_timer: Expose event stream status The arch timer configuration for a CPU might get reset after suspending said CPU. In order to reliably use the event stream in the kernel (e.g. for delays), we keep track of the state where we can safely consider the event stream as properly configured. After writing to cntkctl, we issue an ISB to ensure that subsequent delay loops can rely on the event stream being enabled. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	5 months ago

1 2 3 4 5 ...

762308 Commits (fourteen) All Branches Search

762308 Commits (fourteen)

All Branches