kernel_samsung_sm7125

jenna

Author	SHA1	Message	Date
Alessio Balsini	91c67c3ace	FROMLIST: fuse: Passthrough initialization and release Implement the FUSE passthrough ioctl that associates the lower (passthrough) file system file with the fuse_file. The file descriptor passed to the ioctl by the FUSE daemon is used to access the relative file pointer, that will be copied to the fuse_file data structure to consolidate the link between the FUSE and lower file system. To enable the passthrough mode, user space triggers the FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl and, if the call succeeds, receives back an identifier that will be used at open/create response time in the fuse_open_out field to associate the FUSE file to the lower file system file. The value returned by the ioctl to user space can be: - > 0: success, the identifier can be used as part of an open/create reply. - <= 0: an error occurred. The value 0 represents an error to preserve backward compatibility: the fuse_open_out field that is used to pass the passthrough_fh back to the kernel uses the same bits that were previously as struct padding, and is commonly zero-initialized (e.g., in the libfuse implementation). Removing 0 from the correct values fixes the ambiguity between the case in which 0 corresponds to a real passthrough_fh, a missing implementation of FUSE passthrough or a request for a normal FUSE file, simplifying the user space implementation. For the passthrough mode to be successfully activated, the lower file system file must implement both read_iter and write_iter file operations. This extra check avoids special pseudo files to be targeted for this feature. Passthrough comes with another limitation: no further file system stacking is allowed for those FUSE file systems using passthrough. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-5-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: I4d8290012302fb4547bce9bb261a03cc4f66b5aa Signed-off-by: Alessio Balsini <balsini@google.com>	2 months ago
Alessio Balsini	13f0ce4417	BACKPORT: fuse: Definitions and ioctl for passthrough Expose the FUSE_PASSTHROUGH interface to user space and declare all the basic data structures and functions as the skeleton on top of which the FUSE passthrough functionality will be built. As part of this, introduce the new FUSE passthrough ioctl, which allows the FUSE daemon to specify a direct connection between a FUSE file and a lower file system file. Such ioctl requires user space to pass the file descriptor of one of its opened files through the fuse_passthrough_out data structure introduced in this patch. This structure includes extra fields for possible future extensions. Also, add the passthrough functions for the set-up and tear-down of the data structures and locks that will be used both when fuse_conns and fuse_files are created/deleted. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-4-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: I732532581348adadda5b5048a9346c2b0868d539 Signed-off-by: Alessio Balsini <balsini@google.com>	2 months ago
Alessio Balsini	fd0425bdcf	FROMLIST: fuse: 32-bit user space ioctl compat for fuse device With a 64-bit kernel build the FUSE device cannot handle ioctl requests coming from 32-bit user space. This is due to the ioctl command translation that generates different command identifiers that thus cannot be used for direct comparisons without proper manipulation. Explicitly extract type and number from the ioctl command to enable 32-bit user space compatibility on 64-bit kernel builds. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-3-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: I595517c54d551be70e83c7fcb4b62397a3615004 Signed-off-by: Alessio Balsini <balsini@google.com>	2 months ago
Alessio Balsini	f8bfc96b81	BACKPORT: fs: Generic function to convert iocb to rw flags OverlayFS implements its own function to translate iocb flags into rw flags, so that they can be passed into another vfs call. With commit ce71bfea207b4 ("fs: align IOCB_* flags with RWF_* flags") Jens created a 1:1 matching between the iocb flags and rw flags, simplifying the conversion. Reduce the OverlayFS code by making the flag conversion function generic and reusable. Bug: 179164095 Link: https://lore.kernel.org/lkml/20210125153057.3623715-2-balsini@android.com/ Signed-off-by: Alessio Balsini <balsini@android.com> Change-Id: I74aefeafd6ebbda2fbabee9024474dfe4cc6c2a7 Signed-off-by: Alessio Balsini <balsini@google.com>	2 months ago
Alessio Balsini	c967ad58be	BACKPORT: fs: align IOCB_* flags with RWF_* flags We have a set of flags that are shared between the two and inherired in kiocb_set_rw_flags(), but we check and set these individually. Reorder the IOCB flags so that the bottom part of the space is synced with the RWF flag space, and then we can do them all in one mask and set operation. The only exception is RWF_SYNC, which needs to mark IOCB_SYNC and IOCB_DSYNC. Do that one separately. This shaves 15 bytes of text from kiocb_set_rw_flags() for me. (cherry picked from commit ce71bfea207b4d7c21d36f24ec37618ffcea1da8) Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Change-Id: Ib6316ae5cb3f8a14fabef5492e79783c9e6d3c4d Signed-off-by: Alessio Balsini <balsini@google.com>	2 months ago
Jürg Billeter	0c5aefe308	fs: add RWF_APPEND This is the per-I/O equivalent of O_APPEND to support atomic append operations on any open file. If a file is opened with O_APPEND, pwrite() ignores the offset and always appends data to the end of the file. RWF_APPEND enables atomic append and pwrite() with offset on a single file descriptor. Change-Id: I18cb7fa871e6b55bfe7890a633a4014135bf361e Signed-off-by: Jürg Billeter <j@bitron.ch> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2 months ago
Tim Zimmermann	5da8cf70c7	syscall: Increase bpf fake uname to 5.4 Change-Id: I50bfa0d35d81f1c8cc21530ea0524a6752d0d34c	2 months ago
Toke Høiland-Jørgensen	de8461f386	BACKPORT: devmap: Allow map lookups from eBPF We don't currently allow lookups into a devmap from eBPF, because the map lookup returns a pointer directly to the dev->ifindex, which shouldn't be modifiable from eBPF. However, being able to do lookups in devmaps is useful to know (e.g.) whether forwarding to a specific interface is enabled. Currently, programs work around this by keeping a shadow map of another type which indicates whether a map index is valid. Since we now have a flag to make maps read-only from the eBPF side, we can simply lift the lookup restriction if we make sure this flag is always set. Change-Id: I42b1430605c6837710fd903a0c8abf2c7dc13f16 Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Toke Høiland-Jørgensen	88c96444ad	BACKPORT: xdp: Add devmap_hash map type for looking up devices by hashed index A common pattern when using xdp_redirect_map() is to create a device map where the lookup key is simply ifindex. Because device maps are arrays, this leaves holes in the map, and the map has to be sized to fit the largest ifindex, regardless of how many devices actually are actually needed in the map. This patch adds a second type of device map where the key is looked up using a hashmap, instead of being used as an array index. This allows maps to be densely packed, so they can be smaller. Change-Id: I6155de499a47fb45bac1a39319f0ad979032fd6d Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2 months ago
Tim Zimmermann	54882c4ecc	kernel: bpf: devmap: Create __dev_map_alloc_node * Like in `fca16e5107` Change-Id: I95915b56f381c8f5851b6e7827eed6064aefe586	2 months ago
Andrey Ignatov	83fc9b709f	BACKPORT: bpf: Post-hooks for sys_bind "Post-hooks" are hooks that are called right before returning from sys_bind. At this time IP and port are already allocated and no further changes to `struct sock` can happen before returning from sys_bind but BPF program has a chance to inspect the socket and change sys_bind result. Specifically it can e.g. inspect what port was allocated and if it doesn't satisfy some policy, BPF program can force sys_bind to fail and return EPERM to user. Another example of usage is recording the IP:port pair to some map to use it in later calls to sys_connect. E.g. if some TCP server inside cgroup was bound to some IP:port_n, it can be recorded to a map. And later when some TCP client inside same cgroup is trying to connect to 127.0.0.1:port_n, BPF hook for sys_connect can override the destination and connect application to IP:port_n instead of 127.0.0.1:port_n. That helps forcing all applications inside a cgroup to use desired IP and not break those applications if they e.g. use localhost to communicate between each other. == Implementation details == Post-hooks are implemented as two new attach types `BPF_CGROUP_INET4_POST_BIND` and `BPF_CGROUP_INET6_POST_BIND` for existing prog type `BPF_PROG_TYPE_CGROUP_SOCK`. Separate attach types for IPv4 and IPv6 are introduced to avoid access to IPv6 field in `struct sock` from `inet_bind()` and to IPv4 field from `inet6_bind()` since those fields might not make sense in such cases. Change-Id: Ibef21eed069c37684321b2401e5bb52f689ab8e7 Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Andrey Ignatov	267d5fd235	BACKPORT: bpf: Hooks for sys_connect == The problem == See description of the problem in the initial patch of this patch set. == The solution == The patch provides much more reliable in-kernel solution for the 2nd part of the problem: making outgoing connecttion from desired IP. It adds new attach types `BPF_CGROUP_INET4_CONNECT` and `BPF_CGROUP_INET6_CONNECT` for program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both source and destination of a connection at connect(2) time. Local end of connection can be bound to desired IP using newly introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though, and doesn't support binding to port, i.e. leverages `IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this: * looking for a free port is expensive and can affect performance significantly; * there is no use-case for port. As for remote end (`struct sockaddr *` passed by user), both parts of it can be overridden, remote IP and remote port. It's useful if an application inside cgroup wants to connect to another application inside same cgroup or to itself, but knows nothing about IP assigned to the cgroup. Support is added for IPv4 and IPv6, for TCP and UDP. IPv4 and IPv6 have separate attach types for same reason as sys_bind hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound. == Implementation notes == The patch introduces new field in `struct proto`: `pre_connect` that is a pointer to a function with same signature as `connect` but is called before it. The reason is in some cases BPF hooks should be called way before control is passed to `sk->sk_prot->connect`. Specifically `inet_dgram_connect` autobinds socket before calling `sk->sk_prot->connect` and there is no way to call `bpf_bind()` from hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since it'd cause double-bind. On the other hand `proto.pre_connect` provides a flexible way to add BPF hooks for connect only for necessary `proto` and call them at desired time before `connect`. Since `bpf_bind()` is allowed to bind only to IP and autobind in `inet_dgram_connect` binds only port there is no chance of double-bind. bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite of value of `bind_address_no_port` socket field. bpf_bind() sets `with_lock` to `false` when calling to __inet_bind() and __inet6_bind() since all call-sites, where bpf_bind() is called, already hold socket lock. Change-Id: I03eb513369c630b203466621d1fbdb9b29c8333c Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Andrey Ignatov	055e9f1c2c	BACKPORT: net: Introduce __inet_bind() and __inet6_bind Refactor `bind()` code to make it ready to be called from BPF helper function `bpf_bind()` (will be added soon). Implementation of `inet_bind()` and `inet6_bind()` is separated into `__inet_bind()` and `__inet6_bind()` correspondingly. These function can be used from both `sk_prot->bind` and `bpf_bind()` contexts. New functions have two additional arguments. `force_bind_address_no_port` forces binding to IP only w/o checking `inet_sock.bind_address_no_port` field. It'll allow to bind local end of a connection to desired IP in `bpf_bind()` w/o changing `bind_address_no_port` field of a socket. It's useful since `bpf_bind()` can return an error and we'd need to restore original value of `bind_address_no_port` in that case if we changed this before calling to the helper. `with_lock` specifies whether to lock socket when working with `struct sk` or not. The argument is set to `true` for `sk_prot->bind`, i.e. old behavior is preserved. But it will be set to `false` for `bpf_bind()` use-case. The reason is all call-sites, where `bpf_bind()` will be called, already hold that socket lock. Change-Id: I3cd102acdb2b3c14946ef8452fd7afb763e8215f Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Andrey Ignatov	fbada2a266	BACKPORT: bpf: Hooks for sys_bind == The problem == There is a use-case when all processes inside a cgroup should use one single IP address on a host that has multiple IP configured. Those processes should use the IP for both ingress and egress, for TCP and UDP traffic. So TCP/UDP servers should be bound to that IP to accept incoming connections on it, and TCP/UDP clients should make outgoing connections from that IP. It should not require changing application code since it's often not possible. Currently it's solved by intercepting glibc wrappers around syscalls such as `bind(2)` and `connect(2)`. It's done by a shared library that is preloaded for every process in a cgroup so that whenever TCP/UDP server calls `bind(2)`, the library replaces IP in sockaddr before passing arguments to syscall. When application calls `connect(2)` the library transparently binds the local end of connection to that IP (`bind(2)` with `IP_BIND_ADDRESS_NO_PORT` to avoid performance penalty). Shared library approach is fragile though, e.g.: * some applications clear env vars (incl. `LD_PRELOAD`); * `/etc/ld.so.preload` doesn't help since some applications are linked with option `-z nodefaultlib`; * other applications don't use glibc and there is nothing to intercept. == The solution == The patch provides much more reliable in-kernel solution for the 1st part of the problem: binding TCP/UDP servers on desired IP. It does not depend on application environment and implementation details (whether glibc is used or not). It adds new eBPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` and attach types `BPF_CGROUP_INET4_BIND` and `BPF_CGROUP_INET6_BIND` (similar to already existing `BPF_CGROUP_INET_SOCK_CREATE`). The new program type is intended to be used with sockets (`struct sock`) in a cgroup and provided by user `struct sockaddr`. Pointers to both of them are parts of the context passed to programs of newly added types. The new attach types provides hooks in `bind(2)` system call for both IPv4 and IPv6 so that one can write a program to override IP addresses and ports user program tries to bind to and apply such a program for whole cgroup. == Implementation notes == [1] Separate attach types for `AF_INET` and `AF_INET6` are added intentionally to prevent reading/writing to offsets that don't make sense for corresponding socket family. E.g. if user passes `sockaddr_in` it doesn't make sense to read from / write to `user_ip6[]` context fields. [2] The write access to `struct bpf_sock_addr_kern` is implemented using special field as an additional "register". There are just two registers in `sock_addr_convert_ctx_access`: `src` with value to write and `dst` with pointer to context that can't be changed not to break later instructions. But the fields, allowed to write to, are not available directly and to access them address of corresponding pointer has to be loaded first. To get additional register the 1st not used by `src` and `dst` one is taken, its content is saved to `bpf_sock_addr_kern.tmp_reg`, then the register is used to load address of pointer field, and finally the register's content is restored from the temporary field after writing `src` value. Change-Id: I47b4cd565cb7cd3bcf3ecf80ddf2586ee81868fb Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Alexei Starovoitov	e7fd51d77c	BACKPORT: bpf: introduce BPF_PROG_QUERY command introduce BPF_PROG_QUERY command to retrieve a set of either attached programs to given cgroup or a set of effective programs that will execute for events within a cgroup Change-Id: I05e0ed5f6eddc30f4a18216d4541448816fd1ae5 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> for cgroup bits Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2 months ago
Andrey Ignatov	e228d7b7d6	BACKPORT: bpf: Check attach type at prog load time == The problem == There are use-cases when a program of some type can be attached to multiple attach points and those attach points must have different permissions to access context or to call helpers. E.g. context structure may have fields for both IPv4 and IPv6 but it doesn't make sense to read from / write to IPv6 field when attach point is somewhere in IPv4 stack. Same applies to BPF-helpers: it may make sense to call some helper from some attach point, but not from other for same prog type. == The solution == Introduce `expected_attach_type` field in in `struct bpf_attr` for `BPF_PROG_LOAD` command. If scenario described in "The problem" section is the case for some prog type, the field will be checked twice: 1) At load time prog type is checked to see if attach type for it must be known to validate program permissions correctly. Prog will be rejected with EINVAL if it's the case and `expected_attach_type` is not specified or has invalid value. 2) At attach time `attach_type` is compared with `expected_attach_type`, if prog type requires to have one, and, if they differ, attach will be rejected with EINVAL. The `expected_attach_type` is now available as part of `struct bpf_prog` in both `bpf_verifier_ops->is_valid_access()` and `bpf_verifier_ops->get_func_proto()` () and can be used to check context accesses and calls to helpers correspondingly. Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and Daniel Borkmann <daniel@iogearbox.net> here: https://marc.info/?l=linux-netdev&m=152107378717201&w=2 Change-Id: Idead9c9cb4251bf5bd843b68bcb83072d5746226 Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Jakub Kicinski	e7d650f158	bpf: offload: rename the ifindex field bpf_target_prog seems long and clunky, rename it to prog_ifindex. We don't want to call this field just ifindex, because maps may need a similar field in the future and bpf_attr members for programs and maps are unnamed. Change-Id: I5473ea6721193bcf616ac3a1056c808446af9c8d Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Jakub Kicinski	8df005bf97	BACKPORT: bpf: offload: add infrastructure for loading programs for a specific netdev The fact that we don't know which device the program is going to be used on is quite limiting in current eBPF infrastructure. We have to reverse or limit the changes which kernel makes to the loaded bytecode if we want it to be offloaded to a networking device. We also have to invent new APIs for debugging and troubleshooting support. Make it possible to load programs for a specific netdev. This helps us to bring the debug information closer to the core eBPF infrastructure (e.g. we will be able to reuse the verifer log in device JIT). It allows device JITs to perform translation on the original bytecode. __bpf_prog_get() when called to get a reference for an attachment point will now refuse to give it if program has a device assigned. Following patches will add a version of that function which passes the expected netdev in. @type argument in __bpf_prog_get() is renamed to attach_type to make it clearer that it's only set on attachment. All calls to ndo_bpf are protected by rtnl, only verifier callbacks are not. We need a wait queue to make sure netdev doesn't get destroyed while verifier is still running and calling its driver. Change-Id: Iba7b96574abc005ad3351d6db2528eb534e47561 Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2 months ago
Jakub Kicinski	a7dc67603b	BACKPORT: net: bpf: rename ndo_xdp to ndo_bpf ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Change-Id: I08c456c9afa712ce0b7a98c24b6f46545e69f3cc Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2 months ago
Tim Zimmermann	a1da9dde53	bpf: Update logging functions to work with BTF * Based on `430e68d10b`, `77d2e05abd` and `a2a7d57010` Change-Id: I27e2c804726078646ca9beda31cbae2a745dfd47	2 months ago
Lorenz Bauer	65b93fbf9b	bpf: btf: fix truncated last_member_type_id in btf_struct_resolve [ Upstream commit a37a32583e282d8d815e22add29bc1e91e19951a ] When trying to finish resolving a struct member, btf_struct_resolve saves the member type id in a u16 temporary variable. This truncates the 32 bit type id value if it exceeds UINT16_MAX. As a result, structs that have members with type ids > UINT16_MAX and which need resolution will fail with a message like this: [67414] STRUCT ff_device size=120 vlen=12 effect_owners type_id=67434 bits_offset=960 Member exceeds struct_size Fix this by changing the type of last_member_type_id to u32. Fixes: a0791f0df7d2 ("bpf: fix BTF limits") Reviewed-by: Stanislav Fomichev <sdf@google.com> Change-Id: I3a3db7bd5dc8836dd2aa2ba572169aa2a0629eca Signed-off-by: Lorenz Bauer <oss@lmb.io> Link: https://lore.kernel.org/r/20220910110120.339242-1-oss@lmb.io Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2 months ago
Yoshiki Komachi	2096a49efb	bpf/btf: Fix BTF verification of enum members in struct/union commit da6c7faeb103c493e505e87643272f70be586635 upstream. btf_enum_check_member() was currently sure to recognize the size of "enum" type members in struct/union as the size of "int" even if its size was packed. This patch fixes BTF enum verification to use the correct size of member in BPF programs. Fixes: 179cde8cef7e ("bpf: btf: Check members of struct/union") Change-Id: Idd2f710477e7abbc6cf541fe3d9fecfe0d4ce594 Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1583825550-18606-2-git-send-email-komachi.yoshiki@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2 months ago
Alexei Starovoitov	d906b31859	bpf: fix BTF limits [ Upstream commit a0791f0df7d212c245761538b17a9ea93607b667 ] vmlinux BTF has more than 64k types. Its string section is also at the offset larger than 64k. Adjust both limits to make in-kernel BTF verifier successfully parse in-kernel BTF. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Change-Id: I921037306001847bb0afac797a5b33f625bf65d8 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2 months ago
Martin Lau	4c6e08f9b0	bpf, btf: fix a missing check bug in btf_parse [ Upstream commit 4a6998aff82a20a1aece86a186d8e5263f8b2315 ] Wenwen Wang reported: In btf_parse(), the header of the user-space btf data 'btf_data' is firstly parsed and verified through btf_parse_hdr(). In btf_parse_hdr(), the header is copied from user-space 'btf_data' to kernel-space 'btf->hdr' and then verified. If no error happens during the verification process, the whole data of 'btf_data', including the header, is then copied to 'data' in btf_parse(). It is obvious that the header is copied twice here. More importantly, no check is enforced after the second copy to make sure the headers obtained in these two copies are same. Given that 'btf_data' resides in the user space, a malicious user can race to modify the header between these two copies. By doing so, the user can inject inconsistent data, which can cause undefined behavior of the kernel and introduce potential security risk. This issue is similar to the one fixed in commit 8af03d1ae2e1 ("bpf: btf: Fix a missing check bug"). To fix it, this patch copies the user 'btf_data' before parsing / verifying the BTF header. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Change-Id: I36ea252fc676d49cbb51e71c1c72ca7ebdd179cd Signed-off-by: Martin KaFai Lau <kafai@fb.com> Co-developed-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2 months ago
Wenwen Wang	f148ef58bb	bpf: btf: Fix a missing check bug [ Upstream commit 8af03d1ae2e154a8be3631e8694b87007e1bdbc2 ] In btf_parse_hdr(), the length of the btf data header is firstly copied from the user space to 'hdr_len' and checked to see whether it is larger than 'btf_data_size'. If yes, an error code EINVAL is returned. Otherwise, the whole header is copied again from the user space to 'btf->hdr'. However, after the second copy, there is no check between 'btf->hdr->hdr_len' and 'hdr_len' to confirm that the two copies get the same value. Given that the btf data is in the user space, a malicious user can race to change the data between the two copies. By doing so, the user can provide malicious data to the kernel and cause undefined behavior. This patch adds a necessary check after the second copy, to make sure 'btf->hdr->hdr_len' has the same value as 'hdr_len'. Otherwise, an error code EINVAL will be returned. Change-Id: I8b01374ab866f1917b3f21486eae42403b520711 Signed-off-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2 months ago
Martin KaFai Lau	ef6539cc86	bpf: btf: Fix end boundary calculation for type section The end boundary math for type section is incorrect in btf_check_all_metas(). It just happens that hdr->type_off is always 0 for now because there are only two sections (type and string) and string section must be at the end (ensured in btf_parse_str_sec). However, type_off may not be 0 if a new section would be added later. This patch fixes it. Fixes: f80442a4cd18 ("bpf: btf: Change how section is supported in btf_header") Reported-by: Dmitry Vyukov <dvyukov@google.com> Change-Id: Ic748f3764714643b4002f1459a63a34a3af9317b Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Daniel Borkmann	b5dd3dda19	bpf: fix bpf_skb_load_bytes_relative pkt length check The len > skb_headlen(skb) cannot be used as a maximum upper bound for the packet length since it does not have any relation to the full linear packet length when filtering is used from upper layers (e.g. in case of reuseport BPF programs) as by then skb->data, skb->len already got mangled through __skb_pull() and others. Fixes: 4e1ec56cdc59 ("bpf: add skb_load_bytes_relative helper") Change-Id: Ic72959d61a393dc411f7654697d39b5fabc56604 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com>	2 months ago
Martin KaFai Lau	204068d6ed	bpf: btf: Ensure the member->offset is in the right order This patch ensures the member->offset of a struct is in the correct order (i.e the later member's offset cannot go backward). The current "pahole -J" BTF encoder does not generate something like this. However, checking this can ensure future encoder will not violate this. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Change-Id: I07772d4be8072c45c751389a804431032c535358 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Martin KaFai Lau	53ca9e2635	bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h This patch shrinks the BTF_INT_BITS() mask. The current btf_int_check_meta() ensures the nr_bits of an integer cannot exceed 64. Hence, it is mostly an uapi cleanup. The actual btf usage (i.e. seq_show()) is also modified to use u8 instead of u16. The verification (e.g. btf_int_check_meta()) path stays as is to deal with invalid BTF situation. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Change-Id: I870f4579152bf26f29925382e44e397e35ebb344 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Okash Khawaja	64f7712a10	bpf: btf: Fix bitfield extraction for big endian When extracting bitfield from a number, btf_int_bits_seq_show() builds a mask and accesses least significant byte of the number in a way specific to little-endian. This patch fixes that by checking endianness of the machine and then shifting left and right the unneeded bits. Thanks to Martin Lau for the help in navigating potential pitfalls when dealing with endianess and for the final solution. Fixes: b00b8daec828 ("bpf: btf: Add pretty print capability for data with BTF type info") Change-Id: Ib5586230ad33de1e0af301b4c3790355426450af Signed-off-by: Okash Khawaja <osk@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Martin KaFai Lau	c9f070767d	bpf: btf: Ensure t->type == 0 for BTF_KIND_FWD The t->type in BTF_KIND_FWD is not used. It must be 0. This patch ensures that and also adds a test case in test_btf.c Change-Id: I3a12680100b4379cc69989e9d0e48a9142d1e6e6 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2 months ago
Martin KaFai Lau	cc6426010b	bpf: btf: Check array t->size This patch ensures array's t->size is 0. The array size is decided by its individual elem's size and the number of elements. Hence, t->size is not used and it must be 0. A test case is added to test_btf.c Change-Id: I5f1c299322dbd2172b24439ebd62473f245a513c Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2 months ago
Arnd Bergmann	557f07fe07	bpf: btf: avoid -Wreturn-type warning gcc warns about a noreturn function possibly returning in some configurations: kernel/bpf/btf.c: In function 'env_type_is_resolve_sink': kernel/bpf/btf.c:729:1: error: control reaches end of non-void function [-Werror=return-type] Using BUG() instead of BUG_ON() avoids that warning and otherwise does the exact same thing. Fixes: eb3f595dab40 ("bpf: btf: Validate type reference") Change-Id: I389a395a727a75301b4342e60da6dca10f930ace Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Martin KaFai Lau	cc8be5a24d	bpf: btf: Avoid variable length array Sparse warning: kernel/bpf/btf.c:1985:34: warning: Variable length array is used. This patch directly uses ARRAY_SIZE(). Fixes: f80442a4cd18 ("bpf: btf: Change how section is supported in btf_header") Change-Id: I0457ce16acef2df07f61cff0303f7cf1cba8c8c5 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Martin KaFai Lau	4b8c820067	bpf: btf: Remove unused bits from uapi/linux/btf.h This patch does the followings: 1. Limit BTF_MAX_TYPES and BTF_MAX_NAME_OFFSET to 64k. We can raise it later. 2. Remove the BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID. They are currently encoded at the highest bit of a u32. It is because the current use case does not require supporting parent type (i.e type_id referring to a type in another BTF file). It also does not support referring to a string in ELF. The BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID checks are replaced by BTF_TYPE_ID_CHECK and BTF_STR_OFFSET_CHECK which are defined in btf.c instead of uapi/linux/btf.h. 3. Limit the BTF_INFO_KIND from 5 bits to 4 bits which is enough. There is unused bits headroom if we ever needed it later. 4. The root bit in BTF_INFO is also removed because it is not used in the current use case. 5. Remove BTF_INT_VARARGS since func type is not supported now. The BTF_INT_ENCODING is limited to 4 bits instead of 8 bits. The above can be added back later because the verifier ensures the unused bits are zeros. Change-Id: I1046de7b41054f007572fec5ca7fc62c3fd66440 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Martin KaFai Lau	009a19c8b0	bpf: btf: Check array->index_type Instead of ingoring the array->index_type field. Enforce that it must be a BTF_KIND_INT in size 1/2/4/8 bytes. Change-Id: Ibfcef45f9df9ed1149eb7c521bece3f333ea0007 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Martin KaFai Lau	5e21c5cbdc	bpf: btf: Change how section is supported in btf_header There are currently unused section descriptions in the btf_header. Those sections are here to support future BTF use cases. For example, the func section (func_off) is to support function signature (e.g. the BPF prog function signature). Instead of spelling out all potential sections up-front in the btf_header. This patch makes changes to btf_header such that extending it (e.g. adding a section) is possible later. The unused ones can be removed for now and they can be added back later. This patch: 1. adds a hdr_len to the btf_header. It will allow adding sections (and other info like parent_label and parent_name) later. The check is similar to the existing bpf_attr. If a user passes in a longer hdr_len, the kernel ensures the extra tailing bytes are 0. 2. allows the section order in the BTF object to be different from its sec_off order in btf_header. 3. each sec_off is followed by a sec_len. It must not have gap or overlapping among sections. The string section is ensured to be at the end due to the 4 bytes alignment requirement of the type section. The above changes will allow enough flexibility to add new sections (and other info) to the btf_header later. This patch also removes an unnecessary !err check at the end of btf_parse(). Change-Id: I8e7d8673d7c4cc6a5f5a0bccc64492de5f64a30a Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2 months ago
Martin KaFai Lau	0c4a032a8c	bpf: Fix compiler warning on info.map_ids for 32bit platform This patch uses u64_to_user_ptr() to cast info.map_ids to a userspace ptr. It also tags the user_map_ids with '__user' for sparse check. Fixes: cb4d2b3f03d8 ("bpf: Add name, load_time, uid and map_ids to bpf_prog_info") Change-Id: I18907e003d20295d9e375eb1493c848d795a7a16 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2 months ago
Tim Zimmermann	d977579d49	fixup! bpf: Update logging functions to work with BTF * Upstream calls bpf_verifier_vlog() directly and calling bpf_verifier_log_write() here can sometimes break format args and cause kernel panics Change-Id: I5f7dde9e83b8ef5a2bd1d2739bc08dd2ce69c41d	2 months ago
Tim Zimmermann	488c6af7b4	syscall: Fake uname to 4.19 also for netbpfload * This is required for U QPR2 Change-Id: I0321c64f77fccf74ff2472c3abd29e8b6b4be1ce	2 months ago
Ruchit	d211284296	defconfig: Bump kernel version Change-Id: I392f399ea7e1626c016d5d6ca2b7e5d4de8c9b90	2 months ago
Ruchit	57838f422b	Rip out samsung debugging These fuckers really hardcoded this shit deep	2 months ago
Satya Durga Srinivasu Prabhala	2c345471fd	sched: Fix compilation issues for !CONFIG_SCHED_WALT Below compilation issues are observed when CONFIG_SCHED_WALT is disabled. 1. kernel/sched/cpufreq_schedutil.c:408:23: \ error: implicit declaration of function 'boosted_cpu_util' 2. kernel/sched/core_ctl.c:1291:2: \ error: implicit declaration of function 'for_each_sched_cluster' Fix these compilation issues by adding/updating proper checks and dependencies as needed. Change-Id: I59d3714a9fca0ff58758ec974f50eb5f3f00ae98 Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>	2 months ago
Alexander Winkowski	5986cfb014	cpufreq: schedutil: Give explicit hints to compiler Make sure the compiler optimises away conditions that are always false since commit `b91319892e` (cpufreq: schedutil: Don't jump to max frequency for RT tasks). Change-Id: I7a108ff1a4ba09f2cb82ea8a82bd15967e724709 Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	2 months ago
Wei Wang	a4df9f22eb	ANDROID: cpufreq: schedutil: maintain raw cache when next_f is not changed Currently, the raw cache will be reset when next_f is changed after get_next_freq for correctness. However, it may introduce more cycles in those cases. This patch changes it to maintain the cached value instead of dropping it. Bug: 159936782 Bug: 158863204 Signed-off-by: Wei Wang <wvw@google.com> [dereference23: Backport to 4.14] Signed-off-by: Alexander Winkowski <dereference23@outlook.com> Change-Id: I519ca02dd2e6038e3966e1f68fee641628827c82	2 months ago
John Dias	3a327db6e2	cpufreq: schedutil: clear cached_raw_freq when invalidated The cpufreq_schedutil governor keeps a cache of the last raw frequency that was mapped to a supported device frequency. If the next request for a frequency matches the cached value, the policy's next_freq value is reused. But there are paths that can update the raw cached value without updating the next_freq value, and there are paths that can set the next_freq value without setting the raw cached value. On those paths, the cached value must be reset. The case that has been observed is when a frequency request reaches sugov_update_commit but is then rejected by to the sugov_up_down_rate_limit check. Bug: 116279565 Change-Id: I7c585339a04ff1732054d6e5b36a57e2d41266aa Signed-off-by: John Dias <joaodias@google.com> Signed-off-by: Miguel de Dios <migueldedios@google.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	2 months ago
Alexander Winkowski	8cbdd90438	sched: fair: Modify capacity margins for atoll This is tuned to match energy model characteristics and scheduler efficiency enhancements. Change-Id: Ia60e1ea888457fa1c0c0273cdd4b0180f0a87abf Co-authored-by: Diep Quynh <remilia.1505@gmail.com> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	2 months ago
Sultan Alsawaf	2229fdeb2c	sched/core: Skip superfluous acquire barrier in ttwu ttwu_remote() unconditionally locks the task's runqueue lock, which implies a full barrier across the lock and unlock, so the acquire barrier after the control dependency is only needed when the task isn't on the runqueue. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Change-Id: Ibb988fe341ba3109d381682a8725fbd2e7a648e3	2 months ago
Sultan Alsawaf	ab03afc235	sched/fair: Compile out NUMA code entirely when NUMA is disabled Scheduler code is very hot and every little optimization counts. Instead of constantly checking sched_numa_balancing when NUMA is disabled, compile it out. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Change-Id: I7334594fbe835f615a199cfe02ee526135abab06	2 months ago
Mel Gorman	d89f338fa1	sched/fair: Do not migrate if the prev_cpu is idle wake_affine_idle() prefers to move a task to the current CPU if the wakeup is due to an interrupt. The expectation is that the interrupt data is cache hot and relevant to the waking task as well as avoiding a search. However, there is no way to determine if there was cache hot data on the previous CPU that may exceed the interrupt data. Furthermore, round-robin delivery of interrupts can migrate tasks around a socket where each CPU is under-utilised. This can interact badly with cpufreq which makes decisions based on per-cpu data. It has been observed on machines with HWP that p-states are not boosted to their maximum levels even though the workload is latency and throughput sensitive. This patch uses the previous CPU for the task if it's idle and cache-affine with the current CPU even if the current CPU is idle due to the wakup being related to the interrupt. This reduces migrations at the cost of the interrupt data not being cache hot when the task wakes. A variety of workloads were tested on various machines and no adverse impact was noticed that was outside noise. dbench on ext4 on UMA showed roughly 10% reduction in the number of CPU migrations and it is a case where interrupts are frequent for IO competions. In most cases, the difference in performance is quite small but variability is often reduced. For example, this is the result for pgbench running on a UMA machine with different numbers of clients. 4.15.0-rc9 4.15.0-rc9 baseline waprev-v1 Hmean 1 22096.28 ( 0.00%) 22734.86 ( 2.89%) Hmean 4 74633.42 ( 0.00%) 75496.77 ( 1.16%) Hmean 7 115017.50 ( 0.00%) 113030.81 ( -1.73%) Hmean 12 126209.63 ( 0.00%) 126613.40 ( 0.32%) Hmean 16 131886.91 ( 0.00%) 130844.35 ( -0.79%) Stddev 1 636.38 ( 0.00%) 417.11 ( 34.46%) Stddev 4 614.64 ( 0.00%) 583.24 ( 5.11%) Stddev 7 542.46 ( 0.00%) 435.45 ( 19.73%) Stddev 12 173.93 ( 0.00%) 171.50 ( 1.40%) Stddev 16 671.42 ( 0.00%) 680.30 ( -1.32%) CoeffVar 1 2.88 ( 0.00%) 1.83 ( 36.26%) Note that the different in performance is marginal but for low utilisation, there is less variability. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180130104555.4125-4-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Change-Id: I28ccbec4a55ff8114aa7e8ce92e5e2c48806361d Signed-off-by: Danny Lin <danny@kdrag0n.dev> Signed-off-by: Alexander Winkowski <dereference23@outlook.com>	2 months ago

1 2 3 4 5 ...

762295 Commits (urubino) All Branches Search

762295 Commits (urubino)

All Branches