whiterose

linux unikernel
Log | Files | Refs | README | LICENSE | git clone https://git.ne02ptzero.me/git/whiterose

commit 1a29e857507046e413ca7a4a7c9cd32fed9ea255
parent c4703acd6d4a58dc4b31ad2a8f8b14becb898d25
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sat,  9 Mar 2019 09:56:17 -0800

Merge tag 'docs-5.1' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A fairly routine cycle for docs - lots of typo fixes, some new
  documents, and more translations. There's also some LICENSES
  adjustments from Thomas"

* tag 'docs-5.1' of git://git.lwn.net/linux: (74 commits)
  docs: Bring some order to filesystem documentation
  Documentation/locking/lockdep: Drop last two chars of sample states
  doc: rcu: Suspicious RCU usage is a warning
  docs: driver-api: iio: fix errors in documentation
  Documentation/process/howto: Update for 4.x -> 5.x versioning
  docs: Explicitly state that the 'Fixes:' tag shouldn't split lines
  doc: security: Add kern-doc for lsm_hooks.h
  doc: sctp: Merge and clean up rst files
  Docs: Correct /proc/stat path
  scripts/spdxcheck.py: fix C++ comment style detection
  doc: fix typos in license-rules.rst
  Documentation: fix admin-guide/README.rst minimum gcc version requirement
  doc: process: complete removal of info about -git patches
  doc: translations: sync translations 'remove info about -git patches'
  perf-security: wrap paragraphs on 72 columns
  perf-security: elaborate on perf_events/Perf privileged users
  perf-security: document collected perf_events/Perf data categories
  perf-security: document perf_events/Perf resource control
  sysfs.txt: add note on available attribute macros
  docs: kernel-doc: typo "if ... if" -> "if ... is"
  ...

Diffstat:
MDocumentation/DMA-API.txt | 6+++---
MDocumentation/DMA-ISA-LPC.txt | 4++--
MDocumentation/RCU/lockdep-splat.txt | 12++++++------
MDocumentation/admin-guide/README.rst | 2+-
MDocumentation/admin-guide/kernel-parameters.txt | 13++++++++++---
MDocumentation/admin-guide/perf-security.rst | 253++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------
MDocumentation/admin-guide/tainted-kernels.rst | 159+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
MDocumentation/cgroup-v1/memory.txt | 7++++---
MDocumentation/core-api/kernel-api.rst | 4----
MDocumentation/core-api/memory-allocation.rst | 10++++++----
MDocumentation/core-api/mm-api.rst | 2+-
MDocumentation/dev-tools/kcov.rst | 2+-
MDocumentation/doc-guide/kernel-doc.rst | 17+++++++++++++++--
MDocumentation/doc-guide/sphinx.rst | 12++++++------
MDocumentation/driver-api/dmaengine/client.rst | 7+++++++
MDocumentation/driver-api/iio/buffers.rst | 2+-
MDocumentation/driver-api/iio/core.rst | 6+++---
MDocumentation/driver-api/iio/hw-consumer.rst | 2+-
MDocumentation/driver-api/iio/triggers.rst | 2+-
MDocumentation/fault-injection/fault-injection.txt | 2+-
ADocumentation/filesystems/api-summary.rst | 150+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ADocumentation/filesystems/binderfs.rst | 68++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
MDocumentation/filesystems/index.rst | 389++++++-------------------------------------------------------------------------
ADocumentation/filesystems/journalling.rst | 184+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
MDocumentation/filesystems/path-lookup.rst | 39+++++++++++++++++++++++++++------------
ADocumentation/filesystems/splice.rst | 22++++++++++++++++++++++
MDocumentation/filesystems/sysfs.txt | 21+++++++++++++++++++++
MDocumentation/hwmon/f71882fg | 2+-
MDocumentation/index.rst | 1+
MDocumentation/input/devices/xpad.rst | 2+-
MDocumentation/laptops/lg-laptop.rst | 4+++-
MDocumentation/locking/lockdep-design.txt | 4++--
MDocumentation/misc-devices/ibmvmc.rst | 1+
ADocumentation/misc-devices/index.rst | 17+++++++++++++++++
ADocumentation/networking/checksum-offloads.rst | 143+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DDocumentation/networking/checksum-offloads.txt | 122-------------------------------------------------------------------------------
MDocumentation/networking/index.rst | 3+++
ADocumentation/networking/scaling.rst | 523+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DDocumentation/networking/scaling.txt | 484-------------------------------------------------------------------------------
ADocumentation/networking/segmentation-offloads.rst | 184+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DDocumentation/networking/segmentation-offloads.txt | 170-------------------------------------------------------------------------------
MDocumentation/process/coding-style.rst | 97++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
MDocumentation/process/howto.rst | 59+++++++++++++++++++++++------------------------------------
MDocumentation/process/kernel-docs.rst | 2+-
MDocumentation/process/license-rules.rst | 66++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
MDocumentation/process/stable-api-nonsense.rst | 15+++++++--------
MDocumentation/process/stable-kernel-rules.rst | 9++++++---
MDocumentation/process/submitting-patches.rst | 6++++--
DDocumentation/security/LSM-sctp.rst | 175-------------------------------------------------------------------------------
MDocumentation/security/LSM.rst | 5++++-
ADocumentation/security/SCTP.rst | 343+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DDocumentation/security/SELinux-sctp.rst | 158-------------------------------------------------------------------------------
MDocumentation/security/index.rst | 3+--
MDocumentation/static-keys.txt | 2+-
MDocumentation/sysctl/kernel.txt | 50++++++++++++++++++++++----------------------------
MDocumentation/sysctl/vm.txt | 2+-
MDocumentation/timers/highres.txt | 2+-
MDocumentation/translations/it_IT/doc-guide/sphinx.rst | 2++
MDocumentation/translations/it_IT/process/applying-patches.rst | 12+++++++-----
MDocumentation/translations/it_IT/process/changes.rst | 487++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
MDocumentation/translations/it_IT/process/coding-style.rst | 103++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------
MDocumentation/translations/it_IT/process/howto.rst | 13+------------
MDocumentation/translations/it_IT/process/stable-api-nonsense.rst | 202+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
MDocumentation/translations/it_IT/process/submit-checklist.rst | 127++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
MDocumentation/translations/it_IT/process/submitting-drivers.rst | 8++++++--
MDocumentation/translations/it_IT/process/submitting-patches.rst | 862++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
MDocumentation/translations/ja_JP/howto.rst | 12+-----------
MDocumentation/translations/ko_KR/howto.rst | 56++++++++++++++++++--------------------------------------
MDocumentation/translations/zh_CN/HOWTO | 9---------
MDocumentation/translations/zh_CN/coding-style.rst | 57+++++++++++++++++++++++++++++++++++++--------------------
MDocumentation/vm/index.rst | 2+-
MDocumentation/vm/slub.rst | 4++--
ALICENSES/exceptions/GCC-exception-2.0 | 18++++++++++++++++++
Minclude/linux/module.h | 18+++++++++++++++++-
Minclude/linux/skbuff.h | 2+-
Msamples/Kconfig | 7+++++++
Msamples/Makefile | 2+-
Asamples/binderfs/Makefile | 1+
Asamples/binderfs/binderfs_example.c | 83+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mscripts/checkpatch.pl | 13-------------
Mscripts/kernel-doc | 2+-
Mscripts/spdxcheck.py | 8+++++++-
Msecurity/selinux/hooks.c | 2+-
Mtools/Makefile | 14++++++++------
Atools/debugging/Makefile | 16++++++++++++++++
Atools/debugging/kernel-chktaint | 202+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
86 files changed, 4501 insertions(+), 1894 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt @@ -530,8 +530,8 @@ that simply cannot make consistent memory. dma_free_attrs(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs) -Free memory allocated by the dma_alloc_attrs(). All parameters common -parameters must identical to those otherwise passed to dma_fre_coherent, +Free memory allocated by the dma_alloc_attrs(). All common +parameters must be identical to those otherwise passed to dma_free_coherent, and the attrs argument must be identical to the attrs passed to dma_alloc_attrs(). @@ -717,7 +717,7 @@ dma-api/num_free_entries The current number of free dma_debug_entries dma-api/nr_total_entries The total number of dma_debug_entries in the allocator, both free and used. -dma-api/driver-filter You can write a name of a driver into this file +dma-api/driver_filter You can write a name of a driver into this file to limit the debug output to requests from that particular driver. Write an empty string to that file to disable the filter and see diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/DMA-ISA-LPC.txt @@ -52,8 +52,8 @@ Address translation ------------------- To translate the virtual address to a bus address, use the normal DMA -API. Do _not_ use isa_virt_to_phys() even though it does the same -thing. The reason for this is that the function isa_virt_to_phys() +API. Do _not_ use isa_virt_to_bus() even though it does the same +thing. The reason for this is that the function isa_virt_to_bus() will require a Kconfig dependency to ISA, not just ISA_DMA_API which is really all you need. Remember that even though the DMA controller has its origins in ISA it is used elsewhere. diff --git a/Documentation/RCU/lockdep-splat.txt b/Documentation/RCU/lockdep-splat.txt @@ -14,9 +14,9 @@ being the real world and all that. So let's look at an example RCU lockdep splat from 3.0-rc5, one that has long since been fixed: -=============================== -[ INFO: suspicious RCU usage. ] -------------------------------- +============================= +WARNING: suspicious RCU usage +----------------------------- block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage! other info that might help us debug this: @@ -24,11 +24,11 @@ other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 3 locks held by scsi_scan_6/1552: - #0: (&shost->scan_mutex){+.+.+.}, at: [<ffffffff8145efca>] + #0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>] scsi_scan_host_selected+0x5a/0x150 - #1: (&eq->sysfs_lock){+.+...}, at: [<ffffffff812a5032>] + #1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>] elevator_exit+0x22/0x60 - #2: (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffff812b6233>] + #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>] cfq_exit_queue+0x43/0x190 stack backtrace: diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst @@ -251,7 +251,7 @@ Configuring the kernel Compiling the kernel -------------------- - - Make sure you have at least gcc 3.2 available. + - Make sure you have at least gcc 4.6 available. For more information, refer to :ref:`Documentation/process/changes.rst <changes>`. Please note that you can still run a.out user programs with this kernel. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt @@ -1197,9 +1197,10 @@ arch/x86/kernel/cpu/cpufreq/elanfreq.c. elevator= [IOSCHED] - Format: {"cfq" | "deadline" | "noop"} - See Documentation/block/cfq-iosched.txt and - Documentation/block/deadline-iosched.txt for details. + Format: { "mq-deadline" | "kyber" | "bfq" } + See Documentation/block/deadline-iosched.txt, + Documentation/block/kyber-iosched.txt and + Documentation/block/bfq-iosched.txt for details. elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390] Specifies physical address of start of kernel core @@ -1996,6 +1997,12 @@ Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, the default is off. + kpti= [ARM64] Control page table isolation of user + and kernel address spaces. + Default: enabled on cores which need mitigation. + 0: force disabled + 1: force enabled + kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst @@ -6,83 +6,211 @@ Perf Events and tool security Overview -------- -Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ can -impose a considerable risk of leaking sensitive data accessed by monitored -processes. The data leakage is possible both in scenarios of direct usage of -perf_events system call API [2]_ and over data files generated by Perf tool user -mode utility (Perf) [3]_ , [4]_ . The risk depends on the nature of data that -perf_events performance monitoring units (PMU) [2]_ collect and expose for -performance analysis. Having that said perf_events/Perf performance monitoring -is the subject for security access control management [5]_ . +Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ +can impose a considerable risk of leaking sensitive data accessed by +monitored processes. The data leakage is possible both in scenarios of +direct usage of perf_events system call API [2]_ and over data files +generated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk +depends on the nature of data that perf_events performance monitoring +units (PMU) [2]_ and Perf collect and expose for performance analysis. +Collected system and performance data may be split into several +categories: + +1. System hardware and software configuration data, for example: a CPU + model and its cache configuration, an amount of available memory and + its topology, used kernel and Perf versions, performance monitoring + setup including experiment time, events configuration, Perf command + line parameters, etc. + +2. User and kernel module paths and their load addresses with sizes, + process and thread names with their PIDs and TIDs, timestamps for + captured hardware and software events. + +3. Content of kernel software counters (e.g., for context switches, page + faults, CPU migrations), architectural hardware performance counters + (PMC) [8]_ and machine specific registers (MSR) [9]_ that provide + execution metrics for various monitored parts of the system (e.g., + memory controller (IMC), interconnect (QPI/UPI) or peripheral (PCIe) + uncore counters) without direct attribution to any execution context + state. + +4. Content of architectural execution context registers (e.g., RIP, RSP, + RBP on x86_64), process user and kernel space memory addresses and + data, content of various architectural MSRs that capture data from + this category. + +Data that belong to the fourth category can potentially contain +sensitive process data. If PMUs in some monitoring modes capture values +of execution context registers or data from process memory then access +to such monitoring capabilities requires to be ordered and secured +properly. So, perf_events/Perf performance monitoring is the subject for +security access control management [5]_ . perf_events/Perf access control ------------------------------- -To perform security checks, the Linux implementation splits processes into two -categories [6]_ : a) privileged processes (whose effective user ID is 0, referred -to as superuser or root), and b) unprivileged processes (whose effective UID is -nonzero). Privileged processes bypass all kernel security permission checks so -perf_events performance monitoring is fully available to privileged processes -without access, scope and resource restrictions. - -Unprivileged processes are subject to a full security permission check based on -the process's credentials [5]_ (usually: effective UID, effective GID, and -supplementary group list). - -Linux divides the privileges traditionally associated with superuser into -distinct units, known as capabilities [6]_ , which can be independently enabled -and disabled on per-thread basis for processes and files of unprivileged users. - -Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated as -privileged processes with respect to perf_events performance monitoring and -bypass *scope* permissions checks in the kernel. - -Unprivileged processes using perf_events system call API is also subject for -PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose outcome -determines whether monitoring is permitted. So unprivileged processes provided -with CAP_SYS_PTRACE capability are effectively permitted to pass the check. - -Other capabilities being granted to unprivileged processes can effectively -enable capturing of additional data required for later performance analysis of -monitored processes or a system. For example, CAP_SYSLOG capability permits -reading kernel space memory addresses from /proc/kallsyms file. +To perform security checks, the Linux implementation splits processes +into two categories [6]_ : a) privileged processes (whose effective user +ID is 0, referred to as superuser or root), and b) unprivileged +processes (whose effective UID is nonzero). Privileged processes bypass +all kernel security permission checks so perf_events performance +monitoring is fully available to privileged processes without access, +scope and resource restrictions. + +Unprivileged processes are subject to a full security permission check +based on the process's credentials [5]_ (usually: effective UID, +effective GID, and supplementary group list). + +Linux divides the privileges traditionally associated with superuser +into distinct units, known as capabilities [6]_ , which can be +independently enabled and disabled on per-thread basis for processes and +files of unprivileged users. + +Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated +as privileged processes with respect to perf_events performance +monitoring and bypass *scope* permissions checks in the kernel. + +Unprivileged processes using perf_events system call API is also subject +for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose +outcome determines whether monitoring is permitted. So unprivileged +processes provided with CAP_SYS_PTRACE capability are effectively +permitted to pass the check. + +Other capabilities being granted to unprivileged processes can +effectively enable capturing of additional data required for later +performance analysis of monitored processes or a system. For example, +CAP_SYSLOG capability permits reading kernel space memory addresses from +/proc/kallsyms file. + +perf_events/Perf privileged users +--------------------------------- + +Mechanisms of capabilities, privileged capability-dumb files [6]_ and +file system ACLs [10]_ can be used to create a dedicated group of +perf_events/Perf privileged users who are permitted to execute +performance monitoring without scope limits. The following steps can be +taken to create such a group of privileged Perf users. + +1. Create perf_users group of privileged Perf users, assign perf_users + group to Perf tool executable and limit access to the executable for + other users in the system who are not in the perf_users group: + +:: + + # groupadd perf_users + # ls -alhF + -rwxr-xr-x 2 root root 11M Oct 19 15:12 perf + # chgrp perf_users perf + # ls -alhF + -rwxr-xr-x 2 root perf_users 11M Oct 19 15:12 perf + # chmod o-rwx perf + # ls -alhF + -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf + +2. Assign the required capabilities to the Perf tool executable file and + enable members of perf_users group with performance monitoring + privileges [6]_ : + +:: + + # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + perf: OK + # getcap perf + perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep + +As a result, members of perf_users group are capable of conducting +performance monitoring by using functionality of the configured Perf +tool executable that, when executes, passes perf_events subsystem scope +checks. + +This specific access control management is only available to superuser +or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ +capabilities. perf_events/Perf unprivileged users ----------------------------------- -perf_events/Perf *scope* and *access* control for unprivileged processes is -governed by perf_event_paranoid [2]_ setting: +perf_events/Perf *scope* and *access* control for unprivileged processes +is governed by perf_event_paranoid [2]_ setting: -1: - Impose no *scope* and *access* restrictions on using perf_events performance - monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ locking limit is - ignored when allocating memory buffers for storing performance data. - This is the least secure mode since allowed monitored *scope* is - maximized and no perf_events specific limits are imposed on *resources* - allocated for performance monitoring. + Impose no *scope* and *access* restrictions on using perf_events + performance monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ + locking limit is ignored when allocating memory buffers for storing + performance data. This is the least secure mode since allowed + monitored *scope* is maximized and no perf_events specific limits + are imposed on *resources* allocated for performance monitoring. >=0: *scope* includes per-process and system wide performance monitoring - but excludes raw tracepoints and ftrace function tracepoints monitoring. - CPU and system events happened when executing either in user or - in kernel space can be monitored and captured for later analysis. - Per-user per-cpu perf_event_mlock_kb locking limit is imposed but - ignored for unprivileged processes with CAP_IPC_LOCK [6]_ capability. + but excludes raw tracepoints and ftrace function tracepoints + monitoring. CPU and system events happened when executing either in + user or in kernel space can be monitored and captured for later + analysis. Per-user per-cpu perf_event_mlock_kb locking limit is + imposed but ignored for unprivileged processes with CAP_IPC_LOCK + [6]_ capability. >=1: - *scope* includes per-process performance monitoring only and excludes - system wide performance monitoring. CPU and system events happened when - executing either in user or in kernel space can be monitored and - captured for later analysis. Per-user per-cpu perf_event_mlock_kb - locking limit is imposed but ignored for unprivileged processes with - CAP_IPC_LOCK capability. + *scope* includes per-process performance monitoring only and + excludes system wide performance monitoring. CPU and system events + happened when executing either in user or in kernel space can be + monitored and captured for later analysis. Per-user per-cpu + perf_event_mlock_kb locking limit is imposed but ignored for + unprivileged processes with CAP_IPC_LOCK capability. >=2: - *scope* includes per-process performance monitoring only. CPU and system - events happened when executing in user space only can be monitored and - captured for later analysis. Per-user per-cpu perf_event_mlock_kb - locking limit is imposed but ignored for unprivileged processes with - CAP_IPC_LOCK capability. + *scope* includes per-process performance monitoring only. CPU and + system events happened when executing in user space only can be + monitored and captured for later analysis. Per-user per-cpu + perf_event_mlock_kb locking limit is imposed but ignored for + unprivileged processes with CAP_IPC_LOCK capability. + +perf_events/Perf resource control +--------------------------------- + +Open file descriptors ++++++++++++++++++++++ + +The perf_events system call API [2]_ allocates file descriptors for +every configured PMU event. Open file descriptors are a per-process +accountable resource governed by the RLIMIT_NOFILE [11]_ limit +(ulimit -n), which is usually derived from the login shell process. When +configuring Perf collection for a long list of events on a large server +system, this limit can be easily hit preventing required monitoring +configuration. RLIMIT_NOFILE limit can be increased on per-user basis +modifying content of the limits.conf file [12]_ . Ordinarily, a Perf +sampling session (perf record) requires an amount of open perf_event +file descriptors that is not less than the number of monitored events +multiplied by the number of monitored CPUs. + +Memory allocation ++++++++++++++++++ + +The amount of memory available to user processes for capturing +performance monitoring data is governed by the perf_event_mlock_kb [2]_ +setting. This perf_event specific resource setting defines overall +per-cpu limits of memory allowed for mapping by the user processes to +execute performance monitoring. The setting essentially extends the +RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped +specifically for capturing monitored performance events and related data. + +For example, if a machine has eight cores and perf_event_mlock_kb limit +is set to 516 KiB, then a user process is provided with 516 KiB * 8 = +4128 KiB of memory above the RLIMIT_MEMLOCK limit (ulimit -l) for +perf_event mmap buffers. In particular, this means that, if the user +wants to start two or more performance monitoring processes, the user is +required to manually distribute the available 4128 KiB between the +monitoring processes, for example, using the --mmap-pages Perf record +mode option. Otherwise, the first started performance monitoring process +allocates all available 4128 KiB and the other processes will fail to +proceed due to the lack of memory. + +RLIMIT_MEMLOCK and perf_event_mlock_kb resource constraints are ignored +for processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf +privileged users can be provided with memory above the constraints for +perf_events/Perf performance monitoring purpose by providing the Perf +executable with CAP_IPC_LOCK capability. Bibliography ------------ @@ -94,4 +222,9 @@ Bibliography .. [5] `<https://www.kernel.org/doc/html/latest/security/credentials.html>`_ .. [6] `<http://man7.org/linux/man-pages/man7/capabilities.7.html>`_ .. [7] `<http://man7.org/linux/man-pages/man2/ptrace.2.html>`_ +.. [8] `<https://en.wikipedia.org/wiki/Hardware_performance_counter>`_ +.. [9] `<https://en.wikipedia.org/wiki/Model-specific_register>`_ +.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_ +.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_ +.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_ diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst @@ -1,59 +1,164 @@ Tainted kernels --------------- -Some oops reports contain the string **'Tainted: '** after the program -counter. This indicates that the kernel has been tainted by some -mechanism. The string is followed by a series of position-sensitive -characters, each representing a particular tainted value. - - 1) ``G`` if all modules loaded have a GPL or compatible license, ``P`` if +The kernel will mark itself as 'tainted' when something occurs that might be +relevant later when investigating problems. Don't worry too much about this, +most of the time it's not a problem to run a tainted kernel; the information is +mainly of interest once someone wants to investigate some problem, as its real +cause might be the event that got the kernel tainted. That's why bug reports +from tainted kernels will often be ignored by developers, hence try to reproduce +problems with an untainted kernel. + +Note the kernel will remain tainted even after you undo what caused the taint +(i.e. unload a proprietary kernel module), to indicate the kernel remains not +trustworthy. That's also why the kernel will print the tainted state when it +notices an internal problem (a 'kernel bug'), a recoverable error +('kernel oops') or a non-recoverable error ('kernel panic') and writes debug +information about this to the logs ``dmesg`` outputs. It's also possible to +check the tainted state at runtime through a file in ``/proc/``. + + +Tainted flag in bugs, oops or panics messages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You find the tainted state near the top in a line starting with 'CPU:'; if or +why the kernel was tainted is shown after the Process ID ('PID:') and a shortened +name of the command ('Comm:') that triggered the event:: + + BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 + Oops: 0002 [#1] SMP PTI + CPU: 0 PID: 4424 Comm: insmod Tainted: P W O 4.20.0-0.rc6.fc30 #1 + Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 + RIP: 0010:my_oops_init+0x13/0x1000 [kpanic] + [...] + +You'll find a 'Not tainted: ' there if the kernel was not tainted at the +time of the event; if it was, then it will print 'Tainted: ' and characters +either letters or blanks. In above example it looks like this:: + + Tainted: P W O + +The meaning of those characters is explained in the table below. In tis case +the kernel got tainted earlier because a proprietary Module (``P``) was loaded, +a warning occurred (``W``), and an externally-built module was loaded (``O``). +To decode other letters use the table below. + + +Decoding tainted state at runtime +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +At runtime, you can query the tainted state by reading +``cat /proc/sys/kernel/tainted``. If that returns ``0``, the kernel is not +tainted; any other number indicates the reasons why it is. The easiest way to +decode that number is the script ``tools/debugging/kernel-chktaint``, which your +distribution might ship as part of a package called ``linux-tools`` or +``kernel-tools``; if it doesn't you can download the script from +`git.kernel.org <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/debugging/kernel-chktaint>`_ +and execute it with ``sh kernel-chktaint``, which would print something like +this on the machine that had the statements in the logs that were quoted earlier:: + + Kernel is Tainted for following reasons: + * Proprietary module was loaded (#0) + * Kernel issued warning (#9) + * Externally-built ('out-of-tree') module was loaded (#12) + See Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel or + https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html for + a more details explanation of the various taint flags. + Raw taint value as int/string: 4609/'P W O ' + +You can try to decode the number yourself. That's easy if there was only one +reason that got your kernel tainted, as in this case you can find the number +with the table below. If there were multiple reasons you need to decode the +number, as it is a bitfield, where each bit indicates the absence or presence of +a particular type of taint. It's best to leave that to the aforementioned +script, but if you need something quick you can use this shell command to check +which bits are set:: + + $ for i in $(seq 18); do echo $(($i-1)) $(($(cat /proc/sys/kernel/tainted)>>($i-1)&1));done + +Table for decoding tainted state +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +=== === ====== ======================================================== +Bit Log Number Reason that got the kernel tainted +=== === ====== ======================================================== + 0 G/P 1 proprietary module was loaded + 1 _/F 2 module was force loaded + 2 _/S 4 SMP kernel oops on an officially SMP incapable processor + 3 _/R 8 module was force unloaded + 4 _/M 16 processor reported a Machine Check Exception (MCE) + 5 _/B 32 bad page referenced or some unexpected page flags + 6 _/U 64 taint requested by userspace application + 7 _/D 128 kernel died recently, i.e. there was an OOPS or BUG + 8 _/A 256 ACPI table overridden by user + 9 _/W 512 kernel issued warning + 10 _/C 1024 staging driver was loaded + 11 _/I 2048 workaround for bug in platform firmware applied + 12 _/O 4096 externally-built ("out-of-tree") module was loaded + 13 _/E 8192 unsigned module was loaded + 14 _/L 16384 soft lockup occurred + 15 _/K 32768 kernel has been live patched + 16 _/X 65536 auxiliary taint, defined for and used by distros + 17 _/T 131072 kernel was built with the struct randomization plugin +=== === ====== ======================================================== + +Note: The character ``_`` is representing a blank in this table to make reading +easier. + +More detailed explanation for tainting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + 0) ``G`` if all modules loaded have a GPL or compatible license, ``P`` if any proprietary module has been loaded. Modules without a MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by insmod as GPL compatible are assumed to be proprietary. - 2) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all + 1) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all modules were loaded normally. - 3) ``S`` if the oops occurred on an SMP kernel running on hardware that + 2) ``S`` if the oops occurred on an SMP kernel running on hardware that hasn't been certified as safe to run multiprocessor. Currently this occurs only on various Athlons that are not SMP capable. - 4) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all + 3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all modules were unloaded normally. - 5) ``M`` if any processor has reported a Machine Check Exception, + 4) ``M`` if any processor has reported a Machine Check Exception, ``' '`` if no Machine Check Exceptions have occurred. - 6) ``B`` if a page-release function has found a bad page reference or - some unexpected page flags. + 5) ``B`` If a page-release function has found a bad page reference or some + unexpected page flags. This indicates a hardware problem or a kernel bug; + there should be other information in the log indicating why this tainting + occured. - 7) ``U`` if a user or user application specifically requested that the + 6) ``U`` if a user or user application specifically requested that the Tainted flag be set, ``' '`` otherwise. - 8) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG. + 7) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG. - 9) ``A`` if the ACPI table has been overridden. + 8) ``A`` if an ACPI table has been overridden. - 10) ``W`` if a warning has previously been issued by the kernel. + 9) ``W`` if a warning has previously been issued by the kernel. (Though some warnings may set more specific taint flags.) - 11) ``C`` if a staging driver has been loaded. + 10) ``C`` if a staging driver has been loaded. - 12) ``I`` if the kernel is working around a severe bug in the platform + 11) ``I`` if the kernel is working around a severe bug in the platform firmware (BIOS or similar). - 13) ``O`` if an externally-built ("out-of-tree") module has been loaded. + 12) ``O`` if an externally-built ("out-of-tree") module has been loaded. - 14) ``E`` if an unsigned module has been loaded in a kernel supporting + 13) ``E`` if an unsigned module has been loaded in a kernel supporting module signature. - 15) ``L`` if a soft lockup has previously occurred on the system. + 14) ``L`` if a soft lockup has previously occurred on the system. + + 15) ``K`` if the kernel has been live patched. - 16) ``K`` if the kernel has been live patched. + 16) ``X`` Auxiliary taint, defined for and used by Linux distributors. -The primary reason for the **'Tainted: '** string is to tell kernel -debuggers if this is a clean kernel or if anything unusual has -occurred. Tainting is permanent: even if an offending module is -unloaded, the tainted value remains to indicate that the kernel is not -trustworthy. + 17) ``T`` Kernel was build with the randstruct plugin, which can intentionally + produce extremely unusual kernel structure layouts (even performance + pathological ones), which is important to know when debugging. Set at + build time. diff --git a/Documentation/cgroup-v1/memory.txt b/Documentation/cgroup-v1/memory.txt @@ -70,7 +70,7 @@ Brief summary of control files. memory.soft_limit_in_bytes # set/show soft limit of memory usage memory.stat # show various statistics memory.use_hierarchy # set/show hierarchical account enabled - memory.force_empty # trigger forced move charge to parent + memory.force_empty # trigger forced page reclaim memory.pressure_level # set memory pressure notifications memory.swappiness # set/show swappiness parameter of vmscan (See sysctl's vm.swappiness) @@ -459,8 +459,9 @@ About use_hierarchy, see Section 6. the cgroup will be reclaimed and as many pages reclaimed as possible. The typical use case for this interface is before calling rmdir(). - Because rmdir() moves all pages to parent, some out-of-use page caches can be - moved to the parent. If you want to avoid that, force_empty will be useful. + Though rmdir() offlines memcg, but the memcg may still stay there due to + charged file caches. Some out-of-use page caches may keep charged until + memory pressure happens. If you want to avoid that, force_empty will be useful. Also, note that when memory.kmem.limit_in_bytes is set the charges due to kernel pages will still be seen. This is not considered a failure and the diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst @@ -356,10 +356,6 @@ Read-Copy Update (RCU) .. kernel-doc:: include/linux/rcupdate.h -.. kernel-doc:: include/linux/rcupdate_wait.h - -.. kernel-doc:: include/linux/rcutree.h - .. kernel-doc:: kernel/rcu/tree.c .. kernel-doc:: kernel/rcu/tree_plugin.h diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst @@ -1,4 +1,4 @@ -.. _memory-allocation: +.. _memory_allocation: ======================= Memory Allocation Guide @@ -113,9 +113,11 @@ see :c:func:`kvmalloc_node` reference documentation. Note that If you need to allocate many identical objects you can use the slab cache allocator. The cache should be set up with -:c:func:`kmem_cache_create` before it can be used. Afterwards -:c:func:`kmem_cache_alloc` and its convenience wrappers can allocate -memory from that cache. +:c:func:`kmem_cache_create` or :c:func:`kmem_cache_create_usercopy` +before it can be used. The second function should be used if a part of +the cache might be copied to the userspace. After the cache is +created :c:func:`kmem_cache_alloc` and its convenience wrappers can +allocate memory from that cache. When the allocated memory is no longer needed it must be freed. You can use :c:func:`kvfree` for the memory allocated with `kmalloc`, diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst @@ -35,7 +35,7 @@ users will want to use a plain ``GFP_KERNEL``. :doc: Reclaim modifiers .. kernel-doc:: include/linux/gfp.h - :doc: Common combinations + :doc: Useful GFP flag combinations The Slab Cache ============== diff --git a/Documentation/dev-tools/kcov.rst b/Documentation/dev-tools/kcov.rst @@ -22,7 +22,7 @@ Configure the kernel with:: CONFIG_KCOV=y -CONFIG_KCOV requires gcc built on revision 231296 or later. +CONFIG_KCOV requires gcc 6.1.0 or later. If the comparison operands need to be collected, set:: diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst @@ -490,7 +490,7 @@ doc: *title* functions: *[ function ...]* Include documentation for each *function* in *source*. - If no *function* if specified, the documentaion for all functions + If no *function* is specified, the documentation for all functions and types in the *source* will be included. Examples:: @@ -517,4 +517,17 @@ How to use kernel-doc to generate man pages If you just want to use kernel-doc to generate man pages you can do this from the kernel git tree:: - $ scripts/kernel-doc -man $(git grep -l '/\*\*' -- :^Documentation :^tools) | scripts/split-man.pl /tmp/man + $ scripts/kernel-doc -man \ + $(git grep -l '/\*\*' -- :^Documentation :^tools) \ + | scripts/split-man.pl /tmp/man + +Some older versions of git do not support some of the variants of syntax for +path exclusion. One of the following commands may work for those versions:: + + $ scripts/kernel-doc -man \ + $(git grep -l '/\*\*' -- . ':!Documentation' ':!tools') \ + | scripts/split-man.pl /tmp/man + + $ scripts/kernel-doc -man \ + $(git grep -l '/\*\*' -- . ":(exclude)Documentation" ":(exclude)tools") \ + | scripts/split-man.pl /tmp/man diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst @@ -27,8 +27,8 @@ Sphinx Install ============== The ReST markups currently used by the Documentation/ files are meant to be -built with ``Sphinx`` version 1.3 or upper. If you're desiring to build -PDF outputs, it is recommended to use version 1.4.6 or upper. +built with ``Sphinx`` version 1.3 or higher. If you desire to build +PDF output, it is recommended to use version 1.4.6 or higher. There's a script that checks for the Sphinx requirements. Please see :ref:`sphinx-pre-install` for further details. @@ -37,15 +37,15 @@ Most distributions are shipped with Sphinx, but its toolchain is fragile, and it is not uncommon that upgrading it or some other Python packages on your machine would cause the documentation build to break. -A way to get rid of that is to use a different version than the one shipped -on your distributions. In order to do that, it is recommended to install +A way to avoid that is to use a different version than the one shipped +with your distributions. In order to do so, it is recommended to install Sphinx inside a virtual environment, using ``virtualenv-3`` or ``virtualenv``, depending on how your distribution packaged Python 3. .. note:: #) Sphinx versions below 1.5 don't work properly with Python's - docutils version 0.13.1 or upper. So, if you're willing to use + docutils version 0.13.1 or higher. So, if you're willing to use those versions, you should run ``pip install 'docutils==0.12'``. #) It is recommended to use the RTD theme for html output. Depending @@ -82,7 +82,7 @@ output. PDF and LaTeX builds -------------------- -Such builds are currently supported only with Sphinx versions 1.4 and upper. +Such builds are currently supported only with Sphinx versions 1.4 and higher. For PDF and LaTeX output, you'll also need ``XeLaTeX`` version 3.14159265. diff --git a/Documentation/driver-api/dmaengine/client.rst b/Documentation/driver-api/dmaengine/client.rst @@ -168,6 +168,13 @@ The details of these operations are: dmaengine_submit() will not start the DMA operation, it merely adds it to the pending queue. For this, see step 5, dma_async_issue_pending. + .. note:: + + After calling ``dmaengine_submit()`` the submitted transfer descriptor + (``struct dma_async_tx_descriptor``) belongs to the DMA engine. + Consequentially, the client must consider invalid the pointer to that + descriptor. + 5. Issue pending DMA requests and wait for callback notification The transactions in the pending queue can be activated by calling the diff --git a/Documentation/driver-api/iio/buffers.rst b/Documentation/driver-api/iio/buffers.rst @@ -26,7 +26,7 @@ IIO buffer setup ================ The meta information associated with a channel reading placed in a buffer is -called a scan element . The important bits configuring scan elements are +called a scan element. The important bits configuring scan elements are exposed to userspace applications via the :file:`/sys/bus/iio/iio:device{X}/scan_elements/*` directory. This file contains attributes of the following form: diff --git a/Documentation/driver-api/iio/core.rst b/Documentation/driver-api/iio/core.rst @@ -2,8 +2,8 @@ Core elements ============= -The Industrial I/O core offers a unified framework for writing drivers for -many different types of embedded sensors. a standard interface to user space +The Industrial I/O core offers both a unified framework for writing drivers for +many different types of embedded sensors and a standard interface to user space applications manipulating sensors. The implementation can be found under :file:`drivers/iio/industrialio-*` @@ -11,7 +11,7 @@ Industrial I/O Devices ---------------------- * struct :c:type:`iio_dev` - industrial I/O device -* :c:func:`iio_device_alloc()` - alocate an :c:type:`iio_dev` from a driver +* :c:func:`iio_device_alloc()` - allocate an :c:type:`iio_dev` from a driver * :c:func:`iio_device_free()` - free an :c:type:`iio_dev` from a driver * :c:func:`iio_device_register()` - register a device with the IIO subsystem * :c:func:`iio_device_unregister()` - unregister a device from the IIO diff --git a/Documentation/driver-api/iio/hw-consumer.rst b/Documentation/driver-api/iio/hw-consumer.rst @@ -1,7 +1,7 @@ =========== HW consumer =========== -An IIO device can be directly connected to another device in hardware. in this +An IIO device can be directly connected to another device in hardware. In this case the buffers between IIO provider and IIO consumer are handled by hardware. The Industrial I/O HW consumer offers a way to bond these IIO devices without software buffer for data. The implementation can be found under diff --git a/Documentation/driver-api/iio/triggers.rst b/Documentation/driver-api/iio/triggers.rst @@ -38,7 +38,7 @@ There are two locations in sysfs related to triggers: * :file:`/sys/bus/iio/devices/iio:device{X}/trigger/*`, this directory is created once the device supports a triggered buffer. We can associate a - trigger with our device by writing the trigger's name in the + trigger with our device by writing the trigger's name in the :file:`current_trigger` file. IIO trigger setup diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt @@ -195,7 +195,7 @@ o #include <linux/fault-inject.h> o define the fault attributes - DECLARE_FAULT_INJECTION(name); + DECLARE_FAULT_ATTR(name); Please see the definition of struct fault_attr in fault-inject.h for details. diff --git a/Documentation/filesystems/api-summary.rst b/Documentation/filesystems/api-summary.rst @@ -0,0 +1,150 @@ +============================= +Linux Filesystems API summary +============================= + +This section contains API-level documentation, mostly taken from the source +code itself. + +The Linux VFS +============= + +The Filesystem types +-------------------- + +.. kernel-doc:: include/linux/fs.h + :internal: + +The Directory Cache +------------------- + +.. kernel-doc:: fs/dcache.c + :export: + +.. kernel-doc:: include/linux/dcache.h + :internal: + +Inode Handling +-------------- + +.. kernel-doc:: fs/inode.c + :export: + +.. kernel-doc:: fs/bad_inode.c + :export: + +Registration and Superblocks +---------------------------- + +.. kernel-doc:: fs/super.c + :export: + +File Locks +---------- + +.. kernel-doc:: fs/locks.c + :export: + +.. kernel-doc:: fs/locks.c + :internal: + +Other Functions +--------------- + +.. kernel-doc:: fs/mpage.c + :export: + +.. kernel-doc:: fs/namei.c + :export: + +.. kernel-doc:: fs/buffer.c + :export: + +.. kernel-doc:: block/bio.c + :export: + +.. kernel-doc:: fs/seq_file.c + :export: + +.. kernel-doc:: fs/filesystems.c + :export: + +.. kernel-doc:: fs/fs-writeback.c + :export: + +.. kernel-doc:: fs/block_dev.c + :export: + +.. kernel-doc:: fs/anon_inodes.c + :export: + +.. kernel-doc:: fs/attr.c + :export: + +.. kernel-doc:: fs/d_path.c + :export: + +.. kernel-doc:: fs/dax.c + :export: + +.. kernel-doc:: fs/direct-io.c + :export: + +.. kernel-doc:: fs/file_table.c + :export: + +.. kernel-doc:: fs/libfs.c + :export: + +.. kernel-doc:: fs/posix_acl.c + :export: + +.. kernel-doc:: fs/stat.c + :export: + +.. kernel-doc:: fs/sync.c + :export: + +.. kernel-doc:: fs/xattr.c + :export: + +The proc filesystem +=================== + +sysctl interface +---------------- + +.. kernel-doc:: kernel/sysctl.c + :export: + +proc filesystem interface +------------------------- + +.. kernel-doc:: fs/proc/base.c + :internal: + +Events based on file descriptors +================================ + +.. kernel-doc:: fs/eventfd.c + :export: + +The Filesystem for Exporting Kernel Objects +=========================================== + +.. kernel-doc:: fs/sysfs/file.c + :export: + +.. kernel-doc:: fs/sysfs/symlink.c + :export: + +The debugfs filesystem +====================== + +debugfs interface +----------------- + +.. kernel-doc:: fs/debugfs/inode.c + :export: + +.. kernel-doc:: fs/debugfs/file.c + :export: diff --git a/Documentation/filesystems/binderfs.rst b/Documentation/filesystems/binderfs.rst @@ -0,0 +1,68 @@ +.. SPDX-License-Identifier: GPL-2.0 + +The Android binderfs Filesystem +=============================== + +Android binderfs is a filesystem for the Android binder IPC mechanism. It +allows to dynamically add and remove binder devices at runtime. Binder devices +located in a new binderfs instance are independent of binder devices located in +other binderfs instances. Mounting a new binderfs instance makes it possible +to get a set of private binder devices. + +Mounting binderfs +----------------- + +Android binderfs can be mounted with:: + + mkdir /dev/binderfs + mount -t binder binder /dev/binderfs + +at which point a new instance of binderfs will show up at ``/dev/binderfs``. +In a fresh instance of binderfs no binder devices will be present. There will +only be a ``binder-control`` device which serves as the request handler for +binderfs. Mounting another binderfs instance at a different location will +create a new and separate instance from all other binderfs mounts. This is +identical to the behavior of e.g. ``devpts`` and ``tmpfs``. The Android +binderfs filesystem can be mounted in user namespaces. + +Options +------- +max + binderfs instances can be mounted with a limit on the number of binder + devices that can be allocated. The ``max=<count>`` mount option serves as + a per-instance limit. If ``max=<count>`` is set then only ``<count>`` number + of binder devices can be allocated in this binderfs instance. + +Allocating binder Devices +------------------------- + +.. _ioctl: http://man7.org/linux/man-pages/man2/ioctl.2.html + +To allocate a new binder device in a binderfs instance a request needs to be +sent through the ``binder-control`` device node. A request is sent in the form +of an `ioctl() <ioctl_>`_. + +What a program needs to do is to open the ``binder-control`` device node and +send a ``BINDER_CTL_ADD`` request to the kernel. Users of binderfs need to +tell the kernel which name the new binder device should get. By default a name +can only contain up to ``BINDERFS_MAX_NAME`` chars including the terminating +zero byte. + +Once the request is made via an `ioctl() <ioctl_>`_ passing a ``struct +binder_device`` with the name to the kernel it will allocate a new binder +device and return the major and minor number of the new device in the struct +(This is necessary because binderfs allocates a major device number +dynamically.). After the `ioctl() <ioctl_>`_ returns there will be a new +binder device located under /dev/binderfs with the chosen name. + +Deleting binder Devices +----------------------- + +.. _unlink: http://man7.org/linux/man-pages/man2/unlink.2.html +.. _rm: http://man7.org/linux/man-pages/man1/rm.1.html + +Binderfs binder devices can be deleted via `unlink() <unlink_>`_. This means +that the `rm() <rm_>`_ tool can be used to delete them. Note that the +``binder-control`` device cannot be deleted since this would make the binderfs +instance unuseable. The ``binder-control`` device will be deleted when the +binderfs instance is unmounted and all references to it have been dropped. diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst @@ -1,382 +1,43 @@ -===================== -Linux Filesystems API -===================== +=============================== +Filesystems in the Linux kernel +=============================== -The Linux VFS -============= +This under-development manual will, some glorious day, provide +comprehensive information on how the Linux virtual filesystem (VFS) layer +works, along with the filesystems that sit below it. For now, what we have +can be found below. -The Filesystem types --------------------- - -.. kernel-doc:: include/linux/fs.h - :internal: - -The Directory Cache -------------------- - -.. kernel-doc:: fs/dcache.c - :export: - -.. kernel-doc:: include/linux/dcache.h - :internal: - -Inode Handling --------------- - -.. kernel-doc:: fs/inode.c - :export: - -.. kernel-doc:: fs/bad_inode.c - :export: - -Registration and Superblocks ----------------------------- - -.. kernel-doc:: fs/super.c - :export: - -File Locks ----------- - -.. kernel-doc:: fs/locks.c - :export: - -.. kernel-doc:: fs/locks.c - :internal: - -Other Functions ---------------- - -.. kernel-doc:: fs/mpage.c - :export: - -.. kernel-doc:: fs/namei.c - :export: - -.. kernel-doc:: fs/buffer.c - :export: - -.. kernel-doc:: block/bio.c - :export: - -.. kernel-doc:: fs/seq_file.c - :export: - -.. kernel-doc:: fs/filesystems.c - :export: - -.. kernel-doc:: fs/fs-writeback.c - :export: - -.. kernel-doc:: fs/block_dev.c - :export: - -.. kernel-doc:: fs/anon_inodes.c - :export: - -.. kernel-doc:: fs/attr.c - :export: - -.. kernel-doc:: fs/d_path.c - :export: - -.. kernel-doc:: fs/dax.c - :export: - -.. kernel-doc:: fs/direct-io.c - :export: - -.. kernel-doc:: fs/file_table.c - :export: - -.. kernel-doc:: fs/libfs.c - :export: - -.. kernel-doc:: fs/posix_acl.c - :export: - -.. kernel-doc:: fs/stat.c - :export: - -.. kernel-doc:: fs/sync.c - :export: - -.. kernel-doc:: fs/xattr.c - :export: - -The proc filesystem -=================== - -sysctl interface ----------------- - -.. kernel-doc:: kernel/sysctl.c - :export: - -proc filesystem interface -------------------------- - -.. kernel-doc:: fs/proc/base.c - :internal: - -Events based on file descriptors -================================ - -.. kernel-doc:: fs/eventfd.c - :export: - -The Filesystem for Exporting Kernel Objects -=========================================== - -.. kernel-doc:: fs/sysfs/file.c - :export: - -.. kernel-doc:: fs/sysfs/symlink.c - :export: - -The debugfs filesystem +Core VFS documentation ====================== -debugfs interface ------------------ +See these manuals for documentation about the VFS layer itself and how its +algorithms work. -.. kernel-doc:: fs/debugfs/inode.c - :export: +.. toctree:: + :maxdepth: 2 -.. kernel-doc:: fs/debugfs/file.c - :export: + path-lookup.rst + api-summary + splice -The Linux Journalling API +Filesystem support layers ========================= -Overview --------- - -Details -~~~~~~~ - -The journalling layer is easy to use. You need to first of all create a -journal_t data structure. There are two calls to do this dependent on -how you decide to allocate the physical media on which the journal -resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in -filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used -for journal stored on a raw device (in a continuous range of blocks). A -journal_t is a typedef for a struct pointer, so when you are finally -finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up -any used kernel memory. - -Once you have got your journal_t object you need to 'mount' or load the -journal file. The journalling layer expects the space for the journal -was already allocated and initialized properly by the userspace tools. -When loading the journal you must call :c:func:`jbd2_journal_load` to process -journal contents. If the client file system detects the journal contents -does not need to be processed (or even need not have valid contents), it -may call :c:func:`jbd2_journal_wipe` to clear the journal contents before -calling :c:func:`jbd2_journal_load`. - -Note that jbd2_journal_wipe(..,0) calls -:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding -transactions in the journal and similarly :c:func:`jbd2_journal_load` will -call :c:func:`jbd2_journal_recover` if necessary. I would advise reading -:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage. - -Now you can go ahead and start modifying the underlying filesystem. -Almost. - -You still need to actually journal your filesystem changes, this is done -by wrapping them into transactions. Additionally you also need to wrap -the modification of each of the buffers with calls to the journal layer, -so it knows what the modifications you are actually making are. To do -this use :c:func:`jbd2_journal_start` which returns a transaction handle. - -:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`, -which indicates the end of a transaction are nestable calls, so you can -reenter a transaction if necessary, but remember you must call -:c:func:`jbd2_journal_stop` the same number of times as -:c:func:`jbd2_journal_start` before the transaction is completed (or more -accurately leaves the update phase). Ext4/VFS makes use of this feature to -simplify handling of inode dirtying, quota support, etc. - -Inside each transaction you need to wrap the modifications to the -individual buffers (blocks). Before you start to modify a buffer you -need to call :c:func:`jbd2_journal_get_create_access()` / -:c:func:`jbd2_journal_get_write_access()` / -:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the -journalling layer to copy the unmodified -data if it needs to. After all the buffer may be part of a previously -uncommitted transaction. At this point you are at last ready to modify a -buffer, and once you are have done so you need to call -:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a -buffer you now know is now longer required to be pushed back on the -device you can call :c:func:`jbd2_journal_forget` in much the same way as you -might have used :c:func:`bforget` in the past. - -A :c:func:`jbd2_journal_flush` may be called at any time to commit and -checkpoint all your transactions. - -Then at umount time , in your :c:func:`put_super` you can then call -:c:func:`jbd2_journal_destroy` to clean up your in-core journal object. - -Unfortunately there a couple of ways the journal layer can cause a -deadlock. The first thing to note is that each task can only have a -single outstanding transaction at any one time, remember nothing commits -until the outermost :c:func:`jbd2_journal_stop`. This means you must complete -the transaction at the end of each file/inode/address etc. operation you -perform, so that the journalling system isn't re-entered on another -journal. Since transactions can't be nested/batched across differing -journals, and another filesystem other than yours (say ext4) may be -modified in a later syscall. - -The second case to bear in mind is that :c:func:`jbd2_journal_start` can block -if there isn't enough space in the journal for your transaction (based -on the passed nblocks param) - when it blocks it merely(!) needs to wait -for transactions to complete and be committed from other tasks, so -essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid -deadlocks you must treat :c:func:`jbd2_journal_start` / -:c:func:`jbd2_journal_stop` as if they were semaphores and include them in -your semaphore ordering rules to prevent -deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking -behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as -easily as on :c:func:`jbd2_journal_start`. - -Try to reserve the right number of blocks the first time. ;-). This will -be the maximum number of blocks you are going to touch in this -transaction. I advise having a look at at least ext4_jbd.h to see the -basis on which ext4 uses to make these decisions. - -Another wriggle to watch out for is your on-disk block allocation -strategy. Why? Because, if you do a delete, you need to ensure you -haven't reused any of the freed blocks until the transaction freeing -these blocks commits. If you reused these blocks and crash happens, -there is no way to restore the contents of the reallocated blocks at the -end of the last fully committed transaction. One simple way of doing -this is to mark blocks as free in internal in-memory block allocation -structures only after the transaction freeing them commits. Ext4 uses -journal commit callback for this purpose. - -With journal commit callbacks you can ask the journalling layer to call -a callback function when the transaction is finally committed to disk, -so that you can do some of your own management. You ask the journalling -layer for calling the callback by simply setting -``journal->j_commit_callback`` function pointer and that function is -called after each transaction commit. You can also use -``transaction->t_private_list`` for attaching entries to a transaction -that need processing when the transaction commits. - -JBD2 also provides a way to block all transaction updates via -:c:func:`jbd2_journal_lock_updates()` / -:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a -window with a clean and stable fs for a moment. E.g. - -:: - - - jbd2_journal_lock_updates() //stop new stuff happening.. - jbd2_journal_flush() // checkpoint everything. - ..do stuff on stable fs - jbd2_journal_unlock_updates() // carry on with filesystem use. - -The opportunities for abuse and DOS attacks with this should be obvious, -if you allow unprivileged userspace to trigger codepaths containing -these calls. - -Summary -~~~~~~~ - -Using the journal is a matter of wrapping the different context changes, -being each mount, each modification (transaction) and each changed -buffer to tell the journalling layer about them. - -Data Types ----------- - -The journalling layer uses typedefs to 'hide' the concrete definitions -of the structures used. As a client of the JBD2 layer you can just rely -on the using the pointer as a magic cookie of some sort. Obviously the -hiding is not enforced as this is 'C'. - -Structures -~~~~~~~~~~ - -.. kernel-doc:: include/linux/jbd2.h - :internal: - -Functions ---------- - -The functions here are split into two groups those that affect a journal -as a whole, and those which are used to manage transactions - -Journal Level -~~~~~~~~~~~~~ - -.. kernel-doc:: fs/jbd2/journal.c - :export: - -.. kernel-doc:: fs/jbd2/recovery.c - :internal: - -Transasction Level -~~~~~~~~~~~~~~~~~~ - -.. kernel-doc:: fs/jbd2/transaction.c - -See also --------- - -`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen -Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__ - -`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen -Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__ - -splice API -========== - -splice is a method for moving blocks of data around inside the kernel, -without continually transferring them between the kernel and user space. - -.. kernel-doc:: fs/splice.c - -pipes API -========= - -Pipe interfaces are all for in-kernel (builtin image) use. They are not -exported for use by modules. - -.. kernel-doc:: include/linux/pipe_fs_i.h - :internal: - -.. kernel-doc:: fs/pipe.c - -Encryption API -============== - -A library which filesystems can hook into to support transparent -encryption of files and directories. +Documentation for the support code within the filesystem layer for use in +filesystem implementations. .. toctree:: - :maxdepth: 2 - - fscrypt - -Pathname lookup -=============== - - -This write-up is based on three articles published at lwn.net: + :maxdepth: 2 -- <https://lwn.net/Articles/649115/> Pathname lookup in Linux -- <https://lwn.net/Articles/649729/> RCU-walk: faster pathname lookup in Linux -- <https://lwn.net/Articles/650786/> A walk among the symlinks + journalling + fscrypt -Written by Neil Brown with help from Al Viro and Jon Corbet. -It has subsequently been updated to reflect changes in the kernel -including: +Filesystem-specific documentation +================================= -- per-directory parallel name lookup. +Documentation for individual filesystem types can be found here. .. toctree:: :maxdepth: 2 - path-lookup.rst + binderfs.rst diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst @@ -0,0 +1,184 @@ +The Linux Journalling API +========================= + +Overview +-------- + +Details +~~~~~~~ + +The journalling layer is easy to use. You need to first of all create a +journal_t data structure. There are two calls to do this dependent on +how you decide to allocate the physical media on which the journal +resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in +filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used +for journal stored on a raw device (in a continuous range of blocks). A +journal_t is a typedef for a struct pointer, so when you are finally +finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up +any used kernel memory. + +Once you have got your journal_t object you need to 'mount' or load the +journal file. The journalling layer expects the space for the journal +was already allocated and initialized properly by the userspace tools. +When loading the journal you must call :c:func:`jbd2_journal_load` to process +journal contents. If the client file system detects the journal contents +does not need to be processed (or even need not have valid contents), it +may call :c:func:`jbd2_journal_wipe` to clear the journal contents before +calling :c:func:`jbd2_journal_load`. + +Note that jbd2_journal_wipe(..,0) calls +:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding +transactions in the journal and similarly :c:func:`jbd2_journal_load` will +call :c:func:`jbd2_journal_recover` if necessary. I would advise reading +:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage. + +Now you can go ahead and start modifying the underlying filesystem. +Almost. + +You still need to actually journal your filesystem changes, this is done +by wrapping them into transactions. Additionally you also need to wrap +the modification of each of the buffers with calls to the journal layer, +so it knows what the modifications you are actually making are. To do +this use :c:func:`jbd2_journal_start` which returns a transaction handle. + +:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`, +which indicates the end of a transaction are nestable calls, so you can +reenter a transaction if necessary, but remember you must call +:c:func:`jbd2_journal_stop` the same number of times as +:c:func:`jbd2_journal_start` before the transaction is completed (or more +accurately leaves the update phase). Ext4/VFS makes use of this feature to +simplify handling of inode dirtying, quota support, etc. + +Inside each transaction you need to wrap the modifications to the +individual buffers (blocks). Before you start to modify a buffer you +need to call :c:func:`jbd2_journal_get_create_access()` / +:c:func:`jbd2_journal_get_write_access()` / +:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the +journalling layer to copy the unmodified +data if it needs to. After all the buffer may be part of a previously +uncommitted transaction. At this point you are at last ready to modify a +buffer, and once you are have done so you need to call +:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a +buffer you now know is now longer required to be pushed back on the +device you can call :c:func:`jbd2_journal_forget` in much the same way as you +might have used :c:func:`bforget` in the past. + +A :c:func:`jbd2_journal_flush` may be called at any time to commit and +checkpoint all your transactions. + +Then at umount time , in your :c:func:`put_super` you can then call +:c:func:`jbd2_journal_destroy` to clean up your in-core journal object. + +Unfortunately there a couple of ways the journal layer can cause a +deadlock. The first thing to note is that each task can only have a +single outstanding transaction at any one time, remember nothing commits +until the outermost :c:func:`jbd2_journal_stop`. This means you must complete +the transaction at the end of each file/inode/address etc. operation you +perform, so that the journalling system isn't re-entered on another +journal. Since transactions can't be nested/batched across differing +journals, and another filesystem other than yours (say ext4) may be +modified in a later syscall. + +The second case to bear in mind is that :c:func:`jbd2_journal_start` can block +if there isn't enough space in the journal for your transaction (based +on the passed nblocks param) - when it blocks it merely(!) needs to wait +for transactions to complete and be committed from other tasks, so +essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid +deadlocks you must treat :c:func:`jbd2_journal_start` / +:c:func:`jbd2_journal_stop` as if they were semaphores and include them in +your semaphore ordering rules to prevent +deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking +behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as +easily as on :c:func:`jbd2_journal_start`. + +Try to reserve the right number of blocks the first time. ;-). This will +be the maximum number of blocks you are going to touch in this +transaction. I advise having a look at at least ext4_jbd.h to see the +basis on which ext4 uses to make these decisions. + +Another wriggle to watch out for is your on-disk block allocation +strategy. Why? Because, if you do a delete, you need to ensure you +haven't reused any of the freed blocks until the transaction freeing +these blocks commits. If you reused these blocks and crash happens, +there is no way to restore the contents of the reallocated blocks at the +end of the last fully committed transaction. One simple way of doing +this is to mark blocks as free in internal in-memory block allocation +structures only after the transaction freeing them commits. Ext4 uses +journal commit callback for this purpose. + +With journal commit callbacks you can ask the journalling layer to call +a callback function when the transaction is finally committed to disk, +so that you can do some of your own management. You ask the journalling +layer for calling the callback by simply setting +``journal->j_commit_callback`` function pointer and that function is +called after each transaction commit. You can also use +``transaction->t_private_list`` for attaching entries to a transaction +that need processing when the transaction commits. + +JBD2 also provides a way to block all transaction updates via +:c:func:`jbd2_journal_lock_updates()` / +:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a +window with a clean and stable fs for a moment. E.g. + +:: + + + jbd2_journal_lock_updates() //stop new stuff happening.. + jbd2_journal_flush() // checkpoint everything. + ..do stuff on stable fs + jbd2_journal_unlock_updates() // carry on with filesystem use. + +The opportunities for abuse and DOS attacks with this should be obvious, +if you allow unprivileged userspace to trigger codepaths containing +these calls. + +Summary +~~~~~~~ + +Using the journal is a matter of wrapping the different context changes, +being each mount, each modification (transaction) and each changed +buffer to tell the journalling layer about them. + +Data Types +---------- + +The journalling layer uses typedefs to 'hide' the concrete definitions +of the structures used. As a client of the JBD2 layer you can just rely +on the using the pointer as a magic cookie of some sort. Obviously the +hiding is not enforced as this is 'C'. + +Structures +~~~~~~~~~~ + +.. kernel-doc:: include/linux/jbd2.h + :internal: + +Functions +--------- + +The functions here are split into two groups those that affect a journal +as a whole, and those which are used to manage transactions + +Journal Level +~~~~~~~~~~~~~ + +.. kernel-doc:: fs/jbd2/journal.c + :export: + +.. kernel-doc:: fs/jbd2/recovery.c + :internal: + +Transasction Level +~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: fs/jbd2/transaction.c + +See also +-------- + +`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen +Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__ + +`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen +Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__ + diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst @@ -1,3 +1,18 @@ +=============== +Pathname lookup +=============== + +This write-up is based on three articles published at lwn.net: + +- <https://lwn.net/Articles/649115/> Pathname lookup in Linux +- <https://lwn.net/Articles/649729/> RCU-walk: faster pathname lookup in Linux +- <https://lwn.net/Articles/650786/> A walk among the symlinks + +Written by Neil Brown with help from Al Viro and Jon Corbet. +It has subsequently been updated to reflect changes in the kernel +including: + +- per-directory parallel name lookup. Introduction to pathname lookup =============================== @@ -344,7 +359,7 @@ In particular it is held while scanning chains in the dcache hash table, and the mount point hash table. Bringing it together with ``struct nameidata`` --------------------------------------------- +---------------------------------------------- .. _First edition Unix: http://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/u2.s @@ -355,7 +370,7 @@ converts a "name" to an "inode". ``struct nameidata`` contains (among other fields): ``struct path path`` -~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~ A ``path`` contains a ``struct vfsmount`` (which is embedded in a ``struct mount``) and a ``struct dentry``. Together these @@ -366,13 +381,13 @@ step. A reference through ``d_lockref`` and ``mnt_count`` is always held. ``struct qstr last`` -~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~ This is a string together with a length (i.e. _not_ ``nul`` terminated) that is the "next" component in the pathname. ``int last_type`` -~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~ This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT``, ``LAST_DOTDOT``, or ``LAST_BIND``. The ``last`` field is only valid if the type is @@ -381,7 +396,7 @@ components of the symlink have been processed yet. Others should be fairly self-explanatory. ``struct path root`` -~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~ This is used to hold a reference to the effective root of the filesystem. Often that reference won't be needed, so this field is @@ -510,7 +525,7 @@ potentially interesting things about these dentries corresponding to three different flags that might be set in ``dentry->d_flags``: ``DCACHE_MANAGE_TRANSIT`` -~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~ If this flag has been set, then the filesystem has requested that the ``d_manage()`` dentry operation be called before handling any possible @@ -529,7 +544,7 @@ filesystem, which will then give it a special pass through ``d_manage()`` by returning ``-EISDIR``. ``DCACHE_MOUNTED`` -~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~ This flag is set on every dentry that is mounted on. As Linux supports multiple filesystem namespaces, it is possible that the @@ -542,7 +557,7 @@ If this flag is set, and ``d_manage()`` didn't return ``-EISDIR``, and a new ``dentry`` (both with counted references). ``DCACHE_NEED_AUTOMOUNT`` -~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~ If ``d_manage()`` allowed us to get this far, and ``lookup_mnt()`` didn't find a mount point, then this flag causes the ``d_automount()`` dentry @@ -698,7 +713,7 @@ With that little refresher on seqlocks out of the way we can look at the bigger picture of how RCU-walk uses seqlocks. ``mount_lock`` and ``nd->m_seq`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We already met the ``mount_lock`` seqlock when REF-walk used it to ensure that crossing a mount point is performed safely. RCU-walk uses @@ -727,7 +742,7 @@ results would have been the same. This ensures the invariant holds, at least for vfsmount structures. ``dentry->d_seq`` and ``nd->seq`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In place of taking a count or lock on ``d_reflock``, RCU-walk samples the per-dentry ``d_seq`` seqlock, and stores the sequence number in the @@ -774,7 +789,7 @@ getting a counted reference to the new dentry before dropping that for the old dentry which we saw in REF-walk. No ``inode->i_rwsem`` or even ``rename_lock`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A semaphore is a fairly heavyweight lock that can only be taken when it is permissible to sleep. As ``rcu_read_lock()`` forbids sleeping, @@ -796,7 +811,7 @@ locking. This neatly handles all cases, so adding extra checks on rename_lock would bring no significant value. ``unlazy walk()`` and ``complete_walk()`` -------------------------------------- +----------------------------------------- That "dropping down to REF-walk" typically involves a call to ``unlazy_walk()``, so named because "RCU-walk" is also sometimes diff --git a/Documentation/filesystems/splice.rst b/Documentation/filesystems/splice.rst @@ -0,0 +1,22 @@ +================ +splice and pipes +================ + +splice API +========== + +splice is a method for moving blocks of data around inside the kernel, +without continually transferring them between the kernel and user space. + +.. kernel-doc:: fs/splice.c + +pipes API +========= + +Pipe interfaces are all for in-kernel (builtin image) use. They are not +exported for use by modules. + +.. kernel-doc:: include/linux/pipe_fs_i.h + :internal: + +.. kernel-doc:: fs/pipe.c diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt @@ -116,6 +116,27 @@ static struct device_attribute dev_attr_foo = { .store = store_foo, }; +Note as stated in include/linux/kernel.h "OTHER_WRITABLE? Generally +considered a bad idea." so trying to set a sysfs file writable for +everyone will fail reverting to RO mode for "Others". + +For the common cases sysfs.h provides convenience macros to make +defining attributes easier as well as making code more concise and +readable. The above case could be shortened to: + +static struct device_attribute dev_attr_foo = __ATTR_RW(foo); + +the list of helpers available to define your wrapper function is: +__ATTR_RO(name): assumes default name_show and mode 0444 +__ATTR_WO(name): assumes a name_store only and is restricted to mode + 0200 that is root write access only. +__ATTR_RO_MODE(name, mode): fore more restrictive RO access currently + only use case is the EFI System Resource Table + (see drivers/firmware/efi/esrt.c) +__ATTR_RW(name): assumes default name_show, name_store and setting + mode to 0644. +__ATTR_NULL: which sets the name to NULL and is used as end of list + indicator (see: kernel/workqueue.c) Subsystem-Specific Callbacks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/hwmon/f71882fg b/Documentation/hwmon/f71882fg @@ -94,7 +94,7 @@ Note that the lowest numbered temperature zone trip point corresponds to to the border between the highest and one but highest temperature zones, and vica versa. So the temperature zone trip points 1-4 (or 1-2) go from high temp to low temp! This is how things are implemented in the IC, and the driver -mimicks this. +mimics this. There are 2 modes to specify the speed of the fan, PWM duty cycle (or DC voltage) mode, where 0-100% duty cycle (0-100% of 12V) is specified. And RPM diff --git a/Documentation/index.rst b/Documentation/index.rst @@ -90,6 +90,7 @@ needed). filesystems/index vm/index bpf/index + misc-devices/index Architecture-specific documentation ----------------------------------- diff --git a/Documentation/input/devices/xpad.rst b/Documentation/input/devices/xpad.rst @@ -218,7 +218,7 @@ References .. [1] http://euc.jp/periphs/xbox-controller.ja.html (ITO Takayuki) .. [2] http://xpad.xbox-scene.com/ .. [3] http://www.markosweb.com/www/xboxhackz.com/ -.. [4] http://lxr.free-electrons.com/ident?i=xpad_device +.. [4] https://elixir.bootlin.com/linux/latest/ident/xpad_device Historic Edits diff --git a/Documentation/laptops/lg-laptop.rst b/Documentation/laptops/lg-laptop.rst @@ -1,4 +1,5 @@ .. SPDX-License-Identifier: GPL-2.0+ + LG Gram laptop extra features ============================= @@ -9,6 +10,7 @@ Hotkeys ------- The following FN keys are ignored by the kernel without this driver: + - FN-F1 (LG control panel) - Generates F15 - FN-F5 (Touchpad toggle) - Generates F13 - FN-F6 (Airplane mode) - Generates RFKILL @@ -16,7 +18,7 @@ The following FN keys are ignored by the kernel without this driver: This key also changes keyboard backlight mode. - FN-F9 (Reader mode) - Generates F14 -The rest of the FN key work without a need for a special driver. +The rest of the FN keys work without a need for a special driver. Reader mode diff --git a/Documentation/locking/lockdep-design.txt b/Documentation/locking/lockdep-design.txt @@ -45,10 +45,10 @@ When locking rules are violated, these state bits are presented in the locking error messages, inside curlies. A contrived example: modprobe/2287 is trying to acquire lock: - (&sio_locks[i].lock){-.-...}, at: [<c02867fd>] mutex_lock+0x21/0x24 + (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 but task is already holding lock: - (&sio_locks[i].lock){-.-...}, at: [<c02867fd>] mutex_lock+0x21/0x24 + (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 The bit position indicates STATE, STATE-read, for each of the states listed diff --git a/Documentation/misc-devices/ibmvmc.rst b/Documentation/misc-devices/ibmvmc.rst @@ -1,4 +1,5 @@ .. SPDX-License-Identifier: GPL-2.0+ + ====================================================== IBM Virtual Management Channel Kernel Driver (IBMVMC) ====================================================== diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst @@ -0,0 +1,17 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================ +Assorted Miscellaneous Devices Documentation +============================================ + +This documentation contains information for assorted devices that do not +fit into other categories. + +.. class:: toc-title + + Table of contents + +.. toctree:: + :maxdepth: 2 + + ibmvmc diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst @@ -0,0 +1,143 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +Checksum Offloads +================= + + +Introduction +============ + +This document describes a set of techniques in the Linux networking stack to +take advantage of checksum offload capabilities of various NICs. + +The following technologies are described: + +* TX Checksum Offload +* LCO: Local Checksum Offload +* RCO: Remote Checksum Offload + +Things that should be documented here but aren't yet: + +* RX Checksum Offload +* CHECKSUM_UNNECESSARY conversion + + +TX Checksum Offload +=================== + +The interface for offloading a transmit checksum to a device is explained in +detail in comments near the top of include/linux/skbuff.h. + +In brief, it allows to request the device fill in a single ones-complement +checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. +The device should compute the 16-bit ones-complement checksum (i.e. the +'IP-style' checksum) from csum_start to the end of the packet, and fill in the +result at (csum_start + csum_offset). + +Because csum_offset cannot be negative, this ensures that the previous value of +the checksum field is included in the checksum computation, thus it can be used +to supply any needed corrections to the checksum (such as the sum of the +pseudo-header for UDP or TCP). + +This interface only allows a single checksum to be offloaded. Where +encapsulation is used, the packet may have multiple checksum fields in +different header layers, and the rest will have to be handled by another +mechanism such as LCO or RCO. + +CRC32c can also be offloaded using this interface, by means of filling +skb->csum_start and skb->csum_offset as described above, and setting +skb->csum_not_inet: see skbuff.h comment (section 'D') for more details. + +No offloading of the IP header checksum is performed; it is always done in +software. This is OK because when we build the IP header, we obviously have it +in cache, so summing it isn't expensive. It's also rather short. + +The requirements for GSO are more complicated, because when segmenting an +encapsulated packet both the inner and outer checksums may need to be edited or +recomputed for each resulting segment. See the skbuff.h comment (section 'E') +for more details. + +A driver declares its offload capabilities in netdev->hw_features; see +Documentation/networking/netdev-features.txt for more. Note that a device +which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and +csum_offset given in the SKB; if it tries to deduce these itself in hardware +(as some NICs do) the driver should check that the values in the SKB match +those which the hardware will deduce, and if not, fall back to checksumming in +software instead (with skb_csum_hwoffload_help() or one of the +skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in +include/linux/skbuff.h). + +The stack should, for the most part, assume that checksum offload is supported +by the underlying device. The only place that should check is +validate_xmit_skb(), and the functions it calls directly or indirectly. That +function compares the offload features requested by the SKB (which may include +other offloads besides TX Checksum Offload) and, if they are not supported or +enabled on the device (determined by netdev->features), performs the +corresponding offload in software. In the case of TX Checksum Offload, that +means calling skb_csum_hwoffload_help(skb, features). + + +LCO: Local Checksum Offload +=========================== + +LCO is a technique for efficiently computing the outer checksum of an +encapsulated datagram when the inner checksum is due to be offloaded. + +The ones-complement sum of a correctly checksummed TCP or UDP packet is equal +to the complement of the sum of the pseudo header, because everything else gets +'cancelled out' by the checksum field. This is because the sum was +complemented before being written to the checksum field. + +More generally, this holds in any case where the 'IP-style' ones complement +checksum is used, and thus any checksum that TX Checksum Offload supports. + +That is, if we have set up TX Checksum Offload with a start/offset pair, we +know that after the device has filled in that checksum, the ones complement sum +from csum_start to the end of the packet will be equal to the complement of +whatever value we put in the checksum field beforehand. This allows us to +compute the outer checksum without looking at the payload: we simply stop +summing when we get to csum_start, then add the complement of the 16-bit word +at (csum_start + csum_offset). + +Then, when the true inner checksum is filled in (either by hardware or by +skb_checksum_help()), the outer checksum will become correct by virtue of the +arithmetic. + +LCO is performed by the stack when constructing an outer UDP header for an +encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the +IPv6 equivalents, in udp6_set_csum(). + +It is also performed when constructing an IPv4 GRE header, in +net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when +constructing an IPv6 GRE header; the GRE checksum is computed over the whole +packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use +LCO here as IPv6 GRE still uses an IP-style checksum. + +All of the LCO implementations use a helper function lco_csum(), in +include/linux/skbuff.h. + +LCO can safely be used for nested encapsulations; in this case, the outer +encapsulation layer will sum over both its own header and the 'middle' header. +This does mean that the 'middle' header will get summed multiple times, but +there doesn't seem to be a way to avoid that without incurring bigger costs +(e.g. in SKB bloat). + + +RCO: Remote Checksum Offload +============================ + +RCO is a technique for eliding the inner checksum of an encapsulated datagram, +allowing the outer checksum to be offloaded. It does, however, involve a +change to the encapsulation protocols, which the receiver must also support. +For this reason, it is disabled by default. + +RCO is detailed in the following Internet-Drafts: + +* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 +* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 + +In Linux, RCO is implemented individually in each encapsulation protocol, and +most tunnel types have flags controlling its use. For instance, VXLAN has the +flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be +used when transmitting to a given remote destination. diff --git a/Documentation/networking/checksum-offloads.txt b/Documentation/networking/checksum-offloads.txt @@ -1,122 +0,0 @@ -Checksum Offloads in the Linux Networking Stack - - -Introduction -============ - -This document describes a set of techniques in the Linux networking stack - to take advantage of checksum offload capabilities of various NICs. - -The following technologies are described: - * TX Checksum Offload - * LCO: Local Checksum Offload - * RCO: Remote Checksum Offload - -Things that should be documented here but aren't yet: - * RX Checksum Offload - * CHECKSUM_UNNECESSARY conversion - - -TX Checksum Offload -=================== - -The interface for offloading a transmit checksum to a device is explained - in detail in comments near the top of include/linux/skbuff.h. -In brief, it allows to request the device fill in a single ones-complement - checksum defined by the sk_buff fields skb->csum_start and - skb->csum_offset. The device should compute the 16-bit ones-complement - checksum (i.e. the 'IP-style' checksum) from csum_start to the end of the - packet, and fill in the result at (csum_start + csum_offset). -Because csum_offset cannot be negative, this ensures that the previous - value of the checksum field is included in the checksum computation, thus - it can be used to supply any needed corrections to the checksum (such as - the sum of the pseudo-header for UDP or TCP). -This interface only allows a single checksum to be offloaded. Where - encapsulation is used, the packet may have multiple checksum fields in - different header layers, and the rest will have to be handled by another - mechanism such as LCO or RCO. -CRC32c can also be offloaded using this interface, by means of filling - skb->csum_start and skb->csum_offset as described above, and setting - skb->csum_not_inet: see skbuff.h comment (section 'D') for more details. -No offloading of the IP header checksum is performed; it is always done in - software. This is OK because when we build the IP header, we obviously - have it in cache, so summing it isn't expensive. It's also rather short. -The requirements for GSO are more complicated, because when segmenting an - encapsulated packet both the inner and outer checksums may need to be - edited or recomputed for each resulting segment. See the skbuff.h comment - (section 'E') for more details. - -A driver declares its offload capabilities in netdev->hw_features; see - Documentation/networking/netdev-features.txt for more. Note that a device - which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start - and csum_offset given in the SKB; if it tries to deduce these itself in - hardware (as some NICs do) the driver should check that the values in the - SKB match those which the hardware will deduce, and if not, fall back to - checksumming in software instead (with skb_csum_hwoffload_help() or one of - the skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in - include/linux/skbuff.h). - -The stack should, for the most part, assume that checksum offload is - supported by the underlying device. The only place that should check is - validate_xmit_skb(), and the functions it calls directly or indirectly. - That function compares the offload features requested by the SKB (which - may include other offloads besides TX Checksum Offload) and, if they are - not supported or enabled on the device (determined by netdev->features), - performs the corresponding offload in software. In the case of TX - Checksum Offload, that means calling skb_csum_hwoffload_help(skb, features). - - -LCO: Local Checksum Offload -=========================== - -LCO is a technique for efficiently computing the outer checksum of an - encapsulated datagram when the inner checksum is due to be offloaded. -The ones-complement sum of a correctly checksummed TCP or UDP packet is - equal to the complement of the sum of the pseudo header, because everything - else gets 'cancelled out' by the checksum field. This is because the sum was - complemented before being written to the checksum field. -More generally, this holds in any case where the 'IP-style' ones complement - checksum is used, and thus any checksum that TX Checksum Offload supports. -That is, if we have set up TX Checksum Offload with a start/offset pair, we - know that after the device has filled in that checksum, the ones - complement sum from csum_start to the end of the packet will be equal to - the complement of whatever value we put in the checksum field beforehand. - This allows us to compute the outer checksum without looking at the payload: - we simply stop summing when we get to csum_start, then add the complement of - the 16-bit word at (csum_start + csum_offset). -Then, when the true inner checksum is filled in (either by hardware or by - skb_checksum_help()), the outer checksum will become correct by virtue of - the arithmetic. - -LCO is performed by the stack when constructing an outer UDP header for an - encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for - the IPv6 equivalents, in udp6_set_csum(). -It is also performed when constructing an IPv4 GRE header, in - net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when - constructing an IPv6 GRE header; the GRE checksum is computed over the - whole packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be - possible to use LCO here as IPv6 GRE still uses an IP-style checksum. -All of the LCO implementations use a helper function lco_csum(), in - include/linux/skbuff.h. - -LCO can safely be used for nested encapsulations; in this case, the outer - encapsulation layer will sum over both its own header and the 'middle' - header. This does mean that the 'middle' header will get summed multiple - times, but there doesn't seem to be a way to avoid that without incurring - bigger costs (e.g. in SKB bloat). - - -RCO: Remote Checksum Offload -============================ - -RCO is a technique for eliding the inner checksum of an encapsulated - datagram, allowing the outer checksum to be offloaded. It does, however, - involve a change to the encapsulation protocols, which the receiver must - also support. For this reason, it is disabled by default. -RCO is detailed in the following Internet-Drafts: -https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 -https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 -In Linux, RCO is implemented individually in each encapsulation protocol, - and most tunnel types have flags controlling its use. For instance, VXLAN - has the flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that - RCO should be used when transmitting to a given remote destination. diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst @@ -36,6 +36,9 @@ Contents: alias bridge snmp_counter + checksum-offloads + segmentation-offloads + scaling .. only:: subproject diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst @@ -0,0 +1,523 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +Scaling in the Linux Networking Stack +===================================== + + +Introduction +============ + +This document describes a set of complementary techniques in the Linux +networking stack to increase parallelism and improve performance for +multi-processor systems. + +The following technologies are described: + +- RSS: Receive Side Scaling +- RPS: Receive Packet Steering +- RFS: Receive Flow Steering +- Accelerated Receive Flow Steering +- XPS: Transmit Packet Steering + + +RSS: Receive Side Scaling +========================= + +Contemporary NICs support multiple receive and transmit descriptor queues +(multi-queue). On reception, a NIC can send different packets to different +queues to distribute processing among CPUs. The NIC distributes packets by +applying a filter to each packet that assigns it to one of a small number +of logical flows. Packets for each flow are steered to a separate receive +queue, which in turn can be processed by separate CPUs. This mechanism is +generally known as “Receive-side Scaling” (RSS). The goal of RSS and +the other scaling techniques is to increase performance uniformly. +Multi-queue distribution can also be used for traffic prioritization, but +that is not the focus of these techniques. + +The filter used in RSS is typically a hash function over the network +and/or transport layer headers-- for example, a 4-tuple hash over +IP addresses and TCP ports of a packet. The most common hardware +implementation of RSS uses a 128-entry indirection table where each entry +stores a queue number. The receive queue for a packet is determined +by masking out the low order seven bits of the computed hash for the +packet (usually a Toeplitz hash), taking this number as a key into the +indirection table and reading the corresponding value. + +Some advanced NICs allow steering packets to queues based on +programmable filters. For example, webserver bound TCP port 80 packets +can be directed to their own receive queue. Such “n-tuple” filters can +be configured from ethtool (--config-ntuple). + + +RSS Configuration +----------------- + +The driver for a multi-queue capable NIC typically provides a kernel +module parameter for specifying the number of hardware queues to +configure. In the bnx2x driver, for instance, this parameter is called +num_queues. A typical RSS configuration would be to have one receive queue +for each CPU if the device supports enough queues, or otherwise at least +one for each memory domain, where a memory domain is a set of CPUs that +share a particular memory level (L1, L2, NUMA node, etc.). + +The indirection table of an RSS device, which resolves a queue by masked +hash, is usually programmed by the driver at initialization. The +default mapping is to distribute the queues evenly in the table, but the +indirection table can be retrieved and modified at runtime using ethtool +commands (--show-rxfh-indir and --set-rxfh-indir). Modifying the +indirection table could be done to give different queues different +relative weights. + + +RSS IRQ Configuration +~~~~~~~~~~~~~~~~~~~~~ + +Each receive queue has a separate IRQ associated with it. The NIC triggers +this to notify a CPU when new packets arrive on the given queue. The +signaling path for PCIe devices uses message signaled interrupts (MSI-X), +that can route each interrupt to a particular CPU. The active mapping +of queues to IRQs can be determined from /proc/interrupts. By default, +an IRQ may be handled on any CPU. Because a non-negligible part of packet +processing takes place in receive interrupt handling, it is advantageous +to spread receive interrupts between CPUs. To manually adjust the IRQ +affinity of each interrupt see Documentation/IRQ-affinity.txt. Some systems +will be running irqbalance, a daemon that dynamically optimizes IRQ +assignments and as a result may override any manual settings. + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +RSS should be enabled when latency is a concern or whenever receive +interrupt processing forms a bottleneck. Spreading load between CPUs +decreases queue length. For low latency networking, the optimal setting +is to allocate as many queues as there are CPUs in the system (or the +NIC maximum, if lower). The most efficient high-rate configuration +is likely the one with the smallest number of receive queues where no +receive queue overflows due to a saturated CPU, because in default +mode with interrupt coalescing enabled, the aggregate number of +interrupts (and thus work) grows with each additional queue. + +Per-cpu load can be observed using the mpstat utility, but note that on +processors with hyperthreading (HT), each hyperthread is represented as +a separate CPU. For interrupt handling, HT has shown no benefit in +initial tests, so limit the number of queues to the number of CPU cores +in the system. + + +RPS: Receive Packet Steering +============================ + +Receive Packet Steering (RPS) is logically a software implementation of +RSS. Being in software, it is necessarily called later in the datapath. +Whereas RSS selects the queue and hence CPU that will run the hardware +interrupt handler, RPS selects the CPU to perform protocol processing +above the interrupt handler. This is accomplished by placing the packet +on the desired CPU’s backlog queue and waking up the CPU for processing. +RPS has some advantages over RSS: + +1) it can be used with any NIC +2) software filters can easily be added to hash over new protocols +3) it does not increase hardware device interrupt rate (although it does + introduce inter-processor interrupts (IPIs)) + +RPS is called during bottom half of the receive interrupt handler, when +a driver sends a packet up the network stack with netif_rx() or +netif_receive_skb(). These call the get_rps_cpu() function, which +selects the queue that should process a packet. + +The first step in determining the target CPU for RPS is to calculate a +flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash +depending on the protocol). This serves as a consistent hash of the +associated flow of the packet. The hash is either provided by hardware +or will be computed in the stack. Capable hardware can pass the hash in +the receive descriptor for the packet; this would usually be the same +hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in +skb->hash and can be used elsewhere in the stack as a hash of the +packet’s flow. + +Each receive hardware queue has an associated list of CPUs to which +RPS may enqueue packets for processing. For each received packet, +an index into the list is computed from the flow hash modulo the size +of the list. The indexed CPU is the target for processing the packet, +and the packet is queued to the tail of that CPU’s backlog queue. At +the end of the bottom half routine, IPIs are sent to any CPUs for which +packets have been queued to their backlog queue. The IPI wakes backlog +processing on the remote CPU, and any queued packets are then processed +up the networking stack. + + +RPS Configuration +----------------- + +RPS requires a kernel compiled with the CONFIG_RPS kconfig symbol (on +by default for SMP). Even when compiled in, RPS remains disabled until +explicitly configured. The list of CPUs to which RPS may forward traffic +can be configured for each receive queue using a sysfs file entry:: + + /sys/class/net/<dev>/queues/rx-<n>/rps_cpus + +This file implements a bitmap of CPUs. RPS is disabled when it is zero +(the default), in which case packets are processed on the interrupting +CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to +the bitmap. + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +For a single queue device, a typical RPS configuration would be to set +the rps_cpus to the CPUs in the same memory domain of the interrupting +CPU. If NUMA locality is not an issue, this could also be all CPUs in +the system. At high interrupt rate, it might be wise to exclude the +interrupting CPU from the map since that already performs much work. + +For a multi-queue system, if RSS is configured so that a hardware +receive queue is mapped to each CPU, then RPS is probably redundant +and unnecessary. If there are fewer hardware queues than CPUs, then +RPS might be beneficial if the rps_cpus for each queue are the ones that +share the same memory domain as the interrupting CPU for that queue. + + +RPS Flow Limit +-------------- + +RPS scales kernel receive processing across CPUs without introducing +reordering. The trade-off to sending all packets from the same flow +to the same CPU is CPU load imbalance if flows vary in packet rate. +In the extreme case a single flow dominates traffic. Especially on +common server workloads with many concurrent connections, such +behavior indicates a problem such as a misconfiguration or spoofed +source Denial of Service attack. + +Flow Limit is an optional RPS feature that prioritizes small flows +during CPU contention by dropping packets from large flows slightly +ahead of those from small flows. It is active only when an RPS or RFS +destination CPU approaches saturation. Once a CPU's input packet +queue exceeds half the maximum queue length (as set by sysctl +net.core.netdev_max_backlog), the kernel starts a per-flow packet +count over the last 256 packets. If a flow exceeds a set ratio (by +default, half) of these packets when a new packet arrives, then the +new packet is dropped. Packets from other flows are still only +dropped once the input packet queue reaches netdev_max_backlog. +No packets are dropped when the input packet queue length is below +the threshold, so flow limit does not sever connections outright: +even large flows maintain connectivity. + + +Interface +~~~~~~~~~ + +Flow limit is compiled in by default (CONFIG_NET_FLOW_LIMIT), but not +turned on. It is implemented for each CPU independently (to avoid lock +and cache contention) and toggled per CPU by setting the relevant bit +in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU +bitmap interface as rps_cpus (see above) when called from procfs:: + + /proc/sys/net/core/flow_limit_cpu_bitmap + +Per-flow rate is calculated by hashing each packet into a hashtable +bucket and incrementing a per-bucket counter. The hash function is +the same that selects a CPU in RPS, but as the number of buckets can +be much larger than the number of CPUs, flow limit has finer-grained +identification of large flows and fewer false positives. The default +table has 4096 buckets. This value can be modified through sysctl:: + + net.core.flow_limit_table_len + +The value is only consulted when a new table is allocated. Modifying +it does not update active tables. + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +Flow limit is useful on systems with many concurrent connections, +where a single connection taking up 50% of a CPU indicates a problem. +In such environments, enable the feature on all CPUs that handle +network rx interrupts (as set in /proc/irq/N/smp_affinity). + +The feature depends on the input packet queue length to exceed +the flow limit threshold (50%) + the flow history length (256). +Setting net.core.netdev_max_backlog to either 1000 or 10000 +performed well in experiments. + + +RFS: Receive Flow Steering +========================== + +While RPS steers packets solely based on hash, and thus generally +provides good load distribution, it does not take into account +application locality. This is accomplished by Receive Flow Steering +(RFS). The goal of RFS is to increase datacache hitrate by steering +kernel processing of packets to the CPU where the application thread +consuming the packet is running. RFS relies on the same RPS mechanisms +to enqueue packets onto the backlog of another CPU and to wake up that +CPU. + +In RFS, packets are not forwarded directly by the value of their hash, +but the hash is used as index into a flow lookup table. This table maps +flows to the CPUs where those flows are being processed. The flow hash +(see RPS section above) is used to calculate the index into this table. +The CPU recorded in each entry is the one which last processed the flow. +If an entry does not hold a valid CPU, then packets mapped to that entry +are steered using plain RPS. Multiple table entries may point to the +same CPU. Indeed, with many flows and few CPUs, it is very likely that +a single application thread handles flows with many different flow hashes. + +rps_sock_flow_table is a global flow table that contains the *desired* CPU +for flows: the CPU that is currently processing the flow in userspace. +Each table value is a CPU index that is updated during calls to recvmsg +and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() +and tcp_splice_read()). + +When the scheduler moves a thread to a new CPU while it has outstanding +receive packets on the old CPU, packets may arrive out of order. To +avoid this, RFS uses a second flow table to track outstanding packets +for each flow: rps_dev_flow_table is a table specific to each hardware +receive queue of each device. Each table value stores a CPU index and a +counter. The CPU index represents the *current* CPU onto which packets +for this flow are enqueued for further kernel processing. Ideally, kernel +and userspace processing occur on the same CPU, and hence the CPU index +in both tables is identical. This is likely false if the scheduler has +recently migrated a userspace thread while the kernel still has packets +enqueued for kernel processing on the old CPU. + +The counter in rps_dev_flow_table values records the length of the current +CPU's backlog when a packet in this flow was last enqueued. Each backlog +queue has a head counter that is incremented on dequeue. A tail counter +is computed as head counter + queue length. In other words, the counter +in rps_dev_flow[i] records the last element in flow i that has +been enqueued onto the currently designated CPU for flow i (of course, +entry i is actually selected by hash and multiple flows may hash to the +same entry i). + +And now the trick for avoiding out of order packets: when selecting the +CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table +and the rps_dev_flow table of the queue that the packet was received on +are compared. If the desired CPU for the flow (found in the +rps_sock_flow table) matches the current CPU (found in the rps_dev_flow +table), the packet is enqueued onto that CPU’s backlog. If they differ, +the current CPU is updated to match the desired CPU if one of the +following is true: + + - The current CPU's queue head counter >= the recorded tail counter + value in rps_dev_flow[i] + - The current CPU is unset (>= nr_cpu_ids) + - The current CPU is offline + +After this check, the packet is sent to the (possibly updated) current +CPU. These rules aim to ensure that a flow only moves to a new CPU when +there are no packets outstanding on the old CPU, as the outstanding +packets could arrive later than those about to be processed on the new +CPU. + + +RFS Configuration +----------------- + +RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on +by default for SMP). The functionality remains disabled until explicitly +configured. The number of entries in the global flow table is set through:: + + /proc/sys/net/core/rps_sock_flow_entries + +The number of entries in the per-queue flow table are set through:: + + /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +Both of these need to be set before RFS is enabled for a receive queue. +Values for both are rounded up to the nearest power of two. The +suggested flow count depends on the expected number of active connections +at any given time, which may be significantly less than the number of open +connections. We have found that a value of 32768 for rps_sock_flow_entries +works fairly well on a moderately loaded server. + +For a single queue device, the rps_flow_cnt value for the single queue +would normally be configured to the same value as rps_sock_flow_entries. +For a multi-queue device, the rps_flow_cnt for each queue might be +configured as rps_sock_flow_entries / N, where N is the number of +queues. So for instance, if rps_sock_flow_entries is set to 32768 and there +are 16 configured receive queues, rps_flow_cnt for each queue might be +configured as 2048. + + +Accelerated RFS +=============== + +Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load +balancing mechanism that uses soft state to steer flows based on where +the application thread consuming the packets of each flow is running. +Accelerated RFS should perform better than RFS since packets are sent +directly to a CPU local to the thread consuming the data. The target CPU +will either be the same CPU where the application runs, or at least a CPU +which is local to the application thread’s CPU in the cache hierarchy. + +To enable accelerated RFS, the networking stack calls the +ndo_rx_flow_steer driver function to communicate the desired hardware +queue for packets matching a particular flow. The network stack +automatically calls this function every time a flow entry in +rps_dev_flow_table is updated. The driver in turn uses a device specific +method to program the NIC to steer the packets. + +The hardware queue for a flow is derived from the CPU recorded in +rps_dev_flow_table. The stack consults a CPU to hardware queue map which +is maintained by the NIC driver. This is an auto-generated reverse map of +the IRQ affinity table shown by /proc/interrupts. Drivers can use +functions in the cpu_rmap (“CPU affinity reverse map”) kernel library +to populate the map. For each CPU, the corresponding queue in the map is +set to be one whose processing CPU is closest in cache locality. + + +Accelerated RFS Configuration +----------------------------- + +Accelerated RFS is only available if the kernel is compiled with +CONFIG_RFS_ACCEL and support is provided by the NIC device and driver. +It also requires that ntuple filtering is enabled via ethtool. The map +of CPU to queues is automatically deduced from the IRQ affinities +configured for each receive queue by the driver, so no additional +configuration should be necessary. + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +This technique should be enabled whenever one wants to use RFS and the +NIC supports hardware acceleration. + + +XPS: Transmit Packet Steering +============================= + +Transmit Packet Steering is a mechanism for intelligently selecting +which transmit queue to use when transmitting a packet on a multi-queue +device. This can be accomplished by recording two kinds of maps, either +a mapping of CPU to hardware queue(s) or a mapping of receive queue(s) +to hardware transmit queue(s). + +1. XPS using CPUs map + +The goal of this mapping is usually to assign queues +exclusively to a subset of CPUs, where the transmit completions for +these queues are processed on a CPU within this set. This choice +provides two benefits. First, contention on the device queue lock is +significantly reduced since fewer CPUs contend for the same queue +(contention can be eliminated completely if each CPU has its own +transmit queue). Secondly, cache miss rate on transmit completion is +reduced, in particular for data cache lines that hold the sk_buff +structures. + +2. XPS using receive queues map + +This mapping is used to pick transmit queue based on the receive +queue(s) map configuration set by the administrator. A set of receive +queues can be mapped to a set of transmit queues (many:many), although +the common use case is a 1:1 mapping. This will enable sending packets +on the same queue associations for transmit and receive. This is useful for +busy polling multi-threaded workloads where there are challenges in +associating a given CPU to a given application thread. The application +threads are not pinned to CPUs and each thread handles packets +received on a single queue. The receive queue number is cached in the +socket for the connection. In this model, sending the packets on the same +transmit queue corresponding to the associated receive queue has benefits +in keeping the CPU overhead low. Transmit completion work is locked into +the same queue-association that a given application is polling on. This +avoids the overhead of triggering an interrupt on another CPU. When the +application cleans up the packets during the busy poll, transmit completion +may be processed along with it in the same thread context and so result in +reduced latency. + +XPS is configured per transmit queue by setting a bitmap of +CPUs/receive-queues that may use that queue to transmit. The reverse +mapping, from CPUs to transmit queues or from receive-queues to transmit +queues, is computed and maintained for each network device. When +transmitting the first packet in a flow, the function get_xps_queue() is +called to select a queue. This function uses the ID of the receive queue +for the socket connection for a match in the receive queue-to-transmit queue +lookup table. Alternatively, this function can also use the ID of the +running CPU as a key into the CPU-to-queue lookup table. If the +ID matches a single queue, that is used for transmission. If multiple +queues match, one is selected by using the flow hash to compute an index +into the set. When selecting the transmit queue based on receive queue(s) +map, the transmit device is not validated against the receive device as it +requires expensive lookup operation in the datapath. + +The queue chosen for transmitting a particular flow is saved in the +corresponding socket structure for the flow (e.g. a TCP connection). +This transmit queue is used for subsequent packets sent on the flow to +prevent out of order (ooo) packets. The choice also amortizes the cost +of calling get_xps_queues() over all packets in the flow. To avoid +ooo packets, the queue for a flow can subsequently only be changed if +skb->ooo_okay is set for a packet in the flow. This flag indicates that +there are no outstanding packets in the flow, so the transmit queue can +change without the risk of generating out of order packets. The +transport layer is responsible for setting ooo_okay appropriately. TCP, +for instance, sets the flag when all data for a connection has been +acknowledged. + +XPS Configuration +----------------- + +XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by +default for SMP). The functionality remains disabled until explicitly +configured. To enable XPS, the bitmap of CPUs/receive-queues that may +use a transmit queue is configured using the sysfs file entry: + +For selection based on CPUs map:: + + /sys/class/net/<dev>/queues/tx-<n>/xps_cpus + +For selection based on receive-queues map:: + + /sys/class/net/<dev>/queues/tx-<n>/xps_rxqs + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +For a network device with a single transmission queue, XPS configuration +has no effect, since there is no choice in this case. In a multi-queue +system, XPS is preferably configured so that each CPU maps onto one queue. +If there are as many queues as there are CPUs in the system, then each +queue can also map onto one CPU, resulting in exclusive pairings that +experience no contention. If there are fewer queues than CPUs, then the +best CPUs to share a given queue are probably those that share the cache +with the CPU that processes transmit completions for that queue +(transmit interrupts). + +For transmit queue selection based on receive queue(s), XPS has to be +explicitly configured mapping receive-queue(s) to transmit queue(s). If the +user configuration for receive-queue map does not apply, then the transmit +queue is selected based on the CPUs map. + + +Per TX Queue rate limitation +============================ + +These are rate-limitation mechanisms implemented by HW, where currently +a max-rate attribute is supported, by setting a Mbps value to:: + + /sys/class/net/<dev>/queues/tx-<n>/tx_maxrate + +A value of zero means disabled, and this is the default. + + +Further Information +=================== +RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into +2.6.38. Original patches were submitted by Tom Herbert +(therbert@google.com) + +Accelerated RFS was introduced in 2.6.35. Original patches were +submitted by Ben Hutchings (bwh@kernel.org) + +Authors: + +- Tom Herbert (therbert@google.com) +- Willem de Bruijn (willemb@google.com) diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt @@ -1,484 +0,0 @@ -Scaling in the Linux Networking Stack - - -Introduction -============ - -This document describes a set of complementary techniques in the Linux -networking stack to increase parallelism and improve performance for -multi-processor systems. - -The following technologies are described: - - RSS: Receive Side Scaling - RPS: Receive Packet Steering - RFS: Receive Flow Steering - Accelerated Receive Flow Steering - XPS: Transmit Packet Steering - - -RSS: Receive Side Scaling -========================= - -Contemporary NICs support multiple receive and transmit descriptor queues -(multi-queue). On reception, a NIC can send different packets to different -queues to distribute processing among CPUs. The NIC distributes packets by -applying a filter to each packet that assigns it to one of a small number -of logical flows. Packets for each flow are steered to a separate receive -queue, which in turn can be processed by separate CPUs. This mechanism is -generally known as “Receive-side Scaling” (RSS). The goal of RSS and -the other scaling techniques is to increase performance uniformly. -Multi-queue distribution can also be used for traffic prioritization, but -that is not the focus of these techniques. - -The filter used in RSS is typically a hash function over the network -and/or transport layer headers-- for example, a 4-tuple hash over -IP addresses and TCP ports of a packet. The most common hardware -implementation of RSS uses a 128-entry indirection table where each entry -stores a queue number. The receive queue for a packet is determined -by masking out the low order seven bits of the computed hash for the -packet (usually a Toeplitz hash), taking this number as a key into the -indirection table and reading the corresponding value. - -Some advanced NICs allow steering packets to queues based on -programmable filters. For example, webserver bound TCP port 80 packets -can be directed to their own receive queue. Such “n-tuple” filters can -be configured from ethtool (--config-ntuple). - -==== RSS Configuration - -The driver for a multi-queue capable NIC typically provides a kernel -module parameter for specifying the number of hardware queues to -configure. In the bnx2x driver, for instance, this parameter is called -num_queues. A typical RSS configuration would be to have one receive queue -for each CPU if the device supports enough queues, or otherwise at least -one for each memory domain, where a memory domain is a set of CPUs that -share a particular memory level (L1, L2, NUMA node, etc.). - -The indirection table of an RSS device, which resolves a queue by masked -hash, is usually programmed by the driver at initialization. The -default mapping is to distribute the queues evenly in the table, but the -indirection table can be retrieved and modified at runtime using ethtool -commands (--show-rxfh-indir and --set-rxfh-indir). Modifying the -indirection table could be done to give different queues different -relative weights. - -== RSS IRQ Configuration - -Each receive queue has a separate IRQ associated with it. The NIC triggers -this to notify a CPU when new packets arrive on the given queue. The -signaling path for PCIe devices uses message signaled interrupts (MSI-X), -that can route each interrupt to a particular CPU. The active mapping -of queues to IRQs can be determined from /proc/interrupts. By default, -an IRQ may be handled on any CPU. Because a non-negligible part of packet -processing takes place in receive interrupt handling, it is advantageous -to spread receive interrupts between CPUs. To manually adjust the IRQ -affinity of each interrupt see Documentation/IRQ-affinity.txt. Some systems -will be running irqbalance, a daemon that dynamically optimizes IRQ -assignments and as a result may override any manual settings. - -== Suggested Configuration - -RSS should be enabled when latency is a concern or whenever receive -interrupt processing forms a bottleneck. Spreading load between CPUs -decreases queue length. For low latency networking, the optimal setting -is to allocate as many queues as there are CPUs in the system (or the -NIC maximum, if lower). The most efficient high-rate configuration -is likely the one with the smallest number of receive queues where no -receive queue overflows due to a saturated CPU, because in default -mode with interrupt coalescing enabled, the aggregate number of -interrupts (and thus work) grows with each additional queue. - -Per-cpu load can be observed using the mpstat utility, but note that on -processors with hyperthreading (HT), each hyperthread is represented as -a separate CPU. For interrupt handling, HT has shown no benefit in -initial tests, so limit the number of queues to the number of CPU cores -in the system. - - -RPS: Receive Packet Steering -============================ - -Receive Packet Steering (RPS) is logically a software implementation of -RSS. Being in software, it is necessarily called later in the datapath. -Whereas RSS selects the queue and hence CPU that will run the hardware -interrupt handler, RPS selects the CPU to perform protocol processing -above the interrupt handler. This is accomplished by placing the packet -on the desired CPU’s backlog queue and waking up the CPU for processing. -RPS has some advantages over RSS: 1) it can be used with any NIC, -2) software filters can easily be added to hash over new protocols, -3) it does not increase hardware device interrupt rate (although it does -introduce inter-processor interrupts (IPIs)). - -RPS is called during bottom half of the receive interrupt handler, when -a driver sends a packet up the network stack with netif_rx() or -netif_receive_skb(). These call the get_rps_cpu() function, which -selects the queue that should process a packet. - -The first step in determining the target CPU for RPS is to calculate a -flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash -depending on the protocol). This serves as a consistent hash of the -associated flow of the packet. The hash is either provided by hardware -or will be computed in the stack. Capable hardware can pass the hash in -the receive descriptor for the packet; this would usually be the same -hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in -skb->hash and can be used elsewhere in the stack as a hash of the -packet’s flow. - -Each receive hardware queue has an associated list of CPUs to which -RPS may enqueue packets for processing. For each received packet, -an index into the list is computed from the flow hash modulo the size -of the list. The indexed CPU is the target for processing the packet, -and the packet is queued to the tail of that CPU’s backlog queue. At -the end of the bottom half routine, IPIs are sent to any CPUs for which -packets have been queued to their backlog queue. The IPI wakes backlog -processing on the remote CPU, and any queued packets are then processed -up the networking stack. - -==== RPS Configuration - -RPS requires a kernel compiled with the CONFIG_RPS kconfig symbol (on -by default for SMP). Even when compiled in, RPS remains disabled until -explicitly configured. The list of CPUs to which RPS may forward traffic -can be configured for each receive queue using a sysfs file entry: - - /sys/class/net/<dev>/queues/rx-<n>/rps_cpus - -This file implements a bitmap of CPUs. RPS is disabled when it is zero -(the default), in which case packets are processed on the interrupting -CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to -the bitmap. - -== Suggested Configuration - -For a single queue device, a typical RPS configuration would be to set -the rps_cpus to the CPUs in the same memory domain of the interrupting -CPU. If NUMA locality is not an issue, this could also be all CPUs in -the system. At high interrupt rate, it might be wise to exclude the -interrupting CPU from the map since that already performs much work. - -For a multi-queue system, if RSS is configured so that a hardware -receive queue is mapped to each CPU, then RPS is probably redundant -and unnecessary. If there are fewer hardware queues than CPUs, then -RPS might be beneficial if the rps_cpus for each queue are the ones that -share the same memory domain as the interrupting CPU for that queue. - -==== RPS Flow Limit - -RPS scales kernel receive processing across CPUs without introducing -reordering. The trade-off to sending all packets from the same flow -to the same CPU is CPU load imbalance if flows vary in packet rate. -In the extreme case a single flow dominates traffic. Especially on -common server workloads with many concurrent connections, such -behavior indicates a problem such as a misconfiguration or spoofed -source Denial of Service attack. - -Flow Limit is an optional RPS feature that prioritizes small flows -during CPU contention by dropping packets from large flows slightly -ahead of those from small flows. It is active only when an RPS or RFS -destination CPU approaches saturation. Once a CPU's input packet -queue exceeds half the maximum queue length (as set by sysctl -net.core.netdev_max_backlog), the kernel starts a per-flow packet -count over the last 256 packets. If a flow exceeds a set ratio (by -default, half) of these packets when a new packet arrives, then the -new packet is dropped. Packets from other flows are still only -dropped once the input packet queue reaches netdev_max_backlog. -No packets are dropped when the input packet queue length is below -the threshold, so flow limit does not sever connections outright: -even large flows maintain connectivity. - -== Interface - -Flow limit is compiled in by default (CONFIG_NET_FLOW_LIMIT), but not -turned on. It is implemented for each CPU independently (to avoid lock -and cache contention) and toggled per CPU by setting the relevant bit -in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU -bitmap interface as rps_cpus (see above) when called from procfs: - - /proc/sys/net/core/flow_limit_cpu_bitmap - -Per-flow rate is calculated by hashing each packet into a hashtable -bucket and incrementing a per-bucket counter. The hash function is -the same that selects a CPU in RPS, but as the number of buckets can -be much larger than the number of CPUs, flow limit has finer-grained -identification of large flows and fewer false positives. The default -table has 4096 buckets. This value can be modified through sysctl - - net.core.flow_limit_table_len - -The value is only consulted when a new table is allocated. Modifying -it does not update active tables. - -== Suggested Configuration - -Flow limit is useful on systems with many concurrent connections, -where a single connection taking up 50% of a CPU indicates a problem. -In such environments, enable the feature on all CPUs that handle -network rx interrupts (as set in /proc/irq/N/smp_affinity). - -The feature depends on the input packet queue length to exceed -the flow limit threshold (50%) + the flow history length (256). -Setting net.core.netdev_max_backlog to either 1000 or 10000 -performed well in experiments. - - -RFS: Receive Flow Steering -========================== - -While RPS steers packets solely based on hash, and thus generally -provides good load distribution, it does not take into account -application locality. This is accomplished by Receive Flow Steering -(RFS). The goal of RFS is to increase datacache hitrate by steering -kernel processing of packets to the CPU where the application thread -consuming the packet is running. RFS relies on the same RPS mechanisms -to enqueue packets onto the backlog of another CPU and to wake up that -CPU. - -In RFS, packets are not forwarded directly by the value of their hash, -but the hash is used as index into a flow lookup table. This table maps -flows to the CPUs where those flows are being processed. The flow hash -(see RPS section above) is used to calculate the index into this table. -The CPU recorded in each entry is the one which last processed the flow. -If an entry does not hold a valid CPU, then packets mapped to that entry -are steered using plain RPS. Multiple table entries may point to the -same CPU. Indeed, with many flows and few CPUs, it is very likely that -a single application thread handles flows with many different flow hashes. - -rps_sock_flow_table is a global flow table that contains the *desired* CPU -for flows: the CPU that is currently processing the flow in userspace. -Each table value is a CPU index that is updated during calls to recvmsg -and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() -and tcp_splice_read()). - -When the scheduler moves a thread to a new CPU while it has outstanding -receive packets on the old CPU, packets may arrive out of order. To -avoid this, RFS uses a second flow table to track outstanding packets -for each flow: rps_dev_flow_table is a table specific to each hardware -receive queue of each device. Each table value stores a CPU index and a -counter. The CPU index represents the *current* CPU onto which packets -for this flow are enqueued for further kernel processing. Ideally, kernel -and userspace processing occur on the same CPU, and hence the CPU index -in both tables is identical. This is likely false if the scheduler has -recently migrated a userspace thread while the kernel still has packets -enqueued for kernel processing on the old CPU. - -The counter in rps_dev_flow_table values records the length of the current -CPU's backlog when a packet in this flow was last enqueued. Each backlog -queue has a head counter that is incremented on dequeue. A tail counter -is computed as head counter + queue length. In other words, the counter -in rps_dev_flow[i] records the last element in flow i that has -been enqueued onto the currently designated CPU for flow i (of course, -entry i is actually selected by hash and multiple flows may hash to the -same entry i). - -And now the trick for avoiding out of order packets: when selecting the -CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table -and the rps_dev_flow table of the queue that the packet was received on -are compared. If the desired CPU for the flow (found in the -rps_sock_flow table) matches the current CPU (found in the rps_dev_flow -table), the packet is enqueued onto that CPU’s backlog. If they differ, -the current CPU is updated to match the desired CPU if one of the -following is true: - -- The current CPU's queue head counter >= the recorded tail counter - value in rps_dev_flow[i] -- The current CPU is unset (>= nr_cpu_ids) -- The current CPU is offline - -After this check, the packet is sent to the (possibly updated) current -CPU. These rules aim to ensure that a flow only moves to a new CPU when -there are no packets outstanding on the old CPU, as the outstanding -packets could arrive later than those about to be processed on the new -CPU. - -==== RFS Configuration - -RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on -by default for SMP). The functionality remains disabled until explicitly -configured. The number of entries in the global flow table is set through: - - /proc/sys/net/core/rps_sock_flow_entries - -The number of entries in the per-queue flow table are set through: - - /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt - -== Suggested Configuration - -Both of these need to be set before RFS is enabled for a receive queue. -Values for both are rounded up to the nearest power of two. The -suggested flow count depends on the expected number of active connections -at any given time, which may be significantly less than the number of open -connections. We have found that a value of 32768 for rps_sock_flow_entries -works fairly well on a moderately loaded server. - -For a single queue device, the rps_flow_cnt value for the single queue -would normally be configured to the same value as rps_sock_flow_entries. -For a multi-queue device, the rps_flow_cnt for each queue might be -configured as rps_sock_flow_entries / N, where N is the number of -queues. So for instance, if rps_sock_flow_entries is set to 32768 and there -are 16 configured receive queues, rps_flow_cnt for each queue might be -configured as 2048. - - -Accelerated RFS -=============== - -Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load -balancing mechanism that uses soft state to steer flows based on where -the application thread consuming the packets of each flow is running. -Accelerated RFS should perform better than RFS since packets are sent -directly to a CPU local to the thread consuming the data. The target CPU -will either be the same CPU where the application runs, or at least a CPU -which is local to the application thread’s CPU in the cache hierarchy. - -To enable accelerated RFS, the networking stack calls the -ndo_rx_flow_steer driver function to communicate the desired hardware -queue for packets matching a particular flow. The network stack -automatically calls this function every time a flow entry in -rps_dev_flow_table is updated. The driver in turn uses a device specific -method to program the NIC to steer the packets. - -The hardware queue for a flow is derived from the CPU recorded in -rps_dev_flow_table. The stack consults a CPU to hardware queue map which -is maintained by the NIC driver. This is an auto-generated reverse map of -the IRQ affinity table shown by /proc/interrupts. Drivers can use -functions in the cpu_rmap (“CPU affinity reverse map”) kernel library -to populate the map. For each CPU, the corresponding queue in the map is -set to be one whose processing CPU is closest in cache locality. - -==== Accelerated RFS Configuration - -Accelerated RFS is only available if the kernel is compiled with -CONFIG_RFS_ACCEL and support is provided by the NIC device and driver. -It also requires that ntuple filtering is enabled via ethtool. The map -of CPU to queues is automatically deduced from the IRQ affinities -configured for each receive queue by the driver, so no additional -configuration should be necessary. - -== Suggested Configuration - -This technique should be enabled whenever one wants to use RFS and the -NIC supports hardware acceleration. - -XPS: Transmit Packet Steering -============================= - -Transmit Packet Steering is a mechanism for intelligently selecting -which transmit queue to use when transmitting a packet on a multi-queue -device. This can be accomplished by recording two kinds of maps, either -a mapping of CPU to hardware queue(s) or a mapping of receive queue(s) -to hardware transmit queue(s). - -1. XPS using CPUs map - -The goal of this mapping is usually to assign queues -exclusively to a subset of CPUs, where the transmit completions for -these queues are processed on a CPU within this set. This choice -provides two benefits. First, contention on the device queue lock is -significantly reduced since fewer CPUs contend for the same queue -(contention can be eliminated completely if each CPU has its own -transmit queue). Secondly, cache miss rate on transmit completion is -reduced, in particular for data cache lines that hold the sk_buff -structures. - -2. XPS using receive queues map - -This mapping is used to pick transmit queue based on the receive -queue(s) map configuration set by the administrator. A set of receive -queues can be mapped to a set of transmit queues (many:many), although -the common use case is a 1:1 mapping. This will enable sending packets -on the same queue associations for transmit and receive. This is useful for -busy polling multi-threaded workloads where there are challenges in -associating a given CPU to a given application thread. The application -threads are not pinned to CPUs and each thread handles packets -received on a single queue. The receive queue number is cached in the -socket for the connection. In this model, sending the packets on the same -transmit queue corresponding to the associated receive queue has benefits -in keeping the CPU overhead low. Transmit completion work is locked into -the same queue-association that a given application is polling on. This -avoids the overhead of triggering an interrupt on another CPU. When the -application cleans up the packets during the busy poll, transmit completion -may be processed along with it in the same thread context and so result in -reduced latency. - -XPS is configured per transmit queue by setting a bitmap of -CPUs/receive-queues that may use that queue to transmit. The reverse -mapping, from CPUs to transmit queues or from receive-queues to transmit -queues, is computed and maintained for each network device. When -transmitting the first packet in a flow, the function get_xps_queue() is -called to select a queue. This function uses the ID of the receive queue -for the socket connection for a match in the receive queue-to-transmit queue -lookup table. Alternatively, this function can also use the ID of the -running CPU as a key into the CPU-to-queue lookup table. If the -ID matches a single queue, that is used for transmission. If multiple -queues match, one is selected by using the flow hash to compute an index -into the set. When selecting the transmit queue based on receive queue(s) -map, the transmit device is not validated against the receive device as it -requires expensive lookup operation in the datapath. - -The queue chosen for transmitting a particular flow is saved in the -corresponding socket structure for the flow (e.g. a TCP connection). -This transmit queue is used for subsequent packets sent on the flow to -prevent out of order (ooo) packets. The choice also amortizes the cost -of calling get_xps_queues() over all packets in the flow. To avoid -ooo packets, the queue for a flow can subsequently only be changed if -skb->ooo_okay is set for a packet in the flow. This flag indicates that -there are no outstanding packets in the flow, so the transmit queue can -change without the risk of generating out of order packets. The -transport layer is responsible for setting ooo_okay appropriately. TCP, -for instance, sets the flag when all data for a connection has been -acknowledged. - -==== XPS Configuration - -XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by -default for SMP). The functionality remains disabled until explicitly -configured. To enable XPS, the bitmap of CPUs/receive-queues that may -use a transmit queue is configured using the sysfs file entry: - -For selection based on CPUs map: -/sys/class/net/<dev>/queues/tx-<n>/xps_cpus - -For selection based on receive-queues map: -/sys/class/net/<dev>/queues/tx-<n>/xps_rxqs - -== Suggested Configuration - -For a network device with a single transmission queue, XPS configuration -has no effect, since there is no choice in this case. In a multi-queue -system, XPS is preferably configured so that each CPU maps onto one queue. -If there are as many queues as there are CPUs in the system, then each -queue can also map onto one CPU, resulting in exclusive pairings that -experience no contention. If there are fewer queues than CPUs, then the -best CPUs to share a given queue are probably those that share the cache -with the CPU that processes transmit completions for that queue -(transmit interrupts). - -For transmit queue selection based on receive queue(s), XPS has to be -explicitly configured mapping receive-queue(s) to transmit queue(s). If the -user configuration for receive-queue map does not apply, then the transmit -queue is selected based on the CPUs map. - -Per TX Queue rate limitation: -============================= - -These are rate-limitation mechanisms implemented by HW, where currently -a max-rate attribute is supported, by setting a Mbps value to - -/sys/class/net/<dev>/queues/tx-<n>/tx_maxrate - -A value of zero means disabled, and this is the default. - -Further Information -=================== -RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into -2.6.38. Original patches were submitted by Tom Herbert -(therbert@google.com) - -Accelerated RFS was introduced in 2.6.35. Original patches were -submitted by Ben Hutchings (bwh@kernel.org) - -Authors: -Tom Herbert (therbert@google.com) -Willem de Bruijn (willemb@google.com) diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst @@ -0,0 +1,184 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +Segmentation Offloads +===================== + + +Introduction +============ + +This document describes a set of techniques in the Linux networking stack +to take advantage of segmentation offload capabilities of various NICs. + +The following technologies are described: + * TCP Segmentation Offload - TSO + * UDP Fragmentation Offload - UFO + * IPIP, SIT, GRE, and UDP Tunnel Offloads + * Generic Segmentation Offload - GSO + * Generic Receive Offload - GRO + * Partial Generic Segmentation Offload - GSO_PARTIAL + * SCTP accelleration with GSO - GSO_BY_FRAGS + + +TCP Segmentation Offload +======================== + +TCP segmentation allows a device to segment a single frame into multiple +frames with a data payload size specified in skb_shinfo()->gso_size. +When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or +SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and +skb_shinfo()->gso_size should be set to a non-zero value. + +TCP segmentation is dependent on support for the use of partial checksum +offload. For this reason TSO is normally disabled if the Tx checksum +offload for a given device is disabled. + +In order to support TCP segmentation offload it is necessary to populate +the network and transport header offsets of the skbuff so that the device +drivers will be able determine the offsets of the IP or IPv6 header and the +TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should +also point to the TCP header of the packet. + +For IPv4 segmentation we support one of two types in terms of the IP ID. +The default behavior is to increment the IP ID with every segment. If the +GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP +ID and all segments will use the same IP ID. If a device has +NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO +and we will either increment the IP ID for all frames, or leave it at a +static value based on driver preference. + + +UDP Fragmentation Offload +========================= + +UDP fragmentation offload allows a device to fragment an oversized UDP +datagram into multiple IPv4 fragments. Many of the requirements for UDP +fragmentation offload are the same as TSO. However the IPv4 ID for +fragments should not increment as a single IPv4 datagram is fragmented. + +UFO is deprecated: modern kernels will no longer generate UFO skbs, but can +still receive them from tuntap and similar devices. Offload of UDP-based +tunnel protocols is still supported. + + +IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads +======================================================== + +In addition to the offloads described above it is possible for a frame to +contain additional headers such as an outer tunnel. In order to account +for such instances an additional set of segmentation offload types were +introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and +SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify +cases where there are more than just 1 set of headers. For example in the +case of IPIP and SIT we should have the network and transport headers moved +from the standard list of headers to "inner" header offsets. + +Currently only two levels of headers are supported. The convention is to +refer to the tunnel headers as the outer headers, while the encapsulated +data is normally referred to as the inner headers. Below is the list of +calls to access the given headers: + +IPIP/SIT Tunnel:: + + Outer Inner + MAC skb_mac_header + Network skb_network_header skb_inner_network_header + Transport skb_transport_header + +UDP/GRE Tunnel:: + + Outer Inner + MAC skb_mac_header skb_inner_mac_header + Network skb_network_header skb_inner_network_header + Transport skb_transport_header skb_inner_transport_header + +In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and +SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the +fact that the outer header also requests to have a non-zero checksum +included in the outer header. + +Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel +header has requested a remote checksum offload. In this case the inner +headers will be left with a partial checksum and only the outer header +checksum will be computed. + + +Generic Segmentation Offload +============================ + +Generic segmentation offload is a pure software offload that is meant to +deal with cases where device drivers cannot perform the offloads described +above. What occurs in GSO is that a given skbuff will have its data broken +out over multiple skbuffs that have been resized to match the MSS provided +via skb_shinfo()->gso_size. + +Before enabling any hardware segmentation offload a corresponding software +offload is required in GSO. Otherwise it becomes possible for a frame to +be re-routed between devices and end up being unable to be transmitted. + + +Generic Receive Offload +======================= + +Generic receive offload is the complement to GSO. Ideally any frame +assembled by GRO should be segmented to create an identical sequence of +frames using GSO, and any sequence of frames segmented by GSO should be +able to be reassembled back to the original by GRO. The only exception to +this is IPv4 ID in the case that the DF bit is set for a given IP header. +If the value of the IPv4 ID is not sequentially incrementing it will be +altered so that it is when a frame assembled via GRO is segmented via GSO. + + +Partial Generic Segmentation Offload +==================================== + +Partial generic segmentation offload is a hybrid between TSO and GSO. What +it effectively does is take advantage of certain traits of TCP and tunnels +so that instead of having to rewrite the packet headers for each segment +only the inner-most transport header and possibly the outer-most network +header need to be updated. This allows devices that do not support tunnel +offloads or tunnel offloads with checksum to still make use of segmentation. + +With the partial offload what occurs is that all headers excluding the +inner transport header are updated such that they will contain the correct +values for if the header was simply duplicated. The one exception to this +is the outer IPv4 ID field. It is up to the device drivers to guarantee +that the IPv4 ID field is incremented in the case that a given header does +not have the DF bit set. + + +SCTP accelleration with GSO +=========================== + +SCTP - despite the lack of hardware support - can still take advantage of +GSO to pass one large packet through the network stack, rather than +multiple small packets. + +This requires a different approach to other offloads, as SCTP packets +cannot be just segmented to (P)MTU. Rather, the chunks must be contained in +IP segments, padding respected. So unlike regular GSO, SCTP can't just +generate a big skb, set gso_size to the fragmentation point and deliver it +to IP layer. + +Instead, the SCTP protocol layer builds an skb with the segments correctly +padded and stored as chained skbs, and skb_segment() splits based on those. +To signal this, gso_size is set to the special value GSO_BY_FRAGS. + +Therefore, any code in the core networking stack must be aware of the +possibility that gso_size will be GSO_BY_FRAGS and handle that case +appropriately. + +There are some helpers to make this easier: + +- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if + an skb is an SCTP GSO skb. + +- For size checks, the skb_gso_validate_*_len family of helpers correctly + considers GSO_BY_FRAGS. + +- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size + will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. + +This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits +set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. diff --git a/Documentation/networking/segmentation-offloads.txt b/Documentation/networking/segmentation-offloads.txt @@ -1,170 +0,0 @@ -Segmentation Offloads in the Linux Networking Stack - -Introduction -============ - -This document describes a set of techniques in the Linux networking stack -to take advantage of segmentation offload capabilities of various NICs. - -The following technologies are described: - * TCP Segmentation Offload - TSO - * UDP Fragmentation Offload - UFO - * IPIP, SIT, GRE, and UDP Tunnel Offloads - * Generic Segmentation Offload - GSO - * Generic Receive Offload - GRO - * Partial Generic Segmentation Offload - GSO_PARTIAL - * SCTP accelleration with GSO - GSO_BY_FRAGS - -TCP Segmentation Offload -======================== - -TCP segmentation allows a device to segment a single frame into multiple -frames with a data payload size specified in skb_shinfo()->gso_size. -When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or -SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and -skb_shinfo()->gso_size should be set to a non-zero value. - -TCP segmentation is dependent on support for the use of partial checksum -offload. For this reason TSO is normally disabled if the Tx checksum -offload for a given device is disabled. - -In order to support TCP segmentation offload it is necessary to populate -the network and transport header offsets of the skbuff so that the device -drivers will be able determine the offsets of the IP or IPv6 header and the -TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should -also point to the TCP header of the packet. - -For IPv4 segmentation we support one of two types in terms of the IP ID. -The default behavior is to increment the IP ID with every segment. If the -GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP -ID and all segments will use the same IP ID. If a device has -NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO -and we will either increment the IP ID for all frames, or leave it at a -static value based on driver preference. - -UDP Fragmentation Offload -========================= - -UDP fragmentation offload allows a device to fragment an oversized UDP -datagram into multiple IPv4 fragments. Many of the requirements for UDP -fragmentation offload are the same as TSO. However the IPv4 ID for -fragments should not increment as a single IPv4 datagram is fragmented. - -UFO is deprecated: modern kernels will no longer generate UFO skbs, but can -still receive them from tuntap and similar devices. Offload of UDP-based -tunnel protocols is still supported. - -IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads -======================================================== - -In addition to the offloads described above it is possible for a frame to -contain additional headers such as an outer tunnel. In order to account -for such instances an additional set of segmentation offload types were -introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and -SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify -cases where there are more than just 1 set of headers. For example in the -case of IPIP and SIT we should have the network and transport headers moved -from the standard list of headers to "inner" header offsets. - -Currently only two levels of headers are supported. The convention is to -refer to the tunnel headers as the outer headers, while the encapsulated -data is normally referred to as the inner headers. Below is the list of -calls to access the given headers: - -IPIP/SIT Tunnel: - Outer Inner -MAC skb_mac_header -Network skb_network_header skb_inner_network_header -Transport skb_transport_header - -UDP/GRE Tunnel: - Outer Inner -MAC skb_mac_header skb_inner_mac_header -Network skb_network_header skb_inner_network_header -Transport skb_transport_header skb_inner_transport_header - -In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and -SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the -fact that the outer header also requests to have a non-zero checksum -included in the outer header. - -Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel -header has requested a remote checksum offload. In this case the inner -headers will be left with a partial checksum and only the outer header -checksum will be computed. - -Generic Segmentation Offload -============================ - -Generic segmentation offload is a pure software offload that is meant to -deal with cases where device drivers cannot perform the offloads described -above. What occurs in GSO is that a given skbuff will have its data broken -out over multiple skbuffs that have been resized to match the MSS provided -via skb_shinfo()->gso_size. - -Before enabling any hardware segmentation offload a corresponding software -offload is required in GSO. Otherwise it becomes possible for a frame to -be re-routed between devices and end up being unable to be transmitted. - -Generic Receive Offload -======================= - -Generic receive offload is the complement to GSO. Ideally any frame -assembled by GRO should be segmented to create an identical sequence of -frames using GSO, and any sequence of frames segmented by GSO should be -able to be reassembled back to the original by GRO. The only exception to -this is IPv4 ID in the case that the DF bit is set for a given IP header. -If the value of the IPv4 ID is not sequentially incrementing it will be -altered so that it is when a frame assembled via GRO is segmented via GSO. - -Partial Generic Segmentation Offload -==================================== - -Partial generic segmentation offload is a hybrid between TSO and GSO. What -it effectively does is take advantage of certain traits of TCP and tunnels -so that instead of having to rewrite the packet headers for each segment -only the inner-most transport header and possibly the outer-most network -header need to be updated. This allows devices that do not support tunnel -offloads or tunnel offloads with checksum to still make use of segmentation. - -With the partial offload what occurs is that all headers excluding the -inner transport header are updated such that they will contain the correct -values for if the header was simply duplicated. The one exception to this -is the outer IPv4 ID field. It is up to the device drivers to guarantee -that the IPv4 ID field is incremented in the case that a given header does -not have the DF bit set. - -SCTP accelleration with GSO -=========================== - -SCTP - despite the lack of hardware support - can still take advantage of -GSO to pass one large packet through the network stack, rather than -multiple small packets. - -This requires a different approach to other offloads, as SCTP packets -cannot be just segmented to (P)MTU. Rather, the chunks must be contained in -IP segments, padding respected. So unlike regular GSO, SCTP can't just -generate a big skb, set gso_size to the fragmentation point and deliver it -to IP layer. - -Instead, the SCTP protocol layer builds an skb with the segments correctly -padded and stored as chained skbs, and skb_segment() splits based on those. -To signal this, gso_size is set to the special value GSO_BY_FRAGS. - -Therefore, any code in the core networking stack must be aware of the -possibility that gso_size will be GSO_BY_FRAGS and handle that case -appropriately. - -There are some helpers to make this easier: - - - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if - an skb is an SCTP GSO skb. - - - For size checks, the skb_gso_validate_*_len family of helpers correctly - considers GSO_BY_FRAGS. - - - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size - will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. - -This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits -set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst @@ -443,7 +443,7 @@ In function prototypes, include parameter names with their data types. Although this is not required by the C language, it is preferred in Linux because it is a simple way to add valuable information for the reader. -Do not use the `extern' keyword with function prototypes as this makes +Do not use the ``extern`` keyword with function prototypes as this makes lines longer and isn't strictly necessary. @@ -595,26 +595,43 @@ values. To do the latter, you can stick the following in your .emacs file: (* (max steps 1) c-basic-offset))) - (add-hook 'c-mode-common-hook - (lambda () - ;; Add kernel style - (c-add-style - "linux-tabs-only" - '("linux" (c-offsets-alist - (arglist-cont-nonempty - c-lineup-gcc-asm-reg - c-lineup-arglist-tabs-only)))))) - - (add-hook 'c-mode-hook - (lambda () - (let ((filename (buffer-file-name))) - ;; Enable kernel mode for the appropriate files - (when (and filename - (string-match (expand-file-name "~/src/linux-trees") - filename)) - (setq indent-tabs-mode t) - (setq show-trailing-whitespace t) - (c-set-style "linux-tabs-only"))))) + (dir-locals-set-class-variables + 'linux-kernel + '((c-mode . ( + (c-basic-offset . 8) + (c-label-minimum-indentation . 0) + (c-offsets-alist . ( + (arglist-close . c-lineup-arglist-tabs-only) + (arglist-cont-nonempty . + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (arglist-intro . +) + (brace-list-intro . +) + (c . c-lineup-C-comments) + (case-label . 0) + (comment-intro . c-lineup-comment) + (cpp-define-intro . +) + (cpp-macro . -1000) + (cpp-macro-cont . +) + (defun-block-intro . +) + (else-clause . 0) + (func-decl-cont . +) + (inclass . +) + (inher-cont . c-lineup-multi-inher) + (knr-argdecl-intro . 0) + (label . -1000) + (statement . 0) + (statement-block-intro . +) + (statement-case-intro . +) + (statement-cont . +) + (substatement . +) + )) + (indent-tabs-mode . t) + (show-trailing-whitespace . t) + )))) + + (dir-locals-set-directory-class + (expand-file-name "~/src/linux-trees") + 'linux-kernel) This will make emacs go better with the kernel coding style for C files below ``~/src/linux-trees``. @@ -921,7 +938,37 @@ result. Typical examples would be functions that return pointers; they use NULL or the ERR_PTR mechanism to report failure. -17) Don't re-invent the kernel macros +17) Using bool +-------------- + +The Linux kernel bool type is an alias for the C99 _Bool type. bool values can +only evaluate to 0 or 1, and implicit or explicit conversion to bool +automatically converts the value to true or false. When using bool types the +!! construction is not needed, which eliminates a class of bugs. + +When working with bool values the true and false definitions should be used +instead of 1 and 0. + +bool function return types and stack variables are always fine to use whenever +appropriate. Use of bool is encouraged to improve readability and is often a +better option than 'int' for storing boolean values. + +Do not use bool if cache line layout or size of the value matters, as its size +and alignment varies based on the compiled architecture. Structures that are +optimized for alignment and size should not use bool. + +If a structure has many true/false values, consider consolidating them into a +bitfield with 1 bit members, or using an appropriate fixed width type, such as +u8. + +Similarly for function arguments, many true/false values can be consolidated +into a single bitwise 'flags' argument and 'flags' can often be a more +readable alternative if the call-sites have naked true/false constants. + +Otherwise limited use of bool in structures and arguments can improve +readability. + +18) Don't re-invent the kernel macros ------------------------------------- The header file include/linux/kernel.h contains a number of macros that @@ -944,7 +991,7 @@ need them. Feel free to peruse that header file to see what else is already defined that you shouldn't reproduce in your code. -18) Editor modelines and other cruft +19) Editor modelines and other cruft ------------------------------------ Some editors can interpret configuration information embedded in source files, @@ -978,7 +1025,7 @@ own custom mode, or may have some other magic method for making indentation work correctly. -19) Inline assembly +20) Inline assembly ------------------- In architecture-specific code, you may need to use inline assembly to interface @@ -1010,7 +1057,7 @@ the next instruction in the assembly output: : /* outputs */ : /* inputs */ : /* clobbers */); -20) Conditional Compilation +21) Conditional Compilation --------------------------- Wherever possible, don't use preprocessor conditionals (#if, #ifdef) in .c diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst @@ -225,7 +225,7 @@ Cross-Reference project, which is able to present source code in a self-referential, indexed webpage format. An excellent up-to-date repository of the kernel code may be found at: - http://lxr.free-electrons.com/ + https://elixir.bootlin.com/ The development process @@ -235,23 +235,21 @@ Linux kernel development process currently consists of a few different main kernel "branches" and lots of different subsystem-specific kernel branches. These different branches are: - - main 4.x kernel tree - - 4.x.y -stable kernel tree - - 4.x -git kernel patches - - subsystem specific kernel trees and patches - - the 4.x -next kernel tree for integration tests + - Linus's mainline tree + - Various stable trees with multiple major numbers + - Subsystem-specific trees + - linux-next integration testing tree -4.x kernel tree -~~~~~~~~~~~~~~~ +Mainline tree +~~~~~~~~~~~~~ -4.x kernels are maintained by Linus Torvalds, and can be found on -https://kernel.org in the pub/linux/kernel/v4.x/ directory. Its development -process is as follows: +Mainline tree are maintained by Linus Torvalds, and can be found at +https://kernel.org or in the repo. Its development process is as follows: - As soon as a new kernel is released a two weeks window is open, during this period of time maintainers can submit big diffs to Linus, usually the patches that have already been included in the - -next kernel for a few weeks. The preferred way to submit big changes + linux-next for a few weeks. The preferred way to submit big changes is using git (the kernel's source management tool, more information can be found at https://git-scm.com/) but plain patches are also just fine. @@ -278,21 +276,19 @@ mailing list about kernel releases: released according to perceived bug status, not according to a preconceived timeline."* -4.x.y -stable kernel tree -~~~~~~~~~~~~~~~~~~~~~~~~~ +Various stable trees with multiple major numbers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kernels with 3-part versions are -stable kernels. They contain relatively small and critical fixes for security problems or significant -regressions discovered in a given 4.x kernel. +regressions discovered in a given major mainline release, with the first +2-part of version number are the same correspondingly. This is the recommended branch for users who want the most recent stable kernel and are not interested in helping test development/experimental versions. -If no 4.x.y kernel is available, then the highest numbered 4.x -kernel is the current stable kernel. - -4.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and +Stable trees are maintained by the "stable" team <stable@vger.kernel.org>, and are released as needs dictate. The normal release period is approximately two weeks, but it can be longer if there are no pressing problems. A security-related problem, instead, can cause a release to happen almost @@ -302,17 +298,8 @@ The file :ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rule in the kernel tree documents what kinds of changes are acceptable for the -stable tree, and how the release process works. -4.x -git patches -~~~~~~~~~~~~~~~~ - -These are daily snapshots of Linus' kernel tree which are managed in a -git repository (hence the name.) These patches are usually released -daily and represent the current state of Linus' tree. They are more -experimental than -rc kernels since they are generated automatically -without even a cursory glance to see if they are sane. - -Subsystem Specific kernel trees and patches -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Subsystem-specific trees +~~~~~~~~~~~~~~~~~~~~~~~~ The maintainers of the various kernel subsystems --- and also many kernel subsystem developers --- expose their current state of @@ -336,19 +323,19 @@ revisions to it, and maintainers can mark patches as under review, accepted, or rejected. Most of these patchwork sites are listed at https://patchwork.kernel.org/. -4.x -next kernel tree for integration tests -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +linux-next integration testing tree +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Before updates from subsystem trees are merged into the mainline 4.x -tree, they need to be integration-tested. For this purpose, a special +Before updates from subsystem trees are merged into the mainline tree, +they need to be integration-tested. For this purpose, a special testing repository exists into which virtually all subsystem trees are pulled on an almost daily basis: https://git.kernel.org/?p=linux/kernel/git/next/linux-next.git -This way, the -next kernel gives a summary outlook onto what will be +This way, the linux-next gives a summary outlook onto what will be expected to go into the mainline kernel at the next merge period. -Adventurous testers are very welcome to runtime-test the -next kernel. +Adventurous testers are very welcome to runtime-test the linux-next. Bug Reporting diff --git a/Documentation/process/kernel-docs.rst b/Documentation/process/kernel-docs.rst @@ -565,7 +565,7 @@ Miscellaneous * Name: **Cross-Referencing Linux** - :URL: http://lxr.free-electrons.com/ + :URL: https://elixir.bootlin.com/ :Keywords: Browsing source code. :Description: Another web-based Linux kernel source code browser. Lots of cross references to variables and functions. You can see diff --git a/Documentation/process/license-rules.rst b/Documentation/process/license-rules.rst @@ -62,7 +62,7 @@ License identifier syntax The SPDX license identifier in kernel files shall be added at the first possible line in a file which can contain a comment. For the majority - or files this is the first line, except for scripts which require the + of files this is the first line, except for scripts which require the '#!PATH_TO_INTERPRETER' in the first line. For those scripts the SPDX identifier goes into the second line. @@ -368,7 +368,69 @@ kernel, can be broken down into: All SPDX license identifiers and exceptions must have a corresponding file -in the LICENSE subdirectories. This is required to allow tool +in the LICENSES subdirectories. This is required to allow tool verification (e.g. checkpatch.pl) and to have the licenses ready to read and extract right from the source, which is recommended by various FOSS organizations, e.g. the `FSFE REUSE initiative <https://reuse.software/>`_. + +_`MODULE_LICENSE` +----------------- + + Loadable kernel modules also require a MODULE_LICENSE() tag. This tag is + neither a replacement for proper source code license information + (SPDX-License-Identifier) nor in any way relevant for expressing or + determining the exact license under which the source code of the module + is provided. + + The sole purpose of this tag is to provide sufficient information + whether the module is free software or proprietary for the kernel + module loader and for user space tools. + + The valid license strings for MODULE_LICENSE() are: + + ============================= ============================================= + "GPL" Module is licensed under GPL version 2. This + does not express any distinction between + GPL-2.0-only or GPL-2.0-or-later. The exact + license information can only be determined + via the license information in the + corresponding source files. + + "GPL v2" Same as "GPL". It exists for historic + reasons. + + "GPL and additional rights" Historical variant of expressing that the + module source is dual licensed under a + GPL v2 variant and MIT license. Please do + not use in new code. + + "Dual MIT/GPL" The correct way of expressing that the + module is dual licensed under a GPL v2 + variant or MIT license choice. + + "Dual BSD/GPL" The module is dual licensed under a GPL v2 + variant or BSD license choice. The exact + variant of the BSD license can only be + determined via the license information + in the corresponding source files. + + "Dual MPL/GPL" The module is dual licensed under a GPL v2 + variant or Mozilla Public License (MPL) + choice. The exact variant of the MPL + license can only be determined via the + license information in the corresponding + source files. + + "Proprietary" The module is under a proprietary license. + This string is solely for proprietary third + party modules and cannot be used for modules + which have their source code in the kernel + tree. Modules tagged that way are tainting + the kernel with the 'P' flag when loaded and + the kernel module loader refuses to link such + modules against symbols which are exported + with EXPORT_SYMBOL_GPL(). + ============================= ============================================= + + + diff --git a/Documentation/process/stable-api-nonsense.rst b/Documentation/process/stable-api-nonsense.rst @@ -169,14 +169,13 @@ driver for every different kernel version for every distribution is a nightmare, and trying to keep up with an ever changing kernel interface is also a rough job. -Simple, get your kernel driver into the main kernel tree (remember we -are talking about GPL released drivers here, if your code doesn't fall -under this category, good luck, you are on your own here, you leech -<insert link to leech comment from Andrew and Linus here>.) If your -driver is in the tree, and a kernel interface changes, it will be fixed -up by the person who did the kernel change in the first place. This -ensures that your driver is always buildable, and works over time, with -very little effort on your part. +Simple, get your kernel driver into the main kernel tree (remember we are +talking about drivers released under a GPL-compatible license here, if your +code doesn't fall under this category, good luck, you are on your own here, +you leech). If your driver is in the tree, and a kernel interface changes, +it will be fixed up by the person who did the kernel change in the first +place. This ensures that your driver is always buildable, and works over +time, with very little effort on your part. The very good side effects of having your driver in the main kernel tree are: diff --git a/Documentation/process/stable-kernel-rules.rst b/Documentation/process/stable-kernel-rules.rst @@ -38,6 +38,9 @@ Procedure for submitting patches to the -stable tree - If the patch covers files in net/ or drivers/net please follow netdev stable submission guidelines as described in :ref:`Documentation/networking/netdev-FAQ.rst <netdev-FAQ>` + after first checking the stable networking queue at + https://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=*&q=&archive= + to ensure the requested patch is not already queued up. - Security patches should not be handled (solely) by the -stable review process but should follow the procedures in :ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>`. @@ -98,9 +101,9 @@ text, like this: commit <sha1> upstream. -Additionally, some patches submitted via Option 1 may have additional patch -prerequisites which can be cherry-picked. This can be specified in the following -format in the sign-off area: +Additionally, some patches submitted via :ref:`option_1` may have additional +patch prerequisites which can be cherry-picked. This can be specified in the +following format in the sign-off area: .. code-block:: none diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst @@ -182,9 +182,11 @@ change five years from now. If your patch fixes a bug in a specific commit, e.g. you found an issue using ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of -the SHA-1 ID, and the one line summary. For example:: +the SHA-1 ID, and the one line summary. Do not split the tag across multiple +lines, tags are exempt from the "wrap at 75 columns" rule in order to simplify +parsing scripts. For example:: - Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()") + Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed") The following ``git config`` settings can be used to add a pretty format for outputting the above style in the ``git log`` or ``git show`` commands:: diff --git a/Documentation/security/LSM-sctp.rst b/Documentation/security/LSM-sctp.rst @@ -1,175 +0,0 @@ -SCTP LSM Support -================ - -For security module support, three SCTP specific hooks have been implemented:: - - security_sctp_assoc_request() - security_sctp_bind_connect() - security_sctp_sk_clone() - -Also the following security hook has been utilised:: - - security_inet_conn_established() - -The usage of these hooks are described below with the SELinux implementation -described in ``Documentation/security/SELinux-sctp.rst`` - - -security_sctp_assoc_request() ------------------------------ -Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the -security module. Returns 0 on success, error on failure. -:: - - @ep - pointer to sctp endpoint structure. - @skb - pointer to skbuff of association packet. - - -security_sctp_bind_connect() ------------------------------ -Passes one or more ipv4/ipv6 addresses to the security module for validation -based on the ``@optname`` that will result in either a bind or connect -service as shown in the permission check tables below. -Returns 0 on success, error on failure. -:: - - @sk - Pointer to sock structure. - @optname - Name of the option to validate. - @address - One or more ipv4 / ipv6 addresses. - @addrlen - The total length of address(s). This is calculated on each - ipv4 or ipv6 address using sizeof(struct sockaddr_in) or - sizeof(struct sockaddr_in6). - - ------------------------------------------------------------------ - | BIND Type Checks | - | @optname | @address contains | - |----------------------------|-----------------------------------| - | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses | - | SCTP_PRIMARY_ADDR | Single ipv4 or ipv6 address | - | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address | - ------------------------------------------------------------------ - - ------------------------------------------------------------------ - | CONNECT Type Checks | - | @optname | @address contains | - |----------------------------|-----------------------------------| - | SCTP_SOCKOPT_CONNECTX | One or more ipv4 / ipv6 addresses | - | SCTP_PARAM_ADD_IP | One or more ipv4 / ipv6 addresses | - | SCTP_SENDMSG_CONNECT | Single ipv4 or ipv6 address | - | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address | - ------------------------------------------------------------------ - -A summary of the ``@optname`` entries is as follows:: - - SCTP_SOCKOPT_BINDX_ADD - Allows additional bind addresses to be - associated after (optionally) calling - bind(3). - sctp_bindx(3) adds a set of bind - addresses on a socket. - - SCTP_SOCKOPT_CONNECTX - Allows the allocation of multiple - addresses for reaching a peer - (multi-homed). - sctp_connectx(3) initiates a connection - on an SCTP socket using multiple - destination addresses. - - SCTP_SENDMSG_CONNECT - Initiate a connection that is generated by a - sendmsg(2) or sctp_sendmsg(3) on a new asociation. - - SCTP_PRIMARY_ADDR - Set local primary address. - - SCTP_SET_PEER_PRIMARY_ADDR - Request peer sets address as - association primary. - - SCTP_PARAM_ADD_IP - These are used when Dynamic Address - SCTP_PARAM_SET_PRIMARY - Reconfiguration is enabled as explained below. - - -To support Dynamic Address Reconfiguration the following parameters must be -enabled on both endpoints (or use the appropriate **setsockopt**\(2)):: - - /proc/sys/net/sctp/addip_enable - /proc/sys/net/sctp/addip_noauth_enable - -then the following *_PARAM_*'s are sent to the peer in an -ASCONF chunk when the corresponding ``@optname``'s are present:: - - @optname ASCONF Parameter - ---------- ------------------ - SCTP_SOCKOPT_BINDX_ADD -> SCTP_PARAM_ADD_IP - SCTP_SET_PEER_PRIMARY_ADDR -> SCTP_PARAM_SET_PRIMARY - - -security_sctp_sk_clone() -------------------------- -Called whenever a new socket is created by **accept**\(2) -(i.e. a TCP style socket) or when a socket is 'peeled off' e.g userspace -calls **sctp_peeloff**\(3). -:: - - @ep - pointer to current sctp endpoint structure. - @sk - pointer to current sock structure. - @sk - pointer to new sock structure. - - -security_inet_conn_established() ---------------------------------- -Called when a COOKIE ACK is received:: - - @sk - pointer to sock structure. - @skb - pointer to skbuff of the COOKIE ACK packet. - - -Security Hooks used for Association Establishment -================================================= -The following diagram shows the use of ``security_sctp_bind_connect()``, -``security_sctp_assoc_request()``, ``security_inet_conn_established()`` when -establishing an association. -:: - - SCTP endpoint "A" SCTP endpoint "Z" - ================= ================= - sctp_sf_do_prm_asoc() - Association setup can be initiated - by a connect(2), sctp_connectx(3), - sendmsg(2) or sctp_sendmsg(3). - These will result in a call to - security_sctp_bind_connect() to - initiate an association to - SCTP peer endpoint "Z". - INIT ---------------------------------------------> - sctp_sf_do_5_1B_init() - Respond to an INIT chunk. - SCTP peer endpoint "A" is - asking for an association. Call - security_sctp_assoc_request() - to set the peer label if first - association. - If not first association, check - whether allowed, IF so send: - <----------------------------------------------- INIT ACK - | ELSE audit event and silently - | discard the packet. - | - COOKIE ECHO ------------------------------------------> - | - | - | - <------------------------------------------- COOKIE ACK - | | - sctp_sf_do_5_1E_ca | - Call security_inet_conn_established() | - to set the peer label. | - | | - | If SCTP_SOCKET_TCP or peeled off - | socket security_sctp_sk_clone() is - | called to clone the new socket. - | | - ESTABLISHED ESTABLISHED - | | - ------------------------------------------------------------------ - | Association Established | - ------------------------------------------------------------------ - - diff --git a/Documentation/security/LSM.rst b/Documentation/security/LSM.rst @@ -11,4 +11,7 @@ that end users and distros can make a more informed decision about which LSMs suit their requirements. For extensive documentation on the available LSM hook interfaces, please -see ``include/linux/lsm_hooks.h``. +see ``include/linux/lsm_hooks.h`` and associated structures: + +.. kernel-doc:: include/linux/lsm_hooks.h + :internal: diff --git a/Documentation/security/SCTP.rst b/Documentation/security/SCTP.rst @@ -0,0 +1,343 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==== +SCTP +==== + +SCTP LSM Support +================ + +Security Hooks +-------------- + +For security module support, three SCTP specific hooks have been implemented:: + + security_sctp_assoc_request() + security_sctp_bind_connect() + security_sctp_sk_clone() + +Also the following security hook has been utilised:: + + security_inet_conn_established() + +The usage of these hooks are described below with the SELinux implementation +described in the `SCTP SELinux Support`_ chapter. + + +security_sctp_assoc_request() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the +security module. Returns 0 on success, error on failure. +:: + + @ep - pointer to sctp endpoint structure. + @skb - pointer to skbuff of association packet. + + +security_sctp_bind_connect() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Passes one or more ipv4/ipv6 addresses to the security module for validation +based on the ``@optname`` that will result in either a bind or connect +service as shown in the permission check tables below. +Returns 0 on success, error on failure. +:: + + @sk - Pointer to sock structure. + @optname - Name of the option to validate. + @address - One or more ipv4 / ipv6 addresses. + @addrlen - The total length of address(s). This is calculated on each + ipv4 or ipv6 address using sizeof(struct sockaddr_in) or + sizeof(struct sockaddr_in6). + + ------------------------------------------------------------------ + | BIND Type Checks | + | @optname | @address contains | + |----------------------------|-----------------------------------| + | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses | + | SCTP_PRIMARY_ADDR | Single ipv4 or ipv6 address | + | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address | + ------------------------------------------------------------------ + + ------------------------------------------------------------------ + | CONNECT Type Checks | + | @optname | @address contains | + |----------------------------|-----------------------------------| + | SCTP_SOCKOPT_CONNECTX | One or more ipv4 / ipv6 addresses | + | SCTP_PARAM_ADD_IP | One or more ipv4 / ipv6 addresses | + | SCTP_SENDMSG_CONNECT | Single ipv4 or ipv6 address | + | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address | + ------------------------------------------------------------------ + +A summary of the ``@optname`` entries is as follows:: + + SCTP_SOCKOPT_BINDX_ADD - Allows additional bind addresses to be + associated after (optionally) calling + bind(3). + sctp_bindx(3) adds a set of bind + addresses on a socket. + + SCTP_SOCKOPT_CONNECTX - Allows the allocation of multiple + addresses for reaching a peer + (multi-homed). + sctp_connectx(3) initiates a connection + on an SCTP socket using multiple + destination addresses. + + SCTP_SENDMSG_CONNECT - Initiate a connection that is generated by a + sendmsg(2) or sctp_sendmsg(3) on a new asociation. + + SCTP_PRIMARY_ADDR - Set local primary address. + + SCTP_SET_PEER_PRIMARY_ADDR - Request peer sets address as + association primary. + + SCTP_PARAM_ADD_IP - These are used when Dynamic Address + SCTP_PARAM_SET_PRIMARY - Reconfiguration is enabled as explained below. + + +To support Dynamic Address Reconfiguration the following parameters must be +enabled on both endpoints (or use the appropriate **setsockopt**\(2)):: + + /proc/sys/net/sctp/addip_enable + /proc/sys/net/sctp/addip_noauth_enable + +then the following *_PARAM_*'s are sent to the peer in an +ASCONF chunk when the corresponding ``@optname``'s are present:: + + @optname ASCONF Parameter + ---------- ------------------ + SCTP_SOCKOPT_BINDX_ADD -> SCTP_PARAM_ADD_IP + SCTP_SET_PEER_PRIMARY_ADDR -> SCTP_PARAM_SET_PRIMARY + + +security_sctp_sk_clone() +~~~~~~~~~~~~~~~~~~~~~~~~ +Called whenever a new socket is created by **accept**\(2) +(i.e. a TCP style socket) or when a socket is 'peeled off' e.g userspace +calls **sctp_peeloff**\(3). +:: + + @ep - pointer to current sctp endpoint structure. + @sk - pointer to current sock structure. + @sk - pointer to new sock structure. + + +security_inet_conn_established() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Called when a COOKIE ACK is received:: + + @sk - pointer to sock structure. + @skb - pointer to skbuff of the COOKIE ACK packet. + + +Security Hooks used for Association Establishment +------------------------------------------------- + +The following diagram shows the use of ``security_sctp_bind_connect()``, +``security_sctp_assoc_request()``, ``security_inet_conn_established()`` when +establishing an association. +:: + + SCTP endpoint "A" SCTP endpoint "Z" + ================= ================= + sctp_sf_do_prm_asoc() + Association setup can be initiated + by a connect(2), sctp_connectx(3), + sendmsg(2) or sctp_sendmsg(3). + These will result in a call to + security_sctp_bind_connect() to + initiate an association to + SCTP peer endpoint "Z". + INIT ---------------------------------------------> + sctp_sf_do_5_1B_init() + Respond to an INIT chunk. + SCTP peer endpoint "A" is + asking for an association. Call + security_sctp_assoc_request() + to set the peer label if first + association. + If not first association, check + whether allowed, IF so send: + <----------------------------------------------- INIT ACK + | ELSE audit event and silently + | discard the packet. + | + COOKIE ECHO ------------------------------------------> + | + | + | + <------------------------------------------- COOKIE ACK + | | + sctp_sf_do_5_1E_ca | + Call security_inet_conn_established() | + to set the peer label. | + | | + | If SCTP_SOCKET_TCP or peeled off + | socket security_sctp_sk_clone() is + | called to clone the new socket. + | | + ESTABLISHED ESTABLISHED + | | + ------------------------------------------------------------------ + | Association Established | + ------------------------------------------------------------------ + + +SCTP SELinux Support +==================== + +Security Hooks +-------------- + +The `SCTP LSM Support`_ chapter above describes the following SCTP security +hooks with the SELinux specifics expanded below:: + + security_sctp_assoc_request() + security_sctp_bind_connect() + security_sctp_sk_clone() + security_inet_conn_established() + + +security_sctp_assoc_request() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the +security module. Returns 0 on success, error on failure. +:: + + @ep - pointer to sctp endpoint structure. + @skb - pointer to skbuff of association packet. + +The security module performs the following operations: + IF this is the first association on ``@ep->base.sk``, then set the peer + sid to that in ``@skb``. This will ensure there is only one peer sid + assigned to ``@ep->base.sk`` that may support multiple associations. + + ELSE validate the ``@ep->base.sk peer_sid`` against the ``@skb peer sid`` + to determine whether the association should be allowed or denied. + + Set the sctp ``@ep sid`` to socket's sid (from ``ep->base.sk``) with + MLS portion taken from ``@skb peer sid``. This will be used by SCTP + TCP style sockets and peeled off connections as they cause a new socket + to be generated. + + If IP security options are configured (CIPSO/CALIPSO), then the ip + options are set on the socket. + + +security_sctp_bind_connect() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Checks permissions required for ipv4/ipv6 addresses based on the ``@optname`` +as follows:: + + ------------------------------------------------------------------ + | BIND Permission Checks | + | @optname | @address contains | + |----------------------------|-----------------------------------| + | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses | + | SCTP_PRIMARY_ADDR | Single ipv4 or ipv6 address | + | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address | + ------------------------------------------------------------------ + + ------------------------------------------------------------------ + | CONNECT Permission Checks | + | @optname | @address contains | + |----------------------------|-----------------------------------| + | SCTP_SOCKOPT_CONNECTX | One or more ipv4 / ipv6 addresses | + | SCTP_PARAM_ADD_IP | One or more ipv4 / ipv6 addresses | + | SCTP_SENDMSG_CONNECT | Single ipv4 or ipv6 address | + | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address | + ------------------------------------------------------------------ + + +`SCTP LSM Support`_ gives a summary of the ``@optname`` +entries and also describes ASCONF chunk processing when Dynamic Address +Reconfiguration is enabled. + + +security_sctp_sk_clone() +~~~~~~~~~~~~~~~~~~~~~~~~ +Called whenever a new socket is created by **accept**\(2) (i.e. a TCP style +socket) or when a socket is 'peeled off' e.g userspace calls +**sctp_peeloff**\(3). ``security_sctp_sk_clone()`` will set the new +sockets sid and peer sid to that contained in the ``@ep sid`` and +``@ep peer sid`` respectively. +:: + + @ep - pointer to current sctp endpoint structure. + @sk - pointer to current sock structure. + @sk - pointer to new sock structure. + + +security_inet_conn_established() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Called when a COOKIE ACK is received where it sets the connection's peer sid +to that in ``@skb``:: + + @sk - pointer to sock structure. + @skb - pointer to skbuff of the COOKIE ACK packet. + + +Policy Statements +----------------- +The following class and permissions to support SCTP are available within the +kernel:: + + class sctp_socket inherits socket { node_bind } + +whenever the following policy capability is enabled:: + + policycap extended_socket_class; + +SELinux SCTP support adds the ``name_connect`` permission for connecting +to a specific port type and the ``association`` permission that is explained +in the section below. + +If userspace tools have been updated, SCTP will support the ``portcon`` +statement as shown in the following example:: + + portcon sctp 1024-1036 system_u:object_r:sctp_ports_t:s0 + + +SCTP Peer Labeling +------------------ +An SCTP socket will only have one peer label assigned to it. This will be +assigned during the establishment of the first association. Any further +associations on this socket will have their packet peer label compared to +the sockets peer label, and only if they are different will the +``association`` permission be validated. This is validated by checking the +socket peer sid against the received packets peer sid to determine whether +the association should be allowed or denied. + +NOTES: + 1) If peer labeling is not enabled, then the peer context will always be + ``SECINITSID_UNLABELED`` (``unlabeled_t`` in Reference Policy). + + 2) As SCTP can support more than one transport address per endpoint + (multi-homing) on a single socket, it is possible to configure policy + and NetLabel to provide different peer labels for each of these. As the + socket peer label is determined by the first associations transport + address, it is recommended that all peer labels are consistent. + + 3) **getpeercon**\(3) may be used by userspace to retrieve the sockets peer + context. + + 4) While not SCTP specific, be aware when using NetLabel that if a label + is assigned to a specific interface, and that interface 'goes down', + then the NetLabel service will remove the entry. Therefore ensure that + the network startup scripts call **netlabelctl**\(8) to set the required + label (see **netlabel-config**\(8) helper script for details). + + 5) The NetLabel SCTP peer labeling rules apply as discussed in the following + set of posts tagged "netlabel" at: http://www.paul-moore.com/blog/t. + + 6) CIPSO is only supported for IPv4 addressing: ``socket(AF_INET, ...)`` + CALIPSO is only supported for IPv6 addressing: ``socket(AF_INET6, ...)`` + + Note the following when testing CIPSO/CALIPSO: + a) CIPSO will send an ICMP packet if an SCTP packet cannot be + delivered because of an invalid label. + b) CALIPSO does not send an ICMP packet, just silently discards it. + + 7) IPSEC is not supported as RFC 3554 - sctp/ipsec support has not been + implemented in userspace (**racoon**\(8) or **ipsec_pluto**\(8)), + although the kernel supports SCTP/IPSEC. diff --git a/Documentation/security/SELinux-sctp.rst b/Documentation/security/SELinux-sctp.rst @@ -1,158 +0,0 @@ -SCTP SELinux Support -===================== - -Security Hooks -=============== - -``Documentation/security/LSM-sctp.rst`` describes the following SCTP security -hooks with the SELinux specifics expanded below:: - - security_sctp_assoc_request() - security_sctp_bind_connect() - security_sctp_sk_clone() - security_inet_conn_established() - - -security_sctp_assoc_request() ------------------------------ -Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the -security module. Returns 0 on success, error on failure. -:: - - @ep - pointer to sctp endpoint structure. - @skb - pointer to skbuff of association packet. - -The security module performs the following operations: - IF this is the first association on ``@ep->base.sk``, then set the peer - sid to that in ``@skb``. This will ensure there is only one peer sid - assigned to ``@ep->base.sk`` that may support multiple associations. - - ELSE validate the ``@ep->base.sk peer_sid`` against the ``@skb peer sid`` - to determine whether the association should be allowed or denied. - - Set the sctp ``@ep sid`` to socket's sid (from ``ep->base.sk``) with - MLS portion taken from ``@skb peer sid``. This will be used by SCTP - TCP style sockets and peeled off connections as they cause a new socket - to be generated. - - If IP security options are configured (CIPSO/CALIPSO), then the ip - options are set on the socket. - - -security_sctp_bind_connect() ------------------------------ -Checks permissions required for ipv4/ipv6 addresses based on the ``@optname`` -as follows:: - - ------------------------------------------------------------------ - | BIND Permission Checks | - | @optname | @address contains | - |----------------------------|-----------------------------------| - | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses | - | SCTP_PRIMARY_ADDR | Single ipv4 or ipv6 address | - | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address | - ------------------------------------------------------------------ - - ------------------------------------------------------------------ - | CONNECT Permission Checks | - | @optname | @address contains | - |----------------------------|-----------------------------------| - | SCTP_SOCKOPT_CONNECTX | One or more ipv4 / ipv6 addresses | - | SCTP_PARAM_ADD_IP | One or more ipv4 / ipv6 addresses | - | SCTP_SENDMSG_CONNECT | Single ipv4 or ipv6 address | - | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address | - ------------------------------------------------------------------ - - -``Documentation/security/LSM-sctp.rst`` gives a summary of the ``@optname`` -entries and also describes ASCONF chunk processing when Dynamic Address -Reconfiguration is enabled. - - -security_sctp_sk_clone() -------------------------- -Called whenever a new socket is created by **accept**\(2) (i.e. a TCP style -socket) or when a socket is 'peeled off' e.g userspace calls -**sctp_peeloff**\(3). ``security_sctp_sk_clone()`` will set the new -sockets sid and peer sid to that contained in the ``@ep sid`` and -``@ep peer sid`` respectively. -:: - - @ep - pointer to current sctp endpoint structure. - @sk - pointer to current sock structure. - @sk - pointer to new sock structure. - - -security_inet_conn_established() ---------------------------------- -Called when a COOKIE ACK is received where it sets the connection's peer sid -to that in ``@skb``:: - - @sk - pointer to sock structure. - @skb - pointer to skbuff of the COOKIE ACK packet. - - -Policy Statements -================== -The following class and permissions to support SCTP are available within the -kernel:: - - class sctp_socket inherits socket { node_bind } - -whenever the following policy capability is enabled:: - - policycap extended_socket_class; - -SELinux SCTP support adds the ``name_connect`` permission for connecting -to a specific port type and the ``association`` permission that is explained -in the section below. - -If userspace tools have been updated, SCTP will support the ``portcon`` -statement as shown in the following example:: - - portcon sctp 1024-1036 system_u:object_r:sctp_ports_t:s0 - - -SCTP Peer Labeling -=================== -An SCTP socket will only have one peer label assigned to it. This will be -assigned during the establishment of the first association. Any further -associations on this socket will have their packet peer label compared to -the sockets peer label, and only if they are different will the -``association`` permission be validated. This is validated by checking the -socket peer sid against the received packets peer sid to determine whether -the association should be allowed or denied. - -NOTES: - 1) If peer labeling is not enabled, then the peer context will always be - ``SECINITSID_UNLABELED`` (``unlabeled_t`` in Reference Policy). - - 2) As SCTP can support more than one transport address per endpoint - (multi-homing) on a single socket, it is possible to configure policy - and NetLabel to provide different peer labels for each of these. As the - socket peer label is determined by the first associations transport - address, it is recommended that all peer labels are consistent. - - 3) **getpeercon**\(3) may be used by userspace to retrieve the sockets peer - context. - - 4) While not SCTP specific, be aware when using NetLabel that if a label - is assigned to a specific interface, and that interface 'goes down', - then the NetLabel service will remove the entry. Therefore ensure that - the network startup scripts call **netlabelctl**\(8) to set the required - label (see **netlabel-config**\(8) helper script for details). - - 5) The NetLabel SCTP peer labeling rules apply as discussed in the following - set of posts tagged "netlabel" at: http://www.paul-moore.com/blog/t. - - 6) CIPSO is only supported for IPv4 addressing: ``socket(AF_INET, ...)`` - CALIPSO is only supported for IPv6 addressing: ``socket(AF_INET6, ...)`` - - Note the following when testing CIPSO/CALIPSO: - a) CIPSO will send an ICMP packet if an SCTP packet cannot be - delivered because of an invalid label. - b) CALIPSO does not send an ICMP packet, just silently discards it. - - 7) IPSEC is not supported as RFC 3554 - sctp/ipsec support has not been - implemented in userspace (**racoon**\(8) or **ipsec_pluto**\(8)), - although the kernel supports SCTP/IPSEC. diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst @@ -9,7 +9,6 @@ Security Documentation IMA-templates keys/index LSM - LSM-sctp - SELinux-sctp + SCTP self-protection tpm/index diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt @@ -159,7 +159,7 @@ particularly the CPU hotplug lock (in order to avoid races against CPUs being brought in the kernel while the kernel is getting patched). Calling the static key API from within a hotplug notifier is thus a sure deadlock recipe. In order to still allow use of the -functionnality, the following functions are provided: +functionality, the following functions are provided: static_key_enable_cpuslocked() static_key_disable_cpuslocked() diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt @@ -95,7 +95,7 @@ show up in /proc/sys/kernel: - stop-a [ SPARC only ] - sysrq ==> Documentation/admin-guide/sysrq.rst - sysctl_writes_strict -- tainted +- tainted ==> Documentation/admin-guide/tainted-kernels.rst - threads-max - unknown_nmi_panic - watchdog @@ -1031,39 +1031,33 @@ compilation sees a 1% slowdown, other systems and workloads may vary. 1: kernel stack erasing is enabled (default), it is performed before returning to the userspace at the end of syscalls. - ============================================================== -tainted: +tainted Non-zero if the kernel has been tainted. Numeric values, which can be ORed together. The letters are seen in "Tainted" line of Oops reports. - 1 (P): A module with a non-GPL license has been loaded, this - includes modules with no license. - Set by modutils >= 2.4.9 and module-init-tools. - 2 (F): A module was force loaded by insmod -f. - Set by modutils >= 2.4.9 and module-init-tools. - 4 (S): Unsafe SMP processors: SMP with CPUs not designed for SMP. - 8 (R): A module was forcibly unloaded from the system by rmmod -f. - 16 (M): A hardware machine check error occurred on the system. - 32 (B): A bad page was discovered on the system. - 64 (U): The user has asked that the system be marked "tainted". This - could be because they are running software that directly modifies - the hardware, or for other reasons. - 128 (D): The system has died. - 256 (A): The ACPI DSDT has been overridden with one supplied by the user - instead of using the one provided by the hardware. - 512 (W): A kernel warning has occurred. - 1024 (C): A module from drivers/staging was loaded. - 2048 (I): The system is working around a severe firmware bug. - 4096 (O): An out-of-tree module has been loaded. - 8192 (E): An unsigned module has been loaded in a kernel supporting module - signature. - 16384 (L): A soft lockup has previously occurred on the system. - 32768 (K): The kernel has been live patched. - 65536 (X): Auxiliary taint, defined and used by for distros. -131072 (T): The kernel was built with the struct randomization plugin. + 1 (P): proprietary module was loaded + 2 (F): module was force loaded + 4 (S): SMP kernel oops on an officially SMP incapable processor + 8 (R): module was force unloaded + 16 (M): processor reported a Machine Check Exception (MCE) + 32 (B): bad page referenced or some unexpected page flags + 64 (U): taint requested by userspace application + 128 (D): kernel died recently, i.e. there was an OOPS or BUG + 256 (A): an ACPI table was overridden by user + 512 (W): kernel issued warning + 1024 (C): staging driver was loaded + 2048 (I): workaround for bug in platform firmware applied + 4096 (O): externally-built ("out-of-tree") module was loaded + 8192 (E): unsigned module was loaded + 16384 (L): soft lockup occurred + 32768 (K): kernel has been live patched + 65536 (X): Auxiliary taint, defined and used by for distros +131072 (T): The kernel was built with the struct randomization plugin + +See Documentation/admin-guide/tainted-kernels.rst for more information. ============================================================== diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt @@ -237,7 +237,7 @@ used: cat (1234): drop_caches: 3 These are informational only. They do not mean that anything is wrong -with your system. To disable them, echo 4 (bit 3) into drop_caches. +with your system. To disable them, echo 4 (bit 2) into drop_caches. ============================================================== diff --git a/Documentation/timers/highres.txt b/Documentation/timers/highres.txt @@ -231,7 +231,7 @@ in the idle period to make sure that jiffies are up to date and the interrupt handler has not to deal with an eventually stale jiffy value. The dynamic tick feature provides statistical values which are exported to -userspace via /proc/stats and can be made available for enhanced power +userspace via /proc/stat and can be made available for enhanced power management control. The implementation leaves room for further development like full tickless diff --git a/Documentation/translations/it_IT/doc-guide/sphinx.rst b/Documentation/translations/it_IT/doc-guide/sphinx.rst @@ -3,6 +3,8 @@ .. note:: Per leggere la documentazione originale in inglese: :ref:`Documentation/doc-guide/index.rst <doc_guide>` +.. _it_sphinxdoc: + Introduzione ============ diff --git a/Documentation/translations/it_IT/process/applying-patches.rst b/Documentation/translations/it_IT/process/applying-patches.rst @@ -1,13 +1,15 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/applying-patches.rst <applying_patches>` - +:Translator: Federico Vaga <federico.vaga@vaga.pv.it> .. _it_applying_patches: -Applicare modifiche al kernel Linux -=================================== +Applicare patch al kernel Linux ++++++++++++++++++++++++++++++++ -.. warning:: +.. note:: - TODO ancora da tradurre + Questo documento è obsoleto. Nella maggior parte dei casi, piuttosto + che usare ``patch`` manualmente, vorrete usare Git. Per questo motivo + il documento non verrà tradotto. diff --git a/Documentation/translations/it_IT/process/changes.rst b/Documentation/translations/it_IT/process/changes.rst @@ -1,12 +1,495 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/changes.rst <changes>` +:Translator: Federico Vaga <federico.vaga@vaga.pv.it> .. _it_changes: Requisiti minimi per compilare il kernel ++++++++++++++++++++++++++++++++++++++++ -.. warning:: +Introduzione +============ - TODO ancora da tradurre +Questo documento fornisce una lista dei software necessari per eseguire i +kernel 4.x. + +Questo documento è basato sul file "Changes" del kernel 2.0.x e quindi le +persone che lo scrissero meritano credito (Jared Mauch, Axel Boldt, +Alessandro Sigala, e tanti altri nella rete). + +Requisiti minimi correnti +************************* + +Prima di pensare d'avere trovato un baco, aggiornate i seguenti programmi +**almeno** alla versione indicata! Se non siete certi della versione che state +usando, il comando indicato dovrebbe dirvelo. + +Questa lista presume che abbiate già un kernel Linux funzionante. In aggiunta, +non tutti gli strumenti sono necessari ovunque; ovviamente, se non avete un +modem ISDN, per esempio, probabilmente non dovreste preoccuparvi di +isdn4k-utils. + +====================== ================= ======================================== + Programma Versione minima Comando per verificare la versione +====================== ================= ======================================== +GNU C 4.6 gcc --version +GNU make 3.81 make --version +binutils 2.20 ld -v +flex 2.5.35 flex --version +bison 2.0 bison --version +util-linux 2.10o fdformat --version +kmod 13 depmod -V +e2fsprogs 1.41.4 e2fsck -V +jfsutils 1.1.3 fsck.jfs -V +reiserfsprogs 3.6.3 reiserfsck -V +xfsprogs 2.6.0 xfs_db -V +squashfs-tools 4.0 mksquashfs -version +btrfs-progs 0.18 btrfsck +pcmciautils 004 pccardctl -V +quota-tools 3.09 quota -V +PPP 2.4.0 pppd --version +isdn4k-utils 3.1pre1 isdnctrl 2>&1|grep version +nfs-utils 1.0.5 showmount --version +procps 3.2.0 ps --version +oprofile 0.9 oprofiled --version +udev 081 udevd --version +grub 0.93 grub --version || grub-install --version +mcelog 0.6 mcelog --version +iptables 1.4.2 iptables -V +openssl & libcrypto 1.0.0 openssl version +bc 1.06.95 bc --version +Sphinx\ [#f1]_ 1.3 sphinx-build --version +====================== ================= ======================================== + +.. [#f1] Sphinx è necessario solo per produrre la documentazione del Kernel + +Compilazione del kernel +*********************** + +GCC +--- + +La versione necessaria di gcc potrebbe variare a seconda del tipo di CPU nel +vostro calcolatore. + +Make +---- + +Per compilare il kernel vi servirà GNU make 3.81 o successivo. + +Binutils +-------- + +Il sistema di compilazione, dalla versione 4.13, per la produzione dei passi +intermedi, si è convertito all'uso di *thin archive* (`ar T`) piuttosto che +all'uso del *linking* incrementale (`ld -r`). Questo richiede binutils 2.20 o +successivo. + +pkg-config +---------- + +Il sistema di compilazione, dalla versione 4.18, richiede pkg-config per +verificare l'esistenza degli strumenti kconfig e per determinare le +impostazioni da usare in 'make {g,x}config'. Precedentemente pkg-config +veniva usato ma non verificato o documentato. + +Flex +---- + +Dalla versione 4.16, il sistema di compilazione, durante l'esecuzione, genera +un analizzatore lessicale. Questo richiede flex 2.5.35 o successivo. + +Bison +----- + +Dalla versione 4.16, il sistema di compilazione, durante l'esecuzione, genera +un parsificatore. Questo richiede bison 2.0 o successivo. + +Perl +---- + +Per compilare il kernel vi servirà perl 5 e i seguenti moduli ``Getopt::Long``, +``Getopt::Std``, ``File::Basename``, e ``File::Find``. + +BC +-- + +Vi servirà bc per compilare i kernel dal 3.10 in poi. + +OpenSSL +------- + +Il programma OpenSSL e la libreria crypto vengono usati per la firma dei moduli +e la gestione dei certificati; sono usati per la creazione della chiave e +la generazione della firma. + +Se la firma dei moduli è abilitata, allora vi servirà openssl per compilare il +kernel 3.7 e successivi. Vi serviranno anche i pacchetti di sviluppo di +openssl per compilare il kernel 4.3 o successivi. + + +Strumenti di sistema +******************** + +Modifiche architetturali +------------------------ + +DevFS è stato reso obsoleto da udev +(http://www.kernel.org/pub/linux/utils/kernel/hotplug/) + +Il supporto per UID a 32-bit è ora disponibile. Divertitevi! + +La documentazione delle funzioni in Linux è una fase di transizione +verso una documentazione integrata nei sorgenti stessi usando dei commenti +formattati in modo speciale e posizionati vicino alle funzioni che descrivono. +Al fine di arricchire la documentazione, questi commenti possono essere +combinati con i file ReST presenti in Documentation/; questi potranno +poi essere convertiti in formato PostScript, HTML, LaTex, ePUB o PDF. +Per convertire i documenti da ReST al formato che volete, avete bisogno di +Sphinx. + +Util-linux +---------- + +Le versioni più recenti di util-linux: forniscono il supporto a ``fdisk`` per +dischi di grandi dimensioni; supportano le nuove opzioni di mount; riconoscono +più tipi di partizioni; hanno un fdformat che funziona con i kernel 2.4; +e altre chicche. Probabilmente vorrete aggiornarlo. + +Ksymoops +-------- + +Se l'impensabile succede e il kernel va in oops, potrebbe servirvi lo strumento +ksymoops per decodificarlo, ma nella maggior parte dei casi non vi servirà. +Generalmente è preferibile compilare il kernel con l'opzione ``CONFIG_KALLSYMS`` +cosicché venga prodotto un output più leggibile che può essere usato così com'è +(produce anche un output migliore di ksymoops). Se per qualche motivo il +vostro kernel non è stato compilato con ``CONFIG_KALLSYMS`` e non avete modo di +ricompilarlo e riprodurre l'oops con quell'opzione abilitata, allora potete +usare ksymoops per decodificare l'oops. + +Mkinitrd +-------- + +I cambiamenti della struttura in ``/lib/modules`` necessita l'aggiornamento di +mkinitrd. + +E2fsprogs +--------- + +L'ultima versione di ``e2fsprogs`` corregge diversi bachi in fsck e debugfs. +Ovviamente, aggiornarlo è una buona idea. + +JFSutils +-------- + +Il pacchetto ``jfsutils`` contiene programmi per il file-system JFS. +Sono disponibili i seguenti strumenti: + +- ``fsck.jfs`` - avvia la ripetizione del log delle transizioni, e verifica e + ripara una partizione formattata secondo JFS + +- ``mkfs.jfs`` - crea una partizione formattata secondo JFS + +- sono disponibili altri strumenti per il file-system. + +Reiserfsprogs +------------- + +Il pacchetto reiserfsprogs dovrebbe essere usato con reiserfs-3.6.x (Linux +kernel 2.4.x). Questo è un pacchetto combinato che contiene versioni +funzionanti di ``mkreiserfs``, ``resize_reiserfs``, ``debugreiserfs`` e +``reiserfsck``. Questi programmi funzionano sulle piattaforme i386 e alpha. + +Xfsprogs +-------- + +L'ultima versione di ``xfsprogs`` contiene, fra i tanti, i programmi +``mkfs.xfs``, ``xfs_db`` e ``xfs_repair`` per il file-system XFS. +Dipendono dell'architettura e qualsiasi versione dalla 2.0.0 in poi +dovrebbe funzionare correttamente con la versione corrente del codice +XFS nel kernel (sono raccomandate le versioni 2.6.0 o successive per via +di importanti miglioramenti). + +PCMCIAutils +----------- + +PCMCIAutils sostituisce ``pcmica-cs``. Serve ad impostare correttamente i +connettori PCMCIA all'avvio del sistema e a caricare i moduli necessari per +i dispositivi a 16-bit se il kernel è stato modularizzato e il sottosistema +hotplug è in uso. + +Quota-tools +----------- + +Il supporto per uid e gid a 32 bit richiedono l'uso della versione 2 del +formato quota. La versione 3.07 e successive di quota-tools supportano +questo formato. Usate la versione raccomandata nella lista qui sopra o una +successiva. + +Micro codice per Intel IA32 +--------------------------- + +Per poter aggiornare il micro codice per Intel IA32, è stato aggiunto un +apposito driver; il driver è accessibile come un normale dispositivo a +caratteri (misc). Se non state usando udev probabilmente sarà necessario +eseguire i seguenti comandi come root prima di poterlo aggiornare:: + + mkdir /dev/cpu + mknod /dev/cpu/microcode c 10 184 + chmod 0644 /dev/cpu/microcode + +Probabilmente, vorrete anche il programma microcode_ctl da usare con questo +dispositivo. + +udev +---- + +``udev`` è un programma in spazio utente il cui scopo è quello di popolare +dinamicamente la cartella ``/dev`` coi dispositivi effettivamente presenti. +``udev`` sostituisce le funzionalità base di devfs, consentendo comunque +nomi persistenti per i dispositivi. + +FUSE +---- + +Serve libfuse 2.4.0 o successiva. Il requisito minimo assoluto è 2.3.0 ma +le opzioni di mount ``direct_io`` e ``kernel_cache`` non funzioneranno. + + +Rete +**** + +Cambiamenti generali +-------------------- + +Se per quanto riguarda la configurazione di rete avete esigenze di un certo +livello dovreste prendere in considerazione l'uso degli strumenti in ip-route2. + +Filtro dei pacchetti / NAT +-------------------------- + +Il codice per filtraggio dei pacchetti e il NAT fanno uso degli stessi +strumenti come nelle versioni del kernel antecedenti la 2.4.x (iptables). +Include ancora moduli di compatibilità per 2.2.x ipchains e 2.0.x ipdwadm. + +PPP +--- + +Il driver per PPP è stato ristrutturato per supportare collegamenti multipli e +per funzionare su diversi livelli. Se usate PPP, aggiornate pppd almeno alla +versione 2.4.0. + +Se non usate udev, dovete avere un file /dev/ppp che può essere creato da root +col seguente comando:: + + mknod /dev/ppp c 108 0 + +Isdn4k-utils +------------ + +Per via della modifica del campo per il numero di telefono, il pacchetto +isdn4k-utils dev'essere ricompilato o (preferibilmente) aggiornato. + +NFS-utils +--------- + +Nei kernel più antichi (2.4 e precedenti), il server NFS doveva essere +informato sui clienti ai quali si voleva fornire accesso via NFS. Questa +informazione veniva passata al kernel quando un cliente montava un file-system +mediante ``mountd``, oppure usando ``exportfs`` all'avvio del sistema. +exportfs prende le informazioni circa i clienti attivi da ``/var/lib/nfs/rmtab``. + +Questo approccio è piuttosto delicato perché dipende dalla correttezza di +rmtab, che non è facile da garantire, in particolare quando si cerca di +implementare un *failover*. Anche quando il sistema funziona bene, ``rmtab`` +ha il problema di accumulare vecchie voci inutilizzate. + +Sui kernel più recenti il kernel ha la possibilità di informare mountd quando +arriva una richiesta da una macchina sconosciuta, e mountd può dare al kernel +le informazioni corrette per l'esportazione. Questo rimuove la dipendenza con +``rmtab`` e significa che il kernel deve essere al corrente solo dei clienti +attivi. + +Per attivare questa funzionalità, dovete eseguire il seguente comando prima di +usare exportfs o mountd:: + + mount -t nfsd nfsd /proc/fs/nfsd + +Dove possibile, raccomandiamo di proteggere tutti i servizi NFS dall'accesso +via internet mediante un firewall. + +mcelog +------ + +Quando ``CONFIG_x86_MCE`` è attivo, il programma mcelog processa e registra +gli eventi *machine check*. Gli eventi *machine check* sono errori riportati +dalla CPU. Incoraggiamo l'analisi di questi errori. + + +Documentazione del kernel +************************* + +Sphinx +------ + +Per i dettaglio sui requisiti di Sphinx, fate riferimento a :ref:`it_sphinx_install` +in :ref:`Documentation/translations/it_IT/doc-guide/sphinx.rst <it_sphinxdoc>` + +Ottenere software aggiornato +============================ + +Compilazione del kernel +*********************** + +gcc +--- + +- <ftp://ftp.gnu.org/gnu/gcc/> + +Make +---- + +- <ftp://ftp.gnu.org/gnu/make/> + +Binutils +-------- + +- <https://www.kernel.org/pub/linux/devel/binutils/> + +Flex +---- + +- <https://github.com/westes/flex/releases> + +Bison +----- + +- <ftp://ftp.gnu.org/gnu/bison/> + +OpenSSL +------- + +- <https://www.openssl.org/> + +Strumenti di sistema +******************** + +Util-linux +---------- + +- <https://www.kernel.org/pub/linux/utils/util-linux/> + +Kmod +---- + +- <https://www.kernel.org/pub/linux/utils/kernel/kmod/> +- <https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git> + +Ksymoops +-------- + +- <https://www.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/> + +Mkinitrd +-------- + +- <https://code.launchpad.net/initrd-tools/main> + +E2fsprogs +--------- + +- <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.29.tar.gz> + +JFSutils +-------- + +- <http://jfs.sourceforge.net/> + +Reiserfsprogs +------------- + +- <http://www.kernel.org/pub/linux/utils/fs/reiserfs/> + +Xfsprogs +-------- + +- <ftp://oss.sgi.com/projects/xfs/> + +Pcmciautils +----------- + +- <https://www.kernel.org/pub/linux/utils/kernel/pcmcia/> + +Quota-tools +----------- + +- <http://sourceforge.net/projects/linuxquota/> + + +Microcodice Intel P6 +-------------------- + +- <https://downloadcenter.intel.com/> + +udev +---- + +- <http://www.freedesktop.org/software/systemd/man/udev.html> + +FUSE +---- + +- <https://github.com/libfuse/libfuse/releases> + +mcelog +------ + +- <http://www.mcelog.org/> + +Rete +**** + +PPP +--- + +- <ftp://ftp.samba.org/pub/ppp/> + +Isdn4k-utils +------------ + +- <ftp://ftp.isdn4linux.de/pub/isdn4linux/utils/> + +NFS-utils +--------- + +- <http://sourceforge.net/project/showfiles.php?group_id=14> + +Iptables +-------- + +- <http://www.iptables.org/downloads.html> + +Ip-route2 +--------- + +- <https://www.kernel.org/pub/linux/utils/net/iproute2/> + +OProfile +-------- + +- <http://oprofile.sf.net/download/> + +NFS-Utils +--------- + +- <http://nfs.sourceforge.net/> + +Documentazione del kernel +************************* + +Sphinx +------ + +- <http://www.sphinx-doc.org/> diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst @@ -449,6 +449,9 @@ Nonostante questo non sia richiesto dal linguaggio C, in Linux viene preferito perché è un modo semplice per aggiungere informazioni importanti per il lettore. +Non usate la parola chiave ``extern`` coi prototipi di funzione perché +rende le righe più lunghe e non è strettamente necessario. + 7) Centralizzare il ritorno delle funzioni ------------------------------------------ @@ -600,26 +603,43 @@ segue nel vostro file .emacs: (* (max steps 1) c-basic-offset))) - (add-hook 'c-mode-common-hook - (lambda () - ;; Add kernel style - (c-add-style - "linux-tabs-only" - '("linux" (c-offsets-alist - (arglist-cont-nonempty - c-lineup-gcc-asm-reg - c-lineup-arglist-tabs-only)))))) - - (add-hook 'c-mode-hook - (lambda () - (let ((filename (buffer-file-name))) - ;; Enable kernel mode for the appropriate files - (when (and filename - (string-match (expand-file-name "~/src/linux-trees") - filename)) - (setq indent-tabs-mode t) - (setq show-trailing-whitespace t) - (c-set-style "linux-tabs-only"))))) + (dir-locals-set-class-variables + 'linux-kernel + '((c-mode . ( + (c-basic-offset . 8) + (c-label-minimum-indentation . 0) + (c-offsets-alist . ( + (arglist-close . c-lineup-arglist-tabs-only) + (arglist-cont-nonempty . + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (arglist-intro . +) + (brace-list-intro . +) + (c . c-lineup-C-comments) + (case-label . 0) + (comment-intro . c-lineup-comment) + (cpp-define-intro . +) + (cpp-macro . -1000) + (cpp-macro-cont . +) + (defun-block-intro . +) + (else-clause . 0) + (func-decl-cont . +) + (inclass . +) + (inher-cont . c-lineup-multi-inher) + (knr-argdecl-intro . 0) + (label . -1000) + (statement . 0) + (statement-block-intro . +) + (statement-case-intro . +) + (statement-cont . +) + (substatement . +) + )) + (indent-tabs-mode . t) + (show-trailing-whitespace . t) + )))) + + (dir-locals-set-directory-class + (expand-file-name "~/src/linux-trees") + 'linux-kernel) Questo farà funzionare meglio emacs con lo stile del kernel per i file che si trovano nella cartella ``~/src/linux-trees``. @@ -929,7 +949,40 @@ qualche valore fuori dai limiti. Un tipico esempio è quello delle funzioni che ritornano un puntatore; queste utilizzano NULL o ERR_PTR come meccanismo di notifica degli errori. -17) Non reinventate le macro del kernel +17) L'uso di bool +----------------- + +Nel kernel Linux il tipo bool deriva dal tipo _Bool dello standard C99. +Un valore bool può assumere solo i valori 0 o 1, e implicitamente o +esplicitamente la conversione a bool converte i valori in vero (*true*) o +falso (*false*). Quando si usa un tipo bool il costrutto !! non sarà più +necessario, e questo va ad eliminare una certa serie di bachi. + +Quando si usano i valori booleani, dovreste utilizzare le definizioni di true +e false al posto dei valori 1 e 0. + +Per il valore di ritorno delle funzioni e per le variabili sullo stack, l'uso +del tipo bool è sempre appropriato. L'uso di bool viene incoraggiato per +migliorare la leggibilità e spesso è molto meglio di 'int' nella gestione di +valori booleani. + +Non usate bool se per voi sono importanti l'ordine delle righe di cache o +la loro dimensione; la dimensione e l'allineamento cambia a seconda +dell'architettura per la quale è stato compilato. Le strutture che sono state +ottimizzate per l'allineamento o la dimensione non dovrebbero usare bool. + +Se una struttura ha molti valori true/false, considerate l'idea di raggrupparli +in un intero usando campi da 1 bit, oppure usate un tipo dalla larghezza fissa, +come u8. + +Come per gli argomenti delle funzioni, molti valori true/false possono essere +raggruppati in un singolo argomento a bit denominato 'flags'; spesso 'flags' è +un'alternativa molto più leggibile se si hanno valori costanti per true/false. + +Detto ciò, un uso parsimonioso di bool nelle strutture dati e negli argomenti +può migliorare la leggibilità. + +18) Non reinventate le macro del kernel --------------------------------------- Il file di intestazione include/linux/kernel.h contiene un certo numero @@ -953,7 +1006,7 @@ rigido sui tipi. Sentitevi liberi di leggere attentamente questo file d'intestazione per scoprire cos'altro è stato definito che non dovreste reinventare nel vostro codice. -18) Linee di configurazione degli editor e altre schifezze +19) Linee di configurazione degli editor e altre schifezze ----------------------------------------------------------- Alcuni editor possono interpretare dei parametri di configurazione integrati @@ -987,8 +1040,8 @@ d'indentazione e di modalità d'uso. Le persone potrebbero aver configurato una modalità su misura, oppure potrebbero avere qualche altra magia per far funzionare bene l'indentazione. -19) Inline assembly ---------------------- +20) Inline assembly +------------------- Nel codice specifico per un'architettura, potreste aver bisogno di codice *inline assembly* per interfacciarvi col processore o con una funzionalità @@ -1020,7 +1073,7 @@ al fine di allineare correttamente l'assembler che verrà generato: "more_magic %reg2, %reg3" : /* outputs */ : /* inputs */ : /* clobbers */); -20) Compilazione sotto condizione +21) Compilazione sotto condizione --------------------------------- Ovunque sia possibile, non usate le direttive condizionali del preprocessore diff --git a/Documentation/translations/it_IT/process/howto.rst b/Documentation/translations/it_IT/process/howto.rst @@ -234,7 +234,7 @@ il progetto Linux Cross-Reference, che è in grado di presentare codice sorgente in un formato autoreferenziale ed indicizzato. Un eccellente ed aggiornata fonte di consultazione del codice del kernel la potete trovare qui: - http://lxr.free-electrons.com/ + https://elixir.bootlin.com/ Il processo di sviluppo @@ -244,7 +244,6 @@ e di molti altri rami per specifici sottosistemi. Questi rami sono: - I sorgenti kernel 4.x - I sorgenti stabili del kernel 4.x.y -stable - - Le modifiche in 4.x -git - Sorgenti dei sottosistemi del kernel e le loro modifiche - Il kernel 4.x -next per test d'integrazione @@ -313,16 +312,6 @@ Il file Documentation/process/stable-kernel-rules.rst (nei sorgenti) documenta quali tipologie di modifiche sono accettate per i sorgenti -stable, e come avviene il processo di rilascio. -Le modifiche in 4.x -git -~~~~~~~~~~~~~~~~~~~~~~~~ - -Queste sono istantanee quotidiane del kernel di Linus e sono gestite in -una repositorio git (da qui il nome). Queste modifiche sono solitamente -rilasciate giornalmente e rappresentano l'attuale stato dei sorgenti di -Linus. Queste sono da considerarsi più sperimentali di un -rc in quanto -generate automaticamente senza nemmeno aver dato una rapida occhiata -per verificarne lo stato. - Sorgenti dei sottosistemi del kernel e le loro patch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/translations/it_IT/process/stable-api-nonsense.rst b/Documentation/translations/it_IT/process/stable-api-nonsense.rst @@ -1,13 +1,209 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/stable-api-nonsense.rst <stable_api_nonsense>` - +:Translator: Federico Vaga <federico.vaga@vaga.pv.it> .. _it_stable_api_nonsense: L'interfaccia dei driver per il kernel Linux ============================================ -.. warning:: +(tutte le risposte alle vostre domande e altro) + +Greg Kroah-Hartman <greg@kroah.com> + +Questo è stato scritto per cercare di spiegare perché Linux **non ha +un'interfaccia binaria, e non ha nemmeno un'interfaccia stabile**. + +.. note:: + + Questo articolo parla di interfacce **interne al kernel**, non delle + interfacce verso lo spazio utente. + + L'interfaccia del kernel verso lo spazio utente è quella usata dai + programmi, ovvero le chiamate di sistema. Queste interfacce sono **molto** + stabili nel tempo e non verranno modificate. Ho vecchi programmi che sono + stati compilati su un kernel 0.9 (circa) e tuttora funzionano sulle versioni + 2.6 del kernel. Queste interfacce sono quelle che gli utenti e i + programmatori possono considerare stabili. + +Riepilogo generale +------------------ + +Pensate di volere un'interfaccia del kernel stabile, ma in realtà non la +volete, e nemmeno sapete di non volerla. Quello che volete è un driver +stabile che funzioni, e questo può essere ottenuto solo se il driver si trova +nei sorgenti del kernel. Ci sono altri vantaggi nell'avere il proprio driver +nei sorgenti del kernel, ognuno dei quali hanno reso Linux un sistema operativo +robusto, stabile e maturo; questi sono anche i motivi per cui avete scelto +Linux. + +Introduzione +------------ + +Solo le persone un po' strambe vorrebbero scrivere driver per il kernel con +la costante preoccupazione per i cambiamenti alle interfacce interne. Per il +resto del mondo, queste interfacce sono invisibili o non di particolare +interesse. + +Innanzitutto, non tratterò **alcun** problema legale riguardante codice +chiuso, nascosto, avvolto, blocchi binari, o qualsia altra cosa che descrive +driver che non hanno i propri sorgenti rilasciati con licenza GPL. Per favore +fate riferimento ad un avvocato per qualsiasi questione legale, io sono un +programmatore e perciò qui vi parlerò soltanto delle questioni tecniche (non +per essere superficiali sui problemi legali, sono veri e dovete esserne a +conoscenza in ogni circostanza). + +Dunque, ci sono due tematiche principali: interfacce binarie del kernel e +interfacce stabili nei sorgenti. Ognuna dipende dall'altra, ma discuteremo +prima delle cose binarie per toglierle di mezzo. + +Interfaccia binaria del kernel +------------------------------ + +Supponiamo d'avere un'interfaccia stabile nei sorgenti del kernel, di +conseguenza un'interfaccia binaria dovrebbe essere anche'essa stabile, giusto? +Sbagliato. Prendete in considerazione i seguenti fatti che riguardano il +kernel Linux: + + - A seconda della versione del compilatore C che state utilizzando, diverse + strutture dati del kernel avranno un allineamento diverso, e possibilmente + un modo diverso di includere le funzioni (renderle inline oppure no). + L'organizzazione delle singole funzioni non è poi così importante, ma la + spaziatura (*padding*) nelle strutture dati, invece, lo è. + + - In base alle opzioni che sono state selezionate per generare il kernel, + un certo numero di cose potrebbero succedere: + + - strutture dati differenti potrebbero contenere campi differenti + - alcune funzioni potrebbero non essere implementate (per esempio, + alcuni *lock* spariscono se compilati su sistemi mono-processore) + - la memoria interna del kernel può essere allineata in differenti modi + a seconda delle opzioni di compilazione. + + - Linux funziona su una vasta gamma di architetture di processore. Non esiste + alcuna possibilità che il binario di un driver per un'architettura funzioni + correttamente su un'altra. + +Alcuni di questi problemi possono essere risolti compilando il proprio modulo +con la stessa identica configurazione del kernel, ed usando la stessa versione +del compilatore usato per compilare il kernel. Questo è sufficiente se volete +fornire un modulo per uno specifico rilascio su una specifica distribuzione +Linux. Ma moltiplicate questa singola compilazione per il numero di +distribuzioni Linux e il numero dei rilasci supportati da quest'ultime e vi +troverete rapidamente in un incubo fatto di configurazioni e piattaforme +hardware (differenti processori con differenti opzioni); dunque, anche per il +singolo rilascio di un modulo, dovreste creare differenti versioni dello +stesso. + +Fidatevi, se tenterete questa via, col tempo, diventerete pazzi; l'ho imparato +a mie spese molto tempo fa... + + +Interfaccia stabile nei sorgenti del kernel +------------------------------------------- + +Se parlate con le persone che cercano di mantenere aggiornato un driver per +Linux ma che non si trova nei sorgenti, allora per queste persone l'argomento +sarà "ostico". + +Lo sviluppo del kernel Linux è continuo e viaggia ad un ritmo sostenuto, e non +rallenta mai. Perciò, gli sviluppatori del kernel trovano bachi nelle +interfacce attuali, o trovano modi migliori per fare le cose. Se le trovano, +allora le correggeranno per migliorarle. In questo frangente, i nomi delle +funzioni potrebbero cambiare, le strutture dati potrebbero diventare più grandi +o più piccole, e gli argomenti delle funzioni potrebbero essere ripensati. +Se questo dovesse succedere, nello stesso momento, tutte le istanze dove questa +interfaccia viene utilizzata verranno corrette, garantendo che tutto continui +a funzionare senza problemi. + +Portiamo ad esempio l'interfaccia interna per il sottosistema USB che ha subito +tre ristrutturazioni nel corso della sua vita. Queste ristrutturazioni furono +fatte per risolvere diversi problemi: + + - È stato fatto un cambiamento da un flusso di dati sincrono ad uno + asincrono. Questo ha ridotto la complessità di molti driver e ha + aumentato la capacità di trasmissione di tutti i driver fino a raggiungere + quasi la velocità massima possibile. + - È stato fatto un cambiamento nell'allocazione dei pacchetti da parte del + sottosistema USB per conto dei driver, cosicché ora i driver devono fornire + più informazioni al sottosistema USB al fine di correggere un certo numero + di stalli. + +Questo è completamente l'opposto di quello che succede in alcuni sistemi +operativi proprietari che hanno dovuto mantenere, nel tempo, il supporto alle +vecchie interfacce USB. I nuovi sviluppatori potrebbero usare accidentalmente +le vecchie interfacce e sviluppare codice nel modo sbagliato, portando, di +conseguenza, all'instabilità del sistema. + +In entrambe gli scenari, gli sviluppatori hanno ritenuto che queste importanti +modifiche erano necessarie, e quindi le hanno fatte con qualche sofferenza. +Se Linux avesse assicurato di mantenere stabile l'interfaccia interna, si +sarebbe dovuto procedere alla creazione di una nuova, e quelle vecchie, e +mal funzionanti, avrebbero dovuto ricevere manutenzione, creando lavoro +aggiuntivo per gli sviluppatori del sottosistema USB. Dato che gli +sviluppatori devono dedicare il proprio tempo a questo genere di lavoro, +chiedergli di dedicarne dell'altro, senza benefici, magari gratuitamente, non +è contemplabile. + +Le problematiche relative alla sicurezza sono molto importanti per Linux. +Quando viene trovato un problema di sicurezza viene corretto in breve tempo. +A volte, per prevenire il problema di sicurezza, si sono dovute cambiare +delle interfacce interne al kernel. Quando è successo, allo stesso tempo, +tutti i driver che usavano quelle interfacce sono stati aggiornati, garantendo +la correzione definitiva del problema senza doversi preoccupare di rivederlo +per sbaglio in futuro. Se non si fossero cambiate le interfacce interne, +sarebbe stato impossibile correggere il problema e garantire che non si sarebbe +più ripetuto. + +Nel tempo le interfacce del kernel subiscono qualche ripulita. Se nessuno +sta più usando un'interfaccia, allora questa verrà rimossa. Questo permette +al kernel di rimanere il più piccolo possibile, e garantisce che tutte le +potenziali interfacce sono state verificate nel limite del possibile (le +interfacce inutilizzate sono impossibili da verificare). + + +Cosa fare +--------- + +Dunque, se avete un driver per il kernel Linux che non si trova nei sorgenti +principali del kernel, come sviluppatori, cosa dovreste fare? Rilasciare un +file binario del driver per ogni versione del kernel e per ogni distribuzione, +è un incubo; inoltre, tenere il passo con tutti i cambiamenti del kernel è un +brutto lavoro. + +Semplicemente, fate sì che il vostro driver per il kernel venga incluso nei +sorgenti principali (ricordatevi, stiamo parlando di driver rilasciati secondo +una licenza compatibile con la GPL; se il vostro codice non ricade in questa +categoria: buona fortuna, arrangiatevi, siete delle sanguisughe) + +Se il vostro driver è nei sorgenti del kernel e un'interfaccia cambia, il +driver verrà corretto immediatamente dalla persona che l'ha modificata. Questo +garantisce che sia sempre possibile compilare il driver, che funzioni, e tutto +con un minimo sforzo da parte vostra. + +Avere il proprio driver nei sorgenti principali del kernel ha i seguenti +vantaggi: + + - La qualità del driver aumenterà e i costi di manutenzione (per lo + sviluppatore originale) diminuiranno. + - Altri sviluppatori aggiungeranno nuove funzionalità al vostro driver. + - Altri persone troveranno e correggeranno bachi nel vostro driver. + - Altri persone troveranno degli aggiustamenti da fare al vostro driver. + - Altri persone aggiorneranno il driver quando è richiesto da un cambiamento + di un'interfaccia. + - Il driver sarà automaticamente reso disponibile in tutte le distribuzioni + Linux senza dover chiedere a nessuna di queste di aggiungerlo. + +Dato che Linux supporta più dispositivi di qualsiasi altro sistema operativo, +e che girano su molti più tipi di processori di qualsiasi altro sistema +operativo; ciò dimostra che questo modello di sviluppo qualcosa di giusto, +dopo tutto, lo fa :) + + + +------ - TODO ancora da tradurre +Dei ringraziamenti vanno a Randy Dunlap, Andrew Morton, David Brownell, +Hanna Linder, Robert Love, e Nishanth Aravamudan per la loro revisione +e per i loro commenti sulle prime bozze di questo articolo. diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst b/Documentation/translations/it_IT/process/submit-checklist.rst @@ -1,12 +1,131 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/submit-checklist.rst <submitchecklist>` +:Translator: Federico Vaga <federico.vaga@vaga.pv.it> .. _it_submitchecklist: -Lista delle cose da fare per inviare una modifica al kernel Linux -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Lista delle verifiche da fare prima di inviare una patch per il kernel Linux +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. warning:: +Qui troverete una lista di cose che uno sviluppatore dovrebbe fare per +vedere le proprie patch accettate più rapidamente. - TODO ancora da tradurre +Tutti questi punti integrano la documentazione fornita riguardo alla +sottomissione delle patch, in particolare +:ref:`Documentation/translations/it_IT/process/submitting-patches.rst <it_submittingpatches>`. + +1) Se state usando delle funzionalità del kernel allora includete (#include) + i file che le dichiarano/definiscono. Non dipendente dal fatto che un file + d'intestazione include anche quelli usati da voi. + +2) Compilazione pulita: + + a) con le opzioni ``CONFIG`` negli stati ``=y``, ``=m`` e ``=n``. Nessun + avviso/errore di ``gcc`` e nessun avviso/errore dal linker. + + b) con ``allnoconfig``, ``allmodconfig`` + + c) quando si usa ``O=builddir`` + +3) Compilare per diverse architetture di processore usando strumenti per + la cross-compilazione o altri. + +4) Una buona architettura per la verifica della cross-compilazione è la ppc64 + perché tende ad usare ``unsigned long`` per le quantità a 64-bit. + +5) Controllate lo stile del codice della vostra patch secondo le direttive + scritte in :ref:`Documentation/translations/it_IT/process/coding-style.rst <it_codingstyle>`. + Prima dell'invio della patch, usate il verificatore di stile + (``script/checkpatch.pl``) per scovare le violazioni più semplici. + Dovreste essere in grado di giustificare tutte le violazioni rimanenti nella + vostra patch. + +6) Le opzioni ``CONFIG``, nuove o modificate, non scombussolano il menu + di configurazione e sono preimpostate come disabilitate a meno che non + soddisfino i criteri descritti in ``Documentation/kbuild/kconfig-language.txt`` + alla punto "Voci di menu: valori predefiniti". + +7) Tutte le nuove opzioni ``Kconfig`` hanno un messaggio di aiuto. + +8) La patch è stata accuratamente revisionata rispetto alle più importanti + configurazioni ``Kconfig``. Questo è molto difficile da fare + correttamente - un buono lavoro di testa sarà utile. + +9) Verificare con sparse. + +10) Usare ``make checkstack`` e ``make namespacecheck`` e correggere tutti i + problemi rilevati. + + .. note:: + + ``checkstack`` non evidenzia esplicitamente i problemi, ma una funzione + che usa più di 512 byte sullo stack è una buona candidata per una + correzione. + +11) Includete commenti :ref:`kernel-doc <kernel_doc>` per documentare API + globali del kernel. Usate ``make htmldocs`` o ``make pdfdocs`` per + verificare i commenti :ref:`kernel-doc <kernel_doc>` ed eventualmente + correggerli. + +12) La patch è stata verificata con le seguenti opzioni abilitate + contemporaneamente: ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``, + ``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``, + ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``, + ``CONFIG_PROVE_RCU`` e ``CONFIG_DEBUG_OBJECTS_RCU_HEAD``. + +13) La patch è stata compilata e verificata in esecuzione con, e senza, + le opzioni ``CONFIG_SMP`` e ``CONFIG_PREEMPT``. + +14) Se la patch ha effetti sull'IO dei dischi, eccetera: allora dev'essere + verificata con, e senza, l'opzione ``CONFIG_LBDAF``. + +15) Tutti i percorsi del codice sono stati verificati con tutte le funzionalità + di lockdep abilitate. + +16) Tutti i nuovi elementi in ``/proc`` sono documentati in ``Documentation/``. + +17) Tutti i nuovi parametri d'avvio del kernel sono documentati in + ``Documentation/admin-guide/kernel-parameters.rst``. + +18) Tutti i nuovi parametri dei moduli sono documentati con ``MODULE_PARM_DESC()``. + +19) Tutte le nuove interfacce verso lo spazio utente sono documentate in + ``Documentation/ABI/``. Leggete ``Documentation/ABI/README`` per maggiori + informazioni. Le patch che modificano le interfacce utente dovrebbero + essere inviate in copia anche a linux-api@vger.kernel.org. + +20) Verifica che il kernel passi con successo ``make headers_check`` + +21) La patch è stata verificata con l'iniezione di fallimenti in slab e + nell'allocazione di pagine. Vedere ``Documentation/fault-injection/``. + + Se il nuovo codice è corposo, potrebbe essere opportuno aggiungere + l'iniezione di fallimenti specifici per il sottosistema. + +22) Il nuovo codice è stato compilato con ``gcc -W`` (usate + ``make EXTRA_CFLAGS=-W``). Questo genererà molti avvisi, ma è ottimo + per scovare bachi come "warning: comparison between signed and unsigned". + +23) La patch è stata verificata dopo essere stata inclusa nella serie di patch + -mm; questo al fine di assicurarsi che continui a funzionare assieme a + tutte le altre patch in coda e i vari cambiamenti nei sottosistemi VM, VFS + e altri. + +24) Tutte le barriere di sincronizzazione {per esempio, ``barrier()``, + ``rmb()``, ``wmb()``} devono essere accompagnate da un commento nei + sorgenti che ne spieghi la logica: cosa fanno e perché. + +25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate + ``Documentation/ioctl/ioctl-number.txt``. + +26) Se il codice che avete modificato dipende o usa una qualsiasi interfaccia o + funzionalità del kernel che è associata a uno dei seguenti simboli + ``Kconfig``, allora verificate che il kernel compili con diverse + configurazioni dove i simboli sono disabilitati e/o ``=m`` (se c'è la + possibilità) [non tutti contemporaneamente, solo diverse combinazioni + casuali]: + + ``CONFIG_SMP``, ``CONFIG_SYSFS``, ``CONFIG_PROC_FS``, ``CONFIG_INPUT``, + ``CONFIG_PCI``, ``CONFIG_BLOCK``, ``CONFIG_PM``, ``CONFIG_MAGIC_SYSRQ``, + ``CONFIG_NET``, ``CONFIG_INET=n`` (ma l'ultimo con ``CONFIG_NET=y``). diff --git a/Documentation/translations/it_IT/process/submitting-drivers.rst b/Documentation/translations/it_IT/process/submitting-drivers.rst @@ -1,12 +1,16 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/submitting-drivers.rst <submittingdrivers>` +:Translator: Federico Vaga <federico.vaga@vaga.pv.it> .. _it_submittingdrivers: Sottomettere driver per il kernel Linux ======================================= -.. warning:: +.. note:: - TODO ancora da tradurre + Questo documento è vecchio e negli ultimi anni non è stato più aggiornato; + dovrebbe essere aggiornato, o forse meglio, rimosso. La maggior parte di + quello che viene detto qui può essere trovato anche negli altri documenti + dedicati allo sviluppo. Per questo motivo il documento non verrà tradotto. diff --git a/Documentation/translations/it_IT/process/submitting-patches.rst b/Documentation/translations/it_IT/process/submitting-patches.rst @@ -1,13 +1,867 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/submitting-patches.rst <submittingpatches>` - +:Translator: Federico Vaga <federico.vaga@vaga.pv.it> .. _it_submittingpatches: -Sottomettere modifiche: la guida essenziale per vedere il vostro codice nel kernel -================================================================================== +Inviare patch: la guida essenziale per vedere il vostro codice nel kernel +========================================================================= + +Una persona o un'azienda che volesse inviare una patch al kernel potrebbe +sentirsi scoraggiata dal processo di sottomissione, specialmente quando manca +una certa familiarità col "sistema". Questo testo è una raccolta di +suggerimenti che aumenteranno significativamente le probabilità di vedere le +vostre patch accettate. + +Questo documento contiene un vasto numero di suggerimenti concisi. Per +maggiori dettagli su come funziona il processo di sviluppo del kernel leggete +:ref:`Documentation/translations/it_IT/process <it_development_process_main>`. +Leggete anche :ref:`Documentation/translations/it_IT/process/submit-checklist.rst <it_submitchecklist>` +per una lista di punti da verificare prima di inviare del codice. Se state +inviando un driver, allora leggete anche :ref:`Documentation/translations/it_IT/process/submitting-drivers.rst <it_submittingdrivers>`; +per delle patch relative alle associazioni per Device Tree leggete +Documentation/devicetree/bindings/submitting-patches.txt. + +Molti di questi passi descrivono il comportamento di base del sistema di +controllo di versione ``git``; se utilizzate ``git`` per preparare le vostre +patch molto del lavoro più ripetitivo lo troverete già fatto per voi, tuttavia +dovete preparare e documentare un certo numero di patch. Generalmente, l'uso +di ``git`` renderà la vostra vita di sviluppatore del kernel più facile. + +0) Ottenere i sorgenti attuali +------------------------------ + +Se non avete un repositorio coi sorgenti del kernel più recenti, allora usate +``git`` per ottenerli. Vorrete iniziare col repositorio principale che può +essere recuperato col comando:: + + git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git + +Notate, comunque, che potreste non voler sviluppare direttamente coi sorgenti +principali del kernel. La maggior parte dei manutentori hanno i propri +sorgenti e desiderano che le patch siano preparate basandosi su di essi. +Guardate l'elemento **T:** per un determinato sottosistema nel file MAINTANERS +che troverete nei sorgenti, o semplicemente chiedete al manutentore nel caso +in cui i sorgenti da usare non siano elencati il quel file. + +Esiste ancora la possibilità di scaricare un rilascio del kernel come archivio +tar (come descritto in una delle prossime sezioni), ma questa è la via più +complicata per sviluppare per il kernel. + +1) ``diff -up`` +--------------- + +Se dovete produrre le vostre patch a mano, usate ``diff -up`` o ``diff -uprN`` +per crearle. Git produce di base le patch in questo formato; se state +usando ``git``, potete saltare interamente questa sezione. + +Tutte le modifiche al kernel Linux avvengono mediate patch, come descritte +in :manpage:`diff(1)`. Quando create la vostra patch, assicuratevi di +crearla nel formato "unified diff", come l'argomento ``-u`` di +:manpage:`diff(1)`. +Inoltre, per favore usate l'argomento ``-p`` per mostrare la funzione C +alla quale si riferiscono le diverse modifiche - questo rende il risultato +di ``diff`` molto più facile da leggere. Le patch dovrebbero essere basate +sulla radice dei sorgenti del kernel, e non sulle sue sottocartelle. + +Per creare una patch per un singolo file, spesso è sufficiente fare:: + + SRCTREE= linux + MYFILE= drivers/net/mydriver.c + + cd $SRCTREE + cp $MYFILE $MYFILE.orig + vi $MYFILE # make your change + cd .. + diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch + +Per creare una patch per molteplici file, dovreste spacchettare i sorgenti +"vergini", o comunque non modificati, e fare un ``diff`` coi vostri. +Per esempio:: + + MYSRC= /devel/linux + + tar xvfz linux-3.19.tar.gz + mv linux-3.19 linux-3.19-vanilla + diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \ + linux-3.19-vanilla $MYSRC > /tmp/patch + +``dontdiff`` è una lista di file che sono generati durante il processo di +compilazione del kernel; questi dovrebbero essere ignorati in qualsiasi +patch generata con :manpage:`diff(1)`. + +Assicuratevi che la vostra patch non includa file che non ne fanno veramente +parte. Al fine di verificarne la correttezza, assicuratevi anche di +revisionare la vostra patch -dopo- averla generata con :manpage:`diff(1)`. + +Se le vostre modifiche producono molte differenze, allora dovrete dividerle +in patch indipendenti che modificano le cose in passi logici; leggete +:ref:`split_changes`. Questo faciliterà la revisione da parte degli altri +sviluppatori, il che è molto importante se volete che la patch venga accettata. + +Se state utilizzando ``git``, ``git rebase -i`` può aiutarvi nel procedimento. +Se non usate ``git``, un'alternativa popolare è ``quilt`` +<http://savannah.nongnu.org/projects/quilt>. + +.. _it_describe_changes: + +2) Descrivete le vostre modifiche +--------------------------------- + +Descrivete il vostro problema. Esiste sempre un problema che via ha spinto +ha fare il vostro lavoro, che sia la correzione di un baco da una riga o una +nuova funzionalità da 5000 righe di codice. Convincete i revisori che vale +la pena risolvere il vostro problema e che ha senso continuare a leggere oltre +al primo paragrafo. + +Descrivete ciò che sarà visibile agli utenti. Chiari incidenti nel sistema +e blocchi sono abbastanza convincenti, ma non tutti i bachi sono così evidenti. +Anche se il problema è stato scoperto durante la revisione del codice, +descrivete l'impatto che questo avrà sugli utenti. Tenete presente che +la maggior parte delle installazioni Linux usa un kernel che arriva dai +sorgenti stabili o dai sorgenti di una distribuzione particolare che prende +singolarmente le patch dai sorgenti principali; quindi, includete tutte +le informazioni che possono essere utili a capire le vostre modifiche: +le circostanze che causano il problema, estratti da dmesg, descrizioni di +un incidente di sistema, prestazioni di una regressione, picchi di latenza, +blocchi, eccetera. + +Quantificare le ottimizzazioni e i compromessi. Se affermate di aver +migliorato le prestazioni, il consumo di memoria, l'impatto sollo stack, +o la dimensione del file binario, includete dei numeri a supporto della +vostra dichiarazione. Ma ricordatevi di descrivere anche eventuali costi +che non sono ovvi. Solitamente le ottimizzazioni non sono gratuite, ma sono +un compromesso fra l'uso di CPU, la memoria e la leggibilità; o, quando si +parla di ipotesi euristiche, fra differenti carichi. Descrivete i lati +negativi che vi aspettate dall'ottimizzazione cosicché i revisori possano +valutare i costi e i benefici. + +Una volta che il problema è chiaro, descrivete come lo risolvete andando +nel dettaglio tecnico. È molto importante che descriviate la modifica +in un inglese semplice cosicché i revisori possano verificare che il codice si +comporti come descritto. + +I manutentori vi saranno grati se scrivete la descrizione della patch in un +formato che sia compatibile con il gestore dei sorgenti usato dal kernel, +``git``, come un "commit log". Leggete :ref:`it_explicit_in_reply_to`. + +Risolvete solo un problema per patch. Se la vostra descrizione inizia ad +essere lunga, potrebbe essere un segno che la vostra patch necessita d'essere +divisa. Leggete :ref:`split_changes`. + +Quando inviate o rinviate una patch o una serie, includete la descrizione +completa delle modifiche e la loro giustificazione. Non limitatevi a dire che +questa è la versione N della patch (o serie). Non aspettatevi che i +manutentori di un sottosistema vadano a cercare le versioni precedenti per +cercare la descrizione da aggiungere. In pratica, la patch (o serie) e la sua +descrizione devono essere un'unica cosa. Questo aiuta i manutentori e i +revisori. Probabilmente, alcuni revisori non hanno nemmeno ricevuto o visto +le versioni precedenti della patch. + +Descrivete le vostro modifiche usando l'imperativo, per esempio "make xyzzy +do frotz" piuttosto che "[This patch] makes xyzzy do frotz" or "[I] changed +xyzzy to do frotz", come se steste dando ordini al codice di cambiare il suo +comportamento. + +Se la patch corregge un baco conosciuto, fare riferimento a quel baco inserendo +il suo numero o il suo URL. Se la patch è la conseguenza di una discussione +su una lista di discussione, allora fornite l'URL all'archivio di quella +discussione; usate i collegamenti a https://lkml.kernel.org/ con il +``Message-Id``, in questo modo vi assicurerete che il collegamento non diventi +invalido nel tempo. + +Tuttavia, cercate di rendere la vostra spiegazione comprensibile anche senza +far riferimento a fonti esterne. In aggiunta ai collegamenti a bachi e liste +di discussione, riassumente i punti più importanti della discussione che hanno +portato alla creazione della patch. + +Se volete far riferimento a uno specifico commit, non usate solo +l'identificativo SHA-1. Per cortesia, aggiungete anche la breve riga +riassuntiva del commit per rendere la chiaro ai revisori l'oggetto. +Per esempio:: + + Commit e21d2170f36602ae2708 ("video: remove unnecessary + platform_set_drvdata()") removed the unnecessary + platform_set_drvdata(), but left the variable "dev" unused, + delete it. + +Dovreste anche assicurarvi di usare almeno i primi 12 caratteri +dell'identificativo SHA-1. Il repositorio del kernel ha *molti* oggetti e +questo rende possibile la collisione fra due identificativi con pochi +caratteri. Tenete ben presente che anche se oggi non ci sono collisioni con il +vostro identificativo a 6 caratteri, potrebbero essercene fra 5 anni da oggi. + +Se la vostra patch corregge un baco in un commit specifico, per esempio avete +trovato un problema usando ``git bisect``, per favore usate l'etichetta +'Fixes:' indicando i primi 12 caratteri dell'identificativo SHA-1 seguiti +dalla riga riassuntiva. Per esempio:: + + Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()") + +La seguente configurazione di ``git config`` può essere usata per formattare +i risultati dei comandi ``git log`` o ``git show`` come nell'esempio +precedente:: + + [core] + abbrev = 12 + [pretty] + fixes = Fixes: %h (\"%s\") + +.. _it_split_changes: + +3) Separate le vostre modifiche +------------------------------- + +Separate ogni **cambiamento logico** in patch distinte. + +Per esempio, se i vostri cambiamenti per un singolo driver includono +sia delle correzioni di bachi che miglioramenti alle prestazioni, +allora separateli in due o più patch. Se i vostri cambiamenti includono +un aggiornamento dell'API e un nuovo driver che lo sfrutta, allora separateli +in due patch. + +D'altro canto, se fate una singola modifica su più file, raggruppate tutte +queste modifiche in una singola patch. Dunque, un singolo cambiamento logico +è contenuto in una sola patch. + +Il punto da ricordare è che ogni modifica dovrebbe fare delle modifiche +che siano facilmente comprensibili e che possano essere verificate dai revisori. +Ogni patch dovrebbe essere giustificabile di per sé. + +Se al fine di ottenere un cambiamento completo una patch dipende da un'altra, +va bene. Semplicemente scrivete una nota nella descrizione della patch per +farlo presente: **"this patch depends on patch X"**. + +Quando dividete i vostri cambiamenti in una serie di patch, prestate +particolare attenzione alla verifica di ogni patch della serie; per ognuna +il kernel deve compilare ed essere eseguito correttamente. Gli sviluppatori +che usano ``git bisect`` per scovare i problemi potrebbero finire nel mezzo +della vostra serie in un punto qualsiasi; non vi saranno grati se nel mezzo +avete introdotto dei bachi. + +Se non potete condensare la vostra serie di patch in una più piccola, allora +pubblicatene una quindicina alla volta e aspettate che vengano revisionate +ed integrate. + + +4) Verificate lo stile delle vostre modifiche +--------------------------------------------- + +Controllate che la vostra patch non violi lo stile del codice, maggiori +dettagli sono disponibili in :ref:`Documentation/translations/it_IT/process/coding-style.rst <it_codingstyle>`. +Non farlo porta semplicemente a una perdita di tempo da parte dei revisori e +voi vedrete la vostra patch rifiutata, probabilmente senza nemmeno essere stata +letta. + +Un'eccezione importante si ha quando del codice viene spostato da un file +ad un altro -- in questo caso non dovreste modificare il codice spostato +per nessun motivo, almeno non nella patch che lo sposta. Questo separa +chiaramente l'azione di spostare il codice e il vostro cambiamento. +Questo aiuta enormemente la revisione delle vere differenze e permette agli +strumenti di tenere meglio la traccia della storia del codice. + +Prima di inviare una patch, verificatene lo stile usando l'apposito +verificatore (scripts/checkpatch.pl). Da notare, comunque, che il verificator +di stile dovrebbe essere visto come una guida, non come un sostituto al +giudizio umano. Se il vostro codice è migliore nonostante una violazione +dello stile, probabilmente è meglio lasciarlo com'è. + +Il verificatore ha tre diversi livelli di severità: + - ERROR: le cose sono molto probabilmente sbagliate + - WARNING: le cose necessitano d'essere revisionate con attenzione + - CHECK: le cose necessitano di un pensierino + +Dovreste essere in grado di giustificare tutte le eventuali violazioni rimaste +nella vostra patch. + + +5) Selezionate i destinatari della vostra patch +----------------------------------------------- + +Dovreste sempre inviare una copia della patch ai manutentori dei sottosistemi +interessati dalle modifiche; date un'occhiata al file MAINTAINERS e alla storia +delle revisioni per scoprire chi si occupa del codice. Lo script +scripts/get_maintainer.pl può esservi d'aiuto. Se non riuscite a trovare un +manutentore per il sottosistema su cui state lavorando, allora Andrew Morton +(akpm@linux-foundation.org) sarà la vostra ultima possibilità. + +Normalmente, dovreste anche scegliere una lista di discussione a cui inviare +la vostra serie di patch. La lista di discussione linux-kernel@vger.kernel.org +è proprio l'ultima spiaggia, il volume di email su questa lista fa si che +diversi sviluppatori non la seguano. Guardate nel file MAINTAINERS per trovare +la lista di discussione dedicata ad un sottosistema; probabilmente lì la vostra +patch riceverà molta più attenzione. Tuttavia, per favore, non spammate le +liste di discussione che non sono interessate al vostro lavoro. + +Molte delle liste di discussione relative al kernel vengono ospitate su +vger.kernel.org; potete trovare un loro elenco alla pagina +http://vger.kernel.org/vger-lists.html. Tuttavia, ci sono altre liste di +discussione ospitate altrove. + +Non inviate più di 15 patch alla volta sulle liste di discussione vger!!! + +L'ultimo giudizio sull'integrazione delle modifiche accettate spetta a +Linux Torvalds. Il suo indirizzo e-mail è <torvalds@linux-foundation.org>. +Riceve moltissime e-mail, e, a questo punto, solo poche patch passano +direttamente attraverso il suo giudizio; quindi, dovreste fare del vostro +meglio per -evitare di- inviargli e-mail. + +Se avete una patch che corregge un baco di sicurezza che potrebbe essere +sfruttato, inviatela a security@kernel.org. Per bachi importanti, un breve +embargo potrebbe essere preso in considerazione per dare il tempo alle +distribuzioni di prendere la patch e renderla disponibile ai loro utenti; +in questo caso, ovviamente, la patch non dovrebbe essere inviata su alcuna +lista di discussione pubblica. + +Patch che correggono bachi importanti su un kernel già rilasciato, dovrebbero +essere inviate ai manutentori dei kernel stabili aggiungendo la seguente riga:: + + Cc: stable@vger.kernel.org + +nella vostra patch, nell'area dedicata alle firme (notate, NON come destinatario +delle e-mail). In aggiunta a questo file, dovreste leggere anche +:ref:`Documentation/translations/it_IT/process/stable-kernel-rules.rst <it_stable_kernel_rules>` + +Tuttavia, notate, che alcuni manutentori di sottosistema preferiscono avere +l'ultima parola su quali patch dovrebbero essere aggiunte ai kernel stabili. +La rete di manutentori, in particolare, non vorrebbe vedere i singoli +sviluppatori aggiungere alle loro patch delle righe come quella sopracitata. + +Se le modifiche hanno effetti sull'interfaccia con lo spazio utente, per favore +inviate una patch per le pagine man ai manutentori di suddette pagine (elencati +nel file MAINTAINERS), o almeno una notifica circa la vostra modifica, +cosicché l'informazione possa trovare la sua strada nel manuale. Le modifiche +all'API dello spazio utente dovrebbero essere inviate in copia anche a +linux-api@vger.kernel.org. + +Per le piccole patch potreste aggiungere in CC l'indirizzo +*Trivial Patch Monkey trivial@kernel.org* che ha lo scopo di raccogliere +le patch "banali". Date uno sguardo al file MAINTAINERS per vedere chi +è l'attuale amministratore. + +Le patch banali devono rientrare in una delle seguenti categorie: + +- errori grammaticali nella documentazione +- errori grammaticali negli errori che potrebbero rompere :manpage:`grep(1)` +- correzione di avvisi di compilazione (riempirsi di avvisi inutili è negativo) +- correzione di errori di compilazione (solo se correggono qualcosa sul serio) +- rimozione di funzioni/macro deprecate +- sostituzione di codice non potabile con uno portabile (anche in codice + specifico per un'architettura, dato che le persone copiano, fintanto che + la modifica sia banale) +- qualsiasi modifica dell'autore/manutentore di un file (in pratica + "patch monkey" in modalità ritrasmissione) + + +6) Niente: MIME, links, compressione, allegati. Solo puro testo +---------------------------------------------------------------- + +Linus e gli altri sviluppatori del kernel devono poter commentare +le modifiche che sottomettete. Per uno sviluppatore è importante +essere in grado di "citare" le vostre modifiche, usando normali +programmi di posta elettronica, cosicché sia possibile commentare +una porzione specifica del vostro codice. + +Per questa ragione tutte le patch devono essere inviate via e-mail +come testo. .. warning:: - TODO ancora da tradurre + Se decidete di copiare ed incollare la patch nel corpo dell'e-mail, state + attenti che il vostro programma non corrompa il contenuto con andate + a capo automatiche. + +La patch non deve essere un allegato MIME, compresso o meno. Molti +dei più popolari programmi di posta elettronica non trasmettono un allegato +MIME come puro testo, e questo rende impossibile commentare il vostro codice. +Inoltre, un allegato MIME rende l'attività di Linus più laboriosa, diminuendo +così la possibilità che il vostro allegato-MIME venga accettato. + +Eccezione: se il vostro servizio di posta storpia le patch, allora qualcuno +potrebbe chiedervi di rinviarle come allegato MIME. + +Leggete :ref:`Documentation/translations/it_IT/process/email-clients.rst <it_email_clients>` +per dei suggerimenti sulla configurazione del programmi di posta elettronica +per l'invio di patch intatte. + +7) Dimensione delle e-mail +-------------------------- + +Le grosse modifiche non sono adatte ad una lista di discussione, e nemmeno +per alcuni manutentori. Se la vostra patch, non compressa, eccede i 300 kB +di spazio, allora caricatela in una spazio accessibile su internet fornendo +l'URL (collegamento) ad essa. Ma notate che se la vostra patch eccede i 300 kB +è quasi certo che necessiti comunque di essere spezzettata. + +8) Rispondere ai commenti di revisione +-------------------------------------- + +Quasi certamente i revisori vi invieranno dei commenti su come migliorare +la vostra patch. Dovete rispondere a questi commenti; ignorare i revisori +è un ottimo modo per essere ignorati. Riscontri o domande che non conducono +ad una modifica del codice quasi certamente dovrebbero portare ad un commento +nel changelog cosicché il prossimo revisore potrà meglio comprendere cosa stia +accadendo. + +Assicuratevi di dire ai revisori quali cambiamenti state facendo e di +ringraziarli per il loro tempo. Revisionare codice è un lavoro faticoso e che +richiede molto tempo, e a volte i revisori diventano burberi. Tuttavia, anche +in questo caso, rispondete con educazione e concentratevi sul problema che +hanno evidenziato. + +9) Non scoraggiatevi - o impazientitevi +--------------------------------------- + +Dopo che avete inviato le vostre modifiche, siate pazienti e aspettate. +I revisori sono persone occupate e potrebbero non ricevere la vostra patch +immediatamente. + +Un tempo, le patch erano solite scomparire nel vuoto senza alcun commento, +ma ora il processo di sviluppo funziona meglio. Dovreste ricevere commenti +in una settimana o poco più; se questo non dovesse accadere, assicuratevi di +aver inviato le patch correttamente. Aspettate almeno una settimana prima di +rinviare le modifiche o sollecitare i revisori - probabilmente anche di più +durante la finestra d'integrazione. + +10) Aggiungete PATCH nell'oggetto +--------------------------------- + +Dato l'alto volume di e-mail per Linus, e la lista linux-kernel, è prassi +prefiggere il vostro oggetto con [PATCH]. Questo permette a Linus e agli +altri sviluppatori del kernel di distinguere facilmente le patch dalle altre +discussioni. + + +11) Firmate il vostro lavoro - Il certificato d'origine dello sviluppatore +-------------------------------------------------------------------------- + +Per migliorare la tracciabilità su "chi ha fatto cosa", specialmente per +quelle patch che per raggiungere lo stadio finale passano attraverso +diversi livelli di manutentori, abbiamo introdotto la procedura di "firma" +delle patch che vengono inviate per e-mail. + +La firma è una semplice riga alla fine della descrizione della patch che +certifica che l'avete scritta voi o che avete il diritto di pubblicarla +come patch open-source. Le regole sono abbastanza semplici: se potete +certificare quanto segue: + +Il certificato d'origine dello sviluppatore 1.1 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Contribuendo a questo progetto, io certifico che: + + (a) Il contributo è stato creato interamente, o in parte, da me e che + ho il diritto di inviarlo in accordo con la licenza open-source + indicata nel file; oppure + + (b) Il contributo è basato su un lavoro precedente che, nei limiti + della mia conoscenza, è coperto da un'appropriata licenza + open-source che mi da il diritto di modificarlo e inviarlo, + le cui modifiche sono interamente o in parte mie, in accordo con + la licenza open-source (a meno che non abbia il permesso di usare + un'altra licenza) indicata nel file; oppure + + (c) Il contributo mi è stato fornito direttamente da qualcuno che + ha certificato (a), (b) o (c) e non l'ho modificata. + + (d) Capisco e concordo col fatto che questo progetto e i suoi + contributi sono pubblici e che un registro dei contributi (incluse + tutte le informazioni personali che invio con essi, inclusa la mia + firma) verrà mantenuto indefinitamente e che possa essere + ridistribuito in accordo con questo progetto o le licenze + open-source coinvolte. + +poi dovete solo aggiungere una riga che dice:: + + Signed-off-by: Random J Developer <random@developer.example.org> + +usando il vostro vero nome (spiacenti, non si accettano pseudonimi o +contributi anonimi). + +Alcune persone aggiungono delle etichette alla fine. Per ora queste verranno +ignorate, ma potete farlo per meglio identificare procedure aziendali interne o +per aggiungere dettagli circa la firma. + +Se siete un manutentore di un sottosistema o di un ramo, qualche volta dovrete +modificare leggermente le patch che avete ricevuto al fine di poterle +integrare; questo perché il codice non è esattamente lo stesso nei vostri +sorgenti e in quelli dei vostri contributori. Se rispettate rigidamente la +regola (c), dovreste chiedere al mittente di rifare la patch, ma questo è +controproducente e una totale perdita di tempo ed energia. La regola (b) +vi permette di correggere il codice, ma poi diventa davvero maleducato cambiare +la patch di qualcuno e addossargli la responsabilità per i vostri bachi. +Per risolvere questo problema dovreste aggiungere una riga, fra l'ultimo +Signed-off-by e il vostro, che spiega la vostra modifica. Nonostante non ci +sia nulla di obbligatorio, un modo efficace è quello di indicare il vostro +nome o indirizzo email fra parentesi quadre, seguito da una breve descrizione; +questo renderà abbastanza visibile chi è responsabile per le modifiche +dell'ultimo minuto. Per esempio:: + + Signed-off-by: Random J Developer <random@developer.example.org> + [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h] + Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org> + +Questa pratica è particolarmente utile se siete i manutentori di un ramo +stabile ma al contempo volete dare credito agli autori, tracciare e integrare +le modifiche, e proteggere i mittenti dalle lamentele. Notate che in nessuna +circostanza è permessa la modifica dell'identità dell'autore (l'intestazione +From), dato che è quella che appare nei changelog. + +Un appunto speciale per chi porta il codice su vecchie versioni. Sembra che +sia comune l'utile pratica di inserire un'indicazione circa l'origine della +patch all'inizio del messaggio di commit (appena dopo la riga dell'oggetto) +al fine di migliorare la tracciabilità. Per esempio, questo è quello che si +vede nel rilascio stabile 3.x-stable:: + + Date: Tue Oct 7 07:26:38 2014 -0400 + + libata: Un-break ATA blacklist + + commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream. + +E qui quello che potrebbe vedersi su un kernel più vecchio dove la patch è +stata applicata:: + + Date: Tue May 13 22:12:27 2008 +0200 + + wireless, airo: waitbusy() won't delay + + [backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a] + +Qualunque sia il formato, questa informazione fornisce un importante aiuto +alle persone che vogliono seguire i vostri sorgenti, e quelle che cercano +dei bachi. + + +12) Quando utilizzare Acked-by:, Cc:, e Co-developed-by: +-------------------------------------------------------- + +L'etichetta Signed-off-by: indica che il firmatario è stato coinvolto nello +sviluppo della patch, o che era nel suo percorso di consegna. + +Se una persona non è direttamente coinvolta con la preparazione o gestione +della patch ma desidera firmare e mettere agli atti la loro approvazione, +allora queste persone possono chiedere di aggiungere al changelog della patch +una riga Acked-by:. + +Acked-by: viene spesso utilizzato dai manutentori del sottosistema in oggetto +quando quello stesso manutentore non ha contribuito né trasmesso la patch. + +Acked-by: non è formale come Signed-off-by:. Questo indica che la persona ha +revisionato la patch e l'ha trovata accettabile. Per cui, a volte, chi +integra le patch convertirà un "sì, mi sembra che vada bene" in un Acked-by: +(ma tenete presente che solitamente è meglio chiedere esplicitamente). + +Acked-by: non indica l'accettazione di un'intera patch. Per esempio, quando +una patch ha effetti su diversi sottosistemi e ha un Acked-by: da un +manutentore di uno di questi, significa che il manutentore accetta quella +parte di codice relativa al sottosistema che mantiene. Qui dovremmo essere +giudiziosi. Quando si hanno dei dubbi si dovrebbe far riferimento alla +discussione originale negli archivi della lista di discussione. + +Se una persona ha avuto l'opportunità di commentare la patch, ma non lo ha +fatto, potete aggiungere l'etichetta ``Cc:`` alla patch. Questa è l'unica +etichetta che può essere aggiunta senza che la persona in questione faccia +alcunché - ma dovrebbe indicare che la persona ha ricevuto una copia della +patch. Questa etichetta documenta che terzi potenzialmente interessati sono +stati inclusi nella discussione. + +L'etichetta Co-developed-by: indica che la patch è stata scritta dall'autore in +collaborazione con un altro sviluppatore. Qualche volta questo è utile quando +più persone lavorano sulla stessa patch. Notate, questa persona deve avere +nella patch anche una riga Signed-off-by:. + + +13) Utilizzare Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: e Fixes: +----------------------------------------------------------------------------- + +L'etichetta Reported-by da credito alle persone che trovano e riportano i bachi +e si spera che questo possa ispirarli ad aiutarci nuovamente in futuro. +Rammentate che se il baco è stato riportato in privato, dovrete chiedere il +permesso prima di poter utilizzare l'etichetta Reported-by. + +L'etichetta Tested-by: indica che la patch è stata verificata con successo +(su un qualche sistema) dalla persona citata. Questa etichetta informa i +manutentori che qualche verifica è stata fatta, fornisce un mezzo per trovare +persone che possano verificare il codice in futuro, e garantisce che queste +stesse persone ricevano credito per il loro lavoro. + +Reviewd-by:, invece, indica che la patch è stata revisionata ed è stata +considerata accettabile in accordo con la dichiarazione dei revisori: + +Dichiarazione di svista dei revisori +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Offrendo la mia etichetta Reviewed-by, dichiaro quanto segue: + + (a) Ho effettuato una revisione tecnica di questa patch per valutarne + l'adeguatezza ai fini dell'inclusione nel ramo principale del + kernel. + + (b) Tutti i problemi e le domande riguardanti la patch sono stati + comunicati al mittente. Sono soddisfatto dalle risposte + del mittente. + + (c) Nonostante ci potrebbero essere cose migliorabili in queste + sottomissione, credo che sia, in questo momento, (1) una modifica + di interesse per il kernel, e (2) libera da problemi che + potrebbero metterne in discussione l'integrazione. + + (d) Nonostante abbia revisionato la patch e creda che vada bene, + non garantisco (se non specificato altrimenti) che questa + otterrà quello che promette o funzionerà correttamente in tutte + le possibili situazioni. + +L'etichetta Reviewed-by è la dichiarazione di un parere sulla bontà di +una modifica che si ritiene appropriata e senza alcun problema tecnico +importante. Qualsiasi revisore interessato (quelli che lo hanno fatto) +possono offrire il proprio Reviewed-by per la patch. Questa etichetta serve +a dare credito ai revisori e a informare i manutentori sul livello di revisione +che è stato fatto sulla patch. L'etichetta Reviewd-by, quando fornita da +revisori conosciuti per la loro conoscenza sulla materia in oggetto e per la +loro serietà nella revisione, accrescerà le probabilità che la vostra patch +venga integrate nel kernel. + +L'etichetta Suggested-by: indica che l'idea della patch è stata suggerita +dalla persona nominata e le da credito. Tenete a mente che questa etichetta +non dovrebbe essere aggiunta senza un permesso esplicito, specialmente se +l'idea non è stata pubblicata in un forum pubblico. Detto ciò, dando credito +a chi ci fornisce delle idee, si spera di poterli ispirare ad aiutarci +nuovamente in futuro. + +L'etichetta Fixes: indica che la patch corregge un problema in un commit +precedente. Serve a chiarire l'origine di un baco, il che aiuta la revisione +del baco stesso. Questa etichetta è di aiuto anche per i manutentori dei +kernel stabili al fine di capire quale kernel deve ricevere la correzione. +Questo è il modo suggerito per indicare che un baco è stato corretto nella +patch. Per maggiori dettagli leggete :ref:`it_describe_changes` + + +14) Il formato canonico delle patch +----------------------------------- + +Questa sezione descrive il formato che dovrebbe essere usato per le patch. +Notate che se state usando un repositorio ``git`` per salvare le vostre patch +potere usare il comando ``git format-patch`` per ottenere patch nel formato +appropriato. Lo strumento non crea il testo necessario, per cui, leggete +le seguenti istruzioni. + +L'oggetto di una patch canonica è la riga:: + + Subject: [PATCH 001/123] subsystem: summary phrase + +Il corpo di una patch canonica contiene i seguenti elementi: + + - Una riga ``from`` che specifica l'autore della patch, seguita + da una riga vuota (necessaria soltanto se la persona che invia la + patch non ne è l'autore). + + - Il corpo della spiegazione, con linee non più lunghe di 75 caratteri, + che verrà copiato permanentemente nel changelog per descrivere la patch. + + - Una riga vuota + + - Le righe ``Signed-off-by:``, descritte in precedenza, che finiranno + anch'esse nel changelog. + + - Una linea di demarcazione contenente semplicemente ``---``. + + - Qualsiasi altro commento che non deve finire nel changelog. + + - Le effettive modifiche al codice (il prodotto di ``diff``). + +Il formato usato per l'oggetto permette ai programmi di posta di usarlo +per ordinare le patch alfabeticamente - tutti i programmi di posta hanno +questa funzionalità - dato che al numero sequenziale si antepongono degli zeri; +in questo modo l'ordine numerico ed alfabetico coincidono. + +Il ``subsystem`` nell'oggetto dell'email dovrebbe identificare l'area +o il sottosistema modificato dalla patch. + +La ``summary phrase`` nell'oggetto dell'email dovrebbe descrivere brevemente +il contenuto della patch. La ``summary phrase`` non dovrebbe essere un nome +di file. Non utilizzate la stessa ``summary phrase`` per tutte le patch in +una serie (dove una ``serie di patch`` è una sequenza ordinata di diverse +patch correlate). + +Ricordatevi che la ``summary phrase`` della vostra email diventerà un +identificatore globale ed unico per quella patch. Si propaga fino al +changelog ``git``. La ``summary phrase`` potrà essere usata in futuro +dagli sviluppatori per riferirsi a quella patch. Le persone vorranno +cercare la ``summary phrase`` su internet per leggere le discussioni che la +riguardano. Potrebbe anche essere l'unica cosa che le persone vedranno +quando, in due o tre mesi, riguarderanno centinaia di patch usando strumenti +come ``gitk`` o ``git log --oneline``. + +Per queste ragioni, dovrebbe essere lunga fra i 70 e i 75 caratteri, e deve +descrivere sia cosa viene modificato, sia il perché sia necessario. Essere +brevi e descrittivi è una bella sfida, ma questo è quello che fa un riassunto +ben scritto. + +La ``summary phrase`` può avere un'etichetta (*tag*) di prefisso racchiusa fra +le parentesi quadre "Subject: [PATCH <tag>...] <summary phrase>". +Le etichette non verranno considerate come parte della frase riassuntiva, ma +indicano come la patch dovrebbe essere trattata. Fra le etichette più comuni +ci sono quelle di versione che vengono usate quando una patch è stata inviata +più volte (per esempio, "v1, v2, v3"); oppure "RFC" per indicare che si +attendono dei commenti (*Request For Comments*). Se ci sono quattro patch +nella serie, queste dovrebbero essere enumerate così: 1/4, 2/4, 3/4, 4/4. +Questo assicura che gli sviluppatori capiranno l'ordine in cui le patch +dovrebbero essere applicate, e per tracciare quelle che hanno revisionate o +che hanno applicato. + +Un paio di esempi di oggetti:: + + Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching + Subject: [PATCH v2 01/27] x86: fix eflags tracking + +La riga ``from`` dev'essere la prima nel corpo del messaggio ed è nel +formato: + + From: Original Author <author@example.com> + +La riga ``from`` indica chi verrà accreditato nel changelog permanente come +l'autore della patch. Se la riga ``from`` è mancante, allora per determinare +l'autore da inserire nel changelog verrà usata la riga ``From`` +nell'intestazione dell'email. + +Il corpo della spiegazione verrà incluso nel changelog permanente, per cui +deve aver senso per un lettore esperto che è ha dimenticato i dettagli della +discussione che hanno portato alla patch. L'inclusione di informazioni +sui problemi oggetto dalla patch (messaggi del kernel, messaggi di oops, +eccetera) è particolarmente utile per le persone che potrebbero cercare fra +i messaggi di log per la patch che li tratta. Se la patch corregge un errore +di compilazione, non sarà necessario includere proprio _tutto_ quello che +è uscito dal compilatore; aggiungete solo quello che è necessario per far si +che la vostra patch venga trovata. Come nella ``summary phrase``, è importante +essere sia brevi che descrittivi. + +La linea di demarcazione ``---`` serve essenzialmente a segnare dove finisce +il messaggio di changelog. + +Aggiungere il ``diffstat`` dopo ``---`` è un buon uso di questo spazio, per +mostrare i file che sono cambiati, e il numero di file aggiunto o rimossi. +Un ``diffstat`` è particolarmente utile per le patch grandi. Altri commenti +che sono importanti solo per i manutentori, quindi inadatti al changelog +permanente, dovrebbero essere messi qui. Un buon esempio per questo tipo +di commenti potrebbe essere quello di descrivere le differenze fra le versioni +della patch. + +Se includete un ``diffstat`` dopo ``---``, usate le opzioni ``-p 1 -w70`` +cosicché i nomi dei file elencati non occupino troppo spazio (facilmente +rientreranno negli 80 caratteri, magari con qualche indentazione). +(``git`` genera di base dei diffstat adatti). + +Maggiori dettagli sul formato delle patch nei riferimenti qui di seguito. + +.. _it_explicit_in_reply_to: + +15) Usare esplicitamente In-Reply-To nell'intestazione +------------------------------------------------------ + +Aggiungere manualmente In-Reply-To: nell'intestazione dell'e-mail +potrebbe essere d'aiuto per associare una patch ad una discussione +precedente, per esempio per collegare la correzione di un baco con l'e-mail +che lo riportava. Tuttavia, per serie di patch multiple è generalmente +sconsigliato l'uso di In-Reply-To: per collegare precedenti versioni. +In questo modo versioni multiple di una patch non diventeranno un'ingestibile +giungla di riferimenti all'interno dei programmi di posta. Se un collegamento +è utile, potete usare https://lkml.kernel.org/ per ottenere i collegamenti +ad una versione precedente di una serie di patch (per esempio, potete usarlo +per l'email introduttiva alla serie). + +16) Inviare richieste ``git pull`` +---------------------------------- + +Se avete una serie di patch, potrebbe essere più conveniente per un manutentore +tirarle dentro al repositorio del sottosistema attraverso l'operazione +``git pull``. Comunque, tenete presente che prendere patch da uno sviluppatore +in questo modo richiede un livello di fiducia più alto rispetto a prenderle da +una lista di discussione. Di conseguenza, molti manutentori sono riluttanti +ad accettare richieste di *pull*, specialmente dagli sviluppatori nuovi e +quindi sconosciuti. Se siete in dubbio, potete fare una richiesta di *pull* +come messaggio introduttivo ad una normale pubblicazione di patch, così +il manutentore avrà la possibilità di scegliere come integrarle. + +Una richiesta di *pull* dovrebbe avere nell'oggetto [GIT] o [PULL]. +La richiesta stessa dovrebbe includere il nome del repositorio e quello del +ramo su una singola riga; dovrebbe essere più o meno così:: + + Please pull from + + git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus + + to get these changes: + +Una richiesta di *pull* dovrebbe includere anche un messaggio generico +che dica cos'è incluso, una lista delle patch usando ``git shortlog``, e una +panoramica sugli effetti della serie di patch con ``diffstat``. Il modo più +semplice per ottenere tutte queste informazioni è, ovviamente, quello di +lasciar fare tutto a ``git`` con il comando ``git request-pull``. + +Alcuni manutentori (incluso Linus) vogliono vedere le richieste di *pull* +da commit firmati con GPG; questo fornisce una maggiore garanzia sul fatto +che siate stati proprio voi a fare la richiesta. In assenza di tale etichetta +firmata Linus, in particolare, non prenderà alcuna patch da siti pubblici come +GitHub. + +Il primo passo verso la creazione di questa etichetta firmata è quello di +creare una chiave GNUPG ed averla fatta firmare da uno o più sviluppatori +principali del kernel. Questo potrebbe essere difficile per i nuovi +sviluppatori, ma non ci sono altre vie. Andare alle conferenze potrebbe +essere un buon modo per trovare sviluppatori che possano firmare la vostra +chiave. + +Una volta che avete preparato la vostra serie di patch in ``git``, e volete che +qualcuno le prenda, create una etichetta firmata col comando ``git tag -s``. +Questo creerà una nuova etichetta che identifica l'ultimo commit della serie +contenente una firma creata con la vostra chiave privata. Avrete anche +l'opportunità di aggiungere un messaggio di changelog all'etichetta; questo è +il posto ideale per descrivere gli effetti della richiesta di *pull*. + +Se i sorgenti da cui il manutentore prenderà le patch non sono gli stessi del +repositorio su cui state lavorando, allora non dimenticatevi di caricare +l'etichetta firmata anche sui sorgenti pubblici. + +Quando generate una richiesta di *pull*, usate l'etichetta firmata come +obiettivo. Un comando come il seguente farà il suo dovere:: + + git request-pull master git://my.public.tree/linux.git my-signed-tag + + +Riferimenti +----------- + +Andrew Morton, "La patch perfetta" (tpp). + <http://www.ozlabs.org/~akpm/stuff/tpp.txt> + +Jeff Garzik, "Formato per la sottomissione di patch per il kernel Linux" + <http://linux.yyz.us/patch-format.html> + +Greg Kroah-Hartman, "Come scocciare un manutentore di un sottosistema" + <http://www.kroah.com/log/linux/maintainer.html> + + <http://www.kroah.com/log/linux/maintainer-02.html> + + <http://www.kroah.com/log/linux/maintainer-03.html> + + <http://www.kroah.com/log/linux/maintainer-04.html> + + <http://www.kroah.com/log/linux/maintainer-05.html> + + <http://www.kroah.com/log/linux/maintainer-06.html> + +No!!!! Basta gigantesche bombe patch alle persone sulla lista linux-kernel@vger.kernel.org! + <https://lkml.org/lkml/2005/7/11/336> + +Kernel Documentation/translations/it_IT/process/coding-style.rst: + :ref:`Documentation/translations/it_IT/process/coding-style.rst <it_codingstyle>` + +E-mail di Linus Torvalds sul formato canonico di una patch: + <http://lkml.org/lkml/2005/4/7/183> + +Andi Kleen, "Su come sottomettere patch del kernel" + Alcune strategie su come sottomettere modifiche toste o controverse. + + http://halobates.de/on-submitting-patches.pdf diff --git a/Documentation/translations/ja_JP/howto.rst b/Documentation/translations/ja_JP/howto.rst @@ -245,7 +245,7 @@ Linux カーネルソースツリーの中に含まれる、きれいにし、 できます。この最新の素晴しいカーネルコードのリポジトリは以下で見つかり ます - - http://lxr.free-electrons.com/ + https://elixir.bootlin.com/ 開発プロセス ------------ @@ -256,7 +256,6 @@ Linux カーネルの開発プロセスは現在幾つかの異なるメイン - メインの 4.x カーネルツリー - 4.x.y -stable カーネルツリー - - 4.x -git カーネルパッチ - サブシステム毎のカーネルツリーとパッチ - 統合テストのための 4.x -next カーネルツリー @@ -319,15 +318,6 @@ Documentation/process/stable-kernel-rules.rst ファイルにはどのような 類の変更が -stable ツリーに受け入れ可能か、またリリースプロセスがどう 動くかが記述されています。 -4.x -git パッチ -~~~~~~~~~~~~~~~ - -git リポジトリで管理されているLinus のカーネルツリーの毎日のスナップ -ショットがあります。(だから -git という名前がついています)。これらのパッ -チはおおむね毎日リリースされており、Linus のツリーの現状を表します。こ -れは -rc カーネルと比べて、パッチが大丈夫かどうかも確認しないで自動的 -に生成されるので、より実験的です。 - サブシステム毎のカーネルツリーとパッチ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst @@ -77,10 +77,12 @@ Documentation/process/howto.rst 리눅스 커널 소스 코드는 GPL로 배포(release)되었다. 소스트리의 메인 디렉토리에 있는 라이센스에 관하여 상세하게 쓰여 있는 COPYING이라는 -파일을 봐라. 여러분이 라이센스에 관한 더 깊은 문제를 가지고 있다면 -리눅스 커널 메일링 리스트에 묻지말고 변호사와 연락하라. 메일링 -리스트들에 있는 사람들은 변호사가 아니기 때문에 법적 문제에 관하여 -그들의 말에 의지해서는 안된다. +파일을 봐라. 리눅스 커널 라이센싱 규칙과 소스 코드 안의 `SPDX +<https://spdx.org/>`_ 식별자 사용법은 +:ref:`Documentation/process/license-rules.rst <kernel_licensing>` 에 설명되어 +있다. 여러분이 라이센스에 관한 더 깊은 문제를 가지고 있다면 리눅스 커널 메일링 +리스트에 묻지말고 변호사와 연락하라. 메일링 리스트들에 있는 사람들은 변호사가 +아니기 때문에 법적 문제에 관하여 그들의 말에 의지해서는 안된다. GPL에 관한 잦은 질문들과 답변들은 다음을 참조하라. @@ -99,7 +101,7 @@ mtk.manpages@gmail.com의 메인테이너에게 보낼 것을 권장한다. 다음은 커널 소스 트리에 있는 읽어야 할 파일들의 리스트이다. - README + :ref:`Documentation/admin-guide/README.rst <readme>` 이 파일은 리눅스 커널에 관하여 간단한 배경 설명과 커널을 설정하고 빌드하기 위해 필요한 것을 설명한다. 커널에 입문하는 사람들은 여기서 시작해야 한다. @@ -220,13 +222,6 @@ ReST 마크업을 사용하는 문서들은 Documentation/output 에 생성된 가지고 있지 않다면 다음에 무엇을 해야할지에 관한 방향을 배울 수 있을 것이다. -여러분들이 이미 커널 트리에 반영하길 원하는 코드 묶음을 가지고 있지만 -올바른 포맷으로 포장하는데 도움이 필요하다면 그러한 문제를 돕기 위해 -만들어진 kernel-mentors 프로젝트가 있다. 그곳은 메일링 리스트이며 -다음에서 참조할 수 있다. - - https://selenic.com/mailman/listinfo/kernel-mentors - 리눅스 커널 코드에 실제 변경을 하기 전에 반드시 그 코드가 어떻게 동작하는지 이해하고 있어야 한다. 코드를 분석하기 위하여 특정한 툴의 도움을 빌려서라도 코드를 직접 읽는 것보다 좋은 것은 없다(대부분의 @@ -235,7 +230,7 @@ ReST 마크업을 사용하는 문서들은 Documentation/output 에 생성된 소스코드를 인덱스된 웹 페이지들의 형태로 보여준다. 최신의 멋진 커널 코드 저장소는 다음을 통하여 참조할 수 있다. - http://lxr.free-electrons.com/ + https://elixir.bootlin.com/ 개발 프로세스 @@ -247,7 +242,6 @@ ReST 마크업을 사용하는 문서들은 Documentation/output 에 생성된 - main 4.x 커널 트리 - 4.x.y - 안정된 커널 트리 - - 4.x -git 커널 패치들 - 서브시스템을 위한 커널 트리들과 패치들 - 4.x - 통합 테스트를 위한 next 커널 트리 @@ -303,17 +297,9 @@ Andrew Morton의 글이 있다. 4.x.y는 "stable" 팀<stable@vger.kernel.org>에 의해 관리되며 거의 매번 격주로 배포된다. -커널 트리 문서들 내에 Documentation/process/stable-kernel-rules.rst 파일은 어떤 -종류의 변경들이 -stable 트리로 들어왔는지와 배포 프로세스가 어떻게 -진행되는지를 설명한다. - -4.x -git 패치들 -~~~~~~~~~~~~~~~ - -git 저장소(그러므로 -git이라는 이름이 붙음)에는 날마다 관리되는 Linus의 -커널 트리의 snapshot 들이 있다. 이 패치들은 일반적으로 날마다 배포되며 -Linus의 트리의 현재 상태를 나타낸다. 이 패치들은 정상적인지 조금도 -살펴보지 않고 자동적으로 생성된 것이므로 -rc 커널들 보다도 더 실험적이다. +커널 트리 문서들 내의 :ref:`Documentation/process/stable-kernel-rules.rst <stable_kernel_rules>` +파일은 어떤 종류의 변경들이 -stable 트리로 들어왔는지와 +배포 프로세스가 어떻게 진행되는지를 설명한다. 서브시스템 커널 트리들과 패치들 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -360,9 +346,10 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버 https://bugzilla.kernel.org/page.cgi?id=faq.html -메인 커널 소스 디렉토리에 있는 admin-guide/reporting-bugs.rst 파일은 커널 버그라고 생각되는 -것을 보고하는 방법에 관한 좋은 템플릿이며 문제를 추적하기 위해서 커널 -개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고 있다. +메인 커널 소스 디렉토리에 있는 :ref:`admin-guide/reporting-bugs.rst <reportingbugs>` +파일은 커널 버그라고 생각되는 것을 보고하는 방법에 관한 좋은 템플릿이며 문제를 +추적하기 위해서 커널 개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고 +있다. 버그 리포트들의 관리 @@ -377,15 +364,7 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버 다른 사람들의 버그들을 수정하기 위하여 시간을 낭비하지 않기 때문이다. 이미 보고된 버그 리포트들을 가지고 작업하기 위해서 https://bugzilla.kernel.org -를 참조하라. 여러분이 앞으로 생겨날 버그 리포트들의 조언자가 되길 원한다면 -bugme-new 메일링 리스트나(새로운 버그 리포트들만이 이곳에서 메일로 전해진다) -bugme-janitor 메일링 리스트(bugzilla에 모든 변화들이 여기서 메일로 전해진다) -에 등록하면 된다. - - https://lists.linux-foundation.org/mailman/listinfo/bugme-new - - https://lists.linux-foundation.org/mailman/listinfo/bugme-janitors - +를 참조하라. 메일링 리스트들 @@ -430,7 +409,8 @@ bugme-janitor 메일링 리스트(bugzilla에 모든 변화들이 여기서 메 "John 커널해커는 작성했다...."를 유지하며 여러분들의 의견을 그 메일의 윗부분에 작성하지 말고 각 인용한 단락들 사이에 넣어라. -여러분들이 패치들을 메일에 넣는다면 그것들은 Documentation/process/submitting-patches.rst에 +여러분들이 패치들을 메일에 넣는다면 그것들은 +:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` 에 나와있는데로 명백히(plain) 읽을 수 있는 텍스트여야 한다. 커널 개발자들은 첨부파일이나 압축된 패치들을 원하지 않는다. 그들은 여러분들의 패치의 각 라인 단위로 코멘트를 하길 원하며 압축하거나 첨부하지 않고 보내는 것이 diff --git a/Documentation/translations/zh_CN/HOWTO b/Documentation/translations/zh_CN/HOWTO @@ -192,7 +192,6 @@ Linux内核代码中包含有大量的文档。这些文档对于学习如何与 些分支包括: - 2.6.x主内核源码树 - 2.6.x.y -stable内核源码树 - - 2.6.x -git内核补丁集 - 2.6.x -mm内核补丁集 - 子系统相关的内核源码树和补丁集 @@ -240,14 +239,6 @@ kernel.org网站的pub/linux/kernel/v2.6/目录下找到它。它的开发遵循 版内核接受的修改类型以及发布的流程。 -2.6.x -git补丁集 ----------------- -Linus的内核源码树的每日快照,这个源码树是由git工具管理的(由此得名)。这 -些补丁通常每天更新以反映Linus的源码树的最新状态。它们比-rc版本的内核源码 -树更具试验性质,因为这个补丁集是全自动生成的,没有任何人来确认其是否真正 -健全。 - - 2.6.x -mm补丁集 --------------- 这是由Andrew Morton维护的试验性内核补丁集。Andrew将所有子系统的内核源码 diff --git a/Documentation/translations/zh_CN/coding-style.rst b/Documentation/translations/zh_CN/coding-style.rst @@ -535,26 +535,43 @@ Documentation/doc-guide/ 和 scripts/kernel-doc 以获得详细信息。 (* (max steps 1) c-basic-offset))) - (add-hook 'c-mode-common-hook - (lambda () - ;; Add kernel style - (c-add-style - "linux-tabs-only" - '("linux" (c-offsets-alist - (arglist-cont-nonempty - c-lineup-gcc-asm-reg - c-lineup-arglist-tabs-only)))))) - - (add-hook 'c-mode-hook - (lambda () - (let ((filename (buffer-file-name))) - ;; Enable kernel mode for the appropriate files - (when (and filename - (string-match (expand-file-name "~/src/linux-trees") - filename)) - (setq indent-tabs-mode t) - (setq show-trailing-whitespace t) - (c-set-style "linux-tabs-only"))))) + (dir-locals-set-class-variables + 'linux-kernel + '((c-mode . ( + (c-basic-offset . 8) + (c-label-minimum-indentation . 0) + (c-offsets-alist . ( + (arglist-close . c-lineup-arglist-tabs-only) + (arglist-cont-nonempty . + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (arglist-intro . +) + (brace-list-intro . +) + (c . c-lineup-C-comments) + (case-label . 0) + (comment-intro . c-lineup-comment) + (cpp-define-intro . +) + (cpp-macro . -1000) + (cpp-macro-cont . +) + (defun-block-intro . +) + (else-clause . 0) + (func-decl-cont . +) + (inclass . +) + (inher-cont . c-lineup-multi-inher) + (knr-argdecl-intro . 0) + (label . -1000) + (statement . 0) + (statement-block-intro . +) + (statement-case-intro . +) + (statement-cont . +) + (substatement . +) + )) + (indent-tabs-mode . t) + (show-trailing-whitespace . t) + )))) + + (dir-locals-set-directory-class + (expand-file-name "~/src/linux-trees") + 'linux-kernel) 这会让 emacs 在 ``~/src/linux-trees`` 下的 C 源文件获得更好的内核代码风格。 diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst @@ -4,7 +4,7 @@ Linux Memory Management Documentation This is a collection of documents about the Linux memory management (mm) subsystem. If you are looking for advice on simply allocating memory, -see the :ref:`memory-allocation`. +see the :ref:`memory_allocation`. User guides for MM features =========================== diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst @@ -66,7 +66,7 @@ Trying to find an issue in the dentry cache? Try:: to only enable debugging on the dentry cache. You may use an asterisk at the end of the slab name, in order to cover all slabs with the same prefix. For example, here's how you can poison the dentry cache as well as all kmalloc -slabs: +slabs:: slub_debug=P,kmalloc-*,dentry @@ -141,7 +141,7 @@ can be influenced by kernel parameters: (list_lock) where contention may occur. ``slub_min_order`` - specifies a minim order of slabs. A similar effect like + specifies a minimum order of slabs. A similar effect like ``slub_min_objects``. ``slub_max_order`` diff --git a/LICENSES/exceptions/GCC-exception-2.0 b/LICENSES/exceptions/GCC-exception-2.0 @@ -0,0 +1,18 @@ +SPDX-Exception-Identifier: GCC-exception-2.0 +SPDX-URL: https://spdx.org/licenses/GCC-exception-2.0.html +SPDX-Licenses: GPL-2.0, GPL-2.0+, GPL-2.0-only, GPL-2.0-or-later +Usage-Guide: + This exception is used together with one of the above SPDX-Licenses to + allow linking the compiled version of code to non GPL compliant code. + To use this exception add it with the keyword WITH to one of the + identifiers in the SPDX-Licenses tag: + SPDX-License-Identifier: <SPDX-License> WITH GCC-exception-2.0 +License-Text: + +In addition to the permissions in the GNU Library General Public License, +the Free Software Foundation gives you unlimited permission to link the +compiled version of this file into combinations with other programs, and to +distribute those programs without any restriction coming from the use of +this file. (The General Public License restrictions do apply in other +respects; for example, they cover modification of the file, and +distribution when not linked into another program.) diff --git a/include/linux/module.h b/include/linux/module.h @@ -172,7 +172,7 @@ extern void cleanup_module(void); * The following license idents are currently accepted as indicating free * software modules * - * "GPL" [GNU Public License v2 or later] + * "GPL" [GNU Public License v2] * "GPL v2" [GNU Public License v2] * "GPL and additional rights" [GNU Public License v2 rights and more] * "Dual BSD/GPL" [GNU Public License v2 @@ -186,6 +186,22 @@ extern void cleanup_module(void); * * "Proprietary" [Non free products] * + * Both "GPL v2" and "GPL" (the latter also in dual licensed strings) are + * merely stating that the module is licensed under the GPL v2, but are not + * telling whether "GPL v2 only" or "GPL v2 or later". The reason why there + * are two variants is a historic and failed attempt to convey more + * information in the MODULE_LICENSE string. For module loading the + * "only/or later" distinction is completely irrelevant and does neither + * replace the proper license identifiers in the corresponding source file + * nor amends them in any way. The sole purpose is to make the + * 'Proprietary' flagging work and to refuse to bind symbols which are + * exported with EXPORT_SYMBOL_GPL when a non free module is loaded. + * + * In the same way "BSD" is not a clear license information. It merely + * states, that the module is licensed under one of the compatible BSD + * license variants. The detailed and correct license information is again + * to be found in the corresponding source files. + * * There are dual licensed components, but when running with Linux it is the * GPL that is relevant so this is a non issue. Similarly LGPL linked with GPL * is a GPL combined work. diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h @@ -4323,7 +4323,7 @@ static inline bool skb_head_is_locked(const struct sk_buff *skb) /* Local Checksum Offload. * Compute outer checksum based on the assumption that the * inner checksum will be offloaded later. - * See Documentation/networking/checksum-offloads.txt for + * See Documentation/networking/checksum-offloads.rst for * explanation of how this works. * Fill in outer checksum adjustment (e.g. with sum of outer * pseudo-header) before calling. diff --git a/samples/Kconfig b/samples/Kconfig @@ -147,6 +147,13 @@ config SAMPLE_VFIO_MDEV_MBOCHS Specifically it does *not* include any legacy vga stuff. Device looks a lot like "qemu -device secondary-vga". +config SAMPLE_ANDROID_BINDERFS + bool "Build Android binderfs example" + depends on CONFIG_ANDROID_BINDERFS + help + Builds a sample program to illustrate the use of the Android binderfs + filesystem. + config SAMPLE_STATX bool "Build example extended-stat using code" depends on BROKEN diff --git a/samples/Makefile b/samples/Makefile @@ -3,4 +3,4 @@ obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \ hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \ configfs/ connector/ v4l/ trace_printk/ \ - vfio-mdev/ statx/ qmi/ + vfio-mdev/ statx/ qmi/ binderfs/ diff --git a/samples/binderfs/Makefile b/samples/binderfs/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs_example.o diff --git a/samples/binderfs/binderfs_example.c b/samples/binderfs/binderfs_example.c @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/mount.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <unistd.h> +#include <linux/android/binder.h> +#include <linux/android/binderfs.h> + +int main(int argc, char *argv[]) +{ + int fd, ret, saved_errno; + size_t len; + struct binderfs_device device = { 0 }; + + ret = unshare(CLONE_NEWNS); + if (ret < 0) { + fprintf(stderr, "%s - Failed to unshare mount namespace\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + ret = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0); + if (ret < 0) { + fprintf(stderr, "%s - Failed to mount / as private\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + ret = mkdir("/dev/binderfs", 0755); + if (ret < 0 && errno != EEXIST) { + fprintf(stderr, "%s - Failed to create binderfs mountpoint\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + ret = mount(NULL, "/dev/binderfs", "binder", 0, 0); + if (ret < 0) { + fprintf(stderr, "%s - Failed to mount binderfs\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + memcpy(device.name, "my-binder", strlen("my-binder")); + + fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC); + if (fd < 0) { + fprintf(stderr, "%s - Failed to open binder-control device\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + ret = ioctl(fd, BINDER_CTL_ADD, &device); + saved_errno = errno; + close(fd); + errno = saved_errno; + if (ret < 0) { + fprintf(stderr, "%s - Failed to allocate new binder device\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + printf("Allocated new binder device with major %d, minor %d, and name %s\n", + device.major, device.minor, device.name); + + ret = unlink("/dev/binderfs/my-binder"); + if (ret < 0) { + fprintf(stderr, "%s - Failed to delete binder device\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + /* Cleanup happens when the mount namespace dies. */ + exit(EXIT_SUCCESS); +} diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl @@ -6396,19 +6396,6 @@ sub process { } } -# check for bool bitfields - if ($sline =~ /^.\s+bool\s*$Ident\s*:\s*\d+\s*;/) { - WARN("BOOL_BITFIELD", - "Avoid using bool as bitfield. Prefer bool bitfields as unsigned int or u<8|16|32>\n" . $herecurr); - } - -# check for bool use in .h files - if ($realfile =~ /\.h$/ && - $sline =~ /^.\s+bool\s*$Ident\s*(?::\s*d+\s*)?;/) { - CHK("BOOL_MEMBER", - "Avoid using bool structure members because of possible alignment issues - see: https://lkml.org/lkml/2017/11/21/384\n" . $herecurr); - } - # check for semaphores initialized locked if ($line =~ /^.\s*sema_init.+,\W?0\W?\)/) { WARN("CONSIDER_COMPLETION", diff --git a/scripts/kernel-doc b/scripts/kernel-doc @@ -1474,7 +1474,7 @@ sub push_parameter($$$$) { if (!defined $parameterdescs{$param} && $param !~ /^#/) { $parameterdescs{$param} = $undescribed; - if (show_warnings($type, $declaration_name)) { + if (show_warnings($type, $declaration_name) && $param !~ /\./) { print STDERR "${file}:$.: warning: Function parameter or member '$param' not described in '$declaration_name'\n"; ++$warnings; diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py @@ -175,7 +175,13 @@ class id_parser(object): self.lines_checked += 1 if line.find("SPDX-License-Identifier:") < 0: continue - expr = line.split(':')[1].replace('*/', '').strip() + expr = line.split(':')[1].strip() + # Remove trailing comment closure + if line.strip().endswith('*/'): + expr = expr.rstrip('*/').strip() + # Special case for SH magic boot code files + if line.startswith('LIST \"'): + expr = expr.rstrip('\"').strip() self.parse(expr) self.spdx_valid += 1 # diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c @@ -4472,7 +4472,7 @@ err_af: } /* This supports connect(2) and SCTP connect services such as sctp_connectx(3) - * and sctp_sendmsg(3) as described in Documentation/security/LSM-sctp.rst + * and sctp_sendmsg(3) as described in Documentation/security/SCTP.rst */ static int selinux_socket_connect_helper(struct socket *sock, struct sockaddr *address, int addrlen) diff --git a/tools/Makefile b/tools/Makefile @@ -12,6 +12,7 @@ help: @echo ' acpi - ACPI tools' @echo ' cgroup - cgroup tools' @echo ' cpupower - a tool for all things x86 CPU power' + @echo ' debugging - tools for debugging' @echo ' firewire - the userspace part of nosy, an IEEE-1394 traffic sniffer' @echo ' firmware - Firmware tools' @echo ' freefall - laptop accelerometer program for disk protection' @@ -61,7 +62,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci firmware debugging: FORCE $(call descend,$@) liblockdep: FORCE @@ -96,7 +97,8 @@ kvm_stat: FORCE all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ - tmon freefall iio objtool kvm_stat wmi pci + tmon freefall iio objtool kvm_stat wmi \ + pci debugging acpi_install: $(call descend,power/$(@:_install=),install) @@ -104,7 +106,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -130,7 +132,7 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install + wmi_install pci_install debugging_install acpi_clean: $(call descend,power/acpi,clean) @@ -138,7 +140,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean firmware_clean debugging_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -176,6 +178,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ perf_clean selftests_clean turbostat_clean spi_clean usb_clean virtio_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ - gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean + gpio_clean objtool_clean leds_clean wmi_clean pci_clean firmware_clean debugging_clean .PHONY: FORCE diff --git a/tools/debugging/Makefile b/tools/debugging/Makefile @@ -0,0 +1,16 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for debugging tools + +PREFIX ?= /usr +BINDIR ?= bin +INSTALL ?= install + +TARGET = kernel-chktaint + +all: $(TARGET) + +clean: + +install: kernel-chktaint + $(INSTALL) -D -m 755 $(TARGET) $(DESTDIR)$(PREFIX)/$(BINDIR)/$(TARGET) + diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint @@ -0,0 +1,202 @@ +#! /bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Randy Dunlap <rdunlap@infradead.org>, 2018 +# Thorsten Leemhuis <linux@leemhuis.info>, 2018 + +usage() +{ + cat <<EOF +usage: ${0##*/} + ${0##*/} <int> + +Call without parameters to decode /proc/sys/kernel/tainted. + +Call with a positive integer as parameter to decode a value you +retrieved from /proc/sys/kernel/tainted on another system. + +EOF +} + +if [ "$1"x != "x" ]; then + if [ "$1"x == "--helpx" ] || [ "$1"x == "-hx" ] ; then + usage + exit 1 + elif [ $1 -ge 0 ] 2>/dev/null ; then + taint=$1 + else + echo "Error: Parameter '$1' not a positive interger. Aborting." >&2 + exit 1 + fi +else + TAINTFILE="/proc/sys/kernel/tainted" + if [ ! -r $TAINTFILE ]; then + echo "No file: $TAINTFILE" + exit + fi + + taint=`cat $TAINTFILE` +fi + +if [ $taint -eq 0 ]; then + echo "Kernel not Tainted" + exit +else + echo "Kernel is \"tainted\" for the following reasons:" +fi + +T=$taint +out= + +addout() { + out=$out$1 +} + +if [ `expr $T % 2` -eq 0 ]; then + addout "G" +else + addout "P" + echo " * proprietary module was loaded (#0)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "F" + echo " * module was force loaded (#1)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "S" + echo " * SMP kernel oops on an officially SMP incapable processor (#2)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "R" + echo " * module was force unloaded (#3)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "M" + echo " * processor reported a Machine Check Exception (MCE) (#4)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "B" + echo " * bad page referenced or some unexpected page flags (#5)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "U" + echo " * taint requested by userspace application (#6)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "D" + echo " * kernel died recently, i.e. there was an OOPS or BUG (#7)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "A" + echo " * an ACPI table was overridden by user (#8)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "W" + echo " * kernel issued warning (#9)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "C" + echo " * staging driver was loaded (#10)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "I" + echo " * workaround for bug in platform firmware applied (#11)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "O" + echo " * externally-built ('out-of-tree') module was loaded (#12)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "E" + echo " * unsigned module was loaded (#13)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "L" + echo " * soft lockup occurred (#14)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "K" + echo " * kernel has been live patched (#15)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "X" + echo " * auxiliary taint, defined for and used by distros (#16)" + +fi +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "T" + echo " * kernel was built with the struct randomization plugin (#17)" +fi + +echo "For a more detailed explanation of the various taint flags see" +echo " Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel sources" +echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html" +echo "Raw taint value as int/string: $taint/'$out'" +#EOF#