System means general functions used to support and manage other kernel functionality. Synonym: infrastructure.
Booting and Halting
Kernel booting
This is loaded in two stages - in the first stage the kernel (as a compressed image file) is loaded into memory and decompressed, and a few fundamental functions such as essential hardware and basic memory management (memory paging) are set up. Control is then switched one final time to the main kernel start process calling start_kernel id, which then performs the majority of system setup (interrupts, the rest of memory management, device and driver initialization, etc.) before spawning separately, the idle process and scheduler, and the init process (which is executed in user space).
Kernel loading stage
The kernel as loaded is typically an image file, compressed into either zImage or bzImage formats with zlib. A routine at the head of it does a minimal amount of hardware setup, decompresses the image fully into high memory, and takes note of any RAM disk if configured. It then executes kernel startup via startup_64 (for x86_64 architecture).
- arch/x86/boot/compressed/vmlinux.lds.S src - linker script defines entry startup_64 id in
- arch/x86/boot/compressed/head_64.S src - assembly of extractor
- extract_kernel id - extractor in language C
- prints
Decompressing Linux... done. Booting the kernel.
Kernel startup stage
The startup function for the kernel (also called the swapper or process 0) establishes memory management (paging tables and memory paging), detects the type of CPU and any additional functionality such as floating point capabilities, and then switches to non-architecture specific Linux kernel functionality via a call to start_kernel id.
- arch/x86/kernel/vmlinux.lds.S src - linker script
- arch/x86/kernel/head_64.S src - assembly of uncompressed startup code
- arch/x86/kernel/head64.c src - platform depended startup:
- init/main.c src - main initialization code
- start_kernel id 200 SLOC
- rest_init id
- kernel_init id - deferred kernel thread #1
- kernel_init_freeable id This and following functions are defied with attribute __init id
- run_init_process id obviously runs the first process man 1 init
- kthreadd id - deferred kernel thread #2
- cpu_startup_entry id
- kernel_init id - deferred kernel thread #1
- rest_init id
- start_kernel id 200 SLOC
start_kernel id executes a wide range of initialization functions. It sets up interrupt handling (IRQs), further configures memory, starts the man 1 init process (the first user-space process), and then starts the idle task via cpu_startup_entry id. Notably, the kernel startup process also mounts the initial RAM disk (man 8 initrd) that was loaded previously as the temporary root file system during the boot phase. The initrd allows driver modules to be loaded directly from memory, without reliance upon other devices (e.g. a hard disk) and the drivers that are needed to access them (e.g. a SATA driver). This split of some drivers statically compiled into the kernel and other drivers loaded from initrd allows for a smaller kernel. The root file system is later switched via a call to man 8 pivot_root / man 2 pivot_root which unmounts the temporary root file system and replaces it with the use of the real one, once the latter is accessible. The memory used by the temporary root file system is then reclaimed.
π References
- Article about booting of the kernel
- Initial RAM disk doc
- Linux startup process
- init
- http://tldp.org/HOWTO/Linux-i386-Boot-Code-HOWTO/
- http://www.tldp.org/LDP/lki/lki-1.html
- http://www.tldp.org/HOWTO/KernelAnalysis-HOWTO-4.html
- IBM description of Linux BIOS boot process
- Linux (U)EFI boot process
- Linux kernel boot parameters
- kernel parameters doc
- Kernel booting process
- Kernel initialization process
Halting and rebooting
π§ TODO
β² API: sys_reboot id calls machine_restart id or machine_halt id or machine_power_off id
User space communication
π References
More
- include/uapi src - shared with user space API headers
System calls
βοΈ Internals:
- linux/syscalls.h inc
- syscall_init id
- entry_SYSCALL_64 id
- do_syscall_64 id
- man 2 syscall
- man 2 syscalls
π References
- System call
- Directory of system calls, man section 2
- Anatomy of a system call, part 1 and part 2
- syscalls ltp
πΎ Historical
Administration
π§ TODO
π References
procfs
The proc filesystem (procfs) is a special filesystem that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures in the kernel. It can be used to obtain information about the system and to change certain kernel parameters at runtime.
/proc includes a directory for each running process βincluding kernel threadsβ in directories named /proc/PID, where PID is the process number. Each directory contains information about one process, including the command that originally started the process (/proc/PID/cmdline), the names and values of its environment variables (/proc/PID/environ), a symlink to its working directory (/proc/PID/cwd), another symlink to the original executable file βif it still existsβ (/proc/PID/exe), a couple of directories with symlinks to each open file descriptor (/proc/PID/fd) and the status βposition, flags, ...β of each of them (/proc/PID/fdinfo), information about mapped files and blocks like heap and stack (/proc/PID/maps), a binary image representing the process's virtual memory (/proc/PID/mem), a symlink to the root path as seen by the process (/proc/PID/root), a directory containing hard links to any child process or thread (/proc/PID/task), basic information about a process including its run state and memory usage (/proc/PID/status) and much more.
π References
sysfs
sysfs is a pseudo-file system that exports information about various kernel subsystems, hardware devices, and associated device drivers from the kernel's device model to user space through virtual files. In addition to providing information about various devices and kernel subsystems, exported virtual files are also used for their configuring. Sysfs is designed to export the information present in the device tree, which would then no longer clutter up procfs.
Sysfs is mounted under the /sys mount point.
β² API:
π References
devtmpfs
devtmpfs is a hybrid kernel/userspace approach of a device filesystem to provide nodes before udev runs for the first time.
π References
Driver Model
or Device Model
Classes
A class is a higher-level view of a device that abstracts out low-level implementation details. Drivers may see a NVME storage or a SATA storage, but, at the class level, they are all simply block_class id devices. Classes allow user space to work with devices based on what they do, rather than how they are connected or how they work.
β² API:
- ls /sys/class/
- class_register id registers class id
- linux/device/class.h inc
π Examples: input_class id, block_class id net_class id
Buses
A bus is a channel between the processor and one or more devices. For the purposes of the device model, all devices are connected via a bus, even if it is an internal, virtual, platform_bus_type id. Buses can plug into each other. A USB controller is usually a PCI device, for example. The device model represents the actual connections between buses and the devices they control. A bus is represented by the bus_type id structure. It contains the name, the default attributes, the bus' methods, PM operations, and the driver core's private data.
β² API:
- ls /sys/bus/
- bus_register id registers bus_type id
- linux/device/bus.h inc
π Examples: usb_bus_type id, hid_bus_type id, pci_bus_type id, scsi_bus_type id, platform_bus_type id
Drivers
β² API:
- ls /sys/bus/*/drivers/
- module_driver id - simple common driver initializer, π for example used in module_pci_driver id
- driver_register id registers device_driver id - basic device driver structure, one per all device instances.
- linux/device/driver.h inc
π Examples: hid_generic id usb_register_device_driver id
Platform drivers
- module_platform_driver id registers platform_driver id (platform wrapper of device_driver id) with platform_bus_type id
- linux/platform_device.h inc
π Examples: gpio_mouse_device_driver id
Devices
β² API:
- ls /sys/devices/
- device_register id registers device id - the basic device structure, per each device instance
- linux/device.h inc
π Examples: platform_bus id mousedev_create
Platform devices
- platform_device id - platform wrapper of struct device - the basic device structure doc, contains resources associated with the devie
- it is can be created dynamically automatically by platform_device_register_simple id or platform_device_alloc id. Or registered with platform_device_register id.
- platform_device_unregister id - releases device and associated resources
π Examples: add_pcspkr id
β² API: π§ TODO
- platform_device_info platform_device_id platform_device_register_full platform_device_add
- platform_device_add_data platform_device_register_data platform_device_add_resources
- attribute_group dev_pm_ops
βοΈ Internals:
π References
Devices
Classic UNIX devices are Char devices used as byte streams with man 2 ioctl.
β² API:
ls /dev cat /proc/devices cat /proc/misc
Examples: misc_fops id usb_fops id memory_fops id
- drivers/char src - actually byte stream devices
- Chapter 13. I/O Architecture and Device Drivers
DMA
β² API:
- dma_alloc_coherent id dma_pool_create id dma_map_sg id dma_alloc_pages id pin_user_pages id
- linux/dma-mapping.h inc
- linux/gfp.h inc
- linux/dmapool.h inc
- DMA-able memory: __get_free_page id kmalloc id kmem_cache_alloc id
- get_user_pages id pins user pages in memory,
βοΈ Internals
π References
- Dynamic DMA mapping Guide doc
- Dynamic DMA mapping using the generic device doc
- LWM: get_user_pages, pinned pages, and DAX
- pin_user_pages() and related calls doc
πΎ Historical
SAC Single Address Cycle
Modules
lsmod cat /proc/modules
π References
- LDD3: Building and Running Modules
- http://www.xml.com/ldd/chapter/book/ch02.html
- http://www.tldp.org/LDP/tlk/modules/modules.html
- http://www.tldp.org/LDP/lkmpg/2.6/html/ The Linux Kernel Module Programming Guide
Peripheral buses
Shell interface:
ls /proc/bus/ ls /sys/bus/
β² API: See Buses of Driver Model
Input: keyboard and mouse
Querying information from shell:
cat /proc/bus/input/devices
PCI
Querying information from shell:
lspci -vv column -t /proc/bus/pci/devices
Main article: PCI
USB
Querying information from shell:
lsusb -v cat /proc/bus/usb/devices
Other
Hardware interfaces
I/O ports and registers
β² API:
Functions for memory mapped registers:
The {in,out}[bwl] macros are for emulating x86-style PCI/ISA IO space:
Hardware Device Drivers
Keywords: kobjects, sysfs, buses, devices, drivers, classes, firmware, hotplug, gpio, pin, clock, mux
βοΈ Internals:
π References