Embedded Linux boot optimization: from seconds to milliseconds

Embedded Linux boot optimization: from seconds to milliseconds

In the world of professional embedded systems, boot time is not a simple performance parameter but an element that directly affects the quality of the product, the user experience and, in many cases, even the functional requirements of the system.

If on a desktop computer waiting a few seconds during startup is perfectly acceptable, on an embedded device the perspective changes radically. An HMI panel, an industrial gateway, a control system or a medical device are perceived as dedicated tools, not as general-purpose computers. The user expects an almost immediate response, an operational availability that is more reminiscent of a household appliance than a workstation.

Reducing embedded Linux boot times under two seconds is a realistic goal today, but only on condition that the problem is addressed with a systemic approach. There is no single magic parameter, there is no universal patch. Boot speed is the result of a chain of architectural decisions, conscious compromises and rigorous analysis of system dependencies.

The Real Nature of Linux Embedded Boot

One of the most common simplifications is to consider boot as a monolithic event: device power, kernel load, system ready. In reality, the boot process of an embedded Linux system is a complex sequence of phases, each with different characteristics, time costs and critical issues.

When the device is turned on, control initially passes to a bootstrap logic resident in the SoC ROM, which has the task of identifying the boot media and loading the bootloader. This first phase, although invisible to the application software, already contributes to the overall time.

The bootloader, typically U-Boot or a variant thereof, represents the first component truly configurable by the designer. Here you load the kernel, device trees and boot parameters. Subsequently the Linux kernel is unpacked, initializes the internal subsystems, analyzes the available hardware and starts the process of mounting the root filesystem. Only at this point do the init system and user-space come into play.

The boot, therefore, is not "Linux starting", but a pipeline of events in which each stage can introduce significant latencies.

Practical example: pipeline visible on the serial log

If you have the serial console active, you can already distinguish “bootloader → kernel → user-space” by looking at what it prints and when. On the kernel, the timestamps in square brackets are seconds since the kernel was started:

dmesg | head -n 80

Explanation: if you notice that the kernel gets to mount the root at ~1.6s but the device “seems” ready after 8s, then the slowness is almost certainly in the user-space (systemd / services / apps) or in the bootloader before the kernel.

Why Many Embedded Linux Systems Are Slow to Start Up

Linux, by its nature, is designed to be extremely flexible. It supports a huge variety of hardware, file systems, drivers and configurations. This versatility is a tremendous benefit, but it comes with a side effect: an unoptimized system tends to bring with it unnecessary functionality.

It is common to encounter embedded products built starting from generic Board Support Packages or reference configurations designed for development, not production. In such contexts, the kernel includes drivers for missing devices, the bootloader probes unused interfaces, and the init system starts superfluous services.

The result is a boot time inflated not by intrinsic inefficiencies, but by unfiltered choices. Every driver initialized, every device queried, every service started has a time cost. In a dedicated embedded system, what is not needed should simply not exist.

Practical example: drivers that introduce "holes" in the boot

When a driver tries to initialize missing hardware it can create gaps of hundreds of milliseconds (or seconds). You can highlight them automatically by searching for “empty” in the log:

dmesg | awk '
  /^\[[ 0-9.]+\]/ {
    gsub(/\[/,""); gsub(/\]/,"");
    t=$1; $1="";
    if(prev!="") {
      dt=t-prev;
      if(dt>0.2) printf("GAP %.3fs ->%s\n", dt, $0);
    }
    prev=t;
  }' | head -n 50

Explanation: If you see recurring gaps (e.g. 0.6–1.2s) around probing messages of a specific bus or driver, you have an immediate candidate to remove or reconfigure in kernel config or device tree.

Measuring Boot Time: A Too Often Ignored Step

A surprisingly common methodological error consists in attempting optimizations without a clear temporal breakdown of the boot. Without data, every intervention becomes a hypothesis.

From an engineering point of view, correctly measuring boot means distinguishing the various transitions: from power-on to bootloader, from bootloader to kernel, from kernel to user-space and finally to the operational application. Tools such as time-stamped serial logs or hardware-based techniques, such as toggling a GPIO observed with an oscilloscope, allow reliable and reproducible measurements to be obtained.

This analysis often reveals that the largest delays are not where one would intuitively expect. A bootloader with an active debug timeout or a driver that waits for non-existent hardware can introduce seconds of latency without the kernel or application actually being responsible.

Practical example: measure “power-on → app-ready” with GPIO and oscilloscope

A robust (not "questionable") method is to mark the APP READY event by raising a GPIO and measuring with logic analyzer:

# /usr/local/bin/boot_marker.sh
#!/bin/sh
GPIO=23
echo $GPIO > /sys/class/gpio/export 2>/dev/null
echo out > /sys/class/gpio/gpio$GPIO/direction
echo 1 > /sys/class/gpio/gpio$GPIO/value

Explanation: When the app is ready, run this script (via systemd or inside the app). The front of the GPIO becomes your “physical timestamp”, including ROM+bootloader+kernel+user-space. It's the most serious way to say "we're under 2 seconds" without discussion.

The Critical Role of the Bootloader

The bootloader represents the first stage on which the designer can intervene directly. In the development phase it is normal to favor flexibility and debugging tools: interactive consoles, autoboot delays, dynamic device scans. In a final product, these same features often become useless.

Reducing or eliminating the autoboot delay is one of the most immediate interventions. A wait of a few seconds, irrelevant during debugging, becomes a systematic cost in production. Similarly, removing probes and initializations of unused devices avoids timeouts and cumulative latencies.

A bootloader designed for fast-booting systems tends to take on minimalist logic: deterministic boot path, limited output, no non-essential waiting.

Practical example: U-Boot — eliminate bootdelay “for free”

If the bootloader is U-Boot, the countdown is often a major source of “fixed” latency.

In build (header board):

/* include/configs/<board>.h */
#define CONFIG_BOOTDELAY 0

Or via environment (if enabled):

setenv bootdelay 0
saveenv

Explanation: 2–3 seconds of boot delay = 2–3 seconds wasted every time you turn it on. In production, it almost always goes to zero.

Practical example: U-Boot — deterministic path, no unnecessary scans

If the product always boots from eMMC, there is no point in trying USB or network.

# esempio concettuale di bootcmd "diretto"
setenv bootcmd 'mmc dev 0; load mmc 0:1 ${kernel_addr_r} Image; booti ${kernel_addr_r} - ${fdt_addr_r}'
saveenv

Explanation: Fallbacks often introduce invisible timeouts (USB scan, DHCP, tftp…). A deterministic boot path is faster and more reliable.

Linux Kernel: Optimization by Subtraction

The Linux kernel is an extremely sophisticated component, capable of supporting heterogeneous configurations. However, in a dedicated embedded system, the priority is not generality but specificity.

A custom-compiled kernel, with only the strictly necessary drivers and subsystems, significantly reduces initialization times. In addition to the binary size, what matters is the number of components that the kernel actually needs to parse and activate. Unnecessary drivers can introduce unnecessary probing or initialization timeouts.

The choice of kernel compression algorithm also has a concrete impact. Aggressive compressions reduce storage space but increase decompression time. In many modern embedded scenarios, a slightly larger but quickly decompressible kernel is preferable.

Practical example: reduce cmdline kernel noise and overhead

quiet loglevel=3

Explanation: less printing means less time wasted on UART (which in embedded can be a surprising bottleneck).

Practical example: LZ4 kernel for faster decompression

In the kernel's .config:

CONFIG_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_XZ is not set

Explanation: The kernel may get a little bigger, but decompression is typically faster and more consistent. If your constraint is boot time, that's often a better choice.

Practical example: profiling slow initcalls with initcall_debug

Add to cmdline:

initcall_debug

Then read:

dmesg | grep -E "initcall.*returned" | head -n 60

Explanation: Get times for driver/subsystem initcall. If a driver takes 400–800ms, you find out right away and can decide whether to remove it, modularize it, or fix probing/DT.

File System and Storage: Invisible but Dominant Latencies

The file system is one of the elements that most influence boot time, although it often remains out of the main focus. Mounting the root filesystem, journaling operations and any integrity checks can introduce non-trivial delays.

On systems with static root filesystems, read-only solutions like SquashFS allow extremely fast and predictable mounts. When writing is necessary, file systems such as ext4 must be configured carefully, avoiding expensive startup operations or overly aggressive checks.

Storage latencies arise not only from the file system, but also from access patterns during initialization. Scripts or services that perform I/O-intensive operations early in the boot can significantly degrade overall speed.

Practical example: ext4 — reduce fsck and frequent checks

Check parameters:

tune2fs -l /dev/mmcblk0p2 | egrep "Mount count|Maximum mount count|Check interval"

Set less frequent checks (example):

tune2fs -c 50 -i 6m /dev/mmcblk0p2

Explanation: Aggressive checks can add seconds. Here the logic is: avoid the boot paying a "systematic" cost if not necessary (always evaluating the reliability requirements).

Practical example: SquashFS read-only + overlay for controlled writes

Conceptual example of fstab:

/dev/mmcblk0p2  /ro   squashfs  ro,defaults  0 0
/dev/mmcblk0p3  /rw   ext4      rw,noatime   0 0
overlay         /     overlay   lowerdir=/ro,upperdir=/rw/upper,workdir=/rw/work  0 0

Explanation: “immutable” and fast root (SquashFS) + writes where needed (overlay on ext4). Typical industrial appliance pattern.

Practical example: mount options to reduce unnecessary writes

/dev/mmcblk0p3 /rw ext4 rw,noatime,nodiratime,commit=30 0 2

Explanation: Reduces update access times and limits some frequent writes that can degrade I/O during boot and runtime.

Init System and Dependencies: The Weight of Configuration Choices

The init system governs the transition to the operational user-space. Modern frameworks like systemd offer advanced parallelization and dependency management mechanisms, but their effectiveness depends entirely on configuration.

Many slow boots are not caused by the init system itself, but by unoptimized dependency chains. A network-dependent service, which in turn depends on DHCP, can block entire portions of the boot due to non-critical delays. In systems designed for fast booting, only truly essential components affect initial availability.

Practical example: systemd — measure and find the critical chain

systemd-analyze time
systemd-analyze blame | head -n 20
systemd-analyze critical-chain

Explanation: critical-chain tells you which units “block” the target. It is the most useful information when you want to reduce real seconds.

Practical example: move a non-critical service off the boot chain

# /etc/systemd/system/telemetry.service
[Unit]
Description=Telemetry (non-critical)
After=multi-user.target
Wants=multi-user.target

[Service]
Type=simple
ExecStart=/usr/local/bin/telemetry

[Install]
WantedBy=multi-user.target

Explanation: The idea is simple: if telemetry is not needed to make the device operational, it should not block apps/UI.

Practical example: avoiding blocks on network-online.target

Many systems waste seconds waiting for “network online” (DHCP, link, etc.). If you don't need network to start, avoid dependencies:

# evita (se non indispensabile):
After=network-online.target
Wants=network-online.target

Explanation: network-online.target is a “sub 2 second” boot killer if inserted into critical drives without real need.

Application-Centric Boot: Rethinking the Purpose

A fundamental, often overlooked concept is to distinguish between “fully initialized Linux” and “functionally operational system”. In many embedded products, what matters is not the availability of all services, but the readiness of the primary function.

A user interface can be made available extremely quickly while other subsystems are initialized in the background. This approach radically changes the perception of device responsiveness and represents a key strategy in fast boot systems.

Practical example: app systemd unit — start as soon as possible

# /etc/systemd/system/app.service
[Unit]
Description=Main Application
After=basic.target
Wants=basic.target

[Service]
Type=simple
ExecStart=/usr/local/bin/my_app
Restart=always
RestartSec=0

[Install]
WantedBy=default.target

Explanation: the app starts after basic.target, without waiting for "convenient" services (online network, advanced logging, etc.). This is one of the most effective ways to reduce perceived time.

Practical example: APP READY marker inside the app (minimal C)

// my_app_boot_marker.c
#include <stdlib.h>
#include <unistd.h>

int main(void) {
    // ... init minima e UI pronta ...
    system("echo 1 > /sys/class/gpio/gpio23/value");  // marker "APP READY"
    // ... init secondaria ...
    for (;;) pause();
}

Explanation: When the main UI/function is ready, you raise the GPIO. This makes the “app-ready” metric measurable and testable on the bench.

Boot Time as an Architectural Problem

Reducing boot time is not an isolated activity but a design discipline. Each component — bootloader, kernel, file system, init system, application — contributes to the final result. The fastest embedded Linux systems are not the result of late optimizations, but of decisions made early in the design phase.

The most effective engineering principle remains surprisingly simple: eliminate what is not strictly necessary. Each removal reduces complexity, dependencies, and latency.

Conclusion

Embedded Linux can be extremely fast at startup, but only when the system is treated as a dedicated product and not as a general-purpose computer. Boot time is an architectural result, not a random parameter.

In professional embedded contexts, startup speed is not achieved by "optimizing Linux", but by consciously designing the system.

Reduce the boot time of your embedded Linux

Silicon LogiX supports you in analyzing and optimizing your boot time embedded platforms: U-Boot, kernel, device tree, filesystem and systemd, with a measurable approach (log + GPIO marker) and result-oriented (app-ready). Objective: boot faster, more predictable and suitable for industrial contexts.

Request boot time advice

Working on a similar problem?

Embedded Linux engineering

Boot time, security, OpenWrt, Yocto/Buildroot and connected-device architecture.

View service Technical audit 90 minutes Discuss your project

Continue the path

Related resources

Embedded Linux engineering

Boot time, security, OpenWrt, Yocto/Buildroot and connected-device architecture.

Secure embedded Linux

Related deep dive in the Embedded Linux and IoT gateways path.

SLX Memory Map Explorer

Visualize memory maps, linker maps and firmware layout for MCU analysis and debugging.

Related articles

Back to English news