Hardware-Accelerated AES-CCM for Bluetooth 5.4 Security: Register-Level Configuration and Firmware Integration for LE Secure Connections

The Bluetooth Core Specification, particularly version 5.4 and the upcoming 6.2, places an increasing emphasis on robust, low-latency security mechanisms. For Low Energy (LE) Secure Connections, the mandatory encryption algorithm is AES-CCM (Counter with CBC-MAC), operating on a 128-bit key. While software implementations exist, they introduce significant latency and power overhead, especially in constrained IoT devices. Modern Bluetooth LE SoCs, such as those from Silicon Labs' Series 3 platform (e.g., SiBG301), integrate dedicated hardware accelerators for AES-CCM. This article provides a deep technical dive into register-level configuration and firmware integration of such hardware accelerators, focusing on achieving optimal performance for LE Secure Connections in Bluetooth 5.4 and later.

The discussion assumes familiarity with the Bluetooth 5.4 security manager protocol, pairing procedures, and the mathematical underpinnings of AES-CCM. We will focus on the practical aspects of offloading encryption to hardware, covering key derivation, nonce generation, and the critical timing constraints for connection events.

1. The Role of AES-CCM in LE Secure Connections

Bluetooth LE Secure Connections, introduced in Bluetooth 4.2 and refined in 5.x and beyond, uses AES-CCM for both encryption and authentication of data packets on the Link Layer. The algorithm operates in two phases: first, a CBC-MAC is computed over the packet header, payload, and a nonce to generate a 4-byte Message Integrity Check (MIC). Second, the payload is encrypted using AES in CTR mode, with the same nonce and a counter. The nonce is constructed from the packet's initialization vector (IV), the master's address, and a direction bit.

In a typical software implementation, each AES-CCM operation requires multiple AES block cipher calls (typically 1 + ceil(Payload_Length / 16) for encryption, plus 1 for the MIC). For a 27-byte payload (a common MTU), this translates to 3-4 AES operations per packet. In a high-throughput scenario (e.g., 1 Mbps PHY with 7.5 ms connection intervals), this can exceed the CPU's capacity, leading to missed connection events or increased power consumption. Hardware acceleration reduces this to a single, autonomous DMA-driven operation.

2. Hardware Accelerator Architecture: A Register-Level View

Modern Bluetooth LE SoCs integrate a dedicated AES-CCM accelerator peripheral. While register maps vary by vendor (e.g., Silicon Labs EFR32 series, Nordic nRF52, or TI CC13xx/CC26xx), the logical architecture is consistent. The accelerator typically includes the following key registers and control blocks:

  • Key Register (128-bit): A write-only register to load the session key derived during the LE Secure Connections pairing process. This register is often locked after writing to prevent software reads.
  • Nonce Register (96-bit or 104-bit): Stores the nonce (IV + address + direction). For Bluetooth LE, this is exactly 13 bytes (104 bits).
  • Data Input FIFO (or DMA Buffer): A memory-mapped region or FIFO for writing the plaintext packet header and payload.
  • Data Output FIFO: For reading the encrypted payload and computed MIC.
  • Control Register: Commands to start encryption, start decryption, or compute MIC only. Also includes flags for key size (128-bit only for LE Secure Connections) and direction (encrypt vs. decrypt).
  • Status Register: Indicates operation completion, error conditions (e.g., key not loaded), or FIFO underflow/overflow.

For example, consider a hypothetical register layout on a Silicon Labs Series 3 device:

// Register offsets (example, not actual Silicon Labs register map)
#define AESCCM_BASE_ADDR         0x4000C000
#define AESCCM_KEY0              (AESCCM_BASE_ADDR + 0x00)  // 4 x 32-bit words
#define AESCCM_NONCE0            (AESCCM_BASE_ADDR + 0x10)  // 3 x 32-bit words + 8-bit
#define AESCCM_CTRL              (AESCCM_BASE_ADDR + 0x1C)
#define AESCCM_STATUS            (AESCCM_BASE_ADDR + 0x20)
#define AESCCM_DATA_IN           (AESCCM_BASE_ADDR + 0x24)  // 32-bit write port
#define AESCCM_DATA_OUT          (AESCCM_BASE_ADDR + 0x28)  // 32-bit read port

// Control register bit fields
#define AESCCM_CTRL_START        (1 << 0)
#define AESCCM_CTRL_MODE_ENCRYPT (0 << 1)
#define AESCCM_CTRL_MODE_DECRYPT (1 << 1)
#define AESCCM_CTRL_KEY_LOAD     (1 << 2)
#define AESCCM_CTRL_DMA_ENABLE   (1 << 3)

// Status register bits
#define AESCCM_STATUS_DONE       (1 << 0)
#define AESCCM_STATUS_ERROR      (1 << 1)

3. Firmware Integration: Step-by-Step Configuration

Integrating the hardware accelerator into a Bluetooth LE stack requires careful sequencing, especially during the pairing and connection phases. The following outlines a typical firmware flow:

3.1 Key Loading After Pairing

During LE Secure Connections pairing, the Security Manager (SM) generates a 128-bit Long Term Key (LTK) using the ECDH-based key exchange. Once the LTK is confirmed, the firmware must load it into the accelerator's key register. This should be done atomically, with interrupts disabled to prevent race conditions.

void aesccm_load_key(const uint8_t *ltk) {
    // Disable interrupts to protect key loading
    __disable_irq();

    // Write 128-bit key as four 32-bit words (little-endian for Cortex-M)
    for (int i = 0; i < 4; i++) {
        uint32_t word = (ltk[i*4] << 0) |
                        (ltk[i*4+1] << 8) |
                        (ltk[i*4+2] << 16) |
                        (ltk[i*4+3] << 24);
        *(volatile uint32_t *)(AESCCM_KEY0 + i*4) = word;
    }

    // Trigger key load operation (hardware may internally latch)
    *(volatile uint32_t *)AESCCM_CTRL = AESCCM_CTRL_KEY_LOAD;

    // Wait for key load completion (typically instant)
    while (!(*(volatile uint32_t *)AESCCM_STATUS & AESCCM_STATUS_DONE));

    // Clear status
    *(volatile uint32_t *)AESCCM_STATUS = AESCCM_STATUS_DONE;

    __enable_irq();
}

3.2 Packet Encryption for a Connection Event

When a connection event requires data transmission, the Link Layer must encrypt the packet payload and compute the MIC. The accelerator can be configured to process the entire packet in one go. The nonce is constructed from the connection's IV (derived from the LTK), the master's Bluetooth address, and the direction bit (0 for master-to-slave, 1 for slave-to-master).

void aesccm_encrypt_packet(uint8_t *header, uint8_t *payload, uint16_t len,
                           uint8_t *nonce, uint8_t *mic_out) {
    // Ensure accelerator is idle
    while (*(volatile uint32_t *)AESCCM_STATUS & AESCCM_STATUS_DONE);

    // Write nonce (13 bytes) as 3 x 32-bit words + 1 byte, zero-padded
    uint32_t nonce_word0 = nonce[0] | (nonce[1] << 8) | (nonce[2] << 16) | (nonce[3] << 24);
    uint32_t nonce_word1 = nonce[4] | (nonce[5] << 8) | (nonce[6] << 16) | (nonce[7] << 24);
    uint32_t nonce_word2 = nonce[8] | (nonce[9] << 8) | (nonce[10] << 16) | (nonce[11] << 24);
    uint32_t nonce_word3 = nonce[12]; // Only 8 bits used

    *(volatile uint32_t *)(AESCCM_NONCE0) = nonce_word0;
    *(volatile uint32_t *)(AESCCM_NONCE0 + 4) = nonce_word1;
    *(volatile uint32_t *)(AESCCM_NONCE0 + 8) = nonce_word2;
    *(volatile uint32_t *)(AESCCM_NONCE0 + 12) = nonce_word3;

    // Write header (2 bytes) and payload to data input FIFO
    // The header includes the LLID and length fields
    *(volatile uint32_t *)AESCCM_DATA_IN = header[0] | (header[1] << 8);
    for (uint16_t i = 0; i < len; i += 4) {
        uint32_t word = payload[i] |
                       (i+1 < len ? payload[i+1] << 8 : 0) |
                       (i+2 < len ? payload[i+2] << 16 : 0) |
                       (i+3 < len ? payload[i+3] << 24 : 0);
        *(volatile uint32_t *)AESCCM_DATA_IN = word;
    }

    // Start encryption (CTR mode + CBC-MAC)
    *(volatile uint32_t *)AESCCM_CTRL = AESCCM_CTRL_MODE_ENCRYPT | AESCCM_CTRL_START;

    // Wait for completion (should be fast, < 10 us for typical packets)
    while (!(*(volatile uint32_t *)AESCCM_STATUS & AESCCM_STATUS_DONE));

    // Read encrypted payload (overwrite original buffer)
    for (uint16_t i = 0; i < len; i += 4) {
        uint32_t word = *(volatile uint32_t *)AESCCM_DATA_OUT;
        payload[i] = word & 0xFF;
        if (i+1 < len) payload[i+1] = (word >> 8) & 0xFF;
        if (i+2 < len) payload[i+2] = (word >> 16) & 0xFF;
        if (i+3 < len) payload[i+3] = (word >> 24) & 0xFF;
    }

    // Read MIC (4 bytes)
    uint32_t mic_word = *(volatile uint32_t *)AESCCM_DATA_OUT;
    mic_out[0] = mic_word & 0xFF;
    mic_out[1] = (mic_word >> 8) & 0xFF;
    mic_out[2] = (mic_word >> 16) & 0xFF;
    mic_out[3] = (mic_word >> 24) & 0xFF;

    // Clear status
    *(volatile uint32_t *)AESCCM_STATUS = AESCCM_STATUS_DONE;
}

4. Performance Analysis and Timing Constraints

Hardware acceleration dramatically reduces encryption latency. A software AES-CCM implementation on a 48 MHz Cortex-M4 typically takes 30-50 microseconds for a 27-byte packet. The hardware accelerator, by contrast, completes the same operation in 2-5 microseconds, including DMA overhead. This is critical for Bluetooth 5.4's shorter connection intervals (down to 7.5 ms) and for the new LE Test Mode Enhancements in Core Spec 6.2, which require deterministic packet timing.

Consider the timing budget for a connection event with encryption enabled:

  • Radio ramp-up: ~150 us
  • Packet transmission (27 bytes @ 1 Mbps): ~216 us
  • Inter-frame spacing (T_IFS): 150 us
  • Packet reception + MIC verification: ~216 us + encryption time

With software encryption, the total event time can exceed 800 us, leaving little margin for sleep. With hardware acceleration, the encryption overhead is negligible, allowing the radio to enter sleep mode sooner and reducing average current consumption by 20-30%.

5. Integration with Bluetooth 5.4 and Future Specs

The Bluetooth Core Specification 6.2 introduces Channel Sounding amplitude-based attack resilience and shorter connection intervals. These features demand even faster encryption to maintain link stability. Hardware-accelerated AES-CCM is not just a performance optimization; it is a prerequisite for meeting the timing requirements of these advanced features. The HCI USB LE Isochronous Support in Spec 6.2 also benefits from deterministic encryption latency for isochronous channels.

For firmware developers, the key takeaway is to abstract the hardware accelerator behind a well-defined API that the Link Layer can call without blocking. Using DMA and interrupt-driven completion (rather than polling) further reduces CPU overhead. The register-level code shown above can be adapted to any vendor's hardware by adjusting the register addresses and bit fields.

6. Conclusion

Hardware-accelerated AES-CCM is a cornerstone of modern Bluetooth LE security, enabling the low-power, high-performance requirements of LE Secure Connections. By understanding the register-level configuration and integrating it tightly with the firmware stack, developers can achieve sub-5-microsecond encryption latency, reduce power consumption, and ensure compliance with Bluetooth 5.4 and future specifications. As the Bluetooth Core Spec evolves (e.g., 6.2's Channel Sounding enhancements), the role of dedicated cryptographic hardware will only grow more critical. Developers should leverage vendor-specific hardware abstraction layers (HALs) but also maintain the ability to configure registers directly for maximum control and performance.

常见问题解答

问: What is the primary benefit of using hardware-accelerated AES-CCM over software implementations for Bluetooth 5.4 LE Secure Connections?

答: Hardware-accelerated AES-CCM significantly reduces latency and power consumption compared to software implementations. For example, a software AES-CCM operation for a 27-byte payload requires 3-4 AES block cipher calls, which can overwhelm the CPU in high-throughput scenarios (e.g., 1 Mbps PHY with 7.5 ms connection intervals), leading to missed connection events or increased power draw. Hardware acceleration offloads this to a single, autonomous DMA-driven operation, ensuring timely processing and lower energy usage in constrained IoT devices.

问: What are the key registers typically found in a hardware AES-CCM accelerator for Bluetooth LE SoCs, and how are they used?

答: The accelerator usually includes a write-only 128-bit Key Register for loading the session key derived during LE Secure Connections pairing, a Nonce Register for constructing the initialization vector (IV), master address, and direction bit, a Data-In/Out Register for packet payloads, and a Control/Status Register to trigger operations and check completion. Additionally, a DMA interface may be present to automate data transfer, reducing CPU intervention.

问: How does the nonce construction for AES-CCM in LE Secure Connections affect register-level configuration?

答: The nonce is built from the packet's initialization vector (IV), the master's Bluetooth address, and a direction bit. At the register level, the firmware must parse the Link Layer packet header to extract these fields and write them into the accelerator's Nonce Register before initiating encryption or MIC computation. This ensures the hardware uses the correct cryptographic context for each packet, maintaining security and compliance with the Bluetooth Core Specification.

问: What timing constraints must firmware consider when integrating a hardware AES-CCM accelerator for Bluetooth 5.4 connection events?

答: Firmware must ensure that the hardware accelerator completes encryption and MIC generation within the connection interval's available time window, typically a few hundred microseconds for 7.5 ms intervals. This requires careful scheduling of register writes and DMA transfers to avoid delaying the Link Layer's response. The accelerator's interrupt or polling mechanism should be used to confirm completion before the packet is transmitted, preventing missed events or retransmissions.

问: How does the hardware accelerator handle the two-phase operation of AES-CCM (CBC-MAC for MIC and CTR mode for encryption) in a single DMA-driven process?

答: The accelerator internally sequences the two phases: it first computes the CBC-MAC over the packet header, payload, and nonce to generate the 4-byte MIC, then encrypts the payload using AES in CTR mode with the same nonce and a counter. The DMA controller streams the packet data through the accelerator in one pass, with the hardware managing the state machine and counter increments. The final output includes the encrypted payload and MIC, which the firmware can read from the output buffer for transmission.

💬 欢迎到论坛参与讨论: 点击这里分享您的见解或提问

Login

Bluetoothchina Wechat Official Accounts

qrcode for gh 84b6e62cdd92 258