Cryptography · Hardware · Embedded

Cryptography On Embedded Devices 101

When talking about “crypto on embedded devices”, the conversation almost always collapses into two questions: what are my physical constraints and what security assumptions I need to reach ? It turns out that picking a primitive once you know all of the device you’re handed is usually the easy part, it is even a luxury to get to choose your primitive. The hard part is everything before and around it: where the keys live, whether the chip can give you a decent random number, how many bytes you are allowed to put on the wire, and whether the chip in front of you already comes with an accelerator you are expected to use whether you like it or not.

Context

Cryptography is not new and the goals are well known: confidentiality, integrity, and authenticity.Confidentiality comes from encryption (symmetric ciphers like AES, DES and triple DES, or asymmetric schemes like RSA), while integrity and authenticity mostly come from hashing (SHA-3 for instance) and from digital signatures.

On a server for example you reach for whatever your TLS stack exposes and most of the time that’s it. On an embedded system, you do not get that little friction very often. Most of the time the security technology used in desktop and enterprise computing cannot simply be executed on embedded systems, and the problem space is in fact larger, not smaller, because the device touches the physical world.

Types of embedded devices with needs of cryptography

So when you have to produce code for embedded systems (in general and not only for cryptography), you have to answer two questions. What kind of device am I on, and what is it actually capable of?

“Embedded” covers an enormous range, and so does the cryptographic applications.In fact , cryptographic components show up in automotive industry, banking context, intellectual property protection, satellite communications, law enforcement technology, supply chain management, device tracking, healthcare and more.

We can choose to group them as such:

Payment and identity smart cards
- EMVCo compliant cards and ATM encrypting pin pads. The Multos Step/One card is a good example of a very low cost device, and it was built around 3DES rather than AES.
RFID and contactless tags
- The historical example for “too small for AES”, although that has changed. Many RFID products now ship an AES engine.
Low power wireless sensors
- Battery or energy nodes talking over Bluetooth Low Energy, LoRaWAN or Sigfox. These are the devices NIST had in mind for lightweight cryptography: IoT devices, embedded systems, and low power sensors.
General purpose microcontrollers
- The STM32 family, the AVR ATmega, the ESP32 family. Plenty of compute for symmetric crypto, often with an on chip accelerator, frequently tight on RAM for public key work and certificate parsing.
Application chips with a security subsystem
- Phones, tablets and laptops where a dedicated secure coprocessor (Apple’s Secure Enclave for instance) holds keys and drives an inline storage encryption engine.

Constraints of embedded devices

That is in my opinion the most important part of implementing lightweight crypto. As we said earlier, most current cryptographic algorithms were designed for desktop and server environments and do not fit into constrained devices.

The constraints that actually hurt are these:

First you should care about the memory. Symmetric primitives are rarely the problem. Public key operations can be impossible, and a full TLS stack can be too large even with constrained friendly implementations like ARM’s mbedTLS, wolfSSL or BearSSL, because of everything around the handshake (certificate parsing, ASN.1, the standard’s surface area). For example, PureEdDSA combines the public key and the data when signing, so a verifier must buffer an entire certificate before it can extract the key, which can exceed RAM on a device that is otherwise perfectly capable of Ed25519.

Then you should know if you are able to use a stateful cipher. Not every platform can reliably and persistently store a counter, a seed, or any context across reboots. Keys written during the personalization phase of production may not be modifiable afterward. Any design that assumes a monotonic counter (hello, AES-GCM nonces) has to be aware of this.

Now for randomness, the platform may offer only a low entropy, non cryptographic generator, or no trustworthy generator of random numbers at all. This becomes a severe limitation especially if you discover it after proudly finishing an implementation.

Network can also be a problem. It may be hard (or very long) to do round trips to your device. For instance, devices behind satellites, on ships, or on smart meters polled by a passing vehicle may have long blackout windows. if your protocol need even moderate synchronization between devices,it may become unusable.

Finally , the whole point of our implementation is to deal with a physical device. Because of that, the attacker often holds the device, and then side channel and fault attacks matter.Power consumption and electromagnetic radiation analysis, timing attacks etc. We will return to these in Part 2, but they orient implementation choices today, which is why the good accelerators advertise countermeasures.

Note: cost is the real constraint behind all of these. Embedded systems are highly cost sensitive, and that single fact explains why a 1998 era cipher is still sitting in payment terminals in 2026.

Existing primitives on various devices

Let’s look at what current systems can give you. This is not an exhaustive list at all, just to get a good grasp of what modern hardware is capable of.

Randomness engines

Good keys, nonces and seeds all start with good entropy, and entropy is the resource constrained devices most often lack. There are two things people mean by random number generator:

A true random number generator, which harvests physical entropy (thermal noise, oscillator jitteror , lava lamps…).

A pseudo random number generator, which stretches a seed using a cryptographic primitive

A well designed engine uses both: a TRNG to seed a PRNG. Apple’s Secure Enclave is a clean reference implementation of this pattern. Per the Apple Platform Security guide, the Secure Enclave’s TRNG generates secure random data whenever the system needs a random cryptographic key, key seed, or other entropy, and it is built from multiple ring oscillators post processed with A PRNG based on a block cipher in counter mode. The same TRNG is even used during manufacturing to generate the per device UID and write it into fuses, so that the root secret is never visible outside the chip.

The lesson for your own designs is simple, if you cannot trust the platform’s randomness, do not depend on it. Use constructions that are robust to weak or repeated randomness. Deterministic ECDSA and EdDSA may be a solution for signatures, and AES-SIV gives you nonce misuse resistant authenticated encryption. That last property is exactly why they chose AES-SIV as a default. A quick way to sanity check what your platform exposes on Linux based embedded targets:

➕➕

filename.cpp

#include <sys/random.h>
#include <stdint.h>
#include <stdio.h>

int fill_random(uint8_t *buf, size_t len) 
{
    size_t off = 0;
    while (off < len)
    {
        // getrandom() blocks until the kernel CSPRNG is seeded,
        // which is exactly what you want on first boot.
        ssize_t n = getrandom(buf + off, len - off, 0);

        if (n < 0) 
            return -1;      // [1]
        off += (size_t)n;
    }
    return 0;                       
}

[1] On a bare metal or RTOS target there is no getrandom(), and this is the moment to find out whether your microcontroller has a hardware RNG peripheral or whether you are about to seed a generator from an uninitialized variable.

AES engines

There are two flavors of hardware help for AES, and they are not the same thing.

The first is an instruction set extension, where the CPU adds opcodes that compute AES rounds directly. Intel proposed AES-NI in March 2008 and shipped it first in Westmere. The instruction set, documented on the AES instruction set reference, is small:

Having the algorithm directly implemented in the silicium helps a lot to improve the bytes per cycle rate for encryption/decryption. You also remove doing the lookups in the tables yourself (which prevent some cache attacks).

AES round instructions are now broadly available: AMD from Bulldozer onward and all Zen cores, the ARMv8-A Cryptographic Extension (announced 2011), and the RISC-V scalar and vector cryptography extensions ratified in 2022 and 2023 respectively. AVX-512 widens AES-NI further with the vectorized VAES instructions.

Here is a way to use AES-128 single block encryption with AES-NI:

➕➕

filename.cpp

#include <wmmintrin.h> 
#include <stdint.h>

static __m128i key_expand(__m128i key, __m128i kgen) 
{
    kgen = _mm_shuffle_epi32(kgen, _MM_SHUFFLE(3, 3, 3, 3));
    key  = _mm_xor_si128(key, _mm_slli_si128(key, 4));
    key  = _mm_xor_si128(key, _mm_slli_si128(key, 4));
    key  = _mm_xor_si128(key, _mm_slli_si128(key, 4));
    return _mm_xor_si128(key, kgen);
}

void aes128_key_schedule(const uint8_t *k, __m128i rk[11]) 
{
    rk[0]  = _mm_loadu_si128((const __m128i *)k);
    rk[1]  = key_expand(rk[0],  _mm_aeskeygenassist_si128(rk[0],  0x01));
    rk[2]  = key_expand(rk[1],  _mm_aeskeygenassist_si128(rk[1],  0x02));
    rk[3]  = key_expand(rk[2],  _mm_aeskeygenassist_si128(rk[2],  0x04));
    rk[4]  = key_expand(rk[3],  _mm_aeskeygenassist_si128(rk[3],  0x08));
    rk[5]  = key_expand(rk[4],  _mm_aeskeygenassist_si128(rk[4],  0x10));
    rk[6]  = key_expand(rk[5],  _mm_aeskeygenassist_si128(rk[5],  0x20));
    rk[7]  = key_expand(rk[6],  _mm_aeskeygenassist_si128(rk[6],  0x40));
    rk[8]  = key_expand(rk[7],  _mm_aeskeygenassist_si128(rk[7],  0x80));
    rk[9]  = key_expand(rk[8],  _mm_aeskeygenassist_si128(rk[8],  0x1b));
    rk[10] = key_expand(rk[9],  _mm_aeskeygenassist_si128(rk[9],  0x36));
}

void aes128_encrypt_block(const __m128i rk[11], const uint8_t in[16], uint8_t out[16]) 
{
    __m128i b = _mm_loadu_si128((const __m128i *)in);
    b = _mm_xor_si128(b, rk[0]);              // initial AddRoundKey
    for (int r = 1; r < 10; r++)
    {
        b = _mm_aesenc_si128(b, rk[r]);      // [1] rounds 1..9
    }
    b = _mm_aesenclast_si128(b, rk[10]);      // [2]
}

1] Each _mm_aesenc_si128 performs SubBytes, ShiftRows, MixColumns and AddRoundKey in one instruction.

[2] The last round drops MixColumns, which is why AES-NI has a dedicated AESENCLAST. Note this is ECB on a single block and is shown only to illustrate the instructions. Real use needs an authenticated mode such as AES-GCM, and you would gate the call behind a runtime check that AES-NI is present (via `CPUID`) before dispatching to this path.

The second flavor is an accelerator block sitting beside the CPU rather than inside it. This is the common case in the embedded world. Examples: the Atmel XMEGA on chip AES accelerator (a peripheral with parallel execution, not a CPU instruction). STM32 parts also commonly expose AES in several modes of operation ready to use.

Apple’s Secure Enclave shows how far the dedicated approach can be pushed. The guide describes a dedicated AES-256 engine placed directly in the DMA path between the NAND flash storage and main memory, so files are encrypted and decrypted inline as they are read and written. Its keys are derived from the per device UID or GID, stay inside the engine, and are never exposed to software, not even to the secure coprocessor’s own OS. The engine is designed to resist timing and Static Power Analysis, and from the A9 onward it adds Dynamic Power Analysis countermeasures. The same chip’s Memory Protection Engine even uses AES in XEX mode plus a CMAC tag and an anti-replay value to protect the enclave’s own DRAM.

Note: AES rounds have become such a good hardware primitive that newer algorithms are built to reuse them. The AEGIS family of authenticated ciphers is constructed on AES rounds and runs at least twice the speed of AES on hardware that has the instructions. SM4, Camellia and ARIA have also been accelerated through AES-NI by way of an affine transform between their S-boxes.

3DES encrypting pin-pads

Even if AES is THE standard and is well supported , encrypted pin pads still use triple DES. Why? Because payment infrastructure runs on it. ATM encrypting pin pads frequently use 3DES, and that the EMVCo compliant Multos Step/One was built around 3DES rather than AES.

Their reading, which matches the wider industry, is that the continued use of 3DES has more to do with compatibility and the cost of replacement than with any prohibitive cost of running a wider block cipher. The hardware could do AES. But the whole ecosystem is not easily changeable.

But if you are building a new device, you do not choose 3DES. If you are integrating with an existing payment estate, you may have no choice.

Implementing cryptography depending on your needs and devices

Now let’s see how to choose a scheme according to what you need and what you can do. The guiding principle, again, is that choosing primitives is a luxury. More often you are handed a device that exposes, say, AES and SHA-256 only, perhaps because they are standardized or already hardware accelerated and you HAVE to build your security layer from those.

Note that a platform may expose AES-GCM as a sealed mode while not having a bare AES core, which then prevents you from building something like AES-SIV on top.

Here is a good flow to determine how to implement your crypto:

First, make the inventory of your hardware. Does the chip/controller have an AES instruction set, a dedicated AES accelerator, a TRNG, a public key accelerator? Read the datasheet and the security peripheral section before committing to a protocol. If an accelerator exists, you are almost always expected to use it, both for speed and for its side channel countermeasures.

Then for symmetric use, if AES is accelerated, use AES in an authenticated mode (AES-GCM if you have a reliable nonce source, AES-SIV or AES-CCM if you do not). If the device is genuinely too constrained for AES to perform well, this is where you should look for the NIST lightweight recommandations. Specifically, it standardizes the Ascon family precisely as a viable alternative for when AES may not perform optimally. We look at Ascon next.

For asymmetric crypto, a very wide portion of the time it will be too expensive to afford on constrained embedded devices, because the underlying operations are way more computationally expensive due to the nature of the groups you work with. However, if you can afford them, you may prefer deterministic scheme to not depend on a RNG source if your device does not provide one, so deterministic ecdsa for example.

Finally if you are stuck with a legacy cipher like 3DES due to the infrastructure, try to isolate your reliance on it as much as possible, and do not let it become a default usage for any new piece of software after this.

Now for the Ascon point, since it is the freshest piece and the one most likely to show up in new lightweight designs: Ascon was selected by NIST in February 2023 after a multi round process that began from 57 submissions in 2019, and standardized in August 2025.

The standard specifies four functions: the authenticated encryption scheme Ascon-AEAD128, the hash function Ascon-Hash256, and two extendable output functions, Ascon-XOF128 and Ascon-CXOF128. All of them are built on a 320 bit permutation separated as five 64 bit words, with only bitwise XORs, rotations, and a 5 bit S-box, which is what makes it conveniently cheap in hardware. Ascon-AEAD128 is a nonce based sponge with a 128 bit rate and 192 bit capacity, offering 128 bit security, with 128 bit key, nonce and tag.

The main reason to reach for it on hardware as we said before is when your device has no AES accelerator, and Ascon in software is far friendlier than AES in software, and you still get a standardized, well analyzed AEAD.

Table recap

References

NIST, SP 800-232 (initial public draft and final), Ascon-Based Lightweight Cryptography Standards for Constrained Devices: https://csrc.nist.gov/pubs/sp/800/232/ipd and https://csrc.nist.gov/pubs/sp/800/232/final
NIST, Lightweight Cryptography project overview and timeline: https://csrc.nist.gov/projects/lightweight-cryptography
J. P. Aumasson and A. Vennard (Teserakt AG), Cryptography in industrial embedded systems: our experience of needs and constraints, NIST Lightweight Cryptography Workshop 2019: https://csrc.nist.gov/CSRC/media/Events/lightweight-cryptography-workshop-2019/documents/papers/cryptography-in-industrial-embedded-systems-lwc2019.pdf
Y. Qian, Applied Cryptography in Embedded Systems, M.Sc. thesis, University of Vaasa, 2013: https://osuva.uwasa.fi/server/api/core/bitstreams/3b839d4b-00fe-443d-b7dc-e1d7cde2ba85/content
S. S. Kalluri (Cadence), Embedded Security Using Cryptography, Semiconductor Engineering: https://semiengineering.com/embedded-security-using-cryptography/
Apple, Apple Platform Security guide: https://help.apple.com/pdf/security/en_US/apple-platform-security-guide.pdf
AES instruction set, Wikipedia: https://en.wikipedia.org/wiki/AES_instruction_set
NIST SP 800-131A Revision 2, Transitioning the Use of Cryptographic Algorithms and Key Lengths (3DES status): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf

About Us

Founded in 2021 and headquartered in Paris, FuzzingLabs is a cybersecurity startup specializing in vulnerability research, fuzzing, and blockchain security. We combine cutting-edge research with hands-on expertise to secure some of the most critical components in the blockchain ecosystem.

LASTEST ARTICLES

Fuzzinglabs Insights

Keep in touch with us !

contact@fuzzinglabs.com

X (Twitter)

@FuzzingLabs

Github

FuzzingLabs