OpenSSL Conference

OpenSSL Conference

Optimizing OpenSSL's AES-CFB128 with Vector AES: Performance Gains and Hard-Won Lessons
2025-10-09 , Belvedere II/ Community, Contribution & the Future

Cryptographic implementations demand both correctness and security but how do you optimize an algorithm like AES-CFB128 for modern CPUs? This case study explores the evolution of OpenSSL’s AES-CFB128 implementation, from a sequential AES-NI baseline to a high-performance VAES-optimized version (openssl#26902).

We’ll deep dive into:
- SIMD and compiler optimization techniques,
- performance measurement and characterization,
- tooling and debugging challenges,
- security considerations,
- lessons learned as an external contributor


AES encryption is the backbone of modern data security and its performance is greatly improved by AES-NI instructions. In this deep dive, we'll explore how we optimized AES-CFB128 (Cipher Feedback mode) in OpenSSL by leveraging AVX-512 and the Vector version of AES-NI, VAES.

We'll start with the basics:
- How AES-CFB works and why encryption is inherently sequential while decryption is embarrassingly parallel
- The role of AES-NI instructions

Then, we'll work through the optimization journey:
- Baseline AES-NI implementation and its bottlenecks
- Short dependency chains that exploit super-scalar and out-of-order execution
- How we used VAES to process multiple blocks in parallel
- Other optimization techniques to reduce overhead

Along the way we'll cover:
- Performance measurement and characterization
- Tooling and debugging challenges
- Security aspects: zeroization and data oblivious algorithms
- Lessons from upstreaming to OpenSSL as an external contributor