2025-10-09 –, Krakow/ Business Value & Enterprise Adoption
With the help of readily available OpenSSL call-back functions, it is possible to accurately measure the CPU-time spent in the various stages of a TLS v1.3 session establishment (ClientHello, ServerHello, CertificateVertify, etc). Accurate timings for the computation of the related crypto algorithms used during those stages can then be extracted. While standalone and/or synthetic performance measurements of crypto algorithms are broadly available, the presented approach thanks to in-situ measurements not only characterizes the actual algorithm implementation as used in OpenSSL, but also takes required collateral performance penalties for the use of the algorithms in the TLS v1.3 protocol into account (e.g., memory allocations, data copying, algorithm loading, merging hybrid secrets, etc.). This presentation provides an overview on the measurement techniques and shows detailed measured results with focus on various combinations of ML-KEM and ML-DSA which were introduced in OpenSSL v3.5.
Data confidentiality is of growing concern and data communication is preferably encrypted end-to-end. The latter ‘e2e’ aspect also requires authenticating the endpoint(s) to ensure that the communication happens between the two intended endpoints and not via a rogue intermediate attacker (‘man in the middle’). Consequently, confidentiality and authenticity are important features of a modern communication protocol.
A sizable number of protocols exist to coordinate the flow of data in the above manner, with Transport Layer Security (TLS) in widespread use. Therefore, the present discussion focusses entirely on TLS, implicitly covering other protocols which use the initial part of the TLS protocol for session establishment, with other protocols being beyond the scope of this discussion. Several versions of TLS exist, with version 1.3 being the most modern at the time of writing which is standardized in RFC 8446 and related standards. This last version is also capable of using crypto algorithms with are resistant against attacks with future quantum computers.
While synthetic performance analysis (e.g., as theoretical number of CPU cycles required for their execution) as well as standalone measurements for the various cryptographic algorithms involved in a TLS session establishment exist, the performance of the algorithms when used inside a complex protocol like TLS was never investigated in detail to the best of our knowledge. We’d like to emphasize that measuring “in-situ”, hence deep inside the TLS library, will ensure that the actual data sizes processed during a TLS session establishment phases are impacting the measured performance during sign/verify or encrypt/decrypt operations, and any overhead during key generation is accounted for, notably including all required bookkeeping of the data, like memory allocation, placing of memory (stack vs heap), data copying, jumps to sub-routines, merging hybrid secrets or any other protocol- and/or implementation-induced overhead. These aspects must be considered to reveal the real-world performance impact or cryptographic algorithms when used inside a protocol.
We can establish a list of cryptographic algorithm types used in the establishment of a communication session with TLS v1.3, and we want performance measurements for all of those as a function of the involved cryptographic algorithm(s):
- Generation of a key pair
- Generation of a common key
- Encryption and decryption of a message record
- Generation of a message digest
- Signature of a message digest
- Verification of a message digest signature
OpenSSL with its libraries and additional providers essentially can be seen as an SDK (software development kit) which provides all required APIs (application programming interfaces) to establish (and maintain) a TLS session.
In addition, to perform the required functionality to establish a TLS session, OpenSSL provides a means to register a callback function which is then automatically executed at predefined phases of the protocol. This enables to generate timestamps for each of the phases of the session establishment and to gather information about the amount of data exchanged during those phases. Specifically, the ‘SSL_set_msg_callback’ in combination with ‘SSL_set_msg_callback_arg’ is used to gather insights about the above-mentioned timings and data amounts. We like to stress that this way, no further instrumentation (and hence no code changes) of the used libraries is required, therefore maintaining the actual processing timings.
As we’re interested on the CPU load rather than the actual duration of the session establishment, we’re using the ‘clock_gettime’ function with the ‘CLOCK_THREAD_CPUTIME_ID’ as timing source. This allows comparison of the ‘number-crunching requirement’ to process the algorithms, expressed in milliseconds (ms) or microseconds (us) which are largely equivalent to CPU cycles on otherwise empty servers.
We recognize that the duration of any phase of the TLS protocol processing, which is determined by the variable execution time of one or several cryptographic algorithms plus some implementation dependent protocol processing overhead, can be measured by calculating the time difference between a first timestamp taken when all data required for the operation is available, and a second timestamp taken when the result of the operation is available. As latter is – in terms of CPU cycles – equivalent to the start of the next phase of the TLS protocol processing, we have timestamps for when
- a fresh, potentially encrypted TLS record is available
- the (plain) payload data of this TLS record is available
- the processing of the payload data has been completed
Based on the above, a way to measure the timings and collect data size information for the various phases of a TLS v1.3 session establishment is available. As an example, the time between a client-side “connect” call and the availability of the “ClientHello” record is mostly spent to generate at least one speculative key pair intended to be used for a Diffie-Hellman key agreement with the server and will change with the computational complexity to generate such a key pair.
The presentation will show detailed measurement results for above mentioned phases of a TLS session establishment as a function of the used crypto algorithms, for both client- and server-side.
Dr. Martin Schmatz is a Principal Researcher at IBM Research Europe in Zurich/Switzerland. He studied electrical engineering and received his doctorate from ETH Zurich on the topic of "Noise Parameter Measurements." After joining IBM in 1999, he led research in the areas of I/O link technology and, from 2012, key components of modern IT server systems. Since 2017, he has focused his work on key management systems (KMS) for Cloud applications and, since 2020, in particular on the migration to quantum-secure communication. He holds an MBA, has over 50 scientific publications, and over 100 patents