Ever since the Sandy Bridge microarchitecture, Intel CPUs have been coming with hardware-accelerated AES support (aka "AES-NI", new instructions). I figured it would be interesting see a comparison between AES with and without the hardware acceleration on my Intel Core i5-3317U CPU (Ivy Bridge) on Arch Linux.
According to a post on the OpenSSL Users mailing list, you can force openssl
to avoid hardware AES instructions using the OPENSSL_ia32cap
environment variable.
Benchmarks
First, with AES-NI enabled (the default, on hardware that supports it):
$ openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 57196857 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 15343650 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 3897351 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 978726 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 122310 aes-128-cbc's in 3.00s
OpenSSL 1.0.1e 11 Feb 2013
built on: Sun Oct 20 14:49:13 CEST 2013
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 305049.90k 327331.20k 332573.95k 334071.81k 333987.84k
Then, setting the capability mask to turn off the hardware AES features:
$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 27883366 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 7736907 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 1949328 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 498847 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 62446 aes-128-cbc's in 3.00s
OpenSSL 1.0.1e 11 Feb 2013
built on: Sun Oct 20 14:49:13 CEST 2013
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,--noexecstack -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 148711.29k 165054.02k 166342.66k 170273.11k 170519.21k
You can see that hardware-accelerated AES is pretty consistently twice as fast as the implementation without aesni. So it's not an exponential win, but getting twice the performance is certainly very serious! This is great for not only for servers using AES encryption (SSL/TLS, hello!), but also for consumers wanting to connect to said servers as well as things like full-disk encryption.
Note: It seems Arch Linux's OpenSSL is built with AES-NI support but not as an engine, so openssl speed
could be misleading (ie, you'd see no difference with or without the capabilities masked). To get the AES-NI support you need to use -evp
("envelope") mode, which is some sort of high-level interface for crypto functions in OpenSSL.
This was originally posted on on my personal blog; re-posted here for posterity.