- Erik Boss
- Ruhr-Universität Bochum
- Embedded Security Group (EMSEC)
- ECRYPT-NET ESR

- Large-scale (exhaustive) search problems
- For crypto

- Approach
- Problems, solutions and observations

Finding S-Boxes with Efficient Masking in Hardware

- Physical/implementation attacks
- Countermeasures
- S-Boxes
- Inherently maskable
- By construction
- Efficient
*and*secure

Whole Lotta Searchin' Goin' On

"That thing that AES uses."

- Intuition: n-bit to n-bit lookup tables
- n=8

- Common in symmetric ciphers
- Non-linear

"That thing that DES has."

- Round function is a 4-bit to 4-bit function.
- \(2^{64}\) candidates

\(H(x) = L(F(x)) \oplus A(x)\)

- \(L(x)\):
**linear**function - \(A(x)\):
**affine**function - \(F(x)\): function corresp. to 4713 equivalence classes

\(2^{16} \cdot 2^{20} \cdot 4713 \approx 2^{48}\)

*But we can reduce to* \(\approx 2^{46.5}\)

Minimize search space.

– First rule of Find Club

Minimize redundancy.

– Third rule of Find Club

"What's in the box!?"

- Algebraic Degree
- AES: 7

- Linearity, \(\le 64\)
- AES: 32

**Differential Uniformity**, \(\le 16\)- AES: 4

"Good luck storming the castle."

- Iterate over all L, A, F.
- Embarrassingly parallel

- Filter
- On?

"Power! Unlimited power!"

- GPUs to the rescue
- CUDA/OpenCL

Per thread/work item/core/…:

- Compute S-Box \(S\) for \(L\), \(F\), \(A\) corresp. to thread
- Compute a diff(\(S\)) check (\(\le\) threshold)
- Compute auxiliary properties
- Output those results that meet all criteria

"The root of all evil."

Let's talk about rule #3 again.

- \(L\) & \(A\) are fairly small.
- A metric !@#$-ton of memory on our GPUs.
- Precompute tables for \(L\) & \(A\)
- Better: \(L \circ F\) and \(A\) for fixed \(F\)
- (Even better: store in cached memory)

Precompute, precompute, precompute.

– Fourth rule of Find Club

Kids, don't try this at home.

Idea:

- Merge computation and filter(s)
- Terminate early
- Instruction divergence

**Better**: but within the same parallel context.

sbox = [...] maximum = 0 for alpha in range(1, 256): hist = [0] * 256 for x in range(0, 256): beta = sbox[x] ^ sbox[x ^ alpha] hist[beta]++ maximum = max(maximum, hist[beta])

- Terminate after `maximum` reaches threshold
- Remove redundancy by:
- \(S(x) \oplus S(x \oplus 128) = S(x \oplus 128) \oplus S((x \oplus 128) \oplus 128)\)

- Merge by interleaving filter and S-box computation
- Observation: most iterations terminate after \(\alpha = 128\)

Terminate early, in concert.

– Fifth rule of Find Club

Optimize for throughput.

– Sixth rule of Find Club

Mind the latency.

– Seventh rule of Find Club

- Search time is approx. 2 weeks per iteration.
- For \(n \le 5\), a \(n\) iteration Feistel network with identical round functions will not give you better properties than:
- Degree 7, linearity 56, diff. uniformity 8, round function degree 2.

- Example of this class (selected for having a relatively low AND-gate complexity)
- F: 0001024704638EAD
- L: 028A9B1346CEDF57
- A: 6273627351405140

Welcome to Find Club. Enjoy your stay.