Very Poor Multi-Threaded Performance of lib TCrypto Hashing Functions

Hello,

We've been investigating performance issues when using the hashing function provided by the tcrypto library.
We have isolated this issue to the way the IPP cryptography primitives are being used: https://github.com/intel/cryptography-primitives/issues/93

To take the specific example of SHA-256, when calling `ippsHashMessage_rmf`, `ippsHashMethod_SHA256_TT` is called every time:

https://github.com/intel/linux-sgx/blob/7385e10ce1106215d15f874a024ca224c7417eea/sdk/tlibcrypto/ipp/sgx_sha256_msg.cpp#L49-L68

This seems innocent enough as this code should just return a static structure with function pointers to the specific functions for that hashing primitive. However, the `_TT` methods support dynamic dispatching to the NI implementations of those hashing primitives: https://www.intel.com/content/www/us/en/docs/ipp-crypto/developer-guide-reference/2021-9/one-way-hash-primitives.html

Because of the way this was implemented, calling `ippsHashMethod_SHA256_TT` repeatedly, on a platform supporting SHA-NI, results in every call setting the `method.hashUpdate` global function pointer to the normal implementation function pointer and then to the NI one:

https://github.com/intel/cryptography-primitives/blob/59a3c2e80c8fccd0d37b7a58020671c5468ec49b/sources/ippcp/hash/sha256/pcphashmethod_sha256_tt.c#L49-L75

Since this structure is static and shared across all threads, calling this method from different threads causes the function pointer to keep changing for all threads involved with devastating consequences for the memory caches (on the pure ippcp sample without using SGX, we could see a massive memory bottleneck due to L1D and L3 cache misses using perf).

I'm unsure what is the correct fix for the libtcrypto functions. I have seen some internal code implementing CPU dispatching directly using ippcp internal functions: https://github.com/intel/linux-sgx/blob/7385e10ce1106215d15f874a024ca224c7417eea/sdk/tlibcrypto/ipp/ipp_disp/intel64/ippsHashMessage_rmf.c#L61-L76

For our use case, we will use ippcp directly, make sure to call `ippsHashMethod_SHA256_TT` only once and cache it.

	sgx_status_t sgx_sha256_msg(const uint8_t p_src, uint32_t src_len, sgx_sha256_hash_t p_hash)
	{
	if ((p_src == NULL) \|\| (p_hash == NULL))
	{
	return SGX_ERROR_INVALID_PARAMETER;
	}

	fips_self_test_hash256();

	IppStatus ipp_ret = ippStsNoErr;
	ipp_ret = ippsHashMessage_rmf((const Ipp8u ) p_src, src_len, (Ipp8u )p_hash, ippsHashMethod_SHA256_TT());
	switch (ipp_ret)
	{
	case ippStsNoErr: return SGX_SUCCESS;
	case ippStsMemAllocErr: return SGX_ERROR_OUT_OF_MEMORY;
	case ippStsNullPtrErr:
	case ippStsLengthErr: return SGX_ERROR_INVALID_PARAMETER;
	default: return SGX_ERROR_UNEXPECTED;
	}
	}

	IPPFUN(IppStatus, sgx_disp_ippsHashMessage_rmf,(const Ipp8u* pMsg, int len, Ipp8u* pMD, const IppsHashMethod* pMethod))
	{
	Ipp64u _features;
	_features = ippcpGetEnabledCpuFeatures();

	if( AVX3I_FEATURES == ( _features & AVX3I_FEATURES )) {
	return k1_ippsHashMessage_rmf( pMsg, len, pMD, pMethod );
	} else
	if( ippCPUID_AVX2 == ( _features & ippCPUID_AVX2 )) {
	return l9_ippsHashMessage_rmf( pMsg, len, pMD, pMethod );
	} else
	if( ippCPUID_SSE42 == ( _features & ippCPUID_SSE42 )) {
	return y8_ippsHashMessage_rmf( pMsg, len, pMD, pMethod );
	} else
	return ippStsCpuNotSupportedErr;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Very Poor Multi-Threaded Performance of lib TCrypto Hashing Functions #1073

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Very Poor Multi-Threaded Performance of lib TCrypto Hashing Functions #1073

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions