Skip to content

AES hardware acceleration not working for STM32F439xI #4928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andresag01 opened this issue Aug 17, 2017 · 32 comments · Fixed by #5018
Closed

AES hardware acceleration not working for STM32F439xI #4928

andresag01 opened this issue Aug 17, 2017 · 32 comments · Fixed by #5018

Comments

@andresag01
Copy link

andresag01 commented Aug 17, 2017

Description

  • Type: Bug
  • Priority: Major

Bug

Our automated tests for the tls-client example in the mbed-os-example-tls fails with the following error message printed in the serial console (target UBLOX_EVK_ODIN_W2) :

mbedtls_ssl_handshake() failed: -0x7780 (-30592): SSL - A fatal alert message was received from our peer

When we enable debug printing, we observe that the TLS connection terminates prematurely because the server sent the tls-client a fatal alert message as the MAC of a TLS record does not check out:

...
ssl_tls.c:3961: |2| got an alert message, type: [2:20]
ssl_tls.c:3969: |1| is a fatal alert message (msg 20)
ssl_tls.c:3744: |1| mbedtls_ssl_handle_message_type() returned -30592 (-0x7780)
ssl_cli.c:3184: |1| mbedtls_ssl_read_record() returned -30592 (-0x7780)
ssl_tls.c:6354: |2| <= handshake
mbedtls_ssl_handshake() failed: -0x7780 (-30592): SSL - A fatal alert message was received from our peer
...

We investigated the problem and found that disabling the AES hardware acceleration code fixes it. To test this, we used the following diff:

diff --git a/features/mbedtls/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/mbedtls_device.h b/features/mbedtls/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/mbedtls_device.h
index dfbc820..2c2fff8 100644
--- a/features/mbedtls/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/mbedtls_device.h
+++ b/features/mbedtls/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F439xI/mbedtls_device.h
@@ -20,8 +20,6 @@
 #ifndef MBEDTLS_DEVICE_H
 #define MBEDTLS_DEVICE_H
 
-#define MBEDTLS_AES_ALT
-
 #define MBEDTLS_SHA256_ALT
 
 #define MBEDTLS_SHA1_ALT

Target
STM32F439xI family of devices with hardware acceleration enabled

Toolchain:
GCC_ARM

mbed-os sha:
Git tag mbed-os-5.5.5

Expected behavior
The tls-client example should succeed.

Actual behavior
The tls-client example fails with error:

mbedtls_ssl_handshake() failed: -0x7780 (-30592): SSL - A fatal alert message was received from our peer

Steps to reproduce
Run the tls-client at mbed-os-example-tls repository (with mbed-os-5.5.4 tag) using the GCC_ARM toolchain on the UBLOX_EVK_ODIN_W2 target. The failure message can be observed in the serial output.

@andresag01
Copy link
Author

cc @RonEld @Patater @0xc0170

@RonEld
Copy link
Contributor

RonEld commented Aug 17, 2017

Hi @andresag01 Thanks for raising this. I don't understand the description, is the target device STM32F439Xl or UBLOX_EVK_ODIN_W2 ?
Also, do you happen to know if the AES used here is AES192 by any chance?

@Patater
Copy link
Contributor

Patater commented Aug 17, 2017

@RonEld UBLOX_EVK_ODIN_W2 is a TARGET_STM32F439xI, so it is affected.

@RonEld
Copy link
Contributor

RonEld commented Aug 17, 2017

I see, thanks,

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 17, 2017

@andresag01
Copy link
Author

cc @adustm

@Patater
Copy link
Contributor

Patater commented Aug 17, 2017

This issue isn't affecting only u-blox targets. This issue affects at least all STM32F439xI-family targets that support AES hardware acceleration.

@andresag01
Copy link
Author

@RonEld: The ciphersuite used for this specific server and example is TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256. So I suppose its AES-128.

@adustm
Copy link
Member

adustm commented Aug 18, 2017

Hello, Thanks for reporting. I have reproduced the issue and will look at it.

@JanneKiiskila
Copy link
Contributor

Seems that using the HW acceleration for crypto also breaks the SD-cards init.

Patater added a commit to Patater/mbed-os that referenced this issue Aug 18, 2017
STM32F439xI-family AES hardware acceleration occasionally produces
incorrect output (ARMmbed#4928).

Don't enable AES HW acceleration on STM32F439xI-family targets by
default until issue ARMmbed#4928 is fixed.
@adustm
Copy link
Member

adustm commented Aug 18, 2017

Hello, I have a question. Is it possible that once the issue happens ('TLS handshake failure'), the server refuses a new connection from my IP address for a while ?
It looks like it is difficult to reconnect when pressing the reset button several times in a raw

Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Starting the TLS handshake...
mbedtls_ssl_handshake() failed: -0x7780 (-30592): SSL - A fatal alert message 
was received from our peer
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009
Using Ethernet LWIP
Client IP Address is 192.168.1.100
Connecting with developer.mbed.org
Failed to connect
MBED: Socket Error: -3009

@sg-
Copy link
Contributor

sg- commented Aug 18, 2017

Seems that using the HW acceleration for crypto also breaks the SD-cards init.

@JanneKiiskila Why would you suggest this? AES and SD/SPI should be completely unrelated. Do you have an application example of this failure?

@andresag01
Copy link
Author

@adustm: I looked up the error number -3009 and found this in mbed-os/features/netsocket/nsapi_types.h:

NSAPI_ERROR_DNS_FAILURE         = -3009,     /*!< DNS failed to complete successfully */ 

Also, from the tls-client app error message you got, it seems the failure was in line tls-client/main.cpp:205:

        mbedtls_printf("Connecting with %s\r\n", _domain);
        ret = _tcpsocket->connect(_domain, _port);
        if (ret != NSAPI_ERROR_OK) {
            mbedtls_printf("Failed to connect\r\n");
            printf("MBED: Socket Error: %d\r\n", ret);
            _tcpsocket->close();
            return;
        }

It looks to me like the device is not able to resolve the DNS? Perhaps the device is not in the network or the server is somehow unreachable? Perhaps there is some network configuration that is causing your device to return this error when it is reset quickly too many times? It seems that there are multiple functions in ./features/netsocket/nsapi_dns.h that could return that specific error code, you could try looking there.

I suppose that it is also possible for servers to refuse connections from the same IP in quick succession, but I would expect the error to have a different value. Of course, I could be wrong...

@andresag01
Copy link
Author

I just wanted to quickly ask if there were any updates regarding this issue...

@adustm
Copy link
Member

adustm commented Aug 22, 2017

Dear all,
I've tried to disable / enable interrupts during the HW process, Remove the AES_FORCE_RESET during aes_free function / and some other things. No clue at the moment...
You can find in attachment the log files of the tls_handshake part (teratermaes_sw.txt is the OK version when there is no AES HW acceleration, teraterm_aes_alt.txt is the failing version with AES HW acceleration). This is done with DEBUG LEVEL 4

Could someone look at that ?
At line 762 of the log files, we can see that the failing version receives a message length of 2 and not 202.

teratermaes_sw.txt
teraterm_aes_alt.txt

Kind regards
Armelle

@RonEld
Copy link
Contributor

RonEld commented Aug 22, 2017

Hi @adustm As you can see, it's not only a different message length. It's a different message. The msgtype is 21 ( alert message) instead of 22 (handshake message. The reason for TLS failure is a fatal alert message received by the server. We need to investigate reason for the alert message, and why with HW acceleration the server failed. I suggest you test AES GCM with and with HW accelerated AES, perhaps there is something wrong with this part of the message

@JanneKiiskila
Copy link
Contributor

JanneKiiskila commented Aug 22, 2017

The SD-card issue for us is related to the fact that we we encrypt the SD-card content, so it seems the HW crypto block doesn't work reliably. With the mbed-os-example-client we see the TLS failure.

@RonEld
Copy link
Contributor

RonEld commented Aug 22, 2017

Hi @adustm,
Could you check whether HAL_CRYP_AESECB_Encrypt has failed on this device, and since mbedtls_aes_encrypt doesn't return error, the driver's error wasn't surfaced up?

@Patater
Copy link
Contributor

Patater commented Aug 23, 2017

Hi @adustm,

Code freeze for Mbed OS 5.5.6 is tomorrow (2017-08-24). Will a fix be ready by then? If not, could you please review #4934 ?

Thanks

@adustm
Copy link
Member

adustm commented Aug 25, 2017

Hello @RonEld

Could you check whether HAL_CRYP_AESECB_Encrypt has failed on this device, and since mbedtls_aes_encrypt doesn't return error, the driver's error wasn't surfaced up?

No error was returned by HAL_CRYP_AESECB_Encrypt .

GCM selftest is also fine (tested with both master branch and mbed-os-5.5 branch).
test case: 'mbedtls_gcm_self_test' ........................................................... OK in 2.58 sec

Would you like to suggest another test ?
Kind regards
Armelle

@adustm
Copy link
Member

adustm commented Aug 25, 2017

I have modified gcm.c so that it can test 2 instances of ctx in parallel, and it's all OK. It looks like the AES hardware is perfectly well managing the save and restore context.

Would someone have a multiple aes thread example that I could work on ?

Kind regards
Armelle

@RonEld
Copy link
Contributor

RonEld commented Aug 27, 2017

Hi @adustm
The alert message that is received is MBEDTLS_SSL_ALERT_MSG_BAD_RECORD_MAC , so I am quite positive that it is a matter of GCM result is not as expected. Probably the key used on both sides is different. Since GCM uses AES, I would focus on the AES part, as you are doing.
I think your direction on multi-threading is correct.

Regards,
Ron

@adustm
Copy link
Member

adustm commented Aug 28, 2017

Hello @RonEld
I have rewritten the gcm_selftest in order to launch 5 threads of GCM in // (see attached main.txt file, to rename as main.cpp if you want to test it)

main.txt

It's all OK.

+-----------------------+---------------+----------------------+--------+--------------------+-------------+
| target                | platform_name | test suite           | result | elapsed_time (sec) | copy_method |
+-----------------------+---------------+----------------------+--------+--------------------+-------------+
| NUCLEO_F439ZI-GCC_ARM | NUCLEO_F439ZI | tests-mbedtls-thread | OK     | 19.89              | shell       |
+-----------------------+---------------+----------------------+--------+--------------------+-------------+

Any other idea ?

@adustm
Copy link
Member

adustm commented Aug 28, 2017

@JanneKiiskila could I access your program to test it ?

@RonEld
Copy link
Contributor

RonEld commented Aug 28, 2017

HI @adustm At the moment, I can think that perhaps there was some preemption, causing the HW to load a different key. Perhaps it's a matter of GCM + AES muti threading scenario.

@JanneKiiskila
Copy link
Contributor

JanneKiiskila commented Aug 28, 2017

Hei,

@adustm - I know STM is a member of mbed Cloud Partners, you have access to these repositories which contain the SW we are running.

Email was sent with a bit more details.

@JanneKiiskila
Copy link
Contributor

Can we raise this to blocker, please.

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 31, 2017

Can we raise this to blocker, please.

The fix will get CI once CI is back running.

@RobMeades
Copy link
Contributor

The fix will get CI once CI is back running.

@0xc0170 can you add a reference here to the fixing PR?

@JanneKiiskila
Copy link
Contributor

I think this is the PR: #4934

@RobMeades
Copy link
Contributor

That's not the fix, though, that's a workaround. I guess ST, maybe @adustm, is still fighting the problem?

@adustm
Copy link
Member

adustm commented Sep 5, 2017

Hello,
The fix is eventually here in PR #5018
(with explanations)
Kind regards
Armelle

adbridge pushed a commit that referenced this issue Sep 12, 2017
STM32F439xI-family AES hardware acceleration occasionally produces
incorrect output (#4928).

Don't enable AES HW acceleration on STM32F439xI-family targets by
default until issue #4928 is fixed.
adbridge pushed a commit that referenced this issue Sep 13, 2017
STM32F439xI-family AES hardware acceleration occasionally produces
incorrect output (#4928).

Don't enable AES HW acceleration on STM32F439xI-family targets by
default until issue #4928 is fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants