Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIL testing issues on different targets #1524

Open
13 of 24 tasks
SergioGasquez opened this issue Apr 29, 2024 · 18 comments
Open
13 of 24 tasks

HIL testing issues on different targets #1524

SergioGasquez opened this issue Apr 29, 2024 · 18 comments
Assignees
Labels
bug Something isn't working tests Unit, Integration, or Hardware-in-Loop Testing
Milestone

Comments

@SergioGasquez
Copy link
Member

SergioGasquez commented Apr 29, 2024

Failing/Disabled Tests

Solved issues

@jessebraham jessebraham added bug Something isn't working tests Unit, Integration, or Hardware-in-Loop Testing labels Apr 29, 2024
@MabezDev
Copy link
Member

There are FIFO access issues on the S2 with uart, maybe there is something similar here?

@SergioGasquez
Copy link
Member Author

There are FIFO access issues on the S2 with uart, maybe there is something similar here?

They could definitely be related, is there a tracking issue for that?

@jessebraham
Copy link
Member

There have been a string of failures for the ESP32-S3, so to keep things moving I have removed the requirement for this check from the merge queue configuration. Once the issue has been resolved we will have to re-add this requirement.

@SergioGasquez
Copy link
Member Author

There have been a string of failures for the ESP32-S3, so to keep things moving I have removed the requirement for this check from the merge queue configuration. Once the issue has been resolved we will have to re-add this requirement.

Did some testing locally and, looks like spi_full_duplex_dma tests fails in S3 after #1859, the following changes makes the test pass in main:

❯ git diff
error: cannot run less: No such file or directory
diff --git a/esp-hal/src/spi/master.rs b/esp-hal/src/spi/master.rs
index e6f407a6..a006fc63 100644
--- a/esp-hal/src/spi/master.rs
+++ b/esp-hal/src/spi/master.rs
@@ -2036,7 +2036,6 @@ where
         self.update();
 
         reset_dma_before_load_dma_dscr(reg_block);
-        self.clear_dma_interrupts();
         tx_chain.fill_for_tx(false, write_buffer_ptr, write_buffer_len)?;
         tx.prepare_transfer_without_start(self.dma_peripheral(), tx_chain)
             .and_then(|_| tx.start_transfer())?;
@@ -2044,6 +2043,7 @@ where
         rx.prepare_transfer_without_start(self.dma_peripheral(), rx_chain)
             .and_then(|_| rx.start_transfer())?;
 
+        self.clear_dma_interrupts();
         reset_dma_before_usr_cmd(reg_block);
 
         reg_block.cmd().modify(|_, w| w.usr().set_bit());

@Dominaezzz
Copy link
Contributor

That's odd, when I run it on my S3 it works just fine.

What the output when you run spi_full_duplex_dma?

@SergioGasquez
Copy link
Member Author

SergioGasquez commented Jul 29, 2024

Here is the full log:

Full log
     Running tests/spi_full_duplex_dma.rs (target/xtensa-esp32s3-none-elf/release/deps/spi_full_duplex_dma-e539faf5a49e6c4d)
      Erasing ✔ [00:00:00] [#########################################################################################################################################] 192.00 KiB/192.00 KiB @ 323.19 KiB/s (eta 0s )
  Programming ✔ [00:00:00] [############################################################################################################################################] 45.08 KiB/45.08 KiB @ 59.17 KiB/s (eta 0s )    Finished in 1.368s

running 6 tests
test tests::test_symmetric_dma_transfer             ... ok
test tests::test_asymmetric_dma_transfer            ... ok
test tests::test_symmetric_dma_transfer_huge_buffer ... ERROR ====================== PANIC ======================
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR panicked at /home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/defmt-0.3.8/src/lib.rs:367:5:
explicit panic
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR Backtrace:
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x42005816
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x4200580a
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x4200332a
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x420025ca
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x42001aee
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x42001361
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x4200259f
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x42005c5e
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x420024fa
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
ERROR 0x420081a2
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25  
Frame 0: syscall_readonly @ 0x4200746d inline
       /home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/semihosting-0.1.13/src/sys/arm_compat/syscall/xtensa.rs:29:9
Frame 1: sys_exit_extended @ 0x000000004200746d inline
       /home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/semihosting-0.1.13/src/sys/arm_compat/mod.rs:196:9
Frame 2: exit @ 0x0000000042007461 inline
       /home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/semihosting-0.1.13/src/sys/arm_compat/mod.rs:165:5
Frame 3: exit @ 0x0000000042007461
       /home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/semihosting-0.1.13/src/process.rs:14:5
Frame 4: <unknown function @ 0x82007490> @ 0x82007490
FAILED
test tests::test_try_using_non_dma_memory_tx_buffer ... ok
test tests::test_try_using_non_dma_memory_rx_buffer ... ok
test tests::test_symmetric_dma_transfer_owned       ... ok

failures:

---- tests::test_symmetric_dma_transfer_huge_buffer ----
Test should Pass but it did Panic


failures:
    tests::test_symmetric_dma_transfer_huge_buffer

test result: FAILED. 5 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.53s

Error: Some tests failed
error: test failed, to rerun pass `--test spi_full_duplex_dma`

Caused by:
  process didn't exit successfully: `probe-rs run /home/sergio/Documents/Espressif/esp-rs/esp-hal/hil-test/target/xtensa-esp32s3-none-elf/release/deps/spi_full_duplex_dma-e539faf5a49e6c4d --chip esp32s3 -Zbuild-std=core,alloc` (exit status: 1)

The error occurs both when running the whole test suite and only the spi_full_duplex_dma test. I'm running espflash erase-flash before running the tests and I'm using a ESP32-S3-DevKitC-1

@Dominaezzz
Copy link
Contributor

Ah the stack doesn't show where the panic happened. I may have some time this evening to look further.

It's rather odd that this only fails on the S3 but only sometimes, and undoing it breaks async spi on the Esp32

@Dominaezzz
Copy link
Contributor

I was able to reproduce the failing HIL test and have done a bit of digging. (sorry for the wall of text, I wasn't planning on writing this much 😬 )

The quick fix for the HIL tests is to do PeripheralClockControl::reset(Peripheral::Gdma); right after the PeripheralClockControl::enable(Peripheral::Gdma); in the Dma::new implementation.
(This should probably be done in all drivers in general)

This was hinting that there was some kind of lingering broken state in the SPI/DMA driver.
Similar to what you found in this PR @SergioGasquez, it appears that even erasing the flash isn't enough to reset this broken state.
Without the PeripheralClockControl::reset(Peripheral::Gdma);, I needed to flash the ESP32-S3 with unrelated code, power cycle (unplug and replug USB) and then rerun the tests.
I don't know enough about ESP32 resets/hardware to make a suggestion yet but whatever reset procedure the HIL tests are using isn't cutting it. I'm able to observe lingering interrupt bits from previous runs.
(This may be the cause for some of the issues in the first comment)

I would've made a PR for the quick fix but it's not quite appropriate. Even if we make the HIL tests pass by resetting the GDMA peripheral, the reality is users don't have this luxury. Once they enter the broken state the spi driver stops working as expected.

I've narrowed down the broken state to be coming from test_asymmetric_dma_transfer. Removing that test makes everything run fine (even without PeripheralClockControl::reset(Peripheral::Gdma);).
My current hypothesis for why it leaves the DMA in a broken state is based on the fact that the SPI peripheral doesn't support asymmetric transfers.
The test tries to send 4 bytes and receives 2 bytes. The driver instructs the hardware to transfer 4 bytes. Unfortunately the SPI reads 4 bytes (and emits an EOF) and RX half of the DMA channel is only expecting 2 bytes, which means it's unable to receive the EOF. I know this because the SPI_DMA_INFIFO_FULL_ERR_INT_RAW bit is high.
I'm not certain why this puts the DMA channel in this weird state but I reckon it has something to do with the asymmetric transfer.

My current thinking for a proper solution is to remove support for asymmetric transfers in the base driver. The hardware doesn't support it directly. An asymmetric transfer should be implemented as two transfers, one full duplex symmetric and one half duplex. i.e. Transfer + Write or Transfer + Read. (This also makes error handling a bit better since asymmetric transfers trigger the error interrupts)

Thoughts?

@bjoernQ
Copy link
Contributor

bjoernQ commented Jul 30, 2024

One thing I discovered while looking into another (unrelated) future HIL test PR: apparently JTAG reset (which is done by the HIL tests) doesn't reset peripherals on all chips (it does on e.g. ESP32-C6/ESP32-C3 but doesn't on ESP32-S3 - probably it's different for Xtensa and RISC-V based chips).

The quick fix for the HIL tests is to do PeripheralClockControl::reset(Peripheral::Gdma); right after the PeripheralClockControl::enable(Peripheral::Gdma); in the Dma::new implementation.
(This should probably be done in all drivers in general)

I added the reset also for I2C which tends to get into weird states on errors. I agree we probably should do it for all drivers - I'll open an issue for that

@bugadani
Copy link
Contributor

probe-rs doesn't yet do a proper system reset for Xtensa targets, that can explain why peripherals aren't reset. It's coming, but I need someone to merge changes over there to progress :)

@SergioGasquez
Copy link
Member Author

Just updated the initial list, only two issues left!

@bugadani
Copy link
Contributor

Don't worry @SergioGasquez we'll have more once ESP32 is on line 🙃

@bugadani
Copy link
Contributor

ESP32-C3/ESP32-C2: uart::test_send_receive_different_baud_rates_and_clock_sources
When using the RcFast clock source, the code just hangs in https://github.com/esp-rs/esp-hal/blob/main/esp-hal/src/uart.rs#L1108 and then, the test times out. The reg_update should: Software write 1 would synchronize registers into UART Core clock domain and would be cleared by hardware after synchronization is done. but looks like its never cleared because it's not synchronized.

The issue is, RcFast is not running. Fixed in #2170

@bugadani
Copy link
Contributor

ESP32-S2: #2098

The main issue is PDMA writing more data than expected. If the buffer's size is not 1 mod 4, PDMA will transfer a whole word, regardless of how many bytes we asked for.

"Fixed" by #2179, by ensuring that DMA buffers are always a) aligned and b) a multiple of 4 bytes, regardless of what the user specified.

@bugadani
Copy link
Contributor

ESP32: spi_full_duplex_dma_async and spi_full_duplex_dma_pcnt

I'm not sure what's wrong here, the tests are enabled and passing on ESP32

@JurajSadel
Copy link
Contributor

ESP32: spi_full_duplex_dma_async and spi_full_duplex_dma_pcnt

I'm not sure what's wrong here, the tests are enabled and passing on ESP32

They were fix with #2131, I will tick them.

@bugadani
Copy link
Contributor

I2S on ESP32-S2 just needed enabling, done in #2181.

@bugadani
Copy link
Contributor

I2S on ESP32 generates two wrong clock pulses. This made the receiver reject the first sample pair, causing the test to fail. Issue worked around in #2194 by assigning the first halfword to the right channel and by outputting the right channel first.

Receivers may still get freaked out by the 2 extra clock pulses they encounter before the first sample, but I'm not sure what to do about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests Unit, Integration, or Hardware-in-Loop Testing
Projects
Status: Todo
Development

No branches or pull requests

8 participants