Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile math operations running in C code #395

Open
npetersen2 opened this issue May 30, 2024 · 0 comments
Open

Profile math operations running in C code #395

npetersen2 opened this issue May 30, 2024 · 0 comments
Assignees

Comments

@npetersen2
Copy link
Collaborator

Introduction

This issue documents @Known4225's AMDC platform onboarding project. This project will span C code firmware development, Python interface to AMDC, and creating a new page on the docs.amdc.dev website which summarizes your findings.

Goal: Quantify how long various math operation take to run on the AMDC real-time digital signal processor ("DSP").
Outcome: Report written in markdown and published on the docs website under GETTING STARTED / User Guide / Math Operations

Background

The AMDC is used for real-time control of motor drive systems: every X seconds, the AMDC samples various sensor input, performs some math on the sampled values, and then updates the PWM outputs based on the math. In the default firmware, the value of X is 100 microseconds, or a control rate of 10 kHz. For this all to work correctly, the firmware must compute the required math operations in a short time, i.e., much less than 100 us.

The AMDC uses a PicoZed system-on-module for its "brains". On this module, it has a AMD Xilinx Zynq-700 system-on-chip which is the main processor. This processor has dual core DSP and FPGA. The code which computes the math operations as described above runs on the DSP. The DSP is a standard ARM Cortex-A9 core. This is a relatively powerful processor.

We are interested in understanding how long various math operations take to complete on the Cortex-A9 processor. For example, sin(), sqrt(), /, etc. Your job is to create a framework to measure this, gather the data, and report it in a new docs web page.

Method

I envision this project using 3 core pieces of the AMDC system:

  1. Command handler (and optional state machine) which actually computes the math operation and records time stats
  2. Python scripts which run various tests on the AMDC to collect data and make plots
  3. Markdown file in the website which presents the findings

Command Handler

To collect the timing data from the AMDC, I recommend a system as follows:

First, come up with a full collection of supported math operations to profile. This should ideally be all supported standard math, i.e., from <math.h> header, for example, see here or here.

Then, write a new command handler which allows the user to run the math function and record how long it takes. This should have the following command signature:

math <num_ops> <func> <args>

where <num_ops> is an integer which tells the code how many times to evalaute the function and then returns the average run-time, <func> is the math function to use, and <args> is the arguments to the function.

Some examples:

math 50 sin 0 -- compute $sin(0)$ function 50 times and report the average run-time duration
math 10 atan 10 -- compute $atan(10)$ function 10 times and report ....
math 1 atan2 1 2 -- compute $atan2(1, 2)$ function 1 time and report ....
math 100 sqrt rand -- compute $sqrt()$ function 100 times, each with a random input
...

To implement this generally as described will require a somewhat "complex" command handler, but shouldn't be too hard.

To keep track of the run-time, I recommend something like the following (with drv/cpu_timer)

uint32_t total_time_ticks = 0;

for (int i = 0; i < N; i++) {
  uint32_t t0 = cpu_timer_now();
  double out = cos(in);
  uint32_t t1 = cpu_timer_get_time();
  total_time_ticks += t1 - t0;
}

double total_time_us = cpu_timer_ticks_to_usec(total_time_ticks);
time_per_op_us = total_time_us / N;

You can also think about using the sys/statistics module to have more complete stats, like mean, max, min, std dev, etc.

A few notes:

  • Include test support for basic math: + - / and *
  • Some math ops require 2 inputs arguments
  • Consider implementing this for different data types, double float and int. The AMDC DSP natively supports double precision, so I do not think the time will be much faster for float vs double, but would be very interesting to find out.
  • You will probably need to make sure all input and outputs from the math are volatile type so that the compiler actually performs the math. Make sure to do a sanity check at some point to ensure things are working as expected and it is actually computing the right numbers
  • Make sure to limit the total ops to do per command handler to a "small" number, like 100 or something, to ensure the total test time is short enough. This is due to the cooperative scheduler on the AMDC

Python Data Collection

Now that the AMDC firmware has the handler to measure the run-time, automate the data collection using the Python host interface and a Jupyter notebook.

For example, collect all data automatically as:

funcs_to_run = ["sin", "cos", "exp", "sqrt", "log", "pow", "floor"]

for func in funcs_to_run:
   # Run the test
   resp = amdc.cmd("math 20 %s rand" % func)
   print("Measured time:", resp[2])

   # Give AMDC a break between tests
   time.sleep(0.1)

Then, generate a plot of the findings, for example:

Website report

Follow the instructions on the docs.amdc.dev repo to set up the Sphinx build system to build the website locally. Then, add a new page for the report of this work. @codecubepi or @npetersen2 can give you support on getting the docs website build system up and running.

Make the report read as a self-contained document where it explains the purpose, background, test procedure, and gives the results.

Present results in graphs whenever possible, rendered with matplotlib directly from the jupyter notebook above. Include them as SVG files in the website (see other docs website pages for examples).

Bonus challenge: code acceleration

Using all your results, come up with a couple complicated and slow math operations which can be accelerated by using a different code implementation. I can help you with this once you have the results for each math operation.

For example, one complicated math operation is to compute the normalized 2D cross-product of two vectors to find the angle error between them. This involves normalization of the vector lengths to be 1 (but keeping the right angle), and then the actual cross product. This is quite slow and can probably be speed up by using only "fast" math operations.

Another example is a 2D vector rotation, for example, written in complex notation, out = in * exp(j * theta). This will end up implemented as cos/sin ops and multiply/accumulates. What is the fastest way to write the code to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants