Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a2xx/a3xx/a4xx: gpu fixes / opp table Samsung J5 #332

Draft
wants to merge 8 commits into
base: msm8916/6.6-rc5
Choose a base branch
from

Conversation

chr-chr
Copy link

@chr-chr chr-chr commented Oct 15, 2023

Hi,

My aim was simply to pretty up the gpu frequencies table. That raised bugs.

  • update MSM8916 gpu frequency table (dtsi)
  • fix msm_gpu PM clk handling
  • fix gcc-msm8916 clk
  • fix a3xx gpu busy counter
  • new msm_gpu_devfreq load meter (simple avg)

Need more testing on other devices. May uncover or fix bugs on other devices:

adreno 1c00000.gpu: _opp_config_clk_single: failed to set clock rate: -22 EINVAL

<...>: rcg didn't update its configuration

Please check if your device gpu ever uses the between frequencies:

cd /sys/class/devfreq/1c00000.gpu
watch -d grep . *

Apply devfreq patch, for the load meter!

The gpu busy counter was broken for me, wrong reg.
I fixed it for a3xx
a4xx may be broken aswell?
a5xx ... pls test
a6xx ... pls test

Before I got:

available_frequencies:19200000 400000000
available_governors:userspace powersave performance simple_ondemand
cur_freq:400000000
governor:simple_ondemand
max_freq:400000000
min_freq:19200000
name:1c00000.gpu
polling_interval:50
target_freq:400000000
timer:delayed
trans_stat:     From  :   To
trans_stat:           :  19200000 400000000   time(ms)
trans_stat:   19200000:         0         2         0
trans_stat:* 400000000:         2         0    925683
trans_stat:Total transition : 4

after:

available_governors:userspace powersave performance simple_ondemand
cur_freq:200000000
governor:simple_ondemand
load:65.08 %
max_freq:465000000
min_freq:19200000
name:1c00000.gpu
polling_interval:50
target_freq:200000000
timer:delayed
trans_stat:     From  :   To
trans_stat:           :  19200000 100000000 200000000 310000000 400000000 465000000   time(ms
)
trans_stat:   19200000:         0         0         0         0         0         1       623
trans_stat:  100000000:         0         0         0         0         0         3      4036
trans_stat:* 200000000:         0         3         0         0         0         2     11750
trans_stat:  310000000:         0         0         6         0         0         0     22790
trans_stat:  400000000:         0         0         0         5         0         0     10583
trans_stat:  465000000:         0         0         0         1         5         0     19733
trans_stat:Total transition : 26

The overclocking is optional ;)

testing cheat sheet:

full fps test:
vblank_mode=1 glxgears -fullscreen

and in vsync mode we want 300 frames/5s 60pfs:
glxgears -fullscreen
works well with 200MHz only (X11).

posh/wayland wants 400MHz for 60fps
note about a possible frame drop to 30fps/200MHz in wayland:
fine tune in /sys/kernel/debug/dri/0/devfreq *)
On the other hand: power cons. dropps ~150mW (!)

*) downdifferential < upthreshold < 100 (%)

Setting bus speed "opp-peak-kBps" needs interconnect frame work, which is broken. I dropped the settings as comments.

@stephan-gh
Copy link
Member

  • AFAIK the 465 MHz GPU is basically only meant for MSM8916v2 (aka Snapdragon 412). Which is pretty much only used in the BQ Aquaris X5 (longcheer-l8910). I doubt you will find the speedbin set on any other device.
  • Can you clarify why the opp-suspend is necessary? I don't quite understand it yet.

@chr-chr
Copy link
Author

chr-chr commented Oct 16, 2023

  • AFAIK the 465 MHz GPU is basically only meant for MSM8916v2 (aka Snapdragon 412). Which is pretty much only used in the BQ Aquaris X5 (longcheer-l8910). I doubt you will find the speedbin set on any other device.

Well, at least it works. My devices shows up:

adreno 1c00000.gpu: speed-bin version: 1 value: 0

I was looking at an LineageOS-custom ROM I 'm using and I realized they are happily overclocking CPU and GPU .... wtf?
However my intention was to engage the in between values to get power savings. However, actually they don't get picked: gpu+devfreq/governor is broken. To test set max value.

* Can you clarify why the `opp-suspend` is necessary? I don't quite understand it yet.

It's a hint for devfreq_suspend_device / resume handlers, they will call then devfreq_set_target()

And I must hotfix that ... bug: with my patch the gpu may not ramp up at all from suspend

@stephan-gh
Copy link
Member

Note that implementing/extending frequency scaling does not necessarily result in power saving. It's always a balance between powering off [the GPU] completely when it is idle vs. keeping it on at a lower frequency. When [the GPU] is powered off (suspended), the frequency shouldn't matter because the clock should get disabled entirely. In powered off/suspended state it should also consume the least amount of power (ideally none).

This means there are two possibilities if you consider a lightly loaded system (with not much to do for the CPU or GPU):

  1. You can run the GPU at maximum frequency and try to finish the work quickly so that you can quickly power off the GPU entirely again.
  2. You can reduce the GPU frequency. This means the GPU will consume less power, but it will also take longer to finish the work.

It's hard to generalize which one of these is better. I would expect (2) is better if you have a lightly loaded system with frequent small periodic tasks, which makes it too expensive to power off the GPU inbetween. Otherwise (1) could also be better in many cases.

If you want to be sure that you are actually improving power consumption I would strongly recommend that you try to measure the power consumption with and without your changes. Otherwise you might end up making power consumption worse. :')

When the GPU is powered off (suspended), the frequency shouldn't matter because the clock should get disabled entirely.

BTW, given ^, I'm still not sure why you need the opp-suspend. The frequency of the GPU shouldn't matter if it gets powered off/suspended. I would expect that you are more likely to slow down the suspend/resume procedure which would waste power.

@chr-chr
Copy link
Author

chr-chr commented Oct 16, 2023

Hi, thanks for the comments

If you want to be sure that you are actually improving power consumption I would strongly recommend that you try to measure the power consumption with and without your changes. Otherwise you might end up making power consumption worse. :')

True, this is why I had to make brightness control working because it's consuming the most power:
Starting glxgears in fullscreen surprisingly dropped the power consumption ... which reminds me that's an OLED display :)

Actually lowering freq seems to reduce power consumption a bit: 200MHz are ok for 60fps glxgears (synced).
Still there's the option to switch the bus speed, that call is actually a nop because this interconnect framework isn't enabled .... and it's broken :/

I'm still not sure why you need the opp-suspend. The frequency of the GPU shouldn't matter if it gets powered off/suspended.

True, but even beforehand the clock was set without opp-suspend in suspend/resume handlers. So I just kept it that way. And the mistake was that setting the freq on a disabled clock locks up.

Remember, this is a draft, I'm still not 100% happy with that ;)

@chr-chr chr-chr changed the title SM-J5: gpu fixes a2xx/a3xx/a4xx: gpu fixes / opp table Samsung J5 Oct 17, 2023
@chr-chr
Copy link
Author

chr-chr commented Oct 18, 2023

Good news!

The busy counter is broken and I found out why and got it fixed!
Pushing soon ... there's still just one issue ;)

Meanwhile my 2 cents on this:

  1. You can run the GPU at maximum frequency and try to finish the work quickly so that you can quickly power off the GPU entirely again.

Thats the way it's implemented.
Why? My assumtions:
No other way for powersaving! Why?
Because there were just 2 steps available: min and max. Why??
Well, I smell they were facing the same lockups and OOpses I got while trying ...

Also the governor on_demand wasn't made for this. To cheat around is ugly.
I'd use the passive instead or:

  1. You can reduce the GPU frequency. This means the GPU will consume less power, but it will also take longer to finish the work.

And I assume this will make a better power saving. My rought measures seems to prove -99mW.
Actually 1) turns /sys/class/power_supply/sm5703_fg/current_now into a great source of entropy.

Christoph Rudorff added 8 commits October 21, 2023 18:24
Signed-off-by: Christoph Rudorff <[email protected]>
On my device that calc. was always >100% load

got this from upstream, and a4xx may be bugged aswell

the /sys/kernel/debug/dri/0/perf does work ...

Signed-off-by: Christoph Rudorff <[email protected]>
Signed-off-by: Christoph Rudorff <[email protected]>
adreno 1c00000.gpu: _opp_config_clk_single: failed to set clock rate: -22 EINVAL
a3xx_set_supported_hw() set opp-supported-hw from fuse value,
defaults to 0x1 in case the value wasn't set.

Signed-off-by: Christoph Rudorff <[email protected]>
gfx3d_clk_src rcg didn't update its configuration.

set clk_enable before set_rate
set "suspend rate" before clk_disable

Signed-off-by: Christoph Rudorff <[email protected]>
for fine tuning:
/sys/kernel/debug/dri/0/devfreq/

Signed-off-by: Christoph Rudorff <[email protected]>
@chr-chr
Copy link
Author

chr-chr commented Oct 21, 2023

Ok, here is my state of the art, tested and works very well on my device, finally.

So, the impact is greater than I expected.
Looking at history, I'm not the first one making this more crazy. oO Am I?

All in all this reverts msm_gpu_devfreq pretty down to some "more cleverness", according to the logs. Am I?

@stephan-gh
Copy link
Member

Nice! I'm afraid at this point you would need to convince the upstream freedreno people about this though, since the changes are a bit too extensive to maintain downstream. There are some minor cosmetic things I could point out but aside from that I don't have enough experience with the Adreno GPU drivers to review this properly.

@chr-chr
Copy link
Author

chr-chr commented Oct 21, 2023

Nice! I'm afraid at this point you would need to convince the upstream freedreno people about this though

Well, most convincing would be having more positive test reports.

I do see power savings.

TravMurav pushed a commit that referenced this pull request Jul 17, 2024
Add a test case which replaces an active ingress qdisc while keeping the
miniq in-tact during the transition period to the new clsact qdisc.

  # ./vmtest.sh -- ./test_progs -t tc_link
  [...]
  ./test_progs -t tc_link
  [    3.412871] bpf_testmod: loading out-of-tree module taints kernel.
  [    3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
  #332     tc_links_after:OK
  #333     tc_links_append:OK
  #334     tc_links_basic:OK
  #335     tc_links_before:OK
  #336     tc_links_chain_classic:OK
  #337     tc_links_chain_mixed:OK
  #338     tc_links_dev_chain0:OK
  #339     tc_links_dev_cleanup:OK
  #340     tc_links_dev_mixed:OK
  #341     tc_links_ingress:OK
  #342     tc_links_invalid:OK
  #343     tc_links_prepend:OK
  #344     tc_links_replace:OK
  #345     tc_links_revision:OK
  Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants