Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu-operator creates ci using mig Insufficient Resources #29

Open
asskss opened this issue Dec 5, 2023 · 3 comments
Open

gpu-operator creates ci using mig Insufficient Resources #29

asskss opened this issue Dec 5, 2023 · 3 comments

Comments

@asskss
Copy link

asskss commented Dec 5, 2023

mig-config.yaml

    mig-configs:
      custom-config: 
        - devices: [0]
          mig-enabled: false
        - devices: [1]     
          mig-enabled: true   
          mig-devices:
            "7g.80gb": 1    
        - devices: [2]
          mig-enabled: true
          mig-devices:
            "2g.20gb": 3
        - devices: [3]      
          mig-enabled: true 
          mig-devices:
            "3g.40gb": 1
            "4g.40gb": 1    
        - devices: [4]      
          mig-enabled: true
          mig-devices:
            "3g.40gb": 1
            "4g.40gb": 1    
        - devices: [5]      
          mig-enabled: true
          mig-devices:
            "3g.40gb": 1
            "4g.40gb": 1
        - devices: [6]
          mig-enabled: true
          mig-devices:
            "3g.40gb": 1
            "4g.40gb": 1 
        - devices: [7]
          mig-enabled: true
          mig-devices:
            "1g.10gb": 1
            "2g.20gb": 1
            "4g.40gb": 1 

nvidia-smi mig -lcip -gi 0

+--------------------------------------------------------------------------------------+
| Compute instance profiles:                                                           |
| GPU     GPU       Name             Profile  Instances   Exclusive       Shared       |
|       Instance                       ID     Free/Total     SM       DEC   ENC   OFA  |
|         ID                                                          CE    JPEG       |
|======================================================================================|
|   0      0       MIG 1c.7g.80gb       0      0/7           14        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   0      0       MIG 2c.7g.80gb       1      0/3           28        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   0      0       MIG 3c.7g.80gb       2      0/2           42        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   0      0       MIG 4c.7g.80gb       3      0/1           56        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   0      0       MIG 7g.80gb          4*     0/1           98        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 1c.7g.80gb       0      0/7           14        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 2c.7g.80gb       1      0/3           28        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 3c.7g.80gb       2      0/2           42        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 4c.7g.80gb       3      0/1           56        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 7g.80gb          4*     0/1           98        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+

nvidia-smi mig -lci -gi 0

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0      0       MIG 7g.80gb          4         0          0:7     |
+--------------------------------------------------------------------+
|   1      0       MIG 7g.80gb          4         0          0:7     |
+--------------------------------------------------------------------+

nvidia-smi mig -cci 2 -gi 0

Unable to create a compute instance on GPU  0 GPU instance ID  0 using profile 2: Insufficient Resources
Failed to create compute instances: Insufficient Resources

I want to create a MIG 3c.7g/80gb specification prompt for Independent Resources.How to solve it.

@elezar
Copy link
Member

elezar commented Dec 5, 2023

Note that as shown, both GPUs already have [7c.]7g.80gb partitions created on them. This means that the ADDITIONAL 3c.7g.80gb partition cannot be created.

Note that since you mention the GPU Operator, the use of partitions where c != g are not currently supported there, but they should be in mig-parted.

Could you give more details on your use case?

@asskss
Copy link
Author

asskss commented Dec 11, 2023

Note that as shown, both GPUs already have [7c.]7g.80gb partitions created on them. This means that the ADDITIONAL 3c.7g.80gb partition cannot be created.

Note that since you mention the GPU Operator, the use of partitions where c != g are not currently supported there, but they should be in mig-parted.

Could you give more details on your use case?

Thanks, my usage scenario is K8S.

  1. My needs: I hope that the two models can be deployed mixedly through 4c.7g.80gb and 3c.7g.80gb. Is this mode shared video memory?
  2. If two models are deployed on one card, are they scheduled fairly or preempted freely when using CUDA?

@asskss
Copy link
Author

asskss commented Dec 11, 2023

https://forums.developer.nvidia.com/t/error-creating-cis-with-mig-on-nvidia-a30/241955
I also saw this article, not sure if it's relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants