Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ways to import PrometheusConnect without also importing **huge** pandas and matplotlib #225

Open
buttonfly1000 opened this issue Nov 4, 2021 · 11 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. todo 🗒️

Comments

@buttonfly1000
Copy link

Is your feature request related to a problem? Please describe.

I found this simple import

from prometheus_api_client import PrometheusConnect

not only import PrometheusConnect itself, but also pandas and possibly matplotlib, which take about 50MB more unnecessary memory when I don't want to use DataFrames and plot them.

Is there any way to only import PrometheusConnect without also importing huge pandas and matplotlib?

@4n4nd 4n4nd added good first issue Good for newcomers help wanted Extra attention is needed todo 🗒️ labels Dec 14, 2021
@chauhankaranraj
Copy link
Collaborator

not only import PrometheusConnect itself, but also pandas and possibly matplotlib, which take about 50MB more unnecessary memory when I don't want to use DataFrames and plot them.

Hi @thetaprimeprime, that's an great observation! I did some memory profiling and can confirm that the additional pandas and matplotlib imports do indeed increase the memory usage by about ~45MB.

Is there any way to only import PrometheusConnect without also importing huge pandas and matplotlib?

At the moment, I don't think so. But I believe this would be a nice and welcome improvement 😃 Is this something you'd like to work on, or would you rather someone from our team do it?

One way to accomplish this could be to refactor this python module into submodules, something like this:

prometheus_api_client
├── core
│   ├── __init__.py
│   └── prometheus_connect.py
├── exceptions
│   ├── base_exception.py
│   └── __init__.py
├── __init__.py --> only import core.* here, to avoid importing mpl, pandas
├── parsers
│   ├── __init__.py
│   ├── metric.py
│   ├── metric_range_df.py
│   ├── metrics_list.py
│   └── metric_snapshot_df.py
└── utils
    ├── datetime_utils.py
    ├── __init__.py
    └── print_utils.py

Just a suggestion off the top of my head, we should explore other ideas as well.

/cc @4n4nd

@sesheta
Copy link
Contributor

sesheta commented Mar 23, 2022

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@sesheta sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 23, 2022
@sesheta
Copy link
Contributor

sesheta commented Apr 22, 2022

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@sesheta sesheta added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 22, 2022
@sesheta
Copy link
Contributor

sesheta commented May 22, 2022

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@sesheta sesheta closed this as completed May 22, 2022
@sesheta
Copy link
Contributor

sesheta commented May 22, 2022

@sesheta: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@raqbit
Copy link

raqbit commented Mar 28, 2023

This is also making it hard to use prometheus-api-client-python in alpine-based docker images, as both matplotlib & pandas need to build wheels.

@4n4nd 4n4nd reopened this Mar 28, 2023
@4n4nd
Copy link
Owner

4n4nd commented Mar 28, 2023

@raqbit do you have any suggestions on how we could improve this?

@raqbit
Copy link

raqbit commented Mar 28, 2023

@raqbit do you have any suggestions on how we could improve this?

Any way that the pandas & matplotlib dependencies can be avoided with a 'non-development' install would help with this. I'm not very familiar with how this is usually done, but I think setuptools optional dependencies will do the job.

@chauhankaranraj
Copy link
Collaborator

This is also making it hard to use prometheus-api-client-python in alpine-based docker images, as both mathplotlib & pandas need to build wheels.

Hi @raqbit, could you please describe your issue in a bit more detail i.e. why is hard to use in alpine based container images? Are you unable to install these dependencies in the container? Or do they take a longer time to install? Or do they bloat the image?

Basically, I think OP’s issue here is with slowness of the imports at runtime (which imo can be easily solved by a restructure). Whereas your issue sounds a bit more about installation? (which can't be solved by restructure, and is a bit more involved). So just wanted to get some clarity and prioritize accordingly :)

@raqbit
Copy link

raqbit commented Mar 31, 2023

This is also making it hard to use prometheus-api-client-python in alpine-based docker images, as both mathplotlib & pandas need to build wheels.

Hi @raqbit, could you please describe your issue in a bit more detail i.e. why is hard to use in alpine based container images? Are you unable to install these dependencies in the container? Or do they take a longer time to install? Or do they bloat the image?

Basically, I think OP’s issue here is with slowness of the imports at runtime (which imo can be easily solved by a restructure). Whereas your issue sounds a bit more about installation? (which can't be solved by restructure, and is a bit more involved). So just wanted to get some clarity and prioritize accordingly :)

Yes. Both pandas and matplotlib (Plus, numpy which is pulled in by pandas) have native components written in C. Normally this means that the Python Wheel (pre-compiled binary component) is downloaded and all is well (except for the longer import times mentioned by OP). For Alpine Linux, however, there is no such wheel available as Alpine uses the musl C standard library implementation instead of the more common glibc. This, in turn, will cause Pip to try to compile the C code for these native components during the installation process as it is unable to find suitable wheels in the package from pypi.

This requires a complete c compiler toolchain to be available, and in my case, was taking 20+ minutes to complete (minutes of compiling C-code with all available cores, making the computer unusable for other tasks).

The bloat of the compiler toolchain can be avoided by uninstalling them after running pip install, but the compilation step cannot be avoided without using a different container base-image such as debian.

@chauhankaranraj
Copy link
Collaborator

Yes. Both pandas and mathplotlib (Plus, numpy which is pulled in by pandas) have native components written in C. Normally this means that the Python Wheel (pre-compiled binary component) is downloaded and all is well (except for the longer import times mentioned by OP). For Alpine Linux, however, there is no such wheel available as Alpine uses the musl C standard library implementation instead of the more common glibc. This, in turn, will cause Pip to try to compile the C code for these native components during the installation process as it is unable to find suitable wheels in the package from pypi.

This requires a complete c compiler toolchain to be available, and in my case, was taking 20+ minutes to complete (minutes of compiling C-code with all available cores, making the computer unusable for other tasks).

The bloat of the compiler toolchain can be avoided by uninstalling them after running pip install, but the compilation step cannot be avoided without using a different container base-image such as debian.

Got it, thanks for the details! In theory yes we could set up installation so that pip install prometheus-api-client-python just installs the "core" library components (PrometheusConnect). And pandas and matplotlib can be listed as dependencies in extras_require to be installed with something like pip install prometheus-api-client-python[full], installing the other componenets (Metric, MetricSnapshotDataFrame, etc). Note that numpy is still required by PrometheusConnect, so it'd require additional changes to move that to extras_require.

However, wouldn't doing so potentially break installations for existing users? I'm in favor of this change if it causes minimal disruption for users while maximizing the benefit. Or if we can map out a rollout procedure to ensure so. But I currently don't have any supporting artifacts to form an opinion on this. Maybe @4n4nd can weigh in here as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. todo 🗒️
Projects
None yet
Development

No branches or pull requests

5 participants