Skip to content
This repository has been archived by the owner on Jun 9, 2023. It is now read-only.

When GithuB API crashes, it does not save any extracted data in srcopsmetrics #573

Open
suppathak opened this issue Jun 2, 2022 · 7 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/user-experience Issues or PRs related to the User Experience of our Services, Tools, and Libraries. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@suppathak
Copy link

suppathak commented Jun 2, 2022

Bug description

While extracting PR information using srcopsmetrics, when github api crashes in between -> the extracted data is not saved in srcopsmetrics. Need to start all over again
!python -m srcopsmetrics.cli -clr $org/$repo -e PullRequest

Actual behavior

Extracted data gets lost whenever there is some interruption.

Screenshot from 2022-06-02 05-41-02

Expected behavior

Should save the extracted information as a json file in srcopsmetrics folder

Screenshot from 2022-06-02 05-32-35

Environment information

srcopsmetrics = "*"
pydriller = "*"
@suppathak suppathak added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/...` label and requires one. labels Jun 2, 2022
@goern
Copy link
Member

goern commented Jun 22, 2022

/assign @xtuchyna
/priority important-soon
/triage accepted

@sesheta sesheta added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jun 22, 2022
@goern goern removed the needs-triage Indicates an issue or PR lacks a `triage/...` label and requires one. label Jun 22, 2022
@xtuchyna
Copy link
Member

Hey @suppathak , so the data should be saved whenever you terminate mi with GitHub API still working (i.e. not exceeding the API rate limit).

But I think what you meant is that when waiting for GH API to be refreshed (i.e. the mi waits for API to be refreshed:
Screenshot 2022-06-23 at 14-08-14 2022-06-23-140653_1920x2160_scrot png (PNG Image, 1920 × 2160 pixels) ) the mi does not save the data when you terminate it.

Is that right?

@suppathak
Copy link
Author

Hey @xtuchyna ,
I experienced this issue when I am extracting the data from a large repo with (~20000) Pull Request. I experienced both cases,

  • When GitHub API limit reached. The program pauses and waits . If, during this time, the server breakdown in between or something else happens which would terminate the code, then the extracted data so far will not be saved automatically.
  • When GitHub API limit is not reached yet and the program is continually running. If, during this time, something happens and ceases the running program. Then also, the extracted data will not be saved automatically.

In short, I experience this in all cases, except when the program actually finishes extracting all the PRs from a repo.

However, now I am presently extracting data from relatively small repo (which has ~10 PR). I can see here the code is working properly. That is, even if the code terminates in between. The extracted data so far is saved properly in the srcopsmetrics folder as a json file.

@xtuchyna
Copy link
Member

Thanks for clarification!
Going to inspect the issue, seems like there could be also some incremental data saving process behind the extraction, so that in the worst cases (like e.g. device shutdown or SIGTERM) there could be some backup to use.

@sesheta
Copy link
Member

sesheta commented Sep 21, 2022

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@sesheta sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2022
@goern goern removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2022
@xtuchyna
Copy link
Member

Could not reproduce the issue, seems like either it was fixed by some other PR or I have different setup.

@suppathak I guess you did everything by default and had not set other envvars differently, correct?

@oindrillac
Copy link

@xtuchyna I am having the same issue. I am using srcopsmetrics in a GitHub action workflow job which gets terminated on its own after 6 hrs of running for large repositories with many PRs. This is because Github actions has a limit for jobs longer than 6 hrs and after it gets terminated and the collected data doesnt get saved on S3.

Screen Shot 2022-11-08 at 11 44 18 AM

As described here aicoe-aiops/ocp-ci-analysis#603 (comment), it would be great if srcopsmetrics when run in the default (save to s3) mode was able to upload the intermediate data to s3 as well, so i could split these longer jobs into multiple jobs and not waste the outcome of collection from the first job itself.

@codificat codificat added the sig/user-experience Issues or PRs related to the User Experience of our Services, Tools, and Libraries. label Jan 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/user-experience Issues or PRs related to the User Experience of our Services, Tools, and Libraries. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: 🆕 New
Development

No branches or pull requests

6 participants