Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to train TTM on multiple repos #603

Open
oindrillac opened this issue Oct 19, 2022 · 6 comments
Open

Add functionality to train TTM on multiple repos #603

oindrillac opened this issue Oct 19, 2022 · 6 comments

Comments

@oindrillac
Copy link
Member

Add functionality to train time to merge this on multiple repos or a Github org

@oindrillac
Copy link
Member Author

Currently the workflow also fails for repositories having more than ~120 PRs redhat-et/time-to-merge-tool#4 even when the Github API rate limit is not reached. Explore options of replacing data collection notebook with script, to rule out the possibility of timing out because of the jupyter notebook cells running for too long.

@oindrillac
Copy link
Member Author

Also evaluate the size of the dataset generated and the storage space available on the github workflow worker

@oindrillac
Copy link
Member Author

Currently the workflow succeeds on repos ~400 PRs. During the workflow the Github API token got rate limited, but the workflow continued running and the data download resumed when the API rate was restored in an hour

Currently triggered workflows on larger repos with ~750 PRs, awaiting their results

As a next step, will modify the data collection step to collect PRs across an organization and monitor a sample workflow

@oindrillac
Copy link
Member Author

Jobs with a large number of PRs example thoth-station/kebechet with 750 PRs fails each time when the workflow is submitted

During the course of running the workflow, it pauses several times upon reaching the rate limit INFO:srcopsmetrics.github_handling:API rate limit REACHED, will now wait for 58 minutes.

But each time, it failed after running for around 6 hrs, with an Error: The operation was canceled. when the workflow was paused while rate limited.

@oindrillac
Copy link
Member Author

The workflow fails each time after 6 hrs even if the job was continuing to run.
eg: https://github.com/drillachat/ttmtool/actions/runs/3414370231/jobs/5682237181

It seems this is in line with Github's usage limits which does not allow jobs within a workflow to run longer than 6 hrs.

  • srcopsmetrics currently does not save intermediary results from collected data which means if the job gets canceled, no data from that PR would be saved.
  • Since we see that even collecting data from a single repo with (~750 PRs) can take > 6 hrs, we need to find a way to split this into multiple jobs within the workflow and also save intermediary results from a job on S3.

@oindrillac
Copy link
Member Author

Even if we split a workflow into multiple workflow or jobs, we will need to save intermediate results from the first job. srcopsmetrics currently does not do that in the default mode. We can try to see if we can download the data locally and save them manually incrementally on S3 per repo.

However, for repos such as thoth-station/kebechet which has over 700 PRs, srcopsmetrics even in the local mode does not save a json per PR as it saves one file per repo thus we will not be able to use srcopsmetrics to collect data for repos such as thoth-station/kebechet within a github workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant