Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first pass at exploring infra flakes #448

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

antter
Copy link
Contributor

@antter antter commented Nov 17, 2021

Related Issues and Dependencies

#447

This introduces a breaking change

  • Yes
  • [x ] No

This Pull Request implements

Infra flakes are likely not detectable in an honest matter with methods by just looking at grid data; more analysis will be needed with different data, e.g. the logs of the outputs. This is a notebook that works to detect anomalous columns in the testgrid data just using very simple probability calculations.

@sesheta sesheta added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 17, 2021
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sesheta
Copy link
Contributor

sesheta commented Nov 17, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign oindrillac after the PR has been reviewed.
You can assign the PR to them by writing /assign @oindrillac in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sesheta sesheta added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 17, 2021
@antter antter force-pushed the infra_flakes branch 2 times, most recently from d99c734 to 2a16d7e Compare November 22, 2021 18:45
@antter antter changed the title [WIP] first pass at exploring infra flakes first pass at exploring infra flakes Nov 24, 2021
@sesheta sesheta removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2021
@Shreyanand Shreyanand self-requested a review December 9, 2021 14:00
@@ -0,0 +1,973 @@
{
Copy link
Member

@Shreyanand Shreyanand Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure that the x-axis here are in days? If iirc they were successive builds. I could be wrong here, we should double check.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're right I just copy pasted this graph from some other place

@@ -0,0 +1,973 @@
{
Copy link
Member

@Shreyanand Shreyanand Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a count of the NA values? If it's 1% it wouldn't be a problem, if it's 99% we'd have to think this through.


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the NA value counting go before the array is fillna'ed with zeros?

@@ -0,0 +1,973 @@
{
Copy link
Member

@Shreyanand Shreyanand Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the last line to better explain the example. "So if a column has 2 tests that failed, and the first had a row score of 2/5...."


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please describe the intuition behind the approach in a bit more detail? E.g. what does sum of entropies signify? why are we looking at entropies of row scores for failed tests?

@@ -0,0 +1,973 @@
{
Copy link
Member

@Shreyanand Shreyanand Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why just 50?


Reply via ReviewNB

@@ -0,0 +1,973 @@
{
Copy link
Member

@Shreyanand Shreyanand Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this output mean?


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these outputs are meant to function as a "progress bar", I'd suggest just using tqdm instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tqdm has never worked on the MOC jupyterhub ):

@@ -0,0 +1,973 @@
{
Copy link
Member

@Shreyanand Shreyanand Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you bringing the score into the 0-1 range? What does 0 mean and what does 1 mean?


Reply via ReviewNB

@@ -0,0 +1,973 @@
{
Copy link
Member

@Shreyanand Shreyanand Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to know the answer to this question?


Reply via ReviewNB

@@ -0,0 +1,973 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you print some examples of jobs that have a high unexpected failure rate according to this method? And some that are expected failures?


Reply via ReviewNB

Copy link
Member

@Shreyanand Shreyanand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a creative method @antter 👏
Left some comments on the notebook.

@@ -0,0 +1,973 @@
{
Copy link
Member

@chauhankaranraj chauhankaranraj Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, adding a legend to this chart might help read it better :)


Reply via ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants