Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] No subclasses refactor v2 #437

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

[WIP] No subclasses refactor v2 #437

wants to merge 16 commits into from

Conversation

cgarciae
Copy link
Contributor

The idea of this PR is to:

  1. Remove the classes LuxDataFrame, LuxSeries, etc
  2. Entirely avoid strong modifications to the pandas library such as overriding pandas.DataFrame class (and all other internal references to it) with their Lux counterpart e.g.:
 pandas.DataFrame = LuxDataFrame

Current behaviour can be surprising to the user and it makes supporting new versions harder as you might have to deal with more edge cases because the impact of the modifications is so large. Instead the idea is to some lightweight monkey patching of fiew core methods (which mostly do bookkeeping) and move the rest to out of the DataFrame class to avoid possible conflicts. A lot of discussion and possible iteration is needed. These are the current efforts:

Ongoing changes

  • LuxDataFrame as a class is removed, instead key methods of interest form pandas.DataFrame are patched.
  • The type LuxDataFrame is now a Protocol that is there only for possible validation and type inference purposes.
  • Core methods are patched using a new @patch(cls) decorator, this replaces the method on the class but the original method is still available as ._super_{method_name}, e.g:
@patch(DataFrame)
def _set_axis(self, axis, labels):
    self._super_set_axis(axis, labels)
    ...
  • To reduce the surface area in which Lux impacts Pandas, a single .lux property that includes all lux extension methods is created. Its very similar to the existing .str and .dt functionality already present in pandas, here is an example usage:
df.lux.save_as_html(...)

@cgarciae
Copy link
Contributor Author

cgarciae commented Nov 23, 2021

Hey @dorisjlee, I created this new PR which aims at a cleaner refactor in that it does as few modifications possible to the existing structure e.g. doesn't mess with imports. It has the following differences with the previous:

  • LuxDataFrameMethods and LuxSeriesMethods now inherit from LuxMethods which contains common fields and a utility class method called from_lux_object that facilitates copying fields.
  • Patching is performed in the lux/patch/{frame, series}.py scripts.

This time I am starting bottom up in that I created a new test file called tests/test_basic.py that I am using as a playground to tryout very simple stuff. Currently it only has 1 test that sets an intent to a dataframe and checks that its propagated to a series. Feel free to play around with this I think the basic structure is there.

Next Steps

  • Change all {dataframe}.{lux_attribute} calls to {dataframe}.lux.{lux_attribute}, specifically code that has the form ldf.{lux_attribute} can be replaced in mass to ldf.lux.{lux_attribute} for all known attributes.
  • For all LuxMethods code, change {lux_methods}.{dataframe_attribute} to {lux_methods}.df.{dataframe_attribute}.

@dorisjlee
Copy link
Member

Thanks @cgarciae! This makes sense, we can slowly add the tests back in to test a subset of the functionality one at a time.

for attr in self._metadata:
groupby_obj.__dict__[attr] = getattr(self, attr, None)
if history_flag:
groupby_obj._history = groupby_obj._history.copy()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic has to change since groupby_obj no longer has ._history.

@cgarciae
Copy link
Contributor Author

cgarciae commented Dec 6, 2021

Hey @dorisjlee! I've been fixing tests and solving bugs.
For tomorrow I'd like to discuss is the implementation of groupby, logic has to be adjusted so feedback would be appreciated.

@cgarciae
Copy link
Contributor Author

cgarciae commented Dec 14, 2021

@dorisjlee In this test, after df.lux.intent = ["Attrition"] it results in df.current_vis being non-empty, new_df copies current_vis and on the last assertion its still not empty. At which point should current_vis be emptied?

def test_metadata_propogate_invalid_intent():
    df = pd.read_csv(
        "https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/employee.csv")
    df.lux.intent = ["Attrition"]
    new_df = df.groupby("BusinessTravel").mean()
    assert new_df.lux.intent[0].attribute == "Attrition", "User-specified intent is retained"
    assert new_df.lux._inferred_intent == [], "Invalid inferred intent is cleared"
    new_df._ipython_display_()
    assert new_df.lux.current_vis == [] # <<--- not being emptied

@cgarciae
Copy link
Contributor Author

cgarciae commented Dec 15, 2021

Just as a recap, I ran all the tests an dcurrently these are the results:

125 failed, 103 passed

@dorisjlee
Copy link
Member

@dorisjlee In this test, after df.lux.intent = ["Attrition"] it results in df.current_vis being non-empty, new_df copies current_vis and on the last assertion its still not empty. At which point should current_vis be emptied?

def test_metadata_propogate_invalid_intent():
    df = pd.read_csv(
        "https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/employee.csv")
    df.lux.intent = ["Attrition"]
    new_df = df.groupby("BusinessTravel").mean()
    assert new_df.lux.intent[0].attribute == "Attrition", "User-specified intent is retained"
    assert new_df.lux._inferred_intent == [], "Invalid inferred intent is cleared"
    new_df._ipython_display_()
    assert new_df.lux.current_vis == [] # <<--- not being emptied

Hey @cgarciae, The current_vis is cleared when we call clear_intent, which calls set_intent. Inside the set_intent, when we do _parse_validate_compile_intent the current_vis gets recomputed based on the now empty intent, and so becomes empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants