Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes slicing of feasibility table in keep_form_with_max_profit() in Developer.py #206

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

apdjustino
Copy link

Developer.pick() allows a list of forms passed to the forms parameter, which will slice the feasibility dataframe to include the forms in the list. When Pandas slices a df with a MultiIndex, it keeps all of the levels from the original df, even if they are not used. Since all the levels remain in MultiIndex of columns the sliced feasibility df, calling stack(level=0) on the sliced df will put forms that we are trying to exclude in the stacked output.

[Refer to the pandas docs] https://pandas.pydata.org/pandas-docs/stable/advanced.html#defined-levels) for more information about Defined Levels in pandas MultiIndex

… from feasibility table. This happens when user passes list in forms parameter to pick() in developer
…vels() to get sliced data from feasibility table.
@coveralls
Copy link

coveralls commented Oct 4, 2017

Coverage Status

Coverage decreased (-0.05%) to 94.29% when pulling bc84eb3 on apdjustino:fix-defined-levels-developer into 2497039 on UDST:master.

@pksohn pksohn self-assigned this Oct 6, 2017
@pksohn
Copy link
Contributor

pksohn commented Oct 6, 2017

@apdjustino Thanks for the PR.

I'm having trouble duplicating your issue: I have a DataFrame with residential, retail, and office forms. I passed ['residential', 'office'] (excluding retail) to the keep_form_with_max_profit method, and indeed after the slice, f.columns is a MultiIndex which still includes retail in the first level. However, f.stack(0) doesn't include retail in its row index after this, and the final DF returned from the method doesn't seem to be affected. Am I understanding the issue correctly? Is this just meant to be a performance improvement?

Also, looks like the remove_unused_columns method of MultiIndex isn't available until Pandas version 0.20 and I'd prefer to leave the required version of Pandas as-is. Looks like your first commit doesn't require that method and still does the same thing. Mind if we roll back to that commit?

@apdjustino
Copy link
Author

hi @pksohn,

Here's a little info on what I'm seeing: I am feeding the developer.pick() method the feasibility table calculated through the proforma feasibility model. I calculate feasibility for all forms, and the resulting table has multi-indexed columns with each form (retail, residential, industrial, office, mixedoffice, mixedresidential). When I run the developer.pick() method and i pass the list ['retail', 'office', industrial'], the command new_buildings.forms.value_counts() on the output table indicates that retail, residential, and office are the forms picked by the developer model. Residential should not be in that list, if it's not passed?

I do see some inconsistencies with this too; different combinations of forms will sometimes be correct and sometime incorrect. With the changes I submitted, I found that the forms in the new_buildings output were consistently matching the forms in the list passed to pick() method

@pksohn
Copy link
Contributor

pksohn commented Oct 17, 2017

@apdjustino Interesting. Thanks for the additional info.

I tried to use a simple test case to reproduce that behavior; let me know if I'm missing something here. I'm still getting the expected/correct result here:

Added to test_developer.py:

def test_developer_multiple_forms(simple_dev_inputs):
    pf = sqpf.SqFtProForma()
    inputs = pd.DataFrame(simple_dev_inputs)
    # Override previous prices; residential price is 40
    inputs.office = 80
    inputs.retail = 40

    dev = developer.Developer(
        {'residential': pf.lookup('residential', inputs),
         'office': pf.lookup('office', inputs),
         'retail': pf.lookup('retail', inputs)})
    target_units = 100
    parcel_size = pd.Series([1000, 1000, 1000], index=['a', 'b', 'c'])
    ave_unit_size = pd.Series([650, 650, 650], index=['a', 'b', 'c'])
    current_units = pd.Series([0, 0, 0], index=['a', 'b', 'c'])
    bldgs = dev.pick(["retail", "office"], target_units,
                     parcel_size, ave_unit_size, current_units,
                     residential=False)
    assert len(bldgs.form.value_counts()) == 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants