Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs follow up #200

Open
andrewgsavage opened this issue Sep 5, 2023 · 2 comments
Open

Docs follow up #200

andrewgsavage opened this issue Sep 5, 2023 · 2 comments

Comments

@andrewgsavage
Copy link
Collaborator

A page or example for:

A key to understanding the behavior of PintArrays within Pandas DataFrames is that PintArrays are ExtensionArrays, and Pandas supports only 1-dimensional ExtensionArrays. In a 2-dimensional DataFrame, PintArrays are only ever columns. If you ask for a single row (via .loc or .iloc or some such), you will get back a Series, that may have a PintArray as its values (for example when the entire DataFrame is homogeneous in its units). But that Series is just a view constructed by PintPandas for your convenience. Rows in the DataFrame itself are not PintArrays.

also for

Pandas and PintArrays
Pandas makes it easy to put data into and rows and columns, and most novice users do not need to understand any details beyond the fact that a DataFrame has both rows and columns, whereas Series are 1-dimensional arrays (which could be a row of data or a column of data). But when using advanced features of Pandas like ExtensionArrays, it is helpful to understand a few additional details.

When working with basic numerical data, Pandas uses Numpy data structures which are well-suited to vectorization and other performance optimizations. Pandas ExtensionArrays provide almost the full range of Pandas functionality when operating on user-defined 1-dimensional arrays. A PintArray is an ExtensionArray that is filled with Pint Quantities (and which can be optimized for performance).

A 1-dimensional Pandas Series can use a PintArray to hold its values. Columns in 2-dimensional Pandas DataFrame can contain PintArrays--with all the efficiency the ExtensionArray APIs provide, but rows are a special case. If all elements of the row have the same units, the row will be returned as Series backed by a PintArray with those units. But if the units are heterogeneous, the row will be returned as a Series consisting of discrete Quantities (or raw data if the column values don't have units). All Quantity data within such Series will follow Pint rules of unit conversions and will give error messages when units are not compatible, but some error messages may lose information as Pandas tries to align two incompatible Quantities to non-unitized magnitude values. To get the greatest benefit from Pint-pandas (and Pandas in general), make your columns from data with homogeneous data and let your rows contain the heterogeneous data when necessary.

When examining DataFrames that contain units, if you see units within your DataFrame or Series, it means the Pandas object is not using PintArrays but it is still using Pint Quantities. If you see only magnitudes when you print your DataFrame, and you see pint[units] as the dtype of the column or Series, it means the Pandas object is using PintArrays.

API page for all series and dataframe accessors. Would need to set up docstrings.

in common issues, add code to illustrate the following:

Quantity objects within Pandas DataFrames (or Series) will behave like Quantities, meaning that they are subject to unit conversion rules and will raise errors when incompatible units are mixed. But these loose Quantities don't offer the elegance or performance optimizations that come from using PintArrays. And they may give strange error messages as Pandas tries to convert incompatible units to dimensionless magnitudes (which is often prohibited by Pint) rather than naming the incompatibility between the two Quantities in question.

in common issues, expand on Creating DataFrames from Series

@andrewgsavage
Copy link
Collaborator Author

Now that it's merged it should be easier to make a PR and view the changed pages yourself @MichaelTiemannOSC
I think you need to make an account on readthedocs to get it to build for you

@MichaelTiemannOSC
Copy link
Collaborator

OK...I've created an account via my GitHub ID. I'll see what sorts of issues I run into from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants