Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot display pint-pandas DataFrame in Streamlit #181

Open
mkaut opened this issue May 23, 2023 · 2 comments
Open

cannot display pint-pandas DataFrame in Streamlit #181

mkaut opened this issue May 23, 2023 · 2 comments

Comments

@mkaut
Copy link

mkaut commented May 23, 2023

I tried using pint-pandas-enhanced DataFrames in a Streamlit app, but displaying it leads to an error from pyarrow.

My code:

import streamlit as st
import pandas as pd
import pint
import pint_pandas
df = pd.DataFrame({
	"torque": pd.Series([1., 2., 2., 3.], dtype="pint[lbf ft]"),
	"angular_velocity": pd.Series([1., 2., 2., 3.], dtype="pint[rpm]"),
})
df.dtypes
df

The penultimate line display the dtypes and confirms that the dataframe is created correctly,
but the last line fails with the following error message in the app:

ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column torque with type pint[foot * force_pound]')

and in the terminal:

2023-05-23 10:57:26.467 Serialization of dataframe to Arrow table was unsuccessful due to: ('Could not convert pint[foot * force_pound] with type PintType: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type object'). Applying automatic fixes for column types to make the dataframe Arrow-compatible.
2023-05-23 10:57:27.067 Serialization of dataframe to Arrow table was unsuccessful due to: ('Did not pass numpy.dtype object', 'Conversion failed for column torque with type pint[foot * force_pound]'). Applying automatic fixes for column types to make the dataframe Arrow-compatible.
2023-05-23 10:57:27.067 Uncaught app exception
Traceback (most recent call last):
  File "[...]\.py311\Lib\site-packages\streamlit\type_util.py", line 757, in data_frame_to_bytes
    table = pa.Table.from_pandas(df)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow\table.pxi", line 3681, in pyarrow.lib.Table.from_pandas
  File "[...]\.py311\Lib\site-packages\pyarrow\pandas_compat.py", line 611, in dataframe_to_arrays
    arrays = [convert_column(c, f)
             ^^^^^^^^^^^^^^^^^^^^^
  File "[...]\.py311\Lib\site-packages\pyarrow\pandas_compat.py", line 611, in <listcomp>
    arrays = [convert_column(c, f)
              ^^^^^^^^^^^^^^^^^^^^
  File "[...]\.py311\Lib\site-packages\pyarrow\pandas_compat.py", line 598, in convert_column
    raise e
  File "[...]\.py311\Lib\site-packages\pyarrow\pandas_compat.py", line 592, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow\array.pxi", line 323, in pyarrow.lib.array
  File "pyarrow\array.pxi", line 79, in pyarrow.lib._ndarray_to_array
  File "pyarrow\array.pxi", line 67, in pyarrow.lib._ndarray_to_type
  File "pyarrow\error.pxi", line 123, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ('Did not pass numpy.dtype object', 'Conversion failed for column torque with type pint[foot * force_pound]')

To me, it looks like Streamlit is using Arrow to process the dataframe for display, and Arrow does not recognize/understand the pint types.

Is there some way to fix it, or get around it?
And should it be reported to Streamlit developers, or is it pint-pandas' responsibility?

@andrewgsavage
Copy link
Collaborator

Is there some way to fix it, or get around it?

Not that I know of

And should it be reported to Streamlit developers, or is it pint-pandas' responsibility?

You can try asking Streamlit or pyarrow and see if anyone is interested. It isn't anyone's respsonsibility.

@scanzy
Copy link

scanzy commented Jul 12, 2023

Hello @mkaut, I have the same problem :(

As a temporary workaround, I am converting everything to string.
It's not ideal, but at least it shows something, with the units!

df = pd.DataFrame({
	"torque": pd.Series([1., 2., 2., 3.], dtype="pint[lbf ft]"),
	"angular_velocity": pd.Series([1., 2., 2., 3.], dtype="pint[rpm]"),
}, dtype = str) # conversion to string here

If you want to make the dataframe editable, maybe it's possible to keep the data as separate pd.Series and update them using st.session_state.my_data_editor.edited_rows.

import pandas as pd
import streamlit as st
import pint

# sets compact unit formatting
u = pint.UnitRegistry()
u.default_format = '~P'
pint.set_application_registry(u)

# initial data
columns = {
	"torque": pd.Series([1., 2., 2., 3.], dtype="pint[lbf ft]"),
	"angular_velocity": pd.Series([1., 2., 2., 3.], dtype="pint[rpm]"),
}

# shows data editor widget, with string data type
st.data_editor(pd.DataFrame(columns, dtype = str), key = "my_data_editor")

# gets edited rows
edited_rows = st.session_state.get("my_data_editor", {}).get("edited_rows", {})
st.write(edited_rows)

# converts edited rows to pint quantities
for rowIndex, editData in edited_rows.items():
    for colIndex, newValue in editData.items():
        columns[colIndex][rowIndex] = u.Quantity(newValue)

# shows edited data
for colName, colSeries in columns.items():
    st.write(colName, colSeries.tolist())

This solutions converts values correctly even using different units.
E.g. using rad/s (instead of rpm) in angular_velocity keeps all values in rpm.

p.s. if you opened some issue about this on streamlit and/or arrow, please post the link here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants