Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

float("nan") not always converted to pd.NA inside series with pint dtype #238

Open
scanzy opened this issue Jun 26, 2024 · 2 comments
Open

Comments

@scanzy
Copy link

scanzy commented Jun 26, 2024

Hello,
I am facing this issue while building a pd.Series with pint dtype.

  1. When float("nan") is alone, it remains float("nan").
  2. When float("nan") is with other values, it is converted into pd.NA.

This is not evident printing the series (the formatting shows always nan), but values or tolist() reveal the difference.

import pint as pt
import pandas as pd
import pint_pandas

# case 1: float nan alone
print(pd.Series([float("nan")], dtype="pint[MW]").tolist())
# gives: [<Quantity(nan, 'megawatt')>]

# case 2: float nan with other values
print(pd.Series([float("nan"), 0.0], dtype="pint[MW]").tolist())
# gives: [<Quantity(<NA>, 'megawatt')>, <Quantity(0.0, 'megawatt')>]

I supposed that float("nan") was the default value meaning "not set magnitude".
The fact that nan is converted to pd.NA based on other values in the series looks bit tricky to me: is it intended?

I am looking a way to keep not-set values consistent (either all float("nan"), or all pd.NA), but:

  1. Tying to convert pd.NA to float("nan") has no effect.
  2. If I try to convert float("nan") to pd.NA I get ValueError.
# test 1: trying to convert pd.NA to nan
s = pd.Series([float("nan"), 0.0], dtype="pint[MW]")
print(s.tolist())
# gives: [<Quantity(<NA>, 'megawatt')>, <Quantity(0, 'megawatt')>]

print(s.fillna(float("nan")).tolist())
# gives the same: [<Quantity(<NA>, 'megawatt')>, <Quantity(0, 'megawatt')>]


# test 2: trying to convert nan to pd.NA
s = pd.Series([float("nan")], dtype="pint[MW]")
print(s.tolist())
# gives: [<Quantity(nan, 'megawatt')>]

s.fillna(pd.NA)
# gives: ValueError: float() argument must be a string or a real number, not 'NAType'
versions:
- Python 3.11.2
- pandas 2.2.2
- Pint 0.24.1
- Pint-Pandas 0.6
@andrewgsavage
Copy link
Collaborator

The difference is due to the underlying data type:

s = pd.Series([float("nan"), 0.0], dtype="pint[MW]")
s.values.data
<FloatingArray>
[<NA>, 0.0]
Length: 2, dtype: Float64


s = pd.Series([float("nan")], dtype="pint[MW]")
s.values.data
<NumpyExtensionArray>
[nan]
Length: 1, dtype: float64

@andrewgsavage
Copy link
Collaborator

I think pint-pandas should by:

  1. By default, convert data to a FloatingArray
  2. Have an option to change the conversion to some other dtype
  3. Have an option to prevent conversion, allowing any dtype as the underlying data dtype. In this case, specify the underlying dtype in the pint dtype, eg 'pint[MW][Float64]'

@andrewgsavage andrewgsavage mentioned this issue Aug 5, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants