Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up slice fallback and remove bit64 slice hacks #1707

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DavisVaughan
Copy link
Member

I originally intended to just remove the old bit64 hacks around slicing with NA_integer_, but trying to do so exposed some strange bugs with the way our fallback slicing was working. I'll add some inline comments below.

The first commit cleans up the fallback slicing approach. The second commit removes the bit64 hacks.

- Handles dim case
- Always restores as needed
- Which means R level shaped dispatch no longer proxy/restores
- R level shaped dispatch also no longer unclasses, which defeated the purpose of dispatching!
- Fixes `vec_is_restored()` to also ignore `dim` and `dimnames`
Comment on lines 112 to -114
# Called when `x` has dimensions
vec_slice_fallback <- function(x, i) {
out <- unclass(vec_proxy(x))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of this vec_slice_fallback() path was to handle shaped fallback vectors when calling vec_slice(). It was intended to give the [ methods for shaped vectors a chance to run, which we'd then restore on the backside if needed with vec_is_restored() and vec_restore() (because base [ methods drop the class).

However, we were unclassing the input before calling [, which kind of defeats the whole point!

This was a problem with bit64, because the [.integer64 method is required to correctly handle NA_integer_ indices.

library(vctrs)

`[.myclass` <- function(x, ...) {
  print("wow!")
  NextMethod()
}

x <- matrix(1:6, nrow = 3)
class(x) <- "myclass"

# Going through current `vec_slice_fallback()`.
# Unclassing so we don't get to call our `[` method and we don't see "wow!".
vec_slice(x, 1)
#>      [,1] [,2]
#> [1,]    1    4
#> attr(,"class")
#> [1] "myclass"

Created on 2022-09-30 by the reprex package (v2.0.1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, this function does not need to proxy/restore, because if a vec_proxy() method was defined then we would not have gone through the fallback path to begin with!

I've trimmed it down to only calling [, which feels more correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly, it was given a rather unfortunate name, because it made us think this was the fallback slice method that we should be using everywhere, when really it was just intended for shaped vectors.

It was actually being used in other places, like the fallback paths of list_unchop() and vec_chop(), when really we should have been using a more generic slice fallback function that actually handles that "maybe restore" part on the backside.

I'm fairly certain this was the reason we tried to make it unclass/proxy/restore all in the same function.

It now has the name bracket_shaped_dispatch() to go along with the C level bracket_dispatch() which calls x[i] in the non-shaped case.

@@ -296,6 +296,10 @@ static SEXP chop_shaped(SEXP x, SEXP indices, struct vctrs_chop_info info) {
}

static SEXP chop_fallback(SEXP x, SEXP indices, struct vctrs_chop_info info) {
// TODO: Do we really care about micro performance that much here? Can we just
// merge with `chop_shaped_fallback()` now that `vec_slice_fallback()`
// handles shaped an unshaped vectors?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chop_fallback() only runs on S3/S4 objects that don't have a vec_proxy() method defined for them (i.e. vec_requires_fallback() is true).

I can't imagine we care about the performance of those cases that much. I think I'd rather simplify this code by a lot by merging it with chop_fallback_shaped(), which just allows vec_slice_fallback() to handle everything related to setting up the environment and doing vec_is_restored(). The less places we have that vec_is_restored() check lying around, the better IMO.

Comment on lines +210 to +225
static
r_obj* bracket_dispatch(r_obj* x, r_obj* subscript) {
return vctrs_dispatch2(
syms_bracket, fns_bracket,
syms_x, x,
syms_i, subscript
);
}
static
r_obj* bracket_shaped_dispatch(r_obj* x, r_obj* subscript) {
return vctrs_dispatch2(
syms.bracket_shaped_dispatch, fns.bracket_shaped_dispatch,
syms_x, x,
syms_i, subscript
);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have these nice two parallel functions. bracket_dispatch() calls x[i] and bracket_shaped_dispatch() just calls x[i, , , drop = FALSE] with as many , as needed.

syms_x, x,
syms_i, subscript
);
}

r_obj* vec_slice_fallback(r_obj* x, r_obj* subscript) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of vec_slice_fallback() is now to completely handle the fallback slice case - which is basically how we have been trying to use it this whole time.

It first hands off to [ and then optionally calls vec_restore() if it looks like we needed to. This allows us to localize the "maybe restore" logic into one place, so we don't forget to do it whenever we use vec_slice_fallback() elsewhere.

Comment on lines -388 to +386
if (r_node_tag(node) == r_syms.names) {
r_obj* tag = r_node_tag(node);

if (tag == r_syms.names ||
tag == r_syms.dim ||
tag == r_syms.dim_names) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec_is_restored() had this bug where it said "you don't need to restore" if the object had dim or dimname attributes, which definitely isn't true. Those are two of the attributes that base R [ methods will propagate, like names (but unlike the class).

# are requesting native vctrs slicing), then you need to declare a
# `vec_proxy()` method as well to tell vctrs what it needs to be natively
# slicing.

local_methods(
`[.vctrs_foobar` = function(x, i, ...) vec_slice(x, i)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so lets talk about this. This infloop check is for:

  • A class that has clearly opted into using vctrs, because they call vec_slice()
  • But isn't using new_vctr(), which has a vec_proxy() method defined
  • And isn't defining their own vec_proxy() method

In that case, vec_slice() tries to fall back to the [ method, which causes the infloop.

I don't think that avoiding this infloop is our responsibility. The way we tried to do this was by calling unclass() in that original vec_slice_fallback() function, but as I showed earlier that defeats the whole purpose of the function.

It it also 100% possible to do this already with non-shaped vectors. i.e. this is with CRAN vctrs:

library(vctrs)
`[.vctrs_foobar` <- function(x, i, ...) vec_slice(x, i)
x <- structure(1:4, class = "vctrs_foobar")
# infloop
x[1]

I think that if you are telling vctrs that you want [ to forward to vec_slice(), then you have to tell vctrs what it should be natively slicing by providing a vec_proxy() method (if you aren't extending vctrs_vctr).

@@ -351,6 +363,7 @@ test_that("vec_restore() is called after slicing data frames", {

test_that("additional subscripts are forwarded to `[`", {
local_methods(
vec_proxy.vctrs_foobar = function(x, ...) x,
`[.vctrs_foobar` = function(x, i, ...) vec_index(x, i, ...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise produces the infloop from before, but we are just trying to test vec_index() properties here (used by vctrs_vctr) so providing a proxy method like vctrs_vctr to avoid the infloop seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant