Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up slice fallback and remove bit64 slice hacks #1707

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Imports:
lifecycle (>= 1.0.2),
rlang (>= 1.0.6)
Suggests:
bit64,
bit64 (>= 4.0.5),
covr,
crayon,
dplyr (>= 0.8.5),
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# vctrs (development version)

* Internal code around `vec_slice()` fallbacks to `[` has been cleaned up
(#1707).

* We have removed some internal hacks required for very old versions of bit64
regarding `NA_integer_` indices and slicing. If you notice any issues related
to this after updating vctrs, you likely need a newer version of bit64.

* `validate_list_of()` has been removed. It hasn't proven to be practically
useful, and isn't used by any packages on CRAN (#1697).

Expand Down
59 changes: 4 additions & 55 deletions R/slice.R
Original file line number Diff line number Diff line change
Expand Up @@ -110,64 +110,13 @@ vec_slice <- function(x, i) {
}

# Called when `x` has dimensions
vec_slice_fallback <- function(x, i) {
out <- unclass(vec_proxy(x))
Comment on lines 112 to -114
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of this vec_slice_fallback() path was to handle shaped fallback vectors when calling vec_slice(). It was intended to give the [ methods for shaped vectors a chance to run, which we'd then restore on the backside if needed with vec_is_restored() and vec_restore() (because base [ methods drop the class).

However, we were unclassing the input before calling [, which kind of defeats the whole point!

This was a problem with bit64, because the [.integer64 method is required to correctly handle NA_integer_ indices.

library(vctrs)

`[.myclass` <- function(x, ...) {
  print("wow!")
  NextMethod()
}

x <- matrix(1:6, nrow = 3)
class(x) <- "myclass"

# Going through current `vec_slice_fallback()`.
# Unclassing so we don't get to call our `[` method and we don't see "wow!".
vec_slice(x, 1)
#>      [,1] [,2]
#> [1,]    1    4
#> attr(,"class")
#> [1] "myclass"

Created on 2022-09-30 by the reprex package (v2.0.1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, this function does not need to proxy/restore, because if a vec_proxy() method was defined then we would not have gone through the fallback path to begin with!

I've trimmed it down to only calling [, which feels more correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly, it was given a rather unfortunate name, because it made us think this was the fallback slice method that we should be using everywhere, when really it was just intended for shaped vectors.

It was actually being used in other places, like the fallback paths of list_unchop() and vec_chop(), when really we should have been using a more generic slice fallback function that actually handles that "maybe restore" part on the backside.

I'm fairly certain this was the reason we tried to make it unclass/proxy/restore all in the same function.

It now has the name bracket_shaped_dispatch() to go along with the C level bracket_dispatch() which calls x[i] in the non-shaped case.

vec_assert(out)

d <- vec_dim_n(out)
if (d == 2) {
out <- out[i, , drop = FALSE]
} else {
miss_args <- rep(list(missing_arg()), d - 1)
out <- eval_bare(expr(out[i, !!!miss_args, drop = FALSE]))
}

vec_restore(out, x)
}

vec_slice_fallback_integer64 <- function(x, i) {
bracket_shaped_dispatch <- function(x, i) {
d <- vec_dim_n(x)

if (d == 2) {
out <- x[i, , drop = FALSE]
} else {
miss_args <- rep(list(missing_arg()), d - 1)
out <- eval_bare(expr(x[i, !!!miss_args, drop = FALSE]))
}

is_na <- is.na(i)

if (!any(is_na)) {
return(out)
}

if (d == 2) {
out[is_na,] <- bit64::NA_integer64_
} else {
eval_bare(expr(out[is_na, !!!miss_args] <- bit64::NA_integer64_))
}

out
}

# bit64::integer64() objects do not have support for `NA_integer_`
# slicing. This manually replaces the garbage values that are created
# any time a slice with `NA_integer_` is made.
vec_slice_dispatch_integer64 <- function(x, i) {
out <- x[i]

is_na <- is.na(i)

if (!any(is_na)) {
return(out)
}

out[is_na] <- bit64::NA_integer64_

out
miss_args <- rep(list(missing_arg()), d - 1)
x <- eval_bare(expr(x[i, !!!miss_args, drop = FALSE]))
x
}


#' @rdname vec_slice
#' @export
`vec_slice<-` <- function(x, i, value) {
Expand Down
8 changes: 2 additions & 6 deletions src/globals.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,7 @@ struct syms {
r_obj* to_arg;
r_obj* value_arg;
r_obj* vec_default_cast;
r_obj* vec_slice_dispatch_integer64;
r_obj* vec_slice_fallback;
r_obj* vec_slice_fallback_integer64;
r_obj* bracket_shaped_dispatch;
r_obj* x_arg;
r_obj* y_arg;
};
Expand All @@ -36,9 +34,7 @@ struct chrs {
};

struct fns {
r_obj* vec_slice_dispatch_integer64;
r_obj* vec_slice_fallback;
r_obj* vec_slice_fallback_integer64;
r_obj* bracket_shaped_dispatch;
};

struct vec_args {
Expand Down
18 changes: 7 additions & 11 deletions src/slice-chop.c
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,10 @@ static SEXP chop_shaped(SEXP x, SEXP indices, struct vctrs_chop_info info) {
}

static SEXP chop_fallback(SEXP x, SEXP indices, struct vctrs_chop_info info) {
// TODO: Do we really care about micro performance that much here? Can we just
// merge with `chop_shaped_fallback()` now that `vec_slice_fallback()`
// handles shaped an unshaped vectors?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chop_fallback() only runs on S3/S4 objects that don't have a vec_proxy() method defined for them (i.e. vec_requires_fallback() is true).

I can't imagine we care about the performance of those cases that much. I think I'd rather simplify this code by a lot by merging it with chop_fallback_shaped(), which just allows vec_slice_fallback() to handle everything related to setting up the environment and doing vec_is_restored(). The less places we have that vec_is_restored() check lying around, the better IMO.


// Evaluate in a child of the global environment to allow dispatch
// to custom functions. We define `[` to point to its base
// definition to ensure consistent look-up. This is the same logic
Expand All @@ -306,16 +310,8 @@ static SEXP chop_fallback(SEXP x, SEXP indices, struct vctrs_chop_info info) {
Rf_defineVar(syms_i, info.index, env);

// Construct call with symbols, not values, for performance.
// TODO - Remove once bit64 is updated on CRAN. Special casing integer64
// objects to ensure correct slicing with `NA_integer_`.
SEXP call;
if (is_integer64(x)) {
call = PROTECT(Rf_lang3(syms.vec_slice_dispatch_integer64, syms_x, syms_i));
Rf_defineVar(syms.vec_slice_dispatch_integer64, fns.vec_slice_dispatch_integer64, env);
} else {
call = PROTECT(Rf_lang3(syms_bracket, syms_x, syms_i));
Rf_defineVar(syms_bracket, fns_bracket, env);
}
SEXP call = PROTECT(Rf_lang3(syms_bracket, syms_x, syms_i));
Rf_defineVar(syms_bracket, fns_bracket, env);

for (R_len_t i = 0; i < info.out_size; ++i) {
if (info.has_indices) {
Expand All @@ -329,6 +325,7 @@ static SEXP chop_fallback(SEXP x, SEXP indices, struct vctrs_chop_info info) {

SEXP elt = PROTECT(Rf_eval(call, env));

// Same logic as `vec_slice_fallback()`
if (!vec_is_restored(elt, x)) {
elt = vec_restore(elt, x, vec_owned(elt));
}
Expand All @@ -349,7 +346,6 @@ static r_obj* chop_fallback_shaped(r_obj* x, r_obj* indices, struct vctrs_chop_i
++(*info.p_index);
}

// `vec_slice_fallback()` will also `vec_restore()` for us
r_obj* elt = PROTECT(vec_slice_fallback(x, info.index));

SET_VECTOR_ELT(info.out, i, elt);
Expand Down
81 changes: 37 additions & 44 deletions src/slice.c
Original file line number Diff line number Diff line change
Expand Up @@ -207,34 +207,39 @@ r_obj* df_slice(r_obj* x, r_obj* subscript) {
return out;
}

static
r_obj* bracket_dispatch(r_obj* x, r_obj* subscript) {
return vctrs_dispatch2(
syms_bracket, fns_bracket,
syms_x, x,
syms_i, subscript
);
}
static
r_obj* bracket_shaped_dispatch(r_obj* x, r_obj* subscript) {
return vctrs_dispatch2(
syms.bracket_shaped_dispatch, fns.bracket_shaped_dispatch,
syms_x, x,
syms_i, subscript
);
}
Comment on lines +210 to +225
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have these nice two parallel functions. bracket_dispatch() calls x[i] and bracket_shaped_dispatch() just calls x[i, , , drop = FALSE] with as many , as needed.


r_obj* vec_slice_fallback(r_obj* x, r_obj* subscript) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of vec_slice_fallback() is now to completely handle the fallback slice case - which is basically how we have been trying to use it this whole time.

It first hands off to [ and then optionally calls vec_restore() if it looks like we needed to. This allows us to localize the "maybe restore" logic into one place, so we don't forget to do it whenever we use vec_slice_fallback() elsewhere.

// TODO - Remove once bit64 is updated on CRAN. Special casing integer64
// objects to ensure correct slicing with `NA_integer_`.
if (is_integer64(x)) {
return vctrs_dispatch2(syms.vec_slice_fallback_integer64, fns.vec_slice_fallback_integer64,
syms_x, x,
syms_i, subscript);
}
r_obj* out = r_null;

return vctrs_dispatch2(syms.vec_slice_fallback, fns.vec_slice_fallback,
syms_x, x,
syms_i, subscript);
}
if (has_dim(x)) {
out = KEEP(bracket_shaped_dispatch(x, subscript));
} else {
out = KEEP(bracket_dispatch(x, subscript));
}

static
r_obj* vec_slice_dispatch(r_obj* x, r_obj* subscript) {
// TODO - Remove once bit64 is updated on CRAN. Special casing integer64
// objects to ensure correct slicing with `NA_integer_`.
if (is_integer64(x)) {
return vctrs_dispatch2(syms.vec_slice_dispatch_integer64, fns.vec_slice_dispatch_integer64,
syms_x, x,
syms_i, subscript);
// Take over attribute restoration only if there is no `[` method
if (!vec_is_restored(out, x)) {
out = vec_restore(out, x, vec_owned(out));
}

return vctrs_dispatch2(syms_bracket, fns_bracket,
syms_x, x,
syms_i, subscript);
FREE(1);
return out;
}

bool vec_requires_fallback(r_obj* x, struct vctrs_proxy_info info) {
Expand Down Expand Up @@ -300,18 +305,7 @@ r_obj* vec_slice_unsafe(r_obj* x, r_obj* subscript) {
subscript = KEEP_N(compact_materialize(subscript), &nprot);
}

r_obj* out;

if (has_dim(x)) {
out = KEEP_N(vec_slice_fallback(x, subscript), &nprot);
} else {
out = KEEP_N(vec_slice_dispatch(x, subscript), &nprot);
}

// Take over attribute restoration only if there is no `[` method
if (!vec_is_restored(out, x)) {
out = vec_restore(out, x, vec_owned(out));
}
r_obj* out = vec_slice_fallback(x, subscript);

FREE(nprot);
return out;
Expand Down Expand Up @@ -381,11 +375,15 @@ bool vec_is_restored(r_obj* x, r_obj* to) {
return false;
}

// Class is restored if it contains any other attributes than names.
// We might want to add support for data frames later on.
// Class is restored if it contains any other attributes than names, dim, or
// dimnames. We might want to add support for data frames later on.
r_obj* node = attrib;
while (node != r_null) {
if (r_node_tag(node) == r_syms.names) {
r_obj* tag = r_node_tag(node);

if (tag == r_syms.names ||
tag == r_syms.dim ||
tag == r_syms.dim_names) {
Comment on lines -388 to +386
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec_is_restored() had this bug where it said "you don't need to restore" if the object had dim or dimname attributes, which definitely isn't true. Those are two of the attributes that base R [ methods will propagate, like names (but unlike the class).

node = r_node_cdr(node);
continue;
}
Expand Down Expand Up @@ -482,11 +480,6 @@ r_obj* ffi_slice_rep(r_obj* x, r_obj* ffi_i, r_obj* ffi_n) {


void vctrs_init_slice(r_obj* ns) {
syms.vec_slice_dispatch_integer64 = r_sym("vec_slice_dispatch_integer64");
syms.vec_slice_fallback = r_sym("vec_slice_fallback");
syms.vec_slice_fallback_integer64 = r_sym("vec_slice_fallback_integer64");

fns.vec_slice_dispatch_integer64 = r_eval(syms.vec_slice_dispatch_integer64, ns);
fns.vec_slice_fallback = r_eval(syms.vec_slice_fallback, ns);
fns.vec_slice_fallback_integer64 = r_eval(syms.vec_slice_fallback_integer64, ns);
syms.bracket_shaped_dispatch = r_sym("bracket_shaped_dispatch");
fns.bracket_shaped_dispatch = r_eval(syms.bracket_shaped_dispatch, ns);
}
5 changes: 0 additions & 5 deletions src/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -965,11 +965,6 @@ r_obj* colnames2(r_obj* x) {
}
}

// [[ include("utils.h") ]]
bool is_integer64(SEXP x) {
return TYPEOF(x) == REALSXP && Rf_inherits(x, "integer64");
}

// [[ include("utils.h") ]]
bool lgl_any_na(SEXP x) {
R_xlen_t size = Rf_xlength(x);
Expand Down
2 changes: 0 additions & 2 deletions src/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,6 @@ bool is_compact(SEXP x);
SEXP compact_materialize(SEXP x);
R_len_t vec_subscript_size(SEXP x);

bool is_integer64(SEXP x);

bool lgl_any_na(SEXP x);

SEXP colnames(SEXP x);
Expand Down
35 changes: 35 additions & 0 deletions tests/testthat/test-slice.R
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,18 @@ test_that("vec_slice() unclasses input before calling `vec_restore()`", {
})

test_that("can call `vec_slice()` from `[` methods with shaped objects without infloop", {
skip("Until infloop is talked about")
# TODO: I don't think it is our job to prevent this.
# You can already make this infloop today with non-shaped objects, i.e
# `[.vctrs_foobar` = function(x, i, ...) vec_slice(x, i)
# x <- structure(1:4, class = "vctrs_foobar")
# x[1]
# So we shouldn't be special casing shaped objects.
# I believe that if your `[` method is going to call `vec_slice()` (i.e. you
# are requesting native vctrs slicing), then you need to declare a
# `vec_proxy()` method as well to tell vctrs what it needs to be natively
# slicing.

local_methods(
`[.vctrs_foobar` = function(x, i, ...) vec_slice(x, i)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so lets talk about this. This infloop check is for:

  • A class that has clearly opted into using vctrs, because they call vec_slice()
  • But isn't using new_vctr(), which has a vec_proxy() method defined
  • And isn't defining their own vec_proxy() method

In that case, vec_slice() tries to fall back to the [ method, which causes the infloop.

I don't think that avoiding this infloop is our responsibility. The way we tried to do this was by calling unclass() in that original vec_slice_fallback() function, but as I showed earlier that defeats the whole purpose of the function.

It it also 100% possible to do this already with non-shaped vectors. i.e. this is with CRAN vctrs:

library(vctrs)
`[.vctrs_foobar` <- function(x, i, ...) vec_slice(x, i)
x <- structure(1:4, class = "vctrs_foobar")
# infloop
x[1]

I think that if you are telling vctrs that you want [ to forward to vec_slice(), then you have to tell vctrs what it should be natively slicing by providing a vec_proxy() method (if you aren't extending vctrs_vctr).

)
Expand All @@ -284,6 +296,28 @@ test_that("can call `vec_slice()` from `[` methods with shaped objects without i
expect_identical(x[1], exp)
})

test_that("slicing shaped S3 objects that don't have a proxy method actually calls the `[` method (#1707)", {
# In particular, this is needed for `bit64::integer64()` to allow their `[`
# method to handle `NA_integer_` correctly.
called <- NULL

local_methods(
`[.vctrs_foobar` = function(x, ...) {
called <<- TRUE
foobar(NextMethod())
}
)

x <- foobar(1:6)
dim(x) <- c(3, 2)

expect <- foobar(c(1L, 4L))
dim(expect) <- c(1, 2)

expect_identical(vec_slice(x, 1), expect)
expect_true(called)
})

test_that("vec_slice() restores attributes on shaped S3 objects correctly", {
x <- factor(c("a", "b", "c", "d", "e", "f"))
dim(x) <- c(3, 2)
Expand Down Expand Up @@ -351,6 +385,7 @@ test_that("vec_restore() is called after slicing data frames", {

test_that("additional subscripts are forwarded to `[`", {
local_methods(
vec_proxy.vctrs_foobar = function(x, ...) x,
`[.vctrs_foobar` = function(x, i, ...) vec_index(x, i, ...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise produces the infloop from before, but we are just trying to test vec_index() properties here (used by vctrs_vctr) so providing a proxy method like vctrs_vctr to avoid the infloop seems reasonable.

)

Expand Down