Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss matchCohorts() when there is more than one record per person within a cohort #204

Open
martaalcalde opened this issue Jun 5, 2024 · 1 comment

Comments

@martaalcalde
Copy link
Collaborator

Currently, matchCohorts() assumes that there is only one record per person within a cohort. Hence, if that assumption is not fulfilled, each record would be treated independently and therefore, the same person will have different matches. See example below for a better understanding:

library(CohortConstructor)
library(dplyr)

cdm <- mockCohortConstructor(nPerson = 1000, seed = 0)
cdm$cohort1 |>
 dplyr::filter(subject_id == 3) |>
 matchCohorts(name = "new_cohort")
#> Starting matching
#> ℹ Creating copy of target cohort.
#> • 1 cohort to be matched.
#> ℹ Creating controls cohorts.
#> ℹ Excluding cases from controls
#> • Matching by gender_concept_id and year_of_birth
#> • Removing controls that were not in observation at index date
#> • Excluding target records whose pair is not in observation
#> • Adjusting ratio
#> Binding both cohorts
#> ✔ Done
#> # Source:   table<main.new_cohort> [4 x 5]
#> # Database: DuckDB v0.10.1 [root@Darwin 23.5.0:R 4.3.2/:memory:]
#>   cohort_definition_id subject_id cohort_start_date cohort_end_date cluster_id
#>                  <int>      <int> <date>            <date>               <dbl>
#> 1                    1          3 2015-03-20        2015-03-29               1
#> 2                    1          3 2015-03-30        2015-04-05               2
#> 3                    2        147 2015-03-20        2016-01-10               1
#> 4                    2        509 2015-03-30        2017-05-20               2

Would be nice to discuss if this is what we should expect or we should throw an error when a person is repeated in a cohort. Currently, I've implemented the following warning message:

Warning: Multiple records per person detected. The matchCohorts() function is designed to operate under the assumption that there is only one record per person within each cohort. If this assumption is not met, each record will be treated independently. As a result, the same individual may be matched multiple times, leading to inconsistent and potentially misleading results.

@edward-burn @catalamarti

Created on 2024-06-05 with reprex v2.1.0

@edward-burn
Copy link
Collaborator

Thanks for spotting this @martaalcalde. For the open pr let's indeed just put this warning, but then let's discuss this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants