Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicates in some selections with GROUP+HAVING #666

Open
stepapo opened this issue Apr 10, 2024 · 6 comments
Open

Duplicates in some selections with GROUP+HAVING #666

stepapo opened this issue Apr 10, 2024 · 6 comments
Labels

Comments

@stepapo
Copy link

stepapo commented Apr 10, 2024

Describe the bug

Sometimes ORM adds columns to GROUP statement, which leads to retrieving duplicate rows.

To Reproduce

Looking for books with any of Tags ID 2 or 3, which at the same time have more than one tag. Should be two books, but results in three, including one duplicate.

$books = $this->orm->books->findBy([
    ICollection::AND,
    [CompareGreaterThanFunction::class, [CountAggregateFunction::class, 'tags->id'], 1],
    ['tags->id' => [2, 3]],
]);
Assert::same($books->count(), 2);
SELECT "books".* 
FROM "books" AS "books" 
LEFT JOIN "books_x_tags" AS "books_x_tags__COUNT" ON ("books"."id" = "books_x_tags__COUNT"."book_id") 
LEFT JOIN "tags" AS "tags__COUNT" ON ("books_x_tags__COUNT"."tag_id" = "tags__COUNT"."id") 
LEFT JOIN "books_x_tags" AS "books_x_tags_any" ON ("books"."id" = "books_x_tags_any"."book_id") 
LEFT JOIN "tags" AS "tags_any" ON ("books_x_tags_any"."tag_id" = "tags_any"."id") 
GROUP BY "books"."id", "tags_any"."id" 
HAVING ((COUNT("tags__COUNT"."id") > 1) AND (("tags_any"."id" IN (2, 3))))

Expected behavior

Adding "tags_any"."id" IN (2, 3) to HAVING requires column "tags_any"."id" to be added in GROUP, which leads to duplicates in selection. In MySQL, the column could be moved from GROUP to SELECT, but in Postgres, that does not do the trick. In both it would be probably ok to SELECT DISTINCT, or to remove column from GROUP and move condition to WHERE (as it was in 4.0 version).

Versions

  • Database: PostgreSQL 13.13.0
  • Orm: dev-main
  • Dbal: 5.0.0-rc4
@stepapo stepapo added the bug label Apr 10, 2024
@hrach
Copy link
Member

hrach commented Apr 10, 2024

Thank you for another great report. I hope this is the last one, not because you would stop, but because it is that way :D

However, this time the fix... is complicated.

@stepapo
Copy link
Author

stepapo commented Apr 10, 2024

Undestood. I also think it's the last one. I found only one more and it was already in v4.0. Now it's fixed, sort of, by NotSupportedException "Relationship cannot be fetched...". I'm worried that the same is going to happen to this one too :-D

hrach added a commit that referenced this issue Apr 11, 2024
@hrach
Copy link
Member

hrach commented Apr 11, 2024

Hm, MySQL is not working correctly (#667) with DISTINCT. Sadly, I cannot move it from having to where, because it would stop working with OR.

@stepapo
Copy link
Author

stepapo commented Apr 12, 2024

I cannot move it from having to where, because it would stop working with OR.

Seems to me that OR is already solved by including condition "tags_any"."id" IN (2, 3) in LEFT JOIN ... ON. Works well. Maybe the same solution could be used for AND?

$books = $this->orm->books->findBy([
    ICollection::OR,
    [CompareGreaterThanFunction::class, [CountAggregateFunction::class, 'tags->id'], 1],
    ['tags->id' => [2, 3]],
])->fetchAll();
SELECT "books".* 
FROM "books" AS "books" 
LEFT JOIN "books_x_tags" AS "books_x_tags__COUNT" ON ("books"."id" = "books_x_tags__COUNT"."book_id") 
LEFT JOIN "tags" AS "tags__COUNT" ON ("books_x_tags__COUNT"."tag_id" = "tags__COUNT"."id") 
LEFT JOIN "books_x_tags" AS "books_x_tags_any" ON ("books"."id" = "books_x_tags_any"."book_id")
LEFT JOIN "tags" AS "tags_any" ON (
    ("books_x_tags_any"."tag_id" = "tags_any"."id") 
    AND "tags_any"."id" IN (2, 3)
) 
GROUP BY "books"."id" 
HAVING ((COUNT("tags__COUNT"."id") > 1) OR ((COUNT("tags_any"."id") > 0)))

@stepapo
Copy link
Author

stepapo commented Apr 15, 2024

Just for reference, I tried simply commenting out this line and both OR and AND cases now work as well as all other tests in MySQL and Postgres. However, it brings down performance significantly for simple AND filters that use AnyAggregator.

I figured that current DbalExpressionResult probably does not allow combining both WHERE and HAVING conditions within a single query. Without that, it might be hard to achieve correct results and optimal performance at the same time. Understandably, complex filters need to be all in one or the other.

It is still a great job making most of this work correctly and this case is not a "must have" for me.

@hrach
Copy link
Member

hrach commented Apr 16, 2024

Thanks for exploring, yes, I had a few ideas but didn't have a chance to explore them. I hope I'll get to it at the end of this week :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants