Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Balanced Full-Text Search for Diary Entries #5156

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kcne
Copy link
Contributor

@kcne kcne commented Sep 3, 2024

This PR adds search functionality to the diary entries page, addressing the need for users to find entries by title and body #3289

Key Changes

  • Model Updates:

    • Integrated the pg_search gem to enable full-text search on diary entries with relevance ranking.
    • Added a searchable column using a tsvector for efficient searching and created an index for performance optimization.
  • Controller Enhancements:

    • Modified the index action in DiaryEntriesController to support search queries, allowing users to filter results by keywords in titles and bodies, with optional language filtering.
  • Testing:

    • Added tests to verify the search functionality, including keyword search, language filtering, and basic edge cases.

Context

Previous methods explored for implementing search functionality, such as the PostgreSQL LIKE operator and pg_trgm search, were either too fast but lacked relevance ranking or too slow to be practical, especially with large datasets like the diary entries (600,000+ records). For instance, LIKE provided speed but no relevance, while pg_trgm could take over 40 seconds to run on a dataset of this size, making it unfeasible. The pg_search gem, utilizing tsearch, offers a balance between relevance and performance, though it required additional database migrations and optimizations. Despite the need for these migrations, this approach serves as a "golden middle," providing a reasonable trade-off between speed and search result quality IMHO.

Commit Summary

  • Add pg_search gem: Introduced pg_search for full-text search capabilities.
  • Add searchable column to diary entries migration: Created a tsvector column for storing precomputed search data.
  • Add index to searchable diary entries migration: Indexed the searchable column for performance improvements.
  • Add search query logic to diary entry model and controller: Integrated search functionality into the model and controller.
  • Add search functionality tests to DiaryEntriesController: Added basic tests to verify the new functionality.

Work in Progress

This is a draft PR. I want to confirm if I'm heading in the right direction. If the approach is approved, I'll add more tests for better coverage and handling of edge cases. Any comments and recommendations welcome.

@kcne kcne changed the title Diary search Implement Balanced Full-Text Search for Diary Entries Sep 3, 2024
@kcne
Copy link
Contributor Author

kcne commented Sep 4, 2024

Any guidance on this would be highly appreciated @tomhughes @gravitystorm @AntonKhorev

@tomhughes
Copy link
Member

Well waiting more than 18 hours before chasing would be my first advice...

The first question really, is whether there is need for this. Other than that one request has there been any other interest in this that you're aware of?

@kcne
Copy link
Contributor Author

kcne commented Sep 4, 2024

Thanks for the feedback. I commented on the issue earlier and thought I'd get a response there, but I realize now I should have tagged you. Apologies for pinging without waiting.

Beyond the issue and comments, I haven’t noticed any additional demand. However, I think this could improve the user experience, especially for those looking for specific content in diaries.

If it’s not a priority right now, I’m happy to pause or adjust as needed.

Also, out of context, but from a contributor perspective, maintaining tags like "help needed" or priority tags on issues would greatly help in understanding which issues are a priority and which are still open for discussion.

@tomhughes
Copy link
Member

I mean if we're going to do it then this is probably the right approach though I'll need to brush up on freetext searching in postgres before I can comment on the details.

We do already have support for freetext searching on notes as a model though it may not be a good one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants