Skip to content

Switch from elasticsearch to PostgreSQL text search #465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

enricostano
Copy link
Contributor

@enricostano enricostano commented Feb 21, 2019

WAT

Elasticsearch is a dependency difficult to maintain and having to maintain a mirrored copy of the resources is also difficult, as you can see looking at the code deleted by this PR.

We already have PostgreSQL as a dependency and since our needs are quite simple we'll use the PostgreSQL built in Full Text Search.

The gem that I'm proposing is https://github.com/Casecommons/pg_search

Depends on

coopdevs/timeoverflow-provisioning#62

Extra

Closes #293

@enricostano enricostano self-assigned this Feb 21, 2019
@enricostano enricostano force-pushed the feature/switch-from-elastic-to-pg-search-la-buena branch 2 times, most recently from ea948e5 to 0ffd3ae Compare February 21, 2019 22:30
Copy link
Collaborator

@markets markets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@markets markets self-requested a review February 22, 2019 00:10
@sseerrggii
Copy link
Contributor

sseerrggii commented Feb 22, 2019

I was playing with it 😉

❌ I noticed that you can't search part of a word. If I search "pequeña" I don't have Posts with "pequeñas" word inside, this a difference with the current usage with ES

Worked fine:

  • Case insensitive ✔️
  • Accent insensitive ✔️
  • Only show active Posts ✔️
  • Edit post, changing keyword and search ✔️

Tested:

Search logged out:

Extra (the current behaviour with ES don't cover this)

When user is logged out we want to filter by organization https://staging.timeoverflow.org/offers?org=2&q=prueba but don't apply org filter (only https://staging.timeoverflow.org/offers?org=2 https://staging.timeoverflow.org/offers?org=1 is filtering by organization).

Analogously near search box you can filter by category ?cat=1 we want to accumulate the filters like https://staging.timeoverflow.org/offers?org=2&cat=1&q=pruebas search Posts with keyword 'pruebas' in organization 2 and category 1

Search logged in in org 1:

Search logged in in org 2:

Same URLs test, only to ensure that org filter is working, and it is ✔️

@markets
Copy link
Collaborator

markets commented Mar 1, 2019

@enricostano I think we can solve that issue by using one of the following settings: https://github.com/Casecommons/pg_search#searching-using-different-search-features (some of them need extra PostgreSQL extensions). I'd try with :tsearch first, and then (if no luck) with :trigram (smarter, but requires an extension).

@markets
Copy link
Collaborator

markets commented Mar 7, 2019

@sseerrggii @enricostano Pushed #467 with partial word search working, via :tsearch as commented above.

@markets
Copy link
Collaborator

markets commented Mar 11, 2019

@sseerrggii @enricostano I think we can leave the "Extra" requirements proposed by Sergi for another branch, so we can move on this one (which simplifies our infra a lot 👌). What do you think?

@markets
Copy link
Collaborator

markets commented Mar 14, 2019

@enricostano @sseerrggii updated with develop and fixed conflicts!

@sseerrggii
Copy link
Contributor

🍏 Tested 🍏

@enricostano
Copy link
Contributor Author

I would like to test the performance of this new approach using a production db dump, with hundreds of posts to search from.

@enricostano enricostano removed the wip label Mar 15, 2019
@sseerrggii
Copy link
Contributor

sseerrggii commented Mar 18, 2019

Speed Test (time to load page in chrome console)

Logged in, search "angles" (search only in one organization's posts)

  • With Elastic Search: 467m
  • PostgreSQL text search: 644ms

Logged in, search "las" (search only in one organization's posts)

  • With Elastic Search: 718ms
  • PostgreSQL text search: 566ms

Logged out, search "angles" (search in all posts)

  • With Elastic Search: 416ms
  • PostgreSQL text search: 2,16s

Logged out, search "las" (search in all posts)

  • With Elastic Search: 735ms
  • PostgreSQL text search: 2,11s

@Morantron
Copy link
Collaborator

might be interesting to play with this https://www.postgresql.org/docs/9.5/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

@enricostano
Copy link
Contributor Author

might be interesting to play with this https://www.postgresql.org/docs/9.5/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX

Yep, I just wanted to do some benchmark first.

Even if "logged out search" (hitting the whole Post table) is not used by users (@sseerrggii 's says... I would like some numbers instead), I would look into indexes in any case.

@enricostano
Copy link
Contributor Author

enricostano commented Mar 23, 2019

Relevant blog posts in order to tune our config:

https://www.compose.com/articles/indexing-for-full-text-search-in-postgresql/
http://rachbelaid.com/postgres-full-text-search-is-good-enough/ (multi language)

EDIT: for indexing look also at https://github.com/Casecommons/pg_search/wiki/Building-indexes (maybe a bit outdated)
EDIT 2: more info on GiST and GIN indexes

@enricostano
Copy link
Contributor Author

Basically in order to improve performance we need to:

  1. store the ts_vector instead of generating it every time
  2. create an index for the ts_vector field

@enricostano
Copy link
Contributor Author

enricostano commented May 23, 2019

Some more benchmarking:

  1. without index
  2. with index

@enricostano enricostano force-pushed the feature/switch-from-elastic-to-pg-search-la-buena branch from e4a4b54 to aab779c Compare May 23, 2019 23:24
@enricostano
Copy link
Contributor Author

Post.active.of_active_members.search_by_query('angles').explain

@sauloperez 👆 I'll try to add a couple of indexes on posts.active and members.active

@enricostano enricostano force-pushed the feature/switch-from-elastic-to-pg-search-la-buena branch from f6dcd78 to 5d0ec9e Compare May 24, 2019 19:20
@enricostano
Copy link
Contributor Author

I tried adding those indexes but since both columns are boolean PG never use them....

@enricostano
Copy link
Contributor Author

@sseerrggii says that's tested enough 😅

Let's merge it!

@enricostano enricostano merged commit 9f7b99a into develop May 24, 2019
@enricostano enricostano deleted the feature/switch-from-elastic-to-pg-search-la-buena branch May 24, 2019 19:44
@markets
Copy link
Collaborator

markets commented May 25, 2019

Nice job @enricostano and @sseerrggii 👏 this is a very good infra and project change!

enricostano added a commit that referenced this pull request Sep 20, 2019
…stic-to-pg-search-la-buena"

This reverts commit 9f7b99a, reversing
changes made to ad9a8f0.
@sauloperez sauloperez mentioned this pull request Dec 4, 2019
@markets markets restored the feature/switch-from-elastic-to-pg-search-la-buena branch October 6, 2020 18:52
@markets markets deleted the feature/switch-from-elastic-to-pg-search-la-buena branch October 6, 2020 18:56
@markets markets restored the feature/switch-from-elastic-to-pg-search-la-buena branch October 6, 2020 18:58
@markets markets deleted the feature/switch-from-elastic-to-pg-search-la-buena branch October 6, 2020 19:08
markets added a commit that referenced this pull request Oct 6, 2020
…from-elastic-to-pg-search-la-buena""

This reverts commit 096fcce.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make tests not to rely on the network
5 participants