Last active
December 24, 2018 09:02
-
-
Save yozzz/da9023a322e660270b3c7fa2571da636 to your computer and use it in GitHub Desktop.
Elasticsearch suggestions. Example of Postgresql function that split text into the separated phrases
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| -- If have to deal with elasticsearch suggestoins | |
| -- there is nice article about them https://hackernoon.com/elasticsearch-using-completion-suggester-to-build-autocomplete-e9c120cf6d87 | |
| -- but you may face one tricky issue about them | |
| -- e.g. you have completion field 'album' with value "The Prodigy: Reunion" | |
| -- if you will search with suggestion "The Prodigy" it will return result for you | |
| -- but if you will try to search by just "Prodigy" you will get no results | |
| -- this is beacause copmpletion suggester start matchig only from the start of the string | |
| -- https://hackernoon.com/elasticsearch-using-completion-suggester-to-build-autocomplete-e9c120cf6d87#2f5e | |
| -- to fix this problem we can split string into the three separated phrases | |
| -- 1) "The Prodigy: Reunion" | |
| -- 2) "Prodigy: Reunion" | |
| -- 3) "Reunion" | |
| -- and put them inside completion field into index as an array ["The Prodigy Reunion", "Prodigy Reunion", "Reunion"] | |
| CREATE OR REPLACE FUNCTION split_words (words CHAR) | |
| RETURNS text[] AS $$ | |
| DECLARE | |
| arr TEXT[] := regexp_split_to_array(words, E'\\s+'); | |
| total_words INTEGER := array_length(arr, 1); | |
| arr2 TEXT[]; | |
| words TEXT; | |
| iterator INTEGER := 0; | |
| iterator2 INTEGER := 0; | |
| step INTEGER := 1; | |
| words_count INTEGER := 0; | |
| BEGIN | |
| FOR i IN 1 .. total_words | |
| LOOP | |
| words_count := total_words - iterator; | |
| iterator2 := step; | |
| iterator := iterator + 1; | |
| words := NULL; | |
| FOR j IN 1 .. words_count | |
| LOOP | |
| CASE | |
| WHEN j < words_count | |
| THEN words := CONCAT(words, arr[iterator2], ' '); | |
| WHEN j = words_count | |
| THEN words := CONCAT(words, arr[iterator2]); | |
| END CASE; | |
| iterator2 := iterator2 + 1; | |
| END LOOP; | |
| arr2 := array_append(arr2, words); | |
| step := step + 1; | |
| END LOOP ; | |
| RETURN arr2 ; | |
| END ; | |
| $$ LANGUAGE plpgsql; | |
| SELECT split_words('The Prodigy: Reunion'); | |
| -- {The Prodigy Reunion,Prodigy Reunion,Reunion} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment