meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 18:45:06 +08:00

Author	SHA1	Message	Date
bors[bot]	afc10acd19	Merge #596 596: Filter operators: NOT + IN[..] r=irevoire a=loiclec # Pull Request ## What does this PR do? Implements the changes described in https://github.com/meilisearch/meilisearch/issues/2580 It is based on top of #556 Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-18 11:24:32 +00:00
Loïc Lecrenier	9b6602cba2	Avoid cloning FilterCondition in filter array parsing	2022-08-18 13:06:57 +02:00
Loïc Lecrenier	c51dcad51b	Don't recompute filterable fields in evaluation of IN[] filter	2022-08-18 10:59:21 +02:00
Irevoire	4aae07d5f5	expose the size methods	2022-08-17 17:07:38 +02:00
Irevoire	e96b852107	bump heed	2022-08-17 17:05:50 +02:00
bors[bot]	087da5621a	Merge #587 587: Word prefix pair proximity docids indexation refactor r=Kerollmops a=loiclec # Pull Request ## What does this PR do? Refactor the code of `WordPrefixPairProximityDocIds` to make it much faster, fix a bug, and add a unit test. ## Why is it faster? Because we avoid using a sorter to insert the (`word1`, `prefix`, `proximity`) keys and their associated bitmaps, and thus we don't have to sort a potentially very big set of data. I have also added a couple of other optimisations: 1. reusing allocations 2. using a prefix trie instead of an array of prefixes to get all the prefixes of a word 3. inserting directly into the database instead of putting the data in an intermediary grenad when possible. Also avoid checking for pre-existing values in the database when we know for certain that they do not exist. ## What bug was fixed? When reindexing, the `new_prefix_fst_words` prefixes may look like: ``` ["ant", "axo", "bor"] ``` which we group by first letter: ``` [["ant", "axo"], ["bor"]] ``` Later in the code, if we have the word2 "axolotl", we try to find which subarray of prefixes contains its prefixes. This check is done with `word2.starts_with(subarray_prefixes[0])`, but `"axolotl".starts_with("ant")` is false, and thus we wrongly think that there are no prefixes in `new_prefix_fst_words` that are prefixes of `axolotl`. ## StrStrU8Codec I had to change the encoding of `StrStrU8Codec` to make the second string null-terminated as well. I don't think this should be a problem, but I may have missed some nuances about the impacts of this change. ## Requests when reviewing this PR I have explained what the code does in the module documentation of `word_pair_proximity_prefix_docids`. It would be nice if someone could read it and give their opinion on whether it is a clear explanation or not. I also have a couple questions regarding the code itself: - Should we clean up and factor out the `PrefixTrieNode` code to try and make broader use of it outside this module? For now, the prefixes undergo a few transformations: from FST, to array, to prefix trie. It seems like it could be simplified. - I wrote a function called `write_into_lmdb_database_without_merging`. (1) Are we okay with such a function existing? (2) Should it be in `grenad_helpers` instead? ## Benchmark Results We reduce the time it takes to index about 8% in most cases, but it varies between -3% and -20%. ``` group indexing_main_ce90fc62 indexing_word-prefix-pair-proximity-docids-refactor_cbad2023 ----- ---------------------- ------------------------------------------------------------ indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.00 1893.0±233.03µs ? ?/sec 1.01 1921.2±260.79µs ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.05 9.4±3.51ms ? ?/sec 1.00 9.0±2.14ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.22 18.3±11.42ms ? ?/sec 1.00 15.0±5.79ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.00 41.4±4.20ms ? ?/sec 1.28 53.0±13.97ms ? ?/sec indexing/-wiki-delete-searchable- 1.00 285.6±18.12ms ? ?/sec 1.03 293.1±16.09ms ? ?/sec indexing/Indexing geo_point 1.03 60.8±0.45s ? ?/sec 1.00 58.8±0.68s ? ?/sec indexing/Indexing movies in three batches 1.14 16.5±0.30s ? ?/sec 1.00 14.5±0.24s ? ?/sec indexing/Indexing movies with default settings 1.11 13.7±0.07s ? ?/sec 1.00 12.3±0.28s ? ?/sec indexing/Indexing nested movies with default settings 1.10 10.6±0.11s ? ?/sec 1.00 9.6±0.15s ? ?/sec indexing/Indexing nested movies without any facets 1.11 9.4±0.15s ? ?/sec 1.00 8.5±0.10s ? ?/sec indexing/Indexing songs in three batches with default settings 1.18 66.2±0.39s ? ?/sec 1.00 56.0±0.67s ? ?/sec indexing/Indexing songs with default settings 1.07 58.7±1.26s ? ?/sec 1.00 54.7±1.71s ? ?/sec indexing/Indexing songs without any facets 1.08 53.1±0.88s ? ?/sec 1.00 49.3±1.43s ? ?/sec indexing/Indexing songs without faceted numbers 1.08 57.7±1.33s ? ?/sec 1.00 53.3±0.98s ? ?/sec indexing/Indexing wiki 1.06 1051.1±21.46s ? ?/sec 1.00 989.6±24.55s ? ?/sec indexing/Indexing wiki in three batches 1.20 1184.8±8.93s ? ?/sec 1.00 989.7±7.06s ? ?/sec indexing/Reindexing geo_point 1.04 67.5±0.75s ? ?/sec 1.00 64.9±0.32s ? ?/sec indexing/Reindexing movies with default settings 1.12 13.9±0.17s ? ?/sec 1.00 12.4±0.13s ? ?/sec indexing/Reindexing songs with default settings 1.05 60.6±0.84s ? ?/sec 1.00 57.5±0.99s ? ?/sec indexing/Reindexing wiki 1.07 1725.0±17.92s ? ?/sec 1.00 1611.4±9.90s ? ?/sec ``` Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-17 14:06:12 +00:00
bors[bot]	fb95e67a2a	Merge #608 608: Fix soft deleted documents r=ManyTheFish a=ManyTheFish When we replaced or updated some documents, the indexing was skipping the replaced documents. Related to https://github.com/meilisearch/meilisearch/issues/2672 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-08-17 13:38:10 +00:00
bors[bot]	e4a52e6e45	Merge #594 594: Fix(Search): Fix phrase search candidates computation r=Kerollmops a=ManyTheFish This bug is an old bug but was hidden by the proximity criterion, Phrase searches were always returning an empty candidates list when the proximity criterion is deactivated. Before the fix, we were trying to find any words[n] near words[n] instead of finding any words[n] near words[n+1], for example: for a phrase search '"Hello world"' we were searching for "hello" near "hello" first, instead of "hello" near "world". Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-08-17 13:22:52 +00:00
ManyTheFish	8c3f1a9c39	Remove useless lifetime declaration	2022-08-17 15:20:43 +02:00
ManyTheFish	e9e2349ce6	Fix typo in comment	2022-08-17 15:09:48 +02:00
ManyTheFish	2668f841d1	Fix update indexing	2022-08-17 15:03:37 +02:00
ManyTheFish	7384650d85	Update test to showcase the bug	2022-08-17 15:03:08 +02:00
bors[bot]	39869be23b	Merge #590 590: Optimise facets indexing r=Kerollmops a=loiclec # Pull Request ## What does this PR do? Fixes #589 ## Notes I added documentation for the whole module which attempts to explain the shape of the databases and their purpose. However, I realise there is already some documentation about this, so I am not sure if we want to keep it. ## Benchmarks We get a ~1.15x speed up on the geo_point benchmark. ``` group indexing_main_57042355 indexing_optimise-facets-indexation_5728619a ----- ---------------------- -------------------------------------------- indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.00 1862.7±294.45µs ? ?/sec 1.58 2.9±1.32ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.11 8.9±2.44ms ? ?/sec 1.00 8.0±1.42ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 12.8±3.32ms ? ?/sec 1.32 16.9±6.98ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.09 43.8±4.78ms ? ?/sec 1.00 40.3±3.79ms ? ?/sec indexing/-wiki-delete-searchable- 1.08 287.4±28.72ms ? ?/sec 1.00 264.9±9.46ms ? ?/sec indexing/Indexing geo_point 1.14 61.2±0.39s ? ?/sec 1.00 53.8±0.57s ? ?/sec indexing/Indexing movies in three batches 1.00 16.6±0.12s ? ?/sec 1.00 16.5±0.10s ? ?/sec indexing/Indexing movies with default settings 1.00 14.1±0.30s ? ?/sec 1.00 14.0±0.28s ? ?/sec indexing/Indexing nested movies with default settings 1.10 10.9±0.50s ? ?/sec 1.00 10.0±0.10s ? ?/sec indexing/Indexing nested movies without any facets 1.01 9.6±0.23s ? ?/sec 1.00 9.5±0.06s ? ?/sec indexing/Indexing songs in three batches with default settings 1.07 66.3±0.55s ? ?/sec 1.00 61.8±0.63s ? ?/sec indexing/Indexing songs with default settings 1.03 58.8±0.82s ? ?/sec 1.00 57.1±1.22s ? ?/sec indexing/Indexing songs without any facets 1.00 53.6±1.09s ? ?/sec 1.01 54.0±0.58s ? ?/sec indexing/Indexing songs without faceted numbers 1.02 58.0±1.29s ? ?/sec 1.00 57.1±1.43s ? ?/sec indexing/Indexing wiki 1.00 1064.1±21.20s ? ?/sec 1.00 1068.0±20.49s ? ?/sec indexing/Indexing wiki in three batches 1.00 1182.5±9.62s ? ?/sec 1.01 1191.2±10.96s ? ?/sec indexing/Reindexing geo_point 1.12 68.0±0.21s ? ?/sec 1.00 60.5±0.82s ? ?/sec indexing/Reindexing movies with default settings 1.01 14.1±0.21s ? ?/sec 1.00 14.0±0.26s ? ?/sec indexing/Reindexing songs with default settings 1.04 61.6±0.57s ? ?/sec 1.00 59.2±0.87s ? ?/sec indexing/Reindexing wiki 1.00 1734.0±11.38s ? ?/sec 1.01 1746.6±22.48s ? ?/sec ``` Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-17 11:46:55 +00:00
Loïc Lecrenier	6cc975704d	Add some documentation to facets.rs	2022-08-17 12:59:52 +02:00
Loïc Lecrenier	93252769af	Apply review suggestions	2022-08-17 12:41:22 +02:00
Loïc Lecrenier	196f79115a	Run cargo fmt	2022-08-17 12:28:33 +02:00
Loïc Lecrenier	d10d78d520	Add integration tests for the IN filter	2022-08-17 12:28:33 +02:00
Loïc Lecrenier	ca97cb0eda	Implement the IN filter operator	2022-08-17 12:28:33 +02:00
Loïc Lecrenier	cc7415bb31	Simplify FilterCondition code, made possible by the new NOT operator	2022-08-17 12:28:33 +02:00
Loïc Lecrenier	44744d9e67	Implement the simplified NOT operator	2022-08-17 12:28:33 +02:00
Loïc Lecrenier	01675771d5	Reimplement `!=` filter to select all docids not selected by `=`	2022-08-17 12:28:33 +02:00
Loïc Lecrenier	258c3dd563	Make AND+OR filters n-ary (store a vector of subfilters instead of 2) NOTE: The token_at_depth is method is a bit useless now, as the only cases where there would be a toke at depth 1000 are the cases where the parser already stack-overflowed earlier. Example: (((((... (x=1) ...)))))	2022-08-17 12:28:33 +02:00
Loïc Lecrenier	39687908f1	Add documentation and comments to facets.rs	2022-08-17 12:26:49 +02:00
Loïc Lecrenier	8d4b21a005	Switch string facet levels indexation to new algo Write the algorithm once for both numbers and strings	2022-08-17 12:26:49 +02:00
Loïc Lecrenier	cf0cd92ed4	Refactor Facets::execute to increase performance	2022-08-17 12:26:49 +02:00
bors[bot]	cd2635ccfc	Merge #602 602: Use mimalloc as the default allocator r=Kerollmops a=loiclec ## What does this PR do? Use mimalloc as the global allocator for milli's benchmarks on macOS. ## Why? On Linux, we use jemalloc, which is a very fast allocator. But on macOS, we currently use the system allocator, which is very slow. In practice, this difference in allocator speed means that it is difficult to gain insight into milli's performance by running benchmarks locally on the Mac. By using mimalloc, which is another excellent allocator, we reduce the speed difference between the two platforms. Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-17 10:26:13 +00:00
Loïc Lecrenier	78d9f0622d	cargo fmt	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	4f9edf13d7	Remove commented-out function	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	405555b401	Add some documentation to PrefixTrieNode	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	1bc4788e59	Remove cached Allocations struct from wpppd indexing	2022-08-17 12:18:22 +02:00
Loïc Lecrenier	ef75a77464	Fix undefined behaviour caused by reusing key from the database New full snapshot: --- source: milli/src/update/word_prefix_pair_proximity_docids.rs --- 5 a 1 [101, ] 5 a 2 [101, ] 5 am 1 [101, ] 5 b 4 [101, ] 5 be 4 [101, ] am a 3 [101, ] amazing a 1 [100, ] amazing a 2 [100, ] amazing a 3 [100, ] amazing an 1 [100, ] amazing an 2 [100, ] amazing b 2 [100, ] amazing be 2 [100, ] an a 1 [100, ] an a 2 [100, 202, ] an am 1 [100, ] an an 2 [100, ] an b 3 [100, ] an be 3 [100, ] and a 2 [100, ] and a 3 [100, ] and a 4 [100, ] and am 2 [100, ] and an 3 [100, ] and b 1 [100, ] and be 1 [100, ] at a 1 [100, 202, ] at a 2 [100, 101, ] at a 3 [100, ] at am 2 [100, 101, ] at an 1 [100, 202, ] at an 3 [100, ] at b 3 [101, ] at b 4 [100, ] at be 3 [101, ] at be 4 [100, ] beautiful a 2 [100, ] beautiful a 3 [100, ] beautiful a 4 [100, ] beautiful am 3 [100, ] beautiful an 2 [100, ] beautiful an 4 [100, ] bell a 2 [101, ] bell a 4 [101, ] bell am 4 [101, ] extraordinary a 2 [202, ] extraordinary a 3 [202, ] extraordinary an 2 [202, ] house a 3 [100, 202, ] house a 4 [100, 202, ] house am 4 [100, ] house an 3 [100, 202, ] house b 2 [100, ] house be 2 [100, ] rings a 1 [101, ] rings a 3 [101, ] rings am 3 [101, ] rings b 2 [101, ] rings be 2 [101, ] the a 3 [101, ] the b 1 [101, ] the be 1 [101, ]	2022-08-17 12:17:45 +02:00
Loïc Lecrenier	7309111433	Don't run block code in doc tests of word_pair_proximity_docids	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	f6f8f543e1	Run cargo fmt	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	34c991ea02	Add newlines in documentation of word_prefix_pair_proximity_docids	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	06f3fd8c6d	Add more comments to WordPrefixPairProximityDocids::execute	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	474500362c	Update wpppd snapshots New snapshot (yes, it's wrong as well, it will get fixed later): --- source: milli/src/update/word_prefix_pair_proximity_docids.rs --- 5 a 1 [101, ] 5 a 2 [101, ] 5 am 1 [101, ] 5 b 4 [101, ] 5 be 4 [101, ] am a 3 [101, ] amazing a 1 [100, ] amazing a 2 [100, ] amazing a 3 [100, ] amazing an 1 [100, ] amazing an 2 [100, ] amazing b 2 [100, ] amazing be 2 [100, ] an a 1 [100, ] an a 2 [100, 202, ] an am 1 [100, ] an b 3 [100, ] an be 3 [100, ] and a 2 [100, ] and a 3 [100, ] and a 4 [100, ] and b 1 [100, ] and be 1 [100, ] d\0 0 [100, 202, ] an an 2 [100, ] and am 2 [100, ] and an 3 [100, ] at a 2 [100, 101, ] at a 3 [100, ] at am 2 [100, 101, ] at an 1 [100, 202, ] at an 3 [100, ] at b 3 [101, ] at b 4 [100, ] at be 3 [101, ] at be 4 [100, ] beautiful a 2 [100, ] beautiful a 3 [100, ] beautiful a 4 [100, ] beautiful am 3 [100, ] beautiful an 2 [100, ] beautiful an 4 [100, ] bell a 2 [101, ] bell a 4 [101, ] bell am 4 [101, ] extraordinary a 2 [202, ] extraordinary a 3 [202, ] extraordinary an 2 [202, ] house a 4 [100, 202, ] house a 4 [100, ] house am 4 [100, ] house an 3 [100, 202, ] house b 2 [100, ] house be 2 [100, ] rings a 1 [101, ] rings a 3 [101, ] rings am 3 [101, ] rings b 2 [101, ] rings be 2 [101, ] the a 3 [101, ] the b 1 [101, ] the be 1 [101, ]	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	ea4a96761c	Move content of readme for WordPrefixPairProximityDocids into the code	2022-08-17 12:05:37 +02:00
Loïc Lecrenier	220921628b	Simplify and document WordPrefixPairProximityDocIds::execute	2022-08-17 11:59:19 +02:00
Loïc Lecrenier	044356d221	Optimise WordPrefixPairProximityDocIds merge operation	2022-08-17 11:59:18 +02:00
Loïc Lecrenier	d350114159	Add tests for WordPrefixPairProximityDocIds	2022-08-17 11:59:15 +02:00
Loïc Lecrenier	86807ca848	Refactor word prefix pair proximity indexation further	2022-08-17 11:59:13 +02:00
Loïc Lecrenier	306593144d	Refactor word prefix pair proximity indexation	2022-08-17 11:59:00 +02:00
Loïc Lecrenier	20be69e1b9	Always use mimalloc as the global allocator	2022-08-16 20:09:36 +02:00
Loïc Lecrenier	dea00311b6	Add type annotations to remove compiler error	2022-08-16 09:19:30 +02:00
Loïc Lecrenier	6f49126223	Fix db_snap macro with inline parameter	2022-08-10 15:55:22 +02:00
Loïc Lecrenier	12920f2a4f	Fix paths of snapshot tests	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	4b7fd4dfae	Update insta version	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	ce560fdcb5	Add documentation for `db_snap!`	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	748bb86b5b	cargo fmt	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	051f24f674	Switch to snapshot tests for search/matches/mod.rs	2022-08-10 15:53:46 +02:00

1 2 3 4 5 ...

995 Commits