meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-27 12:35:05 +08:00

Author	SHA1	Message	Date
Loïc Lecrenier	b7f2428961	Fix formatting and warning after rebasing from main	2022-10-26 13:49:33 +02:00
Loïc Lecrenier	14ca8048a8	Add some documentation on how to run the facet db fuzzer	2022-10-26 13:48:01 +02:00
Loïc Lecrenier	f198b20c42	Add facet deletion tests that use both the incremental and bulk methods + update deletion snapshots to the new database format	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	e3ba1fc883	Make deletion tests for both soft-deletion and hard-deletion	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	ab5e56fd16	Add document deletion snapshot tests and tests for hard-deletion	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	d885de1600	Add option to avoid soft deletion of documents	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	2295e0e3ce	Use real delete function in facet indexing fuzz tests By deleting multiple docids at once instead of one-by-one	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	acc8caebe6	Add link to GitHub PR to document of update/facet module	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	a034a1e628	Move StrRefCodec and ByteSliceRefCodec to their own files	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	1165ba2171	Make facet deletion incremental	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	51961e1064	Polish some details	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	cb8442a119	Further unify facet databases of f64s and strings	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	3baa34d842	Fix compiler errors/warnings	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	86d9f50b9c	Fix bugs in incremental facet indexing with variable parameters e.g. add one facet value incrementally with a group_size = X and then add another one with group_size = Y It is not actually possible to do so with the public API of milli, but I wanted to make sure the algorithm worked well in those cases anyway. The bugs were found by fuzzing the code with fuzzcheck, which I've added to milli as a conditional dev-dependency. But it can be removed later.	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	985a94adfc	cargo fmt	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	b1ab09196c	Remove outdated TODOs	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	27454e9828	Document and refine facet indexing algorithms	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	bee3c23b45	Add comparison benchmark between bulk and incremental facet indexing	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	b2f01ad204	Refactor facet database tests	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	9026867d17	Give same interface to bulk and incremental facet indexing types + cargo fmt, oops, sorry for the bad history :(	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	330c9eb1b2	Rename facet codecs and refine FacetsUpdate API	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	485a72306d	Refactor facet-related codecs	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	9b55e582cd	Add FacetsUpdate type that wraps incremental and bulk indexing methods	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	3d145d7f48	Merge the two <facetttype>_faceted_documents_ids methods into one	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	079ed4a992	Add more snapshots	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	afdf87f6f7	Fix bugs in asc/desc criterion and facet indexing	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	a7201ece04	cargo fmt	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	36296bbb20	Add facet incremental indexing snapshot tests + fix bug	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	07ff92c663	Add more snapshots from facet tests	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	61252248fb	Fix some facet indexing bugs	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	68cbcdf08b	Fix compile errors/warnings in http-ui and infos	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	85824ee203	Try to make facet indexing incremental	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	d30c89e345	Fix compile error+warnings in new tests	2022-10-26 13:46:46 +02:00
Loïc Lecrenier	e8a156d682	Reorganise facets database indexing code	2022-10-26 13:46:46 +02:00
Loïc Lecrenier	bd2c0e1ab6	Remove unused code	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	39a4a0a362	Reintroduce filter range search and facet extractors	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	22d80eeaf9	Reintroduce facet deletion functionality	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	6cc91824c1	Remove unused heed codec files	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	63ef0aba18	Start porting facet distribution and sort to new database structure	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	7913d6365c	Update Facets indexing to be compatible with new database structure	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	c3f49f766d	Prepare refactor of facets database Prepare refactor of facets database	2022-10-26 13:46:14 +02:00
bors[bot]	c8f16530d5	Merge #616 616: Introduce an indexation abortion function when indexing documents r=Kerollmops a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-10-26 11:41:18 +00:00
Ewan Higgs	2ce025a906	Fixes after rebase to fix new issues.	2022-10-25 20:58:31 +02:00
Ewan Higgs	17f7922bfc	Remove unneeded lifetimes.	2022-10-25 20:49:04 +02:00
Ewan Higgs	6b2fe94192	Fixes for clippy bringing us down to 18 remaining issues. This brings us a step closer to enforcing clippy on each build.	2022-10-25 20:49:02 +02:00
Loïc Lecrenier	9a569d73d1	Minor code style change	2022-10-24 15:30:43 +02:00
Loïc Lecrenier	d76d0cb1bf	Merge branch 'main' into word-pair-proximity-docids-refactor	2022-10-24 15:23:00 +02:00
Loïc Lecrenier	a983129613	Apply suggestions from code review	2022-10-20 09:49:37 +02:00
Loïc Lecrenier	ab2f6f3aa4	Refine some details in word_prefix_pair_proximity indexing code	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	178d00f93a	Cargo fmt	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	072b576514	Fix proximity value in keys of prefix_word_pair_proximity_docids	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	6c3a5d69e1	Update snapshots	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	a7de4f5b85	Don't add swapped word pairs to the word_pair_proximity_docids db	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	264a04922d	Add prefix_word_pair_proximity database Similar to the word_prefix_pair_proximity one but instead the keys are: (proximity, prefix, word2)	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	1dbbd8694f	Rename StrStrU8Codec to U8StrStrCodec and reorder its fields	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	bdeb47305e	Change encoding of word_pair_proximity DB to (proximity, word1, word2) Same for word_prefix_pair_proximity	2022-10-18 10:37:34 +02:00
Kerollmops	6603437cb1	Introduce an indexation abortion function when indexing documents	2022-10-17 17:28:03 +02:00
Ewan Higgs	beb987d3d1	Fixing piles of clippy errors. Most of these are calling clone when the struct supports Copy. Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed. I tried to stay away from actual changes or places where I'd have to name fresh variables.	2022-10-13 22:02:54 +02:00
msvaljek	762e320c35	Add proximity calculation for the same word	2022-10-07 12:59:12 +02:00
vishalsodani	00c02d00f3	Add missing logging timer to extractors	2022-09-30 22:17:06 +05:30
bors[bot]	15d478cf4d	Merge #635 635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec # Pull Request ## What does this PR do? Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing. In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used. Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-09-14 12:00:52 +00:00
Loïc Lecrenier	3794962330	Use an unstable algorithm for grenad::Sorter when possible	2022-09-13 14:49:53 +02:00
Kerollmops	d4d7c9d577	We avoid skipping errors in the indexing pipeline	2022-09-13 14:03:00 +02:00
Kerollmops	fe3973a51c	Make sure that long words are correctly skipped	2022-09-07 15:03:32 +02:00
Kerollmops	c83c3cd796	Add a test to make sure that long words are correctly skipped	2022-09-07 14:12:36 +02:00
ManyTheFish	5391e3842c	replace optional_words by term_matching_strategy	2022-08-22 17:47:19 +02:00
ManyTheFish	9640976c79	Rename TermMatchingPolicies	2022-08-18 17:36:08 +02:00
Irevoire	4aae07d5f5	expose the size methods	2022-08-17 17:07:38 +02:00
Irevoire	e96b852107	bump heed	2022-08-17 17:05:50 +02:00
bors[bot]	087da5621a	Merge #587 587: Word prefix pair proximity docids indexation refactor r=Kerollmops a=loiclec # Pull Request ## What does this PR do? Refactor the code of `WordPrefixPairProximityDocIds` to make it much faster, fix a bug, and add a unit test. ## Why is it faster? Because we avoid using a sorter to insert the (`word1`, `prefix`, `proximity`) keys and their associated bitmaps, and thus we don't have to sort a potentially very big set of data. I have also added a couple of other optimisations: 1. reusing allocations 2. using a prefix trie instead of an array of prefixes to get all the prefixes of a word 3. inserting directly into the database instead of putting the data in an intermediary grenad when possible. Also avoid checking for pre-existing values in the database when we know for certain that they do not exist. ## What bug was fixed? When reindexing, the `new_prefix_fst_words` prefixes may look like: ``` ["ant", "axo", "bor"] ``` which we group by first letter: ``` [["ant", "axo"], ["bor"]] ``` Later in the code, if we have the word2 "axolotl", we try to find which subarray of prefixes contains its prefixes. This check is done with `word2.starts_with(subarray_prefixes[0])`, but `"axolotl".starts_with("ant")` is false, and thus we wrongly think that there are no prefixes in `new_prefix_fst_words` that are prefixes of `axolotl`. ## StrStrU8Codec I had to change the encoding of `StrStrU8Codec` to make the second string null-terminated as well. I don't think this should be a problem, but I may have missed some nuances about the impacts of this change. ## Requests when reviewing this PR I have explained what the code does in the module documentation of `word_pair_proximity_prefix_docids`. It would be nice if someone could read it and give their opinion on whether it is a clear explanation or not. I also have a couple questions regarding the code itself: - Should we clean up and factor out the `PrefixTrieNode` code to try and make broader use of it outside this module? For now, the prefixes undergo a few transformations: from FST, to array, to prefix trie. It seems like it could be simplified. - I wrote a function called `write_into_lmdb_database_without_merging`. (1) Are we okay with such a function existing? (2) Should it be in `grenad_helpers` instead? ## Benchmark Results We reduce the time it takes to index about 8% in most cases, but it varies between -3% and -20%. ``` group indexing_main_ce90fc62 indexing_word-prefix-pair-proximity-docids-refactor_cbad2023 ----- ---------------------- ------------------------------------------------------------ indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.00 1893.0±233.03µs ? ?/sec 1.01 1921.2±260.79µs ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.05 9.4±3.51ms ? ?/sec 1.00 9.0±2.14ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.22 18.3±11.42ms ? ?/sec 1.00 15.0±5.79ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.00 41.4±4.20ms ? ?/sec 1.28 53.0±13.97ms ? ?/sec indexing/-wiki-delete-searchable- 1.00 285.6±18.12ms ? ?/sec 1.03 293.1±16.09ms ? ?/sec indexing/Indexing geo_point 1.03 60.8±0.45s ? ?/sec 1.00 58.8±0.68s ? ?/sec indexing/Indexing movies in three batches 1.14 16.5±0.30s ? ?/sec 1.00 14.5±0.24s ? ?/sec indexing/Indexing movies with default settings 1.11 13.7±0.07s ? ?/sec 1.00 12.3±0.28s ? ?/sec indexing/Indexing nested movies with default settings 1.10 10.6±0.11s ? ?/sec 1.00 9.6±0.15s ? ?/sec indexing/Indexing nested movies without any facets 1.11 9.4±0.15s ? ?/sec 1.00 8.5±0.10s ? ?/sec indexing/Indexing songs in three batches with default settings 1.18 66.2±0.39s ? ?/sec 1.00 56.0±0.67s ? ?/sec indexing/Indexing songs with default settings 1.07 58.7±1.26s ? ?/sec 1.00 54.7±1.71s ? ?/sec indexing/Indexing songs without any facets 1.08 53.1±0.88s ? ?/sec 1.00 49.3±1.43s ? ?/sec indexing/Indexing songs without faceted numbers 1.08 57.7±1.33s ? ?/sec 1.00 53.3±0.98s ? ?/sec indexing/Indexing wiki 1.06 1051.1±21.46s ? ?/sec 1.00 989.6±24.55s ? ?/sec indexing/Indexing wiki in three batches 1.20 1184.8±8.93s ? ?/sec 1.00 989.7±7.06s ? ?/sec indexing/Reindexing geo_point 1.04 67.5±0.75s ? ?/sec 1.00 64.9±0.32s ? ?/sec indexing/Reindexing movies with default settings 1.12 13.9±0.17s ? ?/sec 1.00 12.4±0.13s ? ?/sec indexing/Reindexing songs with default settings 1.05 60.6±0.84s ? ?/sec 1.00 57.5±0.99s ? ?/sec indexing/Reindexing wiki 1.07 1725.0±17.92s ? ?/sec 1.00 1611.4±9.90s ? ?/sec ``` Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-17 14:06:12 +00:00
bors[bot]	fb95e67a2a	Merge #608 608: Fix soft deleted documents r=ManyTheFish a=ManyTheFish When we replaced or updated some documents, the indexing was skipping the replaced documents. Related to https://github.com/meilisearch/meilisearch/issues/2672 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-08-17 13:38:10 +00:00
ManyTheFish	e9e2349ce6	Fix typo in comment	2022-08-17 15:09:48 +02:00
ManyTheFish	2668f841d1	Fix update indexing	2022-08-17 15:03:37 +02:00
ManyTheFish	7384650d85	Update test to showcase the bug	2022-08-17 15:03:08 +02:00
Loïc Lecrenier	6cc975704d	Add some documentation to facets.rs	2022-08-17 12:59:52 +02:00
Loïc Lecrenier	93252769af	Apply review suggestions	2022-08-17 12:41:22 +02:00
Loïc Lecrenier	39687908f1	Add documentation and comments to facets.rs	2022-08-17 12:26:49 +02:00
Loïc Lecrenier	8d4b21a005	Switch string facet levels indexation to new algo Write the algorithm once for both numbers and strings	2022-08-17 12:26:49 +02:00
Loïc Lecrenier	cf0cd92ed4	Refactor Facets::execute to increase performance	2022-08-17 12:26:49 +02:00
Loïc Lecrenier	78d9f0622d	cargo fmt	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	4f9edf13d7	Remove commented-out function	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	405555b401	Add some documentation to PrefixTrieNode	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	1bc4788e59	Remove cached Allocations struct from wpppd indexing	2022-08-17 12:18:22 +02:00
Loïc Lecrenier	ef75a77464	Fix undefined behaviour caused by reusing key from the database New full snapshot: --- source: milli/src/update/word_prefix_pair_proximity_docids.rs --- 5 a 1 [101, ] 5 a 2 [101, ] 5 am 1 [101, ] 5 b 4 [101, ] 5 be 4 [101, ] am a 3 [101, ] amazing a 1 [100, ] amazing a 2 [100, ] amazing a 3 [100, ] amazing an 1 [100, ] amazing an 2 [100, ] amazing b 2 [100, ] amazing be 2 [100, ] an a 1 [100, ] an a 2 [100, 202, ] an am 1 [100, ] an an 2 [100, ] an b 3 [100, ] an be 3 [100, ] and a 2 [100, ] and a 3 [100, ] and a 4 [100, ] and am 2 [100, ] and an 3 [100, ] and b 1 [100, ] and be 1 [100, ] at a 1 [100, 202, ] at a 2 [100, 101, ] at a 3 [100, ] at am 2 [100, 101, ] at an 1 [100, 202, ] at an 3 [100, ] at b 3 [101, ] at b 4 [100, ] at be 3 [101, ] at be 4 [100, ] beautiful a 2 [100, ] beautiful a 3 [100, ] beautiful a 4 [100, ] beautiful am 3 [100, ] beautiful an 2 [100, ] beautiful an 4 [100, ] bell a 2 [101, ] bell a 4 [101, ] bell am 4 [101, ] extraordinary a 2 [202, ] extraordinary a 3 [202, ] extraordinary an 2 [202, ] house a 3 [100, 202, ] house a 4 [100, 202, ] house am 4 [100, ] house an 3 [100, 202, ] house b 2 [100, ] house be 2 [100, ] rings a 1 [101, ] rings a 3 [101, ] rings am 3 [101, ] rings b 2 [101, ] rings be 2 [101, ] the a 3 [101, ] the b 1 [101, ] the be 1 [101, ]	2022-08-17 12:17:45 +02:00
Loïc Lecrenier	7309111433	Don't run block code in doc tests of word_pair_proximity_docids	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	f6f8f543e1	Run cargo fmt	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	34c991ea02	Add newlines in documentation of word_prefix_pair_proximity_docids	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	06f3fd8c6d	Add more comments to WordPrefixPairProximityDocids::execute	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	474500362c	Update wpppd snapshots New snapshot (yes, it's wrong as well, it will get fixed later): --- source: milli/src/update/word_prefix_pair_proximity_docids.rs --- 5 a 1 [101, ] 5 a 2 [101, ] 5 am 1 [101, ] 5 b 4 [101, ] 5 be 4 [101, ] am a 3 [101, ] amazing a 1 [100, ] amazing a 2 [100, ] amazing a 3 [100, ] amazing an 1 [100, ] amazing an 2 [100, ] amazing b 2 [100, ] amazing be 2 [100, ] an a 1 [100, ] an a 2 [100, 202, ] an am 1 [100, ] an b 3 [100, ] an be 3 [100, ] and a 2 [100, ] and a 3 [100, ] and a 4 [100, ] and b 1 [100, ] and be 1 [100, ] d\0 0 [100, 202, ] an an 2 [100, ] and am 2 [100, ] and an 3 [100, ] at a 2 [100, 101, ] at a 3 [100, ] at am 2 [100, 101, ] at an 1 [100, 202, ] at an 3 [100, ] at b 3 [101, ] at b 4 [100, ] at be 3 [101, ] at be 4 [100, ] beautiful a 2 [100, ] beautiful a 3 [100, ] beautiful a 4 [100, ] beautiful am 3 [100, ] beautiful an 2 [100, ] beautiful an 4 [100, ] bell a 2 [101, ] bell a 4 [101, ] bell am 4 [101, ] extraordinary a 2 [202, ] extraordinary a 3 [202, ] extraordinary an 2 [202, ] house a 4 [100, 202, ] house a 4 [100, ] house am 4 [100, ] house an 3 [100, 202, ] house b 2 [100, ] house be 2 [100, ] rings a 1 [101, ] rings a 3 [101, ] rings am 3 [101, ] rings b 2 [101, ] rings be 2 [101, ] the a 3 [101, ] the b 1 [101, ] the be 1 [101, ]	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	ea4a96761c	Move content of readme for WordPrefixPairProximityDocids into the code	2022-08-17 12:05:37 +02:00
Loïc Lecrenier	220921628b	Simplify and document WordPrefixPairProximityDocIds::execute	2022-08-17 11:59:19 +02:00
Loïc Lecrenier	044356d221	Optimise WordPrefixPairProximityDocIds merge operation	2022-08-17 11:59:18 +02:00
Loïc Lecrenier	d350114159	Add tests for WordPrefixPairProximityDocIds	2022-08-17 11:59:15 +02:00
Loïc Lecrenier	86807ca848	Refactor word prefix pair proximity indexation further	2022-08-17 11:59:13 +02:00
Loïc Lecrenier	306593144d	Refactor word prefix pair proximity indexation	2022-08-17 11:59:00 +02:00
Loïc Lecrenier	12920f2a4f	Fix paths of snapshot tests	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	8ac24d3114	Cargo fmt + fix compiler warnings/error	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	6066256689	Add snapshot tests for indexing of word_prefix_pair_proximity_docids	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	3a734af159	Add snapshot tests for Facets::execute	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	58cb1c1bda	Simplify unit tests in facet/filter.rs	2022-08-04 12:03:44 +02:00
Loïc Lecrenier	acff17fb88	Simplify indexing tests	2022-08-04 12:03:13 +02:00
bors[bot]	21284cf235	Merge #556 556: Add EXISTS filter r=loiclec a=loiclec ## What does this PR do? Fixes issue [#2484](https://github.com/meilisearch/meilisearch/issues/2484) in the meilisearch repo. It creates a `field EXISTS` filter which selects all documents containing the `field` key. For example, with the following documents: ```json [{ "id": 0, "colour": [] }, { "id": 1, "colour": ["blue", "green"] }, { "id": 2, "colour": 145238 }, { "id": 3, "colour": null }, { "id": 4, "colour": { "green": [] } }, { "id": 5, "colour": {} }, { "id": 6 }] ``` Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`. ## Details There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists. To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place. There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON: ```json { "id": 0, "colour": [], "size": {} } ``` would be flattened to: ```json { "id": 0 } ``` prior to being given to the extraction pipeline. This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example: ```json { "id": 0, "colour": { "green": [], "blue": 1 }, "size": {} } ``` becomes ```json { "id": 0, "colour": [], "colour.green": [], "colour.blue": 1, "size": [] } ``` Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-08-04 09:46:06 +00:00
bors[bot]	50f6524ff2	Merge #579 579: Stop reindexing already indexed documents r=ManyTheFish a=irevoire ``` % ./compare.sh indexing_stop-reindexing-unchanged-documents_cb5a1669.json indexing_main_eeba1960.json group indexing_main_eeba1960 indexing_stop-reindexing-unchanged-documents_cb5a1669 ----- ---------------------- ----------------------------------------------------- indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.03 2.0±0.22ms ? ?/sec 1.00 1955.4±336.24µs ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.08 11.0±2.93ms ? ?/sec 1.00 10.2±4.04ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 15.1±3.89ms ? ?/sec 1.14 17.1±5.18ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.26 59.2±12.01ms ? ?/sec 1.00 47.1±8.52ms ? ?/sec indexing/-wiki-delete-searchable- 1.08 316.6±31.53ms ? ?/sec 1.00 293.6±17.00ms ? ?/sec indexing/Indexing geo_point 1.01 60.9±0.31s ? ?/sec 1.00 60.6±0.36s ? ?/sec indexing/Indexing movies in three batches 1.04 20.0±0.30s ? ?/sec 1.00 19.2±0.25s ? ?/sec indexing/Indexing movies with default settings 1.02 19.1±0.18s ? ?/sec 1.00 18.7±0.24s ? ?/sec indexing/Indexing nested movies with default settings 1.02 26.2±0.29s ? ?/sec 1.00 25.9±0.22s ? ?/sec indexing/Indexing nested movies without any facets 1.02 25.3±0.32s ? ?/sec 1.00 24.7±0.26s ? ?/sec indexing/Indexing songs in three batches with default settings 1.00 66.7±0.41s ? ?/sec 1.01 67.1±0.86s ? ?/sec indexing/Indexing songs with default settings 1.00 58.3±0.90s ? ?/sec 1.01 58.8±1.32s ? ?/sec indexing/Indexing songs without any facets 1.00 54.5±1.43s ? ?/sec 1.01 55.2±1.29s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 57.9±1.20s ? ?/sec 1.01 58.4±0.93s ? ?/sec indexing/Indexing wiki 1.00 1052.0±10.95s ? ?/sec 1.02 1069.4±20.38s ? ?/sec indexing/Indexing wiki in three batches 1.00 1193.1±8.83s ? ?/sec 1.00 1189.5±9.40s ? ?/sec indexing/Reindexing geo_point 3.22 67.5±0.73s ? ?/sec 1.00 21.0±0.16s ? ?/sec indexing/Reindexing movies with default settings 3.75 19.4±0.28s ? ?/sec 1.00 5.2±0.05s ? ?/sec indexing/Reindexing songs with default settings 8.90 61.4±0.91s ? ?/sec 1.00 6.9±0.07s ? ?/sec indexing/Reindexing wiki 1.00 1748.2±35.68s ? ?/sec 1.00 1750.5±18.53s ? ?/sec ``` tldr: We do not lose any performance on the normal indexing benchmark, but we get between 3 and 8 times faster on the reindexing benchmarks 👍 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-08-04 08:10:37 +00:00
ManyTheFish	d6f9a60a32	fix: Remove whitespace trimming during document id validation fix #592	2022-08-03 11:38:40 +02:00
Tamo	7fc35c5586	remove the useless prints	2022-08-02 10:31:22 +02:00
Tamo	f156d7dd3b	Stop reindexing already indexed documents	2022-08-02 10:31:20 +02:00
Loïc Lecrenier	07003704a8	Merge branch 'filter/field-exist'	2022-07-21 14:51:41 +02:00
Loïc Lecrenier	1506683705	Avoid using too much memory when indexing facet-exists-docids	2022-07-19 14:42:35 +02:00
Loïc Lecrenier	aed8c69bcb	Refactor indexation of the "facet-id-exists-docids" database The idea is to directly create a sorted and merged list of bitmaps in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating a grenad::Reader where the keys are field_id and the values are docids. Then we send that BTreeMap to the thing that handles TypedChunks, which inserts its content into the database.	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	1eb1e73bb3	Add integration tests for the EXISTS filter	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	80b962b4f4	Run cargo fmt	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	c17d616250	Refactor index_documents_check_exists_database tests	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	30bd4db0fc	Simplify indexing task for facet_exists_docids database	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	392472f4bb	Apply suggestions from code review Co-authored-by: Tamo <tamo@meilisearch.com>	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	453d593ce8	Add a database containing the docids where each field exists	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	fc9f3f31e7	Change DocumentsBatchReader to access cursor and index at same time Otherwise it is not possible to iterate over all documents while using the fields index at the same time.	2022-07-18 16:08:14 +02:00
Loïc Lecrenier	ab1571cdec	Simplify Transform::read_documents, enabled by enriched documents reader	2022-07-18 12:45:47 +02:00
Kerollmops	448114cc1c	Fix the benchmarks with the new indexation API	2022-07-12 15:22:09 +02:00
Kerollmops	25e768f31c	Fix another issue with the nested primary key selector	2022-07-12 15:14:07 +02:00
Kerollmops	192793ee38	Add some tests to check for the nested documents ids	2022-07-12 15:14:07 +02:00
Kerollmops	dc61105554	Fix the nested document id fetching function	2022-07-12 15:14:06 +02:00
Kerollmops	2eec290424	Check the validity of the latitute and longitude numbers	2022-07-12 15:14:06 +02:00
Kerollmops	5d149d631f	Remove tests for a function that no more exists	2022-07-12 15:14:06 +02:00
Kerollmops	0bbcc7b180	Expose the `DocumentId` struct to be sure to inject the generated ids	2022-07-12 15:14:06 +02:00
Kerollmops	d1a4da9812	Generate a real UUIDv4 when ids are auto-generated	2022-07-12 15:14:06 +02:00
Kerollmops	c8ebf0de47	Rename the validate function as an enriching function	2022-07-12 15:14:06 +02:00
Kerollmops	905af2a2e9	Use the primary key and external id in the transform	2022-07-12 15:14:05 +02:00
Kerollmops	742543091e	Constify the default primary key name	2022-07-12 14:55:52 +02:00
Kerollmops	5f1bfb73ee	Extract the primary key name and make it accessible	2022-07-12 14:55:52 +02:00
Kerollmops	6a0a0ae94f	Make the Transform read from an EnrichedDocumentsBatchReader	2022-07-12 14:55:52 +02:00
Kerollmops	8ebf5eed0d	Make the nested primary key work	2022-07-12 14:55:52 +02:00
Kerollmops	19eb3b4708	Make sur that we do not accept floats as documents ids	2022-07-12 14:55:52 +02:00
Kerollmops	2ceeb51c37	Support the auto-generated ids when validating documents	2022-07-12 14:55:51 +02:00
Kerollmops	399eec5c01	Fix the indexation tests	2022-07-12 14:55:51 +02:00
Kerollmops	fcfc4caf8c	Move the Object type in the lib.rs file and use it everywhere	2022-07-12 14:55:51 +02:00
Kerollmops	0146175fe6	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	bdc4263883	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	e8297ad27e	Fix the tests for the new DocumentsBatchBuilder/Reader	2022-07-12 14:52:56 +02:00
bors[bot]	ebddfdb9a3	Merge #578 578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584). Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-07-05 14:56:08 +00:00
Kerollmops	1bfdcfc84f	Bump uuid to 1.1.2	2022-07-05 16:23:36 +02:00
Tamo	250be9fe6c	put the threshold back to 10k	2022-07-05 15:57:44 +02:00
Tamo	eaf28b0628	Apply review suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-05 15:30:33 +02:00
Tamo	3b309f654a	Fasten the document deletion When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search, and all the other endpoints.	2022-07-05 15:30:33 +02:00
Kerollmops	d7c248042b	Rename the limitedTo parameter into maxTotalHits	2022-06-22 12:00:48 +02:00
ManyTheFish	177154828c	Extends deletion tests	2022-06-13 17:34:16 +02:00
Kerollmops	445d5474cc	Add the pagination_limited_to setting to the database	2022-06-08 18:14:27 +02:00
Kerollmops	69931e50d2	Add the max_values_by_facet setting to the database	2022-06-08 17:54:56 +02:00
Kerollmops	52a494bd3b	Add the new pagination.limited_to and faceting.max_values_per_facet settings	2022-06-08 17:15:36 +02:00
Tamo	d0aaa7ff00	Fix wrong internal ids assignments	2022-06-07 15:49:33 +02:00
ad hoc	31776fdc3f	add failing test	2022-06-07 15:49:33 +02:00
ManyTheFish	86ac8568e6	Use Charabia in milli	2022-06-02 16:59:11 +02:00
ad hoc	8993fec8a3	return optional exact words	2022-05-24 09:15:49 +02:00
bors[bot]	08c6d50cd1	Merge #531 531: fix the mixed dataset geosearch indexing bug r=Kerollmops a=irevoire port #529 to main Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-16 16:06:36 +00:00
bors[bot]	cf3e574cb4	Merge #530 530: fix the searchable fields bug when a field is nested r=Kerollmops a=irevoire port #528 to main Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-16 15:52:30 +00:00
Tamo	0af399a6d7	fix the mixed dataset geosearch indexing bug	2022-05-16 17:37:45 +02:00
Tamo	f586028f9a	fix the searchable fields bug when a field is nested Update milli/src/index.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-05-16 17:24:36 +02:00
bors[bot]	e1e85267fd	Merge #526 526: remove useless comment r=irevoire a=MarinPostma Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-16 10:01:43 +00:00
bors[bot]	65e6aa0de2	Merge #523 523: Improve geosearch error messages r=irevoire a=irevoire Improve the geosearch error messages (#488). And try to parse the string as specified in https://github.com/meilisearch/meilisearch/issues/2354 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-04 13:36:11 +00:00
Tamo	c55368ddd4	apply code suggestion Co-authored-by: Kerollmops <kero@meilisearch.com>	2022-05-04 14:11:03 +02:00
ad hoc	5ad5d56f7e	remove useless comment	2022-05-04 10:43:54 +02:00
bors[bot]	0c2c8af44e	Merge #520 520: fix mistake in Settings initialization r=irevoire a=MarinPostma fix settings not being correctly initialized and add a test to make sure that they are in the future. fix https://github.com/meilisearch/meilisearch/issues/2358 Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-03 15:32:18 +00:00
Kerollmops	211c8763b9	Make sure that we do not generate too long keys	2022-05-03 10:03:15 +02:00
Kerollmops	7e47031bdc	Add a test for long keys in LMDB	2022-05-03 10:03:13 +02:00
Tamo	3cb1f6d0a1	improve geosearch error messages	2022-05-02 19:20:47 +02:00
ad hoc	1ee3d6ae33	fix mistake in Settings initialization	2022-04-29 16:24:25 +02:00
Tamo	f19d2dc548	Only flatten the required fields apply review comments Co-authored-by: Kerollmops <kero@meilisearch.com>	2022-04-26 12:33:46 +02:00
bors[bot]	8010eca9c7	Merge #505 505: normalize exact words r=curquiza a=MarinPostma Normalize the exact words, as specified in the specification. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-25 09:35:32 +00:00
ad hoc	2e0089d5ff	normalize exact words	2022-04-21 15:38:40 +02:00
ad hoc	3a2451fcba	add test normalize exact words	2022-04-21 13:52:09 +02:00
Clément Renault	eb5830aa40	Add a test to make sure that long words are handled	2022-04-21 13:45:28 +02:00
ad hoc	8b14090927	fix min-word-len-for-typo not reset properly	2022-04-19 15:20:16 +02:00
Tamo	00f78d6b5a	Apply code suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-04-14 11:14:08 +02:00
Tamo	399fba16bb	only flatten an object if it's nested	2022-04-14 11:14:08 +02:00
Tamo	ee64f4a936	Use smartstring to store the external id in our hashmap We need to store all the external id (primary key) in a hashmap associated to their internal id during. The smartstring remove heap allocation / memory usage and should improve the cache locality.	2022-04-13 21:22:07 +02:00
Irevoire	4f3ce6d9cd	nested fields	2022-04-07 16:58:46 +02:00
ad hoc	b799f3326b	rename merge_nothing to merge_ignore_values	2022-04-05 18:44:35 +02:00
ad hoc	201fea0fda	limit extract_word_docids memory usage	2022-04-05 14:14:15 +02:00
ad hoc	b85cd4983e	remove field_id_from_position	2022-04-05 09:50:34 +02:00
ad hoc	ab185a59b5	fix infos	2022-04-05 09:46:56 +02:00
ad hoc	1810927dbd	rephrase exact_attributes doc	2022-04-04 21:04:49 +02:00
ad hoc	b7694c34f5	remove println	2022-04-04 21:00:07 +02:00
ad hoc	6cabd47c32	fix typo in comment	2022-04-04 20:59:20 +02:00
ad hoc	6b2c2509b2	fix bug in exact search	2022-04-04 20:54:03 +02:00
ad hoc	e8f06f6c06	extract exact_word_prefix_docids	2022-04-04 20:54:03 +02:00
ad hoc	6dd2e4ffbd	introduce exact_word_prefix database in index	2022-04-04 20:54:03 +02:00
ad hoc	ba0bb29cd8	refactor WordPrefixDocids to take dbs instead of indexes	2022-04-04 20:54:02 +02:00
ad hoc	c4c6e35352	query exact_word_docids in resolve_query_tree	2022-04-04 20:54:02 +02:00
ad hoc	8d46a5b0b5	extract exact word docids	2022-04-04 20:54:02 +02:00
ad hoc	0a77be4ec0	introduce exact_word_docids db	2022-04-04 20:54:02 +02:00
ad hoc	5f9f82757d	refactor spawn_extraction_task	2022-04-04 20:54:02 +02:00
ad hoc	f82d4b36eb	introduce exact attribute setting	2022-04-04 20:54:02 +02:00
ad hoc	8b1e5d9c6d	add test for exact words	2022-04-04 20:10:55 +02:00
ad hoc	9bbffb8fee	add exact words setting	2022-04-04 20:10:54 +02:00
ad hoc	1941072bb2	implement Copy on Setting	2022-04-04 10:41:46 +02:00
ad hoc	66020cd923	rename min_word_len* to use plain letter numbers	2022-04-04 10:41:46 +02:00
ad hoc	4c4b336ecb	rename min word len for typo error	2022-04-01 11:17:03 +02:00
ad hoc	286dd7b2e4	rename min_word_len_2_typo	2022-04-01 11:17:03 +02:00
ad hoc	55af85db3c	add tests for min_word_len_for_typo	2022-04-01 11:17:02 +02:00
ad hoc	5a24e60572	introduce word len for typo setting	2022-04-01 11:17:02 +02:00
ad hoc	3e34981d9b	add test for authorize_typos in update	2022-03-31 14:12:00 +02:00

... 2 3 4 5 6 ...

566 Commits