meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-24 02:55:06 +08:00

Author	SHA1	Message	Date
Tamo	895ab2906c	apply review suggestions	2023-02-16 18:42:47 +01:00
Tamo	8fb7b1d10f	bump deserr	2023-02-14 20:04:30 +01:00
Tamo	74dcfe9676	Fix a bug when you update a document that was already present in the db, deleted and then inserted again in the same transform	2023-02-14 19:09:40 +01:00
Tamo	1b1703a609	make a small optimization to merge obkvs a little bit faster	2023-02-14 18:32:41 +01:00
Tamo	fb5e4957a6	fix and test the early exit in case a grenad ends with a deletion	2023-02-14 18:23:57 +01:00
Tamo	8de3c9f737	Update milli/src/update/index_documents/transform.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-02-14 17:57:14 +01:00
Tamo	43a19d0709	document the operation enum + the grenads	2023-02-14 17:55:26 +01:00
Tamo	746b31c1ce	makes clippy happy	2023-02-09 12:23:01 +01:00
Tamo	93db755d57	add a test to ensure we handle correctly a deletion of multiple time the same document	2023-02-08 21:03:34 +01:00
Tamo	93f130a400	fix all warnings	2023-02-08 20:57:35 +01:00
Tamo	421a9cf05e	provide a new method on the transform to remove documents	2023-02-08 16:06:09 +01:00
Tamo	8f64fba1ce	rewrite the current transform to handle a new byte specifying the kind of operation it's merging	2023-02-08 12:53:38 +01:00
Kerollmops	fbec48f56e	Merge remote-tracking branch 'milli/main' into bring-v1-changes	2023-02-06 16:48:10 +01:00
ManyTheFish	064158e4e2	Update test	2023-02-01 15:34:01 +01:00
Loïc Lecrenier	a2690ea8d4	Reduce incremental indexing time of `words_prefix_position_docids` DB This database can easily contain millions of entries. Thus, iterating over it can be very expensive. For regular `documentAdditionOrUpdate` tasks, `del_prefix_fst_words` will always be empty. Thus, we can save a significant amount of time by adding this `if !del_prefix_fst_words.is_empty()` condition. The code's behaviour remains completely unchanged.	2023-01-31 11:42:24 +01:00
f3r10	7681be5367	Format code	2023-01-31 11:28:05 +01:00
f3r10	50bc156257	Fix tests	2023-01-31 11:28:05 +01:00
f3r10	d8207356f4	Skip script,language insertion if language is undetected	2023-01-31 11:28:05 +01:00
f3r10	fd60a39f1c	Format code	2023-01-31 11:28:05 +01:00
f3r10	369c05732e	Add test checking if from script_language_docids database were removed deleted docids	2023-01-31 11:28:05 +01:00
f3r10	a27f329e3a	Add tests for checking that detected script and language associated with document(s) were stored during indexing	2023-01-31 11:28:05 +01:00
f3r10	b216ddba63	Delete and clear data from the new database	2023-01-31 11:28:05 +01:00
f3r10	d97fb6117e	Extract and index data	2023-01-31 11:28:05 +01:00
Louis Dureuil	20f05efb3c	clippy: needless_lifetimes	2023-01-31 11:12:59 +01:00
Louis Dureuil	cbf029f64c	clippy: --fix	2023-01-31 11:12:59 +01:00
Louis Dureuil	3296cf7ae6	clippy: remove needless lifetimes	2023-01-31 09:32:40 +01:00
Louis Dureuil	89675e5f15	clippy: Replace seek 0 by rewind	2023-01-31 09:32:40 +01:00
Tamo	de3c4f1986	throw an error on unknown fields specified in the _geo field	2023-01-24 12:23:24 +01:00
Philipp Ahlner	f5ca421227	Superfluous test removed	2023-01-19 15:39:21 +01:00
Philipp Ahlner	a2cd7214f0	Fixes error message when lat/lng are unparseable	2023-01-19 10:10:26 +01:00
ManyTheFish	d1fc42b53a	Use compatibility decomposition normalizer in facets	2023-01-18 15:02:13 +01:00
Clément Renault	1b78231e18	Make clippy happy	2023-01-17 18:25:54 +01:00
Loïc Lecrenier	f073a86387	Update deserr to latest version	2023-01-17 11:28:19 +01:00
Loïc Lecrenier	02fd06ea0b	Integrate deserr	2023-01-11 13:56:47 +01:00
bors[bot]	c3f4835e8e	Merge #733 733: Avoid a prefix-related worst-case scenario in the proximity criterion r=loiclec a=loiclec # Pull Request ## Related issue Somewhat fixes (until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3118 ## What does this PR do? When a query ends with a word and a prefix, such as: ``` word pr ``` Then we first determine whether `pre` could possibly be in the proximity prefix database before querying it. There are then three possibilities: 1. `pr` is not in any prefix cache because it is not the prefix of many words. We don't query the proximity prefix database. Instead, we list all the word derivations of `pre` through the FST and query the regular proximity databases. 2. `pr` is in the prefix cache but cannot be found in the proximity prefix databases. In this case, we partially disable the proximity ranking rule for the pair `word pre`. This is done as follows: 1. Only find the documents where `word` is in proximity to `pre` exactly (no derivations) 2. Otherwise, assume that their proximity in all the documents in which they coexist is >= 8 3. `pr` is in the prefix cache and can be found in the proximity prefix databases. In this case we simply query the proximity prefix databases. Note that if a prefix is longer than 2 bytes, then it cannot be in the proximity prefix databases. Also, proximities larger than 4 are not present in these databases either. Therefore, the impact on relevancy is: 1. For common prefixes of one or two letters: we no longer distinguish between proximities from 4 to 8 2. For common prefixes of more than two letters: we no longer distinguish between any proximities 3. For uncommon prefixes: nothing changes Regarding (1), it means that these two documents would be considered equally relevant according to the proximity rule for the query `heard pr` (IF `pr` is the prefix of more than 200 words in the dataset): ```json [ { "text": "I heard there is a faster proximity criterion" }, { "text": "I heard there is a faster but less relevant proximity criterion" } ] ``` Regarding (2), it means that two documents would be considered equally relevant according to the proximity rule for the query "faster pro": ```json [ { "text": "I heard there is a faster but less relevant proximity criterion" } { "text": "I heard there is a faster proximity criterion" }, ] ``` But the following document would be considered more relevant than the two documents above: ```json { "text": "I heard there is a faster swimmer who is competing in the pro section of the competition " } ``` Note, however, that this change of behaviour only occurs when using the set-based version of the proximity criterion. In cases where there are fewer than 1000 candidate documents when the proximity criterion is called, this PR does not change anything. --- ## Performance I couldn't use the existing search benchmarks to measure the impact of the PR, but I did some manual tests with the `songs` benchmark dataset. ``` 1. 10x 'a': - 640ms ⟹ 630ms = no significant difference 2. 10x 'b': - set-based: 4.47s ⟹ 7.42 = bad, ~2x regression - dynamic: 1s ⟹ 870 ms = no significant difference 3. 'Someone I l': - set-based: 250ms ⟹ 12 ms = very good, x20 speedup - dynamic: 21ms ⟹ 11 ms = good, x2 speedup 4. 'billie e': - set-based: 623ms ⟹ 2ms = very good, x300 speedup - dynamic: ~4ms ⟹ 4ms = no difference 5. 'billie ei': - set-based: 57ms ⟹ 20ms = good, ~2x speedup - dynamic: ~4ms ⟹ ~2ms. = no significant difference 6. 'i am getting o' - set-based: 300ms ⟹ 60ms = very good, 5x speedup - dynamic: 30ms ⟹ 6ms = very good, 5x speedup 7. 'prologue 1 a 1: - set-based: 3.36s ⟹ 120ms = very good, 30x speedup - dynamic: 200ms ⟹ 30ms = very good, 6x speedup 8. 'prologue 1 a 10': - set-based: 590ms ⟹ 18ms = very good, 30x speedup - dynamic: 82ms ⟹ 35ms = good, ~2x speedup ``` Performance is often significantly better, but there is also one regression in the set-based implementation with the query `b b b b b b b b b b`. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-01-04 09:00:50 +00:00
bors[bot]	6a10e85707	Merge #736 736: Update charabia r=curquiza a=ManyTheFish Update Charabia to the last version. > We are now Romanizing Chinese characters into Pinyin. > Note that we keep the accent because they are in fact never typed directly by the end-user, moreover, changing an accent leads to a different Chinese character, and I don't have sufficient knowledge to forecast the impact of removing accents in this context. Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-01-03 15:44:41 +00:00
Loïc Lecrenier	777b387dc4	Avoid a prefix-related worst-case scenario in the proximity criterion	2022-12-22 12:08:00 +01:00
Louis Dureuil	4b166bea2b	Add primary_key_inference test	2022-12-21 15:13:38 +01:00
Louis Dureuil	5943100754	Fix existing tests	2022-12-21 15:13:38 +01:00
Louis Dureuil	b24def3281	Add logging when inference took place. Displays log message in the form: ``` [2022-12-21T09:19:42Z INFO milli::update::index_documents::enrich] Primary key was not specified in index. Inferred to 'id' ```	2022-12-21 15:13:38 +01:00
Louis Dureuil	402dcd6b2f	Simplify primary key inference	2022-12-21 15:13:38 +01:00
Louis Dureuil	13c95d25aa	Remove uses of UserError::MissingPrimaryKey not related to inference	2022-12-21 15:13:36 +01:00
Loïc Lecrenier	fc0e7382fe	Fix hard-deletion of an external id that was soft-deleted	2022-12-20 15:33:31 +01:00
Tamo	69edbf9f6d	Update milli/src/update/delete_documents.rs	2022-12-19 18:23:50 +01:00
Louis Dureuil	916c23e7be	Tests: rename snapshots	2022-12-19 10:07:17 +01:00
Louis Dureuil	ad9937c755	Fix tests after adding DeletionStrategy	2022-12-19 10:07:17 +01:00
Louis Dureuil	171c942282	Soft-deletion computation no longer takes into account the mapsize Implemented solution 2.3 from https://github.com/meilisearch/meilisearch/issues/3231#issuecomment-1348628824	2022-12-19 10:07:17 +01:00
Louis Dureuil	e2ae3b24aa	Hard or soft delete according to the deletion strategy	2022-12-19 10:00:13 +01:00
Louis Dureuil	fc7618d49b	Add DeletionStrategy	2022-12-19 09:49:58 +01:00
ManyTheFish	7f88c4ff2f	Fix #1714 test	2022-12-15 18:22:28 +01:00
Loïc Lecrenier	be3b00350c	Apply review suggestions: naming and documentation	2022-12-13 10:15:22 +01:00
Loïc Lecrenier	e3ee553dcc	Remove soft deleted ids from ExternalDocumentIds during document import If the document import replaces a document using hard deletion	2022-12-12 14:16:09 +01:00
Loïc Lecrenier	303d740245	Prepare fix within facet range search By creating snapshots and updating the format of the existing snapshots. The next commit will apply the fix, which will show its effects cleanly on the old and new snapshot tests	2022-12-07 14:38:10 +01:00
Loïc Lecrenier	a993b68684	Cargo fmt >:-(	2022-12-06 15:22:10 +01:00
Loïc Lecrenier	80c7a00567	Fix compilation error in tests of settings update	2022-12-06 15:19:26 +01:00
Loïc Lecrenier	67d8cec209	Fix bug in handling of soft deleted documents when updating settings	2022-12-06 15:09:19 +01:00
Loïc Lecrenier	cda4ba2bb6	Add document import tests	2022-12-05 12:02:49 +01:00
Loïc Lecrenier	f2cf981641	Add more tests and allow disabling of soft-deletion outside of tests Also allow disabling soft-deletion in the IndexDocumentsConfig	2022-12-05 10:51:01 +01:00
bors[bot]	d3731dda48	Merge #706 706: Limit the reindexing caused by updating settings when not needed r=curquiza a=GregoryConrad ## What does this PR do? When updating index settings using `update::Settings`, sometimes a `reindex` of `update::Settings` is triggered when it doesn't need to be. This PR aims to prevent those unnecessary `reindex` calls. For reference, here is a snippet from the current `execute` method in `update::Settings`: ```rust // ... if stop_words_updated \|\| faceted_updated \|\| synonyms_updated \|\| searchable_updated \|\| exact_attributes_updated { self.reindex(&progress_callback, &should_abort, old_fields_ids_map)?; } ``` - [x] `faceted_updated` - looks good as-is ✅ - [x] `stop_words_updated` - looks good as-is ✅ - [x] `synonyms_updated` - looks good as-is ✅ - [x] `searchable_updated` - fixed in this PR - [x] `exact_attributes_updated` - fixed in this PR ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>	2022-12-01 13:58:02 +00:00
bors[bot]	5e754b3ee0	Merge #708 708: Reduce memory usage of the MatchingWords structure r=ManyTheFish a=loiclec # Pull Request ## Related issue Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3115 ## What does this PR do? 1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x. This is done by deduplicating the `MatchingWord` values, which are heavy because of their inner DFA. The deduplication works by wrapping each `MatchingWord` in a reference-counted box and using a hash map to determine whether a `MatchingWord` DFA already exists for a certain signature, or whether a new one needs to be built. 2. Avoid the worst-case scenario of creating a `MatchingWord` for extremely long words that cannot be indexed by milli. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-11-30 17:47:34 +00:00
Loïc Lecrenier	9dd4b33a9a	Fix bulk facet indexing bug	2022-11-30 14:27:36 +01:00
Loïc Lecrenier	8d0ace2d64	Avoid creating a MatchingWord for words that exceed the length limit	2022-11-28 10:20:13 +01:00
Gregory Conrad	e0d24104a3	refactor: Rewrite another method chain to be more readable	2022-11-26 13:33:19 -05:00
Gregory Conrad	2db738dbac	refactor: rewrite method chain to be more readable	2022-11-26 13:26:39 -05:00
Gregory Conrad	ed29cceae9	perf: Prevent reindex in searchable set case when not needed	2022-11-23 22:33:06 -05:00
Gregory Conrad	bb9e33bf85	perf: Prevent reindex in searchable reset case when not needed	2022-11-23 22:01:46 -05:00
Gregory Conrad	d19c8672bb	perf: limit reindex to when exact_attributes changes	2022-11-23 15:50:53 -05:00
bors[bot]	57c9f03e51	Merge #697 697: Fix bug in prefix DB indexing r=loiclec a=loiclec Where the batch's information was not properly updated in cases where only the proximity changed between two consecutive word pair proximities. Closes partially https://github.com/meilisearch/meilisearch/issues/3043 Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-11-17 15:22:01 +00:00
Loïc Lecrenier	777eb3fa00	Add insta-snaps for test of bug 3043	2022-11-17 12:21:27 +01:00
Loïc Lecrenier	0caadedd3b	Make clippy happy	2022-11-17 12:17:53 +01:00
Loïc Lecrenier	ac3baafbe8	Truncate facet values that are too long before indexing them	2022-11-17 11:29:42 +01:00
Loïc Lecrenier	990a861241	Add test for indexing a document with a long facet value	2022-11-17 11:29:42 +01:00
Loïc Lecrenier	d95d02cb8a	Fix Facet Indexing bugs 1. Handle keys with variable length correctly This fixes https://github.com/meilisearch/meilisearch/issues/3042 and is easily reproducible with the updated fuzz tests, which now generate keys with variable lengths. 2. Prevent adding facets to the database if their encoded value does not satisfy `valid_lmdb_key`. This fixes an indexing failure when a document had a filterable attribute containing a value whose length is higher than ~500 bytes.	2022-11-17 11:29:42 +01:00
Loïc Lecrenier	f00108d2ec	Fix name of bug in reproduction test	2022-11-17 11:29:18 +01:00
Loïc Lecrenier	f7c8730d09	Fix bug in prefix DB indexing Where the batch's information was not properly updated in cases where only the proximity changed between two consecutive word pair proximities. Closes https://github.com/meilisearch/meilisearch/issues/3043	2022-11-17 11:29:18 +01:00
Kerollmops	37b3c5c323	Fix transform to use all_documents and ignore soft_deleted documents	2022-11-08 14:23:16 +01:00
Kerollmops	1b1ad1923b	Add a test to check that we take care of soft deleted documents	2022-11-08 14:23:14 +01:00
unvalley	abf1cf9cd5	Fix clippy errors	2022-11-04 09:27:46 +09:00
unvalley	70465aa5ce	Execute cargo fmt	2022-11-04 08:59:58 +09:00
unvalley	3009981d31	Fix clippy errors Add clippy job Add clippy job to CI	2022-11-04 08:58:14 +09:00
unvalley	d55f0e2e53	Execute cargo fmt	2022-10-28 23:42:23 +09:00
unvalley	d53a80b408	Fix clippy error	2022-10-28 23:41:35 +09:00
unvalley	a1d7ed1258	fix clippy error and remove clippy job from ci Remove clippy job Fix clippy error type_complexity Restore ambiguous change	2022-10-28 22:33:50 +09:00
unvalley	f4ec1abb9b	Fix all clippy error after conflicts	2022-10-27 23:58:13 +09:00
unvalley	c7322f704c	Fix cargo clippy errors Dont apply clippy for tests for now Fix clippy warnings of filter-parser package parent 8352febd646ec4bcf56a44161e5c4dce0e55111f author unvalley <38400669+unvalley@users.noreply.github.com> 1666325847 +0900 committer unvalley <kirohi.code@gmail.com> 1666791316 +0900 Update .github/workflows/rust.yml Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com> Allow clippy lint too_many_argments Allow clippy lint needless_collect Allow clippy lint too_many_arguments and type_complexity Fix for clippy warnings comparison_chains Fix for clippy warnings vec_init_then_push Allow clippy lint should_implement_trait Allow clippy lint drop_non_drop Fix lifetime clipy warnings in filter-paprser Execute cargo fmt Fix clippy remaining warnings Fix clippy remaining warnings again and allow lint on each place	2022-10-27 01:04:23 +09:00
unvalley	811f156031	Execute cargo clippy --fix	2022-10-27 01:00:00 +09:00
Loïc Lecrenier	54c0cf93fe	Merge remote-tracking branch 'origin/main' into facet-levels-refactor	2022-10-26 15:13:34 +02:00
bors[bot]	365f44c39b	Merge #668 668: Fix many Clippy errors part 2 r=ManyTheFish a=ehiggs This brings us a step closer to enforcing clippy on each build. # Pull Request ## Related issue This does not fix any issue outright, but it is a second round of fixes for clippy after https://github.com/meilisearch/milli/pull/665. This should contribute to fixing https://github.com/meilisearch/milli/pull/659. ## What does this PR do? Satisfies many issues for clippy. The complaints are mostly: * Passing reference where a variable is already a reference. * Using clone where a struct already implements `Copy` * Using `ok_or_else` when it is a closure that returns a value instead of using the closure to call function (hence we use `ok_or`) * Unambiguous lifetimes don't need names, so we can just use `'_` * Using `return` when it is not needed as we are on the last expression of a function. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>	2022-10-26 12:16:24 +00:00
Loïc Lecrenier	2741756248	Merge remote-tracking branch 'origin/main' into facet-levels-refactor	2022-10-26 14:03:23 +02:00
Loïc Lecrenier	b7f2428961	Fix formatting and warning after rebasing from main	2022-10-26 13:49:33 +02:00
Loïc Lecrenier	14ca8048a8	Add some documentation on how to run the facet db fuzzer	2022-10-26 13:48:01 +02:00
Loïc Lecrenier	f198b20c42	Add facet deletion tests that use both the incremental and bulk methods + update deletion snapshots to the new database format	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	e3ba1fc883	Make deletion tests for both soft-deletion and hard-deletion	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	ab5e56fd16	Add document deletion snapshot tests and tests for hard-deletion	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	d885de1600	Add option to avoid soft deletion of documents	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	2295e0e3ce	Use real delete function in facet indexing fuzz tests By deleting multiple docids at once instead of one-by-one	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	acc8caebe6	Add link to GitHub PR to document of update/facet module	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	a034a1e628	Move StrRefCodec and ByteSliceRefCodec to their own files	2022-10-26 13:47:46 +02:00
Loïc Lecrenier	1165ba2171	Make facet deletion incremental	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	51961e1064	Polish some details	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	cb8442a119	Further unify facet databases of f64s and strings	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	3baa34d842	Fix compiler errors/warnings	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	86d9f50b9c	Fix bugs in incremental facet indexing with variable parameters e.g. add one facet value incrementally with a group_size = X and then add another one with group_size = Y It is not actually possible to do so with the public API of milli, but I wanted to make sure the algorithm worked well in those cases anyway. The bugs were found by fuzzing the code with fuzzcheck, which I've added to milli as a conditional dev-dependency. But it can be removed later.	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	985a94adfc	cargo fmt	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	b1ab09196c	Remove outdated TODOs	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	27454e9828	Document and refine facet indexing algorithms	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	bee3c23b45	Add comparison benchmark between bulk and incremental facet indexing	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	b2f01ad204	Refactor facet database tests	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	9026867d17	Give same interface to bulk and incremental facet indexing types + cargo fmt, oops, sorry for the bad history :(	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	330c9eb1b2	Rename facet codecs and refine FacetsUpdate API	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	485a72306d	Refactor facet-related codecs	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	9b55e582cd	Add FacetsUpdate type that wraps incremental and bulk indexing methods	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	3d145d7f48	Merge the two <facetttype>_faceted_documents_ids methods into one	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	079ed4a992	Add more snapshots	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	afdf87f6f7	Fix bugs in asc/desc criterion and facet indexing	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	a7201ece04	cargo fmt	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	36296bbb20	Add facet incremental indexing snapshot tests + fix bug	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	07ff92c663	Add more snapshots from facet tests	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	61252248fb	Fix some facet indexing bugs	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	68cbcdf08b	Fix compile errors/warnings in http-ui and infos	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	85824ee203	Try to make facet indexing incremental	2022-10-26 13:47:04 +02:00
Loïc Lecrenier	d30c89e345	Fix compile error+warnings in new tests	2022-10-26 13:46:46 +02:00
Loïc Lecrenier	e8a156d682	Reorganise facets database indexing code	2022-10-26 13:46:46 +02:00
Loïc Lecrenier	bd2c0e1ab6	Remove unused code	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	39a4a0a362	Reintroduce filter range search and facet extractors	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	22d80eeaf9	Reintroduce facet deletion functionality	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	6cc91824c1	Remove unused heed codec files	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	63ef0aba18	Start porting facet distribution and sort to new database structure	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	7913d6365c	Update Facets indexing to be compatible with new database structure	2022-10-26 13:46:14 +02:00
Loïc Lecrenier	c3f49f766d	Prepare refactor of facets database Prepare refactor of facets database	2022-10-26 13:46:14 +02:00
bors[bot]	c8f16530d5	Merge #616 616: Introduce an indexation abortion function when indexing documents r=Kerollmops a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-10-26 11:41:18 +00:00
Ewan Higgs	2ce025a906	Fixes after rebase to fix new issues.	2022-10-25 20:58:31 +02:00
Ewan Higgs	17f7922bfc	Remove unneeded lifetimes.	2022-10-25 20:49:04 +02:00
Ewan Higgs	6b2fe94192	Fixes for clippy bringing us down to 18 remaining issues. This brings us a step closer to enforcing clippy on each build.	2022-10-25 20:49:02 +02:00
Loïc Lecrenier	9a569d73d1	Minor code style change	2022-10-24 15:30:43 +02:00
Loïc Lecrenier	d76d0cb1bf	Merge branch 'main' into word-pair-proximity-docids-refactor	2022-10-24 15:23:00 +02:00
Loïc Lecrenier	a983129613	Apply suggestions from code review	2022-10-20 09:49:37 +02:00
Loïc Lecrenier	ab2f6f3aa4	Refine some details in word_prefix_pair_proximity indexing code	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	178d00f93a	Cargo fmt	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	072b576514	Fix proximity value in keys of prefix_word_pair_proximity_docids	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	6c3a5d69e1	Update snapshots	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	a7de4f5b85	Don't add swapped word pairs to the word_pair_proximity_docids db	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	264a04922d	Add prefix_word_pair_proximity database Similar to the word_prefix_pair_proximity one but instead the keys are: (proximity, prefix, word2)	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	1dbbd8694f	Rename StrStrU8Codec to U8StrStrCodec and reorder its fields	2022-10-18 10:37:34 +02:00
Loïc Lecrenier	bdeb47305e	Change encoding of word_pair_proximity DB to (proximity, word1, word2) Same for word_prefix_pair_proximity	2022-10-18 10:37:34 +02:00
Kerollmops	6603437cb1	Introduce an indexation abortion function when indexing documents	2022-10-17 17:28:03 +02:00
Ewan Higgs	beb987d3d1	Fixing piles of clippy errors. Most of these are calling clone when the struct supports Copy. Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed. I tried to stay away from actual changes or places where I'd have to name fresh variables.	2022-10-13 22:02:54 +02:00
msvaljek	762e320c35	Add proximity calculation for the same word	2022-10-07 12:59:12 +02:00
vishalsodani	00c02d00f3	Add missing logging timer to extractors	2022-09-30 22:17:06 +05:30
bors[bot]	15d478cf4d	Merge #635 635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec # Pull Request ## What does this PR do? Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing. In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used. Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-09-14 12:00:52 +00:00
Loïc Lecrenier	3794962330	Use an unstable algorithm for grenad::Sorter when possible	2022-09-13 14:49:53 +02:00
Kerollmops	d4d7c9d577	We avoid skipping errors in the indexing pipeline	2022-09-13 14:03:00 +02:00
Kerollmops	fe3973a51c	Make sure that long words are correctly skipped	2022-09-07 15:03:32 +02:00
Kerollmops	c83c3cd796	Add a test to make sure that long words are correctly skipped	2022-09-07 14:12:36 +02:00
ManyTheFish	5391e3842c	replace optional_words by term_matching_strategy	2022-08-22 17:47:19 +02:00
ManyTheFish	9640976c79	Rename TermMatchingPolicies	2022-08-18 17:36:08 +02:00
Irevoire	4aae07d5f5	expose the size methods	2022-08-17 17:07:38 +02:00
Irevoire	e96b852107	bump heed	2022-08-17 17:05:50 +02:00
bors[bot]	087da5621a	Merge #587 587: Word prefix pair proximity docids indexation refactor r=Kerollmops a=loiclec # Pull Request ## What does this PR do? Refactor the code of `WordPrefixPairProximityDocIds` to make it much faster, fix a bug, and add a unit test. ## Why is it faster? Because we avoid using a sorter to insert the (`word1`, `prefix`, `proximity`) keys and their associated bitmaps, and thus we don't have to sort a potentially very big set of data. I have also added a couple of other optimisations: 1. reusing allocations 2. using a prefix trie instead of an array of prefixes to get all the prefixes of a word 3. inserting directly into the database instead of putting the data in an intermediary grenad when possible. Also avoid checking for pre-existing values in the database when we know for certain that they do not exist. ## What bug was fixed? When reindexing, the `new_prefix_fst_words` prefixes may look like: ``` ["ant", "axo", "bor"] ``` which we group by first letter: ``` [["ant", "axo"], ["bor"]] ``` Later in the code, if we have the word2 "axolotl", we try to find which subarray of prefixes contains its prefixes. This check is done with `word2.starts_with(subarray_prefixes[0])`, but `"axolotl".starts_with("ant")` is false, and thus we wrongly think that there are no prefixes in `new_prefix_fst_words` that are prefixes of `axolotl`. ## StrStrU8Codec I had to change the encoding of `StrStrU8Codec` to make the second string null-terminated as well. I don't think this should be a problem, but I may have missed some nuances about the impacts of this change. ## Requests when reviewing this PR I have explained what the code does in the module documentation of `word_pair_proximity_prefix_docids`. It would be nice if someone could read it and give their opinion on whether it is a clear explanation or not. I also have a couple questions regarding the code itself: - Should we clean up and factor out the `PrefixTrieNode` code to try and make broader use of it outside this module? For now, the prefixes undergo a few transformations: from FST, to array, to prefix trie. It seems like it could be simplified. - I wrote a function called `write_into_lmdb_database_without_merging`. (1) Are we okay with such a function existing? (2) Should it be in `grenad_helpers` instead? ## Benchmark Results We reduce the time it takes to index about 8% in most cases, but it varies between -3% and -20%. ``` group indexing_main_ce90fc62 indexing_word-prefix-pair-proximity-docids-refactor_cbad2023 ----- ---------------------- ------------------------------------------------------------ indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.00 1893.0±233.03µs ? ?/sec 1.01 1921.2±260.79µs ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.05 9.4±3.51ms ? ?/sec 1.00 9.0±2.14ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.22 18.3±11.42ms ? ?/sec 1.00 15.0±5.79ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.00 41.4±4.20ms ? ?/sec 1.28 53.0±13.97ms ? ?/sec indexing/-wiki-delete-searchable- 1.00 285.6±18.12ms ? ?/sec 1.03 293.1±16.09ms ? ?/sec indexing/Indexing geo_point 1.03 60.8±0.45s ? ?/sec 1.00 58.8±0.68s ? ?/sec indexing/Indexing movies in three batches 1.14 16.5±0.30s ? ?/sec 1.00 14.5±0.24s ? ?/sec indexing/Indexing movies with default settings 1.11 13.7±0.07s ? ?/sec 1.00 12.3±0.28s ? ?/sec indexing/Indexing nested movies with default settings 1.10 10.6±0.11s ? ?/sec 1.00 9.6±0.15s ? ?/sec indexing/Indexing nested movies without any facets 1.11 9.4±0.15s ? ?/sec 1.00 8.5±0.10s ? ?/sec indexing/Indexing songs in three batches with default settings 1.18 66.2±0.39s ? ?/sec 1.00 56.0±0.67s ? ?/sec indexing/Indexing songs with default settings 1.07 58.7±1.26s ? ?/sec 1.00 54.7±1.71s ? ?/sec indexing/Indexing songs without any facets 1.08 53.1±0.88s ? ?/sec 1.00 49.3±1.43s ? ?/sec indexing/Indexing songs without faceted numbers 1.08 57.7±1.33s ? ?/sec 1.00 53.3±0.98s ? ?/sec indexing/Indexing wiki 1.06 1051.1±21.46s ? ?/sec 1.00 989.6±24.55s ? ?/sec indexing/Indexing wiki in three batches 1.20 1184.8±8.93s ? ?/sec 1.00 989.7±7.06s ? ?/sec indexing/Reindexing geo_point 1.04 67.5±0.75s ? ?/sec 1.00 64.9±0.32s ? ?/sec indexing/Reindexing movies with default settings 1.12 13.9±0.17s ? ?/sec 1.00 12.4±0.13s ? ?/sec indexing/Reindexing songs with default settings 1.05 60.6±0.84s ? ?/sec 1.00 57.5±0.99s ? ?/sec indexing/Reindexing wiki 1.07 1725.0±17.92s ? ?/sec 1.00 1611.4±9.90s ? ?/sec ``` Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-17 14:06:12 +00:00
bors[bot]	fb95e67a2a	Merge #608 608: Fix soft deleted documents r=ManyTheFish a=ManyTheFish When we replaced or updated some documents, the indexing was skipping the replaced documents. Related to https://github.com/meilisearch/meilisearch/issues/2672 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-08-17 13:38:10 +00:00
ManyTheFish	e9e2349ce6	Fix typo in comment	2022-08-17 15:09:48 +02:00
ManyTheFish	2668f841d1	Fix update indexing	2022-08-17 15:03:37 +02:00
ManyTheFish	7384650d85	Update test to showcase the bug	2022-08-17 15:03:08 +02:00
Loïc Lecrenier	6cc975704d	Add some documentation to facets.rs	2022-08-17 12:59:52 +02:00
Loïc Lecrenier	93252769af	Apply review suggestions	2022-08-17 12:41:22 +02:00
Loïc Lecrenier	39687908f1	Add documentation and comments to facets.rs	2022-08-17 12:26:49 +02:00
Loïc Lecrenier	8d4b21a005	Switch string facet levels indexation to new algo Write the algorithm once for both numbers and strings	2022-08-17 12:26:49 +02:00
Loïc Lecrenier	cf0cd92ed4	Refactor Facets::execute to increase performance	2022-08-17 12:26:49 +02:00
Loïc Lecrenier	78d9f0622d	cargo fmt	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	4f9edf13d7	Remove commented-out function	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	405555b401	Add some documentation to PrefixTrieNode	2022-08-17 12:21:24 +02:00
Loïc Lecrenier	1bc4788e59	Remove cached Allocations struct from wpppd indexing	2022-08-17 12:18:22 +02:00
Loïc Lecrenier	ef75a77464	Fix undefined behaviour caused by reusing key from the database New full snapshot: --- source: milli/src/update/word_prefix_pair_proximity_docids.rs --- 5 a 1 [101, ] 5 a 2 [101, ] 5 am 1 [101, ] 5 b 4 [101, ] 5 be 4 [101, ] am a 3 [101, ] amazing a 1 [100, ] amazing a 2 [100, ] amazing a 3 [100, ] amazing an 1 [100, ] amazing an 2 [100, ] amazing b 2 [100, ] amazing be 2 [100, ] an a 1 [100, ] an a 2 [100, 202, ] an am 1 [100, ] an an 2 [100, ] an b 3 [100, ] an be 3 [100, ] and a 2 [100, ] and a 3 [100, ] and a 4 [100, ] and am 2 [100, ] and an 3 [100, ] and b 1 [100, ] and be 1 [100, ] at a 1 [100, 202, ] at a 2 [100, 101, ] at a 3 [100, ] at am 2 [100, 101, ] at an 1 [100, 202, ] at an 3 [100, ] at b 3 [101, ] at b 4 [100, ] at be 3 [101, ] at be 4 [100, ] beautiful a 2 [100, ] beautiful a 3 [100, ] beautiful a 4 [100, ] beautiful am 3 [100, ] beautiful an 2 [100, ] beautiful an 4 [100, ] bell a 2 [101, ] bell a 4 [101, ] bell am 4 [101, ] extraordinary a 2 [202, ] extraordinary a 3 [202, ] extraordinary an 2 [202, ] house a 3 [100, 202, ] house a 4 [100, 202, ] house am 4 [100, ] house an 3 [100, 202, ] house b 2 [100, ] house be 2 [100, ] rings a 1 [101, ] rings a 3 [101, ] rings am 3 [101, ] rings b 2 [101, ] rings be 2 [101, ] the a 3 [101, ] the b 1 [101, ] the be 1 [101, ]	2022-08-17 12:17:45 +02:00
Loïc Lecrenier	7309111433	Don't run block code in doc tests of word_pair_proximity_docids	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	f6f8f543e1	Run cargo fmt	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	34c991ea02	Add newlines in documentation of word_prefix_pair_proximity_docids	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	06f3fd8c6d	Add more comments to WordPrefixPairProximityDocids::execute	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	474500362c	Update wpppd snapshots New snapshot (yes, it's wrong as well, it will get fixed later): --- source: milli/src/update/word_prefix_pair_proximity_docids.rs --- 5 a 1 [101, ] 5 a 2 [101, ] 5 am 1 [101, ] 5 b 4 [101, ] 5 be 4 [101, ] am a 3 [101, ] amazing a 1 [100, ] amazing a 2 [100, ] amazing a 3 [100, ] amazing an 1 [100, ] amazing an 2 [100, ] amazing b 2 [100, ] amazing be 2 [100, ] an a 1 [100, ] an a 2 [100, 202, ] an am 1 [100, ] an b 3 [100, ] an be 3 [100, ] and a 2 [100, ] and a 3 [100, ] and a 4 [100, ] and b 1 [100, ] and be 1 [100, ] d\0 0 [100, 202, ] an an 2 [100, ] and am 2 [100, ] and an 3 [100, ] at a 2 [100, 101, ] at a 3 [100, ] at am 2 [100, 101, ] at an 1 [100, 202, ] at an 3 [100, ] at b 3 [101, ] at b 4 [100, ] at be 3 [101, ] at be 4 [100, ] beautiful a 2 [100, ] beautiful a 3 [100, ] beautiful a 4 [100, ] beautiful am 3 [100, ] beautiful an 2 [100, ] beautiful an 4 [100, ] bell a 2 [101, ] bell a 4 [101, ] bell am 4 [101, ] extraordinary a 2 [202, ] extraordinary a 3 [202, ] extraordinary an 2 [202, ] house a 4 [100, 202, ] house a 4 [100, ] house am 4 [100, ] house an 3 [100, 202, ] house b 2 [100, ] house be 2 [100, ] rings a 1 [101, ] rings a 3 [101, ] rings am 3 [101, ] rings b 2 [101, ] rings be 2 [101, ] the a 3 [101, ] the b 1 [101, ] the be 1 [101, ]	2022-08-17 12:17:18 +02:00
Loïc Lecrenier	ea4a96761c	Move content of readme for WordPrefixPairProximityDocids into the code	2022-08-17 12:05:37 +02:00
Loïc Lecrenier	220921628b	Simplify and document WordPrefixPairProximityDocIds::execute	2022-08-17 11:59:19 +02:00
Loïc Lecrenier	044356d221	Optimise WordPrefixPairProximityDocIds merge operation	2022-08-17 11:59:18 +02:00
Loïc Lecrenier	d350114159	Add tests for WordPrefixPairProximityDocIds	2022-08-17 11:59:15 +02:00
Loïc Lecrenier	86807ca848	Refactor word prefix pair proximity indexation further	2022-08-17 11:59:13 +02:00
Loïc Lecrenier	306593144d	Refactor word prefix pair proximity indexation	2022-08-17 11:59:00 +02:00
Loïc Lecrenier	12920f2a4f	Fix paths of snapshot tests	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	8ac24d3114	Cargo fmt + fix compiler warnings/error	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	6066256689	Add snapshot tests for indexing of word_prefix_pair_proximity_docids	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	3a734af159	Add snapshot tests for Facets::execute	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	58cb1c1bda	Simplify unit tests in facet/filter.rs	2022-08-04 12:03:44 +02:00
Loïc Lecrenier	acff17fb88	Simplify indexing tests	2022-08-04 12:03:13 +02:00
bors[bot]	21284cf235	Merge #556 556: Add EXISTS filter r=loiclec a=loiclec ## What does this PR do? Fixes issue [#2484](https://github.com/meilisearch/meilisearch/issues/2484) in the meilisearch repo. It creates a `field EXISTS` filter which selects all documents containing the `field` key. For example, with the following documents: ```json [{ "id": 0, "colour": [] }, { "id": 1, "colour": ["blue", "green"] }, { "id": 2, "colour": 145238 }, { "id": 3, "colour": null }, { "id": 4, "colour": { "green": [] } }, { "id": 5, "colour": {} }, { "id": 6 }] ``` Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`. ## Details There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists. To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place. There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON: ```json { "id": 0, "colour": [], "size": {} } ``` would be flattened to: ```json { "id": 0 } ``` prior to being given to the extraction pipeline. This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example: ```json { "id": 0, "colour": { "green": [], "blue": 1 }, "size": {} } ``` becomes ```json { "id": 0, "colour": [], "colour.green": [], "colour.blue": 1, "size": [] } ``` Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-08-04 09:46:06 +00:00
bors[bot]	50f6524ff2	Merge #579 579: Stop reindexing already indexed documents r=ManyTheFish a=irevoire ``` % ./compare.sh indexing_stop-reindexing-unchanged-documents_cb5a1669.json indexing_main_eeba1960.json group indexing_main_eeba1960 indexing_stop-reindexing-unchanged-documents_cb5a1669 ----- ---------------------- ----------------------------------------------------- indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.03 2.0±0.22ms ? ?/sec 1.00 1955.4±336.24µs ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.08 11.0±2.93ms ? ?/sec 1.00 10.2±4.04ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 15.1±3.89ms ? ?/sec 1.14 17.1±5.18ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.26 59.2±12.01ms ? ?/sec 1.00 47.1±8.52ms ? ?/sec indexing/-wiki-delete-searchable- 1.08 316.6±31.53ms ? ?/sec 1.00 293.6±17.00ms ? ?/sec indexing/Indexing geo_point 1.01 60.9±0.31s ? ?/sec 1.00 60.6±0.36s ? ?/sec indexing/Indexing movies in three batches 1.04 20.0±0.30s ? ?/sec 1.00 19.2±0.25s ? ?/sec indexing/Indexing movies with default settings 1.02 19.1±0.18s ? ?/sec 1.00 18.7±0.24s ? ?/sec indexing/Indexing nested movies with default settings 1.02 26.2±0.29s ? ?/sec 1.00 25.9±0.22s ? ?/sec indexing/Indexing nested movies without any facets 1.02 25.3±0.32s ? ?/sec 1.00 24.7±0.26s ? ?/sec indexing/Indexing songs in three batches with default settings 1.00 66.7±0.41s ? ?/sec 1.01 67.1±0.86s ? ?/sec indexing/Indexing songs with default settings 1.00 58.3±0.90s ? ?/sec 1.01 58.8±1.32s ? ?/sec indexing/Indexing songs without any facets 1.00 54.5±1.43s ? ?/sec 1.01 55.2±1.29s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 57.9±1.20s ? ?/sec 1.01 58.4±0.93s ? ?/sec indexing/Indexing wiki 1.00 1052.0±10.95s ? ?/sec 1.02 1069.4±20.38s ? ?/sec indexing/Indexing wiki in three batches 1.00 1193.1±8.83s ? ?/sec 1.00 1189.5±9.40s ? ?/sec indexing/Reindexing geo_point 3.22 67.5±0.73s ? ?/sec 1.00 21.0±0.16s ? ?/sec indexing/Reindexing movies with default settings 3.75 19.4±0.28s ? ?/sec 1.00 5.2±0.05s ? ?/sec indexing/Reindexing songs with default settings 8.90 61.4±0.91s ? ?/sec 1.00 6.9±0.07s ? ?/sec indexing/Reindexing wiki 1.00 1748.2±35.68s ? ?/sec 1.00 1750.5±18.53s ? ?/sec ``` tldr: We do not lose any performance on the normal indexing benchmark, but we get between 3 and 8 times faster on the reindexing benchmarks 👍 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-08-04 08:10:37 +00:00
ManyTheFish	d6f9a60a32	fix: Remove whitespace trimming during document id validation fix #592	2022-08-03 11:38:40 +02:00
Tamo	7fc35c5586	remove the useless prints	2022-08-02 10:31:22 +02:00
Tamo	f156d7dd3b	Stop reindexing already indexed documents	2022-08-02 10:31:20 +02:00
Loïc Lecrenier	07003704a8	Merge branch 'filter/field-exist'	2022-07-21 14:51:41 +02:00
Loïc Lecrenier	1506683705	Avoid using too much memory when indexing facet-exists-docids	2022-07-19 14:42:35 +02:00
Loïc Lecrenier	aed8c69bcb	Refactor indexation of the "facet-id-exists-docids" database The idea is to directly create a sorted and merged list of bitmaps in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating a grenad::Reader where the keys are field_id and the values are docids. Then we send that BTreeMap to the thing that handles TypedChunks, which inserts its content into the database.	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	1eb1e73bb3	Add integration tests for the EXISTS filter	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	80b962b4f4	Run cargo fmt	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	c17d616250	Refactor index_documents_check_exists_database tests	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	30bd4db0fc	Simplify indexing task for facet_exists_docids database	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	392472f4bb	Apply suggestions from code review Co-authored-by: Tamo <tamo@meilisearch.com>	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	453d593ce8	Add a database containing the docids where each field exists	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	fc9f3f31e7	Change DocumentsBatchReader to access cursor and index at same time Otherwise it is not possible to iterate over all documents while using the fields index at the same time.	2022-07-18 16:08:14 +02:00
Loïc Lecrenier	ab1571cdec	Simplify Transform::read_documents, enabled by enriched documents reader	2022-07-18 12:45:47 +02:00
Kerollmops	448114cc1c	Fix the benchmarks with the new indexation API	2022-07-12 15:22:09 +02:00
Kerollmops	25e768f31c	Fix another issue with the nested primary key selector	2022-07-12 15:14:07 +02:00
Kerollmops	192793ee38	Add some tests to check for the nested documents ids	2022-07-12 15:14:07 +02:00
Kerollmops	dc61105554	Fix the nested document id fetching function	2022-07-12 15:14:06 +02:00
Kerollmops	2eec290424	Check the validity of the latitute and longitude numbers	2022-07-12 15:14:06 +02:00
Kerollmops	5d149d631f	Remove tests for a function that no more exists	2022-07-12 15:14:06 +02:00
Kerollmops	0bbcc7b180	Expose the `DocumentId` struct to be sure to inject the generated ids	2022-07-12 15:14:06 +02:00
Kerollmops	d1a4da9812	Generate a real UUIDv4 when ids are auto-generated	2022-07-12 15:14:06 +02:00
Kerollmops	c8ebf0de47	Rename the validate function as an enriching function	2022-07-12 15:14:06 +02:00
Kerollmops	905af2a2e9	Use the primary key and external id in the transform	2022-07-12 15:14:05 +02:00
Kerollmops	742543091e	Constify the default primary key name	2022-07-12 14:55:52 +02:00
Kerollmops	5f1bfb73ee	Extract the primary key name and make it accessible	2022-07-12 14:55:52 +02:00
Kerollmops	6a0a0ae94f	Make the Transform read from an EnrichedDocumentsBatchReader	2022-07-12 14:55:52 +02:00
Kerollmops	8ebf5eed0d	Make the nested primary key work	2022-07-12 14:55:52 +02:00
Kerollmops	19eb3b4708	Make sur that we do not accept floats as documents ids	2022-07-12 14:55:52 +02:00
Kerollmops	2ceeb51c37	Support the auto-generated ids when validating documents	2022-07-12 14:55:51 +02:00
Kerollmops	399eec5c01	Fix the indexation tests	2022-07-12 14:55:51 +02:00
Kerollmops	fcfc4caf8c	Move the Object type in the lib.rs file and use it everywhere	2022-07-12 14:55:51 +02:00
Kerollmops	0146175fe6	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	bdc4263883	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	e8297ad27e	Fix the tests for the new DocumentsBatchBuilder/Reader	2022-07-12 14:52:56 +02:00
bors[bot]	ebddfdb9a3	Merge #578 578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584). Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-07-05 14:56:08 +00:00
Kerollmops	1bfdcfc84f	Bump uuid to 1.1.2	2022-07-05 16:23:36 +02:00
Tamo	250be9fe6c	put the threshold back to 10k	2022-07-05 15:57:44 +02:00
Tamo	eaf28b0628	Apply review suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-05 15:30:33 +02:00
Tamo	3b309f654a	Fasten the document deletion When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search, and all the other endpoints.	2022-07-05 15:30:33 +02:00
Kerollmops	d7c248042b	Rename the limitedTo parameter into maxTotalHits	2022-06-22 12:00:48 +02:00
ManyTheFish	177154828c	Extends deletion tests	2022-06-13 17:34:16 +02:00
Kerollmops	445d5474cc	Add the pagination_limited_to setting to the database	2022-06-08 18:14:27 +02:00
Kerollmops	69931e50d2	Add the max_values_by_facet setting to the database	2022-06-08 17:54:56 +02:00
Kerollmops	52a494bd3b	Add the new pagination.limited_to and faceting.max_values_per_facet settings	2022-06-08 17:15:36 +02:00
Tamo	d0aaa7ff00	Fix wrong internal ids assignments	2022-06-07 15:49:33 +02:00
ad hoc	31776fdc3f	add failing test	2022-06-07 15:49:33 +02:00
ManyTheFish	86ac8568e6	Use Charabia in milli	2022-06-02 16:59:11 +02:00
ad hoc	8993fec8a3	return optional exact words	2022-05-24 09:15:49 +02:00
bors[bot]	08c6d50cd1	Merge #531 531: fix the mixed dataset geosearch indexing bug r=Kerollmops a=irevoire port #529 to main Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-16 16:06:36 +00:00
bors[bot]	cf3e574cb4	Merge #530 530: fix the searchable fields bug when a field is nested r=Kerollmops a=irevoire port #528 to main Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-16 15:52:30 +00:00
Tamo	0af399a6d7	fix the mixed dataset geosearch indexing bug	2022-05-16 17:37:45 +02:00
Tamo	f586028f9a	fix the searchable fields bug when a field is nested Update milli/src/index.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-05-16 17:24:36 +02:00
bors[bot]	e1e85267fd	Merge #526 526: remove useless comment r=irevoire a=MarinPostma Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-16 10:01:43 +00:00
bors[bot]	65e6aa0de2	Merge #523 523: Improve geosearch error messages r=irevoire a=irevoire Improve the geosearch error messages (#488). And try to parse the string as specified in https://github.com/meilisearch/meilisearch/issues/2354 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-04 13:36:11 +00:00
Tamo	c55368ddd4	apply code suggestion Co-authored-by: Kerollmops <kero@meilisearch.com>	2022-05-04 14:11:03 +02:00
ad hoc	5ad5d56f7e	remove useless comment	2022-05-04 10:43:54 +02:00
bors[bot]	0c2c8af44e	Merge #520 520: fix mistake in Settings initialization r=irevoire a=MarinPostma fix settings not being correctly initialized and add a test to make sure that they are in the future. fix https://github.com/meilisearch/meilisearch/issues/2358 Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-03 15:32:18 +00:00
Kerollmops	211c8763b9	Make sure that we do not generate too long keys	2022-05-03 10:03:15 +02:00
Kerollmops	7e47031bdc	Add a test for long keys in LMDB	2022-05-03 10:03:13 +02:00
Tamo	3cb1f6d0a1	improve geosearch error messages	2022-05-02 19:20:47 +02:00
ad hoc	1ee3d6ae33	fix mistake in Settings initialization	2022-04-29 16:24:25 +02:00
Tamo	f19d2dc548	Only flatten the required fields apply review comments Co-authored-by: Kerollmops <kero@meilisearch.com>	2022-04-26 12:33:46 +02:00
bors[bot]	8010eca9c7	Merge #505 505: normalize exact words r=curquiza a=MarinPostma Normalize the exact words, as specified in the specification. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-25 09:35:32 +00:00
ad hoc	2e0089d5ff	normalize exact words	2022-04-21 15:38:40 +02:00
ad hoc	3a2451fcba	add test normalize exact words	2022-04-21 13:52:09 +02:00
Clément Renault	eb5830aa40	Add a test to make sure that long words are handled	2022-04-21 13:45:28 +02:00
ad hoc	8b14090927	fix min-word-len-for-typo not reset properly	2022-04-19 15:20:16 +02:00
Tamo	00f78d6b5a	Apply code suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-04-14 11:14:08 +02:00
Tamo	399fba16bb	only flatten an object if it's nested	2022-04-14 11:14:08 +02:00
Tamo	ee64f4a936	Use smartstring to store the external id in our hashmap We need to store all the external id (primary key) in a hashmap associated to their internal id during. The smartstring remove heap allocation / memory usage and should improve the cache locality.	2022-04-13 21:22:07 +02:00
Irevoire	4f3ce6d9cd	nested fields	2022-04-07 16:58:46 +02:00
ad hoc	b799f3326b	rename merge_nothing to merge_ignore_values	2022-04-05 18:44:35 +02:00
ad hoc	201fea0fda	limit extract_word_docids memory usage	2022-04-05 14:14:15 +02:00
ad hoc	b85cd4983e	remove field_id_from_position	2022-04-05 09:50:34 +02:00
ad hoc	ab185a59b5	fix infos	2022-04-05 09:46:56 +02:00
ad hoc	1810927dbd	rephrase exact_attributes doc	2022-04-04 21:04:49 +02:00
ad hoc	b7694c34f5	remove println	2022-04-04 21:00:07 +02:00
ad hoc	6cabd47c32	fix typo in comment	2022-04-04 20:59:20 +02:00
ad hoc	6b2c2509b2	fix bug in exact search	2022-04-04 20:54:03 +02:00
ad hoc	e8f06f6c06	extract exact_word_prefix_docids	2022-04-04 20:54:03 +02:00
ad hoc	6dd2e4ffbd	introduce exact_word_prefix database in index	2022-04-04 20:54:03 +02:00
ad hoc	ba0bb29cd8	refactor WordPrefixDocids to take dbs instead of indexes	2022-04-04 20:54:02 +02:00
ad hoc	c4c6e35352	query exact_word_docids in resolve_query_tree	2022-04-04 20:54:02 +02:00
ad hoc	8d46a5b0b5	extract exact word docids	2022-04-04 20:54:02 +02:00
ad hoc	0a77be4ec0	introduce exact_word_docids db	2022-04-04 20:54:02 +02:00
ad hoc	5f9f82757d	refactor spawn_extraction_task	2022-04-04 20:54:02 +02:00
ad hoc	f82d4b36eb	introduce exact attribute setting	2022-04-04 20:54:02 +02:00
ad hoc	8b1e5d9c6d	add test for exact words	2022-04-04 20:10:55 +02:00
ad hoc	9bbffb8fee	add exact words setting	2022-04-04 20:10:54 +02:00
ad hoc	1941072bb2	implement Copy on Setting	2022-04-04 10:41:46 +02:00
ad hoc	66020cd923	rename min_word_len* to use plain letter numbers	2022-04-04 10:41:46 +02:00
ad hoc	4c4b336ecb	rename min word len for typo error	2022-04-01 11:17:03 +02:00
ad hoc	286dd7b2e4	rename min_word_len_2_typo	2022-04-01 11:17:03 +02:00
ad hoc	55af85db3c	add tests for min_word_len_for_typo	2022-04-01 11:17:02 +02:00
ad hoc	5a24e60572	introduce word len for typo setting	2022-04-01 11:17:02 +02:00
ad hoc	3e34981d9b	add test for authorize_typos in update	2022-03-31 14:12:00 +02:00
ad hoc	c4653347fd	add authorize typo setting	2022-03-31 10:05:44 +02:00
bors[bot]	8efac33b53	Merge #467 467: optimize prefix database r=Kerollmops a=MarinPostma This pr introduces two optimizations that greatly improve the speed of computing prefix databases. - The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST. - We unconditionally and needlessly checked for documents to remove in `word_prefix_pair`, which caused an iteration over the whole database. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-15 16:14:35 +00:00
ad hoc	d127c57f2d	review edits	2022-03-15 17:12:48 +01:00
ad hoc	d633ac5b9d	optimize word prefix pair	2022-03-15 16:37:22 +01:00
ad hoc	d68fe2b3c7	optimize word prefix fst	2022-03-15 16:36:48 +01:00
Kerollmops	21ec334dcc	Fix the compilation error of the dependency versions	2022-03-15 11:17:45 +01:00
Kerollmops	1ae13c1374	Avoid iterating on big databases when useless	2022-03-09 15:43:54 +01:00
Kerollmops	d5b8b5a2f8	Replace the ugly unwraps by clean if let Somes	2022-02-28 16:31:33 +01:00
Kerollmops	8d26f3040c	Remove a useless grenad file merging	2022-02-28 16:31:33 +01:00
Clément Renault	04b1bbf932	Reintroduce appending sorted entries when possible	2022-02-24 14:50:45 +01:00
bors[bot]	25123af3b8	Merge #436 436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops This PR depends on the fixes done in #431 and must be merged after it. In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents. --- The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity. The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-02-16 15:41:14 +00:00
Clément Renault	ff8d7a810d	Change the behavior of the as_cloneable_grenad by taking a ref	2022-02-16 15:40:08 +01:00
Clément Renault	f367cc2e75	Finally bump grenad to v0.4.1	2022-02-16 15:28:48 +01:00
Irevoire	48542ac8fd	get rid of chrono in favor of time	2022-02-15 11:41:55 +01:00
Many	d59bcea749	Revert "Revert "Change chunk size to 4MiB to fit more the end user usage""	2022-02-02 17:01:13 +01:00
Kerollmops	fb79c32430	Compute the new, common and, deleted prefix words fst once	2022-01-27 11:00:18 +01:00
Clément Renault	51d1e64b23	Remove, now useless, the WriteMethod enum	2022-01-27 10:08:35 +01:00
Clément Renault	e9c02173cf	Rework the WordsPrefixPositionDocids update to compute a subset of the database	2022-01-27 10:08:35 +01:00
Clément Renault	dbba5fd461	Create a function to simplify the word prefix pair proximity docids compute	2022-01-27 10:08:35 +01:00
Clément Renault	e760e02737	Fix the computation of the newly added and common prefix pair proximity words	2022-01-27 10:08:35 +01:00
Clément Renault	d59e559317	Fix the computation of the newly added and common prefix words	2022-01-27 10:08:34 +01:00
Clément Renault	2ec8542105	Rework the WordPrefixDocids update to compute a subset of the database	2022-01-27 10:08:34 +01:00
Clément Renault	28692f65be	Rework the WordPrefixDocids update to compute a subset of the database	2022-01-27 10:08:34 +01:00
Clément Renault	5404bc02dd	Move the fst_stream_into_hashset method in the helper methods	2022-01-27 10:06:00 +01:00
Clément Renault	c90fa95f93	Only compute the word prefix pairs on the created word pair proximities	2022-01-27 10:06:00 +01:00
Clément Renault	822f67e9ad	Bring the newly created word pair proximity docids	2022-01-27 10:06:00 +01:00
Clément Renault	d28f18658e	Retrieve the previous version of the words prefixes FST	2022-01-27 10:05:59 +01:00
Clément Renault	f9b214f34e	Apply suggestions from code review Co-authored-by: Many <legendre.maxime.isn@gmail.com>	2022-01-26 11:28:11 +01:00
Clément Renault	f04cd19886	Introduce a max prefix length parameter to the word prefix pair proximity update	2022-01-25 17:04:23 +01:00
Clément Renault	1514dfa1b7	Introduce a max proximity parameter to the word prefix pair proximity update	2022-01-25 17:04:23 +01:00
Clément Renault	23ea3ad738	Remove the useless threshold when computing the word prefix pair proximity	2022-01-25 17:04:23 +01:00
Clément Renault	e3c34684c6	Fix a bug where we were skipping most of the prefix pairs	2022-01-25 17:04:23 +01:00
bors[bot]	fd177b63f8	Merge #423 423: Remove an unused file r=irevoire a=irevoire This empty file is not included anywhere Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-19 14:18:05 +00:00
Marin Postma	0c84a40298	document batch support reusable transform rework update api add indexer config fix tests review changes Co-authored-by: Clément Renault <clement@meilisearch.com> fmt	2022-01-19 12:40:20 +01:00
Tamo	98a365aaae	store the geopoint in three dimensions	2021-12-14 12:21:24 +01:00
Tamo	d671d6f0f1	remove an unused file	2021-12-13 19:27:34 +01:00
Clément Renault	ef59762d8e	Prefer returning None instead of the Empty Filter state	2021-12-09 11:57:52 +01:00
many	8970246bc4	Sort positions before iterating over them during word pair proximity extraction	2021-11-22 18:16:54 +01:00
Marin Postma	6e977dd8e8	change visibility of DocumentDeletionResult	2021-11-22 15:44:44 +01:00
Marin Postma	6eb47ab792	remove update_id in UpdateBuilder	2021-11-16 13:07:04 +01:00
Marin Postma	09b4281cff	improve document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned meta	2021-11-10 14:08:36 +01:00
Marin Postma	721fc294be	improve document deletion returned meta returns both the remaining number of documents and the number of deleted documents.	2021-11-10 14:08:18 +01:00
Tamo	6831c23449	merge with main	2021-11-06 16:34:30 +01:00
Tamo	b249989bef	fix most of the tests	2021-11-06 01:32:12 +01:00
many	3599df77f0	Change some error messages	2021-10-27 19:33:01 +02:00
marin postma	baddd80069	implement review suggestions	2021-10-25 18:29:12 +02:00
marin postma	430e9b13d3	add csv builder tests	2021-10-25 10:26:43 +02:00
marin postma	2e62925a6e	fix tests	2021-10-25 10:26:42 +02:00
marin postma	0f86d6b28f	implement csv serialization	2021-10-25 10:26:42 +02:00
marin postma	8d70b01714	optimize document deserialization	2021-10-25 10:26:42 +02:00
bors[bot]	aa5e099718	Merge #390 390: Add helper methods on the settings r=Kerollmops a=irevoire This would be a good addition to look at the content of a setting without consuming it. It’s useful for analytics. Co-authored-by: Irevoire <tamo@meilisearch.com>	2021-10-13 20:36:30 +00:00
bors[bot]	c7db4176f3	Merge #384 384: Replace memmap with memmap2 r=Kerollmops a=palfrey [memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values. Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>	2021-10-13 13:47:23 +00:00
Irevoire	a3e7c468cd	add helper methods on the settings	2021-10-13 13:05:07 +02:00
bors[bot]	6e3b869e6a	Merge #388 388: fix primary key inference r=MarinPostma a=MarinPostma The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id". This fix sorts the the index by field_id when infering the primary key. Co-authored-by: mpostma <postma.marin@protonmail.com>	2021-10-12 09:25:16 +00:00
mpostma	86ead92ed5	infer primary key on sorted fields	2021-10-12 11:15:11 +02:00
mpostma	9a266a531b	test correct primary key inference	2021-10-12 11:08:53 +02:00
many	c5a6075484	Make max_position_per_attributes changable	2021-10-12 10:10:50 +02:00
many	360c5ff3df	Remove limit of 1000 position per attribute Instead of using an arbitrary limit we encode the absolute position in a u32 using one strong u16 for the field id and a weak u16 for the relative position in the attribute.	2021-10-12 10:10:50 +02:00
Tom Parker-Shemilt	2dfe24f067	memmap -> memmap2	2021-10-10 22:47:12 +01:00
many	3296bb243c	Simplify word level position DB into a word position DB	2021-10-05 12:15:02 +02:00
Many	26b5dad042	Revert "Change chunk size to 4MiB to fit more the end user usage"	2021-09-29 15:08:39 +02:00
Tamo	f65153ad64	stop casting integer docids to string	2021-09-28 18:35:54 +02:00
many	1988416295	Add failing test related to Meilisearch#1714	2021-09-28 12:05:11 +02:00
many	b188063869	Change chunk size to 4MiB to fit more the end user usage	2021-09-27 14:26:21 +02:00
many	551df0cb77	Add test checking the bug reported in meilisearch issue 1716	2021-09-23 15:55:39 +02:00
mpostma	aa6c5df0bc	Implement documents format document reader transform remove update format support document sequences fix document transform clean transform improve error handling add documents! macro fix transform bug fix tests remove csv dependency Add comments on the transform process replace search cli fmt review edits fix http ui fix clippy warnings Revert "fix clippy warnings" This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620. fix review comments remove smallvec in transform loop review edits	2021-09-21 16:58:33 +02:00
bors[bot]	31c8de1cca	Merge #322 322: Geosearch r=ManyTheFish a=irevoire This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so. ### What we will have to do on the indexing part: - [x] Index the `_geo` fields from the documents. - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process. - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module. - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree` - [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification. - [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file. - [x] save a Roaring bitmap of all documents containing the `_geo` field ### What we will have to do on the query part: - [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range. - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum. - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents. - [x] Add the `_geoRadius` function in the pest parser. - [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too! - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule. ----------- - On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned. Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-20 19:04:57 +00:00
many	26deeb45a3	Add lacking parameter to word level position builder	2021-09-09 17:49:04 +02:00
Irevoire	a84f3a8b31	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-09 15:09:35 +02:00
Tamo	bad8ea47d5	edit the two lasts TODO comments	2021-09-08 18:24:09 +02:00
Tamo	bd4c248292	improve the error handling in general and introduce the concept of reserved keywords	2021-09-08 18:24:09 +02:00
Tamo	5bb175fc90	only index _geo if it's set as sortable OR filterable and only allow the filters if geo was set to filterable	2021-09-08 17:51:08 +02:00
Tamo	f73273d71c	only call the extractor if needed	2021-09-08 17:51:08 +02:00
Irevoire	ea2f2ecf96	create a new database containing all the documents that were geo-faceted	2021-09-08 17:51:08 +02:00
Irevoire	216a8aa3b2	add a tests for the indexation of the geosearch	2021-09-08 17:51:07 +02:00
Irevoire	a21c854790	handle errors	2021-09-08 17:51:07 +02:00
Irevoire	70ab2c37c5	remove multiple bugs	2021-09-08 17:51:07 +02:00
Irevoire	b4b6ba6d82	rename all the ’long’ into ’lng’ like written in the specification	2021-09-08 17:51:07 +02:00
Irevoire	3b9f1db061	implement the clear of the rtree	2021-09-08 17:51:07 +02:00
Irevoire	d344489c12	implement the deletion of geo points	2021-09-08 17:51:07 +02:00
Irevoire	44d6b6ae9e	Index the geo points	2021-09-08 17:51:07 +02:00
many	e54280fbfc	Skip empty normalized words	2021-09-08 15:25:23 +02:00
many	d18ee58ab9	Check if key are not empty in validator	2021-09-08 15:25:23 +02:00
many	9961b78b06	Drop sorter before creating a new one	2021-09-08 13:30:26 +02:00
many	741a4444a9	Remove log in chunk generator	2021-09-02 16:57:46 +02:00
many	7f7fafb857	Make document_chunk_size settable from update builder	2021-09-02 15:25:39 +02:00
many	db0c681bae	Fix Pr comments	2021-09-02 15:17:52 +02:00
many	4860fd4529	Ignore empty facet values	2021-09-01 16:48:40 +02:00
many	b3a22f31f6	Fix memory consuption in word pair proximity extractor	2021-09-01 16:48:40 +02:00
many	9452fabfb2	Optimize cbo roaring bitmaps merge	2021-09-01 16:48:40 +02:00
many	8f702828ca	Ignore errors comming from crossbeam channel senders	2021-09-01 16:48:40 +02:00
many	e09eec37bc	Handle distance addition with hard separators	2021-09-01 16:48:40 +02:00
many	fc7cc770d4	Add logging timers	2021-09-01 16:48:40 +02:00
many	a2f59a28f7	Remove unwrap sending errors in channel	2021-09-01 16:48:40 +02:00
many	5c962c03dd	Fix and optimize word_prefix_pair_proximity_docids database	2021-09-01 16:48:40 +02:00
many	2d1727697d	Take stop word in account	2021-09-01 16:48:40 +02:00
many	823da19745	Fix test and use progress callback	2021-09-01 16:48:39 +02:00
many	1d314328f0	Plug new indexer	2021-09-01 16:48:36 +02:00
bors[bot]	d6bba0663a	Merge #334 334: Wrap long values into BStr for warn logs r=Kerollmops a=shekhirin Resolves https://github.com/meilisearch/milli/issues/263 Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>	2021-08-31 17:38:54 +00:00
Alexey Shekhirin	0b02eb456c	chore(update): wrap long values into BStr for warn logs	2021-08-31 20:28:16 +03:00
Kerollmops	f230ae6fd5	Introduce the reset_sortable_fields Settings method	2021-08-25 17:44:16 +02:00
Clément Renault	89d0758713	Revert "Revert "Sort at query time""	2021-08-24 11:55:16 +02:00
Clément Renault	c084f7f731	Fix the facet string docids filterable deletion bug	2021-08-23 10:50:39 +02:00
Clémentine Urquizar	922f9fd4d5	Revert "Sort at query time"	2021-08-20 18:09:17 +02:00
Kerollmops	71602e0f1b	Add the sortable fields into the settings and in the index	2021-08-18 15:04:07 +02:00
Kerollmops	5b88df508e	Use the new Asc/Desc syntax everywhere	2021-08-17 14:15:22 +02:00
bors[bot]	89b9b61840	Merge #300 300: Fix prefix level position docids database r=curquiza a=ManyTheFish The prefix search was inverted when we generated the DB. Instead of searching if word had a prefix in prefix fst, we were searching if the word was a prefix of a prefix contained in the prefix fst. The indexer, now, iterate over prefix contained in the fst and search them by prefix in the word-level-position-docids database, aggregating matches in a sorter. Fix #299 Co-authored-by: many <maxime@meilisearch.com>	2021-08-04 16:52:09 +00:00
many	cdeb07f0fd	Fix prefix level position docids database The prefix search was inverted when we generated the DB. Instead of searching if word had a prefix in prefix fst, we were searching if the word was a prefix of a prefix contained in the prefix fst. The indexer, now, iterate over prefix contained in the fst and search them by prefix in the word-level-position-docids database, aggregating matches in a sorter. Fix #299	2021-08-04 14:11:49 +02:00
bors[bot]	200e98c211	Merge #293 293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops Fix https://github.com/meilisearch/meilisearch/issues/1505. fix https://github.com/meilisearch/MeiliSearch/issues/1529 Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-07-27 16:04:52 +00:00
Kerollmops	dc2b63abdf	Introduce an empty FilterCondition variant to support unknown fields	2021-07-27 16:34:04 +02:00
Kerollmops	7aa6cc9b04	Do not insert fields in the map when changing the settings	2021-07-22 18:40:12 +02:00
bors[bot]	ee3a49cfba	Merge #291 291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released. It is not an undefined behavior, I repeat: it is a "normal" bug 🎉 👏 ---- This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it. As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`. I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes. <details> <summary>The backtrace leading us to a panic in grenad.</summary> ``` meilisearch_1 \| thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17 meilisearch_1 \| stack backtrace: meilisearch_1 \| 0: rust_begin_unwind meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5 meilisearch_1 \| 1: core::panicking::panic_fmt meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14 meilisearch_1 \| 2: core::panicking::panic meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5 meilisearch_1 \| 3: grenad::block_builder::BlockBuilder::insert meilisearch_1 \| at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17 meilisearch_1 \| 4: grenad::writer::Writer<W>::insert meilisearch_1 \| at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12 meilisearch_1 \| 5: milli::update::words_level_positions::write_level_entry meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5 meilisearch_1 \| 6: milli::update::words_level_positions::compute_positions_levels meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13 meilisearch_1 \| 7: milli::update::words_level_positions::WordsLevelPositions::execute meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23 meilisearch_1 \| 8: milli::update::index_documents::IndexDocuments::execute_raw meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9 meilisearch_1 \| 9: milli::update::index_documents::IndexDocuments::execute meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9 meilisearch_1 \| 10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30 meilisearch_1 \| 11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22 meilisearch_1 \| 12: meilisearch_http::index::update_handler::UpdateHandler::handle_update meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18 meilisearch_1 \| 13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}} meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35 meilisearch_1 \| 14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21 meilisearch_1 \| 15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17 meilisearch_1 \| 16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9 meilisearch_1 \| 17: tokio::runtime::task::core::CoreStage<T>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13 meilisearch_1 \| 18: tokio::runtime::task::harness::poll_future::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23 meilisearch_1 \| 19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9 meilisearch_1 \| 20: std::panicking::try::do_call meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40 meilisearch_1 \| 21: std::panicking::try meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19 meilisearch_1 \| 22: std::panic::catch_unwind meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14 meilisearch_1 \| 23: tokio::runtime::task::harness::poll_future meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19 meilisearch_1 \| 24: tokio::runtime::task::harness::Harness<T,S>::poll_inner meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9 meilisearch_1 \| 25: tokio::runtime::task::harness::Harness<T,S>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15 meilisearch_1 \| 26: tokio::runtime::task::raw::RawTask::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18 meilisearch_1 \| 27: tokio::runtime::task::Notified<S>::run meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9 meilisearch_1 \| 28: tokio::runtime::blocking::pool::Inner::run meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17 meilisearch_1 \| 29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17 meilisearch_1 \| note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. ``` </details> Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-07-22 16:14:35 +00:00
Kerollmops	92c0a2cdc1	Add a test that triggers a panic when indexing zeroes	2021-07-22 17:14:44 +02:00
Kerollmops	aa02a7fdd8	Add a test to check that we indeed impact the relevancy	2021-07-22 17:04:38 +02:00
Clément Renault	0227254a65	Return the original string values for the inverted facet index database	2021-07-21 16:59:39 +02:00
Kerollmops	03a01166ba	Display the original facet string value from the linear facet database	2021-07-21 16:59:39 +02:00
Clément Renault	5676b204dd	Fix the facet string levels codecs	2021-07-21 16:59:38 +02:00
Kerollmops	8c86348119	Indexing the facet strings levels	2021-07-21 16:59:38 +02:00
Kerollmops	757b2b502a	Remove the FacetValueStringCodec	2021-07-21 16:59:38 +02:00
Kerollmops	9f8095c069	Make sure that we don't keep a reference on the LMDB key when using put_current	2021-07-21 10:35:35 +02:00
Kerollmops	a9553af635	Add a test to check that we can index more that 256 fields	2021-07-06 11:58:03 +02:00
Kerollmops	838ed1cd32	Use an u16 field id instead of one byte	2021-07-06 11:58:03 +02:00
bors[bot]	b4dcdbf00d	Merge #269 #271 269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops This PR fixes #268. The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted. The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html). 271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-06-30 12:34:55 +00:00
Kerollmops	32b7bd366f	Remove the roaring operation functions warnings	2021-06-30 14:12:56 +02:00
Kerollmops	c92ef54466	Add a test for when we insert a previously deleted document	2021-06-30 14:00:01 +02:00
Clément Renault	bdc5599b73	Bump heed to use the git repo with v0.12.0	2021-06-28 18:26:20 +02:00
Clément Renault	0013236e5d	Fix the LMDB and heed invalid interactions. It is undefined behavior to keep a reference to the database while modifying it, we were keeping references in the database and also feeding the heed put_current methods with keys referenced inside the database itself. https://github.com/Kerollmops/heed/pull/108	2021-06-28 16:19:02 +02:00
Kerollmops	9e5f9a8a10	Add a test for the words level positions generation bug	2021-06-28 16:08:31 +02:00
Kerollmops	4fc8f06791	Rename faceted_fields into filterable_fields	2021-06-23 17:26:54 +02:00
Kerollmops	c31cadb54f	Do not consider the searchable field as filterable	2021-06-23 17:26:54 +02:00
bors[bot]	5b6adc6d96	Merge #245 245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops Closes #191, and resolves #140. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-06-22 12:10:52 +00:00
Kerollmops	51dbb2e06d	Warn for when a key is too large for LMDB	2021-06-22 11:51:36 +02:00
Kerollmops	0cca2ea24f	Return a MissingDocumentId when a document doesn't have one	2021-06-22 11:22:33 +02:00
Kerollmops	481b0bf277	Warn for when a facet key is too large for LMDB	2021-06-22 10:57:46 +02:00
Clémentine Urquizar	daef43f504	Rename FieldsDistribution into FieldDistribution	2021-06-21 15:57:41 +02:00
Tamo	d08cfda796	convert the field_distribution to a BTreeMap and avoid counting twice the same documents	2021-06-17 18:31:54 +02:00
Tamo	969adaefdf	rename fields_distribution in field_distribution	2021-06-17 15:16:20 +02:00
Tamo	9716fb3b36	format the whole project	2021-06-16 18:33:33 +02:00
many	ce0315a10f	Close write transaction in test	2021-06-16 11:03:37 +02:00
Kerollmops	713acc408b	Introduce the primary key to the Settings builder structure	2021-06-16 11:03:36 +02:00
Kerollmops	a7d6930905	Replace the panicking expect by tracked Errors	2021-06-15 11:51:32 +02:00
Kerollmops	28c004aa2c	Prefer using constant for the database names	2021-06-15 11:13:04 +02:00
Kerollmops	312c2d1d8e	Use the Error enum everywhere in the project	2021-06-14 16:58:38 +02:00
Kerollmops	d2b1ecc885	Remove a lot of serialization unreachable errors	2021-06-14 16:48:51 +02:00
Kerollmops	65b1d09d55	Move the obkv merging functions into the merge_function module	2021-06-14 16:48:51 +02:00
Kerollmops	ab727e428b	Remove the docid_word_positions_merge method that must never be called	2021-06-14 16:48:51 +02:00
Kerollmops	93a8633f18	Remove the documents_merge method that must never be called	2021-06-14 16:48:51 +02:00
Kerollmops	cfc7314bd1	Prefer using an explicit merge function name	2021-06-14 16:48:50 +02:00
Kerollmops	93978ec38a	Serializing a RoaringBitmap into a Vec cannot fail	2021-06-14 16:48:50 +02:00
Kerollmops	ff9414a6ba	Use the out of the compute_primary_key_pair function	2021-06-14 16:48:50 +02:00
Kerollmops	0bf4f3f48a	Modify a test to check that criteria additions change the fields ids map	2021-06-08 18:14:34 +02:00
Kerollmops	82df524e09	Make sure that we register the field when setting criteria	2021-06-08 18:14:33 +02:00
Kerollmops	133ab98260	Use the index primary key when deleting documents	2021-06-08 17:33:29 +02:00
bors[bot]	39ed133f9f	Merge #193 193: Fix primary key behavior r=Kerollmops a=MarinPostma this pr: - Adds early returns on empty document additions, avoiding error messages to be returned when adding no documents and no primary key was set. - Changes the primary key inference logic to match that of legacy meilisearch. close #194 Co-authored-by: Marin Postma <postma.marin@protonmail.com> Co-authored-by: marin postma <postma.marin@protonmail.com>	2021-06-03 10:24:21 +00:00
marin postma	57898d8a90	fix silent deserialize error	2021-06-03 10:42:55 +02:00
Kerollmops	3c304c89d4	Make sure that we generate the faceted database when required	2021-06-02 16:24:58 +02:00
Kerollmops	b0c0490e85	Make sure that we can add a Asc/Desc field without it being filterable	2021-06-02 16:24:58 +02:00
Kerollmops	6476827d3a	Fix the indexer to be sure that distinct and Asc/Desc are also faceted	2021-06-02 16:24:58 +02:00
Kerollmops	2a3f9b32ff	Rename the faceted fields into filterable fields	2021-06-02 16:24:57 +02:00
Many	ab2cf69e8d	Update milli/src/update/delete_documents.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-01 17:04:10 +02:00
Many	8e6d1ff0dc	Update milli/src/update/index_documents/store.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-01 17:04:02 +02:00
many	4ddf008be2	add field id word count database	2021-05-31 16:27:28 +02:00
Kerollmops	1c0a5cd136	Resolve code modification suggestions	2021-05-31 15:22:50 +02:00
Clément Renault	3a4a150ef0	Fix the tests and remaining warnings	2021-05-25 11:31:06 +02:00
Clément Renault	bd7b285bae	Split the update side to use the number and the strings facet databases	2021-05-25 11:30:00 +02:00
Clément Renault	837c1041c7	Clear and delete the documents from the facet database	2021-05-25 11:28:36 +02:00
Marin Postma	eeb0c70ea2	meilisearch compatible primary key inference	2021-05-06 22:42:32 +02:00
Marin Postma	313c362461	early return on empty document addition	2021-05-06 18:14:16 +02:00
Alexey Shekhirin	f8d0f5265f	fix(update): fields distribution after documents merge	2021-05-04 22:12:20 +03:00
Alexey Shekhirin	d81c0e8bba	feat(update): disable autogenerate_docids by default	2021-04-30 21:41:34 +03:00
Marin Postma	e8e32e0ba1	make document addition number visible	2021-04-29 20:05:07 +02:00
Kerollmops	e65bad16cc	Compute the words prefixes at the end of an update	2021-04-27 14:39:52 +02:00
Clément Renault	0ad9499b93	Fix an indexing bug in the words level positions	2021-04-27 14:35:43 +02:00
Clément Renault	7aa5753ed2	Make the attribute positions range bounds to be fixed	2021-04-27 14:35:43 +02:00
Kerollmops	89ee2cf576	Introduce the TreeLevel struct	2021-04-27 14:25:35 +02:00
Kerollmops	bd1a371c62	Compute the WordsLevelPositions only once	2021-04-27 14:25:34 +02:00
Kerollmops	8bd4f5d93e	Compute the biggest values of the words_level_positions_docids	2021-04-27 14:25:34 +02:00
Kerollmops	f713828406	Implement the clear and delete documents for the word-level-positions database	2021-04-27 14:25:34 +02:00
Kerollmops	3069bf4f4a	Fix and improve the words-level-positions computation	2021-04-27 14:25:34 +02:00
Kerollmops	3a25137ee4	Expose and use the WordsLevelPositions update	2021-04-27 14:25:34 +02:00
Kerollmops	c765f277a3	Introduce the WordsLevelPositions update	2021-04-27 14:25:34 +02:00
Kerollmops	9242f2f1d4	Store the first word positions levels	2021-04-27 14:25:34 +02:00
Kerollmops	b0a417f342	Introduce the word_level_position_docids Index database	2021-04-27 14:25:34 +02:00
Kerollmops	c9b2d3ae1a	Warn instead of returning an error when a conversion fails	2021-04-20 10:23:31 +02:00
Kerollmops	51767725b2	Simplify integer and float functions trait bounds	2021-04-20 10:23:31 +02:00
Alexey Shekhirin	33860bc3b7	test(update, settings): set & reset synonyms fixes after review more fixes after review	2021-04-18 11:24:17 +03:00
Alexey Shekhirin	e39aabbfe6	feat(search, update): synonyms	2021-04-18 11:24:17 +03:00
Marin Postma	45c45e11dd	implement distinct attribute distinct can return error facet distinct on numbers return distinct error review fixes make get_facet_value more generic fixes	2021-04-15 16:25:55 +02:00
tamo	dcb00b2e54	test a new implementation of the stop_words	2021-04-12 18:35:33 +02:00
Alexey Shekhirin	84c1dda39d	test(http): setting enum serialize/deserialize	2021-04-08 17:03:40 +03:00
Alexey Shekhirin	dc636d190d	refactor(http, update): introduce setting enum	2021-04-08 17:03:40 +03:00
Alexey Shekhirin	2658c5c545	feat(index): update fields distribution in clear & delete operations fixes after review bump the version of the tokenizer implement a first version of the stop_words The front must provide a BTreeSet containing the stop words The stop_words are set at None if an empty Set is provided add the stop-words in the http-ui interface Use maplit in the test and remove all the useless drop(rtxn) at the end of all tests Integrate the stop_words in the querytree remove the stop_words from the querytree except if it was a prefix or a typo more fixes after review	2021-04-01 19:12:35 +03:00
Alexey Shekhirin	27c7ab6e00	feat(index): store fields distribution in index	2021-04-01 18:35:19 +03:00
tamo	a2f46029c7	implement a first version of the stop_words The front must provide a BTreeSet containing the stop words The stop_words are set at None if an empty Set is provided add the stop-words in the http-ui interface Use maplit in the test and remove all the useless drop(rtxn) at the end of all tests	2021-04-01 13:57:55 +02:00
Alexey Shekhirin	9205b640a4	feat(index): introduce fields_ids_distribution	2021-03-31 18:44:47 +03:00
mpostma	615fe095e1	update index updated at on index writes	2021-03-15 14:05:47 +01:00
Kerollmops	f51eb46c69	Use the RoaringBitmapLenCodec to retrieve the count of documents	2021-03-09 10:25:39 +01:00
Clément Renault	e5bb96bc3b	Fix the searchable settings test	2021-03-06 12:48:41 +01:00
Kerollmops	07784c8990	Tune the words prefixes threshold to compute for 1/1000 instead	2021-03-03 15:51:28 +01:00
Kerollmops	f376c6a728	Make sure we retrieve the docid word positions	2021-03-03 15:45:03 +01:00
many	246286f0eb	take hard separator into account	2021-03-03 15:45:03 +01:00
mpostma	e08b6b3ec7	add primary key to fields_id_map when not present	2021-03-01 16:10:16 +01:00
Clément Renault	c318373b88	Expose the WordsPrefixes update on the UpdateBuilder	2021-02-21 12:15:35 +01:00
Kerollmops	a4a48be923	Run the words prefixes update inside of the indexing documents update	2021-02-17 11:22:26 +01:00
Kerollmops	616ed8f73c	Clean up the word prefix pair proximities when deleting documents	2021-02-17 11:22:26 +01:00
Clément Renault	ea37fd821d	Clean up the words prefixes when deleting documents and words	2021-02-17 11:22:25 +01:00
Clément Renault	62eee9c69e	Introduce the sorter_into_lmdb_database helper function	2021-02-17 11:12:39 +01:00
Clément Renault	b5b89990eb	Compute and write the word prefix pair proximities database	2021-02-17 11:12:38 +01:00
Kerollmops	9b03b0a1b2	Introduce the word prefix pair proximity docids database	2021-02-17 11:12:38 +01:00
Clément Renault	f365de636f	Compute and write the word-prefix-docids database	2021-02-17 11:12:38 +01:00
Clément Renault	ee5a60e1c5	Clear the words prefixes when clearing an index	2021-02-17 10:45:17 +01:00
Clément Renault	b3a21d5a50	Introduce the getters and setters for the words prefixes FST	2021-02-17 10:45:17 +01:00
Clément Renault	89ce4e74fe	Do not change the primary key type when we serialize documents	2021-02-15 21:24:36 +01:00
Clément Renault	69acdd437e	Deserialize documents ids into JSON Values on deletion	2021-02-15 21:24:36 +01:00
Clément Renault	b3776598d8	Add a test to check deletion of documents with number as primary key	2021-02-15 21:24:35 +01:00
Clément Renault	e8639517da	Change the project to become a workspace with milli as a default-member	2021-02-12 16:15:09 +01:00

... 9 10 11 12 13 ...

1005 Commits