meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-12-12 14:36:01 +08:00

Author	SHA1	Message	Date
bors[bot]	941af58239	Merge #561 561: Enriched documents batch reader r=curquiza a=Kerollmops ~This PR is based on #555 and must be rebased on main after it has been merged to ease the review.~ This PR contains the work in #555 and can be merged on main as soon as reviewed and approved. - [x] Create an `EnrichedDocumentsBatchReader` that contains the external documents id. - [x] Extract the primary key name and make it accessible in the `EnrichedDocumentsBatchReader`. - [x] Use the external id from the `EnrichedDocumentsBatchReader` in the `Transform::read_documents`. - [x] Remove the `update_primary_key` from the _transform.rs_ file. - [x] Really generate the auto-generated documents ids. - [x] Insert the (auto-generated) document ids in the document while processing it in `Transform::read_documents`. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-07-21 07:08:50 +00:00
Loïc Lecrenier	41a0ce07cb	Add a code comment, as suggested in PR review Co-authored-by: Many the fish <many@meilisearch.com>	2022-07-20 16:20:35 +02:00
Loïc Lecrenier	1506683705	Avoid using too much memory when indexing facet-exists-docids	2022-07-19 14:42:35 +02:00
Loïc Lecrenier	d0eee5ff7a	Fix compiler error	2022-07-19 13:54:30 +02:00
Loïc Lecrenier	aed8c69bcb	Refactor indexation of the "facet-id-exists-docids" database The idea is to directly create a sorted and merged list of bitmaps in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating a grenad::Reader where the keys are field_id and the values are docids. Then we send that BTreeMap to the thing that handles TypedChunks, which inserts its content into the database.	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	1eb1e73bb3	Add integration tests for the EXISTS filter	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	4f0bd317df	Remove custom implementation of BytesEncode/Decode for the FieldId	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	80b962b4f4	Run cargo fmt	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	c17d616250	Refactor index_documents_check_exists_database tests	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	30bd4db0fc	Simplify indexing task for facet_exists_docids database	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	392472f4bb	Apply suggestions from code review Co-authored-by: Tamo <tamo@meilisearch.com>	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	0388b2d463	Run cargo fmt	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	dc64170a69	Improve syntax of EXISTS filter, allow “value NOT EXISTS”	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	72452f0cb2	Implements the EXIST filter operator	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	453d593ce8	Add a database containing the docids where each field exists	2022-07-19 10:07:33 +02:00
Many the fish	2d79720f5d	Update milli/src/search/matches/mod.rs	2022-07-18 17:48:04 +02:00
Many the fish	8ddb4e750b	Update milli/src/search/matches/mod.rs	2022-07-18 17:47:39 +02:00
Many the fish	a277daa1f2	Update milli/src/search/matches/mod.rs	2022-07-18 17:47:13 +02:00
Many the fish	fb794c6b5e	Update milli/src/search/matches/mod.rs	2022-07-18 17:46:00 +02:00
Many the fish	1237cfc249	Update milli/src/search/matches/mod.rs	2022-07-18 17:45:37 +02:00
Many the fish	d7fd5c58cd	Update milli/src/search/matches/mod.rs	2022-07-18 17:45:06 +02:00
Loïc Lecrenier	fc9f3f31e7	Change DocumentsBatchReader to access cursor and index at same time Otherwise it is not possible to iterate over all documents while using the fields index at the same time.	2022-07-18 16:08:14 +02:00
Loïc Lecrenier	ab1571cdec	Simplify Transform::read_documents, enabled by enriched documents reader	2022-07-18 12:45:47 +02:00
Many the fish	e261ef64d7	Update milli/src/search/matches/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-18 10:18:51 +02:00
Many the fish	1da4ab5918	Update milli/src/search/matches/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-18 10:18:03 +02:00
Kerollmops	448114cc1c	Fix the benchmarks with the new indexation API	2022-07-12 15:22:09 +02:00
Kerollmops	25e768f31c	Fix another issue with the nested primary key selector	2022-07-12 15:14:07 +02:00
Kerollmops	192793ee38	Add some tests to check for the nested documents ids	2022-07-12 15:14:07 +02:00
Kerollmops	a892a4a79c	Introduce a function to extend from a JSON array of objects	2022-07-12 15:14:06 +02:00
Kerollmops	dc61105554	Fix the nested document id fetching function	2022-07-12 15:14:06 +02:00
Kerollmops	2eec290424	Check the validity of the latitute and longitude numbers	2022-07-12 15:14:06 +02:00
Kerollmops	5d149d631f	Remove tests for a function that no more exists	2022-07-12 15:14:06 +02:00
Kerollmops	0bbcc7b180	Expose the `DocumentId` struct to be sure to inject the generated ids	2022-07-12 15:14:06 +02:00
Kerollmops	d1a4da9812	Generate a real UUIDv4 when ids are auto-generated	2022-07-12 15:14:06 +02:00
Kerollmops	c8ebf0de47	Rename the validate function as an enriching function	2022-07-12 15:14:06 +02:00
Kerollmops	905af2a2e9	Use the primary key and external id in the transform	2022-07-12 15:14:05 +02:00
Kerollmops	742543091e	Constify the default primary key name	2022-07-12 14:55:52 +02:00
Kerollmops	5f1bfb73ee	Extract the primary key name and make it accessible	2022-07-12 14:55:52 +02:00
Kerollmops	6a0a0ae94f	Make the Transform read from an EnrichedDocumentsBatchReader	2022-07-12 14:55:52 +02:00
Kerollmops	dc3f092d07	Do not leak an internal grenad Error	2022-07-12 14:55:52 +02:00
Kerollmops	8ebf5eed0d	Make the nested primary key work	2022-07-12 14:55:52 +02:00
Kerollmops	19eb3b4708	Make sur that we do not accept floats as documents ids	2022-07-12 14:55:52 +02:00
Kerollmops	2ceeb51c37	Support the auto-generated ids when validating documents	2022-07-12 14:55:51 +02:00
Kerollmops	399eec5c01	Fix the indexation tests	2022-07-12 14:55:51 +02:00
Kerollmops	fcfc4caf8c	Move the Object type in the lib.rs file and use it everywhere	2022-07-12 14:55:51 +02:00
Kerollmops	0146175fe6	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	cefffde9af	Improve the .gitignore of the fuzz crate	2022-07-12 14:55:51 +02:00
Kerollmops	bdc4263883	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	6d0498df24	Fix the fuzz tests	2022-07-12 14:52:56 +02:00
Kerollmops	e8297ad27e	Fix the tests for the new DocumentsBatchBuilder/Reader	2022-07-12 14:52:56 +02:00
Kerollmops	419ce3966c	Rework the DocumentsBatchBuilder/Reader to use grenad	2022-07-12 14:52:55 +02:00
Kerollmops	eb63af1f10	Update grenad to 0.4.2	2022-07-12 14:52:55 +02:00
Kerollmops	048e174efb	Do not allocate when parsing CSV headers	2022-07-12 14:52:55 +02:00
ManyTheFish	5d79617a56	Chores: Enhance smart-crop code comments	2022-07-07 16:28:09 +02:00
bors[bot]	ebddfdb9a3	Merge #578 578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584). Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-07-05 14:56:08 +00:00
Kerollmops	1bfdcfc84f	Bump uuid to 1.1.2	2022-07-05 16:23:36 +02:00
Tamo	250be9fe6c	put the threshold back to 10k	2022-07-05 15:57:44 +02:00
Tamo	b61efd09fc	Makes the internal soft deleted error a UserError	2022-07-05 15:34:45 +02:00
Tamo	eaf28b0628	Apply review suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-05 15:30:33 +02:00
Tamo	3b309f654a	Fasten the document deletion When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search, and all the other endpoints.	2022-07-05 15:30:33 +02:00
Tamo	446439e8be	bump charabia	2022-07-05 12:19:30 +02:00
Dmytro Gordon	3ff03a3f5f	Fix not equal filter when field contains both number and strings	2022-06-27 15:55:17 +03:00
Kerollmops	cc48992e79	Bump the milli version to 0.31.1	2022-06-22 17:05:51 +02:00
Kerollmops	238692a8e7	Introduce the copy_to_path method on the Index	2022-06-22 16:49:47 +02:00
bors[bot]	290a40b7a5	Merge #564 564: Rename the limitedTo parameter into maxTotalHits r=curquiza a=Kerollmops This PR is related to https://github.com/meilisearch/meilisearch/issues/2542, it renames the `limitedTo` parameter into `maxTotalHits`. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-22 13:48:33 +00:00
bors[bot]	d546f6f40e	Merge #563 563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops This PR is related to https://github.com/meilisearch/meilisearch/issues/2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way. The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned. If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-22 12:39:44 +00:00
Kerollmops	f5c3b951bc	Bump the milli version to 0.31.0	2022-06-22 12:08:16 +02:00
Kerollmops	d7c248042b	Rename the limitedTo parameter into maxTotalHits	2022-06-22 12:00:48 +02:00
Kerollmops	d2f84a9d9e	Improve the estimatedNbHits when distinct is enabled	2022-06-22 11:39:21 +02:00
bors[bot]	4f547eff02	Merge #560 560: Update version for next release (v0.30.0) r=curquiza a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-20 12:37:01 +00:00
Clémentine Urquizar	31f749b5d8	Update version for next release (v0.30.0)	2022-06-20 12:09:57 +02:00
ManyTheFish	a0ab90a4d7	Avoid having an ending separator before crop marker	2022-06-16 18:23:57 +02:00
ManyTheFish	177154828c	Extends deletion tests	2022-06-13 17:34:16 +02:00
ManyTheFish	0d1d354052	Ensure that Index methods are not bypassed by Meilisearch	2022-06-13 17:34:11 +02:00
bors[bot]	f1d848bb9a	Merge #552 552: Fix escaped quotes in filter r=Kerollmops a=irevoire Will fix https://github.com/meilisearch/meilisearch/issues/2380 The issue was that in the evaluation of the filter, I was using the deref implementation instead of calling the `value` method of my token. To avoid the problem happening again, I removed the deref implementation; now, you need to either call the `lexeme` or the `value` methods but can't rely on a « default » implementation to get a string out of a token. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-06-09 14:56:44 +00:00
Tamo	676187ba43	bump milli version	2022-06-09 16:53:32 +02:00
Tamo	90afde435b	fix escaped quotes in filter	2022-06-09 16:03:49 +02:00
Kerollmops	445d5474cc	Add the pagination_limited_to setting to the database	2022-06-08 18:14:27 +02:00
Kerollmops	69931e50d2	Add the max_values_by_facet setting to the database	2022-06-08 17:54:56 +02:00
Kerollmops	52a494bd3b	Add the new pagination.limited_to and faceting.max_values_per_facet settings	2022-06-08 17:15:36 +02:00
bors[bot]	9580b9de79	Merge #549 549: Bump the version to 0.29.2 r=curquiza a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-08 14:29:47 +00:00
Kerollmops	56ee9cc21f	Bump the version to 0.29.2	2022-06-08 16:00:06 +02:00
Kerollmops	2a505503b3	Change the number of facet values returned by default to 100	2022-06-08 15:58:57 +02:00
Kerollmops	bae4007447	Remove the hard limit on the number of facet values returned	2022-06-08 15:58:57 +02:00
bors[bot]	7313d6c533	Merge #547 547: Update version for next release (v0.29.1) r=Kerollmops a=curquiza A new milli version will be released once this PR is merged https://github.com/meilisearch/milli/pull/543 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-08 10:20:24 +00:00
Clémentine Urquizar	478dbfa45a	Update version for next release (v0.29.1)	2022-06-07 18:59:33 +02:00
Tamo	d0aaa7ff00	Fix wrong internal ids assignments	2022-06-07 15:49:33 +02:00
ad hoc	31776fdc3f	add failing test	2022-06-07 15:49:33 +02:00
bors[bot]	05ae6dbfa4	Merge #541 541: Update version for next release (v0.29.0) r=ManyTheFish a=curquiza Need to update the version since #540 was merged and breaking Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-02 16:53:28 +00:00
ManyTheFish	d212dc6b8b	Remove useless newline	2022-06-02 18:22:56 +02:00
Clémentine Urquizar	6ce1c6487a	Update version for next release (v0.29.0)	2022-06-02 18:07:55 +02:00
ManyTheFish	7aabe42ae0	Refactor matching words	2022-06-02 17:59:04 +02:00
ManyTheFish	86ac8568e6	Use Charabia in milli	2022-06-02 16:59:11 +02:00
ManyTheFish	192e024ada	Add Charabia in Cargo.toml	2022-06-02 16:59:07 +02:00
Clémentine Urquizar	c19c17eddb	Update version to v0.28.1	2022-06-01 18:31:02 +02:00
bors[bot]	74d1914a64	Merge #535 535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349. ~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~ I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help! Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-01 14:30:50 +00:00
bors[bot]	582930dbbb	Merge #538 538: speedup exact words r=Kerollmops a=MarinPostma This PR make `exact_words` return an `Option` instead of an empty set, since set creation is costly, as noticed by `@kerollmops.` I was not convinces that this was the cause for all of the performance drop we measured, and then realized that methods that initialized it were called recursively which caused initialization times to add up. While the first fix solves the issue when not using exact words, using exact word remained way more expensive that it should be. To address this issue, the exact words are cached into the `Context`, so they are only initialized once. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-30 08:20:34 +00:00
ad hoc	25fc576696	review changes	2022-05-24 14:15:33 +02:00
ad hoc	69dc4de80f	change &Option<Set> to Option<&Set>	2022-05-24 12:14:55 +02:00
ad hoc	ac975cc747	cache context's exact words	2022-05-24 09:43:17 +02:00
ad hoc	8993fec8a3	return optional exact words	2022-05-24 09:15:49 +02:00
Matthias Wright	754f48a4fb	Improves ranking rules error message	2022-05-20 21:25:43 +02:00
Kerollmops	cd7c6e19ed	Reintroduce the max values by facet limit	2022-05-18 15:57:57 +02:00
ManyTheFish	895f5d8a26	Bump milli version	2022-05-18 10:37:12 +02:00
ManyTheFish	137434a1c8	Add some implementation on MatchBounds	2022-05-17 15:57:09 +02:00
bors[bot]	08c6d50cd1	Merge #531 531: fix the mixed dataset geosearch indexing bug r=Kerollmops a=irevoire port #529 to main Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-16 16:06:36 +00:00
bors[bot]	cf3e574cb4	Merge #530 530: fix the searchable fields bug when a field is nested r=Kerollmops a=irevoire port #528 to main Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-16 15:52:30 +00:00
Tamo	0af399a6d7	fix the mixed dataset geosearch indexing bug	2022-05-16 17:37:45 +02:00
Tamo	f586028f9a	fix the searchable fields bug when a field is nested Update milli/src/index.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-05-16 17:24:36 +02:00
bors[bot]	e1e85267fd	Merge #526 526: remove useless comment r=irevoire a=MarinPostma Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-16 10:01:43 +00:00
bors[bot]	51809eb260	Merge #525 525: Simplify the error creation with thiserror r=irevoire a=irevoire I introduced [`thiserror`](https://docs.rs/thiserror/latest/thiserror/) to implements all the `Display` trait and most of the `impl From<xxx> for yyy` in way less lines. And then I introduced a cute macro to implements the `impl<X, Y, Z> From<X> for Z where Y: From<X>, Z: From<X>` more easily. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-04 15:47:32 +00:00
Tamo	484a9ddb27	Simplify the error creation with thiserror and a smol friendly macro	2022-05-04 17:24:00 +02:00
bors[bot]	65e6aa0de2	Merge #523 523: Improve geosearch error messages r=irevoire a=irevoire Improve the geosearch error messages (#488). And try to parse the string as specified in https://github.com/meilisearch/meilisearch/issues/2354 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-04 13:36:11 +00:00
Tamo	c55368ddd4	apply code suggestion Co-authored-by: Kerollmops <kero@meilisearch.com>	2022-05-04 14:11:03 +02:00
ad hoc	5ad5d56f7e	remove useless comment	2022-05-04 10:43:54 +02:00
bors[bot]	0c2c8af44e	Merge #520 520: fix mistake in Settings initialization r=irevoire a=MarinPostma fix settings not being correctly initialized and add a test to make sure that they are in the future. fix https://github.com/meilisearch/meilisearch/issues/2358 Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-03 15:32:18 +00:00
Kerollmops	211c8763b9	Make sure that we do not generate too long keys	2022-05-03 10:03:15 +02:00
Kerollmops	7e47031bdc	Add a test for long keys in LMDB	2022-05-03 10:03:13 +02:00
Tamo	3cb1f6d0a1	improve geosearch error messages	2022-05-02 19:20:47 +02:00
ad hoc	1ee3d6ae33	fix mistake in Settings initialization	2022-04-29 16:24:25 +02:00
bors[bot]	9db86aac51	Merge #518 518: Return facets even when there is no value associated to it r=Kerollmops a=Kerollmops This PR is related to https://github.com/meilisearch/meilisearch/issues/2352 and should fix the issue when Meilisearch is up-to-date with this PR. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-04-28 09:04:36 +00:00
Kerollmops	a4d343aade	Add a test to check for the returned facet distribution	2022-04-26 18:12:58 +02:00
bors[bot]	c2bd94c871	Merge #511 511: Update version in every workspace r=curquiza a=curquiza Checked with `@Kerollmops` - Update the version into every workspace (the current version is v0.27.0, but I forgot to update it for the previous release) - add `publish = false` except in `milli` workspace. Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-26 16:06:47 +00:00
Kerollmops	7d1c2d97bf	Return facets even when there is no values associated to it	2022-04-26 17:59:53 +02:00
bors[bot]	d388ea0f9d	Merge #506 506: fix cargo warnings r=Kerollmops a=MarinPostma fix cargo warnings Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-26 15:45:20 +00:00
ad hoc	5c29258e8e	fix cargo warnings	2022-04-26 17:33:11 +02:00
bors[bot]	2fdf520271	Merge #514 514: Stop flattening every field r=Kerollmops a=irevoire When we need to flatten a document: * The primary key contains a `.`. * Some fields need to be flattened Instead of flattening the whole object and thus creating a lot of allocations with the `serde_json_flatten_crate`, we instead generate a minimal sub-object containing only the fields that need to be flattened. That should create fewer allocations and thus index faster. --------- ``` group indexing_main_e1e362fa indexing_stop-flattening-every-field_40d1bd6b ----- ---------------------- --------------------------------------------- indexing/Indexing geo_point 1.99 23.7±0.23s ? ?/sec 1.00 11.9±0.21s ? ?/sec indexing/Indexing movies in three batches 1.00 18.2±0.24s ? ?/sec 1.01 18.3±0.29s ? ?/sec indexing/Indexing movies with default settings 1.00 17.5±0.09s ? ?/sec 1.01 17.7±0.26s ? ?/sec indexing/Indexing songs in three batches with default settings 1.00 64.8±0.47s ? ?/sec 1.00 65.1±0.49s ? ?/sec indexing/Indexing songs with default settings 1.00 54.9±0.99s ? ?/sec 1.01 55.7±1.34s ? ?/sec indexing/Indexing songs without any facets 1.00 50.6±0.62s ? ?/sec 1.01 50.9±1.05s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 54.0±1.14s ? ?/sec 1.01 54.7±1.13s ? ?/sec indexing/Indexing wiki 1.00 996.2±8.54s ? ?/sec 1.02 1021.1±30.63s ? ?/sec indexing/Indexing wiki in three batches 1.00 1136.8±9.72s ? ?/sec 1.00 1138.6±6.59s ? ?/sec ``` So basically everything slowed down a liiiiiittle bit except the dataset with a nested field which got twice faster Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-26 11:50:33 +00:00
Tamo	f19d2dc548	Only flatten the required fields apply review comments Co-authored-by: Kerollmops <kero@meilisearch.com>	2022-04-26 12:33:46 +02:00
Clémentine Urquizar	d138b3c704	Update version	2022-04-25 18:43:46 +02:00
Tamo	fa6f495662	fix the indexing fuzzer	2022-04-25 18:32:06 +02:00
bors[bot]	8010eca9c7	Merge #505 505: normalize exact words r=curquiza a=MarinPostma Normalize the exact words, as specified in the specification. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-25 09:35:32 +00:00
ad hoc	2e0089d5ff	normalize exact words	2022-04-21 15:38:40 +02:00
ad hoc	3a2451fcba	add test normalize exact words	2022-04-21 13:52:09 +02:00
Clément Renault	eb5830aa40	Add a test to make sure that long words are handled	2022-04-21 13:45:28 +02:00
ad hoc	8b14090927	fix min-word-len-for-typo not reset properly	2022-04-19 15:20:16 +02:00
bors[bot]	ea4bb9402f	Merge #483 483: Enhance matching words r=Kerollmops a=ManyTheFish # Summary Enhance milli word-matcher making it handle match computing and cropping. # Implementation ## Computing best matches for cropping Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched. Now we are searching for the best matches interval to crop around, the chosen interval is the one: 1) that have the highest count of unique matches > for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better. 2) that have the minimum distance between matches > for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better. 3) that have the highest count of ordered matches > for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better. ## Cropping around the best matches interval Before we were cropping around the interval without checking the context. Now we are cropping around words in the same context as matching words. This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot. > For instance, for the matching word `Split` the text: `Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.` will be cropped like: `…. Split The World is a book written by Emily Henry. …` and not like: `Natalie risk her future. Split The World is a book …` Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-04-19 11:42:32 +00:00
ManyTheFish	f1115e274f	Use Copy impl of FormatOption instead of clonning	2022-04-19 10:35:50 +02:00
Clémentine Urquizar	8d630a6f62	Update version for the next release (v0.26.1)	2022-04-14 11:44:06 +02:00
Tamo	00f78d6b5a	Apply code suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-04-14 11:14:08 +02:00
Tamo	399fba16bb	only flatten an object if it's nested	2022-04-14 11:14:08 +02:00
Tamo	ee64f4a936	Use smartstring to store the external id in our hashmap We need to store all the external id (primary key) in a hashmap associated to their internal id during. The smartstring remove heap allocation / memory usage and should improve the cache locality.	2022-04-13 21:22:07 +02:00
ad hoc	dda28d7415	exclude excluded canditates from search result candidates	2022-04-13 12:10:35 +02:00
ad hoc	cd83014fff	add test for disctinct nb hits	2022-04-13 12:10:35 +02:00
ad hoc	bbb6728d2f	add distinct attributes to cli	2022-04-13 12:10:35 +02:00
ManyTheFish	5809d3ae0d	Add first benchmarks on formatting	2022-04-12 16:31:58 +02:00
ManyTheFish	827cedcd15	Add format option structure	2022-04-12 13:42:14 +02:00
ManyTheFish	011f8210ed	Make compute_matches more rust idiomatic	2022-04-12 10:19:02 +02:00
ManyTheFish	a16de5de84	Symplify format and remove intermediate function	2022-04-08 11:20:41 +02:00
ManyTheFish	a769e09dfa	Make token_crop_bounds more rust idiomatic	2022-04-07 20:15:14 +02:00
bors[bot]	9ac2fd1c37	Merge #487 487: Update version (v0.26.0) r=Kerollmops a=curquiza breaking because of #458 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-07 17:10:24 +00:00
Tamo	bab898ce86	move the flatten-serde-json crate inside of milli	2022-04-07 18:20:44 +02:00
ManyTheFish	c8ed1675a7	Add some documentation	2022-04-07 17:32:13 +02:00
ManyTheFish	b1905dfa24	Make split_best_frequency returns references instead of owned data	2022-04-07 17:05:44 +02:00
Tamo	ab458d8840	fix tests after rebase	2022-04-07 17:00:00 +02:00
Irevoire	4f3ce6d9cd	nested fields	2022-04-07 16:58:46 +02:00
Clémentine Urquizar	ee1d627803	Update version (v0.26.0)	2022-04-07 15:56:10 +02:00
bors[bot]	4ae7aea3b2	Merge #486 486: Update version (v0.25.0) r=curquiza a=curquiza v0.25.0 will be released once #478 is merged Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-06 11:40:41 +00:00
ad hoc	b799f3326b	rename merge_nothing to merge_ignore_values	2022-04-05 18:44:35 +02:00
ManyTheFish	fa7d3a37c0	Make some cleaning and add comments	2022-04-05 17:48:56 +02:00
ManyTheFish	3bb1e35ada	Fix match count	2022-04-05 17:48:45 +02:00
ManyTheFish	56e0edd621	Put crop markers direclty around words	2022-04-05 17:41:32 +02:00
ManyTheFish	a93cd8c61c	Fix prefix highlight with special chars	2022-04-05 17:41:32 +02:00
ManyTheFish	b3f0f39106	Make some cleaning	2022-04-05 17:41:32 +02:00
ManyTheFish	6dc345bc53	Test and Fix prefix highlight	2022-04-05 17:41:32 +02:00
ManyTheFish	bd30ee97b8	Keep separators at start of the croped string	2022-04-05 17:41:32 +02:00
ManyTheFish	29c5f76d7f	Use new matcher in http-ui	2022-04-05 17:41:32 +02:00
ManyTheFish	734d0899d3	Publish Matcher	2022-04-05 17:41:32 +02:00
ManyTheFish	4428cb5909	Add some tests and fix some corner cases	2022-04-05 17:41:32 +02:00
ManyTheFish	844f546a8b	Add matches algorithm V1	2022-04-05 17:41:32 +02:00
ManyTheFish	3be1790803	Add crop algorithm with naive match algorithm	2022-04-05 17:41:32 +02:00
ManyTheFish	d96e72e5dc	Create formater with some tests	2022-04-05 17:41:32 +02:00
ad hoc	201fea0fda	limit extract_word_docids memory usage	2022-04-05 14:14:15 +02:00
ad hoc	5cfd3d8407	add exact attributes documentation	2022-04-05 14:10:22 +02:00
Clémentine Urquizar	9eec44dd98	Update version (v0.25.0)	2022-04-05 12:06:42 +02:00
ad hoc	b85cd4983e	remove field_id_from_position	2022-04-05 09:50:34 +02:00
ad hoc	ab185a59b5	fix infos	2022-04-05 09:46:56 +02:00
ad hoc	59e41d98e3	add comments to integration test	2022-04-04 21:17:06 +02:00
ad hoc	1810927dbd	rephrase exact_attributes doc	2022-04-04 21:04:49 +02:00
ad hoc	b7694c34f5	remove println	2022-04-04 21:00:07 +02:00
ad hoc	6cabd47c32	fix typo in comment	2022-04-04 20:59:20 +02:00
ad hoc	c8d3a09af8	add integration test for disabel typo on attributes	2022-04-04 20:54:03 +02:00
ad hoc	6b2c2509b2	fix bug in exact search	2022-04-04 20:54:03 +02:00
ad hoc	56b4f5dce2	add exact prefix to query_docids	2022-04-04 20:54:03 +02:00
ad hoc	21ae4143b1	add exact_word_prefix to Context	2022-04-04 20:54:03 +02:00
ad hoc	e8f06f6c06	extract exact_word_prefix_docids	2022-04-04 20:54:03 +02:00
ad hoc	6dd2e4ffbd	introduce exact_word_prefix database in index	2022-04-04 20:54:03 +02:00
ad hoc	ba0bb29cd8	refactor WordPrefixDocids to take dbs instead of indexes	2022-04-04 20:54:02 +02:00
ad hoc	c4c6e35352	query exact_word_docids in resolve_query_tree	2022-04-04 20:54:02 +02:00
ad hoc	8d46a5b0b5	extract exact word docids	2022-04-04 20:54:02 +02:00
ad hoc	5451c64d5d	increase criteria asc desc test map size	2022-04-04 20:54:02 +02:00
ad hoc	0a77be4ec0	introduce exact_word_docids db	2022-04-04 20:54:02 +02:00
ad hoc	5f9f82757d	refactor spawn_extraction_task	2022-04-04 20:54:02 +02:00
ad hoc	f82d4b36eb	introduce exact attribute setting	2022-04-04 20:54:02 +02:00
ad hoc	c882d8daf0	add test for exact words	2022-04-04 20:54:01 +02:00
ad hoc	7e9d56a9e7	disable typos on exact words	2022-04-04 20:54:01 +02:00
ad hoc	3e67d8818c	fix typo in test comment	2022-04-04 20:34:23 +02:00
ad hoc	284d8a24e0	add intergration test for disabled typon on word	2022-04-04 20:15:51 +02:00
ad hoc	30a2711bac	rename serde module to serde_impl module needed because of issues with rustfmt	2022-04-04 20:10:55 +02:00
ad hoc	0fd55db21c	fmt	2022-04-04 20:10:55 +02:00
ad hoc	559e46be5e	fix bad rebase bug	2022-04-04 20:10:55 +02:00
ad hoc	8b1e5d9c6d	add test for exact words	2022-04-04 20:10:55 +02:00
ad hoc	774fa8f065	disable typos on exact words	2022-04-04 20:10:55 +02:00
ad hoc	9bbffb8fee	add exact words setting	2022-04-04 20:10:54 +02:00
ad hoc	853b4a520f	fmt	2022-04-04 10:41:46 +02:00
ad hoc	2cb71dff4a	add typo integration tests	2022-04-04 10:41:46 +02:00
ad hoc	1941072bb2	implement Copy on Setting	2022-04-04 10:41:46 +02:00
ad hoc	fdaf45aab2	replace hardcoded value with constant in TestContext	2022-04-04 10:41:46 +02:00
ad hoc	950a740bd4	refactor typos for readability	2022-04-04 10:41:46 +02:00
ad hoc	66020cd923	rename min_word_len* to use plain letter numbers	2022-04-04 10:41:46 +02:00
ad hoc	4c4b336ecb	rename min word len for typo error	2022-04-01 11:17:03 +02:00
ad hoc	286dd7b2e4	rename min_word_len_2_typo	2022-04-01 11:17:03 +02:00
ad hoc	55af85db3c	add tests for min_word_len_for_typo	2022-04-01 11:17:02 +02:00
ad hoc	9102de5500	fix error message	2022-04-01 11:17:02 +02:00
ad hoc	a1a3a49bc9	dynamic minimum word len for typos in query tree builder	2022-04-01 11:17:02 +02:00
ad hoc	5a24e60572	introduce word len for typo setting	2022-04-01 11:17:02 +02:00
ad hoc	9fe40df960	add word derivations tests	2022-04-01 11:05:18 +02:00
ad hoc	d5ddc6b080	fix 2 typos word derivation bug	2022-04-01 10:51:22 +02:00
ad hoc	3e34981d9b	add test for authorize_typos in update	2022-03-31 14:12:00 +02:00
ad hoc	6ef3bb9d83	fmt	2022-03-31 14:06:23 +02:00
ad hoc	f782fe2062	add authorize_typo_test	2022-03-31 10:08:39 +02:00
ad hoc	c4653347fd	add authorize typo setting	2022-03-31 10:05:44 +02:00
Clémentine Urquizar	ddf78a735b	Update version (v0.24.1)	2022-03-24 16:39:45 +01:00
Irevoire	86dd88698d	bump tokenizer	2022-03-23 14:25:58 +01:00
Irevoire	5dc464b9a7	rollback meilisearch-tokenizer version	2022-03-21 17:29:10 +01:00
bors[bot]	90276d9a2d	Merge #472 472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function. Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-03-16 15:33:11 +00:00
ManyTheFish	49d59d88c2	Remove useless variables in proximity	2022-03-16 16:12:52 +01:00
Bruno Casali	adc71742c8	Move string concat to the struct instead of in the calling	2022-03-16 10:26:12 -03:00
Bruno Casali	4822fe1beb	Add a better error message when the filterable attrs are empty Fixes https://github.com/meilisearch/meilisearch/issues/2140	2022-03-15 18:13:59 -03:00
bors[bot]	f04ab67083	Merge #466 466: Bump version to 0.23.1 r=curquiza a=Kerollmops This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-03-15 17:19:05 +00:00
bors[bot]	ad4c982c68	Merge #439 439: Optimize typo criterion r=Kerollmops a=MarinPostma This pr implements a couple of optimization for the typo criterion: - clamp max typo on concatenated query words to 1: By considering that a concatenated query word is a typo, we clamp the max number of typos allowed o it to 1. This is useful because we noticed that concatenated query words often introduced words with 2 typos in queries that otherwise didn't allow for 2 typo words. - Make typos on the first letter count for 2. This change is a big performance gain: by considering the typos on the first letter to count as 2 typos, we drastically restrict the search space for 1 typo, and if we reach 2 typos, the search space is reduced as well, as we only consider: (2 typos ∩ correct first letter) ∪ (wrong first letter ∩ 1 typo) instead of 2 typos anywhere in the word. ## benches ``` group main typo ----- ---- ---- smol-songs.csv: asc + default/Notstandskomitee 2.51 5.8±0.01ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec smol-songs.csv: asc + default/charles 2.48 3.0±0.01ms ? ?/sec 1.00 1190.9±1.29µs ? ?/sec smol-songs.csv: asc + default/charles mingus 5.56 10.8±0.01ms ? ?/sec 1.00 1935.3±1.00µs ? ?/sec smol-songs.csv: asc + default/david 1.65 3.9±0.00ms ? ?/sec 1.00 2.4±0.01ms ? ?/sec smol-songs.csv: asc + default/david bowie 3.34 12.5±0.02ms ? ?/sec 1.00 3.7±0.00ms ? ?/sec smol-songs.csv: asc + default/john 1.00 1849.7±3.74µs ? ?/sec 1.01 1875.1±4.65µs ? ?/sec smol-songs.csv: asc + default/marcus miller 4.32 15.7±0.01ms ? ?/sec 1.00 3.6±0.01ms ? ?/sec smol-songs.csv: asc + default/michael jackson 3.31 12.5±0.01ms ? ?/sec 1.00 3.8±0.00ms ? ?/sec smol-songs.csv: asc + default/tamo 1.05 565.4±0.86µs ? ?/sec 1.00 539.3±1.22µs ? ?/sec smol-songs.csv: asc + default/thelonious monk 3.49 11.5±0.01ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec smol-songs.csv: asc/Notstandskomitee 2.59 5.6±0.02ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec smol-songs.csv: asc/charles 6.05 2.1±0.00ms ? ?/sec 1.00 347.8±0.60µs ? ?/sec smol-songs.csv: asc/charles mingus 14.46 9.4±0.01ms ? ?/sec 1.00 649.2±0.97µs ? ?/sec smol-songs.csv: asc/david 3.87 2.4±0.00ms ? ?/sec 1.00 618.2±0.69µs ? ?/sec smol-songs.csv: asc/david bowie 10.14 9.8±0.01ms ? ?/sec 1.00 970.8±1.55µs ? ?/sec smol-songs.csv: asc/john 1.00 546.5±1.10µs ? ?/sec 1.00 547.1±2.11µs ? ?/sec smol-songs.csv: asc/marcus miller 11.45 10.4±0.06ms ? ?/sec 1.00 907.9±1.37µs ? ?/sec smol-songs.csv: asc/michael jackson 10.56 9.7±0.01ms ? ?/sec 1.00 919.6±1.03µs ? ?/sec smol-songs.csv: asc/tamo 1.03 43.3±0.18µs ? ?/sec 1.00 42.2±0.23µs ? ?/sec smol-songs.csv: asc/thelonious monk 4.16 10.7±0.02ms ? ?/sec 1.00 2.6±0.00ms ? ?/sec smol-songs.csv: basic filter: <=/Notstandskomitee 1.00 95.7±0.20µs ? ?/sec 1.15 109.6±10.40µs ? ?/sec smol-songs.csv: basic filter: <=/charles 1.00 27.8±0.15µs ? ?/sec 1.01 27.9±0.18µs ? ?/sec smol-songs.csv: basic filter: <=/charles mingus 1.72 119.2±0.67µs ? ?/sec 1.00 69.1±0.13µs ? ?/sec smol-songs.csv: basic filter: <=/david 1.00 22.3±0.33µs ? ?/sec 1.05 23.4±0.19µs ? ?/sec smol-songs.csv: basic filter: <=/david bowie 1.59 86.9±0.79µs ? ?/sec 1.00 54.5±0.31µs ? ?/sec smol-songs.csv: basic filter: <=/john 1.00 17.9±0.06µs ? ?/sec 1.06 18.9±0.15µs ? ?/sec smol-songs.csv: basic filter: <=/marcus miller 1.65 102.7±1.63µs ? ?/sec 1.00 62.3±0.18µs ? ?/sec smol-songs.csv: basic filter: <=/michael jackson 1.76 128.2±1.85µs ? ?/sec 1.00 72.9±0.19µs ? ?/sec smol-songs.csv: basic filter: <=/tamo 1.00 17.9±0.13µs ? ?/sec 1.05 18.7±0.20µs ? ?/sec smol-songs.csv: basic filter: <=/thelonious monk 1.53 157.5±2.38µs ? ?/sec 1.00 102.8±0.88µs ? ?/sec smol-songs.csv: basic filter: TO/Notstandskomitee 1.00 100.9±4.36µs ? ?/sec 1.04 105.0±8.25µs ? ?/sec smol-songs.csv: basic filter: TO/charles 1.00 28.4±0.36µs ? ?/sec 1.03 29.4±0.33µs ? ?/sec smol-songs.csv: basic filter: TO/charles mingus 1.71 118.1±1.08µs ? ?/sec 1.00 68.9±0.26µs ? ?/sec smol-songs.csv: basic filter: TO/david 1.00 24.0±0.26µs ? ?/sec 1.03 24.6±0.43µs ? ?/sec smol-songs.csv: basic filter: TO/david bowie 1.72 95.2±0.30µs ? ?/sec 1.00 55.2±0.14µs ? ?/sec smol-songs.csv: basic filter: TO/john 1.00 18.8±0.09µs ? ?/sec 1.06 19.8±0.17µs ? ?/sec smol-songs.csv: basic filter: TO/marcus miller 1.61 102.4±1.65µs ? ?/sec 1.00 63.4±0.24µs ? ?/sec smol-songs.csv: basic filter: TO/michael jackson 1.77 132.1±1.41µs ? ?/sec 1.00 74.5±0.59µs ? ?/sec smol-songs.csv: basic filter: TO/tamo 1.00 18.2±0.14µs ? ?/sec 1.05 19.2±0.46µs ? ?/sec smol-songs.csv: basic filter: TO/thelonious monk 1.49 150.8±1.92µs ? ?/sec 1.00 101.3±0.44µs ? ?/sec smol-songs.csv: basic placeholder/ 1.00 27.3±0.07µs ? ?/sec 1.03 28.0±0.05µs ? ?/sec smol-songs.csv: basic with quote/"Notstandskomitee" 1.00 122.4±0.17µs ? ?/sec 1.03 125.6±0.16µs ? ?/sec smol-songs.csv: basic with quote/"charles" 1.00 88.8±0.30µs ? ?/sec 1.00 88.4±0.15µs ? ?/sec smol-songs.csv: basic with quote/"charles" "mingus" 1.00 685.2±0.74µs ? ?/sec 1.01 689.4±6.07µs ? ?/sec smol-songs.csv: basic with quote/"david" 1.00 161.6±0.42µs ? ?/sec 1.01 162.6±0.17µs ? ?/sec smol-songs.csv: basic with quote/"david" "bowie" 1.00 731.7±0.73µs ? ?/sec 1.02 743.1±0.77µs ? ?/sec smol-songs.csv: basic with quote/"john" 1.00 267.1±0.33µs ? ?/sec 1.01 270.9±0.33µs ? ?/sec smol-songs.csv: basic with quote/"marcus" "miller" 1.00 138.7±0.31µs ? ?/sec 1.02 140.9±0.13µs ? ?/sec smol-songs.csv: basic with quote/"michael" "jackson" 1.01 841.4±0.72µs ? ?/sec 1.00 833.8±0.92µs ? ?/sec smol-songs.csv: basic with quote/"tamo" 1.01 189.2±0.26µs ? ?/sec 1.00 188.2±0.71µs ? ?/sec smol-songs.csv: basic with quote/"thelonious" "monk" 1.00 1100.5±1.36µs ? ?/sec 1.01 1111.7±2.17µs ? ?/sec smol-songs.csv: basic without quote/Notstandskomitee 3.40 7.9±0.02ms ? ?/sec 1.00 2.3±0.02ms ? ?/sec smol-songs.csv: basic without quote/charles 2.57 494.4±0.89µs ? ?/sec 1.00 192.5±0.18µs ? ?/sec smol-songs.csv: basic without quote/charles mingus 1.29 2.8±0.02ms ? ?/sec 1.00 2.1±0.01ms ? ?/sec smol-songs.csv: basic without quote/david 1.95 623.8±0.90µs ? ?/sec 1.00 319.2±1.22µs ? ?/sec smol-songs.csv: basic without quote/david bowie 1.12 5.9±0.00ms ? ?/sec 1.00 5.2±0.00ms ? ?/sec smol-songs.csv: basic without quote/john 1.24 1340.9±2.25µs ? ?/sec 1.00 1084.7±7.76µs ? ?/sec smol-songs.csv: basic without quote/marcus miller 7.97 14.6±0.01ms ? ?/sec 1.00 1826.0±6.84µs ? ?/sec smol-songs.csv: basic without quote/michael jackson 1.19 3.9±0.00ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec smol-songs.csv: basic without quote/tamo 1.65 737.7±3.58µs ? ?/sec 1.00 446.7±0.51µs ? ?/sec smol-songs.csv: basic without quote/thelonious monk 1.16 4.5±0.02ms ? ?/sec 1.00 3.9±0.04ms ? ?/sec smol-songs.csv: big filter/Notstandskomitee 3.27 7.6±0.02ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec smol-songs.csv: big filter/charles 8.26 1957.5±1.37µs ? ?/sec 1.00 236.8±0.34µs ? ?/sec smol-songs.csv: big filter/charles mingus 18.49 11.2±0.06ms ? ?/sec 1.00 607.7±3.03µs ? ?/sec smol-songs.csv: big filter/david 3.78 2.4±0.00ms ? ?/sec 1.00 622.8±0.80µs ? ?/sec smol-songs.csv: big filter/david bowie 9.00 12.0±0.01ms ? ?/sec 1.00 1336.0±3.17µs ? ?/sec smol-songs.csv: big filter/john 1.00 554.2±0.95µs ? ?/sec 1.01 560.4±0.79µs ? ?/sec smol-songs.csv: big filter/marcus miller 18.09 12.0±0.01ms ? ?/sec 1.00 664.7±0.60µs ? ?/sec smol-songs.csv: big filter/michael jackson 8.43 12.0±0.01ms ? ?/sec 1.00 1421.6±1.37µs ? ?/sec smol-songs.csv: big filter/tamo 1.00 86.3±0.14µs ? ?/sec 1.01 87.3±0.21µs ? ?/sec smol-songs.csv: big filter/thelonious monk 5.55 14.3±0.02ms ? ?/sec 1.00 2.6±0.01ms ? ?/sec smol-songs.csv: desc + default/Notstandskomitee 2.52 5.8±0.01ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec smol-songs.csv: desc + default/charles 3.04 2.7±0.01ms ? ?/sec 1.00 893.4±1.08µs ? ?/sec smol-songs.csv: desc + default/charles mingus 6.77 10.3±0.01ms ? ?/sec 1.00 1520.8±1.90µs ? ?/sec smol-songs.csv: desc + default/david 1.39 5.7±0.00ms ? ?/sec 1.00 4.1±0.00ms ? ?/sec smol-songs.csv: desc + default/david bowie 2.34 15.8±0.02ms ? ?/sec 1.00 6.7±0.01ms ? ?/sec smol-songs.csv: desc + default/john 1.00 2.5±0.00ms ? ?/sec 1.02 2.6±0.01ms ? ?/sec smol-songs.csv: desc + default/marcus miller 5.06 14.5±0.02ms ? ?/sec 1.00 2.9±0.01ms ? ?/sec smol-songs.csv: desc + default/michael jackson 2.64 14.1±0.05ms ? ?/sec 1.00 5.4±0.00ms ? ?/sec smol-songs.csv: desc + default/tamo 1.00 567.0±0.65µs ? ?/sec 1.00 565.7±0.97µs ? ?/sec smol-songs.csv: desc + default/thelonious monk 3.55 11.6±0.02ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec smol-songs.csv: desc/Notstandskomitee 2.58 5.6±0.02ms ? ?/sec 1.00 2.2±0.02ms ? ?/sec smol-songs.csv: desc/charles 6.04 2.1±0.00ms ? ?/sec 1.00 348.1±0.57µs ? ?/sec smol-songs.csv: desc/charles mingus 14.51 9.4±0.01ms ? ?/sec 1.00 646.7±0.99µs ? ?/sec smol-songs.csv: desc/david 3.86 2.4±0.00ms ? ?/sec 1.00 620.7±2.46µs ? ?/sec smol-songs.csv: desc/david bowie 10.10 9.8±0.01ms ? ?/sec 1.00 973.9±3.31µs ? ?/sec smol-songs.csv: desc/john 1.00 545.5±0.78µs ? ?/sec 1.00 547.2±0.48µs ? ?/sec smol-songs.csv: desc/marcus miller 11.39 10.3±0.01ms ? ?/sec 1.00 903.7±0.95µs ? ?/sec smol-songs.csv: desc/michael jackson 10.51 9.7±0.01ms ? ?/sec 1.00 924.7±2.02µs ? ?/sec smol-songs.csv: desc/tamo 1.01 43.2±0.33µs ? ?/sec 1.00 42.6±0.35µs ? ?/sec smol-songs.csv: desc/thelonious monk 4.19 10.8±0.03ms ? ?/sec 1.00 2.6±0.00ms ? ?/sec smol-songs.csv: prefix search/a 1.00 1008.7±1.00µs ? ?/sec 1.00 1005.5±0.91µs ? ?/sec smol-songs.csv: prefix search/b 1.00 885.0±0.70µs ? ?/sec 1.01 890.6±1.11µs ? ?/sec smol-songs.csv: prefix search/i 1.00 1051.8±1.25µs ? ?/sec 1.00 1056.6±4.12µs ? ?/sec smol-songs.csv: prefix search/s 1.00 724.7±1.77µs ? ?/sec 1.00 721.6±0.59µs ? ?/sec smol-songs.csv: prefix search/x 1.01 212.4±0.21µs ? ?/sec 1.00 210.9±0.38µs ? ?/sec smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie 18.55 48.5±0.09ms ? ?/sec 1.00 2.6±0.03ms ? ?/sec smol-songs.csv: proximity/The Disneyland Sing-Along Chorus 8.41 56.7±0.45ms ? ?/sec 1.00 6.7±0.05ms ? ?/sec smol-songs.csv: proximity/Under Great Northern Lights 15.74 38.9±0.14ms ? ?/sec 1.00 2.5±0.00ms ? ?/sec smol-songs.csv: proximity/black saint sinner lady 11.82 40.1±0.13ms ? ?/sec 1.00 3.4±0.02ms ? ?/sec smol-songs.csv: proximity/les dangeureuses 1960 6.90 26.1±0.13ms ? ?/sec 1.00 3.8±0.04ms ? ?/sec smol-songs.csv: typo/Arethla Franklin 14.93 5.8±0.01ms ? ?/sec 1.00 390.1±1.89µs ? ?/sec smol-songs.csv: typo/Disnaylande 3.18 7.3±0.01ms ? ?/sec 1.00 2.3±0.00ms ? ?/sec smol-songs.csv: typo/dire straights 5.55 15.2±0.02ms ? ?/sec 1.00 2.7±0.00ms ? ?/sec smol-songs.csv: typo/fear of the duck 28.03 20.0±0.03ms ? ?/sec 1.00 713.3±1.54µs ? ?/sec smol-songs.csv: typo/indochie 19.25 1851.4±2.38µs ? ?/sec 1.00 96.2±0.13µs ? ?/sec smol-songs.csv: typo/indochien 14.66 1887.7±3.18µs ? ?/sec 1.00 128.8±0.18µs ? ?/sec smol-songs.csv: typo/klub des loopers 37.73 18.0±0.02ms ? ?/sec 1.00 476.7±0.73µs ? ?/sec smol-songs.csv: typo/michel depech 10.17 5.8±0.01ms ? ?/sec 1.00 565.8±1.16µs ? ?/sec smol-songs.csv: typo/mongus 15.33 1897.4±3.44µs ? ?/sec 1.00 123.8±0.13µs ? ?/sec smol-songs.csv: typo/stromal 14.63 1859.3±2.40µs ? ?/sec 1.00 127.1±0.29µs ? ?/sec smol-songs.csv: typo/the white striper 10.83 9.4±0.01ms ? ?/sec 1.00 866.0±0.98µs ? ?/sec smol-songs.csv: typo/thelonius monk 14.40 3.8±0.00ms ? ?/sec 1.00 261.5±1.30µs ? ?/sec smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots 5.54 70.8±0.09ms ? ?/sec 1.00 12.8±0.03ms ? ?/sec smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title 3.48 119.8±0.14ms ? ?/sec 1.00 34.4±0.04ms ? ?/sec smol-songs.csv: words/The Disneyland Children's Sing-Alone song 8.98 71.9±0.12ms ? ?/sec 1.00 8.0±0.01ms ? ?/sec smol-songs.csv: words/les liaisons dangeureuses 1793 11.88 37.4±0.07ms ? ?/sec 1.00 3.1±0.01ms ? ?/sec smol-songs.csv: words/seven nation mummy 22.86 23.4±0.04ms ? ?/sec 1.00 1024.8±1.57µs ? ?/sec smol-songs.csv: words/the black saint and the sinner lady and the good doggo 2.76 124.4±0.15ms ? ?/sec 1.00 45.1±0.09ms ? ?/sec smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one 2.52 107.0±0.23ms ? ?/sec 1.00 42.4±0.66ms ? ?/sec group main-wiki typo-wiki ----- --------- --------- smol-wiki-articles.csv: basic placeholder/ 1.02 13.7±0.02µs ? ?/sec 1.00 13.4±0.03µs ? ?/sec smol-wiki-articles.csv: basic with quote/"film" 1.02 409.8±0.67µs ? ?/sec 1.00 402.6±0.48µs ? ?/sec smol-wiki-articles.csv: basic with quote/"france" 1.00 325.9±0.91µs ? ?/sec 1.00 326.4±0.49µs ? ?/sec smol-wiki-articles.csv: basic with quote/"japan" 1.00 218.4±0.26µs ? ?/sec 1.01 220.5±0.20µs ? ?/sec smol-wiki-articles.csv: basic with quote/"machine" 1.00 143.0±0.12µs ? ?/sec 1.04 148.8±0.21µs ? ?/sec smol-wiki-articles.csv: basic with quote/"miles" "davis" 1.00 11.7±0.06ms ? ?/sec 1.00 11.8±0.01ms ? ?/sec smol-wiki-articles.csv: basic with quote/"mingus" 1.00 4.4±0.03ms ? ?/sec 1.00 4.4±0.00ms ? ?/sec smol-wiki-articles.csv: basic with quote/"rock" "and" "roll" 1.00 43.5±0.08ms ? ?/sec 1.01 43.8±0.06ms ? ?/sec smol-wiki-articles.csv: basic with quote/"spain" 1.00 137.3±0.35µs ? ?/sec 1.05 144.4±0.23µs ? ?/sec smol-wiki-articles.csv: basic without quote/film 1.00 125.3±0.30µs ? ?/sec 1.06 133.1±0.37µs ? ?/sec smol-wiki-articles.csv: basic without quote/france 1.21 1782.6±1.65µs ? ?/sec 1.00 1477.0±1.39µs ? ?/sec smol-wiki-articles.csv: basic without quote/japan 1.28 1363.9±0.80µs ? ?/sec 1.00 1064.3±1.79µs ? ?/sec smol-wiki-articles.csv: basic without quote/machine 1.73 760.3±0.81µs ? ?/sec 1.00 439.6±0.75µs ? ?/sec smol-wiki-articles.csv: basic without quote/miles davis 1.03 17.0±0.03ms ? ?/sec 1.00 16.5±0.02ms ? ?/sec smol-wiki-articles.csv: basic without quote/mingus 1.07 5.3±0.01ms ? ?/sec 1.00 5.0±0.00ms ? ?/sec smol-wiki-articles.csv: basic without quote/rock and roll 1.01 63.9±0.18ms ? ?/sec 1.00 63.0±0.07ms ? ?/sec smol-wiki-articles.csv: basic without quote/spain 2.07 667.4±0.93µs ? ?/sec 1.00 322.8±0.29µs ? ?/sec smol-wiki-articles.csv: prefix search/c 1.00 343.1±0.47µs ? ?/sec 1.00 344.0±0.34µs ? ?/sec smol-wiki-articles.csv: prefix search/g 1.00 374.4±3.42µs ? ?/sec 1.00 374.1±0.44µs ? ?/sec smol-wiki-articles.csv: prefix search/j 1.00 359.9±0.31µs ? ?/sec 1.00 361.2±0.79µs ? ?/sec smol-wiki-articles.csv: prefix search/q 1.01 102.0±0.12µs ? ?/sec 1.00 101.4±0.32µs ? ?/sec smol-wiki-articles.csv: prefix search/t 1.00 536.7±1.39µs ? ?/sec 1.00 534.3±0.84µs ? ?/sec smol-wiki-articles.csv: prefix search/x 1.00 400.9±1.00µs ? ?/sec 1.00 399.5±0.45µs ? ?/sec smol-wiki-articles.csv: proximity/april paris 3.86 14.4±0.01ms ? ?/sec 1.00 3.7±0.01ms ? ?/sec smol-wiki-articles.csv: proximity/diesel engine 12.98 10.4±0.01ms ? ?/sec 1.00 803.5±1.13µs ? ?/sec smol-wiki-articles.csv: proximity/herald sings 1.00 12.7±0.06ms ? ?/sec 5.29 67.1±0.09ms ? ?/sec smol-wiki-articles.csv: proximity/tea two 6.48 1452.1±2.78µs ? ?/sec 1.00 224.1±0.38µs ? ?/sec smol-wiki-articles.csv: typo/Disnaylande 3.89 8.5±0.01ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec smol-wiki-articles.csv: typo/aritmetric 3.78 10.3±0.01ms ? ?/sec 1.00 2.7±0.00ms ? ?/sec smol-wiki-articles.csv: typo/linax 8.91 1426.7±0.97µs ? ?/sec 1.00 160.1±0.18µs ? ?/sec smol-wiki-articles.csv: typo/migrosoft 7.48 1417.3±5.84µs ? ?/sec 1.00 189.5±0.88µs ? ?/sec smol-wiki-articles.csv: typo/nympalidea 3.96 7.2±0.01ms ? ?/sec 1.00 1810.1±2.03µs ? ?/sec smol-wiki-articles.csv: typo/phytogropher 3.71 7.2±0.01ms ? ?/sec 1.00 1934.3±6.51µs ? ?/sec smol-wiki-articles.csv: typo/sisan 6.44 1497.2±1.38µs ? ?/sec 1.00 232.7±0.94µs ? ?/sec smol-wiki-articles.csv: typo/the fronce 6.92 2.9±0.00ms ? ?/sec 1.00 418.0±1.76µs ? ?/sec smol-wiki-articles.csv: words/Abraham machin 16.63 10.8±0.01ms ? ?/sec 1.00 649.7±1.08µs ? ?/sec smol-wiki-articles.csv: words/Idaho Bellevue pizza 27.15 25.6±0.03ms ? ?/sec 1.00 944.2±5.07µs ? ?/sec smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk 26.87 40.7±0.05ms ? ?/sec 1.00 1515.3±2.73µs ? ?/sec smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli 11.99 48.8±0.10ms ? ?/sec 1.00 4.1±0.02ms ? ?/sec smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo 4.90 110.0±0.15ms ? ?/sec 1.00 22.4±0.03ms ? ?/sec ``` Co-authored-by: mpostma <postma.marin@protonmail.com> Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-15 16:43:36 +00:00
ad hoc	3f24555c3d	custom fst automatons	2022-03-15 17:38:35 +01:00
ad hoc	628c835a22	fix tests	2022-03-15 17:38:34 +01:00
bors[bot]	8efac33b53	Merge #467 467: optimize prefix database r=Kerollmops a=MarinPostma This pr introduces two optimizations that greatly improve the speed of computing prefix databases. - The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST. - We unconditionally and needlessly checked for documents to remove in `word_prefix_pair`, which caused an iteration over the whole database. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-15 16:14:35 +00:00
ad hoc	d127c57f2d	review edits	2022-03-15 17:12:48 +01:00
ad hoc	d633ac5b9d	optimize word prefix pair	2022-03-15 16:37:22 +01:00
ad hoc	d68fe2b3c7	optimize word prefix fst	2022-03-15 16:36:48 +01:00
Kerollmops	08a06b49f0	Bump version to 0.23.1	2022-03-15 15:50:28 +01:00
Clément Renault	0c5f4ed7de	Apply suggestions Co-authored-by: Many <many@meilisearch.com>	2022-03-15 14:18:29 +01:00
Kerollmops	21ec334dcc	Fix the compilation error of the dependency versions	2022-03-15 11:17:45 +01:00
Kerollmops	63682c2c9a	Upgrade the dependencies	2022-03-15 11:17:44 +01:00
Kerollmops	288a879411	Remove three useless dependencies	2022-03-15 11:17:44 +01:00
psvnl sai kumar	5e08fac729	fixes for rustfmt pass	2022-03-14 19:22:41 +05:30
psvnl sai kumar	92e2e09434	exporting heed to avoid having different versions of Heed in Meilisearch	2022-03-14 01:01:58 +05:30
Kerollmops	1ae13c1374	Avoid iterating on big databases when useless	2022-03-09 15:43:54 +01:00
Bruno Casali	66c6d5e1ef	Add a new error message when the `valid_fields` is empty > "Attribute `{}` is not sortable. This index doesn't have configured sortable attributes." > "Attribute `{}` is not sortable. Available sortable attributes are: `{}`." coexist in the error handling	2022-03-05 10:38:18 -03:00
Clémentine Urquizar	d9ed9de2b0	Update heed link in cargo toml	2022-03-01 19:45:29 +01:00
Kerollmops	d5b8b5a2f8	Replace the ugly unwraps by clean if let Somes	2022-02-28 16:31:33 +01:00
Kerollmops	8d26f3040c	Remove a useless grenad file merging	2022-02-28 16:31:33 +01:00
Clément Renault	04b1bbf932	Reintroduce appending sorted entries when possible	2022-02-24 14:50:45 +01:00
bors[bot]	25123af3b8	Merge #436 436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops This PR depends on the fixes done in #431 and must be merged after it. In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents. --- The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity. The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-02-16 15:41:14 +00:00
Clément Renault	ff8d7a810d	Change the behavior of the as_cloneable_grenad by taking a ref	2022-02-16 15:40:08 +01:00
Clément Renault	f367cc2e75	Finally bump grenad to v0.4.1	2022-02-16 15:28:48 +01:00
Irevoire	0defeb268c	bump milli	2022-02-16 13:27:41 +01:00
Irevoire	48542ac8fd	get rid of chrono in favor of time	2022-02-15 11:41:55 +01:00
Clémentine Urquizar	d03b3ceb58	Update version for the next release (v0.22.1)	2022-02-07 18:39:29 +01:00
bors[bot]	5d58cb7449	Merge #442 442: fix phrase search r=curquiza a=MarinPostma Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-02-07 16:18:20 +00:00
ad hoc	bd2262ceea	allow null values in csv	2022-02-03 16:03:01 +01:00
ad hoc	13de251047	rewrite word pair distance gathering	2022-02-03 15:57:20 +01:00
Many	d59bcea749	Revert "Revert "Change chunk size to 4MiB to fit more the end user usage""	2022-02-02 17:01:13 +01:00
mpostma	7541ab99cd	review changes	2022-02-02 12:59:01 +01:00
mpostma	d0aabde502	optimize 2 typos case	2022-02-02 12:56:09 +01:00
mpostma	55e6cb9c7b	typos on first letter counts as 2	2022-02-02 12:56:09 +01:00
mpostma	642c01d0dc	set max typos on ngram to 1	2022-02-02 12:56:08 +01:00
ad hoc	d852dc0d2b	fix phrase search	2022-02-01 20:21:33 +01:00
Kerollmops	fb79c32430	Compute the new, common and, deleted prefix words fst once	2022-01-27 11:00:18 +01:00
Clément Renault	51d1e64b23	Remove, now useless, the WriteMethod enum	2022-01-27 10:08:35 +01:00
Clément Renault	e9c02173cf	Rework the WordsPrefixPositionDocids update to compute a subset of the database	2022-01-27 10:08:35 +01:00
Clément Renault	dbba5fd461	Create a function to simplify the word prefix pair proximity docids compute	2022-01-27 10:08:35 +01:00
Clément Renault	e760e02737	Fix the computation of the newly added and common prefix pair proximity words	2022-01-27 10:08:35 +01:00
Clément Renault	d59e559317	Fix the computation of the newly added and common prefix words	2022-01-27 10:08:34 +01:00
Clément Renault	2ec8542105	Rework the WordPrefixDocids update to compute a subset of the database	2022-01-27 10:08:34 +01:00
Clément Renault	28692f65be	Rework the WordPrefixDocids update to compute a subset of the database	2022-01-27 10:08:34 +01:00
Clément Renault	5404bc02dd	Move the fst_stream_into_hashset method in the helper methods	2022-01-27 10:06:00 +01:00
Clément Renault	c90fa95f93	Only compute the word prefix pairs on the created word pair proximities	2022-01-27 10:06:00 +01:00
Clément Renault	822f67e9ad	Bring the newly created word pair proximity docids	2022-01-27 10:06:00 +01:00
Clément Renault	d28f18658e	Retrieve the previous version of the words prefixes FST	2022-01-27 10:05:59 +01:00
bors[bot]	38d23546a5	Merge #431 431: Fix and improve word prefix pair proximity r=ManyTheFish a=Kerollmops This PR first fixes the algorithm we used to select and compute the word prefix pair proximity database. The previous version was skipping nearly all of the prefixes. The issue is that this fix made this method to take more time and we were trying to reduce the time spent in it. With `@ManyTheFish` we found out that we could skip some of the work we were doing by: - discarding the prefixes that were shorter than a specific threshold (default: 2). - discarding the word prefix pairs with proximity bigger than a specific threshold (default: 4). - remove the unused threshold that was specifying a minimum amount of word docids to merge. We will take more time to do some more optimization, like stop clearing and recomputing from scratch the database, we will compute the subsets of keys to create, keep and merge. This change is a little bit more complex than what this PR does. I keep this PR as a draft as I want to further test the real gain if it is enough or not if it is valid or not. I advise reviewers to review commit by commit to see the changes bit by bit, reviewing the whole PR can be hard. Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-27 07:04:56 +00:00
Clément Renault	f9b214f34e	Apply suggestions from code review Co-authored-by: Many <legendre.maxime.isn@gmail.com>	2022-01-26 11:28:11 +01:00
bors[bot]	e1cc025cbd	Merge #440 440: fix(fuzzer): fix the fuzzer after #430 r=Kerollmops a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-25 16:33:57 +00:00
Clément Renault	f04cd19886	Introduce a max prefix length parameter to the word prefix pair proximity update	2022-01-25 17:04:23 +01:00
Clément Renault	1514dfa1b7	Introduce a max proximity parameter to the word prefix pair proximity update	2022-01-25 17:04:23 +01:00
Clément Renault	23ea3ad738	Remove the useless threshold when computing the word prefix pair proximity	2022-01-25 17:04:23 +01:00
Clément Renault	e3c34684c6	Fix a bug where we were skipping most of the prefix pairs	2022-01-25 17:04:23 +01:00
Tamo	fb51d511be	fix(fuzzer): fix the fuzzer after #430	2022-01-25 12:08:47 +01:00
bors[bot]	9f2ff71581	Merge #434 434: bump milli to v0.22.0 r=curquiza a=irevoire This is breaking because of this PR: `98a365aaae` Should we do a special branch to only release the [patch](https://github.com/meilisearch/milli/pull/433) for https://github.com/meilisearch/MeiliSearch/issues/2082 (which is non-breaking)? Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-24 17:31:20 +00:00
bors[bot]	fd177b63f8	Merge #423 423: Remove an unused file r=irevoire a=irevoire This empty file is not included anywhere Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-19 14:18:05 +00:00
Marin Postma	0c84a40298	document batch support reusable transform rework update api add indexer config fix tests review changes Co-authored-by: Clément Renault <clement@meilisearch.com> fmt	2022-01-19 12:40:20 +01:00
Tamo	01968d7ca7	ensure we get no documents and no error when filtering on an empty db	2022-01-18 11:40:30 +01:00
Tamo	367f403693	bump milli	2022-01-17 16:41:34 +01:00
bors[bot]	8f4499090b	Merge #433 433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we return an empty RoaringBitmap instead of throwing an internal error Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-17 14:06:53 +00:00
bors[bot]	4c516c00da	Merge #426 426: Fix search highlight for non-unicode chars r=ManyTheFish a=Samyak2 # Pull Request ## What does this PR do? Fixes https://github.com/meilisearch/MeiliSearch/issues/1480 <!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. --> ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? ## Changes The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function now returns the number of matching grapheme clusters instead of bytes. Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched ## Questions Since `matching_bytes` does not return number of bytes but grapheme clusters, should it be renamed to something like `matching_chars` or `matching_graphemes`? Will this break the API? Thank you very much `@ManyTheFish` for helping 😄 Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>	2022-01-17 13:39:00 +00:00
Tamo	d1ac40ea14	fix(filter): Fix two bugs. - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we returns an empty RoaringBitmap instead of throwing an internal error	2022-01-17 13:51:46 +01:00
Samyak S Sarnayak	2d7607734e	Run cargo fmt on matching_words.rs	2022-01-17 13:04:33 +05:30
Samyak S Sarnayak	5ab505be33	Fix highlight by replacing num_graphemes_from_bytes num_graphemes_from_bytes has been renamed in the tokenizer to num_chars_from_bytes. Highlight now works correctly!	2022-01-17 13:02:55 +05:30
Samyak S Sarnayak	c10f58b7bd	Update tokenizer to v0.2.7	2022-01-17 13:02:00 +05:30
Samyak S Sarnayak	e752bd06f7	Fix matching_words tests to compile successfully The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59	2022-01-17 11:37:45 +05:30
Samyak S Sarnayak	30247d70cd	Fix search highlight for non-unicode chars The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function returns the number of matching grapheme clusters instead of bytes. Should this function be renamed then? Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched	2022-01-17 11:37:44 +05:30
Tamo	0605c0ac68	apply review comments	2022-01-13 18:51:08 +01:00
Tamo	b22c80106f	add some settings to the fuzzed milli and use the published version of arbitrary json	2022-01-13 15:35:24 +01:00
Tamo	c94952e25d	update the readme + dependencies	2022-01-12 18:30:11 +01:00
Tamo	e1053989c0	add a fuzzer on milli	2022-01-12 17:57:54 +01:00
Tamo	98a365aaae	store the geopoint in three dimensions	2021-12-14 12:21:24 +01:00
Tamo	d671d6f0f1	remove an unused file	2021-12-13 19:27:34 +01:00
Clément Renault	25faef67d0	Remove the database setup in the filter_depth test	2021-12-09 11:57:53 +01:00
Clément Renault	65519bc04b	Test that empty filters return a None	2021-12-09 11:57:53 +01:00
Clément Renault	ef59762d8e	Prefer returning None instead of the Empty Filter state	2021-12-09 11:57:52 +01:00
Clément Renault	ee856a7a46	Limit the max filter depth to 2000	2021-12-07 17:36:45 +01:00
Clément Renault	32bd9f091f	Detect the filters that are too deep and return an error	2021-12-07 17:20:11 +01:00
Clément Renault	90f49eab6d	Check the filter max depth limit and reject the invalid ones	2021-12-07 16:32:48 +01:00
many	1b3923b5ce	Update all packages to 0.21.0	2021-11-29 12:17:59 +01:00
many	8970246bc4	Sort positions before iterating over them during word pair proximity extraction	2021-11-22 18:16:54 +01:00
Marin Postma	6e977dd8e8	change visibility of DocumentDeletionResult	2021-11-22 15:44:44 +01:00
many	35f9499638	Export tokenizer from milli	2021-11-18 16:57:12 +01:00
many	64ef5869d7	Update tokenizer v0.2.6	2021-11-18 16:56:05 +01:00
Marin Postma	6eb47ab792	remove update_id in UpdateBuilder	2021-11-16 13:07:04 +01:00
Marin Postma	09b4281cff	improve document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned metaimprove document addition returned meta	2021-11-10 14:08:36 +01:00
Marin Postma	721fc294be	improve document deletion returned meta returns both the remaining number of documents and the number of deleted documents.	2021-11-10 14:08:18 +01:00
Tamo	f28600031d	Rename the filter_parser crate into filter-parser Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-11-09 16:41:10 +01:00
Irevoire	0ea0146e04	implement deref &str on the tokens	2021-11-09 11:34:10 +01:00
Tamo	7483c7513a	fix the filterable fields	2021-11-07 01:52:19 +01:00
Tamo	e5af3ac65c	rename the filter_condition.rs to filter.rs	2021-11-06 16:37:55 +01:00
Tamo	6831c23449	merge with main	2021-11-06 16:34:30 +01:00
Tamo	b249989bef	fix most of the tests	2021-11-06 01:32:12 +01:00
Tamo	27a6a26b4b	makes the parse function part of the filter_parser	2021-11-05 10:46:54 +01:00
Tamo	76d961cc77	implements the last errors	2021-11-04 17:42:06 +01:00
Tamo	8234f9fdf3	recreate most filter error except for the geosearch	2021-11-04 17:24:55 +01:00
Tamo	07a5ffb04c	update http-ui	2021-11-04 15:52:22 +01:00
Tamo	a58bc5bebb	update milli with the new parser_filter	2021-11-04 15:02:36 +01:00
many	743ed9f57f	Bump milli version	2021-11-04 14:04:21 +01:00
many	7b3bac46a0	Change Attribute and Ranking rules errors	2021-11-04 13:19:32 +01:00
many	702589104d	Update version for the next release (v0.20.1)	2021-11-03 14:20:01 +01:00
many	0c0038488c	Change last error messages	2021-11-03 11:24:06 +01:00
Tamo	76a2adb7c3	re-enable the tests in the parser and start the creation of an error type	2021-11-02 17:35:17 +01:00
bors[bot]	5a6d22d4ec	Merge #407 407: Update version for the next release (v0.20.0) r=curquiza a=curquiza Breaking because of #405 and #406 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-10-28 13:43:48 +00:00
bors[bot]	08ae47e475	Merge #405 405: Change some error messages r=ManyTheFish a=ManyTheFish Co-authored-by: many <maxime@meilisearch.com>	2021-10-28 13:35:55 +00:00
Clémentine Urquizar	056ff13c4d	Update version for the next release (v0.20.0)	2021-10-28 14:52:57 +02:00
many	9f1e0d2a49	Refine asc/desc error messages	2021-10-28 14:47:17 +02:00
many	ed6db19681	Fix PR comments	2021-10-28 11:18:32 +02:00
marin postma	183d3dada7	return document count from builder	2021-10-28 10:33:04 +02:00
many	2be755ce75	Lower error check, already check in meilisearch	2021-10-27 19:50:41 +02:00
many	3599df77f0	Change some error messages	2021-10-27 19:33:01 +02:00
bors[bot]	d7943fe225	Merge #402 402: Optimize document transform r=MarinPostma a=MarinPostma This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically: - For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation. - For csv, we directly write the lines in the obkv, applying other optimization as well. Co-authored-by: marin postma <postma.marin@protonmail.com>	2021-10-26 09:55:28 +00:00
marin postma	baddd80069	implement review suggestions	2021-10-25 18:29:12 +02:00
marin postma	f9445c1d90	return float parsing error context in csv	2021-10-25 17:27:10 +02:00
bors[bot]	15c29cdd9b	Merge #401 401: Update version for the next release (v0.19.0) r=curquiza a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-10-25 12:49:53 +00:00
Clémentine Urquizar	208903ddde	Revert "Replacing pest with nom "	2021-10-25 11:58:00 +02:00
Clémentine Urquizar	679fe18b17	Update version for the next release (v0.19.0)	2021-10-25 11:52:17 +02:00
marin postma	3fcccc31b5	add document builder example	2021-10-25 10:26:43 +02:00
marin postma	430e9b13d3	add csv builder tests	2021-10-25 10:26:43 +02:00
marin postma	53c79e85f2	document errors	2021-10-25 10:26:43 +02:00
marin postma	2e62925a6e	fix tests	2021-10-25 10:26:42 +02:00
marin postma	0f86d6b28f	implement csv serialization	2021-10-25 10:26:42 +02:00
marin postma	8d70b01714	optimize document deserialization	2021-10-25 10:26:42 +02:00
Tamo	1327807caa	add some error messages	2021-10-22 19:00:33 +02:00
Tamo	c8d03046bf	add a check on the fid in the geosearch	2021-10-22 18:08:18 +02:00
Tamo	3942b3732f	re-implement the geosearch	2021-10-22 18:03:39 +02:00
Tamo	7cd9109e2f	lowercase value extracted from Token	2021-10-22 17:50:15 +02:00
Tamo	e25ca9776f	start updating the exposed function to makes other modules happy	2021-10-22 17:23:22 +02:00
Tamo	6c9165b6a8	provide a helper to parse the token but to not handle the errors	2021-10-22 16:52:13 +02:00
Tamo	efb2f8b325	convert the errors	2021-10-22 16:38:35 +02:00
Tamo	c27870e765	integrate a first version without any error handling	2021-10-22 14:33:18 +02:00
Tamo	01dedde1c9	update some names and move some parser out of the lib.rs	2021-10-22 01:59:38 +02:00
Tamo	c634d43ac5	add a simple test on the filters with an integer	2021-10-21 17:10:27 +02:00
Tamo	6c15f50899	rewrite the parser logic	2021-10-21 16:45:42 +02:00
Tamo	e1d81342cf	add test on the or and and operator	2021-10-21 13:01:25 +02:00
Tamo	423baac08b	fix the tests	2021-10-21 12:45:40 +02:00
Tamo	36281a653f	write all the simple tests	2021-10-21 12:40:11 +02:00
Clémentine Urquizar	f8fe9316c0	Update version for the next release (v0.18.1)	2021-10-21 11:56:14 +02:00
Tamo	661bc21af5	Fix the filter parser And add a bunch of tests on the filter::from_array	2021-10-21 11:45:03 +02:00
Clémentine Urquizar	2209acbfe2	Update version for the next release (v0.18.2)	2021-10-18 13:45:48 +02:00
bors[bot]	59cc59e93e	Merge #358 358: Replacing pest with nom r=Kerollmops a=CNLHC Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>	2021-10-16 20:44:38 +00:00
刘瀚骋	7666e4f34a	follow the suggestions	2021-10-14 21:37:59 +08:00
刘瀚骋	2ea2f7570c	use nightly cargo to format the code	2021-10-14 16:46:13 +08:00
刘瀚骋	e750465e15	check logic for geolocation.	2021-10-14 16:12:00 +08:00
bors[bot]	aa5e099718	Merge #390 390: Add helper methods on the settings r=Kerollmops a=irevoire This would be a good addition to look at the content of a setting without consuming it. It’s useful for analytics. Co-authored-by: Irevoire <tamo@meilisearch.com>	2021-10-13 20:36:30 +00:00
bors[bot]	c7db4176f3	Merge #384 384: Replace memmap with memmap2 r=Kerollmops a=palfrey [memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values. Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>	2021-10-13 13:47:23 +00:00
Irevoire	a3e7c468cd	add helper methods on the settings	2021-10-13 13:05:07 +02:00
刘瀚骋	cd359cd96e	WIP: extract the error trait bound to new trait.	2021-10-13 18:04:15 +08:00
刘瀚骋	5de5dd80a3	WIP: remove '_nom' suffix/redundant error enum/...	2021-10-13 11:06:15 +08:00
刘瀚骋	2c65781d91	format	2021-10-12 22:20:22 +08:00
bors[bot]	6e3b869e6a	Merge #388 388: fix primary key inference r=MarinPostma a=MarinPostma The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id". This fix sorts the the index by field_id when infering the primary key. Co-authored-by: mpostma <postma.marin@protonmail.com>	2021-10-12 09:25:16 +00:00
mpostma	86ead92ed5	infer primary key on sorted fields	2021-10-12 11:15:11 +02:00
mpostma	9a266a531b	test correct primary key inference	2021-10-12 11:08:53 +02:00
many	c5a6075484	Make max_position_per_attributes changable	2021-10-12 10:10:50 +02:00
many	360c5ff3df	Remove limit of 1000 position per attribute Instead of using an arbitrary limit we encode the absolute position in a u32 using one strong u16 for the field id and a weak u16 for the relative position in the attribute.	2021-10-12 10:10:50 +02:00
刘瀚骋	d323e35001	add a test case	2021-10-12 13:30:40 +08:00
刘瀚骋	70f576d5d3	error handling	2021-10-12 13:30:40 +08:00
刘瀚骋	28f9be8d7c	support syntax	2021-10-12 13:30:40 +08:00
刘瀚骋	469d92c569	tweak error handling	2021-10-12 13:30:40 +08:00
刘瀚骋	7a90a101ee	reorganize parser logic	2021-10-12 13:30:40 +08:00
刘瀚骋	f7796edc7e	remove everything about pest	2021-10-12 13:30:40 +08:00
刘瀚骋	ac1df9d9d7	fix typo and remove pest	2021-10-12 13:30:40 +08:00
刘瀚骋	50ad750ec1	enhance error handling	2021-10-12 13:30:40 +08:00
刘瀚骋	8748df2ca4	draft without error handling	2021-10-12 13:30:40 +08:00
bors[bot]	07fb6d64e5	Merge #386 386: fix obkv document r=curquiza a=MarinPostma When serializing a document, the serializer resolved the field_id of the current field and immediately added it to the obkv document under construction. The issue with that is that obkv expects the fields to be inserted in order, and when a document with out of order fields was added, obkv failed to insert the field. The current fix first resolves each field_id, and adds all the fields to a temporary `BTreeMap`, until `end` is called on the map serializer, where all the fields are added to the obkv at once, and in order. Co-authored-by: mpostma <postma.marin@protonmail.com>	2021-10-11 13:45:04 +00:00
Clémentine Urquizar	dd56e82dba	Update version for the next release (v0.17.2)	2021-10-11 15:20:35 +02:00
mpostma	99889a0ed0	add obkv document serialization test	2021-10-11 15:13:17 +02:00
mpostma	799f3d43c8	fix serialization to obkv format	2021-10-11 15:04:47 +02:00
Tom Parker-Shemilt	2dfe24f067	memmap -> memmap2	2021-10-10 22:47:12 +01:00
Irevoire	b65aa7b5ac	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-10-07 17:51:52 +02:00
Tamo	11dfe38761	Update the check on the latitude and longitude Latitude are not supposed to go beyound 90 degrees or below -90. The same goes for longitude with 180 or -180. This was badly implemented in the filters, and was not implemented for the AscDesc rules.	2021-10-07 16:10:43 +02:00
many	085bc6440c	Apply PR comments	2021-10-06 11:12:26 +02:00
many	1bd15d849b	Reduce candidates threshold	2021-10-05 18:52:14 +02:00
many	ea4bd29d14	Apply PR comments	2021-10-05 17:35:07 +02:00
many	3296bb243c	Simplify word level position DB into a word position DB	2021-10-05 12:15:02 +02:00
many	75d341d928	Re-implement set based algorithm for attribute criterion	2021-10-05 12:14:50 +02:00
Clémentine Urquizar	05d8a33a28	Update version for the next release (v0.17.1)	2021-10-02 16:21:31 +02:00
Tamo	d9eba9d145	improve and test the sort error message	2021-09-30 14:38:27 +02:00
Tamo	0ee67bb7d1	improve the reserved keyword error message for the filters	2021-09-30 14:38:27 +02:00
bors[bot]	22551d0941	Merge #379 379: Revert "Change chunk size to 4MiB to fit more the end user usage" r=curquiza a=ManyTheFish Reverts meilisearch/milli#370 Co-authored-by: Many <legendre.maxime.isn@gmail.com>	2021-09-29 13:20:53 +00:00
Many	26b5dad042	Revert "Change chunk size to 4MiB to fit more the end user usage"	2021-09-29 15:08:39 +02:00
Many	2e49230ca2	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-29 14:49:45 +02:00
Many	7ad0214089	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-29 14:49:41 +02:00
many	1df5b8712b	Hotfix meilisearch#1707	2021-09-29 14:41:56 +02:00
bors[bot]	68c758a533	Merge #376 376: Stop casting integer docids to string r=Kerollmops a=irevoire When a docid is an integer, we stop casting it to a string, and thus we don't add `"` around it. Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-29 08:32:48 +00:00
Clémentine Urquizar	0e8665bf18	Update version for the next release (v0.17.0)	2021-09-28 19:38:12 +02:00
Tamo	f65153ad64	stop casting integer docids to string	2021-09-28 18:35:54 +02:00
Vishnu Gt	785c1372f2	Change "settings" to "setting" Co-authored-by: Clément Renault <renault.cle@gmail.com>	2021-09-28 20:11:32 +05:30
Vishnu Ganesan	3580b2d803	Fixes #365	2021-09-28 19:30:23 +05:30
bors[bot]	3a12f5887e	Merge #373 373: Improve error message for bad sort syntax with geosearch r=Kerollmops a=irevoire `@Kerollmops` This should be the last PR for the geosearch and error handling, sorry for doing it in so many steps 😬 Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-28 12:39:32 +00:00
Tamo	a80dcfd4a3	improve error message for bad sort syntax with geosearch	2021-09-28 14:32:24 +02:00
bors[bot]	b2a332599e	Merge #372 372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word. On Chinese Script characters, we were allowing 2 typos on 3 characters words. We are now counting the number of char instead of counting bytes to assign the typo tolerance. Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714) Co-authored-by: many <maxime@meilisearch.com>	2021-09-28 11:59:45 +00:00
many	8046ae4bd5	Count the number of char instead of counting bytes to assign the typo tolerance	2021-09-28 12:10:43 +02:00
many	1988416295	Add failing test related to Meilisearch#1714	2021-09-28 12:05:11 +02:00
Tamo	c7cb816ae1	simplify the error handling of the sort syntax for meilisearch	2021-09-27 19:07:22 +02:00
many	b188063869	Change chunk size to 4MiB to fit more the end user usage	2021-09-27 14:26:21 +02:00
many	551df0cb77	Add test checking the bug reported in meilisearch issue 1716	2021-09-23 15:55:39 +02:00
bors[bot]	87dd441a3a	Merge #367 367: Update version for the next release (v0.16.0) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-22 15:20:20 +00:00
Clémentine Urquizar	1eacab2169	Update version for the next release (v0.15.1)	2021-09-22 17:18:54 +02:00
Irevoire	218f0a6661	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-22 17:00:27 +02:00
Tamo	47ee93b0bd	return an error when _geoPoint is used but _geo is not sortable	2021-09-22 16:37:41 +02:00
Tamo	1e5e3d57e2	auto convert AscDescError into CriterionError	2021-09-22 16:37:41 +02:00
Tamo	023446ecf3	create a smaller and easier to maintain CriterionError type	2021-09-22 16:37:41 +02:00
Tamo	86e272856a	create an asc_desc error type that is never supposed to be returned to the end user	2021-09-22 16:37:41 +02:00
Tamo	257e621d40	create an asc_desc module	2021-09-22 16:37:41 +02:00
Tamo	113a061bee	fix the error handling on the criterion side	2021-09-22 15:09:07 +02:00
Tamo	78b0bce9a1	fix the returned error when asc desc fails to be parsed	2021-09-22 11:37:05 +02:00
Clémentine Urquizar	f8ecbc28e2	Update version for the next release (v0.15.0)	2021-09-21 18:09:14 +02:00
mpostma	aa6c5df0bc	Implement documents format document reader transform remove update format support document sequences fix document transform clean transform improve error handling add documents! macro fix transform bug fix tests remove csv dependency Add comments on the transform process replace search cli fmt review edits fix http ui fix clippy warnings Revert "fix clippy warnings" This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620. fix review comments remove smallvec in transform loop review edits	2021-09-21 16:58:33 +02:00
bors[bot]	94764e5c7c	Merge #360 360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza Release containing the geosearch, cf #322 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-21 08:43:27 +00:00
bors[bot]	31c8de1cca	Merge #322 322: Geosearch r=ManyTheFish a=irevoire This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so. ### What we will have to do on the indexing part: - [x] Index the `_geo` fields from the documents. - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process. - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module. - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree` - [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification. - [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file. - [x] save a Roaring bitmap of all documents containing the `_geo` field ### What we will have to do on the query part: - [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range. - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum. - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents. - [x] Add the `_geoRadius` function in the pest parser. - [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too! - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule. ----------- - On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned. Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-20 19:04:57 +00:00
Irevoire	0d104a0fce	Update milli/src/criterion.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-20 18:13:17 +02:00
Clémentine Urquizar	3f1453f470	Update version for the next release (v0.14.0)	2021-09-20 18:12:23 +02:00
Tamo	f4b8e5675d	move the reserved keyword logic for the criterion and sort + add test	2021-09-20 17:21:02 +02:00
Irevoire	3b7a2cdbce	fix typo Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-20 16:10:39 +02:00
Tamo	c695a1ffd2	add the possibility to sort by descending order on geoPoint	2021-09-15 11:49:58 +02:00
Tamo	91ce4d1721	Stop iterating through the whole list of points We stop when there is no possible candidates left	2021-09-15 11:49:58 +02:00
Clémentine Urquizar	f167f7b412	Update version for the next release (v0.13.1)	2021-09-10 09:48:17 +02:00
Tamo	cfc62a1c15	use geoutils instead of haversine	2021-09-09 18:11:38 +02:00
many	26deeb45a3	Add lacking parameter to word level position builder	2021-09-09 17:49:04 +02:00
Tamo	3fc145c254	if we have no rtree we return all other provided documents	2021-09-09 17:44:09 +02:00
Irevoire	a84f3a8b31	Apply suggestions from code review Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-09-09 15:09:35 +02:00
Tamo	c81ff22c5b	delete the invalid criterion name error in favor of invalid ranking rule name	2021-09-08 19:17:00 +02:00
Tamo	bad8ea47d5	edit the two lasts TODO comments	2021-09-08 18:24:09 +02:00
Tamo	b15c77ebc4	return an error in case a user try to sort with :desc	2021-09-08 18:24:09 +02:00
Tamo	4b618b95e4	rebase on main	2021-09-08 18:24:09 +02:00
Tamo	2988d3c76d	tests the geo filters	2021-09-08 18:24:09 +02:00
Tamo	e5ef0cad9a	use meters in the filters	2021-09-08 18:24:09 +02:00
Tamo	4f69b190bc	remove the distance from the search, the computation of the distance will be made on meilisearch side	2021-09-08 18:24:09 +02:00
Tamo	7ae2a7341c	introduce the reserved keywords in the filters	2021-09-08 18:24:09 +02:00
Tamo	6d5762a6c8	handle the case where you forgot entirely the parenthesis	2021-09-08 18:24:09 +02:00
Tamo	ebf82ac28c	improve the error messages and add tests for the filters	2021-09-08 18:24:09 +02:00
Tamo	bd4c248292	improve the error handling in general and introduce the concept of reserved keywords	2021-09-08 18:24:09 +02:00
Tamo	e8c093c1d0	fix the error handling in the filters	2021-09-08 18:24:09 +02:00
Tamo	f0b74637dc	fix all the tests	2021-09-08 18:24:09 +02:00
Tamo	b1bf7d4f40	reformat	2021-09-08 18:24:09 +02:00
Tamo	aca707413c	remove the memory leak	2021-09-08 18:24:09 +02:00
Tamo	a8a1f5bd55	move the geosearch criteria out of asc_desc.rs	2021-09-08 18:24:09 +02:00
Tamo	dc84ecc40b	fix a bug	2021-09-08 18:24:09 +02:00
Tamo	4820ac71a6	allow spaces in a geoRadius	2021-09-08 18:24:09 +02:00
Tamo	13c78e5aa2	Implement the _geoPoint in the sortable	2021-09-08 18:24:09 +02:00
Tamo	5bb175fc90	only index _geo if it's set as sortable OR filterable and only allow the filters if geo was set to filterable	2021-09-08 17:51:08 +02:00
Tamo	f73273d71c	only call the extractor if needed	2021-09-08 17:51:08 +02:00
Irevoire	ea2f2ecf96	create a new database containing all the documents that were geo-faceted	2021-09-08 17:51:08 +02:00
Irevoire	4b459768a0	create the _geoRadius filter	2021-09-08 17:51:07 +02:00
Irevoire	6d70978edc	update the facet filter grammar	2021-09-08 17:51:07 +02:00
Irevoire	216a8aa3b2	add a tests for the indexation of the geosearch	2021-09-08 17:51:07 +02:00
Irevoire	a21c854790	handle errors	2021-09-08 17:51:07 +02:00
Irevoire	70ab2c37c5	remove multiple bugs	2021-09-08 17:51:07 +02:00
Irevoire	b4b6ba6d82	rename all the ’long’ into ’lng’ like written in the specification	2021-09-08 17:51:07 +02:00
Irevoire	3b9f1db061	implement the clear of the rtree	2021-09-08 17:51:07 +02:00
Irevoire	d344489c12	implement the deletion of geo points	2021-09-08 17:51:07 +02:00
Irevoire	44d6b6ae9e	Index the geo points	2021-09-08 17:51:07 +02:00
Irevoire	8d9c2c4425	create a new db with getters and setters	2021-09-08 17:51:07 +02:00
bors[bot]	b22aac92ac	Merge #342 342: Let the caller decide what kind of error they want to returns when parsing `AscDesc` r=Kerollmops a=irevoire This is one possible fix for #339 We would then need to patch these lines https://github.com/meilisearch/MeiliSearch/blob/main/meilisearch-http/src/index/search.rs#L110-L114 to return the error we want. Another solution would be to add a parameter to the `from_str` to specify which context we are in. Co-authored-by: Tamo <tamo@meilisearch.com>	2021-09-08 14:18:57 +00:00
Tamo	932998f5cc	let the caller decide if they want to return an invalidSortName or an invalidCriterionName error	2021-09-08 16:17:31 +02:00
bors[bot]	86c3b0c8c2	Merge #350 350: Fix mdb val size error r=Kerollmops a=ManyTheFish Related to [#1677](https://github.com/meilisearch/MeiliSearch/issues/1677) Co-authored-by: many <maxime@meilisearch.com>	2021-09-08 13:32:15 +00:00
many	e54280fbfc	Skip empty normalized words	2021-09-08 15:25:23 +02:00
many	d18ee58ab9	Check if key are not empty in validator	2021-09-08 15:25:23 +02:00
Kerollmops	8a088fb99e	Bump grenad to v0.3.1	2021-09-08 14:08:55 +02:00
Kerollmops	20ad43b908	Enable the grenad tempfile feature back	2021-09-08 14:06:28 +02:00
bors[bot]	772e55d174	Merge #347 347: Update version for the next release (v0.13.0) r=curquiza a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-09-08 11:41:15 +00:00
many	9961b78b06	Drop sorter before creating a new one	2021-09-08 13:30:26 +02:00
Clémentine Urquizar	eb7b9d9dbf	Update version for the next release (v0.13.0)	2021-09-08 10:59:30 +02:00
bors[bot]	48d211b8b0	Merge #344 344: Move the sort ranking rule before the exactness ranking rule r=ManyTheFish a=Kerollmops This PR moves the sort ranking rule at the 5th position by default, right before the exactness one. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-09-07 15:47:15 +00:00
bors[bot]	720becb5e8	Merge #341 341: Throw a query time error when a sort parameter is used but the sort ranking rule is missing r=Kerollmops a=Kerollmops This PR makes the engine throw an error for when the ranking rules don't contain the `sort` rule, the `sortable_fields` are correctly set but the user tries to use the `sort` query parameter. Doing so will have no effect on the returned documents so we preferred returning an error to help debug this. That's breaking on the MeiliSearch side as we added a new variant to the `UserError` enum. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-09-07 14:45:05 +00:00
Kerollmops	e2cefc9b4f	Move the sort ranking rule before the exactness ranking rule	2021-09-07 16:41:33 +02:00
mpostma	cd043d4461	remove unused grenad default features	2021-09-07 16:21:46 +02:00
Kerollmops	5989528833	Add a test to make sure we throw the right error message	2021-09-07 11:02:00 +02:00
Kerollmops	fd3daa4423	Throw a query time error when a sort param is used but sort ranking rule is missing	2021-09-07 11:02:00 +02:00
Kerollmops	8dca36433c	Introduce the new SortRankingRuleMissing user error variant	2021-09-07 11:01:59 +02:00
Alexey Shekhirin	0be09555f1	test(search): asc/desc criteria for large datasets	2021-09-03 18:00:08 +03:00
Alexey Shekhirin	c2517e7d5f	fix(facet): string fields sorting	2021-09-03 11:58:26 +03:00
bors[bot]	5cbe879325	Merge #308 308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish Rewrite the indexer: - enhance memory consumption control - optimize parallelism using rayon and crossbeam channel - factorize the different parts and make new DB implementation easier - optimize and fix prefix databases Co-authored-by: many <maxime@meilisearch.com>	2021-09-02 15:03:52 +00:00
many	741a4444a9	Remove log in chunk generator	2021-09-02 16:57:46 +02:00
many	7f7fafb857	Make document_chunk_size settable from update builder	2021-09-02 15:25:39 +02:00
many	db0c681bae	Fix Pr comments	2021-09-02 15:17:52 +02:00
Clémentine Urquizar	285849e3a6	Update version for the next release (v0.12.0)	2021-09-02 10:08:41 +02:00
many	4860fd4529	Ignore empty facet values	2021-09-01 16:48:40 +02:00
many	b3a22f31f6	Fix memory consuption in word pair proximity extractor	2021-09-01 16:48:40 +02:00
many	9452fabfb2	Optimize cbo roaring bitmaps merge	2021-09-01 16:48:40 +02:00
many	8f702828ca	Ignore errors comming from crossbeam channel senders	2021-09-01 16:48:40 +02:00
many	e09eec37bc	Handle distance addition with hard separators	2021-09-01 16:48:40 +02:00
many	fc7cc770d4	Add logging timers	2021-09-01 16:48:40 +02:00
many	a2f59a28f7	Remove unwrap sending errors in channel	2021-09-01 16:48:40 +02:00
many	5c962c03dd	Fix and optimize word_prefix_pair_proximity_docids database	2021-09-01 16:48:40 +02:00
many	2d1727697d	Take stop word in account	2021-09-01 16:48:40 +02:00
many	823da19745	Fix test and use progress callback	2021-09-01 16:48:39 +02:00
many	1d314328f0	Plug new indexer	2021-09-01 16:48:36 +02:00
many	3aaf1d62f3	Publish grenad CompressionType type in milli	2021-09-01 16:42:08 +02:00
Alexey Shekhirin	0e379558a1	fix(search): get sortable_fields only if criteria present	2021-08-31 21:35:41 +03:00
bors[bot]	d6bba0663a	Merge #334 334: Wrap long values into BStr for warn logs r=Kerollmops a=shekhirin Resolves https://github.com/meilisearch/milli/issues/263 Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>	2021-08-31 17:38:54 +00:00
Alexey Shekhirin	0b02eb456c	chore(update): wrap long values into BStr for warn logs	2021-08-31 20:28:16 +03:00
Kerollmops	f230ae6fd5	Introduce the reset_sortable_fields Settings method	2021-08-25 17:44:16 +02:00
Kerollmops	af65485ba7	Reexport the grenad CompressionType from milli	2021-08-24 18:15:31 +02:00
Kerollmops	f2e1591826	Remove the unused tinytemplate dependency	2021-08-24 18:10:58 +02:00
Kerollmops	2f20257070	Update milli to the v0.11.0	2021-08-24 18:10:11 +02:00
Clément Renault	89d0758713	Revert "Revert "Sort at query time""	2021-08-24 11:55:16 +02:00
Clémentine Urquizar	88f6c18665	Update version for the next release (v0.10.2)	2021-08-23 11:33:30 +02:00
Clément Renault	c084f7f731	Fix the facet string docids filterable deletion bug	2021-08-23 10:50:39 +02:00
Clémentine Urquizar	922f9fd4d5	Revert "Sort at query time"	2021-08-20 18:09:17 +02:00
bors[bot]	41fc0dcb62	Merge #309 309: Sort at query time r=Kerollmops a=Kerollmops This PR: - Makes the `Asc/Desc` criteria work with strings too, it first returns documents ordered by numbers then by strings, and finally the documents that can't be ordered. Note that it is lexicographically ordered and not ordered by character, which means that it doesn't know about wide and short characters i.e. `a`, `丹`, `▲`. - Changes the syntax for the `Asc/Desc` criterion by now using a colon to separate the name and the order i.e. `title:asc`, `price:desc`. - Add the `Sort` criterion at the third position in the ranking rules by default. - Add the `sort_criteria` method to the `Search` builder struct to let the users define the `Asc/Desc` sortable attributes they want to use at query time. Note that we need to check that the fields are registered in the sortable attributes before performing the search. - Introduce a new `InvalidSortableAttribute` user error that is raised when the sort criteria declared at query time are not part of the sortable attributes. - `@ManyTheFish` introduced integration tests for the dynamic Sort criterion. Fixes #305. Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: many <maxime@meilisearch.com>	2021-08-18 16:55:32 +00:00
many	d1df0d20f9	Add integration test of SortBy criterion	2021-08-18 16:21:51 +02:00
Kerollmops	1b7f6ea1e7	Return a new error when the sort criteria is not sortable	2021-08-18 15:04:07 +02:00
Kerollmops	71602e0f1b	Add the sortable fields into the settings and in the index	2021-08-18 15:04:07 +02:00
Kerollmops	407f53872a	Add a sort_criteria method to the Search builder struct	2021-08-18 15:04:07 +02:00
Kerollmops	687cd2e205	Introduce the new Sort criterion and AscDesc enum	2021-08-18 15:04:07 +02:00
bors[bot]	198c416bd8	Merge #312 312: Update milli version to v0.10.1 r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-08-18 12:08:04 +00:00
Clémentine Urquizar	6cb9c3b81f	Update milli version to v0.10.1	2021-08-18 13:46:27 +02:00
Clémentine Urquizar	42cf847a63	Update tokenizer version to v0.2.5	2021-08-18 13:37:41 +02:00
Kerollmops	5b88df508e	Use the new Asc/Desc syntax everywhere	2021-08-17 14:15:22 +02:00
Kerollmops	fcedff95e8	Change the Asc/Desc criterion syntax to use a colon (:)	2021-08-17 14:03:21 +02:00
Kerollmops	e9ada44509	AscDesc criterion returns documents ordered by numbers then by strings	2021-08-17 13:21:31 +02:00
Kerollmops	110bf6b778	Make the FacetStringIter work in both, ascending and descending orders	2021-08-17 11:18:40 +02:00
Kerollmops	22ebd2658f	Introduce the EitherString/RevRange private aliases	2021-08-17 10:47:15 +02:00
Kerollmops	7a5889bc5a	Introduce the highest_reverse_iter private method	2021-08-17 10:45:26 +02:00
Kerollmops	ad0d311f8a	Introduce the FacetStringLevelZeroRevRange struct	2021-08-17 10:44:43 +02:00
Kerollmops	6214c38da9	Introduce the FacetStringGroupRevRange struct	2021-08-17 10:44:27 +02:00
Kerollmops	1c604de158	Introduce the highest_iter private method on the FacetStringIter struct	2021-08-17 10:41:11 +02:00
Kerollmops	64df159057	Introduce the new_reducing constructor on the FacetStringIter struct	2021-08-17 10:35:06 +02:00
Kerollmops	01a4052828	Move the FacetStringIter creation logic into a private new method	2021-08-17 10:29:43 +02:00
bors[bot]	51581d14f8	Merge #307 307: Update version for the next release (v0.10.0) r=Kerollmops a=curquiza Replaces https://github.com/meilisearch/milli/pull/304 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-08-16 10:33:53 +00:00
Clémentine Urquizar	fcc520e49a	Update version for the next release (v0.10.0)	2021-08-16 12:00:28 +02:00
many	7dbefae1e3	Make facet string iterator non reducing	2021-08-12 17:23:39 +02:00
many	8fdf860c17	Remove max values by facet limit for facet distribution	2021-08-12 11:29:20 +02:00
bors[bot]	2102e0da6b	Merge #302 302: Update milli to v0.9.0 r=curquiza a=curquiza Updating the minor and not patch since #300 seems to be breaking: it involves a re-indexation to get the fix, so it involves an additional step from the users, not only downloading the latest version. Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-08-05 08:38:15 +00:00
bors[bot]	89b9b61840	Merge #300 300: Fix prefix level position docids database r=curquiza a=ManyTheFish The prefix search was inverted when we generated the DB. Instead of searching if word had a prefix in prefix fst, we were searching if the word was a prefix of a prefix contained in the prefix fst. The indexer, now, iterate over prefix contained in the fst and search them by prefix in the word-level-position-docids database, aggregating matches in a sorter. Fix #299 Co-authored-by: many <maxime@meilisearch.com>	2021-08-04 16:52:09 +00:00
Clémentine Urquizar	7f26c75610	Update milli to v0.9.0	2021-08-04 16:04:55 +02:00
many	cdeb07f0fd	Fix prefix level position docids database The prefix search was inverted when we generated the DB. Instead of searching if word had a prefix in prefix fst, we were searching if the word was a prefix of a prefix contained in the prefix fst. The indexer, now, iterate over prefix contained in the fst and search them by prefix in the word-level-position-docids database, aggregating matches in a sorter. Fix #299	2021-08-04 14:11:49 +02:00
bors[bot]	1290edd58a	Merge #297 297: Bump milli to v0.8.1 r=curquiza a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-07-29 14:19:41 +00:00
Kerollmops	341c244965	Bump milli to v0.8.1	2021-07-29 15:56:36 +02:00
Kerollmops	90514e03d1	Fix invalid faceted documents ids buffer size	2021-07-29 15:49:23 +02:00
bors[bot]	200e98c211	Merge #293 293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops Fix https://github.com/meilisearch/meilisearch/issues/1505. fix https://github.com/meilisearch/MeiliSearch/issues/1529 Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-07-27 16:04:52 +00:00
Clémentine Urquizar	6a141694da	Update version for the next release (v0.8.0)	2021-07-27 16:38:42 +02:00
Kerollmops	dc2b63abdf	Introduce an empty FilterCondition variant to support unknown fields	2021-07-27 16:34:04 +02:00
Kerollmops	b12738cfe9	Use the right DB prefixes to store the faceted fields	2021-07-22 19:18:22 +02:00
Kerollmops	7aa6cc9b04	Do not insert fields in the map when changing the settings	2021-07-22 18:40:12 +02:00
bors[bot]	ee3a49cfba	Merge #291 291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released. It is not an undefined behavior, I repeat: it is a "normal" bug 🎉 👏 ---- This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it. As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`. I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes. <details> <summary>The backtrace leading us to a panic in grenad.</summary> ``` meilisearch_1 \| thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17 meilisearch_1 \| stack backtrace: meilisearch_1 \| 0: rust_begin_unwind meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5 meilisearch_1 \| 1: core::panicking::panic_fmt meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14 meilisearch_1 \| 2: core::panicking::panic meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5 meilisearch_1 \| 3: grenad::block_builder::BlockBuilder::insert meilisearch_1 \| at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17 meilisearch_1 \| 4: grenad::writer::Writer<W>::insert meilisearch_1 \| at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12 meilisearch_1 \| 5: milli::update::words_level_positions::write_level_entry meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5 meilisearch_1 \| 6: milli::update::words_level_positions::compute_positions_levels meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13 meilisearch_1 \| 7: milli::update::words_level_positions::WordsLevelPositions::execute meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23 meilisearch_1 \| 8: milli::update::index_documents::IndexDocuments::execute_raw meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9 meilisearch_1 \| 9: milli::update::index_documents::IndexDocuments::execute meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9 meilisearch_1 \| 10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30 meilisearch_1 \| 11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22 meilisearch_1 \| 12: meilisearch_http::index::update_handler::UpdateHandler::handle_update meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18 meilisearch_1 \| 13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}} meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35 meilisearch_1 \| 14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21 meilisearch_1 \| 15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17 meilisearch_1 \| 16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9 meilisearch_1 \| 17: tokio::runtime::task::core::CoreStage<T>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13 meilisearch_1 \| 18: tokio::runtime::task::harness::poll_future::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23 meilisearch_1 \| 19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9 meilisearch_1 \| 20: std::panicking::try::do_call meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40 meilisearch_1 \| 21: std::panicking::try meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19 meilisearch_1 \| 22: std::panic::catch_unwind meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14 meilisearch_1 \| 23: tokio::runtime::task::harness::poll_future meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19 meilisearch_1 \| 24: tokio::runtime::task::harness::Harness<T,S>::poll_inner meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9 meilisearch_1 \| 25: tokio::runtime::task::harness::Harness<T,S>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15 meilisearch_1 \| 26: tokio::runtime::task::raw::RawTask::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18 meilisearch_1 \| 27: tokio::runtime::task::Notified<S>::run meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9 meilisearch_1 \| 28: tokio::runtime::blocking::pool::Inner::run meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17 meilisearch_1 \| 29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17 meilisearch_1 \| note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. ``` </details> Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-07-22 16:14:35 +00:00
Kerollmops	0353fbb5df	Bump the tokenizer version to v0.2.4	2021-07-22 17:14:45 +02:00
Kerollmops	92c0a2cdc1	Add a test that triggers a panic when indexing zeroes	2021-07-22 17:14:44 +02:00
Kerollmops	aa02a7fdd8	Add a test to check that we indeed impact the relevancy	2021-07-22 17:04:38 +02:00
Clément Renault	0227254a65	Return the original string values for the inverted facet index database	2021-07-21 16:59:39 +02:00
Kerollmops	03a01166ba	Display the original facet string value from the linear facet database	2021-07-21 16:59:39 +02:00
Clément Renault	d23c250ad5	Fix a bound error in the facet string range construction	2021-07-21 16:59:39 +02:00
Clément Renault	081278dfd6	Use the facet string levels when computing the facet distribution	2021-07-21 16:59:39 +02:00
Clément Renault	5676b204dd	Fix the facet string levels codecs	2021-07-21 16:59:38 +02:00
Kerollmops	8c86348119	Indexing the facet strings levels	2021-07-21 16:59:38 +02:00
Kerollmops	a7ae552ba7	Fix the FacetStringLevelZeroRange range when unbounded	2021-07-21 16:59:38 +02:00
Kerollmops	757b2b502a	Remove the FacetValueStringCodec	2021-07-21 16:59:38 +02:00
Kerollmops	adfd4da24c	Introduce the FacetStringIter iterator	2021-07-21 16:59:38 +02:00
Kerollmops	a79661c6dc	Introduce a lot of facet string helper iterators	2021-07-21 16:59:38 +02:00
Kerollmops	851f979039	Describe the way we want to group the facet strings	2021-07-21 16:59:38 +02:00
Kerollmops	f858f64b1f	Move the facet number iterators into their own module	2021-07-21 16:59:37 +02:00
Kerollmops	9f8095c069	Make sure that we don't keep a reference on the LMDB key when using put_current	2021-07-21 10:35:35 +02:00
Kerollmops	a9553af635	Add a test to check that we can index more that 256 fields	2021-07-06 11:58:03 +02:00
Kerollmops	838ed1cd32	Use an u16 field id instead of one byte	2021-07-06 11:58:03 +02:00
Kerollmops	91c5d0c042	Use the AlwaysFreePages flag when opening an index	2021-07-05 16:36:13 +02:00
Kerollmops	a6b4069172	Bump to v0.7.2	2021-07-05 10:54:53 +02:00
many	9f62149b94	Fix matching lenghth in matching_words	2021-07-01 19:03:28 +02:00
Clémentine Urquizar	3c149d8a43	Update tokenizer version to v0.2.3	2021-06-30 18:41:35 +02:00
bors[bot]	b4dcdbf00d	Merge #269 #271 269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops This PR fixes #268. The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted. The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html). 271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-06-30 12:34:55 +00:00
Kerollmops	32b7bd366f	Remove the roaring operation functions warnings	2021-06-30 14:12:56 +02:00
Kerollmops	c92ef54466	Add a test for when we insert a previously deleted document	2021-06-30 14:00:01 +02:00
Kerollmops	28782ff99d	Fix ExternalDocumentsIds struct when inserting previously deleted ids	2021-06-30 14:00:01 +02:00
Clémentine Urquizar	b489515f4d	Update milli version to v0.7.1	2021-06-30 13:52:46 +02:00
Kerollmops	54889813ce	Implement some debug functions on the ExternalDocumentsIds struct	2021-06-30 11:29:41 +02:00
Kerollmops	4bce66d5ff	Make the Index::delete_* method private	2021-06-30 10:07:31 +02:00
Irevoire	6044b80362	Update milli/src/search/matching_words.rs Co-authored-by: Clément Renault <renault.cle@gmail.com>	2021-06-30 00:35:26 +02:00
Tamo	be75e738b1	add more tests	2021-06-29 16:24:58 +02:00
Tamo	56fceb1928	re-implement the Damerau-Levenshtein used for the highlighting	2021-06-29 15:36:03 +02:00
Clément Renault	80c6aaf1fd	Bump milli to 0.7.0	2021-06-28 18:31:56 +02:00
Clément Renault	bdc5599b73	Bump heed to use the git repo with v0.12.0	2021-06-28 18:26:20 +02:00
Clément Renault	0013236e5d	Fix the LMDB and heed invalid interactions. It is undefined behavior to keep a reference to the database while modifying it, we were keeping references in the database and also feeding the heed put_current methods with keys referenced inside the database itself. https://github.com/Kerollmops/heed/pull/108	2021-06-28 16:19:02 +02:00
Kerollmops	9e5f9a8a10	Add a test for the words level positions generation bug	2021-06-28 16:08:31 +02:00
Kerollmops	98285b4b18	Bump milli to 0.6.0	2021-06-23 17:30:26 +02:00
Kerollmops	4fc8f06791	Rename faceted_fields into filterable_fields	2021-06-23 17:26:54 +02:00
Kerollmops	c31cadb54f	Do not consider the searchable field as filterable	2021-06-23 17:26:54 +02:00
bors[bot]	2ab24c4f49	Merge #256 256: Update version for the next release (v0.5.1) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-06-23 12:29:57 +00:00
Clémentine Urquizar	9885fb4159	Update version for the next release (v0.5.1)	2021-06-23 14:05:20 +02:00
Kerollmops	a6218a20ae	Introduce a new InvalidFacetsDistribution user error	2021-06-23 13:56:19 +02:00
Kerollmops	2364777838	Return an error for when a field distribution cannot be done	2021-06-23 11:50:49 +02:00
Kerollmops	aeaac743ff	Replace an if let some by a match	2021-06-23 11:33:30 +02:00
Tamo	8d2a0b43ff	run the formatter on the whole project a second time	2021-06-22 15:36:22 +02:00
Tamo	3d90b03d7b	fix the limit There was no check on the limit and thus, if a user especified a very large number this line could causes a panic	2021-06-22 14:52:13 +02:00
bors[bot]	5b6adc6d96	Merge #245 245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops Closes #191, and resolves #140. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-06-22 12:10:52 +00:00
Kerollmops	51dbb2e06d	Warn for when a key is too large for LMDB	2021-06-22 11:51:36 +02:00
Kerollmops	aecbd14761	Improve the error message for InvalidDocumentId	2021-06-22 11:31:58 +02:00
Kerollmops	0cca2ea24f	Return a MissingDocumentId when a document doesn't have one	2021-06-22 11:22:33 +02:00
Kerollmops	481b0bf277	Warn for when a facet key is too large for LMDB	2021-06-22 10:57:46 +02:00
bors[bot]	b073fd49ea	Merge #244 244: Update version for the next release (v0.5.0) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-06-21 14:27:10 +00:00
Clémentine Urquizar	320670f8fe	Update version for the next release (v0.5.0)	2021-06-21 15:59:17 +02:00
Clémentine Urquizar	daef43f504	Rename FieldsDistribution into FieldDistribution	2021-06-21 15:57:41 +02:00
Clémentine Urquizar	35fcc351a0	Update version for the next release (v0.4.2)	2021-06-20 17:37:24 +02:00
bors[bot]	5b19dd23d9	Merge #240 240: Field distribution r=Kerollmops a=irevoire closes #199 closes #198 Co-authored-by: Tamo <tamo@meilisearch.com>	2021-06-19 10:14:25 +00:00
Tamo	d08cfda796	convert the field_distribution to a BTreeMap and avoid counting twice the same documents	2021-06-17 18:31:54 +02:00
bors[bot]	a9e552ab18	Merge #238 238: Integration tests on filters and distinct r=Kerollmops a=ManyTheFish Fix #216 Fix #120 Co-authored-by: many <maxime@meilisearch.com>	2021-06-17 15:00:51 +00:00
many	6cb1102bdb	Fix PR comments	2021-06-17 15:19:03 +02:00
Tamo	969adaefdf	rename fields_distribution in field_distribution	2021-06-17 15:16:20 +02:00
Kerollmops	ccd6f13793	Update version to the next release (0.4.1)	2021-06-17 15:01:20 +02:00
many	f496cd320d	Add distinct integration tests	2021-06-17 14:33:18 +02:00
many	9f4184208e	Add test on filters	2021-06-17 13:56:09 +02:00
marin	70bee7d405	re-export remaining error types Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-17 11:49:03 +02:00
marin postma	abbebad669	change sub errors visibility	2021-06-17 11:44:01 +02:00
Tamo	9716fb3b36	format the whole project	2021-06-16 18:33:33 +02:00
Clémentine Urquizar	f5ff3e8e19	Update version for the next release (v0.4.0)	2021-06-16 14:01:05 +02:00
many	ce0315a10f	Close write transaction in test	2021-06-16 11:03:37 +02:00
Kerollmops	7ac441e473	Fix small typos	2021-06-16 11:03:37 +02:00
Kerollmops	adf0c389c5	Rename FilterParsing into InvalidFilter	2021-06-16 11:03:36 +02:00
Kerollmops	8cfe3e1ec0	Rename DatabaseSizeReached into MaxDatabaseSizeReached	2021-06-16 11:03:36 +02:00
Kerollmops	4eda438f6f	Add a new Error for when a user use a non-filtered attribute in a filter	2021-06-16 11:03:36 +02:00
Kerollmops	713acc408b	Introduce the primary key to the Settings builder structure	2021-06-16 11:03:36 +02:00
Kerollmops	a7d6930905	Replace the panicking expect by tracked Errors	2021-06-15 11:51:32 +02:00
Kerollmops	f0e804afd5	Rename the FieldIdMapMissingEntry from_db_name field into process	2021-06-15 11:13:04 +02:00
Kerollmops	28c004aa2c	Prefer using constant for the database names	2021-06-15 11:13:04 +02:00
Kerollmops	312c2d1d8e	Use the Error enum everywhere in the project	2021-06-14 16:58:38 +02:00
Kerollmops	ca78cb5aca	Introduce more variants to the error module enums	2021-06-14 16:58:38 +02:00
Kerollmops	456541e921	Implement the Display trait on the Error type	2021-06-14 16:48:51 +02:00
Kerollmops	44c353fafd	Introduce some way to construct an Error	2021-06-14 16:48:51 +02:00
Kerollmops	23fcf7920e	Introduce a basic version of the InternalError struct	2021-06-14 16:48:51 +02:00
Kerollmops	d2b1ecc885	Remove a lot of serialization unreachable errors	2021-06-14 16:48:51 +02:00

... 11 12 13 14 15 ...

1525 Commits