meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-27 04:25:06 +08:00

Author	SHA1	Message	Date
Kerollmops	399eec5c01	Fix the indexation tests	2022-07-12 14:55:51 +02:00
Kerollmops	fcfc4caf8c	Move the Object type in the lib.rs file and use it everywhere	2022-07-12 14:55:51 +02:00
Kerollmops	0146175fe6	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	cefffde9af	Improve the .gitignore of the fuzz crate	2022-07-12 14:55:51 +02:00
Kerollmops	bdc4263883	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	6d0498df24	Fix the fuzz tests	2022-07-12 14:52:56 +02:00
Kerollmops	e8297ad27e	Fix the tests for the new DocumentsBatchBuilder/Reader	2022-07-12 14:52:56 +02:00
Kerollmops	419ce3966c	Rework the DocumentsBatchBuilder/Reader to use grenad	2022-07-12 14:52:55 +02:00
Kerollmops	eb63af1f10	Update grenad to 0.4.2	2022-07-12 14:52:55 +02:00
Kerollmops	048e174efb	Do not allocate when parsing CSV headers	2022-07-12 14:52:55 +02:00
ManyTheFish	5d79617a56	Chores: Enhance smart-crop code comments	2022-07-07 16:28:09 +02:00
bors[bot]	ebddfdb9a3	Merge #578 578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584). Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-07-05 14:56:08 +00:00
Kerollmops	1bfdcfc84f	Bump uuid to 1.1.2	2022-07-05 16:23:36 +02:00
Tamo	250be9fe6c	put the threshold back to 10k	2022-07-05 15:57:44 +02:00
Tamo	b61efd09fc	Makes the internal soft deleted error a UserError	2022-07-05 15:34:45 +02:00
Tamo	eaf28b0628	Apply review suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-05 15:30:33 +02:00
Tamo	3b309f654a	Fasten the document deletion When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search, and all the other endpoints.	2022-07-05 15:30:33 +02:00
Tamo	446439e8be	bump charabia	2022-07-05 12:19:30 +02:00
Dmytro Gordon	3ff03a3f5f	Fix not equal filter when field contains both number and strings	2022-06-27 15:55:17 +03:00
Kerollmops	cc48992e79	Bump the milli version to 0.31.1	2022-06-22 17:05:51 +02:00
Kerollmops	238692a8e7	Introduce the copy_to_path method on the Index	2022-06-22 16:49:47 +02:00
bors[bot]	290a40b7a5	Merge #564 564: Rename the limitedTo parameter into maxTotalHits r=curquiza a=Kerollmops This PR is related to https://github.com/meilisearch/meilisearch/issues/2542, it renames the `limitedTo` parameter into `maxTotalHits`. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-22 13:48:33 +00:00
bors[bot]	d546f6f40e	Merge #563 563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops This PR is related to https://github.com/meilisearch/meilisearch/issues/2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way. The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned. If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-22 12:39:44 +00:00
Kerollmops	f5c3b951bc	Bump the milli version to 0.31.0	2022-06-22 12:08:16 +02:00
Kerollmops	d7c248042b	Rename the limitedTo parameter into maxTotalHits	2022-06-22 12:00:48 +02:00
Kerollmops	d2f84a9d9e	Improve the estimatedNbHits when distinct is enabled	2022-06-22 11:39:21 +02:00
bors[bot]	4f547eff02	Merge #560 560: Update version for next release (v0.30.0) r=curquiza a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-20 12:37:01 +00:00
Clémentine Urquizar	31f749b5d8	Update version for next release (v0.30.0)	2022-06-20 12:09:57 +02:00
ManyTheFish	a0ab90a4d7	Avoid having an ending separator before crop marker	2022-06-16 18:23:57 +02:00
ManyTheFish	177154828c	Extends deletion tests	2022-06-13 17:34:16 +02:00
ManyTheFish	0d1d354052	Ensure that Index methods are not bypassed by Meilisearch	2022-06-13 17:34:11 +02:00
bors[bot]	f1d848bb9a	Merge #552 552: Fix escaped quotes in filter r=Kerollmops a=irevoire Will fix https://github.com/meilisearch/meilisearch/issues/2380 The issue was that in the evaluation of the filter, I was using the deref implementation instead of calling the `value` method of my token. To avoid the problem happening again, I removed the deref implementation; now, you need to either call the `lexeme` or the `value` methods but can't rely on a « default » implementation to get a string out of a token. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-06-09 14:56:44 +00:00
Tamo	676187ba43	bump milli version	2022-06-09 16:53:32 +02:00
Tamo	90afde435b	fix escaped quotes in filter	2022-06-09 16:03:49 +02:00
Kerollmops	445d5474cc	Add the pagination_limited_to setting to the database	2022-06-08 18:14:27 +02:00
Kerollmops	69931e50d2	Add the max_values_by_facet setting to the database	2022-06-08 17:54:56 +02:00
Kerollmops	52a494bd3b	Add the new pagination.limited_to and faceting.max_values_per_facet settings	2022-06-08 17:15:36 +02:00
bors[bot]	9580b9de79	Merge #549 549: Bump the version to 0.29.2 r=curquiza a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-08 14:29:47 +00:00
Kerollmops	56ee9cc21f	Bump the version to 0.29.2	2022-06-08 16:00:06 +02:00
Kerollmops	2a505503b3	Change the number of facet values returned by default to 100	2022-06-08 15:58:57 +02:00
Kerollmops	bae4007447	Remove the hard limit on the number of facet values returned	2022-06-08 15:58:57 +02:00
bors[bot]	7313d6c533	Merge #547 547: Update version for next release (v0.29.1) r=Kerollmops a=curquiza A new milli version will be released once this PR is merged https://github.com/meilisearch/milli/pull/543 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-08 10:20:24 +00:00
Clémentine Urquizar	478dbfa45a	Update version for next release (v0.29.1)	2022-06-07 18:59:33 +02:00
Tamo	d0aaa7ff00	Fix wrong internal ids assignments	2022-06-07 15:49:33 +02:00
ad hoc	31776fdc3f	add failing test	2022-06-07 15:49:33 +02:00
bors[bot]	05ae6dbfa4	Merge #541 541: Update version for next release (v0.29.0) r=ManyTheFish a=curquiza Need to update the version since #540 was merged and breaking Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-02 16:53:28 +00:00
ManyTheFish	d212dc6b8b	Remove useless newline	2022-06-02 18:22:56 +02:00
Clémentine Urquizar	6ce1c6487a	Update version for next release (v0.29.0)	2022-06-02 18:07:55 +02:00
ManyTheFish	7aabe42ae0	Refactor matching words	2022-06-02 17:59:04 +02:00
ManyTheFish	86ac8568e6	Use Charabia in milli	2022-06-02 16:59:11 +02:00
ManyTheFish	192e024ada	Add Charabia in Cargo.toml	2022-06-02 16:59:07 +02:00
Clémentine Urquizar	c19c17eddb	Update version to v0.28.1	2022-06-01 18:31:02 +02:00
bors[bot]	74d1914a64	Merge #535 535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349. ~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~ I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help! Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-01 14:30:50 +00:00
bors[bot]	582930dbbb	Merge #538 538: speedup exact words r=Kerollmops a=MarinPostma This PR make `exact_words` return an `Option` instead of an empty set, since set creation is costly, as noticed by `@kerollmops.` I was not convinces that this was the cause for all of the performance drop we measured, and then realized that methods that initialized it were called recursively which caused initialization times to add up. While the first fix solves the issue when not using exact words, using exact word remained way more expensive that it should be. To address this issue, the exact words are cached into the `Context`, so they are only initialized once. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-30 08:20:34 +00:00
ad hoc	25fc576696	review changes	2022-05-24 14:15:33 +02:00
ad hoc	69dc4de80f	change &Option<Set> to Option<&Set>	2022-05-24 12:14:55 +02:00
ad hoc	ac975cc747	cache context's exact words	2022-05-24 09:43:17 +02:00
ad hoc	8993fec8a3	return optional exact words	2022-05-24 09:15:49 +02:00
Matthias Wright	754f48a4fb	Improves ranking rules error message	2022-05-20 21:25:43 +02:00
Kerollmops	cd7c6e19ed	Reintroduce the max values by facet limit	2022-05-18 15:57:57 +02:00
ManyTheFish	895f5d8a26	Bump milli version	2022-05-18 10:37:12 +02:00
ManyTheFish	137434a1c8	Add some implementation on MatchBounds	2022-05-17 15:57:09 +02:00
bors[bot]	08c6d50cd1	Merge #531 531: fix the mixed dataset geosearch indexing bug r=Kerollmops a=irevoire port #529 to main Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-16 16:06:36 +00:00
bors[bot]	cf3e574cb4	Merge #530 530: fix the searchable fields bug when a field is nested r=Kerollmops a=irevoire port #528 to main Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-16 15:52:30 +00:00
Tamo	0af399a6d7	fix the mixed dataset geosearch indexing bug	2022-05-16 17:37:45 +02:00
Tamo	f586028f9a	fix the searchable fields bug when a field is nested Update milli/src/index.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-05-16 17:24:36 +02:00
bors[bot]	e1e85267fd	Merge #526 526: remove useless comment r=irevoire a=MarinPostma Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-16 10:01:43 +00:00
bors[bot]	51809eb260	Merge #525 525: Simplify the error creation with thiserror r=irevoire a=irevoire I introduced [`thiserror`](https://docs.rs/thiserror/latest/thiserror/) to implements all the `Display` trait and most of the `impl From<xxx> for yyy` in way less lines. And then I introduced a cute macro to implements the `impl<X, Y, Z> From<X> for Z where Y: From<X>, Z: From<X>` more easily. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-04 15:47:32 +00:00
Tamo	484a9ddb27	Simplify the error creation with thiserror and a smol friendly macro	2022-05-04 17:24:00 +02:00
bors[bot]	65e6aa0de2	Merge #523 523: Improve geosearch error messages r=irevoire a=irevoire Improve the geosearch error messages (#488). And try to parse the string as specified in https://github.com/meilisearch/meilisearch/issues/2354 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-05-04 13:36:11 +00:00
Tamo	c55368ddd4	apply code suggestion Co-authored-by: Kerollmops <kero@meilisearch.com>	2022-05-04 14:11:03 +02:00
ad hoc	5ad5d56f7e	remove useless comment	2022-05-04 10:43:54 +02:00
bors[bot]	0c2c8af44e	Merge #520 520: fix mistake in Settings initialization r=irevoire a=MarinPostma fix settings not being correctly initialized and add a test to make sure that they are in the future. fix https://github.com/meilisearch/meilisearch/issues/2358 Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-05-03 15:32:18 +00:00
Kerollmops	211c8763b9	Make sure that we do not generate too long keys	2022-05-03 10:03:15 +02:00
Kerollmops	7e47031bdc	Add a test for long keys in LMDB	2022-05-03 10:03:13 +02:00
Tamo	3cb1f6d0a1	improve geosearch error messages	2022-05-02 19:20:47 +02:00
ad hoc	1ee3d6ae33	fix mistake in Settings initialization	2022-04-29 16:24:25 +02:00
bors[bot]	9db86aac51	Merge #518 518: Return facets even when there is no value associated to it r=Kerollmops a=Kerollmops This PR is related to https://github.com/meilisearch/meilisearch/issues/2352 and should fix the issue when Meilisearch is up-to-date with this PR. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-04-28 09:04:36 +00:00
Kerollmops	a4d343aade	Add a test to check for the returned facet distribution	2022-04-26 18:12:58 +02:00
bors[bot]	c2bd94c871	Merge #511 511: Update version in every workspace r=curquiza a=curquiza Checked with `@Kerollmops` - Update the version into every workspace (the current version is v0.27.0, but I forgot to update it for the previous release) - add `publish = false` except in `milli` workspace. Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-26 16:06:47 +00:00
Kerollmops	7d1c2d97bf	Return facets even when there is no values associated to it	2022-04-26 17:59:53 +02:00
bors[bot]	d388ea0f9d	Merge #506 506: fix cargo warnings r=Kerollmops a=MarinPostma fix cargo warnings Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-26 15:45:20 +00:00
ad hoc	5c29258e8e	fix cargo warnings	2022-04-26 17:33:11 +02:00
bors[bot]	2fdf520271	Merge #514 514: Stop flattening every field r=Kerollmops a=irevoire When we need to flatten a document: * The primary key contains a `.`. * Some fields need to be flattened Instead of flattening the whole object and thus creating a lot of allocations with the `serde_json_flatten_crate`, we instead generate a minimal sub-object containing only the fields that need to be flattened. That should create fewer allocations and thus index faster. --------- ``` group indexing_main_e1e362fa indexing_stop-flattening-every-field_40d1bd6b ----- ---------------------- --------------------------------------------- indexing/Indexing geo_point 1.99 23.7±0.23s ? ?/sec 1.00 11.9±0.21s ? ?/sec indexing/Indexing movies in three batches 1.00 18.2±0.24s ? ?/sec 1.01 18.3±0.29s ? ?/sec indexing/Indexing movies with default settings 1.00 17.5±0.09s ? ?/sec 1.01 17.7±0.26s ? ?/sec indexing/Indexing songs in three batches with default settings 1.00 64.8±0.47s ? ?/sec 1.00 65.1±0.49s ? ?/sec indexing/Indexing songs with default settings 1.00 54.9±0.99s ? ?/sec 1.01 55.7±1.34s ? ?/sec indexing/Indexing songs without any facets 1.00 50.6±0.62s ? ?/sec 1.01 50.9±1.05s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 54.0±1.14s ? ?/sec 1.01 54.7±1.13s ? ?/sec indexing/Indexing wiki 1.00 996.2±8.54s ? ?/sec 1.02 1021.1±30.63s ? ?/sec indexing/Indexing wiki in three batches 1.00 1136.8±9.72s ? ?/sec 1.00 1138.6±6.59s ? ?/sec ``` So basically everything slowed down a liiiiiittle bit except the dataset with a nested field which got twice faster Co-authored-by: Tamo <tamo@meilisearch.com>	2022-04-26 11:50:33 +00:00
Tamo	f19d2dc548	Only flatten the required fields apply review comments Co-authored-by: Kerollmops <kero@meilisearch.com>	2022-04-26 12:33:46 +02:00
Clémentine Urquizar	d138b3c704	Update version	2022-04-25 18:43:46 +02:00
Tamo	fa6f495662	fix the indexing fuzzer	2022-04-25 18:32:06 +02:00
bors[bot]	8010eca9c7	Merge #505 505: normalize exact words r=curquiza a=MarinPostma Normalize the exact words, as specified in the specification. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-04-25 09:35:32 +00:00
ad hoc	2e0089d5ff	normalize exact words	2022-04-21 15:38:40 +02:00
ad hoc	3a2451fcba	add test normalize exact words	2022-04-21 13:52:09 +02:00
Clément Renault	eb5830aa40	Add a test to make sure that long words are handled	2022-04-21 13:45:28 +02:00
ad hoc	8b14090927	fix min-word-len-for-typo not reset properly	2022-04-19 15:20:16 +02:00
bors[bot]	ea4bb9402f	Merge #483 483: Enhance matching words r=Kerollmops a=ManyTheFish # Summary Enhance milli word-matcher making it handle match computing and cropping. # Implementation ## Computing best matches for cropping Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched. Now we are searching for the best matches interval to crop around, the chosen interval is the one: 1) that have the highest count of unique matches > for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better. 2) that have the minimum distance between matches > for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better. 3) that have the highest count of ordered matches > for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better. ## Cropping around the best matches interval Before we were cropping around the interval without checking the context. Now we are cropping around words in the same context as matching words. This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot. > For instance, for the matching word `Split` the text: `Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.` will be cropped like: `…. Split The World is a book written by Emily Henry. …` and not like: `Natalie risk her future. Split The World is a book …` Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-04-19 11:42:32 +00:00
ManyTheFish	f1115e274f	Use Copy impl of FormatOption instead of clonning	2022-04-19 10:35:50 +02:00
Clémentine Urquizar	8d630a6f62	Update version for the next release (v0.26.1)	2022-04-14 11:44:06 +02:00
Tamo	00f78d6b5a	Apply code suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-04-14 11:14:08 +02:00
Tamo	399fba16bb	only flatten an object if it's nested	2022-04-14 11:14:08 +02:00
Tamo	ee64f4a936	Use smartstring to store the external id in our hashmap We need to store all the external id (primary key) in a hashmap associated to their internal id during. The smartstring remove heap allocation / memory usage and should improve the cache locality.	2022-04-13 21:22:07 +02:00
ad hoc	dda28d7415	exclude excluded canditates from search result candidates	2022-04-13 12:10:35 +02:00
ad hoc	cd83014fff	add test for disctinct nb hits	2022-04-13 12:10:35 +02:00
ad hoc	bbb6728d2f	add distinct attributes to cli	2022-04-13 12:10:35 +02:00
ManyTheFish	5809d3ae0d	Add first benchmarks on formatting	2022-04-12 16:31:58 +02:00
ManyTheFish	827cedcd15	Add format option structure	2022-04-12 13:42:14 +02:00
ManyTheFish	011f8210ed	Make compute_matches more rust idiomatic	2022-04-12 10:19:02 +02:00
ManyTheFish	a16de5de84	Symplify format and remove intermediate function	2022-04-08 11:20:41 +02:00
ManyTheFish	a769e09dfa	Make token_crop_bounds more rust idiomatic	2022-04-07 20:15:14 +02:00
bors[bot]	9ac2fd1c37	Merge #487 487: Update version (v0.26.0) r=Kerollmops a=curquiza breaking because of #458 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-07 17:10:24 +00:00
Tamo	bab898ce86	move the flatten-serde-json crate inside of milli	2022-04-07 18:20:44 +02:00
ManyTheFish	c8ed1675a7	Add some documentation	2022-04-07 17:32:13 +02:00
ManyTheFish	b1905dfa24	Make split_best_frequency returns references instead of owned data	2022-04-07 17:05:44 +02:00
Tamo	ab458d8840	fix tests after rebase	2022-04-07 17:00:00 +02:00
Irevoire	4f3ce6d9cd	nested fields	2022-04-07 16:58:46 +02:00
Clémentine Urquizar	ee1d627803	Update version (v0.26.0)	2022-04-07 15:56:10 +02:00
bors[bot]	4ae7aea3b2	Merge #486 486: Update version (v0.25.0) r=curquiza a=curquiza v0.25.0 will be released once #478 is merged Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-04-06 11:40:41 +00:00
ad hoc	b799f3326b	rename merge_nothing to merge_ignore_values	2022-04-05 18:44:35 +02:00
ManyTheFish	fa7d3a37c0	Make some cleaning and add comments	2022-04-05 17:48:56 +02:00
ManyTheFish	3bb1e35ada	Fix match count	2022-04-05 17:48:45 +02:00
ManyTheFish	56e0edd621	Put crop markers direclty around words	2022-04-05 17:41:32 +02:00
ManyTheFish	a93cd8c61c	Fix prefix highlight with special chars	2022-04-05 17:41:32 +02:00
ManyTheFish	b3f0f39106	Make some cleaning	2022-04-05 17:41:32 +02:00
ManyTheFish	6dc345bc53	Test and Fix prefix highlight	2022-04-05 17:41:32 +02:00
ManyTheFish	bd30ee97b8	Keep separators at start of the croped string	2022-04-05 17:41:32 +02:00
ManyTheFish	29c5f76d7f	Use new matcher in http-ui	2022-04-05 17:41:32 +02:00
ManyTheFish	734d0899d3	Publish Matcher	2022-04-05 17:41:32 +02:00
ManyTheFish	4428cb5909	Add some tests and fix some corner cases	2022-04-05 17:41:32 +02:00
ManyTheFish	844f546a8b	Add matches algorithm V1	2022-04-05 17:41:32 +02:00
ManyTheFish	3be1790803	Add crop algorithm with naive match algorithm	2022-04-05 17:41:32 +02:00
ManyTheFish	d96e72e5dc	Create formater with some tests	2022-04-05 17:41:32 +02:00
ad hoc	201fea0fda	limit extract_word_docids memory usage	2022-04-05 14:14:15 +02:00
ad hoc	5cfd3d8407	add exact attributes documentation	2022-04-05 14:10:22 +02:00
Clémentine Urquizar	9eec44dd98	Update version (v0.25.0)	2022-04-05 12:06:42 +02:00
ad hoc	b85cd4983e	remove field_id_from_position	2022-04-05 09:50:34 +02:00
ad hoc	ab185a59b5	fix infos	2022-04-05 09:46:56 +02:00
ad hoc	59e41d98e3	add comments to integration test	2022-04-04 21:17:06 +02:00
ad hoc	1810927dbd	rephrase exact_attributes doc	2022-04-04 21:04:49 +02:00
ad hoc	b7694c34f5	remove println	2022-04-04 21:00:07 +02:00
ad hoc	6cabd47c32	fix typo in comment	2022-04-04 20:59:20 +02:00
ad hoc	c8d3a09af8	add integration test for disabel typo on attributes	2022-04-04 20:54:03 +02:00
ad hoc	6b2c2509b2	fix bug in exact search	2022-04-04 20:54:03 +02:00
ad hoc	56b4f5dce2	add exact prefix to query_docids	2022-04-04 20:54:03 +02:00
ad hoc	21ae4143b1	add exact_word_prefix to Context	2022-04-04 20:54:03 +02:00
ad hoc	e8f06f6c06	extract exact_word_prefix_docids	2022-04-04 20:54:03 +02:00
ad hoc	6dd2e4ffbd	introduce exact_word_prefix database in index	2022-04-04 20:54:03 +02:00
ad hoc	ba0bb29cd8	refactor WordPrefixDocids to take dbs instead of indexes	2022-04-04 20:54:02 +02:00
ad hoc	c4c6e35352	query exact_word_docids in resolve_query_tree	2022-04-04 20:54:02 +02:00
ad hoc	8d46a5b0b5	extract exact word docids	2022-04-04 20:54:02 +02:00
ad hoc	5451c64d5d	increase criteria asc desc test map size	2022-04-04 20:54:02 +02:00
ad hoc	0a77be4ec0	introduce exact_word_docids db	2022-04-04 20:54:02 +02:00
ad hoc	5f9f82757d	refactor spawn_extraction_task	2022-04-04 20:54:02 +02:00
ad hoc	f82d4b36eb	introduce exact attribute setting	2022-04-04 20:54:02 +02:00
ad hoc	c882d8daf0	add test for exact words	2022-04-04 20:54:01 +02:00
ad hoc	7e9d56a9e7	disable typos on exact words	2022-04-04 20:54:01 +02:00
ad hoc	3e67d8818c	fix typo in test comment	2022-04-04 20:34:23 +02:00
ad hoc	284d8a24e0	add intergration test for disabled typon on word	2022-04-04 20:15:51 +02:00
ad hoc	30a2711bac	rename serde module to serde_impl module needed because of issues with rustfmt	2022-04-04 20:10:55 +02:00
ad hoc	0fd55db21c	fmt	2022-04-04 20:10:55 +02:00
ad hoc	559e46be5e	fix bad rebase bug	2022-04-04 20:10:55 +02:00
ad hoc	8b1e5d9c6d	add test for exact words	2022-04-04 20:10:55 +02:00
ad hoc	774fa8f065	disable typos on exact words	2022-04-04 20:10:55 +02:00
ad hoc	9bbffb8fee	add exact words setting	2022-04-04 20:10:54 +02:00
ad hoc	853b4a520f	fmt	2022-04-04 10:41:46 +02:00
ad hoc	2cb71dff4a	add typo integration tests	2022-04-04 10:41:46 +02:00
ad hoc	1941072bb2	implement Copy on Setting	2022-04-04 10:41:46 +02:00
ad hoc	fdaf45aab2	replace hardcoded value with constant in TestContext	2022-04-04 10:41:46 +02:00
ad hoc	950a740bd4	refactor typos for readability	2022-04-04 10:41:46 +02:00
ad hoc	66020cd923	rename min_word_len* to use plain letter numbers	2022-04-04 10:41:46 +02:00
ad hoc	4c4b336ecb	rename min word len for typo error	2022-04-01 11:17:03 +02:00
ad hoc	286dd7b2e4	rename min_word_len_2_typo	2022-04-01 11:17:03 +02:00
ad hoc	55af85db3c	add tests for min_word_len_for_typo	2022-04-01 11:17:02 +02:00
ad hoc	9102de5500	fix error message	2022-04-01 11:17:02 +02:00
ad hoc	a1a3a49bc9	dynamic minimum word len for typos in query tree builder	2022-04-01 11:17:02 +02:00
ad hoc	5a24e60572	introduce word len for typo setting	2022-04-01 11:17:02 +02:00
ad hoc	9fe40df960	add word derivations tests	2022-04-01 11:05:18 +02:00
ad hoc	d5ddc6b080	fix 2 typos word derivation bug	2022-04-01 10:51:22 +02:00
ad hoc	3e34981d9b	add test for authorize_typos in update	2022-03-31 14:12:00 +02:00
ad hoc	6ef3bb9d83	fmt	2022-03-31 14:06:23 +02:00
ad hoc	f782fe2062	add authorize_typo_test	2022-03-31 10:08:39 +02:00
ad hoc	c4653347fd	add authorize typo setting	2022-03-31 10:05:44 +02:00
Clémentine Urquizar	ddf78a735b	Update version (v0.24.1)	2022-03-24 16:39:45 +01:00
Irevoire	86dd88698d	bump tokenizer	2022-03-23 14:25:58 +01:00
Irevoire	5dc464b9a7	rollback meilisearch-tokenizer version	2022-03-21 17:29:10 +01:00
bors[bot]	90276d9a2d	Merge #472 472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function. Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-03-16 15:33:11 +00:00
ManyTheFish	49d59d88c2	Remove useless variables in proximity	2022-03-16 16:12:52 +01:00
Bruno Casali	adc71742c8	Move string concat to the struct instead of in the calling	2022-03-16 10:26:12 -03:00
Bruno Casali	4822fe1beb	Add a better error message when the filterable attrs are empty Fixes https://github.com/meilisearch/meilisearch/issues/2140	2022-03-15 18:13:59 -03:00
bors[bot]	f04ab67083	Merge #466 466: Bump version to 0.23.1 r=curquiza a=Kerollmops This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-03-15 17:19:05 +00:00
bors[bot]	ad4c982c68	Merge #439 439: Optimize typo criterion r=Kerollmops a=MarinPostma This pr implements a couple of optimization for the typo criterion: - clamp max typo on concatenated query words to 1: By considering that a concatenated query word is a typo, we clamp the max number of typos allowed o it to 1. This is useful because we noticed that concatenated query words often introduced words with 2 typos in queries that otherwise didn't allow for 2 typo words. - Make typos on the first letter count for 2. This change is a big performance gain: by considering the typos on the first letter to count as 2 typos, we drastically restrict the search space for 1 typo, and if we reach 2 typos, the search space is reduced as well, as we only consider: (2 typos ∩ correct first letter) ∪ (wrong first letter ∩ 1 typo) instead of 2 typos anywhere in the word. ## benches ``` group main typo ----- ---- ---- smol-songs.csv: asc + default/Notstandskomitee 2.51 5.8±0.01ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec smol-songs.csv: asc + default/charles 2.48 3.0±0.01ms ? ?/sec 1.00 1190.9±1.29µs ? ?/sec smol-songs.csv: asc + default/charles mingus 5.56 10.8±0.01ms ? ?/sec 1.00 1935.3±1.00µs ? ?/sec smol-songs.csv: asc + default/david 1.65 3.9±0.00ms ? ?/sec 1.00 2.4±0.01ms ? ?/sec smol-songs.csv: asc + default/david bowie 3.34 12.5±0.02ms ? ?/sec 1.00 3.7±0.00ms ? ?/sec smol-songs.csv: asc + default/john 1.00 1849.7±3.74µs ? ?/sec 1.01 1875.1±4.65µs ? ?/sec smol-songs.csv: asc + default/marcus miller 4.32 15.7±0.01ms ? ?/sec 1.00 3.6±0.01ms ? ?/sec smol-songs.csv: asc + default/michael jackson 3.31 12.5±0.01ms ? ?/sec 1.00 3.8±0.00ms ? ?/sec smol-songs.csv: asc + default/tamo 1.05 565.4±0.86µs ? ?/sec 1.00 539.3±1.22µs ? ?/sec smol-songs.csv: asc + default/thelonious monk 3.49 11.5±0.01ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec smol-songs.csv: asc/Notstandskomitee 2.59 5.6±0.02ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec smol-songs.csv: asc/charles 6.05 2.1±0.00ms ? ?/sec 1.00 347.8±0.60µs ? ?/sec smol-songs.csv: asc/charles mingus 14.46 9.4±0.01ms ? ?/sec 1.00 649.2±0.97µs ? ?/sec smol-songs.csv: asc/david 3.87 2.4±0.00ms ? ?/sec 1.00 618.2±0.69µs ? ?/sec smol-songs.csv: asc/david bowie 10.14 9.8±0.01ms ? ?/sec 1.00 970.8±1.55µs ? ?/sec smol-songs.csv: asc/john 1.00 546.5±1.10µs ? ?/sec 1.00 547.1±2.11µs ? ?/sec smol-songs.csv: asc/marcus miller 11.45 10.4±0.06ms ? ?/sec 1.00 907.9±1.37µs ? ?/sec smol-songs.csv: asc/michael jackson 10.56 9.7±0.01ms ? ?/sec 1.00 919.6±1.03µs ? ?/sec smol-songs.csv: asc/tamo 1.03 43.3±0.18µs ? ?/sec 1.00 42.2±0.23µs ? ?/sec smol-songs.csv: asc/thelonious monk 4.16 10.7±0.02ms ? ?/sec 1.00 2.6±0.00ms ? ?/sec smol-songs.csv: basic filter: <=/Notstandskomitee 1.00 95.7±0.20µs ? ?/sec 1.15 109.6±10.40µs ? ?/sec smol-songs.csv: basic filter: <=/charles 1.00 27.8±0.15µs ? ?/sec 1.01 27.9±0.18µs ? ?/sec smol-songs.csv: basic filter: <=/charles mingus 1.72 119.2±0.67µs ? ?/sec 1.00 69.1±0.13µs ? ?/sec smol-songs.csv: basic filter: <=/david 1.00 22.3±0.33µs ? ?/sec 1.05 23.4±0.19µs ? ?/sec smol-songs.csv: basic filter: <=/david bowie 1.59 86.9±0.79µs ? ?/sec 1.00 54.5±0.31µs ? ?/sec smol-songs.csv: basic filter: <=/john 1.00 17.9±0.06µs ? ?/sec 1.06 18.9±0.15µs ? ?/sec smol-songs.csv: basic filter: <=/marcus miller 1.65 102.7±1.63µs ? ?/sec 1.00 62.3±0.18µs ? ?/sec smol-songs.csv: basic filter: <=/michael jackson 1.76 128.2±1.85µs ? ?/sec 1.00 72.9±0.19µs ? ?/sec smol-songs.csv: basic filter: <=/tamo 1.00 17.9±0.13µs ? ?/sec 1.05 18.7±0.20µs ? ?/sec smol-songs.csv: basic filter: <=/thelonious monk 1.53 157.5±2.38µs ? ?/sec 1.00 102.8±0.88µs ? ?/sec smol-songs.csv: basic filter: TO/Notstandskomitee 1.00 100.9±4.36µs ? ?/sec 1.04 105.0±8.25µs ? ?/sec smol-songs.csv: basic filter: TO/charles 1.00 28.4±0.36µs ? ?/sec 1.03 29.4±0.33µs ? ?/sec smol-songs.csv: basic filter: TO/charles mingus 1.71 118.1±1.08µs ? ?/sec 1.00 68.9±0.26µs ? ?/sec smol-songs.csv: basic filter: TO/david 1.00 24.0±0.26µs ? ?/sec 1.03 24.6±0.43µs ? ?/sec smol-songs.csv: basic filter: TO/david bowie 1.72 95.2±0.30µs ? ?/sec 1.00 55.2±0.14µs ? ?/sec smol-songs.csv: basic filter: TO/john 1.00 18.8±0.09µs ? ?/sec 1.06 19.8±0.17µs ? ?/sec smol-songs.csv: basic filter: TO/marcus miller 1.61 102.4±1.65µs ? ?/sec 1.00 63.4±0.24µs ? ?/sec smol-songs.csv: basic filter: TO/michael jackson 1.77 132.1±1.41µs ? ?/sec 1.00 74.5±0.59µs ? ?/sec smol-songs.csv: basic filter: TO/tamo 1.00 18.2±0.14µs ? ?/sec 1.05 19.2±0.46µs ? ?/sec smol-songs.csv: basic filter: TO/thelonious monk 1.49 150.8±1.92µs ? ?/sec 1.00 101.3±0.44µs ? ?/sec smol-songs.csv: basic placeholder/ 1.00 27.3±0.07µs ? ?/sec 1.03 28.0±0.05µs ? ?/sec smol-songs.csv: basic with quote/"Notstandskomitee" 1.00 122.4±0.17µs ? ?/sec 1.03 125.6±0.16µs ? ?/sec smol-songs.csv: basic with quote/"charles" 1.00 88.8±0.30µs ? ?/sec 1.00 88.4±0.15µs ? ?/sec smol-songs.csv: basic with quote/"charles" "mingus" 1.00 685.2±0.74µs ? ?/sec 1.01 689.4±6.07µs ? ?/sec smol-songs.csv: basic with quote/"david" 1.00 161.6±0.42µs ? ?/sec 1.01 162.6±0.17µs ? ?/sec smol-songs.csv: basic with quote/"david" "bowie" 1.00 731.7±0.73µs ? ?/sec 1.02 743.1±0.77µs ? ?/sec smol-songs.csv: basic with quote/"john" 1.00 267.1±0.33µs ? ?/sec 1.01 270.9±0.33µs ? ?/sec smol-songs.csv: basic with quote/"marcus" "miller" 1.00 138.7±0.31µs ? ?/sec 1.02 140.9±0.13µs ? ?/sec smol-songs.csv: basic with quote/"michael" "jackson" 1.01 841.4±0.72µs ? ?/sec 1.00 833.8±0.92µs ? ?/sec smol-songs.csv: basic with quote/"tamo" 1.01 189.2±0.26µs ? ?/sec 1.00 188.2±0.71µs ? ?/sec smol-songs.csv: basic with quote/"thelonious" "monk" 1.00 1100.5±1.36µs ? ?/sec 1.01 1111.7±2.17µs ? ?/sec smol-songs.csv: basic without quote/Notstandskomitee 3.40 7.9±0.02ms ? ?/sec 1.00 2.3±0.02ms ? ?/sec smol-songs.csv: basic without quote/charles 2.57 494.4±0.89µs ? ?/sec 1.00 192.5±0.18µs ? ?/sec smol-songs.csv: basic without quote/charles mingus 1.29 2.8±0.02ms ? ?/sec 1.00 2.1±0.01ms ? ?/sec smol-songs.csv: basic without quote/david 1.95 623.8±0.90µs ? ?/sec 1.00 319.2±1.22µs ? ?/sec smol-songs.csv: basic without quote/david bowie 1.12 5.9±0.00ms ? ?/sec 1.00 5.2±0.00ms ? ?/sec smol-songs.csv: basic without quote/john 1.24 1340.9±2.25µs ? ?/sec 1.00 1084.7±7.76µs ? ?/sec smol-songs.csv: basic without quote/marcus miller 7.97 14.6±0.01ms ? ?/sec 1.00 1826.0±6.84µs ? ?/sec smol-songs.csv: basic without quote/michael jackson 1.19 3.9±0.00ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec smol-songs.csv: basic without quote/tamo 1.65 737.7±3.58µs ? ?/sec 1.00 446.7±0.51µs ? ?/sec smol-songs.csv: basic without quote/thelonious monk 1.16 4.5±0.02ms ? ?/sec 1.00 3.9±0.04ms ? ?/sec smol-songs.csv: big filter/Notstandskomitee 3.27 7.6±0.02ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec smol-songs.csv: big filter/charles 8.26 1957.5±1.37µs ? ?/sec 1.00 236.8±0.34µs ? ?/sec smol-songs.csv: big filter/charles mingus 18.49 11.2±0.06ms ? ?/sec 1.00 607.7±3.03µs ? ?/sec smol-songs.csv: big filter/david 3.78 2.4±0.00ms ? ?/sec 1.00 622.8±0.80µs ? ?/sec smol-songs.csv: big filter/david bowie 9.00 12.0±0.01ms ? ?/sec 1.00 1336.0±3.17µs ? ?/sec smol-songs.csv: big filter/john 1.00 554.2±0.95µs ? ?/sec 1.01 560.4±0.79µs ? ?/sec smol-songs.csv: big filter/marcus miller 18.09 12.0±0.01ms ? ?/sec 1.00 664.7±0.60µs ? ?/sec smol-songs.csv: big filter/michael jackson 8.43 12.0±0.01ms ? ?/sec 1.00 1421.6±1.37µs ? ?/sec smol-songs.csv: big filter/tamo 1.00 86.3±0.14µs ? ?/sec 1.01 87.3±0.21µs ? ?/sec smol-songs.csv: big filter/thelonious monk 5.55 14.3±0.02ms ? ?/sec 1.00 2.6±0.01ms ? ?/sec smol-songs.csv: desc + default/Notstandskomitee 2.52 5.8±0.01ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec smol-songs.csv: desc + default/charles 3.04 2.7±0.01ms ? ?/sec 1.00 893.4±1.08µs ? ?/sec smol-songs.csv: desc + default/charles mingus 6.77 10.3±0.01ms ? ?/sec 1.00 1520.8±1.90µs ? ?/sec smol-songs.csv: desc + default/david 1.39 5.7±0.00ms ? ?/sec 1.00 4.1±0.00ms ? ?/sec smol-songs.csv: desc + default/david bowie 2.34 15.8±0.02ms ? ?/sec 1.00 6.7±0.01ms ? ?/sec smol-songs.csv: desc + default/john 1.00 2.5±0.00ms ? ?/sec 1.02 2.6±0.01ms ? ?/sec smol-songs.csv: desc + default/marcus miller 5.06 14.5±0.02ms ? ?/sec 1.00 2.9±0.01ms ? ?/sec smol-songs.csv: desc + default/michael jackson 2.64 14.1±0.05ms ? ?/sec 1.00 5.4±0.00ms ? ?/sec smol-songs.csv: desc + default/tamo 1.00 567.0±0.65µs ? ?/sec 1.00 565.7±0.97µs ? ?/sec smol-songs.csv: desc + default/thelonious monk 3.55 11.6±0.02ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec smol-songs.csv: desc/Notstandskomitee 2.58 5.6±0.02ms ? ?/sec 1.00 2.2±0.02ms ? ?/sec smol-songs.csv: desc/charles 6.04 2.1±0.00ms ? ?/sec 1.00 348.1±0.57µs ? ?/sec smol-songs.csv: desc/charles mingus 14.51 9.4±0.01ms ? ?/sec 1.00 646.7±0.99µs ? ?/sec smol-songs.csv: desc/david 3.86 2.4±0.00ms ? ?/sec 1.00 620.7±2.46µs ? ?/sec smol-songs.csv: desc/david bowie 10.10 9.8±0.01ms ? ?/sec 1.00 973.9±3.31µs ? ?/sec smol-songs.csv: desc/john 1.00 545.5±0.78µs ? ?/sec 1.00 547.2±0.48µs ? ?/sec smol-songs.csv: desc/marcus miller 11.39 10.3±0.01ms ? ?/sec 1.00 903.7±0.95µs ? ?/sec smol-songs.csv: desc/michael jackson 10.51 9.7±0.01ms ? ?/sec 1.00 924.7±2.02µs ? ?/sec smol-songs.csv: desc/tamo 1.01 43.2±0.33µs ? ?/sec 1.00 42.6±0.35µs ? ?/sec smol-songs.csv: desc/thelonious monk 4.19 10.8±0.03ms ? ?/sec 1.00 2.6±0.00ms ? ?/sec smol-songs.csv: prefix search/a 1.00 1008.7±1.00µs ? ?/sec 1.00 1005.5±0.91µs ? ?/sec smol-songs.csv: prefix search/b 1.00 885.0±0.70µs ? ?/sec 1.01 890.6±1.11µs ? ?/sec smol-songs.csv: prefix search/i 1.00 1051.8±1.25µs ? ?/sec 1.00 1056.6±4.12µs ? ?/sec smol-songs.csv: prefix search/s 1.00 724.7±1.77µs ? ?/sec 1.00 721.6±0.59µs ? ?/sec smol-songs.csv: prefix search/x 1.01 212.4±0.21µs ? ?/sec 1.00 210.9±0.38µs ? ?/sec smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie 18.55 48.5±0.09ms ? ?/sec 1.00 2.6±0.03ms ? ?/sec smol-songs.csv: proximity/The Disneyland Sing-Along Chorus 8.41 56.7±0.45ms ? ?/sec 1.00 6.7±0.05ms ? ?/sec smol-songs.csv: proximity/Under Great Northern Lights 15.74 38.9±0.14ms ? ?/sec 1.00 2.5±0.00ms ? ?/sec smol-songs.csv: proximity/black saint sinner lady 11.82 40.1±0.13ms ? ?/sec 1.00 3.4±0.02ms ? ?/sec smol-songs.csv: proximity/les dangeureuses 1960 6.90 26.1±0.13ms ? ?/sec 1.00 3.8±0.04ms ? ?/sec smol-songs.csv: typo/Arethla Franklin 14.93 5.8±0.01ms ? ?/sec 1.00 390.1±1.89µs ? ?/sec smol-songs.csv: typo/Disnaylande 3.18 7.3±0.01ms ? ?/sec 1.00 2.3±0.00ms ? ?/sec smol-songs.csv: typo/dire straights 5.55 15.2±0.02ms ? ?/sec 1.00 2.7±0.00ms ? ?/sec smol-songs.csv: typo/fear of the duck 28.03 20.0±0.03ms ? ?/sec 1.00 713.3±1.54µs ? ?/sec smol-songs.csv: typo/indochie 19.25 1851.4±2.38µs ? ?/sec 1.00 96.2±0.13µs ? ?/sec smol-songs.csv: typo/indochien 14.66 1887.7±3.18µs ? ?/sec 1.00 128.8±0.18µs ? ?/sec smol-songs.csv: typo/klub des loopers 37.73 18.0±0.02ms ? ?/sec 1.00 476.7±0.73µs ? ?/sec smol-songs.csv: typo/michel depech 10.17 5.8±0.01ms ? ?/sec 1.00 565.8±1.16µs ? ?/sec smol-songs.csv: typo/mongus 15.33 1897.4±3.44µs ? ?/sec 1.00 123.8±0.13µs ? ?/sec smol-songs.csv: typo/stromal 14.63 1859.3±2.40µs ? ?/sec 1.00 127.1±0.29µs ? ?/sec smol-songs.csv: typo/the white striper 10.83 9.4±0.01ms ? ?/sec 1.00 866.0±0.98µs ? ?/sec smol-songs.csv: typo/thelonius monk 14.40 3.8±0.00ms ? ?/sec 1.00 261.5±1.30µs ? ?/sec smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots 5.54 70.8±0.09ms ? ?/sec 1.00 12.8±0.03ms ? ?/sec smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title 3.48 119.8±0.14ms ? ?/sec 1.00 34.4±0.04ms ? ?/sec smol-songs.csv: words/The Disneyland Children's Sing-Alone song 8.98 71.9±0.12ms ? ?/sec 1.00 8.0±0.01ms ? ?/sec smol-songs.csv: words/les liaisons dangeureuses 1793 11.88 37.4±0.07ms ? ?/sec 1.00 3.1±0.01ms ? ?/sec smol-songs.csv: words/seven nation mummy 22.86 23.4±0.04ms ? ?/sec 1.00 1024.8±1.57µs ? ?/sec smol-songs.csv: words/the black saint and the sinner lady and the good doggo 2.76 124.4±0.15ms ? ?/sec 1.00 45.1±0.09ms ? ?/sec smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one 2.52 107.0±0.23ms ? ?/sec 1.00 42.4±0.66ms ? ?/sec group main-wiki typo-wiki ----- --------- --------- smol-wiki-articles.csv: basic placeholder/ 1.02 13.7±0.02µs ? ?/sec 1.00 13.4±0.03µs ? ?/sec smol-wiki-articles.csv: basic with quote/"film" 1.02 409.8±0.67µs ? ?/sec 1.00 402.6±0.48µs ? ?/sec smol-wiki-articles.csv: basic with quote/"france" 1.00 325.9±0.91µs ? ?/sec 1.00 326.4±0.49µs ? ?/sec smol-wiki-articles.csv: basic with quote/"japan" 1.00 218.4±0.26µs ? ?/sec 1.01 220.5±0.20µs ? ?/sec smol-wiki-articles.csv: basic with quote/"machine" 1.00 143.0±0.12µs ? ?/sec 1.04 148.8±0.21µs ? ?/sec smol-wiki-articles.csv: basic with quote/"miles" "davis" 1.00 11.7±0.06ms ? ?/sec 1.00 11.8±0.01ms ? ?/sec smol-wiki-articles.csv: basic with quote/"mingus" 1.00 4.4±0.03ms ? ?/sec 1.00 4.4±0.00ms ? ?/sec smol-wiki-articles.csv: basic with quote/"rock" "and" "roll" 1.00 43.5±0.08ms ? ?/sec 1.01 43.8±0.06ms ? ?/sec smol-wiki-articles.csv: basic with quote/"spain" 1.00 137.3±0.35µs ? ?/sec 1.05 144.4±0.23µs ? ?/sec smol-wiki-articles.csv: basic without quote/film 1.00 125.3±0.30µs ? ?/sec 1.06 133.1±0.37µs ? ?/sec smol-wiki-articles.csv: basic without quote/france 1.21 1782.6±1.65µs ? ?/sec 1.00 1477.0±1.39µs ? ?/sec smol-wiki-articles.csv: basic without quote/japan 1.28 1363.9±0.80µs ? ?/sec 1.00 1064.3±1.79µs ? ?/sec smol-wiki-articles.csv: basic without quote/machine 1.73 760.3±0.81µs ? ?/sec 1.00 439.6±0.75µs ? ?/sec smol-wiki-articles.csv: basic without quote/miles davis 1.03 17.0±0.03ms ? ?/sec 1.00 16.5±0.02ms ? ?/sec smol-wiki-articles.csv: basic without quote/mingus 1.07 5.3±0.01ms ? ?/sec 1.00 5.0±0.00ms ? ?/sec smol-wiki-articles.csv: basic without quote/rock and roll 1.01 63.9±0.18ms ? ?/sec 1.00 63.0±0.07ms ? ?/sec smol-wiki-articles.csv: basic without quote/spain 2.07 667.4±0.93µs ? ?/sec 1.00 322.8±0.29µs ? ?/sec smol-wiki-articles.csv: prefix search/c 1.00 343.1±0.47µs ? ?/sec 1.00 344.0±0.34µs ? ?/sec smol-wiki-articles.csv: prefix search/g 1.00 374.4±3.42µs ? ?/sec 1.00 374.1±0.44µs ? ?/sec smol-wiki-articles.csv: prefix search/j 1.00 359.9±0.31µs ? ?/sec 1.00 361.2±0.79µs ? ?/sec smol-wiki-articles.csv: prefix search/q 1.01 102.0±0.12µs ? ?/sec 1.00 101.4±0.32µs ? ?/sec smol-wiki-articles.csv: prefix search/t 1.00 536.7±1.39µs ? ?/sec 1.00 534.3±0.84µs ? ?/sec smol-wiki-articles.csv: prefix search/x 1.00 400.9±1.00µs ? ?/sec 1.00 399.5±0.45µs ? ?/sec smol-wiki-articles.csv: proximity/april paris 3.86 14.4±0.01ms ? ?/sec 1.00 3.7±0.01ms ? ?/sec smol-wiki-articles.csv: proximity/diesel engine 12.98 10.4±0.01ms ? ?/sec 1.00 803.5±1.13µs ? ?/sec smol-wiki-articles.csv: proximity/herald sings 1.00 12.7±0.06ms ? ?/sec 5.29 67.1±0.09ms ? ?/sec smol-wiki-articles.csv: proximity/tea two 6.48 1452.1±2.78µs ? ?/sec 1.00 224.1±0.38µs ? ?/sec smol-wiki-articles.csv: typo/Disnaylande 3.89 8.5±0.01ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec smol-wiki-articles.csv: typo/aritmetric 3.78 10.3±0.01ms ? ?/sec 1.00 2.7±0.00ms ? ?/sec smol-wiki-articles.csv: typo/linax 8.91 1426.7±0.97µs ? ?/sec 1.00 160.1±0.18µs ? ?/sec smol-wiki-articles.csv: typo/migrosoft 7.48 1417.3±5.84µs ? ?/sec 1.00 189.5±0.88µs ? ?/sec smol-wiki-articles.csv: typo/nympalidea 3.96 7.2±0.01ms ? ?/sec 1.00 1810.1±2.03µs ? ?/sec smol-wiki-articles.csv: typo/phytogropher 3.71 7.2±0.01ms ? ?/sec 1.00 1934.3±6.51µs ? ?/sec smol-wiki-articles.csv: typo/sisan 6.44 1497.2±1.38µs ? ?/sec 1.00 232.7±0.94µs ? ?/sec smol-wiki-articles.csv: typo/the fronce 6.92 2.9±0.00ms ? ?/sec 1.00 418.0±1.76µs ? ?/sec smol-wiki-articles.csv: words/Abraham machin 16.63 10.8±0.01ms ? ?/sec 1.00 649.7±1.08µs ? ?/sec smol-wiki-articles.csv: words/Idaho Bellevue pizza 27.15 25.6±0.03ms ? ?/sec 1.00 944.2±5.07µs ? ?/sec smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk 26.87 40.7±0.05ms ? ?/sec 1.00 1515.3±2.73µs ? ?/sec smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli 11.99 48.8±0.10ms ? ?/sec 1.00 4.1±0.02ms ? ?/sec smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo 4.90 110.0±0.15ms ? ?/sec 1.00 22.4±0.03ms ? ?/sec ``` Co-authored-by: mpostma <postma.marin@protonmail.com> Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-15 16:43:36 +00:00
ad hoc	3f24555c3d	custom fst automatons	2022-03-15 17:38:35 +01:00
ad hoc	628c835a22	fix tests	2022-03-15 17:38:34 +01:00
bors[bot]	8efac33b53	Merge #467 467: optimize prefix database r=Kerollmops a=MarinPostma This pr introduces two optimizations that greatly improve the speed of computing prefix databases. - The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST. - We unconditionally and needlessly checked for documents to remove in `word_prefix_pair`, which caused an iteration over the whole database. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-15 16:14:35 +00:00
ad hoc	d127c57f2d	review edits	2022-03-15 17:12:48 +01:00
ad hoc	d633ac5b9d	optimize word prefix pair	2022-03-15 16:37:22 +01:00
ad hoc	d68fe2b3c7	optimize word prefix fst	2022-03-15 16:36:48 +01:00
Kerollmops	08a06b49f0	Bump version to 0.23.1	2022-03-15 15:50:28 +01:00
Clément Renault	0c5f4ed7de	Apply suggestions Co-authored-by: Many <many@meilisearch.com>	2022-03-15 14:18:29 +01:00
Kerollmops	21ec334dcc	Fix the compilation error of the dependency versions	2022-03-15 11:17:45 +01:00
Kerollmops	63682c2c9a	Upgrade the dependencies	2022-03-15 11:17:44 +01:00
Kerollmops	288a879411	Remove three useless dependencies	2022-03-15 11:17:44 +01:00
psvnl sai kumar	5e08fac729	fixes for rustfmt pass	2022-03-14 19:22:41 +05:30
psvnl sai kumar	92e2e09434	exporting heed to avoid having different versions of Heed in Meilisearch	2022-03-14 01:01:58 +05:30
Kerollmops	1ae13c1374	Avoid iterating on big databases when useless	2022-03-09 15:43:54 +01:00
Bruno Casali	66c6d5e1ef	Add a new error message when the `valid_fields` is empty > "Attribute `{}` is not sortable. This index doesn't have configured sortable attributes." > "Attribute `{}` is not sortable. Available sortable attributes are: `{}`." coexist in the error handling	2022-03-05 10:38:18 -03:00
Clémentine Urquizar	d9ed9de2b0	Update heed link in cargo toml	2022-03-01 19:45:29 +01:00
Kerollmops	d5b8b5a2f8	Replace the ugly unwraps by clean if let Somes	2022-02-28 16:31:33 +01:00
Kerollmops	8d26f3040c	Remove a useless grenad file merging	2022-02-28 16:31:33 +01:00
Clément Renault	04b1bbf932	Reintroduce appending sorted entries when possible	2022-02-24 14:50:45 +01:00
bors[bot]	25123af3b8	Merge #436 436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops This PR depends on the fixes done in #431 and must be merged after it. In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents. --- The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity. The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-02-16 15:41:14 +00:00
Clément Renault	ff8d7a810d	Change the behavior of the as_cloneable_grenad by taking a ref	2022-02-16 15:40:08 +01:00
Clément Renault	f367cc2e75	Finally bump grenad to v0.4.1	2022-02-16 15:28:48 +01:00
Irevoire	0defeb268c	bump milli	2022-02-16 13:27:41 +01:00
Irevoire	48542ac8fd	get rid of chrono in favor of time	2022-02-15 11:41:55 +01:00
Clémentine Urquizar	d03b3ceb58	Update version for the next release (v0.22.1)	2022-02-07 18:39:29 +01:00
bors[bot]	5d58cb7449	Merge #442 442: fix phrase search r=curquiza a=MarinPostma Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-02-07 16:18:20 +00:00
ad hoc	bd2262ceea	allow null values in csv	2022-02-03 16:03:01 +01:00
ad hoc	13de251047	rewrite word pair distance gathering	2022-02-03 15:57:20 +01:00
Many	d59bcea749	Revert "Revert "Change chunk size to 4MiB to fit more the end user usage""	2022-02-02 17:01:13 +01:00
mpostma	7541ab99cd	review changes	2022-02-02 12:59:01 +01:00
mpostma	d0aabde502	optimize 2 typos case	2022-02-02 12:56:09 +01:00
mpostma	55e6cb9c7b	typos on first letter counts as 2	2022-02-02 12:56:09 +01:00
mpostma	642c01d0dc	set max typos on ngram to 1	2022-02-02 12:56:08 +01:00
ad hoc	d852dc0d2b	fix phrase search	2022-02-01 20:21:33 +01:00
Kerollmops	fb79c32430	Compute the new, common and, deleted prefix words fst once	2022-01-27 11:00:18 +01:00
Clément Renault	51d1e64b23	Remove, now useless, the WriteMethod enum	2022-01-27 10:08:35 +01:00
Clément Renault	e9c02173cf	Rework the WordsPrefixPositionDocids update to compute a subset of the database	2022-01-27 10:08:35 +01:00
Clément Renault	dbba5fd461	Create a function to simplify the word prefix pair proximity docids compute	2022-01-27 10:08:35 +01:00
Clément Renault	e760e02737	Fix the computation of the newly added and common prefix pair proximity words	2022-01-27 10:08:35 +01:00
Clément Renault	d59e559317	Fix the computation of the newly added and common prefix words	2022-01-27 10:08:34 +01:00
Clément Renault	2ec8542105	Rework the WordPrefixDocids update to compute a subset of the database	2022-01-27 10:08:34 +01:00
Clément Renault	28692f65be	Rework the WordPrefixDocids update to compute a subset of the database	2022-01-27 10:08:34 +01:00
Clément Renault	5404bc02dd	Move the fst_stream_into_hashset method in the helper methods	2022-01-27 10:06:00 +01:00
Clément Renault	c90fa95f93	Only compute the word prefix pairs on the created word pair proximities	2022-01-27 10:06:00 +01:00
Clément Renault	822f67e9ad	Bring the newly created word pair proximity docids	2022-01-27 10:06:00 +01:00
Clément Renault	d28f18658e	Retrieve the previous version of the words prefixes FST	2022-01-27 10:05:59 +01:00
bors[bot]	38d23546a5	Merge #431 431: Fix and improve word prefix pair proximity r=ManyTheFish a=Kerollmops This PR first fixes the algorithm we used to select and compute the word prefix pair proximity database. The previous version was skipping nearly all of the prefixes. The issue is that this fix made this method to take more time and we were trying to reduce the time spent in it. With `@ManyTheFish` we found out that we could skip some of the work we were doing by: - discarding the prefixes that were shorter than a specific threshold (default: 2). - discarding the word prefix pairs with proximity bigger than a specific threshold (default: 4). - remove the unused threshold that was specifying a minimum amount of word docids to merge. We will take more time to do some more optimization, like stop clearing and recomputing from scratch the database, we will compute the subsets of keys to create, keep and merge. This change is a little bit more complex than what this PR does. I keep this PR as a draft as I want to further test the real gain if it is enough or not if it is valid or not. I advise reviewers to review commit by commit to see the changes bit by bit, reviewing the whole PR can be hard. Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-27 07:04:56 +00:00
Clément Renault	f9b214f34e	Apply suggestions from code review Co-authored-by: Many <legendre.maxime.isn@gmail.com>	2022-01-26 11:28:11 +01:00
bors[bot]	e1cc025cbd	Merge #440 440: fix(fuzzer): fix the fuzzer after #430 r=Kerollmops a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-25 16:33:57 +00:00
Clément Renault	f04cd19886	Introduce a max prefix length parameter to the word prefix pair proximity update	2022-01-25 17:04:23 +01:00
Clément Renault	1514dfa1b7	Introduce a max proximity parameter to the word prefix pair proximity update	2022-01-25 17:04:23 +01:00
Clément Renault	23ea3ad738	Remove the useless threshold when computing the word prefix pair proximity	2022-01-25 17:04:23 +01:00
Clément Renault	e3c34684c6	Fix a bug where we were skipping most of the prefix pairs	2022-01-25 17:04:23 +01:00
Tamo	fb51d511be	fix(fuzzer): fix the fuzzer after #430	2022-01-25 12:08:47 +01:00
bors[bot]	9f2ff71581	Merge #434 434: bump milli to v0.22.0 r=curquiza a=irevoire This is breaking because of this PR: `98a365aaae` Should we do a special branch to only release the [patch](https://github.com/meilisearch/milli/pull/433) for https://github.com/meilisearch/MeiliSearch/issues/2082 (which is non-breaking)? Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-24 17:31:20 +00:00
bors[bot]	fd177b63f8	Merge #423 423: Remove an unused file r=irevoire a=irevoire This empty file is not included anywhere Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-19 14:18:05 +00:00
Marin Postma	0c84a40298	document batch support reusable transform rework update api add indexer config fix tests review changes Co-authored-by: Clément Renault <clement@meilisearch.com> fmt	2022-01-19 12:40:20 +01:00
Tamo	01968d7ca7	ensure we get no documents and no error when filtering on an empty db	2022-01-18 11:40:30 +01:00
Tamo	367f403693	bump milli	2022-01-17 16:41:34 +01:00
bors[bot]	8f4499090b	Merge #433 433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we return an empty RoaringBitmap instead of throwing an internal error Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released Co-authored-by: Tamo <tamo@meilisearch.com>	2022-01-17 14:06:53 +00:00
bors[bot]	4c516c00da	Merge #426 426: Fix search highlight for non-unicode chars r=ManyTheFish a=Samyak2 # Pull Request ## What does this PR do? Fixes https://github.com/meilisearch/MeiliSearch/issues/1480 <!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. --> ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? ## Changes The `matching_bytes` function takes a `&Token` now and: - gets the number of bytes to highlight (unchanged). - uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight. In essence, the `matching_bytes` function now returns the number of matching grapheme clusters instead of bytes. Added proper highlighting in the HTTP UI: - requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens - `<mark>` tag is put around only the matched part - before this change, the entire word was highlighted even if only a part of it matched ## Questions Since `matching_bytes` does not return number of bytes but grapheme clusters, should it be renamed to something like `matching_chars` or `matching_graphemes`? Will this break the API? Thank you very much `@ManyTheFish` for helping 😄 Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>	2022-01-17 13:39:00 +00:00
Tamo	d1ac40ea14	fix(filter): Fix two bugs. - Stop lowercasing the field when looking in the field id map - When a field id does not exist it means there is currently zero documents containing this field thus we returns an empty RoaringBitmap instead of throwing an internal error	2022-01-17 13:51:46 +01:00
Samyak S Sarnayak	2d7607734e	Run cargo fmt on matching_words.rs	2022-01-17 13:04:33 +05:30

... 3 4 5 6 7 ...

1082 Commits