meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-27 04:25:06 +08:00

Author	SHA1	Message	Date
bors[bot]	21284cf235	Merge #556 556: Add EXISTS filter r=loiclec a=loiclec ## What does this PR do? Fixes issue [#2484](https://github.com/meilisearch/meilisearch/issues/2484) in the meilisearch repo. It creates a `field EXISTS` filter which selects all documents containing the `field` key. For example, with the following documents: ```json [{ "id": 0, "colour": [] }, { "id": 1, "colour": ["blue", "green"] }, { "id": 2, "colour": 145238 }, { "id": 3, "colour": null }, { "id": 4, "colour": { "green": [] } }, { "id": 5, "colour": {} }, { "id": 6 }] ``` Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`. ## Details There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists. To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place. There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON: ```json { "id": 0, "colour": [], "size": {} } ``` would be flattened to: ```json { "id": 0 } ``` prior to being given to the extraction pipeline. This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example: ```json { "id": 0, "colour": { "green": [], "blue": 1 }, "size": {} } ``` becomes ```json { "id": 0, "colour": [], "colour.green": [], "colour.blue": 1, "size": [] } ``` Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-08-04 09:46:06 +00:00
bors[bot]	50f6524ff2	Merge #579 579: Stop reindexing already indexed documents r=ManyTheFish a=irevoire ``` % ./compare.sh indexing_stop-reindexing-unchanged-documents_cb5a1669.json indexing_main_eeba1960.json group indexing_main_eeba1960 indexing_stop-reindexing-unchanged-documents_cb5a1669 ----- ---------------------- ----------------------------------------------------- indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.03 2.0±0.22ms ? ?/sec 1.00 1955.4±336.24µs ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.08 11.0±2.93ms ? ?/sec 1.00 10.2±4.04ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 15.1±3.89ms ? ?/sec 1.14 17.1±5.18ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.26 59.2±12.01ms ? ?/sec 1.00 47.1±8.52ms ? ?/sec indexing/-wiki-delete-searchable- 1.08 316.6±31.53ms ? ?/sec 1.00 293.6±17.00ms ? ?/sec indexing/Indexing geo_point 1.01 60.9±0.31s ? ?/sec 1.00 60.6±0.36s ? ?/sec indexing/Indexing movies in three batches 1.04 20.0±0.30s ? ?/sec 1.00 19.2±0.25s ? ?/sec indexing/Indexing movies with default settings 1.02 19.1±0.18s ? ?/sec 1.00 18.7±0.24s ? ?/sec indexing/Indexing nested movies with default settings 1.02 26.2±0.29s ? ?/sec 1.00 25.9±0.22s ? ?/sec indexing/Indexing nested movies without any facets 1.02 25.3±0.32s ? ?/sec 1.00 24.7±0.26s ? ?/sec indexing/Indexing songs in three batches with default settings 1.00 66.7±0.41s ? ?/sec 1.01 67.1±0.86s ? ?/sec indexing/Indexing songs with default settings 1.00 58.3±0.90s ? ?/sec 1.01 58.8±1.32s ? ?/sec indexing/Indexing songs without any facets 1.00 54.5±1.43s ? ?/sec 1.01 55.2±1.29s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 57.9±1.20s ? ?/sec 1.01 58.4±0.93s ? ?/sec indexing/Indexing wiki 1.00 1052.0±10.95s ? ?/sec 1.02 1069.4±20.38s ? ?/sec indexing/Indexing wiki in three batches 1.00 1193.1±8.83s ? ?/sec 1.00 1189.5±9.40s ? ?/sec indexing/Reindexing geo_point 3.22 67.5±0.73s ? ?/sec 1.00 21.0±0.16s ? ?/sec indexing/Reindexing movies with default settings 3.75 19.4±0.28s ? ?/sec 1.00 5.2±0.05s ? ?/sec indexing/Reindexing songs with default settings 8.90 61.4±0.91s ? ?/sec 1.00 6.9±0.07s ? ?/sec indexing/Reindexing wiki 1.00 1748.2±35.68s ? ?/sec 1.00 1750.5±18.53s ? ?/sec ``` tldr: We do not lose any performance on the normal indexing benchmark, but we get between 3 and 8 times faster on the reindexing benchmarks 👍 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-08-04 08:10:37 +00:00
ManyTheFish	d6f9a60a32	fix: Remove whitespace trimming during document id validation fix #592	2022-08-03 11:38:40 +02:00
Tamo	7fc35c5586	remove the useless prints	2022-08-02 10:31:22 +02:00
Tamo	f156d7dd3b	Stop reindexing already indexed documents	2022-08-02 10:31:20 +02:00
Loïc Lecrenier	07003704a8	Merge branch 'filter/field-exist'	2022-07-21 14:51:41 +02:00
Clémentine Urquizar	d5e9b7305b	Update version for next release (v0.32.0)	2022-07-21 13:20:02 +04:00
ManyTheFish	cbb3b25459	Fix(Search): Fix phrase search candidates computation This bug is an old bug but was hidden by the proximity criterion, Phrase search were always returning an empty candidates list. Before the fix, we were trying to find any words[n] near words[n] instead of finding any words[n] near words[n+1], for example: for a phrase search '"Hello world"' we were searching for "hello" near "hello" first, instead of "hello" near "world".	2022-07-21 10:04:30 +02:00
bors[bot]	941af58239	Merge #561 561: Enriched documents batch reader r=curquiza a=Kerollmops ~This PR is based on #555 and must be rebased on main after it has been merged to ease the review.~ This PR contains the work in #555 and can be merged on main as soon as reviewed and approved. - [x] Create an `EnrichedDocumentsBatchReader` that contains the external documents id. - [x] Extract the primary key name and make it accessible in the `EnrichedDocumentsBatchReader`. - [x] Use the external id from the `EnrichedDocumentsBatchReader` in the `Transform::read_documents`. - [x] Remove the `update_primary_key` from the _transform.rs_ file. - [x] Really generate the auto-generated documents ids. - [x] Insert the (auto-generated) document ids in the document while processing it in `Transform::read_documents`. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-07-21 07:08:50 +00:00
Loïc Lecrenier	41a0ce07cb	Add a code comment, as suggested in PR review Co-authored-by: Many the fish <many@meilisearch.com>	2022-07-20 16:20:35 +02:00
Loïc Lecrenier	1506683705	Avoid using too much memory when indexing facet-exists-docids	2022-07-19 14:42:35 +02:00
Loïc Lecrenier	d0eee5ff7a	Fix compiler error	2022-07-19 13:54:30 +02:00
Loïc Lecrenier	aed8c69bcb	Refactor indexation of the "facet-id-exists-docids" database The idea is to directly create a sorted and merged list of bitmaps in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating a grenad::Reader where the keys are field_id and the values are docids. Then we send that BTreeMap to the thing that handles TypedChunks, which inserts its content into the database.	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	1eb1e73bb3	Add integration tests for the EXISTS filter	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	4f0bd317df	Remove custom implementation of BytesEncode/Decode for the FieldId	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	80b962b4f4	Run cargo fmt	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	c17d616250	Refactor index_documents_check_exists_database tests	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	30bd4db0fc	Simplify indexing task for facet_exists_docids database	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	392472f4bb	Apply suggestions from code review Co-authored-by: Tamo <tamo@meilisearch.com>	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	0388b2d463	Run cargo fmt	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	dc64170a69	Improve syntax of EXISTS filter, allow “value NOT EXISTS”	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	72452f0cb2	Implements the EXIST filter operator	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	453d593ce8	Add a database containing the docids where each field exists	2022-07-19 10:07:33 +02:00
Many the fish	2d79720f5d	Update milli/src/search/matches/mod.rs	2022-07-18 17:48:04 +02:00
Many the fish	8ddb4e750b	Update milli/src/search/matches/mod.rs	2022-07-18 17:47:39 +02:00
Many the fish	a277daa1f2	Update milli/src/search/matches/mod.rs	2022-07-18 17:47:13 +02:00
Many the fish	fb794c6b5e	Update milli/src/search/matches/mod.rs	2022-07-18 17:46:00 +02:00
Many the fish	1237cfc249	Update milli/src/search/matches/mod.rs	2022-07-18 17:45:37 +02:00
Many the fish	d7fd5c58cd	Update milli/src/search/matches/mod.rs	2022-07-18 17:45:06 +02:00
Loïc Lecrenier	fc9f3f31e7	Change DocumentsBatchReader to access cursor and index at same time Otherwise it is not possible to iterate over all documents while using the fields index at the same time.	2022-07-18 16:08:14 +02:00
Loïc Lecrenier	ab1571cdec	Simplify Transform::read_documents, enabled by enriched documents reader	2022-07-18 12:45:47 +02:00
Many the fish	e261ef64d7	Update milli/src/search/matches/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-18 10:18:51 +02:00
Many the fish	1da4ab5918	Update milli/src/search/matches/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-18 10:18:03 +02:00
Kerollmops	448114cc1c	Fix the benchmarks with the new indexation API	2022-07-12 15:22:09 +02:00
Kerollmops	25e768f31c	Fix another issue with the nested primary key selector	2022-07-12 15:14:07 +02:00
Kerollmops	192793ee38	Add some tests to check for the nested documents ids	2022-07-12 15:14:07 +02:00
Kerollmops	a892a4a79c	Introduce a function to extend from a JSON array of objects	2022-07-12 15:14:06 +02:00
Kerollmops	dc61105554	Fix the nested document id fetching function	2022-07-12 15:14:06 +02:00
Kerollmops	2eec290424	Check the validity of the latitute and longitude numbers	2022-07-12 15:14:06 +02:00
Kerollmops	5d149d631f	Remove tests for a function that no more exists	2022-07-12 15:14:06 +02:00
Kerollmops	0bbcc7b180	Expose the `DocumentId` struct to be sure to inject the generated ids	2022-07-12 15:14:06 +02:00
Kerollmops	d1a4da9812	Generate a real UUIDv4 when ids are auto-generated	2022-07-12 15:14:06 +02:00
Kerollmops	c8ebf0de47	Rename the validate function as an enriching function	2022-07-12 15:14:06 +02:00
Kerollmops	905af2a2e9	Use the primary key and external id in the transform	2022-07-12 15:14:05 +02:00
Kerollmops	742543091e	Constify the default primary key name	2022-07-12 14:55:52 +02:00
Kerollmops	5f1bfb73ee	Extract the primary key name and make it accessible	2022-07-12 14:55:52 +02:00
Kerollmops	6a0a0ae94f	Make the Transform read from an EnrichedDocumentsBatchReader	2022-07-12 14:55:52 +02:00
Kerollmops	dc3f092d07	Do not leak an internal grenad Error	2022-07-12 14:55:52 +02:00
Kerollmops	8ebf5eed0d	Make the nested primary key work	2022-07-12 14:55:52 +02:00
Kerollmops	19eb3b4708	Make sur that we do not accept floats as documents ids	2022-07-12 14:55:52 +02:00
Kerollmops	2ceeb51c37	Support the auto-generated ids when validating documents	2022-07-12 14:55:51 +02:00
Kerollmops	399eec5c01	Fix the indexation tests	2022-07-12 14:55:51 +02:00
Kerollmops	fcfc4caf8c	Move the Object type in the lib.rs file and use it everywhere	2022-07-12 14:55:51 +02:00
Kerollmops	0146175fe6	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	cefffde9af	Improve the .gitignore of the fuzz crate	2022-07-12 14:55:51 +02:00
Kerollmops	bdc4263883	Introduce the validate_documents_batch function	2022-07-12 14:55:51 +02:00
Kerollmops	6d0498df24	Fix the fuzz tests	2022-07-12 14:52:56 +02:00
Kerollmops	e8297ad27e	Fix the tests for the new DocumentsBatchBuilder/Reader	2022-07-12 14:52:56 +02:00
Kerollmops	419ce3966c	Rework the DocumentsBatchBuilder/Reader to use grenad	2022-07-12 14:52:55 +02:00
Kerollmops	eb63af1f10	Update grenad to 0.4.2	2022-07-12 14:52:55 +02:00
Kerollmops	048e174efb	Do not allocate when parsing CSV headers	2022-07-12 14:52:55 +02:00
ManyTheFish	5d79617a56	Chores: Enhance smart-crop code comments	2022-07-07 16:28:09 +02:00
bors[bot]	ebddfdb9a3	Merge #578 578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584). Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-07-05 14:56:08 +00:00
Kerollmops	1bfdcfc84f	Bump uuid to 1.1.2	2022-07-05 16:23:36 +02:00
Tamo	250be9fe6c	put the threshold back to 10k	2022-07-05 15:57:44 +02:00
Tamo	b61efd09fc	Makes the internal soft deleted error a UserError	2022-07-05 15:34:45 +02:00
Tamo	eaf28b0628	Apply review suggestions Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-07-05 15:30:33 +02:00
Tamo	3b309f654a	Fasten the document deletion When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search, and all the other endpoints.	2022-07-05 15:30:33 +02:00
Tamo	446439e8be	bump charabia	2022-07-05 12:19:30 +02:00
Dmytro Gordon	3ff03a3f5f	Fix not equal filter when field contains both number and strings	2022-06-27 15:55:17 +03:00
Kerollmops	cc48992e79	Bump the milli version to 0.31.1	2022-06-22 17:05:51 +02:00
Kerollmops	238692a8e7	Introduce the copy_to_path method on the Index	2022-06-22 16:49:47 +02:00
bors[bot]	290a40b7a5	Merge #564 564: Rename the limitedTo parameter into maxTotalHits r=curquiza a=Kerollmops This PR is related to https://github.com/meilisearch/meilisearch/issues/2542, it renames the `limitedTo` parameter into `maxTotalHits`. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-22 13:48:33 +00:00
bors[bot]	d546f6f40e	Merge #563 563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops This PR is related to https://github.com/meilisearch/meilisearch/issues/2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way. The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned. If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-22 12:39:44 +00:00
Kerollmops	f5c3b951bc	Bump the milli version to 0.31.0	2022-06-22 12:08:16 +02:00
Kerollmops	d7c248042b	Rename the limitedTo parameter into maxTotalHits	2022-06-22 12:00:48 +02:00
Kerollmops	d2f84a9d9e	Improve the estimatedNbHits when distinct is enabled	2022-06-22 11:39:21 +02:00
bors[bot]	4f547eff02	Merge #560 560: Update version for next release (v0.30.0) r=curquiza a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-20 12:37:01 +00:00
Clémentine Urquizar	31f749b5d8	Update version for next release (v0.30.0)	2022-06-20 12:09:57 +02:00
ManyTheFish	a0ab90a4d7	Avoid having an ending separator before crop marker	2022-06-16 18:23:57 +02:00
ManyTheFish	177154828c	Extends deletion tests	2022-06-13 17:34:16 +02:00
ManyTheFish	0d1d354052	Ensure that Index methods are not bypassed by Meilisearch	2022-06-13 17:34:11 +02:00
bors[bot]	f1d848bb9a	Merge #552 552: Fix escaped quotes in filter r=Kerollmops a=irevoire Will fix https://github.com/meilisearch/meilisearch/issues/2380 The issue was that in the evaluation of the filter, I was using the deref implementation instead of calling the `value` method of my token. To avoid the problem happening again, I removed the deref implementation; now, you need to either call the `lexeme` or the `value` methods but can't rely on a « default » implementation to get a string out of a token. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-06-09 14:56:44 +00:00
Tamo	676187ba43	bump milli version	2022-06-09 16:53:32 +02:00
Tamo	90afde435b	fix escaped quotes in filter	2022-06-09 16:03:49 +02:00
Kerollmops	445d5474cc	Add the pagination_limited_to setting to the database	2022-06-08 18:14:27 +02:00
Kerollmops	69931e50d2	Add the max_values_by_facet setting to the database	2022-06-08 17:54:56 +02:00
Kerollmops	52a494bd3b	Add the new pagination.limited_to and faceting.max_values_per_facet settings	2022-06-08 17:15:36 +02:00
bors[bot]	9580b9de79	Merge #549 549: Bump the version to 0.29.2 r=curquiza a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-06-08 14:29:47 +00:00
Kerollmops	56ee9cc21f	Bump the version to 0.29.2	2022-06-08 16:00:06 +02:00
Kerollmops	2a505503b3	Change the number of facet values returned by default to 100	2022-06-08 15:58:57 +02:00
Kerollmops	bae4007447	Remove the hard limit on the number of facet values returned	2022-06-08 15:58:57 +02:00
bors[bot]	7313d6c533	Merge #547 547: Update version for next release (v0.29.1) r=Kerollmops a=curquiza A new milli version will be released once this PR is merged https://github.com/meilisearch/milli/pull/543 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-08 10:20:24 +00:00
Clémentine Urquizar	478dbfa45a	Update version for next release (v0.29.1)	2022-06-07 18:59:33 +02:00
Tamo	d0aaa7ff00	Fix wrong internal ids assignments	2022-06-07 15:49:33 +02:00
ad hoc	31776fdc3f	add failing test	2022-06-07 15:49:33 +02:00
bors[bot]	05ae6dbfa4	Merge #541 541: Update version for next release (v0.29.0) r=ManyTheFish a=curquiza Need to update the version since #540 was merged and breaking Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-06-02 16:53:28 +00:00
ManyTheFish	d212dc6b8b	Remove useless newline	2022-06-02 18:22:56 +02:00
Clémentine Urquizar	6ce1c6487a	Update version for next release (v0.29.0)	2022-06-02 18:07:55 +02:00
ManyTheFish	7aabe42ae0	Refactor matching words	2022-06-02 17:59:04 +02:00

1 2 3 4 5 ...

983 Commits