meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 18:45:06 +08:00

Author	SHA1	Message	Date
Loïc Lecrenier	86807ca848	Refactor word prefix pair proximity indexation further	2022-08-17 11:59:13 +02:00
Loïc Lecrenier	306593144d	Refactor word prefix pair proximity indexation	2022-08-17 11:59:00 +02:00
Loïc Lecrenier	5d59bfde8a	Sort Cargo.toml dependencies	2022-08-17 11:46:56 +02:00
bors[bot]	f55034ed54	Merge #606 606: Make binaries faster on release profile through better compile options r=Kerollmops a=loiclec Using `codegen-units = 1` and `lto = 'thin'` makes the compile time a bit longer, but also produces faster binaries. I'd like to run milli's benchmark with these options, so that we can see whether it is worth enabling on meilisearch. Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-17 08:57:24 +00:00
Loïc Lecrenier	03e679b634	Make binaries faster on release profile through better compile options	2022-08-17 10:29:33 +02:00
Loïc Lecrenier	f20e588ec1	Make sure there is one newline at eof in cargo.toml	2022-08-17 07:44:33 +02:00
Loïc Lecrenier	20be69e1b9	Always use mimalloc as the global allocator	2022-08-16 20:09:36 +02:00
bors[bot]	293a246af8	Merge #601 601: Introduce snapshot tests r=Kerollmops a=loiclec # Pull Request ## What does this PR do? Introduce snapshot tests into milli, by using the `insta` crate. This implements the idea described by #597 See: [insta.rs](https://insta.rs) ## Design There is now a new file, `snapshot_tests.rs`, which is compiled only under `#[cfg(test)]`. It exposes the `db_snap!` macro, which is used to snapshot the content of a database. When running `cargo test`, `insta` will check that the value of the current snapshot is the same as the previous one (on the file system). If they are the same, the test passes. If they are different, the test fails and you are asked to review the new snapshot to approve or reject it. We don't want to save very large snapshots to the file system, because it will pollute the git repository and increase its size too much. Instead, we only save their `md5` hashes under the name `<snapshot_name>.hash.snap`. There is a new environment variable called `MILLI_TEST_FULL_SNAPS` which can be set to `true` in order to also save the full content of the snapshot under the name `<snapshot_name>.full.snap`. However, snapshots with the extension `.full.snap` are never saved to the git repository. ## Example ```rust // In e.g. facets.rs #[test] fn my_test() { // create an index let index = TempIndex::new(): index.add_documents(...); index.update_settings(\|settings\| ...); // then snapshot the content of one of its databases // the snapshot will be saved at the current folder under facets.rs/my_test/facet_id_string_docids.snap db_snap!(index, facet_id_string_docids); index.add_documents(...); // we can also name the snapshot to ensure there is no conflict // this snapshot will be saved at facets.rs/my_test/updated/facet_id_string_docids.snap db_snap!(index, facet_id_string, docids, "updated"); // and we can also use "inline" snapshots, which insert their content in the given string literal db_snap!(index, field_distributions, `@"");` // once the snapshot is approved, it will automatically get transformed to, e.g.: // db_snap!(index, field_distributions, `@"` // my_facet 21 // other_field 3 // "); // now let's add many documents index.add_documents(...); // because the snapshot is too big, its hash is saved instead // if the MILLI_TEST_FULL_SNAPS env variable is set to true, then the full snapshot will also be saved // at facets.rs/my_test/large/facet_id_string_docids.full.snap db_snap!(index, facet_id_string_docids, "large", `@"5348bbc46b5384455b6a900666d2a502");` } ``` Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-16 11:57:09 +00:00
Loïc Lecrenier	dea00311b6	Add type annotations to remove compiler error	2022-08-16 09:19:30 +02:00
Loïc Lecrenier	fb2b6c0c28	Use mimalloc for benchmarks on all platforms	2022-08-10 16:56:42 +02:00
Loïc Lecrenier	6f49126223	Fix db_snap macro with inline parameter	2022-08-10 15:55:22 +02:00
Loïc Lecrenier	12920f2a4f	Fix paths of snapshot tests	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	4b7fd4dfae	Update insta version	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	ce560fdcb5	Add documentation for `db_snap!`	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	748bb86b5b	cargo fmt	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	051f24f674	Switch to snapshot tests for search/matches/mod.rs	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	d2e01528a6	Switch to snapshot tests for search/criteria/typo.rs	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	a9c7d82693	Switch to snapshot tests for search/criteria/attribute.rs	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	4bba2f41d7	Switch to snapshot tests for query_tree.rs	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	8ac24d3114	Cargo fmt + fix compiler warnings/error	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	6066256689	Add snapshot tests for indexing of word_prefix_pair_proximity_docids	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	3a734af159	Add snapshot tests for Facets::execute	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	b9907997e4	Remove old snapshot tests code	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	ef889ade5d	Refactor snapshot tests	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	334098a7e0	Add index snapshot test helper function	2022-08-10 15:53:46 +02:00
Loïc Lecrenier	8f73251012	Use mimalloc for benchmarks on macOS	2022-08-10 13:30:56 +02:00
ManyTheFish	b389be48a0	Factorize phrase computation	2022-08-08 10:37:31 +02:00
bors[bot]	950d8e4c44	Merge #600 600: Simplify some unit tests r=ManyTheFish a=loiclec # Pull Request ## What does this PR do? Simplify the code that is used in unit tests to create and modify an index. Basically, the following code: ```rust let path = tempfile::tempdir().unwrap(); let mut options = EnvOpenOptions::new(); options.map_size(10 * 1024 * 1024); // 10 MB let index = Index::new(options, &path).unwrap(); let mut wtxn = index.write_txn().unwrap(); let content = documents!([ { "id": 0, "name": "kevin" }, ]); let config = IndexerConfig::default(); let indexing_config = IndexDocumentsConfig::default(); let builder = IndexDocuments::new(&mut wtxn, &index, &config, indexing_config, \|_\| ()).unwrap(); let (builder, user_error) = builder.add_documents(content).unwrap(); user_error.unwrap(); builder.execute().unwrap(); wtxn.commit.unwrap(); let mut wtxn = index.write_txn().unwrap(); let config = IndexerConfig::default(); let mut builder = Settings::new(&mut wtxn, &index, &config); builder.set_primary_key(S("docid")); builder.set_filterable_fields(hashset! { S("label") }); builder.execute(\|_\| ()).unwrap(); wtxn.commit().unwrap(); ``` becomes: ```rust let index = TempIndex::new(): index.add_documents(documents!( { "id": 0, "name": "kevin" }, )).unwrap(); index.update_settings(\|settings\| { settings.set_primary_key(S("docid")); settings.set_filterable_fields(hashset! { S("label") }); }).unwrap(); ``` Then there is a bunch of options to modify the indexing configs, the map size, to reuse a transaction, etc. For example: ```rust let mut index = TempIndex::new_with_map_size(1000 * 4096 * 10); index.index_documents_config.autogenerate_docids = true; let mut wtxn = index.write_txn().unwrap(); index.update_settings_using_wtxn(&mut wtxn, \|settings\| { settings.set_primary_key(S("docids")); }).unwrap(); wtxn.commit().unwrap(); ``` Co-authored-by: Loïc Lecrenier <loic@meilisearch.com> Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com> Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>	2022-08-04 10:19:42 +00:00
Loïc Lecrenier	58cb1c1bda	Simplify unit tests in facet/filter.rs	2022-08-04 12:03:44 +02:00
Loïc Lecrenier	acff17fb88	Simplify indexing tests	2022-08-04 12:03:13 +02:00
bors[bot]	21284cf235	Merge #556 556: Add EXISTS filter r=loiclec a=loiclec ## What does this PR do? Fixes issue [#2484](https://github.com/meilisearch/meilisearch/issues/2484) in the meilisearch repo. It creates a `field EXISTS` filter which selects all documents containing the `field` key. For example, with the following documents: ```json [{ "id": 0, "colour": [] }, { "id": 1, "colour": ["blue", "green"] }, { "id": 2, "colour": 145238 }, { "id": 3, "colour": null }, { "id": 4, "colour": { "green": [] } }, { "id": 5, "colour": {} }, { "id": 6 }] ``` Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`. ## Details There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists. To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place. There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON: ```json { "id": 0, "colour": [], "size": {} } ``` would be flattened to: ```json { "id": 0 } ``` prior to being given to the extraction pipeline. This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example: ```json { "id": 0, "colour": { "green": [], "blue": 1 }, "size": {} } ``` becomes ```json { "id": 0, "colour": [], "colour.green": [], "colour.blue": 1, "size": [] } ``` Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-08-04 09:46:06 +00:00
bors[bot]	50f6524ff2	Merge #579 579: Stop reindexing already indexed documents r=ManyTheFish a=irevoire ``` % ./compare.sh indexing_stop-reindexing-unchanged-documents_cb5a1669.json indexing_main_eeba1960.json group indexing_main_eeba1960 indexing_stop-reindexing-unchanged-documents_cb5a1669 ----- ---------------------- ----------------------------------------------------- indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.03 2.0±0.22ms ? ?/sec 1.00 1955.4±336.24µs ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.08 11.0±2.93ms ? ?/sec 1.00 10.2±4.04ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 15.1±3.89ms ? ?/sec 1.14 17.1±5.18ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.26 59.2±12.01ms ? ?/sec 1.00 47.1±8.52ms ? ?/sec indexing/-wiki-delete-searchable- 1.08 316.6±31.53ms ? ?/sec 1.00 293.6±17.00ms ? ?/sec indexing/Indexing geo_point 1.01 60.9±0.31s ? ?/sec 1.00 60.6±0.36s ? ?/sec indexing/Indexing movies in three batches 1.04 20.0±0.30s ? ?/sec 1.00 19.2±0.25s ? ?/sec indexing/Indexing movies with default settings 1.02 19.1±0.18s ? ?/sec 1.00 18.7±0.24s ? ?/sec indexing/Indexing nested movies with default settings 1.02 26.2±0.29s ? ?/sec 1.00 25.9±0.22s ? ?/sec indexing/Indexing nested movies without any facets 1.02 25.3±0.32s ? ?/sec 1.00 24.7±0.26s ? ?/sec indexing/Indexing songs in three batches with default settings 1.00 66.7±0.41s ? ?/sec 1.01 67.1±0.86s ? ?/sec indexing/Indexing songs with default settings 1.00 58.3±0.90s ? ?/sec 1.01 58.8±1.32s ? ?/sec indexing/Indexing songs without any facets 1.00 54.5±1.43s ? ?/sec 1.01 55.2±1.29s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 57.9±1.20s ? ?/sec 1.01 58.4±0.93s ? ?/sec indexing/Indexing wiki 1.00 1052.0±10.95s ? ?/sec 1.02 1069.4±20.38s ? ?/sec indexing/Indexing wiki in three batches 1.00 1193.1±8.83s ? ?/sec 1.00 1189.5±9.40s ? ?/sec indexing/Reindexing geo_point 3.22 67.5±0.73s ? ?/sec 1.00 21.0±0.16s ? ?/sec indexing/Reindexing movies with default settings 3.75 19.4±0.28s ? ?/sec 1.00 5.2±0.05s ? ?/sec indexing/Reindexing songs with default settings 8.90 61.4±0.91s ? ?/sec 1.00 6.9±0.07s ? ?/sec indexing/Reindexing wiki 1.00 1748.2±35.68s ? ?/sec 1.00 1750.5±18.53s ? ?/sec ``` tldr: We do not lose any performance on the normal indexing benchmark, but we get between 3 and 8 times faster on the reindexing benchmarks 👍 Co-authored-by: Tamo <tamo@meilisearch.com>	2022-08-04 08:10:37 +00:00
bors[bot]	e8987cf5aa	Merge #599 599: fix: Remove whitespace trimming during document id validation r=ManyTheFish a=ManyTheFish fix #592 related to https://github.com/meilisearch/meilisearch/issues/2640 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-08-03 14:55:25 +00:00
ManyTheFish	d6f9a60a32	fix: Remove whitespace trimming during document id validation fix #592	2022-08-03 11:38:40 +02:00
Tamo	7fc35c5586	remove the useless prints	2022-08-02 10:31:22 +02:00
Tamo	f156d7dd3b	Stop reindexing already indexed documents	2022-08-02 10:31:20 +02:00
Loïc Lecrenier	1fe224f2c6	Update filter-parser/fuzz/.gitignore Co-authored-by: Many the fish <many@meilisearch.com>	2022-07-21 16:12:01 +02:00
Loïc Lecrenier	07003704a8	Merge branch 'filter/field-exist'	2022-07-21 14:51:41 +02:00
bors[bot]	e1bc610d27	Merge #595 595: Update version for next release (v0.32.0) r=ManyTheFish a=curquiza In order to release on `main` (for v0.29.0, not v0.28.1) <img width="1014" alt="Capture d’écran 2022-07-21 à 13 20 35" src="https://user-images.githubusercontent.com/20380692/180178725-381fbdf1-c0fb-4fa9-9954-452aec5a1574.png"> Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-07-21 11:07:42 +00:00
Clémentine Urquizar	d5e9b7305b	Update version for next release (v0.32.0)	2022-07-21 13:20:02 +04:00
ManyTheFish	cbb3b25459	Fix(Search): Fix phrase search candidates computation This bug is an old bug but was hidden by the proximity criterion, Phrase search were always returning an empty candidates list. Before the fix, we were trying to find any words[n] near words[n] instead of finding any words[n] near words[n+1], for example: for a phrase search '"Hello world"' we were searching for "hello" near "hello" first, instead of "hello" near "world".	2022-07-21 10:04:30 +02:00
bors[bot]	941af58239	Merge #561 561: Enriched documents batch reader r=curquiza a=Kerollmops ~This PR is based on #555 and must be rebased on main after it has been merged to ease the review.~ This PR contains the work in #555 and can be merged on main as soon as reviewed and approved. - [x] Create an `EnrichedDocumentsBatchReader` that contains the external documents id. - [x] Extract the primary key name and make it accessible in the `EnrichedDocumentsBatchReader`. - [x] Use the external id from the `EnrichedDocumentsBatchReader` in the `Transform::read_documents`. - [x] Remove the `update_primary_key` from the _transform.rs_ file. - [x] Really generate the auto-generated documents ids. - [x] Insert the (auto-generated) document ids in the document while processing it in `Transform::read_documents`. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-07-21 07:08:50 +00:00
Loïc Lecrenier	41a0ce07cb	Add a code comment, as suggested in PR review Co-authored-by: Many the fish <many@meilisearch.com>	2022-07-20 16:20:35 +02:00
Loïc Lecrenier	1506683705	Avoid using too much memory when indexing facet-exists-docids	2022-07-19 14:42:35 +02:00
Loïc Lecrenier	d0eee5ff7a	Fix compiler error	2022-07-19 13:54:30 +02:00
Loïc Lecrenier	aed8c69bcb	Refactor indexation of the "facet-id-exists-docids" database The idea is to directly create a sorted and merged list of bitmaps in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating a grenad::Reader where the keys are field_id and the values are docids. Then we send that BTreeMap to the thing that handles TypedChunks, which inserts its content into the database.	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	1eb1e73bb3	Add integration tests for the EXISTS filter	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	4f0bd317df	Remove custom implementation of BytesEncode/Decode for the FieldId	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	80b962b4f4	Run cargo fmt	2022-07-19 10:07:33 +02:00
Loïc Lecrenier	ea0642c32d	Make filter parser more strict regarding spacing around operators OR, AND, NOT, TO must now be followed by spaces	2022-07-19 10:07:33 +02:00

... 3 4 5 6 7 ...

2134 Commits