meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 18:45:06 +08:00

Author	SHA1	Message	Date
bors[bot]	200e98c211	Merge #293 293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops Fix https://github.com/meilisearch/meilisearch/issues/1505. fix https://github.com/meilisearch/MeiliSearch/issues/1529 Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-07-27 16:04:52 +00:00
Clémentine Urquizar	6a141694da	Update version for the next release (v0.8.0)	2021-07-27 16:38:42 +02:00
Kerollmops	dc2b63abdf	Introduce an empty FilterCondition variant to support unknown fields	2021-07-27 16:34:04 +02:00
Kerollmops	b12738cfe9	Use the right DB prefixes to store the faceted fields	2021-07-22 19:18:22 +02:00
Kerollmops	7aa6cc9b04	Do not insert fields in the map when changing the settings	2021-07-22 18:40:12 +02:00
bors[bot]	ee3a49cfba	Merge #291 291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released. It is not an undefined behavior, I repeat: it is a "normal" bug 🎉 👏 ---- This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it. As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`. I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes. <details> <summary>The backtrace leading us to a panic in grenad.</summary> ``` meilisearch_1 \| thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17 meilisearch_1 \| stack backtrace: meilisearch_1 \| 0: rust_begin_unwind meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5 meilisearch_1 \| 1: core::panicking::panic_fmt meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14 meilisearch_1 \| 2: core::panicking::panic meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5 meilisearch_1 \| 3: grenad::block_builder::BlockBuilder::insert meilisearch_1 \| at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17 meilisearch_1 \| 4: grenad::writer::Writer<W>::insert meilisearch_1 \| at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12 meilisearch_1 \| 5: milli::update::words_level_positions::write_level_entry meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5 meilisearch_1 \| 6: milli::update::words_level_positions::compute_positions_levels meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13 meilisearch_1 \| 7: milli::update::words_level_positions::WordsLevelPositions::execute meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23 meilisearch_1 \| 8: milli::update::index_documents::IndexDocuments::execute_raw meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9 meilisearch_1 \| 9: milli::update::index_documents::IndexDocuments::execute meilisearch_1 \| at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9 meilisearch_1 \| 10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30 meilisearch_1 \| 11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22 meilisearch_1 \| 12: meilisearch_http::index::update_handler::UpdateHandler::handle_update meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18 meilisearch_1 \| 13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}} meilisearch_1 \| at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35 meilisearch_1 \| 14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21 meilisearch_1 \| 15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17 meilisearch_1 \| 16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9 meilisearch_1 \| 17: tokio::runtime::task::core::CoreStage<T>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13 meilisearch_1 \| 18: tokio::runtime::task::harness::poll_future::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23 meilisearch_1 \| 19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9 meilisearch_1 \| 20: std::panicking::try::do_call meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40 meilisearch_1 \| 21: std::panicking::try meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19 meilisearch_1 \| 22: std::panic::catch_unwind meilisearch_1 \| at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14 meilisearch_1 \| 23: tokio::runtime::task::harness::poll_future meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19 meilisearch_1 \| 24: tokio::runtime::task::harness::Harness<T,S>::poll_inner meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9 meilisearch_1 \| 25: tokio::runtime::task::harness::Harness<T,S>::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15 meilisearch_1 \| 26: tokio::runtime::task::raw::RawTask::poll meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18 meilisearch_1 \| 27: tokio::runtime::task::Notified<S>::run meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9 meilisearch_1 \| 28: tokio::runtime::blocking::pool::Inner::run meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17 meilisearch_1 \| 29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} meilisearch_1 \| at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17 meilisearch_1 \| note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. ``` </details> Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-07-22 16:14:35 +00:00
Kerollmops	0353fbb5df	Bump the tokenizer version to v0.2.4	2021-07-22 17:14:45 +02:00
Kerollmops	92c0a2cdc1	Add a test that triggers a panic when indexing zeroes	2021-07-22 17:14:44 +02:00
Kerollmops	aa02a7fdd8	Add a test to check that we indeed impact the relevancy	2021-07-22 17:04:38 +02:00
Clément Renault	0227254a65	Return the original string values for the inverted facet index database	2021-07-21 16:59:39 +02:00
Kerollmops	03a01166ba	Display the original facet string value from the linear facet database	2021-07-21 16:59:39 +02:00
Clément Renault	d23c250ad5	Fix a bound error in the facet string range construction	2021-07-21 16:59:39 +02:00
Clément Renault	081278dfd6	Use the facet string levels when computing the facet distribution	2021-07-21 16:59:39 +02:00
Clément Renault	5676b204dd	Fix the facet string levels codecs	2021-07-21 16:59:38 +02:00
Kerollmops	8c86348119	Indexing the facet strings levels	2021-07-21 16:59:38 +02:00
Kerollmops	a7ae552ba7	Fix the FacetStringLevelZeroRange range when unbounded	2021-07-21 16:59:38 +02:00
Kerollmops	757b2b502a	Remove the FacetValueStringCodec	2021-07-21 16:59:38 +02:00
Kerollmops	adfd4da24c	Introduce the FacetStringIter iterator	2021-07-21 16:59:38 +02:00
Kerollmops	a79661c6dc	Introduce a lot of facet string helper iterators	2021-07-21 16:59:38 +02:00
Kerollmops	851f979039	Describe the way we want to group the facet strings	2021-07-21 16:59:38 +02:00
Kerollmops	f858f64b1f	Move the facet number iterators into their own module	2021-07-21 16:59:37 +02:00
Kerollmops	9f8095c069	Make sure that we don't keep a reference on the LMDB key when using put_current	2021-07-21 10:35:35 +02:00
Kerollmops	a9553af635	Add a test to check that we can index more that 256 fields	2021-07-06 11:58:03 +02:00
Kerollmops	838ed1cd32	Use an u16 field id instead of one byte	2021-07-06 11:58:03 +02:00
Kerollmops	91c5d0c042	Use the AlwaysFreePages flag when opening an index	2021-07-05 16:36:13 +02:00
Kerollmops	a6b4069172	Bump to v0.7.2	2021-07-05 10:54:53 +02:00
many	9f62149b94	Fix matching lenghth in matching_words	2021-07-01 19:03:28 +02:00
Clémentine Urquizar	3c149d8a43	Update tokenizer version to v0.2.3	2021-06-30 18:41:35 +02:00
bors[bot]	b4dcdbf00d	Merge #269 #271 269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops This PR fixes #268. The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted. The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html). 271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-06-30 12:34:55 +00:00
Kerollmops	32b7bd366f	Remove the roaring operation functions warnings	2021-06-30 14:12:56 +02:00
Kerollmops	c92ef54466	Add a test for when we insert a previously deleted document	2021-06-30 14:00:01 +02:00
Kerollmops	28782ff99d	Fix ExternalDocumentsIds struct when inserting previously deleted ids	2021-06-30 14:00:01 +02:00
Clémentine Urquizar	b489515f4d	Update milli version to v0.7.1	2021-06-30 13:52:46 +02:00
Kerollmops	54889813ce	Implement some debug functions on the ExternalDocumentsIds struct	2021-06-30 11:29:41 +02:00
Kerollmops	4bce66d5ff	Make the Index::delete_* method private	2021-06-30 10:07:31 +02:00
Irevoire	6044b80362	Update milli/src/search/matching_words.rs Co-authored-by: Clément Renault <renault.cle@gmail.com>	2021-06-30 00:35:26 +02:00
Tamo	be75e738b1	add more tests	2021-06-29 16:24:58 +02:00
Tamo	56fceb1928	re-implement the Damerau-Levenshtein used for the highlighting	2021-06-29 15:36:03 +02:00
Clément Renault	80c6aaf1fd	Bump milli to 0.7.0	2021-06-28 18:31:56 +02:00
Clément Renault	bdc5599b73	Bump heed to use the git repo with v0.12.0	2021-06-28 18:26:20 +02:00
Clément Renault	0013236e5d	Fix the LMDB and heed invalid interactions. It is undefined behavior to keep a reference to the database while modifying it, we were keeping references in the database and also feeding the heed put_current methods with keys referenced inside the database itself. https://github.com/Kerollmops/heed/pull/108	2021-06-28 16:19:02 +02:00
Kerollmops	9e5f9a8a10	Add a test for the words level positions generation bug	2021-06-28 16:08:31 +02:00
Kerollmops	98285b4b18	Bump milli to 0.6.0	2021-06-23 17:30:26 +02:00
Kerollmops	4fc8f06791	Rename faceted_fields into filterable_fields	2021-06-23 17:26:54 +02:00
Kerollmops	c31cadb54f	Do not consider the searchable field as filterable	2021-06-23 17:26:54 +02:00
bors[bot]	2ab24c4f49	Merge #256 256: Update version for the next release (v0.5.1) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-06-23 12:29:57 +00:00
Clémentine Urquizar	9885fb4159	Update version for the next release (v0.5.1)	2021-06-23 14:05:20 +02:00
Kerollmops	a6218a20ae	Introduce a new InvalidFacetsDistribution user error	2021-06-23 13:56:19 +02:00
Kerollmops	2364777838	Return an error for when a field distribution cannot be done	2021-06-23 11:50:49 +02:00
Kerollmops	aeaac743ff	Replace an if let some by a match	2021-06-23 11:33:30 +02:00
Tamo	8d2a0b43ff	run the formatter on the whole project a second time	2021-06-22 15:36:22 +02:00
Tamo	3d90b03d7b	fix the limit There was no check on the limit and thus, if a user especified a very large number this line could causes a panic	2021-06-22 14:52:13 +02:00
bors[bot]	5b6adc6d96	Merge #245 245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops Closes #191, and resolves #140. Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-06-22 12:10:52 +00:00
Kerollmops	51dbb2e06d	Warn for when a key is too large for LMDB	2021-06-22 11:51:36 +02:00
Kerollmops	aecbd14761	Improve the error message for InvalidDocumentId	2021-06-22 11:31:58 +02:00
Kerollmops	0cca2ea24f	Return a MissingDocumentId when a document doesn't have one	2021-06-22 11:22:33 +02:00
Kerollmops	481b0bf277	Warn for when a facet key is too large for LMDB	2021-06-22 10:57:46 +02:00
bors[bot]	b073fd49ea	Merge #244 244: Update version for the next release (v0.5.0) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-06-21 14:27:10 +00:00
Clémentine Urquizar	320670f8fe	Update version for the next release (v0.5.0)	2021-06-21 15:59:17 +02:00
Clémentine Urquizar	daef43f504	Rename FieldsDistribution into FieldDistribution	2021-06-21 15:57:41 +02:00
Clémentine Urquizar	35fcc351a0	Update version for the next release (v0.4.2)	2021-06-20 17:37:24 +02:00
bors[bot]	5b19dd23d9	Merge #240 240: Field distribution r=Kerollmops a=irevoire closes #199 closes #198 Co-authored-by: Tamo <tamo@meilisearch.com>	2021-06-19 10:14:25 +00:00
Tamo	d08cfda796	convert the field_distribution to a BTreeMap and avoid counting twice the same documents	2021-06-17 18:31:54 +02:00
bors[bot]	a9e552ab18	Merge #238 238: Integration tests on filters and distinct r=Kerollmops a=ManyTheFish Fix #216 Fix #120 Co-authored-by: many <maxime@meilisearch.com>	2021-06-17 15:00:51 +00:00
many	6cb1102bdb	Fix PR comments	2021-06-17 15:19:03 +02:00
Tamo	969adaefdf	rename fields_distribution in field_distribution	2021-06-17 15:16:20 +02:00
Kerollmops	ccd6f13793	Update version to the next release (0.4.1)	2021-06-17 15:01:20 +02:00
many	f496cd320d	Add distinct integration tests	2021-06-17 14:33:18 +02:00
many	9f4184208e	Add test on filters	2021-06-17 13:56:09 +02:00
marin	70bee7d405	re-export remaining error types Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-17 11:49:03 +02:00
marin postma	abbebad669	change sub errors visibility	2021-06-17 11:44:01 +02:00
Tamo	9716fb3b36	format the whole project	2021-06-16 18:33:33 +02:00
Clémentine Urquizar	f5ff3e8e19	Update version for the next release (v0.4.0)	2021-06-16 14:01:05 +02:00
many	ce0315a10f	Close write transaction in test	2021-06-16 11:03:37 +02:00
Kerollmops	7ac441e473	Fix small typos	2021-06-16 11:03:37 +02:00
Kerollmops	adf0c389c5	Rename FilterParsing into InvalidFilter	2021-06-16 11:03:36 +02:00
Kerollmops	8cfe3e1ec0	Rename DatabaseSizeReached into MaxDatabaseSizeReached	2021-06-16 11:03:36 +02:00
Kerollmops	4eda438f6f	Add a new Error for when a user use a non-filtered attribute in a filter	2021-06-16 11:03:36 +02:00
Kerollmops	713acc408b	Introduce the primary key to the Settings builder structure	2021-06-16 11:03:36 +02:00
Kerollmops	a7d6930905	Replace the panicking expect by tracked Errors	2021-06-15 11:51:32 +02:00
Kerollmops	f0e804afd5	Rename the FieldIdMapMissingEntry from_db_name field into process	2021-06-15 11:13:04 +02:00
Kerollmops	28c004aa2c	Prefer using constant for the database names	2021-06-15 11:13:04 +02:00
Kerollmops	312c2d1d8e	Use the Error enum everywhere in the project	2021-06-14 16:58:38 +02:00
Kerollmops	ca78cb5aca	Introduce more variants to the error module enums	2021-06-14 16:58:38 +02:00
Kerollmops	456541e921	Implement the Display trait on the Error type	2021-06-14 16:48:51 +02:00
Kerollmops	44c353fafd	Introduce some way to construct an Error	2021-06-14 16:48:51 +02:00
Kerollmops	23fcf7920e	Introduce a basic version of the InternalError struct	2021-06-14 16:48:51 +02:00
Kerollmops	d2b1ecc885	Remove a lot of serialization unreachable errors	2021-06-14 16:48:51 +02:00
Kerollmops	65b1d09d55	Move the obkv merging functions into the merge_function module	2021-06-14 16:48:51 +02:00
Kerollmops	ab727e428b	Remove the docid_word_positions_merge method that must never be called	2021-06-14 16:48:51 +02:00
Kerollmops	93a8633f18	Remove the documents_merge method that must never be called	2021-06-14 16:48:51 +02:00
Kerollmops	cfc7314bd1	Prefer using an explicit merge function name	2021-06-14 16:48:50 +02:00
Kerollmops	93978ec38a	Serializing a RoaringBitmap into a Vec cannot fail	2021-06-14 16:48:50 +02:00
Kerollmops	ff9414a6ba	Use the out of the compute_primary_key_pair function	2021-06-14 16:48:50 +02:00
Many	f4cab080a6	Update milli/src/search/query_tree.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-10 11:30:51 +02:00
Many	36715f571c	Update milli/src/search/criteria/proximity.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-10 11:30:33 +02:00
many	e923a3ed6a	Replace Consecutive by Phrase in query tree Replace Consecutive by Phrase in query tree in order to remove theorical bugs, due of the Consecutive enum type.	2021-06-10 11:16:16 +02:00
Clémentine Urquizar	dc64e139b9	Update version for the next release (v0.3.1)	2021-06-09 14:39:21 +02:00
bors[bot]	afb4133bd2	Merge #212 #222 #223 212: Introduce integration test on criteria r=Kerollmops a=ManyTheFish - add pre-ranked dataset - test each criterion 1 by 1 - test all criteria in several order 222: Move the `UpdateStore` into the http-ui crate r=Kerollmops a=Kerollmops We no more need to have the `UpdateStore` inside of the mill crate as this is the job of the caller to stack the updates and sequentially give them to milli. 223: Update dataset links r=Kerollmops a=curquiza Co-authored-by: many <maxime@meilisearch.com> Co-authored-by: Many <legendre.maxime.isn@gmail.com> Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-06-09 08:47:19 +00:00
bors[bot]	6faa87302c	Merge #220 220: Make hard separators split phrase query r=Kerollmops a=ManyTheFish hard separators will now split a phrase query as two sequential phrases (double-quoted strings): the query `"Radioactive (Imagine Dragons)"` would be considered equivalent to `"Radioactive" "Imagine Dragons"` which as the little disadvantage of not keeping the order of the two (or more) separate phrases. Fix #208 Co-authored-by: many <maxime@meilisearch.com> Co-authored-by: Many <legendre.maxime.isn@gmail.com>	2021-06-09 08:22:58 +00:00
Many	f4ff30e99d	Update milli/tests/search/mod.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-09 10:12:24 +02:00
Many	ab696f6a23	Update milli/tests/search/query_criteria.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-09 10:12:17 +02:00
Kerollmops	0bf4f3f48a	Modify a test to check that criteria additions change the fields ids map	2021-06-08 18:14:34 +02:00
Kerollmops	82df524e09	Make sure that we register the field when setting criteria	2021-06-08 18:14:33 +02:00
Kerollmops	103dddba2f	Move the UpdateStore into the http-ui crate	2021-06-08 17:59:51 +02:00
Many	faf148d297	Update milli/src/search/query_tree.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-08 17:52:37 +02:00
Kerollmops	133ab98260	Use the index primary key when deleting documents	2021-06-08 17:33:29 +02:00
many	b489d699ce	Make hard separators split phrase query hard separators will now split a phrase query as double double-quotes Fix #208	2021-06-08 17:29:38 +02:00
Many	afb09c914d	Update milli/tests/search/query_criteria.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-08 16:53:56 +02:00
many	b64cd2a3e3	Resolve PR comments	2021-06-08 14:14:34 +02:00
many	1fcc5f73ac	Factorize tests using macro_rules	2021-06-08 12:33:02 +02:00
many	10882bcbce	Introduce integration test on criteria	2021-06-03 14:44:53 +02:00
bors[bot]	a32236c80c	Merge #211 211: Update Cargo.toml for next release v0.3.0 r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-06-03 10:42:52 +00:00
Clémentine Urquizar	3b2b3aeea9	Update Cargo.toml for next release v0.3.0	2021-06-03 12:24:27 +02:00
bors[bot]	39ed133f9f	Merge #193 193: Fix primary key behavior r=Kerollmops a=MarinPostma this pr: - Adds early returns on empty document additions, avoiding error messages to be returned when adding no documents and no primary key was set. - Changes the primary key inference logic to match that of legacy meilisearch. close #194 Co-authored-by: Marin Postma <postma.marin@protonmail.com> Co-authored-by: marin postma <postma.marin@protonmail.com>	2021-06-03 10:24:21 +00:00
marin postma	57898d8a90	fix silent deserialize error	2021-06-03 10:42:55 +02:00
bors[bot]	834504aec0	Merge #204 204: Decorrelate Distinct, Asc/Desc, Filterable fields from the faceted fields r=Kerollmops a=Kerollmops This PR decorrelates the fields that need to be stored in facet databases (big inverted indexes for fast access) from the filterable fields, the previously named faceted fields are now named filterable fields and are the union of the distinct attribute, all the Asc/Desc criteria and, the filterable fields. I added two tests to make sure that the engine was correctly generating the faceted databases when a distinct attribute or an Asc/Desc criteria were added, and one to make sure that it was impossible to filter on a non-filterable field even if it was a faceted one. Note that the `AttributesForFacetting` has also been renamed into `FilterableAttributes`. But it will be the Transplant's job to do that on the API, this change is only visible to the milli's library users. - Related to https://github.com/meilisearch/transplant/issues/187. - Fixes #161 by returning the documents that don't have the Asc/Desc field at the end of the bucket. - Fixes #168. - Fixes #152. Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Marin Postma <postma.marin@protonmail.com> Co-authored-by: many <maxime@meilisearch.com>	2021-06-02 15:43:39 +00:00
many	26a9974667	Make asc/desc criterion return resting documents Fix #161.2	2021-06-02 17:41:48 +02:00
Kerollmops	3c304c89d4	Make sure that we generate the faceted database when required	2021-06-02 16:24:58 +02:00
Kerollmops	b0c0490e85	Make sure that we can add a Asc/Desc field without it being filterable	2021-06-02 16:24:58 +02:00
Kerollmops	3b1cd4c4b4	Rename the FacetCondition into FilterCondition	2021-06-02 16:24:58 +02:00
Kerollmops	c2afdbb1fb	Move and comment some internal facet_condition helper functions	2021-06-02 16:24:58 +02:00
Kerollmops	6476827d3a	Fix the indexer to be sure that distinct and Asc/Desc are also faceted	2021-06-02 16:24:58 +02:00
Marin Postma	1e366dae3e	remove useless lifetime on Distinct Trait	2021-06-02 16:24:58 +02:00
Kerollmops	187c713de5	Remove the MapDistinct struct as now distinct attributes are faceted	2021-06-02 16:24:57 +02:00
Kerollmops	ff440c1d9d	Introduce the faceted fields method to retrieve those that needs faceting	2021-06-02 16:24:57 +02:00
Kerollmops	2a3f9b32ff	Rename the faceted fields into filterable fields	2021-06-02 16:24:57 +02:00
tamo	06c414a753	move the benchmarks to another crate so we can download the datasets automatically without adding overhead to the build of milli	2021-06-02 11:11:50 +02:00
tamo	3c84075d2d	uses an env variable to find the datasets	2021-06-02 11:05:07 +02:00
tamo	4969abeaab	update the facets for the benchmarks	2021-06-02 11:05:07 +02:00
tamo	e5dfde88fd	fix the facets conditions	2021-06-02 11:05:07 +02:00
tamo	7c7fba4e57	remove the time limitation to let criterion do what it wants	2021-06-02 11:05:07 +02:00
tamo	5d5d115608	reformat all the files	2021-06-02 11:05:07 +02:00
tamo	7086009f93	improve the base search	2021-06-02 11:05:07 +02:00
tamo	d0b44c380f	add benchmarks on a wiki dataset	2021-06-02 11:05:07 +02:00
tamo	beae843766	add a missing space	2021-06-02 11:05:07 +02:00
tamo	5132a106a1	refactorize everything related to the songs dataset in a songs benchmark file	2021-06-02 11:05:07 +02:00
tamo	136efd6b53	fix the benches	2021-06-02 11:05:07 +02:00
tamo	4b78ef31b6	add the configuration of the searchable fields and displayed fields and a default configuration for the songs	2021-06-02 11:05:07 +02:00
tamo	ea0c6d8c40	add a bunch of queries and start the introduction of the filters and the new dataset	2021-06-02 11:05:07 +02:00
tamo	3def42abd8	merge all the criterion only benchmarks in one file	2021-06-02 11:05:07 +02:00
tamo	a2bff68c1a	remove the optional words for the typo criterion	2021-06-02 11:05:07 +02:00
tamo	aee49bb3cd	add the proximity criterion	2021-06-02 11:05:07 +02:00
tamo	49e4cc3daf	add the words criterion to the bench	2021-06-02 11:05:07 +02:00
tamo	15cce89a45	update the README with instructions to get the download the dataset	2021-06-02 11:05:07 +02:00
tamo	e425f70ef9	let criterion decide how much iteration it wants to do in 10s	2021-06-02 11:05:07 +02:00
tamo	4fdbfd6048	push a first version of the benchmark for the typo	2021-06-02 11:05:07 +02:00
bors[bot]	270da98c46	Merge #202 202: Add field id word count docids database r=Kerollmops a=LegendreM This PR introduces a new database, `field_id_word_count_docids`, that maps the number of words in an attribute with a list of document ids. This relation is limited to attributes that contain less than 11 words. This database is used by the exactness criterion to know if a document has an attribute that contains exactly the query without any additional word. Fix #165 Fix #196 Related to [specifications:#36](https://github.com/meilisearch/specifications/pull/36) Co-authored-by: many <maxime@meilisearch.com> Co-authored-by: Many <legendre.maxime.isn@gmail.com>	2021-06-01 16:09:48 +00:00
many	e857ca4d7d	Fix PR comments	2021-06-01 18:06:46 +02:00
Many	ab2cf69e8d	Update milli/src/update/delete_documents.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-01 17:04:10 +02:00
Many	8e6d1ff0dc	Update milli/src/update/index_documents/store.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-06-01 17:04:02 +02:00
bors[bot]	7d36d664a7	Merge #203 203: Make the MatchingWords return the number of matching bytes r=Kerollmops a=LegendreM Make the MatchingWords return the number of matching bytes using a custom Levenshtein algorithm. Fix #138 Co-authored-by: many <maxime@meilisearch.com>	2021-06-01 12:00:33 +00:00
many	225ae6fd25	Resolve PR comments	2021-06-01 11:53:09 +02:00
Marin Postma	984dc7c1ed	rewrite roaring codec without byteorder.	2021-05-31 22:15:39 +02:00
Marin Postma	1373637da1	optimize roaring codec	2021-05-31 22:15:35 +02:00
many	1df68d342a	Make the MatchingWords return the number of matching bytes	2021-05-31 18:22:29 +02:00
many	c701f8bf36	Use field id word count database in exactness criterion	2021-05-31 16:27:28 +02:00
many	4ddf008be2	add field id word count database	2021-05-31 16:27:28 +02:00
bors[bot]	2f5e61bacb	Merge #184 184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops This pull request is related to https://github.com/meilisearch/milli/issues/152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database. There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28](`d916b57d74/text/0028-indexing-csv.md`) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section. All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good! Fixes #192. Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2021-05-31 13:32:58 +00:00
Kerollmops	1c0a5cd136	Resolve code modification suggestions	2021-05-31 15:22:50 +02:00
many	a5e98cf46d	Fix plane sweep algorithm	2021-05-25 18:21:55 +02:00
Clément Renault	3a4a150ef0	Fix the tests and remaining warnings	2021-05-25 11:31:06 +02:00
Clément Renault	02c655ff1a	Refine the facet distribution to use both databases	2021-05-25 11:30:00 +02:00
Clément Renault	79efded841	Refine the FacetCondition from_array constructor	2021-05-25 11:30:00 +02:00
Clément Renault	f7efde11d9	Refine the facet condition to use both facet databases	2021-05-25 11:30:00 +02:00
Clément Renault	e62b89a2ed	Make the facet distinct work with the new split facets	2021-05-25 11:30:00 +02:00
Clément Renault	bd7b285bae	Split the update side to use the number and the strings facet databases	2021-05-25 11:30:00 +02:00
Clément Renault	038e03a4e4	Use both facet databases in the FacetIter type	2021-05-25 11:30:00 +02:00
Clément Renault	597144b0b9	Use both number and string facet databases in the distinct system	2021-05-25 11:29:59 +02:00
Clément Renault	837c1041c7	Clear and delete the documents from the facet database	2021-05-25 11:28:36 +02:00
Clément Renault	a56c46b6f1	Explode the string and f64 facet databases into two	2021-05-25 11:28:36 +02:00
Clément Renault	df7a32e3d0	Move the creation date initialization into a function	2021-05-25 11:28:35 +02:00
many	a3944a7083	Introduce a filtered_candidates field	2021-05-11 11:37:40 +02:00
many	efba662ca6	Fix clippy warnings in cirteria	2021-05-10 10:27:18 +02:00
many	e923d51b8f	Make bucket candidates optionals	2021-05-10 10:27:04 +02:00
Marin Postma	eeb0c70ea2	meilisearch compatible primary key inference	2021-05-06 22:42:32 +02:00
Marin Postma	313c362461	early return on empty document addition	2021-05-06 18:14:16 +02:00
Many	44b6843de7	Fix pull request reviews Update milli/src/fields_ids_map.rs Update milli/src/search/criteria/exactness.rs Update milli/src/search/criteria/mod.rs	2021-05-06 14:31:03 +02:00
many	c1ce4e4ca9	Introduce mocked ExactAttribute step in exactness criterion	2021-05-06 14:28:31 +02:00
many	a3f8686fbf	Introduce exactness criterion	2021-05-06 14:28:30 +02:00
bors[bot]	25f75d4d03	Merge #189 189: Update version for the next release (v0.2.1) r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2021-05-05 15:28:56 +00:00
Clémentine Urquizar	1e11578ef0	Update version for the next release (v0.2.1)	2021-05-05 14:57:34 +02:00
Alexey Shekhirin	f8d0f5265f	fix(update): fields distribution after documents merge	2021-05-04 22:12:20 +03:00
tamo	d61566787e	provide an iterator over all the documents in a milli index	2021-05-04 11:23:51 +02:00
Clémentine Urquizar	a8680887d8	Upgrade Milli version (v0.2.0)	2021-05-03 14:50:47 +02:00
Clémentine Urquizar	34e02aba42	Upgrade Tokenizer version (v0.2.2)	2021-05-03 10:55:55 +02:00
Alexey Shekhirin	d81c0e8bba	feat(update): disable autogenerate_docids by default	2021-04-30 21:41:34 +03:00
Marin Postma	e8e32e0ba1	make document addition number visible	2021-04-29 20:05:07 +02:00
many	ee09e50e7f	Remove excluded document in criteria iterations - pass excluded document to criteria to remove them in higher levels of the bucket-sort - merge already returned document with excluded documents to avoid duplicas Related to #125 and #112 Fix #170	2021-04-29 12:09:38 +02:00
many	31607bf9cd	Add a threshold on proximity when choosing between linear/set algorithm	2021-04-28 14:57:22 +02:00
many	3b7e6afb55	Make some refacto and add documentation	2021-04-28 13:53:27 +02:00
Many	0add4d735c	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:40:34 +02:00
Many	3794ffc952	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:39:23 +02:00
Many	329bd4a1bb	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:39:03 +02:00
Many	3b1358b62f	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:32:19 +02:00
Many	c862b1bc6b	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:32:10 +02:00
Many	e92d137676	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:31:42 +02:00
Many	b3d6c6a9a0	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:31:13 +02:00
Many	498c2b298c	Update milli/src/search/criteria/attribute.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:30:02 +02:00
Many	0e4e6dfada	Update milli/src/search/criteria/proximity.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 17:29:52 +02:00
Many	47d780b8ce	Update milli/src/search/criteria/mod.rs Co-authored-by: Irevoire <tamo@meilisearch.com>	2021-04-27 14:39:53 +02:00
Many	0daa0e170a	Fix PR comments Co-authored-by: Clément Renault <clement@meilisearch.com>	2021-04-27 14:39:53 +02:00
many	0d7d3ce802	Update roaring package	2021-04-27 14:39:53 +02:00
many	71740805a7	Fix forgotten typo tests	2021-04-27 14:39:53 +02:00
many	e77291a6f3	Optimize Atrribute criterion on big requests	2021-04-27 14:39:53 +02:00
many	716c8e22b0	Add style and comments	2021-04-27 14:39:52 +02:00
many	f853790016	Use the LCM of 10 first numbers to compute attribute rank	2021-04-27 14:39:52 +02:00
many	2b036449be	Fix the return of equal candidates in different pages	2021-04-27 14:39:52 +02:00
many	0efa011e09	Make a small code clean-up	2021-04-27 14:39:52 +02:00
many	17c8c6f945	Make set algorithm return None when nothing can be returned	2021-04-27 14:39:52 +02:00
many	b3e2280bb9	Debug attribute criterion * debug folding when initializing iterators	2021-04-27 14:39:52 +02:00
many	1eee0029a8	Make attribute criterion typo/prefix tolerant	2021-04-27 14:39:52 +02:00
many	59f58c15f7	Implement attribute criterion * Implement WordLevelIterator * Implement QueryLevelIterator * Implement set algorithm based on iterators Not tested + Some TODO to fix	2021-04-27 14:39:52 +02:00
Clément Renault	361193099f	Reduce the amount of branches when query tree flattened	2021-04-27 14:39:52 +02:00
Kerollmops	e65bad16cc	Compute the words prefixes at the end of an update	2021-04-27 14:39:52 +02:00
many	ab92c814c3	Fix attributes score	2021-04-27 14:35:43 +02:00
Clément Renault	0ad9499b93	Fix an indexing bug in the words level positions	2021-04-27 14:35:43 +02:00
Clément Renault	7aa5753ed2	Make the attribute positions range bounds to be fixed	2021-04-27 14:35:43 +02:00
Clément Renault	658f316511	Introduce the Initial Criterion	2021-04-27 14:35:43 +02:00
Kerollmops	89ee2cf576	Introduce the TreeLevel struct	2021-04-27 14:25:35 +02:00
Kerollmops	bd1a371c62	Compute the WordsLevelPositions only once	2021-04-27 14:25:34 +02:00
Kerollmops	8bd4f5d93e	Compute the biggest values of the words_level_positions_docids	2021-04-27 14:25:34 +02:00
Kerollmops	f713828406	Implement the clear and delete documents for the word-level-positions database	2021-04-27 14:25:34 +02:00
Kerollmops	3069bf4f4a	Fix and improve the words-level-positions computation	2021-04-27 14:25:34 +02:00
Kerollmops	3a25137ee4	Expose and use the WordsLevelPositions update	2021-04-27 14:25:34 +02:00
Kerollmops	c765f277a3	Introduce the WordsLevelPositions update	2021-04-27 14:25:34 +02:00
Kerollmops	9242f2f1d4	Store the first word positions levels	2021-04-27 14:25:34 +02:00
Kerollmops	b0a417f342	Introduce the word_level_position_docids Index database	2021-04-27 14:25:34 +02:00
many	75e7b1e3da	Implement test Context methods	2021-04-27 14:25:34 +02:00
many	4ff67ec2ee	Implement attribute criterion for small amounts of candidates	2021-04-27 14:25:34 +02:00
Kerollmops	0f4c0beffd	Introduce the Attribute criterion	2021-04-27 14:25:34 +02:00
tamo	f8dee1b402	[makes clippy happy] search/criteria/proximity.rs	2021-04-21 12:36:45 +02:00
Alexey Shekhirin	6fa00c61d2	feat(search): support words_limit	2021-04-20 12:22:04 +03:00
Kerollmops	c9b2d3ae1a	Warn instead of returning an error when a conversion fails	2021-04-20 10:23:31 +02:00
Kerollmops	2aeef09316	Remove debug logs while iterating through the facet levels	2021-04-20 10:23:31 +02:00
Kerollmops	51767725b2	Simplify integer and float functions trait bounds	2021-04-20 10:23:31 +02:00
Kerollmops	efbfa81fa7	Merge the Float and Integer enum variant into the Number one	2021-04-20 10:23:30 +02:00
Clémentine Urquizar	127d3d028e	Update version for the next release (v0.1.1)	2021-04-19 14:48:13 +02:00
Alexey Shekhirin	33860bc3b7	test(update, settings): set & reset synonyms fixes after review more fixes after review	2021-04-18 11:24:17 +03:00
Alexey Shekhirin	e39aabbfe6	feat(search, update): synonyms	2021-04-18 11:24:17 +03:00
Marin Postma	9c4660d3d6	add tests	2021-04-15 16:25:56 +02:00
Marin Postma	75464a1baa	review fixes	2021-04-15 16:25:56 +02:00
Marin Postma	2f73fa55ae	add documentation	2021-04-15 16:25:55 +02:00
Marin Postma	45c45e11dd	implement distinct attribute distinct can return error facet distinct on numbers return distinct error review fixes make get_facet_value more generic fixes	2021-04-15 16:25:55 +02:00
Clémentine Urquizar	2c5c79d68e	Update Tokenizer version to v0.2.1	2021-04-14 18:54:04 +02:00
tamo	dcb00b2e54	test a new implementation of the stop_words	2021-04-12 18:35:33 +02:00
tamo	da036dcc3e	Revert "Integrate the stop_words in the querytree" This reverts commit `12fb509d84`. We revert this commit because it's causing the bug #150. The initial algorithm we implemented for the stop_words was: 1. remove the stop_words from the dataset 2. keep the stop_words in the query to see if we can generate new words by integrating typos or if the word was a prefix => This was causing the bug since, in the case of “The hobbit”, we were always looking for something starting with “t he” or “th e” instead of ignoring the word completely. For now we are going to fix the bug by completely ignoring the stop_words in the query. This could cause another problem were someone mistyped a normal word and ended up typing a stop_word. For example imagine someone searching for the music “Won't he do it”. If that person misplace one space and write “Won' the do it” then we will loose a part of the request. One fix would be to update our query tree to something like that: --------------------- OR OR TOLERANT hobbit # the first option is to ignore the stop_word AND CONSECUTIVE # the second option is to do as we are doing EXACT t # currently EXACT he TOLERANT hobbit --------------------- This would increase drastically the size of our query tree on request with a lot of stop_words. For example think of “The Lord Of The Rings”. For now whatsoever we decided we were going to ignore this problem and consider that it doesn't reduce too much the relevancy of the search to do that while it improves the performances.	2021-04-12 18:35:33 +02:00
Alexey Shekhirin	84c1dda39d	test(http): setting enum serialize/deserialize	2021-04-08 17:03:40 +03:00
Alexey Shekhirin	dc636d190d	refactor(http, update): introduce setting enum	2021-04-08 17:03:40 +03:00
tamo	0a4bde1f2f	update the default ordering of the criterion	2021-04-01 19:45:31 +02:00

... 3 4 5 6 7 ...

563 Commits