meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-11-23 02:27:40 +08:00

Author	SHA1	Message	Date
Louis Dureuil	dde3a04679	WIP arroy integration	2023-12-14 16:07:49 +01:00
Louis Dureuil	13c2c6c16b	Small commit to add hybrid search and autoembedding	2023-12-14 16:07:48 +01:00
Louis Dureuil	21bcf32109	Add candle and hg_hub, updating a lot of deps in the process	2023-12-14 16:07:48 +01:00
Clément Renault	170e063b80	Remove the actix-web dependency from milli	2023-11-28 17:19:57 +01:00
Clément Renault	d32eb11329	Move to the v0.20.0-alpha.9 of heed	2023-11-27 11:52:22 +01:00
Clément Renault	0d4482625a	Make the changes to use heed v0.20-alpha.6	2023-11-23 11:43:58 +01:00
Clément Renault	56a0d91ecd	Update the heed dependency and lock file	2023-11-22 15:11:09 +01:00
Clément Renault	7cb7e37ba8	Merge branch 'main' into tmp-release-v1.5.0	2023-11-21 16:30:46 +01:00
Clément Renault	b10c060bf7	Cleanup TOML	2023-11-01 14:03:04 +01:00
Clément Renault	c71b1d33ae	Sort entries using rayon in the transform sorters	2023-11-01 11:07:16 +01:00
ManyTheFish	17b647dfe5	Wip	2023-10-30 11:13:08 +01:00
ManyTheFish	4c6fddb1cb	update charabia	2023-10-26 17:01:10 +02:00
meili-bors[bot]	ccf3ba3f32	Merge #4019 4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: irevoire <irevoire@users.noreply.github.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-08-28 12:14:11 +00:00
Kerollmops	717b069907	Bump charabia to 0.8.3	2023-08-22 16:25:00 +02:00
ManyTheFish	cab27c2ab4	upgrade indexmap = "2.0.0"	2023-08-10 18:09:02 +02:00
ManyTheFish	624fa9052f	upgrade deserr = "0.6.0"	2023-08-10 18:09:02 +02:00
ManyTheFish	60c11dbdbd	upgrade rstar - "0.11.0"	2023-08-10 18:09:02 +02:00
ManyTheFish	dacee40ebc	upgrade memmap2 = "0.7.1"	2023-08-10 18:09:02 +02:00
ManyTheFish	cc2c19d4c3	upgrade itertools = "0.10.5"	2023-08-10 18:09:02 +02:00
ManyTheFish	b45c36cd71	Merge branch 'main' into tmp-release-v1.3.0	2023-08-01 15:05:17 +02:00
meili-bors[bot]	151c31c18f	Merge #3963 3963: Fix the milli crate r=ManyTheFish a=irevoire Milli was using the serde feature of either without enabling it first; thus, it wasn't working. It was working in meilisearch, though, because `meilisearch-types` was using the feature which enables it globally for all the other crates. ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3962 Co-authored-by: Tamo <tamo@meilisearch.com>	2023-07-31 09:32:08 +00:00
Tamo	a8ad0902d3	Fix the milli crate Milli was using the serde feature of either without enabling it first, thus it wasn't working	2023-07-31 11:08:27 +02:00
ThatOneCalculator	ba919b6123	fix: ⬆️ up mimalloc	2023-07-28 20:35:47 -07:00
Clément Renault	d8b47b689e	Use the new read-txn-no-tls heed feature	2023-07-26 15:45:15 +02:00
Kerollmops	29ab54b259	Replace the hnsw crate by the instant-distance one	2023-07-25 12:37:35 +02:00
ManyTheFish	0497f93494	Update Charabia to the last version	2023-07-19 15:19:32 +02:00
Kerollmops	eef95de30e	First iteration on exposing puffin profiling	2023-07-18 17:38:13 +02:00
ManyTheFish	c106906f8f	deactivate camelCase segmentation	2023-07-13 12:06:27 +02:00
Kerollmops	a442af6a7c	Update the features of the either dependency to compile milli successfully	2023-07-03 18:51:43 +02:00
meili-bors[bot]	661d1f90dc	Merge #3866 3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish # Pull Request Update Charabia: - enhance Japanese segmentation - enhance Latin Tokenization - words containing `_` are now properly segmented into several words - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule - fixes #3815 - fixes #3778 - fixes [product#151](https://github.com/meilisearch/product/discussions/151) > Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22` Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-29 15:24:36 +00:00
ManyTheFish	84845de9ef	Update Charabia	2023-06-29 15:56:32 +02:00
Kerollmops	a385642ec3	Replace the BTreeMap by an IndexMap to return values in order	2023-06-29 14:33:31 +02:00
Kerollmops	c79e82c62a	Move back to the hnsw crate This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.	2023-06-27 12:32:39 +02:00
Kerollmops	268a9ef416	Move to the hgg crate	2023-06-27 12:32:38 +02:00
Clément Renault	4571e512d2	Store the vectors in an HNSW in LMDB	2023-06-27 12:32:38 +02:00
Clément Renault	34349faeae	Create a new _vector extractor	2023-06-27 12:32:37 +02:00
meili-bors[bot]	45636d315c	Merge #3670 3670: Fix addition deletion bug r=irevoire a=irevoire The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed. It fixes https://github.com/meilisearch/meilisearch/issues/3440. ### What was happening? The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents. Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible. The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced: 1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB. 2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand 3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform. ### Other things introduced in this PR Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug. It's not perfect, but it's easy to improve in the future. It'll also run for as long as possible on every merge on the main branch. Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>	2023-06-19 09:09:30 +00:00
Tamo	6c6387d05e	move the fuzzer to its own crate	2023-05-29 12:27:39 +02:00
Tamo	602ad98cb8	improve the way we handle the fsts	2023-05-22 11:15:14 +02:00
Tamo	4391cba6ca	fix the addition + deletion bug	2023-05-17 18:28:57 +02:00
Kerollmops	1a79fd0c3c	Use the new heed v0.12.6	2023-05-15 11:42:30 +02:00
Kerollmops	c4a40e7110	Use the writemap flag to reduce the memory usage	2023-05-15 10:15:33 +02:00
Jakub Jirutka	13f1277637	Allow to disable specialized tokenizations (again) In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai` feature flags to allow melisearch to be built without huge specialed tokenizations that took up 90% of the melisearch binary size. Unfortunately, due to some recent changes, this doesn't work anymore. The problem lies in excessive use of the `default` feature flag, which infects the dependency graph. Instead of adding `default-features = false` here and there, it's easier and more future-proof to not declare `default` in `milli` and `meilisearch-types`. I've renamed it to `all-tokenizers`, which also makes it a bit clearer what it's about.	2023-05-04 15:45:40 +02:00
Louis Dureuil	90bc230820	Merge remote-tracking branch 'origin/main' into search-refactor Conflicts \| resolution ----------\|----------- Cargo.lock \| added mimalloc Cargo.toml \| took origin/main version milli/src/search/criteria/exactness.rs \| deleted after checking it was only clippy changes milli/src/search/query_tree.rs \| deleted after checking it was only clippy changes	2023-05-03 12:19:06 +02:00
ManyTheFish	249053e514	Update feature flags	2023-04-26 14:59:25 +02:00
ManyTheFish	ff2cf2a5ae	Update charabia in milli	2023-04-26 14:56:54 +02:00
Kerollmops	a109802d45	Upgrade the incompatible versions of the dependencies	2023-04-24 17:50:57 +02:00
Kerollmops	47b66e49b8	Upgrade the compatible versions of the dependencies	2023-04-24 17:50:52 +02:00
Loïc Lecrenier	6c659dc12f	Use MiMalloc in milli tests	2023-03-20 09:41:37 +01:00
bors[bot]	39407885c2	Merge #3347 3347: Enhance language detection r=irevoire a=ManyTheFish ## Summary Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query. This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search. ## Detail - Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected - Fill the newly created database during indexing - Create an allow-list with this database and pass it to Charabia - Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese ## Related issues Fixes #2403 Fixes #3513 Co-authored-by: f3r10 <frledesma@outlook.com> Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Many the fish <many@meilisearch.com>	2023-02-21 10:52:13 +00:00

1 2 3 4 5

250 Commits