meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2024-12-24 20:36:21 +08:00

Author	SHA1	Message	Date
bors[bot]	0c7251475d	Merge #2150 2150: Bump milli to v0.22.1 r=curquiza a=curquiza Fixes https://github.com/meilisearch/meilisearch/issues/2138 and https://github.com/meilisearch/meilisearch/issues/2123 by bumping milli to [v0.22.1](https://github.com/meilisearch/milli/releases/tag/v0.22.1) Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-02-08 15:05:06 +00:00
Clémentine Urquizar	1a87b2f37d	Bump milli to v0.22.1	2022-02-08 11:21:44 +01:00
bors[bot]	ea15ad6c34	Merge #447 447: Update version for the next release (v0.22.1) r=curquiza a=curquiza Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-02-07 17:44:09 +00:00
Clémentine Urquizar	d03b3ceb58	Update version for the next release (v0.22.1)	2022-02-07 18:39:29 +01:00
bors[bot]	5d58cb7449	Merge #442 442: fix phrase search r=curquiza a=MarinPostma Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-02-07 16:18:20 +00:00
bors[bot]	752a0e13ad	Merge #2136 2136: Refactoring CI regarding ARM binary publish r=curquiza a=curquiza Fixes https://github.com/meilisearch/meilisearch/issues/1909 - Remove CI file to publish aarch64 binary and put the logic into `publish-binary.yml` - Remove the job to publish armv8 binary - Fix download-latest script accordingly - Adapt dowload-latest with the specific case of the MacOS m1 Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com> Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>	2022-02-07 16:07:46 +00:00
Clémentine Urquizar	ccaca33446	Add --fail-with-body flag to curl in script	2022-02-07 16:16:49 +01:00
Clémentine Urquizar	2a90e805a2	Fix script	2022-02-07 16:05:48 +01:00
Clémentine Urquizar	c4a2d70d19	Fix error handler for curl command in script	2022-02-07 16:01:50 +01:00
meili-bot	f7e4a0177d	Update download-latest.sh Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-02-07 13:53:30 +01:00
bors[bot]	cca65499de	Merge #2145 2145: Update LICENSE r=meili-bot a=curquiza Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-02-07 12:51:50 +00:00
bors[bot]	c5a996aa78	Merge #446 446: Update LICENSE r=Kerollmops a=curquiza Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-02-07 09:47:39 +00:00
Clémentine Urquizar - curqui	80fa7dbbfa	Update LICENSE	2022-02-05 18:29:47 +01:00
Clémentine Urquizar - curqui	1279c38ac9	Update LICENSE	2022-02-05 18:29:11 +01:00
bors[bot]	c24b1e5250	Merge #2135 2135: bug(auth): Make API keys accept Null descriptions r=curquiza a=ManyTheFish Fix #2116 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-02-03 15:26:11 +00:00
bors[bot]	267d14c28d	Merge #445 445: allow null values in csv r=Kerollmops a=MarinPostma This pr allows null values in csv: - if the field is of type string, then an empty field is considered null (`,,`), anything other is turned into a string (i.e `, ,` is a single whitespace string) - if the field is of type number, when the trimmed field is empty, we consider the value null (i.e `,,`, `, ,` are both null numbers) otherwise we try to parse the number. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-02-03 15:11:32 +00:00
ad hoc	bd2262ceea	allow null values in csv	2022-02-03 16:03:01 +01:00
ad hoc	13de251047	rewrite word pair distance gathering	2022-02-03 15:57:20 +01:00
Clémentine Urquizar	78cf8f1f9f	Fix typo	2022-02-02 19:32:20 +01:00
bors[bot]	fda4f229bb	Merge #417 417: Change chunk size to 4MiB to fit more the end user usage r=Kerollmops a=ManyTheFish Reverts meilisearch/milli#379 We made several indexing tests using different sizes of datasets (5 datasets from 9MiB to 100MiB) on several typologies of VMs (`XS: 1GiB RAM, 1 VCPU`, `S: 2GiB RAM, 2 VCPU`, `M: 4GiB RAM, 3 VCPU`, `L: 8GiB RAM, 4 VCPU`). The result of these tests shows that the `4MiB` chunk size seems to be the best size compared to other chunk sizes (`2Mib`, `4MiB`, `8Mib`, `16Mib`, `32Mib`, `64Mib`, `128Mib`). below is the average time per chunk size: ![Capture d’écran 2021-09-27 à 14 27 50](https://user-images.githubusercontent.com/6482087/134909368-ef0bc45e-68d5-49d1-aaf9-91113b7c410f.png) <details> <summary>Detailled data</summary> <br> ![Capture d’écran 2021-09-27 à 14 39 48](https://user-images.githubusercontent.com/6482087/134909952-a36b1457-bbbd-4a6c-bbe5-519e4b926b5a.png) </br> </details> Co-authored-by: Many <many@meilisearch.com>	2022-02-02 18:30:59 +00:00
Clémentine Urquizar	1da7277817	Fix dowload-latest.sh according to the new name of the binary	2022-02-02 19:25:52 +01:00
Clémentine Urquizar	c71c95feb0	Refactor CIs to publish aaarch64 binary	2022-02-02 19:25:28 +01:00
bors[bot]	2468ebb76b	Merge #444 444: Fix the parsing of ndjson requests to index more than the first line r=Kerollmops a=Kerollmops This PR correctly uses the `BufRead` trait to read every line of the content instead of just the first one. This bug was only affecting the http-ui test crate. Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-02-02 17:59:44 +00:00
ManyTheFish	3bee31e6c7	bug(auth): Make API keys accept Null descriptions	2022-02-02 18:18:17 +01:00
Kerollmops	9142ba9dd4	Fix the parsing of ndjson requests to index more than the first line	2022-02-02 17:55:13 +01:00
Many	d59bcea749	Revert "Revert "Change chunk size to 4MiB to fit more the end user usage""	2022-02-02 17:01:13 +01:00
mpostma	7541ab99cd	review changes	2022-02-02 12:59:01 +01:00
mpostma	d0aabde502	optimize 2 typos case	2022-02-02 12:56:09 +01:00
mpostma	55e6cb9c7b	typos on first letter counts as 2	2022-02-02 12:56:09 +01:00
mpostma	642c01d0dc	set max typos on ngram to 1	2022-02-02 12:56:08 +01:00
bors[bot]	9448ca58aa	Merge #2005 2005: auto batching r=MarinPostma a=MarinPostma This pr implements auto batching. The basic functioning of this is that all updates that can be batched together are batched together while the previous batch is being processed. For now, the only updates that can be batched together are the document addition updates (both update and replace), for a single index. The batching is disabled by default for multiple reasons: - We need more experimentation with the scheduling techniques - Right now, if one task fails in a batch, the whole batch fails. We need more permissive error handling when processing document indexation. There are four CLI options, for now, to interact with how the batch is scheduled: - `enable-autobatching`: enable the autobatching feature. - `debounce-duration-sec`: When an update is received, wait that number of seconds before batching and performing the updates. Defaults to 0s. - `max-batch-size`: the maximum number of tasks per batch, defaults to unlimited. - `max-documents-per-batch`: the maximum number of documents in a batch, defaults to unlimited. The batch will always contain a least 1 task, no matter the number of documents in that task. # Implementation The current implementation is made of 3 major components: ## TaskStore The `TaskStore` contains all the tasks. When a task is pushed, it is directly registered to the task store. ## Scheduler The scheduler is in charge of making the batches. At its core, there is a `TaskQueue` and a job queue. `Job`s are always processed first. They are volatile tasks, that is, they don't have a TaskId and are not persisted to disk. Snapshots and dumps are examples of Jobs. If no `Job` is available for processing, then the scheduler attempts to make a `Task` batch from the `TaskQueue`. The first step is to gather new tasks from the `TaskStore` to populate the `TaskQueue`. When this is done, we can prepare our batch. The `TaskQueue` is itself a `BinaryHeap` of `Tasklist`. Each `index_uid` is associated with a `TaskList` that contains all the updates associated with that index uid. Each `TaskList` in the `TaskQueue` is ordered by the id of its first task. When preparing a batch, the `TaskList` at the top of the `TaskQueue` is popped, and the tasks are popped from the list to make the next batch. If there are remaining tasks in the list, the list is inserted back in the `TaskQueue`. ## UpdateLoop The `UpdateLoop` role is to perform batch sequentially. Each time updates are pushed to the update store, the scheduler is notified, and will in turn notify the update loop that work can be performed. When notified, the update loop waits some time to wait for more incoming update and then asks the scheduler for the next batch to perform and perform it. When it is done, the status of the task is put back into the store, and the next batch is processed. Co-authored-by: mpostma <postma.marin@protonmail.com>	2022-02-02 11:04:30 +00:00
ad hoc	d852dc0d2b	fix phrase search	2022-02-01 20:21:33 +01:00
mpostma	c9a236b0af	feat(lib): auto-batching	2022-02-01 18:06:20 +01:00
bors[bot]	622c15e825	Merge #2096 2096: feat(auth): Tenant token r=Kerollmops a=ManyTheFish Make meilisearch support JWT authentication signed with meilisearch API keys using HS256, HS384 or HS512 algorithms. Related spec: [specifications#89](https://github.com/meilisearch/specifications/pull/89) [rendered](https://github.com/meilisearch/specifications/blob/scoped-api-keys/text/0089-tenant-tokens.md) Fix #1991 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-01-27 10:38:41 +00:00
Kerollmops	fb79c32430	Compute the new, common and, deleted prefix words fst once	2022-01-27 11:00:18 +01:00
bors[bot]	054598734a	Merge #2120 2120: Bring `stable` into `main` r=curquiza a=curquiza I forgot to do it, tell me `@Kerollmops` or `@irevoire` if it's useful or not. I would say yes, otherwise I will have conflict when I will try to bring `main` into `stable` for the next release. Maybe I'm wrong Co-authored-by: Irevoire <tamo@meilisearch.com> Co-authored-by: mpostma <postma.marin@protonmail.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com> Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-01-27 09:35:21 +00:00
Clément Renault	51d1e64b23	Remove, now useless, the WriteMethod enum	2022-01-27 10:08:35 +01:00
Clément Renault	e9c02173cf	Rework the WordsPrefixPositionDocids update to compute a subset of the database	2022-01-27 10:08:35 +01:00
Clément Renault	dbba5fd461	Create a function to simplify the word prefix pair proximity docids compute	2022-01-27 10:08:35 +01:00
Clément Renault	e760e02737	Fix the computation of the newly added and common prefix pair proximity words	2022-01-27 10:08:35 +01:00
Clément Renault	d59e559317	Fix the computation of the newly added and common prefix words	2022-01-27 10:08:34 +01:00
Clément Renault	2ec8542105	Rework the WordPrefixDocids update to compute a subset of the database	2022-01-27 10:08:34 +01:00
Clément Renault	28692f65be	Rework the WordPrefixDocids update to compute a subset of the database	2022-01-27 10:08:34 +01:00
Clément Renault	5404bc02dd	Move the fst_stream_into_hashset method in the helper methods	2022-01-27 10:06:00 +01:00
Clément Renault	c90fa95f93	Only compute the word prefix pairs on the created word pair proximities	2022-01-27 10:06:00 +01:00
Clément Renault	822f67e9ad	Bring the newly created word pair proximity docids	2022-01-27 10:06:00 +01:00
Clément Renault	d28f18658e	Retrieve the previous version of the words prefixes FST	2022-01-27 10:05:59 +01:00
ManyTheFish	7ca647f0d0	feat(auth): Implement Tenant token Make meilisearch support JWT authentication signed with meilisearch API keys using HS256, HS384 or HS512 algorithms. Related spec: https://github.com/meilisearch/specifications/pull/89 Fix #1991	2022-01-27 08:25:39 +01:00
bors[bot]	38d23546a5	Merge #431 431: Fix and improve word prefix pair proximity r=ManyTheFish a=Kerollmops This PR first fixes the algorithm we used to select and compute the word prefix pair proximity database. The previous version was skipping nearly all of the prefixes. The issue is that this fix made this method to take more time and we were trying to reduce the time spent in it. With `@ManyTheFish` we found out that we could skip some of the work we were doing by: - discarding the prefixes that were shorter than a specific threshold (default: 2). - discarding the word prefix pairs with proximity bigger than a specific threshold (default: 4). - remove the unused threshold that was specifying a minimum amount of word docids to merge. We will take more time to do some more optimization, like stop clearing and recomputing from scratch the database, we will compute the subsets of keys to create, keep and merge. This change is a little bit more complex than what this PR does. I keep this PR as a draft as I want to further test the real gain if it is enough or not if it is valid or not. I advise reviewers to review commit by commit to see the changes bit by bit, reviewing the whole PR can be hard. Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-01-27 07:04:56 +00:00
Clémentine Urquizar - curqui	aa50fcb1f0	Merge branch 'main' into stable	2022-01-26 20:17:41 +01:00

... 96 97 98 99 100 ...

9692 Commits