mirror of https://github.com/meilisearch/meilisearch.git synced 2025-03-13 20:31:17 +08:00

Go to file

Merge #2005

2005: auto batching r=MarinPostma a=MarinPostma

This pr implements auto batching. The basic functioning of this is that all updates that can be batched together are batched together while the previous batch is being processed.

For now, the only updates that can be batched together are the document addition updates (both update and replace), for a single index.

The batching is disabled by default for multiple reasons:
- We need more experimentation with the scheduling techniques
- Right now, if one task fails in a batch, the whole batch fails. We need more permissive error handling when processing document indexation.

There are four CLI options, for now, to interact with how the batch is scheduled:
- `enable-autobatching`: enable the autobatching feature.
- `debounce-duration-sec`: When an update is received, wait that number of seconds before batching and performing the updates. Defaults to 0s.
- `max-batch-size`: the maximum number of tasks per batch, defaults to unlimited.
- `max-documents-per-batch`: the maximum number of documents in a batch, defaults to unlimited. The batch will always contain a least 1 task, no matter the number of documents in that task.

# Implementation

The current implementation is made of 3 major components:

## TaskStore
The `TaskStore` contains all the tasks. When a task is pushed, it is directly registered to the task store.

## Scheduler
The scheduler is in charge of making the batches. At its core, there is a `TaskQueue` and a job queue. `Job`s are always processed first. They are *volatile* tasks, that is, they don't have a TaskId and are not persisted to disk. Snapshots and dumps are examples of Jobs.

If no `Job` is available for processing, then the scheduler attempts to make a `Task` batch from the `TaskQueue`. The first step is to gather new tasks from the `TaskStore` to populate the `TaskQueue`. When this is done, we can prepare our batch. The `TaskQueue` is itself a `BinaryHeap` of `Tasklist`. Each `index_uid` is associated with a `TaskList` that contains all the updates associated with that index uid. Each `TaskList` in the `TaskQueue` is ordered by the id of its first task.

When preparing a batch, the `TaskList` at the top of the `TaskQueue` is popped, and the tasks are popped from the list to make the next batch. If there are remaining tasks in the list, the list is inserted back in the `TaskQueue`.

## UpdateLoop
The `UpdateLoop` role is to perform batch sequentially. Each time updates are pushed to the update store, the scheduler is notified, and will in turn notify the update loop that work can be performed. When notified, the update loop waits some time to wait for more incoming update and then asks the scheduler for the next batch to perform and perform it. When it is done, the status of the task is put back into the store, and the next batch is processed.

Co-authored-by: mpostma <postma.marin@protonmail.com>

2022-02-02 11:04:30 +00:00

.github

Update MeiliSearch into Meilisearch everywhere

2022-01-26 17:43:16 +01:00

assets

Replace logo

2022-01-26 17:34:31 +01:00

meilisearch-auth

feat(auth): Implement Tenant token

2022-01-27 08:25:39 +01:00

meilisearch-error

Merge branch 'main' into stable

2022-01-26 20:17:41 +01:00

meilisearch-http

feat(lib): auto-batching

2022-02-01 18:06:20 +01:00

meilisearch-lib

feat(lib): auto-batching

2022-02-01 18:06:20 +01:00

.dockerignore

import .git to docker to fix vergen

2021-07-28 19:12:40 +02:00

.gitignore

ignore the snapshots and dumps in the gitignore (#1449 )

2021-07-01 14:41:53 +02:00

bors.toml

Use pr_status isntead of status

2021-10-14 14:21:42 +02:00

Cargo.lock

feat(lib): auto-batching

2022-02-01 18:06:20 +01:00

Cargo.toml

chore(all): fix rust edition

2022-01-12 11:14:50 +01:00

CODE_OF_CONDUCT.md

Create CODE_OF_CONDUCT.md

2020-04-30 20:16:02 +02:00

CONTRIBUTING.md

Update CONTRIBUTING.md

2022-01-26 17:51:38 +01:00

Cross.toml

Cross build with action-rs

2021-10-10 02:21:30 +08:00

Dockerfile

Revert docker as non root PR

2021-12-21 16:59:15 +01:00

download-latest.sh

Fix some typo

2022-01-26 17:56:18 +01:00

LICENSE

Update LICENSE

2021-01-19 00:18:52 +01:00

README.md

Update README.md

2022-01-26 17:54:51 +01:00

SECURITY.md

Update MeiliSearch into Meilisearch everywhere

2022-01-26 17:43:16 +01:00

README.md

Meilisearch

Website | Roadmap | Blog | LinkedIn | Twitter | Documentation | FAQ

⚡ Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine 🔍

Meilisearch is a powerful, fast, open-source, easy to use and deploy search engine. Both searching and indexing are highly customizable. Features such as typo-tolerance, filters, and synonyms are provided out-of-the-box. For more information about features go to our documentation.

✨ Features

Search-as-you-type experience (answers < 50 milliseconds)
Full-text search
Typo tolerant (understands typos and misspelling)
Faceted search and filters
Supports hanzi (Chinese characters)
Supports synonyms
Easy to install, deploy, and maintain
Whole documents are returned
Highly customizable
RESTful API

Getting started

Deploy the Server

Homebrew (Mac OS)

brew update && brew install meilisearch
meilisearch

Docker

docker run -p 7700:7700 -v "$(pwd)/data.ms:/data.ms" getmeili/meilisearch

Announcing a cloud-hosted Meilisearch

Join the closed beta by filling out this form.

Try Meilisearch in our Sandbox

Create a Meilisearch instance in Meilisearch Sandbox. This instance is free, and will be active for 48 hours.

Run on Digital Ocean

Deploy on Platform.sh

APT (Debian & Ubuntu)

echo "deb [trusted=yes] https://apt.fury.io/meilisearch/ /" > /etc/apt/sources.list.d/fury.list
apt update && apt install meilisearch-http
meilisearch

Download the binary (Linux & Mac OS)

curl -L https://install.meilisearch.com | sh
./meilisearch

Compile and run it from sources

If you have the latest stable Rust toolchain installed on your local system, clone the repository and change it to your working directory.

git clone https://github.com/meilisearch/meilisearch.git
cd meilisearch
cargo run --release

Create an Index and Upload Some Documents

Let's create an index! If you need a sample dataset, use this movie database. You can also find it in the datasets/ directory.

curl -L 'https://bit.ly/2PAcw9l' -o movies.json

Now, you're ready to index some data.

curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \
  --header 'content-type: application/json' \
  --data-binary @movies.json

Search for Documents

In command line

The search engine is now aware of your documents and can serve those via an HTTP server.

The jq command-line tool can greatly help you read the server responses.

curl 'http://127.0.0.1:7700/indexes/movies/search?q=botman+robin&limit=2' | jq

{
  "hits": [
    {
      "id": "415",
      "title": "Batman & Robin",
      "poster": "https://image.tmdb.org/t/p/w1280/79AYCcxw3kSKbhGpx1LiqaCAbwo.jpg",
      "overview": "Along with crime-fighting partner Robin and new recruit Batgirl, Batman battles the dual threat of frosty genius Mr. Freeze and homicidal horticulturalist Poison Ivy. Freeze plans to put Gotham City on ice, while Ivy tries to drive a wedge between the dynamic duo.",
      "release_date": 866768400
    },
    {
      "id": "411736",
      "title": "Batman: Return of the Caped Crusaders",
      "poster": "https://image.tmdb.org/t/p/w1280/GW3IyMW5Xgl0cgCN8wu96IlNpD.jpg",
      "overview": "Adam West and Burt Ward returns to their iconic roles of Batman and Robin. Featuring the voices of Adam West, Burt Ward, and Julie Newmar, the film sees the superheroes going up against classic villains like The Joker, The Riddler, The Penguin and Catwoman, both in Gotham City… and in space.",
      "release_date": 1475888400
    }
  ],
  "nbHits": 8,
  "exhaustiveNbHits": false,
  "query": "botman robin",
  "limit": 2,
  "offset": 0,
  "processingTimeMs": 2
}

Use the Web Interface

We also deliver an out-of-the-box web interface in which you can test Meilisearch interactively.

You can access the web interface in your web browser at the root of the server. The default URL is http://127.0.0.1:7700. All you need to do is open your web browser and enter Meilisearch’s address to visit it. This will lead you to a web page with a search bar that will allow you to search in the selected index.

| See the gif above

Documentation

Now that your Meilisearch server is up and running, you can learn more about how to tune your search engine in the documentation.

Contributing

Hey! We're glad you're thinking about contributing to Meilisearch! Feel free to pick an issue labeled as good first issue, and to ask any question you need. Some points might not be clear and we are available to help you!

Also, we recommend following the CONTRIBUTING to create your PR.

Core engine and tokenizer

The code in this repository is only concerned with managing multiple indexes, handling the update store, and exposing an HTTP API.

Search and indexation are the domain of our core engine, milli, while tokenization is handled by our tokenizer library.

Telemetry

Meilisearch collects anonymous data regarding general usage. This helps us better understand developers' usage of Meilisearch features.

To find out more on what information we're retrieving, please see our documentation on Telemetry.

This program is optional, you can disable these analytics by using the MEILI_NO_ANALYTICS env variable.

Feature request

The feature requests are not managed in this repository. Please visit our dedicated repository to see our work about the Meilisearch product.

If you have a feature request or any feedback about an existing feature, please open a discussion. Also, feel free to participate in the current discussions, we are looking forward to reading your comments.

💌 Contact

Please visit this page.

Meilisearch is developed by Meili, a young company. To know more about us, you can read our blog. Any suggestion or feedback is highly appreciated. Thank you for your support!

README.md Unescape Escape

Meilisearch

Website | Roadmap | Blog | LinkedIn | Twitter | Documentation | FAQ

✨ Features

Getting started

Deploy the Server

Homebrew (Mac OS)

Docker

Announcing a cloud-hosted Meilisearch

Try Meilisearch in our Sandbox

Run on Digital Ocean

Deploy on Platform.sh

APT (Debian & Ubuntu)

Download the binary (Linux & Mac OS)

Compile and run it from sources

Create an Index and Upload Some Documents

Search for Documents

In command line

Use the Web Interface

Documentation

Contributing

Core engine and tokenizer

Telemetry

Feature request

💌 Contact

README.md