A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
Go to file
meili-bors[bot] 1afde4fea5
Merge #3542
3542: Refactor of the search algorithms r=dureuill a=loiclec

This PR refactors a large part of the search logic (related to https://github.com/meilisearch/meilisearch/issues/3547)

- The "query tree" is replaced by a "query graph", which describes the different ways in which the search query can be interpreted and precomputes the word derivations for each query term. Example:

<img width="1162" alt="Screenshot 2023-02-27 at 10 26 50" src="https://user-images.githubusercontent.com/6040237/221525270-87917cc0-60d1-473f-847f-2c5a7de9e370.png">

- The control flow between the ~criterions~ ranking rules is managed in a single place instead of being independently implemented by each ranking rule.

- The set of document candidates is determined greedily from the beginning. It is often referred as the "universe" in the code.

- The ranking rules  `proximity`, `attribute`, `typo`, and (maybe) `exactness` are or will be implemented using a K-shortest path graph algorithm. This minimises the number of database and bitmap operations we need to do to compute each ranking rule bucket. It also simplifies the code a lot since a lot of ranking rules will share a large part of their implementation.

- Pointers to database values are stored in a cache to avoid searching in the LMDB databases needlessly.

- The result of some roaring bitmap operations are also stored in a cache, although we'll need to measure the memory pressure this puts on the system and maybe deactivate this cache later on.

- Search requests can be visually logged and debugged in tests.

TODO:
- [ ] Reintroduce search benchmarks
- [x] Implement `disableOnWords` and `disableOnAttributes` settings of typo tolerance
- [x] Implement "exhaustive number of hits
- [x] Implement `attribute` ranking rule
   - [x] Indexing changes: split into `word_fid_docids` and `word_position_docids` (with bucketed position)
   - [x] Ranking rule implementations
- [ ] Implement `exactness` ranking rule
  - [x] Initial implementation
  - [ ] Correct implementation when followed by `Words`
- [ ] Implement `geosort` ranking rule
- [ ] Add tests
   - [x] Typo tolerance `disableOnWords`/`disableOnAttributes`
   - [ ] Geosort
   - [x] Exactness
   - [ ] Attribute/Position
   - [ ] Interactions between ranking rules:
     - [x] Typo/Proximity/Attribute not preceded by Words
     - [x] Exactness not preceded by Words
     - [x] Exactness -> Words (+ check universe correctness)
     - [x] Exactness -> Typo, etc.
     - [ ] Sort -> Words (performance tests)
     - [ ] Attribute/Position -> Typo
     - [ ] Attribute/Position -> Proximity
     - [x] Typo -> Exactness 
     - [x] Typo -> Proximity
     - [x] Proximity -> Typo
   - [x] Words 
   - [x] Typo
   - [x] Proximity
   - [x] Sort
   - [x] Ngrams
   - [x] Split words
   - [x] Ngram + Split Words
   - [x] Term matching strategy
   - [x] Distinct attribute
   - [x] Phrase Search
   - [x] Placeholder search
   - [x] Highlighter 
- [x] Limit the number of word derivations in a search query
- [x] Compute the initial universe correctly according to the terms matching strategy
- [x] Implement placeholder search
- [x] Get the list of ranking rules from the settings 
- [x] Implement `distinct`
- [x] Determine what to do when one of `attribute`, `proximity`, `typo`, or `exactness` is placed before `words`
- [x] Make sure the correct number of allowed typos is used for each word, including the prefix one
- [x] Make sure stop words are treated correctly (e.g. correct position in query graph), including in phrases
- [x] Support phrases correctly
- [x] Support synonyms
- [x] Support split words
- [x] Support combination of ngram + split-words (e.g. `whiteh orse` -> `"white horse"`)
- [x] Implement `typo` ranking rule
- [x] Implement `sort` ranking rule
- [x] Use existing `Search` interface to use the new search algorithms
- [x] Remove old code


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-05-03 13:42:51 +00:00
.github Add SDKs test in a CI 2023-05-02 11:53:28 +02:00
assets Add a README to the milli crate 2023-01-16 16:25:12 +01:00
benchmarks Merge remote-tracking branch 'origin/main' into search-refactor 2023-05-03 12:19:06 +02:00
dump Fix the clippy warnings 2023-04-25 16:40:32 +02:00
file-store Upgrade the compatible versions of the dependencies 2023-04-24 17:50:52 +02:00
filter-parser Merge #3571 2023-04-27 13:14:00 +00:00
flatten-serde-json Fix the tests to make flattening work 2023-03-15 14:12:34 +01:00
grafana-dashboards Add suffix describing the unit when needed; Replace MeiliSearch by Meilisearch; Precised some metrics name 2022-08-23 17:09:27 +02:00
index-scheduler remove the unused snapshot files 2023-04-25 17:55:27 +02:00
json-depth-checker Use the workspace inheritance feature of rust 1.64 2023-02-15 13:51:07 +01:00
meili-snap Upgrade the compatible versions of the dependencies 2023-04-24 17:50:52 +02:00
meilisearch Merge #3702 #3710 2023-05-02 17:27:57 +00:00
meilisearch-auth Fix the compilation 2023-04-24 17:50:58 +02:00
meilisearch-types Merge #3702 #3710 2023-05-02 17:27:57 +00:00
milli Update exactness tests following charabia camelCase tokenization 2023-05-03 14:45:09 +02:00
permissive-json-pointer Use the workspace inheritance feature of rust 1.64 2023-02-15 13:51:07 +01:00
.dockerignore Update .dockerignore 2023-04-25 16:48:25 +02:00
.gitignore edit gitignore to ignore .idea and .vscode folders 2023-02-10 11:42:19 +04:00
.rustfmt.toml Introduce a rustfmt file 2022-10-27 11:35:05 +02:00
bors.toml Remove macos-latest and windows-latest usages 2022-12-20 11:10:09 +01:00
Cargo.lock Merge remote-tracking branch 'origin/main' into search-refactor 2023-05-03 12:19:06 +02:00
Cargo.toml Update version for the next release (v1.1.1) in Cargo.toml 2023-04-13 13:25:10 +00:00
CODE_OF_CONDUCT.md Create CODE_OF_CONDUCT.md 2020-04-30 20:16:02 +02:00
config.toml config: case experimental_enable_metrics in snake_case 2023-02-27 17:14:06 +01:00
CONTRIBUTING.md Change a the text of a link 2023-05-02 13:53:36 +02:00
Cross.toml Cross build with action-rs 2021-10-10 02:21:30 +08:00
Dockerfile revert mount 2023-04-18 15:15:33 +09:00
download-latest.sh Updated messages pointing to the docs website 2023-04-28 20:52:03 +00:00
LICENSE Update LICENSE 2022-02-15 15:54:45 +01:00
README.md Fix the README.md broken links 2023-05-02 13:51:50 +02:00
SECURITY.md docs(security): Fix Supported 2022-05-31 14:21:34 -05:00

Website | Roadmap | Blog | Documentation | FAQ | Discord

Dependency status License Bors enabled

A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow 🔍

Meilisearch helps you shape a delightful search experience in a snap, offering features that work out-of-the-box to speed up your workflow.

A bright colored application for finding movies screening near the user A dark colored application for finding movies screening near the user

🔥 Try it! 🔥

Features

  • Search-as-you-type: find search results in less than 50 milliseconds
  • Typo tolerance: get relevant matches even when queries contain typos and misspellings
  • Filtering and faceted search: enhance your user's search experience with custom filters and build a faceted search interface in a few lines of code
  • Sorting: sort results based on price, date, or pretty much anything else your users need
  • Synonym support: configure synonyms to include more relevant content in your search results
  • Geosearch: filter and sort documents based on geographic data
  • Extensive language support: search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
  • Security management: control which users can access what data with API keys that allow fine-grained permissions handling
  • Multi-Tenancy: personalize search results for any number of application tenants
  • Highly Customizable: customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
  • RESTful API: integrate Meilisearch in your technical stack with our plugins and SDKs
  • Easy to install, deploy, and maintain

📖 Documentation

You can consult Meilisearch's documentation at https://meilisearch.com/docs.

🚀 Getting started

For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our Quick Start guide.

You may also want to check out Meilisearch 101 for an introduction to some of Meilisearch's most popular features.

☁️ Meilisearch cloud

Let us manage your infrastructure so you can focus on integrating a great search experience. Try Meilisearch Cloud today.

🧰 SDKs & integration tools

Install one of our SDKs in your project for seamless integration between Meilisearch and your favorite language or framework!

Take a look at the complete Meilisearch integration list.

Logos belonging to different languages and frameworks supported by Meilisearch, including React, Ruby on Rails, Go, Rust, and PHP

⚙️ Advanced usage

Experienced users will want to keep our API Reference close at hand.

We also offer a wide range of dedicated guides to all Meilisearch features, such as filtering, sorting, geosearch, API keys, and tenant tokens.

Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as documents and indexes.

📊 Telemetry

Meilisearch collects anonymized data from users to help us improve our product. You can deactivate this whenever you want.

To request deletion of collected data, please write to us at privacy@meilisearch.com. Don't forget to include your Instance UID in the message, as this helps us quickly find and delete your data.

If you want to know more about the kind of data we collect and what we use it for, check the telemetry section of our documentation.

📫 Get in touch!

Meilisearch is a search engine created by Meili, a software development company based in France and with team members all over the world. Want to know more about us? Check out our blog!

🗞 Subscribe to our newsletter if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.

💌 Want to make a suggestion or give feedback? Here are some of the channels where you can reach us:

Thank you for your support!

👩‍💻 Contributing

Meilisearch is, and will always be, open-source! If you want to contribute to the project, please take a look at our contribution guidelines.

📦 Versioning

Meilisearch releases and their associated binaries are available in this GitHub page.

The binaries are versioned following SemVer conventions. To know more, read our versioning policy.

Differently from the binaries, crates in this repository are not currently available on crates.io and do not follow SemVer conventions.