meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-01-18 08:48:32 +08:00

A lightning-fast search API that fits effortlessly into your apps, websites, and workflow

Go to file

bors[bot] 249e051cd4 Merge #750 750: Fix hard-deletion of an external id that was soft-deleted and then reimported - main r=irevoire a=loiclec # Pull Request ## Related issue Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3021 ## What does this PR do? There was a bug happening when: 1. Documents were added 2. Some of these documents were replaced using soft-deletion 3. A deletion of another non-replaced document takes place and triggers a hard-deletion 4. Documents with the same identifiers as the replaced documents are added again Then, search results would return duplicate documents. No crash would happen at any time (this is the reason it wasn't caught by the previous fuzz test. I have updated the new one such that it also checks the result of a placeholder search request, which then finds the bug immediately). The cause of the bug is: 1. When a hard-deletion is triggered, we try to retrieve the external document id associated with each soft-deleted document id. 2. Then, we take this list of external document ids and remove each of them from the `ExternalDocumentsIds` structure. 3. However, this is not correct in case an existing (non-deleted) document shares the external id of a soft-deleted document. ## Implementation of the fix 1. Before we process a permanent deletion, we update the list of soft-deleted document ids. 2. Then, the permanent deletion's job is to remove the soft-deleted documents from all data structures. Therefore, to update `ExternalDocumentsIds`, we can simply call the `delete_soft_deleted_documents_ids_from_fsts` method, which is faster and simpler. ## Correctness A unit test was added to reproduce the bug. The new fuzz test, when adjusted to check the correctness of a placeholder search, could also instantly reproduce the bug, but now does not find any other problem. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>		2022-12-20 16:13:20 +00:00
.github	Merge #715	2022-12-01 10:39:38 +00:00
assets	chore: move logo to (new) assets folder	2022-10-04 12:20:24 +02:00
benchmarks	Update version for the next release (v0.38.0) in Cargo.toml files	2022-12-19 16:35:38 +00:00
cli	Update version for the next release (v0.38.0) in Cargo.toml files	2022-12-19 16:35:38 +00:00
filter-parser	Update version for the next release (v0.38.0) in Cargo.toml files	2022-12-19 16:35:38 +00:00
flatten-serde-json	Update version for the next release (v0.38.0) in Cargo.toml files	2022-12-19 16:35:38 +00:00
json-depth-checker	Update version for the next release (v0.38.0) in Cargo.toml files	2022-12-19 16:35:38 +00:00
milli	Fix hard-deletion of an external id that was soft-deleted	2022-12-20 15:33:31 +01:00
script	format the whole project	2021-06-16 18:33:33 +02:00
.gitignore	Ignore files generated by fuzzcheck	2022-10-26 13:47:46 +02:00
.rustfmt.toml	format the whole project	2021-06-16 18:33:33 +02:00
bors.toml	Add clippy job	2022-11-04 08:58:12 +09:00
Cargo.toml	Optimize a few performance sensitive dependencies on debug builds	2022-10-12 09:22:05 +02:00
CONTRIBUTING.md	add a sentence about installing rust-nightly	2022-12-07 12:31:43 +01:00
LICENSE	Update LICENSE	2022-02-15 15:52:50 +01:00
README.md	chore: move logo to (new) assets folder	2022-10-04 12:20:24 +02:00

README.md

a concurrent indexer combined with fast and relevant search algorithms

Introduction

This repository contains the core engine used in Meilisearch.

It contains a library that can manage one and only one index. Meilisearch manages the multi-index itself. Milli is unable to store updates in a store: it is the job of something else above and this is why it is only able to process one update at a time.

This repository contains crates to quickly debug the engine:

There are benchmarks located in the benchmarks crate.
The cli crate is a simple command-line interface that helps run flamegraph on top of it.
The filter-parser crate contains the parser for the Meilisearch filter syntax.
The flatten-serde-json crate contains the library that flattens serde-json Value objects like Elasticsearch does.
The json-depth-checker crate is used to indicate if a JSON must be flattened.

How to use it?

Milli is a library that does search things, it must be embedded in a program. You can compute the documentation of it by using cargo doc --open.

Here is an example usage of the library where we insert documents into the engine and search for one of them right after.

let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();

let mut wtxn = index.write_txn().unwrap();
let content = documents!([
    {
        "id": 2,
        "title": "Prideand Prejudice",
        "author": "Jane Austin",
        "genre": "romance",
        "price$": "3.5$",
    },
    {
        "id": 456,
        "title": "Le Petit Prince",
        "author": "Antoine de Saint-Exupéry",
        "genre": "adventure",
        "price$": "10.0$",
    },
    {
        "id": 1,
        "title": "Wonderland",
        "author": "Lewis Carroll",
        "genre": "fantasy",
        "price$": "25.99$",
    },
    {
        "id": 4,
        "title": "Harry Potter ing fantasy\0lood Prince",
        "author": "J. K. Rowling",
        "genre": "fantasy\0",
    },
]);

let config = IndexerConfig::default();
let indexing_config = IndexDocumentsConfig::default();
let mut builder =
    IndexDocuments::new(&mut wtxn, &index, &config, indexing_config.clone(), |_| ())
        .unwrap();
builder.add_documents(content).unwrap();
builder.execute().unwrap();
wtxn.commit().unwrap();


// You can search in the index now!
let mut rtxn = index.read_txn().unwrap();
let mut search = Search::new(&rtxn, &index);
search.query("horry");
search.limit(10);

let result = search.execute().unwrap();
assert_eq!(result.documents_ids.len(), 1);

Contributing

We're glad you're thinking about contributing to this repository! Feel free to pick an issue, and to ask any question you need. Some points might not be clear and we are available to help you!

Also, we recommend following the CONTRIBUTING.md to create your PR.