meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-02-20 17:45:54 +08:00

Go to file

747: Soft-deletion computation no longer depends on the mapsize r=irevoire a=dureuill

# Pull Request

## Related issue

Related to https://github.com/meilisearch/meilisearch/issues/3231: After removing `--max-index-size`, the `mapsize` will always be unrelated to the actual max size the user wants for their DB, so it doesn't make sense to use these values any longer.

This implements solution 2.3 from https://github.com/meilisearch/meilisearch/issues/3231#issuecomment-1348628824

## What does this PR do?

### User-visible

- Soft-deleted are no longer deleted when there is less than 10% of the mapsize available or when they take more than 10% of the mapsize
- Instead, they are deleted when they are more soft deleted than regular documents, or when they take more than 1GiB disk space (estimated).

### Implementation standpoint

1. Adds a `DeletionStrategy` struct to replace the boolean `disable_soft_deletion` that we had up until now. This enum allows us to specify that we want "always hard", "always soft", or to use the dynamic soft-deletion strategy (default).
2. Uses the current strategy when deleting documents, with the new heuristics being used in the `DeletionStrategy::Dynamic` variant.
3. Updates the tests to use the appropriate DeletionStrategy whenever needed (one of `AlwaysHard` or `AlwaysSoft` depending on the test)

Note to reviewers: this PR is optimized for a commit-by-commit review.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>

2022-12-19 17:46:18 +00:00

.github

Merge #715

2022-12-01 10:39:38 +00:00

assets

chore: move logo to (new) assets folder

2022-10-04 12:20:24 +02:00

benchmarks

Update version for the next release (v0.38.0) in Cargo.toml files

2022-12-19 16:35:38 +00:00

cli

Update version for the next release (v0.38.0) in Cargo.toml files

2022-12-19 16:35:38 +00:00

filter-parser

Update version for the next release (v0.38.0) in Cargo.toml files

2022-12-19 16:35:38 +00:00

flatten-serde-json

Update version for the next release (v0.38.0) in Cargo.toml files

2022-12-19 16:35:38 +00:00

json-depth-checker

Update version for the next release (v0.38.0) in Cargo.toml files

2022-12-19 16:35:38 +00:00

milli

Merge #747

2022-12-19 17:46:18 +00:00

script

format the whole project

2021-06-16 18:33:33 +02:00

.gitignore

Ignore files generated by fuzzcheck

2022-10-26 13:47:46 +02:00

.rustfmt.toml

format the whole project

2021-06-16 18:33:33 +02:00

bors.toml

Add clippy job

2022-11-04 08:58:12 +09:00

Cargo.toml

Optimize a few performance sensitive dependencies on debug builds

2022-10-12 09:22:05 +02:00

CONTRIBUTING.md

add a sentence about installing rust-nightly

2022-12-07 12:31:43 +01:00

LICENSE

Update LICENSE

2022-02-15 15:52:50 +01:00

README.md

chore: move logo to (new) assets folder

2022-10-04 12:20:24 +02:00

README.md

a concurrent indexer combined with fast and relevant search algorithms

Introduction

This repository contains the core engine used in Meilisearch.

It contains a library that can manage one and only one index. Meilisearch manages the multi-index itself. Milli is unable to store updates in a store: it is the job of something else above and this is why it is only able to process one update at a time.

This repository contains crates to quickly debug the engine:

There are benchmarks located in the benchmarks crate.
The cli crate is a simple command-line interface that helps run flamegraph on top of it.
The filter-parser crate contains the parser for the Meilisearch filter syntax.
The flatten-serde-json crate contains the library that flattens serde-json Value objects like Elasticsearch does.
The json-depth-checker crate is used to indicate if a JSON must be flattened.

How to use it?

Milli is a library that does search things, it must be embedded in a program. You can compute the documentation of it by using cargo doc --open.

Here is an example usage of the library where we insert documents into the engine and search for one of them right after.

let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();

let mut wtxn = index.write_txn().unwrap();
let content = documents!([
    {
        "id": 2,
        "title": "Prideand Prejudice",
        "author": "Jane Austin",
        "genre": "romance",
        "price$": "3.5$",
    },
    {
        "id": 456,
        "title": "Le Petit Prince",
        "author": "Antoine de Saint-Exupéry",
        "genre": "adventure",
        "price$": "10.0$",
    },
    {
        "id": 1,
        "title": "Wonderland",
        "author": "Lewis Carroll",
        "genre": "fantasy",
        "price$": "25.99$",
    },
    {
        "id": 4,
        "title": "Harry Potter ing fantasy\0lood Prince",
        "author": "J. K. Rowling",
        "genre": "fantasy\0",
    },
]);

let config = IndexerConfig::default();
let indexing_config = IndexDocumentsConfig::default();
let mut builder =
    IndexDocuments::new(&mut wtxn, &index, &config, indexing_config.clone(), |_| ())
        .unwrap();
builder.add_documents(content).unwrap();
builder.execute().unwrap();
wtxn.commit().unwrap();


// You can search in the index now!
let mut rtxn = index.read_txn().unwrap();
let mut search = Search::new(&rtxn, &index);
search.query("horry");
search.limit(10);

let result = search.execute().unwrap();
assert_eq!(result.documents_ids.len(), 1);

Contributing

We're glad you're thinking about contributing to this repository! Feel free to pick an issue, and to ask any question you need. Some points might not be clear and we are available to help you!

Also, we recommend following the CONTRIBUTING.md to create your PR.