meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-02-21 01:55:52 +08:00

Go to file

bors[bot] 2fdf520271

514: Stop flattening every field r=Kerollmops a=irevoire

When we need to flatten a document:
* The primary key contains a `.`.
* Some fields need to be flattened

Instead of flattening the whole object and thus creating a lot of allocations with the `serde_json_flatten_crate`, we instead generate a minimal sub-object containing only the fields that need to be flattened.
That should create fewer allocations and thus index faster.

---------

```
group                                                             indexing_main_e1e362fa                 indexing_stop-flattening-every-field_40d1bd6b
-----                                                             ----------------------                 ---------------------------------------------
indexing/Indexing geo_point                                       1.99      23.7±0.23s        ? ?/sec    1.00      11.9±0.21s        ? ?/sec
indexing/Indexing movies in three batches                         1.00      18.2±0.24s        ? ?/sec    1.01      18.3±0.29s        ? ?/sec
indexing/Indexing movies with default settings                    1.00      17.5±0.09s        ? ?/sec    1.01      17.7±0.26s        ? ?/sec
indexing/Indexing songs in three batches with default settings    1.00      64.8±0.47s        ? ?/sec    1.00      65.1±0.49s        ? ?/sec
indexing/Indexing songs with default settings                     1.00      54.9±0.99s        ? ?/sec    1.01      55.7±1.34s        ? ?/sec
indexing/Indexing songs without any facets                        1.00      50.6±0.62s        ? ?/sec    1.01      50.9±1.05s        ? ?/sec
indexing/Indexing songs without faceted numbers                   1.00      54.0±1.14s        ? ?/sec    1.01      54.7±1.13s        ? ?/sec
indexing/Indexing wiki                                            1.00     996.2±8.54s        ? ?/sec    1.02   1021.1±30.63s        ? ?/sec
indexing/Indexing wiki in three batches                           1.00    1136.8±9.72s        ? ?/sec    1.00    1138.6±6.59s        ? ?/sec
```

So basically everything slowed down a liiiiiittle bit except the dataset with a nested field which got twice faster

Co-authored-by: Tamo <tamo@meilisearch.com>

2022-04-26 11:50:33 +00:00

.github

Enforce labelling for the PRs

2022-04-09 23:47:06 +02:00

benchmarks

Get rid of the threshold when comparing benchmarks

2022-04-19 15:39:58 +02:00

cli

Update the list of milli's subcrates

2022-04-25 15:55:38 +02:00

filter-parser

Update the list of milli's subcrates

2022-04-25 15:55:38 +02:00

flatten-serde-json

improve the fuzzer of the flatten crate

2022-04-20 16:11:23 +02:00

helpers

Update the list of milli's subcrates

2022-04-25 15:55:38 +02:00

http-ui

Merge #483

2022-04-19 11:42:32 +00:00

infos

Update version for the next release (v0.26.1)

2022-04-14 11:44:06 +02:00

json-depth-checker

Update the list of milli's subcrates

2022-04-25 15:55:38 +02:00

milli

Merge #514

2022-04-26 11:50:33 +00:00

script

format the whole project

2021-06-16 18:33:33 +02:00

.gitignore

Change the project to become a workspace with milli as a default-member

2021-02-12 16:15:09 +01:00

.rustfmt.toml

format the whole project

2021-06-16 18:33:33 +02:00

bors.toml

Remove pr_status from bors settings

2022-04-25 13:39:45 +02:00

Cargo.toml

create the json-depth-checker crate

2022-04-14 11:14:08 +02:00

CONTRIBUTING.md

First version of new CONTRIBUTING.md

2022-04-21 19:02:22 +02:00

LICENSE

Update LICENSE

2022-02-15 15:52:50 +01:00

README.md

Update README.md

2022-04-25 18:14:43 +02:00

README.md

a concurrent indexer combined with fast and relevant search algorithms

Introduction

This repository contains the core engine used in Meilisearch.

It contains a library that can manage one and only one index. Meilisearch manages the multi-index itself. Milli is unable to store updates in a store: it is the job of something else above and this is why it is only able to process one update at a time.

This repository contains crates to quickly debug the engine:

There are benchmarks located in the benchmarks crate.
The cli crate is a simple command-line interface that helps run flamegraph on top of it.
The filter-parser crate contains the parser for the Meilisearch filter syntax.
The flatten-serde-json crate contains the library that flattens serde-json Value objects like Elasticsearch does.
The helpers crate is only used to do operations on the database.
The http-ui crate is a simple HTTP dashboard to test the features like for real!
The infos crate is used to dump the internal data-structure and ensure correctness.
The json-depth-checker crate is used to indicate if a JSON must be flattened.

How to use it?

Milli is a library that does search things, it must be embedded in a program. You can compute the documentation of it by using cargo doc --open.

Here is an example usage of the library where we insert documents into the engine and search for one of them right after.

let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();

let mut wtxn = index.write_txn().unwrap();
let content = documents!([
    {
        "id": 2,
        "title": "Prideand Prejudice",
        "au{hor": "Jane Austin",
        "genre": "romance",
        "price$": "3.5$",
    },
    {
        "id": 456,
        "title": "Le Petit Prince",
        "au{hor": "Antoine de Saint-Exupéry",
        "genre": "adventure",
        "price$": "10.0$",
    },
    {
        "id": 1,
        "title": "Wonderland",
        "au{hor": "Lewis Carroll",
        "genre": "fantasy",
        "price$": "25.99$",
    },
    {
        "id": 4,
        "title": "Harry Potter ing fantasy\0lood Prince",
        "au{hor": "J. K. Rowling",
        "genre": "fantasy\0",
    },
]);

let config = IndexerConfig::default();
let indexing_config = IndexDocumentsConfig::default();
let mut builder =
    IndexDocuments::new(&mut wtxn, &index, &config, indexing_config.clone(), |_| ())
        .unwrap();
builder.add_documents(content).unwrap();
builder.execute().unwrap();
wtxn.commit().unwrap();


// You can search in the index now!
let mut rtxn = index.read_txn().unwrap();
let mut search = Search::new(&rtxn, &index);
search.query("horry");
search.limit(10);

let result = search.execute().unwrap();
assert_eq!(result.documents_ids.len(), 1);

Contributing

We're glad you're thinking about contributing to this repository! Feel free to pick an issue, and to ask any question you need. Some points might not be clear and we are available to help you!

Also, we recommend following the CONTRIBUTING.md to create your PR.