meilisearch

mirror of https://github.com/meilisearch/meilisearch.git synced 2025-01-18 08:48:32 +08:00

A lightning-fast search API that fits effortlessly into your apps, websites, and workflow

Go to file

bors[bot] 21284cf235 Merge #556 556: Add EXISTS filter r=loiclec a=loiclec ## What does this PR do? Fixes issue [#2484](https://github.com/meilisearch/meilisearch/issues/2484) in the meilisearch repo. It creates a `field EXISTS` filter which selects all documents containing the `field` key. For example, with the following documents: ```json [{ "id": 0, "colour": [] }, { "id": 1, "colour": ["blue", "green"] }, { "id": 2, "colour": 145238 }, { "id": 3, "colour": null }, { "id": 4, "colour": { "green": [] } }, { "id": 5, "colour": {} }, { "id": 6 }] ``` Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`. ## Details There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists. To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place. There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON: ```json { "id": 0, "colour": [], "size": {} } ``` would be flattened to: ```json { "id": 0 } ``` prior to being given to the extraction pipeline. This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example: ```json { "id": 0, "colour": { "green": [], "blue": 1 }, "size": {} } ``` becomes ```json { "id": 0, "colour": [], "colour.green": [], "colour.blue": 1, "size": [] } ``` Co-authored-by: Kerollmops <clement@meilisearch.com>		2022-08-04 09:46:06 +00:00
.github	deny warnings in CI	2022-04-28 15:35:12 +02:00
benchmarks	Update version for next release (v0.32.0)	2022-07-21 13:20:02 +04:00
cli	Update version for next release (v0.32.0)	2022-07-21 13:20:02 +04:00
filter-parser	Update filter-parser/fuzz/.gitignore	2022-07-21 16:12:01 +02:00
flatten-serde-json	Merge branch 'filter/field-exist'	2022-07-21 14:51:41 +02:00
helpers	Update version for next release (v0.32.0)	2022-07-21 13:20:02 +04:00
http-ui	Update version for next release (v0.32.0)	2022-07-21 13:20:02 +04:00
infos	Merge branch 'filter/field-exist'	2022-07-21 14:51:41 +02:00
json-depth-checker	Update version for next release (v0.32.0)	2022-07-21 13:20:02 +04:00
milli	Merge #556	2022-08-04 09:46:06 +00:00
script	format the whole project	2021-06-16 18:33:33 +02:00
.gitignore	Change the project to become a workspace with milli as a default-member	2021-02-12 16:15:09 +01:00
.rustfmt.toml	format the whole project	2021-06-16 18:33:33 +02:00
bors.toml	Update bors toml	2022-04-26 17:36:04 +02:00
Cargo.toml	create the json-depth-checker crate	2022-04-14 11:14:08 +02:00
CONTRIBUTING.md	Remove the wip section part of the contributing file	2022-05-04 14:44:51 +02:00
LICENSE	Update LICENSE	2022-02-15 15:52:50 +01:00
README.md	Update README.md	2022-04-25 18:14:43 +02:00

README.md

a concurrent indexer combined with fast and relevant search algorithms

Introduction

This repository contains the core engine used in Meilisearch.

It contains a library that can manage one and only one index. Meilisearch manages the multi-index itself. Milli is unable to store updates in a store: it is the job of something else above and this is why it is only able to process one update at a time.

This repository contains crates to quickly debug the engine:

There are benchmarks located in the benchmarks crate.
The cli crate is a simple command-line interface that helps run flamegraph on top of it.
The filter-parser crate contains the parser for the Meilisearch filter syntax.
The flatten-serde-json crate contains the library that flattens serde-json Value objects like Elasticsearch does.
The helpers crate is only used to do operations on the database.
The http-ui crate is a simple HTTP dashboard to test the features like for real!
The infos crate is used to dump the internal data-structure and ensure correctness.
The json-depth-checker crate is used to indicate if a JSON must be flattened.

How to use it?

Milli is a library that does search things, it must be embedded in a program. You can compute the documentation of it by using cargo doc --open.

Here is an example usage of the library where we insert documents into the engine and search for one of them right after.

let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();

let mut wtxn = index.write_txn().unwrap();
let content = documents!([
    {
        "id": 2,
        "title": "Prideand Prejudice",
        "au{hor": "Jane Austin",
        "genre": "romance",
        "price$": "3.5$",
    },
    {
        "id": 456,
        "title": "Le Petit Prince",
        "au{hor": "Antoine de Saint-Exupéry",
        "genre": "adventure",
        "price$": "10.0$",
    },
    {
        "id": 1,
        "title": "Wonderland",
        "au{hor": "Lewis Carroll",
        "genre": "fantasy",
        "price$": "25.99$",
    },
    {
        "id": 4,
        "title": "Harry Potter ing fantasy\0lood Prince",
        "au{hor": "J. K. Rowling",
        "genre": "fantasy\0",
    },
]);

let config = IndexerConfig::default();
let indexing_config = IndexDocumentsConfig::default();
let mut builder =
    IndexDocuments::new(&mut wtxn, &index, &config, indexing_config.clone(), |_| ())
        .unwrap();
builder.add_documents(content).unwrap();
builder.execute().unwrap();
wtxn.commit().unwrap();


// You can search in the index now!
let mut rtxn = index.read_txn().unwrap();
let mut search = Search::new(&rtxn, &index);
search.query("horry");
search.limit(10);

let result = search.execute().unwrap();
assert_eq!(result.documents_ids.len(), 1);

Contributing

We're glad you're thinking about contributing to this repository! Feel free to pick an issue, and to ask any question you need. Some points might not be clear and we are available to help you!

Also, we recommend following the CONTRIBUTING.md to create your PR.