Commit Graph

8585 Commits

Author SHA1 Message Date
Clément Renault
15a4c05379
Store the facet string values in multiple FSTs 2023-06-28 14:58:41 +02:00
meili-bors[bot]
9deeec88e0
Merge #3861
3861: Add "meilisearch" prefix to last metrics that were missing it r=Kerollmops a=dureuill

# Pull Request

## Related issue
Related to #3790 

## What does this PR do?
- change implementation to follow the spec on metrics name
- regenerate grafana dashboard from the code

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-28 09:28:31 +00:00
Louis Dureuil
167ac55a2d
Update dashboard generated from grafana 2023-06-28 11:22:16 +02:00
Louis Dureuil
ea68ccd034
prefix http_* metrics by meilisearch 2023-06-28 11:21:50 +02:00
meili-bors[bot]
d4f10800f2
Merge #3834
3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish

## Summary
This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`:

```json
{
  "q": "Captain Marvel",
  "attributesToSearchOn": ["title"]
}
```

This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. 

## Trying the prototype

A dedicated docker image has been released for this feature:

#### last prototype version:

```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1
```

#### others prototype versions:

```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0
```

## Technical Detail

The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases.
The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache.

### Relevancy limits

Almost all ranking rules behave as expected when ordering the documents.
Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it:
```rust
#[actix_rt::test]
async fn proximity_ranking_rule_order() {
    let server = Server::new().await;
    let index = index_with_documents(
        &server,
        &json!([
        {
            "title": "Captain super mega cool. A Marvel story",
            // Perfect distance between words in an ignored attribute
            "desc": "Captain Marvel",
            "id": "1",
        },
        {
            "title": "Captain America from Marvel",
            "desc": "a Shazam ersatz",
            "id": "2",
        }]),
    )
    .await;

    // Document 2 should appear before document 1.
    index
        .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), |response, code| {
            assert_eq!(code, 200, "{}", response);
            assert_eq!(
                response["hits"],
                json!([
                    {"id": "2"},
                    {"id": "1"},
                ])
            );
        })
        .await;
}
```

Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR.

## Related

Fixes #3772

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-28 08:19:23 +00:00
meili-bors[bot]
dc293911ad
Merge #3745
3745: tests: add unit test for `PayloadTooLarge` error r=curquiza a=cymruu

# Pull Request
Add a unit test for the `Payload`, which verifies that a request with a payload that is too large is rejected with the appropriate message.
This was requested in this PR https://github.com/meilisearch/meilisearch/pull/3739

## Related issue
https://github.com/meilisearch/meilisearch/pull/3739

## What does this PR do?
- Adds requested test

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Filip Bachul <filipbachul@gmail.com>
2023-06-27 14:58:23 +00:00
meili-bors[bot]
9d68e6969e
Merge #3859
3859: Merge all analytics events pertaining to updating the experimental features r=Kerollmops a=dureuill

Follow-up to #3850 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-27 13:26:01 +00:00
Louis Dureuil
b4b686d253
Merge all analytics events pertaining to updating the experimental features 2023-06-27 15:16:23 +02:00
meili-bors[bot]
98ec476198
Merge #3855
3855: Change and add links to the Cloud r=Kerollmops a=dureuill

- add cloud link in banner
- add utm to existing links following https://github.com/meilisearch/integration-guides/issues/277#issuecomment-1592054536

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-27 12:29:36 +00:00
Louis Dureuil
c47b8a8bfe
Fix typo
Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
2023-06-27 14:27:54 +02:00
Louis Dureuil
054f81a021
Make message consistent with the one in integration repos 2023-06-27 14:20:55 +02:00
meili-bors[bot]
d8ea688481
Merge #3825
3825: Accept semantic vectors and allow users to query nearest neighbors r=Kerollmops a=Kerollmops

This Pull Request brings a new feature to the current API. The engine accepts a new `_vectors` field akin to the `_geo` one. This vector is stored in Meilisearch and can be retrieved via search. This work is the first step toward hybrid search, bringing the best of both worlds: keyword and semantic search ❤️‍🔥

## ToDo
 - [x] Make it possible to get the `limit` nearest neighbors from a user-generated vector by using the `vector` field of search route.
 - [x] Delete the documents and vectors from the HNSW-related data structures.
     - [x] Do it the slow and ugly way (we need to be able to iterate over all the values).
     - [ ] Do it the efficient way (Wait for a new method or implement it myself).
 - [ ] ~~Move from the `hnsw` crate to the hgg one~~ The hgg crate is too slow.
   Meilisearch takes approximately 88s to answer a query. It is related to the time it takes to deserialize the `Hgg` data structure or search in it. I didn't take the time to measure precisely. We moved back to the hnsw crate which takes approximately 40ms to answer.
   - [ ] ~~Wait for a fix for https://github.com/rust-cv/hgg/issues/4.~~
 - [x] Fix the current dot product function.
 - [x] Fill in the other `SearchResult` fields.
 - [x] Remove the `hnsw` dependency of the meilisearch crate.
 - [x] Fix the pages by taking the offset into account.
 - [x] Release a first prototype https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647
 - [x] Make the pagination and filtering faster and more correct.
 - [x] Return the original vector in the output search results (like `query`).
 - [x] Return an `_semanticSimilarity` field in the documents (it's a dot product)
   - [x] Return this score even if the `_vectors` field is not displayed
   - [x] Rename the field `_semanticScore`.
   - [ ] Return the `_geoDistance` value even if the `_geo` field is not displayed
 - [x] Store the HNSW on possibly multiple LMDB values.
   - [ ] Measure it and make it faster if needed
   - [ ] Export the `ReadableSlices` type into a small external crate
 - [x] Accept an `_vectors` field instead of the `_vector` one.
 - [x] Normalize all vectors.
 - [ ] Remove the `_vectors` field from the default searchable attributes (as we do with `_geo`?).
 - [ ] Correctly compute the candidates by remembering the documents having a valid `_vectors` field.
 - [ ] Return the right errors:
     - [ ] Return an error when the query vector is not the same length as the vectors in the HNSW.
     - [ ] We must return the user document id that triggered the vector dimension issue.
     - [x] If an indexation error occurs.
     - [ ] Fix the error codes when using the search route.
 - [ ] ~~Introduce some settings:~~
    We currently ensure that the vector length is consistent over the whole set of documents and return an error for when a vector dimension doesn't follow the current number of dimensions.
     - [ ] The length of the vector the user will provide.
     - [ ] The distance function (we only support dot as of now).
 - [ ] Introduce other distance functions
    - [ ] Euclidean
    - [ ] Dot Product
    - [ ] Cosine
    - [ ] Make them SIMD optimized
    - [ ] Give credit to qdrant
 - [ ] Add tests.
 - [ ] Write a mini spec.
 - [ ] Release it in v1.3 as an experimental feature.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-06-27 11:17:07 +00:00
Clément Renault
e69be93e42
Log warn about using both q and vector field parameters 2023-06-27 12:32:44 +02:00
Clément Renault
b2b413db12
Return all the _semanticScore values in the documents 2023-06-27 12:32:43 +02:00
Clément Renault
30741d17fa
Change the TODO message 2023-06-27 12:32:43 +02:00
Clément Renault
ebad1f396f
Remove the useless euclidean distance implementation 2023-06-27 12:32:43 +02:00
Clément Renault
29d8268c94
Fix the vector query part by using the correct universe 2023-06-27 12:32:43 +02:00
Clément Renault
63bfe1cee2
Ignore when there are too many vectors 2023-06-27 12:32:43 +02:00
Clément Renault
f3e4d70638
Send analytics about the query vector length 2023-06-27 12:32:43 +02:00
Kerollmops
eecf20f109
Introduce a new invalid_vector_store 2023-06-27 12:32:42 +02:00
Kerollmops
816d7ed174
Update the Vector Store product feature link 2023-06-27 12:32:42 +02:00
Louis Dureuil
864ad2a23c
Check that vector store feature is enabled 2023-06-27 12:32:42 +02:00
Kerollmops
66fb5c150c
Rename _semanticSimilarity into _semanticScore 2023-06-27 12:32:42 +02:00
Kerollmops
7c2f5f77b8
Make clippy and fmt happy 2023-06-27 12:32:42 +02:00
Kerollmops
66b8cfd8c8
Introduce a way to store the HNSW on multiple LMDB entries 2023-06-27 12:32:42 +02:00
Kerollmops
ff3664431f
Make rustfmt happy 2023-06-27 12:32:42 +02:00
Kerollmops
531748c536
Return a user error when the _vectors type is invalid 2023-06-27 12:32:41 +02:00
Kerollmops
7aa1275337
Display the _semanticSimilarity even if the _vectors field is not displayed 2023-06-27 12:32:41 +02:00
Kerollmops
737aec1705
Expose an _semanticSimilarity as a dot product in the documents 2023-06-27 12:32:41 +02:00
Kerollmops
3e3c743392
Make Rustfmt happy 2023-06-27 12:32:41 +02:00
Kerollmops
5c5a4e075d
Make clippy happy 2023-06-27 12:32:41 +02:00
Kerollmops
ab9f2269aa
Normalize the vectors during indexation and search 2023-06-27 12:32:41 +02:00
Kerollmops
321ec5f3fa
Accept multiple vectors by documents using the _vectors field 2023-06-27 12:32:40 +02:00
Kerollmops
1b2923f7c0
Return the vector in the output of the search routes 2023-06-27 12:32:40 +02:00
Kerollmops
717d4fddd4
Remove the unused distance 2023-06-27 12:32:40 +02:00
Kerollmops
a7e0f0de89
Introduce a new error message for invalid vector dimensions 2023-06-27 12:32:40 +02:00
Kerollmops
3b560ef7d0
Make clippy happy 2023-06-27 12:32:40 +02:00
Kerollmops
2cf747cb89
Fix the tests 2023-06-27 12:32:40 +02:00
Kerollmops
3c31e1cdd1
Support more pages but in an ugly way 2023-06-27 12:32:39 +02:00
Kerollmops
23eaaf1001
Change the name of the distance module 2023-06-27 12:32:39 +02:00
Kerollmops
c2a402f3ae
Implement an ugly deletion of values in the HNSW 2023-06-27 12:32:39 +02:00
Kerollmops
436a10bef4
Replace the euclidean with a dot product 2023-06-27 12:32:39 +02:00
Kerollmops
8debf6fe81
Use a basic euclidean distance function 2023-06-27 12:32:39 +02:00
Kerollmops
c79e82c62a
Move back to the hnsw crate
This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.
2023-06-27 12:32:39 +02:00
Kerollmops
aca305bb77
Log more to make sure we insert vectors in the hgg data-structure 2023-06-27 12:32:38 +02:00
Kerollmops
5816008139
Introduce an optimized version of the euclidean distance function 2023-06-27 12:32:38 +02:00
Kerollmops
268a9ef416
Move to the hgg crate 2023-06-27 12:32:38 +02:00
Clément Renault
642b0f3a1b
Expose a new vector field on the search route 2023-06-27 12:32:38 +02:00
Clément Renault
cad90e8cbc
Add a vector field to the search routes 2023-06-27 12:32:38 +02:00
Clément Renault
4571e512d2
Store the vectors in an HNSW in LMDB 2023-06-27 12:32:38 +02:00