400: Rewrite the filter parser and add a lot of tests r=irevoire a=irevoire

This PR is a complete rewrite of #358, which was reverted in #403.
You can already try this PR in Meilisearch here https://github.com/meilisearch/MeiliSearch/pull/1880.

Since writing a parser is quite complicated, I moved all the logic to another workspace called `filter_parser`.
In this workspace, we don't know anything about milli, the filterable fields / field ID or anything.
As you can see in its `cargo.toml`, it has only three dependencies entirely focused on the parsing part:
```
nom = "7.0.0"
nom_locate = "4.0.0"
```

But introducing this new workspace made some changes necessary on the “AST”. Now the parser only returns `Tokens` (a simple `&str` with a bit of context). Everything is interpreted when we execute the filter later in milli.
This crate provides a new error type for all filter related errors.

---------
## Errors

Currently, we have multiple kinds of errors. Sometimes we are generating errors looking like that: (for `name = truc`)
```
Attribute `name` is not filterable. Available filterable attributes are: ``.
```
While sometimes pest was generating errors looking like that:
```
Invalid syntax for the filter parameter: ` --> 1:7
  |
1 | name =
  |       ^---
  |
  = expected word`.
```

Which most people were seeing like that: (for `name =`)
```
Invalid syntax for the filter parameter: ` --> 1:7\n  |\n1 | name =\n  |       ^---\n  |\n  = expected word`.
```

-----------

With this PR, the error format is unified between all errors.
All errors follow this more straightforward format:
```
The error message.
[from char]:[to char] filter
```

This should be way easier to read when embedded in the JSON for a human. And it should also allow us to parse the errors easily and provide highlighting or something with a frontend playground.

Here is an example of the two previous errors with the new format:
For `name = truc`:
```
Attribute `name` is not filterable. Available filterable attributes are: ``.
1:4 name = truc
```
Or in one line:
```
Attribute `name` is not filterable. Available filterable attributes are: ``.\n1:4 name = truc
```

And for `name =`:
```
Was expecting a value but instead got nothing.
7:7 name =
```
Or in one line:
```
Was expecting a value but instead got nothing.\n7:7 name =
```

Also, since we now have control over the parser, we can generate more explicit error messages so a lot of new errors have been created. I tried to be as helpful as possible for the user; here is a little overview of the new error message you can get when misusing a filter:
```
Expression `"truc` is missing the following closing delimiter: `"`.
8:13 name = "truc
```
The `_geoRadius` filter is an operation and can't be used as a value.
8:30 name = _geoRadius(12, 13, 14)
```
etc

## Tests
A lot of tests have been written in the `filter_parser` crate. I think there is a unit test for every part of the syntax. 
But since we can never be sure we covered all the cases, I also fuzzed the new parser A LOT (for ±8 hours on 20 threads). And the code to fuzz the parser is included in the workspace, so if one day we need to change something to the syntax, we'll be able to re-use it by simply running:
```
cargo fuzz run --release parse
```

## Milli
I renamed the type and module `filter_condition.rs` / `FilterCondition` to `filter.rs` / `Filter`.

Co-authored-by: Tamo <tamo@meilisearch.com>
This commit is contained in:
bors[bot] 2021-11-09 16:09:34 +00:00 committed by GitHub
commit 8dff08d772
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
67 changed files with 1742 additions and 1031 deletions

View File

@ -1,5 +1,5 @@
[workspace]
members = ["milli", "http-ui", "benchmarks", "infos", "helpers", "cli"]
members = ["milli", "filter-parser", "http-ui", "benchmarks", "infos", "helpers", "cli"]
default-members = ["milli"]
[profile.dev]

View File

@ -9,7 +9,7 @@ use criterion::BenchmarkId;
use heed::EnvOpenOptions;
use milli::documents::DocumentBatchReader;
use milli::update::{IndexDocumentsMethod, Settings, UpdateBuilder};
use milli::{FilterCondition, Index};
use milli::{Filter, Index};
use serde_json::{Map, Value};
pub struct Conf<'a> {
@ -117,7 +117,7 @@ pub fn run_benches(c: &mut criterion::Criterion, confs: &[Conf]) {
let mut search = index.search(&rtxn);
search.query(query).optional_words(conf.optional_words);
if let Some(filter) = conf.filter {
let filter = FilterCondition::from_str(&rtxn, &index, filter).unwrap();
let filter = Filter::from_str(filter).unwrap();
search.filter(filter);
}
if let Some(sort) = &conf.sort {

View File

@ -250,7 +250,7 @@ impl Search {
}
if let Some(ref filter) = self.filter {
let condition = milli::FilterCondition::from_str(&txn, &index, filter)?;
let condition = milli::Filter::from_str(filter)?;
search.filter(condition);
}

10
filter-parser/Cargo.toml Normal file
View File

@ -0,0 +1,10 @@
[package]
name = "filter-parser"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
nom = "7.0.0"
nom_locate = "4.0.0"

36
filter-parser/README.md Normal file
View File

@ -0,0 +1,36 @@
# Filter parser
This workspace is dedicated to the parsing of the MeiliSearch filters.
Most of the code and explanation are in the [`lib.rs`](./src/lib.rs). Especially, the BNF of the filters at the top of this file.
The parser use [nom](https://docs.rs/nom/) to do most of its work and [nom-locate](https://docs.rs/nom_locate/) to keep track of what we were doing when we encountered an error.
## Cli
A simple main is provided to quick-test if a filter can be parsed or not without bringing milli.
It takes one argument and try to parse it.
```
cargo run -- 'field = value' # success
cargo run -- 'field = "doggo' # error => missing closing delimiter "
```
## Fuzz
The workspace have been fuzzed with [cargo-fuzz](https://rust-fuzz.github.io/book/cargo-fuzz.html).
### Setup
You'll need rust-nightly to execute the fuzzer.
```
cargo install cargo-fuzz
```
### Run
When the filter parser is executed by the fuzzer it's triggering a stackoverflow really fast. We can avoid this problem by limiting the `max_len` of [libfuzzer](https://llvm.org/docs/LibFuzzer.html) at 500 characters.
```
cargo fuzz run parse -- -max_len=500
```
## What to do if you find a bug in the parser
- Write a test at the end of the [`lib.rs`](./src/lib.rs) to ensure it never happens again.
- Add a file in [the corpus directory](./fuzz/corpus/parse/) with your filter to help the fuzzer find new bugs. Since this directory is going to be heavily polluted by the execution of the fuzzer it's in the gitignore and you'll need to force push your new test.

View File

@ -0,0 +1,25 @@
[package]
name = "filter-parser-fuzz"
version = "0.0.0"
authors = ["Automatically generated"]
publish = false
edition = "2018"
[package.metadata]
cargo-fuzz = true
[dependencies]
libfuzzer-sys = "0.4"
[dependencies.filter-parser]
path = ".."
# Prevent this from interfering with workspaces
[workspace]
members = ["."]
[[bin]]
name = "parse"
path = "fuzz_targets/parse.rs"
test = false
doc = false

View File

@ -0,0 +1 @@
channel = Ponce

View File

@ -0,0 +1 @@
channel != ponce

View File

@ -0,0 +1 @@
NOT channel = ponce

View File

@ -0,0 +1 @@
subscribers < 1000

View File

@ -0,0 +1 @@
subscribers > 1000

View File

@ -0,0 +1 @@
subscribers <= 1000

View File

@ -0,0 +1 @@
subscribers >= 1000

View File

@ -0,0 +1 @@
NOT subscribers < 1000

View File

@ -0,0 +1 @@
NOT subscribers > 1000

View File

@ -0,0 +1 @@
NOT subscribers <= 1000

View File

@ -0,0 +1 @@
NOT subscribers >= 1000

View File

@ -0,0 +1 @@
subscribers = 12

View File

@ -0,0 +1 @@
subscribers 100 TO 1000

View File

@ -0,0 +1 @@
NOT subscribers 100 TO 1000

View File

@ -0,0 +1 @@
_geoRadius(12, 13, 14)

View File

@ -0,0 +1 @@
NOT _geoRadius(12, 13, 14)

View File

@ -0,0 +1 @@
channel = ponce AND 'dog race' != 'bernese mountain'

View File

@ -0,0 +1 @@
channel = ponce OR 'dog race' != 'bernese mountain'

View File

@ -0,0 +1 @@
channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > 1000

View File

@ -0,0 +1 @@
channel = ponce AND ( 'dog race' != 'bernese mountain' OR subscribers > 1000 )

View File

@ -0,0 +1 @@
(channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > 1000) AND _geoRadius(12, 13, 14)

View File

@ -0,0 +1 @@
channel = Ponce = 12

View File

@ -0,0 +1 @@
channel = 'Mister Mv'

View File

@ -0,0 +1 @@
channel =

View File

@ -0,0 +1 @@
channel = 🐻

View File

@ -0,0 +1 @@
OR

View File

@ -0,0 +1 @@
AND

View File

@ -0,0 +1 @@
channel Ponce

View File

@ -0,0 +1 @@
channel = Ponce OR

View File

@ -0,0 +1 @@
_geoRadius

View File

@ -0,0 +1 @@
_geoRadius = 12

View File

@ -0,0 +1 @@
_geoPoint(12, 13, 14)

View File

@ -0,0 +1 @@
position <= _geoPoint(12, 13, 14)

View File

@ -0,0 +1 @@
channel = "Mister Mv"

View File

@ -0,0 +1 @@
position <= _geoRadius(12, 13, 14)

View File

@ -0,0 +1 @@
channel = 'ponce

View File

@ -0,0 +1 @@
channel = "ponce

View File

@ -0,0 +1 @@
channel = mv OR (followers >= 1000

View File

@ -0,0 +1 @@
'dog race' = Borzoi

View File

@ -0,0 +1 @@
"dog race" = Chusky

View File

@ -0,0 +1 @@
"dog race" = "Bernese Mountain"

View File

@ -0,0 +1 @@
'dog race' = 'Bernese Mountain'

View File

@ -0,0 +1 @@
"dog race" = 'Bernese Mountain'

View File

@ -0,0 +1,18 @@
#![no_main]
use filter_parser::{ErrorKind, FilterCondition};
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: &[u8]| {
if let Ok(s) = std::str::from_utf8(data) {
// When we are fuzzing the parser we can get a stack overflow very easily.
// But since this doesn't happens with a normal build we are just going to limit the fuzzer to 500 characters.
if s.len() < 500 {
match FilterCondition::parse(s) {
Err(e) if matches!(e.kind(), ErrorKind::InternalError(_)) => {
panic!("Found an internal error: `{:?}`", e)
}
_ => (),
}
}
}
});

View File

@ -0,0 +1,73 @@
//! BNF grammar:
//!
//! ```text
//! condition = value ("==" | ">" ...) value
//! to = value value TO value
//! ```
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::combinator::cut;
use nom::sequence::tuple;
use Condition::*;
use crate::{parse_value, FilterCondition, IResult, Span, Token};
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Condition<'a> {
GreaterThan(Token<'a>),
GreaterThanOrEqual(Token<'a>),
Equal(Token<'a>),
NotEqual(Token<'a>),
LowerThan(Token<'a>),
LowerThanOrEqual(Token<'a>),
Between { from: Token<'a>, to: Token<'a> },
}
impl<'a> Condition<'a> {
/// This method can return two operations in case it must express
/// an OR operation for the between case (i.e. `TO`).
pub fn negate(self) -> (Self, Option<Self>) {
match self {
GreaterThan(n) => (LowerThanOrEqual(n), None),
GreaterThanOrEqual(n) => (LowerThan(n), None),
Equal(s) => (NotEqual(s), None),
NotEqual(s) => (Equal(s), None),
LowerThan(n) => (GreaterThanOrEqual(n), None),
LowerThanOrEqual(n) => (GreaterThan(n), None),
Between { from, to } => (LowerThan(from), Some(GreaterThan(to))),
}
}
}
/// condition = value ("==" | ">" ...) value
pub fn parse_condition(input: Span) -> IResult<FilterCondition> {
let operator = alt((tag("<="), tag(">="), tag("!="), tag("<"), tag(">"), tag("=")));
let (input, (fid, op, value)) = tuple((parse_value, operator, cut(parse_value)))(input)?;
let condition = match *op.fragment() {
"<=" => FilterCondition::Condition { fid, op: LowerThanOrEqual(value) },
">=" => FilterCondition::Condition { fid, op: GreaterThanOrEqual(value) },
"!=" => FilterCondition::Condition { fid, op: NotEqual(value) },
"<" => FilterCondition::Condition { fid, op: LowerThan(value) },
">" => FilterCondition::Condition { fid, op: GreaterThan(value) },
"=" => FilterCondition::Condition { fid, op: Equal(value) },
_ => unreachable!(),
};
Ok((input, condition))
}
/// to = value value TO value
pub fn parse_to(input: Span) -> IResult<FilterCondition> {
let (input, (key, from, _, to)) =
tuple((parse_value, parse_value, tag("TO"), cut(parse_value)))(input)?;
Ok((
input,
FilterCondition::Condition {
fid: key.into(),
op: Between { from: from.into(), to: to.into() },
},
))
}

158
filter-parser/src/error.rs Normal file
View File

@ -0,0 +1,158 @@
use std::fmt::Display;
use nom::error::{self, ParseError};
use nom::Parser;
use crate::{IResult, Span};
pub trait NomErrorExt<E> {
fn is_failure(&self) -> bool;
fn map_err<O: FnOnce(E) -> E>(self, op: O) -> nom::Err<E>;
fn map_fail<O: FnOnce(E) -> E>(self, op: O) -> nom::Err<E>;
}
impl<E> NomErrorExt<E> for nom::Err<E> {
fn is_failure(&self) -> bool {
matches!(self, Self::Failure(_))
}
fn map_err<O: FnOnce(E) -> E>(self, op: O) -> nom::Err<E> {
match self {
e @ Self::Failure(_) => e,
e => e.map(|e| op(e)),
}
}
fn map_fail<O: FnOnce(E) -> E>(self, op: O) -> nom::Err<E> {
match self {
e @ Self::Error(_) => e,
e => e.map(|e| op(e)),
}
}
}
/// cut a parser and map the error
pub fn cut_with_err<'a, O>(
mut parser: impl FnMut(Span<'a>) -> IResult<O>,
mut with: impl FnMut(Error<'a>) -> Error<'a>,
) -> impl FnMut(Span<'a>) -> IResult<O> {
move |input| match parser.parse(input) {
Err(nom::Err::Error(e)) => Err(nom::Err::Failure(with(e))),
rest => rest,
}
}
#[derive(Debug)]
pub struct Error<'a> {
context: Span<'a>,
kind: ErrorKind<'a>,
}
#[derive(Debug)]
pub enum ErrorKind<'a> {
ReservedGeo(&'a str),
Geo,
MisusedGeo,
InvalidPrimary,
ExpectedEof,
ExpectedValue,
MissingClosingDelimiter(char),
Char(char),
InternalError(error::ErrorKind),
External(String),
}
impl<'a> Error<'a> {
pub fn kind(&self) -> &ErrorKind<'a> {
&self.kind
}
pub fn context(&self) -> &Span<'a> {
&self.context
}
pub fn new_from_kind(context: Span<'a>, kind: ErrorKind<'a>) -> Self {
Self { context, kind }
}
pub fn new_from_external(context: Span<'a>, error: impl std::error::Error) -> Self {
Self::new_from_kind(context, ErrorKind::External(error.to_string()))
}
pub fn char(self) -> char {
match self.kind {
ErrorKind::Char(c) => c,
_ => panic!("Internal filter parser error"),
}
}
}
impl<'a> ParseError<Span<'a>> for Error<'a> {
fn from_error_kind(input: Span<'a>, kind: error::ErrorKind) -> Self {
let kind = match kind {
error::ErrorKind::Eof => ErrorKind::ExpectedEof,
kind => ErrorKind::InternalError(kind),
};
Self { context: input, kind }
}
fn append(_input: Span<'a>, _kind: error::ErrorKind, other: Self) -> Self {
other
}
fn from_char(input: Span<'a>, c: char) -> Self {
Self { context: input, kind: ErrorKind::Char(c) }
}
}
impl<'a> Display for Error<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let input = self.context.fragment();
// When printing our error message we want to escape all `\n` to be sure we keep our format with the
// first line being the diagnostic and the second line being the incriminated filter.
let escaped_input = input.escape_debug();
match self.kind {
ErrorKind::ExpectedValue if input.trim().is_empty() => {
writeln!(f, "Was expecting a value but instead got nothing.")?
}
ErrorKind::MissingClosingDelimiter(c) => {
writeln!(f, "Expression `{}` is missing the following closing delimiter: `{}`.", escaped_input, c)?
}
ErrorKind::ExpectedValue => {
writeln!(f, "Was expecting a value but instead got `{}`.", escaped_input)?
}
ErrorKind::InvalidPrimary if input.trim().is_empty() => {
writeln!(f, "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `TO` or `_geoRadius` but instead got nothing.")?
}
ErrorKind::InvalidPrimary => {
writeln!(f, "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `TO` or `_geoRadius` at `{}`.", escaped_input)?
}
ErrorKind::ExpectedEof => {
writeln!(f, "Found unexpected characters at the end of the filter: `{}`. You probably forgot an `OR` or an `AND` rule.", escaped_input)?
}
ErrorKind::Geo => {
writeln!(f, "The `_geoRadius` filter expects three arguments: `_geoRadius(latitude, longitude, radius)`.")?
}
ErrorKind::ReservedGeo(name) => {
writeln!(f, "`{}` is a reserved keyword and thus can't be used as a filter expression. Use the `_geoRadius(latitude, longitude, distance) built-in rule to filter on `_geo` coordinates.", name.escape_debug())?
}
ErrorKind::MisusedGeo => {
writeln!(f, "The `_geoRadius` filter is an operation and can't be used as a value.")?
}
ErrorKind::Char(c) => {
panic!("Tried to display a char error with `{}`", c)
}
ErrorKind::InternalError(kind) => writeln!(
f,
"Encountered an internal `{:?}` error while parsing your filter. Please fill an issue", kind
)?,
ErrorKind::External(ref error) => writeln!(f, "{}", error)?,
}
let base_column = self.context.get_utf8_column();
let size = self.context.fragment().chars().count();
write!(f, "{}:{} {}", base_column, base_column + size, self.context.extra)
}
}

587
filter-parser/src/lib.rs Normal file
View File

@ -0,0 +1,587 @@
//! BNF grammar:
//!
//! ```text
//! filter = expression ~ EOF
//! expression = or
//! or = and (~ "OR" ~ and)
//! and = not (~ "AND" not)*
//! not = ("NOT" ~ not) | primary
//! primary = (WS* ~ "(" expression ")" ~ WS*) | geoRadius | condition | to
//! condition = value ("==" | ">" ...) value
//! to = value value TO value
//! value = WS* ~ ( word | singleQuoted | doubleQuoted) ~ WS*
//! singleQuoted = "'" .* all but quotes "'"
//! doubleQuoted = "\"" .* all but double quotes "\""
//! word = (alphanumeric | _ | - | .)+
//! geoRadius = WS* ~ "_geoRadius(" ~ WS* ~ float ~ WS* ~ "," ~ WS* ~ float ~ WS* ~ "," float ~ WS* ~ ")"
//! ```
//!
//! Other BNF grammar used to handle some specific errors:
//! ```text
//! geoPoint = WS* ~ "_geoPoint(" ~ (float ~ ",")* ~ ")"
//! ```
//!
//! Specific errors:
//! ================
//! - If a user try to use a geoPoint, as a primary OR as a value we must throw an error.
//! ```text
//! field = _geoPoint(12, 13, 14)
//! field < 12 AND _geoPoint(1, 2)
//! ```
//!
//! - If a user try to use a geoRadius as a value we must throw an error.
//! ```text
//! field = _geoRadius(12, 13, 14)
//! ```
//!
mod condition;
mod error;
mod value;
use std::fmt::Debug;
use std::ops::Deref;
use std::str::FromStr;
pub use condition::{parse_condition, parse_to, Condition};
use error::{cut_with_err, NomErrorExt};
pub use error::{Error, ErrorKind};
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::character::complete::{char, multispace0};
use nom::combinator::{cut, eof, map};
use nom::multi::{many0, separated_list1};
use nom::number::complete::recognize_float;
use nom::sequence::{delimited, preceded, terminated, tuple};
use nom::Finish;
use nom_locate::LocatedSpan;
pub(crate) use value::parse_value;
pub type Span<'a> = LocatedSpan<&'a str, &'a str>;
type IResult<'a, Ret> = nom::IResult<Span<'a>, Ret, Error<'a>>;
#[derive(Debug, Clone, Eq)]
pub struct Token<'a>(Span<'a>);
impl<'a> Deref for Token<'a> {
type Target = &'a str;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<'a> PartialEq for Token<'a> {
fn eq(&self, other: &Self) -> bool {
self.0.fragment() == other.0.fragment()
}
}
impl<'a> Token<'a> {
pub fn new(position: Span<'a>) -> Self {
Self(position)
}
pub fn as_external_error(&self, error: impl std::error::Error) -> Error<'a> {
Error::new_from_external(self.0, error)
}
pub fn parse<T>(&self) -> Result<T, Error>
where
T: FromStr,
T::Err: std::error::Error,
{
self.0.parse().map_err(|e| self.as_external_error(e))
}
}
impl<'a> From<Span<'a>> for Token<'a> {
fn from(span: Span<'a>) -> Self {
Self(span)
}
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum FilterCondition<'a> {
Condition { fid: Token<'a>, op: Condition<'a> },
Or(Box<Self>, Box<Self>),
And(Box<Self>, Box<Self>),
GeoLowerThan { point: [Token<'a>; 2], radius: Token<'a> },
GeoGreaterThan { point: [Token<'a>; 2], radius: Token<'a> },
Empty,
}
impl<'a> FilterCondition<'a> {
pub fn negate(self) -> FilterCondition<'a> {
use FilterCondition::*;
match self {
Condition { fid, op } => match op.negate() {
(op, None) => Condition { fid, op },
(a, Some(b)) => Or(
Condition { fid: fid.clone(), op: a }.into(),
Condition { fid, op: b }.into(),
),
},
Or(a, b) => And(a.negate().into(), b.negate().into()),
And(a, b) => Or(a.negate().into(), b.negate().into()),
Empty => Empty,
GeoLowerThan { point, radius } => GeoGreaterThan { point, radius },
GeoGreaterThan { point, radius } => GeoLowerThan { point, radius },
}
}
pub fn parse(input: &'a str) -> Result<Self, Error> {
if input.trim().is_empty() {
return Ok(Self::Empty);
}
let span = Span::new_extra(input, input);
parse_filter(span).finish().map(|(_rem, output)| output)
}
}
/// remove OPTIONAL whitespaces before AND after the provided parser.
fn ws<'a, O>(inner: impl FnMut(Span<'a>) -> IResult<O>) -> impl FnMut(Span<'a>) -> IResult<O> {
delimited(multispace0, inner, multispace0)
}
/// or = and (~ "OR" ~ and)
fn parse_or(input: Span) -> IResult<FilterCondition> {
let (input, lhs) = parse_and(input)?;
// if we found a `OR` then we MUST find something next
let (input, ors) = many0(preceded(ws(tag("OR")), cut(parse_and)))(input)?;
let expr = ors
.into_iter()
.fold(lhs, |acc, branch| FilterCondition::Or(Box::new(acc), Box::new(branch)));
Ok((input, expr))
}
/// and = not (~ "AND" not)*
fn parse_and(input: Span) -> IResult<FilterCondition> {
let (input, lhs) = parse_not(input)?;
// if we found a `AND` then we MUST find something next
let (input, ors) = many0(preceded(ws(tag("AND")), cut(parse_not)))(input)?;
let expr = ors
.into_iter()
.fold(lhs, |acc, branch| FilterCondition::And(Box::new(acc), Box::new(branch)));
Ok((input, expr))
}
/// not = ("NOT" ~ not) | primary
/// We can have multiple consecutive not, eg: `NOT NOT channel = mv`.
/// If we parse a `NOT` we MUST parse something behind.
fn parse_not(input: Span) -> IResult<FilterCondition> {
alt((map(preceded(tag("NOT"), cut(parse_not)), |e| e.negate()), parse_primary))(input)
}
/// geoRadius = WS* ~ "_geoRadius(float ~ "," ~ float ~ "," float)
/// If we parse `_geoRadius` we MUST parse the rest of the expression.
fn parse_geo_radius(input: Span) -> IResult<FilterCondition> {
// we want to forbid space BEFORE the _geoRadius but not after
let parsed = preceded(
tuple((multispace0, tag("_geoRadius"))),
// if we were able to parse `_geoRadius` and can't parse the rest of the input we return a failure
cut(delimited(char('('), separated_list1(tag(","), ws(recognize_float)), char(')'))),
)(input)
.map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::Geo)));
let (input, args) = parsed?;
if args.len() != 3 {
return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::Geo)));
}
let res = FilterCondition::GeoLowerThan {
point: [args[0].into(), args[1].into()],
radius: args[2].into(),
};
Ok((input, res))
}
/// geoPoint = WS* ~ "_geoPoint(float ~ "," ~ float ~ "," float)
fn parse_geo_point(input: Span) -> IResult<FilterCondition> {
// we want to forbid space BEFORE the _geoPoint but not after
tuple((
multispace0,
tag("_geoPoint"),
// if we were able to parse `_geoPoint` we are going to return a Failure whatever happens next.
cut(delimited(char('('), separated_list1(tag(","), ws(|c| recognize_float(c))), char(')'))),
))(input)
.map_err(|e| e.map(|_| Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoPoint"))))?;
// if we succeeded we still return a `Failure` because geoPoints are not allowed
Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::ReservedGeo("_geoPoint"))))
}
/// primary = (WS* ~ "(" expression ")" ~ WS*) | geoRadius | condition | to
fn parse_primary(input: Span) -> IResult<FilterCondition> {
alt((
// if we find a first parenthesis, then we must parse an expression and find the closing parenthesis
delimited(
ws(char('(')),
cut(parse_expression),
cut_with_err(ws(char(')')), |c| {
Error::new_from_kind(input, ErrorKind::MissingClosingDelimiter(c.char()))
}),
),
parse_geo_radius,
parse_condition,
parse_to,
// the next lines are only for error handling and are written at the end to have the less possible performance impact
parse_geo_point,
))(input)
// if the inner parsers did not match enough information to return an accurate error
.map_err(|e| e.map_err(|_| Error::new_from_kind(input, ErrorKind::InvalidPrimary)))
}
/// expression = or
pub fn parse_expression(input: Span) -> IResult<FilterCondition> {
parse_or(input)
}
/// filter = expression ~ EOF
pub fn parse_filter(input: Span) -> IResult<FilterCondition> {
terminated(parse_expression, eof)(input)
}
#[cfg(test)]
pub mod tests {
use super::*;
/// Create a raw [Token]. You must specify the string that appear BEFORE your element followed by your element
pub fn rtok<'a>(before: &'a str, value: &'a str) -> Token<'a> {
// if the string is empty we still need to return 1 for the line number
let lines = before.is_empty().then(|| 1).unwrap_or_else(|| before.lines().count());
let offset = before.chars().count();
// the extra field is not checked in the tests so we can set it to nothing
unsafe { Span::new_from_raw_offset(offset, lines as u32, value, "") }.into()
}
#[test]
fn parse() {
use FilterCondition as Fc;
let test_case = [
// simple test
(
"channel = Ponce",
Fc::Condition {
fid: rtok("", "channel"),
op: Condition::Equal(rtok("channel = ", "Ponce")),
},
),
(
"subscribers = 12",
Fc::Condition {
fid: rtok("", "subscribers"),
op: Condition::Equal(rtok("subscribers = ", "12")),
},
),
// test all the quotes and simple quotes
(
"channel = 'Mister Mv'",
Fc::Condition {
fid: rtok("", "channel"),
op: Condition::Equal(rtok("channel = '", "Mister Mv")),
},
),
(
"channel = \"Mister Mv\"",
Fc::Condition {
fid: rtok("", "channel"),
op: Condition::Equal(rtok("channel = \"", "Mister Mv")),
},
),
(
"'dog race' = Borzoi",
Fc::Condition {
fid: rtok("'", "dog race"),
op: Condition::Equal(rtok("'dog race' = ", "Borzoi")),
},
),
(
"\"dog race\" = Chusky",
Fc::Condition {
fid: rtok("\"", "dog race"),
op: Condition::Equal(rtok("\"dog race\" = ", "Chusky")),
},
),
(
"\"dog race\" = \"Bernese Mountain\"",
Fc::Condition {
fid: rtok("\"", "dog race"),
op: Condition::Equal(rtok("\"dog race\" = \"", "Bernese Mountain")),
},
),
(
"'dog race' = 'Bernese Mountain'",
Fc::Condition {
fid: rtok("'", "dog race"),
op: Condition::Equal(rtok("'dog race' = '", "Bernese Mountain")),
},
),
(
"\"dog race\" = 'Bernese Mountain'",
Fc::Condition {
fid: rtok("\"", "dog race"),
op: Condition::Equal(rtok("\"dog race\" = \"", "Bernese Mountain")),
},
),
// test all the operators
(
"channel != ponce",
Fc::Condition {
fid: rtok("", "channel"),
op: Condition::NotEqual(rtok("channel != ", "ponce")),
},
),
(
"NOT channel = ponce",
Fc::Condition {
fid: rtok("NOT ", "channel"),
op: Condition::NotEqual(rtok("NOT channel = ", "ponce")),
},
),
(
"subscribers < 1000",
Fc::Condition {
fid: rtok("", "subscribers"),
op: Condition::LowerThan(rtok("subscribers < ", "1000")),
},
),
(
"subscribers > 1000",
Fc::Condition {
fid: rtok("", "subscribers"),
op: Condition::GreaterThan(rtok("subscribers > ", "1000")),
},
),
(
"subscribers <= 1000",
Fc::Condition {
fid: rtok("", "subscribers"),
op: Condition::LowerThanOrEqual(rtok("subscribers <= ", "1000")),
},
),
(
"subscribers >= 1000",
Fc::Condition {
fid: rtok("", "subscribers"),
op: Condition::GreaterThanOrEqual(rtok("subscribers >= ", "1000")),
},
),
(
"NOT subscribers < 1000",
Fc::Condition {
fid: rtok("NOT ", "subscribers"),
op: Condition::GreaterThanOrEqual(rtok("NOT subscribers < ", "1000")),
},
),
(
"NOT subscribers > 1000",
Fc::Condition {
fid: rtok("NOT ", "subscribers"),
op: Condition::LowerThanOrEqual(rtok("NOT subscribers > ", "1000")),
},
),
(
"NOT subscribers <= 1000",
Fc::Condition {
fid: rtok("NOT ", "subscribers"),
op: Condition::GreaterThan(rtok("NOT subscribers <= ", "1000")),
},
),
(
"NOT subscribers >= 1000",
Fc::Condition {
fid: rtok("NOT ", "subscribers"),
op: Condition::LowerThan(rtok("NOT subscribers >= ", "1000")),
},
),
(
"subscribers 100 TO 1000",
Fc::Condition {
fid: rtok("", "subscribers"),
op: Condition::Between {
from: rtok("subscribers ", "100"),
to: rtok("subscribers 100 TO ", "1000"),
},
},
),
(
"NOT subscribers 100 TO 1000",
Fc::Or(
Fc::Condition {
fid: rtok("NOT ", "subscribers"),
op: Condition::LowerThan(rtok("NOT subscribers ", "100")),
}
.into(),
Fc::Condition {
fid: rtok("NOT ", "subscribers"),
op: Condition::GreaterThan(rtok("NOT subscribers 100 TO ", "1000")),
}
.into(),
),
),
(
"_geoRadius(12, 13, 14)",
Fc::GeoLowerThan {
point: [rtok("_geoRadius(", "12"), rtok("_geoRadius(12, ", "13")],
radius: rtok("_geoRadius(12, 13, ", "14"),
},
),
(
"NOT _geoRadius(12, 13, 14)",
Fc::GeoGreaterThan {
point: [rtok("NOT _geoRadius(", "12"), rtok("NOT _geoRadius(12, ", "13")],
radius: rtok("NOT _geoRadius(12, 13, ", "14"),
},
),
// test simple `or` and `and`
(
"channel = ponce AND 'dog race' != 'bernese mountain'",
Fc::And(
Fc::Condition {
fid: rtok("", "channel"),
op: Condition::Equal(rtok("channel = ", "ponce")),
}
.into(),
Fc::Condition {
fid: rtok("channel = ponce AND '", "dog race"),
op: Condition::NotEqual(rtok(
"channel = ponce AND 'dog race' != '",
"bernese mountain",
)),
}
.into(),
),
),
(
"channel = ponce OR 'dog race' != 'bernese mountain'",
Fc::Or(
Fc::Condition {
fid: rtok("", "channel"),
op: Condition::Equal(rtok("channel = ", "ponce")),
}
.into(),
Fc::Condition {
fid: rtok("channel = ponce OR '", "dog race"),
op: Condition::NotEqual(rtok(
"channel = ponce OR 'dog race' != '",
"bernese mountain",
)),
}
.into(),
),
),
(
"channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > 1000",
Fc::Or(
Fc::And(
Fc::Condition {
fid: rtok("", "channel"),
op: Condition::Equal(rtok("channel = ", "ponce")),
}
.into(),
Fc::Condition {
fid: rtok("channel = ponce AND '", "dog race"),
op: Condition::NotEqual(rtok(
"channel = ponce AND 'dog race' != '",
"bernese mountain",
)),
}
.into(),
)
.into(),
Fc::Condition {
fid: rtok(
"channel = ponce AND 'dog race' != 'bernese mountain' OR ",
"subscribers",
),
op: Condition::GreaterThan(rtok(
"channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > ",
"1000",
)),
}
.into(),
),
),
// test parenthesis
(
"channel = ponce AND ( 'dog race' != 'bernese mountain' OR subscribers > 1000 )",
Fc::And(
Fc::Condition { fid: rtok("", "channel"), op: Condition::Equal(rtok("channel = ", "ponce")) }.into(),
Fc::Or(
Fc::Condition { fid: rtok("channel = ponce AND ( '", "dog race"), op: Condition::NotEqual(rtok("channel = ponce AND ( 'dog race' != '", "bernese mountain"))}.into(),
Fc::Condition { fid: rtok("channel = ponce AND ( 'dog race' != 'bernese mountain' OR ", "subscribers"), op: Condition::GreaterThan(rtok("channel = ponce AND ( 'dog race' != 'bernese mountain' OR subscribers > ", "1000")) }.into(),
).into()),
),
(
"(channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > 1000) AND _geoRadius(12, 13, 14)",
Fc::And(
Fc::Or(
Fc::And(
Fc::Condition { fid: rtok("(", "channel"), op: Condition::Equal(rtok("(channel = ", "ponce")) }.into(),
Fc::Condition { fid: rtok("(channel = ponce AND '", "dog race"), op: Condition::NotEqual(rtok("(channel = ponce AND 'dog race' != '", "bernese mountain")) }.into(),
).into(),
Fc::Condition { fid: rtok("(channel = ponce AND 'dog race' != 'bernese mountain' OR ", "subscribers"), op: Condition::GreaterThan(rtok("(channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > ", "1000")) }.into(),
).into(),
Fc::GeoLowerThan { point: [rtok("(channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > 1000) AND _geoRadius(", "12"), rtok("(channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > 1000) AND _geoRadius(12, ", "13")], radius: rtok("(channel = ponce AND 'dog race' != 'bernese mountain' OR subscribers > 1000) AND _geoRadius(12, 13, ", "14") }.into()
)
)
];
for (input, expected) in test_case {
let result = Fc::parse(input);
assert!(
result.is_ok(),
"Filter `{:?}` was supposed to be parsed but failed with the following error: `{}`",
expected,
result.unwrap_err()
);
let filter = result.unwrap();
assert_eq!(filter, expected, "Filter `{}` failed.", input);
}
}
#[test]
fn error() {
use FilterCondition as Fc;
let test_case = [
// simple test
("channel = Ponce = 12", "Found unexpected characters at the end of the filter: `= 12`. You probably forgot an `OR` or an `AND` rule."),
("channel = ", "Was expecting a value but instead got nothing."),
("channel = 🐻", "Was expecting a value but instead got `🐻`."),
("channel = 🐻 AND followers < 100", "Was expecting a value but instead got `🐻`."),
("OR", "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `TO` or `_geoRadius` at `OR`."),
("AND", "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `TO` or `_geoRadius` at `AND`."),
("channel Ponce", "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `TO` or `_geoRadius` at `channel Ponce`."),
("channel = Ponce OR", "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `TO` or `_geoRadius` but instead got nothing."),
("_geoRadius", "The `_geoRadius` filter expects three arguments: `_geoRadius(latitude, longitude, radius)`."),
("_geoRadius = 12", "The `_geoRadius` filter expects three arguments: `_geoRadius(latitude, longitude, radius)`."),
("_geoPoint(12, 13, 14)", "`_geoPoint` is a reserved keyword and thus can't be used as a filter expression. Use the `_geoRadius(latitude, longitude, distance) built-in rule to filter on `_geo` coordinates."),
("position <= _geoPoint(12, 13, 14)", "`_geoPoint` is a reserved keyword and thus can't be used as a filter expression. Use the `_geoRadius(latitude, longitude, distance) built-in rule to filter on `_geo` coordinates."),
("position <= _geoRadius(12, 13, 14)", "The `_geoRadius` filter is an operation and can't be used as a value."),
("channel = 'ponce", "Expression `\\'ponce` is missing the following closing delimiter: `'`."),
("channel = \"ponce", "Expression `\\\"ponce` is missing the following closing delimiter: `\"`."),
("channel = mv OR (followers >= 1000", "Expression `(followers >= 1000` is missing the following closing delimiter: `)`."),
("channel = mv OR followers >= 1000)", "Found unexpected characters at the end of the filter: `)`. You probably forgot an `OR` or an `AND` rule."),
];
for (input, expected) in test_case {
let result = Fc::parse(input);
assert!(
result.is_err(),
"Filter `{}` wasn't supposed to be parsed but it did with the following result: `{:?}`",
input,
result.unwrap()
);
let filter = result.unwrap_err().to_string();
assert!(filter.starts_with(expected), "Filter `{:?}` was supposed to return the following error:\n{}\n, but instead returned\n{}\n.", input, expected, filter);
}
}
}

16
filter-parser/src/main.rs Normal file
View File

@ -0,0 +1,16 @@
fn main() {
let input = std::env::args().nth(1).expect("You must provide a filter to test");
println!("Trying to execute the following filter:\n{}\n", input);
match filter_parser::FilterCondition::parse(&input) {
Ok(filter) => {
println!("✅ Valid filter");
println!("{:#?}", filter);
}
Err(e) => {
println!("❎ Invalid filter");
println!("{}", e.to_string());
}
}
}

147
filter-parser/src/value.rs Normal file
View File

@ -0,0 +1,147 @@
use nom::branch::alt;
use nom::bytes::complete::{take_till, take_while, take_while1};
use nom::character::complete::{char, multispace0};
use nom::combinator::cut;
use nom::sequence::{delimited, terminated};
use crate::error::NomErrorExt;
use crate::{parse_geo_point, parse_geo_radius, Error, ErrorKind, IResult, Span, Token};
/// value = WS* ~ ( word | singleQuoted | doubleQuoted) ~ WS*
pub fn parse_value(input: Span) -> IResult<Token> {
// to get better diagnostic message we are going to strip the left whitespaces from the input right now
let (input, _) = take_while(char::is_whitespace)(input)?;
// then, we want to check if the user is misusing a geo expression
// This expression cant finish without error.
// We want to return an error in case of failure.
if let Err(err) = parse_geo_point(input) {
if err.is_failure() {
return Err(err);
}
}
match parse_geo_radius(input) {
Ok(_) => return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::MisusedGeo))),
// if we encountered a failure it means the user badly wrote a _geoRadius filter.
// But instead of showing him how to fix his syntax we are going to tell him he should not use this filter as a value.
Err(e) if e.is_failure() => {
return Err(nom::Err::Failure(Error::new_from_kind(input, ErrorKind::MisusedGeo)))
}
_ => (),
}
// singleQuoted = "'" .* all but quotes "'"
let simple_quoted = take_till(|c: char| c == '\'');
// doubleQuoted = "\"" (word | spaces)* "\""
let double_quoted = take_till(|c: char| c == '"');
// word = (alphanumeric | _ | - | .)+
let word = take_while1(is_value_component);
// this parser is only used when an error is encountered and it parse the
// largest string possible that do not contain any “language” syntax.
// If we try to parse `name = 🦀 AND language = rust` we want to return an
// error saying we could not parse `🦀`. Not that no value were found or that
// we could note parse `🦀 AND language = rust`.
// we want to remove the space before entering the alt because if we don't,
// when we create the errors from the output of the alt we have spaces everywhere
let error_word = take_till::<_, _, Error>(is_syntax_component);
terminated(
alt((
delimited(char('\''), cut(simple_quoted), cut(char('\''))),
delimited(char('"'), cut(double_quoted), cut(char('"'))),
word,
)),
multispace0,
)(input)
.map(|(s, t)| (s, t.into()))
// if we found nothing in the alt it means the user specified something that was not recognized as a value
.map_err(|e: nom::Err<Error>| {
e.map_err(|_| Error::new_from_kind(error_word(input).unwrap().1, ErrorKind::ExpectedValue))
})
// if we found encountered a failure it means the user really tried to input a value, but had an unmatched quote
.map_err(|e| {
e.map_fail(|c| Error::new_from_kind(input, ErrorKind::MissingClosingDelimiter(c.char())))
})
}
fn is_value_component(c: char) -> bool {
c.is_alphanumeric() || ['_', '-', '.'].contains(&c)
}
fn is_syntax_component(c: char) -> bool {
c.is_whitespace() || ['(', ')', '=', '<', '>', '!'].contains(&c)
}
#[cfg(test)]
pub mod test {
use nom::Finish;
use super::*;
use crate::tests::rtok;
#[test]
fn name() {
let test_case = [
("channel", rtok("", "channel")),
(".private", rtok("", ".private")),
("I-love-kebab", rtok("", "I-love-kebab")),
("but_snakes_is_also_good", rtok("", "but_snakes_is_also_good")),
("parens(", rtok("", "parens")),
("parens)", rtok("", "parens")),
("not!", rtok("", "not")),
(" channel", rtok(" ", "channel")),
("channel ", rtok("", "channel")),
(" channel ", rtok(" ", "channel")),
("'channel'", rtok("'", "channel")),
("\"channel\"", rtok("\"", "channel")),
("'cha)nnel'", rtok("'", "cha)nnel")),
("'cha\"nnel'", rtok("'", "cha\"nnel")),
("\"cha'nnel\"", rtok("\"", "cha'nnel")),
("\" some spaces \"", rtok("\"", " some spaces ")),
("\"cha'nnel\"", rtok("'", "cha'nnel")),
("\"cha'nnel\"", rtok("'", "cha'nnel")),
("I'm tamo", rtok("'m tamo", "I")),
];
for (input, expected) in test_case {
let input = Span::new_extra(input, input);
let result = parse_value(input);
assert!(
result.is_ok(),
"Filter `{:?}` was supposed to be parsed but failed with the following error: `{}`",
expected,
result.unwrap_err()
);
let value = result.unwrap().1;
assert_eq!(value, expected, "Filter `{}` failed.", input);
}
}
#[test]
fn diagnostic() {
let test_case = [
("🦀", "🦀"),
(" 🦀", "🦀"),
("🦀 AND crab = truc", "🦀"),
("🦀_in_name", "🦀_in_name"),
(" (name = ...", ""),
];
for (input, expected) in test_case {
let input = Span::new_extra(input, input);
let result = parse_value(input);
assert!(
result.is_err(),
"Filter `{}` wasnt supposed to be parsed but it did with the following result: `{:?}`",
expected,
result.unwrap()
);
// get the inner string referenced in the error
let value = *result.finish().unwrap_err().context().fragment();
assert_eq!(value, expected, "Filter `{}` was supposed to fail with the following value: `{}`, but it failed with: `{}`.", input, expected, value);
}
}
}

View File

@ -23,7 +23,8 @@ use milli::documents::DocumentBatchReader;
use milli::update::UpdateIndexingStep::*;
use milli::update::{IndexDocumentsMethod, Setting, UpdateBuilder};
use milli::{
obkv_to_json, CompressionType, FilterCondition, Index, MatchingWords, SearchResult, SortError,
obkv_to_json, CompressionType, Filter as MilliFilter, FilterCondition, Index, MatchingWords,
SearchResult, SortError,
};
use once_cell::sync::OnceCell;
use rayon::ThreadPool;
@ -735,31 +736,37 @@ async fn main() -> anyhow::Result<()> {
search.query(query);
}
let filters = match query.filters {
let filters = match query.filters.as_ref() {
Some(condition) if !condition.trim().is_empty() => {
Some(FilterCondition::from_str(&rtxn, &index, &condition).unwrap())
Some(MilliFilter::from_str(condition).unwrap())
}
_otherwise => None,
};
let facet_filters = match query.facet_filters {
let facet_filters = match query.facet_filters.as_ref() {
Some(array) => {
let eithers = array.into_iter().map(Into::into);
FilterCondition::from_array(&rtxn, &index, eithers).unwrap()
let eithers = array.iter().map(|either| match either {
UntaggedEither::Left(l) => {
Either::Left(l.iter().map(|s| s.as_str()).collect::<Vec<&str>>())
}
UntaggedEither::Right(r) => Either::Right(r.as_str()),
});
MilliFilter::from_array(eithers).unwrap()
}
_otherwise => None,
};
let condition = match (filters, facet_filters) {
(Some(filters), Some(facet_filters)) => {
Some(FilterCondition::And(Box::new(filters), Box::new(facet_filters)))
}
(Some(condition), None) | (None, Some(condition)) => Some(condition),
(Some(filters), Some(facet_filters)) => Some(FilterCondition::And(
Box::new(filters.into()),
Box::new(facet_filters.into()),
)),
(Some(condition), None) | (None, Some(condition)) => Some(condition.into()),
_otherwise => None,
};
if let Some(condition) = condition {
search.filter(condition);
search.filter(condition.into());
}
if let Some(limit) = query.limit {

View File

@ -38,9 +38,7 @@ smallvec = "1.6.1"
tempfile = "3.2.0"
uuid = { version = "0.8.2", features = ["v4"] }
# facet filter parser
pest = { git = "https://github.com/pest-parser/pest.git", rev = "51fd1d49f1041f7839975664ef71fe15c7dcaf67" }
pest_derive = "2.1.0"
filter-parser = { path = "../filter-parser" }
# documents words self-join
itertools = "0.10.0"

View File

@ -7,7 +7,6 @@ use heed::{Error as HeedError, MdbError};
use rayon::ThreadPoolBuildError;
use serde_json::{Map, Value};
use crate::search::ParserRule;
use crate::{CriterionError, DocumentId, FieldId, SortError};
pub type Object = Map<String, Value>;
@ -59,8 +58,8 @@ pub enum UserError {
DocumentLimitReached,
InvalidDocumentId { document_id: Value },
InvalidFacetsDistribution { invalid_facets_name: BTreeSet<String> },
InvalidFilter(FilterError),
InvalidGeoField { document_id: Value, object: Value },
InvalidFilter(String),
InvalidSortableAttribute { field: String, valid_fields: BTreeSet<String> },
SortRankingRuleMissing,
InvalidStoreFile,
@ -74,13 +73,6 @@ pub enum UserError {
UnknownInternalDocumentId { document_id: DocumentId },
}
#[derive(Debug)]
pub enum FilterError {
InvalidAttribute { field: String, valid_fields: BTreeSet<String> },
ReservedKeyword { field: String, context: Option<String> },
Syntax(pest::error::Error<ParserRule>),
}
impl From<io::Error> for Error {
fn from(error: io::Error) -> Error {
// TODO must be improved and more precise
@ -165,12 +157,6 @@ impl From<UserError> for Error {
}
}
impl From<FilterError> for Error {
fn from(error: FilterError) -> Error {
Error::UserError(UserError::InvalidFilter(error))
}
}
impl From<SerializationError> for Error {
fn from(error: SerializationError) -> Error {
Error::InternalError(InternalError::Serialization(error))
@ -219,6 +205,7 @@ impl StdError for InternalError {}
impl fmt::Display for UserError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
Self::InvalidFilter(error) => f.write_str(error),
Self::AttributeLimitReached => f.write_str("A document cannot contain more than 65,535 fields."),
Self::CriterionError(error) => write!(f, "{}", error),
Self::DocumentLimitReached => f.write_str("Maximum number of documents reached."),
@ -231,7 +218,6 @@ impl fmt::Display for UserError {
name_list
)
}
Self::InvalidFilter(error) => error.fmt(f),
Self::InvalidGeoField { document_id, object } => {
let document_id = match document_id {
Value::String(id) => id.clone(),
@ -293,40 +279,6 @@ ranking rules settings to use the sort parameter at search time.",
}
}
impl fmt::Display for FilterError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
Self::InvalidAttribute { field, valid_fields } => write!(
f,
"Attribute `{}` is not filterable. Available filterable attributes are: `{}`.",
field,
valid_fields
.clone()
.into_iter()
.reduce(|left, right| left + "`, `" + &right)
.unwrap_or_default()
),
Self::ReservedKeyword { field, context: Some(context) } => {
write!(
f,
"`{}` is a reserved keyword and thus can't be used as a filter expression. {}",
field, context
)
}
Self::ReservedKeyword { field, context: None } => {
write!(
f,
"`{}` is a reserved keyword and thus can't be used as a filter expression.",
field
)
}
Self::Syntax(syntax_helper) => {
write!(f, "Invalid syntax for the filter parameter: `{}`.", syntax_helper)
}
}
}
}
impl StdError for UserError {}
impl fmt::Display for FieldIdMapMissingEntry {

View File

@ -1,6 +1,3 @@
#[macro_use]
extern crate pest_derive;
#[macro_use]
pub mod documents;
@ -20,6 +17,7 @@ use std::collections::{BTreeMap, HashMap};
use std::convert::{TryFrom, TryInto};
use std::hash::BuildHasherDefault;
pub use filter_parser::{Condition, FilterCondition};
use fxhash::{FxHasher32, FxHasher64};
pub use grenad::CompressionType;
use serde_json::{Map, Value};
@ -37,7 +35,7 @@ pub use self::heed_codec::{
RoaringBitmapLenCodec, StrBEU32Codec, StrStrU8Codec,
};
pub use self::index::Index;
pub use self::search::{FacetDistribution, FilterCondition, MatchingWords, Search, SearchResult};
pub use self::search::{FacetDistribution, Filter, MatchingWords, Search, SearchResult};
pub type Result<T> = std::result::Result<T, error::Error>;

View File

@ -0,0 +1,589 @@
use std::fmt::{Debug, Display};
use std::ops::Bound::{self, Excluded, Included};
use std::ops::Deref;
use either::Either;
pub use filter_parser::{Condition, Error as FPError, FilterCondition, Span, Token};
use heed::types::DecodeIgnore;
use log::debug;
use roaring::RoaringBitmap;
use super::FacetNumberRange;
use crate::error::{Error, UserError};
use crate::heed_codec::facet::{
FacetLevelValueF64Codec, FacetStringLevelZeroCodec, FacetStringLevelZeroValueCodec,
};
use crate::{distance_between_two_points, CboRoaringBitmapCodec, FieldId, Index, Result};
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Filter<'a> {
condition: FilterCondition<'a>,
}
#[derive(Debug)]
enum FilterError<'a> {
AttributeNotFilterable { attribute: &'a str, filterable: String },
BadGeo(&'a str),
BadGeoLat(f64),
BadGeoLng(f64),
Reserved(&'a str),
InternalError,
}
impl<'a> std::error::Error for FilterError<'a> {}
impl<'a> Display for FilterError<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Self::AttributeNotFilterable { attribute, filterable } => write!(
f,
"Attribute `{}` is not filterable. Available filterable attributes are: `{}`.",
attribute,
filterable,
),
Self::Reserved(keyword) => write!(
f,
"`{}` is a reserved keyword and thus can't be used as a filter expression.",
keyword
),
Self::BadGeo(keyword) => write!(f, "`{}` is a reserved keyword and thus can't be used as a filter expression. Use the _geoRadius(latitude, longitude, distance) built-in rule to filter on _geo field coordinates.", keyword),
Self::BadGeoLat(lat) => write!(f, "Bad latitude `{}`. Latitude must be contained between -90 and 90 degrees. ", lat),
Self::BadGeoLng(lng) => write!(f, "Bad longitude `{}`. Longitude must be contained between -180 and 180 degrees. ", lng),
Self::InternalError => write!(f, "Internal error while executing this filter."),
}
}
}
impl<'a> From<FPError<'a>> for Error {
fn from(error: FPError<'a>) -> Self {
Self::UserError(UserError::InvalidFilter(error.to_string()))
}
}
impl<'a> From<Filter<'a>> for FilterCondition<'a> {
fn from(f: Filter<'a>) -> Self {
f.condition
}
}
impl<'a> Filter<'a> {
pub fn from_array<I, J>(array: I) -> Result<Option<Self>>
where
I: IntoIterator<Item = Either<J, &'a str>>,
J: IntoIterator<Item = &'a str>,
{
let mut ands: Option<FilterCondition> = None;
for either in array {
match either {
Either::Left(array) => {
let mut ors = None;
for rule in array {
let condition = Self::from_str(rule.as_ref())?.condition;
ors = match ors.take() {
Some(ors) => {
Some(FilterCondition::Or(Box::new(ors), Box::new(condition)))
}
None => Some(condition),
};
}
if let Some(rule) = ors {
ands = match ands.take() {
Some(ands) => {
Some(FilterCondition::And(Box::new(ands), Box::new(rule)))
}
None => Some(rule),
};
}
}
Either::Right(rule) => {
let condition = Self::from_str(rule.as_ref())?.condition;
ands = match ands.take() {
Some(ands) => {
Some(FilterCondition::And(Box::new(ands), Box::new(condition)))
}
None => Some(condition),
};
}
}
}
Ok(ands.map(|ands| Self { condition: ands }))
}
pub fn from_str(expression: &'a str) -> Result<Self> {
let condition = match FilterCondition::parse(expression) {
Ok(fc) => Ok(fc),
Err(e) => Err(Error::UserError(UserError::InvalidFilter(e.to_string()))),
}?;
Ok(Self { condition })
}
}
impl<'a> Filter<'a> {
/// Aggregates the documents ids that are part of the specified range automatically
/// going deeper through the levels.
fn explore_facet_number_levels(
rtxn: &heed::RoTxn,
db: heed::Database<FacetLevelValueF64Codec, CboRoaringBitmapCodec>,
field_id: FieldId,
level: u8,
left: Bound<f64>,
right: Bound<f64>,
output: &mut RoaringBitmap,
) -> Result<()> {
match (left, right) {
// If the request is an exact value we must go directly to the deepest level.
(Included(l), Included(r)) if l == r && level > 0 => {
return Self::explore_facet_number_levels(
rtxn, db, field_id, 0, left, right, output,
);
}
// lower TO upper when lower > upper must return no result
(Included(l), Included(r)) if l > r => return Ok(()),
(Included(l), Excluded(r)) if l >= r => return Ok(()),
(Excluded(l), Excluded(r)) if l >= r => return Ok(()),
(Excluded(l), Included(r)) if l >= r => return Ok(()),
(_, _) => (),
}
let mut left_found = None;
let mut right_found = None;
// We must create a custom iterator to be able to iterate over the
// requested range as the range iterator cannot express some conditions.
let iter = FacetNumberRange::new(rtxn, db, field_id, level, left, right)?;
debug!("Iterating between {:?} and {:?} (level {})", left, right, level);
for (i, result) in iter.enumerate() {
let ((_fid, level, l, r), docids) = result?;
debug!("{:?} to {:?} (level {}) found {} documents", l, r, level, docids.len());
*output |= docids;
// We save the leftest and rightest bounds we actually found at this level.
if i == 0 {
left_found = Some(l);
}
right_found = Some(r);
}
// Can we go deeper?
let deeper_level = match level.checked_sub(1) {
Some(level) => level,
None => return Ok(()),
};
// We must refine the left and right bounds of this range by retrieving the
// missing part in a deeper level.
match left_found.zip(right_found) {
Some((left_found, right_found)) => {
// If the bound is satisfied we avoid calling this function again.
if !matches!(left, Included(l) if l == left_found) {
let sub_right = Excluded(left_found);
debug!(
"calling left with {:?} to {:?} (level {})",
left, sub_right, deeper_level
);
Self::explore_facet_number_levels(
rtxn,
db,
field_id,
deeper_level,
left,
sub_right,
output,
)?;
}
if !matches!(right, Included(r) if r == right_found) {
let sub_left = Excluded(right_found);
debug!(
"calling right with {:?} to {:?} (level {})",
sub_left, right, deeper_level
);
Self::explore_facet_number_levels(
rtxn,
db,
field_id,
deeper_level,
sub_left,
right,
output,
)?;
}
}
None => {
// If we found nothing at this level it means that we must find
// the same bounds but at a deeper, more precise level.
Self::explore_facet_number_levels(
rtxn,
db,
field_id,
deeper_level,
left,
right,
output,
)?;
}
}
Ok(())
}
fn evaluate_operator(
rtxn: &heed::RoTxn,
index: &Index,
numbers_db: heed::Database<FacetLevelValueF64Codec, CboRoaringBitmapCodec>,
strings_db: heed::Database<FacetStringLevelZeroCodec, FacetStringLevelZeroValueCodec>,
field_id: FieldId,
operator: &Condition<'a>,
) -> Result<RoaringBitmap> {
// Make sure we always bound the ranges with the field id and the level,
// as the facets values are all in the same database and prefixed by the
// field id and the level.
let (left, right) = match operator {
Condition::GreaterThan(val) => (Excluded(val.parse()?), Included(f64::MAX)),
Condition::GreaterThanOrEqual(val) => (Included(val.parse()?), Included(f64::MAX)),
Condition::LowerThan(val) => (Included(f64::MIN), Excluded(val.parse()?)),
Condition::LowerThanOrEqual(val) => (Included(f64::MIN), Included(val.parse()?)),
Condition::Between { from, to } => (Included(from.parse()?), Included(to.parse()?)),
Condition::Equal(val) => {
let (_original_value, string_docids) =
strings_db.get(rtxn, &(field_id, &val.to_lowercase()))?.unwrap_or_default();
let number = val.parse::<f64>().ok();
let number_docids = match number {
Some(n) => {
let n = Included(n);
let mut output = RoaringBitmap::new();
Self::explore_facet_number_levels(
rtxn,
numbers_db,
field_id,
0,
n,
n,
&mut output,
)?;
output
}
None => RoaringBitmap::new(),
};
return Ok(string_docids | number_docids);
}
Condition::NotEqual(val) => {
let number = val.parse::<f64>().ok();
let all_numbers_ids = if number.is_some() {
index.number_faceted_documents_ids(rtxn, field_id)?
} else {
RoaringBitmap::new()
};
let all_strings_ids = index.string_faceted_documents_ids(rtxn, field_id)?;
let operator = Condition::Equal(val.clone());
let docids = Self::evaluate_operator(
rtxn, index, numbers_db, strings_db, field_id, &operator,
)?;
return Ok((all_numbers_ids | all_strings_ids) - docids);
}
};
// Ask for the biggest value that can exist for this specific field, if it exists
// that's fine if it don't, the value just before will be returned instead.
let biggest_level = numbers_db
.remap_data_type::<DecodeIgnore>()
.get_lower_than_or_equal_to(rtxn, &(field_id, u8::MAX, f64::MAX, f64::MAX))?
.and_then(|((id, level, _, _), _)| if id == field_id { Some(level) } else { None });
match biggest_level {
Some(level) => {
let mut output = RoaringBitmap::new();
Self::explore_facet_number_levels(
rtxn,
numbers_db,
field_id,
level,
left,
right,
&mut output,
)?;
Ok(output)
}
None => Ok(RoaringBitmap::new()),
}
}
pub fn evaluate(&self, rtxn: &heed::RoTxn, index: &Index) -> Result<RoaringBitmap> {
let numbers_db = index.facet_id_f64_docids;
let strings_db = index.facet_id_string_docids;
match &self.condition {
FilterCondition::Condition { fid, op } => {
let filterable_fields = index.filterable_fields(rtxn)?;
if filterable_fields.contains(&fid.to_lowercase()) {
let field_ids_map = index.fields_ids_map(rtxn)?;
if let Some(fid) = field_ids_map.id(&fid) {
Self::evaluate_operator(rtxn, index, numbers_db, strings_db, fid, &op)
} else {
return Err(fid.as_external_error(FilterError::InternalError))?;
}
} else {
match *fid.deref() {
attribute @ "_geo" => {
return Err(fid.as_external_error(FilterError::BadGeo(attribute)))?;
}
attribute if attribute.starts_with("_geoPoint(") => {
return Err(fid.as_external_error(FilterError::BadGeo("_geoPoint")))?;
}
attribute @ "_geoDistance" => {
return Err(fid.as_external_error(FilterError::Reserved(attribute)))?;
}
attribute => {
return Err(fid.as_external_error(
FilterError::AttributeNotFilterable {
attribute,
filterable: filterable_fields
.into_iter()
.collect::<Vec<_>>()
.join(" "),
},
))?;
}
}
}
}
FilterCondition::Or(lhs, rhs) => {
let lhs = Self::evaluate(&(lhs.as_ref().clone()).into(), rtxn, index)?;
let rhs = Self::evaluate(&(rhs.as_ref().clone()).into(), rtxn, index)?;
Ok(lhs | rhs)
}
FilterCondition::And(lhs, rhs) => {
let lhs = Self::evaluate(&(lhs.as_ref().clone()).into(), rtxn, index)?;
let rhs = Self::evaluate(&(rhs.as_ref().clone()).into(), rtxn, index)?;
Ok(lhs & rhs)
}
FilterCondition::Empty => Ok(RoaringBitmap::new()),
FilterCondition::GeoLowerThan { point, radius } => {
let filterable_fields = index.filterable_fields(rtxn)?;
if filterable_fields.contains("_geo") {
let base_point: [f64; 2] = [point[0].parse()?, point[1].parse()?];
if !(-90.0..=90.0).contains(&base_point[0]) {
return Err(
point[0].as_external_error(FilterError::BadGeoLat(base_point[0]))
)?;
}
if !(-180.0..=180.0).contains(&base_point[1]) {
return Err(
point[1].as_external_error(FilterError::BadGeoLng(base_point[1]))
)?;
}
let radius = radius.parse()?;
let rtree = match index.geo_rtree(rtxn)? {
Some(rtree) => rtree,
None => return Ok(RoaringBitmap::new()),
};
let result = rtree
.nearest_neighbor_iter(&base_point)
.take_while(|point| {
distance_between_two_points(&base_point, point.geom()) < radius
})
.map(|point| point.data)
.collect();
Ok(result)
} else {
return Err(point[0].as_external_error(FilterError::AttributeNotFilterable {
attribute: "_geo",
filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
}))?;
}
}
FilterCondition::GeoGreaterThan { point, radius } => {
let result = Self::evaluate(
&FilterCondition::GeoLowerThan { point: point.clone(), radius: radius.clone() }
.into(),
rtxn,
index,
)?;
let geo_faceted_doc_ids = index.geo_faceted_documents_ids(rtxn)?;
Ok(geo_faceted_doc_ids - result)
}
}
}
}
impl<'a> From<FilterCondition<'a>> for Filter<'a> {
fn from(fc: FilterCondition<'a>) -> Self {
Self { condition: fc }
}
}
#[cfg(test)]
mod tests {
use big_s::S;
use either::Either;
use heed::EnvOpenOptions;
use maplit::hashset;
use super::*;
use crate::update::Settings;
use crate::Index;
#[test]
fn from_array() {
// Simple array with Left
let condition = Filter::from_array(vec![Either::Left(["channel = mv"])]).unwrap().unwrap();
let expected = Filter::from_str("channel = mv").unwrap();
assert_eq!(condition, expected);
// Simple array with Right
let condition = Filter::from_array::<_, Option<&str>>(vec![Either::Right("channel = mv")])
.unwrap()
.unwrap();
let expected = Filter::from_str("channel = mv").unwrap();
assert_eq!(condition, expected);
// Array with Left and escaped quote
let condition =
Filter::from_array(vec![Either::Left(["channel = \"Mister Mv\""])]).unwrap().unwrap();
let expected = Filter::from_str("channel = \"Mister Mv\"").unwrap();
assert_eq!(condition, expected);
// Array with Right and escaped quote
let condition =
Filter::from_array::<_, Option<&str>>(vec![Either::Right("channel = \"Mister Mv\"")])
.unwrap()
.unwrap();
let expected = Filter::from_str("channel = \"Mister Mv\"").unwrap();
assert_eq!(condition, expected);
// Array with Left and escaped simple quote
let condition =
Filter::from_array(vec![Either::Left(["channel = 'Mister Mv'"])]).unwrap().unwrap();
let expected = Filter::from_str("channel = 'Mister Mv'").unwrap();
assert_eq!(condition, expected);
// Array with Right and escaped simple quote
let condition =
Filter::from_array::<_, Option<&str>>(vec![Either::Right("channel = 'Mister Mv'")])
.unwrap()
.unwrap();
let expected = Filter::from_str("channel = 'Mister Mv'").unwrap();
assert_eq!(condition, expected);
// Simple with parenthesis
let condition =
Filter::from_array(vec![Either::Left(["(channel = mv)"])]).unwrap().unwrap();
let expected = Filter::from_str("(channel = mv)").unwrap();
assert_eq!(condition, expected);
// Test that the facet condition is correctly generated.
let condition = Filter::from_array(vec![
Either::Right("channel = gotaga"),
Either::Left(vec!["timestamp = 44", "channel != ponce"]),
])
.unwrap()
.unwrap();
let expected =
Filter::from_str("channel = gotaga AND (timestamp = 44 OR channel != ponce)").unwrap();
println!("\nExpecting: {:#?}\nGot: {:#?}\n", expected, condition);
assert_eq!(condition, expected);
}
#[test]
fn not_filterable() {
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
let rtxn = index.read_txn().unwrap();
let filter = Filter::from_str("_geoRadius(42, 150, 10)").unwrap();
let error = filter.evaluate(&rtxn, &index).unwrap_err();
assert!(error.to_string().starts_with(
"Attribute `_geo` is not filterable. Available filterable attributes are: ``."
));
let filter = Filter::from_str("dog = \"bernese mountain\"").unwrap();
let error = filter.evaluate(&rtxn, &index).unwrap_err();
assert!(error.to_string().starts_with(
"Attribute `dog` is not filterable. Available filterable attributes are: ``."
));
drop(rtxn);
// Set the filterable fields to be the channel.
let mut wtxn = index.write_txn().unwrap();
let mut builder = Settings::new(&mut wtxn, &index, 0);
builder.set_searchable_fields(vec![S("title")]);
builder.set_filterable_fields(hashset! { S("title") });
builder.execute(|_, _| ()).unwrap();
wtxn.commit().unwrap();
let rtxn = index.read_txn().unwrap();
let filter = Filter::from_str("_geoRadius(-100, 150, 10)").unwrap();
let error = filter.evaluate(&rtxn, &index).unwrap_err();
assert!(error.to_string().starts_with(
"Attribute `_geo` is not filterable. Available filterable attributes are: `title`."
));
let filter = Filter::from_str("name = 12").unwrap();
let error = filter.evaluate(&rtxn, &index).unwrap_err();
assert!(error.to_string().starts_with(
"Attribute `name` is not filterable. Available filterable attributes are: `title`."
));
}
#[test]
fn geo_radius_error() {
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
// Set the filterable fields to be the channel.
let mut wtxn = index.write_txn().unwrap();
let mut builder = Settings::new(&mut wtxn, &index, 0);
builder.set_searchable_fields(vec![S("_geo"), S("price")]); // to keep the fields order
builder.set_filterable_fields(hashset! { S("_geo"), S("price") });
builder.execute(|_, _| ()).unwrap();
wtxn.commit().unwrap();
let rtxn = index.read_txn().unwrap();
// georadius have a bad latitude
let filter = Filter::from_str("_geoRadius(-100, 150, 10)").unwrap();
let error = filter.evaluate(&rtxn, &index).unwrap_err();
assert!(
error.to_string().starts_with(
"Bad latitude `-100`. Latitude must be contained between -90 and 90 degrees."
),
"{}",
error.to_string()
);
// georadius have a bad latitude
let filter = Filter::from_str("_geoRadius(-90.0000001, 150, 10)").unwrap();
let error = filter.evaluate(&rtxn, &index).unwrap_err();
assert!(error.to_string().contains(
"Bad latitude `-90.0000001`. Latitude must be contained between -90 and 90 degrees."
));
// georadius have a bad longitude
let filter = Filter::from_str("_geoRadius(-10, 250, 10)").unwrap();
let error = filter.evaluate(&rtxn, &index).unwrap_err();
assert!(
error.to_string().contains(
"Bad longitude `250`. Longitude must be contained between -180 and 180 degrees."
),
"{}",
error.to_string(),
);
// georadius have a bad longitude
let filter = Filter::from_str("_geoRadius(-10, 180.000001, 10)").unwrap();
let error = filter.evaluate(&rtxn, &index).unwrap_err();
assert!(error.to_string().contains(
"Bad longitude `180.000001`. Longitude must be contained between -180 and 180 degrees."
));
}
}

View File

@ -1,929 +0,0 @@
use std::collections::HashSet;
use std::fmt::Debug;
use std::ops::Bound::{self, Excluded, Included};
use std::result::Result as StdResult;
use std::str::FromStr;
use either::Either;
use heed::types::DecodeIgnore;
use log::debug;
use pest::error::{Error as PestError, ErrorVariant};
use pest::iterators::{Pair, Pairs};
use pest::Parser;
use roaring::RoaringBitmap;
use self::FilterCondition::*;
use self::Operator::*;
use super::parser::{FilterParser, Rule, PREC_CLIMBER};
use super::FacetNumberRange;
use crate::error::FilterError;
use crate::heed_codec::facet::{
FacetLevelValueF64Codec, FacetStringLevelZeroCodec, FacetStringLevelZeroValueCodec,
};
use crate::{
distance_between_two_points, CboRoaringBitmapCodec, FieldId, FieldsIdsMap, Index, Result,
};
#[derive(Debug, Clone, PartialEq)]
pub enum Operator {
GreaterThan(f64),
GreaterThanOrEqual(f64),
Equal(Option<f64>, String),
NotEqual(Option<f64>, String),
LowerThan(f64),
LowerThanOrEqual(f64),
Between(f64, f64),
GeoLowerThan([f64; 2], f64),
GeoGreaterThan([f64; 2], f64),
}
impl Operator {
/// This method can return two operations in case it must express
/// an OR operation for the between case (i.e. `TO`).
fn negate(self) -> (Self, Option<Self>) {
match self {
GreaterThan(n) => (LowerThanOrEqual(n), None),
GreaterThanOrEqual(n) => (LowerThan(n), None),
Equal(n, s) => (NotEqual(n, s), None),
NotEqual(n, s) => (Equal(n, s), None),
LowerThan(n) => (GreaterThanOrEqual(n), None),
LowerThanOrEqual(n) => (GreaterThan(n), None),
Between(n, m) => (LowerThan(n), Some(GreaterThan(m))),
GeoLowerThan(point, distance) => (GeoGreaterThan(point, distance), None),
GeoGreaterThan(point, distance) => (GeoLowerThan(point, distance), None),
}
}
}
#[derive(Debug, Clone, PartialEq)]
pub enum FilterCondition {
Operator(FieldId, Operator),
Or(Box<Self>, Box<Self>),
And(Box<Self>, Box<Self>),
Empty,
}
impl FilterCondition {
pub fn from_array<I, J, A, B>(
rtxn: &heed::RoTxn,
index: &Index,
array: I,
) -> Result<Option<FilterCondition>>
where
I: IntoIterator<Item = Either<J, B>>,
J: IntoIterator<Item = A>,
A: AsRef<str>,
B: AsRef<str>,
{
let mut ands = None;
for either in array {
match either {
Either::Left(array) => {
let mut ors = None;
for rule in array {
let condition = FilterCondition::from_str(rtxn, index, rule.as_ref())?;
ors = match ors.take() {
Some(ors) => Some(Or(Box::new(ors), Box::new(condition))),
None => Some(condition),
};
}
if let Some(rule) = ors {
ands = match ands.take() {
Some(ands) => Some(And(Box::new(ands), Box::new(rule))),
None => Some(rule),
};
}
}
Either::Right(rule) => {
let condition = FilterCondition::from_str(rtxn, index, rule.as_ref())?;
ands = match ands.take() {
Some(ands) => Some(And(Box::new(ands), Box::new(condition))),
None => Some(condition),
};
}
}
}
Ok(ands)
}
pub fn from_str(
rtxn: &heed::RoTxn,
index: &Index,
expression: &str,
) -> Result<FilterCondition> {
let fields_ids_map = index.fields_ids_map(rtxn)?;
let filterable_fields = index.filterable_fields(rtxn)?;
let lexed = FilterParser::parse(Rule::prgm, expression).map_err(FilterError::Syntax)?;
FilterCondition::from_pairs(&fields_ids_map, &filterable_fields, lexed)
}
fn from_pairs(
fim: &FieldsIdsMap,
ff: &HashSet<String>,
expression: Pairs<Rule>,
) -> Result<Self> {
PREC_CLIMBER.climb(
expression,
|pair: Pair<Rule>| match pair.as_rule() {
Rule::greater => Ok(Self::greater_than(fim, ff, pair)?),
Rule::geq => Ok(Self::greater_than_or_equal(fim, ff, pair)?),
Rule::eq => Ok(Self::equal(fim, ff, pair)?),
Rule::neq => Ok(Self::equal(fim, ff, pair)?.negate()),
Rule::leq => Ok(Self::lower_than_or_equal(fim, ff, pair)?),
Rule::less => Ok(Self::lower_than(fim, ff, pair)?),
Rule::between => Ok(Self::between(fim, ff, pair)?),
Rule::geo_radius => Ok(Self::geo_radius(fim, ff, pair)?),
Rule::not => Ok(Self::from_pairs(fim, ff, pair.into_inner())?.negate()),
Rule::prgm => Self::from_pairs(fim, ff, pair.into_inner()),
Rule::term => Self::from_pairs(fim, ff, pair.into_inner()),
_ => unreachable!(),
},
|lhs: Result<Self>, op: Pair<Rule>, rhs: Result<Self>| match op.as_rule() {
Rule::or => Ok(Or(Box::new(lhs?), Box::new(rhs?))),
Rule::and => Ok(And(Box::new(lhs?), Box::new(rhs?))),
_ => unreachable!(),
},
)
}
fn negate(self) -> FilterCondition {
match self {
Operator(fid, op) => match op.negate() {
(op, None) => Operator(fid, op),
(a, Some(b)) => Or(Box::new(Operator(fid, a)), Box::new(Operator(fid, b))),
},
Or(a, b) => And(Box::new(a.negate()), Box::new(b.negate())),
And(a, b) => Or(Box::new(a.negate()), Box::new(b.negate())),
Empty => Empty,
}
}
fn geo_radius(
fields_ids_map: &FieldsIdsMap,
filterable_fields: &HashSet<String>,
item: Pair<Rule>,
) -> Result<FilterCondition> {
if !filterable_fields.contains("_geo") {
return Err(FilterError::InvalidAttribute {
field: "_geo".to_string(),
valid_fields: filterable_fields.into_iter().cloned().collect(),
}
.into());
}
let mut items = item.into_inner();
let fid = match fields_ids_map.id("_geo") {
Some(fid) => fid,
None => return Ok(Empty),
};
let parameters_item = items.next().unwrap();
// We don't need more than 3 parameters, but to handle errors correctly we are still going
// to extract the first 4 parameters
let param_span = parameters_item.as_span();
let parameters = parameters_item
.into_inner()
.take(4)
.map(|param| (param.clone(), param.as_span()))
.map(|(param, span)| pest_parse(param).0.map(|arg| (arg, span)))
.collect::<StdResult<Vec<(f64, _)>, _>>()
.map_err(FilterError::Syntax)?;
if parameters.len() != 3 {
return Err(FilterError::Syntax(PestError::new_from_span(
ErrorVariant::CustomError {
message: format!("The _geoRadius filter expect three arguments: _geoRadius(latitude, longitude, radius)"),
},
// we want to point to the last parameters and if there was no parameters we
// point to the parenthesis
parameters.last().map(|param| param.1.clone()).unwrap_or(param_span),
)).into());
}
let (lat, lng, distance) = (&parameters[0], &parameters[1], parameters[2].0);
if !(-90.0..=90.0).contains(&lat.0) {
return Err(FilterError::Syntax(PestError::new_from_span(
ErrorVariant::CustomError {
message: format!("Latitude must be contained between -90 and 90 degrees."),
},
lat.1.clone(),
)))?;
} else if !(-180.0..=180.0).contains(&lng.0) {
return Err(FilterError::Syntax(PestError::new_from_span(
ErrorVariant::CustomError {
message: format!("Longitude must be contained between -180 and 180 degrees."),
},
lng.1.clone(),
)))?;
}
Ok(Operator(fid, GeoLowerThan([lat.0, lng.0], distance)))
}
fn between(
fields_ids_map: &FieldsIdsMap,
filterable_fields: &HashSet<String>,
item: Pair<Rule>,
) -> Result<FilterCondition> {
let mut items = item.into_inner();
let fid = match field_id(fields_ids_map, filterable_fields, &mut items)? {
Some(fid) => fid,
None => return Ok(Empty),
};
let (lresult, _) = pest_parse(items.next().unwrap());
let (rresult, _) = pest_parse(items.next().unwrap());
let lvalue = lresult.map_err(FilterError::Syntax)?;
let rvalue = rresult.map_err(FilterError::Syntax)?;
Ok(Operator(fid, Between(lvalue, rvalue)))
}
fn equal(
fields_ids_map: &FieldsIdsMap,
filterable_fields: &HashSet<String>,
item: Pair<Rule>,
) -> Result<FilterCondition> {
let mut items = item.into_inner();
let fid = match field_id(fields_ids_map, filterable_fields, &mut items)? {
Some(fid) => fid,
None => return Ok(Empty),
};
let value = items.next().unwrap();
let (result, svalue) = pest_parse(value);
let svalue = svalue.to_lowercase();
Ok(Operator(fid, Equal(result.ok(), svalue)))
}
fn greater_than(
fields_ids_map: &FieldsIdsMap,
filterable_fields: &HashSet<String>,
item: Pair<Rule>,
) -> Result<FilterCondition> {
let mut items = item.into_inner();
let fid = match field_id(fields_ids_map, filterable_fields, &mut items)? {
Some(fid) => fid,
None => return Ok(Empty),
};
let value = items.next().unwrap();
let (result, _svalue) = pest_parse(value);
let value = result.map_err(FilterError::Syntax)?;
Ok(Operator(fid, GreaterThan(value)))
}
fn greater_than_or_equal(
fields_ids_map: &FieldsIdsMap,
filterable_fields: &HashSet<String>,
item: Pair<Rule>,
) -> Result<FilterCondition> {
let mut items = item.into_inner();
let fid = match field_id(fields_ids_map, filterable_fields, &mut items)? {
Some(fid) => fid,
None => return Ok(Empty),
};
let value = items.next().unwrap();
let (result, _svalue) = pest_parse(value);
let value = result.map_err(FilterError::Syntax)?;
Ok(Operator(fid, GreaterThanOrEqual(value)))
}
fn lower_than(
fields_ids_map: &FieldsIdsMap,
filterable_fields: &HashSet<String>,
item: Pair<Rule>,
) -> Result<FilterCondition> {
let mut items = item.into_inner();
let fid = match field_id(fields_ids_map, filterable_fields, &mut items)? {
Some(fid) => fid,
None => return Ok(Empty),
};
let value = items.next().unwrap();
let (result, _svalue) = pest_parse(value);
let value = result.map_err(FilterError::Syntax)?;
Ok(Operator(fid, LowerThan(value)))
}
fn lower_than_or_equal(
fields_ids_map: &FieldsIdsMap,
filterable_fields: &HashSet<String>,
item: Pair<Rule>,
) -> Result<FilterCondition> {
let mut items = item.into_inner();
let fid = match field_id(fields_ids_map, filterable_fields, &mut items)? {
Some(fid) => fid,
None => return Ok(Empty),
};
let value = items.next().unwrap();
let (result, _svalue) = pest_parse(value);
let value = result.map_err(FilterError::Syntax)?;
Ok(Operator(fid, LowerThanOrEqual(value)))
}
}
impl FilterCondition {
/// Aggregates the documents ids that are part of the specified range automatically
/// going deeper through the levels.
fn explore_facet_number_levels(
rtxn: &heed::RoTxn,
db: heed::Database<FacetLevelValueF64Codec, CboRoaringBitmapCodec>,
field_id: FieldId,
level: u8,
left: Bound<f64>,
right: Bound<f64>,
output: &mut RoaringBitmap,
) -> Result<()> {
match (left, right) {
// If the request is an exact value we must go directly to the deepest level.
(Included(l), Included(r)) if l == r && level > 0 => {
return Self::explore_facet_number_levels(
rtxn, db, field_id, 0, left, right, output,
);
}
// lower TO upper when lower > upper must return no result
(Included(l), Included(r)) if l > r => return Ok(()),
(Included(l), Excluded(r)) if l >= r => return Ok(()),
(Excluded(l), Excluded(r)) if l >= r => return Ok(()),
(Excluded(l), Included(r)) if l >= r => return Ok(()),
(_, _) => (),
}
let mut left_found = None;
let mut right_found = None;
// We must create a custom iterator to be able to iterate over the
// requested range as the range iterator cannot express some conditions.
let iter = FacetNumberRange::new(rtxn, db, field_id, level, left, right)?;
debug!("Iterating between {:?} and {:?} (level {})", left, right, level);
for (i, result) in iter.enumerate() {
let ((_fid, level, l, r), docids) = result?;
debug!("{:?} to {:?} (level {}) found {} documents", l, r, level, docids.len());
*output |= docids;
// We save the leftest and rightest bounds we actually found at this level.
if i == 0 {
left_found = Some(l);
}
right_found = Some(r);
}
// Can we go deeper?
let deeper_level = match level.checked_sub(1) {
Some(level) => level,
None => return Ok(()),
};
// We must refine the left and right bounds of this range by retrieving the
// missing part in a deeper level.
match left_found.zip(right_found) {
Some((left_found, right_found)) => {
// If the bound is satisfied we avoid calling this function again.
if !matches!(left, Included(l) if l == left_found) {
let sub_right = Excluded(left_found);
debug!(
"calling left with {:?} to {:?} (level {})",
left, sub_right, deeper_level
);
Self::explore_facet_number_levels(
rtxn,
db,
field_id,
deeper_level,
left,
sub_right,
output,
)?;
}
if !matches!(right, Included(r) if r == right_found) {
let sub_left = Excluded(right_found);
debug!(
"calling right with {:?} to {:?} (level {})",
sub_left, right, deeper_level
);
Self::explore_facet_number_levels(
rtxn,
db,
field_id,
deeper_level,
sub_left,
right,
output,
)?;
}
}
None => {
// If we found nothing at this level it means that we must find
// the same bounds but at a deeper, more precise level.
Self::explore_facet_number_levels(
rtxn,
db,
field_id,
deeper_level,
left,
right,
output,
)?;
}
}
Ok(())
}
fn evaluate_operator(
rtxn: &heed::RoTxn,
index: &Index,
numbers_db: heed::Database<FacetLevelValueF64Codec, CboRoaringBitmapCodec>,
strings_db: heed::Database<FacetStringLevelZeroCodec, FacetStringLevelZeroValueCodec>,
field_id: FieldId,
operator: &Operator,
) -> Result<RoaringBitmap> {
// Make sure we always bound the ranges with the field id and the level,
// as the facets values are all in the same database and prefixed by the
// field id and the level.
let (left, right) = match operator {
GreaterThan(val) => (Excluded(*val), Included(f64::MAX)),
GreaterThanOrEqual(val) => (Included(*val), Included(f64::MAX)),
Equal(number, string) => {
let (_original_value, string_docids) =
strings_db.get(rtxn, &(field_id, &string))?.unwrap_or_default();
let number_docids = match number {
Some(n) => {
let n = Included(*n);
let mut output = RoaringBitmap::new();
Self::explore_facet_number_levels(
rtxn,
numbers_db,
field_id,
0,
n,
n,
&mut output,
)?;
output
}
None => RoaringBitmap::new(),
};
return Ok(string_docids | number_docids);
}
NotEqual(number, string) => {
let all_numbers_ids = if number.is_some() {
index.number_faceted_documents_ids(rtxn, field_id)?
} else {
RoaringBitmap::new()
};
let all_strings_ids = index.string_faceted_documents_ids(rtxn, field_id)?;
let operator = Equal(*number, string.clone());
let docids = Self::evaluate_operator(
rtxn, index, numbers_db, strings_db, field_id, &operator,
)?;
return Ok((all_numbers_ids | all_strings_ids) - docids);
}
LowerThan(val) => (Included(f64::MIN), Excluded(*val)),
LowerThanOrEqual(val) => (Included(f64::MIN), Included(*val)),
Between(left, right) => (Included(*left), Included(*right)),
GeoLowerThan(base_point, distance) => {
let rtree = match index.geo_rtree(rtxn)? {
Some(rtree) => rtree,
None => return Ok(RoaringBitmap::new()),
};
let result = rtree
.nearest_neighbor_iter(base_point)
.take_while(|point| {
distance_between_two_points(base_point, point.geom()) < *distance
})
.map(|point| point.data)
.collect();
return Ok(result);
}
GeoGreaterThan(point, distance) => {
let result = Self::evaluate_operator(
rtxn,
index,
numbers_db,
strings_db,
field_id,
&GeoLowerThan(point.clone(), *distance),
)?;
let geo_faceted_doc_ids = index.geo_faceted_documents_ids(rtxn)?;
return Ok(geo_faceted_doc_ids - result);
}
};
// Ask for the biggest value that can exist for this specific field, if it exists
// that's fine if it don't, the value just before will be returned instead.
let biggest_level = numbers_db
.remap_data_type::<DecodeIgnore>()
.get_lower_than_or_equal_to(rtxn, &(field_id, u8::MAX, f64::MAX, f64::MAX))?
.and_then(|((id, level, _, _), _)| if id == field_id { Some(level) } else { None });
match biggest_level {
Some(level) => {
let mut output = RoaringBitmap::new();
Self::explore_facet_number_levels(
rtxn,
numbers_db,
field_id,
level,
left,
right,
&mut output,
)?;
Ok(output)
}
None => Ok(RoaringBitmap::new()),
}
}
pub fn evaluate(&self, rtxn: &heed::RoTxn, index: &Index) -> Result<RoaringBitmap> {
let numbers_db = index.facet_id_f64_docids;
let strings_db = index.facet_id_string_docids;
match self {
Operator(fid, op) => {
Self::evaluate_operator(rtxn, index, numbers_db, strings_db, *fid, op)
}
Or(lhs, rhs) => {
let lhs = lhs.evaluate(rtxn, index)?;
let rhs = rhs.evaluate(rtxn, index)?;
Ok(lhs | rhs)
}
And(lhs, rhs) => {
let lhs = lhs.evaluate(rtxn, index)?;
let rhs = rhs.evaluate(rtxn, index)?;
Ok(lhs & rhs)
}
Empty => Ok(RoaringBitmap::new()),
}
}
}
/// Retrieve the field id base on the pest value.
///
/// Returns an error if the given value is not filterable.
///
/// Returns Ok(None) if the given value is filterable, but is not yet ascociated to a field_id.
///
/// The pest pair is simply a string associated with a span, a location to highlight in
/// the error message.
fn field_id(
fields_ids_map: &FieldsIdsMap,
filterable_fields: &HashSet<String>,
items: &mut Pairs<Rule>,
) -> StdResult<Option<FieldId>, FilterError> {
// lexing ensures that we at least have a key
let key = items.next().unwrap();
if key.as_rule() == Rule::reserved {
return match key.as_str() {
key if key.starts_with("_geoPoint") => {
Err(FilterError::ReservedKeyword { field: "_geoPoint".to_string(), context: Some("Use the _geoRadius(latitude, longitude, distance) built-in rule to filter on _geo field coordinates.".to_string()) })
}
"_geo" => {
Err(FilterError::ReservedKeyword { field: "_geo".to_string(), context: Some("Use the _geoRadius(latitude, longitude, distance) built-in rule to filter on _geo field coordinates.".to_string()) })
}
key =>
Err(FilterError::ReservedKeyword { field: key.to_string(), context: None }),
};
}
if !filterable_fields.contains(key.as_str()) {
return Err(FilterError::InvalidAttribute {
field: key.as_str().to_string(),
valid_fields: filterable_fields.into_iter().cloned().collect(),
});
}
Ok(fields_ids_map.id(key.as_str()))
}
/// Tries to parse the pest pair into the type `T` specified, always returns
/// the original string that we tried to parse.
///
/// Returns the parsing error associated with the span if the conversion fails.
fn pest_parse<T>(pair: Pair<Rule>) -> (StdResult<T, pest::error::Error<Rule>>, String)
where
T: FromStr,
T::Err: ToString,
{
let result = match pair.as_str().parse::<T>() {
Ok(value) => Ok(value),
Err(e) => Err(PestError::<Rule>::new_from_span(
ErrorVariant::CustomError { message: e.to_string() },
pair.as_span(),
)),
};
(result, pair.as_str().to_string())
}
#[cfg(test)]
mod tests {
use big_s::S;
use heed::EnvOpenOptions;
use maplit::hashset;
use super::*;
use crate::update::Settings;
#[test]
fn string() {
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
// Set the filterable fields to be the channel.
let mut wtxn = index.write_txn().unwrap();
let mut map = index.fields_ids_map(&wtxn).unwrap();
map.insert("channel");
index.put_fields_ids_map(&mut wtxn, &map).unwrap();
let mut builder = Settings::new(&mut wtxn, &index, 0);
builder.set_filterable_fields(hashset! { S("channel") });
builder.execute(|_, _| ()).unwrap();
wtxn.commit().unwrap();
// Test that the facet condition is correctly generated.
let rtxn = index.read_txn().unwrap();
let condition = FilterCondition::from_str(&rtxn, &index, "channel = Ponce").unwrap();
let expected = Operator(0, Operator::Equal(None, S("ponce")));
assert_eq!(condition, expected);
let condition = FilterCondition::from_str(&rtxn, &index, "channel != ponce").unwrap();
let expected = Operator(0, Operator::NotEqual(None, S("ponce")));
assert_eq!(condition, expected);
let condition = FilterCondition::from_str(&rtxn, &index, "NOT channel = ponce").unwrap();
let expected = Operator(0, Operator::NotEqual(None, S("ponce")));
assert_eq!(condition, expected);
}
#[test]
fn number() {
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
// Set the filterable fields to be the channel.
let mut wtxn = index.write_txn().unwrap();
let mut map = index.fields_ids_map(&wtxn).unwrap();
map.insert("timestamp");
index.put_fields_ids_map(&mut wtxn, &map).unwrap();
let mut builder = Settings::new(&mut wtxn, &index, 0);
builder.set_filterable_fields(hashset! { "timestamp".into() });
builder.execute(|_, _| ()).unwrap();
wtxn.commit().unwrap();
// Test that the facet condition is correctly generated.
let rtxn = index.read_txn().unwrap();
let condition = FilterCondition::from_str(&rtxn, &index, "timestamp 22 TO 44").unwrap();
let expected = Operator(0, Between(22.0, 44.0));
assert_eq!(condition, expected);
let condition = FilterCondition::from_str(&rtxn, &index, "NOT timestamp 22 TO 44").unwrap();
let expected =
Or(Box::new(Operator(0, LowerThan(22.0))), Box::new(Operator(0, GreaterThan(44.0))));
assert_eq!(condition, expected);
}
#[test]
fn parentheses() {
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
// Set the filterable fields to be the channel.
let mut wtxn = index.write_txn().unwrap();
let mut builder = Settings::new(&mut wtxn, &index, 0);
builder.set_searchable_fields(vec![S("channel"), S("timestamp")]); // to keep the fields order
builder.set_filterable_fields(hashset! { S("channel"), S("timestamp") });
builder.execute(|_, _| ()).unwrap();
wtxn.commit().unwrap();
// Test that the facet condition is correctly generated.
let rtxn = index.read_txn().unwrap();
let condition = FilterCondition::from_str(
&rtxn,
&index,
"channel = gotaga OR (timestamp 22 TO 44 AND channel != ponce)",
)
.unwrap();
let expected = Or(
Box::new(Operator(0, Operator::Equal(None, S("gotaga")))),
Box::new(And(
Box::new(Operator(1, Between(22.0, 44.0))),
Box::new(Operator(0, Operator::NotEqual(None, S("ponce")))),
)),
);
assert_eq!(condition, expected);
let condition = FilterCondition::from_str(
&rtxn,
&index,
"channel = gotaga OR NOT (timestamp 22 TO 44 AND channel != ponce)",
)
.unwrap();
let expected = Or(
Box::new(Operator(0, Operator::Equal(None, S("gotaga")))),
Box::new(Or(
Box::new(Or(
Box::new(Operator(1, LowerThan(22.0))),
Box::new(Operator(1, GreaterThan(44.0))),
)),
Box::new(Operator(0, Operator::Equal(None, S("ponce")))),
)),
);
assert_eq!(condition, expected);
}
#[test]
fn reserved_field_names() {
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
let rtxn = index.read_txn().unwrap();
assert!(FilterCondition::from_str(&rtxn, &index, "_geo = 12").is_err());
assert!(FilterCondition::from_str(&rtxn, &index, r#"_geoDistance <= 1000"#).is_err());
assert!(FilterCondition::from_str(&rtxn, &index, r#"_geoPoint > 5"#).is_err());
assert!(FilterCondition::from_str(&rtxn, &index, r#"_geoPoint(12, 16) > 5"#).is_err());
}
#[test]
fn geo_radius() {
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
// Set the filterable fields to be the channel.
let mut wtxn = index.write_txn().unwrap();
let mut builder = Settings::new(&mut wtxn, &index, 0);
builder.set_searchable_fields(vec![S("_geo"), S("price")]); // to keep the fields order
builder.execute(|_, _| ()).unwrap();
wtxn.commit().unwrap();
let mut wtxn = index.write_txn().unwrap();
let mut builder = Settings::new(&mut wtxn, &index, 0);
builder.set_filterable_fields(hashset! { S("_geo"), S("price") });
builder.execute(|_, _| ()).unwrap();
wtxn.commit().unwrap();
let rtxn = index.read_txn().unwrap();
// basic test
let condition =
FilterCondition::from_str(&rtxn, &index, "_geoRadius(12, 13.0005, 2000)").unwrap();
let expected = Operator(0, GeoLowerThan([12., 13.0005], 2000.));
assert_eq!(condition, expected);
// basic test with latitude and longitude at the max angle
let condition =
FilterCondition::from_str(&rtxn, &index, "_geoRadius(90, 180, 2000)").unwrap();
let expected = Operator(0, GeoLowerThan([90., 180.], 2000.));
assert_eq!(condition, expected);
// basic test with latitude and longitude at the min angle
let condition =
FilterCondition::from_str(&rtxn, &index, "_geoRadius(-90, -180, 2000)").unwrap();
let expected = Operator(0, GeoLowerThan([-90., -180.], 2000.));
assert_eq!(condition, expected);
// test the negation of the GeoLowerThan
let condition =
FilterCondition::from_str(&rtxn, &index, "NOT _geoRadius(50, 18, 2000.500)").unwrap();
let expected = Operator(0, GeoGreaterThan([50., 18.], 2000.500));
assert_eq!(condition, expected);
// composition of multiple operations
let condition = FilterCondition::from_str(
&rtxn,
&index,
"(NOT _geoRadius(1, 2, 300) AND _geoRadius(1.001, 2.002, 1000.300)) OR price <= 10",
)
.unwrap();
let expected = Or(
Box::new(And(
Box::new(Operator(0, GeoGreaterThan([1., 2.], 300.))),
Box::new(Operator(0, GeoLowerThan([1.001, 2.002], 1000.300))),
)),
Box::new(Operator(1, LowerThanOrEqual(10.))),
);
assert_eq!(condition, expected);
// georadius don't have any parameters
let result = FilterCondition::from_str(&rtxn, &index, "_geoRadius");
assert!(result.is_err());
let error = result.unwrap_err();
assert!(error.to_string().contains(
"The _geoRadius filter expect three arguments: _geoRadius(latitude, longitude, radius)"
));
// georadius don't have any parameters
let result = FilterCondition::from_str(&rtxn, &index, "_geoRadius()");
assert!(result.is_err());
let error = result.unwrap_err();
assert!(error.to_string().contains(
"The _geoRadius filter expect three arguments: _geoRadius(latitude, longitude, radius)"
));
// georadius don't have enough parameters
let result = FilterCondition::from_str(&rtxn, &index, "_geoRadius(1, 2)");
assert!(result.is_err());
let error = result.unwrap_err();
assert!(error.to_string().contains(
"The _geoRadius filter expect three arguments: _geoRadius(latitude, longitude, radius)"
));
// georadius have too many parameters
let result =
FilterCondition::from_str(&rtxn, &index, "_geoRadius(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)");
assert!(result.is_err());
let error = result.unwrap_err();
assert!(error.to_string().contains(
"The _geoRadius filter expect three arguments: _geoRadius(latitude, longitude, radius)"
));
// georadius have a bad latitude
let result = FilterCondition::from_str(&rtxn, &index, "_geoRadius(-100, 150, 10)");
assert!(result.is_err());
let error = result.unwrap_err();
assert!(error
.to_string()
.contains("Latitude must be contained between -90 and 90 degrees."));
// georadius have a bad latitude
let result = FilterCondition::from_str(&rtxn, &index, "_geoRadius(-90.0000001, 150, 10)");
assert!(result.is_err());
let error = result.unwrap_err();
assert!(error
.to_string()
.contains("Latitude must be contained between -90 and 90 degrees."));
// georadius have a bad longitude
let result = FilterCondition::from_str(&rtxn, &index, "_geoRadius(-10, 250, 10)");
assert!(result.is_err());
let error = result.unwrap_err();
assert!(error
.to_string()
.contains("Longitude must be contained between -180 and 180 degrees."));
// georadius have a bad longitude
let result = FilterCondition::from_str(&rtxn, &index, "_geoRadius(-10, 180.000001, 10)");
assert!(result.is_err());
let error = result.unwrap_err();
assert!(error
.to_string()
.contains("Longitude must be contained between -180 and 180 degrees."));
}
#[test]
fn from_array() {
let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();
// Set the filterable fields to be the channel.
let mut wtxn = index.write_txn().unwrap();
let mut builder = Settings::new(&mut wtxn, &index, 0);
builder.set_searchable_fields(vec![S("channel"), S("timestamp")]); // to keep the fields order
builder.set_filterable_fields(hashset! { S("channel"), S("timestamp") });
builder.execute(|_, _| ()).unwrap();
wtxn.commit().unwrap();
// Test that the facet condition is correctly generated.
let rtxn = index.read_txn().unwrap();
let condition = FilterCondition::from_array(
&rtxn,
&index,
vec![
Either::Right("channel = gotaga"),
Either::Left(vec!["timestamp = 44", "channel != ponce"]),
],
)
.unwrap()
.unwrap();
let expected = FilterCondition::from_str(
&rtxn,
&index,
"channel = gotaga AND (timestamp = 44 OR channel != ponce)",
)
.unwrap();
assert_eq!(condition, expected);
}
}

View File

@ -1,11 +1,9 @@
pub use self::facet_distribution::FacetDistribution;
pub use self::facet_number::{FacetNumberIter, FacetNumberRange, FacetNumberRevRange};
pub use self::facet_string::FacetStringIter;
pub use self::filter_condition::{FilterCondition, Operator};
pub(crate) use self::parser::Rule as ParserRule;
pub use self::filter::Filter;
mod facet_distribution;
mod facet_number;
mod facet_string;
mod filter_condition;
mod parser;
mod filter;

View File

@ -1,12 +0,0 @@
use once_cell::sync::Lazy;
use pest::prec_climber::{Assoc, Operator, PrecClimber};
pub static PREC_CLIMBER: Lazy<PrecClimber<Rule>> = Lazy::new(|| {
use Assoc::*;
use Rule::*;
pest::prec_climber::PrecClimber::new(vec![Operator::new(or, Left), Operator::new(and, Left)])
});
#[derive(Parser)]
#[grammar = "search/facet/grammar.pest"]
pub struct FilterParser;

View File

@ -14,8 +14,7 @@ use meilisearch_tokenizer::{Analyzer, AnalyzerConfig};
use once_cell::sync::Lazy;
use roaring::bitmap::RoaringBitmap;
pub(crate) use self::facet::ParserRule;
pub use self::facet::{FacetDistribution, FacetNumberIter, FilterCondition, Operator};
pub use self::facet::{FacetDistribution, FacetNumberIter, Filter};
pub use self::matching_words::MatchingWords;
use self::query_tree::QueryTreeBuilder;
use crate::error::UserError;
@ -35,7 +34,8 @@ mod query_tree;
pub struct Search<'a> {
query: Option<String>,
filter: Option<FilterCondition>,
// this should be linked to the String in the query
filter: Option<Filter<'a>>,
offset: usize,
limit: usize,
sort_criteria: Option<Vec<AscDesc>>,
@ -97,7 +97,7 @@ impl<'a> Search<'a> {
self
}
pub fn filter(&mut self, condition: FilterCondition) -> &mut Search<'a> {
pub fn filter(&mut self, condition: Filter<'a>) -> &mut Search<'a> {
self.filter = Some(condition);
self
}

View File

@ -567,7 +567,7 @@ mod tests {
use super::*;
use crate::update::{IndexDocuments, Settings};
use crate::FilterCondition;
use crate::Filter;
#[test]
fn delete_documents_with_numbers_as_primary_key() {
@ -667,7 +667,7 @@ mod tests {
builder.delete_external_id("1_4");
builder.execute().unwrap();
let filter = FilterCondition::from_str(&wtxn, &index, "label = sign").unwrap();
let filter = Filter::from_str("label = sign").unwrap();
let results = index.search(&wtxn).filter(filter).execute().unwrap();
assert!(results.documents_ids.is_empty());

View File

@ -526,7 +526,7 @@ mod tests {
use super::*;
use crate::error::Error;
use crate::update::IndexDocuments;
use crate::{Criterion, FilterCondition, SearchResult};
use crate::{Criterion, Filter, SearchResult};
#[test]
fn set_and_reset_searchable_fields() {
@ -1068,7 +1068,8 @@ mod tests {
wtxn.commit().unwrap();
let rtxn = index.read_txn().unwrap();
FilterCondition::from_str(&rtxn, &index, "toto = 32").unwrap_err();
let filter = Filter::from_str("toto = 32").unwrap();
let _ = filter.evaluate(&rtxn, &index).unwrap_err();
}
#[test]

View File

@ -1,5 +1,5 @@
use either::{Either, Left, Right};
use milli::{Criterion, FilterCondition, Search, SearchResult};
use milli::{Criterion, Filter, Search, SearchResult};
use Criterion::*;
use crate::search::{self, EXTERNAL_DOCUMENTS_IDS};
@ -13,11 +13,7 @@ macro_rules! test_filter {
let rtxn = index.read_txn().unwrap();
let filter_conditions =
FilterCondition::from_array::<Vec<Either<Vec<&str>, &str>>, _, _, _>(
&rtxn, &index, $filter,
)
.unwrap()
.unwrap();
Filter::from_array::<Vec<Either<Vec<&str>, &str>>, _>($filter).unwrap().unwrap();
let mut search = Search::new(&rtxn, &index);
search.query(search::TEST_QUERY);