✨ feat: enhance search capabilities and indexing across collections
This commit is contained in:
@@ -79,6 +79,7 @@ Tibi supports multiple search modes via collection `search:` config:
|
||||
- `filter`
|
||||
- `ngram`
|
||||
- `vector`
|
||||
- `combined`
|
||||
|
||||
Use explicit search configs when search is a real product feature. Auto-fallback is useful, but it is not a substitute for a deliberate retrieval model.
|
||||
|
||||
@@ -92,7 +93,7 @@ Use when:
|
||||
- exact field ownership of the text index is clear
|
||||
- keyword search is enough
|
||||
|
||||
Requires a text index.
|
||||
Requires a MongoDB text index (`$text: $**` or specific).
|
||||
|
||||
### `regex`
|
||||
|
||||
@@ -100,9 +101,16 @@ Use when:
|
||||
|
||||
- the searchable fields are explicit
|
||||
- case-insensitive matching is enough
|
||||
- weighted field scoring is useful
|
||||
- weighted field scoring is useful (via `regex.weights: { "meta.title": 10, path: 5 }`)
|
||||
|
||||
Good for smaller datasets or precise keyed fields.
|
||||
Good for smaller datasets or precise keyed fields. Very easy to configure without external dependencies. Example:
|
||||
|
||||
```yaml
|
||||
search:
|
||||
- name: default
|
||||
mode: regex
|
||||
fields: [title, "alt.de", description]
|
||||
```
|
||||
|
||||
### `filter` or `eval`
|
||||
|
||||
@@ -121,23 +129,62 @@ Use when:
|
||||
- users search codes, names, transliterated terms, or partial inputs
|
||||
|
||||
This is enrichment-based search. It stores generated `_search` data and benefits from clear regeneration expectations.
|
||||
_Note:_ Field weighting is not natively supported inside a single `ngram` mode, because all `fields` are concatenated into one large ngram index block per document.
|
||||
|
||||
### `vector`
|
||||
|
||||
Use when:
|
||||
|
||||
- semantic similarity matters more than literal keyword overlap
|
||||
- the project can support embedding-provider setup and operator cost expectations
|
||||
- the project can support embedding-provider setup (e.g. `bge-m3` in `api/config.yml`)
|
||||
- search quality justifies added complexity
|
||||
|
||||
Vector mode can use:
|
||||
Vector mode requires a registered provider.
|
||||
|
||||
- `fields`
|
||||
- custom `eval` transformation
|
||||
- `documentPrefix`
|
||||
- `queryPrefix`
|
||||
- `overflow: truncate|chunk`
|
||||
- `rrf` tuning for hybrid scoring
|
||||
### `combined` (RRF)
|
||||
|
||||
Use when:
|
||||
|
||||
- Hybrid search is required (e.g. `vector` + `ngram` to catch typos and semantic meaning).
|
||||
- You need to simulate field-weighting for `vector` or `ngram` by breaking them up into multiple search blocks and fusing them with different weights.
|
||||
|
||||
`mode: combined` uses Reciprocal Rank Fusion (RRF). It delegates execution to other configured search blocks (which should be hidden in admin UI via `meta.hide: true`).
|
||||
|
||||
**Field-Weighting Workaround with combined:**
|
||||
Because `vector` and `ngram` concatenate all fields, you can weight highly important fields (like titles) higher than deep content fields by creating multiple ngram/vector blocks and boosting the important one in the `combined` weights:
|
||||
|
||||
```yaml
|
||||
search:
|
||||
- name: main_search
|
||||
mode: combined
|
||||
rrf:
|
||||
k: 60
|
||||
topK: 100
|
||||
weights:
|
||||
semantic: 1.5
|
||||
fuzzy_important: 2.0 # Boosts matches in title/headline
|
||||
fuzzy_content: 0.5 # Lowers weight for deep text matches
|
||||
meta:
|
||||
label: { de: "Suche", en: "Search" }
|
||||
|
||||
- name: fuzzy_important
|
||||
mode: ngram
|
||||
fields: [name, "meta.title", "blocks.headline"]
|
||||
autoRegenerate: true
|
||||
meta: { hide: true }
|
||||
|
||||
- name: fuzzy_content
|
||||
mode: ngram
|
||||
fields: ["blocks.text", "blocks.items.answer"]
|
||||
autoRegenerate: true
|
||||
meta: { hide: true }
|
||||
|
||||
- name: semantic
|
||||
mode: vector
|
||||
fields: [name, "meta.title", "blocks.text"]
|
||||
vector: { provider: bge-m3 }
|
||||
autoRegenerate: true
|
||||
```
|
||||
|
||||
## Auto-regeneration and admin flows
|
||||
|
||||
|
||||
Reference in New Issue
Block a user