forked from cms/tibi-svelte-starter
✨ feat: enhance medialib image handling and add asset URL resolution
- Implemented `resolveApiAssetUrl` function to normalize asset URLs based on API base. - Updated `MedialibImage` component to utilize new asset URL resolution and added support for alt text and class properties. - Enhanced image loading behavior with improved width measurement and focal point handling. - Added placeholder image handling and improved accessibility with alt text. - Introduced new test script for auditing broken links in skill documentation. - Expanded seeded test content to include medialib entries and updated related tests for pagebuilder previews. - Improved global setup and teardown logging for clarity on seeded content management.
This commit is contained in:
@@ -0,0 +1,191 @@
|
||||
---
|
||||
name: search-and-embeddings
|
||||
description: Model search and semantic retrieval for tibi website projects. Covers embedding provider configuration, collection search modes, auto-regeneration, regenerate-search admin flows, and how later agents should decide between no search, classic search, ngram search, and vector search.
|
||||
---
|
||||
|
||||
# search-and-embeddings
|
||||
|
||||
## When to use this skill
|
||||
|
||||
Use this skill when:
|
||||
|
||||
- a project needs explicit search behavior beyond generic CRUD filtering
|
||||
- search should be typo-tolerant, weighted, or semantic
|
||||
- embedding providers must be configured
|
||||
- later agents need a clear yes/no decision for search instead of vague optionality
|
||||
|
||||
## Goal
|
||||
|
||||
Give later agents a practical workflow for deciding whether search is needed and, if yes, which search mode belongs to the project.
|
||||
|
||||
This skill is separate from editor AI features. Search and embeddings affect content retrieval, operational setup, and index/regeneration behavior, not just editor assistance.
|
||||
|
||||
## Source of truth
|
||||
|
||||
Use these sources when implementing or reviewing search behavior:
|
||||
|
||||
- `tibi-server/docs/02-configuration.md`
|
||||
- `tibi-server/docs/04-collections.md`
|
||||
- `tibi-server/docs/09-llm-integration.md`
|
||||
- `.agents/skills/nova-ai-editor-features/SKILL.md`
|
||||
- `.agents/skills/mongodb-and-indexes/SKILL.md`
|
||||
|
||||
## First decision: no search vs explicit search
|
||||
|
||||
Do not leave search in an implied state.
|
||||
|
||||
Make one explicit decision:
|
||||
|
||||
- no search in this project
|
||||
- classic keyword search only
|
||||
- fuzzy substring search (`ngram`)
|
||||
- semantic/vector search
|
||||
- hybrid search with deliberate ranking behavior
|
||||
|
||||
If the answer is “not used”, document that clearly so later agents do not accidentally wire providers or regress into half-configured search.
|
||||
|
||||
## Server-level provider setup
|
||||
|
||||
Embedding providers are configured server-side:
|
||||
|
||||
```yaml
|
||||
embedding:
|
||||
providers:
|
||||
- name: bge-m3
|
||||
type: native
|
||||
modelPath: /models/bge-m3
|
||||
dimensions: 1024
|
||||
- name: openai-embed
|
||||
type: openai
|
||||
model: text-embedding-3-small
|
||||
apiKey: ${EMBEDDING_OPENAI-EMBED_APIKEY}
|
||||
baseURL: https://api.openai.com/v1
|
||||
dimensions: 1536
|
||||
```
|
||||
|
||||
Important:
|
||||
|
||||
- collection search config references the provider by name
|
||||
- embedding secrets and model paths can come from environment variables
|
||||
- vector search is not only a collection concern; the server must actually provide the embedding backend
|
||||
|
||||
## Collection search modes
|
||||
|
||||
Tibi supports multiple search modes via collection `search:` config:
|
||||
|
||||
- `text`
|
||||
- `regex`
|
||||
- `eval`
|
||||
- `filter`
|
||||
- `ngram`
|
||||
- `vector`
|
||||
|
||||
Use explicit search configs when search is a real product feature. Auto-fallback is useful, but it is not a substitute for a deliberate retrieval model.
|
||||
|
||||
## Choosing the right mode
|
||||
|
||||
### `text`
|
||||
|
||||
Use when:
|
||||
|
||||
- MongoDB text indexing is sufficient
|
||||
- exact field ownership of the text index is clear
|
||||
- keyword search is enough
|
||||
|
||||
Requires a text index.
|
||||
|
||||
### `regex`
|
||||
|
||||
Use when:
|
||||
|
||||
- the searchable fields are explicit
|
||||
- case-insensitive matching is enough
|
||||
- weighted field scoring is useful
|
||||
|
||||
Good for smaller datasets or precise keyed fields.
|
||||
|
||||
### `filter` or `eval`
|
||||
|
||||
Use when:
|
||||
|
||||
- search logic depends on auth, project context, or business-specific filtering
|
||||
- plain keyword matching is not the full contract
|
||||
|
||||
Treat these as controlled power tools. The resulting filters are still sanitized against blocked operators.
|
||||
|
||||
### `ngram`
|
||||
|
||||
Use when:
|
||||
|
||||
- typo tolerance or substring matching is needed
|
||||
- users search codes, names, transliterated terms, or partial inputs
|
||||
|
||||
This is enrichment-based search. It stores generated `_search` data and benefits from clear regeneration expectations.
|
||||
|
||||
### `vector`
|
||||
|
||||
Use when:
|
||||
|
||||
- semantic similarity matters more than literal keyword overlap
|
||||
- the project can support embedding-provider setup and operator cost expectations
|
||||
- search quality justifies added complexity
|
||||
|
||||
Vector mode can use:
|
||||
|
||||
- `fields`
|
||||
- custom `eval` transformation
|
||||
- `documentPrefix`
|
||||
- `queryPrefix`
|
||||
- `overflow: truncate|chunk`
|
||||
- `rrf` tuning for hybrid scoring
|
||||
|
||||
## Auto-regeneration and admin flows
|
||||
|
||||
For `ngram` and `vector`, `autoRegenerate: true` can refresh stale enrichment data after config changes.
|
||||
|
||||
If regeneration is needed manually, the admin flow depends on project admin tokens with:
|
||||
|
||||
- `allowRegenerateSearch: true`
|
||||
|
||||
Treat regeneration as part of the search contract, not as an implementation footnote.
|
||||
|
||||
## Search and LLM are related but not identical
|
||||
|
||||
The LLM system and the embedding system are adjacent, but they are not the same thing.
|
||||
|
||||
- `llm.providers` drive chat/completion features
|
||||
- `embedding.providers` drive vector search enrichment
|
||||
- org/user budgets affect LLM usage workflows
|
||||
- search design still needs its own retrieval and operator decisions
|
||||
|
||||
Do not assume that enabling editor AI automatically defines a sound search architecture.
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
- leaving search unspecified and hoping auto-fallback is “good enough”
|
||||
- enabling vector search without a real provider/runtime plan
|
||||
- forgetting text indexes for `mode: text`
|
||||
- enabling enrichment modes without a regeneration story
|
||||
- mixing editor AI decisions with search decisions until neither is clear
|
||||
|
||||
## Verification checklist
|
||||
|
||||
After search-related changes, verify all of these:
|
||||
|
||||
1. the project has an explicit yes/no search decision
|
||||
2. server-side embedding providers exist when vector search is configured
|
||||
3. required text or search indexes exist
|
||||
4. `?q=` and `?qName=` behavior matches the intended search contract
|
||||
5. regeneration behavior is defined for enrichment-based modes
|
||||
|
||||
## What an LLM should inspect first
|
||||
|
||||
When asked to add or review search on this starter, inspect in this order:
|
||||
|
||||
1. `tibi-server/docs/04-collections.md`
|
||||
2. `tibi-server/docs/02-configuration.md`
|
||||
3. existing collection `search:` config
|
||||
4. whether the project needs keyword, fuzzy, semantic, or no search
|
||||
5. operator expectations for regeneration and provider secrets
|
||||
|
||||
This prevents over-engineered vector setups and under-specified search behavior.
|
||||
Reference in New Issue
Block a user