✨ feat: enhance medialib image handling and add asset URL resolution

- Implemented `resolveApiAssetUrl` function to normalize asset URLs based on API base. - Updated `MedialibImage` component to utilize new asset URL resolution and added support for alt text and class properties. - Enhanced image loading behavior with improved width measurement and focal point handling. - Added placeholder image handling and improved accessibility with alt text. - Introduced new test script for auditing broken links in skill documentation. - Expanded seeded test content to include medialib entries and updated related tests for pagebuilder previews. - Improved global setup and teardown logging for clarity on seeded content management.
2026-05-17 00:52:41 +00:00
parent 958b45272d
commit 4020ad62c5
44 changed files with 4276 additions and 867 deletions
@@ -0,0 +1,191 @@
+---
+name: search-and-embeddings
+description: Model search and semantic retrieval for tibi website projects. Covers embedding provider configuration, collection search modes, auto-regeneration, regenerate-search admin flows, and how later agents should decide between no search, classic search, ngram search, and vector search.
+---
+
+# search-and-embeddings
+
+## When to use this skill
+
+Use this skill when:
+
+- a project needs explicit search behavior beyond generic CRUD filtering
+- search should be typo-tolerant, weighted, or semantic
+- embedding providers must be configured
+- later agents need a clear yes/no decision for search instead of vague optionality
+
+## Goal
+
+Give later agents a practical workflow for deciding whether search is needed and, if yes, which search mode belongs to the project.
+
+This skill is separate from editor AI features. Search and embeddings affect content retrieval, operational setup, and index/regeneration behavior, not just editor assistance.
+
+## Source of truth
+
+Use these sources when implementing or reviewing search behavior:
+
+- `tibi-server/docs/02-configuration.md`
+- `tibi-server/docs/04-collections.md`
+- `tibi-server/docs/09-llm-integration.md`
+- `.agents/skills/nova-ai-editor-features/SKILL.md`
+- `.agents/skills/mongodb-and-indexes/SKILL.md`
+
+## First decision: no search vs explicit search
+
+Do not leave search in an implied state.
+
+Make one explicit decision:
+
+- no search in this project
+- classic keyword search only
+- fuzzy substring search (`ngram`)
+- semantic/vector search
+- hybrid search with deliberate ranking behavior
+
+If the answer is “not used”, document that clearly so later agents do not accidentally wire providers or regress into half-configured search.
+
+## Server-level provider setup
+
+Embedding providers are configured server-side:
+
+```yaml
+embedding:
+    providers:
+        - name: bge-m3
+          type: native
+          modelPath: /models/bge-m3
+          dimensions: 1024
+        - name: openai-embed
+          type: openai
+          model: text-embedding-3-small
+          apiKey: ${EMBEDDING_OPENAI-EMBED_APIKEY}
+          baseURL: https://api.openai.com/v1
+          dimensions: 1536
+```
+
+Important:
+
+- collection search config references the provider by name
+- embedding secrets and model paths can come from environment variables
+- vector search is not only a collection concern; the server must actually provide the embedding backend
+
+## Collection search modes
+
+Tibi supports multiple search modes via collection `search:` config:
+
+- `text`
+- `regex`
+- `eval`
+- `filter`
+- `ngram`
+- `vector`
+
+Use explicit search configs when search is a real product feature. Auto-fallback is useful, but it is not a substitute for a deliberate retrieval model.
+
+## Choosing the right mode
+
+### `text`
+
+Use when:
+
+- MongoDB text indexing is sufficient
+- exact field ownership of the text index is clear
+- keyword search is enough
+
+Requires a text index.
+
+### `regex`
+
+Use when:
+
+- the searchable fields are explicit
+- case-insensitive matching is enough
+- weighted field scoring is useful
+
+Good for smaller datasets or precise keyed fields.
+
+### `filter` or `eval`
+
+Use when:
+
+- search logic depends on auth, project context, or business-specific filtering
+- plain keyword matching is not the full contract
+
+Treat these as controlled power tools. The resulting filters are still sanitized against blocked operators.
+
+### `ngram`
+
+Use when:
+
+- typo tolerance or substring matching is needed
+- users search codes, names, transliterated terms, or partial inputs
+
+This is enrichment-based search. It stores generated `_search` data and benefits from clear regeneration expectations.
+
+### `vector`
+
+Use when:
+
+- semantic similarity matters more than literal keyword overlap
+- the project can support embedding-provider setup and operator cost expectations
+- search quality justifies added complexity
+
+Vector mode can use:
+
+- `fields`
+- custom `eval` transformation
+- `documentPrefix`
+- `queryPrefix`
+- `overflow: truncate|chunk`
+- `rrf` tuning for hybrid scoring
+
+## Auto-regeneration and admin flows
+
+For `ngram` and `vector`, `autoRegenerate: true` can refresh stale enrichment data after config changes.
+
+If regeneration is needed manually, the admin flow depends on project admin tokens with:
+
+- `allowRegenerateSearch: true`
+
+Treat regeneration as part of the search contract, not as an implementation footnote.
+
+## Search and LLM are related but not identical
+
+The LLM system and the embedding system are adjacent, but they are not the same thing.
+
+- `llm.providers` drive chat/completion features
+- `embedding.providers` drive vector search enrichment
+- org/user budgets affect LLM usage workflows
+- search design still needs its own retrieval and operator decisions
+
+Do not assume that enabling editor AI automatically defines a sound search architecture.
+
+## Anti-patterns
+
+- leaving search unspecified and hoping auto-fallback is “good enough”
+- enabling vector search without a real provider/runtime plan
+- forgetting text indexes for `mode: text`
+- enabling enrichment modes without a regeneration story
+- mixing editor AI decisions with search decisions until neither is clear
+
+## Verification checklist
+
+After search-related changes, verify all of these:
+
+1. the project has an explicit yes/no search decision
+2. server-side embedding providers exist when vector search is configured
+3. required text or search indexes exist
+4. `?q=` and `?qName=` behavior matches the intended search contract
+5. regeneration behavior is defined for enrichment-based modes
+
+## What an LLM should inspect first
+
+When asked to add or review search on this starter, inspect in this order:
+
+1. `tibi-server/docs/04-collections.md`
+2. `tibi-server/docs/02-configuration.md`
+3. existing collection `search:` config
+4. whether the project needs keyword, fuzzy, semantic, or no search
+5. operator expectations for regeneration and provider secrets
+
+This prevents over-engineered vector setups and under-specified search behavior.