Backend API
main.py
Health check endpoint.
Returns:
| Name | Type | Description |
|---|---|---|
dict |
Dict[str, str]
|
A simple JSON message confirming that the backend is running. |
Example response
{ "message": "Multimodal RAG backend is running!!!" }
Source code in backend/main.py
api/endpoints.py
Bases: BaseModel
Schema for user query requests.
Source code in backend/api/endpoints.py
Bases: BaseModel
Schema for text embedding endpoint.
Source code in backend/api/endpoints.py
Bases: BaseModel
Schema for image embedding endpoint.
Source code in backend/api/endpoints.py
Bases: BaseModel
Schema for video embedding endpoint.
Source code in backend/api/endpoints.py
Handle chat request with optional model selection and attachment.
Source code in backend/api/endpoints.py
Run RAG with user filtering and optional attachment/model.
Source code in backend/api/endpoints.py
Generate embedding for a text payload.
Source code in backend/api/endpoints.py
Generate embedding for a local image file.
Source code in backend/api/endpoints.py
Generate embedding for a local video file.
Source code in backend/api/endpoints.py
Upload a file from chat attachment and persist metadata.
Source code in backend/api/endpoints.py
Return latest query history for the current user.
Source code in backend/api/endpoints.py
core/embeddings.py
Generate an embedding vector for a given text string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The input text to encode. |
required |
provider_name
|
str | None
|
Optional provider name override. |
None
|
Returns:
| Type | Description |
|---|---|
list[float]
|
List[float]: A list of floats representing the text embedding. |
Notes
- This embedding can be stored in a vector database like Qdrant.
- Ensure the model used for text embeddings is compatible with your retrieval pipeline.
Source code in backend/core/embeddings.py
Generate an embedding vector for an image from a file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_path
|
str
|
Path to the image file to encode. |
required |
provider_name
|
str | None
|
Optional provider name override. |
None
|
Returns:
| Type | Description |
|---|---|
List[float]
|
List[float]: A list of floats representing the image embedding. |
Notes
- The image is converted to RGB before encoding.
- Uses a CLIP-based model ("clip-ViT-B-32") for generating visual embeddings.
- Embeddings can be stored in Qdrant or compared with other image embeddings.
Source code in backend/core/embeddings.py
Generate an embedding vector for a video from a file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_path
|
str
|
Path to the video file to encode. |
required |
sample_fps
|
float | None
|
Sampling FPS. Falls back to config when None. |
None
|
provider_name
|
str | None
|
Optional provider name override. |
None
|
Source code in backend/core/embeddings.py
core/llm.py
Wrapper for a multimodal Vision-Text LLM (Qwen2-VL) to generate text from images and prompts.
Attributes:
| Name | Type | Description |
|---|---|---|
processor |
Processor for preparing images and text for the model. |
|
model |
The loaded Vision2Seq model for multimodal inference. |
Source code in backend/core/llm.py
Helper function to generate a response using the global QwenVisionLLM instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
User prompt or question. |
required |
context
|
list[dict]
|
Retrieved documents for context. |
None
|
image
|
str or Image
|
Image to include in generation. |
None
|
model
|
str | None
|
Requested model identifier. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Generated text from the LLM. |
Source code in backend/core/llm.py
core/multimodal_rag.py
Local Retrieval-Augmented Generation (RAG) pipeline using Qdrant and a local LLM.
Attributes:
| Name | Type | Description |
|---|---|---|
client |
QdrantClient
|
Client to connect to the local Qdrant vector database. |
Source code in backend/core/multimodal_rag.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
Retrieve top-K most similar documents from Qdrant for a given query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
User text query. |
required |
top_k
|
int
|
Number of top documents to retrieve. Defaults to 5. |
5
|
user_id
|
str | None
|
Filter results by user id. Defaults to None. |
None
|
folder_scopes
|
list[str] | None
|
Optional folder scope filter. |
None
|
file_ids
|
list[str] | None
|
Optional file filter. |
None
|
Returns:
| Type | Description |
|---|---|
List[Dict[str, str]]
|
List[Dict[str, str]]: A list of dictionaries containing the retrieved documents: - 'text' (str): The text content of the document chunk. - 'source' (str): Source file path or metadata for the chunk. |
Source code in backend/core/multimodal_rag.py
Generate an answer using the local LLM based on the query and optionally an image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
User question or prompt. |
required |
top_k
|
int
|
Number of top documents to retrieve for context. Defaults to 5. |
5
|
image
|
Optional[str]
|
Optional image path or URL to include in the prompt. Defaults to None. |
None
|
user_id
|
str | None
|
Filter context by user id. Defaults to None. |
None
|
model
|
str | None
|
LLM model name hint for backend routing. |
None
|
folder_scopes
|
list[str] | None
|
Optional folder filter for retrieval. |
None
|
file_ids
|
list[str] | None
|
Optional file filter for retrieval. |
None
|
extra_docs
|
list[dict[str, str]] | None
|
Extra context docs from attachments. |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict[str, object]: A dictionary with: - 'answer' (str): The generated text answer from the LLM. - 'retrieved_docs' (List[Dict[str, str]]): The list of retrieved documents used as context. |
Source code in backend/core/multimodal_rag.py
core/embedding_providers.py
Bases: ABC
Base interface for embedding providers.
Source code in backend/core/embedding_providers.py
Bases: EmbeddingProvider
SentenceTransformers-based provider for text, image and video.
Source code in backend/core/embedding_providers.py
Resolve embedding provider by name.
Source code in backend/core/embedding_providers.py
utils/load_data.py
Unified loader for text documents and images with optional Qdrant upsert.
Source code in backend/utils/load_data.py
monitoring/metrics.py
Observe common HTTP metrics.
Source code in backend/monitoring/metrics.py
Observe RAG query metrics.
Source code in backend/monitoring/metrics.py
Observe embedding generation metrics.