RAG API
This document outlines the API endpoints for managing Retrieval-Augmented Generation (RAG) components in PySpur.
Document Collections
Create Document Collection
Description: Creates a new document collection from uploaded files and metadata. The files are processed asynchronously in the background.
URL: /rag/collections/
Method: POST
Form Data:
files: List[UploadFile] # List of files to upload (optional)
metadata: str # JSON string containing collection configuration
Where metadata
is a JSON string representing:
class DocumentCollectionCreateSchema:
name: str # Name of the collection
description: str # Description of the collection
text_processing: ChunkingConfigSchema # Configuration for text processing
Response Schema:
class DocumentCollectionResponseSchema:
id: str # ID of the document collection
name: str # Name of the collection
description: str # Description of the collection
status: str # Status of the collection (processing, ready, failed)
created_at: str # When the collection was created (ISO format)
updated_at: str # When the collection was last updated (ISO format)
document_count: int # Number of documents in the collection
chunk_count: int # Number of chunks in the collection
error_message: Optional[str] # Error message if processing failed
List Document Collections
Description: Lists all document collections.
URL: /rag/collections/
Method: GET
Response Schema:
List[DocumentCollectionResponseSchema]
Get Document Collection
Description: Gets details of a specific document collection.
URL: /rag/collections/{collection_id}/
Method: GET
Parameters:
collection_id: str # ID of the document collection
Response Schema:
class DocumentCollectionResponseSchema:
id: str # ID of the document collection
name: str # Name of the collection
description: str # Description of the collection
status: str # Status of the collection (processing, ready, failed)
created_at: str # When the collection was created (ISO format)
updated_at: str # When the collection was last updated (ISO format)
document_count: int # Number of documents in the collection
chunk_count: int # Number of chunks in the collection
error_message: Optional[str] # Error message if processing failed
Delete Document Collection
Description: Deletes a document collection and its associated data.
URL: /rag/collections/{collection_id}/
Method: DELETE
Parameters:
collection_id: str # ID of the document collection
Response: 200 OK with message
Get Collection Progress
Description: Gets the processing progress of a document collection.
URL: /rag/collections/{collection_id}/progress/
Method: GET
Parameters:
collection_id: str # ID of the document collection
Response Schema:
class ProcessingProgressSchema:
id: str # ID of the collection
status: str # Status of processing
progress: float # Progress percentage (0-100)
current_step: Optional[str] # Current processing step
total_files: Optional[int] # Total number of files
processed_files: Optional[int] # Number of processed files
total_chunks: Optional[int] # Total number of chunks
processed_chunks: Optional[int] # Number of processed chunks
error_message: Optional[str] # Error message if processing failed
created_at: str # When processing started (ISO format)
updated_at: str # When processing was last updated (ISO format)
Add Documents to Collection
Description: Adds documents to an existing collection. The documents are processed asynchronously in the background.
URL: /rag/collections/{collection_id}/documents/
Method: POST
Parameters:
collection_id: str # ID of the document collection
Form Data:
files: List[UploadFile] # List of files to upload
Response Schema:
class DocumentCollectionResponseSchema:
# Same as Get Document Collection
Get Collection Documents
Description: Gets all documents and their chunks for a collection.
URL: /rag/collections/{collection_id}/documents/
Method: GET
Parameters:
collection_id: str # ID of the document collection
Response Schema:
List[DocumentWithChunksSchema]
Where DocumentWithChunksSchema
contains:
class DocumentWithChunksSchema:
id: str # ID of the document
title: str # Title of the document
metadata: Dict[str, Any] # Metadata about the document
chunks: List[DocumentChunkSchema] # List of chunks in the document
Delete Document from Collection
Description: Deletes a document from a collection.
URL: /rag/collections/{collection_id}/documents/{document_id}/
Method: DELETE
Parameters:
collection_id: str # ID of the document collection
document_id: str # ID of the document to delete
Response: 200 OK with message
Preview Chunk
Description: Previews how a document would be chunked with a given configuration.
URL: /rag/collections/preview_chunk/
Method: POST
Form Data:
file: UploadFile # File to preview
chunking_config: str # JSON string containing chunking configuration
Response Schema:
{
"chunks": List[Dict[str, Any]], # Preview of chunks
"total_chunks": int # Total number of chunks
}
Vector Indices
Create Vector Index
Description: Creates a new vector index from a document collection. The index is created asynchronously in the background.
URL: /rag/indices/
Method: POST
Request Payload:
class VectorIndexCreateSchema:
name: str # Name of the index
description: str # Description of the index
collection_id: str # ID of the document collection
embedding: EmbeddingConfigSchema # Configuration for embedding
Response Schema:
class VectorIndexResponseSchema:
id: str # ID of the vector index
name: str # Name of the index
description: str # Description of the index
collection_id: str # ID of the document collection
status: str # Status of the index (processing, ready, failed)
created_at: str # When the index was created (ISO format)
updated_at: str # When the index was last updated (ISO format)
document_count: int # Number of documents in the index
chunk_count: int # Number of chunks in the index
embedding_model: str # Name of the embedding model
vector_db: str # Name of the vector database
error_message: Optional[str] # Error message if processing failed
List Vector Indices
Description: Lists all vector indices.
URL: /rag/indices/
Method: GET
Response Schema:
List[VectorIndexResponseSchema]
Get Vector Index
Description: Gets details of a specific vector index.
URL: /rag/indices/{index_id}/
Method: GET
Parameters:
index_id: str # ID of the vector index
Response Schema:
class VectorIndexResponseSchema:
# Same as Create Vector Index response
Delete Vector Index
Description: Deletes a vector index and its associated data.
URL: /rag/indices/{index_id}/
Method: DELETE
Parameters:
index_id: str # ID of the vector index
Response: 200 OK with message
Get Index Progress
Description: Gets the processing progress of a vector index.
URL: /rag/indices/{index_id}/progress/
Method: GET
Parameters:
index_id: str # ID of the vector index
Response Schema:
class ProcessingProgressSchema:
# Same as Get Collection Progress response
Retrieve from Index
Description: Retrieves relevant chunks from a vector index based on a query.
URL: /rag/indices/{index_id}/retrieve/
Method: POST
Parameters:
index_id: str # ID of the vector index
Request Payload:
class RetrievalRequestSchema:
query: str # Query to search for
top_k: Optional[int] = 5 # Number of results to return
score_threshold: Optional[float] = None # Minimum score threshold
semantic_weight: Optional[float] = 1.0 # Weight for semantic search
keyword_weight: Optional[float] = 0.0 # Weight for keyword search
Response Schema:
class RetrievalResponseSchema:
results: List[RetrievalResultSchema] # List of retrieval results
total_results: int # Total number of results
Where RetrievalResultSchema
contains:
class RetrievalResultSchema:
text: str # Text of the chunk
score: float # Relevance score
metadata: ChunkMetadataSchema # Metadata about the chunk