RAG API

This document outlines the API endpoints for managing Retrieval-Augmented Generation (RAG) components in PySpur.

Document Collections

Create Document Collection

Description: Creates a new document collection from uploaded files and metadata. The files are processed asynchronously in the background. URL: /rag/collections/ Method: POST Form Data:
files: List[UploadFile]  # List of files to upload (optional)
metadata: str  # JSON string containing collection configuration
Where metadata is a JSON string representing:
class DocumentCollectionCreateSchema:
    name: str  # Name of the collection
    description: str  # Description of the collection
    text_processing: ChunkingConfigSchema  # Configuration for text processing
Response Schema:
class DocumentCollectionResponseSchema:
    id: str  # ID of the document collection
    name: str  # Name of the collection
    description: str  # Description of the collection
    status: str  # Status of the collection (processing, ready, failed)
    created_at: str  # When the collection was created (ISO format)
    updated_at: str  # When the collection was last updated (ISO format)
    document_count: int  # Number of documents in the collection
    chunk_count: int  # Number of chunks in the collection
    error_message: Optional[str]  # Error message if processing failed

List Document Collections

Description: Lists all document collections. URL: /rag/collections/ Method: GET Response Schema:
List[DocumentCollectionResponseSchema]

Get Document Collection

Description: Gets details of a specific document collection. URL: /rag/collections/{collection_id}/ Method: GET Parameters:
collection_id: str  # ID of the document collection
Response Schema:
class DocumentCollectionResponseSchema:
    id: str  # ID of the document collection
    name: str  # Name of the collection
    description: str  # Description of the collection
    status: str  # Status of the collection (processing, ready, failed)
    created_at: str  # When the collection was created (ISO format)
    updated_at: str  # When the collection was last updated (ISO format)
    document_count: int  # Number of documents in the collection
    chunk_count: int  # Number of chunks in the collection
    error_message: Optional[str]  # Error message if processing failed

Delete Document Collection

Description: Deletes a document collection and its associated data. URL: /rag/collections/{collection_id}/ Method: DELETE Parameters:
collection_id: str  # ID of the document collection
Response: 200 OK with message

Get Collection Progress

Description: Gets the processing progress of a document collection. URL: /rag/collections/{collection_id}/progress/ Method: GET Parameters:
collection_id: str  # ID of the document collection
Response Schema:
class ProcessingProgressSchema:
    id: str  # ID of the collection
    status: str  # Status of processing
    progress: float  # Progress percentage (0-100)
    current_step: Optional[str]  # Current processing step
    total_files: Optional[int]  # Total number of files
    processed_files: Optional[int]  # Number of processed files
    total_chunks: Optional[int]  # Total number of chunks
    processed_chunks: Optional[int]  # Number of processed chunks
    error_message: Optional[str]  # Error message if processing failed
    created_at: str  # When processing started (ISO format)
    updated_at: str  # When processing was last updated (ISO format)

Add Documents to Collection

Description: Adds documents to an existing collection. The documents are processed asynchronously in the background. URL: /rag/collections/{collection_id}/documents/ Method: POST Parameters:
collection_id: str  # ID of the document collection
Form Data:
files: List[UploadFile]  # List of files to upload
Response Schema:
class DocumentCollectionResponseSchema:
    # Same as Get Document Collection

Get Collection Documents

Description: Gets all documents and their chunks for a collection. URL: /rag/collections/{collection_id}/documents/ Method: GET Parameters:
collection_id: str  # ID of the document collection
Response Schema:
List[DocumentWithChunksSchema]
Where DocumentWithChunksSchema contains:
class DocumentWithChunksSchema:
    id: str  # ID of the document
    title: str  # Title of the document
    metadata: Dict[str, Any]  # Metadata about the document
    chunks: List[DocumentChunkSchema]  # List of chunks in the document

Delete Document from Collection

Description: Deletes a document from a collection. URL: /rag/collections/{collection_id}/documents/{document_id}/ Method: DELETE Parameters:
collection_id: str  # ID of the document collection
document_id: str  # ID of the document to delete
Response: 200 OK with message

Preview Chunk

Description: Previews how a document would be chunked with a given configuration. URL: /rag/collections/preview_chunk/ Method: POST Form Data:
file: UploadFile  # File to preview
chunking_config: str  # JSON string containing chunking configuration
Response Schema:
{
    "chunks": List[Dict[str, Any]],  # Preview of chunks
    "total_chunks": int  # Total number of chunks
}

Vector Indices

Create Vector Index

Description: Creates a new vector index from a document collection. The index is created asynchronously in the background. URL: /rag/indices/ Method: POST Request Payload:
class VectorIndexCreateSchema:
    name: str  # Name of the index
    description: str  # Description of the index
    collection_id: str  # ID of the document collection
    embedding: EmbeddingConfigSchema  # Configuration for embedding
Response Schema:
class VectorIndexResponseSchema:
    id: str  # ID of the vector index
    name: str  # Name of the index
    description: str  # Description of the index
    collection_id: str  # ID of the document collection
    status: str  # Status of the index (processing, ready, failed)
    created_at: str  # When the index was created (ISO format)
    updated_at: str  # When the index was last updated (ISO format)
    document_count: int  # Number of documents in the index
    chunk_count: int  # Number of chunks in the index
    embedding_model: str  # Name of the embedding model
    vector_db: str  # Name of the vector database
    error_message: Optional[str]  # Error message if processing failed

List Vector Indices

Description: Lists all vector indices. URL: /rag/indices/ Method: GET Response Schema:
List[VectorIndexResponseSchema]

Get Vector Index

Description: Gets details of a specific vector index. URL: /rag/indices/{index_id}/ Method: GET Parameters:
index_id: str  # ID of the vector index
Response Schema:
class VectorIndexResponseSchema:
    # Same as Create Vector Index response

Delete Vector Index

Description: Deletes a vector index and its associated data. URL: /rag/indices/{index_id}/ Method: DELETE Parameters:
index_id: str  # ID of the vector index
Response: 200 OK with message

Get Index Progress

Description: Gets the processing progress of a vector index. URL: /rag/indices/{index_id}/progress/ Method: GET Parameters:
index_id: str  # ID of the vector index
Response Schema:
class ProcessingProgressSchema:
    # Same as Get Collection Progress response

Retrieve from Index

Description: Retrieves relevant chunks from a vector index based on a query. URL: /rag/indices/{index_id}/retrieve/ Method: POST Parameters:
index_id: str  # ID of the vector index
Request Payload:
class RetrievalRequestSchema:
    query: str  # Query to search for
    top_k: Optional[int] = 5  # Number of results to return
    score_threshold: Optional[float] = None  # Minimum score threshold
    semantic_weight: Optional[float] = 1.0  # Weight for semantic search
    keyword_weight: Optional[float] = 0.0  # Weight for keyword search
Response Schema:
class RetrievalResponseSchema:
    results: List[RetrievalResultSchema]  # List of retrieval results
    total_results: int  # Total number of results
Where RetrievalResultSchema contains:
class RetrievalResultSchema:
    text: str  # Text of the chunk
    score: float  # Relevance score
    metadata: ChunkMetadataSchema  # Metadata about the chunk