RAG API

This document outlines the API endpoints for managing Retrieval-Augmented Generation (RAG) components in PySpur.

Document Collections

Create Document Collection

Description: Creates a new document collection from uploaded files and metadata. The files are processed asynchronously in the background.

URL: /rag/collections/

Method: POST

Form Data:

files: List[UploadFile]  # List of files to upload (optional)
metadata: str  # JSON string containing collection configuration

Where metadata is a JSON string representing:

class DocumentCollectionCreateSchema:
    name: str  # Name of the collection
    description: str  # Description of the collection
    text_processing: ChunkingConfigSchema  # Configuration for text processing

Response Schema:

class DocumentCollectionResponseSchema:
    id: str  # ID of the document collection
    name: str  # Name of the collection
    description: str  # Description of the collection
    status: str  # Status of the collection (processing, ready, failed)
    created_at: str  # When the collection was created (ISO format)
    updated_at: str  # When the collection was last updated (ISO format)
    document_count: int  # Number of documents in the collection
    chunk_count: int  # Number of chunks in the collection
    error_message: Optional[str]  # Error message if processing failed

List Document Collections

Description: Lists all document collections.

URL: /rag/collections/

Method: GET

Response Schema:

List[DocumentCollectionResponseSchema]

Get Document Collection

Description: Gets details of a specific document collection.

URL: /rag/collections/{collection_id}/

Method: GET

Parameters:

collection_id: str  # ID of the document collection

Response Schema:

class DocumentCollectionResponseSchema:
    id: str  # ID of the document collection
    name: str  # Name of the collection
    description: str  # Description of the collection
    status: str  # Status of the collection (processing, ready, failed)
    created_at: str  # When the collection was created (ISO format)
    updated_at: str  # When the collection was last updated (ISO format)
    document_count: int  # Number of documents in the collection
    chunk_count: int  # Number of chunks in the collection
    error_message: Optional[str]  # Error message if processing failed

Delete Document Collection

Description: Deletes a document collection and its associated data.

URL: /rag/collections/{collection_id}/

Method: DELETE

Parameters:

collection_id: str  # ID of the document collection

Response: 200 OK with message

Get Collection Progress

Description: Gets the processing progress of a document collection.

URL: /rag/collections/{collection_id}/progress/

Method: GET

Parameters:

collection_id: str  # ID of the document collection

Response Schema:

class ProcessingProgressSchema:
    id: str  # ID of the collection
    status: str  # Status of processing
    progress: float  # Progress percentage (0-100)
    current_step: Optional[str]  # Current processing step
    total_files: Optional[int]  # Total number of files
    processed_files: Optional[int]  # Number of processed files
    total_chunks: Optional[int]  # Total number of chunks
    processed_chunks: Optional[int]  # Number of processed chunks
    error_message: Optional[str]  # Error message if processing failed
    created_at: str  # When processing started (ISO format)
    updated_at: str  # When processing was last updated (ISO format)

Add Documents to Collection

Description: Adds documents to an existing collection. The documents are processed asynchronously in the background.

URL: /rag/collections/{collection_id}/documents/

Method: POST

Parameters:

collection_id: str  # ID of the document collection

Form Data:

files: List[UploadFile]  # List of files to upload

Response Schema:

class DocumentCollectionResponseSchema:
    # Same as Get Document Collection

Get Collection Documents

Description: Gets all documents and their chunks for a collection.

URL: /rag/collections/{collection_id}/documents/

Method: GET

Parameters:

collection_id: str  # ID of the document collection

Response Schema:

List[DocumentWithChunksSchema]

Where DocumentWithChunksSchema contains:

class DocumentWithChunksSchema:
    id: str  # ID of the document
    title: str  # Title of the document
    metadata: Dict[str, Any]  # Metadata about the document
    chunks: List[DocumentChunkSchema]  # List of chunks in the document

Delete Document from Collection

Description: Deletes a document from a collection.

URL: /rag/collections/{collection_id}/documents/{document_id}/

Method: DELETE

Parameters:

collection_id: str  # ID of the document collection
document_id: str  # ID of the document to delete

Response: 200 OK with message

Preview Chunk

Description: Previews how a document would be chunked with a given configuration.

URL: /rag/collections/preview_chunk/

Method: POST

Form Data:

file: UploadFile  # File to preview
chunking_config: str  # JSON string containing chunking configuration

Response Schema:

{
    "chunks": List[Dict[str, Any]],  # Preview of chunks
    "total_chunks": int  # Total number of chunks
}

Vector Indices

Create Vector Index

Description: Creates a new vector index from a document collection. The index is created asynchronously in the background.

URL: /rag/indices/

Method: POST

Request Payload:

class VectorIndexCreateSchema:
    name: str  # Name of the index
    description: str  # Description of the index
    collection_id: str  # ID of the document collection
    embedding: EmbeddingConfigSchema  # Configuration for embedding

Response Schema:

class VectorIndexResponseSchema:
    id: str  # ID of the vector index
    name: str  # Name of the index
    description: str  # Description of the index
    collection_id: str  # ID of the document collection
    status: str  # Status of the index (processing, ready, failed)
    created_at: str  # When the index was created (ISO format)
    updated_at: str  # When the index was last updated (ISO format)
    document_count: int  # Number of documents in the index
    chunk_count: int  # Number of chunks in the index
    embedding_model: str  # Name of the embedding model
    vector_db: str  # Name of the vector database
    error_message: Optional[str]  # Error message if processing failed

List Vector Indices

Description: Lists all vector indices.

URL: /rag/indices/

Method: GET

Response Schema:

List[VectorIndexResponseSchema]

Get Vector Index

Description: Gets details of a specific vector index.

URL: /rag/indices/{index_id}/

Method: GET

Parameters:

index_id: str  # ID of the vector index

Response Schema:

class VectorIndexResponseSchema:
    # Same as Create Vector Index response

Delete Vector Index

Description: Deletes a vector index and its associated data.

URL: /rag/indices/{index_id}/

Method: DELETE

Parameters:

index_id: str  # ID of the vector index

Response: 200 OK with message

Get Index Progress

Description: Gets the processing progress of a vector index.

URL: /rag/indices/{index_id}/progress/

Method: GET

Parameters:

index_id: str  # ID of the vector index

Response Schema:

class ProcessingProgressSchema:
    # Same as Get Collection Progress response

Retrieve from Index

Description: Retrieves relevant chunks from a vector index based on a query.

URL: /rag/indices/{index_id}/retrieve/

Method: POST

Parameters:

index_id: str  # ID of the vector index

Request Payload:

class RetrievalRequestSchema:
    query: str  # Query to search for
    top_k: Optional[int] = 5  # Number of results to return
    score_threshold: Optional[float] = None  # Minimum score threshold
    semantic_weight: Optional[float] = 1.0  # Weight for semantic search
    keyword_weight: Optional[float] = 0.0  # Weight for keyword search

Response Schema:

class RetrievalResponseSchema:
    results: List[RetrievalResultSchema]  # List of retrieval results
    total_results: int  # Total number of results

Where RetrievalResultSchema contains:

class RetrievalResultSchema:
    text: str  # Text of the chunk
    score: float  # Relevance score
    metadata: ChunkMetadataSchema  # Metadata about the chunk