Evaluations API

This document outlines the API endpoints for managing evaluations in PySpur.

List Available Evaluations

Description: Lists all available evaluations by scanning the tasks directory for YAML files. Returns metadata about each evaluation including name, description, type, and number of samples.

URL: /evals/

Method: GET

Response Schema:

List[Dict[str, Any]]

Each dictionary in the list contains:

{
    "name": str,  # Name of the evaluation
    "description": str,  # Description of the evaluation
    "type": str,  # Type of evaluation
    "num_samples": str,  # Number of samples in the evaluation
    "paper_link": str,  # Link to the paper describing the evaluation
    "file_name": str  # Name of the YAML file
}

Launch Evaluation

Description: Launches an evaluation job by triggering the evaluator with the specified evaluation configuration. The evaluation is run asynchronously in the background.

URL: /evals/launch/

Method: POST

Request Payload:

class EvalRunRequest:
    eval_name: str  # Name of the evaluation to run
    workflow_id: str  # ID of the workflow to evaluate
    output_variable: str  # Output variable to evaluate
    num_samples: int = 100  # Number of random samples to evaluate

Response Schema:

class EvalRunResponse:
    run_id: str  # ID of the evaluation run
    eval_name: str  # Name of the evaluation
    workflow_id: str  # ID of the workflow being evaluated
    status: EvalRunStatusEnum  # Status of the evaluation run
    start_time: datetime  # When the evaluation started
    end_time: Optional[datetime]  # When the evaluation ended (if completed)
    results: Optional[Dict[str, Any]]  # Results of the evaluation (if completed)

Get Evaluation Run Status

Description: Gets the status of a specific evaluation run, including results if the evaluation has completed.

URL: /evals/runs/{eval_run_id}

Method: GET

Parameters:

eval_run_id: str  # ID of the evaluation run

Response Schema:

class EvalRunResponse:
    run_id: str  # ID of the evaluation run
    eval_name: str  # Name of the evaluation
    workflow_id: str  # ID of the workflow being evaluated
    status: EvalRunStatusEnum  # Status of the evaluation run
    start_time: datetime  # When the evaluation started
    end_time: Optional[datetime]  # When the evaluation ended (if completed)
    results: Optional[Dict[str, Any]]  # Results of the evaluation (if completed)

List Evaluation Runs

Description: Lists all evaluation runs, ordered by start time descending.

URL: /evals/runs/

Method: GET

Response Schema:

List[EvalRunResponse]

Where EvalRunResponse contains:

class EvalRunResponse:
    run_id: str  # ID of the evaluation run
    eval_name: str  # Name of the evaluation
    workflow_id: str  # ID of the workflow being evaluated
    status: EvalRunStatusEnum  # Status of the evaluation run
    start_time: datetime  # When the evaluation started
    end_time: Optional[datetime]  # When the evaluation ended (if completed)
    results: Optional[Dict[str, Any]]  # Results of the evaluation (if completed)

Get Started

Chatbots

Tools

RAG

Evals

API Reference

Evaluations

Evaluations API

List Available Evaluations

Launch Evaluation

Get Evaluation Run Status

List Evaluation Runs

Get Started

Chatbots

Tools

RAG

Evals

API Reference

​Evaluations API

​List Available Evaluations

​Launch Evaluation

​Get Evaluation Run Status

​List Evaluation Runs

Evaluations API

List Available Evaluations

Launch Evaluation

Get Evaluation Run Status

List Evaluation Runs