Evals
Learn how to evaluate, test, and validate Foldspace agent actions by running natural language prompts, verifying schema mapping, and refining action accuracy
The Evaluations tab is where you test if a user’s natural language prompt is correctly translated into the action you’ve defined, and whether the right input parameters are extracted based on the schema.
Goal
- Confirm the AI routes the user intent to the correct action.
- Check that the schema fields (parameters) are filled correctly from the prompt.
- Validate how the agent handles the server response using the mock response
Generating Test Prompts
- Open the Evaluations tab.
- Click Generate Prompts to auto-create a set of test prompts based on your action schema.
- Example (for
Create Campaign
action):- “Create a new campaign named ‘Summer Sale’ starting from July 1 to July 31 with a budget of 5000 and status ACTIVE.”
- “I want to create a campaign called ‘Holiday Promo’.”
- “Create campaign starting on August 1 with a budget of 2000 and status PAUSED.”
- Example (for
You can also click + Add New Test to write your own custom prompt.
Running a Test
- Click Run next to a test prompt.
- The agent will process the input and attempt to:
- Match the correct action.
- Extract values for each field in the Schema.
- Return the Response Mock you defined.
Reviewing Results
-
On the right-hand side, you’ll see the Agent Testing output.
- Example:
Campaign Created: Spring Sale Name: Spring Sale Start Date: 2024-05-01 End Date: 2024-05-31 Budget: 10,000 Status: ACTIVE
- Example:
-
In the Agent Testing panel, click the action link (e.g., Create Campaign was executed).
- This opens the Arguments view, which shows the raw schema extraction:
{ "startDate": "2024-05-01", "endDate": "2024-05-31", "budget": 10000, "status": "ACTIVE", "campaignName": "Spring Sale" }
- This opens the Arguments view, which shows the raw schema extraction:
This lets you confirm that user language (e.g., “budget of 10k”) is mapped into structured schema fields.
Best Practices
-
Create tests that cover:
- All required fields provided (happy path).
- Only required fields provided (minimal input).
- Missing required fields (should fail validation).
- Partial optional fields (some extras given, others missing).
-
Update your schema descriptions or instructions if the AI is misinterpreting user prompts.
-
Always re-run evaluations after editing the schema or response mock.
✅ Use Evaluations before publishing to ensure your action works reliably across different ways a user might phrase their request.
Updated 2 days ago