Preparing Conversational datasets
While it’s recommended to generate synthetic data to achieve both quantity and quality—especially for testing edge cases and building a diverse dataset—you may already have a manually curated, highly focused dataset or collected important data from failing responses or conversations in production.
It’s best practice to continually monitor your LLM application in production and add failing responses to new evaluation datasets for iterative evaluation in staging environments. To learn more about this feature on Confident AI, visit this section of the tutorial.
In this section, we'll be utilizing conversations with our chatbot to curate a conversational evaluation dataset for our medical chatbot application. We'll be covering:
- How to create conversational test cases from existing data
- Creating a Conversational Dataset
Login to Confident AI
As usual, let's begin by logging into Confident AI. In this section, we'll be leveraging the platform to view, edit, and leave feedback on our conversational datasets. Run the code below in your CLI to login:
deepeval login
1. Creating Conversational Test Cases
Let's say we've curated conversational data by having humans chat with our medical chatbot and saving these interactions to a JSON file. A conversational data might look like this:
conversation = [
{
"input": "I’ve been feeling fatigued and experiencing mild headaches for the past few days. Can I also know the available appointment slots for the next two days?",
"response": "Let's start by creating an appointment and recording your symptoms. Could you please provide more specific details about your symptoms? For example, are you experiencing any other symptoms like dizziness, nausea, or changes in vision? Also, do you have any known medical conditions or are you taking any medications? Once I have more details, I'll proceed with the diagnosis and check the available appointment slots for you."
},
{
"input": "I have been feeling fatigued and experiencing mild headaches for the past few days. Occasionally, I feel a bit dizzy, especially when standing up quickly. While there’s no nausea, I have noticed slight blurriness in my vision when the headache intensifies. I don’t have any known medical conditions, although I have experienced occasional migraines in the past. Currently, I am not taking any medications. Let me know if you need additional details!",
"response": "Your symptoms have been recorded, and a preliminary diagnosis suggests that they may be related to migraines, especially given your history. It's recommended to consult a healthcare professional for a thorough evaluation and to discuss potential preventive treatments. Could you please provide your name, the date you would like the appointment, and your email address so I can finalize the appointment details for you?"
},
...
]
We'll be using DeepEval to convert the conversation into a ConversationalTestCase
. Essentially, ConversationalTestCases
are collections of LLM test cases, with each test case referred to as a "turn." Therefore, they are designed to handle multi-turn conversations. Here’s how you can do it:
from deepeval.test_case import LLMTestCase, ConversationalTestCase
# Create Test Cases for each Turn
turns = [
LLMTestCase(
input=turn["input"],
actual_output=turn["response"]
)
for turn in conversation
]
# Define Conversational Test Case
conversation_test_case = ConversationalTestCase(turns=turns)
2. Creating a Conversational Dataset
To create a conversational dataset, simply pass in the list of conversational test cases to the EvaluationDataset
object.
from deepeval.dataset import EvaluationDataset
dataset = EvaluationDataset(conversational_goldens=[conversation_test_case])