Evaluate a model's response
In this quickstart, you create an MSTest app to evaluate the chat response of an OpenAI model. The test app uses the Microsoft.Extensions.AI.Evaluation libraries.
Note
This quickstart demonstrates the simplest usage of the evaluation API. Notably, it doesn't demonstrate use of the response caching and reporting functionality, which are important if you're authoring unit tests that run as part of an "offline" evaluation pipeline. The scenario shown in this quickstart is suitable in use cases such as "online" evaluation of AI responses within production code and logging scores to telemetry, where caching and reporting aren't relevant. For a tutorial that demonstrates the caching and reporting functionality, see Tutorial: Evaluate a model's response with response caching and reporting
Prerequisites
- .NET 8 or a later version
- Visual Studio Code (optional)
Configure the AI service
To provision an Azure OpenAI service and model using the Azure portal, complete the steps in the Create and deploy an Azure OpenAI Service resource article. In the "Deploy a model" step, select the gpt-4o
model.
Create the test app
Complete the following steps to create an MSTest project that connects to your local phi3:mini
AI model.
In a terminal window, navigate to the directory where you want to create your app, and create a new MSTest app with the
dotnet new
command:dotnet new mstest -o TestAI
Navigate to the
TestAI
directory, and add the necessary packages to your app:dotnet add package Azure.AI.OpenAI dotnet add package Azure.Identity dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease dotnet add package Microsoft.Extensions.AI.Evaluation.Quality --prerelease dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease dotnet add package Microsoft.Extensions.Configuration dotnet add package Microsoft.Extensions.Configuration.UserSecrets
Run the following commands to add app secrets for your Azure OpenAI endpoint, model name, and tenant ID:
dotnet user-secrets init dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-azure-openai-endpoint> dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o dotnet user-secrets set AZURE_TENANT_ID <your-tenant-id>
(Depending on your environment, the tenant ID might not be needed. In that case, remove it from the code that instantiates the DefaultAzureCredential.)
Open the new app in your editor of choice.
Add the test app code
Rename the Test1.cs file to MyTests.cs, and then open the file and rename the class to
MyTests
.Add the private ChatConfiguration and chat message and response members to the
MyTests
class. Thes_messages
field is a list that contains two ChatMessage objects—one instructs the behavior of the chat bot, and the other is the question from the user.private static ChatConfiguration? s_chatConfiguration; private static IList<ChatMessage> s_messages = [ new ChatMessage( ChatRole.System, """ You're an AI assistant that can answer questions related to astronomy. Keep your responses concise and try to stay under 100 words. Use the imperial measurement system for all measurements in your response. """), new ChatMessage( ChatRole.User, "How far is the planet Venus from Earth at its closest and furthest points?")]; private static ChatResponse s_response = new();
Add the
InitializeAsync
method to theMyTests
class.[ClassInitialize] public static async Task InitializeAsync(TestContext _) { /// Set up the <see cref="ChatConfiguration"/>, /// which includes the <see cref="IChatClient"/> that the /// evaluator uses to communicate with the model. s_chatConfiguration = GetAzureOpenAIChatConfiguration(); var chatOptions = new ChatOptions { Temperature = 0.0f, ResponseFormat = ChatResponseFormat.Text }; // Fetch the response to be evaluated // and store it in a static variable. s_response = await s_chatConfiguration.ChatClient.GetResponseAsync(s_messages, chatOptions); }
This method accomplishes the following tasks:
- Sets up the ChatConfiguration.
- Sets the ChatOptions, including the Temperature and the ResponseFormat.
- Fetches the response to be evaluated by calling GetResponseAsync(IEnumerable<ChatMessage>, ChatOptions, CancellationToken), and stores it in a static variable.
Add the
GetAzureOpenAIChatConfiguration
method, which creates the IChatClient that the evaluator uses to communicate with the model.private static ChatConfiguration GetAzureOpenAIChatConfiguration() { IConfigurationRoot config = new ConfigurationBuilder().AddUserSecrets<MyTests>().Build(); string endpoint = config["AZURE_OPENAI_ENDPOINT"]; string model = config["AZURE_OPENAI_GPT_NAME"]; string tenantId = config["AZURE_TENANT_ID"]; // Get a chat client for the Azure OpenAI endpoint. AzureOpenAIClient azureClient = new( new Uri(endpoint), new DefaultAzureCredential(new DefaultAzureCredentialOptions() { TenantId = tenantId })); IChatClient client = azureClient.AsChatClient(modelId: model); return new ChatConfiguration(client); }
Add a test method to evaluate the model's response.
[TestMethod] public async Task TestCoherence() { IEvaluator coherenceEvaluator = new CoherenceEvaluator(); EvaluationResult result = await coherenceEvaluator.EvaluateAsync( s_messages, s_response, s_chatConfiguration); /// Retrieve the score for coherence from the <see cref="EvaluationResult"/>. NumericMetric coherence = result.Get<NumericMetric>(CoherenceEvaluator.CoherenceMetricName); // Validate the default interpretation // for the returned coherence metric. Assert.IsFalse(coherence.Interpretation!.Failed); Assert.IsTrue(coherence.Interpretation.Rating is EvaluationRating.Good or EvaluationRating.Exceptional); // Validate that no diagnostics are present // on the returned coherence metric. Assert.IsFalse(coherence.ContainsDiagnostics()); }
This method does the following:
- Invokes the CoherenceEvaluator to evaluate the coherence of the response. The EvaluateAsync(IEnumerable<ChatMessage>, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken) method returns an EvaluationResult that contains a NumericMetric. A
NumericMetric
contains a numeric value that's typically used to represent numeric scores that fall within a well-defined range. - Retrieves the coherence score from the EvaluationResult.
- Validates the default interpretation for the returned coherence metric. Evaluators can include a default interpretation for the metrics they return. You can also change the default interpretation to suit your specific requirements, if needed.
- Validates that no diagnostics are present on the returned coherence metric. Evaluators can include diagnostics on the metrics they return to indicate errors, warnings, or other exceptional conditions encountered during evaluation.
- Invokes the CoherenceEvaluator to evaluate the coherence of the response. The EvaluateAsync(IEnumerable<ChatMessage>, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken) method returns an EvaluationResult that contains a NumericMetric. A
Run the test/evaluation
Run the test using your preferred test workflow, for example, by using the CLI command dotnet test
or through Test Explorer.
Next steps
- Evaluate the responses from different OpenAI models.
- Add response caching and reporting to your evaluation code. For more information, see Tutorial: Evaluate a model's response with response caching and reporting.