Evaluate a model's response

Чланак
03/20/2025

In this quickstart, you create an MSTest app to evaluate the chat response of an OpenAI model. The test app uses the Microsoft.Extensions.AI.Evaluation libraries.

Note

This quickstart demonstrates the simplest usage of the evaluation API. Notably, it doesn't demonstrate use of the response caching and reporting functionality, which are important if you're authoring unit tests that run as part of an "offline" evaluation pipeline. The scenario shown in this quickstart is suitable in use cases such as "online" evaluation of AI responses within production code and logging scores to telemetry, where caching and reporting aren't relevant. For a tutorial that demonstrates the caching and reporting functionality, see Tutorial: Evaluate a model's response with response caching and reporting

Prerequisites

.NET 8 or a later version
Visual Studio Code (optional)

Configure the AI service

To provision an Azure OpenAI service and model using the Azure portal, complete the steps in the Create and deploy an Azure OpenAI Service resource article. In the "Deploy a model" step, select the gpt-4o model.

Create the test app

Complete the following steps to create an MSTest project that connects to your local phi3:mini AI model.

In a terminal window, navigate to the directory where you want to create your app, and create a new MSTest app with the dotnet new command:
```
dotnet new mstest -o TestAI
```

Navigate to the TestAI directory, and add the necessary packages to your app:

dotnet add package Azure.AI.OpenAI
dotnet add package Azure.Identity
dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation.Quality --prerelease
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
dotnet add package Microsoft.Extensions.Configuration
dotnet add package Microsoft.Extensions.Configuration.UserSecrets

Run the following commands to add app secrets for your Azure OpenAI endpoint, model name, and tenant ID:
```
dotnet user-secrets init
dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-azure-openai-endpoint>
dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o
dotnet user-secrets set AZURE_TENANT_ID <your-tenant-id>
```
(Depending on your environment, the tenant ID might not be needed. In that case, remove it from the code that instantiates the DefaultAzureCredential.)
Open the new app in your editor of choice.

Add the test app code

Rename the Test1.cs file to MyTests.cs, and then open the file and rename the class to MyTests.

Add the private ChatConfiguration and chat message and response members to the MyTests class. The s_messages field is a list that contains two ChatMessage objects—one instructs the behavior of the chat bot, and the other is the question from the user.

private static ChatConfiguration? s_chatConfiguration;
private static IList<ChatMessage> s_messages = [
    new ChatMessage(
        ChatRole.System,
        """
        You're an AI assistant that can answer questions related to astronomy.
        Keep your responses concise and try to stay under 100 words.
        Use the imperial measurement system for all measurements in your response.
        """),
    new ChatMessage(
        ChatRole.User,
        "How far is the planet Venus from Earth at its closest and furthest points?")];
private static ChatResponse s_response = new();

Add the InitializeAsync method to the MyTests class.

[ClassInitialize]
public static async Task InitializeAsync(TestContext _)
{
    /// Set up the <see cref="ChatConfiguration"/>,
    /// which includes the <see cref="IChatClient"/> that the
    /// evaluator uses to communicate with the model.
    s_chatConfiguration = GetAzureOpenAIChatConfiguration();

    var chatOptions =
        new ChatOptions
        {
            Temperature = 0.0f,
            ResponseFormat = ChatResponseFormat.Text
        };

    // Fetch the response to be evaluated
    // and store it in a static variable.
    s_response = await s_chatConfiguration.ChatClient.GetResponseAsync(s_messages, chatOptions);
}

This method accomplishes the following tasks:

Sets up the ChatConfiguration.
Sets the ChatOptions, including the Temperature and the ResponseFormat.
Fetches the response to be evaluated by calling GetResponseAsync(IEnumerable<ChatMessage>, ChatOptions, CancellationToken), and stores it in a static variable.

Add the GetAzureOpenAIChatConfiguration method, which creates the IChatClient that the evaluator uses to communicate with the model.

private static ChatConfiguration GetAzureOpenAIChatConfiguration()
{
    IConfigurationRoot config = new ConfigurationBuilder().AddUserSecrets<MyTests>().Build();

    string endpoint = config["AZURE_OPENAI_ENDPOINT"];
    string model = config["AZURE_OPENAI_GPT_NAME"];
    string tenantId = config["AZURE_TENANT_ID"];

    // Get a chat client for the Azure OpenAI endpoint.
    AzureOpenAIClient azureClient =
        new(
            new Uri(endpoint),
            new DefaultAzureCredential(new DefaultAzureCredentialOptions() { TenantId = tenantId }));
    IChatClient client = azureClient.AsChatClient(modelId: model);

    return new ChatConfiguration(client);
}

Add a test method to evaluate the model's response.
```
[TestMethod]
public async Task TestCoherence()
{
    IEvaluator coherenceEvaluator = new CoherenceEvaluator();
    EvaluationResult result = await coherenceEvaluator.EvaluateAsync(
        s_messages,
        s_response,
        s_chatConfiguration);

    /// Retrieve the score for coherence from the <see cref="EvaluationResult"/>.
    NumericMetric coherence = result.Get<NumericMetric>(CoherenceEvaluator.CoherenceMetricName);

    // Validate the default interpretation
    // for the returned coherence metric.
    Assert.IsFalse(coherence.Interpretation!.Failed);
    Assert.IsTrue(coherence.Interpretation.Rating is EvaluationRating.Good or EvaluationRating.Exceptional);

    // Validate that no diagnostics are present
    // on the returned coherence metric.
    Assert.IsFalse(coherence.ContainsDiagnostics());
}
```
This method does the following:
- Invokes the CoherenceEvaluator to evaluate the coherence of the response. The EvaluateAsync(IEnumerable<ChatMessage>, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken) method returns an EvaluationResult that contains a NumericMetric. A NumericMetric contains a numeric value that's typically used to represent numeric scores that fall within a well-defined range.
- Retrieves the coherence score from the EvaluationResult.
- Validates the default interpretation for the returned coherence metric. Evaluators can include a default interpretation for the metrics they return. You can also change the default interpretation to suit your specific requirements, if needed.
- Validates that no diagnostics are present on the returned coherence metric. Evaluators can include diagnostics on the metrics they return to indicate errors, warnings, or other exceptional conditions encountered during evaluation.

Run the test/evaluation

Run the test using your preferred test workflow, for example, by using the CLI command dotnet test or through Test Explorer.

Next steps

Evaluate the responses from different OpenAI models.
Add response caching and reporting to your evaluation code. For more information, see Tutorial: Evaluate a model's response with response caching and reporting.

Делите путем