Generative AI strategies for math and data queries

Article
11/20/2024

Currently, there are varying opinions on how well tools such as ChatGPT or other language models handle math and data queries. In this article, we're going to identify strategies and set expectations when building agents in Copilot Studio that handle math and data queries.

Definitions of math and data queries in this article

The goal of this article isn't to evaluate whether generative AI can assist with calculating the perimeter of a rectangle or the diameter of a circle. Math, in this context, refers to typical natural language questions that someone would ask an agent. These questions assume the AI can aggregate and interpret sums, averages, and trends across the knowledge sources or data tables used to ground the models.

The desired outcome, in this case, isn't to answer a math equation. Instead, it's to help the user evaluate or understand data more efficiently. When users are looking for deep data analytics, such as looking for advanced predictive or prescriptive analytics, a custom agent is typically not the tool of choice. However, there are several agents in the Microsoft stack that are more directly focused on analytics. For example, the following agents supplement the language model with Microsoft application code for this purpose:

Data aggregates in Natural Language Understanding

When we ground an agent in our own knowledge sources, we're simplifying the discovery of information asked by a user in natural language. Keep in mind that language models are designed to predict the next word in a sequence rather than perform rigorous math. However, they can still provide useful insights and explanations. These insights are faster for information discovery than browsing keyword search results or manually scrolling through all the records in a table.

Copilot Studio agents can scan knowledge sources on our behalf. These agents summarize answers across topics, actions, and knowledge sources, whether they involve aggregates of numeric data or not. However, as we ground the models with our data, we must contextualize the data required for the AI to respond. Based on this understanding, we know when we should provide more context or topic nodes. This extra understanding is relevant when niche terms or highly technical verbiage is found in the data sources. The following are examples of data queries that involve mathematical expressions:

Example questions	Things to consider
How many of our customers in North America purchased product X?	This prompt involves multiple structured tables in a relational database, and commonly looking over hundreds or even thousands of records.
What was the total cost impact for repair work items after the hurricane?	This prompt involves a table of items repaired, with a column for cost impact for each work item. If the table has more repairs than work items related to the hurricane, then a category or reason column would be needed for the AI to know which are related to the hurricane.
Which of our customers submitted the most requests for change?	This prompt involves a table with change requests, and related table with customer names. However, this first counts requests by customer, then returns the customer with the highest number of requests (and not the highest cost impact dollar value).

Prompt clarity and structure

Language models rely heavily on how well the question is phrased. A well-structured prompt that clearly explains the math problem, defines variables, and breaks the task into steps leads to more accurate responses. For instance, asking for a direct answer to a simple arithmetic problem likely works well, but vague or multi-layered questions without clear context might confuse the model.

Here are some sample prompts based on a structured knowledge source such as a Dataverse table. This sample illustrates the addition of a Power Apps Dataverse table, as shown in the following image.

Screenshot of a Power Apps Dataverse table.

The Dataverse table was added as a knowledge source, and given an accurate knowledge description, along with synonyms and glossary definitions to aid the AI in interpreting the data.

Screenshot of a knowledge source, highlighting the description.

Screenshot of a knowledge source, highlighting the synonyms and glossary definitions.

Specific prompts

These prompts are specific and scoped to the information being requested.

"Can you provide full details on change order reference PCO-1003, including account name, amount requested, and reason for request?"
"How many accounts submitted change requests in August of 2024?"
"What is the total number of change orders requested to date?"
"Which customer submitted the highest cost impact in 2024?"

Generalized prompts

These prompts are generalized and are unlikely to consistently aggregate all of the results, likely only returning the top three results.

"Please list our accounts in the order of their respective revenue."
"Please list the change requests which were submitted this year in August, and include change amounts and status."
"Can you list all the change order requests submitted to date?"

Note

Enabling or disabling the AI's ability to use its own general knowledge can affect the accuracy or appropriateness of the returned answers.

Tips and tricks

Here are a few suggestions when working with Copilot Studio which help you to set expectations around generative answers that rely on mathematical expressions.

Plan for scenarios that highlight top trends, rather than expecting calculations over thousands of records. Raise awareness to this conversational approach that summarizes rather than itemizes, instead.
Favor structured knowledge sources (tabular over nontabular) to optimize mathematical expressions.
Support specific scenarios, and understand the dependencies for the differences. For example, note the difference between these two questions:
- Which of our customers submitted the most requests for change? Counts request IDs, and returns the customer with the most requests – ignoring other columns
- Which of our customers have the highest cost impact across requests for change? Sums the cost impact column by customer, and returns the customer who submitted the highest total dollar amount. It only returns this information if it finds an appropriate column that is numeric or currency-based.
Be sure to identify and define any numeric columns for calculations. Ensure that they're formatted with the appropriate data type; both at the knowledge source level, and when used in any Copilot Studio variables. When possible, include a clear description, and include common synonyms for the relevant columns in the tables, columns, or action descriptions.

Tip

With natural language understanding, if the table headers are too technical in their naming protocol, the AI might not be able to answer the human-centric questions asked during the conversation flow. Add descriptors with typical verbiage used by your users.
Recognize that people only get answers over the data that they're permissioned to see. For example, a Sales table in Dataverse might only expose some records to specific business groups but not all. So, be sure that your agent doesn’t set the wrong expectations on how the data is summarized. For example, a request for total sales in 2024 only sums the owned or shared records.
Always set consumer expectations for AI driven answers. Use the agent Conversation start or the first message following topic triggers, to gently highlight the purpose and constraints for one or more relevant knowledge sources.

Use AI builder prompt actions

Prompts actions enable you to add generative AI capabilities from Power Apps to your agents and solutions in Copilot Studio. This feature allows you to perform tasks such as classification, summarization, draft content generation, data transformation and much more. With prompt actions, you can also tailor generative AI responses to use specific filters and aggregations from tables.

In the following screenshot, you see how a Copilot Studio maker used AI Builder prompt actions in Copilot Studio to summarize change order requests from both the Account table and the related PCO table.

Screenshot of the AI Builder prompt being displayed within Copilot Studio.

In the preceding example, the agent’s knowledge sources weren't used. Instead, the prompt includes the dynamic prompt variable for the Account Number, and a table from Dataverse as Data.

Tip

Related tables are assumed by the AI and don’t need to be added in this case. (The PCO table has a many to one relationship with Accounts.)

Share via