Azure AI Foundry Inference (Preview)

This is a first-party Swagger specification for Azure AI Foundry models, specifically designed for Chat Completion tasks. It currently supports only the Chat Completions endpoint.

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
     -   US Department of Defense (DoD)
Contact
Name Microsoft
URL Microsoft LogicApps Support
Connector Metadata
Publisher Microsoft

Creating a connection

The connector supports the following authentication types:

Default Parameters for creating connection. All regions Shareable

Default

Applicable: All regions

Parameters for creating connection.

This is shareable connection. If the power app is shared with another user, connection is shared as well. For more information, please see the Connectors overview for canvas apps - Power Apps | Microsoft Docs

Name Type Description Required
API Key securestring The API Key for this Model Inference Endpoint True
Target Uri string Specify the inference endpoint for the Foundry model True

Throttling Limits

Name Calls Renewal Period
API calls per connection 5000 60 seconds

Actions

Create a chat completion

Generates a completion for a conversation, based on the provided messages and model configuration.

Create a chat completion

Generates a completion for a conversation, based on the provided messages and model configuration.

Parameters

Name Key Required Type Description
API version
api-version string

The version of the API to use for this model chat completions endpoint.

role
role string

The role of the sender of the message (e.g., 'user', 'assistant').

content
content object

The content of the message.

temperature
temperature float

The sampling temperature to use, between 0 and 1. Higher values make the output more random.

top_p
top_p float

The top-p sampling parameter, between 0 and 1.

max_tokens
max_tokens integer

The maximum number of tokens to generate in the response.

model
model string

Model Deployment Name.

Returns

Definitions

Choice

Name Path Type Description
content_filter_results
content_filter_results object

Results from the content filter applied to the response.

finish_reason
finish_reason string

The reason the model stopped generating further tokens. Possible values include 'stop', 'length', 'content_filter', etc.

index
index integer

The index of this choice within the generated set of completions.

logprobs
logprobs string

Log probabilities associated with each token in the response (if requested).

content
message.content string

The content of the generated message in the conversation. This is the response to user's NL query.

refusal
message.refusal string

If the model refuses to generate a message, this field describes the refusal.

role
message.role string

The role of the sender of the message (e.g., 'user', 'assistant').

PromptFilterResult

Name Path Type Description
prompt_index
prompt_index integer

The index of the prompt in the original input.

content_filter_results
content_filter_results object

The content filter metadata applied to the prompt.

CompletionTokensDetails

Details about the token usage for completion.

Name Path Type Description
accepted_prediction_tokens
accepted_prediction_tokens integer

The number of tokens accepted as valid predictions for the response.

reasoning_tokens
reasoning_tokens integer

The number of tokens used for the model's reasoning process.

rejected_prediction_tokens
rejected_prediction_tokens integer

The number of tokens rejected during the prediction process.

PromptTokensDetails

Details about the tokens used in the prompt.

Name Path Type Description
cached_tokens
cached_tokens integer

The number of tokens that were cached and reused for the prompt.

Usage

Token usage details for the request, including both prompt and completion tokens.

Name Path Type Description
completion_tokens
completion_tokens integer

The number of tokens consumed by the completion.

completion_tokens_details
completion_tokens_details CompletionTokensDetails

Details about the token usage for completion.

prompt_tokens
prompt_tokens integer

The number of tokens consumed by the prompt.

prompt_tokens_details
prompt_tokens_details PromptTokensDetails

Details about the tokens used in the prompt.

total_tokens
total_tokens integer

The total number of tokens consumed by the entire request (prompt + completion).

ChatCompletionResponse

Name Path Type Description
choices
choices array of Choice

The list of generated completions for the given prompt.

id
id string

A unique identifier for the chat completion request.

model
model string

The model used for generating the chat completion.

prompt_filter_results
prompt_filter_results array of PromptFilterResult

The content filter results for each prompt in the request.

usage
usage Usage

Token usage details for the request, including both prompt and completion tokens.