Skip to main content

How to call text-to-text DIAL applications

From this notebook, you can learn how to call text-to-text DIAL applications via DIAL API chat/completions call.

View Jupyter Notebook

For this example, we use a sample text-to-text application called Echo, which returns the content of the last user message.

Setup

Step 1: install the necessary dependencies and import the libraries we are going to use.

!pip install requests==2.31.0
!pip install openai==1.9.0
!pip install langchain-openai==0.0.3
import requests
import openai
import langchain_openai

Step 2: if DIAL Core server is already configured and running, set env vars DIAL_URL and APP_NAME to point to the DIAL Core server and the text-to-text application (or model) you want to use.

Otherwise, run the docker-compose file in a separate terminal to start the DIAL Core server locally along with a sample echo application. The DIAL Core will become available at http://localhost:8080:

docker compose up core echo

Step 3: configure DIAL_URL and APP_NAME env vars. The default values are configured under the assumption that DIAL Core is running locally via the docker-compose file.

import os

dial_url = os.environ.get("DIAL_URL", "http://localhost:8080")
os.environ["DIAL_URL"] = dial_url

app_name = os.environ.get("APP_NAME", "echo")
os.environ["APP_NAME"] = app_name

Using Curl

  • The DIAL deployment is called app_name.
  • The local DIAL Core server URL is dial_url.
  • The OpenAI API version we are going to use is 2023-12-01-preview.

Therefore, the application is accessible via the following URL:

${DIAL_URL}/openai/deployments/${APP_NAME}/chat/completions?api-version=2023-12-01-preview

The curl command that requests completion for a single message chat is:

!curl -X POST "${DIAL_URL}/openai/deployments/${APP_NAME}/chat/completions?api-version=2023-12-01-preview" \
-H "Api-Key:dial_api_key" \
-H "Content-Type:application/json" \
-d '{"messages": [{"role": "user", "content": "Hello world!"}]}'
{"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Hello world!"}}],"usage":null,"id":"37ffdc98-da4d-48e8-8dec-2d0ec0fd94b1","created":1707310417,"object":"chat.completion"}

Using Python library Requests

Let’s make an HTTP request using the Python library requests and make sure the output message matches the message in the request.

The arguments are identical to the curl command above.

Let’s call the application in the non-streaming mode first:

response = requests.post(
f"{dial_url}/openai/deployments/{app_name}/chat/completions?api-version=2023-12-01-preview",
headers={"Api-Key": "dial_api_key"},
json={"messages": [{"role": "user", "content": "Hello world!"}]},
)
body = response.json()
display(body)
completion = body["choices"][0]["message"]["content"]
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
{'choices': [{'index': 0,
'finish_reason': 'stop',
'message': {'role': 'assistant', 'content': 'Hello world!'}}],
'usage': None,
'id': 'dd3647aa-2496-461c-adc4-746e323ee13f',
'created': 1707310430,
'object': 'chat.completion'}
Completion: 'Hello world!'

When streaming is enabled, the chat completion returns a sequence of messages, each containing a chunk of a generated response:

response = requests.post(
f"{dial_url}/openai/deployments/{app_name}/chat/completions?api-version=2023-12-01-preview",
headers={"Api-Key": "dial_api_key"},
json={"messages": [{"role": "user", "content": "Hello world!"}], "stream": True},
)
for chunk in response.iter_lines():
print(chunk)
b'data: {"choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":null,"id":"3c231303-2c25-48a0-bf5e-4e46243ba2eb","created":1707310448,"object":"chat.completion.chunk"}'
b''
b'data: {"choices":[{"index":0,"finish_reason":null,"delta":{"content":"Hello world!"}}],"usage":null,"id":"3c231303-2c25-48a0-bf5e-4e46243ba2eb","created":1707310448,"object":"chat.completion.chunk"}'
b''
b'data: {"choices":[{"index":0,"finish_reason":"stop","delta":{}}],"usage":null,"id":"3c231303-2c25-48a0-bf5e-4e46243ba2eb","created":1707310448,"object":"chat.completion.chunk"}'
b''
b'data: [DONE]'
b''

Using OpenAI Python SDK

The DIAL deployment could be called using OpenAI Python SDK as well.

openai_client = openai.AzureOpenAI(
azure_endpoint=dial_url,
azure_deployment=app_name,
api_key="dial_api_key",
api_version="2023-12-01-preview",
)

Let’s call the application in the non-streaming mode:


chat_completion = openai_client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Hello world!",
}
],
model=app_name,
)
print(chat_completion)
completion = chat_completion.choices[0].message.content
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
ChatCompletion(id='1d020e70-9de6-402a-a2e0-cb45e340aafa', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello world!', role='assistant', function_call=None, tool_calls=None))], created=1707310540, model=None, object='chat.completion', system_fingerprint=None, usage=None)
Completion: 'Hello world!'

Let’s call the application in the streaming mode:

chat_completion = openai_client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Hello world!",
}
],
stream=True,
model=app_name,
)
completion = ""
for chunk in chat_completion:
print(chunk)
content = chunk.choices[0].delta.content
if content:
completion += content
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
ChatCompletionChunk(id='3a99fb21-d47c-411d-a2c2-6f51ea9d12f6', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1707310529, model=None, object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='3a99fb21-d47c-411d-a2c2-6f51ea9d12f6', choices=[Choice(delta=ChoiceDelta(content='Hello world!', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1707310529, model=None, object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='3a99fb21-d47c-411d-a2c2-6f51ea9d12f6', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1707310529, model=None, object='chat.completion.chunk', system_fingerprint=None, usage=None)
Completion: 'Hello world!'

Using LangChain

Let’s call the application via the LangChain library.

from langchain_core.messages import HumanMessage

llm = langchain_openai.AzureChatOpenAI(
azure_endpoint=dial_url,
azure_deployment=app_name,
api_key="dial_api_key",
api_version="2023-12-01-preview",
)

Let’s call the application in the non-streaming mode:

output = llm.generate(messages=[[HumanMessage(content="Hello world!")]])
print(output)
completion = output.generations[0][0].text
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
generations=[[ChatGeneration(text='Hello world!', generation_info={'finish_reason': 'stop', 'logprobs': None}, message=AIMessage(content='Hello world!'))]] llm_output={'token_usage': {}, 'model_name': 'gpt-3.5-turbo'} run=[RunInfo(run_id=UUID('ca6e6bbf-84cb-489a-abcf-9c6ed922713d'))]
Completion: 'Hello world!'

Let’s call the application in the streaming mode:

output = llm.stream(input=[HumanMessage(content="Hello world!")])
completion = ""
for chunk in output:
print(chunk.dict())
completion += chunk.content
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
{'content': '', 'additional_kwargs': {}, 'type': 'AIMessageChunk', 'example': False}
{'content': 'Hello world!', 'additional_kwargs': {}, 'type': 'AIMessageChunk', 'example': False}
{'content': '', 'additional_kwargs': {}, 'type': 'AIMessageChunk', 'example': False}
Completion: 'Hello world!'