How to call text-to-text DIAL applications
From this notebook, you can learn how to call text-to-text DIAL applications via DIAL API chat/completions call.
For this example, we use a sample text-to-text application called Echo, which returns the content of the last user message.
Setup
Step 1: install the necessary dependencies and import the libraries we are going to use.
!pip install requests==2.32.3
!pip install openai==1.43.0
!pip install langchain-openai==0.1.23
import requests
import openai
import langchain_openai
Step 2: if DIAL Core server is already configured and running, set
env vars DIAL_URL
and APP_NAME
to point to the DIAL Core server and
the text-to-text application (or model) you want to use.
Otherwise, run the docker-compose
file
in a separate terminal to start the DIAL Core server locally along
with a sample echo application. The DIAL Core will become available
at http://localhost:8080
:
docker compose up core echo
Step 3: configure DIAL_URL
and APP_NAME
env vars. The default
values are configured under the assumption that DIAL Core is running
locally via the docker-compose file.
import os
dial_url = os.environ.get("DIAL_URL", "http://localhost:8080")
os.environ["DIAL_URL"] = dial_url
app_name = os.environ.get("APP_NAME", "echo")
os.environ["APP_NAME"] = app_name
Using Curl
- The DIAL deployment is called
app_name
. - The local DIAL Core server URL is
dial_url
. - The OpenAI API version we are going to use is
2023-12-01-preview
.
Therefore, the application is accessible via the following URL:
${DIAL_URL}/openai/deployments/${APP_NAME}/chat/completions?api-version=2023-12-01-preview
The curl command that requests completion for a single message chat is:
!curl -X POST "${DIAL_URL}/openai/deployments/${APP_NAME}/chat/completions?api-version=2023-12-01-preview" \
-H "Api-Key:dial_api_key" \
-H "Content-Type:application/json" \
-d '{"messages": [{"role": "user", "content": "Hello world!"}]}'
{"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Hello world!"}}],"usage":null,"id":"37ffdc98-da4d-48e8-8dec-2d0ec0fd94b1","created":1707310417,"object":"chat.completion"}
Using Python library Requests
Let’s make an HTTP request using the Python library requests
and make
sure the output message matches the message in the request.
The arguments are identical to the curl command above.
Let’s call the application in the non-streaming mode first:
response = requests.post(
f"{dial_url}/openai/deployments/{app_name}/chat/completions?api-version=2023-12-01-preview",
headers={"Api-Key": "dial_api_key"},
json={"messages": [{"role": "user", "content": "Hello world!"}]},
)
body = response.json()
display(body)
completion = body["choices"][0]["message"]["content"]
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
{'choices': [{'index': 0,
'finish_reason': 'stop',
'message': {'role': 'assistant', 'content': 'Hello world!'}}],
'usage': None,
'id': 'dd3647aa-2496-461c-adc4-746e323ee13f',
'created': 1707310430,
'object': 'chat.completion'}
Completion: 'Hello world!'
When streaming is enabled, the chat completion returns a sequence of messages, each containing a chunk of a generated response:
response = requests.post(
f"{dial_url}/openai/deployments/{app_name}/chat/completions?api-version=2023-12-01-preview",
headers={"Api-Key": "dial_api_key"},
json={"messages": [{"role": "user", "content": "Hello world!"}], "stream": True},
)
for chunk in response.iter_lines():
print(chunk)
b'data: {"choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":null,"id":"3c231303-2c25-48a0-bf5e-4e46243ba2eb","created":1707310448,"object":"chat.completion.chunk"}'
b''
b'data: {"choices":[{"index":0,"finish_reason":null,"delta":{"content":"Hello world!"}}],"usage":null,"id":"3c231303-2c25-48a0-bf5e-4e46243ba2eb","created":1707310448,"object":"chat.completion.chunk"}'
b''
b'data: {"choices":[{"index":0,"finish_reason":"stop","delta":{}}],"usage":null,"id":"3c231303-2c25-48a0-bf5e-4e46243ba2eb","created":1707310448,"object":"chat.completion.chunk"}'
b''
b'data: [DONE]'
b''
Using OpenAI Python SDK
The DIAL deployment could be called using OpenAI Python SDK as well.
openai_client = openai.AzureOpenAI(
azure_endpoint=dial_url,
azure_deployment=app_name,
api_key="dial_api_key",
api_version="2023-12-01-preview",
)
Let’s call the application in the non-streaming mode:
chat_completion = openai_client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Hello world!",
}
],
model=app_name,
)
print(chat_completion)
completion = chat_completion.choices[0].message.content
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
ChatCompletion(id='1d020e70-9de6-402a-a2e0-cb45e340aafa', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello world!', role='assistant', function_call=None, tool_calls=None))], created=1707310540, model=None, object='chat.completion', system_fingerprint=None, usage=None)
Completion: 'Hello world!'
Let’s call the application in the streaming mode:
chat_completion = openai_client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Hello world!",
}
],
stream=True,
model=app_name,
)
completion = ""
for chunk in chat_completion:
print(chunk)
content = chunk.choices[0].delta.content
if content:
completion += content
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
ChatCompletionChunk(id='3a99fb21-d47c-411d-a2c2-6f51ea9d12f6', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1707310529, model=None, object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='3a99fb21-d47c-411d-a2c2-6f51ea9d12f6', choices=[Choice(delta=ChoiceDelta(content='Hello world!', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1707310529, model=None, object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='3a99fb21-d47c-411d-a2c2-6f51ea9d12f6', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1707310529, model=None, object='chat.completion.chunk', system_fingerprint=None, usage=None)
Completion: 'Hello world!'
Using LangChain
Let’s call the application via the LangChain library.
from langchain_core.messages import HumanMessage
llm = langchain_openai.AzureChatOpenAI(
azure_endpoint=dial_url,
azure_deployment=app_name,
api_key="dial_api_key",
api_version="2023-12-01-preview",
)
Let’s call the application in the non-streaming mode:
output = llm.generate(messages=[[HumanMessage(content="Hello world!")]])
print(output)
completion = output.generations[0][0].text
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
generations=[[ChatGeneration(text='Hello world!', generation_info={'finish_reason': 'stop', 'logprobs': None}, message=AIMessage(content='Hello world!'))]] llm_output={'token_usage': {}, 'model_name': 'gpt-3.5-turbo'} run=[RunInfo(run_id=UUID('ca6e6bbf-84cb-489a-abcf-9c6ed922713d'))]
Completion: 'Hello world!'
Let’s call the application in the streaming mode:
output = llm.stream(input=[HumanMessage(content="Hello world!")])
completion = ""
for chunk in output:
print(chunk.dict())
completion += chunk.content
print(f"Completion: {completion!r}")
assert completion == "Hello world!", "Unexpected completion"
{'content': '', 'additional_kwargs': {}, 'type': 'AIMessageChunk', 'example': False}
{'content': 'Hello world!', 'additional_kwargs': {}, 'type': 'AIMessageChunk', 'example': False}
{'content': '', 'additional_kwargs': {}, 'type': 'AIMessageChunk', 'example': False}
Completion: 'Hello world!'