Streaming + Async
Streaming Responses
LiteLLM supports streaming the model response back by passing stream=True
as an argument to the completion function
Usage
from litellm import completion
messages = [{"role": "user", "content": "Hey, how's it going?"}]
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
Helper function
LiteLLM also exposes a helper function to rebuild the complete streaming response from the list of chunks.
from litellm import completion
messages = [{"role": "user", "content": "Hey, how's it going?"}]
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
chunks.append(chunk)
print(litellm.stream_chunk_builder(chunks, messages=messages))
Async Completion
Asynchronous Completion with LiteLLM. LiteLLM provides an asynchronous version of the completion function called acompletion
Usage
from litellm import acompletion
import asyncio
async def test_get_response():
user_message = "Hello, how are you?"
messages = [{"content": user_message, "role": "user"}]
response = await acompletion(model="gpt-3.5-turbo", messages=messages)
return response
response = asyncio.run(test_get_response())
print(response)
Async Streaming
We've implemented an __anext__()
function in the streaming object returned. This enables async iteration over the streaming object.
Usage
Here's an example of using it with openai.
from litellm import acompletion
import asyncio, os, traceback
async def completion_call():
try:
print("test acompletion + streaming")
response = await acompletion(
model="gpt-3.5-turbo",
messages=[{"content": "Hello, how are you?", "role": "user"}],
stream=True
)
print(f"response: {response}")
async for chunk in response:
print(chunk)
except:
print(f"error occurred: {traceback.format_exc()}")
pass
asyncio.run(completion_call())