Creating Chat Completion Predictors
You can create chat completion predictors compatible with Muna’sopenai.chat.completions.create interface.
1
Accepting Chat Messages
Chat completion predictors should accept a list of input messages with type
list[muna.beta.openai.Message]:llm.py
2
Returning Chat Completion Chunks
Chat completion predictors must return an iterator of completion chunks, with type
Iterator[muna.beta.openai.ChatCompletionChunk]:llm.py
3
Creating Chat Completions
We recommend using the
llama-cpp-python package to
create chat completions using Llama.cpp:llm.py
Creating Embedding Predictors
You can create text embedding predictors compatible with Muna’sopenai.embeddings.create interface.
1
Accepting Input Texts
Embedding predictors should accept a list of input texts to embed, as a
list[str]:embed_text.py
2
Returning the Embeddings
Embedding predictors must return an embedding matrix as a Numpy
ndarray.
The array must have a Parameter.Embedding
annotation:embed_text.py
The returned
ndarray must have a float32 data type.The returned
ndarray must be a 2D array with shape (N,D), where
N is the number of input texts and D is the embedding dimension.3
(Optional) Supporting Matryoshka Embeddings
Some embedding models allow for specifying the number of embedding dimensions, based on Matryoshka representation learning.
To expose this setting, add an
int parameter with the
Parameter.EmbeddingDims annotation:embed_text.py
Creating Speech Predictors
You can create speech generation predictors compatible with Muna’sopenai.audio.speech.create interface.
1
Accepting Input Text
Speech generation predictors should accept an input text
str:generate_speech.py
2
Accepting a Generation Voice
Speech generation predictors must also accept a generation voice argument. We recommend using a
Literal or
StrEnum type.
Regardless of the type you choose, the parameter must have a
Parameter.AudioVoice annotation:generate_speech.py
The generation voice must be a required parameter, because developers are required to specify
the voice in the OpenAI interface.
3
Returning the Generated Audio
Speech generation predictors must return the generated audio as a Numpy
ndarray containing
linear PCM samples. The array must have a Parameter.Audio
annotation:generate_speech.py
The returned
ndarray must have a float32 data type.The returned
ndarray must either be a 1D array with shape (F,) for single channel audio; or a
2D array with shape (C,F) where C is the channel count.4
(Optional) Supporting Audio Speed
Some speech generation predictors support configuring the speed of the generated audio. To expose this
setting, add a
float parameter with a Parameter.AudioSpeed annotation:generate_speech.py