Muna’s mock OpenAI client allows developers to use millions of open-source AI
models without changing their existing code.
Creating Chat Completion Predictors
You can create chat completion predictors compatible with Muna’sopenai.chat.completions.create
interface.
1
Accepting Chat Messages
Chat completion predictors should accept a list of input messages with type
list[muna.beta.openai.Message]
:llm.py
2
Returning Chat Completion Chunks
Chat completion predictors must return an iterator of completion chunks, with type
Iterator[muna.beta.openai.ChatCompletionChunk]
:llm.py
3
Creating Chat Completions
We recommend using the
llama-cpp-python
package to
create chat completions using Llama.cpp
:llm.py
Creating Embedding Predictors
You can create text embedding predictors compatible with Muna’sopenai.embeddings.create
interface.
1
Accepting Input Texts
Embedding predictors should accept a list of input texts to embed, as a
list[str]
:embed_text.py
2
Returning the Embeddings
Embedding predictors must return an embedding metrix as a Numpy
ndarray
.
The array must have a Parameter.Embedding
annotation:embed_text.py
The returned array must have a
float32
data type.The returned array must either be a 2D array with shape
(N,D)
, where
N
is the number of input texts and D
is the embedding dimension.3
(Optional) Supporting Matryoshka Embeddings
Some embedding models allow for specifying the number of embedding dimensions, based on Matryoshka representation learning.
To expose this setting, add an
int
parameter with the
Parameter.EmbeddingDims
annotation:embed_text.py
To remain compatible with the OpenAI embeddings interface, the predictor must have only one required parameter.
As a result, make sure to specify a default value for the
dimensions
parameter.Creating Speech Predictors
You can create speech generation predictors compatible with Muna’sopenai.audio.speech.create
interface.
1
Accepting Input Text
Speech generation predictors should accept an input text
str
:generate_speech.py
2
Accepting a Generation Voice
Speech generation predictors must also accept a generation voice argument. We recommend using a
Literal
or
StrEnum
type.
Regardless of the type you choose, the parameter must have a
Parameter.AudioVoice
annotation:generate_speech.py
The generation voice must be a required parameter, because developers are required to specify
the voice in the OpenAI interface.
3
Returning the Generated Audio
Speech generation predictors must return the generated audio as a Numpy
ndarray
containing
linear PCM samples. The array must have a Parameter.Audio
annotation:generate_speech.py
The returned array must have a
float32
data type.The returned array must either be a 1D array with shape
(F,)
for single channel audio; or a
2D array with shape (C,F)
where C
is the channel count.4
(Optional) Supporting Audio Speed
Some speech generation predictors support configuring the speed of the generated audio. To expose this
setting, add a
float
parameter with a Parameter.AudioSpeed
annotation:generate_speech.py
The audio speed parameter must have a default value, because it is an optional setting in the OpenAI interface.