Compiling Chat Completion Models
You can compile chat completion models compatible with Muna’sopenai.chat.completions.create interface.
Accepting Chat Messages
Chat completion functions should accept a list of input messages with type
list[muna.beta.openai.Message]:llm.py
Returning Chat Completion Chunks
Chat completion functions must return an iterator of completion chunks, with type
Iterator[muna.beta.openai.ChatCompletionChunk]:llm.py
Creating Chat Completions
We recommend using the
llama-cpp-python package to
create chat completions using Llama.cpp:llm.py
Compiling Embedding Models
You can compile text embedding models compatible with Muna’sopenai.embeddings.create interface.
Accepting Input Texts
Embedding functions should accept a list of input texts to embed, as a
list[str]:embed_text.py
Returning the Embeddings
Embedding functions must return an embedding matrix as a Numpy
ndarray.
The array must have a Parameter.Embedding
annotation:embed_text.py
The returned
ndarray must have a float32 data type.The returned
ndarray must be a 2D array with shape (N,D), where
N is the number of input texts and D is the embedding dimension.(Optional) Supporting Matryoshka Embeddings
Some embedding models allow for specifying the number of embedding dimensions, based on Matryoshka representation learning.
To expose this setting, add an
int parameter with the
Parameter.EmbeddingDims annotation:embed_text.py
Compiling Speech Models
You can compile text-to-speech models compatible with Muna’sopenai.audio.speech.create interface.
Accepting a Generation Voice
Text-to-speech functions must also accept a generation voice argument. We recommend using a
Literal or
StrEnum type.
Regardless of the type you choose, the parameter must have a
Parameter.AudioVoice annotation:generate_speech.py
The generation voice must be a required parameter, because developers are required to specify
the voice in the OpenAI interface.
Returning the Generated Audio
Speech generation functions must return the generated audio as a Numpy
ndarray containing
linear PCM samples. The array must have a Parameter.Audio
annotation:generate_speech.py
The returned
ndarray must have a float32 data type.The returned
ndarray must either be a 1D array with shape (F,) for single channel audio; or a
2D array with shape (F,C) where C is the channel count (interleaved).(Optional) Supporting Audio Speed
Some text-to-speech functions support configuring the speed of the generated audio. To expose this
setting, add a
float parameter with a Parameter.AudioSpeed annotation:generate_speech.py
Compiling Transcription Models
You can compile speech-to-text models compatible with Muna’sopenai.audio.transcriptions.create interface.
Accepting Input Audio
Transcription functions should accept input audio as a Numpy
ndarray annotated
with the Parameter.Audio annotation:moonshine_base.py