Specifying the Function Signature
The prediction function must be a module-level function, and must have parameter and return type annotations:Supported Parameter Types
Muna supports a fixed set of predictor input and output value types. Below are supported type annotations:Floating Point Values
Floating Point Values
float built-in type.numpy.float[16,32,64] types:Integer Values
Integer Values
int built-in type.numpy.int[8,16,32,64] types:Boolean Values
Boolean Values
bool built-in type.Tensor Values
Tensor Values
numpy.typing.NDArray[T] type, where T is
the tensor element type.| Numpy data type | Muna data type |
|---|---|
np.float16 | float16 |
np.float32 | float32 |
np.float64 | float64 |
np.int8 | int8 |
np.int16 | int16 |
np.int32 | int32 |
np.int64 | int64 |
np.uint8 | uint8 |
np.uint16 | uint16 |
np.uint32 | uint32 |
np.uint64 | uint64 |
bool | bool |
String Values
String Values
str built-in type.List Values
List Values
list[T] built-in type, where T is the element type.T is optional but strongly recommended because it is used to generate a schema for the parameter or
return value.Dictionary Values
Dictionary Values
- Using a Pydantic
BaseModelsubclass. - Using the
dict[str, T]built-in type.
Image Values
Image Values
PIL.Image.Image type.Binary Values
Binary Values
- Using the
bytesbuilt-in type. - Using the
bytearraybuilt-in type. - Using the
io.BytesIOtype.
Using Parameter Annotations
Muna supports attaching additional annotations to the function’s parameter and return types:- They help users know what input data to provide to the predictor and how to use output data from the predictor, via the parameter
description. - They help users search for predictors using highly detailed queries (e.g. MCP clients).
- They help the Muna client automatically provide familiar interfaces around your prediction function, e.g. with the OpenAI interface.
- They help the Muna website automatically create interactive
visualizersfor your prediction function.
Generic Annotation
Generic Annotation
Parameter.Generic annotation to provide information about a general input or output parameters:Parameter.Generic annotation definition:Numeric Annotation
Numeric Annotation
Parameter.Numeric annotation to specify numeric input or output parameters:Parameter.Numeric annotation definition:Audio Annotation
Audio Annotation
Parameter.Audio annotation to specify audio parameters:Parameter.Audio annotation definition:Audio Speed Annotation
Audio Speed Annotation
Parameter.AudioSpeed annotation to specify audio speed parameters in audio generation predictors:Parameter.AudioSpeed annotation definition:Audio Voice Annotation
Audio Voice Annotation
Parameter.AudioVoice annotation to specify audio voice parameters in audio generation predictors:Parameter.AudioVoice annotation definition:Bounding Box Annotation
Bounding Box Annotation
Parameter.BoundingBox or Parameter.BoundingBoxes annotations to specify
bounding box parameters in object detection predictors:Parameter.BoundingBox annotation definition:Depth Map Annotation
Depth Map Annotation
Parameter.DepthMap annotation to specify depth map parameters in depth estimation predictors:Parameter.DepthMap annotation definition:Embedding Annotation
Embedding Annotation
Parameter.Embedding annotation to specify vector embedding parameters in embedding predictors:Parameter.Embedding annotation definition:Embedding Dimensions Annotation
Embedding Dimensions Annotation
Parameter.EmbeddingDims annotation to specify an embedding
Matryoshka dimension parameter in embedding predictors:Parameter.EmbeddingDims annotation definition:Writing the Function Body
The function body can contain arbitrary Python code. Given that the Muna compiler is currently a proof of concept, it has limited coverage for Python language features. Below is a list of Python language features that we either partially support, or do not support at all:Functions
Functions
| Statement | Status | Notes |
|---|---|---|
| Recursive functions | 🔨 | Recursive functions must have a return type annotation. |
| Lambda expressions | 🚧 | Lambda expressions can be invoked, but cannot be used as objects. |
Literals
Literals
| Collection | Status | Notes |
|---|---|---|
| List literals | 🚧 | List must contain primitive members (e.g. int, str). |
| Dictionary literals | 🚧 | Dictionary must contain primitive members (e.g. int, str). |
| Set literals | 🚧 | Set must contain primitive members (e.g. int, str). |
| Tuple literals | 🚧 | Tuple must contain primitive members (e.g. int, str). |
Classes
Classes
Exceptions
Exceptions
| Statement | Status | Notes |
|---|---|---|
raise statements | 🔨 | |
try..except statement | 🔨 |
Using Compiler Metadata
Muna’s compiler supports specifying metadata, allowing you to configure the compiler or provide additional information.TensorRT Inference Metadata
TensorRT Inference Metadata
TensorRTInferenceMetadata metadata type to compile a PyTorch nn.Module to TensorRT:Target CUDA Architectures
TensorRT engines must be compiled for specific target CUDA architectures. Below are CUDA architectures that our compiler supports:| CUDA Architecture | GPU Family |
|---|---|
sm_80 | Ampere (e.g. A100) |
sm_86 | Ampere |
sm_87 | Ampere |
sm_89 | Ada Lovelace (e.g. L40S) |
sm_90 | Hopper (e.g. H100) |
sm_100 | Blackwell (e.g. B200) |
TensorRT Inference Precision
TensorRT allows for specifying the inference engine’s precision. Below are supported precision modes:| Precision | Notes |
|---|---|
fp32 | 32-bit single precision inference. |
fp16 | 16-bit half precision inference. |
int8 | 8-bit quantized integer inference. |
OnnxRuntime Inference Metadata
OnnxRuntime Inference Metadata
OnnxRuntimeInferenceMetadata metadata type to compile a PyTorch nn.Module for inference with ONNXRuntime:OnnxRuntime Inference Session Metadata
OnnxRuntime Inference Session Metadata
OnnxRuntimeInferenceSessionMetadata metadata type to compile an OnnxRuntime InferenceSession:CoreML Inference Metadata
CoreML Inference Metadata
Llama.cpp Inference Metadata
Llama.cpp Inference Metadata
LlamaCppInferenceMetadata metadata type to compile a Llama
instance:Llama.cpp Hardware Backends
Llama.cpp supports several hardware backends to accelerate model inference. Below are targets that are currently supported by Muna:| Backend | Notes |
|---|---|
cuda | Nvidia CUDA backend. Linux only. |
ExecuTorch Inference Metadata
ExecuTorch Inference Metadata
ExecuTorchInferenceMetadata metadata type to compile a PyTorch nn.Module for inference with ExecuTorch:ExecuTorch Hardware Backends
ExecuTorch supports several hardware backends to accelerate model inference. Below are targets that are currently supported by Muna:| Backend | Notes |
|---|---|
xnnpack | XNNPACK CPU backend. Always enabled. |
vulkan | Vulkan GPU backend. Only supported on Android. |
LiteRT Inference Metadata
LiteRT Inference Metadata
TensorFlow Lite Interpreter Metadata
TensorFlow Lite Interpreter Metadata
TFLiteInterpreterMetadata metadata type to compile a TensorFlow Lite
Interpreter:QNN Inference Metadata
QNN Inference Metadata
QnnInferenceMetadata metadata type to compile a PyTorch nn.Module to a Qualcomm QNN context binary:QNN Hardware Backends
QNN requires that a hardware devicebackend is specified ahead of time. Below are supported backends:| Backend | Notes |
|---|---|
cpu | Reference aarch64 CPU backend. |
gpu | Adreno GPU backend, accelerated by OpenCL. |
htp | Hexagon NPU backend. |
QNN Model Quantization
When using thehtp backend, you must specify a model quantization mode as the Hexagon NPU only supports
running integer-quantized models. Below are supported quantization modes:| Quantization | Notes |
|---|---|
w8a8 | Weights and activations are quantized to uint8. |
w8a16 | Weights are quantized to uint8 while activations are quantized to uint16. |
w4a8 | Weights are quantized to uint4 while activations are quantized to uint8. |
w4a16 | Weights are quantized to uint4 while activations are quantized to uint16. |
OpenVINO Inference Metadata
OpenVINO Inference Metadata
OpenVINOInferenceMetadata metadata type to compile a PyTorch nn.Module to OpenVINO IR:x86_64 devices with Intel processors.IREE Inference Metadata
IREE Inference Metadata
muna.beta.IREEInferenceMetadata metadata type to compile a PyTorch nn.Module for inference with IREE:IREE HAL Target Backends
IREE supports several HAL target backends that themodel can be compiled against. Below are targets that are currently supported by Muna:| Target | Notes |
|---|---|
vulkan | Vulkan GPU backend. Only supported on Android. |
MIGraphX Inference Metadata
MIGraphX Inference Metadata
Library Coverage
We are adding support for popular libraries, across tensor frameworks, scientific computing, and more:Supported Libraries
Supported Libraries