Skip to main content
Muna supports compiling a tiny-but-growing subset of Python language constructs. Below are requirements and guidelines for compiling a Python function with Muna:

Specifying the Function Signature

The prediction function must be a module-level function, and must have parameter and return type annotations:
from muna import compile

@compile(...)
def greeting(name: str) -> str:
    return f"Hello {name}"
The prediction function must not have any variable-length positional or keyword arguments.

Supported Parameter Types

Muna supports a fixed set of predictor input and output value types. Below are supported type annotations:
Floating-point input and return values should be annotated with the float built-in type.
from muna import compile

@compile(...)
def square(number: float) -> float:
    return number ** 2
Unlike Python which defaults to 64-bit floats, Muna will always lower a Python float to 32 bits.
For control over the binary width of the number, use the numpy.float[16,32,64] types:
from muna import compile
import numpy as np

@compile(...)
def square(number: np.float64) -> float64:
    return number ** 2
Integer input and return values should be annotated with the int built-in type.
from muna import compile

@compile(...)
def square(number: int) -> int:
    return number ** 2
Unlike Python which supports arbitrary-precision integers, Muna will always lower a Python int to 32 bits.
For control over the binary width of the integer, use the numpy.int[8,16,32,64] types:
from muna import compile
import numpy as np

@compile(...)
def square(number: np.int16) -> np.int16:
    return number ** 2
Boolean input and return values must be annotated with the bool built-in type.
from muna import compile

@compile(...)
def invert(on: bool) -> bool:
    return not on
Tensor input and return values must be annotated with the NumPy numpy.typing.NDArray[T] type, where T is the tensor element type.
from muna import compile
import numpy as np
from numpy.typing import NDArray

@compile(...)
def cholesky_decompose(tensor: NDArray[np.float64]) -> np.ndarray:
    return np.linalg.cholesky(tensor).astype("float32")
You can also annotate with the np.ndarray type, but doing so will always assume a float32 element type (following PyTorch semantics).
Below are the supported element types:
Numpy data typeMuna data type
np.float16float16
np.float32float32
np.float64float64
np.int8int8
np.int16int16
np.int32int32
np.int64int64
np.uint8uint8
np.uint16uint16
np.uint32uint32
np.uint64uint64
boolbool
Muna does not yet support complex numbers or tensors.
Muna only supports, and will always assume, little-endian ordering for multi-byte element types.
String input and return values must be annotated with the str built-in type.
from muna import compile

@compile(...)
def uppercase(text: str) -> str:
    return text.upper()
List input and return values must be annotated with the list[T] built-in type, where T is the element type.
from muna import compile

@compile(...)
def slice(items: list[str]) -> list[str]:
    return items[:3]
When the list element type T is a Pydantic BaseModel, a full JSON schema will be generated.
Providing an element type T is optional but strongly recommended because it is used to generate a schema for the parameter or return value.
Dictionary input and return values can be annotated in one of two ways:
  1. Using a Pydantic BaseModel subclass.
  2. Using the dict[str, T] built-in type.
from muna import compile
from pydantic import BaseModel
from typing import Literal

class Person(BaseModel):
    city: str
    age: int

class Pet(BaseModel):
    sound: Literal["bark", "meow"]
    legs: int

@compile(...)
def choose_favorite_pet(person: Person) -> Pet:
    return Pet(sound="meow", legs=6)
We strongly recommend the Pydantic BaseModel annotation, as it allows us to generate a full JSON schema.
When using the dict annotation, they key type must be str. The value type T can be any arbitrary type.
Image input and return values must be annotated with the Pillow PIL.Image.Image type.
from muna import compile
from PIL import Image

@compile(...)
def resize(image: Image.Image) -> Image.Image:
    return image.resize((512, 512))
Binary input and return values can be annotated in one of three ways:
  1. Using the bytes built-in type.
  2. Using the bytearray built-in type.
  3. Using the io.BytesIO type.
from muna import compile
from PIL import Image

def resize_pixels(pixels: bytes) -> bytes
    return Image.frombytes("L", (4,4), pixels).resize((8,8)).tobytes()

Using Parameter Annotations

Muna supports attaching additional annotations to the function’s parameter and return types:
from muna import compile, Parameter
from typing import Annotated

@compile(...)
def area(
    radius: Annotated[
        float,
        Parameter.Generic(description="Radius of the circle.")
    ]
) -> Annotated[
    float,
    Parameter.Generic(description="Area of the circle.")
]:
    ...
These annotations serve multiple important purposes:
  • They help users know what input data to provide to the predictor and how to use output data from the predictor, via the parameter description.
  • They help users search for predictors using highly detailed queries (e.g. MCP clients).
  • They help the Muna client automatically provide familiar interfaces around your prediction function, e.g. with the OpenAI interface.
  • They help the Muna website automatically create interactive visualizers for your prediction function.
While not required, we highly recommend using parameter annotations on your compiled functions.
Below are currently supported annotations:
Use the Parameter.Generic annotation to provide information about a general input or output parameters:
predictor.py
from muna import compile, Parameter
from typing import Annotated

@compile(...)
def area(
    radius: Annotated[
        float,
        Parameter.Generic(description="Radius of the circle.")
    ]
) -> float:
    ...
Below is the full Parameter.Generic annotation definition:
@classmethod
def Generic(
    cls,
    *,
    description: str  # Parameter description.
) -> Parameter: ...
Use the Parameter.Numeric annotation to specify numeric input or output parameters:
calculate_area.py
from muna import compile, Parameter
from typing import Annotated

@compile(...)
def area(
    radius: Annotated[
        float,
        Parameter.Numeric(
            description="Circle radius.",
            min=1.,
            max=12.
        )
    ]
) -> float:
    ...
Below is the full Parameter.Numeric annotation definition:
@classmethod
def Numeric(
    cls,
    *,
    description: str,         # Parameter description.
    min: float | None=None,   # Minimum value.
    max: float | None=None    # Maximum value.
) -> Parameter: ...
Use the Parameter.Audio annotation to specify audio parameters:
transcribe_audio.py
from muna import compile, Parameter
from numpy import ndarray
from typing import Annotated

@compile(...)
def transcribe_audio(
    audio: Annotated[
        ndarray,
        Parameter.Audio(
            description="Input audio.",
            sample_rate=24_000
        )
    ]
) -> str:
    ...
The Parameter.Audio annotation allows the compiled predictor to be used by our OpenAI speech client.
Below is the full Parameter.Audio annotation definition:
@classmethod
def Audio(
    cls,
    *,
    description: str, # Parameter description.
    sample_rate: int  # Audio sample rate in Hertz.
) -> Parameter: ...
Use the Parameter.AudioSpeed annotation to specify audio speed parameters in audio generation predictors:
generate_speech.py
from muna import compile, Parameter
from numpy import ndarray
from typing import Annotated

@compile(...)
def generate_speech(
    text: str,
    speed: Annotated[
        float,
        Parameter.AudioSpeed(
            description="The speed of the generated audio.",
            min=0.25,
            max=4.0
        )
    ] = 1.0
) -> ndarray:
    ...
Below is the full Parameter.AudioSpeed annotation definition:
@classmethod
def AudioSpeed(
    cls,
    *,
    description: str,       # Parameter description.
    min: float | None=None, # Minimum audio speed.
    max: float | None=None  # Maximum audio speed.
) -> Parameter: ...
Use the Parameter.AudioVoice annotation to specify audio voice parameters in audio generation predictors:
generate_speech.py
from muna import compile, Parameter
from numpy import ndarray
from typing import Annotated, Literal

Voice = Literal["almas", "parv", "rhea", "sam"]

@compile(...)
def generate_speech(
    text: str,
    voice: Annotated[
        Voice,
        Parameter.AudioVoice(description="Voice to use when generating audio.")
    ],
    speed: float=1.0
) -> ndarray:
    ...
Below is the full Parameter.AudioVoice annotation definition:
@classmethod
def AudioVoice(
    cls,
    *,
    description: str    # Parameter description.
) -> Parameter: ...
Use the Parameter.BoundingBox or Parameter.BoundingBoxes annotations to specify bounding box parameters in object detection predictors:
from muna import compile, Parameter
from PIL import Image
from typing import Annotated, Literal

@compile(...)
def detect_object(
    image: Image.Image
) -> Annotated[
    Detection,
    Parameter.BoundingBox(description="Detected object.")
]:
    ...
Below is the full Parameter.BoundingBox annotation definition:
@classmethod
def BoundingBox(
    cls,
    *,
    description: str    # Parameter description.
) -> Parameter: ...
Use the Parameter.DepthMap annotation to specify depth map parameters in depth estimation predictors:
estimate_depth.py
from muna import compile, Parameter
from numpy import ndarray
from PIL import Image
from typing import Annotated

@compile(...)
def estimate_depth(
    image: Image.Image
) -> Annotated[
    ndarray,
    Parameter.DepthMap(description="Metric depth tensor.")
]:
    ...
Below is the full Parameter.DepthMap annotation definition:
@classmethod
def DepthMap(
    cls,
    *,
    description: str    # Parameter description.
) -> Parameter: ...
Use the Parameter.Embedding annotation to specify vector embedding parameters in embedding predictors:
embed_text.py
from muna import compile, Parameter
from numpy import ndarray
from typing import Annotated

@compile(...)
def embed_text(
    text: str
) -> Annotated[
    ndarray,
    Parameter.Embedding(description="Embedding vector.")
]:
    ...
The Parameter.Embedding annotation allows the compiled predictor to be used by our
OpenAI embedding client.
Below is the full Parameter.Embedding annotation definition:
@classmethod
def Embedding(
    cls,
    *,
    description: str  # Parameter description.
) -> Parameter: ...
Use the Parameter.EmbeddingDims annotation to specify an embedding Matryoshka dimension parameter in embedding predictors:
embed_text.py
from muna import compile, Parameter
from numpy import ndarray
from typing import Annotated

@compile(...)
def embed_text(
    text: str,
    dims: Annotated[
        int,
        Parameter.EmbeddingDims(description="Embedding dimensions.")
    ]
) -> ndarray:
    ...
Below is the full Parameter.EmbeddingDims annotation definition:
@classmethod
def EmbeddingDims(
    cls,
    *,
    description: str,       # Parameter description.
    min: int | None=None,   # Minimum embedding dimensions.
    max: int | None=None    # Maximum embedding dimensions.
) -> Parameter: ...

Writing the Function Body

The function body can contain arbitrary Python code. Given that the Muna compiler is currently a proof of concept, it has limited coverage for Python language features. Below is a list of Python language features that we either partially support, or do not support at all:
StatementStatusNotes
Recursive functions🔨Recursive functions must have a return type annotation.
Lambda expressions🚧Lambda expressions can be invoked, but cannot be used as objects.
CollectionStatusNotes
List literals🚧List must contain primitive members (e.g. int, str).
Dictionary literals🚧Dictionary must contain primitive members (e.g. int, str).
Set literals🚧Set must contain primitive members (e.g. int, str).
Tuple literals🚧Tuple must contain primitive members (e.g. int, str).
Tracing through classes is not yet supported.
StatementStatusNotes
raise statements🔨
try..except statement🔨
Over time the list of unsupported language features will shrink and eventually, will be empty.

Using Compiler Metadata

Muna’s compiler supports specifying metadata, allowing you to configure the compiler or provide additional information.
Use the TensorRTInferenceMetadata metadata type to compile a PyTorch nn.Module to TensorRT:
ai.py
from muna.beta import TensorRTInferenceMetadata
from torch import randn, Tensor
from torch.nn import Module

# Given a PyTorch model...
model: Module = ...
# With some example arguments...
example_args: list[Tensor] = [randn(1, 3, 224, 224)]

@compile(
    ...,
    metadata=[
        # Use TensorRT for model inference
        TensorRTInferenceMetadata(
            model=model,
            model_args=example_args,
            cuda_arch="sm_100",
            precision="int4"
        )
    ]
)
def predict() -> None:
    pass
The TensorRT inference backend is only available on Linux and Windows devices with compatible Nvidia GPUs.
We are working on adding support for consumer RTX GPUs with TensorRT for RTX.

Target CUDA Architectures

TensorRT engines must be compiled for specific target CUDA architectures. Below are CUDA architectures that our compiler supports:
CUDA ArchitectureGPU Family
sm_80Ampere (e.g. A100)
sm_86Ampere
sm_87Ampere
sm_89Ada Lovelace (e.g. L40S)
sm_90Hopper (e.g. H100)
sm_100Blackwell (e.g. B200)

TensorRT Inference Precision

TensorRT allows for specifying the inference engine’s precision. Below are supported precision modes:
PrecisionNotes
fp3232-bit single precision inference.
fp1616-bit half precision inference.
int88-bit quantized integer inference.
Use the OnnxRuntimeInferenceMetadata metadata type to compile a PyTorch nn.Module for inference with ONNXRuntime:
ai.py
from muna.beta import OnnxRuntimeInferenceMetadata
from torch import randn, Tensor
from torch.nn import Module

# Given a PyTorch model...
model: Module = ...
# With some example arguments...
example_args: list[Tensor] = [randn(1, 3, 224, 224)]

@compile(
    ...,
    metadata=[
        # Use ONNXRuntime for model inference
        OnnxRuntimeInferenceMetadata(
            model=model,
            model_args=example_args
        )
    ]
)
def predict() -> None:
    pass
Use the OnnxRuntimeInferenceSessionMetadata metadata type to compile an OnnxRuntime InferenceSession:
ai.py
from muna.beta import OnnxRuntimeInferenceSessionMetadata
from onnxruntime import InferenceSession

# Given an ONNXRuntime inference session...
model_path = "/path/to/model.onnx"
session = InferenceSession(model_path)

@compile(
    ...,
    metadata=[
        # Use ONNXRuntime for model inference
        OnnxRuntimeInferenceSessionMetadata(
            session=session,
            model_path=model_path
        )
    ]
)
def predict(...) -> None:
    pass
The ONNX model file must exist at the provided model_path within the compiler sandbox.
Use the CoreMLInferenceMetadata metadata type to compile a PyTorch nn.Module to CoreML:
ai.py
from muna.beta import CoreMLInferenceMetadata
from torch import randn, Tensor
from torch.nn import Module

# Given a PyTorch model...
model: Module = ...
# With some example arguments...
example_args: list[Tensor] = [randn(1, 3, 224, 224)]

@compile(
    ...,
    metadata=[
        # Use CoreML for model inference
        CoreMLInferenceMetadata(
            model=model,
            model_args=example_args
        )
    ]
)
def predict() -> None:
    pass
The CoreML inference backend is only available on iOS, macOS, and visionOS devices.
Use the LlamaCppInferenceMetadata metadata type to compile a Llama instance:
llm.py
from muna.beta import LlamaCppInferenceMetadata
from llama_cpp import Llama

# Given an LLM
llm = Llama(...)

@compile(
    ...,
    metadata=[
        # Specify Llama.cpp inference metadata
        LlamaCppInferenceMetadata(
            model=llm,
            backends=["cuda"]
        )
    ]
)
def predict() -> None:
    pass

Llama.cpp Hardware Backends

Llama.cpp supports several hardware backends to accelerate model inference. Below are targets that are currently supported by Muna:
BackendNotes
cudaNvidia CUDA backend. Linux only.
Use the ExecuTorchInferenceMetadata metadata type to compile a PyTorch nn.Module for inference with ExecuTorch:
ai.py
from muna.beta import ExecuTorchInferenceMetadata
from torch import randn, Tensor
from torch.nn import Module

# Given a PyTorch model...
model: Module = ...
# With some example arguments...
example_args: list[Tensor] = [randn(1, 3, 224, 224)]

@compile(
    ...,
    metadata=[
        # Use ExecuTorch for model inference
        ExecuTorchInferenceMetadata(
            model=model,
            model_args=example_args,
            backend="xnnpack"
        )
    ]
)
def predict() -> None:
    pass
The ExecuTorch inference backend is only available on Android.

ExecuTorch Hardware Backends

ExecuTorch supports several hardware backends to accelerate model inference. Below are targets that are currently supported by Muna:
BackendNotes
xnnpackXNNPACK CPU backend. Always enabled.
vulkanVulkan GPU backend. Only supported on Android.
Use the LiteRTInferenceMetadata metadata type to compile a PyTorch nn.Module for inference with LiteRT:
ai.py
from muna.beta import LiteRTInferenceMetadata
from torch import randn, Tensor
from torch.nn import Module

# Given a PyTorch model...
model: Module = ...
# With some example arguments...
example_args: list[Tensor] = [randn(1, 3, 224, 224)]

@compile(
    ...,
    metadata=[
        # Use LiteRT for model inference
        LiteRTInferenceMetadata(
            model=model,
            model_args=example_args
        )
    ]
)
def predict() -> None:
    pass
Use the TFLiteInterpreterMetadata metadata type to compile a TensorFlow Lite Interpreter:
ai.py
from muna.beta import TFLiteInterpreterMetadata
from tensorflow import lite

# Given a TFLite interpreter...
model_path = "/path/to/model.tflite"
interpreter = lite.Interpreter(model_path)

@compile(
    ...,
    metadata=[
        # Use TensorFlow Lite for model inference
        TFLiteInterpreterMetadata(
            interpreter=interpreter,
            model_path=model_path
        )
    ]
)
def predict(...) -> None:
    pass
The TensorFlow Lite model file must exist at the provided model_path within the compiler sandbox.
Use the QnnInferenceMetadata metadata type to compile a PyTorch nn.Module to a Qualcomm QNN context binary:
ai.py
from muna.beta import QnnInferenceMetadata
from torch import randn, Tensor
from torch.nn import Module

# Given a PyTorch model...
model: Module = ...
# With some example arguments...
example_args: list[Tensor] = [randn(1, 3, 224, 224)]

@compile(
    ...,
    metadata=[
        # Use QNN for model inference
        QnnInferenceMetadata(
            model=model,
            model_args=example_args,
            backend="gpu",
            quantization=None
        )
    ]
)
def predict() -> None:
    pass
The QNN inference backend is only available on Android and Windows devices with Qualcomm processors.

QNN Hardware Backends

QNN requires that a hardware device backend is specified ahead of time. Below are supported backends:
BackendNotes
cpuReference aarch64 CPU backend.
gpuAdreno GPU backend, accelerated by OpenCL.
htpHexagon NPU backend.
Learn more about QNN hardware backends.

QNN Model Quantization

When using the htp backend, you must specify a model quantization mode as the Hexagon NPU only supports running integer-quantized models. Below are supported quantization modes:
QuantizationNotes
w8a8Weights and activations are quantized to uint8.
w8a16Weights are quantized to uint8 while activations are quantized to uint16.
w4a8Weights are quantized to uint4 while activations are quantized to uint8.
w4a16Weights are quantized to uint4 while activations are quantized to uint16.
Use the OpenVINOInferenceMetadata metadata type to compile a PyTorch nn.Module to OpenVINO IR:
ai.py
from muna.beta import OpenVINOInferenceMetadata
from torch import randn, Tensor
from torch.nn import Module

# Given a PyTorch model...
model: Module = ...
# With some example arguments...
example_args: list[Tensor] = [randn(1, 3, 224, 224)]

@compile(
    ...,
    metadata=[
        # Use OpenVINO for model inference
        OpenVINOInferenceMetadata(
            model=model,
            model_args=example_args
        )
    ]
)
def predict() -> None:
    pass
At runtime, the OpenVINO IR will be used for inference with the OpenVINO toolkit.
The OpenVINO inference backend is only available on Linux and Windows x86_64 devices with Intel processors.
Use the muna.beta.IREEInferenceMetadata metadata type to compile a PyTorch nn.Module for inference with IREE:
ai.py
from muna.beta import IREEInferenceMetadata
from torch import randn, Tensor
from torch.nn import Module

# Given a PyTorch model...
model: Module = ...
# With some example arguments...
example_args: list[Tensor] = [randn(1, 3, 224, 224)]

@compile(
    ...,
    metadata=[
        # Use IREE for model inference
        IREEInferenceMetadata(
            model=model,
            model_args=example_args,
            backend="vulkan"
        )
    ]
)
def predict() -> None:
    pass
The IREE inference backend is only available on Android devices.

IREE HAL Target Backends

IREE supports several HAL target backends that the model can be compiled against. Below are targets that are currently supported by Muna:
TargetNotes
vulkanVulkan GPU backend. Only supported on Android.
Coming soon 🤫.

Library Coverage

We are adding support for popular libraries, across tensor frameworks, scientific computing, and more:
Below are libraries currently supported by our compiler:
If you need a specific library to be supported by the Muna compiler, reach out to us.