Muna supports deploying compiled models to your GPU fleets, whether on-prem or from third-party GPU platforms. Compiled models can yield up to 45x reductions in cold-start times, along with reduced latency and higher utilization.
Deploying to Modal
Use the muna deploy CLI command to deploy a compiled model to Modal:
# Deploy a compiled LLM to Modal
$ muna deploy @google/gemma-4-26b-a4b-it --provider modal --gpu b200
This command will create a lightweight app on Modal that runs an OpenAI-compatible web server. This server then forwards requests to the compiled model.
This command requires the modal package to be installed. Run pip install modal.
Deploying to Baseten
Use the muna deploy CLI command to deploy a compiled model to Baseten:
# Deploy a compiled LLM to Baseten
$ muna deploy @google/gemma-4-26b-a4b-it --provider baseten --gpu b200
This command will create and deploy a lightweight service on Baseten that runs an OpenAI-compatible web server. This server then forwards requests to the compiled model.
This command requires the truss package to be installed. Run pip install truss.
Generating a Truss Config
The muna deploy CLI command supports emitting a Truss config without deploying the app. To generate a Truss config, use the --dry-run flag:
# Generate a Truss config for a compiled model
$ muna deploy @google/gemma-4-26b-a4b-it --provider baseten --dry-run