Muna can adaptively search for the best hardware to run models, depending on your cost, latency, and throughput requirements. Use the muna.predictions.create method, and specify your constraints in natural language:
// π₯ Run inference with the lowest latencyconst prediction = await muna.predictions.create({ tag: "@openai/gpt-oss-120b", inputs: { messages }, acceleration: "lowest latency"});
This feature is in early alpha, and is only offered to specific teams.
Request access on our Slack.
Use the muna.predictions.create method to run inference locally:
// π₯ Run inference with the local NPUconst prediction = await muna.predictions.create({ tag: "@bytedance/depth-anything-3", inputs: { image }, acceleration: "local_npu"});
Some Muna clients allow you to specify the acceleration device used to make predictions.
Our clients expose this field as an untyped integer or pointer.
The underlying type depends on the current operating system:
The prediction device is merely a hint. Setting a device does not guarantee that all
or any operation in the prediction function will actually use that acceleration device.
You should absolutely (absolutely) never ever do this unless you know what the hell youβre doing.