Skip to content

Runtime#

Runtime Module for the models

This module provides the necessary functionality for loading, preprocessing, running inference, and benchmarking ONNX and TorchScript models using different execution providers such as CUDA, TensorRT, and CPU. It includes utility functions for image preprocessing, postprocessing, and interfacing with the ONNXRuntime and TorchScript libraries.

Functions:

Name Description
det_postprocess

Postprocesses detection model outputs into sv.Detections.

semseg_postprocess

Postprocesses semantic segmentation model outputs into sv.Detections.

load_runtime

Returns an ONNXRuntime or TorchscriptRuntime instance configured for the given runtime type.

Classes:

Name Description
RuntimeTypes

Enum for the different runtime types.

ONNXRuntime

A class that interfaces with ONNX Runtime for model inference.

TorchscriptRuntime

A class that interfaces with TorchScript for model inference.

BaseRuntime #

Abstract base class for runtime implementations.

This class defines the interface that all runtime implementations must follow. It provides methods for model initialization, inference, and performance benchmarking.

Attributes:

Name Type Description
model_path str

Path to the model file.

opts Any

Runtime-specific options.

model_metadata ModelMetadata

Metadata about the model.

Source code in focoos/runtime.py
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
class BaseRuntime:
    """
    Abstract base class for runtime implementations.

    This class defines the interface that all runtime implementations must follow.
    It provides methods for model initialization, inference, and performance benchmarking.

    Attributes:
        model_path (str): Path to the model file.
        opts (Any): Runtime-specific options.
        model_metadata (ModelMetadata): Metadata about the model.
    """

    def __init__(self, model_path: str, opts: Any, model_metadata: ModelMetadata):
        """
        Initialize the runtime with model path, options and metadata.

        Args:
            model_path (str): Path to the model file.
            opts (Any): Runtime-specific configuration options.
            model_metadata (ModelMetadata): Metadata about the model.
        """
        pass

    @abstractmethod
    def __call__(self, im: np.ndarray) -> np.ndarray:
        """
        Run inference on the input image.

        Args:
            im (np.ndarray): Input image as a numpy array.

        Returns:
            np.ndarray: Model output as a numpy array.
        """
        pass

    @abstractmethod
    def benchmark(self, iterations=20, size=640) -> LatencyMetrics:
        """
        Benchmark the model performance.

        Args:
            iterations (int, optional): Number of inference iterations to run. Defaults to 20.
            size (int, optional): Input image size for benchmarking. Defaults to 640.

        Returns:
            LatencyMetrics: Performance metrics including mean, median, and percentile latencies.
        """
        pass

__call__(im) abstractmethod #

Run inference on the input image.

Parameters:

Name Type Description Default
im ndarray

Input image as a numpy array.

required

Returns:

Type Description
ndarray

np.ndarray: Model output as a numpy array.

Source code in focoos/runtime.py
82
83
84
85
86
87
88
89
90
91
92
93
@abstractmethod
def __call__(self, im: np.ndarray) -> np.ndarray:
    """
    Run inference on the input image.

    Args:
        im (np.ndarray): Input image as a numpy array.

    Returns:
        np.ndarray: Model output as a numpy array.
    """
    pass

__init__(model_path, opts, model_metadata) #

Initialize the runtime with model path, options and metadata.

Parameters:

Name Type Description Default
model_path str

Path to the model file.

required
opts Any

Runtime-specific configuration options.

required
model_metadata ModelMetadata

Metadata about the model.

required
Source code in focoos/runtime.py
71
72
73
74
75
76
77
78
79
80
def __init__(self, model_path: str, opts: Any, model_metadata: ModelMetadata):
    """
    Initialize the runtime with model path, options and metadata.

    Args:
        model_path (str): Path to the model file.
        opts (Any): Runtime-specific configuration options.
        model_metadata (ModelMetadata): Metadata about the model.
    """
    pass

benchmark(iterations=20, size=640) abstractmethod #

Benchmark the model performance.

Parameters:

Name Type Description Default
iterations int

Number of inference iterations to run. Defaults to 20.

20
size int

Input image size for benchmarking. Defaults to 640.

640

Returns:

Name Type Description
LatencyMetrics LatencyMetrics

Performance metrics including mean, median, and percentile latencies.

Source code in focoos/runtime.py
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
@abstractmethod
def benchmark(self, iterations=20, size=640) -> LatencyMetrics:
    """
    Benchmark the model performance.

    Args:
        iterations (int, optional): Number of inference iterations to run. Defaults to 20.
        size (int, optional): Input image size for benchmarking. Defaults to 640.

    Returns:
        LatencyMetrics: Performance metrics including mean, median, and percentile latencies.
    """
    pass

ONNXRuntime #

Bases: BaseRuntime

ONNX Runtime wrapper for model inference with different execution providers.

This class implements the BaseRuntime interface for ONNX models, supporting various execution providers like CUDA, TensorRT, OpenVINO, and CoreML. It handles model initialization, provider configuration, warmup, inference, and performance benchmarking.

Attributes:

Name Type Description
name str

Name of the model derived from the model path.

opts OnnxRuntimeOpts

Configuration options for the ONNX runtime.

model_metadata ModelMetadata

Metadata about the model.

ort_sess InferenceSession

ONNX Runtime inference session.

active_providers list

List of active execution providers.

dtype dtype

Input data type for the model.

Source code in focoos/runtime.py
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
class ONNXRuntime(BaseRuntime):
    """
    ONNX Runtime wrapper for model inference with different execution providers.

    This class implements the BaseRuntime interface for ONNX models, supporting
    various execution providers like CUDA, TensorRT, OpenVINO, and CoreML.
    It handles model initialization, provider configuration, warmup, inference,
    and performance benchmarking.

    Attributes:
        name (str): Name of the model derived from the model path.
        opts (OnnxRuntimeOpts): Configuration options for the ONNX runtime.
        model_metadata (ModelMetadata): Metadata about the model.
        ort_sess (ort.InferenceSession): ONNX Runtime inference session.
        active_providers (list): List of active execution providers.
        dtype (np.dtype): Input data type for the model.
    """

    def __init__(self, model_path: str, opts: OnnxRuntimeOpts, model_metadata: ModelMetadata):
        self.logger = get_logger()

        self.logger.debug(f"🔧 [onnxruntime device] {ort.get_device()}")

        self.name = Path(model_path).stem
        self.opts = opts
        self.model_metadata = model_metadata

        # Setup session options
        options = ort.SessionOptions()
        options.log_severity_level = 0 if opts.verbose else 2
        options.enable_profiling = opts.verbose

        # Setup providers
        self.providers = self._setup_providers(model_dir=Path(model_path).parent)
        self.active_provider = self.providers[0][0]
        self.logger.info(f"[onnxruntime] using: {self.active_provider}")
        # Create session
        self.ort_sess = ort.InferenceSession(model_path, options, providers=self.providers)

        if self.opts.trt and self.providers[0][0] == "TensorrtExecutionProvider":
            self.logger.info(
                "🟢 [onnxruntime] TensorRT enabled. First execution may take longer as it builds the TRT engine."
            )
        # Set input type
        self.dtype = np.uint8 if self.ort_sess.get_inputs()[0].type == "tensor(uint8)" else np.float32

        # Warmup
        if self.opts.warmup_iter > 0:
            self._warmup()

    def _setup_providers(self, model_dir: str):
        providers = []
        available = ort.get_available_providers()
        self.logger.info(f"[onnxruntime] available providers:{available}")
        _dir = Path(model_dir)
        models_root = _dir.parent
        # Check and add providers in order of preference
        provider_configs = [
            (
                "TensorrtExecutionProvider",
                self.opts.trt,
                {
                    "device_id": GPU_ID,
                    "trt_fp16_enable": self.opts.fp16,
                    "trt_force_sequential_engine_build": False,
                    "trt_engine_cache_enable": True,
                    "trt_engine_cache_path": str(_dir / ".trt_cache"),
                    "trt_ep_context_file_path": str(_dir),
                    "trt_timing_cache_enable": True,  # Timing cache can be shared across multiple models if layers are the same
                    "trt_builder_optimization_level": 3,
                    "trt_timing_cache_path": str(models_root / ".trt_timing_cache"),
                },
            ),
            (
                "OpenVINOExecutionProvider",
                self.opts.vino,
                {"device_type": "MYRIAD_FP16", "enable_vpu_fast_compile": True, "num_of_threads": 1},
            ),
            (
                "CUDAExecutionProvider",
                self.opts.cuda,
                {
                    "device_id": GPU_ID,
                    "arena_extend_strategy": "kSameAsRequested",
                    "gpu_mem_limit": 16 * 1024 * 1024 * 1024,
                    "cudnn_conv_algo_search": "EXHAUSTIVE",
                    "do_copy_in_default_stream": True,
                },
            ),
            ("CoreMLExecutionProvider", self.opts.coreml, {}),
        ]

        for provider, enabled, config in provider_configs:
            if enabled and provider in available:
                providers.append((provider, config))
            elif enabled:
                self.logger.warning(f"{provider} not found.")

        providers.append(("CPUExecutionProvider", {}))
        return providers

    def _warmup(self):
        self.logger.info("⏱️ [onnxruntime] Warming up model ..")
        np_image = np.random.rand(1, 3, 640, 640).astype(self.dtype)
        input_name = self.ort_sess.get_inputs()[0].name
        out_name = [output.name for output in self.ort_sess.get_outputs()]

        for _ in range(self.opts.warmup_iter):
            self.ort_sess.run(out_name, {input_name: np_image})

        self.logger.info("⏱️ [onnxruntime] Warmup done")

    def __call__(self, im: np.ndarray) -> list[np.ndarray]:
        """
        Run inference on the input image.

        Args:
            im (np.ndarray): Input image as a numpy array.

        Returns:
            list[np.ndarray]: Model outputs as a list of numpy arrays.
        """
        input_name = self.ort_sess.get_inputs()[0].name
        out_name = [output.name for output in self.ort_sess.get_outputs()]
        out = self.ort_sess.run(out_name, {input_name: im})
        return out

    def benchmark(self, iterations=20, size=640) -> LatencyMetrics:
        """
        Benchmark the model performance.

        Runs multiple inference iterations and measures execution time to calculate
        performance metrics like FPS, mean latency, and other statistics.

        Args:
            iterations (int, optional): Number of inference iterations to run. Defaults to 20.
            size (int or tuple, optional): Input image size for benchmarking. Defaults to 640.

        Returns:
            LatencyMetrics: Performance metrics including FPS, mean, min, max, and std latencies.
        """
        gpu_info = get_gpu_info()
        device_name = "CPU"
        if gpu_info.devices is not None and len(gpu_info.devices) > 0:
            device_name = gpu_info.devices[0].gpu_name
        else:
            device_name = get_cpu_name()
            self.logger.warning(f"No GPU found, using CPU {device_name}.")

        self.logger.info(f"⏱️ [onnxruntime] Benchmarking latency on {device_name}..")
        size = size if isinstance(size, (tuple, list)) else (size, size)

        np_input = (255 * np.random.random((1, 3, size[0], size[1]))).astype(self.dtype)
        input_name = self.ort_sess.get_inputs()[0].name
        out_name = [output.name for output in self.ort_sess.get_outputs()]

        durations = []
        for step in range(iterations + 5):
            start = perf_counter()
            self.ort_sess.run(out_name, {input_name: np_input})
            end = perf_counter()

            if step >= 5:  # Skip first 5 iterations
                durations.append((end - start) * 1000)

        durations = np.array(durations)

        metrics = LatencyMetrics(
            fps=int(1000 / durations.mean()),
            engine=f"onnx.{self.active_provider}",
            mean=round(durations.mean().astype(float), 3),
            max=round(durations.max().astype(float), 3),
            min=round(durations.min().astype(float), 3),
            std=round(durations.std().astype(float), 3),
            im_size=size[0],
            device=str(device_name),
        )
        self.logger.info(f"🔥 FPS: {metrics.fps} Mean latency: {metrics.mean} ms ")
        return metrics

__call__(im) #

Run inference on the input image.

Parameters:

Name Type Description Default
im ndarray

Input image as a numpy array.

required

Returns:

Type Description
list[ndarray]

list[np.ndarray]: Model outputs as a list of numpy arrays.

Source code in focoos/runtime.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
def __call__(self, im: np.ndarray) -> list[np.ndarray]:
    """
    Run inference on the input image.

    Args:
        im (np.ndarray): Input image as a numpy array.

    Returns:
        list[np.ndarray]: Model outputs as a list of numpy arrays.
    """
    input_name = self.ort_sess.get_inputs()[0].name
    out_name = [output.name for output in self.ort_sess.get_outputs()]
    out = self.ort_sess.run(out_name, {input_name: im})
    return out

benchmark(iterations=20, size=640) #

Benchmark the model performance.

Runs multiple inference iterations and measures execution time to calculate performance metrics like FPS, mean latency, and other statistics.

Parameters:

Name Type Description Default
iterations int

Number of inference iterations to run. Defaults to 20.

20
size int or tuple

Input image size for benchmarking. Defaults to 640.

640

Returns:

Name Type Description
LatencyMetrics LatencyMetrics

Performance metrics including FPS, mean, min, max, and std latencies.

Source code in focoos/runtime.py
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
def benchmark(self, iterations=20, size=640) -> LatencyMetrics:
    """
    Benchmark the model performance.

    Runs multiple inference iterations and measures execution time to calculate
    performance metrics like FPS, mean latency, and other statistics.

    Args:
        iterations (int, optional): Number of inference iterations to run. Defaults to 20.
        size (int or tuple, optional): Input image size for benchmarking. Defaults to 640.

    Returns:
        LatencyMetrics: Performance metrics including FPS, mean, min, max, and std latencies.
    """
    gpu_info = get_gpu_info()
    device_name = "CPU"
    if gpu_info.devices is not None and len(gpu_info.devices) > 0:
        device_name = gpu_info.devices[0].gpu_name
    else:
        device_name = get_cpu_name()
        self.logger.warning(f"No GPU found, using CPU {device_name}.")

    self.logger.info(f"⏱️ [onnxruntime] Benchmarking latency on {device_name}..")
    size = size if isinstance(size, (tuple, list)) else (size, size)

    np_input = (255 * np.random.random((1, 3, size[0], size[1]))).astype(self.dtype)
    input_name = self.ort_sess.get_inputs()[0].name
    out_name = [output.name for output in self.ort_sess.get_outputs()]

    durations = []
    for step in range(iterations + 5):
        start = perf_counter()
        self.ort_sess.run(out_name, {input_name: np_input})
        end = perf_counter()

        if step >= 5:  # Skip first 5 iterations
            durations.append((end - start) * 1000)

    durations = np.array(durations)

    metrics = LatencyMetrics(
        fps=int(1000 / durations.mean()),
        engine=f"onnx.{self.active_provider}",
        mean=round(durations.mean().astype(float), 3),
        max=round(durations.max().astype(float), 3),
        min=round(durations.min().astype(float), 3),
        std=round(durations.std().astype(float), 3),
        im_size=size[0],
        device=str(device_name),
    )
    self.logger.info(f"🔥 FPS: {metrics.fps} Mean latency: {metrics.mean} ms ")
    return metrics

TorchscriptRuntime #

Bases: BaseRuntime

TorchScript Runtime wrapper for model inference.

This class implements the BaseRuntime interface for TorchScript models, supporting both CPU and CUDA devices. It handles model initialization, device placement, warmup, inference, and performance benchmarking.

Attributes:

Name Type Description
device device

Device to run inference on (CPU or CUDA).

opts TorchscriptRuntimeOpts

Configuration options for the TorchScript runtime.

model ScriptModule

Loaded TorchScript model.

Source code in focoos/runtime.py
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
class TorchscriptRuntime(BaseRuntime):
    """
    TorchScript Runtime wrapper for model inference.

    This class implements the BaseRuntime interface for TorchScript models,
    supporting both CPU and CUDA devices. It handles model initialization,
    device placement, warmup, inference, and performance benchmarking.

    Attributes:
        device (torch.device): Device to run inference on (CPU or CUDA).
        opts (TorchscriptRuntimeOpts): Configuration options for the TorchScript runtime.
        model (torch.jit.ScriptModule): Loaded TorchScript model.
    """

    def __init__(
        self,
        model_path: str,
        opts: TorchscriptRuntimeOpts,
        model_metadata: ModelMetadata,
    ):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.logger = get_logger(name="TorchscriptEngine")
        self.logger.info(f"🔧 [torchscript] Device: {self.device}")
        self.opts = opts

        map_location = None if torch.cuda.is_available() else "cpu"

        self.model = torch.jit.load(model_path, map_location=map_location)
        self.model = self.model.to(self.device)

        if self.opts.warmup_iter > 0:
            self.logger.info("⏱️ [torchscript] Warming up model..")
            with torch.no_grad():
                np_image = torch.rand(1, 3, 640, 640, device=self.device)
                for _ in range(self.opts.warmup_iter):
                    self.model(np_image)
            self.logger.info("⏱️ [torchscript] WARMUP DONE")

    def __call__(self, im: np.ndarray) -> list[np.ndarray]:
        """
        Run inference on the input image.

        Args:
            im (np.ndarray): Input image as a numpy array.

        Returns:
            list[np.ndarray]: Model outputs as a list of numpy arrays.
        """
        with torch.no_grad():
            torch_image = torch.from_numpy(im).to(self.device, dtype=torch.float32)
            res = self.model(torch_image)
            return [r.cpu().numpy() for r in res]

    def benchmark(self, iterations=20, size=640) -> LatencyMetrics:
        """
        Benchmark the model performance.

        Runs multiple inference iterations and measures execution time to calculate
        performance metrics like FPS, mean latency, and other statistics.

        Args:
            iterations (int, optional): Number of inference iterations to run. Defaults to 20.
            size (int or tuple, optional): Input image size for benchmarking. Defaults to 640.

        Returns:
            LatencyMetrics: Performance metrics including FPS, mean, min, max, and std latencies.
        """
        gpu_info = get_gpu_info()
        device_name = "CPU"
        if gpu_info.devices is not None and len(gpu_info.devices) > 0:
            device_name = gpu_info.devices[0].gpu_name
        else:
            device_name = get_cpu_name()
            self.logger.warning(f"No GPU found, using CPU {device_name}.")
        self.logger.info("⏱️ [torchscript] Benchmarking latency..")
        size = size if isinstance(size, (tuple, list)) else (size, size)

        torch_input = torch.rand(1, 3, size[0], size[1], device=self.device)
        durations = []

        with torch.no_grad():
            for step in range(iterations + 5):
                start = perf_counter()
                self.model(torch_input)
                end = perf_counter()

                if step >= 5:  # Skip first 5 iterations
                    durations.append((end - start) * 1000)

        durations = np.array(durations)

        metrics = LatencyMetrics(
            fps=int(1000 / durations.mean().astype(float)),
            engine="torchscript",
            mean=round(durations.mean().astype(float), 3),
            max=round(durations.max().astype(float), 3),
            min=round(durations.min().astype(float), 3),
            std=round(durations.std().astype(float), 3),
            im_size=size[0],
            device=str(device_name),
        )
        self.logger.info(f"🔥 FPS: {metrics.fps} Mean latency: {metrics.mean} ms ")
        return metrics

__call__(im) #

Run inference on the input image.

Parameters:

Name Type Description Default
im ndarray

Input image as a numpy array.

required

Returns:

Type Description
list[ndarray]

list[np.ndarray]: Model outputs as a list of numpy arrays.

Source code in focoos/runtime.py
329
330
331
332
333
334
335
336
337
338
339
340
341
342
def __call__(self, im: np.ndarray) -> list[np.ndarray]:
    """
    Run inference on the input image.

    Args:
        im (np.ndarray): Input image as a numpy array.

    Returns:
        list[np.ndarray]: Model outputs as a list of numpy arrays.
    """
    with torch.no_grad():
        torch_image = torch.from_numpy(im).to(self.device, dtype=torch.float32)
        res = self.model(torch_image)
        return [r.cpu().numpy() for r in res]

benchmark(iterations=20, size=640) #

Benchmark the model performance.

Runs multiple inference iterations and measures execution time to calculate performance metrics like FPS, mean latency, and other statistics.

Parameters:

Name Type Description Default
iterations int

Number of inference iterations to run. Defaults to 20.

20
size int or tuple

Input image size for benchmarking. Defaults to 640.

640

Returns:

Name Type Description
LatencyMetrics LatencyMetrics

Performance metrics including FPS, mean, min, max, and std latencies.

Source code in focoos/runtime.py
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
def benchmark(self, iterations=20, size=640) -> LatencyMetrics:
    """
    Benchmark the model performance.

    Runs multiple inference iterations and measures execution time to calculate
    performance metrics like FPS, mean latency, and other statistics.

    Args:
        iterations (int, optional): Number of inference iterations to run. Defaults to 20.
        size (int or tuple, optional): Input image size for benchmarking. Defaults to 640.

    Returns:
        LatencyMetrics: Performance metrics including FPS, mean, min, max, and std latencies.
    """
    gpu_info = get_gpu_info()
    device_name = "CPU"
    if gpu_info.devices is not None and len(gpu_info.devices) > 0:
        device_name = gpu_info.devices[0].gpu_name
    else:
        device_name = get_cpu_name()
        self.logger.warning(f"No GPU found, using CPU {device_name}.")
    self.logger.info("⏱️ [torchscript] Benchmarking latency..")
    size = size if isinstance(size, (tuple, list)) else (size, size)

    torch_input = torch.rand(1, 3, size[0], size[1], device=self.device)
    durations = []

    with torch.no_grad():
        for step in range(iterations + 5):
            start = perf_counter()
            self.model(torch_input)
            end = perf_counter()

            if step >= 5:  # Skip first 5 iterations
                durations.append((end - start) * 1000)

    durations = np.array(durations)

    metrics = LatencyMetrics(
        fps=int(1000 / durations.mean().astype(float)),
        engine="torchscript",
        mean=round(durations.mean().astype(float), 3),
        max=round(durations.max().astype(float), 3),
        min=round(durations.min().astype(float), 3),
        std=round(durations.std().astype(float), 3),
        im_size=size[0],
        device=str(device_name),
    )
    self.logger.info(f"🔥 FPS: {metrics.fps} Mean latency: {metrics.mean} ms ")
    return metrics

load_runtime(runtime_type, model_path, model_metadata, warmup_iter=0) #

Creates and returns a runtime instance based on the specified runtime type. Supports both ONNX and TorchScript runtimes with various execution providers.

Parameters:

Name Type Description Default
runtime_type RuntimeTypes

The type of runtime to use. Can be one of: - ONNX_CUDA32: ONNX runtime with CUDA FP32 - ONNX_TRT32: ONNX runtime with TensorRT FP32 - ONNX_TRT16: ONNX runtime with TensorRT FP16 - ONNX_CPU: ONNX runtime with CPU - ONNX_COREML: ONNX runtime with CoreML - TORCHSCRIPT_32: TorchScript runtime with FP32

required
model_path str

Path to the model file (.onnx or .pt)

required
model_metadata ModelMetadata

Model metadata containing task type, classes etc.

required
warmup_iter int

Number of warmup iterations before inference. Defaults to 0.

0

Returns:

Name Type Description
BaseRuntime BaseRuntime

A configured runtime instance (ONNXRuntime or TorchscriptRuntime)

Raises:

Type Description
ImportError

If required dependencies (torch/onnxruntime) are not installed

Source code in focoos/runtime.py
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
def load_runtime(
    runtime_type: RuntimeTypes,
    model_path: str,
    model_metadata: ModelMetadata,
    warmup_iter: int = 0,
) -> BaseRuntime:
    """
    Creates and returns a runtime instance based on the specified runtime type.
    Supports both ONNX and TorchScript runtimes with various execution providers.

    Args:
        runtime_type (RuntimeTypes): The type of runtime to use. Can be one of:
            - ONNX_CUDA32: ONNX runtime with CUDA FP32
            - ONNX_TRT32: ONNX runtime with TensorRT FP32
            - ONNX_TRT16: ONNX runtime with TensorRT FP16
            - ONNX_CPU: ONNX runtime with CPU
            - ONNX_COREML: ONNX runtime with CoreML
            - TORCHSCRIPT_32: TorchScript runtime with FP32
        model_path (str): Path to the model file (.onnx or .pt)
        model_metadata (ModelMetadata): Model metadata containing task type, classes etc.
        warmup_iter (int, optional): Number of warmup iterations before inference. Defaults to 0.

    Returns:
        BaseRuntime: A configured runtime instance (ONNXRuntime or TorchscriptRuntime)

    Raises:
        ImportError: If required dependencies (torch/onnxruntime) are not installed
    """
    if runtime_type == RuntimeTypes.TORCHSCRIPT_32:
        if not TORCH_AVAILABLE:
            logger.error(
                "⚠️ Pytorch not found =(  please install focoos with ['torch'] extra. See https://focoosai.github.io/focoos/setup/ for more details"
            )
            raise ImportError("Pytorch not found")
        opts = TorchscriptRuntimeOpts(warmup_iter=warmup_iter)
        return TorchscriptRuntime(model_path, opts, model_metadata)
    else:
        if not ORT_AVAILABLE:
            logger.error(
                "⚠️ onnxruntime not found =(  please install focoos with one of 'cpu', 'cuda', 'tensorrt' extra. See https://focoosai.github.io/focoos/setup/ for more details"
            )
            raise ImportError("onnxruntime not found")
        opts = OnnxRuntimeOpts(
            cuda=runtime_type == RuntimeTypes.ONNX_CUDA32,
            trt=runtime_type in [RuntimeTypes.ONNX_TRT32, RuntimeTypes.ONNX_TRT16],
            fp16=runtime_type == RuntimeTypes.ONNX_TRT16,
            warmup_iter=warmup_iter,
            coreml=runtime_type == RuntimeTypes.ONNX_COREML,
            verbose=False,
        )
    return ONNXRuntime(model_path, opts, model_metadata)