Skip to content

runtime

Runtime Module for ONNX-based Models

This module provides the necessary functionality for loading, preprocessing, running inference, and benchmarking ONNX-based models using different execution providers such as CUDA, TensorRT, OpenVINO, and CPU. It includes utility functions for image preprocessing, postprocessing, and interfacing with the ONNXRuntime library.

Functions:

Name Description
det_postprocess

Postprocesses detection model outputs into sv.Detections.

semseg_postprocess

Postprocesses semantic segmentation model outputs into sv.Detections.

get_runtime

Returns an ONNXRuntime instance configured for the given runtime type.

Classes:

Name Description
ONNXRuntime

A class that interfaces with ONNX Runtime for model inference.

ONNXRuntime #

A class that interfaces with ONNX Runtime for model inference using different execution providers (CUDA, TensorRT, OpenVINO, CoreML, etc.). It manages preprocessing, inference, and postprocessing of data, as well as benchmarking the performance of the model.

Attributes:

Name Type Description
logger Logger

Logger for the ONNXRuntime instance.

name str

The name of the model (derived from its path).

opts OnnxEngineOpts

Options used for configuring the ONNX Runtime.

model_metadata ModelMetadata

Metadata related to the model.

postprocess_fn Callable

The function used to postprocess the model's output.

ort_sess InferenceSession

The ONNXRuntime inference session.

dtype dtype

The data type for the model input.

binding Optional[str]

The binding type for the runtime (e.g., CUDA, CPU).

Source code in focoos/runtime.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
class ONNXRuntime:
    """
    A class that interfaces with ONNX Runtime for model inference using different execution providers
    (CUDA, TensorRT, OpenVINO, CoreML, etc.). It manages preprocessing, inference, and postprocessing
    of data, as well as benchmarking the performance of the model.

    Attributes:
        logger (Logger): Logger for the ONNXRuntime instance.
        name (str): The name of the model (derived from its path).
        opts (OnnxEngineOpts): Options used for configuring the ONNX Runtime.
        model_metadata (ModelMetadata): Metadata related to the model.
        postprocess_fn (Callable): The function used to postprocess the model's output.
        ort_sess (InferenceSession): The ONNXRuntime inference session.
        dtype (np.dtype): The data type for the model input.
        binding (Optional[str]): The binding type for the runtime (e.g., CUDA, CPU).
    """

    def __init__(self, model_path: str, opts: OnnxEngineOpts, model_metadata: ModelMetadata):
        """
        Initializes the ONNXRuntime instance with the specified model and configuration options.

        Args:
            model_path (str): Path to the ONNX model file.
            opts (OnnxEngineOpts): The configuration options for ONNX Runtime.
            model_metadata (ModelMetadata): Metadata for the model (e.g., task type).
        """
        self.logger = get_logger()
        self.logger.debug(f"[onnxruntime device] {ort.get_device()}")
        self.logger.debug(f"[onnxruntime available providers] {ort.get_available_providers()}")
        self.name = Path(model_path).stem
        self.opts = opts
        self.model_metadata = model_metadata
        self.postprocess_fn = det_postprocess if model_metadata.task == FocoosTask.DETECTION else semseg_postprocess
        options = ort.SessionOptions()
        if opts.verbose:
            options.log_severity_level = 0
        options.enable_profiling = opts.verbose
        # options.intra_op_num_threads = 1
        available_providers = ort.get_available_providers()
        if opts.cuda and "CUDAExecutionProvider" not in available_providers:
            self.logger.warning("CUDA ExecutionProvider not found.")
        if opts.trt and "TensorrtExecutionProvider" not in available_providers:
            self.logger.warning("Tensorrt ExecutionProvider not found.")
        if opts.vino and "OpenVINOExecutionProvider" not in available_providers:
            self.logger.warning("OpenVINO ExecutionProvider not found.")
        if opts.coreml and "CoreMLExecutionProvider" not in available_providers:
            self.logger.warning("CoreML ExecutionProvider not found.")
        # Set providers
        providers = []
        dtype = np.float32
        binding = None
        if opts.trt and "TensorrtExecutionProvider" in available_providers:
            providers.append(
                (
                    "TensorrtExecutionProvider",
                    {
                        "device_id": 0,
                        # 'trt_max_workspace_size': 1073741824,  # 1 GB
                        "trt_fp16_enable": opts.fp16,
                        "trt_force_sequential_engine_build": False,
                    },
                )
            )
            dtype = np.float32
        elif opts.vino and "OpenVINOExecutionProvider" in available_providers:
            providers.append(
                (
                    "OpenVINOExecutionProvider",
                    {
                        "device_type": "MYRIAD_FP16",
                        "enable_vpu_fast_compile": True,
                        "num_of_threads": 1,
                    },
                    # 'use_compiled_network': False}
                )
            )
            options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
            dtype = np.float32
            binding = None
        elif opts.cuda and "CUDAExecutionProvider" in available_providers:
            binding = "cuda"
            options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
            providers.append(
                (
                    "CUDAExecutionProvider",
                    {
                        "device_id": GPU_ID,
                        "arena_extend_strategy": "kSameAsRequested",
                        "gpu_mem_limit": 16 * 1024 * 1024 * 1024,
                        "cudnn_conv_algo_search": "EXHAUSTIVE",
                        "do_copy_in_default_stream": True,
                    },
                )
            )
        elif opts.coreml and "CoreMLExecutionProvider" in available_providers:
            #     # options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
            providers.append("CoreMLExecutionProvider")
        else:
            binding = None

        binding = None  # TODO: remove this
        providers.append("CPUExecutionProvider")
        self.dtype = dtype
        self.binding = binding
        self.ort_sess = ort.InferenceSession(model_path, options, providers=providers)
        self.active_providers = self.ort_sess.get_providers()
        self.logger.info(f"[onnxruntime] Active providers:{self.ort_sess.get_providers()}")
        if self.ort_sess.get_inputs()[0].type == "tensor(uint8)":
            self.dtype = np.uint8
        else:
            self.dtype = np.float32
        if self.opts.warmup_iter > 0:
            self.logger.info("⏱️ [onnxruntime] Warming up model ..")
            for _ in range(self.opts.warmup_iter):
                np_image = np.random.rand(1, 3, 640, 640).astype(self.dtype)
                input_name = self.ort_sess.get_inputs()[0].name
                out_name = [output.name for output in self.ort_sess.get_outputs()]
                if self.binding is not None:
                    io_binding = self.ort_sess.io_binding()
                    io_binding.bind_input(
                        input_name,
                        self.binding,
                        device_id=GPU_ID,
                        element_type=self.dtype,
                        shape=np_image.shape,
                        buffer_ptr=np_image.ctypes.data,
                    )
                    io_binding.bind_cpu_input(input_name, np_image)
                    io_binding.bind_output(out_name[0], self.binding)
                    self.ort_sess.run_with_iobinding(io_binding)
                    io_binding.copy_outputs_to_cpu()
                else:
                    self.ort_sess.run(out_name, {input_name: np_image})

            self.logger.info(f"⏱️ [onnxruntime] {self.name} WARMUP DONE")

    def __call__(self, im: np.ndarray, conf_threshold: float) -> sv.Detections:
        """
        Runs inference on the provided input image and returns the model's detections.

        Args:
            im (np.ndarray): The preprocessed input image.
            conf_threshold (float): The confidence threshold for filtering results.

        Returns:
            sv.Detections: A sv.Detections object containing the model's output detections.
        """
        out_name = None
        input_name = self.ort_sess.get_inputs()[0].name
        out_name = [output.name for output in self.ort_sess.get_outputs()]
        if self.binding is not None:
            self.logger.info(f"binding {self.binding}")
            io_binding = self.ort_sess.io_binding()

            io_binding.bind_input(
                input_name,
                self.binding,
                device_id=GPU_ID,
                element_type=self.dtype,
                shape=im.shape,
                buffer_ptr=im.ctypes.data,
            )

            io_binding.bind_cpu_input(input_name, im)
            io_binding.bind_output(out_name[0], self.binding)
            self.ort_sess.run_with_iobinding(io_binding)
            out = io_binding.copy_outputs_to_cpu()
        else:
            out = self.ort_sess.run(out_name, {input_name: im})

        detections = self.postprocess_fn(out, (im.shape[2], im.shape[3]), conf_threshold)
        return detections

    def benchmark(self, iterations=20, size=640) -> LatencyMetrics:
        """
        Benchmarks the model by running multiple inference iterations and measuring the latency.

        Args:
            iterations (int, optional): Number of iterations to run for benchmarking. Defaults to 20.
            size (int, optional): The input image size for benchmarking. Defaults to 640.

        Returns:
            LatencyMetrics: The latency metrics (e.g., FPS, mean, min, max, and standard deviation).
        """
        self.logger.info("⏱️ [onnxruntime] Benchmarking latency..")
        size = size if isinstance(size, (tuple, list)) else (size, size)

        durations = []
        np_input = (255 * np.random.random((1, 3, size[0], size[1]))).astype(self.dtype)
        input_name = self.ort_sess.get_inputs()[0].name
        out_name = self.ort_sess.get_outputs()[0].name
        if self.binding:
            io_binding = self.ort_sess.io_binding()

            io_binding.bind_input(
                input_name,
                "cuda",
                device_id=0,
                element_type=self.dtype,
                shape=np_input.shape,
                buffer_ptr=np_input.ctypes.data,
            )

            io_binding.bind_cpu_input(input_name, np_input)
            io_binding.bind_output(out_name, "cuda")
        else:
            out_name = [output.name for output in self.ort_sess.get_outputs()]

        for step in range(iterations + 5):
            if self.binding:
                start = perf_counter()
                self.ort_sess.run_with_iobinding(io_binding)
                end = perf_counter()
            else:
                start = perf_counter()
                self.ort_sess.run(out_name, {input_name: np_input})
                end = perf_counter()

            if step >= 5:
                durations.append((end - start) * 1000)
        durations = np.array(durations)
        provider = self.active_providers[0]
        if provider in ["CUDAExecutionProvider", "TensorrtExecutionProvider"]:
            device = get_gpu_name()
        else:
            device = get_cpu_name()
        metrics = LatencyMetrics(
            fps=int(1000 / durations.mean()),
            engine=f"onnx.{provider}",
            mean=round(durations.mean(), 3),
            max=round(durations.max(), 3),
            min=round(durations.min(), 3),
            std=round(durations.std(), 3),
            im_size=size[0],
            device=str(device),
        )
        self.logger.info(f"🔥 FPS: {metrics.fps}")
        return metrics

__call__(im, conf_threshold) #

Runs inference on the provided input image and returns the model's detections.

Parameters:

Name Type Description Default
im ndarray

The preprocessed input image.

required
conf_threshold float

The confidence threshold for filtering results.

required

Returns:

Type Description
Detections

sv.Detections: A sv.Detections object containing the model's output detections.

Source code in focoos/runtime.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
def __call__(self, im: np.ndarray, conf_threshold: float) -> sv.Detections:
    """
    Runs inference on the provided input image and returns the model's detections.

    Args:
        im (np.ndarray): The preprocessed input image.
        conf_threshold (float): The confidence threshold for filtering results.

    Returns:
        sv.Detections: A sv.Detections object containing the model's output detections.
    """
    out_name = None
    input_name = self.ort_sess.get_inputs()[0].name
    out_name = [output.name for output in self.ort_sess.get_outputs()]
    if self.binding is not None:
        self.logger.info(f"binding {self.binding}")
        io_binding = self.ort_sess.io_binding()

        io_binding.bind_input(
            input_name,
            self.binding,
            device_id=GPU_ID,
            element_type=self.dtype,
            shape=im.shape,
            buffer_ptr=im.ctypes.data,
        )

        io_binding.bind_cpu_input(input_name, im)
        io_binding.bind_output(out_name[0], self.binding)
        self.ort_sess.run_with_iobinding(io_binding)
        out = io_binding.copy_outputs_to_cpu()
    else:
        out = self.ort_sess.run(out_name, {input_name: im})

    detections = self.postprocess_fn(out, (im.shape[2], im.shape[3]), conf_threshold)
    return detections

__init__(model_path, opts, model_metadata) #

Initializes the ONNXRuntime instance with the specified model and configuration options.

Parameters:

Name Type Description Default
model_path str

Path to the ONNX model file.

required
opts OnnxEngineOpts

The configuration options for ONNX Runtime.

required
model_metadata ModelMetadata

Metadata for the model (e.g., task type).

required
Source code in focoos/runtime.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
def __init__(self, model_path: str, opts: OnnxEngineOpts, model_metadata: ModelMetadata):
    """
    Initializes the ONNXRuntime instance with the specified model and configuration options.

    Args:
        model_path (str): Path to the ONNX model file.
        opts (OnnxEngineOpts): The configuration options for ONNX Runtime.
        model_metadata (ModelMetadata): Metadata for the model (e.g., task type).
    """
    self.logger = get_logger()
    self.logger.debug(f"[onnxruntime device] {ort.get_device()}")
    self.logger.debug(f"[onnxruntime available providers] {ort.get_available_providers()}")
    self.name = Path(model_path).stem
    self.opts = opts
    self.model_metadata = model_metadata
    self.postprocess_fn = det_postprocess if model_metadata.task == FocoosTask.DETECTION else semseg_postprocess
    options = ort.SessionOptions()
    if opts.verbose:
        options.log_severity_level = 0
    options.enable_profiling = opts.verbose
    # options.intra_op_num_threads = 1
    available_providers = ort.get_available_providers()
    if opts.cuda and "CUDAExecutionProvider" not in available_providers:
        self.logger.warning("CUDA ExecutionProvider not found.")
    if opts.trt and "TensorrtExecutionProvider" not in available_providers:
        self.logger.warning("Tensorrt ExecutionProvider not found.")
    if opts.vino and "OpenVINOExecutionProvider" not in available_providers:
        self.logger.warning("OpenVINO ExecutionProvider not found.")
    if opts.coreml and "CoreMLExecutionProvider" not in available_providers:
        self.logger.warning("CoreML ExecutionProvider not found.")
    # Set providers
    providers = []
    dtype = np.float32
    binding = None
    if opts.trt and "TensorrtExecutionProvider" in available_providers:
        providers.append(
            (
                "TensorrtExecutionProvider",
                {
                    "device_id": 0,
                    # 'trt_max_workspace_size': 1073741824,  # 1 GB
                    "trt_fp16_enable": opts.fp16,
                    "trt_force_sequential_engine_build": False,
                },
            )
        )
        dtype = np.float32
    elif opts.vino and "OpenVINOExecutionProvider" in available_providers:
        providers.append(
            (
                "OpenVINOExecutionProvider",
                {
                    "device_type": "MYRIAD_FP16",
                    "enable_vpu_fast_compile": True,
                    "num_of_threads": 1,
                },
                # 'use_compiled_network': False}
            )
        )
        options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
        dtype = np.float32
        binding = None
    elif opts.cuda and "CUDAExecutionProvider" in available_providers:
        binding = "cuda"
        options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
        providers.append(
            (
                "CUDAExecutionProvider",
                {
                    "device_id": GPU_ID,
                    "arena_extend_strategy": "kSameAsRequested",
                    "gpu_mem_limit": 16 * 1024 * 1024 * 1024,
                    "cudnn_conv_algo_search": "EXHAUSTIVE",
                    "do_copy_in_default_stream": True,
                },
            )
        )
    elif opts.coreml and "CoreMLExecutionProvider" in available_providers:
        #     # options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
        providers.append("CoreMLExecutionProvider")
    else:
        binding = None

    binding = None  # TODO: remove this
    providers.append("CPUExecutionProvider")
    self.dtype = dtype
    self.binding = binding
    self.ort_sess = ort.InferenceSession(model_path, options, providers=providers)
    self.active_providers = self.ort_sess.get_providers()
    self.logger.info(f"[onnxruntime] Active providers:{self.ort_sess.get_providers()}")
    if self.ort_sess.get_inputs()[0].type == "tensor(uint8)":
        self.dtype = np.uint8
    else:
        self.dtype = np.float32
    if self.opts.warmup_iter > 0:
        self.logger.info("⏱️ [onnxruntime] Warming up model ..")
        for _ in range(self.opts.warmup_iter):
            np_image = np.random.rand(1, 3, 640, 640).astype(self.dtype)
            input_name = self.ort_sess.get_inputs()[0].name
            out_name = [output.name for output in self.ort_sess.get_outputs()]
            if self.binding is not None:
                io_binding = self.ort_sess.io_binding()
                io_binding.bind_input(
                    input_name,
                    self.binding,
                    device_id=GPU_ID,
                    element_type=self.dtype,
                    shape=np_image.shape,
                    buffer_ptr=np_image.ctypes.data,
                )
                io_binding.bind_cpu_input(input_name, np_image)
                io_binding.bind_output(out_name[0], self.binding)
                self.ort_sess.run_with_iobinding(io_binding)
                io_binding.copy_outputs_to_cpu()
            else:
                self.ort_sess.run(out_name, {input_name: np_image})

        self.logger.info(f"⏱️ [onnxruntime] {self.name} WARMUP DONE")

benchmark(iterations=20, size=640) #

Benchmarks the model by running multiple inference iterations and measuring the latency.

Parameters:

Name Type Description Default
iterations int

Number of iterations to run for benchmarking. Defaults to 20.

20
size int

The input image size for benchmarking. Defaults to 640.

640

Returns:

Name Type Description
LatencyMetrics LatencyMetrics

The latency metrics (e.g., FPS, mean, min, max, and standard deviation).

Source code in focoos/runtime.py
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
def benchmark(self, iterations=20, size=640) -> LatencyMetrics:
    """
    Benchmarks the model by running multiple inference iterations and measuring the latency.

    Args:
        iterations (int, optional): Number of iterations to run for benchmarking. Defaults to 20.
        size (int, optional): The input image size for benchmarking. Defaults to 640.

    Returns:
        LatencyMetrics: The latency metrics (e.g., FPS, mean, min, max, and standard deviation).
    """
    self.logger.info("⏱️ [onnxruntime] Benchmarking latency..")
    size = size if isinstance(size, (tuple, list)) else (size, size)

    durations = []
    np_input = (255 * np.random.random((1, 3, size[0], size[1]))).astype(self.dtype)
    input_name = self.ort_sess.get_inputs()[0].name
    out_name = self.ort_sess.get_outputs()[0].name
    if self.binding:
        io_binding = self.ort_sess.io_binding()

        io_binding.bind_input(
            input_name,
            "cuda",
            device_id=0,
            element_type=self.dtype,
            shape=np_input.shape,
            buffer_ptr=np_input.ctypes.data,
        )

        io_binding.bind_cpu_input(input_name, np_input)
        io_binding.bind_output(out_name, "cuda")
    else:
        out_name = [output.name for output in self.ort_sess.get_outputs()]

    for step in range(iterations + 5):
        if self.binding:
            start = perf_counter()
            self.ort_sess.run_with_iobinding(io_binding)
            end = perf_counter()
        else:
            start = perf_counter()
            self.ort_sess.run(out_name, {input_name: np_input})
            end = perf_counter()

        if step >= 5:
            durations.append((end - start) * 1000)
    durations = np.array(durations)
    provider = self.active_providers[0]
    if provider in ["CUDAExecutionProvider", "TensorrtExecutionProvider"]:
        device = get_gpu_name()
    else:
        device = get_cpu_name()
    metrics = LatencyMetrics(
        fps=int(1000 / durations.mean()),
        engine=f"onnx.{provider}",
        mean=round(durations.mean(), 3),
        max=round(durations.max(), 3),
        min=round(durations.min(), 3),
        std=round(durations.std(), 3),
        im_size=size[0],
        device=str(device),
    )
    self.logger.info(f"🔥 FPS: {metrics.fps}")
    return metrics

det_postprocess(out, im0_shape, conf_threshold) #

Postprocesses the output of an object detection model and filters detections based on a confidence threshold.

Parameters:

Name Type Description Default
out List[ndarray]

The output of the detection model.

required
im0_shape Tuple[int, int]

The original shape of the input image (height, width).

required
conf_threshold float

The confidence threshold for filtering detections.

required

Returns:

Type Description
Detections

sv.Detections: A sv.Detections object containing the filtered bounding boxes, class ids, and confidences.

Source code in focoos/runtime.py
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def det_postprocess(out: List[np.ndarray], im0_shape: Tuple[int, int], conf_threshold: float) -> sv.Detections:
    """
    Postprocesses the output of an object detection model and filters detections
    based on a confidence threshold.

    Args:
        out (List[np.ndarray]): The output of the detection model.
        im0_shape (Tuple[int, int]): The original shape of the input image (height, width).
        conf_threshold (float): The confidence threshold for filtering detections.

    Returns:
        sv.Detections: A sv.Detections object containing the filtered bounding boxes, class ids, and confidences.
    """
    cls_ids, boxes, confs = out
    boxes[:, 0::2] *= im0_shape[1]
    boxes[:, 1::2] *= im0_shape[0]
    high_conf_indices = (confs > conf_threshold).nonzero()

    return sv.Detections(
        xyxy=boxes[high_conf_indices].astype(int),
        class_id=cls_ids[high_conf_indices].astype(int),
        confidence=confs[high_conf_indices].astype(float),
    )

get_runtime(runtime_type, model_path, model_metadata, warmup_iter=0) #

Creates and returns an ONNXRuntime instance based on the specified runtime type and model path, with options for various execution providers (CUDA, TensorRT, CPU, etc.).

Parameters:

Name Type Description Default
runtime_type RuntimeTypes

The type of runtime to use (e.g., ONNX_CUDA32, ONNX_TRT32).

required
model_path str

The path to the ONNX model.

required
model_metadata ModelMetadata

Metadata describing the model.

required
warmup_iter int

Number of warmup iterations before benchmarking. Defaults to 0.

0

Returns:

Name Type Description
ONNXRuntime ONNXRuntime

A fully configured ONNXRuntime instance.

Source code in focoos/runtime.py
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
def get_runtime(
    runtime_type: RuntimeTypes,
    model_path: str,
    model_metadata: ModelMetadata,
    warmup_iter: int = 0,
) -> ONNXRuntime:
    """
    Creates and returns an ONNXRuntime instance based on the specified runtime type
    and model path, with options for various execution providers (CUDA, TensorRT, CPU, etc.).

    Args:
        runtime_type (RuntimeTypes): The type of runtime to use (e.g., ONNX_CUDA32, ONNX_TRT32).
        model_path (str): The path to the ONNX model.
        model_metadata (ModelMetadata): Metadata describing the model.
        warmup_iter (int, optional): Number of warmup iterations before benchmarking. Defaults to 0.

    Returns:
        ONNXRuntime: A fully configured ONNXRuntime instance.
    """
    opts = OnnxEngineOpts(
        cuda=runtime_type == RuntimeTypes.ONNX_CUDA32,
        trt=runtime_type in [RuntimeTypes.ONNX_TRT32, RuntimeTypes.ONNX_TRT16],
        fp16=runtime_type == RuntimeTypes.ONNX_TRT16,
        warmup_iter=warmup_iter,
        coreml=runtime_type == RuntimeTypes.ONNX_COREML,
        verbose=False,
    )
    return ONNXRuntime(model_path, opts, model_metadata)

semseg_postprocess(out, im0_shape, conf_threshold) #

Postprocesses the output of a semantic segmentation model and filters based on a confidence threshold.

Parameters:

Name Type Description Default
out List[ndarray]

The output of the semantic segmentation model.

required
im0_shape Tuple[int, int]

The original shape of the input image (height, width).

required
conf_threshold float

The confidence threshold for filtering detections.

required

Returns:

Type Description
Detections

sv.Detections: A sv.Detections object containing the masks, class ids, and confidences.

Source code in focoos/runtime.py
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
def semseg_postprocess(out: List[np.ndarray], im0_shape: Tuple[int, int], conf_threshold: float) -> sv.Detections:
    """
    Postprocesses the output of a semantic segmentation model and filters based
    on a confidence threshold.

    Args:
        out (List[np.ndarray]): The output of the semantic segmentation model.
        im0_shape (Tuple[int, int]): The original shape of the input image (height, width).
        conf_threshold (float): The confidence threshold for filtering detections.

    Returns:
        sv.Detections: A sv.Detections object containing the masks, class ids, and confidences.
    """
    cls_ids, mask, confs = out[0][0], out[1][0], out[2][0]
    masks = np.equal(mask, np.arange(len(cls_ids))[:, None, None])
    high_conf_indices = np.where(confs > conf_threshold)[0]
    masks = masks[high_conf_indices].astype(bool)
    cls_ids = cls_ids[high_conf_indices].astype(int)
    confs = confs[high_conf_indices].astype(float)
    return sv.Detections(
        mask=masks,
        # xyxy is required from supervision
        xyxy=np.zeros(shape=(len(high_conf_indices), 4), dtype=np.uint8),
        class_id=cls_ids,
        confidence=confs,
    )