Skip to content

fai-rtdetr-m-obj365#

Overview#

The models is a RT-DETR model otimized by FocoosAI for the Objects365. It is a object detection model able to detect 365 thing (dog, cat, car, etc.) classes.

Model Details#

The model is based on the RT-DETR architecture. It is a object detection model that uses a transformer-based encoder-decoder architecture.

Neural Network Architecture#

This implementation is a reimplementation of the RT-DETR model by FocoosAI. The original model is fully described in this paper.

RT-DETR is a hybrid model that uses three main components: a backbone for extracting features, an encoder for upscaling the features, and a transformer-based decoder for generating the detection output.

alt text

In this implementation:

  • the backbone is a Resnet-50,that guarantees a good performance while having good efficiency.
  • the encoder is the Hybrid Encoder, as proposed by the paper, and it is a bi-FPN (bilinear feature pyramid network) that includes a transformer encoder on the smaller feature resolution for improving efficiency.
  • The query selection mechanism select the features of the pixels (aka queries) with the highest probability of containing an object and pass them to a transformer decoder head that will generate the final detection output. In this implementation, we select 300 queries and use 6 transformer decoder layers.

Losses#

We use the same losses as the original paper:

  • loss_vfl: a variant of the binary cross entropy loss for the classification of the classes that is weighted by the correctness of the predicted bounding boxes IoU.
  • loss_bbox: an L1 loss computing the distance between the predicted bounding boxes and the ground truth bounding boxes.
  • loss_giou: a loss minimizing the IoU the predicted bounding boxes and the ground truth bounding boxes. For more details look at GIoU.

These losses are applied to each output of the transformer decoder, meaning that we apply it on the output and on each auxiliary output of the transformer decoder layers. Please refer to the RT-DETR paper for more details.

Output Format#

The pre-processed output of the model is set of bounding boxes with associated class probabilities. In particular, the output is composed by three tensors:

  • class_ids: a tensor of 300 elements containing the class id associated with each bounding box (such as 1 for wall, 2 for building, etc.)
  • scores: a tensor of 300 elements containing the corresponding probability of the class_id
  • boxes: a tensor of shape (300, 4) where the values represent the coordinates of the bounding boxes in the format [x1, y1, x2, y2]

The model does not need NMS (non-maximum suppression) because the output is already a set of bounding boxes with associated class probabilities and has been trained to avoid overlaps.

After the post-processing, the output is a the output is a Focoos Detections object containing the predicted bounding boxes with confidence greather than a specific threshold (0.5 by default).

Classes#

The model is pretrained on the Objects365 with 365 classes.

Class ID Class Name AP
1 Person 68.8
2 Sneakers 47.3
3 Chair 52.7
4 Other Shoes 14.8
5 Hat 56.4
6 Car 37.2
7 Lamp 43.7
8 Glasses 47.8
9 Bottle 40.6
10 Desk 48.2
11 Cup 53.9
12 Street Lights 38.9
13 Cabinet/shelf 46.8
14 Handbag/Satchel 29.5
15 Bracelet 27.8
16 Plate 70.5
17 Picture/Frame 66.0
18 Helmet 47.7
19 Book 23.0
20 Gloves 42.9
21 Storage box 24.1
22 Boat 31.1
23 Leather Shoes 29.3
24 Flower 39.6
25 Bench 26.6
26 Potted Plant 42.4
27 Bowl/Basin 57.7
28 Flag 38.6
29 Pillow 56.0
30 Boots 39.8
31 Vase 37.1
32 Microphone 39.1
33 Necklace 30.3
34 Ring 19.5
35 SUV 33.4
36 Wine Glass 68.6
37 Belt 35.8
38 Moniter/TV 75.1
39 Backpack 29.9
40 Umbrella 37.4
41 Traffic Light 39.0
42 Speaker 58.3
43 Watch 47.1
44 Tie 37.0
45 Trash bin Can 49.1
46 Slippers 43.9
47 Bicycle 46.8
48 Stool 48.9
49 Barrel/bucket 39.9
50 Van 30.7
51 Couch 62.5
52 Sandals 43.7
53 Bakset 40.1
54 Drum 54.8
55 Pen/Pencil 29.9
56 Bus 45.3
57 Wild Bird 12.6
58 High Heels 41.9
59 Motorcycle 31.8
60 Guitar 64.7
61 Carpet 59.8
62 Cell Phone 43.8
63 Bread 23.6
64 Camera 32.8
65 Canned 37.4
66 Truck 23.0
67 Traffic cone 44.5
68 Cymbal 55.7
69 Lifesaver 31.2
70 Towel 54.0
71 Stuffed Toy 40.7
72 Candle 30.5
73 Sailboat 55.5
74 Laptop 73.8
75 Awning 27.4
76 Bed 66.1
77 Faucet 41.8
78 Tent 30.8
79 Horse 46.3
80 Mirror 59.0
81 Power outlet 43.1
82 Sink 53.3
83 Apple 22.0
84 Air Conditioner 30.1
85 Knife 46.1
86 Hockey Stick 59.0
87 Paddle 25.5
88 Pickup Truck 45.2
89 Fork 57.7
90 Traffic Sign 30.0
91 Ballon 47.7
92 Tripod 29.7
93 Dog 58.5
94 Spoon 46.7
95 Clock 61.6
96 Pot 44.3
97 Cow 18.7
98 Cake 15.7
99 Dinning Table 46.2
100 Sheep 30.5
101 Hanger 9.9
102 Blackboard/Whiteboard 47.5
103 Napkin 35.5
104 Other Fish 30.4
105 Orange/Tangerine 10.8
106 Toiletry 30.1
107 Keyboard 71.8
108 Tomato 36.4
109 Lantern 47.7
110 Machinery Vehicle 30.5
111 Fan 49.4
112 Green Vegetables 13.2
113 Banana 30.2
114 Baseball Glove 38.8
115 Airplane 60.9
116 Mouse 61.5
117 Train 50.5
118 Pumpkin 53.4
119 Soccer 29.0
120 Skiboard 24.3
121 Luggage 32.5
122 Nightstand 62.3
123 Tea pot 31.7
124 Telephone 45.8
125 Trolley 36.8
126 Head Phone 40.9
127 Sports Car 67.8
128 Stop Sign 49.3
129 Dessert 28.7
130 Scooter 35.5
131 Stroller 42.7
132 Crane 46.0
133 Remote 47.1
134 Refrigerator 70.2
135 Oven 51.9
136 Lemon 33.4
137 Duck 43.3
138 Baseball Bat 40.4
139 Surveillance Camera 24.2
140 Cat 67.5
141 Jug 24.1
142 Broccoli 29.3
143 Piano 41.5
144 Pizza 50.9
145 Elephant 66.9
146 Skateboard 19.0
147 Surfboard 44.0
148 Gun 23.8
149 Skating and Skiing shoes 64.6
150 Gas stove 39.2
151 Donut 45.0
152 Bow Tie 28.8
153 Carrot 15.6
154 Toilet 73.2
155 Kite 44.1
156 Strawberry 24.2
157 Other Balls 36.1
158 Shovel 18.9
159 Pepper 18.5
160 Computer Box 49.3
161 Toilet Paper 39.0
162 Cleaning Products 21.1
163 Chopsticks 40.3
164 Microwave 68.0
165 Pigeon 48.1
166 Baseball 32.6
167 Cutting/chopping Board 40.9
168 Coffee Table 49.8
169 Side Table 34.6
170 Scissors 28.1
171 Marker 20.6
172 Pie 20.8
173 Ladder 34.4
174 Snowboard 36.8
175 Cookies 13.0
176 Radiator 50.3
177 Fire Hydrant 47.6
178 Basketball 32.9
179 Zebra 58.3
180 Grape 10.4
181 Giraffe 64.0
182 Potato 11.0
183 Sausage 25.1
184 Tricycle 27.4
185 Violin 39.6
186 Egg 34.7
187 Fire Extinguisher 47.3
188 Candy 1.9
189 Fire Truck 55.7
190 Billards 63.0
191 Converter 18.0
192 Bathtub 58.8
193 Wheelchair 62.9
194 Golf Club 33.8
195 Briefcase 35.1
196 Cucumber 26.6
197 Cigar/Cigarette 11.2
198 Paint Brush 15.0
199 Pear 5.5
200 Heavy Truck 35.3
201 Hamburger 40.5
202 Extractor 62.7
203 Extention Cord 19.1
204 Tong 16.4
205 Tennis Racket 51.4
206 Folder 8.8
207 American Football 22.6
208 earphone 7.6
209 Mask 36.9
210 Kettle 43.9
211 Tennis 37.3
212 Ship 49.0
213 Swing 48.4
214 Coffee Machine 50.3
215 Slide 46.7
216 Carriage 59.5
217 Onion 7.4
218 Green beans 3.6
219 Projector 48.6
220 Frisbee 33.0
221 Washing Machine/Drying Machine 53.6
222 Chicken 49.5
223 Printer 54.8
224 Watermelon 28.7
225 Saxophone 52.2
226 Tissue 31.6
227 Toothbrush 23.6
228 Ice cream 27.6
229 Hotair ballon 77.2
230 Cello 45.9
231 French Fries 42.7
232 Scale 28.3
233 Trophy 37.6
234 Cabbage 11.9
235 Hot dog 39.9
236 Blender 44.3
237 Peach 6.2
238 Rice 44.3
239 Wallet/Purse 30.4
240 Volleyball 51.2
241 Deer 45.0
242 Goose 17.5
243 Tape 24.0
244 Tablet 39.9
245 Cosmetics 18.8
246 Trumpet 36.7
247 Pineapple 19.1
248 Golf Ball 39.5
249 Ambulance 78.2
250 Parking meter 33.8
251 Mango 0.8
252 Key 3.0
253 Hurdle 33.9
254 Fishing Rod 29.1
255 Medal 22.8
256 Flute 31.9
257 Brush 8.2
258 Penguin 57.5
259 Megaphone 18.8
260 Corn 22.4
261 Lettuce 2.3
262 Garlic 16.8
263 Swan 46.1
264 Helicopter 42.7
265 Green Onion 0.5
266 Sandwich 29.5
267 Nuts 0.6
268 Speed Limit Sign 43.4
269 Induction Cooker 27.3
270 Broom 19.6
271 Trombone 33.0
272 Plum 3.7
273 Rickshaw 24.7
274 Goldfish 14.3
275 Kiwi fruit 8.3
276 Router/modem 14.5
277 Poker Card 31.2
278 Toaster 48.6
279 Shrimp 5.0
280 Sushi 22.3
281 Cheese 21.2
282 Notepaper 9.3
283 Cherry 4.3
284 Pliers 12.2
285 CD 21.1
286 Pasta 36.3
287 Hammer 20.6
288 Cue 46.2
289 Avocado 9.8
290 Hamimelon 2.9
291 Flask 31.2
292 Mushroon 3.1
293 Screwdriver 13.6
294 Soap 19.0
295 Recorder 32.9
296 Bear 46.2
297 Eggplant 8.8
298 Board Eraser 38.7
299 Coconut 10.0
300 Tape Measur/ Ruler 15.0
301 Pig 50.0
302 Showerhead 11.6
303 Globe 57.1
304 Chips 13.7
305 Steak 28.2
306 Crosswalk Sign 48.2
307 Stapler 27.2
308 Campel 55.3
309 Formula 1 59.1
310 Pomegranate 3.9
311 Dishwasher 53.1
312 Crab 11.6
313 Hoverboard 29.9
314 Meat ball 7.6
315 Rice Cooker 30.5
316 Tuba 24.6
317 Calculator 38.0
318 Papaya 4.3
319 Antelope 24.4
320 Parrot 34.1
321 Seal 41.1
322 Buttefly 36.4
323 Dumbbell 8.0
324 Donkey 42.0
325 Lion 33.8
326 Urinal 53.0
327 Dolphin 39.5
328 Electric Drill 24.1
329 Hair Dryer 10.0
330 Egg tart 5.1
331 Jellyfish 40.2
332 Treadmill 43.9
333 Lighter 12.6
334 Grapefruit 1.2
335 Game board 37.3
336 Mop 5.8
337 Radish 0.6
338 Baozi 40.2
339 Target 14.6
340 French 27.3
341 Spring Rolls 29.1
342 Monkey 37.8
343 Rabbit 36.4
344 Pencil Case 22.8
345 Yak 37.7
346 Red Cabbage 7.1
347 Binoculars 15.7
348 Asparagus 4.1
349 Barbell 17.1
350 Scallop 12.4
351 Noddles 21.1
352 Comb 14.1
353 Dumpling 5.2
354 Oyster 17.6
355 Table Teniis paddle 22.2
356 Cosmetics Brush/Eyeliner Pencil 40.3
357 Chainsaw 13.8
358 Eraser 16.4
359 Lobster 18.0
360 Durian 33.7
361 Okra 0.1
362 Lipstick 36.3
363 Cosmetics Mirror 8.2
364 Curling 44.7
365 Table Tennis 25.1

What are you waiting? Try it!#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from focoos import Focoos
import os

# Initialize the Focoos client with your API key
focoos = Focoos(api_key=os.getenv("FOCOOS_API_KEY"))

# Get the remote model (fai-rtdetr-s-coco) from Focoos API
model = focoos.get_remote_model("fai-rtdetr-m-obj365")

# Run inference on an image
predictions = model.infer("./image.jpg", threshold=0.5)

# Output the predictions
print(predictions)