fai-m2f-m-ade#
Overview#
The models is a Mask2Former model otimized by FocoosAI for the ADE20K dataset. It is a semantic segmentation model able to segment 150 classes, comprising both stuff (sky, road, etc.) and thing (dog, cat, car, etc.).
Benchmark#
Note: FPS are computed on NVIDIA T4 using TensorRT and image size 640x640.
Model Details#
The model is based on the Mask2Former architecture. It is a segmentation model that uses a transformer-based encoder-decoder architecture. Differently from traditional segmentation models (such as DeepLab), Mask2Former uses a mask-classification approach, where the prediction is made by a set of segmentation mask with associated class probabilities.
Neural Network Architecture#
The Mask2Former FocoosAI implementation optimize the original neural network architecture for improving the model's efficiency and performance. The original model is fully described in this paper.
Mask2Former is a hybrid model that uses three main components: a backbone for extracting features, a pixel decoder for upscaling the features, and a transformer-based decoder for generating the segmentation output.
In this implementation:
- the backbone is STDC-2 that show an amazing trade-off between performance and efficiency.
- the pixel decoder is a FPN getting the features from the stage 2 (1/4 resolution), 3 (1/8 resolution), 4 (1/16 resolution) and 5 (1/32 resolution) of the backbone. Differently from the original paper, for the sake of portability, we removed the deformable attention modules in the pixel decoder, speeding up the inference while only marginally affecting the accuracy.
- the transformer decoder is a lighter version of the original, having only 3 decoder layers (instead of 9) and 100 learnable queries.
Losses#
We use the same losses as the original paper:
- loss_ce: Cross-entropy loss for the classification of the classes
- loss_dice: Dice loss for the segmentation of the classes
- loss_mask: A binary cross-entropy loss applied to the predicted segmentation masks
These losses are applied to each output of the transformer decoder, meaning that we apply it on the output and on each auxiliary output of the 3 transformer decoder layers. Please refer to the Mask2Former paper for more details.
Output Format#
The pre-processed output of the model is set of masks with associated class probabilities. In particular, the output is composed by three tensors:
- class_ids: a tensor of 100 elements containing the class id associated with each mask (such as 1 for wall, 2 for building, etc.)
- scores: a tensor of 100 elements containing the corresponding probability of the class_id
- masks: a tensor of shape (100, H, W) where H and W are the height and width of the input image and the values represent the index of the class_id associated with the pixel
The model does not need NMS (non-maximum suppression) because the output is already a set of masks with associated class probabilities and has been trained to avoid overlapping masks.
After the post-processing, the output is a Focoos Detections object containing the predicted masks with confidence greather than a specific threshold (0.5 by default).
Classes#
The model is pretrained on the ADE20K dataset with 150 classes.
Class | mIoU | |
---|---|---|
1 | wall | 75.369549 |
2 | building | 79.835995 |
3 | sky | 94.176995 |
4 | floor | 79.620841 |
5 | tree | 73.204506 |
6 | ceiling | 82.303035 |
7 | road, route | 80.822591 |
8 | bed | 87.573840 |
9 | window | 57.452584 |
10 | grass | 70.099493 |
11 | cabinet | 56.903790 |
12 | sidewalk, pavement | 62.247267 |
13 | person | 79.460606 |
14 | earth, ground | 38.537802 |
15 | door | 43.930878 |
16 | table | 56.753292 |
17 | mountain, mount | 61.160462 |
18 | plant | 48.995487 |
19 | curtain | 71.951930 |
20 | chair | 52.852125 |
21 | car | 80.725703 |
22 | water | 51.233498 |
23 | painting, picture | 66.989493 |
24 | sofa | 58.103663 |
25 | shelf | 34.979205 |
26 | house | 36.828611 |
27 | sea | 51.219096 |
28 | mirror | 58.572852 |
29 | rug | 54.897799 |
30 | field | 29.053876 |
31 | armchair | 39.565663 |
32 | seat | 53.113668 |
33 | fence | 41.113128 |
34 | desk | 37.930189 |
35 | rock, stone | 44.940982 |
36 | wardrobe, closet, press | 39.897858 |
37 | lamp | 60.921356 |
38 | tub | 78.041637 |
39 | rail | 31.893878 |
40 | cushion | 53.029316 |
41 | base, pedestal, stand | 20.233620 |
42 | box | 18.276924 |
43 | column, pillar | 42.655306 |
44 | signboard, sign | 35.959448 |
45 | chest of drawers, chest, bureau, dresser | 36.521600 |
46 | counter | 29.353667 |
47 | sand | 38.729599 |
48 | sink | 72.303141 |
49 | skyscraper | 44.122387 |
50 | fireplace | 66.614683 |
51 | refrigerator, icebox | 72.137179 |
52 | grandstand, covered stand | 29.061628 |
53 | path | 26.629478 |
54 | stairs | 31.833328 |
55 | runway | 76.017706 |
56 | case, display case, showcase, vitrine | 37.452627 |
57 | pool table, billiard table, snooker table | 93.246039 |
58 | pillow | 54.689591 |
59 | screen door, screen | 58.096890 |
60 | stairway, staircase | 29.962829 |
61 | river | 15.010211 |
62 | bridge, span | 66.617580 |
63 | bookcase | 31.383789 |
64 | blind, screen | 39.221180 |
65 | coffee table | 63.300795 |
66 | toilet, can, commode, crapper, pot, potty, stool, throne | 84.038177 |
67 | flower | 35.994798 |
68 | book | 43.252042 |
69 | hill | 6.240850 |
70 | bench | 35.007473 |
71 | countertop | 56.592858 |
72 | stove | 74.866261 |
73 | palm, palm tree | 49.092486 |
74 | kitchen island | 32.353614 |
75 | computer | 57.673329 |
76 | swivel chair | 43.202283 |
77 | boat | 48.170742 |
78 | bar | 24.034261 |
79 | arcade machine | 11.467819 |
80 | hovel, hut, hutch, shack, shanty | 10.258017 |
81 | bus | 81.375072 |
82 | towel | 54.954106 |
83 | light | 53.256340 |
84 | truck | 29.656645 |
85 | tower | 36.864496 |
86 | chandelier | 63.787459 |
87 | awning, sunshade, sunblind | 23.610311 |
88 | street lamp | 29.944617 |
89 | booth | 29.360433 |
90 | tv | 61.512572 |
91 | plane | 53.270513 |
92 | dirt track | 4.206758 |
93 | clothes | 35.342074 |
94 | pole | 20.678348 |
95 | land, ground, soil | 3.195710 |
96 | bannister, banister, balustrade, balusters, handrail | 17.522631 |
97 | escalator, moving staircase, moving stairway | 20.889345 |
98 | ottoman, pouf, pouffe, puff, hassock | 47.003450 |
99 | bottle | 15.504667 |
100 | buffet, counter, sideboard | 26.077572 |
101 | poster, posting, placard, notice, bill, card | 30.691103 |
102 | stage | 11.744151 |
103 | van | 40.161822 |
104 | ship | 79.300311 |
105 | fountain | 0.112958 |
106 | conveyer belt, conveyor belt, conveyer, conveyor, transporter | 60.552373 |
107 | canopy | 25.086350 |
108 | washer, automatic washer, washing machine | 63.550537 |
109 | plaything, toy | 18.290597 |
110 | pool | 32.873865 |
111 | stool | 39.256308 |
112 | barrel, cask | 6.358771 |
113 | basket, handbasket | 29.850719 |
114 | falls | 57.657161 |
115 | tent | 93.717152 |
116 | bag | 10.629695 |
117 | minibike, motorbike | 56.217901 |
118 | cradle | 69.441302 |
119 | oven | 38.940583 |
120 | ball | 45.543376 |
121 | food, solid food | 52.779065 |
122 | step, stair | 10.843115 |
123 | tank, storage tank | 30.871163 |
124 | trade name | 27.908376 |
125 | microwave | 32.381977 |
126 | pot | 41.040635 |
127 | animal | 55.882266 |
128 | bicycle | 50.185374 |
129 | lake | 0.007605 |
130 | dishwasher | 58.970317 |
131 | screen | 60.016197 |
132 | blanket, cover | 26.963189 |
133 | sculpture | 27.667732 |
134 | hood, exhaust hood | 58.025458 |
135 | sconce | 39.341998 |
136 | vase | 31.185747 |
137 | traffic light | 23.810429 |
138 | tray | 7.244281 |
139 | trash can | 30.072544 |
140 | fan | 52.113861 |
141 | pier | 56.678802 |
142 | crt screen | 9.133357 |
143 | plate | 38.900407 |
144 | monitor | 3.323130 |
145 | bulletin board | 52.337659 |
146 | shower | 4.692180 |
147 | radiator | 43.811464 |
148 | glass, drinking glass | 14.036491 |
149 | clock | 25.044316 |
150 | flag | 40.007933 |
What are you waiting? Try it!#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|