Skip to content

fai-m2f-l-ade#

Overview#

The models is a Mask2Former model otimized by FocoosAI for the ADE20K dataset. It is a semantic segmentation model able to segment 150 classes, comprising both stuff (sky, road, etc.) and thing (dog, cat, car, etc.).

Benchmark#

Benchmark Comparison Note: FPS are computed on NVIDIA T4 using TensorRT and image size 640x640.

Model Details#

The model is based on the Mask2Former architecture. It is a segmentation model that uses a transformer-based encoder-decoder architecture. Differently from traditional segmentation models (such as DeepLab), Mask2Former uses a mask-classification approach, where the prediction is made by a set of segmentation mask with associated class probabilities.

Neural Network Architecture#

The Mask2Former FocoosAI implementation optimize the original neural network architecture for improving the model's efficiency and performance. The original model is fully described in this paper.

Mask2Former is a hybrid model that uses three main components: a backbone for extracting features, a pixel decoder for upscaling the features, and a transformer-based decoder for generating the segmentation output.

alt text

In this implementation:

  • the backbone is a Resnet-101,that guarantees a good performance while having good efficiency.
  • the pixel decoder is a FPN getting the features from the stage 2 (1/4 resolution), 3 (1/8 resolution), 4 (1/16 resolution) and 5 (1/32 resolution) of the backbone. Differently from the original paper, for the sake of portability, we removed the deformable attention modules in the pixel decoder, speeding up the inference while only marginally affecting the accuracy.
  • the transformer decoder is a lighter version of the original, having only 6 decoder layers (instead of 9) and 100 learnable queries.

Losses#

We use the same losses as the original paper:

  • loss_ce: Cross-entropy loss for the classification of the classes
  • loss_dice: Dice loss for the segmentation of the classes
  • loss_mask: A binary cross-entropy loss applied to the predicted segmentation masks

These losses are applied to each output of the transformer decoder, meaning that we apply it on the output and on each auxiliary output of the 6 transformer decoder layers. Please refer to the Mask2Former paper for more details.

Output Format#

The pre-processed output of the model is set of masks with associated class probabilities. In particular, the output is composed by three tensors:

  • class_ids: a tensor of 100 elements containing the class id associated with each mask (such as 1 for wall, 2 for building, etc.)
  • scores: a tensor of 100 elements containing the corresponding probability of the class_id
  • masks: a tensor of shape (100, H, W) where H and W are the height and width of the input image and the values represent the index of the class_id associated with the pixel

The model does not need NMS (non-maximum suppression) because the output is already a set of masks with associated class probabilities and has been trained to avoid overlapping masks.

After the post-processing, the output is a Focoos Detections object containing the predicted masks with confidence greather than a specific threshold (0.5 by default).

Classes#

The model is pretrained on the ADE20K dataset with 150 classes.

Class ID Class Name mIoU
1wall77.284
2building81.396
3sky94.337
4floor81.584
5tree74.103
6ceiling83.073
7road, route83.013
8bed88.120
9window61.048
10grass69.099
11cabinet56.303
12sidewalk, pavement62.300
13person82.073
14earth, ground35.094
15door45.140
16table59.436
17mountain, mount60.538
18plant51.829
19curtain71.510
20chair56.219
21car83.766
22water49.028
23painting, picture70.214
24sofa68.081
25shelf35.453
26house45.656
27sea51.205
28mirror61.611
29rug64.144
30field30.577
31armchair45.761
32seat61.850
33fence40.992
34desk41.814
35rock, stone47.600
36wardrobe, closet, press39.846
37lamp64.062
38tub74.760
39rail24.105
40cushion56.811
41base, pedestal, stand27.777
42box24.670
43column, pillar40.094
44signboard, sign33.495
45chest of drawers, chest, bureau, dresser41.847
46counter21.387
47sand29.763
48sink74.092
49skyscraper37.613
50fireplace65.037
51refrigerator, icebox57.648
52grandstand, covered stand46.626
53path24.543
54stairs28.681
55runway73.779
56case, display case, showcase, vitrine38.437
57pool table, billiard table, snooker table91.825
58pillow49.388
59screen door, screen59.058
60stairway, staircase32.832
61river18.597
62bridge, span56.011
63bookcase28.848
64blind, screen43.934
65coffee table59.869
66toilet, can, commode, crapper, pot, potty, stool, throne86.346
67flower38.141
68book42.528
69hill6.905
70bench45.494
71countertop49.007
72stove73.973
73palm, palm tree49.478
74kitchen island42.603
75computer72.142
76swivel chair44.262
77boat73.689
78bar37.749
79arcade machine78.733
80hovel, hut, hutch, shack, shanty30.537
81bus90.808
82towel58.158
83light57.444
84truck31.745
85tower32.058
86chandelier67.524
87awning, sunshade, sunblind28.566
88street lamp30.507
89booth39.696
90tv76.194
91plane50.005
92dirt track18.268
93clothes37.748
94pole23.343
95land, ground, soil0.001
96bannister, banister, balustrade, balusters, handrail16.222
97escalator, moving staircase, moving stairway54.888
98ottoman, pouf, pouffe, puff, hassock32.444
99bottle22.166
100buffet, counter, sideboard48.994
101poster, posting, placard, notice, bill, card31.773
102stage18.731
103van46.747
104ship79.937
105fountain21.205
106conveyer belt, conveyor belt, conveyer, conveyor, transporter62.591
107canopy23.719
108washer, automatic washer, washing machine66.458
109plaything, toy35.377
110pool34.297
111stool41.199
112barrel, cask61.803
113basket, handbasket34.313
114falls57.149
115tent94.077
116bag19.126
117minibike, motorbike71.207
118cradle85.775
119oven50.996
120ball32.601
121food, solid food58.662
122step, stair16.474
123tank, storage tank37.627
124trade name20.788
125microwave37.998
126pot53.411
127animal57.360
128bicycle58.772
129lake41.597
130dishwasher74.543
131screen79.757
132blanket, cover15.202
133sculpture53.537
134hood, exhaust hood52.684
135sconce48.160
136vase45.300
137traffic light35.375
138tray14.093
139trash can30.699
140fan56.574
141pier10.286
142crt screen0.936
143plate53.268
144monitor9.358
145bulletin board29.970
146shower8.978
147radiator59.763
148glass, drinking glass18.246
149clock29.088
150flag37.727

What are you waiting? Try it!#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from focoos import Focoos
import os

# Initialize the Focoos client with your API key
focoos = Focoos(api_key=os.getenv("FOCOOS_API_KEY"))

# Get the remote model (fai-m2f-l-ade) from Focoos API
model = focoos.get_remote_model("fai-m2f-l-ade")

# Run inference on an image
predictions = model.infer("./image.jpg", threshold=0.5)

# Output the predictions
print(predictions)