The Cityscapes Dataset
Introduction
Cityscapes is commonly used for semantic segmentation. Its data is divided into 8 categories in total, including one named “void”, and each category contains multiple classes. Cityscapes has 30 classes in all, but once labeled there are 35 kinds of labels in total, which also include labels such as “unlabeled” that are not counted as classes.
The data most commonly used in papers is only a portion of this, namely the finely annotated dataset. Some additional coarsely annotated data is used relatively rarely. Let’s first introduce this commonly used data.
Some of the commonly used data is shown in the figure below:

The top two are the masks after annotation, and the bottom two are the original data, i.e. the 8-bit photos. These two pairs of data are also split into two parts: one part is finely annotated, and the other is additional coarsely annotated data, corresponding to train_extra, which is usually not used for training. For the finely annotated data, there are 5000 photos in total along with their corresponding annotations, of which 2975 are used for train, 500 for val, and 1525 for test and benchmark.
The Annotated Data in Cityscapes
Here we focus on the annotated data in Cityscapes, which is a bit more involved. For each image, there are four kinds of annotation data, also included in the gtFine folder. The four annotations corresponding to each image are: xxx_gtFine_color.png, xxx_gtFine_instanceIds.png, xxx_gtFine_labelIds.png, and xxx_gtFine_polygons.json. Among them, xxx_gtFine_color.png is a colored representation of the annotation, mainly used for visualization; xxx_gtFine_instanceIds.png is the annotation used for the instance-level segmentation task; xxx_gtFine_labelIds.png is the annotation used for pixel-level segmentation; and xxx_gtFine_polygons.json contains the specific information of the annotation, including the image dimensions, the annotated category, and the vertices of the polygons.

Note that the files here are not suitable for use in training directly. Cityscapes assigns a number to every label, including things like sky; we call this number the absolute ID.
For the instance-level task, in xxx_gtFine_instanceIds.png, the grayscale value corresponding to each pixel does not map one-to-one to these IDs; instead, some functions are needed to convert them. These functions can be found in cityscapesscripts. The converted file is xxx_gtFine_instanceTrainIds.png, after which it can be used for instance-level training.
For the pixel-level task, likewise you cannot directly use xxx_gtFine_labelIds.png as the annotation. The grayscale values here represent the number Cityscapes assigns to each label, i.e. the absolute ID, and many of these labels are not used in training or in the final val, such as sky. So these IDs also need to be converted. You can use the createTrainIdLabelImgs function in cityscapesscripts to convert them into xxx_gtFine_labelTrainIds.png, where the grayscale value corresponding to each pixel is a temporary ID, comprising the IDs -1, 0->18, and 255. Neither the -1 nor the 255 IDs will be detected during val, so the two can be grouped into ID 19, which means the final training IDs become the 20 classes of labels from 0 to 19, and therefore there are 20 categories to classify in total. Note that if you want to submit your results to the Cityscapes benchmark, you must convert these classes back to the absolute IDs before uploading. The correspondence between the various IDs and colors can be found in the labels.py file in cityscapesscripts, where the trainId is precisely the temporary ID used for pixel-level segmentation training.
| name | id | trainId | category | catId | hasInstances | ignoreInEval | color |
|---|---|---|---|---|---|---|---|
| unlabeled | 0 | 255 | void | 0 | False | True | (0, 0, 0) |
| ego vehicle | 1 | 255 | void | 0 | False | True | (0, 0, 0) |
| rectification border | 2 | 255 | void | 0 | False | True | (0, 0, 0) |
| out of roi | 3 | 255 | void | 0 | False | True | (0, 0, 0) |
| static | 4 | 255 | void | 0 | False | True | (0, 0, 0) |
| dynamic | 5 | 255 | void | 0 | False | True | (111, 74, 0) |
| ground | 6 | 255 | void | 0 | False | True | (81, 0, 81) |
| road | 7 | 0 | flat | 1 | False | False | (128, 64, 128) |
| sidewalk | 8 | 1 | flat | 1 | False | False | (244, 35, 232) |
| parking | 9 | 255 | flat | 1 | False | True | (250, 170, 160) |
| rail track | 10 | 255 | flat | 1 | False | True | (230, 150, 140) |
| building | 11 | 2 | construction | 2 | False | False | (70, 70, 70) |
| wall | 12 | 3 | construction | 2 | False | False | (102, 102, 156) |
| fence | 13 | 4 | construction | 2 | False | False | (190, 153, 153) |
| guard rail | 14 | 255 | construction | 2 | False | True | (180, 165, 180) |
| bridge | 15 | 255 | construction | 2 | False | True | (150, 100, 100) |
| tunnel | 16 | 255 | construction | 2 | False | True | (150, 120, 90) |
| pole | 17 | 5 | object | 3 | False | False | (153, 153, 153) |
| polegroup | 18 | 255 | object | 3 | False | True | (153, 153, 153) |
| traffic light | 19 | 6 | object | 3 | False | False | (250, 170, 30) |
| traffic sign | 20 | 7 | object | 3 | False | False | (220, 220, 0) |
| vegetation | 21 | 8 | nature | 4 | False | False | (107, 142, 35) |
| terrain | 22 | 9 | nature | 4 | False | False | (152, 251, 152) |
| sky | 23 | 10 | sky | 5 | False | False | (70, 130, 180) |
| person | 24 | 11 | human | 6 | True | False | (220, 20, 60) |
| rider | 25 | 12 | human | 6 | True | False | (255, 0, 0) |
| car | 26 | 13 | vehicle | 7 | True | False | (0, 0, 142) |
| truck | 27 | 14 | vehicle | 7 | True | False | (0, 0, 70) |
| bus | 28 | 15 | vehicle | 7 | True | False | (0, 60, 100) |
| caravan | 29 | 255 | vehicle | 7 | True | True | (0, 0, 90) |
| trailer | 30 | 255 | vehicle | 7 | True | True | (0, 0, 110) |
| train | 31 | 16 | vehicle | 7 | True | False | (0, 80, 100) |
| motorcycle | 32 | 17 | vehicle | 7 | True | False | (0, 0, 230) |
| bicycle | 33 | 18 | vehicle | 7 | True | False | (119, 11, 32) |
| license plate | -1 | -1 | vehicle | 7 | False | True | (0, 0, 142) |
The category annotations in Cityscapes