The Cityscapes Dataset

Introduction

Cityscapes is commonly used for semantic segmentation. Its data is divided into 8 categories in total, including one named “void”, and each category contains multiple classes. Cityscapes has 30 classes in all, but once labeled there are 35 kinds of labels in total, which also include labels such as “unlabeled” that are not counted as classes.

The data most commonly used in papers is only a portion of this, namely the finely annotated dataset. Some additional coarsely annotated data is used relatively rarely. Let’s first introduce this commonly used data.

Some of the commonly used data is shown in the figure below:

Commonly used Cityscapes data — Examples of commonly used Cityscapes data

The top two are the masks after annotation, and the bottom two are the original data, i.e. the 8-bit photos. These two pairs of data are also split into two parts: one part is finely annotated, and the other is additional coarsely annotated data, corresponding to train_extra, which is usually not used for training. For the finely annotated data, there are 5000 photos in total along with their corresponding annotations, of which 2975 are used for train, 500 for val, and 1525 for test and benchmark.

The Annotated Data in Cityscapes

Here we focus on the annotated data in Cityscapes, which is a bit more involved. For each image, there are four kinds of annotation data, also included in the gtFine folder. The four annotations corresponding to each image are: xxx_gtFine_color.png, xxx_gtFine_instanceIds.png, xxx_gtFine_labelIds.png, and xxx_gtFine_polygons.json. Among them, xxx_gtFine_color.png is a colored representation of the annotation, mainly used for visualization; xxx_gtFine_instanceIds.png is the annotation used for the instance-level segmentation task; xxx_gtFine_labelIds.png is the annotation used for pixel-level segmentation; and xxx_gtFine_polygons.json contains the specific information of the annotation, including the image dimensions, the annotated category, and the vertices of the polygons.

The annotation information for each image

Note that the files here are not suitable for use in training directly. Cityscapes assigns a number to every label, including things like sky; we call this number the absolute ID.

For the instance-level task, in xxx_gtFine_instanceIds.png, the grayscale value corresponding to each pixel does not map one-to-one to these IDs; instead, some functions are needed to convert them. These functions can be found in cityscapesscripts. The converted file is xxx_gtFine_instanceTrainIds.png, after which it can be used for instance-level training.

For the pixel-level task, likewise you cannot directly use xxx_gtFine_labelIds.png as the annotation. The grayscale values here represent the number Cityscapes assigns to each label, i.e. the absolute ID, and many of these labels are not used in training or in the final val, such as sky. So these IDs also need to be converted. You can use the createTrainIdLabelImgs function in cityscapesscripts to convert them into xxx_gtFine_labelTrainIds.png, where the grayscale value corresponding to each pixel is a temporary ID, comprising the IDs -1, 0->18, and 255. Neither the -1 nor the 255 IDs will be detected during val, so the two can be grouped into ID 19, which means the final training IDs become the 20 classes of labels from 0 to 19, and therefore there are 20 categories to classify in total. Note that if you want to submit your results to the Cityscapes benchmark, you must convert these classes back to the absolute IDs before uploading. The correspondence between the various IDs and colors can be found in the labels.py file in cityscapesscripts, where the trainId is precisely the temporary ID used for pixel-level segmentation training.

name	id	trainId	category	catId	hasInstances	ignoreInEval	color
unlabeled	0	255	void	0	False	True	(0, 0, 0)
ego vehicle	1	255	void	0	False	True	(0, 0, 0)
rectification border	2	255	void	0	False	True	(0, 0, 0)
out of roi	3	255	void	0	False	True	(0, 0, 0)
static	4	255	void	0	False	True	(0, 0, 0)
dynamic	5	255	void	0	False	True	(111, 74, 0)
ground	6	255	void	0	False	True	(81, 0, 81)
road	7	0	flat	1	False	False	(128, 64, 128)
sidewalk	8	1	flat	1	False	False	(244, 35, 232)
parking	9	255	flat	1	False	True	(250, 170, 160)
rail track	10	255	flat	1	False	True	(230, 150, 140)
building	11	2	construction	2	False	False	(70, 70, 70)
wall	12	3	construction	2	False	False	(102, 102, 156)
fence	13	4	construction	2	False	False	(190, 153, 153)
guard rail	14	255	construction	2	False	True	(180, 165, 180)
bridge	15	255	construction	2	False	True	(150, 100, 100)
tunnel	16	255	construction	2	False	True	(150, 120, 90)
pole	17	5	object	3	False	False	(153, 153, 153)
polegroup	18	255	object	3	False	True	(153, 153, 153)
traffic light	19	6	object	3	False	False	(250, 170, 30)
traffic sign	20	7	object	3	False	False	(220, 220, 0)
vegetation	21	8	nature	4	False	False	(107, 142, 35)
terrain	22	9	nature	4	False	False	(152, 251, 152)
sky	23	10	sky	5	False	False	(70, 130, 180)
person	24	11	human	6	True	False	(220, 20, 60)
rider	25	12	human	6	True	False	(255, 0, 0)
car	26	13	vehicle	7	True	False	(0, 0, 142)
truck	27	14	vehicle	7	True	False	(0, 0, 70)
bus	28	15	vehicle	7	True	False	(0, 60, 100)
caravan	29	255	vehicle	7	True	True	(0, 0, 90)
trailer	30	255	vehicle	7	True	True	(0, 0, 110)
train	31	16	vehicle	7	True	False	(0, 80, 100)
motorcycle	32	17	vehicle	7	True	False	(0, 0, 230)
bicycle	33	18	vehicle	7	True	False	(119, 11, 32)
license plate	-1	-1	vehicle	7	False	True	(0, 0, 142)

The category annotations in Cityscapes

Technology

2018 · 11 · 02