Once you give the program an image, it predicts the each pixel as one of the classes (e.g. person, car, road, etc.) To put it simply, semantic segmentation is classifying each pixel. Useful for various other tasks such as image processing, object detection, etc. The labels are given in the same H x W dimension of the image, i.e. a mask. The mask labels each pixel in the image with one of the classes. The mask can be visualized to show different types of objects within the image.
Need to work on getting the accuracy better... train more models