from: "Keras" semantic segmentation of remote sensing images based on segnet and U-net
Two months to participate in a competition, do is the remote sensing HD image to do semantic segmentation, the name of the "Eye of the sky." At the end of this two-week data mining class, project we selected is also a semantic segmentation of remote sensing images, so just the previous period of time to do the results of the reorganization and strengthen a bit, so wrote this article, record the use of in-depth learning to do remote sensing image semantic segmentation of the complete flow and some good ideas and skills. data sets
First of all, the data set we use is the data provided by the CCF large data competition (HD remote sensing image of a city in southern China in 2015), a small dataset containing 5 large size RGB Remote sensing images (size range from 3000x3000 to 6000x 6000), there are 4 types of objects, vegetation (Mark 1), Construction (Mark 2), Water (Mark 3), Road (Mark 4) and others (Mark 0). Among them, arable land, woodland, grassland are classified as vegetation, in order to better observe the annotation, we will be three of the training pictures visualized as follows: blue-water, yellow-house, green-vegetation, brown-road. A more detailed description can be observed here.
Now let's talk about the steps of our data processing. We now have 5 large size remote sensing images that we can't train directly into the network because the memory can't handle it and their sizes are different. So we'll start by randomly cutting them, that is, randomly generated x,y coordinates, and then pull out the coordinates of the 256*256 small map, and do the following data enhancement operations: the original and the label map need to rotate: 90 degrees, 180 degrees, 270 degrees original and the label map all need to do along the Y axis mirroring operation Original artwork do fuzzy operation original to do illumination adjustment operation original artwork to do increase noise operation (Gaussian noise, salt and pepper noise)
Instead of using the Keras data augmentation function, I wrote the corresponding enhancement function using OPENCV.
Img_w = 256 Img_h = 256 Image_sets = [' 1.png ', ' 2.png ', ' 3.png ', ' 4.png ', ' 5.png '] def gamma_transform (IMG, gamma): Gamma_table = [Np.power (x/255.0, Gamma) * 255.0 for x in range (256)] gamma_table = Np.round (Np.array (gamma_table)). A Stype (np.uint8) return CV2. LUT (IMG, gamma_table) def random_gamma_transform (IMG, gamma_vari): Log_gamma_vari = Np.log (Gamma_vari) Alpha = NP . Random.uniform (-log_gamma_vari, Log_gamma_vari) gamma = Np.exp (Alpha) return Gamma_transform (IMG, gamma) de F Rotate (xb,yb,angle): M_rotate = cv2.getrotationmatrix2d (IMG_W/2, IMG_H/2), angle, 1) XB = Cv2.warpaffine (XB, m_ Rotate, (img_w, img_h)) YB = Cv2.warpaffine (YB, M_rotate, (img_w, Img_h)) return Xb,yb def Blur (IMG): img
= Cv2.blur (IMG, (3, 3));
return img def add_noise (IMG): For i in range: #添加点噪声 temp_x = Np.random.randint (0,img.shape[0)) temp_y = Np.random.randint (0,img.shape[1]) img[temp_x][temp_y] = 255 RETURN img def data_augment (xb,yb): If Np.random.random () < 0.25:xb,yb = Rotate (xb,yb,90) if NP . Random.random () < 0.25:xb,yb = Rotate (xb,yb,180) if Np.random.random () < 0.25:xb,yb = Rotate ( xb,yb,270) if Np.random.random () < 0.25:XB = Cv2.flip (XB, 1) # Flipcode > 0: Flip along y-axis YB = Cv2.fli P (YB, 1) if Np.random.random () < 0.25:XB = Random_gamma_transform (xb,1.0) if Np.rand
Om.random () < 0.25:XB = Blur (XB) if Np.random.random () < 0.2:XB = Add_noise (XB) Return Xb,yb def creat_dataset (image_num = 100000, mode = ' original '): Print (' creating DataSet ... ') IMAGE_EAC h = Image_num/len (image_sets) g_count = 0 for I in TQDM (range (len (image_sets))): Count = 0 src_i MG = Cv2.imread ('./data/src/' + image_sets[i]) # 3 Channels label_img = Cv2.imread ('./data/label/' + image_sets[i ],cv2. Imread_grayscale) # single Channel x_height,x_width,_ = Src_img.shape while Count < Image_each:random_width
= Random.randint (0, x_width-img_w-1) random_height = Random.randint (0, X_height-img_h-1) Src_roi = Src_img[random_height:random_height + img_h, Random_width:random_width + img_w,:] Label_roi = Labe
L_img[random_height:random_height + img_h, Random_width:random_width + img_w] if mode = = ' augment ': Src_roi,label_roi = Data_augment (src_roi,label_roi) visualize = Np.zeros ((256,256)). Astyp E (np.uint8) visualize = Label_roi *50 cv2.imwrite ('./aug/train/visualize/%d.png '% g _count), visualize) cv2.imwrite ('./aug/train/src/%d.png '% g_count), Src_roi) Cv2.imwrite ('./aug/t Rain/label/%d.png '% g_count), Label_roi) Count = 1 G_count + = 1
After the above data enhancement operation, we obtained the larger training set: 100000 256*256 picture.
Convolution Neural network
Faced with this kind of image semantic segmentation task, we can choose a lot of classic networks, such as Fcn,u-net,segnet,deeplab,refinenet,mask rcnn,hed Net These are very classic and in many competitions are widely used in the network architecture. So we can choose from one or two classic networks as our solution for this split task. According to our group, we selected U-net and segnet as our main network to conduct experiments. segnet
Segnet has been out for years, this is not the most effective semantic segmentation network, but it wins in the network structure is clear and understandable, training fast pit less, so we also take it to do the same task. SEGNET Network structure is encoder-decoder structure, very elegant, it is noteworthy that segnet do semantic segmentation usually at the end of the CRF module to do after processing, in order to further refined edge segmentation results. I'm interested to see here.
Now to explain the Code section, first we first define the Segnet network structure.
Def segnet (): Model = sequential () #encoder Model.add (conv2d (3,3), strides= (1,1), input_shape= (3,img_w , img_h), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (conv2d (3,3), strides= ( 1,1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (Maxpooling2d (pool_size=) ) # (128,128) Model.add (conv2d (128, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ')) model.ad D (Batchnormalization ()) Model.add (conv2d (128, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') mode L.add (Batchnormalization ()) Model.add (Maxpooling2d (pool_size= (2, 2))) # (64,64) Model.add (conv2d (256, 3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (256, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add ( 256, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (Maxpooling2d (pool_size= (2, 2)) # (32,32) Model.add (conv2d (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ')) model.ad D (Batchnormalization ()) Model.add (conv2d (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') mode
L.add (Batchnormalization ()) Model.add (conv2d (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ')) Model.add (Batchnormalization ()) Model.add (Maxpooling2d (pool_size= (2, 2))) # (16,16) Model.add (conv2d (512, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add ( (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (Con V2d (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (Maxpooling2d (pool_size=(2, 2)) # (8,8) #decoder Model.add (Upsampling2d (size= (2,2))) # (16,16) Model.add (conv2d (3, 3), strides = (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (conv2d (3, 3), str ides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (conv2d (512, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (SI Ze= (2, 2))) # (32,32) Model.add (conv2d (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ')) m
Odel.add (Batchnormalization ()) Model.add (conv2d (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu '))
Model.add (Batchnormalization ()) Model.add (conv2d (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ')) Model.add (Batchnormalization ()) Model.add (Upsampling2d (size= (2, 2)) # (64,64) Model.add (conv2d (25 6, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (conv2d (256, 3 , 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (256 , (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (Upsamp Ling2d (Size= (2, 2))) # (128,128) Model.add (conv2d (128, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (conv2d (128, (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ') Model.add (Batchnormalization ()) Model.add (Upsampling2d (size= (2, 2)) # (256,256) Model.add ( Conv2d (3, 3), strides= (1, 1), input_shape= (3,img_w, img_h), padding= ' same ', activation= ' Relu ')) Model.add (Batch Normalization ()) Model.add (conv2d (3, 3), strides= (1, 1), padding= ' same ', activation= ' Relu ')) Model.add (Ba
Tchnormalization ()) Model.add (conv2d (N_label, (1, 1), strides= (1, 1), padding= ' same ')) Model.add (Reshape ((n_label,img_w*img_h)))
#axis = 1 and axis=2 interchange position, equivalent to Np.swapaxes (layer,1,2) Model.add (Permute (2,1)) Model.add (Activation (' Softmax ')) Model.compile (loss= ' categorical_crossentropy ', optimizer= ' sgd ', metrics=[' accuracy ']) model.summary () return mo Del
The
Then needs to be read into the dataset. The validation set size we chose here is 0.25 of the training set.
def get_train_val (val_rate = 0.25): Train_url = [] Train_set = [] Val_set = [] for pic in Os.listdir ( filepath + ' src '): Train_url.append (pic) random.shuffle (train_url) total_num = Len (train_url) Val_num = Int (val_rate * total_num) for I in Range (len (train_url)): If I < Val_num:val_set.append (trai
N_url[i]) else:train_set.append (Train_url[i]) return Train_set,val_set # Data for training
def generatedata (batch_size,data=[]): #print ' Generatedata ... ' while true:train_data = [] Train_label = [] Batch = 0 for I in (Range (len (data)): url = data[i] Batch + + 1 #print (filepath + ' src/' + URL) #img = load_img (filepath + ' src/' + URL, target_size= (img_w , img_h)) img = load_img (filepath + ' src/' + URL) img = Img_to_array (img) # Print I MG # Print IMg.shape train_data.append (img) #label = load_img (filepath + ' label/' + URL, target_size= (img_ W, Img_h), grayscale=true) label = load_img (filepath + ' label/' + URL, grayscale=true) label = Img_
To_array (label). Reshape ((img_w * img_h,)) # Print Label.shape train_label.append (label) If batch% batch_size==0: #print ' Get enough bacth!\n ' Train_data = Np.array (trai N_data) Train_label = Np.array (Train_label). Flatten () Train_label = Labelencoder.trans Form (train_label) Train_label = to_categorical (Train_label, Num_classes=n_label) train
_label = Train_label.reshape ((batch_size,img_w * Img_h,n_label)) yield (Train_data,train_label) Train_data = [] Train_label = [] Batch = 0 # data for validation def GE Neratevaliddata (Batch_siZe,data=[]): #print ' Generatevaliddata ... ' while true:valid_data = [] Valid_label = [] Batch = 0 for I in (Range (len (data)): url = data[i] Batch = 1 #i
MG = load_img (filepath + ' src/' + URL, target_size= (img_w, img_h)) img = load_img (filepath + ' src/' + URL) #print img #print (filepath + ' src/' + URL) img = Img_to_array (img) # Print Img.shape valid_data.append (img) #label = load_img (filepath + ' label/' + URL, target_size= (im G_w, Img_h), grayscale=true) label = load_img (filepath + ' label/' + URL, grayscale=true) label = IM
G_to_array (label). Reshape ((img_w * img_h,)) # Print Label.shape valid_label.append (label) If batch% Batch_size==0:valid_data = Np.array (valid_data) Valid_label = NP . Array (Valid_label). FlatTen () Valid_label = Labelencoder.transform (valid_label) Valid_label = to_categorical (v
Alid_label, Num_classes=n_label) Valid_label = Valid_label.reshape ((batch_size,img_w * Img_h,n_label))
Yield (valid_data,valid_label) valid_data = [] Valid_label = [] Batch = 0
Then define the process of our training, in which we set the batch size to 16,epoch 30, storing the best model (save_best_only=true) each time, and drawing the LOSS/ACC curve at the end of the training, and stored it up.
Def train (args): epochs = BS = model = Segnet () Modelcheck = Modelcheckpoint (args[' model '],monito R= ' Val_acc ', save_best_only=true,mode= ' max ') callable = [Modelcheck] Train_set,val_set = Get_train_val () tr Ain_numb = Len (train_set) valid_numb = Len (val_set) print ("The number of train data is", Train_numb) pri NT ("The number of Val data is", valid_numb) H = Model.fit_generator (Generator=generatedata (bs,train_set), Steps_per_epo Ch=train_numb//bs,epochs=epochs,verbose=1, Validation_data=generatevaliddata (bs,val_set), Validation_ steps=valid_numb//bs,callbacks=callable,max_q_size=1) # Plot the training loss and accuracy Plt.style.use ("GGPL OT ") plt.figure () N = epochs Plt.plot (np.arange (0, N), h.history[" loss "], label=" Train_loss ") Plt.plot (NP.
Arange (0, N), h.history["Val_loss"], label= "Val_loss") Plt.plot (np.arange (0, N), h.history["ACC"], label= "TRAIN_ACC") Plt.plot (Np.arange (0, N), h.history["Val_acc"], label= "VAL_ACC") plt.title ("Training Loss and accuracy on segnet satellite Seg") plt.x Label ("Epoch #") Plt.ylabel ("Loss/accuracy") plt.legend (loc= "lower left") Plt.savefig (args["Plot"])
Then began the long training, the training time is close to 3 days, draws out the LOSS/ACC figure as follows:
Training loss to about 0.1, ACC can go to 0.9, but the validation set loss and ACC are not so good, seemingly there are some problems.
First of all, look at the results of the forecast.
Here we need to think about how to predict the entire remote sensing image. We know that the image input we choose when we train the model is 256x256, so we should also use 256x256 image dimension to predict the model. Now we're going to think about a problem, how do we rearrange these small, predictable graphs into a larger picture? Here is a most basic solution: first, to do padding 0 operation of the big picture, get a pair of large padding, and we also generate a picture of the same size of the whole 0 figure A, the image of the dimensions of the 256 to complete the multiple, and then 256 for the step size cut large figure, in turn, the small map into the model prediction, The predicted small picture is placed in the corresponding position of a, sequentially, and finally get the whole big picture (ie a), then do the image cutting, cut into the size of the original picture, complete the forecast process.
def predict (args): # load the trained convolutional neural network print ("[INFO] Loading network ...") model =
Load_model (args["model")) Stride = args[' stride ' for N in range (len (test_set)): Path = Test_set[n]
#load the image image = Cv2.imread ('./test/' + path) # pre-process the image for classification #image = Image.astype ("float")/255.0 #image = Img_to_array (image) H,w,_ = Image.shape Padding_h = (h//stride + 1) * Stride padding_w = (w//stride + 1) * Stride padding_img = Np.zeros ((padding_h,padding _w,3), Dtype=np.uint8) padding_img[0:h,0:w,:] = image[:,:,:] padding_img = Padding_img.astype ("float")/25 5.0 padding_img = Img_to_array (padding_img) print ' src: ', padding_img.shape mask_whole = Np.zeros ( padding_h,padding_w), dtype=np.uint8) for I in Range (Padding_h//stride): for J in Range (Padding_w//stri DE): CROp = padding_img[:3,i*stride:i*stride+image_size,j*stride:j*stride+image_size] _,CH,CW = Crop.shape If CH!= 256 or CW!=