Project Introduction
Project address: Poke around here. The reason for this project is that the great God wants to participate in the hackmit, need them to identify 10000 pieces in 15000 verification code or each character's recognition accuracy rate to 90%. Then he didn't want to label the data (just so willful ~). The decision was made to generate a batch of verification code (synthesizer synthesizer) and then use a refiner (GAN) to make some adjustments to the synthesized verification code so that they look similar to the actual training samples. So he is equivalent to a batch of labeled verification code, with this part of the callout code to train a classifier, and then the need to hack 15000 pictures to do classification. The paper was made by Apple in 2016, poking around here. But he found that the model he had trained on the data was only 55% accurate for real samples, so he let a classmate marked 4000 pictures to hack (this classmate originally intended to mark 10000), and finally a happy heart of a picture is not marked with the qualification to participate in this competition.
If you don't want to focus on paper details, skip this section and go directly to the project code. Overview
The following figure is the overall structure in the paper. The paper is to synthesize and train a similar eye picture. Overview.jpg
The simulator first synthesizes some pictures (synthetic), then uses a refiner to refine the picture (improve, adjust), and then uses a discriminant (discriminator) to identify the picture after refine and the real but not annotated picture. The goal is to let the discriminant have no way to distinguish between the real picture and the refine out of the picture. Then we can use the simulator to generate a batch of annotated data, and then use refiner to make corrections, the resulting picture is similar to the original training dataset. Objective
Here is a brief overview of the loss functions that the model needs to use. Simulated+unsupervised Learning to learn a refiner with some undated real picture y, this refiner is further used to refine our composite picture X. The key point is to make the composite picture X ' look similar to the real picture, and keep the annotation information. For example, you want the texture of your composite image to be the same as the texture of the real picture, and you can't lose the content information (realism) of the composite picture (the numeric letter above the CAPTCHA). So there are two loss that need refiner to optimize:
The l_real in the figure above refers to the loss between the refine composite picture (x_i ') and the real picture Y. L_reg is the loss between the original synthetic picture x_i and the x_i of the synthesized picture after refine. A lambda is a Gao Shen.
Refiner's goal is to fool the discriminant D as far as possible, so that the discriminant has no way of distinguishing whether a picture is real or synthetic. The object of the discriminant D is just the opposite, and is as distinguishable as possible. So the loss of the discriminant is this:
This is the cross entropy of a two classification, D (.) is the probability that the input picture is a synthetic picture, 1-d (.) is to enter the picture is the probability of a real picture. In other words, if the input picture is a composite picture, then the loss is the first half, and if the input is a real picture, the loss is the second part. In the details of the implementation, when the input is a composite picture x_i then the label is 1, whereas 0. And in each mini-batch, we randomly sample part of the real picture and part of the composite picture. The model uses convnet, and the last level of output is the probability that sample is a composite picture. Finally, use SGD to update the parameters. (Here the discriminant is to use a convolution network, then add a binary_categorical_crossentropy, and then use SGD to reduce loss).
Then, contrary to the discriminant target, refiner should force the discriminant to distinguish between the refine and the composite picture. So it's l_real is purple:
Next is L_reg, in order to retain the original image content information, we need a loss to force the model not to modify the image and the original picture is very different, here introduced self-regularization loss. This loss is the difference between the pixel of the refine image and the pixel of the original picture.
Together, refiner's loss are as follows:
In the course of training, we reduce the loss of refiner and discriminator respectively. When updating the refiner, the discriminator parameters are fixed and not updated, and the refiner parameters are fixed when updating the discriminator parameters.
Here are two tricks. Local adversarial lossrefiner should not introduce artifacts when learning to model real images, and when we train a strong discriminant, refiner tend to emphasize some picture features to fool the current discriminant, This leads to the generation of some artifacts. So how to solve it. I can observe that if we dig a piece (patch) from the refine's composite picture, this statistic (statistics) should be similar to the statistical information of the real picture. So, instead of defining a global discriminant (synthetic or true for the entire picture), we can identify each piece of the picture. In this way, it is not only limited to the receiving domain (receptive field), but also provides more samples for the training discriminant. The discriminant is a full convolution network, its output is w*h a patches is the probability of synthesizing the picture. So when we update the refiner, we can add the cross entropy loss of these w*h patches.
For example, the above graph, the output is the 2*3 matrix, each value indicates that this piece of patch is the probability value of the composite picture. When loss, add up the cross entropy of these 6 pictures.
2. One of the problems with refined's historical picture to update the discriminant counter training is that the discriminant focuses only on the most recent refined images, which can cause two problems-the dispersion of confrontation training and the introduction of the refiner network to the artifacts that the discriminant has long forgotten. The classifier is therefore updated by using refined's historical picture as a buffer rather than just the current mini-batch. The specific method is that in each round of classifier training, we first sampled from the current batch B/2 picture, and then from the size of B buffer sampling B/2 picture, together to update the parameters of the discriminant. Then, after this round, replace the B/2 picture in B with the newly generated B/2 picture. parameter Details
Implementation Details: Refiner: Input picture 55*35=> 64 3*3 Filter => 4 resnet block => 1 1*1 fitler output as a composite picture (black and white, so 1 channels) 1 => Block is purple:
discriminator:96 3*3filter, stride=2 => 64 3*3filter, Stride = 2 => max_pool:3*3, stride=1 => 32 3*3filter,strid E=1 => 32 1*1 filter, stride=1 => 2 1*1 filter, stride=1 =>
Our network is full convolution network, the last layer of refiner and Disriminator is very similar (refiner output is the same size as the original, discriminator to shrink the original image into such as W/4 * h/ 4来 represents the probability of so many patch). First, only use self-regularization loss to train the refiner network 1000 steps, then training discriminator 200 steps. Then every time we update the discriminant, we update refiner two times.
The specific details of the algorithm are as follows: Project Code Overview
Challenges: The Data sample folder that needs to be predicted IMGs: A picture folder SIMGAN-CAPTCHA.IPYNB after extracting from challenges: Process NOTEBOOKARIAL-EXTRA.OTF for the entire project: The simulator generates the font type for the CAPTCHA Avg.png: The competition organizers made some encrypted lines based on each person's information and needed to remove the lines when training. image_history_buffer.py: pretreatment
This part of the original author is the need to download a picture from an address of the Base64 encrypted images downloaded, but because this is last year's game, the URL has not been used. So the author put the corresponding document directly into the challenges inside. We'll just start with the second decompression. Because Python2 and Python3 are not the same, the author should use the Python2, I give the Python3 version of the code here. Decompression
The files under each challenges file are a JSON file containing 1000 base64 encrypted JPG image files, so for each file, we unzip Base64 str into a JPEG and put them under the Orig folder.
Import requests
Import threading
URL = "Https://captcha.delorean.codes/u/rickyhan/challenge"
DIR = " challenges/"
num_challenges =
lock = Threading. Lock ()
import JSON, base64, os
img_dir = "./orig"
fnames = ["{}/challenge-{}". Format (DIR, i) for I in range (NU M_challenges)]
if not os.path.exists (img_dir):
os.mkdir (img_dir)
def save_imgs (fname): with
Open (fname, ' R ') as F:
L = json.loads (F.read (), encoding= "latin-1") for
image in l[' images ']:
byte_image = Bytes (Map (ord,image[' jpg_base64 '))
B = base64.decodebytes (byte_image)
name = image[' name '] with
Open (img_dir+ "/{}.jpg". Format (name), ' WB ') as F:
F.write (b) for
fname in Fnames:
Save_imgs (fname)
assert Len (Os.listdir (img_dir)) = = 1000 * num_challenges
The image after decompression looks like this:
From PIL import Image
imgpath = Img_dir + "/" + Os.listdir (Img_dir) [0]
imgpath2 = Img_dir + "/" + Os.listdir (img_dir ) [3]
im = Image.open (example_image_path)
im2 = Image.open (example_image_path2)
img_fnames = [Img_dir + '/' + p for P in Os.listdir (Img_dir)]
Im
Img2
Convert to Black-and-white picture
The binary graph will save a lot of computation, so we set a threshold here and then convert the picture to the corresponding two-value graph. (The conversion method used here is shown in the following note.) )
def Gray (Img_path):
# Convert to grayscale, then binarize
#L = R * 299/1000 + G * 587/1000 + B * 114/1000
img = Image.open (Img_path). CONVERT ("L") # Convert to gray scale, one 8-bit byte per pixel
img = img.point (lambda x:255 if x > or X = = 0 else x) # value found through t&e
img = img.point (lambda x:0 if x < 255 else 255, "1") # Convert to binary image
Img.save (img_path) for
Img_path in Img_fnames:
Gray (Img_path)
im = Image.open (example_image_path)
im
Extract Mask
You can see that these pictures have the same horizontal line, as mentioned above, because it is the game, so the lines on these captcha are generated according to the contestants ' names. In real life, we can filter out these noises with some form conversion functions of OPENCV (morphological transformation). Here the author uses the sum of all the pictures to get the average mask. He also recommended that you use bit mask (&=) to filter out.
Mask = Np.ones ((height, width)) for
im in IMS:
mask &= im
Here is to add all the pictures to the average:
Import NumPy as NP
WIDTH, HEIGHT = im.size
mask_dir = "Avg.png"
def generatemask ():
n=1000*num_ Challenges
Arr=np.zeros ((HEIGHT, WIDTH), np.float) for
fname in Img_fnames:
Imarr=np.array ( fname), dtype=np.float)
arr=arr+imarr/n
Arr=np.array (Np.round (arr), dtype=np.uint8)
out= Image.fromarray (arr,mode= "L") # Save As Gray scale
out.save (mask_dir)
generatemask ()
im = Image.open (mask_dir) # OK this can do with binary MASK: &=
im
Fix it again.
im = Image.open (mask_dir)
im = Im.point (lambda x:255 if x > 230 else x)
im = Im.point (lambda x:0 if x<255 else 255, "1") # 1-bit bilevel, stored with the leftmost pixel in the most bit. 0 means black, 1 means white.
Im.save (Mask_dir)
im
the builder of the real picture
We also need to put the real picture in the training, so here we use the Keras flow_from_directory to generate pictures and do some preprocessing of the pictures.
From Keras import models from
Keras import layers to
Keras import optimizers from
keras import applications< C3/>from keras.preprocessing Import Image
import TensorFlow as TF
# Real Data Generator
DataGen = image. Imagedatagenerator (
preprocessing_function=applications.xception.preprocess_input
# Call imagenet_ Utils preoprocess Input Function
# Tf:will scale pixels between-1 and 1,sample-wise.
)
Flow_from_directory_params = {' target_size ': (HEIGHT, WIDTH),
' color_mode ': ' Grayscale ',
' Class_mode ': None,
' batch_size ': batch_size}
real_generator = datagen.flow_from_directory (
directory= ".",
**flow_from_directory_params
)
(DUMB) generator (emulator simulator)
Then we need to define a generator to help us generate (CAPTCHA, label) pairs, and these generated captcha should be as much like the real picture as possible.
# synthetic Captcha generator from PIL import Imagefont, imagedraw from random import choice, random from string import as Cii_lowercase, digits alphanumeric = ascii_lowercase + digits def fuzzy_loc (locs): acc = [] for i,loc in Enumera Te (Locs[:-1]): If locs[i+1]-loc < 8:continue else:acc.append (Loc) retur
N ACC def SEG (img): arr = Np.array (img, dtype=np.float) arr = Arr.transpose () # arr = Np.mean (arr, axis=2) arr = Np.sum (arr, axis=1) locs = Np.where (arr < arr.min () + 2) [0].tolist () locs = Fuzzy_loc (locs) return Locs def is_well_formed (img_path): original_img = Image.open (img_path) img = Original_img.convert (' 1 ') return Len (SEG (img)) = = 4 Noiseimg = Np.array (Image.open ("Avg.png"). Convert ("1")) # noiseimg = Np.bitwise_not (noiseimg) fnt = I Magefont.truetype ('./arial-extra.otf ', num) def gen_one (): og = image.new ("1", (100,50)) Text = '. Join ([Choice (ALP Hanumeric) for _ in rangE (4)]) draw = Imagedraw.draw (OG) for I, T in enumerate (text): Txt=image.new (' L ', (40,40)) d = imag Edraw.draw (TXT) d.text ((0, 0), T, Font=fnt, fill=255) if random () > 0.5:w=txt.rotate (-20
* (Random ()-1), expand=1) Og.paste (W, (i*20 + int (25*random ()), int (25+30* (random ()-1)), W) Else: W=txt.rotate (20* (Random ()-1), expand=1) Og.paste (W, (i*20 + int (25*random ()), int (20*random ())), W) segments = SEG (OG) If Len (segments)!= 4:return-Gen_one () Ogarr = Np.array (og) Ogarr = np.bit Wise_or (noiseimg, ogarr) Ogarr = Np.expand_dims (Ogarr, axis=2). Astype (float) Ogarr = Np.random.random (size= (50,100 , 1)) * Ogarr Ogarr = (Ogarr > 0.0). Astype (float) # Add noise return Ogarr, Text def synth_generator (): AR
rs = [] While true:for _ in range (batch_size): img, Text = Gen_one () arrs.append (IMG) Yield Np.array (ARRS) Arrs = []
The above code is a random generation of different character numbers, and then rotate, and then put the characters together, the original noise picture Avg.png plus go, some coincident characters of the verification code to remove. If there is a problem here, it is highly recommended to upgrade the Pillow,debug for a long time ... sigh~
def get_image_batch (Generator): "" "Keras generators may generate a incomplete for the last batch" "
#img_b Atch = Generator.next ()
Img_batch = Next (Generator)
if Len (img_batch)!= batch_size:
img_batch = Generator.next ()
assert len (img_batch) = = batch_size return
img_batch
Look at what the real picture looks like.
Import Matplotlib.pyplot as Plt
%matplotlib inline
Imarr = Get_image_batch (real_generator)
Imarr = imarr[ 0,:,:, 0]
plt.imshow (Imarr)
What does the image we produce look like?
Imarr = Get_image_batch (Synth_generator ()) [0,:,:, 0]
print imarr.shape
plt.imshow (Imarr)
Note that the image above shows the color is because of the use of plt.imshow, is actually a gray two-value map.
This part of the generated code, I personally think that readers can download a github directly download a code generator, and then the picture based on the previous steps into a two-value diagram on the line, and can choose as much as possible with their own need to predict the verification code more similar font. Model Definition
There are three parts of the entire network refiner
Refiner,rθ, is a restnet, it is in the pixel dimension to modify our generated pictures, rather than the overall modification of the picture content, so that you can keep the overall picture structure and annotation. (otherwise embarrassing, if the letter A is turned into another letter is not accurate) Discriminator
The discriminant, dφ, is a simple convnet, containing 5 convolution layers and 2 max-pooling layers, is a two classifier that distinguishes between a validation code that we synthesize or a true sample set. Put ' em together.
Put the refined picture into the discriminant refiner
The main is 4 resnet_block superimposed together, and finally using a 1*1 filter to construct a feature_map as the generated picture. You can see that all of the border_mode are same, which means that any step of the output is consistent with the original picture width (fully convolution).
A resnet_block is purple:
We first put the input picture with 64 3*3 filter to Conv, the results (input_features) and then throw it into 4 resnet_block.
def refiner_network (input_image_tensor): "" ":p Aram Input_image_tensor:input tensor that corresponds to a synthet
IC image.
: Return:output tensor that corresponds to a refined synthetic image. "" "Def Resnet_block (Input_features, nb_features=64, nb_kernel_rows=3, nb_kernel_cols=3):" "A resnet
Blocks with two ' nb_kernel_rows ' x ' nb_kernel_cols ' convolutional layers, each with ' nb_features ' feature maps.
Figure 6 in Https://arxiv.org/pdf/1612.07828v1.pdf.
:p Aram Input_features:input tensor to ResNet block.
: Return:output tensor from ResNet block. "" "Y = layers. Convolution2d (Nb_features, Nb_kernel_rows, Nb_kernel_cols, border_mode= ' same ') (input_features) y = layers. Activation (' Relu ') (y) y = layers. Convolution2d (Nb_features, Nb_kernel_rows, Nb_kernel_cols, border_mode= ' same ') (y) y = layers.merge ([input_feature S, y], mode= ' sum ') return layers.
Activation (' Relu ') (y) # an input image of size wxh be convolved with 3x3 filters that output to feature maps x = layers. Convolution2d (3, 3, border_mode= ' same ', activation= ' Relu ') (input_image_tensor) # The output is passed through 4 R ESnet blocks for _ in range (4): x = Resnet_block (x) # The output of the ' last ' resnet block ' passed to a 1x1 convolutional layer producing 1 feature map # corresponding to the refined of synthetic image return layers.
convolution2d (1, 1, 1, border_mode= ' same ', activation= ' Tanh ') (x)
discriminator
Notice here subsample is strides, because subsample= (2,2) so will the picture long width halved, because there are two, so the final picture will become the original 1/16 or so. For example, the first picture size is 10050, after a transformation is 5025, after a transformation is 25*13.
Finally, two feature_map are generated and one is used to determine if there is a real and one to judge whether it is refined.
def discriminator_network (input_image_tensor): "" ":p Aram Input_image_tensor:input tensor corresponding to an IMA
GE, either real or refined.
: Return:output tensor that corresponds to the probability of whether, an image are real or refined. "" "x = layers. Convolution2d (3, 3, border_mode= ' same ', subsample= (2, 2), activation= ' Relu ') (input_image_tensor) x = layers. Convolution2d (3, 3, border_mode= ' same ', subsample= (2, 2), activation= ' Relu ') (x) x = layers. Maxpooling2d (Pool_size= (3, 3), border_mode= ' same ', strides= (1, 1)) (x) x = layers. Convolution2d (3, 3, border_mode= ' same ', subsample= (1, 1), activation= ' Relu ') (x) x = layers. convolution2d (1, 1, border_mode= ' same ', subsample= (1, 1), activation= ' Relu ') (x) x = layers. Convolution2d (2, 1, 1, border_mode= ' same ', subsample= (1, 1), activation= ' Relu ') (x) # Here one feature map corresponds To ' is_real ' and ' is_refined ', # and the custom loss function is then ' TF.NN.SPARSE_SOFtmax_cross_entropy_with_logits ' return layers. Reshape (( -1, 2)) (x)