Record a discussion with the great gods on the application of Gan to NLP

Source: Internet
Author: User

To tell the truth, is to listen to the great God, I just 捧哏-like Ah, ah a few words.

Previously Paperweekly's Gan discussion group had a discussion and a number of issues to vote on. I am more interested in the topic of Gan in NLP, gan and RL, and half supervised Gan. There are also images related to the orthodox question about Gan.

I didn't expect the last Gan in NLP to get the most votes. I used to apply Gan to NLP as a sword-walk slant, and I didn't think so much of like-minded people ...

The next is a complete record of the content of the discussion, the end of this article is my little sentiment.

Some of the discussion questions and answers below are irrelevant, first of all, because according to the order of speech, it is inevitable that there is interruption, second, because in the discussion of the focus of attention, many issues will be summed up at the end of a few contradictions, and there are some content to focus on not many people, So there is no discussion of the outcome. In addition, the following issues are subject to the agenda, and some of the students in the discussion process.

Finally, the following records do not represent my position and do not represent the correct answer. It's just a discussion record. The first question: What is Gan currently trying to do in NLP, what is the main idea, and how effective.

A: There are dialogue systems, there are pure text generation, there are machine translation, there are IR, more and more attempts.
B: There are Chinese participle, text classification.
C: Plain text can do all sorts of strange tasks, as well as poetry generation or something.
D: I have seen a poem written in Tang Dynasty.
E: The main idea is to consider D to do true and false discriminant, and then use the policy  gradient of the RL to grade and update.
1 2 3 4 5 Why use policy gradient instead of using gradient directly?
A: This feeling is a difficult point in NLP itself, and many applications lack very reasonable evaluation criteria.
B:d itself is as a discriminant, so the probability output is the probability as a fraction.
C: Because the text is discrete, it cannot be transmitted.
D: Because discrete sequences have no meaning in gradient trimming.
E: Did some character-level generation examples, the effect is not very good.
F: Or in the original task of adding RL effect is better.
G: That is, the use of Word2vec, such as continuous, fine tuning may get a vector that does not represent anything.
H:policy Gradient's bias will be bigger.
I: In the text generation, especially with the Word2vec. I ask a question here, I do not know whether there are any plans in the industry. Is that the generated vector is in space, basically it is impossible to correspond to a meaningful point (word), then only the most recent, then if the vector is optimized, it is very sensitive to Gan, perhaps the discriminant is more difficult to judge, but the word in the dictionary is not sensitive, Perhaps the order of distances from this generated vector does not change.
1 2 3 4 5 6 7 8 9

So, how can you reduce the sensitivity of Gan? Make the optimization more meaningful
...

How good is the effect of Gan on NLP at present?

A: I remember the best thing about IR was Irgan.
B: I think the difference between Gan and tradition in NLP is not to give the noise to make him build but to improve on the original task, like Irgan, such as its g, is actually a candidate sentence or to Sofmax.
C:g and D are both IR systems, and the article says that in the case of confrontation training, there is usually a side better than direct training.
D:seqgan is relatively early to do, at the end of 16 on the arxiv, but the effect may be better now, such as LeCun just put on arxiv they cast the 17 nips article, with Wgan;irgan and Seqgan are UCL Wang Mr. June and the Zhang Weinan group were handed in.
E:irgan a bit like the Minmax game theory of the previous 80 's. Irgan's reward is also put on G, also not directly generated, and from the candidate to use Softmax pick
F: In addition, I see Irgan Open Source Code QA section did not use Q and a attention reference paper have tried, so there is the possibility of improving space, The reward of G is computed by D.
G: I read the code, the network structure is written very simple, no attention mechanism.
1 2 3 4 5 6 7 Other tasks for the present effect. For example, dialogue system, machine translation, there are no students to answer.
A: Read the dialogue system has an article, thinking and Seqgan almost. Is
B:seqgan a pioneering work?
C: 16 at the end of the arxiv, is the earliest of it, the effect is therefore not very good, at that time Wgan what just put forward.
D: But the G network is seq2seq,d is hierarchical encoder (not CNN).
E:seqgan is an undergraduate, is very admire.
F: In addition to adversarial Learning for neural dialogue generation, in the chatbot do not know there are other not, as if ... Without.
1 2 3 4 5 6 The second question: What are the current ways to combat text generation? Tell me about the papers you read.
A: The words gumble Softmax seem to have not been mentioned.
B: A way to generate discrete data, but it hasn't been quite clear.
C: Noon I just saw Gumble Softmax, it can replace policy gradient, direct can guide.
D: This is a simple and crude solution.
E: But looking at the result is not good where to go, is one of the more potential methods.
provides a way of thinking.
F: After all, just approximate.
G: The current approach is mainly these two kinds of. Policy gradient and Gumble Softmax, is there anything else? (No answer)
1 2 3 4 5 6 7 8 There is a basic question to ask, the generation of discrete data can not return gradient, but the hidden layer is not continuous mody. Why not just use the implicit layer representation. Like Seq2seq is not also a loss on the basis of Softmax's output.
A: Because the real data is discrete oh, in order to give D and real data corresponding to the generated data, g so also to generate discrete.
B: Because subtle changes on the hidden layer don't make sense
1 2 The third problem: Gan solves the NLP problem, what are the common structures of the network design of G and D. Each has its own advantages and disadvantages.
A: This aspect may be a lot of architecture is seq2seq, there are hierarchy Seq2seq in
front of a classmate mentioned G with Seq2seq, D with hierarchical encoder
( The next answer is gradually biased to the discrete cliché)
B: You can use some sort of embedding method to map the real data to a representation space, but it's still a problem to escape the continuity problem in this space
c:embedding, Ian. Goodfellow said, the word vector is smooth, but the dictionary is discrete
D: Reverting back is only approximate
E: So I wonder if the loss plus one or the weights will work, for discrete, like the distance of the nearest word vector +1, As weights multiply the original loss
F: Let G generate the closer the discrete loss the smaller. (To)
G: This is actually a good wgan.
1 2 3 4 5 6 7 8 9 I think of Irgan, how he chose the document of query. It seems that the discrete problem is not particularly large when it comes to making the "choice" function.
It is through a G network feature, and finally use Softmax to determine which document and query most match
1 G and D of the adoption of RNN,CNN have done the article. There is no transcendental knowledge (metaphysics) that can guide us in what situation we should use.
A: I feel that CNN does more
B:CNN now have no generator to do.
C: Yes, but it doesn't seem to work very well.
1 2 3 Fourth question: The disadvantages of Gan solving such discrete series problems
A: Including, Ian Goodfellow said, the discrete data of continuous embedding space contradictions, can only do approximation; policy gradient and gumble Softmax exist bias
B: When text is generated, Gan models the entire sequence of text. For a partially (partially)-generated sequence, it is very difficult to determine the fraction of its subsequent generation of the entire (fully) sequence.
C: Another potential challenge relates to the nature of the RNN (most of the generated text uses the RNN model). If we try to generate text from latent codes, the error will accumulate exponentially with the length of the sentence. The first few words may be relatively reasonable, but the sentence quality will continue to get worse as the length of the sentence increases. In addition, the length of the sentence is generated from the random latent representation, so the length of the sentence is difficult to control.
D: Say this error has accumulated problems, can add self attention
do things attention now useful to Gan frame it.
E: No words can do things, the effect is good, maybe a piece of paper came out
F:jiwei Li's article, which was created in dialogue with Gan, was seq2seq+attention.
1 2 3 4 5 6 7 Next question: Which NLP problem do you think may be suitable to be solved with Gan, the existing Gan method may pit too much, not necessarily suitable for that kind of NLP problem
A: I think, gan to do half supervision or more hopeful, I think text categorization is a viable application
B: Dialog Generation If there is an evaluation indicator should be able to do, but now there is no
C: Ha, in machine translation proved that Gan is indeed a work
D: Semi-supervision is more hopeful, eh, is it possible to do data enhancement through GAN
1 2 3 4

(At this time the host summons the Great God one)
-You have a job, gan in NMT can introduce

A: But the accuracy of this d, including the design of D, is also critical. Also pay great attention to control the learning rate, adversarial neural machine translation
Q: Is there any particular skill in controlling learning rate?
A: Because when you replace the update mode with the RL policy gradient, you need to pay special attention to the updated learning rate
Q:D design can be expanded to say it.
A: Generally experience design, RL in the learning rate as far as possible to ensure that small
a: I tried different learning rate, found that the law is really big will affect the results
Q: Because policy gradient variance too big.
A: Well, you can say that.
A: In addition, as far as possible to join the bilingual supervised training, this can make the training more stable
A: I personally think with Gan and the way the RL to update, It does help.
1 2 3 4 5 6 7 8 9-10 Enter today's final question: what combination of Gan and search-and-answer questions is currently available.
A:irgan is a kind of
B: The general view of the dialogue generated seems to be generated, that the same thinking with Irgan to do a retrieval of the dialogue system does not seem very difficult to
A:generative and discriminative search methods can be done, From the Irgan to see
landfills-style innovation. It feels like a special case of Irgan.
A: In general, it is divided into the generation model and discriminant model, Irgan has covered these two, perhaps we can further discuss
What is the combination of A:generative/discriminative's retrieval dialogue system and Gan?
A: I remember Irgan said that G can generate samples directly, but they do not do so, but directly with Softmax selection
A: So, is it a way to generate samples directly? How feasible.
C: Searchable dialogue is of no value
A: It is not worthless, I think retrieve information after the extraction of key information, follow-up can be customized personalized answer
1 2 3 4 5 6 7 8 9 10

This is the last question, can be seen to pay attention to this problem is not a lot of students, so there is no one to answer, has been the host in the expression of their views, until ... C Classmate said a "search dialogue has no value" ... The discussion ended.
Because I don't know about the problem, I can't evaluate it.

And then I'm going to say a little bit about the whole discussion:

First, the text is discrete. This is no doubt, and it is the charm of Gan applied to NLP.

As can be seen from the above discussion, most of them will return to the problem.

Then, the students mentioned most of the results is Irgan. I have not read this paper and intend to study it.
Attach a link to unscramble the paper

About Seqgan, can be regarded as a pioneering work, provides the idea that RL applies to GAN.

There are gumble Softmax, it can replace policy gradient, can only say that is a thought, because its production effect is not very good, it also shows that this kind of thinking of the promotion of space is very large.

The students also mentioned that Gan is used in the semi-supervision of text categorization of the possibility, which is my current focus on one of the directions.

Finally, please the great God to teach.

and attach a great God to the summary of this discussion

Original address: http://blog.csdn.net/yinruiyang94/article/details/77618344

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.