Use. Net to develop the MSN chatbot-the secret of MSN chatbot development. [Repost to favorites]

Source: Internet
Author: User
Preface:
I am not a developer, not a master, or I love to play. In terms of technology, there is nothing like to explore, but to be opportunistic. In this article Article You can't "get out of work" or use my robot to change your own robot because you think Program The write is stinking, so it won't open source. However, if you have some knowledge of. Net or C #, I believe that from this article, you can find all the resources you need to develop your own, absolutely available MSN robot. To chat with my robot, you can add tbot01@hotmail.com, name is "taqikma", is named after the cartoon shells. At the same time, you can also go to http://www.guanqun.com, there is a Web Chat Robot like this MSN robot, you can first talk to see, try to chat in Chinese.

This is not an article for beginners. If you do not know anything about. Net or database, we recommend that you take a look at it first. At the same time, I also hope that the real experts will not laugh at me. After all, it is not a wrong thing for a common computer enthusiast who is not a developer to explore and tell everyone how to make a fun thing.

I. Why MSN chatbots?

1. The reason I can think

The most important thing is that it is fun. What Your MSN robot says must reflect your character (if you want ). Of course, this is my reason. The original intention of this robot is just one day I want to do it myself. Maybe you want your robots to help you do something like an expert system or customer service system.

2. The current MSN chatbot

Now there are a lot of MSN robots, if you have added MSN robots, I think the most on the list of you is a "little cloth" or a bunch of friends of his siblings (http://www.9zi.com ), based on the load, you may be surrounded by a bunch of requests from their families to add friends each time you go online. There are also some so-called "free text message" robots that I have always been working on sp. I directly said that in order not to delay your making money, I will not comment on such robots. Could you mention the msgerai (msgerai@hotmail.com) robot, the guy who developed it really wanted to do something that could be as intelligent as a person, although it could not be done in his lifetime, but I wish him success. After all, a dream is good, and this robot can now do some work for him (http://www.funnyok.net/nlp ). There are also some other MSN robots, such as providing information query services to help you search for Google. MSN when there is a list (http://www.msning.com), just look at it.

Ii. Why does. net

The reason is simple. C # is very similar to Java, but I can't find a very useful ide that suits my usage habits. And C # is different, vs. Net (http://msdn.microsoft.com/vstudio/) Of course the best use, C # Builder (http://www.borland.com/csharpbuilder/) is also good, even sharpdevelop (http://www.icsharpcode.net/OpenSource/SD/) are quite comfortable to use. Therefore, it is better to select. net.

In addition,. NET is very convenient for development. As long as you have a little development foundation, it is not very difficult to use. Net to write programs. From the perspective of a user rather than a developer, I don't have to study too many technical aspects or optimize things. I don't want to go to Microsoft Research Institute without that ability.

We recommend that you use the latest Visual Studio. NET version to save a lot of trouble.

At the same time, there are many resources available for. NET development, which will be mentioned later.

3. What kind of chatbot do you want

1. ideas before development

I am talking about the concept of "chatbot", which means that he can chat with you. You need to have a program to "teach" him to speak, and let him understand the general meaning contained in the words, and be able to answer questions that are basically not so outrageous.

2. What else can he do?

You can also ask him to do many other things, such as querying IP addresses, mobile phone numbers, registration numbers, flight numbers, or directly ask him to search Google for help. These are not troublesome, as long as you want.

4. Let the Robot speak first

Whether your robot is intelligent or not, it is the most important thing for him to answer questions on MSN. Therefore, you need an MSN account to connect to the MSN server, get messages from various servers, and send messages back to the server.

Of course, you can analyze the MSN protocol (http://www.hypothetic.org/docs/msn/index.php), write the communication part by yourself. However, I mentioned that I am a opportunistic person, so it is better to find a usable interface. So I found some interfaces for MSN development.

Msnhelper:
Http://sourceforge.net/projects/msnphelper/

Dotmsn:
Http://members.home.nl/ B .geertsema/dotMSN/

Both are developed for. net. I use dotmsn, which uses the msnp8 protocol. Do not use the version on SourceForge for dotmsn. Use the address given above.
Next, download this example:
Http://members.home.nl/ B .geertsema/dotMSN/...ple/Example.zip

Open, compile, and execute with vs.net.

Read it. After logging on, double-click a person on the list and send a "hello World!" message to this person! ". You can directly talk to people without using the original MSN program.

This partCodeYes:

Private void contactjoined (conversation sender, contacteventargs E)
{
// Someone joined our conversation! Remember that this also occurs when you are
// Only talking to 1 other person. Log this event.
Log. Text + = E. Contact. Name + "joined the conversation. \ r \ n ";

// Now say something back. You can send messages using the conversation object.
Sender. sendmessage ("Hello world! ");
}

This means that when the recipient joins the chat, you will send him a "Hello world! . At this time, if the person on your list double-click your name, you will also receive a hello World !.

5. Make robots understand Chinese

1 Database

Because we want to be a Chinese chatbot, the size of the corpus is directly related to whether your robot is smart. Due to my habits, I used MYSQL as a database for storing the corpus and Chinese dictionary. MySQL is extremely fast. Of course, you can use access or SQL Server, and it is easier .. Net to call the MySQL library, you can find MySQL driver CS here
Http://sourceforge.net/projects/mysqldrivercs/

2. Full sentence matching

The concept of sentence matching is very simple. Chat, people you don't know usually come up and say "hello", or "Hi ~~" And so on. This is usually very simple, and there is not much change, just let the robot answer. For example, if the other party says "hello" and the robot sees this "hello", it will directly answer "hello. If the other party says "88", you can ask the robot to say "goodbye" or "88 .. This is called full sentence matching. That is, the robot got the entire sentence and checked it in the library. Ah, how can I answer this sentence in the library? Pick out a single answer and the other party won't think the robot is stupid.

Even if the other party says "You are stupid", and you ask the robot to answer "I am not stupid", the other party will surely feel that this robot is okay and knows that others say that he is stupid.

3. Chinese Word Segmentation

A chatbot must understand Chinese. The basic of Chinese processing is Chinese word segmentation. What is word segmentation? "Word segmentation is the process of re-composing word sequences according to certain specifications ." I copied this definition. Please refer to this article: http://www.hylanda.com/center/knowledge.htm they do Chinese Word Segmentation should have a certain score. ICTCLAS is also doing well in Chinese Word Segmentation systems. With VCSource code, You can take a look.
Http://www.nlp.org.cn/project/project.php? Proj_id = 6

Some people may say that I don't understand this thing, and I have never studied it. I don't understand either. However, if you do not use Chinese word segmentation, chatbots can only stay at the level of evidence matching. We can use the maximum matching method to perform simple word segmentation for what chatbots receive. AboutAlgorithmFor more information, see Mr. Zhan Weidong's handout.

Course name: Basic Chinese Information Processing

Http://ccl.pku.edu.cn/doubtfire/Course/Chi...2002_2003_1.htm

Download this PPT handout: http://ccl.pku.edu.cn/doubtfire/Course/Chinese%20Information%20Processing/contents/Chapter_07_1.ppt

Word Segmentation Algorithms do not need to be too complex. It is easy.

In addition, the word segmentation algorithm requires a Chinese Word Segmentation dictionary. I provide a MySQL instance, which can be downloaded here. Import it to your MySQL instance. Other databases can also use SQL statements for simple modification.
Chinese Dictionary download: http://www.guanqun.com/down/wordlist.rar

4. Word matching

Word splitting is not enough. If you really want robots to understand what people say, you must use artificial intelligence algorithms. We just want to be a robot, and there is no need to study it so deeply. Artificial intelligence has come to the present, and there are few chatbots that are too smart. In addition, it is good to let professional researchers study it. We just play with it. So ...... We use the simplest method. Our method is to ask the robot to find the keyword of this sentence, the approximate part-of-speech matching of this sentence, and then find the answer that matches this rule in the corpus.

A simple example:
For example, the other party said:

"You are really fun"

We first use the word segmentation algorithm to divide this sentence

"You are really fun ",

Then find out the keyword "fun ". At the same time, the part-of-speech matching of this sentence is also recorded. In this way, when we find the keyword "fun" in the corpus, we will try again to find out whether there is an answer similar to this sentence. If so, we will randomly answer a sentence: "Haha... .", In this way, the user can feel better.

Then the question arises. How can we find out the keywords? My method is ...... (Poor, but usually valid), find the word with the longest length in this sentence as a keyword. No, because it is faster. If all words in a sentence are scanned as keywords, and then the database is searched, some matching problems may occur. (Not scientific, but usually valid ).

5. Make robots smarter

1. Design of sentence-matching Corpus

The first step is to make your entire sentence match corpus. The corpus must be written by yourself. Do not be lazy. Find out what others often say, for example, hello, thank you, sorry, and put more answers in it, so that each time the answer is the same, and then write an SQL statement to query the answer, for example

Select * From reply where 'key' = '"+ sentense +" 'order by rand () limit 1

If you find it, you can directly reply to it. If the entire sentence cannot be found, perform word segmentation.

2. Design of Word Segmentation matching Corpus

Because our word segmentation algorithm has not been optimized, and the method for finding keywords is not so good, your answers must be less clear. To put it bluntly, the answer should be vague. The goal is to make people feel that the robot has understood what he said, and the answer is still "right ". There is no 100% million lines. As long as there are more than 40% lines, chat users will basically accept them. At the same time, it is recommended that the answer corpus be used to tell the answer in your corpus when the other party answers the answer again. It is best to make sentences that match the entire sentence.

For example:

Q: Are you a male or a female? /Are you male or female? (It doesn't matter if there are punctuation marks. We need to record the part-of-speech matching of sentences, and at the same time, we need to deal with punctuation marks)

In a sentence like this, we can use word segmentation to find out the keyword "or". By judging the part of speech, we can know that this is a question. In addition, you can choose between the two cases. (Of course, we can't know this sentence is actually about gender through simple algorithms)

How can your robot answer this question? In fact, it is very simple. First of all, you should answer the question in the right way, and try not to make people feel wrong. At least people think that your robot knows what the other party is asking. Therefore, my robot replied:

Robot answer: Yes... Haha

Because the answer is a chat term with a joke, it will make the chatbot feel that this robot is not so stupid.

This is just a simple example. You have to analyze many specific sentences by yourself. Of course, the more corpus, the more robots understand, and the more intelligent they are.

3. What if no keyword is found?

In not many cases, it is very likely that our word segmentation algorithm cannot match the appropriate answer. Therefore, we need to make another corpus to answer the question when the keyword cannot be matched. Such answers require the answering skills of people similar to "computation", because the other party may say anything, and our robots do not understand. Therefore, we need to find a way to "pass through" and try to guide the answers that may be answered to your robot. You can try to chat with "Xiao Bu". When you find that the answer is not answered, you will just pick a "Buddhist Sutra.

In fact, the most important technique is that the words of a fortune teller are all in the fog of the cloud, which makes people feel confused and may be right. We need robots to learn this kind of skill to achieve what looks "smart.

Last words:
In fact, writing such a robot program is very fast. If you are familiar with it, you can write it in one day. It took me about a day and a half, and I added some time to prepare the corpus. If you really want to be a little "smart" robot, this article will give you at least 3-5 hours of data searching time. If you are too lazy to study it on your own, and other companies can only download programs that match the entire sentence, you can just play it on your own.

First published in my blog: http://bot.donews.net/bot reprint please do not remove this

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.