ML: NLP questions for natural language processing

Source: Internet
Author: User
Tags hash knowledge base lenovo

Three milestones in natural language processing:

http://blog.csdn.net/sddamoke/article/details/1419973
the two facts were:

The grammar of phrase structure cannot describe the natural language effectively.

Secondly, the coverage of phrase structure rules is limited. Chomsky has made the assumption that the number of grammatical rules for a natural language is limited, and the number of sentences generated is infinite.
The three milestones mentioned in the article are:

One, complex feature set. Complex feature sets are also called multiple attribute descriptions.

Second, lexicology. The linguistic community believes that grammatical structures, i.e. grammatical "impossibility", cannot be written according to individual words without dividing speech, but it is also noted that any classification will lose some important information of the individual.

Third, statistical language model. This is also the corpus method.

Summarize:

The complex collection, the Oneness grammar and the method of lexicology are all important contributions made in the framework of the original rationalism (production or logical reasoning). In particular, the development of lexicology is more and more supported by corpus and statistical methods, which is also the combination of empiricism and rationalism. They will become the mainstream of natural language processing technology.


Natural Language Processing:

More than 10k experience in Beijing more than 1 years of master and above full-time

Job Temptation: Performance bonus More promotion space big outstanding team half Salary job description

Job Description:
1, undertake the natural language processing technology in dialogue, general question and answer, intention to understand, knowledge base processing management direction of application research;
2, in accordance with the research and project planning, and other members of the project team work together, in order to ensure quality, on time to complete research and development tasks.
Qualifications for Employment:
1. Education requirements: Graduated from 211 or 985 colleges and universities, master degree or above, Ph. D. in English above 4.
2. Working experience:

More than 1 years working experience in related field, familiar with natural language processing techniques, dialogue, general questions, intention to understand research experience
Familiar with C/c++/python, have algorithm research background experience
Good English, wide knowledge, strong research ability, familiar with and master the application of natural language processing in the field of intelligent human-computer interaction technology.
3. Capacity Requirements:
Have strong research interest and willingness to learn, have good learning ability, problem analysis ability, and problem solving ability;
Good communication and coordination skills and team work spirit, can take the initiative to summarize and share their development experience.
4. Quality Requirements:
Work steadfast, positive attitude, able to work under pressure, can adapt to strict project management;
Strong sense of responsibility, integrity and honesty, trustworthy.


Interview Written questions:

Baidu Campus Recruitment-natural Language processing engineer original link: http://bishi.cnrencai.com/jingyan/1775.html

First, Jane Answer (30 points)

1. Briefly describe the principle and necessary conditions of database and thread deadlock, and briefly describe how to avoid deadlock. (10 points)

2. Please enumerate the three basic elements of object-oriented design and five main design principles. (10 points)

3. Multithreading how to sync. (10 points)

Second, algorithm and program design (a total of 45 points)

1.100 bulbs in a row, the first round will be all light bulbs open, the second round every other light bulb off one, that is, even the light bulb is turned off. In the third round every two bulbs, the open bulb is turned off and the light bulb turns off. And so on, at the end of the 100th round, there were still a few lights on. Write code implementations. (15 points)

2. There is a millions string set (Worddic) in which each string in Worddict is of 2~5 character length. For any query string (query), define the condition that this query has for worddic fuzzy matches: The query internally removes up to 6 consecutive characters and exactly matches a word in worddic. For example: Worddic in the "Baidu Company" this string, query "Beijing Baidu Network Technology Co., Ltd.", this query can be removed by 6 consecutive characters (' Network Technology Limited ') to match the "Baidu Company";

Now you need to design an algorithm to implement this function:

/** @brief: Query Match function

* @param worddcit: A collection of strings where you can customize the data structure of the dictionary worddic;

* @param query:query;

* The length of the @param querylen:query;

* @param return:1 that the query can blur a string in the dictionary,-1 for the other;

*/

int check_query (const dict *worddict, const char *query, const int querylen);

Requirements: Give the design of data structure Dict and complete check_query function (20 points)

Third, the system design problem (a total of 35 points)

1. Spelling error correction is a function of the search engine, which is to automatically analyze user input queries (query), check for spelling errors, and if so, give correct spelling suggestions. For example: "Lenovo mobile phone" to the wrong "Lenovo mobile phone." At this point the search engine will usually give a hint "you are looking for is not: Lenovo mobile phone."

Generally speaking, the spelling error correction mainly includes two important steps: The first is to identify the wrong words entered by the user, and the second is to change the wrong words into the correct words.

Question: 1 in Chinese, the common error input is the homonym of different words: for example, "Apple" was wrong to "Pingguo"; in English, the common error typing errors, such as "latest" wrong to "LATST". For the above two kinds of errors in Chinese and English input, please give a solution respectively.

2 user input query, often also contains some contextual information (such as "when the Pingguo mobile phone release"), how to use these contexts to improve the effect of error correction?

Baidu 2015 Campus recruitment natural Language Processing engineer Written examination

Original link: http://blog.csdn.net/shymi1991/article/details/39432775

One, Jane answer

1. The difference between new and malloc.

1,malloc and free are standard library functions for c++/c languages, and new/delete are the operators of C + +. Both can be used to request dynamic memory and free memory.

2, for objects that are not internal data types, Maloc/free cannot meet the requirements of dynamic objects. Objects are created with the constructor automatically executed, and the destructor is automatically executed before the object dies. Because Malloc/free is a library function and not an operator, it is not possible to impose the task of executing constructors and destructors on Malloc/free without the compiler controlling permissions.

3, the C + + language requires a new operator to complete the dynamic memory allocation and initialization work, with an operator delete that cleans and frees up memory work. Note that New/delete is not a library function.

4,c++ programs often call C functions, while C programs can only manage dynamic memory with Malloc/free

2. Hash conflict and solution.

Elements with different keyword values may reflect hash conflicts on the same address as the hash table. Solution:

1 Open addressing: When a conflict occurs, a probing (measured) sequence is formed in the hash table using some kind of probing (also known as probing) technique. Look up this sequence one unit at a time until a given keyword is found, or if you encounter an open address (that is, if the address cell is empty) (to insert and, if you are probing to an open address, you can save the new node to be inserted in the Address cell). Probing to an open address on lookup indicates that there are no key words in the table, that is, the lookup failed.

2) Again Hashifa: construct several different hash functions at the same time.

3 The Chain address method: The element of all hash address I is composed of a single linked list called a synonym chain, and the head pointer of a single linked list exists in the first unit of the hash table, so finding, inserting and deleting are mainly done in the synonym chain. The link address method applies to situations where insertions and deletions are frequently performed.

4 The establishment of a public overflow area: the hash table is divided into basic tables and overflow table two parts, general and basic table conflict elements are filled in overflow table.

Second, programming

1. Implement merge sort.

2. The S-type traversal of the two-fork tree.

The first level from left to right, the second level from left to right, the third level from left to right ...

3.2 billion URL exists in one text, a URL occupies one row, where there is duplication, and the frequency of the URL is counted.


third, System design

1. The forward matching in natural language processing is often used in participle. as

The Distant ancient Babylon

Forward matching participle result is

remote | cuba | babylon

Forward matching participle result is:

Far | away | ancient | babylon

It is required to write forward matching interface and implementation method.

Summary: just excerpt so much, anyway also do not finish.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.