ML: Natural Language processing NLP face question

Source: Internet
Author: User
Tags hash sort knowledge base lenovo

Three milestones in natural language processing:

http://blog.csdn.net/sddamoke/article/details/1419973
the two facts were:

The phrase structure grammar cannot describe the natural language effectively.

The rule of phrase structure has limited coverage. Chomsky has put forward the hypothesis that the number of grammatical rules is limited for a natural language, and the number of sentences generated is infinite.
The three milestone developments mentioned in the article are as follows:

A complex set of features. Complex feature set is also called multiple attribute description.

Second, the lexical doctrine. The linguistic community argues that the grammatical structure, which is "impossible" to write the rules according to individual words, cannot be divided into parts of speech, but it is also noted that any classification can lose some important information about the individual.

Third, statistical language model. This is also the corpus method.

Summarize:

The complex special collection, the Oneness grammar and the Lexicology method are all important contributions in the framework of the original rationalism (production or logical reasoning). In particular, the development of lexical methods is increasingly supported by corpora and statistical methods, which are the integration of empiricism and rationalism approaches. They will become the mainstream of natural language processing technology.


Natural Language Processing:

10k above 1 years of experience in Beijing with Master's or above full-time

Position Temptation: Performance bonus Multiple promotion space big excellent team semi-annual salary Job description

Job Description:
1, to undertake natural language processing technology in dialogue, general questions and answers, intention to understand, knowledge base processing management and other directions of application research;
2, in accordance with the research and project plan, and other members of the project team work together to ensure the quality of the premise, the timely completion of research and development tasks.
Qualifications:
1. Education requirements: Graduated from 211 or 985 institutions, master degree or above, PhD preferred; English 4 or above.
2. Working experience:

At least 1 years working experience in related field, familiar with natural language processing technology, dialogue, general question and answer, intention understanding research experience
Familiar with C/c++/python, with algorithmic research background experience
Good English, broad knowledge, strong research ability, familiar with and master the application of natural language processing in the field of intelligent man-machine interactive technology.
3. Competency Requirements:
Have a strong interest in research and learning will, with good learning ability, problem analysis ability, and problem-solving ability;
Good communication and coordination skills and team work sense, can actively summarize and share their own development experience.
4. Quality Requirements:
Practical work, positive attitude, able to work under pressure, can adapt to strict project management;
Strong sense of responsibility, honest and trustworthy.


Interview Written questions:

Baidu Campus Recruitment-natural Language processing engineer original link: http://bishi.cnrencai.com/jingyan/1775.html

First, Jane Answer (30 points total)

1. Briefly describe the principle and necessary condition of database and thread deadlock, and briefly describe how to avoid deadlock. (10 points)

2. Please list three basic elements of object-oriented design and five main design principles. (10 points)

3. How to synchronize multiple threads. (10 points)

Second, algorithm and program design (45 points total)

1.100 bulbs in a row, the first round to turn all the light bulbs, the second round every light bulb off one, that is, the even-numbered bulbs are turned off. The third round of every two bulbs, will open the light bulb off, turn off the light bulb to open. And so on, at the end of the 100th round, there are several light bulbs on. Write code implementations. (15 points)

2. There is a millions string collection (Worddic), and the length of each string in Worddict is 2~5 characters. For any query string, you define the query to worddic a fuzzy match: The query internally removes up to 6 consecutive Chinese characters and exactly matches a word in the worddic. For example: Worddic in the "Baidu Company" this string, query "Beijing Baidu Network Technology Co., Ltd.", the query can be removed by 6 consecutive characters (' Network Technology Limited ') to match "Baidu Company";

Now you need to design an algorithm to implement this function:

/** @brief: Query Match function

* @param worddcit: String collection, where you can customize the data structure of the dictionary worddic;

* @param query:query;

* @param length of querylen:query;

* @param return:1 indicates that the query can obfuscate a string in the dictionary, 1 means other;

*/

int check_query (const dict *worddict, const char *query, const int querylen);

Requirements: Give the design of the data structure Dict and complete the Check_query function (20 points)

Three, the system design problem (a total of 35 points)

1. Spell correction is a search engine function, refers to the automatic analysis of user input queries (query), check for spelling errors, if any, give the correct spelling suggestions. For example: "Lenovo mobile" to the wrong "Lenovo phone." At this time the search engine will generally give a hint "you are looking for: Lenovo Phone".

In general, spelling correction consists of two important steps: one is to identify the wrong words entered by the user, and the other is to change the wrong words into the correct words.

Question: 1) in Chinese, the common error input is the same pronunciation of different words: for example, "Apple" is the wrong word "Pingguo", in English, the common error input spelling errors, such as "latest" wrong to "LATST". For the above two types of errors in Chinese and English input, please provide a solution.

2) The user input query, often also contains some contextual information (such as "when Pingguo mobile phone release"), how to use these contexts to improve the effect of error correction?

Baidu 2015 Campus recruitment natural Language Processing engineer written test

Original link: http://blog.csdn.net/shymi1991/article/details/39432775

One, a simple answer

1. The difference between new and malloc.

1,malloc and free are standard library functions for c++/c languages, and new/delete are operators of C + +. They can all be used to request dynamic memory and free memory.

2, for objects with non-intrinsic data types, the light Maloc/free cannot satisfy the requirements of dynamic objects. Objects are automatically executed when they are created, and the object executes the destructor automatically before it dies. Because Malloc/free is a library function and not an operator, the task of executing constructors and destructors cannot be imposed on malloc/free, not within the control of the compiler.

3, so the C + + language requires an operator new that can perform dynamic memory allocation and initialization, with an operator delete that can perform cleanup and release of memory work. Note New/delete is not a library function.

4,c++ programs often call C functions, while C programs can only use Malloc/free to manage dynamic memory

2. Hash conflicts and solutions.

Elements with different key values may be mapped to hash collisions on the same address as the hash table. Workaround:

1) Open addressing method: When a conflict occurs, a profiling (detection) sequence is formed in the hash table using some sort of probing (also called probing) technique. Finds the specified keyword along this sequence, either until a given key is found, or when an open address (that is, the address cell is empty) (to insert, in the case of an open address, the new node to be inserted is stored in the Address cell). Probing to open addresses while searching indicates that there are no unknown origin keywords in the table, that is, the lookup failed.

2) Re-hashing: constructs several different hash functions at the same time.

3) Chain Address method: all elements that hash address I are formed into a single linked list called the synonym chain, and the head pointer of the single-linked list exists in the unit I of the hash table, so that find, insert, and delete are mainly performed in the synonym chain. The chain address method is suitable for frequently inserted and deleted situations.

4) set up a public overflow area: the hash table is divided into basic tables and overflow table two parts, the general and the basic table conflict elements, all fill in overflow table.

Second, programming

1. Implement merge sort.

2. The S-type traversal of the two fork tree.

The first layer from left to right, the second layer from left to right, the third layer from left to right ...

3.2 billion The presence of a URL in a text, a URL for a row, where there are duplicates, statistics out the frequency of the URL.


third, the system design

1. Forward matching in natural language processing is often used in Word segmentation. as

The Distant ancient Babylon

Forward matching participle result is

The Distant | cuba | Babylon

The result of the forward matching participle is:

Remote | ancient | babylon

It is required to write forward matching interface and implementation method.

Summary: just excerpt so much, anyway also do not finish.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.