Calculate the number of occurrences of a vocabulary

Last Update:2016-06-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There is a group of non-daily English vocabulary, I need to calculate in English articles in the most frequent frequency.
So I initially thought of traversing the array, using Substr_count to sequentially calculate the number of occurrences of each word, but this would result in multiple repetitions of the entire article scan. Or the article is broken into words, from the array function to calculate the number of intersections, but still feel not ideal.

Do you have any ideas? This app is actually a keyword extraction.

Reply to discussion (solution)

How to split the group is not good, English into the array is very convenient ah, at least more simple than Chinese
Actually do not understand your needs, purely statistical array_count_values is convenient enough

That means you already have a thesaurus, and now you need to check the number of occurrences of the word store word in the article.
If yes, then you can use the trie algorithm (which I sent)
Just scan the article once and then, of course, first construct the thesaurus

Why is it better to save a thesaurus? Mysql,json,xml, a pure array?

If an article has 5kb, the thesaurus has 1000 words, then put the 1000 words one by one foreach, matching this article,

Mysql_query,
Json_decode ()
Simplexml_load_file ()
Array

Which is more efficient and saves resources (Cpu,ram)?

5KB is not likely to have 1000 words, all of them are articles of the word?

Even if 1000, the amount is not very large, remove the repetition should be much less, once the array intersection is enough

My train of thought is that the article splits into the word array, array_count_values the statistics and removes duplicates two functions
Then extract the number of parts (too few times do not match the meaning of it?) ), the rest is very few, and the existing thesaurus to find the intersection is enough

Although the landlord is a reference to English vocabulary, but your algorithm is limited to English words, it is meaningless.

5KB is not likely to have 1000 words, all of them are articles of the word?

Even if 1000, the amount is not very large, remove the repetition should be much less, once the array intersection is enough

My train of thought is that the article splits into the word array, array_count_values the statistics and removes duplicates two functions
Then extract the number of parts (too few times do not match the meaning of it?) ), the rest is very few, and the existing thesaurus to find the intersection is enough

What you're saying makes sense.
But I think the simple problem is simple, since he speaks English, so to think, there is no need to spend too much time thinking about the algorithm
If he says mixed languages, I guess I'm just not going back to this post, huh?

Although the landlord is a reference to English vocabulary, but your algorithm is limited to English words, it is meaningless.

Reference 4 Floor Snmr_com's reply: 5kb is unlikely to have 1000 words, all of them are articles of the word?

Even if 1000, the amount is not very large, remove the repetition should be much less, once the array intersection is enough

My train of thought is that the article splits into the word array, array_count_values the statistics and removes duplicates two functions
Then extract the number of times ...

version of the prefix tree did not understand, for the time being selected several times to scan the article to achieve

A simple example

Include ' ttrie.php '; class Wordkey extends Ttrie {  function B () {    $t = Array_pop ($this->buffer);    $this->buffer[] = "$t";  }} $p = new Wordkey; $p->set (' Qin Shihuang ', ' B '), $p->set (' Luoyang ', ' B '); $t = $p->match (' Qin shihuang East patrol Luoyang '); Echo join (', $t);

Qin ShihuangEast Patrol Luoyang



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Calculate the number of occurrences of a vocabulary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Calculate the number of occurrences of a vocabulary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support