Calculate the number of occurrences of a vocabulary

Source: Internet
Author: User
There is a group of non-daily English vocabulary, I need to calculate in English articles in the most frequent frequency.
So I initially thought of traversing the array, using Substr_count to sequentially calculate the number of occurrences of each word, but this would result in multiple repetitions of the entire article scan. Or the article is broken into words, from the array function to calculate the number of intersections, but still feel not ideal.

Do you have any ideas? This app is actually a keyword extraction.


Reply to discussion (solution)

How to split the group is not good, English into the array is very convenient ah, at least more simple than Chinese
Actually do not understand your needs, purely statistical array_count_values is convenient enough

That means you already have a thesaurus, and now you need to check the number of occurrences of the word store word in the article.
If yes, then you can use the trie algorithm (which I sent)
Just scan the article once and then, of course, first construct the thesaurus

That means you already have a thesaurus, and now you need to check the number of occurrences of the word store word in the article.
If yes, then you can use the trie algorithm (which I sent)
Just scan the article once and then, of course, first construct the thesaurus

Why is it better to save a thesaurus? Mysql,json,xml, a pure array?

If an article has 5kb, the thesaurus has 1000 words, then put the 1000 words one by one foreach, matching this article,

Mysql_query,
Json_decode ()
Simplexml_load_file ()
Array

Which is more efficient and saves resources (Cpu,ram)?

5KB is not likely to have 1000 words, all of them are articles of the word?

Even if 1000, the amount is not very large, remove the repetition should be much less, once the array intersection is enough

My train of thought is that the article splits into the word array, array_count_values the statistics and removes duplicates two functions
Then extract the number of parts (too few times do not match the meaning of it?) ), the rest is very few, and the existing thesaurus to find the intersection is enough

Although the landlord is a reference to English vocabulary, but your algorithm is limited to English words, it is meaningless.


5KB is not likely to have 1000 words, all of them are articles of the word?

Even if 1000, the amount is not very large, remove the repetition should be much less, once the array intersection is enough

My train of thought is that the article splits into the word array, array_count_values the statistics and removes duplicates two functions
Then extract the number of parts (too few times do not match the meaning of it?) ), the rest is very few, and the existing thesaurus to find the intersection is enough

What you're saying makes sense.
But I think the simple problem is simple, since he speaks English, so to think, there is no need to spend too much time thinking about the algorithm
If he says mixed languages, I guess I'm just not going back to this post, huh?

Although the landlord is a reference to English vocabulary, but your algorithm is limited to English words, it is meaningless.


Reference 4 Floor Snmr_com's reply: 5kb is unlikely to have 1000 words, all of them are articles of the word?

Even if 1000, the amount is not very large, remove the repetition should be much less, once the array intersection is enough

My train of thought is that the article splits into the word array, array_count_values the statistics and removes duplicates two functions
Then extract the number of times ...

version of the prefix tree did not understand, for the time being selected several times to scan the article to achieve

A simple example

Include ' ttrie.php '; class Wordkey extends Ttrie {  function B () {    $t = Array_pop ($this->buffer);    $this->buffer[] = "$t";  }} $p = new Wordkey; $p->set (' Qin Shihuang ', ' B '), $p->set (' Luoyang ', ' B '); $t = $p->match (' Qin shihuang East patrol Luoyang '); Echo join (', $t);
Qin ShihuangEast Patrol Luoyang
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.