Yesterday, Weibo saw a problem and a solution.

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Yesterday, Weibo saw a question in the 1 million Usernames. try to find the username automatically created by the machine. It is actually a simple anti-spam method. Some people say that they can search by google or baidu for each user name to see if there are any traces of Internet access. Not to mention that this is unreliable. The author obviously wants to solve this problem from an algorithm perspective rather than social engineering. I began to think about how to first break word segmentation for 1 million usernames, and then count each word. on these 1 million usernames, I saw a problem on Weibo yesterday.
Find the username automatically created on the machine among the 1 million usernames.
It is actually a simple anti-spam method.

Some people say that they can search by google or baidu for each user name to see if there are any traces of Internet access. Not to mention that this is unreliable. The author obviously wants to solve this problem from an algorithm perspective rather than social engineering.

I began to think about first word segmentation for 1 million usernames, and then count the number of times each word appears in these 1 million usernames, that is, the word frequency. Sort by word frequency in descending order and take top n. Next, find the words that appear in top n in the 1 million usernames. These are probably created by machines.

However, it was not scientific to do so, and a large normal user name may be mistakenly killed. Because some hot words appear in each time period, many people prefer to use these hot words as part of the user name. Or some classic words may be used by most people.

Therefore, I think that unless we can manually participate in the process to find some hot words. Exclude hot words from top n. Otherwise, this method is not good at all.

Let's take a look at your ideas and discuss them together. Note: This proposition only applies to the user name, rather than the user's speech or registration date.

------ Solution --------------------
1. from the past registration experience, the user names automatically created by the machine are combined with the registration information submitted by the user. There is also a prefix of the plus signs
2. check that usernames with the same prefix are the most concise method.

If you have data available at hand, you can explore the algorithm. Unfortunately, no
------ Solution --------------------
I am also paying attention to this. haha, although not quite familiar with beginners.
------ Solution --------------------
Reference:

1. from the past registration experience, the user names automatically created by the machine are combined with the registration information submitted by the user. There is also a prefix of the plus signs
2. check that usernames with the same prefix are the most concise method.

If you have data available at hand, you can explore the algorithm. Unfortunately, no

Take the csdn user library for trial... There is another 100 M + database on hand ....

At present, it seems that some characters + numbers are reliable, and the numbers keep going.

------ Solution --------------------
If I am a machine, I don't need to use simple words or english. I use Japanese, Korean, and Malay. can you use such a large database to tell the truth?
Therefore, the security token is still a verification code.
------ Solution --------------------
There is no solution to this problem using algorithms...

Ci169
Ci1699
Ci16999
Ci169999
Ci1699999

Just like which of the above CSDN accounts can be calculated for machine registration.
------ Solution --------------------
Why is hot considered a machine ????
------ Solution --------------------
Is there any free LAMP space for interesting questions? Upload a copy.

'tom'.substr(str_shuffle("abcdefghijklmnopqrstuvwxyz"), 0, 4);

------ Solution --------------------
Bayesian classification should be only positive, but how to organize raw data is a problem.
It is a bit inappropriate to mention algorithms without many uncertainties.
We recommend that you use weka (a java data mining software) for testing.
------ Solution --------------------
The user name registered by a person must have a certain logic, so that it is easy to remember, and the machine does not need to register automatically;
I think we can use a dictionary to screen the password first.
The question is just to find out as much as possible.

In fact, even the usernames sorted by disordered letters cannot be determined to be registered by machines,
Unless there are user logon behaviors, registration intervals, and other auxiliary information, otherwise, I really think this method is meaningless.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Yesterday, Weibo saw a problem and a solution.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Yesterday, Weibo saw a problem and a solution.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support