Analysis of a phishing attack against Alexa Top 100 websites

Source: Internet
Author: User
Tags domain name registration reverse dns

Analysis of a phishing attack against Alexa Top 100 websites

Ladies and gentlemen, we will do something special today. This article is jointly written by Ethan Dodge (@__ eth0) and Brian Warehime. We will discuss anti-phishing techniques in Alexa Top 100 domain names, at the same time, it will expose some cases of phishing attacks against these domain names.

We use a new DNS detection tool DNStwist and some self-written Python scripts to collect and analyze all the information we found. Of course, you can follow our ideas to explore it.

Summary

We will capture the top 100 domain names in the Alexa ranking, use a script to collect the modified domain Names (permutated domain) and the sort type (bitsquatting, insert, default, replace, etc.) obtained by DNStwist ), with the domain name in this list, we can query the host to obtain the IP address of the host. Finally, we will perform WHOIS queries and reverse DNS queries on IP addresses to compare their registration/pointing records.

After data comparison, we can find out which type of arrangement covers the widest range of attacks (that is, the domain name registration change domain may prevent Phishing) and which type of arrangement covers the least.

Capture Data

First, we need to obtain the data of the previous millions of Alexa sites.
Wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Then narrow down to the first 100
Cat top-1m.csv | awk-F', '{'print $ 2'} | head-n 100> alexatop100.txt

Retrieve domain names

At first, we will get an arrangement list from DNStwist, and we will continue to modify the original script to caster the redundant output information. We only need to arrange the type and the domain name generated by the sort type.

The following shows the sorting result generated using google.com as an example.


Now we have our own domain name list, and then use the following bash command to traverse all domain names. Then, the modified dnstwist is run to output the result to a new file.
While read domain; do python dnstwist. py $ domain> ~ /Desktop/alexatop100/$ domain; done ~ /Desktop/alexatop100.txt
It takes about five seconds to run the preceding command. The directory we obtained is probably like this.


Host search

In our plan, the next step is to search for the host of the generated domain name list. We want to get a text document containing the changed domain name, change type, and IP address.

The following bash command can be used to view all the arrangement modes of each domain name and run host query to add it to a new file for further analysis.
For file in *; do python hostlookup. py $ file; done

After running the preceding command for about 30 minutes, we get a directory


You should also note that the directory contains 100 additional \ _ hostlookup files. In each file, we can see the IP address of each changed domain and other judgment information.

Reverse DNS query

At first, we wanted to perform reverse DNS queries to view the IP point records. However, we believe that more complete information can be obtained through WHOIS queries. In any case, this step should continue.

Next, we wrote a Python script to retrieve the pointing records from each returned array. It includes domain names, IP addresses, sorting types, pointing records, and determining whether the host names are true or false Based on the captured host names.

Run the following bash command to obtain the 100 files prefixed with _ rdns.
For file in *; do python rdnslookup. py $ file; done


In each file, we can see the results of pointing to records and true/false judgments.


WHOIS Query

Before performing a WHOIS query, we need to use the data obtained during host query.

In this section, we want to capture the description field in the WHOIS information. After WHOIS and DNS reverse queries, we have the ability to match IP addresses in the changed domain.

Finally, we obtain two data results, one from WHOIS query and the other from DNS query. Using this data, we make statistics to answer the questions raised earlier in the article. Before that, we need to import the data to Splunk.

Splunk Setup

To enable Splunk to recognize these fields, we need to configure props. conf in the/opt/splunk/etc/system/local/directory.
[Phishing]
REPORT-phishing = REPORT-phishing
[Whois]
REPORT-whois = REPORT-whois

Edit the transforms. conf file in/opt/splunk/etc/system/local /.
[REPORT-phishing]
DELIMS = ""
FIELDS = "domain", "ip", "perm_type", "hostname", "is_match"
[REPORT-whois]
DELIMS = ""
FIELDS = "domain", "ip", "perm_type", "owner", "is_match"

This is all for better analysis.



WHOIS Analysis

Go to the topic and start with WHOIS data analysis.

The following list shows the most common types of changes.
Sourcetype = whois | top perm_type


Well, how many potential domain names are registered by the original domain name owner?
Sourcetype = whois is_match = true | stats count

Among all domain names, we do not fully count that 460 domain names are registered by the original domain name owner.

Now let's take a look at the rankings of the types of changes that the original domain name owner loves


Domain names of the insertion type are very popular.
Sourcetype = whois is_match = true perm_type = "Insertion" | rex field = source "\/tmp \/(? [^ _] +) "| Top original_domain


Now let's ignore the arrangement type to see


Amazon seems to be the most concerned about protecting users.

Reverse DNS Analysis

Now, perform reverse DNS Analysis to see the most common change types.
Sourcetype = phishing | top perm_type


How many potential domain names are registered by the original domain name owner?
Sourcetype = phishing is_match = true | stats count

Among all domain names, we do not fully count that 381 domain names are registered by the original domain name owner.

Now let's take a look at the rankings of the types of changes that the original domain name owner loves


It is very popular to have two data sources that indicate the insertion type domain name.
Sourcetype = phishing is_match = true perm_type = "Insertion" | rex field = source "\/tmp \/(? [^ _] +) "| Top original_domain


Ignore the arrangement type
Sourcetype = phishing is_match = true | rex field = source "\/tmp \/(? [^ _] +) "| Top original_domain


Similar to the conclusions drawn from using WHOIS data, Amazon has made great efforts to defend against phishing entries.

DDoS Defense site

Of course, we know that this statistical result is not completely correct. As we know, for example, the wikipedia.com domain name is owned by Wikimedia, but we still record it as a false site.

We also found that a large number of domain name records point to prolexic.com, which is a DDoS defense site. We suspect that the phishing domain name will use this anti-DDoS service because the cost of high traffic is not what the average person can afford. Based on this reality, we will count the sites pointing to prolexic.com into the real site.

Let's re-run some initial searches

First, how many domain names have changed domain names for protection?
Sourcetype = phishing | eval ddos = if (searchmatch ("hostname = * prolexic *"), "True", "False ") | search ddos = "True" OR is_match = "True" | stats count

We can see that the result is 808 instead of the previous 381, which is a big change.

Change type ranking


Which domain name is the most conscientious?


Summary

Finally, the most interesting research results came out.

The most common types of domain name changes are replacement and insertion (netflox.com and netfliix.com). At the same time, we also find that most companies use DDoS defense sites to switch to their changed Domain Names (this is just an interesting point, not surprising.) Finally, we can see that amazon.com, booking.com, and yahoo.com are very conscientious for users, so they are afraid that errors may occur when users enter URLs.

Efforts made by amazon.com, booking.com, and yahoo.com to defend against phishing attacks,
Salute!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.