Analysis of a phishing attack against Alexa Top 100 websites
Ladies and gentlemen, we will do something special today. This article is jointly written by Ethan Dodge (@__ eth0) and Brian Warehime. We will discuss anti-phishing techniques in Alexa Top 100 domain names, at the same time, it will expose some cases of phishing attacks against these domain names.
We use a new DNS detection tool DNStwist and some self-written Python scripts to collect and analyze all the information we found. Of course, you can follow our ideas to explore it.
Summary
We will capture the top 100 domain names in the Alexa ranking, use a script to collect the modified domain Names (permutated domain) and the sort type (bitsquatting, insert, default, replace, etc.) obtained by DNStwist ), with the domain name in this list, we can query the host to obtain the IP address of the host. Finally, we will perform WHOIS queries and reverse DNS queries on IP addresses to compare their registration/pointing records.
After data comparison, we can find out which type of arrangement covers the widest range of attacks (that is, the domain name registration change domain may prevent Phishing) and which type of arrangement covers the least.
Capture Data
First, we need to obtain the data of the previous millions of Alexa sites.
Wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
Then narrow down to the first 100
Cat top-1m.csv | awk-F', '{'print $ 2'} | head-n 100> alexatop100.txt
Retrieve domain names
At first, we will get an arrangement list from DNStwist, and we will continue to modify the original script to caster the redundant output information. We only need to arrange the type and the domain name generated by the sort type.
The following shows the sorting result generated using google.com as an example.
Now we have our own domain name list, and then use the following bash command to traverse all domain names. Then, the modified dnstwist is run to output the result to a new file.
While read domain; do python dnstwist. py $ domain> ~ /Desktop/alexatop100/$ domain; done ~ /Desktop/alexatop100.txt
It takes about five seconds to run the preceding command. The directory we obtained is probably like this.
Host search
In our plan, the next step is to search for the host of the generated domain name list. We want to get a text document containing the changed domain name, change type, and IP address.
The following bash command can be used to view all the arrangement modes of each domain name and run host query to add it to a new file for further analysis.
For file in *; do python hostlookup. py $ file; done
After running the preceding command for about 30 minutes, we get a directory
You should also note that the directory contains 100 additional \ _ hostlookup files. In each file, we can see the IP address of each changed domain and other judgment information.
Reverse DNS query
At first, we wanted to perform reverse DNS queries to view the IP point records. However, we believe that more complete information can be obtained through WHOIS queries. In any case, this step should continue.
Next, we wrote a Python script to retrieve the pointing records from each returned array. It includes domain names, IP addresses, sorting types, pointing records, and determining whether the host names are true or false Based on the captured host names.
Run the following bash command to obtain the 100 files prefixed with _ rdns.
For file in *; do python rdnslookup. py $ file; done
In each file, we can see the results of pointing to records and true/false judgments.
WHOIS Query
Before performing a WHOIS query, we need to use the data obtained during host query.
In this section, we want to capture the description field in the WHOIS information. After WHOIS and DNS reverse queries, we have the ability to match IP addresses in the changed domain.
Finally, we obtain two data results, one from WHOIS query and the other from DNS query. Using this data, we make statistics to answer the questions raised earlier in the article. Before that, we need to import the data to Splunk.
Splunk Setup
To enable Splunk to recognize these fields, we need to configure props. conf in the/opt/splunk/etc/system/local/directory.
[Phishing]
REPORT-phishing = REPORT-phishing
[Whois]
REPORT-whois = REPORT-whois
Edit the transforms. conf file in/opt/splunk/etc/system/local /.
[REPORT-phishing]
DELIMS = ""
FIELDS = "domain", "ip", "perm_type", "hostname", "is_match"
[REPORT-whois]
DELIMS = ""
FIELDS = "domain", "ip", "perm_type", "owner", "is_match"
This is all for better analysis.
WHOIS Analysis
Go to the topic and start with WHOIS data analysis.
The following list shows the most common types of changes.
Sourcetype = whois | top perm_type
Well, how many potential domain names are registered by the original domain name owner?
Sourcetype = whois is_match = true | stats count
Among all domain names, we do not fully count that 460 domain names are registered by the original domain name owner.
Now let's take a look at the rankings of the types of changes that the original domain name owner loves
Domain names of the insertion type are very popular.
Sourcetype = whois is_match = true perm_type = "Insertion" | rex field = source "\/tmp \/(? [^ _] +) "| Top original_domain
Now let's ignore the arrangement type to see
Amazon seems to be the most concerned about protecting users.
Reverse DNS Analysis
Now, perform reverse DNS Analysis to see the most common change types.
Sourcetype = phishing | top perm_type
How many potential domain names are registered by the original domain name owner?
Sourcetype = phishing is_match = true | stats count
Among all domain names, we do not fully count that 381 domain names are registered by the original domain name owner.
Now let's take a look at the rankings of the types of changes that the original domain name owner loves
It is very popular to have two data sources that indicate the insertion type domain name.
Sourcetype = phishing is_match = true perm_type = "Insertion" | rex field = source "\/tmp \/(? [^ _] +) "| Top original_domain
Ignore the arrangement type
Sourcetype = phishing is_match = true | rex field = source "\/tmp \/(? [^ _] +) "| Top original_domain
Similar to the conclusions drawn from using WHOIS data, Amazon has made great efforts to defend against phishing entries.
DDoS Defense site
Of course, we know that this statistical result is not completely correct. As we know, for example, the wikipedia.com domain name is owned by Wikimedia, but we still record it as a false site.
We also found that a large number of domain name records point to prolexic.com, which is a DDoS defense site. We suspect that the phishing domain name will use this anti-DDoS service because the cost of high traffic is not what the average person can afford. Based on this reality, we will count the sites pointing to prolexic.com into the real site.
Let's re-run some initial searches
First, how many domain names have changed domain names for protection?
Sourcetype = phishing | eval ddos = if (searchmatch ("hostname = * prolexic *"), "True", "False ") | search ddos = "True" OR is_match = "True" | stats count
We can see that the result is 808 instead of the previous 381, which is a big change.
Change type ranking
Which domain name is the most conscientious?
Summary
Finally, the most interesting research results came out.
The most common types of domain name changes are replacement and insertion (netflox.com and netfliix.com). At the same time, we also find that most companies use DDoS defense sites to switch to their changed Domain Names (this is just an interesting point, not surprising.) Finally, we can see that amazon.com, booking.com, and yahoo.com are very conscientious for users, so they are afraid that errors may occur when users enter URLs.
Efforts made by amazon.com, booking.com, and yahoo.com to defend against phishing attacks,
Salute!