Construction of social engineering data search engine

Last Update:2014-10-04 Source: Internet

Author: User

Tags sqlite database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: le4f

Http://le4f.net/post/post/build-social-engineer-evildata-search-engine

How to design and build a social engineering database

From the design of a social engineering database at the beginning to the present beta, it took a whole month before and after the beginning, and all the records are recorded by Lin Zong, leaving them to the ones who need them.

The format of the leaked database is not the same, and it looks amazing. Because the user name, password, mailbox, and hash information need to be stored in the database by table, data sorting is a very time-consuming step.

Overall Architecture

Debian x64 + MySQL + mysqlcft + sphft + nginx + PHP

Simple SQLite database, but the query speed and space consumption are too high, resulting in poor processing efficiency for SQLite requests

Mangodb, but mangodb has better advantages in distributed computing, but it is not obvious in the construction of social engineering libraries.

In MySQL and PostgreSQL, select MySql

To facilitate the migration of the social engineering database, a VM is used to create a virtual machine. All databases are in the virtual machine. The lnmp environment only provides the PHP query interface.

In actual testing, the centos6.4 virtual machine loses too much data due to power failure and is not as stable as expected. In addition, the mmseg Chinese Word Segmentation compilation of coreseek versions may encounter various problems, which are difficult to solve. The 32-bit operating system has a limit on the size of a single index in sphbench. After several tossing times, it still moved back to debian7 x64.

Sort data

Generally, bash scripts are used. Demo:

Common Format:

#!/bin/bashsed ‘/^[ \t]\{0,\}$/d‘ $1 | sed ‘s/[\t ]/\t/g‘ | sed ‘s/\r//g‘ |sed ‘/^\s*$/d‘ | awk -F"\t" ‘{print "NULL|||"$1"|||"$2"||||||"$1"|||xxxx|||" }‘ | iconv -c -f gb18030 -t UTF-8 | sort | uniq he

Ucmember format:

#!/bin/bashsed ‘s/[\r\n ]//g‘ $1 | tr -d ‘\n‘ | sed "s/INSERTINTO\`bbs\_uc\_members\`VALUES(//g" | sed "s/‘,‘/\t/g" | sed "s/,‘/\t/g" | sed "s/‘,/\t/g" |sed "s/‘);/\n/g" | awk -F"\t" ‘{ print "NULL|||"$2"||||||"$3"|||"$9"|||"$4"|||"$7"|||xxxx+xxxx+evil++"}‘ | sed "/+evil++||||||||||||/d"

Pw_member format:

#!/bin/bashsed ‘s/[\r\n ]//g‘ $1 | tr -d ‘\n‘ | sed "s/INSERTINTO\`pw_members\`VALUES(‘/\n/g" | sed "s/‘,‘/\t/g" | awk -F"\t" ‘{ print $1"|||"$2"||||||"$3"|||"$5"|||xxxx|||"}‘ | sed "/+evil++||||||||||||/d"

Processing scripts in different file formats:

#!/bin/bashsed ‘s/\r//g‘ $1 | sed "s/‘/[fenhao]/g" | sed ‘s/[ \t]\{1,\}/\t/g‘ | sed ‘s/----/\t/g‘ | awk -F "\t" ‘{ if ($2~/@163.com/){if ($3~/^$/) {print "NULL|||"$1"|||"$2$3"||||||"}else {print "NULL|||"$1"|||"$3"||||||"}}else{print "NULL|||"$1"|||"$2"||||||"}}‘ | iconv -c -f gb18030 -t UTF-8 | sort | uniq

Batch import by line:

#!/bin/bashsed ‘/^[ \t]\{0,\}$/d‘ $1 | sed ‘s/\r//g‘ | sed ‘/^\s*$/d‘ | awk ‘{print "NULL|||"$0"|||xxxx|||" }‘

It is also important for database design. After the test, the number of records stored by table and single table cannot exceed 3000. Otherwise, it takes too long to create a full-text index.

Database and Table Design

CREATE TABLE `evil_xxxx_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`username` VARCHAR( 64 ) ,`password` VARCHAR( 64 ) ,`passhash` VARCHAR( 64 ) ,`email` VARCHAR( 64 ));CREATE TABLE `evil_ucmember_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`username` VARCHAR( 64 ) ,`password` VARCHAR( 64 ) ,`passhash` VARCHAR( 64 ) ,`salt` VARCHAR( 16 ) ,`email` VARCHAR( 64 ) ,`ip` VARCHAR( 32 ) ,`site` VARCHAR( 64 ) ,PRIMARY KEY (id));CREATE TABLE `evil_multinfo_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`info` VARCHAR( 1024 ) ,`site` VARCHAR( 64 ) ,PRIMARY KEY (id));

Table sharding solution:

INSERT INTO evil_xx_1(evil_xx.id,evil_xx.username,evil_xx.password,evil_xx.passhash) SELECT (evil_xx.id,evil_xx.username,evil_xx.password,evil_xx.passhash)FROM evil_xx where user.id <= 30000000;

Command Query

Show table status; // view the database status DESC table; // view the table structure set names 'utf8'; // set the encoding, data table unified encoding UTF-8drop index index_name on table_name; // Delete the index alter table table_name add primary key (ID); // Add the primary key alter table table_name drop primary key; // remove the primary key truncate table table_name; // clear the alter table table name Rename to the new table name; // rename the alter table table_name change name varchar (16 ); // rename the column and format alter table table_name add column XXXX varchar (64) after AAAA; // Add the alter table Table Name drop column name; // Delete the column set @ T = 0; update table_name Set ID = (@ T: [email protected] + 1); // recreate the ID

Import Data

load data infile ‘/tmp/xxxx.sql‘ into table `evil_xxxx` fields terminated by ‘|||‘lines terminated by ‘\n‘;

Relatively Fast

About Indexes

Full-text Chinese index mainly depends on mysqlctf and sphsf-test. It is found that the sphsf-word segmentation index is faster, but the word segmentation causes such as searching humor but not searching humorous, however, the index speed of mysqlctf is relatively slow and there is no progress information, but the index size is controllable. Generally, the full-text index 1g Database Information corresponds to the myi size of 5-9G

Many official documents on coreseek compilation and usage

Sphsf-configuration file (full-text search includes five letters for Word Segmentation and minimum Word Segmentation)

Source evil_xxx {type = MySQL SQL _host = localhost SQL _user = OOO SQL _pass = xxx SQL _db = dB SQL _port = 3306 SQL _query_pre = set names utf8 SQL _query = select ID, username, password, passhash, email from XXXX SQL _query_info = select * from XXXX where id = $ id} index evil_xxx {source = evil_xxx Path =/usr/local/coreseek/var/data/evil_xxx docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 charset_dictpath =/usr/local/mmseg3/etc/charset_type = zh_cn.utf-8 ngram_len = 0 min_infix_len = 5 enable_star = 1 // allow fuzzy search}

Call the official API for retrieval.

How to create an index using mysqlcft

First, install the plug-in

wget http://mysqlcft.googlecode.com/files/mysqlcft-1.0.0-x86_64-bin.tar.gztar zxvf mysqlcft-1.0.0-x86_64-bin.tar.gzmkdir -p /usr/local/mysqlcft/lib/mysql/plugin/cp mysqlcft.so /usr/local/mysqlcft/lib/mysql/plugin/

After logging on to MySQL

INSTALL PLUGIN mysqlcft SONAME ‘mysqlcft.so‘;SHOW PLUGINS;

Create an index after installing OK

Alter ignore table database name. Table name Add Fulltext index full-text joint index name (field name 1, field name 2) with parser mysqlcft;

For example

ALTER IGNORE TABLE db.evil_xxxx ADD FULLTEXT INDEX fullindex(username，password,passhash,email) WITH PARSER mysqlcft;

When searching

SELECT * FROM evil_xxxx WHERE MATCH(username,password,passhash,email) AGAINST (‘searchinfo‘ IN BOOLEAN MODE);

Re-Indexing

Repair table database name. Table Name quick;

Several tips

Add source database/website fields to facilitate Query
The format of separators in the same database is different. Regular Expression replacement
The database engine selects MyISAM (instead of InnoDB) and can use alter conversion.
To avoid overlap with password characters, you can use separators such as = AAAA = to replace
Some mailboxes in the database are in front of the password, some are in front of the password, and some are in front of the mailbox, use awk to write several more lines of branch statements
Configure the my. CNF file to maximize Mysql Data processing efficiency

From le4f 'blog

Construction of social engineering data search engine

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More