Author: le4f
Http://le4f.net/post/post/build-social-engineer-evildata-search-engine
How to design and build a social engineering database
From the design of a social engineering database at the beginning to the present beta, it took a whole month before and after the beginning, and all the records are recorded by Lin Zong, leaving them to the ones who need them.
The format of the leaked database is not the same, and it looks amazing. Because the user name, password, mailbox, and hash information need to be stored in the database by table, data sorting is a very time-consuming step.
Overall Architecture
Debian x64 + MySQL + mysqlcft + sphft + nginx + PHP
Simple SQLite database, but the query speed and space consumption are too high, resulting in poor processing efficiency for SQLite requests
Mangodb, but mangodb has better advantages in distributed computing, but it is not obvious in the construction of social engineering libraries.
In MySQL and PostgreSQL, select MySql
To facilitate the migration of the social engineering database, a VM is used to create a virtual machine. All databases are in the virtual machine. The lnmp environment only provides the PHP query interface.
In actual testing, the centos6.4 virtual machine loses too much data due to power failure and is not as stable as expected. In addition, the mmseg Chinese Word Segmentation compilation of coreseek versions may encounter various problems, which are difficult to solve. The 32-bit operating system has a limit on the size of a single index in sphbench. After several tossing times, it still moved back to debian7 x64.
Sort data
Generally, bash scripts are used. Demo:
Common Format:
#!/bin/bashsed ‘/^[ \t]\{0,\}$/d‘ $1 | sed ‘s/[\t ]/\t/g‘ | sed ‘s/\r//g‘ |sed ‘/^\s*$/d‘ | awk -F"\t" ‘{print "NULL|||"$1"|||"$2"||||||"$1"|||xxxx|||" }‘ | iconv -c -f gb18030 -t UTF-8 | sort | uniq he
Ucmember format:
#!/bin/bashsed ‘s/[\r\n ]//g‘ $1 | tr -d ‘\n‘ | sed "s/INSERTINTO\`bbs\_uc\_members\`VALUES(//g" | sed "s/‘,‘/\t/g" | sed "s/,‘/\t/g" | sed "s/‘,/\t/g" |sed "s/‘);/\n/g" | awk -F"\t" ‘{ print "NULL|||"$2"||||||"$3"|||"$9"|||"$4"|||"$7"|||xxxx+xxxx+evil++"}‘ | sed "/+evil++||||||||||||/d"
Pw_member format:
#!/bin/bashsed ‘s/[\r\n ]//g‘ $1 | tr -d ‘\n‘ | sed "s/INSERTINTO\`pw_members\`VALUES(‘/\n/g" | sed "s/‘,‘/\t/g" | awk -F"\t" ‘{ print $1"|||"$2"||||||"$3"|||"$5"|||xxxx|||"}‘ | sed "/+evil++||||||||||||/d"
Processing scripts in different file formats:
#!/bin/bashsed ‘s/\r//g‘ $1 | sed "s/‘/[fenhao]/g" | sed ‘s/[ \t]\{1,\}/\t/g‘ | sed ‘s/----/\t/g‘ | awk -F "\t" ‘{ if ($2~/@163.com/){if ($3~/^$/) {print "NULL|||"$1"|||"$2$3"||||||"}else {print "NULL|||"$1"|||"$3"||||||"}}else{print "NULL|||"$1"|||"$2"||||||"}}‘ | iconv -c -f gb18030 -t UTF-8 | sort | uniq
Batch import by line:
#!/bin/bashsed ‘/^[ \t]\{0,\}$/d‘ $1 | sed ‘s/\r//g‘ | sed ‘/^\s*$/d‘ | awk ‘{print "NULL|||"$0"|||xxxx|||" }‘
It is also important for database design. After the test, the number of records stored by table and single table cannot exceed 3000. Otherwise, it takes too long to create a full-text index.
Database and Table Design
CREATE TABLE `evil_xxxx_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`username` VARCHAR( 64 ) ,`password` VARCHAR( 64 ) ,`passhash` VARCHAR( 64 ) ,`email` VARCHAR( 64 ));CREATE TABLE `evil_ucmember_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`username` VARCHAR( 64 ) ,`password` VARCHAR( 64 ) ,`passhash` VARCHAR( 64 ) ,`salt` VARCHAR( 16 ) ,`email` VARCHAR( 64 ) ,`ip` VARCHAR( 32 ) ,`site` VARCHAR( 64 ) ,PRIMARY KEY (id));CREATE TABLE `evil_multinfo_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`info` VARCHAR( 1024 ) ,`site` VARCHAR( 64 ) ,PRIMARY KEY (id));
Table sharding solution:
INSERT INTO evil_xx_1(evil_xx.id,evil_xx.username,evil_xx.password,evil_xx.passhash) SELECT (evil_xx.id,evil_xx.username,evil_xx.password,evil_xx.passhash)FROM evil_xx where user.id <= 30000000;
Command Query
Show table status; // view the database status DESC table; // view the table structure set names 'utf8'; // set the encoding, data table unified encoding UTF-8drop index index_name on table_name; // Delete the index alter table table_name add primary key (ID); // Add the primary key alter table table_name drop primary key; // remove the primary key truncate table table_name; // clear the alter table table name Rename to the new table name; // rename the alter table table_name change name varchar (16 ); // rename the column and format alter table table_name add column XXXX varchar (64) after AAAA; // Add the alter table Table Name drop column name; // Delete the column set @ T = 0; update table_name Set ID = (@ T: [email protected] + 1); // recreate the ID
Import Data
load data infile ‘/tmp/xxxx.sql‘ into table `evil_xxxx` fields terminated by ‘|||‘lines terminated by ‘\n‘;
Relatively Fast
About Indexes
Full-text Chinese index mainly depends on mysqlctf and sphsf-test. It is found that the sphsf-word segmentation index is faster, but the word segmentation causes such as searching humor but not searching humorous, however, the index speed of mysqlctf is relatively slow and there is no progress information, but the index size is controllable. Generally, the full-text index 1g Database Information corresponds to the myi size of 5-9G
Many official documents on coreseek compilation and usage
Sphsf-configuration file (full-text search includes five letters for Word Segmentation and minimum Word Segmentation)
Source evil_xxx {type = MySQL SQL _host = localhost SQL _user = OOO SQL _pass = xxx SQL _db = dB SQL _port = 3306 SQL _query_pre = set names utf8 SQL _query = select ID, username, password, passhash, email from XXXX SQL _query_info = select * from XXXX where id = $ id} index evil_xxx {source = evil_xxx Path =/usr/local/coreseek/var/data/evil_xxx docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 charset_dictpath =/usr/local/mmseg3/etc/charset_type = zh_cn.utf-8 ngram_len = 0 min_infix_len = 5 enable_star = 1 // allow fuzzy search}
Call the official API for retrieval.
How to create an index using mysqlcft
First, install the plug-in
wget http://mysqlcft.googlecode.com/files/mysqlcft-1.0.0-x86_64-bin.tar.gztar zxvf mysqlcft-1.0.0-x86_64-bin.tar.gzmkdir -p /usr/local/mysqlcft/lib/mysql/plugin/cp mysqlcft.so /usr/local/mysqlcft/lib/mysql/plugin/
After logging on to MySQL
INSTALL PLUGIN mysqlcft SONAME ‘mysqlcft.so‘;SHOW PLUGINS;
Create an index after installing OK
Alter ignore table database name. Table name Add Fulltext index full-text joint index name (field name 1, field name 2) with parser mysqlcft;
For example
ALTER IGNORE TABLE db.evil_xxxx ADD FULLTEXT INDEX fullindex(username,password,passhash,email) WITH PARSER mysqlcft;
When searching
SELECT * FROM evil_xxxx WHERE MATCH(username,password,passhash,email) AGAINST (‘searchinfo‘ IN BOOLEAN MODE);
Re-Indexing
Repair table database name. Table Name quick;
Several tips
- Add source database/website fields to facilitate Query
- The format of separators in the same database is different. Regular Expression replacement
- The database engine selects MyISAM (instead of InnoDB) and can use alter conversion.
- To avoid overlap with password characters, you can use separators such as = AAAA = to replace
- Some mailboxes in the database are in front of the password, some are in front of the password, and some are in front of the mailbox, use awk to write several more lines of branch statements
- Configure the my. CNF file to maximize Mysql Data processing efficiency
From le4f 'blog
Construction of social engineering data search engine