Construction of social engineering data search engine

Source: Internet
Author: User
Tags sqlite database

Author: le4f

Http://le4f.net/post/post/build-social-engineer-evildata-search-engine

How to design and build a social engineering database

From the design of a social engineering database at the beginning to the present beta, it took a whole month before and after the beginning, and all the records are recorded by Lin Zong, leaving them to the ones who need them.

The format of the leaked database is not the same, and it looks amazing. Because the user name, password, mailbox, and hash information need to be stored in the database by table, data sorting is a very time-consuming step.

Overall Architecture

Debian x64 + MySQL + mysqlcft + sphft + nginx + PHP

Simple SQLite database, but the query speed and space consumption are too high, resulting in poor processing efficiency for SQLite requests

Mangodb, but mangodb has better advantages in distributed computing, but it is not obvious in the construction of social engineering libraries.

In MySQL and PostgreSQL, select MySql

To facilitate the migration of the social engineering database, a VM is used to create a virtual machine. All databases are in the virtual machine. The lnmp environment only provides the PHP query interface.

In actual testing, the centos6.4 virtual machine loses too much data due to power failure and is not as stable as expected. In addition, the mmseg Chinese Word Segmentation compilation of coreseek versions may encounter various problems, which are difficult to solve. The 32-bit operating system has a limit on the size of a single index in sphbench. After several tossing times, it still moved back to debian7 x64.

Sort data

Generally, bash scripts are used. Demo:

Common Format:

#!/bin/bashsed ‘/^[ \t]\{0,\}$/d‘ $1 | sed ‘s/[\t ]/\t/g‘ | sed ‘s/\r//g‘ |sed ‘/^\s*$/d‘ | awk -F"\t" ‘{print "NULL|||"$1"|||"$2"||||||"$1"|||xxxx|||" }‘ | iconv -c -f gb18030 -t UTF-8 | sort | uniq he

Ucmember format:

#!/bin/bashsed ‘s/[\r\n ]//g‘ $1 | tr -d ‘\n‘ | sed "s/INSERTINTO\`bbs\_uc\_members\`VALUES(//g" | sed "s/‘,‘/\t/g" | sed "s/,‘/\t/g" | sed "s/‘,/\t/g" |sed "s/‘);/\n/g" | awk -F"\t" ‘{ print "NULL|||"$2"||||||"$3"|||"$9"|||"$4"|||"$7"|||xxxx+xxxx+evil++"}‘ | sed "/+evil++||||||||||||/d"

Pw_member format:

#!/bin/bashsed ‘s/[\r\n ]//g‘ $1 | tr -d ‘\n‘ | sed "s/INSERTINTO\`pw_members\`VALUES(‘/\n/g" | sed "s/‘,‘/\t/g" | awk -F"\t" ‘{ print $1"|||"$2"||||||"$3"|||"$5"|||xxxx|||"}‘ | sed "/+evil++||||||||||||/d"

Processing scripts in different file formats:

#!/bin/bashsed ‘s/\r//g‘ $1 | sed "s/‘/[fenhao]/g" | sed ‘s/[ \t]\{1,\}/\t/g‘ | sed ‘s/----/\t/g‘ | awk -F "\t" ‘{ if ($2~/@163.com/){if ($3~/^$/) {print "NULL|||"$1"|||"$2$3"||||||"}else {print "NULL|||"$1"|||"$3"||||||"}}else{print "NULL|||"$1"|||"$2"||||||"}}‘ | iconv -c -f gb18030 -t UTF-8 | sort | uniq

Batch import by line:

#!/bin/bashsed ‘/^[ \t]\{0,\}$/d‘ $1 | sed ‘s/\r//g‘ | sed ‘/^\s*$/d‘ | awk ‘{print "NULL|||"$0"|||xxxx|||" }‘

It is also important for database design. After the test, the number of records stored by table and single table cannot exceed 3000. Otherwise, it takes too long to create a full-text index.

Database and Table Design
CREATE TABLE `evil_xxxx_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`username` VARCHAR( 64 ) ,`password` VARCHAR( 64 ) ,`passhash` VARCHAR( 64 ) ,`email` VARCHAR( 64 ));CREATE TABLE `evil_ucmember_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`username` VARCHAR( 64 ) ,`password` VARCHAR( 64 ) ,`passhash` VARCHAR( 64 ) ,`salt` VARCHAR( 16 ) ,`email` VARCHAR( 64 ) ,`ip` VARCHAR( 32 ) ,`site` VARCHAR( 64 ) ,PRIMARY KEY (id));CREATE TABLE `evil_multinfo_1` (`id` INT( 10 ) NOT NULL AUTO_INCREMENT,`info` VARCHAR( 1024 ) ,`site` VARCHAR( 64 ) ,PRIMARY KEY (id));

Table sharding solution:

INSERT INTO evil_xx_1(evil_xx.id,evil_xx.username,evil_xx.password,evil_xx.passhash) SELECT (evil_xx.id,evil_xx.username,evil_xx.password,evil_xx.passhash)FROM evil_xx where user.id <= 30000000;
Command Query
Show table status; // view the database status DESC table; // view the table structure set names 'utf8'; // set the encoding, data table unified encoding UTF-8drop index index_name on table_name; // Delete the index alter table table_name add primary key (ID); // Add the primary key alter table table_name drop primary key; // remove the primary key truncate table table_name; // clear the alter table table name Rename to the new table name; // rename the alter table table_name change name varchar (16 ); // rename the column and format alter table table_name add column XXXX varchar (64) after AAAA; // Add the alter table Table Name drop column name; // Delete the column set @ T = 0; update table_name Set ID = (@ T: [email protected] + 1); // recreate the ID
Import Data
load data infile ‘/tmp/xxxx.sql‘ into table `evil_xxxx` fields terminated by ‘|||‘lines terminated by ‘\n‘;

Relatively Fast

About Indexes

Full-text Chinese index mainly depends on mysqlctf and sphsf-test. It is found that the sphsf-word segmentation index is faster, but the word segmentation causes such as searching humor but not searching humorous, however, the index speed of mysqlctf is relatively slow and there is no progress information, but the index size is controllable. Generally, the full-text index 1g Database Information corresponds to the myi size of 5-9G

Many official documents on coreseek compilation and usage

Sphsf-configuration file (full-text search includes five letters for Word Segmentation and minimum Word Segmentation)

Source evil_xxx {type = MySQL SQL _host = localhost SQL _user = OOO SQL _pass = xxx SQL _db = dB SQL _port = 3306 SQL _query_pre = set names utf8 SQL _query = select ID, username, password, passhash, email from XXXX SQL _query_info = select * from XXXX where id = $ id} index evil_xxx {source = evil_xxx Path =/usr/local/coreseek/var/data/evil_xxx docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 charset_dictpath =/usr/local/mmseg3/etc/charset_type = zh_cn.utf-8 ngram_len = 0 min_infix_len = 5 enable_star = 1 // allow fuzzy search}

Call the official API for retrieval.

How to create an index using mysqlcft

First, install the plug-in

wget http://mysqlcft.googlecode.com/files/mysqlcft-1.0.0-x86_64-bin.tar.gztar zxvf mysqlcft-1.0.0-x86_64-bin.tar.gzmkdir -p /usr/local/mysqlcft/lib/mysql/plugin/cp mysqlcft.so /usr/local/mysqlcft/lib/mysql/plugin/

After logging on to MySQL

INSTALL PLUGIN mysqlcft SONAME ‘mysqlcft.so‘;SHOW PLUGINS;

Create an index after installing OK

Alter ignore table database name. Table name Add Fulltext index full-text joint index name (field name 1, field name 2) with parser mysqlcft;

For example

ALTER IGNORE TABLE db.evil_xxxx ADD FULLTEXT INDEX fullindex(username,password,passhash,email) WITH PARSER mysqlcft;

When searching

SELECT * FROM evil_xxxx WHERE MATCH(username,password,passhash,email) AGAINST (‘searchinfo‘ IN BOOLEAN MODE);

Re-Indexing

Repair table database name. Table Name quick;
Several tips
  • Add source database/website fields to facilitate Query
  • The format of separators in the same database is different. Regular Expression replacement
  • The database engine selects MyISAM (instead of InnoDB) and can use alter conversion.
  • To avoid overlap with password characters, you can use separators such as = AAAA = to replace
  • Some mailboxes in the database are in front of the password, some are in front of the password, and some are in front of the mailbox, use awk to write several more lines of branch statements
  • Configure the my. CNF file to maximize Mysql Data processing efficiency

From le4f 'blog

Construction of social engineering data search engine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.