We use the MARIADB, which is used by this audit tool https://mariadb.com/kb/en/library/mariadb-audit-plugin/
This tool does not take into account the late processing of the data, because his log is like this
20180727 11:40:17,aaa-main-mariadb-bjc-001,user,10.1.111.11,3125928,6493942844,query,account, ' Select ID, company_id, user_id, Department, title, role, Create_time, Update_time, status, Is_del, Receive_email , contact From company WHERE ( user_id = 101 and Is_del = 0) ', 0
So it needs to be logstash formatted.
Input {file{Path= ["/data/logs/mariadb/server_audit.log"] Start_position="End"codec=Multiline {CharSet="iso-8859-1"pattern="^[0-9]{8}"negate=true What="previous"}}}filter {if "Quartz" inch[message] {drop {}} mutate {Gsub= [ "message","\s"," ", "message","\s+"," "]} dissect {mapping= { "message"="%{ts}%{+ts},%{hostname},%{user},%{dbhost},%{connid},%{queryid},%{operate},%{db},%{object}"}} mutate {replace= = {"message"="%{ts}%{hostname}%{user}%{dbhost}%{operate}%{db}%{object}"}}}output {file{Path="/data/logs/mariadb/%{host}_%{+yyyy-mm-dd_hh}.gz" gzip=truecodec= = line {format ="%{message}" } }}
Attention!!!
Replace that place between the fields I use tab-separated, if using vim must not write \ t, this in hive does not know anything else, in Vim, first press CTRL + V, and then press TAB
In Vim, the set list is displayed as follows
MARIADB audit log can not be cut by the hour, above Logstash I put the log by the hour to generate GZ file, the back is pushed to HDFs, during the trial of various methods
Logstash output [Webhdfs] efficiency No, we lost the data.
Syslog-ng
Rsyslog
It doesn't work, and ultimately it's perfect to use the HDFs CLI directly.
I put Logstash configuration and push the command to HDFs to the RPM, the following post rpm SPEC file also record it
name:logstashversion:1.0.0Release:1%{?Dist} Summary:specialize to MySQL audit log collectionLicense:GPLAutoReqProv:no%define __os_install_post%{nil}%Description%Prep%Build%InstallRM-RF $RPM _build_rootmkdir-P%{buildroot}/{apps,etc,usr,var/lib/logstash,var/log/Logstash}CP-R%{_BUILDDIR}/ETC/* %{buildroot}/etc/cp-r%{_builddir}/usr/*%{buildroot}/usr/cp-r%{_builddir}/apps/*%{buildroot}/apps/%postchown-r Root:root/usr/share/logstashchown-r root/var/log/logstashchown-r root:root/var/lib/logstashchown-r Root:root/apps /hadoop-2.6.0/usr/share/logstash/bin/system-install/etc/logstash/startup.optionscat >>/etc/hosts << eof# for Logstash push msyql Audit to HDFs fill in HDFs namenode and Datanode Hostseofecho "$ (shuf-i 3-15-n 1) * * * * *" ' sour Ce/etc/profile;/apps/hadoop-2.6.0/bin/hdfs dfs-copyfromlocal/data/logs/mariadb/${hostname}_$ (date-u + "\%Y-\%m-\% D_\%h "-D" last Hour "). GZ hdfs://active_namenode/mysql_audit/$ (date-u +" \%y-\%m-\%d ")/&& rm-f/data/logs/ mariadb/${hostname}_$ (date-u + "\%y-\%m-\%d_\%h"-D "last Hour"). Gz ' >>/var/spool/cron/rootinitctl start Logstash%files%defattr (-,root,root)/apps/hadoop-2.6.0/etc/logstash/usr/share/logstash/var/lib/logstash/var/log /logstash%preunif [$1-eq 0]; Then # Upstart if [-R '/etc/init/logstAsh.conf "]; Then if [-F "/sbin/stop"]; Then/sbin/stop logstash >/dev/null 2>&1 | | True Else/sbin/service logstash stop >/dev/null 2>&1 | | True fi if [-F "/etc/init/logstash.conf"]; Then rm/etc/init/logstash.conf fi # SYSV Elif [-R "/etc/init.d/logstash"]; Then/sbin/chkconfig--del Logstash If [-F "/etc/init.d/logstash"]; Then Rm/etc/init.d/logstash fi # systemd Else systemctl stop logstash >/dev/null 2>&1 | | True if [-F "/etc/systemd/system/logstash-prestart.sh"]; Then rm/etc/systemd/system/logstash-prestart.sh fi if [-F "/etc/systemd/system/logstash.service"]; Then Rm/etc/systemd/system/logstash.service fi fi if getent passwd logstash >/dev/null; Then Userdel Logstash fi if getent group Logstash >/dev/null; Then Groupdel logstash fifi%postun%cleanrm-rf $RPM _build_root
I put the Hadoop program in, so it's convenient.
Auto start Logstash Jingxiang after installing rpm
Now the log has been written to HDFs in days, and then into hive
Create Hive Table First
string,hostnamestringString, Object string int string ' \ t ';
divided by 3 partition
Load HDFs to Hive
#!/bin/bash# description:load HDFs MySQL audit gz to hive# author:quke# Date:2018- -- -Source/root/. Bash_profilecur_date=$(Date-U +"%y-%m-%d"-D"Last hour") Cur_date_short=$(Date-U +"%y%m%d"-D"Last hour") Cur_hour=$(Date-U +"%H"-D"Last hour") forFninch$ (HDFs DFS-ls/mysql_audit/${cur_date}/*_${cur_hour}.gz|awk ' {print $NF} ');d o host_name=$ (echo $fn |awk-f [/_] ' {print $ (NF-2)} ') MODULE=${HOST_NAME%-BJ c*} echo "Load Data inpath ' HDFS://OSSMONDB${FN} ' into table Mysql_audit partition (Dt=${cur_date_short},hour=${cur_hour },module= ' ${module} '); ">> hive.sqldonehive-f hive.sql && rm-f hive.sql
Have any questions welcome to Exchange
MARIADB Audit log Import hive