Solution to data asymmetry between MySQL and Elasticsearch: mysqlelasticsearch
Solutions to data asymmetry between MySQL and Elasticsearch
Jdbc-input-plugin can only be used to append data to a database. elasticsearch writes data incrementally. However, databases at the First-end of the jdbc source may be deleted or updated. In this way, the database is asymmetrical with the search engine database.
Of course, if you have a development team, you can write programs to synchronize operations on search engines when they are deleted or updated. If you do not have this capability, try the following methods.
Here is a data table article. The mtime field defines the on update CURRENT_TIMESTAMP, so the time of every mtime UPDATE changes.
mysql> desc article;+-------------+--------------+------+-----+--------------------------------+-------+| Field | Type | Null | Key | Default | Extra |+-------------+--------------+------+-----+--------------------------------+-------+| id | int(11) | NO | | 0 | || title | mediumtext | NO | | NULL | || description | mediumtext | YES | | NULL | || author | varchar(100) | YES | | NULL | || source | varchar(100) | YES | | NULL | || content | longtext | YES | | NULL | || status | enum('Y','N')| NO | | 'N' | || ctime | timestamp | NO | | CURRENT_TIMESTAMP | || mtime | timestamp | YES | | ON UPDATE CURRENT_TIMESTAMP | |+-------------+--------------+------+-----+--------------------------------+-------+7 rows in set (0.00 sec)
Logstash adds mtime query rules
Jdbc {jdbc_driver_library => "/usr/share/java/mysql-connector-java.jar" jdbc_driver_class => "com. mysql. jdbc. driver "jdbc_connection_string =>" jdbc: mysql: // localhost: 3306/cms "jdbc_user =>" cms "jdbc_password =>" password "schedule =>" ***** "# regular cron expression, statement => "select * from article where mtime>: SQL _last_value "use_column_value => true tracking_column =>" mtime "tracking_column_type =>" timestamp "record_last_run => true last_run_metadata_path =>"/var/tmp/article-mtime.last "}
Create a recycle bin table to delete the database or disable status = 'n.
CREATE TABLE `elasticsearch_trash` ( `id` int(11) NOT NULL, `ctime` timestamp NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8
Create a trigger for the article table
Create definer = 'dba '@' % 'TRIGGER' article _ BEFORE_UPDATE 'before update on 'Article' for each rowbegin -- the logic here is to solve the problem when the article state changes to N, you need to delete the data in the search engine. If new. status = 'n' THEN insert into elasticsearch_trash (id) values (OLD. id); end if; -- when the logic here is to modify the state to Y, the elasticsearch_trash method still has this article ID, resulting in accidental deletion. Therefore, you need to delete the recycle records in the recycle bin. If new. status = 'y' THEN delete from elasticsearch_trash where id = OLD. id; end if; endcreate definer = 'dba '@' % 'TRIGGER' article _ BEFORE_DELETE 'before delete on 'Article' for each rowbegin -- the logic here is that the deleted article will be put into the search engine recycle bin.. Insert into elasticsearch_trash (id) values (OLD. id); END
Next, we need to write a simple Shell that runs every minute to retrieve data from the elasticsearch_trash data table, and then use the curl command to call the elasticsearch restful interface to delete the recovered data.
You can also develop related programs. Here is an example of a Spring boot timing task.
Entity
package cn.netkiller.api.domain.elasticsearch;import java.util.Date;import javax.persistence.Column;import javax.persistence.Entity;import javax.persistence.Id;import javax.persistence.Table;@Entity@Tablepublic class ElasticsearchTrash { @Id private int id; @Column(columnDefinition = "TIMESTAMP DEFAULT CURRENT_TIMESTAMP") private Date ctime; public int getId() { return id; } public void setId(int id) { this.id = id; } public Date getCtime() { return ctime; } public void setCtime(Date ctime) { this.ctime = ctime; }}
Warehouse
package cn.netkiller.api.repository.elasticsearch;import org.springframework.data.repository.CrudRepository;import com.example.api.domain.elasticsearch.ElasticsearchTrash;public interface ElasticsearchTrashRepository extends CrudRepository<ElasticsearchTrash, Integer>{}
Scheduled task
Package cn. netkiller. api. schedule; import org. elasticsearch. action. delete. deleteResponse; import org. elasticsearch. client. transport. transportClient; import org. elasticsearch. rest. restStatus; import org. slf4j. logger; import org. slf4j. loggerFactory; import org. springframework. beans. factory. annotation. autowired; import org. springframework. scheduling. annotation. scheduled; import org. springframework. stereotype. component; import com. example. api. domain. elasticsearch. elasticsearchTrash; import com. example. api. repository. elasticsearch. elasticsearchTrashRepository; @ Componentpublic class ScheduledTasks {private static final Logger logger = LoggerFactory. getLogger (ScheduledTasks. class); @ Autowired private TransportClient client; @ Autowired private ElasticsearchTrashRepository alasticsearchTrashRepository; public ScheduledTasks () {}@ Scheduled (fixedRate = 1000*60) // run the scheduling task public void cleanTrash () {for (ElasticsearchTrash elasticsearchTrash: alasticsearchTrashRepository. findAll () {DeleteResponse response = client. prepareDelete ("information", "article", elasticsearchTrash. getId () + ""). get (); RestStatus status = response. status (); logger.info ("delete {}{}", elasticsearchTrash. getId (), status. toString (); if (status = RestStatus. OK | status = RestStatus. NOT_FOUND) {alasticsearchTrashRepository. delete (elasticsearchTrash );}}}}
Start the main program in Spring boot.
package cn.netkiller.api;import org.springframework.boot.SpringApplication;import org.springframework.boot.autoconfigure.SpringBootApplication;import org.springframework.scheduling.annotation.EnableScheduling;@SpringBootApplication@EnableSchedulingpublic class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); }}
The above is an explanation of the solution to the data asymmetry between MySQL and Elasticsearch. If you have any questions, please leave a message or go to the community on this site for discussion. Thank you for reading this article and hope to help you, thank you for your support!