To analyze the performance of inserting massive data into the Cassandra Cluster or Oracle, that is, the insertion rate, we sampled the inserted data using a Java program, and finally plotted the sample results with Jfreechart.
For the sake of fairness, we did the following:
1. All the loop variables are placed outside the loop
2. For Cassandra, the Replication-factor setting is 1, so inserting the data does not require inserting additional backups.
3. For Oracle We use precompiled statements so that the execution plan for the insert operation can be reused.
4. All tests are done over the weekend, so that no one else can interfere with the servers.
5. Other processes running on these machines have been killed by me, thus guaranteeing the CPU, memory specificity.
6. When inserting Cassandra Records in Java code, I used the Thrift API because it was more efficient than the Hector API.
The following is the experiment (two parts, one is the sampling part, the other is the data analysis part)
Part 1: Sample:
Sampling of Cassandra:
We still use loops to insert the 50W record, and the difference is that when the loop starts and loops every 10,000 records, we record the timestamp in the list and eventually write the list to a text file (Cassandra_input_sample_data.txt):
Package Com.charles.cassandra.demo;
Import Java.io.File;
Import Java.io.FileWriter;
Import java.util.ArrayList;
Import java.util.List;
Import Org.apache.cassandra.thrift.Cassandra;
Import Org.apache.cassandra.thrift.Column;
Import org.apache.cassandra.thrift.ColumnParent;
Import Org.apache.cassandra.thrift.ConsistencyLevel;
Import Org.apache.cassandra.thrift.TBinaryProtocol;
Import Org.apache.thrift.protocol.TProtocol;
Import Org.apache.thrift.transport.TFramedTransport;
Import Org.apache.thrift.transport.TSocket;
Import Org.apache.thrift.transport.TTransport;
Import Com.charles.cassandra.util.CassandraOperationUtil;
public class Cassandraclusterstresstest {public static void main (string[] args) throws Exception {
Packaged socket Ttransport tr = new Tframedtransport (New Tsocket ("192.168.129.34", 9160);
Tprotocol proto = new Tbinaryprotocol (TR); Cassandra.client Client = New Cassandra.client (proto);
Tr.open ();
if (!tr.isopen ()) {System.out.println ("Unable to connect to server!");
Return
System.out.println ("Start the stress test, we insert 50W data into the 2-node cluster");
System.out.println ("...");
Mark Start time Long starttime = System.currenttimemillis (); Client.set_keyspace ("Charles_stress_test2");/use charles_stress_test keyspace columnparent parent = new ColumnPare
NT ("Student");//column Family * * Here we insert 500,000 data into student * Each data includes ID and name
*/String key_user_id = "a";
String K;
Long timestamp;
Column Idcolumn =null;
Column Namecolumn=null; This sampledata represents the data sample for every Cassandra of milliseconds recorded to the cluster in the 1W section list<integer> sampledata = new Arraylist<integer> (51)
;
for (int i = 0;i < 500000;i++) { K = key_user_id + i;//row key timestamp = System.currenttimemillis ();//time Stamp/ /first field in each row (id field) Idcolumn = new Column (Cassandraoperationutil.stringtobytebuffer ("id"));//Field name IdC Olumn.setvalue (Cassandraoperationutil.stringtobytebuffer (i + ""));//Field value Idcolumn.settimestamp (timestamp);//Time
Poke Client.insert (Cassandraoperationutil.stringtobytebuffer (k), parent,
Idcolumn, Consistencylevel.one);
The second field (Name field) of each row NameColumn = new Column (Cassandraoperationutil.stringtobytebuffer ("name"));
Namecolumn.setvalue (Cassandraoperationutil.stringtobytebuffer ("student" + i));
Namecolumn.settimestamp (timestamp);
Client.insert (Cassandraoperationutil.stringtobytebuffer (k), parent, NameColumn, COnsistencylevel.one); Determine if this is the starting record (used to mark the start time stamp) and the nth million record (the timestamp of the nth record) if ((i==0) | | ((i+1)%10000==0))
{Sampledata.add ((int) (timestamp));
}//Mark end time Long endtime = System.currenttimemillis ();
The tag is altogether long elapsedtime = Endtime-starttime;
System.out.println ("The stress test completed, when:" +elapsedtime+ "milliseconds");
Close connection tr.close ();
After the stress test, we write all the sample data to the file waiting to be processed FileWriter FW = new FileWriter (new file ("Cassandra_insert_sample_data.txt"));
for (int j=0;j<sampledata.size (); j + +) {Fw.write (Sampledata.get (j) + "\ n");
} fw.flush ();
Fw.close (); }
}