Cause: In the previous period, we replaced the put () operation with the write data via happybase into the batch () result to find that performance has not improved
Reading the code, I found that the put () implementation uses the BULK INSERT
table.py
def put(self, row, data, Timestamp=none, wal=true): "" " Store data in the table. This method stores the data "argument for the row specified by ' row '. The ' data ' argument is dictionary this maps columns to values. Column names must include a family and qualifier part, e.g. ' Cf:col ', though the qualifier part could be the empty string, e.g. ' CF: '. Note that, in many situations,:p Y:meth: ' Batch () ' are a more appropriate method to manipulate data. .. Versionadded:: 0.7 ' wal ' argument:p Aram str row:the row key:p Aram Dict data:the data to store :p Aram int Timestamp:timestamp (optional):p Aram Wal Bool:whether to write to the Wal (optional) "" " withSelf.batch (Timestamp=timestamp, Wal=wal) asBatch:batch.put (row, data)# It's obviously a bulk operation
batch.py
class Batch(object): "" " Batch mutation class. This class cannot is instantiated directly; Use:p y:meth: ' Table.batch ' instead. """ def __init__(self, table, Timestamp=none, Batch_size=none, Transaction=false, WA l=true): "" " initialise a new Batch instance. " "" if not(timestamp is None orIsinstance (timestamp, Integral)):RaiseTypeError ("' timestamp ' must is an integer or None")ifBatch_size is not None:ifTransactionRaiseTypeError ("' Transaction ' cannot is used when" "' batch_size ' is specified")if notBatch_size >0:RaiseValueError ("' batch_size ' must be > 0") self._table = Table Self._batch_size = batch_size Self._timestamp = Timestamp self._transact Ion = Transaction Self._wal = Wal self._families =NoneSelf._reset_mutations () def _reset_mutations(self): "" " Reset the internal mutation buffer. " "self._mutations = defaultdict (list) Self._mutation_count =0 def send(self): "" "Send the batch to the server. " "BMS = [Batchmutation (row, m) forRow, MinchSelf._mutations.iteritems ()]if notBms:returnLogger.debug ("Sending batch for '%s ' (%d mutations on%d rows)", Self._table.name, Self._mutation_count, Len (BMS))ifSelf._timestamp is None: Self._table.connection.client.mutaterows (Self._table.name, BMS, {})Else: Self._table.connection.client.mutaterowsts (Self._table.name, BMS, Self._timestamp, {}) Self._reset_mutations ()# # Mutation Methods # def put(self, row, data, Wal=none): "" " Store data in the table. See:p y:meth: ' Table.put ' for a description of the ' row ', ' data ', and ' Wal ' arguments. The ' Wal ' argument should normally is not being used; The Batch-wide value passed to:p y:meth: ' Table.batch '. """ ifWal is None: Wal = Self._wal self._mutations[row].extend (Mutation (isdelete=False, Column=column, Value=value, Writetowal=wal) forcolumn, valueinchData.iteritems ()) Self._mutation_count + = len (data)ifSelf._batch_size andSelf._mutation_count >= self._batch_size:# only larger than _batch_size will actually send the dataSelf.send ()
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Happybase put () Operation defaults to batch