Thrift get a lot of data
Description
In thrift, the method invocation operation is implemented using a short connection, that is, each call is independent, the connection is closed, so for a similar need to obtain a large amount of data, our method calls to get too much data, resulting in poor performance, At this point we expect to have a similar usage in socket programming, that is, to get a long connection, take only part of the data at a time, and then loop through it until the data is finished and then the connection is closed again. And this way of using long connections is not implemented by default in thrift, let's look at how to do this through some other means.
In addition, for the data stored in the database, we can rely on the database paging function to implement, so it does not apply here, this is only for those data is the command line call to get or from the database to obtain data. Pre-optimized program
The following code is the initial implementation, that is, to get all the data at once. Job.thrift
Interface definition file, which defines a method to obtain all job information on the server.
struct JOB {
1:string ID,
2:string name,
3:string queue,
4:string user,
5:string cmd,
# ... c11/>}
service Jobservice {
list<job> getjobs ()
}
jobserver.py
Server-side code, which simulates 100,000 jobs in memory, which is only analog, and it is possible to get 100,000 job records by calling a command line.
#!/usr/bin/env python
import sys, glob
sys.path.append (' gen-py ') from
job import Jobservice
from Job.ttypes Import * from
thrift.transport import Tsocket to
thrift.transport import Ttransport
from Thrift.protocol import Tbinaryprotocol from
thrift.server import Tserver
class Jobservicehandler:
def __ Init__ (self):
self.jobs = []
for I in range (0, 100000):
self.jobs.append (Job (Id=str (i), name= ' job_ ' + STR (i), queue= ' normal ', user= ' kongxx ', cmd= ' sleep 1 ")
def getjobs (self): return
self.jobs Handler
= Jobservicehandler ()
processor = Jobservice.processor (handler)
transport = Tsocket.tserversocket (port= 9090)
tfactory = Ttransport.tbufferedtransportfactory ()
pfactory = Tbinaryprotocol.tbinaryprotocolfactory ()
server = Tserver.tsimpleserver (processor, transport, Tfactory, pfactory)
print ' Starting the server ... '
server.serve ()
print ' done. '
jobclient.py
Client code, establish a connection to the server side, and then fetch all the data at once.
#!/usr/bin/env python
import sys, glob
sys.path.append (' gen-py ') from
job import Jobservice
from Job.ttypes Import * from
Thrift import Thrift to
thrift.transport import tsocket from
Thrift.transport Import Ttransport from
thrift.protocol import tbinaryprotocol
try:
transport = Tsocket.tsocket (' localhost ', 9090)
transport = Ttransport.tbufferedtransport (transport)
protocol = Tbinaryprotocol.tbinaryprotocol (transport)
client = jobservice.client (protocol)
Transport.open ()
Client.getjobs (100000)
transport.close ()
except Thrift.texception, TX:
print '%s '% (tx.message)
TestRun "python jobserver.py" run "python jobclient.py" using the "Thrift-r–gen py job.thrift" Build code
optimized after programAfter optimizing the program, the program is optimized to get all the data for partial data acquisition. The open and close methods are added to define the integrity of a query operation, with a bit of a database operation getting a connection and closing a connection.
Job.thrift
Interface Definition text File open ()-simulate the establishment of an operational transaction, return an operation number, you need to ensure that the Operation ID unique, here is only testing, so simply use the system time for the Operation ID, the production environment needs to ensure that the test ID unique. In the example, the open method generates 100,000 job data on the server side of the simulation. Close ()-closes an operation transaction to clean up server-side data. Getjobs ()-Gets the data in a single operation transaction in batches. The Getjobs function must be called between open and close, or it will be an error.
struct JOB {
1:string ID,
2:string name,
3:string queue,
4:string user,
5:string cmd,
# ...
}
exception jobserviceexception {
1:i32 code,
2:string message
}
service Jobservice {
string Open () throws (1:jobserviceexception e),
void Close (1:string OperationID) throws (1:jobserviceexcept Ion e),
list<job> getjobs (1:string operationid, 2:i32 offset, 3:i32 size) throws (1:jobserviceexception e) c17/>}
jobserver.py
Server-Side code: Jobserviceinstance Class-maintains one instance of a job query, simulating the production of 100,000 jobs in the constructor. Jobserviceinstancemanager class-A class that manages the instance of a job query to create, destroy, and get a job query instance. Jobservicehandler class-Handles specific query operations, returns results or throws exceptions as needed. All operations are implemented through the Jobserviceinstancemanager class and the Jobserviceinstance class.
#!/usr/bin/env python import sys, glob, Time Sys.path.append (' gen-py ') from job import Jobservice from job.ttypes Import * FROM Thrift.transport import tsocket to Thrift.transport import ttransport from Thrift.protocol import Tbinaryprotoco L from thrift.server import Tserver class Jobservicehandler:def __init__ (self): Self.manager = Jobserviceins Tancemanager () def open (self): try:operation_id = str (time.time ()) Self.manager.crea Te_instance (operation_id) return operation_id except Exception as E:raise jobserviceexcep tion (message=e.message) def close (self, operation_id): Try:self.manager.drop_instance (Operation_ ID) except Exception as E:raise jobserviceexception (message=e.message) def getjobs (self, Operati
on_id, offset, size): Instance = self.manager.get_instance (operation_id) if instance:try: Return InstAnce.get_jobs (offset, size) except Exception as E:raise jobserviceexception (message=e.message
) else:raise jobserviceexception (message= ' Invalid operation ID. ') Class Jobserviceinstance:def __init__ (self): Self.jobs = [] for i in range (0, 100000): SE Lf.jobs.append (Job (Id=str (i), name= ' job_ ' + str (i), queue= ' normal ', user= ' kongxx ', cmd= ' sleep 1 ') def get_jobs (self, Offset, size): return self.jobs[offset:offset+size] def clean (self): Self.jobs = [] class Jobservic Einstancemanager:def __init__ (self): Self.instances = Dict () def get_instance (self, id): return Self.instances.get (ID) def create_instance (self, id): Instance = jobserviceinstance () self.instances.
Update ({id:instance}) return instance Def drop_instance (self, id): Instance = Self.instances.get (ID)
If Instance:instance.clean () Self.instances.pop (ID) handler = Jobservicehandler () processor = Jobservice.processor (handler) transport = Tsocket.tser Versocket (port=9090) tfactory = Ttransport.tbufferedtransportfactory () Pfactory =
Tbinaryprotocol.tbinaryprotocolfactory () Server = Tserver.tsimpleserver (processor, transport, tfactory, Pfactory)
print ' Starting the server ... ' Server.serve () print ' done. '
jobclient.py
Client class: First use Client.open () to get an action number. Based on the operation number, query the offset and the size of each page query part of the data. Call Client.close () after the query completes to close a query operation.
#!/usr/bin/env python import sys, glob sys.path.append (' gen-py ') from job import Jobservice fro M job.ttypes Import * from thrift Import Thrift to Thrift.transport import tsocket from thrift.transport import TTRANSP ORT from Thrift.protocol import tbinaryprotocol try:transport = Tsocket.tsocket (' localhost ', 9090) transport = T Transport.tbufferedtransport (transport) protocol = TBINARYPROTOCOL.TBINARYPROTOCOL (transport) client = JobService.
Client (Protocol) transport.open () offset = 0 size = Total = 0 operation_id = Client.open ()
While true:jobs = Client.getjobs (operation_id, offset, size) if len (jobs) = = 0:break
Offset + = Size total = Len (jobs) for job in Jobs:print job print "Total:%s"% total Client.close (operation_id) transport.close () except Thrift.texception, Tx:print '%s '% (tx.message)
TestRun "python jobserver.py" run "python jobclient.py" using the "Thrift-r–gen py job.thrift" Build code
post-optimization problem descriptionOptimized before and after optimization, if we look at the query time alone, it doesn't seem to change much, but it's still good because we're getting the data in batches, so we can process the queried data asynchronously as needed, rather than having to wait until all the results are returned before the optimization. Given the possibility that the client side forgot to call the Close method to shut down the operation, causing the server side to produce garbage data, we can use the timeout time to set the data to avoid it. For example: increase the timing of jobserviceinstance calls in the Jobserviceinstancemanager class, and update the last call time every time a call is made to the getjobs () If a jobserviceinstance is detected that is not invoked within the specified timeout period, it is destroyed from the Jobserviceinstancemanager class. The operation number in the test code is not guaranteed to be unique, and the production environment can write an algorithm to generate it as needed.
Reprint please indicate this link form link
This article link: http://blog.csdn.net/kongxx/article/details/52188408