you have seen the complexity of implementing a semantics that has and only once been executed. Tridentthe benefit of doing so is to put all the logic of fault-tolerance in the Stateinside -- as a user, you don't have to deal with complexTxid, storing excess information in the database, or anything else like that. All you have to do is write the following simpleCode:
   tridenttopology topology = New tridenttopology (); tridentstate wordcounts = Topology.newstream ("spout1", spout) .each (new fields ("sentence"), new Split (), New fields ("word")) .groupby (New fields ("word")) .persistentaggregate (Memcachedstate.opaque (serverlocations), New count (), New fields (" Count ")) .parallelismhint (6); |
All the logic required to manage opaque transactional state is covered in calls to the Memcachedstate.opaque method, except that the database updates are automatically made in batch to avoid multiple accesses to the database. The basic interface of State contains only the following two methods:
Public interface State { void Begincommit (Long txid);//Can is null for things like Partitionpersist Occurring off a drpc stream void commit (Long txid); } |
When a state update starts, and you are told when an update is complete, you will be told the Txid. Trident does not have any assumptions about the way your state works. Suppose you set up a database yourself to store user location information, and you want to access it in Trident. In the implementation of State, there should be a set of user information, a GET method:
Public class Locationdb implements State { Public void Begincommit (Long txid) { } Public Void Commit (Long txid) { } Public void setlocation (Long userId, String location) {//code to Access database and set location } Public String getLocation (Long userId) {//code to get location from database }} |
Then you also need to provide Trident a statefactory to create your state object in the Trident task. The Locationdb statefactory might look like this:
Public class Locationdbfactory implements Statefactory { public State Makestate (Map conf, int partition Index, int numpartitions) { return new Locationdb (); } } |
Trident provides a queryfunction interface to implement the function of querying on a state source in Trident. A stateupdater is also provided to implement the functions of updating Statesource in Trident. For example, let's write a query to the address of the operation, this operation will query locationdb to find the user's address. Here's how to start with this feature in topology, assuming that this topology accepts a user ID as the input data stream:
tridenttopology topology = new Tridenttopology (); tridentstate locations = topology.newstaticstate (new locationdbfactory ()); Topology.newstream ("Myspout", spout) . Statequery (Locations, new fields ("UserID"), New Querylocation (), New fields ("Location")) |
Let's take a look at what Querylocation's implementation should be like:
Public class Querylocation extends Basequeryfunction<locationdb, string> { Public list<string> Batchretrieve (locationdb State, list<tridenttuple> inputs) {list<string> ret = new ArrayList ();For (Tridenttuple input:inputs) {Ret.add (state.getlocation (input.getlong (0))); }return ret; } Public void Execute (tridenttuple tuple, String location, Tridentcollector collector) {Collector.emit (new Values (location)); } } |
The execution of Queryfunction is divided into two parts. First Trident collects a batch of read operations and unifies them to Batchretrieve. In this example, Batchretrieve will accept multiple user IDs. Batchretrieve should return a result list of the same size and input tuple quantity. The first element in the result list corresponds to the result of the first input tuple, and the second element in the result list corresponds to the result of the second input tuple, and so on. As you can see, this code does not take advantage of batch as well as Trident, but instead queries a locationdb for each input tuple. So a better way to operate the LOCATIONDB should be this:
Public class Locationdb implements State {Public void Begincommit (Long txid) { } Public Void Commit (Long txid) { } Public void Setlocationsbulk (list<long> userids, list<string> locations) { //Set locations in bulk } Public list<string> bulkgetlocations (list<long> userids) {//Get locations in bulk }} |
Next, you can rewrite the above querylocation:
Public class Querylocation extends Basequeryfunction<locationdb, string> { Public list<string> Batchretrieve (locationdb State, list<tridenttuple> inputs) {list<long> userids = new arraylist<long> ();For (Tridenttuple input:inputs) {Userids.add (Input.getlong (0)); }return state.bulkgetlocations (userids); } Public void Execute (tridenttuple tuple, String location, Tridentcollector collector) {Collector.emit (new Values (location)); } } |
This code is much more efficient than the previous implementation by effectively reducing the number of times the database is accessed. If you want to update state, you need to use the Stateupdater interface, here is an example of Stateupdater to update the new address information to LOCATIONDB.
Public class Locationupdater extends basestateupdater<locationdb> { Public void Updatestate (locationdb state, list<tridenttuple> tuples, tridentcollector collector) {list<long> ids = new arraylist<long> ();list<string> locations = new arraylist<string> ();For (Tridenttuple t:tuples) {Ids.add (T.getlong (0));Locations.add (t.getstring (1)); }State.setlocationsbulk (IDs, locations); }} |
Below is a list of how you should use the Locationupdater declared above in Trident topology:
&NBSP;&NBSP; tridenttopology topology = New tridenttopology (); tridentstate locations = topology.newstream ("Locations", Locationsspout) .partitionpersist (New Locationdbfactory (), New fields ("userid "," Location "), new Locationupdater ()) |
The partitionpersist operation updates a state by handing the state and a batch of updated tuples to Stateupdater, which is stateupdater to complete the corresponding update operation. In this code, it simply extracts the UserID and the corresponding location from the input tuple and updates it to the state. Partitionpersist will return a Tridentstate object to represent the location db that was updated by this Trident Topoloy. You can then use this state to do a query operation anywhere in the topology. At the same time, you can also see that we sent a tridentcollector to stateupdaters,collector send a tuple will go to a new stream. In this example, we don't need to go to a new stream, but if you're doing something like updating a count in the database, you can emit the updated count to the new stream. You can then call the Tridentstate#newvaluesstream method to access the new stream for additional processing. For more information, please pay attention to: http://bbs.superwu.cn attention to Superman academy QR Code: Focus on the Superman College Java Free Communication Group: |