Streaminsight: use time to import and synchronize slow-row Reference streams and fast-row data streams

Source: Internet
Author: User

Link: http://blogs.msdn.com/ B /masimms/archive/2010/09/27/streaminsight-synchronizing-slow-moving-reference-streams-with-fast-moving-data-streams-time-import.aspx

One of the common tasks of using streaminsight is to use a reference stream to integrate metadata or reference data from a relatively static data source (such as an SQL Server table. The difficulty of integration reference stream is that it needs to deal with liveliness, that is:

In this figure, we can see that two streams (data streams and reference streams) are connected together to create a connection stream. If the above content is expressed using the LINQ syntax, you can write it like this:

Cepstream<Sensorreading> Datastream = ...;Cepstream<Sensormetadata> metadatastream = ...;VaRJoinedquery =FromE1InDatastreamJoinE2InMetadatastreamOnE1.sensoridEqualsE2.sensoridSelectE1;

We can see that there are only two output events. This is based on the application of the streaminsight engine.ProgramTime (updated by CTI. In this example, the streaminsight engine does the following:

    • Data Stream time is T5
    • The referenced stream event is T0.
    • Because these two events are lingering together, the output can be generated only after all events within this period of time are received. Because the reference is transferred to the data stream, the event can only be output at the speed of the reference stream.

In general, the above actions are undoubtedly correct. But sometimes what we want to implement is inconsistent with what the streaminsight engine implements-for example, the reference stream changes slowly, and we don't want to wait for it all the time. That is to say, we want the output speed of the result to be consistent with that of the data stream. So what should we do?

Set the CTI concept in the data streamIntroductionTo the reference stream:

You can use the cepstream <>. Create () overload method to complete this operation. BelowCodeDisplays the process of referencing a stream using a CSV file as the sample data source and another CSV file as the sample. You can download the entire project from here.

 //////////////////////////////////////// /// // Create a time import setting, specifies that the stream will be imported to the CTI settings in datastream //  VaR Timeimportsettings = New  Advancetimesettings ( Null , New  Advancetimeimportsettings ("Datastream" ), Advancetimepolicy . Adjust ); //////////////////////////////////////// /////////////////////////// Create a reference data stream from the refstream.csv file; use the CTI settings in datastream defined in timeimportsettings //  Cepstream < Sensormetadata > Metadatastream = Cepstream < Sensormetadata >. Create ( "Refstream" , Typeof ( Textfilereaderfactory ),New  Textfilereaderconfig () {Ctifrequency = 1, culturename = Cultureinfo . Currentculture. Name, delimiter = ',' , Inputfilename = "Refstream.csv" }, Eventshape . Point, timeimportsettings );

Note that the preceding syntax assumes that the "datastream" stream exists. Now, if we connect the data stream and the reference stream, we can see a stable output stream. However, if we simply look at the original output of the metadata stream:

VaRRawdata = metadatastream. toquery (cepapp,"Metadatastream","",Typeof(Tracerfactory), Traceconfig,Eventshape. Interval,Streameventorder. Fullyordered );

The following error is returned:

Error in query: Microsoft. complexeventprocessing. managementexception: Advance time import stream 'datastream' does not exist. --->
Microsoft. complexeventprocessing. compiler. compilerexception:Advance time import stream 'datastream' does not exist.

Why? What does the imported stream mean when it does not exist? I have already defined it! The reason is thatThe imported stream has not been physically connected to another stream.. To solve this problem, you need to connect two streams before binding the output adapter.

  ///////////////////////////////////// ///////////////// // create a connection between two streams, bind the result to the console //   var  joinedquery =  from  E1  in  datastream  join  E2  in  metadatastream  On  e1.sensorid  equals  e2.sensorid  select  E1;  var  query = joinedquery. toquery (cepapp,  "joinedoutput" , " ",  typeof  ( tracerfactory ), traceconfig,  eventshape . interval,  streameventorder . fullyordered); 

Now let's look at the output:

Ref, interval from 06/25/2009 00:00:00 + 00:00 to 06/25/2009 00:00:00 + 00: 00:, mysensor_1001, 1001, 14ref: CTI at 06/25/2009 00:00:00 + 00: 00ref: CTI at 06/25/2009 00:00:09 + 00: 00ref: CTI at 12/31/9999 23:59:59 +

Why? Where is the output? Note that referencing data is only a sequence of point events. If we want to use it as a reference stream, we need to transform the series of point events into edge events. You can use the altereventduration and clip operators to complete the above work:

 
// Convert the vertex event in the referenced stream to an edge eventVaREdgeevents =FromEInMetadatastream. altereventduration (E =>Timespan. Maxvalue). clipeventduration (metadatastream, (E1, E2) => (e1.sensorid = e2.sensorid ))SelectE;

This Code does the following:

    • Extend the point event duration to infinite time
    • Trim any vertex events that arrive with the same Sensor ID. For example, for a given value (1001, sensorid_1001), if the value of another event arrives at a later time is (1001, mysensor), the initial event will be cropped and the new value will be changed to mysensor.

Put all things together as follows:

 // Convert the vertex event in the referenced stream to an edge event  VaR Edgeevents = From E In Metadatastream. altereventduration (E => Timespan . Maxvalue). clipeventduration (metadatastream, (E1, E2) => (e1.sensorid = e2.sensorid )) Select E; //////////////////////////////////////// ////////////////// // Create a connection between the two streams, bind the result to the console // VaR Joinedquery = From E1 In Datastream Join E2 In Edgeevents On E1.sensorid Equals E2.sensorid Select New {Sensorid = e1.sensorid, name = e2.name, value = e1.value };

The final result is as follows:

 ref, interval, 12:00:00. 00.000, 12: 00: 1001,  mysensor_1001 , 14ref: CTI at 06/25/2009 00:00:00 + 00: 00ref, interval, 12:00:01. 01.000, 12: 00: 1001,  mysensor_1001 , 4ref, interval, 12:00:02. 02.000, 12: 00: 1001,  mysensor_1001 , 77ref, interval, 12:00:03. 03.000, 12: 00: 1001,  mysensor_1001 , 44ref, interval, 12:00:04. 04.000, 12: 00: 1001,  mysensor_1001 , 22ref, interval, 12:00:05. 05.000, 12: 00: 1001,  mysensor_1001 , 51ref, interval, 12:00:06. 06.000, 12: 00: 1001,  mysensor_1001 , 46ref, interval, 12:00:07. 07.000, 12: 00: 1001,  mysensor_1001 , 71ref, interval, 12:00:08. 08.000, 12: 00: 1001,  mysensor_1001 , 37ref, interval, 12:00:09. 09.000, 12: 00: 1001,  mysensor_1001 , 12/31, 45ref: CTI at 9999/23:59:59 + 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.