Previously wrote a merged string of the CLR aggregate function, basically is copied Ms example, plus some processing, has been put into use for a long time, no problem also did not how to study, recently wanted to change, encountered some problems, then tinkering with some experience, recorded as follows.
I. Miscellaneous
- CLR aggregation functions, although they are functions in SQL, are actually present in C # in the form of a class or struct (yes, the entity that implements the aggregate function can be a struct or a class ), which differs from the CLR common function, which is the class , for this reason, the class or struct that implements the aggregate function is collectively referred to as the aggregate class to avoid confusion in the reader's mind
- The aggregation class must be described with an attribute [sqluserdefinedaggregate]. The [Serializable] feature is also used in the MS example, which is not required to be tested
- Aggregation classes are always serialized and deserialized at execution time, which requires the aggregation class to be serializable, and the details below say
- sqluserdefinedaggregate characteristics of Isinvarianttonulls, isinvarianttoduplicates, Isinvarianttoorder, Isnullifempty These properties are used by query handlers for the SQL engine, presumably as the basis for optimizing the execution plan, and are not related to the implementation of the aggregation class . What do you mean, like isinvarianttonulls=true, is to tell the SQL query processor that I have this aggregate function regardless of whether you have nullin the field (meaning nullin SQL, same as below), The returned results are consistent, and the query processor may be based on this to determine whether or not to reuse an existing execution plan; it does not mean that it will automatically filter out null values for the incoming accumulate method. In other words, set isinvarianttonulls=True , the accumulate in the null will come in, you have to deal with the processing. The same is true of a few other attributes, which do not take care of anything, and your mind must be drilled. So in this case, then your implementation is best with the above attribute description of the same, or deceived the query processor, estimated to have good fruit for you to eat. Like you and MM said to ask her to eat 6 dollars of spicy soup, in fact you only bought 1 dollars, the consequences of their own brain repair
Ii. Order of execution
After the interrupt point debugging, the aggregation classes are executed in the following order:
- Enter the Init method. This is the first step after the start of the aggregation, tried to give the class without a parameter constructor, but did not go in, here the aggregation class has been secretly warn you, do not treat it as a general class . As for why, do not know, Wang Gao answers
- Enter the accumulate method (this step is not necessarily happening, as explained later). The parameter of the method is the last deployment to SQL, the parameters that can be passed when the aggregation is called (SQL 2005 only supports 1 parameters), the equivalent of a SQL-oriented portal, the elements to be aggregated will be thrown one after another, come in a single execution once, and then come in one more to execute again, So the method is executed several times based on the number of elements to be aggregated, of course, if the element is 0, it will not be executed and will jump directly to the next step, which is why this step is not necessarily happening. It should be noted that the work of the aggregate function is grouped into a cycle, that is, group by out there are several groups, the aggregation function will be called several times, here is the order of execution in each time, so the accumulate method is the number of cycles of a single group of rows, not all groups of rows. For a chestnut, group by out of 2 groups, the 1th group has 2 rows, the 2nd group has 3 rows, then the entire aggregation function will be called 2 times, the first time the accumulate will cycle 2 times, finished the next step, the entire cycle completed, and then began to aggregate the next group, obviously, In the second round, the accumulate will cycle 3 times.
- The
- is serialized. Dry hair to serialize, I also want to know, I just know that after this step, the values of all the fields that aggregate the class will be emptied (to be exact reset to the class default), so if you don't seize the opportunity to save the data at serialization, Will spoil all the work that you did in init and accumulate, because you will not get any data in the next deserialization process, which will result in an unlimited number of possible returns in the final terminate Method! So this step of serialization must be focused on understanding. This step can only be seen if it has been serialized. In a nutshell, the serialization behavior of the aggregation class is divided into two types, by sqluserdefinedaggregate The Format property of the attribute specifies that the property (which is an enumerated class type) has 3 values: Unknown, Native, UserDefined, where Unknown is present as a default, class The class , like none of the other enumerations, represents the sqluserdefinedaggregate attribute, format must be specified as native or userdefined, and if it is unknown, an exception is thrown. Therefore, the serialization behavior of the aggregation class can only be native, userdefined two kinds:
- Native. Represents the aggregation class to the CLR to go to serialize and deserialize, do not need to implement, it looks beautiful, but the native way some premise is that the aggregation class can only exist members of the value class type, cannot have a reference class Type, including string, and if the aggregation class is class and not a struct, then you must use [structlayout( layoutkind.sequential)] attribute is marked, and if the above conditions are not met, deployment will fail. So sometimes you can't be lazy, you have to use the UserDefined way
- userdefined。 means that you must implement your own serialization and deserialization behavior by letting the aggregationclassRealizeibinaryserializeinterface, as in the case of Ms. The interface has two methods,Public voidWrite (BinaryWriterW) andPublic voidRead (BinaryReaderR), which represents the serialization and deserialization process, respectively. Finally, the point is, in the aggregationclassTo serialize this step, you are responsible for writing all the data you want to save to W (aBinaryWriterinstance) in the underlying stream, either by the Write method of W or by direct access to W. BaseStream operation of the underlying stream, or like newBinaryFormatter(). Serialize (W.basestream, obj) This way, the entire object isBinaryFormatterWrite to the basic flow, in short, the method is diverse, belongs to the knowledge category of flow, this article does not repeat, anyway, the ultimate goal is to write data into the W. BaseStream, for the sake of insurance, after writing can be W. Flush () a bit. This is about the second point, the question of the amount of data that can be written,sqluserdefinedaggregateThe attribute has a MaxByteSize property that, when used in userdefined mode, must be specified to indicate the maximum number of bytes of data that can be written at serialization. Does not specify is 0, is what data also cannot save ~ Play Mao. The maximum value that the maxbytesize can set is determined by theSqluserdefinedaggregateattribute. Maxbytesizevalue constants, and this constant. net2.0-3.5 are all 8000, and subsequent versions do not know if there are any changes. That is, when serializing, you can write up to 8000 bytes of data, and you can save 4000 kanji? Oh, good. ~nonono, according to my debugging, W encoding method is UTF8 (uncertainty with the environment has no relationship, because W is the CLR is responsible for the incoming, under what circumstances, what coded w, not elegant. If it is a fixed transmission UTF8, that can only say a little bit of the people of the non-ASCII area), and cannot be changed, that is, 1 Chinese characters may occupy four to five bytes, according to 3 bytes also about 2,600 kanji, should be said to be not rich, can only touch the limit of praying application. So my opinion, must save to use this volume, only saves the necessary data, does not have the diagram to be easy to serialize the whole whole object in. For example, MS can onlyStringBuilderIn thestringTucked in, without putting the wholeStringBuilderObject plugged in.
- The
- is deserialized. The last said, to serialize the data you want to save, it is natural to take the data out. Similarly, you can access R by using various readxxx methods of R (a binaryreader ). BaseStream operations the underlying stream to fetch data. The mind needs to stay awake, that is, after extracting the data to process and return the result in the Terminate method, instead of having to restore the member object, and then to manipulate the object in the terminate. What do you mean, also take MS's example to say things, but this time it is a negative case, in read in the previous saved string , there is no need to revert to stringbuilder , you can fully use a string field to catch, Then it's OK to process the field and return it in terminate
- Enter the Terminate method. As mentioned above, accumulate is a SQL-oriented portal, and terminate is the exit, and the result of the aggregation calculation is returned to the caller by terminate, so the return class of the method is the class type obtained in SQL. By the above, it is known that between accumulate and terminate, there is a serialization and deserialization link, and that after serialization, the value of the class field has been emptied, not the original class Fields (unless you restore them when deserializing). With this in mind, you should be aware of the seemingly justified approach like this:
Public void accumulate (SqlString str) { = str;} Public SqlString Terminate () { returnnew SqlString (s);}
It is a great error in an aggregation class , unless S is saved at serialization and restored at deserialization time.
- Starts the next set of init→accumulate→ serialization/deserialization →terminate. Of course if there is no next group, the whole end
Finished? There is not a merge method, I am sorry, I do not know when the goods will be used. In my many debugs, I never encountered a situation where the merge was executed. As described in the MSDN documentation, my guess is that the CLR does not guarantee that the same aggregate class instance is used in a single aggregation, that it is possible to work on another instance at any time, and that the new instance's Merge method merges the old instance's data into the new instance, and releases the old instance. Do not know this speculation is right, the master if clear, also hope to advise, thanked first. If this is true, it is clear that the merge method is to incorporate the data of the old instance (other) into the current instance, and what should be written about the reader. It is important to note that if the aggregation class is designed to handle only non-repeating elements, the elements stored in each instance are guaranteed to be unique, but the elements in the two instances may be the same, and be aware of this when implementing merge, to ensure that the merged data remains unique.
Third, the last
For now, in my opinion, the aggregation class is a class /struct in C #, but it's strange everywhere, such as not executing a constructor, emptying the class field while running and passing the state in a serialized and deserialized way. Make it less like a normal class, so I suggest that before fully figuring it out, do not use some of the methods of OOP to achieve it, such as inheritance rewrite what, do not think, honestly fill in the blanks. In addition, for the doubts raised in the text, hope to get expert advice, again thanked.
Finally, attach a modified string aggregation (ignoring null, whitespace, repeating string, removing the first and last blanks):
usingMicrosoft.SqlServer.Server;usingSystem;usingSystem.Collections.Generic;usingSystem.Data.SqlTypes;usingSystem.IO;namespaceahdung.sqlclr{[sqluserdefinedaggregate(format.userdefined, Isinvarianttonulls=true, Isinvarianttoduplicates=true, Isinvarianttoorder=true, MaxByteSize= sqluserdefinedaggregateattribute.maxbytesizevalue)] Public classJoinstring:ibinaryserialize {stringsptr, result; Dictionary<string,Object>dic; Public voidInit () {sptr=string. Empty; DiC=Newdictionary<string,Object> (stringcomparer.currentcultureignorecase);//Ignore Case } Public voidaccumulate (SqlString str, SqlString separater) {if(Sptr. Length = =0&&!separater. IsNull &&!string. IsNullOrEmpty (Separater. Value) {sptr =Separater. Value; } strings; if(str. IsNull | | Str. Value.trim (). Length = =0|| Dic. ContainsKey (s = str.) Value.trim ())) {return; } Dic. ADD (S,NULL); } Public voidMerge (joinstring other) {foreach(stringSinchOther.dic.Keys) {if(DIC. ContainsKey (s)) {Continue; } Dic. ADD (S,NULL); } } PublicSqlString Terminate () {return NewSqlString (Result); } Public voidRead (BinaryReader r) {result=r.readstring (); } Public voidWrite (BinaryWriter w) {string[] ss =New string[DIC. Count]; Dic. Keys.copyto (SS,0); W.write (string. Join (SPTR, SS)); } }}
Finish
"SQL" CLR aggregation function what a ghost