Cboard Framework Data Set implementation--dataprovider

Last Update:2018-08-02 Source: Internet

Author: User

Tags foreach arrays

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

article contentCorrelation class Analysis Lambda aggregation implementation data source aggregation implementation (JDBC) Summary

In the previous period of time in the use of Cboard, agree that the framework of the implementation of the Dateprovider is the core of the content (the other general-like ... haha), so, recently spent a little time to study the realization of Dataprovider, here can only be a simple analysis of its implementation, you can provide a way of thinking in the development process. 1. Related class analysis

Classes related to Dataprovider are divided into four main categories:
Config class: Describes the definition of dimension columns and indicator columns (DTOS);
Result class: Describes data results;
Aggregatable class: Aggregation implements interface definition and data aggregation based on Java8 characteristics;
Dataprovider class: Dataprovider abstract class and its specific implementation: 1.1.Config class

The following three categories are of primary concern:

Dimensionconfig: Dimension definition column, including column name, filter condition (type, value), rename;

Valueconfig: Indicator column definition, including column name and aggregation operation type;

Aggconfig: A chart corresponding to the data set description, including rows, columns, indicator columns, filter conditions;

The implementations of these classes and the JSON objects submitted at the time of the data query have a corresponding relationship, and by modifying the JSON field values you can customize the query criteria, which are equivalent to telling Dataprovider what data you need and the format of your data; 1.2.Result class

The implementation of the result class is relatively simple, just a description of the result, for all data sources, the result is columnindex (column definition) and string[][] two-dimensional data;

I think the dimensions of the data can be considered as needing to be increased; 1.3.Aggregatable class

I think this part of the implementation is the most exciting part of the framework, although there are some limitations, but the lambda expression is very good use

In addition to the definition of generic interfaces, we focus on the following:

Aggregatable interface: Whether the aggregation is done directly through the data source or the lambda expression, you need to implement the Aggregatable interface, mainly the following methods:

Queryaggdata: Gets the aggregated data set;
Querydimvals: Gets the filtered value of a certain latitude;
Viewaggdataquery: Viewing queries (such as SQL after adding an aggregate column)
Separatenull: null is worth processing, there is a default implementation;

Aggregatecollector class: Implement Collector interface (we often use collectors), define aggregation rules;

The corresponding aggregation rule in collectors is returned according to the aggregation type (aggtype);
private Collector Newcollector (columnindex columnindex) {
        switch ( Columnindex.getaggtype ()) {case
            "sum":
                return Collectors.summingdouble (this::todouble);
            Case "AVG":
                return Collectors.averagingdouble (this::todouble);
            Case "Max":
                return Collectors.maxby (comparator.comparingdouble (this::todouble));
            Case "min":
                return Collectors.minby (comparator.comparingdouble (this::todouble));
            Case "distinct":
                return new Cardinalitycollector ();
            Default:
                return collectors.counting ();
        }
    }

Perhaps like Cardinalitycollector, extensions support richer aggregation types and support custom aggregation processing;

/***********/

Jvmaggregator class: Based on the aggregation rules provided by aggregatecollector, processing simple data set data, returning the data format after aggregation processing;

Public Aggregateresult queryaggdata (aggconfig config) throws Exception {string[][] data = Rawdatacache.get (GETCAC
        Hekey ());
        Column definition, such as map<string, integer> columnindex = getcolumnindex (data);

        Filter RowFilter = new Filter (config, columnindex);
        stream<columnindex> columns = Config.getcolumns (). Stream (). Map (Columnindex::fromdimensionconfig);
        stream<columnindex> rows = Config.getrows (). Stream (). Map (Columnindex::fromdimensionconfig); list<columnindex> valueslist = Config.getvalues (). Stream (). Map (Columnindex::fromvalueconfig). Collect (
        Collectors.tolist ());
        list<columnindex> dimensionlist = stream.concat (columns, rows). Collect (Collectors.tolist ());
        Dimensionlist.foreach (E-E.setindex (Columnindex.get (E.getname ()));

        Valueslist.foreach (E-E.setindex (Columnindex.get (E.getname ())); One-time group,dimensionlist no hierarchical relationship map<dimensions, double[]> grouped = Arrays.strEAM (data). Skip (1). Filter (Rowfilter::filter). Collect (Collectors.groupingby (row, {
                    String[] ds = Dimensionlist.stream (). Map (D-, Row[d.getindex ()]). ToArray (string[]::new);
                return new Dimensions (DS);
        }, Aggregatecollector.getcollector (valueslist)); TODO Hierarchical Aggregation Dimensionlist Preserve hierarchy string[][] result = new String[grouped.keyset (). Size ()][dimensionlist.size () + Val
        Ueslist.size ()];
        int i = 0; For (Dimensions D:grouped.keyset ()) {result[i++] = Stream.concat (Arrays.stream (d.dimensions), Arrays.stream
        (Grouped.get (d)). Map (E-e.tostring ())). ToArray (string[]::new);
        }//null Value processing int dimsize = Dimensionlist.size (); For (string[] row:result) {intstream.range (0, dimsize). ForEach (D-, {if (row[d] = = null)
            ) Row[d] = null_string;
        });
        } dimensionlist.addall (Valueslist); Sentence setThe advantages of the INDEX,LAMBDA expression of the Set column >< intstream.range (0, Dimensionlist.size ()). ForEach (J-Dimensionlist.get (j). S
        Etindex (j));
    return new Aggregateresult (dimensionlist, result);
 }

The process of aggregation is roughly: Filter on raw data (string[][])-"Calculate indicator column (Aggregatecollector rule)-" Merge dimension column and indicator column;

If the aggregation is done through jvmaggregator, the data is fetched directly from the cache, and all the Checkandload () methods are loaded before the aggregation. class 1.4.DataProvider

The following methods are mainly concerned:

Doaggregationindatasource: Determine whether the current data source supports "data source aggregation";

Getaggdata: Gets the data after aggregation;

Public final Aggregateresult getaggdata (Aggconfig ac, Boolean reload) throws Exception {
        evalvalueexpression (AC); C2/>if (This instanceof aggregatable && doaggregationindatasource ()) {
            return ((aggregatable) this). Queryaggdata (AC);
        } else {
            checkandload (reload);
            return Inneraggregator.queryaggdata (AC);
        }
    }

Evalvalueexpression, evaluator and getfiltervalue: expression parsing, when filtering parameter selection, you can choose to select a time period, their implementation is used to convert to a specific value;

Reference: Com.googlecode.aviator.runtime.function.AbstractFunction

Here I used to expand some of the system parameters, as a system condition;

Getlockkey: Generate data cache ID;

Here Key.intern (), very interesting things;

Configcomp2dimconfiglist: Converts a definition-given multidimensional definition into column data;

Therefore, the data after the aggregation processing is a two-dimensional array format;

jdbcdataprovider (JDBC Data source implementation)

Jdbcdataprovider supports two aggregation methods: Data source aggregation and lambda aggregation, which can be set through the data source configuration;

Data Source Aggregation :

private bifunction<valueconfig, Map<string, Integer>, String> toSelect = (config
        , types), {String aggexp;
            if (Config.getcolumn (). Contains ("")) {aggexp = Config.getcolumn (); For (String Column:types.keySet ()) {aggexp = Aggexp.replaceall ("" + Column + "", "__view__." + Colum
            n + "");
        }} else {aggexp = "__view__." + Config.getcolumn ();
            } switch (Config.getaggtype ()) {case "sum": Return "sum (" + Aggexp + ")";
            Case "AVG": Return "avg (" + aggexp + ")";
            Case "Max": Return "Max" ("+ Aggexp +") ";
            Case "min": Return "Min" ("+ Aggexp +") ";
            Case "distinct": return "COUNT (distinct" + aggexp + ")";
        Default:return "COUNT (" + Aggexp + ")";
}
    };

Right, is the SQL nested, the original SQL query results are aggregated processing, the implementation of the same data filtering method;

It can be thought that the way data is aggregated is limited to the way the database is supported and must be generic (Oracle rollup, etc.);

Lambda Mode aggregation:

It is done by jvmaggregator the service; 2. Lambda Aggregation Implementation 3. Data source aggregation implementation (JDBC)

These are all mentioned above. 4. Summary

I have little research on other data sources, and do not know whether the data source aggregation in other types of data source implementation has a great advantage, but in the case of JDBC, is basically to meet the common aggregation processing requirements; But, as I mentioned earlier, the most wonderful thing is the implementation of Jvmaggregator, You can see the advantages of lambda expressions in data processing, although it can be a bit difficult to read lambda expressions at first, but after you are familiar with the basic syntax, you will not be able to easily understand it.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More