Recently TalkingData Open source The main role of Fregata,fregata is to speed up the computing speed of machine learning based on spark, it is said that 1 billion * 1 billion level of data if cached in memory, the 1s clock can be completed, if not cached, 10 seconds to fix, If this is the case, it is a fortress, and the following are only translations, if there are incorrect welcome corrections
Brief introduction
Fregata is a lightweight, super fast, large-scale framework based on spark machine learning, which provides a high standard of SCALAAPI
• High accuracy: For a variety of problems, fregata than mllib to achieve higher accuracy
• High speed: 10-100 times faster than Mllib, for linear model training, if 1 billion * 1 billion level of data if cached in memory, the 1s clock can be completed, if not cached, 10 seconds to fix
• Parameter freedom: Fregata uses GSA for optimization, there is no need to adjust the learning rate, because in the training process will calculate an appropriate learning speed, when faced with ultra-high dimensional problems, Fregata will dynamically calculate the remaining memory to determine how sparse output to match the largest high precision and high speed , these two features make Fregata a standard module of data processing for different problems
• Lightweight: Only the standard API for spark makes it easy to integrate seamlessly and quickly into the data processing processes of many enterprises
Architecture:
Current version 0.1
Core: Based on GSA inherits the stand-alone version of the algorithm, including classification regression and clustering algorithm, now only supports two classification and multiple classification model
Spark: Through the packaging of the Core.jar of Spark, the spark learning algorithm is inherited,
Only scala2.10 versions are currently supported and spark versions of 1.x.2.x support
The project I built with MAVEN is referenced below
<dependency>
<groupId>com.talkingdata.fregata</groupId>
<artifactid>core</ artifactid>
<version>0.0.1</version>
</dependency>
<dependency>
<groupId>com.talkingdata.fregata</groupId>
<artifactId>spark</artifactId>
< Version>0.0.1</version>
</dependency>
The next chapter will begin to realize the stage