In terms of the python encapsulation of hadoop APIs, the famous music site last. FM released the python-based Dumbo (small
Apsara stack project and dumbo can help Python developers compile hadoop applications more conveniently, And dumbo provides a flexible and easy-to-use Python
API. Last. FM developer and klaas bosteels project initiator
It is believed that replacing Java with python for customizing hadoop applications will make work more efficient.
Another similar project, happy, is Jython
Developers can use the hadoop framework for convenience. The happy framework encapsulates the complex calling process of hadoop, Making Map-reduce development easier. Happy
The map-reduce job process is defined in the sub-class happy. happyjob. After you create a class instance, set the input and output parameters of the job task, and then call the run () method.
Start the sub-governance protocol processing. At this time, the happy framework will serialize the user's job instance and copy the job and the corresponding dependent libraries to the hadoop cluster for execution. Currently, the happy framework has been used by the data integration site.
Point freebase.com is adopted for Site Data Mining and analysis.