Feather package for fast data frame reading and writing, you deserve to have

Source: Internet
Author: User

What is feather?

Feature is a file format that supports interactive storage of R languages and Python, faster. Currently supports the R language Data.frame and Python pandas dataframe.

Feather received support for the Apache Arrow project, Apache Arrow, a new open source project for the Apache Foundation and a top-notch project. It is designed as a cross-platform data layer to speed up the operation of big data analytics projects.

Features of Feather

Feather is a fast, lightweight, easy-to-use binary file format for storing data frames. The main features are as follows:

    • Lightweight and easy to measure
    • Language-Independent: Supports Python and R languages, and can also be read in other languages
    • High performance reading and writing
Code Demo

My Computer hardware configuration: win7,64 bit operating system, 8G memory, CPU A6 dual core. Each person's computer configuration is different, the data read and write time is different. The reader can experiment with the code below and see for yourself.

featherThe package was introduced on March 29 in Rstudio's official blog. Because it was just posted on GitHub, Windows user installations need to be compiled and installed using the GCC 4.93 tool, which is cumbersome. The feather package was officially released today in Cran, and now we just need to install it with the function under R 3.3.0 install.packages() . No upgrade to R 3.3.0 versions of Windows users can refer to the article-hand in hand to teach you to upgrade r in a Windows environment. Let's try out feather how fast the package is in R:

library(feather)x <- runif(1e7)x[sample(1e7, 1e6)] <- NA # 10%的NA值df <- as.data.frame(replicate(10, x))# 内存占用format(object.size(df), ‘MB‘)#[1] "762.9 Mb"#数据写出system.time(write_feather(df, ‘test.feather‘)) # 用户  系统  流逝  # 3.97  3.37 29.47 #数据导入system.time(read_feather(‘test.feather‘)) # 用户  系统  流逝  # 3.83  3.51 50.39 #查看下前几行数据data <- read_feather(‘test.feather‘)head(data)class(data) [1] "tbl_df"     "tbl"        "data.frame"

Originally to demonstrate feather readr the speed of the package and package comparison, but the computer configuration is not, readr packet data written out spent nearly one hours have no movement, decisive give up. For the introduction of the readr package of interested readers can refer to here

Summary

Feature is fast, but it is still in the development phase, and officials say it is not suitable for long-term storage, and does not guarantee compatibility with different versions. But it can be used for R and Python interactions, and it's pretty awesome. 762.9Mb data import takes only 50.39 seconds, feather package you deserve to have.

Reference article:

    • Feather:a Fast on-disk Format for Data Frames-R and Python, powered by Apache Arrow
    • Feather R language and python interactive hard disk storage format

This article is compiled by the snow-clear data network. Reprint please indicate this article link http://www.xueqing.tv/cms/article/210

Feather package for fast data frame reading and writing, you deserve to have

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.