1.spark Read the compressed file of HDFs GZ
spark1.5 later versions support direct reading of files in GZ format, no difference from reading other plain text files.Start the spark shell interface and read the GZ file in the same way as a plain text
how to unzip a. gz zip file
#gzip-D xxx.gz
Tar command
[Root@linux ~]# tar [-cxtzjvfppn] files and directories ....
Parameters:
-C: Create a compressed file parameter directive (the meaning of Create);
-X: Unlocks a parameter directive for a
Issue background:I need to generate a type of double matrix of about 1.5T, the hard disk can not stand, io time is not consumed, so try to compress before the output. Matrix generation using Java, calculations on matrices using CPPThen try to use
Previously wrote a log file (TXT file) to extract a specific log, written to the MySQL database script, because the log is too large, maintenance staff to pack the log compressed into the tar.gz format.
TXT file before a single file of more than 2G,
Original: Linux unzip multiple. gz or. tar.gz files at once
Unzip multiple compressed packagesFor extracting multiple .gz files, use this command:for gz in *.gz; do gunzip $gz; doneFor extracting multiple .tar.gz files, use the following
The RPM command management package on the CentOS systemOne, RPM package management is divided into installation, upgrade, uninstall, query and calibration, database maintenanceInstallation of RPM packagesThe option parameters added after the rpm
I. Batch build folder, bulk Read folder nameToday, the work encountered a problem: Boss gave us more than 200 companies ID code (such as 6007, 7920, etc.), need to search for the download news according to these ID numbers, so that the download to
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.