Common instruction records
Common Python operations:
#! /Usr/bin/ENV python3
#-*-Coding: UTF-8 -*-
# Font problems of Drawing
From pylab import MPL
MPL. rcparams ['font. Sans-serif'] = ['fangsong'] # specify the default font
MPL. rcparams ['axes. unicode_minus '] = false # solve the problem of saving the image as a negative sign'-'and displaying it as a square
# Show All rows and columns
PD. set_option ("display. max_columns", none)
PD. set_option ("display. max_rows", none)
# Output all variables
From ipython. Core. interactiveshell import interactiveshell
Interactiveshell. ast_node_interactivity = "all"
# Drawing X axis label skew
Import pylab as pl
Pl. xticks (rotation = 45)
# Make sure that all strings are numeric characters
Data1 ['T'] = data1 ['atx']. Str. isdigit ()
Data1 = data1 [data1 ['T'] = true]
# Connect to the ipvs database and read data
Conn = psycopg2.connect (Database = "", user = "", password = "", host = "", Port = "")
Cur = conn. cursor ()
Cur.exe cute ("select title from \" fd_content_interaction \";")
Rows = cur. fetchone () // fetchone, fetchall, fetchweight (num = none) # obtain the data size
Print (rows)
Conn. Commit ()
Cur. Close ()
Conn. Close () # The reason for shutting down the database is that the number of concurrent connections to the database is limited. The database data is read in Python, And the jasonarray format is converted to the dict format in tuple.
# Filter empty characters, 0 and none
A = List (filter (none, list1) # both null and none are boolean values 0
# Implement time sequence
PD. date_range ('20140901', periods = 20100101)
# Open a file
With open ("/tmp/foo.txt") as file:
Data = file. Read ()
# Perform operations on multiple rows and return them to one row as values
Def my_test (A, B ):
Return A + B
DF ['value'] = DF. Apply (lambda row: my_test (row ['c1'], row ['c2']), axis = 1)
Common hive operations:
Hive
Hive> show tables; --- display table;
Hive> show databases; --- displays the database, which contains multiple tables;
Hive> Use brain; --- use the database; then use show tables; to view the data table
Hive> show tables;
Hive> set hive. cli. Print. header = true; --- displays the headers of the table, that is, the column name;
Hive> select * From ttc_show where ttc_show.date_col = 20180601 and ttc_show.app_id = 'abcdef' and ttc_show.alg = '["related"] 'limit 20;
Common SQL:
Select name, country from websites;
Select distinct country from websites; // select the variable for removing repeated items from country.
Select * from websites where country = 'cn'; // pay attention to single double quotation marks. The address after "from" is double quotation marks. The equal sign after "where" is a single equal sign. The text content is enclosed in single quotation marks, no quotation marks are required for numeric content
Select * from EMP where SAL> 2000 or comm> 500; // or use and
Select * from EMP where comm is null;
Select * from EMP where Sal between 1500 and 3000; // numeric type, strong type all have between, strong type sort based on the first letter
Select * from EMP where Sal in (1500 );
Select * from websites where Alexa> 15 and (Country = 'cn' or Country = 'usa ');
Select * from websites order by Alexa; // sort by a certain column. You can sort multiple columns first by the first specified column, then by the second column.
Insert into websites (name, URL, Alexa, country) values ('Baidu ', 'https: // www.baidu.com/', '4', 'cn'); // The ID column is automatically updated without entering any fields.
Update websites set Alexa = '20140901', Country = 'usa' where name = 'cainiao tutorial '; // update overwrite data. Be careful when executing an update without a where clause, proceed with caution !!!! If no WHERE clause exists, this column of all rows is modified.
Delete from websites where name = 'Baidu 'and Country = 'cn ';
Select * from websites where name like 'G % '; // select a fuzzy query starting with G
Select * from websites where name like '% K'; // select a fuzzy query whose end is K.
Select * from websites where name like '% OO %'; // select
Select * from websites where name not like '% OO %'; // select
Select * from username where username like 'segment _ % '-- the CIDR block is found.
'% A' // number ending with a 'a %' // data starting with a '% A %' // number containing a '_ _'// three digits with the middle letter
'_ A' // two digits with the ending letter A' A _ '// two digits with the start Letter
Select * from websites where name Regexp '^ [GFS]'; // select all websites whose names start with "g", "F", or "S"
Select * from websites where name Regexp '^ [A-H]'; // select a website whose name starts with a letter A to H
Select * from websites where name in ('Google ', 'cainiao tutorial'); // select all websites whose name is "google" or "cainiao tutorial"
Select * from websites where Alexa not between 1 and 20;
Select column_name (s)
From Table1 inner join Table2 on table1.column _ name = table2.column _ name; // select the items matching a column in the two files.
The meaning of left join and right join is that a specific column may exist on the left (right) side and does not exist on the other side during select. At this time, the left (right) column is left, the value of no matching content on the other side is null.
Note: Only single quotation marks can be used for string search, and double quotation marks are used for file names.
Common Linux commands:
Run the man command to view the usage documents of each command.
U cancel Operation
Cat obtains the File Content
Cat Merge files
$ Cat file1.txt file2.txt> file.txt can read any number of files
Use> Add a text stream to the end of another file
$ Cat file1.txt> file2.txt
Grep search by condition, plus | separated from files
Dd Deletion
Delete 5 rows in d5d
CP dir copy the file to the new folder
CP-r dir copy all files in the folder to the new directory.
Move the MV dir file and name it file2.
The hadoop-put dir1 dir2 statement is to put the data on dir1 to dir2 (HDFS)
View the file content on hadoop:
Hadoop FS-ls/DIR/File
Hadoop FS-CAT/DIR/file | head-10 (files must be decompressed for compressed files; otherwise they are garbled)
Download the file from HDFS to the local:-Get
Differences between '', $, $ (), and $ {} in Linux:
$ () Is a command, which is equivalent to ''. For example, todaydate = $ (date + % Y % m % d) means to execute the date command and return the execution result to the variable todaydate, it can also be written as todaydate = 'date + % Y % m % d ';
Variables are placed in $ {}. For example, Echo $ {path} is used to take the value of the PATH variable and print it. You can also leave no parentheses such as $ path.
Documentation)
Decompress the file: extract the tar-cxvf unzip date0000.tar.gz file and put it in the current folder. The name is the same as the name of the TAR file.
Compressed file: tar-zcvf finaldata.tar.gz finaldata/
The hadoop FS-getmerge operation can only be put to a local folder and then put to HDFS
View the running program: PS-Ef | grep ks_collect_data
Firedata 30458 88548 0 00:00:00 pts/10 sh ks_collect_data.sh 20180813 20180826
Kill this task: $ kill-9 30458
Kill tasks on the cluster: hadoop job-kill job_1524660075674
Upload the local server file to another server: SCP 211.100.28.190:/DIR/file/Dir and then enter the password.
Perform operations on Multiple folders cyclically:
For file in/home/hustyangju
Do
Done
CTRL + C: Stop running the program. CTRL + Z: The task remains suspended and has not been completed. You can use fg/BG to continue the foreground or background tasks, the FG command restarts the foreground interrupted task, and the BG Command places the interrupted task in the background for execution.
Download the file from the remote server to the local: sz/DIR/file or cut the disk to this folder SZ file.
Log:/DIR/test. Sh>/DIR/test. log 2> & 1 &
Nohup Java-jar demo2.jar> test. out 2> & 1 when the account exits or the terminal is closed, the program continues to run
Large files. If it is troublesome to submit the interoperability on the remote server, you can run the tail-N 10000 File> file_sample command and then run the sz sample on the window to debug the program and then drag it into the server for operation.
WC: counts the number of bytes, number of words, and number of rows in a specified file, and displays the result. -C: the number of bytes,-l: the number of rows,-W: the number of words, and-M: The number of characters.
The order and number of columns output during combination are not affected by the order and number of options. Output result sequence: number of lines, characters, bytes, file name
Tail-F click to check whether data is being written
Crontab-l view the running periodically executed program. crontab-E is equivalent to VIM, and then edit it.
Du-H filename/view the filename file size, ll-H view the details of all files in the folder
Time: Date = 'date-d "+ 1 day $ startdate" + % Y % m % d'
Computer Knowledge:
Encoding:
In computer memory, Unicode encoding is used in a unified manner, when you need to save to the hard disk or need to transfer, it is converted to UTF-8 encoding.
When edited with notepad, The UTF-8 characters read from the file are converted to unicode characters into memory, after the editing is complete, the Unicode is converted to the UTF-8 when saved to the file.
The IP address is composed of numbers, which is difficult to remember. Therefore, with the domain name, you can find the IP address through the domain name address.
Http://zhidao.baidu.com this is a domain name, one IP can have multiple domain names
Http: // 168.103.123.465 this is IP
Linux:
SSH is the Secure Shell protocol, abbreviated as Secure Shell, which provides security for remote logon sessions and other network services. Xshell supports the SSH protocol and allows remote access.
Git Bash is a command line tool in Windows that facilitates the use of git Command Simulation terminals in windows. Git shell is a shell installed with git, and bash is a shell.
Common computer commands