[Data cleansing]-clean "dirty" data in Pandas (3) and clean pandasPreview Data
This time, we use Artworks.csv, And we select 100 rows of data to complete this content. Procedure:
DataFrame is the built-in data display structure of Pandas, and the display speed is very fast. With DataFrame, we can quickly preview and analyze data. The Code is as follows:
import pandas as pddf = pd.read_csv('../data/Artworks.csv').head(100)df.head(10)
Statistical date data
Let's take a closer look at the data i
:
Create a tbs_usage table on the Data host to reflect the amount of data files used in the data. The tbs_timeid is the primary key of the table and is used as the id that uniquely identifies the tablespace of the database on the current day. The tbs_timeid is df. tablespace_name | "-" | (sysdate)1. pansky users are responsible for daily management. Currently, they are mainly used to monitor the table space data volume.SQL> create user pansky identifi
Most of the students who Do data analysis start with excel, and Excel is the most highly rated tool in the Microsoft Office Series.But when the amount of data is very large, Excel is powerless, python Third-party package pandas greatly extend the functionality of excel, the entry takes a little time, but really is the necessary artifact of big data!1. Read data from a filePandas supports the reading of multiple format data, of course the most common are Excel files, csv files, and txt files.name
Basic operations:
Get the Spark version number (in Spark 2.0.0 for example) at run time:
SPARKSN = SparkSession.builder.appName ("Pythonsql"). Getorcreate () Print sparksn.version
Create and CONVERT formats:
The dataframe of Pandas and Spark are converted to each other:
PANDAS_DF = Spark_df.topandas ()
SPARK_DF = Sqlcontext.createdataframe (PANDAS_DF)
Reciprocal conversion to spark RDD:
RDD_DF = Df.rdd
For example, it is 13:31:40.Past: 11:30:24Now I want to get two date differences in the form of: XX days xx hours xx minutes XX secondsMethod 1: Java
Code
Dateformat df =NewSimpledateformat ("Yyyy-mm-dd hh: mm: SS");
Try
{
Date d1 = DF. parse ("13:31:40");
Date D2 = DF. parse ("11:30:24");
LongDiff = d1.gettime ()-d2.gettime ();
LongDays = d
What is the following data format? What should I do with PHP? Thank you! I: 5; a: 10: {I: 2; a: 2: {s: 2: quot; df quot; s: 1: quot; 0 quot; s: 2: quot; da quot; s: 1: quot; 1 quot;} I: 22; a: 2: {s: 2: quot; df quot; what is the following data format? What should I do with PHP? Thank you!
I: 5; a: 10 :{
I: 2; a: 2: {s: 2: "df"; s: 1: "0"; s: 2: "da"; s
understanding of: SQL Server 2005/2008 Database Engine manages a hierarchical collection of entities that can be protected by permissions. These entities are called "securable objects." In securable objects, the most prominent are servers and databases, but discrete permissions can be set at a finer level. SQL Server controls the actions that the principal performs on securable objects by verifying that the principal has the appropriate permissions. Security object relationships such as: Her
Dataframe Data Filter--loc,iloc,ix,at,iat condition Filter Single condition filter Select a record with a value greater than N for the col1 column: data[data[' col1 ']>n] filters the col1 column for records with a value greater than N, but displays col2, Col3 column value: data[[' col2 ', ' col3 ']][data[' col1 ']>n] Select a specific row: Use the Isin function to filter records based on specific values. Filter col1 value equals record of element in l
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.