Teach you to sort out SQL Server input data

Source: Internet
Author: User
Tags sort

My current project includes a table with 62 million rows and more than 500 columns. Where the data comes from outside of SQL Server, they arrive in a table that identifies the primary key, and all remaining columns are saved in varchar (50).

Problem solving

There are many dates saved in YYYYMMDD format, which is the first question I have noticed. I'll change these dates to integer columns. Then I found that many integers were saved in varchar form, and I changed them to integers.

Next, many columns contain "Y" or "N" or null. We are likely to turn them into ranks, but there is a major flaw in doing so-you can't index a ranked one. So I changed them to char (1) NULL.

In addition, there are columns that contain a ZIP code, preceded by a column of five digits, followed by four digits or null. I changed them to char (5) and char (4), all null.

Another problem I encountered was that I could not understand the contents of a considerable portion of the column. I treated them separately, for varchar (50). I want to determine the maximum length of the string saved in each varchar column, but it is impractical to examine all the data in 62 million rows. So I chose to generate a char or varchar list of all the columns.

SELECT
COLUMN_NAME, DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH, IS_NULLABLE
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='tblArthurClean'
AND DATA_TYPE IN('char','varchar')

I copied the first column in the results table and pasted it into Notepad (this is my favorite text editor because it has excellent find-and-replace features.) To determine the longest data in a particular column, use the following script:

SELECT Max(Len( ColumnName ))
FROM TableName

By extension, I want to traverse the column list and generate the declarations I need. I focus on a few formatting lines and use a separate query to do this.

SELECT
',Max(Len( ' + COLUMN_NAME + ' ))
AS ' + COLUMN_NAME
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'tblArthurClean'
AND DATA_TYPE IN( 'char', 'varchar' )

A short result set in List A. Now follow these steps:

Max(Len( Record_Length_Indicator_or_Filler ))
AS Record_Length_Indicator_or_Filler, 
Max(Len( Job_Number )) AS Job_Number,  
Max(Len( Personal_or_Firm_Name ))
AS Personal_or_Firm_Name,  
Max(Len( Mailing_Address_Supplied_by_Customer

Paste the result set into a text editor.

Inserts a select before the first line.

Adds the From and table names at the end of the file.

I add the following at the beginning and end of the file.

SELECT GetDate()
GO

Then I delete the comma in front of the first line of the output result, save it as a query, load it and let it run.

I am very satisfied with the performance of it on my server. It only takes a little more than one hours to work out the results-not too long for the analysis of a 62 million-row hundreds of-column table.

The result is a row of result sets, which I can use as a guide to modifying the table structure and column names, and to relate these results to the original table and make adjustments accordingly.

This article explains how to take a step back and think about how you can save a lot of input time. My motto is: "As long as SQL can do it, it should be done." ”

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.