How to organize SQL Server Input Data

Source: Internet
Author: User

My current project contains a table with 62 million rows and more than 500 columns. The data comes from outside SQL Server. The tables they arrive at have a primary key, and all the remaining columns are saved in varchar (50) format.

Problem Solving

Many dates are saved in the YYYYMMDD format, which is the first problem I have noticed. I changed these dates to integer columns. Then I found that many integers are saved as varchar, and I changed them to integers.

Next, many columns contain "Y", "N", or NULL. We are likely to convert them into rankings, but there is a major defect in doing so-you cannot index a ranking. Therefore, I changed them to char (1) NULL.

In addition, some columns contain zip codes. The first column is five digits, and the last column is four digits or NULL. I changed them to char (5) and char (4), both of which are NULL.

Another problem I encountered was that I could not understand the content in a considerable number of columns. I process them separately, for varchar (50 ). I want to determine the maximum length of the string saved in each varchar column, but it is impractical to check all the data in row 62 million. Therefore, I choose to generate a char or varchar list for all columns.

       
        SELECTCOLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, IS_NULLABLE FROMINFORMATION_SCHEMA.COLUMNSWHERE TABLE_NAME ='tblArthurClean'AND DATA_TYPE IN('char','varchar')
       

Copy the first column in The result table and paste it into notepad (this is my favorite text editor because it has excellent search-replace functions .) To determine the maximum data in a specific column, use the following script:

       
        SELECT Max(Len( ColumnName )) FROM TableName
       

In other words, I want to traverse the column list and generate the declaration I need. I concentrated the required content into some formatting rows and used a separate query to achieve this goal.

       
        SELECT',Max(Len( ' + COLUMN_NAME + ' )) AS ' + COLUMN_NAME FROMINFORMATION_SCHEMA.COLUMNSWHERE TABLE_NAME = 'tblArthurClean'AND DATA_TYPE IN( 'char', 'varchar' )
       

List A is A short result set. Perform the following steps:

       
        Max(Len( Record_Length_Indicator_or_Filler )) AS Record_Length_Indicator_or_Filler,   Max(Len( Job_Number )) AS Job_Number,    Max(Len( Personal_or_Firm_Name )) AS Personal_or_Firm_Name,    Max(Len( Mailing_Address_Supplied_by_Customer
       

Paste the result set into a text editor.

Insert SELECT before the first line.

Add the FROM and table names at the end of the file.

I add the following content at the beginning and end of the file.

       
        SELECT GetDate()GO
       

Then I delete the comma before the first line of the output result, save it as a query, load it, and run it again.

I am very satisfied with its performance on my server. It takes more than an hour to calculate the result. It is not too long to analyze a table with hundreds of rows and hundreds of columns.

The result is a result set of a row. I can use it as a guide for modifying the table structure and column name, and associate these results with the original table to make corresponding adjustments.

This article explains how to save a lot of input time. My motto is: "As long as SQL can be done, it should be done ."

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.