Go Disk Space Usage and SQL Server performance

Source: Internet
Author: User

Disk Space Usage and SQL Server performance

by Gregory A. Larsen

When you think of SQL Server performance components, you think of the usage of CPU, memory, and the I/O it takes to Proces s a query. However, there is another component you should consider. That component is disk space usage. The old days disk space is expensive, so much so people spent great effort to conserver disk space. Remember the Y2K issues with dates being stored as a 6 character field. The ancient computer programmers left off the first and digits of the year to save 2 bytes of space when storing each date . Today disk drives is inexpensive, so we don ' t spend great deals of time thinking of ways to optimize our database design To minimize disk space usage. Nevertheless, every extra byte of space you waste in your database causes a performance hits to your application. This article looks at disk space usage and how it affects performance.

Performance impacts of Disk Space Usage

To discuss performance impacts of disk space usage we need to review what it takes to retrieve and write data to disk Driv Es. Each time you need to read a piece of data from SQL Server need to retrieve the information from disk. This retrieval causes a disk I/O. Data in SQL Server was stored in a number of different physical pages. Each of the single page is 8060 bytes long. for every page of data, SQL Server requires one I/O to retrieve that data.

To better understand how I/O can impact performance Let's consider how many I/O ' s it would take to retrieve 10,000,000 rec Ords from a SQL Server table. Say Each record is a bytes long. This means can store different records per page. The entire 10,000,000 records would require 384,616 data pages just to store the raw data; This doesn ' t take to account space for indexes. So-to-read every record in my large million record table it would take 384,616 I/O.

Now say I saved just 2 bytes of the data from the every record, making each record 298 bytes long. This would mean you could store records per SQL Server page. With this 2-byte savings, you now could retrieve 1 more record with each I/O operation. The total I/O savings if you read the entire 10,000,000 record table would is 14,245 I/O ' s. This was a lot of I/O would save for only 2 bytes of space saved per record.

So, each time you can save a few bytes of the data from each record stored in a SQL Server table you improve your performance. The larger the table the bigger the performance gains you'll see. Therefore, you want-to-try to minimize your a record size so you can maximize the number of records that can is stored in EA CH Data page.

Now I/O is not your only savings if you minimize the space it takes to store your data. Keep in mind each page this is a read first needs to being stored in the buffer pool. The smaller the record sizes the more records you can fit into a single buffer pool page. Therefore, by conserving disk space for storing your data is also conserving the amount of memory you'll need when Reading your data.

Using Data Types to Minimize Disk Space Usage

When selecting a data type for a particular column you need to make sure you select the right data type. It's easy to pick the wrong data type and waste some disk space. Therefore, need to is careful and make sure you select a data type that meets your data requirements, and also Minimiz Es the amount of disk space required to store each data column. I ' ll review different data types and discuss space consideration for each.

First, let me talk about Unicode date types. Unicode data types is NVARCHAR, NCHAR, and NTEXT. Unicode data types require 2 bytes to store every character. Whereas Non-unicode date types like VARCHAR, CHAR, and TEXT only take one byte to store each character. The Non-unicode data types can only store different characters. With Unicode data types, you can store up to 65,536 different 2 byte patterns. Because of the limitation on the number of the unique characters so can be stored using Non-unicode data types, the Hexadeci Mal representation for a particular character are not the same across different code pages. When your use is Unicode data types, the character representation for commonly used characters is the same across code pages. Unicode data types is typically used for international applications. If your application does not need to being supported internationally then you should consider just using the VARCHAR, CHAR, a nd TEXT data type, provided the characters your application uses Can be represented using the 1 byte–256 character set. By using the Non-unicode data types, you'll use half the disk space for each character-based column. If you store lots of character data then your disk space consumption savings using non-unicode characters could is Substan Tial. I converted one of our databases from using all Unicode data types to using all Non-unicode data types and realized a 40% Reduction in disk space consumption. This kind of disk space savings can provide drastic performance gains over using Unicode data types. This performance improvement was made by maximizing the number of records, can be stored in a single SQL Server page.

Something else to consider if storing character data is your use of CHAR and VARCHAR data types. The CHAR data type is a fixed formatted data type. So if you define a column as CHAR (at) It always stores bytes of data. So if you have populate a CHAR (a) column with the value "ABC", SQL Server Stores "abc" followed by spaces. On the other hand, a VARCHAR column is variable length column. If you create a column as VARCHAR (a) and populate it with a value of ' ABC ' only 5 bytes is stored, 2 byte for the L Ength of the data, plus 3 bytes for the value "ABC". If you had sparsely populated character columns, then storing those columns as VARCHAR would help reduce the disk space us Age of those columns.

Now let's consider how integer data types is stored. There is 4 different integer data types:tinyint, SMALLINT, INT, and BIGINT. Each one of the these data types requires a different number of bytes to store their value. TINYINT takes 1 byte and supports values from 0 to 255. SMALLINT takes 2 bytes and supports values from–32,768 to 32,767. A INT date type takes 4 bytes and can handle values from-2,147,483,648 to 2,147,483,647. The BIGINT data type takes 8 bytes and can store values from-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

When defining a column to store an integer value need to consider all the possible different values that your Applicat Ion might require. If your column would represent a small set of ten or so different values from 0-9 and then you should use a TINYINT column. If you used one of the other integer data types, you would just is wasting storage space.

It is common practice to has an ID of column in a table defined as an identity column. When creating your identity columns consider the maximum number of rows you expect in your table when determining the data Type. If you aren't going to has more than and rows then you should select a SMALLINT data type over an INT for your identi Ty column to save space. Commonly people define all identity columns as INT data types even when the table was expected to has only a few rows. Make sure if declaring the data type for a identity column you select to appropriate integer data type.

Another data type to being considered for saving space is the BIT data type. The BIT data type is for storing 1 or 0 column value. The BIT data type is ideal for storing true/false conditions. Since the smallest unit of storage is a byte, a single byte of storage can store up to 8 different BIT columns. If you have true/false or 0/1 type conditions you want to remember to use the BIT data type instead of one of the Teger data types. If you are only having a single bit of data type column in your table and then defining it as a BIT, or TINYINT uses the same amount O F Disk storage space. This is because to store a single BIT data type it requires 1 byte. Your space saving over TINYINT comes to play when you have the multiple BIT data type columns in a single table.

The DATETIME data type is another commonly misused data type this wastes space in your database. There is different date data types:smalldatetime, and DATETIME. Like the other data types, these-different date types require a different number of bytes to store their value. The smalldatetime data type requires 4 bytes of storage, and stores dates from January 1, 1900 through June 6, 2079 and th E time portion is only accurate to the minutes. The DATETIME data type takes 8 bytes to store it value and supports dates from January 1, 1753 through December 31, 9999 And the time portion is accurate to 3 milliseconds. If your application only needs to store the current date or some a future date that does not to go beyond the year 2079, and You don't need a time portion then the smalldatetime column data type are the correct data type to use. Most applications this deal with dates can save space by using the smalldatetime data type, provided the application doesn ' t need a time precisionLess than one minute. Anytime your application needs to store time down to the second, or has dates prior to 1900, or beyond 2079 then DATETIME Data type must be used.

Sometimes people use the Uniqueidentifer data type for ID columns within a table. This data type produces a GUID and takes-bytes to store it. It is ideal if you want to create a unique value for a column, needs to being unique regardless of which column is Bein g defined across all columns, tables, or SQL Server machines. One common reason for using the This data type was when you were doing replication. A uniqueidentifier column would be unique across both the Publisher and Subscriber databases. I only mention the this data type here because if your is using a uniqueidentifier data type in a non-replicated environment T hen you is wasting space. In a non-replicated environment to save disk space you should use one of the integer data types instead of a uniqueidentif IER.

Some applications use Money and smallmoney data types to store currency amounts. Just like the other data types money and smallmoney take different amounts of space to store their data values. SmallMoney takes 4 bytes and supports values from-214,748.3538 to 214,748.3647, whereas the Money data type takes 8 Bytes and stores values from-922,337,203,685,477.5808 to 922,337,203,685,477.5807. If your application is storing monetary transactions that would never exceed a little over 200,000 then you'll save 4 byt Es for each transaction amount by using the SmallMoney data type.

Binary data can be stored using the different data types:binary and VARBINARY. These the data types function just like CHAR and VARCHAR. Meaning if you use a BINARY data type it is the fixed length whereas a VARBINARY is a variable length data type. Just like the VARCHAR data type the VARBINARY data type stores a 2 byte length along with the data so SQL Server can dete Rmine the actual length of the VARBINARY data. If you plan to has variable length binary strings then the VARBINARY data type would conserve on disk space usage.

Conclusion

Using an incorrect data type, requires SQL Server to use additional disk space to store a given value not only wastes Disk space, but also causes performance issues. You should always pick a data type this uses the least amount of disk space in order to implement all the possible Combina tions of a given column. Next time contemplating which data type should be used for a particular column remember the space and performance Issues that might occur by choosing the wrong data type.

Original URL:

Http://www.databasejournal.com/features/mssql/article.php/3718066/Disk-Space-Usage-and-SQL-Server-Performance.htm

Go Disk Space Usage and SQL Server performance

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.