Analysis on the Performance and occupied space of varchar (n) and nvarchar (n) in SQL

Source: Internet
Author: User
This article describes in detail the performance and space occupied by varchar (n) and nvarchar (n) in SQL. For more information, see.

This article describes in detail the performance and space occupied by varchar (n) and nvarchar (n) in SQL. For more information, see.

How to explain n in varchar (n) and nvarchar (n:
Nvarchar (n) can store up to n characters, which are not distinguished between Chinese and English.
Varchar (n) can store up to n Bytes. One Chinese character is two bytes.

Space occupied:

A single nvarchar (n) character occupies two bytes.
Varchar (n) occupies two bytes of Chinese, and English occupies one.

Value range of n:
Nvarchar (n) n ranges from 1 to 4000.
Varchar (n) n ranges from 1 to 8000.

Whether the size of n affects performance:
The length n in varchar and nvarchar does not affect the space size and performance. Unless n is max and the content is greater than 4000 or 8000
Setting n is more of a business requirement. For example, if the ID card is limited to 18 characters, an error is reported if there are more than 18 characters, or the hard disk is prevented from being cracked by malicious attacks. It has no effect on space and performance.

N setting is better:
Since there is no impact on space and performance, we only need to consider the business needs. I have analyzed Microsoft databases, most of which are set to 256. We will also see 64,128,512, max, etc, it may be easy to remember.


Examples of varchar (n) and nvarchar (n) buckets:
Unicode data with a variable length of n characters. The storage size of bytes is twice the number of input characters.
The two fields have Field Values: Me and coffee.
The varchar field occupies 2 × 2 + 6 = 10 bytes of storage space, while the nvarchar field occupies 8 × 2 = 16 bytes of storage space.
If the field value is only English, you can select varchar. If the field value contains many double-byte (Chinese, Korean, etc.) characters, nvarchar is used.

How do I select varchar and nvarchar?

In SQL Server, varchar stores data in a single byte. nvarchar uses Unicode to store data. when a Chinese character is stored in SQL Server, it is saved as two bytes (Unicode encoding is generally used) and an English character is saved to the database. If the field type is varchar, only one byte is occupied, if the field type is nvarchar, it occupies two bytes.

Example

The Code is as follows:

----------------------------------------------------------------------------------
-- Subject: whether n and max in nvarchar (n) and nvarchar (max) affect the performance.
-- Environment: Microsoft SQL Server 2008 R2 (RTM)-10.50.1600.1 (Intel X86)
-- Apr 2 2010 15:53:02
-- Copyright (c) Microsoft Corporation
-- Enterprise Evaluation Edition on Windows NT 5.2 (Build 3790: Service Pack 2)
----------------------------------------------------------------------------------
-- Create two tables, one for nvarchar (4000) and the other for nvarchar (max)

Create table [dbo]. [testnvarchar4000] (
Id int IDENTITY (1, 1 ),
Cnt nvarchar (4000)
)
GO
Create table [dbo]. [testnvarcharmax] (
Id int IDENTITY (1, 1 ),
Cnt nvarchar (max)
)

GO

-- Then insert 0.1 million data records. Each cnt contains 4000 characters (the maximum value of n in nvarchar (n), which is about 1.6 GB.


BEGIN TRANSACTION
DECLARE @ I INT;
SET @ I = 0;
While @ I <100000
Begin
Insert into [testnvarchar4000] values (LEFT (REPLICATE (cast (@ I as nvarchar) + 'I am Liu yongfa', 1000), 4000 ))
Insert into [testnvarcharmax] values (LEFT (REPLICATE (cast (@ I as nvarchar) + 'I am Liu yongfa', 1000), 4000 ))
Set @ I = @ I + 1
END
COMMIT


-- Clear the cache or restart the SQL service to test the query speed and lob reading status (lob indicates a large object)
-- Test testnvarcharmax
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SET STATISTICS IO ON
SET STATISTICS TIME ON
Select count (*)
FROM testnvarcharmax
WHERE cnt LIKE '% Liu yongfa %'

SET STATISTICS TIME OFF
SET STATISTICS IO OFF

-- Test testnvarchar4000
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SET STATISTICS IO ON
SET STATISTICS TIME ON
Select count (*)
FROM testnvarchar4000
WHERE cnt LIKE '% Liu yongfa %'

SET STATISTICS TIME OFF
SET STATISTICS IO OFF


-- Result:
-- (One row is affected)
-- Table 'testnvarcharmax '. Scan count 3, logical reads 100000, physical reads 8494, pre-read 99908, lob logic reads 0, lob physical reads 0, and lob pre-read 0.
--
-- SQL Server execution time:
-- CPU time = 1172 milliseconds, occupied time = 30461 milliseconds.


-- (One row is affected)
-- Table 'testnvarchar4000 '. Scan count 3, logical reads 100000, physical reads 8523, pre-read 99916, lob logic reads 0, lob physical reads 0, and lob pre-read 0.
--
-- SQL Server execution time:
-- CPU time = 968 milliseconds, occupied time = 30038 milliseconds.


-- From the results, we can see that the read times are basically the same and no lob is read. The above data is tested for multiple times.

-- The length of the first 1000 fields of testnvarcharmax is + 1. to test whether the length exceeds 4000 characters, lob is used to read
UPDATE testnvarcharmax SET cnt = cnt + '1' WHERE id <= 1000

-- Result:

-- (One row is affected)
-- Table 'testnvarcharmax '. Scan count 3, logical reads 100000, physical reads 8292, pre-read 99696, lob logic reads 900, lob physical reads 83, and lob pre-read 0.
--
-- SQL Server execution time:
-- CPU time = 1124 milliseconds, occupied time = 30318 milliseconds.

-- The result shows that lob is used for reading. However, the time difference is not big.


-- Double the field value and test again:
UPDATE testnvarcharmax SET cnt = cnt + cnt WHERE id <= 1000
-- Result:

-- (One row is affected)
-- Table 'testnvarcharmax '. Scan count 3, logical reads 100000, physical reads 8164, pre-read 99521, lob logic reads 1000, lob physical reads 101, and lob pre-read 0.
--
-- SQL Server execution time:
-- CPU time = 1094 milliseconds, occupied time = 31095 milliseconds.

Normally, we can use varchar to store Chinese characters. However, if the operating system is an English operating system and the Chinese font is not fully supported, if the Chinese character is varchar in SQL Server, garbled characters (displayed as?) are displayed ??). In addition, the host normally supports Chinese environments. Therefore, if varchar is used to store data, it cannot be found in the development stage. in most cases, there will be no problems during deployment.
But! If the deployed host is an English operating system and does not support the Chinese environment, the problem arises. All varchar fields are garbled when they are stored in Chinese ??). In general, you do not know that this is because you use the wrong data type to store the data. You will try to install Chinese fonts, try to set the language environment of the operating system... these cannot solve the problem. The only solution is to set the database field type to nvarchar (or nchar ). anyone familiar with project management should know that it is terrible to modify the database at the deployment stage.

Another advantage of using nvarchar is that you do not need to consider the differences between Chinese and English characters when judging strings.
Of course, using nvarchar to store English characters will increase by a factor of storage space. However, given the low storage cost, compatibility will bring you more benefits.

Therefore, we should try to use nvarchar to store data during Design. varchar is used only when you ensure that this field does not save Chinese characters.


If both varchar (300) and varchar (8000) store the same number of characters, there is no difference in performance and the storage behavior is not different. Because they all have the same storage structure, two-byte offset, and two-byte columns (if all columns in the table are of the varchar type ). The difference lies in the storage capacity.
Most performance comparisons are concentrated on varchar, char, varchar, and varchar (max. Also, out-of-Row Storage (supported by SQL Server 2005 ).

Varchar (max) (lob type) is different from varchar storage.
When the LOB data is small enough, you can consider storing the data directly in the Data row (the data page where the row is located), so as to avoid additional LOB page reading, improve the efficiency of LOB Data Access (set the threshold for storing LOB data directly on the data page by the text in row option ).
When the LOB data exceeds this threshold, or the row SIZE exceeds 8060 bytes (the maximum SIZE of a single row), the LOB data is stored on the LOB page, on the data page, a 16-byte pointer pointing to the LOB page is retained. Of course, its access efficiency will be low.
In addition, malicious users can use this to "blow up" your disk.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.