Storage in Windows azure
Windows azure provides scalable storage services for both structured and unstructured data for the following reasons.
• The application can expand its storage to hundreds of TB of data.
• The storage service can expand data access to achieve better access performance, depending on the data mode.
Even if it is often used together, the storage service is actually independent of any hosted services. access to Windows azure storage is completed through the rest-based API. this means that many clients that support the HTTP stack can access the storage service. in practice, you can combine your data with the storage service in Windows azure to achieve the best performance. like the hosted service, the storage service can also be fault tolerant and has high reliability. each bit of data stored in Windows Azure is copied in the data center and between data centers in different geographical locations. data corruption is constantly monitored. It also protects and copies your data (currently, any of your data has three copies .)
All data is accessed through HTTP requests that follow rest conventions .. NET framework includes a lot of libraries that can interact with rest-based services at different abstract levels. For example, the following is an example of an abstract level, or a direct HTTP request through the webrequest class.
The Windows azure SDK also includes a dedicated client library that provides Domain Models for all windows azure services ). rest-based services are also applied on many other platforms, such as Java, PHP, and Ruby. almost all programming stacks can handle HTTP requests that interact with the Windows azure storage service.
Windows azure storage has four types: blobs, drives, tables, and queues.
To access windows azure storage, you must have a storage account created on the Windows azure portal http://windows.azure.com page.
The storage account is associated with a specific geographic location. currently, each storage account can store up to TB of data, which can be composed of any blob, table, queue and drive. if you want to, you can have any number of storage accounts. However, by default, you can create up to five storage accounts, does it mean that more money is required to exceed this limit ?)
By default, all access to Windows azure storage must be authenticated. Each storage account has two 256bit symmetric keys.
In general, blob provides access to large data blocks, examples of large data blocks, images, videos, documents, and code. each storage account in subpartitions can have any number of iner, and each container can have any number of blobs. the storage restrictions are at the account level, rather than any specific container or BLOB data. blobs is referenced by a URL in the following format:
HTTP (s): // <storage account name> .blob.core.windows.net/<container>/<blob Name>
Windows azure blob storage supports the concept of root iner. this is useful when you only need to specify domain name to access BLOB data. the Blob name "$ rootdenotes" retained by Windows Azure is a special case. the following url specifies a blob data that appears in the Account "myaccount" and is named "“mypicture.jpg.
This is equivalent to the following URL.
You can name your blog to make it look like a hierarchical namespace, but in fact, the namespace is flat. for example, the following is a blob reference. This reference seems to imply a hierarchical structure.
You may mistakenly think that a hierarchical structure is composed of folders named "pictures", "trips", and "Seattle", but in fact, the segments of all paths are actually part of the Blob name. in other words, the container name is "pictures" and the Blob name is "trips/Seattle/spac-eneedle.jpg ".
Both container and blob can store up to 8 KB of metadata in name/value pair format ,. in addition to creation, update, and deletion operations, you can also perform more customized operations for blob, such as copy, snapshot, or lease.
Container is like a blob Storage Security boundary. by default, all accesses to blob storage require a key. however, you can set security rules for a iner to modify this behavior and allow anonymous access. here, the security access inspection rules are at the iner level and only for BLOB access rules. container-level access and virtual traversal and discovery of all blob in the container. blob-only accesses the Uniform Resource Identifier (URI) of blob ). if the access rule is removed, the default behavior is to request a new key.
To efficiently distribute blob content, Windows azure provides the content delivery network (CDN ). CDN stores frequently accessed blog data near blob applications. for example, if a video is particularly popular among Asian users, CDN will move the BLOB data to a server that is geographically closer to the user. CDN is an explicitly available blob feature. using this feature will affect your bill, that is, it is not free. figure 3 illustrates how blob is stored. one account holds blob container. multiple containers can be associated with one account. the container holds BLOB data.
BLOB data can be divided into two types: block blobs and page blobs ).
Each block blog can store up to GB of data. Such large data blocks are divided into 4 MB data blocks. block blob is optimized for stream data load. they perform well on large volumes of data, such as streaming media videos, images, documents, and code. block blob operations are suitable for securely uploading large amounts of information. for example, you can use APIs to concurrently upload data blocks. in addition, if an error occurs, you can resume the upload of a specific data block instead of the upload of the entire data set.
For example, if you upload a 10 Gb file to blob storage, you can split it into 4 MB blocks. then you can use the pubblock function to upload each block independently (or parallel upload with other blocks to increase the throughput ). finally, you can use the pubblocklist function to write all these blocks into a readable blob. figure4 describes this example.
Page blob has a predefined size limit of 1 Tb. It consists of a series of pages, each of which is 512 bytes. the best use of page blob is random access to read/write Io. the distance between write operations such as putpage must be aligned with a page. on the contrary, read operations, such as the getpage method, can take place on any address within a certain range. page blob charges fees based on the amount of information actually contained, rather than the size of the space it retains for you. if you create 1 GB blob containing two pages, Windows azure will charge you 1 kb of data. figure5 describes the basic page blob read/write operations.
Windows azure drives
Windows azure drives (drive) is a single volume virtual hard drive formatted according to NTFS. A separate role instance can map a drive in read/write mode, or many instances can map to a drive in read-only mode at the same time. these two options cannot be merged. typically, an instance maps a drive in read/write mode and regularly captures snapshot for the drive. this snapshot can be mapped to other instances in read-only mode at the same time. because the underlying storage of Windows azure drives is page blob, after the drive is mapped to a computing node, all the information written by this node will be serialized to this blob. after obtaining the lease of the drive, you can write data to blob. lease (lease) is one of the concurrent control mechanisms of Windows azure storage. essentially, it is a lock on blob ). windows azure drives are useful for legacy applications that depend on the NTFS file system and the standard Io library. all operations on page blob are available for Windows azure drive.
Figure 6 Alibaba strates a Windows azure drive.
Windows azure drive is accessible for code running in a role. data written to the Windows azure drive is stored in a page blob, which is defined by the Windows azure blob service and cached in the local file system.
Windows azure tables
Windows azure table provides scalable structured storage. table is associated with a storage account. windows azure tables is not like tables in a typical relational database. they neither implement relationship nor schema. on the contrary, each entity stored in the table can have a different set of attributes, such as string orint. table update and deletion use optimistic concurrency, while Optimistic Concurrency is timestamp-based. optimistic Concurrency assumes that concurrency violations (concurrency violations) occur less frequently, and simply Disables any update or delete actions that cause concurrency violations. figure7 describes table Storage.
All entities stored in the table have three attributes: A partitionkey, A rowkey, and a system-controlled Attribute-lastupdate. the object is uniquely identified by the partitionkey and rowkey attributes. the lastupdate attribute is used to optimize concurrent access.
Windows azure monitors the partitionkey attribute and automatically extends the table when there are enough activities. potentially, it can expand the table to thousands of storage nodes by distributing entity in the table. the partitionkey also ensures that some associated entities are always put together. this means that it is very important to select a good key value for the partition key. the combination of partitionkey and rowkey uniquely identifies the specified entity in any table.
If the partitionkey and rowkey attributes are specified, only one entity is returned for a query of Windows azure table. many entity may be returned for any other type of query, because the query conditions do not guarantee that the results that meet the conditions are unique. windows azure table stores the data returned in the page (currently, a maximum of 1000 entity records are returned for each query ). if you need to retrieve more data, the returned result contains the continuation token (continue token) that can be used to retrieve the data of the next page ).
Continuation tokens will not be returned until more data space is available. table currently does not support any aggregate function, such as sum or count. even if you can count the number of rows or sum some columns, most of these operations are done on the client side and you need to scan the entire table content. This operation costs a lot. you should consider other methods, such as pre-calculation and pre-storage of the data you need. or provide some estimation values that approximate the actual situation.
For data in a single table and in a single partition, transition is supported. for example, you can create, delete, or update entity in a single atomic operation. this kind of atomic operation is also called batch operation (batch operation ). the maximum batch payload is 4 MB. there are many APIs for interacting with tables. the highest level of API uses the WCF data service. at the lowest level, you can use the rest endpoint exposed by Windows azure.
Windows azure queues
Unlike blob and table, which can be used to store data, queue is used to fulfill other intentions. the main purpose is to allow the Web role to communicate with the worker role, which is typical for notifications and agreed work.
Queue provides serialized asynchronous messages. Each message can have a maximum length of 8 KB. applications that obtain messages from queue should be designed as idempotent because messages can be processed more than once. idempotence means that an operation can be executed multiple times without changing the result. the application that obtains messages should also be designed to be able to read toxic messages (poison messages ). A virus message contains abnormal data. abnormal data can cause the queue processor to throw an exception. the result is that the message is not processed and kept in the queue. The next attempt to process the message will fail again. figure 8 describes the queue storage. as shown in the figure, the storage account can contain queue and thus contain messages.
The SDK contains a domain model that implements high-level abstraction. You can also use rest endpoints to interact with queue.
SQL Azure is a cloud-based Relational Database Management System (RDBMS ). currently, this feature is concentrated on the features that require transaction execution. for example, it provides index, view, trigger, and stored procedures. SQL azure can be used to access local SQL server applications without any modifications (if any. customers can also use on-premise software, such as SQL Server reporting services, to work with SQL azure.
You can connect to SQL azure in many ways, such as ADO ,. net, PHP, or ODBC. this means that the method you use today to develop database applications is the same as that of SQL azure. if you have a database on the cloud, you only need to modify the connection string.
Applications can be connected to databases in the cloud or on-premise. the first option is called code near, and the second option is called code far. no matter where the application is, it uses the protocol named tabular data stream (TDS) to access data over the TCP/IP protocol. this is the same as the protocol used to access the local SQL Server database. SQL azure includes a security feature that allows machines with specific IP ranges to access the database. in this way, you can specify the IP address of the machine accessing the database, and reject all other requests at the network level.
To access SQL Azure, you must create an account in http:// SQL .azure.com. each account can have one or more logic servers. These servers are implemented as multiple physical servers in one geographic location. each Logic Server can have one or more databases. These databases are partitioned and distributed across multiple physical machines.
First, you can create a database on the SQL azure server management interface, which can be found on the web portal. you can also use tools such as SQL Server Management studio to create databases, add elements (user-defined objects, tables, views, and indexes), or modify firewall settings.
The size of the three types of SQL azure databases: 1 GB, 10 Gb, and 50 GB. Your Website is based on the database size, not the actual data size you store.
The main purpose of Windows Azure is to make the life of the application owner simple. one way to achieve this goal is to provide an automated service management layer. through this service, developers create an application and deploy it on the cloud. the developer also configures service configurations and constraints. after these actions are completed, Windows Azure is responsible for running the service and maintaining the health status of the service. windows azure also provides the ability to perform many operations, such as monitoring applications and managing storage accounts, managing hosted services, service deployments, and affinity groups. you can perform these operations through the web portal, or program the operations through the rest API. APIs use different authentication methods than web portals. all programming calls are authenticated using the X509 client certificate. you can upload any valid X509 Certificate to the Windows azure developer portal and use it as the authentication certificate used for API calling on the client.
Note: The Windows azure management API discussed here refers to an API dedicated to Windows azure components (computing and storage. other APIs on the platform (such as SQL azure and appfabric) have their own management interface.