As a programmer, the basis for the use of SQL, although also written for many years of SQL, but often still do not remember some common commands, so through a blog post to consolidate the relevant memory, and the T-SQL itself, some of the new features to learn again.
First review the basic concept, this part can skip ha, more boring. Structured Query Language SQL is based on set theory and predicate logic, and the content of this part is mainly involved in digital logic and discrete mathematics in university curriculum.
The set theory is created by mathematicians Georg Cantor and is a branch of mathematics based on the relational model. The definition of a set is that any aggregate is what we perceive or think, the whole of an object m that can be determined and mutually dissimilar .
The origin of predicate logic can be traced back to ancient Greece and is also a branch of mathematics based on the relational model. Edgar F. Codd, who creates a relational model realm, proposes to manage and query data through predicate logic. In general, a predicate is a property or an expression that represents "hold" or "does not hold", that is, "true" or "false." The relational model relies on predicates to maintain the logical completion of data and to define its structure. In C #, you can often see predicate, which is also the logical predicate.
Relational models involve concepts such as propositions, predicates, relationships, tuples, and attributes. In set theory, a relationship is a representation of a set. In a relational model, a relationship is a collection of related information that corresponds to a table (not a relationship between tables) in SQL. It is important to note that a single relationship represents a single collection, and that the result of multiple relationships being manipulated (based on relational algebra) is a relationship, such as a join operation. In general, in our view, the predicate is not "true", or "false". But let us recall that a famous experiment in physics, "Schrödinger's Cat," whose main idea involves "unknown state", so in the database, we can see "true", "false", "unknown" three cases.
At the same time, the relational model uses constraint rules to define data integrity as part of the model. Common are candidate keys that provide entity integrity and foreign keys that provide referential integrity . A candidate key is a key that defines one and more properties, prevents multiple identical tuples from appearing in a relationship, a predicate that is based on a candidate key can uniquely identify a row, can define multiple candidate keys on a relationship, usually chooses a candidate key as the primary key, and the other candidate keys are called alternate keys. What can be added here is that the concept of the primary key (candidate key) and the concept of the index are separate, but the database is implemented with the primary key as a clustered index. Foreign keys are used to enforce referential integrity, and a foreign key defines one or more properties of a relationship that refer to a candidate key for another relationship, which qualifies the value in the foreign key attribute of the referencing relationship and should appear in the candidate key attribute of the referenced relationship. This is to add that in high concurrency environments , the database foreign keys are generally not used, but are controlled at the business level. In addition, there are non-null constraints, UNIQUE constraints, and so on.
The next step is to introduce a common, but time-long, easy-to-forget concept, normalization rules (also called paradigms). Normalization is a general mathematical process used to ensure that each entity is represented by a single relationship to avoid exceptions during data modification and to maintain minimum redundancy without sacrificing integrity. The three most common paradigms are:
1NF, the tuple in the relationship must be unique, and the attribute is atomized (this is often based on demand, not absolute).
2NF, on the basis of satisfying the first paradigm, each non-key attribute must be a complete function dependency on the entire candidate key for each candidate key. In short, if you want to get any non-key attribute values, you need to provide the values of all the attributes in the same tuple candidate key, and if you know the values of all the properties of the candidate key, you can retrieve any value of any property of any tuple.
3NF, on the basis of satisfying the second paradigm, all non-key attributes must depend on the non-transitive candidate key. Simply put, all non-key attributes must be independent of each other, and a non-key attribute cannot depend on another non-key attribute.
Following a brief introduction to the life cycle of the data, common business systems typically contain only one stage of online transaction processing, but as the size of the business grows, it will slowly derive a number of big data direction stages, 1 of which are represented by the acronym: OLTP, online transaction processing, DSA, data preparation area, DW ( OLAP), Data Warehouse, BISM, business Intelligence semantic model, DM (data Mining), mining, ETL, extract, transform and load; MDX, multidimensional Expressions; DAX, data analysis expressions;
Figure 1 Data life cycle
The Data Warehouse is mainly designed and optimized to support data retrieval requirements, and the model is deliberately redundant, reducing tables and simplifying relationships. The simplest DW structure is a star schema, including multiple dimension tables and a fact table, each of which represents the data topic to be analyzed. For example, in order and sales systems, you might want to analyze customer, product, employee, time, and similar topic data. In a star schema , each dimension fills a single table with redundant data (for example, a productdim is synthesized from product, ProductSubCategory, ProductCategory). In addition, if you want to normalize a dimension table, you will produce multiple tables to represent the dimension, and a snowflake dimension, which is also known as the snowflake schema .
The process of extracting data from the source system, processing the data, and loading it into the Data warehouse, known as extracting, transforming, and loading Etl,sql server-related products is our common SSIS, which often involves the use of the data prep Zone DSA between OLTP and DW.
The Business Intelligence Semantic model Bism provides a rich and flexible analysis and reporting capability with three layers of data models, business logic, and query data access. Models can be deployed on Analysis Services and PowerPivot, the former for BI professionals, using a multidimensional data model or table, which uses tabular data models for enterprise users. Business and queries use two languages, Multidimensional Expressions (MDX) based on multidimensional concepts, and data analysis Expressions (DAX) based on tabular concepts. The data access layer can obtain data from different data sources: relational databases such as DW, files, cloud services, OData subscriptions, and so on. The data access layer can either cache data locally or serve as a pass-through layer for connecting to a data source. There are two storage engine options available in cache mode, a pre-aggregation called MOLAP, designed to support multidimensional models, and a new engine called Vertipaq, which implements Columnstore concepts with high compression levels and very fast processing engines, not requiring pre-aggregation, indexes, and so on.
Bism provides the user with the possible answers, while the DM provides the correct answers to the customer. In other words, data mining algorithms comb data and filter useful information, and Analysis Services supports data mining algorithms including clustering, decision trees, and so on.
Some of the concepts about SQL Server, such as ABC, are skipped, after all, but there are some bright spots in Microsoft's cloud services, including database cloud services (Window Azure SQL). More technical concepts, then introduced.
SQL Server instance is uniquely labeled by machine name \ Instance name, such as Server1\inst1. Several different databases in the database instance, including: master database Storage Instance scope metadata information, server configuration and so on; The model database is used to create the template for the data; The tempdb database is where temporary data is stored, such as worksheets, sort spaces, row versioning information, and so on The msdb database is where data is stored by a service called SQL Server Agent, and SQL Server Agent provides automated actions (including jobs, schedules, and alerts), replication services, Database Mail, service Broker, backups, and so on.
The database has data files and log files, and when you create a database, you can define various properties for each file, including the file name, location, initial size, maximum size, and an autogrow increment. Data files store data, and log files maintain transaction information. Although SQL Server can write multiple data files in parallel, it can only write one log file at a time in a sequential manner. Therefore, unlike data files, multiple log files do not result in a performance boost, and if the log is running out of disk space, you may also need to add log files, which in practice often experience log files that are too large to occupy disk space and that the database service is not available. The data files are organized in a logical group called filegroups, the primary filegroup contains the Xxx.mdf primary database file, and the database system directory, you can add a secondary data file xxx.ndf to primary, while Xxx.ldf represents the log file. In addition, the database provides a layer of schema schema for the management of database objects.
Figure 2 Database File layout
Mainly for their own review reference use, if there are omissions, deep apology.
Resources:
- ( US ) Ben . SQL Server (t ) Basics Tutorial [M]. Beijing : People's Post and Telecommunications publishing house ,.
T-SQL2012 Review-01 Basic concepts