We have been dealing with thousands of people and their databases. After countless hours of reading and writing queries, we can say that we have seen almost all the situations. Below is our
We have been dealing with thousands of people and their databases. After countless hours of reading and writing queries, we can say that we have seen almost all the situations. Below is our
When creating tables and data warehouses, you often need to make many decisions. Some seemingly insignificant decisions at the time will eventually cause you and your customers to suffer from the whole process of using the database.
We have been dealing with thousands of people and their databases. After countless hours of reading and writing queries, we can say that we have seen almost all the situations. The following are the 10 rules that help create a Schema.
1. Only lowercase letters, numbers, and underscores are used.
Do not use dots, spaces, or dashes on databases, modes, tables, or column names. Because point numbers are used to identify objects, they are generally used only in the case of database. schema. table. column.
It is difficult to include a vertex in the object name. Similarly, using spaces in the object name will force you to add unnecessary quotation marks to the query:
In addition, once you use uppercase letters on a table or column name, the query becomes more difficult to write. If all are lowercase letters, there is no need to remember whether the user table is Users or users.
In addition, when you finally change the database or copy the table to a data warehouse, you do not need to remember which database is case sensitive except some databases.
2. Use simple and descriptive column names
If the users table needs to define a foreign key that references the packages table, it is a good choice to name it package_id. We should avoid short and vague column names like pkg_fk, because it is hard for others to know what it means. Descriptive names make it easier for others to understand patterns, and this is also important to maintain productivity when the team grows.
Do not use ambiguous names to name data that may have multiple interpretation methods. If you find that you are creating a column in a naming style like item_type or item_value, it may indicate that you should use more columns with specific names, such as photo_count, view_count, and transaction_price.
In this case, the data stored in the column is always known by the mode, instead of the other values in the row.
Do not use the table name as the column name prefix. Generally, defining columns such as user_birthday, user_created_at, and user_name in the users table does not play any auxiliary role.
Finally, avoid using reserved keywords such as column, tag, or user as column names. Because once reserved keywords are used, it means you have to use additional references in the query statement. When someone forgets to do so, the database will produce very confusing error messages. If a keyword is used where the column name appears, the database cannot understand the query statement.
3. Use a simple and descriptive table name
If the table name is composed of multiple words, use underscores to separate them. Because package_deliveries is easier to read than packagedeliveries.
If possible, always use one word instead of two, because deliveries is easier to read.
Do not use the schema name as the table name prefix. If you need to partition some tables into a range, you only need to put these tables into a mode. Table names, such as store_items, store_transactions, and store_coupons, have the same prefix.
We recommend that you use the plural form to name a table (such as packages), and use the plural form for both words in the Union table name. Table names in the singular form are more likely to conflict with keywords accidentally and are generally less readable in queries.
4. Use an integer as the primary key.
Whether you are using columns of various UUID (Universal Unique Identification Code) types as the primary key, or you think it is meaningless to add a primary key with an auto-incrementing integer sequence (for example, for a Union table ), we recommend that you add a standard id column with an ascending integer sequence. This type of primary key makes specific analysis easier. For example, you can select only the first row from a group of data.
In addition, when data is imported, the primary key becomes a panacea, because we can easily delete specific rows through the primary key:
Avoid Multiple Primary keys. When writing efficient queries, multiple columns of primary keys will make the query statement hard to understand and modify. We can use an integer primary key, a unique constraint for multiple columns, or a single column index to replace multiple primary keys.
5. Consistent with foreign keys
Naming primary keys and Foreign keys have many styles. We recommend that you use the most common style: For any table foo, name the primary key in foo as id and all the foreign keys as foo_id.
Another style is to use a globally unified primary key name. In this style, the primary key of table foo is called foo_id, and all foreign keys are also called foo_id. However, no matter which style you use, using abbreviations (for example, the users table is abbreviated as uid) will always cause problems or name conflicts, so you should avoid using abbreviations.
In addition, no matter what style you choose, stick to it. Do not use uid in some places, but use user_id or users_fk in other places.
In addition, the unexpected key does not explicitly match a table. A column named owner_id may be a foreign key of the users table, or of course not. Therefore, if necessary, name the column as user_id or owner_user_id as the foreign key.
6. Save the Date and time as various Date and Time types
Instead of using Unix timestamps or strings to store dates, convert them to various Date and Time types. Although the date calculation functions of SQL are not the best, it is easier to call these functions to process timestamps than to process them by yourself. During the query, we need to call the SQL date function for every query involving the conversion from timestamp to datetime type.
Do not store the year, month, and day in different columns. This will make the query of each time series more difficult to write, and will also cause problems for most SQL beginners when using the date information of this table.
7. Always use UTC
Using the time zone instead of UTC will cause endless problems. Good tools (including our peris) have all the functions you need to convert from UTC to your time zone data. In peris, simply add a: pst to convert UTC to Pacific Time.
Fromusers
Set the time zone of the database to UTC, and all datetime columns should be of the type after the time zone is stripped (for example, timestamp without the time zone ).
If your database's time zone is not UTC, or your database is mixed with UTC and non-UTC time dates, it will become more difficult to analyze and query time series.
8. A single truth Source
A piece of data should only have a single Source of Truth ). The view and summary (Rollup) should be marked. In this way, data consumers will know the difference between the data they use and the native truth.
Select *
From daily_usage_rollup
On the other hand, retaining the legacy columns such as user_id, user_id_old, or user_id_v2 will only cause endless troubles. Therefore, make sure that the discarded tables and fields that are no longer used are deleted during routine maintenance.
9. Use tables without JSON columns first
Do not use tables with too many columns. If a table has more than dozens of columns and some of them are named in sequence (for example, answer1, answer2, and answer3), you will feel bad immediately.
The correct method is to convert such a table into a pattern that does not contain duplicate columns, because this pattern can be easily queried. For example, calculate the number of completed questions in a survey in a query: