DynamoDB, amazondynamodb
I originally wanted to write processes, threads, lightweight threads, goroutine, and coroutine. Why do I list goroutine separately because it is not a coroutine and all translation into coroutine is a concept of stealing, calls coroutine, Which is lighter than a thread. Forget it. Don't worry about it. Today I want to write DynamoDB and record my experiences.
As described on the official website, there are infinite extensions. Well, the specific implementation is unknown, but the premise of this unlimited expansion is many restrictions. When determining whether DynamoDB is suitable for your project, remember to analyze its limitations. Otherwise, it will become a bottleneck of the Project sooner or later, or reduce business development.
First, DynamoDB statistics are unfriendly. What do you want to measure? It is best to give up this idea and scan it once at an incredible cost. Before using DynamoDB, set up your log system.
Secondly, it is not a cache. Although SSD is used, the cost of connecting to this thing (HTTPS) is not what the common cache will do. In terms of its internal query performance, it is also dozens of hundreds of milliseconds.
Once again, do not think that this item is easy to use. There are various sdks. Take python boto as an example. The implementation of the two versions is incomplete and some functional sdks cannot be provided at all, you have to prepare for writing RESTful requests.
Finally, if you really want to use DynamoDB and want to read it in English, you should not continue to read it. You can directly read the various details and guidance documents on the official website. The following is an excerpt from your personal experiences and pitfalls.
The first pitfall is the data type. Well, it does not fully conform to JSON. That is to say, the point format conversion was not long ago supported by List and Map, called the document type, the official recommendation is to compress JSON into Binary, which .... The maximum storage capacity of a project (that is, a record, called item) is KB (only written in this way in an English document. The Chinese document has not been updated yet and still writes 64 KB, which is a little small. In addition, chinese documents are incomplete and are not worth mentioning. Provisioned Throughput is the biggest pitfall, limiting the read/write rate and capacity by traffic. A single read unit can read the content within 1 second and less than 4 kb. 40 kb requires 10 read units. Similarly, a single write unit can write the content once per second and less than 1 kb.
The second pitfall is the preset read/write traffic limit. The global secondary index is used to separate the preset traffic calculation from the master table. The most pitfall lies in that if you want to write data to the master table, the preset traffic of the master table is sufficient, and the preset traffic of the global secondary index is insufficient. In this way, the data cannot be inserted and an error is reported, in this way, the global secondary index and the new table are no different, and the new table is more flexible. Another pitfall of default traffic is that your hashkey (the whole database uses hashkey as the primary key. If you do not know it, go to the Chinese version of the official website for further tutorial) must be hash evenly to get the maximum effect of the default traffic. For example, if you preset 1000 read records (you still need to submit a table application if the preset read capacity exceeds 10000), amazon will automatically store the data in 20 zones, then you only get 50 read traffic for each partition. If you are eager to read a certain type of data and this type of data is accidentally concentrated in a certain area, Sorry, your consumption is multiplied by several times or dozens of times (amazon's service itself is expensive ). If you want to perform consistent read (the default value is not consistent read, and you do not understand it), it will consume two times of the dirty read unit.
The third challenge is indexing. The global index mentioned above is a pitfall, and the second-level local index is also a pitfall. It has a single hashkey index with no more than 10 Gb content, this index content projection attribute * (please refer to the official website for this concept ). That is to say, when you add an index to project a single item, there are 200 bytes (200 bytes are really not many, a uuid is 32 byte at will, of course, you can choose to compress ...), in the case of 200 bytes, the maximum number of entries that a hashkey can contain is 50 million. This rule is restricted to humans. You cannot play with a single hashkey, let alone do too much projection. In addition, indexes cannot be added, modified, or deleted, but can only be added when the table is created. Although there is a limit of 10 Gb, I think its RangeKey and local secondary index are quite unique and can solve a large part of the needs. The premise is that the deepest index is Level 2. You need to find a solution to level 3 and Level 4 indexes. This solution can also solve most of the requirements.
Fourth, don't consider too many transactions, atomic operations, and so on. It can't do it. Simple auto-increment is still possible.
The last step is the SDK. If you just use it, there is no major problem with the python SDK boto, but when you consider concurrency conflicts, some attribute modifications and whether to overwrite the data, and so on, boto cannot be solved at all, which requires a large amount of effort to manually Restful. Of course, if you want to use tonado, there is no ready-made asynchronous module available.
If you have skipped the above pitfalls, I believe that DynamoDB is not far from meeting your project requirements. Of course, I have not performed many tests on performance and are ready to launch more than million projects, please wait for details.