Hello, Eric Buddy ~ Giant Cedar answer line today!
As a commercial open source software, the giant Cedar database already has a large number of community users. Open source so far, large to distributed database principles, architecture problems, small to the installation of the SDB giant FIR database problems, we seem to have a lot of problems to communicate with us, so we invite technical big coffee and everyone to have a good chat, what problems despite the horse come over!
Yesterday, we went to the giant Cedar database of two technology big coffee in the official group of giant Cedar for the relevant content of the distributed database for the group of friends to answer questions. Come and get some dry goods!
1. The official data said that the giant fir a lot of performance over MongoDB, the former can replace the latter, can not be completely replaced by the words, in what performance is not enough for MongoDB, why, there are design factors? What is the relationship between MongoDB and SDB?
SDB is a completely alternative to MongoDB, and very versatile MongoDB does not support it. such as transactions, join queries, and so on.
At present, the performance of the above MongoDB, but also in the distribution mechanism than MongoDB perfect.
To say weaknesses, mainly in the promotion and development of the community, MongoDB's earliest starting point is from the development began.
E-R relational structure is too slow for development iterations, so JSON document type and object one by one correspond, and there is no schema qualification, for the development of POC and other iterations of the help is too big, many internet enterprises are a half-day version of the development requirements are very high.
MongoDB in the ease of doing very well, we are also working hard, I hope you support!
There is no relationship between MongoDB and SDB, and everyone's starting point is very different.
2. The development of the database itself is highly technical and requires the principle of the thesis to support it, like the balance of AC in the CAP theory, and I hope to hear the analysis of the original reason.
Cap mainly refers to consistency (consistency), availability (availability), Partition tolerance (partition fault tolerance), mainly in the distributed storage System proposed in the theory, p is distributed in the necessary, when the network and other factors in the event of failure, A and C cannot be satisfied at the same time, so the concept of "three choice two" and the choice of AP and CP are proposed.
Three election two will mislead many new people, think is not CP is AP, in fact this choice is extreme situation, not simple non-black is white choice.
First, because partitions rarely occur, there is no reason to sacrifice C or a if there are no partitions in the system. Second, the trade-offs between C and a can occur repeatedly in a very small granularity within the same system, and each decision may vary depending on the specific operation or even the specific data or user involved. So is the change between 0-100%.
Here, for example, the distribution of a master two from, the protection of the AP, but the speed of network synchronization determines the ability of the CP, so this synchronization ability is based on the network environment from the 0-100% change.
When the conditions are good, the basic is satisfied, when the network is broken, it is necessary to make a choice, but when the synchronization is complete, from the node upgrade to the main node, and then resumed.
Of course, if you want to 100% to ensure that the CP must be strong consistency, then when a machine down, strong consistency can not be done, can not meet the CP.
Acid is the most important characteristic of traditional relational database, such as atomicity, consistency, isolation and persistence, emphasizing consistency. belongs to CP.
Base is the basic available (basically Available) soft state (Soft state) final consistency (eventually consistent) soft state is the state of the interim, such as a distributed slave node when synchronization is delayed. From the base point of view, high availability is a priority guarantee. Belongs to the AP.
How to balance these two states requires the business logic, user requirements, and business requirements.
3. Now that the giant Cedar database is divided into Community Edition and Enterprise Edition, does the Community office have castration, which are the two different scenarios?
Open Source Community Edition and Enterprise version of the database kernel source code are consistent, compared to the Enterprise Edition, the Community Edition is only the visualization of the operation and Sparksql this part is limited.
The other is professional services, the Enterprise version has some of our dedicated debugging tools, etc., easy to service, but the community has little impact.
4. How does the giant Cedar database combine with spark? No ordinary SQL query, which method is used to query?
Spark is suitable for complex queries with low concurrency and large data volumes.
The giant cedar database specifically developed a set of connectors for spark that can support the provision of raw data for spark. The syntax of the Spark SQL basically follows the syntax of the hive SQL, which you can basically write by looking at the hive SQL syntax.
What is the difference between 5.spark SQL and normal SQL? Different grammar or language?
Spark SQL is a language similar to standard SQL, but it's basically no barrier to writing, but after all it's not an OLTP (on-line Transaction processing online transaction process), some grammars are not supported and belong to a subset of SQL.
6. Where is the bottleneck of the giant cedar database?
At present, the biggest bottleneck in the isolation mechanism, giant fir sdb is mainly in pursuit of high availability, high performance, is the acid and base mechanism, on the base mechanism of the giant FIR sdb advantages more obvious.
Acid and base respectively drive the development of relational databases and NoSQL, and now the Newsql is looking for a better balance, to support acid as much as possible on a highly available basis.
For today's business, highly available scenarios are much larger than OLTP scenarios.
What is the difference between 7.nosql and newsql?
This is to start with NoSQL, in the internet era to inspire a large number of users, high availability becomes very important, so you exit the base mechanism
Base is the basic available (basically Available) soft state (Soft state) final consistency (eventually consistent) soft state is the state of the interim, such as a distributed slave node when synchronization is delayed. From the base point of view, high availability is a priority guarantee. The APS belonging to the CAP are primarily designed to ensure the convenience and high availability of business iterations. Newsql is based on base, as far as possible to meet acid.
8. Does the giant Cedar database support distributed file storage?
Supported, giant Cedar has SEQUOIACM products are specifically used for unstructured storage, distributed in the same way, storage structure is different, is block storage.
At present, SDB supports dual storage, one is the Bson structure, and the other is a block storage structure, and SDB is called LOB (large object) for block storage and supports storing unstructured files of any size. So you can use the SDB LOB feature to store the file, and the file will be cut into 256KB (the default) block size distributed stored in the entire database cluster, the data redundancy depends on the data partition group way.
? "Giant Cedar Solution" ECM Enterprise Content Management
Does 9.SDB currently have an official Docker image?
There is no official Docker at the moment.
The main reason is that Docker has a lot of skills in memory management and I/O usage from a performance standpoint, but we'll consider launching Docker from a small project, usability perspective.
We have recently provided a mirrored version of the app in Tsing Yun, and we also consider using Docker deployment for other clouds.
10. If I am a video file, what will the performance change?
If the user continues to write some large unstructured files to SDB, the performance change will depend primarily on your server I/O.
Do you have any questions about SDB, distributed database? Add "Shanshan" for friends to talk about it ~
What is the relationship between the giant Cedar database and MongoDB?