Learn more about the SEQUOIADB giant Cedar database and how to connect python

Source: Internet
Author: User
Tags create index data structures mongodb numeric value create database

As the company's increasingly complex and diverse needs, as well as the rapid expansion of the massive data business, we need to provide efficient services, while reducing its equipment and program maintenance costs. Forget it, don't blow, plainly is need to fetch a large amount of data from the giant FIR database, but I do not now, so need to study hard. By the way, make a note in case of a rainy future.

Here, I began to learn from the basics of the giant Cedar database, its history, its performance, its deployment, and so on.

According to a rough understanding of the Internet, bat and other giants have their own nosql projects, some are based on open source project self-development, some rely on MongoDB and other nosql as the basis for building data analysis platform. In recent years, all kinds of public cloud service providers, in addition to relational databases such as SQL SERVER,MYSQL,MARIADB, are starting to attempt to deploy NoSQL (of course, these NoSQL databases are based on MongoDB).

BAT and other big internet companies, rich and deep, mostly with a group of programmers based on open source database to make improvements to their own business, so very few such products released.

Thankfully, there are also nosql developers in the country. Guangzhou giant Cedar released the enterprise-class nosql:sequoiadb, known in the function and research and development technology, does not lose mongodb. SEQUOIADB's name was still unfamiliar, but in recent years it has often appeared in conferences, forums, and other big data themes, and in the recent announcement of a round of tens dollars, and the announcement of Open source at the Archsummit Summit.

Guangzhou giant Cedar Software's introduction to its product SEQUOIADB is: SEQUOIADB (Giant cedar database) is a distributed, document-based NoSQL database that is the only product in the industry that supports transactions and SQL. SEQUOIADB can be used both as a data source for Hadoop and spark to meet the mixed load of real-time queries and analytics, or as a high-performance, flexible and easy-to-use database for direct application. SEQUOIADB already has customers including well-known it internet companies and the world's top 500 companies.

One, the Python environment builds 1.1 install and deploy a clustered environment (this step is not executed, the mini-series is deployed well)

Enter the terminal, cut into the storage sequoiadb directory, my directory is the Home (~) directory.

The SEQUOIADB installation package is downloaded and is a tar.gz compressed package that needs to be decompressed first:

TAR-ZXVF sequoiadb-1.10-linux_x86_64-installer.tar.gz

This command extracts the compressed package to the current directory.

Execute the extracted run package:

sudo./sequoiadb-1.10-linux_x86_64-installer.run

This shell command performs a SEQUOIADBDB installation and, if it is a desktop environment, is installed by default as a graphical wizard, and if you need the Character Setup wizard, you can use the following command to perform the installation:

sudo./sequoiadb-1.10-linux_x86_64-installer.run--mode text--sms false

I'm using the character wizard to install, and in turn,

    1. Language choice, Chinese and English, I choose to be 中文版;
    2. User License Agreement statement, if in doubt open source Choice 2, view the specific content of the agreement;
    3. Confirm the license, select Y;
    4. installation directory, default is /opt/sequoiadb; (excluding ";")
    5. Create DATABASE Administrator account: User name, default is sdbadmin, password, default is sdbadmin, if you need to set the database administrator user name and password, you can enter the corresponding input prompt;
    6. Cluster Management Service port, default is 11790;
    7. Boot from start, I choose Y, confirm boot from start;
    8. OM Server installation and other confirmation, I chose Y, confirm the installation;

Finally, to continue the installation confirmation, you must select Y-ha, confirm that the following characters will appear:

Lease wait while the Setup installs SEQUOIADB Server on your computer.  Installing 0% ______________ 50% ______________ 100% ###########################

  

Wait patiently for its installation, and when the installation is complete, it will automatically cut into the shell input state.

Confirm again: During installation, the installer creates a system user for the database administrator, manages and deploys the user database.

Here, the database is installed correctly:)

After the installation is complete, check the database service status and execute the service sdbcm status in the shell

In the right case, the Sdbcmd process number, and SDBCM is running , should appear.

tynia@milky:~$ Service sdbcm status4991sdbcm is running.

  

If the SDBCM service does not start, you can manually perform service sdbcm start to start the SDBCM services.

If it still fails, please check the installation process for errors.

SDBCM The service status is correct, you can continue to deploy.

(The above operation is to refer to this blog, deployment of specific actions, please refer to the blog: www.cnblogs.com/tynia/p/sequoiadb01.html)

1.2 Installing the SEQUOIADB driver

I really do not understand why I downloaded the drive of the giant FIR database, but also I register, enter personal information and so on. Forget it, Jirenlixia, that's it, download the link address is as follows http://download.sequoiadb.com/cn/index-cat_id-2#

I use the compiler language is python, so download is the Python driver, see (Note is Linux):

For Windows, the drive development package was not released, and, well,,,,,

1.3 Database Operations

  database connection (connecting)

Import pysequoiadbfrom pysequoiadb  import client# Connect to local db, using default args Value.host = ' loaclhost ' port = 11810# user = ', password = ' db = Client (host,port) # If no error occurs, connect to specified server Successfullyprin T (' Connect success ') Db.disconnect ()

(This tutorial is the official document of the giant Cedar database) This example thread attached the service port 11810 to the local database, using an empty user name and password. Users need to configure the parameters according to their actual situation. For example, modify the above code db = client() to db = client('192.168.10.188', 11810) . When the database has been created, the user should be connected to the database with the correct user and password, or the connection fails.

  Creating collection Spaces and collections

The following creates a collection space with the name "Foo" and a collection named "Bar", with a data page size of 16k for the collection within the collection space. Data pages of different sizes can be selected according to the actual situation. After creating the collection, you can make additions and deletions to the collection and other operations.

# Connect to Dbdb = Client ("localhost", 11810) # Create collection spacecs_name = ' foo ' cs = db.create_collection_space (cs_n AME) Cl_Name = ' Bar ' cl = cs.create_collection (Cl_Name)

  Inserting data (insert)

# creat Dict Objectrecord = {"Name": "Tom", "age": 24}oid = Cl.insert (record);

The record is the input parameter, which is the data to be inserted. The Dict object will be converted into Bson to be inserted into the collection. The OID is the objectid of the Bson structure that is returned when the record is inserted.

 Queries (query)

Import pysequoiadbfrom pysequoiadb import clientfrom pysequoiadb.error Import SDBENDOFCURSORCR = Cl.query () while True:tr Y:    record = Cr.next ()    print (record)  except sdbendofcursor: Break    finally:    cr.close ()

A query operation requires a cursor object to hold the result of the query locally. Cursor operations are required to obtain the results of the query. This example uses the next interface of the cursor operation, which indicates that a record is taken from the query results. In this example, no query criteria, filter criteria, sorting conditions, and only the default index are used.

  Indexing (Index)

index_name = "index_name" idx = ordereddict ([' Name ', 1), (' Age ',-1)]) Cl.create_index (idx, Index_name, false, false);

In the Collection object collection, create an index with "name" Ascending and "age" in descending order.

Updates (update)

Rule = {"$set": {"age": 19}}print rulecl.update (rule)

The record was updated in the collection object ollection. The data matching rule is not specified in the instance, so this example updates all the collections in the collection.

1.4 SQL to sequoiadb Shell to Python

The SEQUOIADB query is represented by a dict (Bson) object, and the following table shows the SQL statement as an example. SEQUOIADB The comparison between the shell statement and the SEQUOIADB Python driver syntax.

SQL sequoiadb Shell Python Driver
Insert into Bar (A, b) values (1,-1) Db.foo.bar.insert ({a:1,b:-1}) CL = db.get_collection ("Foo.bar")
obj = {"A": 1, "B":-1}
Cl.insert (obj)
Select a B from bar Db.foo.bar.find (null,{a: "", B: ""}) CL = db.get_collection ("Foo.bar")
Selected = {"A": "", "B": ""}
CR = Cl.query (selector = selected)
SELECT * FROM Bar Db.foo.bar.find () CL = db.get_collection ("Foo.bar")
CR = Cl.query ()
SELECT * from bar where age=20 Db.foo.bar.find ({age:20}) CL = db.get_collection ("Foo.bar")
Cond ={"age": 20}
CR = Cl.query (condition = cond)
SELECT * from bar where age=20 order by name Db.foo.bar.find ({age:20}). Sort ({name:1}) CL = db.get_collection ("Foo.bar")
Cond ={"age": 20}
by = {"Name": 1}
CR = Cl.query (Condition=cond, Order_by=orderby)
SELECT * from bar where age > Age < 30 Db.foo.bar.find ({age:{$gt: $, $lt: 30}}) CL = db.get_collection ("Foo.bar")
Cond = {"Age": {"$GT": $, "$lt": 30}}
CR = Cl.query (condition = cond)
Create index Testindex on bar (name) Db.foo.bar.createIndex ("Testindex", {name:1},false) CL = db.get_collection ("Foo.bar")
obj = {"Name": 1}
Cl.create_index (obj, "Testindex", false, False)
SELECT * FROM bar limit 10 offset Db.foo.bar.find (). Limit (a). Skip (10) CL = db.get_collection ("Foo.bar")
CR = Cl.query (num_to_skip=10l, num_to_return=20l)
Select COUNT (*) from bar where age > 20 Db.foo.bar.find ({age:{$gt:}}). Count () CL = db.get_collection ("Foo.bar")
Count = 0L
Condition = {"Age": {"$GT": 20}}
Count = Cl.get_count (condition)
Update bar set a=2 where B=-1 Db.foo.bar.update ({$set: {a:2}},{b:-1}) CL = db.get_collection ("Foo.bar")
Condition = {"B": 1}
Rule = {"$set": {"a": 2}}
Cl.update (rule, condition=condition)
Delete from bar where a=1 Db.foo.bar.remove ({a:1}) CL = db.get_collection ("Foo.bar")
Condition = {"A": 1}
Cl.delete (condition=condition)
1.5 Python API1 Collection class add interface
Alter, modify the properties of the collection enable_sharding, enable the partitioning feature on the collection disable_sharding, turn off the partitioning feature on the collection enable_compression, enable compression on the collection Disable_ compression, modifying the collection's properties by closing the compression function set_attributes the collection
2 Collectionspace class Add interface
Alter, modify the properties of the collection space Set_attributes, modify the properties of the collection space Set_domain, modify the domain Remove_domain to which the collection space belongs, remove the domain to which the collection space belongs
3 Domain class Add interface
Add_groups, add Data group set_groups to the domain, set the data group Remove_groups for the domain, remove some data group set_attributes that belong to the domain, set the properties of the domain

  For details, please refer to: http://doc.sequoiadb.com/cn/index/Public/Home/document/300/api/python/html/index.html

Second: Preliminary understanding of SEQUOIADB Database

  (This section explains the content of the data is a reference to the online blog 12106133)

SEQUOIADB, the world's first enterprise-class, document-type, non-relational database, provides a full-fledged platform for such things as high scalability, high availability, high performance, easy maintenance, and low cost. The following from its characteristics, data model, system architecture and other three aspects to understand SEQUOIADB.

2.1 SEQUOIADB Features

1, when the traditional relational database can not achieve horizontal expansion capacity, in the SEQUOIADB will be the perfect solution, through the vertical slicing of data, and the application of a new non-relational data model, SEQUOIADB effectively reduce the traditional database partition in the large number of data exchange bottlenecks, The ability of linear horizontal expansion is obtained.
2, SEQUOIADB can save the user's every data in real-time multiple copies, effectively prevent the server, computer room and human factors caused by the system downtime losses, to ensure that the online availability at any time.
3, SEQUOIADB for the enterprise to provide user-friendly and perfect management, maintenance and monitoring interface, to achieve 24x7 telephone and on-site technical support, with perfect enterprise-level support.
4, SEQUOIADB using the JSON data model, flexible and effective to reduce the complexity of the relational model, so that the database closer to the application, thus greatly reducing the application development and maintenance costs.
5, sequoiadb in a large-scale distributed environment to provide the ultimate consistency of data protection, to meet the needs of users of real-time and consistency.
6, sequoiadb through the fragmentation mechanism to read and write separation, allowing the front-end online application and background data analysis perfect parallel non-interference, and can be combined with Hadoop technology for massive data analysis.

2.2 SEQUOIADB Data Model

The SEQUOIADB database does not use a traditional relational data model, but rather a JSON data model. JSON data structures are all called JavaScript Object Notation, is a lightweight data interchange format, very easy to read and write, but also easy to machine generation and parsing, in plain text format, supporting nested structures and arrays.

2.3 JSON constructs are based on two structures:

1. Set of key-value pairs. In a key-value pair collection structure, each data element has a name and a numeric value that can contain common structures such as numbers, strings, or nested JSON objects and arrays.

2, Array. Each element in the array does not contain an element name, and its value can be a number, a common structure such as a string, or a nested Josn object and array.
Its typical nested data structure is shown in the following example:

  

2.4 SEQUOIADB System Architecture

SEQUOIADB uses a distributed architecture for architecture:

On the client (or application side), local or (and) remote applications are linked to the SEQUOIADB client library. Forget communicates with a remote client using the TCP/IP protocol with the coordination node.
The coordination node does not save any user data and distributes the user request to the appropriate data node as a pull-distribution node only.
The cataloging node holds the metadata information of the system, and the coordination node communicates with the cataloging node to understand the actual distribution of the data in the data node. One or more cataloging nodes can form a replication group cluster.
The data node holds the user's data information. One or more data nodes can form a replication group. The data for each data node in the replication group guarantees eventual consistency synchronization. Data replication groups are also called data shards (Shard), and the data stored in different shards is not duplicated.
Each shard can contain one or more data nodes. When there are multiple data nodes, data between nodes is replicated asynchronously. There can be up to one master node and several slave nodes in a shard. Where the master node can read and write operations, from the node-only operation.

Offline from the node does not affect the primary node's normal operation. When the master node is offline, the new master node is automatically selected to process write requests from the node.

When the node is restored, or the new node joins the Shard, the row automatically synchronizes, ensuring that the data is consistent with the master node when the synchronization is complete.

The architecture in a single data node is as follows:

At the data node, the activity is controlled by the engine-height unit (EDU). Each node to a process in the operating system. Each edu is a thread in the node. For external use requests whose processing thread is the proxy thread, for intra-cluster requests, the synchronization agent thread processes the intra-Shard synchronization event or the Shard agent thread to handle the inter-Shard synchronization event. All writes to the data are entered into the log buffer, which is written asynchronously to the disk by the logger. User data is written directly to the file system buffer pool by the agent thread, which is then asynchronously written by the operating system to the underlying disk.
From the above three points can be a preliminary understanding of sequoiadb database, can be more in-depth study and application of SEQUOIADB database as a theoretical foreshadowing.

Three: Some basic concepts of SEQUOIADB database

(This section of the basics is from blogs: 12106005)

3.1 Documentation

The documents in SEQUOIADB are in JSON format and are generally referred to as records. The JSON data is stored inside the database using Bson, which is a binary way. In general, a document consists of one or more fields, each of which is divided into two parts of the key value and the value. It should be noted that the Bson document may have more than one name field, but most SEQUOIADB interfaces do not support duplicate field names; some documents created by SEQUOIADB internal programs may contain fields with duplicate names, but do not add duplicate keys to existing user documents.

3.2 Integrated

A collection (Collection) is a logical object that holds documents in a SEQUOIADB database. Any document must belong to one and only one collection.

3.3 Integrated Space

The Collection space is the physical object that holds the collection in the database. Any collection must belong to one and only one collection space. Each collection space corresponds to a file in the data node.

3.4 Database Server

SEQUOIADB is a document model non-relational database server that provides software services to securely and efficiently manage information. A database server is a computer that has the SEQUOIADB database engine installed. SEQUOIADB engine is the basic unit of data access operation, in distributed architecture, each database exists as an outer node, and the data between nodes is not shared. On a single computer, each SEQUOIADB database engine corresponds to a database path, and all the collection spaces in that database are placed in that directory. The database path contains one or more collection spaces. Each database engine can contain a maximum of 4,096 collection spaces.

3.5 Index

In the SEQUOIADB database, the index is a special data object. The index itself does not act as a container for saving user data. But as a special kind of meta-data, improve the efficiency of data access. Each index must be set up in a collection, and a collection can have up to 64 indexes. An index can be thought of as the way in which the data is sorted by one or more of the given fields, in which you quickly search for a user-specified query condition. In Sequoiadb, the index uses the B-tree structure.

3.6 Business

A transaction is a logical unit of work that consists of a series of operations. In the same session (or connection), only one transaction is allowed at the same time, that is, when the user creates a transaction in a single session, the user cannot create a new transaction until the end of the transaction. The transaction executes as a complete unit of work, and the operations in the transaction either execute successfully or all fail. An operation in a SEQUOIADB transaction can be done only by inserting data, modifying data, and deleting data, and other operations performed during a transaction are not included in the transaction, meaning that non-transactional operations are not rolled back when the transaction is rolled back. If there is data in a table or table space that involves transactional operations, the table or table space is not allowed to be deleted. By default, the transaction functionality is turned off.

3.7 Final Consistency policy

SEQUOIADB in order to improve the reliability of data and the realization of data read and write separation, the data between the replication group using the "final consistency" strategy, the data read during the read and write separation may not be the latest in a period of time, but ultimately consistent.

3.8 Read/write separation

In sequoiadb, all write requests are sent only to nodes, and if there is no master node, the current data group cannot process write requests.

3.9 Clusters

SEQUOIADB cluster refers to the way to improve the efficiency of data request by combining multiple database servers to achieve parallel computing. Through the SEQUOIADB cluster, high-performance data access can ensure high availability of data and achieve the level of database expansion.

3.10 Operating mode

Refers to whether the service starts in standalone mode or in cluster mode when the SEQUOIADB service is started. Standalone mode is the most streamlined mode to start the SEQUOIADB, and only needs to start a data node in standalone mode for data service. (It is generally recommended to use standalone mode in the development environment to reduce the need for hardware resources.) Cluster mode is the standard mode for starting SEQUOIADB, which requires at least three nodes.

3.11 Nodes

Catalog node: is a logical node that holds metadata information for a database without saving other user data. In addition to cataloging nodes, all other nodes in the cluster do not hold any global metadata information on the disk. When it is necessary to access data on other nodes, other nodes other than the catalog node need to look for collection information from the local cache, and if it does not exist, it needs to be fetched from the catalog node. The cataloging node communicates with the other nodes primarily using the catalog service port. Coordination node: It is also a logical node, and no user data information is saved in the base. The coordination node, as the coordinator of the data request part, does not participate in the data matching and reading operations, but simply distributes the request to the data node that needs to be processed. The coordination node communicates with the other nodes primarily using the partition service port.
Data node: is still a logical node in which user data information is saved. The data node has a specialized collection of cataloging information, so you need to request the collection's metadata information to the catalog node before you access the collection for the first time. In standalone mode, the data node is a separate service provider that communicates directly with the application or client and does not require access to any cataloging information.

3.12 Partition Groups

Also known as a replication group, a replication group can contain one or more data nodes (or cataloging nodes), and the data between nodes is used by the asynchronous log replication mechanism to maintain eventual consistency. All nodes in a partition group communicate with each other by using a replication service port to periodically send heartbeat information to each other to verify the state. The nodes of each partition group have two states: the primary node (which can be read and write, all the written data is written synchronously to the log file, the log information in the log file is written asynchronously from the node) and from the node (as read-only, all data written from the master node is written asynchronously from the node, As a result, there may be temporary data inconsistencies between the node and the primary node, but the replication mechanism guarantees the eventual consistency of the data.

3.13 Data Partitioning

In the SEQUOIADB cluster environment, users tend to store data in different logical nodes and physical nodes in order to achieve the purpose of cross-line computing. Since every node in the partition group that holds the data contains exactly the same data, each partition group is called a "partition" the data between each partition is not affected and is not shared.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.