Introduction to Crawler "8" Python connection MongoDB usage

Source: Internet
Author: User
Tags bulk insert mongoclient

MongoDB Connectivity and data access

MongoDB is a cross-platform, document-oriented NoSQL database that provides high performance, high availability, and ease of expansion.
Contains several important concepts such as databases, collections, and documents.
We do not introduce the characteristics and usage of MongoDB here, interested can look up the official documents.
This article focuses on how to use Python to connect to MongoDB and manipulate the MongoDB database via Pymongo.
Here the default MongoDB has been installed, installation tutorial can refer to:
Http://www.yiibai.com/mongodb/mongodb_environment.html
Thanks to the easy tutorial ~~~~~

Installing Pymongo

Currently the latest version is 3.5.1. Please be careful not to install the Bson package separately, otherwise it will not be compatible with Pymongo.

Using mongoclient to establish a connection

When using Pymongo, the first step is to run the Mongod instance to create a mongoclient with the following code:
Of course, before using the code test, be sure to ensure that the MongoDB service is open, or not connected to the ~ ~ ~

fromimport MongoClientclient=MongoClient()#这是设置连接默认主机和端口,也可以明确指定主机和端口
fromimport MongoClient#client = MongoClient()= MongoClient(‘localhost‘27017)#client = MongoClient(‘mongodb://localhost:27017/‘)#上面几种方法都可以。
Get database

If the connection is successful, then we will start to access the database:
The first method is to use the property method of the client instance, that is. The way of DatabaseName
Let's say our database name is Pytest, look at the code:

db=client.pyTest

The second way is to use a dictionary to see the code:

db=client[‘pyTest‘]
Get collection

With the connection of the database, we can further obtain the database in the amount of the collection, that is, collection, similar to SQL data table, for the preservation of data. There are also two ways to assume that we have a table called first in the Pytest database.

collection=db.first#collection=db[‘first‘]

It is worth noting that the creation of collections and databases in MongoDB is different from traditional SQL databases, and they are lazy to create collections and databases only when the first document (data) is inserted into a collection.

Document (data)

In MongoDB, the data stored is called a document and is in Bson format. Use a dictionary to represent a document, such as a document that represents a blog:

= {"author""xingzhui",         "text""My first blog post!",         "tags": ["mongodb""python""pymongo"],         "date": datetime.datetime.utcnow()}#可以看出文档是字典格式的,key-value对组成的,如果一个key对应多个value,需要用[]将所有的value包围起来。
Insert Document

To insert a document into a collection, you can use the Insert_one () method, which simply inserts a single bar;
If you want to insert more than one document, then use the Insert_many () method.
The parameters of the two methods are similar. Give me a chestnut, if all of our operations are smooth.

#获取pyTest数据中名为posts的集合,如果该集合不存在,那么就创建他= db.posts#将post数据插入到posts里面,并获取插入的文档的主键id= posts.insert_one(post).inserted_idprint ("post id is ", post_id)
#我们将上面的语句拆分一下result=db.posts.insert_one(post)print(result.inserted_id)

Let's talk about a insert_many () Chestnut:

>>> db.test.count()0>>>= db.test.insert_many([{‘x‘forinrange(2)])>>> result.inserted_ids[ObjectId(‘54f113fffba522406c9cc20e‘), ObjectId(‘54f113fffba522406c9cc20f‘)]>>> db.test.count()2

It is important to note that if the document does not contain the _id field, _id is automatically added, and the value of _id must be unique within the collection.
If we want to list all the collections in the database, you can use the following code:

cur_collection=db.collection_names(False)print(cur_collection)
Get a single document using Find_one ()

The simplest query type executed in MongoDB is Find_one (), which returns a single document matching the query, and none if no matching document is obtained. Consider using the Find_one () method only if you know that there is only one matching document, or if you are only interested in the first match.
Here's an example:

#获取第一个文档,结果就是之前插入的字典格式,并且多了一个_id。post_first=db.posts.find_one()print(post_first)
#获取具有匹配的特定元素,比如author为xingzhui的文档,作为指定的查询条件。post_xingzhui=db.posts.find_one({‘author‘:‘xingzhui‘})print(post_xingzhui)
Query by Objectid

Sometimes, we can also find a post through _id, the comparison is applicable to our own designation of _id, for example:

post =  {: 200 , : ,  "text" :  "This is the My first post!" ,  "tags" /span>: [,  "Shell" ,  "Pymongo" ], : Datetime.datetime.utcnow ()}post_id=  Db.posts.insert_one ( Post). Inserted_idprint  (post_id) Post_user=  Db.posts.find_one ({ Span class= "st" > ' _id ' :p ost_id}) print  ( ' by post ID: ' , Post_ User[ ' author ' ])  
#输出结果200By Post ID: Suifeng
Bulk Insert Insert_many ()

To accommodate more complex queries, we then insert some more documents into the posts collection.
After inserting a single document, you can also insert multiple documents and use Insert_many () to perform them.
Multiple documents are inserted through a single command.
As an example:

New_posts=[{"_id": +,"Author":"Curry","Text":"another post!","tags": ["Bulk","Insert"],"Date": Datetime.datetime ( ., One, A, One, -)},              {"_id":1001,"Author":"Maxsu","title":"MongoDB is Fun","Text":"and Pretty easy Too!","Date": Datetime.datetime (2019, One,Ten,Ten, $)}]result=Db.posts.insert_many (new_posts)Print(' Bulk inserts Result is: ', Result.inserted_ids)

It is worth noting that:
The result of Insert_many () returns two Objectid instances, each of which represents an inserted document.
In addition, we replace the tag field with the Title field in the second post and can also be inserted into the database.
MongoDB is modeless, that's what it means.

Querying multiple documents

To query for results that exceed a single document as a query, you can use the Find () method, and find () returns a cursor instance that allows you to traverse all matching documents.

forin db.posts.find():    print(post)

Similarly, we can pass the filter parameters to the Find method, such as

forin db.posts.find({‘author‘:‘xingzhui‘}):    print(post)
Count statistics

If you only want to know how many document matching queries you have, you can perform the count () method operation instead of a full query.
You can get a count of all the documents in a collection:

print(db.posts.count())print(db.posts.find({"author""xingzhui"}).count())

Well, so far we've introduced all the basics that might be used in reptiles, and we're going to start with the actual crawler.
I intend to use one months of time, static pages, dynamic Web pages, login verification and other types of crawler methods to introduce, I hope you can be interested.
Progress together!

Introduction to Crawler "8" Python connection MongoDB usage

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.