Python crawler data saved to MongoDB

Source: Internet
Author: User
Tags mongoclient mongodb connection string

MongoDB is a non-relational database written by the C + + language and is an open source database system based on distributed file storage that is stored in a manner similar to a JSON object whose field values can be other documents or arrays, but whose data type can only be string literal.

Before using it, we need to make sure MongoDB is installed and the service is started. This is mainly used to save Python data, we want to install the Python Pymongo Library, run the ' pip install Pymongo ' command to complete the Pymongo installation. Entering Python mode, you can view Pymongo version information through Pymongo.version.

  

  The first step is to connect MongoDB

We passed the mongoclient in the Pymongo library. Where the first parameter, host, is the address of MongoDB, the second parameter is port ports (default is 27017 if parameter is not passed)

Client = Pymongo. Mongoclient (host='127.0.0.1', port=27017)

Another way is to pass the MongoDB connection string directly, starting with MongoDB.

Client = Pymongo. Mongoclient ('mongodb://127.0.0.1:27017/')

Second step, select the database or collection

In MongoDB, you can build multiple databases, each of which contains many collections, similar to tables in a relational database. There are two ways to select a database, both of which work the same way.

db = client.test    # Test Database
db = client['test']

After choosing a good database we need to specify the collection to manipulate, similar to the database selection.

p = db.persons    # Persons Collection
p = db['persons')

  Step three, add the data

 person = {  "  ID   ": "   00001   "   '   : 19

This adds a piece of data through the object's insert () method, which returns the value of the _id attribute that was added automatically during data insertion, which is unique. In addition, we can add more than one piece of data, which is passed in the form of a list.

person = {    'ID':'00001',    'name':'ABC',    ' Age': 19}person1= {    'ID':'00002',    'name':'DFG',    ' Age': 20}result=P.insert ([Person,person1])
# recommend using the Insert_many () method, then use Result.inserted_ids to view the _id list of inserted dataPrint(Result)

  Fourth step, query the data

Query data We can use the Find_one () or Find () method, where Find_one () gets a single data result, and find () returns a generator object.

res = P.find_one ({'name':'abc'})  # Query The information of the person named ABC, Return word typical data print(res)

Find () is used to query multiple data, return the cursor type of the survivor, we have to traverse to get all the data results.

res = P.find ({'age': ')  # Query the collection for age is 20 data
# res = P.find ({' age ': {' $GT ': $}}) # Query the collection of age greater than 20 for data print(res) for in res: print(r)

In addition, we can also query through regular matching.

res = P.find ({'name': {'$regex':'^a.*  '}}  # Query the collection for the data that begins with a name

To count the total number of data in a query, you need to use the count () method

Count = P.find (). Count ()  # count all data bars in the collection

Sorting calls the sort () method directly, passing in the ascending descending flag as required

res = P.find (). Sort ('age', Pymongo. Ascending)  # Sorts the data in the collection according to age, Pymongo. Ascending indicates ascending, Pymongo. Descending = Descending

When we only need to get a few elements, we can use the skip () method to offset a few positions, to get rid of the remaining element data after the offset number

res = P.find ({'name': {'$regex':'^a.*  '}}). Skip (2print([r['name'for in Res])  # Prints the name of the data that begins with a name, and is displayed from a third

Fifth step, update the data

Update data we implement it using the update () method and specify the conditions for the update and the data that needs to be updated.

 where = { " name   ": "  ABC   " }res  = P.find_one (where) res[ "  age   ' ] = 25result  = P.update (where, RES) # Recommended use of Update_One () or Update_ Many ()  print  (Result) 

The returned data is in a dictionary form,{'ok': 1, 'nmodified': 1, ' N ': 1, 'updatedexisting': True}, where OK indicates successful execution, nmodified represents the number of data bars affected.

In addition, we can use the $set operator to update the data. With $set, only the fields that exist in the dictionary are updated, and the other fields are not updated or deleted. If not, all data is updated and other existing fields are deleted.

where = {' age ': {' $gt ': 20}}
result = P.update_many (where,{' $inc ': {' age ': 1}}) # adds 1 to the data age of the first piece of the collection that is older than 20
Print (Result)
Print (Result.matched_count,result.modified_count) # Gets the number of matched data bars, affecting the number of data bars

  Sixth step, delete data

Deleting data can call the Remove () method, and you need to specify a delete condition.

result = P.remove ({'name':'abc'})   # Delete data with name ABC, recommended Delete_one () and Delete_many (), call result.delete_count after execution, get the number of deleted data bars

The return is a word typical data, {' OK ': 1, ' n ': 1}

In addition, we can also operate on the index, such as Create_index () create a single index, create_indexes () Create multiple indexes, Drop_index () delete the index and so on.

Reference: Quiet Blog https://cuiqingcai.com/5584.html

Python crawler data saved to MongoDB

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.