The additional chapter of the python crawler. The installation of mongodb and redis databases on the machine on the master node is not an apt method.
It is found that data crawled by crawlers is stored according to the table structure, which is not only troublesome but also redundant.
Try this non-relational database to try the storage effect.
I don't plan to use redis for comparison here, because he is a memory database and he is good at caching and Small Data Statistics classification.
(Another major player in memcache is doing this). redis will work with other applications to improve efficiency.
The main difference here is the poor performance of mongodb and mysql. It specifically refers to the complex relational network application environment !!
Apt-cache depends # (package understands dependencies)
Apt-cache rdepends # (does package know a specific dependency? # Check which packages the package depends on ...)
Sudo apt-get build-dep # (package installation-related compilation environment)
Apt-get source # (package downloads the source code of the package)
Sudo apt-get clean & sudo apt-get autoclean # Clear the archive of downloaded files & only clear outdated packages
Sudo apt-get check # check for any corrupted Dependencies
In addition, if the installation package is suddenly interrupted,
Sudo rm/var/lib/dpkg/updates /*
The main reason is that the information in the/var/lib/dpkg/updates folder is incorrect, causing an error in the update software program, so you have to delete them completely,
The sudo apt-get update command will re-create these materials and use sudo apt-get upgrade to update the details of installed software on your computer, update the software to the latest version based on the software details.
Sudo apt-get update # update source
Sudo apt-get upgrade # Update installed packages, which is different from sudo apt-get dist-upgrade # upgrade the system
Accidental damage of dpkg and apt-get
First:
Sudo rm/var/cache/apt/archives/lock
Sudo rm/var/lib/dpkg/lock
Make the above update again
Install mongodb first:
See mongodb official website installation documents: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
The above shows how to uninstall the old version and how to install it using apt-get:
I strongly recommend that you follow the apt Method for installation !!!!
I strongly recommend that you follow the apt Method for installation !!!!
I strongly recommend that you follow the apt Method for installation !!!!
Don't learn me, a painful lesson... Follow the apt tutorial on the official mongodb website to install apsaradb for mongodb !!!!
I didn't select the apt method, because I am a newbie, the apt method will split all the files into various system folders, I don't know
Besides, I only want to use my own user to use mongodb. I don't want to use root.
Therefore, I still install the package by extracting the package.
After downloading the package, I found that no conf file exists. All are compiled program files.
1
Curl-O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.0.5.tgz
2
Tar-zxvf mongodb-linux-x86_64-3.0.5.tgz
3
Mkdir-p mongodb-3.0.5/
Cp-R-n mongodb-linux-x86_64-3.0.5 /~ /Mongodb-3.0.5/
4
Export PATH = /Bin: $ PATH
Running
1
Mkdir-p/data/db
2
Are you sure you want to have read and write permissions?
3
Start
Mongod -- dbpath
Sudo chmod-R 777/home/luis/mongodb-3.0.5/data/db/
The next step is some auxiliary work.
Modify the maximum number of connections:
Modify the configuration file/etc/security/limits. conf.
Run sudo gedit/etc/security/limits. conf.
Add
* Soft nofile 3000
* Hard nofile 20000
Root soft nofile 3000
Root hard nofile 20000
* Indicates that the configuration is valid for all users. The root user must add two lines.
The hard limit is generally the maximum number of files that can be opened simultaneously by the system based on the system hardware resources (mainly the system memory). The soft limit is based on this limit. Therefore, the number of Soft limits must be lower than the hard limit.
Nofile indicates max number of open files
Restart the computer and run the ulimit-a command to view the information:
Open File (-n) 3000
It has already taken effect. Start mongodb server again now to solve the problem
It takes effect only after restart.
Then set boot start
Create a script file mongodb in the/etc/init. d/directory.
#!/bin/sh ### BEGIN INIT INFO # Provides: mongodb # Required-Start: # Required-Stop: # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: mongodb # Description: mongo db server ### END INIT INFO . /lib/lsb/init-functions PROGRAM=/home/luis/mongodb-3.0.5/bin/mongodMONGOPID=`ps -ef | grep 'mongod' | grep -v grep | awk '{print $2}'` test -x $PROGRAM || exit 0 case "$1" in start) ulimit -n 3000 log_begin_msg "Starting MongoDB server" $PROGRAM --fork --quiet -journal -maxConns=2400 -rest --dbpath /home/luis/mongodb-3.0.5/data/db --logpath /home/luis/mongodb-3.0.5/data/db/journal/mongodb.log log_end_msg 0 ;; stop) log_begin_msg "Stopping MongoDB server" if [ ! -z "$MONGOPID" ]; then kill -15 $MONGOPID fi log_end_msg 0 ;; status) ;; *) log_success_msg "Usage: /etc/init.d/mongodb {start|stop|status}" exit 1 esac exit 0
Run the sudo chmod + x/etc/init. d/mongodb command to allow the script to be executed.
Run the following command to register the boot script:
Update-rc.d mongodb ults
You can also remove it via update-rc.d-f mongodb remove
Restart the service. You can view the self-started service process through ps-def | grep restart D. Then, you can run the following command to close/start the service.
Sudo service mongodb stop
Sudo service mongodb start
Client Login Server
The startup log is as above, and the server is started as above. Now we test whether the server is normal on another terminal.
Go to/usr/local/mongodb-linux-x86_64-2.0.2/bin and execute./mongo
Appears
MongoDB shell version: 2.0.2
Connecting to: test
Execute db. foo. save ({1: "Hello world "})
Then find db. foo. find ();
{"_ Id": ObjectId ("4e4b395986738efa2d0718b9"), "1": "hello world "}
Congratulations! mongodb has been installed successfully.
You can also use the following method to connect to a remote mongodb server. The default port is 27017. For example:
./Mongo 192.168.30.25
Create a database
If the mydb database does not exist, run the following command on the client:
Use mydb
The mydb database is created and the current database is switched to mydb.
Show dbs does not display the database name. Run the db. stats () command to check the current database status.
Standard inspection process
1. First check ulimit-
Check whether open files (-n) is set.
If it is set in the boot script, perform the following 4 steps. This step can be ignored.
2. ps-def | grep 1_d
Check whether the service is started
3. cd/data/db/journal/
Cat mongdb. log
Check whether the server is correct
4. Go to http: // 192.168.1.199: 28017
Check whether the server is started normally
5. Go to/usr/mongodb/bin and run./mongo.
Check whether you can log on
Install redis:
Reference Official Website: http://redis.io/download
Installation
$ Wget http://download.redis.io/releases/redis-3.0.3.tar.gz
$ Tar xzf redis-3.0.3.tar.gz
$ Redis-3.0.3 cd
$ Make
The binaries that are now compiled are available in the src directory. Run Redis:
$ Src/redis-server
You can interact with Redis using the built-in client:
$ Src/redis-cli
Redis> set foo bar
OK
Redis> get foo
"Bar"