Cola is a distributed crawler framework written in Python. Its purpose is to facilitate distributed deployment. Although there are still many imperfections, it is still worth exploring. This article will give a more detailed introduction to the Cola runtime environment configuration. Before configuration, ensure that the system has a normal compilation tool (gcc, make, autoconf, etc .). In addition, you may need to install python-dev: sudoapt-getinstallpython-dev first.
Cola is a distributed crawler framework written in Python. Its purpose is to facilitate distributed deployment. Although there are still many imperfections, it is still worth exploring. This article will give a more detailed introduction to the Cola runtime environment configuration. Before configuration, ensure that the system has a normal compilation tool (gcc, make, autoconf, etc .). In addition, you may need to install python-dev first:
Sudo apt-get install python-dev
1. Install MongoDB
The database used by Cola is a document-type database MongoDB. Therefore, you must first configure MongoDB. Here we provide a more convenient deployment method.
MongoDB is maintained and developed by 10gen. First, add the 10gen resource library to apt package management:
Sudo apt-key adv -- keyserver keyserver.Ubuntu.com -- recv 7F0CEB10
Echo 'Destroy http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen '| sudo tee/etc/apt/sources. list. d/10gen. list
Sudo apt-get update
After completing the preceding steps, you can install the latest stable version of MongoDB:
Sudo apt-get install mongodb-10gen
After the installation is complete, the MongoDB service is automatically started.
2 install PIP
PIP is a Python package management tool. Cola relies on many Python libraries. To facilitate later installation, We need to configure PIP first. PIP itself depends on setuptools. Therefore, you must install setuptools before installing PIP. The specific method is as follows:
Wget https://bitbucket.org/pypa/setuptools/raw/0.7.5/ez_setup.py-O-| sudo python
Sudo apt-get install curl
Curl-O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
Sudo python get-pip.py
3. Configure the library on which Cola depends
Sudo apt-get install libyaml-dev
Sudo pip install pyyaml
Sudo pip install mechanic
Sudo pip install python-dateutil
Sudo pip install BeautifulSoup4
Sudo pip install plugin Engine
Sudo easy_install rsa
Git clone https://github.com/chineking/cola.git
In the last command, we get the Cola source code, then you can run the standalone mode or distributed mode, see: https://github.com/chineking/cola/wiki
For more information about Ubuntu, see Ubuntu special page http://www.linuxidc.com/topicnews.aspx? Tid = 2