Configure the distributed crawler framework-Cola in Ubuntu12.04

Source: Internet
Author: User
Tags install mongodb
Cola is a distributed crawler framework written in Python. Its purpose is to facilitate distributed deployment. Although there are still many imperfections, it is still worth exploring. This article will give a more detailed introduction to the Cola runtime environment configuration. Before configuration, ensure that the system has a normal compilation tool (gcc, make, autoconf, etc .). In addition, you may need to install python-dev: sudoapt-getinstallpython-dev first.

Cola is a distributed crawler framework written in Python. Its purpose is to facilitate distributed deployment. Although there are still many imperfections, it is still worth exploring. This article will give a more detailed introduction to the Cola runtime environment configuration. Before configuration, ensure that the system has a normal compilation tool (gcc, make, autoconf, etc .). In addition, you may need to install python-dev first:

Sudo apt-get install python-dev

1. Install MongoDB
The database used by Cola is a document-type database MongoDB. Therefore, you must first configure MongoDB. Here we provide a more convenient deployment method.

MongoDB is maintained and developed by 10gen. First, add the 10gen resource library to apt package management:

Sudo apt-key adv -- keyserver keyserver.Ubuntu.com -- recv 7F0CEB10

Echo 'Destroy http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen '| sudo tee/etc/apt/sources. list. d/10gen. list

Sudo apt-get update

After completing the preceding steps, you can install the latest stable version of MongoDB:

Sudo apt-get install mongodb-10gen

After the installation is complete, the MongoDB service is automatically started.

2 install PIP
PIP is a Python package management tool. Cola relies on many Python libraries. To facilitate later installation, We need to configure PIP first. PIP itself depends on setuptools. Therefore, you must install setuptools before installing PIP. The specific method is as follows:

Wget https://bitbucket.org/pypa/setuptools/raw/0.7.5/ez_setup.py-O-| sudo python
Sudo apt-get install curl

Curl-O https://raw.github.com/pypa/pip/master/contrib/get-pip.py

Sudo python get-pip.py

3. Configure the library on which Cola depends
Sudo apt-get install libyaml-dev

Sudo pip install pyyaml

Sudo pip install mechanic

Sudo pip install python-dateutil

Sudo pip install BeautifulSoup4

Sudo pip install plugin Engine

Sudo easy_install rsa

Git clone https://github.com/chineking/cola.git

In the last command, we get the Cola source code, then you can run the standalone mode or distributed mode, see: https://github.com/chineking/cola/wiki

For more information about Ubuntu, see Ubuntu special page http://www.linuxidc.com/topicnews.aspx? Tid = 2

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.