I summed up a bit is the storage, message processing (asynchronous, blocking, queue, message middleware)
Reference Job Requirements
Job responsibilities of the Data Crawler engineer:
1, distributed network Crawler research and development: constantly improve the existing crawl system, through the capture, analysis, scheduling, storage and other modules of the Division and optimization, the formation of a local service characteristics of the targeted crawler engine, continuous improvement and iterative improvement, and promote open service construction;
2, crawling data demand support: continuously meet the needs of fine operation, in ensuring the grasping system on the basis of continuous progress, complete daily crawl and analytic tasks, the stability of the data is responsible.
3, the asynchronous processing or message processing mode of understanding, familiar with and in the project used twisted Framework or message middleware (such as RABBITMQ, ACTIVEMQ) plus points;
4. Skilled in using relational databases (such as MySQL, PostgreSQL) or NoSQL databases (such as MongoDB and Redis), and proficient at least one of them and used in multiple projects and have their own
experiences and experiences;
Other references:
A. Familiar with common class libraries * proficient in Django architecture and development, and commonly used third-party packages * Familiar with restful API design and use, familiar with nonblocking io and asynchronous IO technology
B. Python fundamentals: Familiar with Io, multithreading and other basic technologies
PS: As a full-time developer, in the business to spend a lot of effort, such as on-board debugging (embedded equipment), and big data back-end debugging, and front-end debugging interface display; In the use and selection of tools, may also take some detours, after all, it is impossible to choose all the perfect. Give yourself some confidence in catching up.
Some knowledge points related to Python data processing (learning points)