Java Crawler engineer Skill List "Go"

Source: Internet
Author: User
Tags zookeeper zookeeper client

I. List of skills, mastering Java, especially programming network parts; Li Gang's Java Foundation has seen at least three times;

2, familiar with HTML, JS, Ajax, Firedebug
3, Web page to heavy, find the characteristics of the site
4. Distributed
5. Multithreading
6, a relational database Mysql/oraclelserver/mybatis
7. Regular expressions, CSS selector, XPath
8. DNS Cache
9, Tcp/ip/http protocol tp2.010, Web login protocol

10. SSO, OAuth principle

11. Anti-crawl strategy
12, familiar with HttpClient, OKHTTP3 ...
13, familiar with some extraction tools, Jsoup, Selenim webdriver ...
14, search technology. Familiar with lucene/nutch/heritrix/solr/elastic-search/
15, familiar with XML, JSON, SOAP protocol;
16. MongoDB, Redis, HBase, Hadoop
17. Text analysis, machine learning, data mining, natural language processing [NLP]
18, the completion of Web pages, Weibo,, paste, forum and other data information accurate extraction
19. RPC protocol
20, Netty, NIO
21, Htmlunit, Phantomjs, Slimerjs, Casperjs
22, Agent deployment scenario: Http/socks
23. Nginx, squid, jetty
24. hack iOS
25. Verification code, OCR, tess4j

Second, crawler tools

1, Phantomjs

2. BERSERKJS (improved version based on PHANTOMJS)

3, Slimerjs

4, Casperjs

5, Selenium

Third, Java-related

Common Ide:intellij Idea,eclipse,netbeans

Web Development Related: Tomcat, Resin, Jetty, WebLogic, etc., common components struts,spring

Hibernatenetty: Asynchronous event-driven Network application programming framework for high Concurrency Network programming (NIO framework)

MINA: The simple development of high-performance and high-reliability network applications (also a NIO framework), a lot of hand-travel services are developed with it

Jooq:java ORM Framework Activiti: Workflow engine, similar to JBPM, Snaker

Perfuse: is a user Interface pack that is used to present structured and unstructured data with interactive visual graphics.

Gephi: Complex network analysis software, which is mainly used for various network and complex systems, dynamic and hierarchical mapping of interactive visualization and detection of open source tools

Nutch: A well-known crawler project, Hadoop was developed from this project.

Web-harvest:web Data Extraction Tool

Pom Tool: Maven+artifactorynetflix

Curator:netflix company Open Source, a Zookeeper client library, to simplify the programming of zookeeper clients

Akka: A concurrency Processing framework based on actor model

Java Crawler engineer Skill List "Go"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.