用java做抓取的時候免不了要用到多線程的了,因為要同時抓取多個網站或一條線程抓取一個網站的話實在太慢,而且有時一條線程抓取同一個網站的話也比較浪費CPU資源。要用到多線程的等方面,也就免不了對線程的控制或用到線程池。 我在做我們現在的那一個抓取架構的時候,就曾經用過java.util.concurrent.ExecutorService作為線程池,關於ExecutorService的使用代碼大概如下:
java.util.concurrent.Executors類的API提供大量建立串連池的靜態方法:1.固定大小的線程池:
1 package BackStage;
2
3 import java.util.concurrent.Executors;
4 import java.util.concurrent.ExecutorService;
5
6 public class JavaThreadPool {
7 public static void main(String[] args) {
8 // 建立一個可重用固定線程數的線程池
9 ExecutorService pool = Executors.newFixedThreadPool(2);
10 // 建立實現了Runnable介面對象,Thread對象當然也實現了Runnable介面
11 Thread t1 = new MyThread();
12 Thread t2 = new MyThread();
13 Thread t3 = new MyThread();
14 Thread t4 = new MyThread();
15 Thread t5 = new MyThread();
16 // 將線程放入池中進行執行
17 pool.execute(t1);
18 pool.execute(t2);
19 pool.execute(t3);
20 pool.execute(t4);
21 pool.execute(t5);
22 // 關閉線程池
23 pool.shutdown();
24 }
25 }
26
27 class MyThread extends Thread {
28 @Override
29 public void run() {
30 System.out.println(Thread.currentThread().getName() + "正在執行。。。");
31 }
32 }
後來發現ExecutorService的功能沒有想像中的那麼好,而且最多隻是提供一個線程的容器而然,所以後來我用改用了java.lang.ThreadGroup,ThreadGroup有很多優勢,最重要的一點就是它可以對線程進行遍曆,知道那些線程已經運行完畢,還有那些線程在運行。關於ThreadGroup的使用代碼如下:
1 class MyThread extends Thread {
2 boolean stopped;
3
4 MyThread(ThreadGroup tg, String name) {
5 super(tg, name);
6 stopped = false;
7 }
8
9 public void run() {
10 System.out.println(Thread.currentThread().getName() + " starting.");
11 try {
12 for (int i = 1; i < 1000; i++) {
13 System.out.print(".");
14 Thread.sleep(250);
15 synchronized (this) {
16 if (stopped)
17 break;
18 }
19 }
20 } catch (Exception exc) {
21 System.out.println(Thread.currentThread().getName() + " interrupted.");
22 }
23 System.out.println(Thread.currentThread().getName() + " exiting.");
24 }
25
26 synchronized void myStop() {
27 stopped = true;
28 }
29 }
30
31 public class Main {
32 public static void main(String args[]) throws Exception {
33 ThreadGroup tg = new ThreadGroup("My Group");
34
35 MyThread thrd = new MyThread(tg, "MyThread #1");
36 MyThread thrd2 = new MyThread(tg, "MyThread #2");
37 MyThread thrd3 = new MyThread(tg, "MyThread #3");
38
39 thrd.start();
40 thrd2.start();
41 thrd3.start();
42
43 Thread.sleep(1000);
44
45 System.out.println(tg.activeCount() + " threads in thread group.");
46
47 Thread thrds[] = new Thread[tg.activeCount()];
48 tg.enumerate(thrds);
49 for (Thread t : thrds)
50 System.out.println(t.getName());
51
52 thrd.myStop();
53
54 Thread.sleep(1000);
55
56 System.out.println(tg.activeCount() + " threads in tg.");
57 tg.interrupt();
58 }
59 }
由以上的代碼可以看出:ThreadGroup比ExecutorService多以下幾個優勢
1.ThreadGroup可以遍曆線程,知道那些線程已經運行完畢,那些還在運行
2.可以通過ThreadGroup.activeCount知道有多少線程從而可以控制插入的線程數