How to abort Axel, a linux multi-thread download Tool

Source: Internet
Author: User
Tags nameserver htons

How to abort Axel, a linux multi-thread download Tool

1. What is Axel?
Axel is a LINUX multi-thread download tool, official website http://axel.alioth.debian.org/

2. Problems I encountered
$> Axel-a-n 10-s 409600 "myurl"
There is no progress in downloading for a period of time, and this phenomenon is difficult to reproduce.
3 Axel source code Logic

Main () {axel_new () {/* Send an http get request in 1 byte to get the number of bytes of the downloaded file */} axel_open () {/* allocate the start and end bytes of the downloaded file data for each connection, and open the file used to save the downloaded data */Axel-> outfd = open ();} axel_start () {/* Create thread pool */setup_thread () {/* The thread processing function returns only after the connection is created. and connect () is not set to time out */gethostbyname (); Connect (); Return ;}/ * Register SIGINT and sigterm signal */while (the download is incomplete and the signal is not received) {axel_do () {/* Save download status register select () If no valid connection is created for the current thread pool, that is, all Descriptors (Axel-> conn [0-n]. FD <= 0), the thread is recycled or re-created. If a thread creates a connection successfully or fails to return a result, the thread is reclaimed, And the thread is created again to execute the connection operation. if a thread does not return within the specified time, pthread_cancel () will drop the thread. however, pthread_cancel () does not take effect immediately. If a connection is readable, data is read and written to the corresponding file location. if a connection is unreadable and times out for 45 seconds, the connection is closed. the thread will be created to reconnect in the next loop. */}}}

4. analysis process
Use strace to track the current Axel process:
Strace-F-TT-P PID
Then let's look at the execution process of a single thread:

16457 14:28:54.194584 clone(child_stack=0xb5e35494, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb5e35bd8, {entry_number:6, base_addr:0xb5e35b70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb5e35bd8) = 2342323423 14:28:54.194938 set_robust_list(0xb5e35be0, 0xc <unfinished ...>23423 14:28:54.195125 <... set_robust_list resumed> ) = 023423 14:28:54.195300 futex(0xb77bcde0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>16457 14:29:14.077986 tgkill(16457, 23423, SIGRTMIN <unfinished ...>23423 14:29:14.078204 <... futex resumed> ) = ? ERESTARTSYS (To be restarted)23423 14:29:14.078424 --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---23423 14:29:14.078676 madvise(0xb5635000, 8372224, MADV_DONTNEED <unfinished ...>23423 14:29:14.078761 <... madvise resumed> ) = 023423 14:29:14.078916 _exit(0)          = ?16457 14:28:54.195374 <... clone resumed> child_stack=0xb5634494, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb5634bd8, {entry_number:6, base_addr:0xb5634b70, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb5634bd8) = 2342423424 14:28:54.195524 set_robust_list(0xb5634be0, 0xc <unfinished ...>23424 14:28:54.195666 <... set_robust_list resumed> ) = 023424 14:28:54.195877 futex(0xb77bcde0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>16457 14:29:14.078985 tgkill(16457, 23424, SIGRTMIN) = 023424 14:29:14.079124 <... futex resumed> ) = ? ERESTARTSYS (To be restarted)23424 14:29:14.079353 --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---23424 14:29:14.079510 madvise(0xb4e34000, 8372224, MADV_DONTNEED <unfinished ...>23424 14:29:14.079719 <... madvise resumed> ) = 023424 14:29:14.079944 _exit(0)          = ?...

Found:

-> The newly allocated threads all stop at futex (0xb77bcde0, futex_wait_private, 2, null <unfinished...> | V-> apparently the deadlock. | why? V-> check the source code without using the lock mechanism. The thread functions called in the whole code are: | pthread_create () | pthread_join () | pthread_cancel () | pthread_setcancelstate () | pthread_setcanceltype () | V-> it indicates that futex () is not called by code display. A function may call futex () internally. | which function calls futex? V-> the axel thread that traces successful execution: | 10363 16:28:41 <... clone resumed> child_stack = alert, flags = clone_vm | clone_fs | clone_files | alert | clone_thread | alert | clone_settls | alert, credential = alert, {entry_number: 6, base_addr: alert, limit: 1048575, seg_32bit: 1, contents: 0, read_exec_only: 0, limit_in_pages: 1, seg_not_present: 0, useable: 1}, child_tidptr = 0 Xb6b4cbd8) = 10370 | 10370 16:28:41 set_robust_list (0xb6b4cbe0, 0xc <unfinished...> | 10370 16:28:41 <... set_robust_list resumed>) = 0 | 10370 16:28:41 futex (0x28ae68, futex_wait_private, 2, null <unfinished...> | 10370 16:28:41 <... futex resumed>) = 0 | 10370 16:28:41 open ("/etc/resolv. conf ", o_rdonly <unfinished...> | 10370 16:28:41 <... open resumed>) = 4 | 10370 16:28:41 fstat64 (4, <UN Finished...> | 10370 16:28:41 <... fstat64 resumed> {st_mode = s_ifreg | 0644, st_size = 52 ,...}) = 0 | 10370 16:28:41 mmap2 (null, 4096, prot_read | prot_write, map_private | map_anonymous,-1, 0) = 0xb7711000 | 10370 16:28:41 read (4, "nameserver 219.141.136.10 \ nnamese "..., 4096) = 52 | 10370 16:28:41 read (4, "", 4096) = 0 | 10370 16:28:41 close (4) = 0 | 10370 16:28:41 munmap (0xb7711000, 4096) = 0 | 10370 16:28:41 uname ({sys = "Linux", node = "201221021jm93x ",...}) = 0 | 10370 16:28:41 stat64 ("/etc/resolv. conf ", {st_mode = s_ifreg | 0644, st_size = 52 ,...}) = 0 | 10370 16:28:41 open ("/etc/hosts", o_rdonly | o_cloexec) = 4 | 10370 16:28:41 fstat64 (4, {st_mode = s_ifreg | 0644, st_size = 3382 ,...}) = 0 | 10370 16:28:41 mmap2 (null, 4096, prot_read | prot_write, map_private | map_anonymous,-1, 0) = 0xb7711000 | 10370 1 6:28:41 read (4, "127.0.0.1 \ tlocalhost \ n10.2.30.159 \ t "..., 4096) = 3382 | 10370 16:28:41 read (4, "", 4096) = 0 | 10370 16:28:41 close (4) = 0 | 10370 16:28:41 munmap (0xb7711000, 4096) = 0 | 10370 16:28:41 stat64 ("/etc/resolv. conf ", {st_mode = s_ifreg | 0644, st_size = 52 ,...}) = 0 | 10370 16:28:41 socket (pf_inet, sock_dgram | sock_nonblock, ipproto_ip) = 4 | 10370 16:28:41 connect (4, {sa_family = af_inet, Si N_port = htons (53), sin_addr = inet_addr ("219.141.136.10")}, 16) = 0 | 10370 16:28:41 gettimeofday ({1345624121,428 914}, null) = 0 | 10370 16:28:41 poll ([{FD = 4, events = pollout}], 1, 0) = 1 ([{FD = 4, revents = pollout}]) | 10370 16:28:41 send (4, "# \ 301 \ 1 \ 0 \ 0 \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 5cacti \ 6bokecc \ 3Com \ 0 \ 0 \ 1 "..., 34, msg_nosignal) = 34 | 10370 16:28:41 poll ([{FD = 4, events = Pollin}], 1, 5000 <unfinished...> | 10370 1 6:28:41 <... poll resumed>) = 1 ([{FD = 4, revents = Pollin}]) | 10370 16:28:41 IOCTL (4, fionread, [188]) = 0 | 10370 16:28:41 recvfrom (4, "# \ 301 \ 201 \ 200 \ 0 \ 1 \ 0 \ 1 \ 0 \ 2 \ 0 \ 6 \ 5cacti \ 6bokecc \ 3Com \ 0 \ 0 \ 1 "..., 1024, 0, {sa_family = af_inet, sin_port = htons (53), sin_addr = inet_addr ("219.141.136.10")}, [16]) = 188 | 10370 16:28:41 close (4) = 0 | 10370 16:28:41 futex (0x28ae68, futex_wake_private, 1 <unfinished...> | 10370 16:28:41 <... futex resumed>) = 1 | 10370 16:28:41 socket (pf_inet, sock_stream, ipproto_ip <unfinished...> | 10370 16:28:41 <... socket resumed>) = 6 | 10370 16:28:41 connect (6, {sa_family = af_inet, sin_port = htons (80), sin_addr = inet_addr ("114.113.152.135 ")}, 16 <unfinished...> | 10370 16:28:41 <... connect resumed>) = 0 | 10370 16:28:41 gettimeofday (<unfinished...> | 16:28:41, 10370 <... Gettimeofday resumed> {1345624121,435 907}, null) = 0 | 10370 16:28:41 write (6, "Get/test. flv http/1.0 \ r \ nhos "..., 116 <unfinished...> v-> the second system call that enters the thread for execution is futex () | 10370 16:28:41 set_robust_list (0xb6b4cbe0, 0xc <unfinished...> | 10370 16:28:41 <... set_robust_list resumed>) = 0 | 10370 16:28:41 futex (0x28ae68, futex_wait_private, 2, null <unfinished...> | 10370 16:28:41 <... fu Tex resumed>) = 0 | 10370 16:28:41 open ("/etc/resolv. conf ", o_rdonly <unfinished...> | 10370 16:28:41 <... open resumed>) = 4 | 10370 16:28:41 fstat64 (4, <unfinished...> | 10370 16:28:41 <... fstat64 resumed> {st_mode = s_ifreg | 0644, st_size = 52 ,...}) = 0 | 10370 16:28:41 mmap2 (null, 4096, prot_read | prot_write, map_private | map_anonymous,-1, 0) = 0xb7711000 |... | 10370 16:28:41 close (4) = 0 |... | 10370 16:28:41 stat64 ("/etc/resolv. conf ", {st_mode = s_ifreg | 0644, st_size = 52 ,...}) = 0 | 10370 16:28:41 open ("/etc/hosts", o_rdonly | o_cloexec) = 4 |... | 10370 16:28:41 close (4) = 0 |... | 10370 16:28:41 close (4) = 0 | 10370 16:28:41 futex (0x28ae68, futex_wake_private, 1 <unfinished...> | 10370 16:28:41 <... futex resumed>) = 1 | 10370 16:28:41 socket (pf_inet, sock_stream, ipproto_ip <UN Finished...> | 10370 16:28:41 <... socket resumed>) = 6 | 10370 16:28:41 connect (6, {sa_family = af_inet, sin_port = htons (80), sin_addr = inet_addr ("114.113.152.135 ")}, 16 <unfinished...> v-> it is obvious that the gethostbyname () function calls futex () | V-> I remember the instructor said that gethostbyname () is not thread-safe. will it be related to this? Use gethostbyname_r () or getaddrinfo () to replace and test again | 32559 16:39:39 clone (child_stack = 0xb7357494, flags = clone_vm | clone_fs | clone _ FILES | clone_sighand | clone_thread | clone_sysvsem | clone_settls | Protocol, Protocol = protocol, {entry_number: 6, base_addr: Protocol, limit: 1048575, seg_32bit: 1, contents: 0, read_exec_only: 0, limit_in_pages: 1, seg_not_present: 0, useable: 1}, child_tidptr = 0xb7357bd8) = 32619 | 32619 16:39:39 bytes (0xb7357be0, 0xc <unfinished...> | 32619 16:39:39 <... set _ Robust_list resumed>) = 0 | 32619 16:39:39 open ("/etc/resolv. conf ", o_rdonly <unfinished...> | 32619 16:39:39 <... open resumed>) = 4 | 32619 16:39:39 fstat64 (4, <unfinished...> | 32619 16:39:39 <... fstat64 resumed> {st_mode = s_ifreg | 0644, st_size = 52 ,...}) = 0 | 32619 16:39:39 mmap2 (null, 4096, prot_read | prot_write, map_private | map_anonymous,-1, 0 <unfinished...> | 32619 16:39:39 <... mmap2 Resumed>) = 0xb6355000 | 32619 16:39:39 read (4, <unfinished...> | 32619 16:39:39 <... read resumed> "nameserver 219.141.136.10 \ nnamese "..., 4096) = 52 | 32619 16:39:39 read (4, <unfinished...> | 32619 16:39:39 <... read resumed> "", 4096) = 0 | 32619 16:39:39 close (4 <unfinished...> | 32619 16:39:39 <... close resumed>) = 0 | 32619 16:39:39 munmap (0xb6355000, 4096 <unfinished...> | 32619 16: 39: 39 <... munmap resumed>) = 0 | 32619 16:39:39 uname (<unfinished...> | 32619 16:39:39 <... uname resumed> {sys = "Linux", node = "201221021jm93x ",...}) = 0 | 32619 16:39:39 futex (0x98e1e8, futex_wait_private, 2, null <unfinished...> | 32619 16:39:39 <... futex resumed>) = 0 | 32619 16:39:39 open ("/etc/hosts", o_rdonly | o_cloexec) = 6 | 32619 16:39:39 fstat64 (6, {st_mode = s_ifreg | 0644, st_size = 33 82 ,...}) = 0 | 32619 16:39:39 mmap2 (null, 4096, prot_read | prot_write, map_private | map_anonymous,-1, 0) = 0xb7713000 | 32619 16:39:39 read (6, "127.0.0.1 \ tlocalhost \ n10.2.30.159 \ t "..., 4096) = 3382 | 32619 16:39:39 read (6, "", 4096) = 0 | 32619 16:39:39 close (6) = 0 | 32619 16:39:39 munmap (0xb7713000, 4096) = 0 | 32619 16:39:39 futex (0x98e1e8, futex_wake_private, 1) = 1 V-> it is found that gethostbyname_r will also call f Utex () | it may not be a problem with gethostbyname () or gethostbyname_r (). The verification method <= writes gethostbyname_r () into a separate program, and then the strace trace does not generate futex () CALL | the reason that futex () is called is because the pthread library is connected during compilation, but the deadlock must be related to this, that is, the thread is executed in gethostbyname, just completed | futex (0x98e1e8, futex_wait_private, 2, null <unfinished...> (Lock) | cancel has been unlocked before it can be unlocked. Other threads execute gethostbyname () and also need to lock it. Because no one unlocks it, each thread will block it until 20 s has timed out, it is canceled by the main thread cancel. | the main thread creates a new thread again, and the thread remains deadlocked. In this way, the axel file we described at the beginning is downloaded for a period of time without any progress. v-> it can be inferred that the cause of the description at the beginning is: | when the thread is cancel, that is, it cannot be cancel after the thread is locked or before it is unlocked, apparently this is caused by the cancel mode in the thread function | "pthread_setcanceltype (pthread_cancel_deferred, & oldstate);" | it is set to asynchronous cancel mode, in the <Unix environment advanced programming> Version 2nd cancel option (p333) Section, write: | "Asynchronous cancellation and latency cancellation are different. When you try asynchronous cancellation, the thread can be canceled at any time, instead of being canceled at any point in time. "V-> Why does the author do this? | I want to control the gethostbyname () and connect () timeout in thread processing. timeout can be set until connect (), but gethostbyname () cannot be set. v-> what should I do? | 1. Change gethostbyname () to the thread-safe function gethostbyname_r () | 2 Change the cancel mode set in the thread processing function from Asynchronous cancellation mode to delayed cancellation mode | 3 use non-blocking Io and select to implement the 10 s setting of Connect () Timeout. v-> do I not need to set the timeout value of the gethostbyname_r () function? | The DNS protocol has its own default timeout time. You can use man 5 resolv. conf, view: | "the default is res_timeout (currently 5, see <resolv. h>) "| so the gethostbyname_r () timeout is not considered here. V end

------

GS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.