Obtain the length and width of all images on the webpage at the fastest speed

Source: Internet
Author: User
Obtain the length and width of all images on the webpage at the fastest speed. I wonder if you have ever played & nbsp; pinterest.com & nbsp ;? After registration, it has a & nbsp; add & nbsp; a & nbsp; pin, & nbsp; when you submit a website URL and press Find & nbsp; Images, it can be used to search for all the images on the submitted webpage (screening by the length and width conditions). The entire process is generally about 10 seconds. Obtain the length and width of all images on the webpage at the fastest speed.
I do not know if you have played http://pinterest.com? After registration, it has an add a pin. when you submit a website URL and press Find Images, it can be used to search for all the images on the submitted webpage (screening by the length and width conditions). The entire process is generally about 10 seconds.

Recently I want to imitate it and make a small functional component. The getimagesize () (which takes 48.64 seconds) has been abandoned, and imagecreatefromstring () (or 26.13 seconds) has been replaced. it is quite different from its 10 seconds or so.

TCP connections should be considered, server resources should be minimized, and execution time should be minimized. How can I continue to optimize the code? It can run faster.



Function ranger ($ url ){
$ Headers = array ("Range: bytes = 0-32768 ");
$ Curl = curl_init ($ url );
Curl_setopt ($ curl, CURLOPT_HTTPHEADER, $ headers );
Curl_setopt ($ curl, CURLOPT_RETURNTRANSFER, 1 );
Return curl_exec ($ curl );
Curl_close ($ curl );
} // Curl settings

Require dirname (_ FILE _). '/simple_html_dom.php ';
// Use simple_html_dom.php to analyze HTML nod

$ Url = 'http: // www.huffingtonpost.com /';

$ Html = file_get_html ($ url );
If ($ html-> find ('IMG ')){
Foreach ($ html-> find ('IMG ') as $ element ){
$ Raw = ranger ($ element-> src );
$ Im = @ imagecreatefromstring ($ raw );
$ Width = @ imagesx ($ im );
$ Height = @ imagesy ($ im );
If ($ width> = 200 | $ height> = 200 ){
Echo $ element; // Obtain an image that grows larger than 200 and is larger than or equal to 200.
}
}
}



------ Solution ----------------------
It may take a detour to reduce the network pressure on the server.
The server is responsible for parsing HTML data, collecting image tag information, and sending the collected text data back to the client.
Loading images is completed by the client. you only need to read the width and height attributes to obtain the original image size.
There are many benefits, but the possible trouble is anti-Leech.
------ Solution ----------------------
Top floor
PHP resource acquisition
Javascript Image length and width
------ Solution ----------------------
Read and parse 2.8 seconds
Read Images (138) for 27 seconds
Found 7

It should be slightly oily only from the optimization code.
Multi-Channel concurrency considerations
------ Solution ----------------------
Read and parse 3.6 seconds
Start the image reading process (138) for 1.3 seconds
Result file contains 7 Records
http://s.huffpost.com/images/v/logos/v4/tagline.gif
http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9
http://i.huffpost.com/gen/559399/thumbs/r-OLBERMANN-huge.jpg
http://s.huffpost.com/images/facebook_promo_connect.png?3
http://images.huffingtonpost.com/2012-04-04-michaeljfoxmarlo2SECOND.jpg
http://images.huffingtonpost.com/2012-04-05-Screenshot20120405at9.40.24AM.jpg
http://i.huffpost.com/gen/557914/thumbs/s-SCORSESE-large300.jpg


The original cycle is changed
    foreach($html->find('img') as $element) {
tenor("tenorcall.php?v=$element->src");
}
}


Tenorcall. php
Function ranger ($ url ){
$ Headers = array ("Range: bytes = 0-32768 ");
$ Curl = curl_init ($ url );
Curl_setopt ($ curl, CURLOPT_HTTPHEADER, $ headers );
Curl_setopt ($ curl, CURLOPT_RETURNTRANSFER, 1 );
Return curl_exec ($ curl );
Curl_close ($ curl );
} // Curl settings

$ Raw = ranger ($ _ GET ['V']);
$ Im = @ imagecreatefromstring ($ raw );
$ Width = @ imagesx ($ im );
$ Height = @ imagesy ($ im );
If ($ width> = 200
------ Solution ----------------------
$ Height >=200 ){
File_put_contents('tenorcall.txt ', $ _ GET ['V']. PHP_EOL, FILE_APPEND); // you can obtain an image larger than 200 and larger than or equal to 200.
}


/**
* Function tenor
* The function starts a url, but does not wait to return
* Parameter $ page: The page program to be executed
* None returned
**/
If (! Function_exists ('tenor ')):
Function tenor ($ page ){
$ Host = $ _ SERVER ["HTTP_HOST"];
$ Fp = fsockopen ($ host, 80, $ errno, $ errmsg );
If (! $ Fp ){
Echo "$ errstr ($ errno)
\ N ";
} Else {
Fputs ($ fp, "GET/$ page HTTP/1.0 \ nHost: $ host \ n ");
Fclose ($ fp );
}
}
Endif;


The code is still the original code. instead of reducing the number, the code is increased.
However, because it is concurrent, the speed is significantly improved.

It is worth noting that the tenor function cannot run stably on some web servers (such as iis6) for unknown reasons.
------ Solution ----------------------
I think it is feasible to load the client,

The client then submits the required image information to the server. The server then verifies the information and saves it...


How did we get the other 32768? 1-200 is not enough?
------ Solution ----------------------
Learning! Does PHP directly read the image header information after obtaining the image url?
------ Solution ----------------------
The pinterest pin is very creative and technically simple. it means to bookmark a string of js code, and then you click this bookmark to append a js file to the current page document, it is easy to write this js file, mainly to traverse the document. getElementsByTagName ('IMG ')
------ Solution ----------------------

This post was last edited by xuzuning at 15:25:06

The concurrency of 138 photos consumes 138 connections.
Pair

Whether to modify php. ini to increase the number of connections
No, the connection is external. if you want to change it, the other party also changes it.

CPU and memory overhead
This is not a good test

Again, The question about using js to judge is that they cannot be tested because they do not provide code.
I wrote two solutions that were not ideal.

If JavaScript concurrency and direct PHP concurrency are used, which of the two is less likely to consume resources?
The resource consumption angle is the same, and all images must be loaded completely.
However, the former consumes client resources while the latter consumes server resources.
In addition, the browser mechanism is unknown, and it is unknown whether the browser mechanism is actually concurrent.
------ Solution ----------------------
This code is about 1.8 seconds here, and does not calculate the file_get_html ($ url) time.

$ Res [] = $ url; // $ temp;
This is the network address.

It is saved as a local file and getimagesize is used to obtain the size.

It should be through curl concurrency. I don't know much about this mechanism.
------ Solution ----------------------
However, if (in_array ($ absUrl, $ visited) continue; this line reports an error. Warning: in_array () expects parameter 2 to be array, null.

His code does not contain the error code you mentioned.
File_get_html reports an error.
File_get_html: use file_get_contents to read URLs with a low success rate
You need to refresh the data two or three times to obtain the unique data.
------ Solution ----------------------
JS can directly obtain the image height by obtaining the image header information,
This method is at least 10 times faster than obtaining images after they are loaded,
I remember seeing this post in a podcast,
No Favorites. I won't be able to find it at half past one. it's depressing ~

------ Solution ----------------------
Just registered http://pinterest.com. It uses a client to load data.
Click Add to select Pin, paste the URL http://www.huffingtonpost.com/
In chrome's Network, we can see a request.
GET/pin/create/find_images /? Url = http % 253A % 2F % 2Fwww.huffingtonpost.com HTTP/1.1
The returned information is a json object:
images: [http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9,…]
0: "http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9"
1: "http://s.huffpost.com/images/v/logos/v4/tagline.gif"
2: "http://s.huffpost.com/images/splash/t_mini-a.png"
3: "http://s.huffpost.com/images/splash/t_mini-a.png"
4: "http://s.huffpost.com/images/splash/t_mini-a.png"
5: "http://s.huffpost.com/images/splash/t_mini-a.png"
6: "http://s.huffpost.com/images/splash/t_mini-a.png"
7: "http://s.huffpost.com/images/splash/t_mini-a.png"
8: "http://s.huffpost.com/images/splash/t_mini-a.png"
9: "http://s.huffpost.com/images/splash/t_mini-a.png"
10: "http://s.huffpost.com/images/splash/t_mini-a.png"
11: "http://s.huffpost.com/images/splash/t_mini-a.png"
12: "http://s.huffpost.com/images/splash/t_mini-a.png"
13: "http://s.huffpost.com/images/splash/t_mini-a.png"
14: "http://s.huffpost.com/images/splash/t_mini-a.png"
15: "http://s.huffpost.com/images/splash/t_mini-a.png"
16: "http://s.huffpost.com/images/splash/t_mini-a.png"
17: "http://i.huffpost.com/gen/560770/thumbs/r-GSA-LAS-VEGAS-VIDEO-huge.jpg"
18: "http://s.huffpost.com/images/webslice12x12.png"
19: "http://s.huffpost.com/images/v/blog_column.png"
20: "http://s.huffpost.com/contributors/gary-hart/headshot.jpg"
21: "http://www.huffingtonpost.com/images/trans.gif"
22: "http://www.huffingtonpost.com/images/trans.gif"
23: "http://www.huffingtonpost.com/images/trans.gif"
24: "http://images.huffingtonpost.com/2012-04-06-campbellguitar.jpg"
25: "http://www.huffingtonpost.com/images/trans.gif"
26: "http://www.huffingtonpost.com/images/trans.gif"
27: "http://www.huffingtonpost.com/images/trans.gif"
28: "http://www.huffingtonpost.com/images/trans.gif"
29: "http://www.huffingtonpost.com/images/trans.gif"
30: "http://www.huffingtonpost.com/images/trans.gif"
31: "http://images.huffingtonpost.com/2012-04-06-Screenshot20120406at7.09.17PM.jpg"
32: "http://www.huffingtonpost.com/images/trans.gif"
33: "http://www.huffingtonpost.com/images/trans.gif"
34: "http://www.huffingtonpost.com/images/trans.gif"
35: "http://www.huffingtonpost.com/images/trans.gif"
36: "http://www.huffingtonpost.com/images/trans.gif"
37: "http://www.huffingtonpost.com/images/trans.gif"
38: "http://www.huffingtonpost.com/images/trans.gif"
39: "http://www.huffingtonpost.com/images/trans.gif"
40: "http://www.huffingtonpost.com/images/trans.gif"
41: "http://www.huffingtonpost.com/images/trans.gif"
42: "http://www.huffingtonpost.com/images/trans.gif"
43: "http://www.huffingtonpost.com/images/trans.gif"
44: "http://www.huffingtonpost.com/images/trans.gif"
45: "http://www.huffingtonpost.com/images/trans.gif"
46: "http://www.huffingtonpost.com/images/trans.gif"
47: "http://www.huffingtonpost.com/images/trans.gif"
48: "http://www.huffingtonpost.com/images/trans.gif"
49: "http://www.huffingtonpost.com/images/trans.gif"
50: "http://www.huffingtonpost.com/images/trans.gif"
51: "http://www.huffingtonpost.com/images/trans.gif"
52: "http://www.huffingtonpost.com/images/trans.gif"
53: "http://www.huffingtonpost.com/images/trans.gif"
54: "http://www.huffingtonpost.com/images/trans.gif"
55: "http://www.huffingtonpost.com/images/trans.gif"
56: "http://www.huffingtonpost.com/images/trans.gif"
57: "http://www.huffingtonpost.com/images/trans.gif"
58: "http://www.huffingtonpost.com/images/trans.gif"
59: "http://www.huffingtonpost.com/images/trans.gif"
60: "http://www.huffingtonpost.com/images/trans.gif"
61: "http://www.huffingtonpost.com/images/trans.gif"
62: "http://www.huffingtonpost.com/images/trans.gif"
63: "http://www.huffingtonpost.com/images/trans.gif"
64: "http://www.huffingtonpost.com/images/trans.gif"
65: "http://www.huffingtonpost.com/images/trans.gif"
66: "http://www.huffingtonpost.com/images/trans.gif"
67: "http://www.huffingtonpost.com/images/trans.gif"
68: "http://www.huffingtonpost.com/images/trans.gif"
69: "http://www.huffingtonpost.com/images/trans.gif"
70: "http://www.huffingtonpost.com/images/trans.gif"
71: "http://www.huffingtonpost.com/images/trans.gif"
72: "http://www.huffingtonpost.com/images/trans.gif"
73: "http://www.huffingtonpost.com/images/trans.gif"
74: "http://www.huffingtonpost.com/images/trans.gif"
75: "http://s.huffpost.com/images/blank.gif"
76: "http://s.huffpost.com/images/blank.gif"
77: "http://s.huffpost.com/images/blank.gif"
78: "http://s.huffpost.com/images/blank.gif"
79: "http://s.huffpost.com/images/blank.gif"
80: "http://s.huffpost.com/images/blank.gif"
81: "http://s.huffpost.com/images/blank.gif"
82: "http://s.huffpost.com/images/facebook_promo_connect.png?3"
83: "http://s.huffpost.com/images/loader.gif"
84: "http://www.huffingtonpost.com/images/trans.gif"
85: "http://www.huffingtonpost.com/images/trans.gif"
86: "http://www.huffingtonpost.com/images/trans.gif"
87: "http://www.huffingtonpost.com/images/trans.gif"
88: "http://www.huffingtonpost.com/images/trans.gif"
89: "http://www.huffingtonpost.com/images/trans.gif"
90: "http://s.huffpost.com/contributors/gary-hart/headshot.jpg"
91: "http://s.huffpost.com/contributors/mike-campbell/headshot.jpg"
92: "http://s.huffpost.com/contributors/roma-downey/headshot.jpg"
93: "http://s.huffpost.com/contributors/gavin-newsom/headshot.jpg"
94: "http://s.huffpost.com/contributors/sarah-shourd/headshot.jpg"
95: "http://s.huffpost.com/contributors/jacqueline-novogratz/headshot.jpg"
96: "http://s.huffpost.com/contributors/peggy-drexler/headshot.jpg"
97: "http://s.huffpost.com/contributors/mohamed-a-elerian/headshot.jpg"
98: "http://s.huffpost.com/contributors/bill-mckibben/headshot.jpg"
99: "http://s.huffpost.com/contributors/marlo-thomas/headshot.jpg"
100: "http://www.huffingtonpost.com/images/v/something_to_say_button.png"
101: "http://www.huffingtonpost.com/images/trans.gif"
102: "http://www.huffingtonpost.com/images/trans.gif"
103: "http://www.huffingtonpost.com/images/trans.gif"
104: "http://www.huffingtonpost.com/images/trans.gif"
105: "http://www.huffingtonpost.com/images/trans.gif"
106: "http://www.huffingtonpost.com/images/trans.gif"
107: "http://www.huffingtonpost.com/images/trans.gif"
108: "http://www.huffingtonpost.com/images/trans.gif"
109: "http://www.huffingtonpost.com/images/trans.gif"
110: "http://www.huffingtonpost.com/images/trans.gif"
111: "http://www.huffingtonpost.com/images/trans.gif"
112: "http://www.huffingtonpost.com/images/trans.gif"
113: "http://www.huffingtonpost.com/images/trans.gif"
114: "http://www.huffingtonpost.com/images/trans.gif"
115: "http://www.huffingtonpost.com/images/trans.gif"
116: "http://www.huffingtonpost.com/images/trans.gif"
117: "http://www.huffingtonpost.com/images/trans.gif"
118: "http://www.huffingtonpost.com/images/trans.gif"
119: "http://www.huffingtonpost.com/images/trans.gif"
120: "http://www.huffingtonpost.com/images/trans.gif"
121: "http://www.huffingtonpost.com/images/trans.gif"
122: "http://www.huffingtonpost.com/images/trans.gif"
123: "http://www.huffingtonpost.com/images/trans.gif"
124: "http://www.huffingtonpost.com/images/trans.gif"
125: "http://www.huffingtonpost.com/images/trans.gif"
126: "http://www.huffingtonpost.com/images/trans.gif"
127: "http://www.huffingtonpost.com/images/trans.gif"
128: "http://www.huffingtonpost.com/images/trans.gif"
129: "http://www.huffingtonpost.com/images/trans.gif"
130: "http://www.huffingtonpost.com/images/trans.gif"
131: "http://www.huffingtonpost.com/images/trans.gif"
132: "http://www.huffingtonpost.com/images/trans.gif"
133: "http://www.huffingtonpost.com/images/trans.gif"
134: "http://b.scorecardresearch.com/p?c1=2&c2=6723616&c3=&c4=&c5=front&c6=&c15=&cj=1"
135: "http://www.huffingtonpost.com//secure-us.imrworldwide.com/cgi-bin/m?ci=us-703240h&cg=0&cc=1&ts=noscript"
136: "http://vertical-stats.huffpost.com/?-1&&"
137: "http://www.huffingtonpost.com//pixel.quantserve.com/pixel/p-6fTutip1SMLM2.gif?labels=Home"
images_count: 138
redirected: false
status: "success"
title: "Breaking News and Opinion on The Huffington Post"
type: "text/html; charset=utf-8"


When the server returns almost the same result, the browser starts to load the image. Chrome monitoring is as follows. The yellow line indicates that the url is submitted to obtain Image resources, and the image is loaded later. the loading speed depends on my network.

Because the JS code of http://pinterest.com/has been compressed, and jqueryis used, it takes a lot of effort to find it. In fact, it is very easy for anyone to think about how to do it. Traverse json data, create an img tag object, set the src attribute, and save the object. The remaining browsers will be completed by themselves.
------ Solution ----------------------
Reference:
Reference:

Just registered http://pinterest.com. It uses a client to load data.
Click Add to select Pin, paste the URL http://www.huffingtonpost.com/
In chrome's Network, we can see a request.
GET/pin/create/find_images /? Url = http % 253A % 2F % 2Fwww. huffingtonpo ......

What object?
Do you mean the data of the image link returned by the server? No need to save. After receiving the ajax request, parse the returned data.
In addition, all external resources loaded by the browser are asynchronous. That is to say, no matter whether JQuery is used or not, it is asynchronously loaded and will not affect each other. Similar to the php end written by the boss.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.