Java URL class stepping hole guide __java

Source: Internet
Author: User
Tags readline
Background Information

One of the most recent RSS-reading tools for yourself is to get a JSON file containing a list of RSS feeds from the server side, and then download and parse RSS content based on this JSON file. The core code is as follows:

Class Presenterimpl (Val context:context, Val activity:mainactivity): ipresenter {
    private val url_api = "Https://vi Merzhao.github.io/others/rssreader/rss.json "

    override Fun Getrssresource (): rsssource {
        val Gson = Gsonbuilder (). Create () return
        Gson.fromjson (getfromnet (URL_API), Rsssource::class.java)
    }

    Private Fun Getfromnet (url:string): String {
        val result = URL (url). ReadText () return result
    } ...
}

Until the first two days I purchased a vimerzhao.top domain name and redirected the original domain name Vimerzhao.github.io to the Vimerzhao.top. The tool cannot be used, but the data is available in the browser input URL_API:

Then why Url.readtext () didn't get the data. Redirection not supported

You can test by following code:

Import java.net.*;
Import java.io.*;

public class Testredirect {public
    static void Main (String args[]) {
        try {
            url url1 = new URL ("Https://vimerzh Ao.github.io/others/rssreader/rss.json ");
            URL url2 = new URL ("Http://vimerzhao.top/others/rssreader/RSS.json");
            Read (URL1);
            System.out.println ("=--------------------------------=");
            Read (URL2);
        } catch (Exception e) {
            e.printstacktrace ();
        }
    }
    public static void read (URL url) {
        try {
            BufferedReader in = new BufferedReader (
                    new InputStreamReader URL . OpenStream ());

            String Inputline;
            while ((Inputline = In.readline ())!= null) {
                System.out.println (inputline);
            }
            In.close ();
        } catch (IOException e) {
            e.printstacktrace ();}}}

The results were as follows:

 

HTTP return code 301, where redirection occurred. This process is so fast in the browser that we do not see the 301 interface appearing. Here need to explain is url.readtext () is an extension function in Kotlin, the essence or call the URL class OpenStream method, part of the source code is as follows:

.....
/**
 * Reads the entire content of this URL as a String using UTF-8 or the specified [charset].
 * * This method isn't
 recommended on huge files.
 *
 * @param charset a character set to use.
 * @return A string with the this URL entire content.
 * *
@kotlin. internal.inlineonly public
Inline Fun url.readtext (Charset:charset = charsets.utf_8): String = Readbytes (). ToString (CharSet)

/**
 * Reads the entire content of the URL as byte array.
 * * This method isn't
 recommended on huge files.
 *
 * @return A byte array with the this URL entire content.
 * * Public
Fun url.readbytes (): ByteArray = OpenStream (). Use {it.readbytes ()}

So the test code above explains why the Url.readtext () failed.
However, the URL does not support redirection is reasonable. Why not support. has yet to be explored. the unstable Equals method

First look at the description of the Equals (URL (Java Platform SE 7)):

Compares this URL to equality with another object.
If the given object is not a URL then this method immediately returns false.
Two URL objects are equal if they have the same protocol, reference equivalent, hosts, have the same port number on the HOS T, and the same file and fragment of the file.
Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can ' t be resolved, the host names must is equal without to case; or both host names equal to NULL.
Since hosts comparison requires name resolution, this operation is a blocking operation.
Note:the defined behavior for equals are known to being inconsistent with virtual hosting in HTTP.

Next, look at the code:

Import java.net.*;
public class Testequals {public
    static void Main (String args[]) {
        try {
            //Vimerzhao's Blog home page
            URL url1 = new URL ("https://vimerzhao.github.io/");
            Zhanglanqing's blog homepage
            url url2 = new URL ("https://zhanglanqing.github.io/");
            Vimerzhao Blog Home page redirect after the domain name
            url url3 = new URL ("http://vimerzhao.top/");
            System.out.println (Url1.equals (URL2));
            System.out.println (Url1.equals (URL3));
        } catch (Exception e) {
            e.printstacktrace ();}}}

What is the output according to the definition? After running, this is true:

True
false

You may be right, but if I break the computer and execute it again, the result is:

False
False

But in fact, 3 domain names are the same IP address, you can ping:

Zhaoyu@inspiron ~/project $ ping vimezhao.github.io ping sni.github.map.fastly.net (151.101.77.147) bytes of data. Bytes from 151.101.77.147:icmp_seq=1 ttl=44 time=396 ms ^c---sni.github.map.fastly.net ping statistics---1 packet s transmitted, 1 received, 0% packet loss, time 0ms rtt Min/avg/max/mdev = 396.692/396.692/396.692/0.000 ms Zhaoyu@inspiro
N ~/project $ ping Zhanglanqing.github.io ping sni.github.map.fastly.net (151.101.77.147) bytes of data.  Bytes from 151.101.77.147:icmp_seq=1 ttl=44 time=396 ms ^c---sni.github.map.fastly.net ping statistics---2 packets Transmitted, 1 received, 50% packet loss, time 1000ms rtt Min/avg/max/mdev = 396.009/396.009/396.009/0.000 ms ZHAOYU@INSP Iron ~/project $ ping vimezhao.top ping:unknown host vimezhao.top zhaoyu@inspiron ~/project $ ping vimerzhao.top ping SNi
. Github.map.fastly.net (151.101.77.147) bytes of data. Bytes from 151.101.77.147:icmp_seq=1 ttl=44 time=409 ms ^c---sni.github.map.fastly.net piNG Statistics---2 packets transmitted, 1 received, 50% packet loss, time 1001ms rtt Min/avg/max/mdev = 409.978/409.978/4 09.978/0.000 ms

First look at the network connection, Vimerzhao.github.io and Zhanglanqing.github.io is my classmate and I blog, although the content is different but point to the same IP, protocol, port, etc. are the same, so equal While Vimerzhao.github.io and Vimerzhao.top point to the same blog, but one is HTTPS one is HTTP, the protocol is different, so judge is not equal. believe this is contrary to most people's intuition : URLs that point to different blogs are equal, but URLs that point to the same blog are not equal.
Then analyze the results after the broken network: First look at the URL source:

    public boolean equals (Object obj) {
        if (!) ( obj instanceof URL)) return
            false;
        URL u2 = (URL) obj;

        Return Handler.equals (this, U2);
    

Then look at the source of the handler object:

    protected Boolean equals (url u1, url u2) {
        String ref1 = U1.getref ();
        String Ref2 = U2.getref ();
        return (REF1 = = Ref2 | | (ref1!= null && ref1.equals (REF2))) &&
               Samefile (U1, U2);
    }

Samefile Source:

    Protected boolean samefile (url u1, url u2) {
        //Compare the protocols.
        if (!) ( (U1.getprotocol () = = U2.getprotocol ()) | |
              (U1.getprotocol ()!= null &&
               U1.getprotocol (). Equalsignorecase (U2.getprotocol ()
            ))) return false;

        Compare the files.
        if (!) ( U1.getfile () = = U2.getfile () | |
              (U1.getfile ()!= null && u1.getfile (). Equals (U2.getfile ()
            ))) return false;

        Compare the ports.
        int Port1, port2;
        Port1 = (U1.getport ()!=-1)? U1.getport (): U1.handler.getDefaultPort ();
        Port2 = (U2.getport ()!=-1)? U2.getport (): U2.handler.getDefaultPort ();
        if (Port1!= port2) return
            false;

        Compare the hosts.
        if (!hostsequal (U1, U2)) return
            false;//No network connection triggers this sentence return

        true;
    }

Finally is the source code of Hostsequal:

    Protected boolean hostsequal (url u1, url u2) {
        inetaddress a1 = gethostaddress (u1);
        inetaddress A2 = Gethostaddress (U2);
        If we have Internet address for both, compare them
        if (A1!= null && A2 null) {return
            != (a1.equals) ;
        else, if both have host names, compare them
        } else if (u1.gethost ()!= null && u2.gethost ()!= null)
            RE Turn u1.gethost (). Equalsignorecase (U2.gethost ());
         else return
            u1.gethost () = = null && u2.gethost () = null;
    }

In the case of a network, A1 and A2 are not NULL, which triggers return a1.equals (A2), returns True, and when there is no network, it triggers returning u1.gethost (). Equalsignorecase ( ); The second judgment, apparently URL1 host (Vimerzhao.github.io) and URL2 host (Zhanglanqing.github.io), returns false, causing the if (!hostsequal, U2)) is true, return false executes.
It can be seen that the Equals method of the URL class not only violates intuition but also lacks consistency, it is very dangerous to have different result in different environment. Time -consuming Equals method

In addition, equals is a time-consuming operation because of the need for DNS resolution in the case of a network, hashcode () as an example of hashcode (). Hashcode () Source of the URL class:

    public synchronized int hashcode () {
        if (hashcode!=-1) return
            hashcode;

        Hashcode = Handler.hashcode (this);
        return hashcode;
    }

The Hashcode () method of the Handler object:

 protected int hashcode (URL u) {int h = 0;
        Generate the protocol part.
        String protocol = U.getprotocol ();

        if (protocol!= null) H + = Protocol.hashcode ();
        Generate the host part.
        InetAddress addr = gethostaddress (u);
        if (addr!= null) {H + = Addr.hashcode ();
            else {String host = U.gethost ();
        if (host!= null) H + + host.tolowercase (). Hashcode ();
        }//Generate the file part.
        String file = U.getfile ();

        if (file!= null) H + = File.hashcode ();
        Generate the port part.
        if (u.getport () = = 1) H + = Getdefaultport ();

        else H = u.getport ();
        Generate the ref part.
        String ref = U.getref ();

        if (ref!= null) H + = Ref.hashcode ();
    return h; }

where gethostaddress () consumes a lot of time. So, if you store URL objects in a container based on a hash table, it's a disaster. The following code compares the URL and the URI's performance when it is stored 50 times:

Import java.net.*;
Import java.util.*;

public class Testhash {public
    static void Main (String args[]) {
        hashset<url> list1 = new hashset<> ();
        hashset<uri> list2 = new hashset<> ();
        try {
            url url1 = new URL ("https://vimerzhao.github.io/");
            Uri url2 = new Uri ("https://zhanglanqing.github.io/");
            Long cur = system.currenttimemillis ();
            int cnt = m;
            for (int i = 0; i < cnt; i++) {
                list1.add (URL1);
            }
            System.out.println (System.currenttimemillis ()-cur);
            cur = System.currenttimemillis ();
            for (int i = 0; i < cnt; i++) {
                list2.add (URL2);
            }
            System.out.println (System.currenttimemillis ()-cur);

        catch (Exception e) {
            e.printstacktrace ();}}}

The output is:

271
0

Therefore, containers based on hash table implementations are best not to use URLs. the role of Trailingslash

The so-called Trailingslash is the end of the domain name slash. For example, we see vimerzhao.top in the browser, copy paste found is http://vimerzhao.top/. First Test with the following code:

Import java.net.*;
Import java.io.*;

public class Testtrailingslash {public
    static void Main (String args[]) {
        try {
            url url1 = new URL ("Https://vi merzhao.github.io/");
            URL url2 = new URL ("Https://vimerzhao.github.io");
            System.out.println (Url1.equals (URL2));
            Outputinfo (URL1);
            Outputinfo (URL2);
        } catch (Exception e) {
            e.printstacktrace ();
        }
    }
    public static void Outputinfo (URL url) {
        System.out.println ("------" + url.tostring () + "----------");
        System.out.println (Url.getref ());
        System.out.println (Url.getfile ());
        System.out.println (Url.gethost ());
        System.out.println ("----------------");
    }

The results were as follows:

False
------https://vimerzhao.github.io/----------
null
/
Vimerzhao.github.io
------------ ----
------Https://vimerzhao.github.io----------
null

vimerzhao.github.io
----------------

In fact, regardless of the previous read () method of reading or the address bar directly input url,url1 and URL2 content is the same , but add/indicate that this is a directory, does not add that this is a file, so the two getfile () The result is different, Causes equals to be judged to be false. In the address bar input is not even aware of this trailingslash, the return of the same result, but the equals judge is false, it is impossible to guard against.
Here is another problem: one is a file, so one is a directory, why can get the same result.
Investigation after found: In fact, if there is a request, then you will find index.html files in this directory; if not, take vimerzhao.top/tags as an example, will find tags first, if not found will automatically add a/, in the back And then in the tags directory to find index.html file. As shown in figure:

Here's an interesting test to write two pieces of code as follows:

import java.net.*; import java.io.*;  public class Testtrailingslash {public static void main (String args[]) {try {URL Urlwithslash =
            New URL ("http://vimerzhao.top/tags/");
            int cnt = 5;
            Long cur = system.currenttimemillis ();
            for (int i = 0; i < cnt; i++) {read (Urlwithslash);
        } System.out.println (System.currenttimemillis ()-cur);
        catch (Exception e) {e.printstacktrace ();
                    The public static void read (URL url) {try {BufferedReader in = new BufferedReader (

            New InputStreamReader (Url.openstream ()));
            String Inputline; while ((Inputline = In.readline ())!= 
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.