Efficiently identify the different elements in the two list

Source: Internet
Author: User
Tags diff

For example: There are list<string> List1 and list<string> List2, two sets each have tens of thousands of elements, how to remove two sets of different elements?

Method 1: Traverse two sets:

Package com.czp.test;
Import java.util.ArrayList;

Import java.util.List; public class Testlist {public static void main (string[] args) {list<string> list1 = new arraylist<
        String> ();
        list<string> list2 = new arraylist<string> ();
            for (int i = 0; I < 10000 i++) {list1.add ("test" +i);
        List2.add ("test" +i*2);
        } getdiffrent (LIST1,LIST2);
     Output: Total times 2566454675}/** * Gets the different elements of the two list * @param list1 * @param list2 * @return * Private static List<string> getdiffrent (list<string> list1, list<string> list2) {long
        st = System.nanotime ();
        list<string> diff = new Arraylist<string> ();
            for (String str:list1) {if (!list2.contains (str)) {diff.add (str);
        } System.out.println ("Total times" + (System.nanotime ()-st)); RetuRN diff; }
}

Do not use this method, the total number of cycles is two list size multiplied by the product, from the output to see the time is also relatively long, then we have no other way. Of course.

Method 2: Use the Retainall () method provided by list:

Package com.czp.test;
Import java.util.ArrayList;

Import java.util.List; public class Testlist {public static void main (string[] args) {list<string> list1 = new arraylist<
        String> ();
        list<string> list2 = new arraylist<string> ();
            for (int i = 0; I < 10000 i++) {list1.add ("test" +i);
        List2.add ("test" +i*2);
        } getdiffrent (LIST1,LIST2);
        Output: Total times 2566454675 GetDiffrent2 (LIST1,LIST2);
     Output: GetDiffrent2 total times 2787800964}/** * Gets the different elements of a list * @param list1 * @param list2 * @return * * private static list<string> GetDiffrent2 (list<string> list1, list<string> List
        2) {Long st = System.nanotime ();
        List1.retainall (LIST2);
        System.out.println ("GetDiffrent2 total times" + (System.nanotime ()-st));
    return list1; /** * Gets the different elements of the two list * @param list1 * @parAM LIST2 * @return * * private static list<string> getdiffrent (list<string> list1, list<string
        > List2) {Long st = System.nanotime ();
        list<string> diff = new Arraylist<string> ();
            for (String str:list1) {if (!list2.contains (str)) {diff.add (str);
        } System.out.println ("Getdiffrent total times" + (System.nanotime ()-st));
    return diff; }
}

Unfortunately, this way, although only a few lines of code to be done, but this is more time-consuming to view the source code of Retainall ():

public boolean Retainall (collection<?> c) {
    Boolean modified = false;
    iterator<e> E = iterator ();
    while (E.hasnext ()) {
        if (!c.contains (E.next ())) {
        e.remove ();
        Modified = true;
        }
    }
    return modified;
    }

No need to explain this time consuming is inevitable, then we have no better way. Careful analysis of the above two methods I have done the MXN cycle, in fact, there is no need to cycle so many times, our requirement is to find the different elements in the two list, then I can consider this: use a map to store all the elements of Lsit, where the key is lsit1 elements, Value is the number of times the element appears, and then put all the elements of the list2 into the map, if it already exists, value plus 1, and finally we just take the element of value 1 in the map so that we can simply loop through the m+n times, greatly reducing the number of loops.

Package com.czp.test;
Import java.util.ArrayList;
Import Java.util.HashMap;
Import java.util.List;

Import Java.util.Map; public class Testlist {public static void main (string[] args) {list<string> list1 = new arraylist<
        String> ();
        list<string> list2 = new arraylist<string> ();
            for (int i = 0; I < 10000 i++) {list1.add ("test" +i);
        List2.add ("test" +i*2);
        } getdiffrent (LIST1,LIST2);
        Output: Total times 2566454675 GetDiffrent2 (LIST1,LIST2);
        Output: GetDiffrent2 total times 2787800964 getDiffrent3 (LIST1,LIST2); Output: GetDiffrent3 total times 61763995}/** * Get different elements of two list * @param list1 * @param list2 *
        return */private static list<string> GetDiffrent3 (list<string> list1, list<string> list2) {
        Long st = System.nanotime (); map<string,integer> map = new Hashmap<string,integer> (liSt1.size () +list2.size ());
        list<string> diff = new Arraylist<string> ();
        for (string string:list1) {Map.put (string, 1);
            for (string string:list2) {Integer cc = Map.get (string);
                if (cc!=null) {map.put (string, ++CC);
            Continue
        } map.put (string, 1);
            For (map.entry<string, integer> entry:map.entrySet ()) {if (Entry.getvalue () ==1)
            {Diff.add (Entry.getkey ());
        } System.out.println ("GetDiffrent3 total times" + (System.nanotime ()-st));
    return list1; /** * Gets the different elements of the two List * @param list1 * @param list2 * @return * * private static list<
        String> GetDiffrent2 (list<string> list1, list<string> list2) {Long st = System.nanotime ();
        List1.retainall (LIST2); System.out.prIntln ("GetDiffrent2 total times" + (System.nanotime ()-st));
    return list1; /** * Gets the different elements of the two List * @param list1 * @param list2 * @return * * private static list<
        String> getdiffrent (list<string> list1, list<string> list2) {Long st = System.nanotime ();
        list<string> diff = new Arraylist<string> ();
            for (String str:list1) {if (!list2.contains (str)) {diff.add (str);
        } System.out.println ("Getdiffrent total times" + (System.nanotime ()-st));
    return diff; }
}

Obviously, this method greatly reduces the time consuming, is the method 1 1/4, is the Method 2 1/40, this performance enhancement is quite considerable, but, this is not the best solution, observes the method 3 we just randomly took a list as the first added standard, So once our list2 is larger than the size of the list1, the If judgment on the second put is time-consuming, and the following improvements are made:

Package com.czp.test;
Import java.util.ArrayList;
Import Java.util.HashMap;
Import java.util.List;

Import Java.util.Map; public class Testlist {public static void main (string[] args) {list<string> list1 = new arraylist<
        String> ();
        list<string> list2 = new arraylist<string> ();
            for (int i = 0; I < 10000 i++) {list1.add ("test" +i);
        List2.add ("test" +i*2);
        } getdiffrent (LIST1,LIST2);
        GetDiffrent2 (LIST1,LIST2);
        GetDiffrent3 (LIST1,LIST2);
GetDiffrent4 (LIST1,LIST2); Getdiffrent total times 2789492240//GetDiffrent2 total times 3324502695//GetDiffrent3 Total Ti Mes 24710682//GetDiffrent4 total times 15627685}/** * Gets the different elements of the two list * @param list1 * @pa Ram List2 * @return * * private static list<string> getDiffrent4 (list<string> list1, List<stri Ng> List2) {Long St = System.nanotiMe ();
        map<string,integer> map = new hashmap<string,integer> (List1.size () +list2.size ());
        list<string> diff = new Arraylist<string> ();
        list<string> maxlist = List1;
        list<string> minlist = List2;
            if (List2.size () >list1.size ()) {maxlist = List2;
        Minlist = List1;
        for (string string:maxlist) {Map.put (string, 1);
            for (string string:minlist) {Integer cc = Map.get (string);
                if (cc!=null) {map.put (string, ++CC);
            Continue
        } map.put (string, 1);
            For (map.entry<string, integer> entry:map.entrySet ()) {if (Entry.getvalue () ==1)
            {Diff.add (Entry.getkey ());
        } System.out.println ("GetDiffrent4 total times" + (System.nanotime ()-st));
  return diff;      
    /** * Gets the different elements of the two list * @param list1 * @param list2 * @return * * Private Stati C list<string> GetDiffrent3 (list<string> list1, list<string> list2) {Long st = System.nanotime (
        );
        map<string,integer> map = new hashmap<string,integer> (List1.size () +list2.size ());
        list<string> diff = new Arraylist<string> ();
        for (string string:list1) {Map.put (string, 1);
            for (string string:list2) {Integer cc = Map.get (string);
                if (cc!=null) {map.put (string, ++CC);
            Continue
        } map.put (string, 1);
            For (map.entry<string, integer> entry:map.entrySet ()) {if (Entry.getvalue () ==1)
            {Diff.add (Entry.getkey ()); } System.out.println ("GetDiffrent3 total times" + (System.nanotime ())-ST));
    return diff; /** * Gets the different elements of a List * @param list1 * @param list2 * @return * * private static list<
        String> GetDiffrent2 (list<string> list1, list<string> list2) {Long st = System.nanotime ();
        List1.retainall (LIST2);
        System.out.println ("GetDiffrent2 total times" + (System.nanotime ()-st));
    return list1; /** * Gets the different elements of the two List * @param list1 * @param list2 * @return * * private static list<
        String> getdiffrent (list<string> list1, list<string> list2) {Long st = System.nanotime ();
        list<string> diff = new Arraylist<string> ();
            for (String str:list1) {if (!list2.contains (str)) {diff.add (str);
        } System.out.println ("Getdiffrent total times" + (System.nanotime ()-st));
    return diff; }
}

Here to the size of the list is judged, small at the end of the add, this will reduce the cycle of judgment, performance has a certain upgrade, as a friend said, programming is endless, as long as you seriously to think, will always find a better way.

Thank Binglian very much, for the list has repeated elements of the problem, do the following correction, first of all, two list no matter how many repetitions, as long as the repeated elements in the two list can be found, should not be included in the return value, so in the second cycle, This judgment: if the current element is not found in the map, it will definitely need to add to the return value, if you can find the value++, after the traversal after the diff contains only in the List2 and not in the LIST2 elements, The rest of the work is to find elements in the list1 that are not in the List2, traversing the map to take value 1:

Package com.czp.test;
Import java.util.ArrayList;
Import Java.util.HashMap;
Import java.util.List;

Import Java.util.Map; public class Testlist {public static void main (string[] args) {list<string> list1 = new arraylist<
        String> ();
        list<string> list2 = new arraylist<string> ();
            for (int i = 0; I < 10000 i++) {list1.add ("test" +i);
        List2.add ("test" +i*2);
        } getdiffrent (LIST1,LIST2);
        GetDiffrent3 (LIST1,LIST2);
        GETDIFFRENT5 (LIST1,LIST2);
        GetDiffrent4 (LIST1,LIST2);

GetDiffrent2 (LIST1,LIST2);  GetDiffrent3 total times 32271699//GETDIFFRENT5 total times 12239545//GetDiffrent4 total times
     16786491//GetDiffrent2 total times 2438731459}/** * Gets the different elements of the two list * @param list1 * @param list2 * @return/private static list<string> getDiffrent5 (list<string> list1, List <string&Gt
         List2) {Long st = System.nanotime ();
         list<string> diff = new Arraylist<string> ();
         list<string> maxlist = List1;
         list<string> minlist = List2;
             if (List2.size () >list1.size ()) {maxlist = List2;
         Minlist = List1;
         } map<string,integer> Map = new hashmap<string,integer> (Maxlist.size ()); for (string string:maxlist) {Map.put (string, 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.