Zheng Haibo 2013-07-08
Problem:
There are list<string> List1 and list<string> List2, two sets each have tens of thousands of elements, how to find two sets of different elements?
Problem Analysis:
Since there are tens of thousands of elements in each list, if you use a simple traversal lookup algorithm, then at least a 10000*10000 comparison is required. Obviously, this is extremely inefficient. So is there a better plan? After my thinking, I came up with 2 ways. Please judge us.
Method One: An improved algorithm for ergodic algorithm
Idea: For each element in the List1, look in the list2, whether to repeat, if not repeat, put the element in Listdiff. If repeated, the element is removed from the list2. In this way, the time complexity of the traversal algorithm can be reduced, and the more repeated elements, the shorter the running time of the improved algorithm. Of course, if the number of repeating elements of the two list is much smaller than the length of the list, then the time complexity of the algorithm and the traversal algorithm are similar, it will become very slow and impractical.
Method Two: Using the characteristic of no repeating element in map
Idea: The element in List1 is first copied to map<string,integer>, and its Integer value is set to 1. Then the elements in the list2 are compared to the elements in the map. If the string already exists in the map, the integer in map corresponding to string is added 1 (indicating the number of occurrences of the string), and if it does not exist in the map, it will be copied to the map. and set its integer to 1. Then, the string for the element in map with an integer value of 1 is the different element in the two list.
The following code is implemented in Java, or can be tested with C + + STL.
[Java]View Plaincopy
- Import java.util.ArrayList;
- Import Java.util.HashMap;
- Import java.util.List;
- Import Java.util.Map;
- /*
- * @author: Zhenghaibo
- *2013-07-08 Nanjing,conris,china
- */
- Public class Testmian {
- private static final int listlen = 10000; Set the length of the list
- private static final Integer flagunique = 1; Key value with no repeating string
- Public list<string> list1 = new arraylist<string> ();
- Public list<string> list2 = new arraylist<string> ();
- public static void Main (string[] args) {
- //TODO auto-generated method stub
- Testmian mtest=New Testmian ();
- Mtest.initlist ();
- List<string> Listdiff1=mtest.getdiffelementuseeach (MTEST.LIST1,MTEST.LIST2); //Get different elements
- Mtest.initlist ();
- List<string> Listdiff2=mtest.getdiffelementusemap (MTEST.LIST1,MTEST.LIST2); //Get different elements
- System.out.println ("The number of the diff element is:" +listdiff1.size ());
- System.out.println ("The number of the diff element is:" +listdiff2.size ());
- //mtest.printlist (LISTDIFF1);
- //mtest.printlist (LISTDIFF2);
- }
- //Initializes the elements in the list and guarantees the same elements
- public void Initlist () {
- List1.clear ();
- List2.clear ();
- For (int i = 0; i < Listlen; i++) {
- List1.add ("conris_list_of" + i + "test");
- List2.add ("conris_list_of" + 3 * i + "test");
- }
- }
- //Get different elements in a list, find the Delete method
- Public list<string> Getdiffelementuseeach (list<string> list1,list<string> list2) {
- System.out.println ("-----------------------Method 1----------------------");
- Long runtime = System.nanotime (); //Start timing
- list<string> difflist = new arraylist<string> (); For saving two different elements in a list
- for (String string:list1) {//To eliminate duplicate elements of the list1 itself
- int Index=list2.indexof (string);
- if (index==-1) {//indicates that this element does not exist in List2
- Difflist.add (string);
- }else{//list2 This element exists, then delete this element
- List2.remove (index);
- }
- }
- For (String string:list2) {//At this time, duplicate elements in Liat2 have been deleted, just copy to Difflist
- Difflist.add (string);
- }
- System.out.println ("Getdiffelementuseremove Run Time:"
- + (System.nanotime ()-runtime));
- return difflist;
- }
- //Get different elements from two list, map method
- Public list<string> Getdiffelementusemap (list<string> list1,list<string> list2) {
- System.out.println ("-----------------------Method 2----------------------");
- Long runtime = System.nanotime (); //Start timing
- //Using a map that does not have the characteristics of duplicate elements
- map<string, integer> map = new hashmap<string,integer> (list1.size () + list2.size ());
- list<string> difflist = new arraylist<string> (); For saving two different elements in a list
- For (String string:list1) {
- Map.put (String,flagunique); ///First copy the elements in List1 to map to save
- }
- For (String string:list2) {
- Integer key = Map.get (string); //Get key value
- if (key! = null) {//If the element already exists in the map, stating that the element exists in List1, then add its key to 1
- Map.put (string, ++key);
- continue;
- }else{//If not present, put in map
- Map.put (String,flagunique);
- }
- }
- for (map.entry<string, integer> entry:map.entrySet ()) {
- if (entry.getvalue () = = Flagunique)//In map, the element with the key value Flagunique is a non-repeating element
- {
- Difflist.add (Entry.getkey ());
- }
- }
- System.out.println ("Getdiffelementusemap Run Time:"
- + (System.nanotime ()-runtime));
- return difflist;
- }
- public void Printlist (list<string> List) {
- For (int i=0;i<list.size (); i++) {
- System.out.println (List.get (i));
- }
- }
- }
Experimental results:
When Listlen is set to 10000:
Result 1:
[HTML]View Plaincopy
- -----------------------Method 1----------------------
- Getdiffelementuseremove Run time:2015792051
- -----------------------Method 2----------------------
- Getdiffelementusemap Run time:37966034
- The number of diff element is:13332
- The number of diff element is:13332
When the Listlen is set to 100000: After a half-day Method 1 does not run out of the results, Method 2 runs the following results:
[HTML]View Plaincopy
- -----------------------Method 2----------------------
- Getdiffelementusemap Run time:471017640
- The number of diff element is:133332
It can be seen that when the amount of data reaches 100000 (10 times times larger), method two still works, and time increases linearly with increasing data volume.
and Method 1 has not run out of results for a long time ...
This shows that the method of using HashMap is faster and can meet the basic requirements. I don't know what other ideas we can exchange. Hope to be of help to everyone.
PS: If implemented in C + + STL, it will run faster! Try again when it's okay.
Problem solving and programming practices for finding the different elements in two lists