Introduction
This week's job is actually a little complicated, and there are a lot of code to be completed, a little bit around. This week's course focuses on the classes, inheritance, and Polymorphism in Scala, And the homework also examines the content of the course from various aspects. The content in the TweetSet. scala file is somewhat trendy and is related to Twitter. A abstract class TweetSet is defined, and its two sub-classes Empty and NonEmpty represent Empty sets and non-Empty sets. Non-empty sets are represented by binary trees. The root of the Binary Tree is a Tweet object, indicating a Tweet (in the words of tianchao, it is called "Weibo") and a microblog, A Tweet contains three fields: user, text, and retweets, indicating the author, content, and number of forwards of the tweet respectively. The left and right subtree of a binary tree is sorted by the text content of its root. The text of all left subtree nodes is smaller than the text of the root node in the Lexicographic Order, the text of all nodes in the right subtree is greater than the text of the root node in the Lexicographic Order.
Note that all classes are immutable classes, that is, all operations will not change their own content, but return a new object after modification.
The question emphasizes that you can refer to the implementation methods contains and incl before doing the job. Let's take a look at the implementation of these two methods in the Empty and NonEmpty classes:
def contains(tweet: Tweet): Boolean = false def incl(tweet: Tweet): TweetSet = new NonEmpty(tweet, new Empty, new Empty)
def contains(x: Tweet): Boolean = if (x.text < elem.text) left.contains(x) else if (elem.text < x.text) right.contains(x) else true def incl(x: Tweet): TweetSet = { if (x.text < elem.text) new NonEmpty(elem, left.incl(x), right) else if (elem.text < x.text) new NonEmpty(elem, left, right.incl(x)) else this }
The above two sections are the implementation code of contains and incl in the Empty and NonEmpty classes respectively, and the corresponding method declaration in their parent classes is as follows:
def incl(tweet: Tweet): TweetSet /** * Returns a new `TweetSet` which excludes `tweet`. */ def remove(tweet: Tweet): TweetSet /** * Tests if `tweet` exists in this `TweetSet`. */ def contains(tweet: Tweet): Boolean
We can see that in TweetSet, all methods do not have function bodies, and we do not need to implement these methods in the abstract class, because we will not instantiate an abstract class, so even if the method is implemented, there is no chance to call them.
The implementation of the two methods in NonEmpty is very simple. Because Empty does not contain any elements, you only need to return false for all contains methods. The incl method adds an element to Empty. After an element is added to the Empty set, it becomes a NonEmpty. Its root is the newly added Tweet object, left and right subtree are empty.
Let's take a look at NonEmpty. The contains method first judges the lexicographic relationship between the root text and the input text. If the input text is smaller than the root text, text with the same input parameter and description can only exist in the left subtree of the Set; otherwise, it exists in the right subtree of the set. If the text of the input parameter is the same as the text of the Root parameter, true is returned.
Have you noticed that determining whether the content is included only determines whether the text and input parameters of elements in the set are the same, but neither the user nor the retweet are judged (here you can understand it as a bug, but we think this is the original intention of simple implementation ).
Let's look at the incl method. If the text of the input parameter is smaller than the text of the Set root, add it to the left subtree; otherwise, add it to the right subtree. If the text of the input parameter is equal to the text of the set, it indicates that the element already exists in the set and does not need to be added. Return to the set.
Here is another detail, that is, incl will not modify the content of this set, but will perform some operations on this set, and then return a new object after editing (which is consistent with the requirement at the beginning of the question ).
Filtering
Now, after reading the two methods implemented in the set, we will analyze the first item of the job: filtering.
This part implements a filter method. Its input parameter is a judgment function from a Tweet object to a boolean value, and returns a subset of a set, all the elements in the subset are determined to be true by the input parameter's judgment function, for example, the following call:
tweets.filter(tweet => tweet.retweets > 10)
Returns a set of all elements with retweets greater than 10 in tweets.
The question provides a prompt to add an auxiliary method filterAcc for the filter method. The prototype of the two functions is as follows:
/** This method takes a predicate and returns a subset of all the elements * in the original set for which the predicate is true. */def filter(p: Tweet => Boolean): TweetSetdef filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet
For Empty, the implementation of this method is very simple. Because Empty does not contain any elements, Empty itself is itself for any judgment function, so you can return this.
Next, let's take a look at how NonEmpty implements this method. We can see from the previous modes of contains and incl that a method that can be completed only by traversing the set is usually first performed on the root of the binary tree, the next step is to operate the Left and Right Subtrees of a binary tree separately. This is a recursive traversal idea. So specific to the filter method, we can still determine whether the root element of the Set meets the condition of the judgment function. If yes, add it to the set we want to return, next, we will perform the filter operation on the left and right subtree respectively. The left and right subtree here can be used as an independent set. The left and right subtree of the left and right subtree can repeat the previous operation, does it seem familiar? (The son has the sun, the sun has the Son, and the son has the sun, and the sun does not have the trouble ......)
Well, we need to use something to save the elements that have already been traversed and conform to the judgment function. That's right. You can see that the second input parameter of filterAcc is a TweetSet set, this is in the middle!
Therefore, filterAcc needs to be called in the filter function. Because there are no elements in acc at the beginning, an Empty object is passed to it here. P is a judgment function and cannot be changed during transmission.
The next key is how to implement filterAcc. In a set, filterAcc must first determine its root, that is, whether elem meets the judgment FUNCTION p. what if it does? Yes, it is to add it to acc. The added operation can be implemented using the incl method already implemented. After determining the root element, you also need to perform the same operation on the left and right subtree. Note that when the left and right subtree calls filterAcc, do not discard the previously accumulated acc set. Each time filterAcc is called, the second parameter of flterAcc must be transferred to the new set after acc is modified.
Originally, the logic was difficult, but after the question gave the filterAcc prompt, our ideas were broadened a lot. Here, we still cannot forget that each returned method is immutable, that is, a newly created set is returned each time!
Taking Unions
The second task of the job is to complete the union method. The input parameter of this method is another TweetSet object. The union method returns the union of callers and parameters. The signature is as follows:
def union(that: TweetSet): TweetSet
With the preparations above, it should be easy to think of solution. The first one is Empty. For an Empty set, the Union of the Empty set and any set is the set represented by parameters, therefore, you can directly return the set represented by parameters.
NonEmpty, according to the previous ideas, we need to first add the corresponding binary tree to the end, and then pass the left and right subtree as the union parameter to the set that has been added to the binary tree root. Of course, each time you need to save the set obtained after the previous operation of the called set as the parameter for the next operation.
It's easy to implement.
Sorting Tweets by Their Influence
This part sorts a TweetSet and returns a TweetList. TweetSet. scala also defines a special TweetSet to indicate a list:
trait TweetList { def head: Tweet def tail: TweetList def isEmpty: Boolean def foreach(f: Tweet => Unit): Unit = if (!isEmpty) { f(head) tail.foreach(f) }}
We can see that this is a typical expression of the list. head is a Tweet class object, and tail is a list composed of elements other than head.
TweetSet contains two implementations: An empty list Nil and a non-empty list Cons:
object Nil extends TweetList { def head = throw new java.util.NoSuchElementException("head of EmptyList") def tail = throw new java.util.NoSuchElementException("tail of EmptyList") def isEmpty = true}class Cons(val head: Tweet, val tail: TweetList) extends TweetList { def isEmpty = false}
We can see that Nil's head and tail are both an element that throws an exception and cannot have multiple empty list instances. Here Nil is defined as a singleton Object, that is, an Object.
The Cons element contains two parts: one is the head of the Tweet class object, and the other is the tail as the list.
This is a bit far away. In retrospect, if elements are rearranged into a TweetList Based on the retweet size (that is, the impact level) from a TweetSet, it will start with Nil, perform the following steps:
1. Select the most retweet node each time.
2. Add the selected node to the TweetList to be returned.
3. Delete this node in the original TweetSet.
These three processes are connected to form the sort by influence function. The function signature is as follows:
def descendingByRetweet: TweetList
We can see that this method has no parameters, and the returned value is a TweetList object.
Starting from the three points listed above, we can continue to look at the questions and find that the corresponding methods have been implemented in Step 1:
def remove(tw: Tweet): TweetSet = if (tw.text < elem.text) new NonEmpty(elem, left.remove(tw), right) else if (elem.text < tw.text) new NonEmpty(elem, left, right.remove(tw)) else left.union(right)
Then we only need to do the 1st and 2 steps.
First, let's take a look at the first step. The work to be done in the first step also provides a prototype in the abstract class TweetSet:
def mostRetweeted: Tweet = ???
The implementation is not provided here, because it is not required here. But we need to implement this method in Empty and NonEmpty respectively. How should we return the most popular tweets in an Empty set? We can use special characters to mark objects. For example, we can return the number of retweet objects that use-1.
In NonEmpty? According to the previous thought, of course, it is to compare the corresponding binary tree root element and the mostRetweeted element of the left and right subtree who is more popular, and then return the most popular one. Define three Tweet objects to indicate the root element, the maximum retweet element of the Left subtree, The mostRetweet result of the Left subtree, and the maximum retweet element of the right subtree, then, the largest retweet element is returned through comparison. Because the left and right subtree of the left and right subtree (after a finite loop) is eventually an empty set, our definition of the empty set determines that this recursion will eventually return results, the maximum number of retweet objects is displayed.
The second part is to add the most popular Tweet to the list and return it. Of course, we need to determine if the returned Tweet is forwarded to-1 (the devil number we defined earlier) then, Nil is returned, indicating that the empty set sorting result is still an empty list. Otherwise, the most popular Tweet is returned. The original TweetSet deletes the remaining sets except the most popular Tweet, and the list sorted by the retweet quantity is the tail list. Is it a bit difficult to find the description of tail? Simply put, after deleting the original TweetSet, call descendingByRetweet to obtain the list.
Look, it's another beautiful recursion.
Tying everything together
The filter, union, and sorting functions are completed. Now we can do a little thing. For example, in a batch of pushes, select the batch with the highest number of forwards. TweetReader in the job project. scala contains a batch of Twitter data, which has been summarized in the job by calling TweetReader. allTweets can return a set of these tweets, represented by a TweetSet object.
In addition, TweetSet. scala provides two keyword lists: the first list contains a series of Google and Android related words, and the second list contains a series of Apple and iOS related words.
The assignment must first assign values to the two tweetsets. googleTweets contains the Tweet objects of all words in the first vocabulary list mentioned in all texts, appleTweets contains the Tweet objects of all words mentioned in the second vocabulary list in all texts. The two TweetSet objects are all lazy variables (we will introduce the usage of this variable later), and their signatures are as follows:
lazy val googleTweets: TweetSetlazy val appleTweets: TweetSet
After assigning values to these two variables, the job also requires that the elements in the Union be sorted by the retweet quantity to complete the following function bodies:
/** * A list of all tweets mentioning a keyword from either apple or google, * sorted by the number of retweets. */ lazy val trending: TweetList = ???
First of all, let's take a look at the assignment of these two variables. Of course, we need to define a variable to store all the tweets. The assignment of this TweetSet object is not much said, so we can get it done with a single statement.
Then we need to use the filter method to filter the complete set of tweets. What are the filtering conditions? Of course, it is whether it contains the vocabulary in the corresponding list. Therefore, the filter parameter indicates whether to include at least one word in the list.
The two TweetSet methods are similar, but the input variables are different.
The trending method is simpler. Perform a union operation on the obtained googleTweets and appleTweets, and use the union to call the sorting method.
Conclusion
There are two points highlighted in this assignment: 1. note that these classes are immutable. That is to say, operations on class objects do not modify the value of the original object, but are reflected by returning a new object, this is also a very important concept in functional programming, namely the closure (closure); 2. because of the expression of TweetSet and TweetList, it is easy to use recursion to complete the function. In the recursion process, do not forget to pass the result obtained in the previous step as the parameter for the next step, otherwise, an error occurs!
In addition, there are two points worth your consideration: one is that in addition to the functions to be completed, the question provides a lot of function implementation methods by default, these implementations are very in line with the concept of functional programming, the Code is also very neat. We recommend you read and learn from it. The other one is a TweetSetSuite test class under the test path, you can improve the test cases to test the robustness of your function!
Week3 title: http://download.csdn.net/detail/doggie_wangtao/7395957
Analysis of previous exercises:
Coursera open course Functional Programming Principles in Scala exercise answer: Week 2
Coursera open course Functional Programming Principles in Scala exercise answer: Week 1
Statement:
This article is original, prohibited for any commercial use, reproduced please indicate the source: http://blog.csdn.net/asongoficeandfire/article/details/26842279