Transferred from: http://www.ithao123.cn/content-6053935.html
You can see the difference between the cache and the persist by observing the Rdd.scala source code:
def persist(newlevel:storagelevel): This.type = { if (storagelevel! = Storagelevel.none && Newlevel! = storagelevel) { throw new Unsupportedoperationexception ("Cannot change storage level of an RDD after it is already assigned a level") } Sc.persistrdd (This) Sc.cleaner.foreach (_.registerrddforcleanup (This)) Storagelevel = Newlevel This } /** Persist This RDD with the default storage level (' memory_only '). */ def persist(): This.type = persist (storagelevel.memory_only) /** Persist This RDD with the default storage level (' memory_only '). */ def Cache(): This.type = persist () |
Know:
1) the RDD cache () method is actually called the persist method, the cache policy is memory_only;
2) The Persist method can be manually set Storagelevel to meet the required storage level of the project;
3) Cache or persist is not an action;
Attached: Both cache and persist can be canceled with Unpersist