這是一個建立於 的文章,其中的資訊可能已經有所發展或是發生改變。
What do the Go (the language) and Scala (an object-functional programming and scripting language) have in common? They are both growing in popularity. They are both strongly typed languages. And, they are capturing mindshare from dynamic language experts.
At OSCON this week I attended the “Getting Started with Scalding, Twitter’s High-level Scala API for Hadoop MapReduce” by Avi Bryant of Stripe. Conference veterans might remember Avi from his keynote at RailsConf in 2007. Avi talked about Smalltalk and Ruby, two decidely “dynamic” languages, where dynamic means no typing; everything is just an object, or at least, the interpreter does not tell you that your data does not fit your functions until after it has started running.
Another way of thinking about typing is to look at the other language case: those that are statically typed. The data inside the programs written in a static language is data constrained by a type. Imagine types to be like classifications. Without classifications in the real world, you could stick a duck into a beehive. It makes it easier to make beehives and easier to store ducks, but you might find the honey tastes strange once in a while. Dynamic languages enable people to ship things quickly. Twitter started using Ruby and bootstrapped fast; but now we have Avi talking about Scalding during his time at Twitter. Twitter has moved much of its architecture (an architecture which now needs to be stability and not rapid iteration) to static languages.
Why are static languages surging in popularity? As Avi noted in his talk today, static languages spout error messages at compile time, rather than run time. When we were building the first versions of web apps, our write/compile/run phase was small: we might make a change, hit the server and see the results in our browser in a few seconds. The feedback loop was small. Avi’s talk today was about Hadoop clusters computing millions of records of data. As Avi noted, a “job” might take ten hours, and you want to see your programming errors during compile time (before the job has been submitted to Hadoop) rather than during the running job. Static languages don’t allow completion of the compile phase and don’t allow you to submit a job until you have written code which the compiler has verified uses the correct inputs and outputs. In other words, when you transform millions of records into something newsworthy about duck and bee relations, the compiler knows if the record should be a row of bees or a row of ducks, and if the function is “countDucks” the compiler won’t let you proceed unless the row you are putting into it comes from a duck factory. With Ruby or Smalltalk this can’t happen: these languages figure out whether it is a bee or duck after the job has started (“Hey, don’t yell at me, they both have wings!”), and with a Hadoop job that moment might be five hours into the job, meaning you just wasted five hours.
So, static languages catch bugs earlier. As more and more work moves to large data set calculation, we will see more reliance on these languages. Ironically, as our computing power grows, so have our data sets and log files. Processing the sheer volume of data has resulted in long running jobs, harkening back to the term job which used to be referring to “punchcard jobs” like we saw at the advent of computing. We’ve largely forgotten the techniques of programming long running jobs. Static languages provide an entry way back into writing reliable long running processes. But they alos rid us of the tiresome memory management issues that plagued earlier static languages like C.
One last thing these two languages excel at: concurrency. In his talk Avi showed code to process “Alice in Wonderland”.
.filter{ word => word.length > 2 }
.filterNot{word => word.isEmpty}
.filter{word => word.head.isUpper}
.map{word => (word, 1)}
.group
.sum
.groupAll
.maxBy{ case (word,count) => count }
This code is special because Scala and Scalding can optimize it depending on whether it is running locally (on your laptop) or over a cluster like Hadoop. But, remember: this code does not need modification regardless of where it runs. You simply specify a different switch on the command line (–local vs –hdfs); this is a safer way to do transformation and transposing, by letting an optimizing compiler figure out how to translate a job onto a running compute engine. Both Go and Scala provide language level concurrency constructs that are incredible easy to use compared to Java’s threading or in Node.js async callbacks (even when using promise based libraries). Concurrency is necessary for the distributed computing we see more and more. These languages provide a simple and safe way to do concurrency and this is another reason they are surging in popularity. When you want to take activity in a swarm, like bees or ants do, you build using simple components and simple workers. Bees are the way to go, not ducks. Use the right languages like Go and Scala.
Feature image via Flickr Creative Commons