Small functions may be harmful (Small Functions considered harmful)
Source: Internet
Author: User
In this blog post, my goals are:
Reveal some plausible advantages of small functions
Explain why I personally think that it is not as good as suggested
Explain why small functions are sometimes counterproductive
Explain where I think small functions are really useful in mocks
In general, programming advice always says to use more elegant and useful small functions. "Code Clean" is generally regarded as a programming Bible, it has a chapter dedicated to functions, the beginning of the article is to introduce a very long, headache function. The book believes that the biggest problem with this function is that it is too long, and points out:
It (function) is not only too long, but also has many repeated codes, strange strings, many strange and unclear data types and APIs. After three minutes of study, can you understand the function of the function? Maybe not. There are too many levels of abstraction. Strange strings, strange function calls mixed in double nested, if statements controlled by flags.
This chapter simply thinks about what features will make the code "easier to read and understand" and "allow any reader to intuitively understand the program they encounter" before saying that in order to achieve this purpose, the function must be set up Smaller.
The first principle of the function is that it must be small. The second principle of a function is that it must be smaller.
The view that the function should be small is almost considered an authoritative view and cannot be questioned. In code reviews, on twitter, in conferences, in books and podcasts on programming, in articles on best practices for code refactoring, etc. This idea entered my timeline again in the form of this tweet a few days ago:
In his tweet, Fowler linked his article on function length and continued to point out:
If you have to spend effort looking at this piece of code to determine what it is doing, then you should extract it into a function and name the function after its function.
Once I accepted this principle, I developed the habit of writing some very small functions-usually only a few lines 2. Any function that exceeds half a dozen lines will make me feel uncomfortable. For me, a function with only one line of code is not uncommon.
Some people are obsessed with small functions, so the idea of abstracting any logic that may seem complex into a single function has always been highly regarded.
I have been studying the code base that people have inherited. They internalized this view to the point of being completely distorted, so that it eventually went to an irreparable point, completely contrary to the original will of this view. In this article, I want to explain why some of the benefits that are often touted are not always developed in the way people hope. Sometimes, the application of some opinions will be counterproductive.
Supposed benefits of smaller functions
There are usually some reasons to prove the advantages behind small functions.
Do one thing only
The idea is simple-a function should only do one thing and do it well. On the surface, this seems to be a very good idea, which coincides with the Unix philosophy.
When this "one thing" needs to be defined, the description becomes ambiguous. "One thing" can be from simple return statements to conditional expressions, mathematical calculations called through the network (and so on). Normally, in many cases, this "one thing" means a single-level abstraction of some (usually business) logic.
For example, in a web application, a CURD operation like "create user" might be "one thing". In general, creating users needs at least to create records in the database (and handle any accompanying possible errors). In addition, users may need to send them a welcome email after registering. In addition, people may also wish to customize an event. A message middleware like kafka can send this event to other systems.
Therefore, "single level of abstraction" is not just a level. What I have seen is that programmers who fully understand the idea that functions should do "one thing" often have a hard time resisting applying recursion to every function and method they write.
Therefore, we are no longer abstracting into a reasonable unit for understandability (and testing), but dividing the smaller unit to describe each component of "one thing" until it is completely modular DRY (Don't repeat yourself).
DRY's fallacy
The tendency of DRY and the smallest possible function is not necessarily the same thing, but I have seen that the latter often makes the goal the former. DRY is already a good guiding principle in my opinion, but practicality and rationality are sacrificed dogmatically, especially for programmers who are convinced of Rails.
Raymond Hettinger, the core developer of Python, published a wonderful speech called Beyond PEP8: That is a wonderful and easy-to-understand best practice. This is a topic that must be concerned, not only for Python programmers, but also for anyone interested in programming or living on developing programs, because it very clearly explains the fallacy of dogmatic compliance with PEP8, which is true Python style guide, which introduces many low-level implementations. The focus of the conversation on PEP8 is no more important than the applicable insights, and (and) many of them cannot be described in words.
Even if you have n’t watched the whole speech, you should look at the first minute of the speech, which is a surprising analogy to DRY ’s alarm. Programmers insist on simplifying the code as much as possible, so that they will only focus on the local and ignore the whole.
My main problem with DRY is that it forces abstraction to be abstract-nested and too early. Since it is impossible to abstract perfectly, we can only do as much abstraction as possible. The definition of "good enough" is difficult and depends on many factors.
In the figure below, the term "abstract" can be used interchangeably with "function". For example, suppose we want to design abstraction layer A, we may need to consider the following points:
Support the hypothetical nature of abstract concept A and the likelihood of the level they may hold (and how long it may last)
Abstract layer A (abstract layer X and abstract layer Y) and any abstract layer built on top of abstract layer A (abstract layer Z) tend to maintain consistency, flexibility, and scalability in its implementation and design And correctness.
Future abstractions (abstract layer M) requirements and expectations may be built on abstract A, and any abstractions that may need to be supported under A (abstract layer N)
The abstraction layer A we develop will inevitably be reevaluated in the future, and it will probably fail partially or even completely. One of the most important features that can support the inevitable changes we need is to design our abstractions to make them more flexible.
Optimizing the code as much as possible means that when we need to adapt to changes in the future, (this will) deprive us of our own flexibility. When we optimize, we also need to allow ourselves enough room to adapt to the inevitable changes. Sooner or later, we will have such a requirement, instead of immediately optimizing for a perfect fit.
The best abstraction is one that is optimized well enough, but not perfect. This is a function, not an error. Understanding the very prominent nature of abstraction is the key to designing a good program.
Alex Martelli is a celebrity of duck theory and python school. The slide in his famous speech "Abstract Tower" is well worth reading.
Rubyist Sandi Metz has a famous speech called All The Little Things, and she believes that "repetition is cheaper than wrong abstraction" and therefore "prone to repeated abstraction."
In my opinion, an abstract concept cannot be completely “correct” or “wrong”, because the line between “correct” and “wrong” is inherently vague. In fact, our carefully designed "perfect" abstraction is just a business requirement or a bug report commissioned.
I think this helps to treat abstraction as a graph, like the graph we saw earlier in this article. One end of the graph optimizes accuracy, and every aspect of our code requires accuracy in the end. This of course has its advantages, but because it strives for perfect alignment, it is not suitable for good abstraction. The other end of the map is optimized, which brings inaccuracy and lack of boundaries. Although this does allow maximum flexibility, I have found that this extreme tendency will lead to other disadvantages.
Like most other things, the "ideal model" is somewhere in between. No entertainment can please everyone. The "ideal model" also depends on many factors-engineering and social relations-and good engineering is able to determine the location of the "ideal model" in different environments, and can constantly re-evaluate and calibrate the model.
Name the abstract (The name of the game)
Speaking of abstraction, once you determine what and how to abstract, you need to give it a name.
It has always been difficult to name things.
This way (name the abstract) is generally considered to be an effective way to make the code live longer during the programming process. A more descriptive name is a good thing. Some even advocate replacing the name in the code with a commented name Notes. Their idea is that the more descriptive a name, the better the encapsulation.
This view is common in the Java world, and long names (in Java programs) are very common, but I have never found these long names to make the code easier to read. For example, a function with a very long name may be hidden in 4-5 lines of code. When I was reading the code, a very long word suddenly appeared and would stop me, because I had to try to deal with all the different syllables in the name of this function, try to integrate it into the mental model I had created, and Decide whether or not to jump to where it is defined to see its specific implementation.
However, the problem with "small functions" is that the process of tracing small functions leads to more small functions, all of which tend to give very lengthy names in the process of recording themselves and avoiding discussions.
As a result, processing the names of detailed functions (and variables) introduces cognitive overhead and mapping them into the mental model I have constructed so far to determine which functions need to be explored in depth and which functions can be eliminated, and will These puzzles are put together to demystify the program, but dealing with lengthy function (and variable) names makes this process more difficult.
Personally, compared to viewing custom variable or function names, from a visual point of view, I find that keywords, constructions, and idioms provided by programming languages are more acceptable. For example, when I read the if-else module, I rarely need to spend energy to deal with the keywords if or elseif, only need to spend time to understand the logic flow of the program.
A VeryVeryLongFuncNameAndArgList name will interrupt my reasoning. This is especially true when the function being called is actually a single thread that can be easily inlined. Context switching is expensive, whether it is CPU context switching or programmers have to think about context switching when reading code.
Another problem with over-emphasizing small functions is that especially those functions that are very descriptive but whose names are not intuitive are more difficult to find in the code base. In contrast, a function called createUser is easy and intuitive to use for grep, such as renderPageWithSetupsAndTeardowns (in "Clean Code" as a star example, this name is not the easiest name to remember, nor the easiest to search To the name). Many editors also perform fuzzy searches on the code base, so functions with similar prefixes are also more likely to cause redundant results when searching, which is not what we want.
Loss of Locality (Loss of Locality)
(Note: This refers to the code that can be implemented in this function, this file, and this package, but moved to other functions, files, or packages for small functions)
Small functions work best when we do n’t have to skip files or packages to find the definition of the function. The "Clean Code" book proposes a principle called "The Stepdown Rule".
We want the code to be as easy to read as a top-down narrative. We hope that each function is followed by the next level of abstraction, so that when we read the program and read the function list, we can descend one level of abstraction at a time. I call it "The Stepdown Rule".
This view is theoretically feasible, but in actual practice, it rarely plays a role. On the contrary, most of what I see is to add more functions to the code and reduce the local code.
Let us start with the assumption of three functions A, B and C, one calling the other. Our initial abstraction confirmed certain assumptions, requirements, and precautions, all of which we carefully studied and demonstrated during the initial design.
Soon, assuming that we have a new requirement or an additional feature, we need to cater to unforeseen or a new constraint. We need to modify the function A because the "one whole" it encapsulates is no longer valid (may be invalid from the beginning, now we need to modify it to make it valid). As we learned in "Clean Code", the best way to deal with these problems is to create more functions and hide all kinds of new requirements.
After we modify it according to our ideas, in a few weeks, if our needs are modified again, we may need to create more functions to encapsulate all the additional changes required.
A few more times, we really saw the problem described by Sandi Metz in her blog post "The Wrong Abstraction". This blog post says:
The existing code has a powerful influence. Its existence shows that it is correct and effective. We know that the code represents the effort, and we are very active in maintaining the value of this effort. Unfortunately, the sad truth is that the more complex the code, the harder it is to understand, that is, the greater the investment in designing it, the more we feel that we need to keep it ("sunken cost fallacy").
If the same team members continue to maintain it, I believe this is correct, but when new programmers (or managers) take ownership of the code base, I will see the opposite result. Code that started with good intentions now becomes With the spaghetti code, the code is no longer concise, it has become hell-like code, and now the urge to "refactor" or sometimes even rewrite the code is more tempting.
Now, people may argue that to some extent this is inevitable. They are right. We rarely discuss how important it is to write code that will be retired. I have written about the importance of making code operationally easy to retire in the past, especially when it comes to the code base itself.
Normally, programmers only treat code as "dead" when it is determined to be deleted or no longer used. If we start (in code will "die") thinking about the code we write, then every time a new git commit is added, I think we may be more active in writing code that is easy to modify. When thinking about abstraction, it is helpful for us to recognize the fact that the code we are building may be only a few hours from death (being modified). Therefore, optimizations made to facilitate code modification are often better than trying to build the top-down design mentioned in Clean Code.
Pollution
In supporting object-oriented programming, small functions bring about larger or more classes. In programming languages like Go, I see this trend leading to larger interfaces (combined with the double blow of interface implementation) or a large number of small packages.
This exacerbates the overhead of mapping business logic to the abstract cognition we have created. The greater the number of classes / interfaces / software packages, the more difficult it is to win in one fell swoop. This proves that the maintenance costs (large) required for these different classes / interfaces / software packages we constructed are reasonable.
Fewer parameters
Proponents of fewer functions almost always tend to support passing fewer parameters to the function.
The problem with fewer function parameters is that there is a risk of unclear dependencies.
I have seen that the Ruby class has 5-10 methods, all of which usually do very simple things and may have one or two variables as parameters. I also saw that many of them changed the state of shared global variables, or relied on singletons that have no explicit transitive relationship. As long as there is a situation, it is the opposite pattern (as we discussed earlier). .
In addition, when the dependency relationship is not clear, the test will become more complicated. Before the independent test for our itty-bitty function, we need to reset and modify the state value before it can run.
Harder to read
This has already been stated earlier, but it is worth reiterating-the explosive growth of small functions, especially one-line functions, makes the code base difficult to read. This will especially hurt those who code should be optimized-novice.
There are several types of novices in the code base. In my experience, a good rule of thumb is to remember some people who might check the "new" category above. Doing so can help me re-evaluate my assumptions and rethink that I may inadvertently add some newbies to the first time I read the code for the first time. I realize that this approach actually leads to better and simpler code than other ways.
Simple code does not mean it is easy to write, and it is rarely the best code for DRY. It requires a lot of careful thinking, attention to detail, and careful implementation of simple solutions, which is correct and natural. The most striking aspect of this hard-won simplicity is that it is suitable for new and old programmers, and it is easy to understand all possible definitions of "old" and "new".
When I am new to the code base, if I am fortunate enough to already know the language or framework it uses, (then) the biggest challenge for me is to understand the business logic or implementation details. When I ’m not so lucky, and face the (must) difficult task of writing a code base in my layman ’s language, the biggest challenge I face is to be able to understand the language / framework adequately, such as being on thin ice to understand the code What to do without falling into the pit, at the same time to be able to distinguish the "single thing" that I really need to understand and related to the goal, in order to make the necessary progress in the project.
During this time, I have not seen a strange code base, so I will say:
Well, these functions are small enough and conform to the DRY style.
While I was trying to find the answer to the question, I ventured into the unknown field. What I really wanted was to let the least amount of thinking jump and context switch.
Investing time and effort to make future maintainers or consumers of the code easier (understand), this will generate huge returns, especially for open source projects. This is something I hope I can do better early in my career, and I have been paying attention during this time (this).
When does a small function make sense
When all situations are taken into account, I believe that small functions definitely have meaning, especially when testing.
Network I / O
This is not an article on how to best write functions, integrations and unit tests for a large number of services. However, when it comes to unit testing, network I / O is tested in a certain way, well, there is actually no testing.
I am not a fan of mock functions. The mock function has several disadvantages. First, mock is a manual simulation of some results. Only when our imagination and our ability to predict the various failure modes that our application may encounter. Mocks are also likely to be different from the real services they support, unless everyone has rigorously tested the real services (note: know the details). Mock is best when there is only one instance of each specific simulation and each test uses the same simulation.
In other words, mock is still the only way to test some forms of network I / O alone. We live in an era of microservices and outsource most (if not all) of our main product concerns to suppliers. Now the core functions of many applications require one call or multiple calls, and the best way to unit test these calls is to simulate them.
Overall, I found that limiting the scope of mocking is the best. Calling the API of the email service to send a welcome email to our newly created user (of course) requires an HTTP connection. Isolate this request into as few functions as possible and allow us to mock in the test to minimize the amount of code. Generally, this should be a function no longer than 1-2 lines, used to establish an HTTP connection and return any errors and responses. The same is true when sending events to Kafka or newly created users in the database.
Attribute-based testing
For those things that can provide such huge benefits through such small code, attribute-based testing is not fully utilized. (This kind of test) was invented by Haskell Library ’s QuickCheck and adopted in other languages such as Scala (ScalaCheck) and Python (hypothesis). Attribute-based testing allows people to generate a large amount of input that conforms to a given test specification and assert The test pass condition for each situation.
Many attribute-based testing frameworks are function-oriented, so it makes sense to isolate anything that may be subject to attribute-based testing to a single function. I find this to be particularly useful when testing the encoding or decoding of data, and when testing JSON or msgpack parsing.
in conclusion
The intention of this article is neither to say DRY nor to say that the small function itself is bad (although the title of this article gives such a hint). It's just that they are essentially not good or bad.
The number of small functions in the code base or the average function length is not a boastable indicator in itself. There was a topic called onelineizer in the 2016 PyCon talk about a Python program of the same name that can convert any Python program (including itself) into a single line of code. Although this makes meeting discussions interesting and tempting, it would be very stupid to write (similar) product code on the same issue.
The above recommendations are generally applicable, not just for Go. As the complexity of the programs we write increases greatly, and the restrictions we oppose become more variable, programmers should adjust their thinking accordingly.
Unfortunately, orthodox programming ideas are still seriously affected by the supremacy of object-oriented programming and design patterns. So far, many ideas and best practices that have been widely disseminated are largely to a large extent, and have not been challenged for decades. At present, there is an urgent need to rethink, especially, the programming pattern and paradigms have occurred in recent years The change.
Not changing the old style not only encourages laziness, but also causes programmers to fall into a false comfort that they cannot bear.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.