PS: These three hypotheses and life venture capital combine their association
- Hypothesis One: Brain optimization cost function The Brain optimizes costs Functions
- Hypothesis two: Different brain regions in different periods of development use a variety of cost functions costing Functions is diverse across areas and change over development
- Hypothesis Three: Specialized systems in the brain efficiently solve critical computing problems Specialized system allow efficient solution of key computational problems
mon1st
Links: https://zhuanlan.zhihu.com/p/23021318
Source: Know
Copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please specify the source.
Lead
The origins of deep learning may be neuroscience, but the development of recent years has undoubtedly become a faction, and (almost) unrelated to neuroscience. Machine learning experts are interested in how to further refine their algorithms, and neuroscientists want to know how the human brain, not the deep web, works.
This "brain circuit" image is also rejected by both computer scientists and biologists-it is neither a true deep network structure nor a description of how the brain works.
Konrad kording attempts to change this trend by restarting the dialogue between neuroscience and machine learning . His work with Adam Marblestone (MIT Media Lab) and Greg Wayne (Google Deepmind), "a combination of deep learning and neuroscience," illustrates this concept, which was published in the June biorxiv,9 month in the computational neurology The forefront of learning.
Some readers may have seen a neuro-scientist understand microprocessors? The theoretical dilemma of neuroscience in the age of big data is also an introduction to Kording's work. If the previous article raises a sharp question-whether the existing research tools of neuroscience are satisfactory-this article may be seen as one of the possible ways to solve the problem: to adopt the ideas developed in deep learning to study the brain. The article is very long, involves more content, here first only introduces the general idea, many branches although interesting will temporarily skip over. It is highly recommended to read the original text (open access).
Three trends in modern machine learning
The author first points out three characteristics of modern machine learning:
- Focus on optimizing cost functions
- Recent work introduces complex cost functions
- This includes cost functions that are not average in space and time, and cost functions that are generated inside the network.
- The structure of machine learning is becoming more and more diverse.
- The newly developed structures include memory units, "capsules", external memory, pointers, and hard-coded arithmetic instructions.
Three hypotheses about how the brain works
After pointing out the above three machine learning features, the author proposes three hypotheses:
- Hypothesis One: Brain optimization cost function The Brain optimizes costs Functions
- Hypothesis two: Different brain regions in different periods of development use a variety of cost functions costing Functions is diverse across areas and change over development
- Hypothesis Three: Specialized systems in the brain efficiently solve critical computing problems Specialized system allow efficient solution of key computational problems
Digress for a minute, this section can skip
Honestly, the feeling I saw here in June was--
Good pit Daddy!
Looks like it's another empty, unexplained brain hole. Article: Cost function is a particularly broad concept, for the activity of the nervous system to find the cost function is very ordinary. Hypothesis Two is obviously a patch! Can not find the global cost function is local, but also fear that the time instability is the result of development ... Hypothesis three did not say Ah, brain area specialization division who does not know?
In this way, I threw the article aside. Three months after the article actually went through the review, and then the new version to see.
This time I jumped to the end to see the conclusion, the result was suddenly attracted:
In the other words, the This framework could is viewed as proposing a kind of "society" of cost functions and trainable networks, Permitting internal bootstrapping processes reminiscent of the Society of Mind (Minsky, 1988). In this view, intelligence are enabled by many computationally specialized structures, each trained with its own developmen Tally regulated cost function, where both the structures and the cost functions is themselves optimized by evolution like The hyperparameters in neural networks.
In other words, the framework can be seen as a "society" of cost functions and a trained network, thus fulfilling the internal bootstrap process that is similar to Minsky in the "Mind Society". In this view, intelligence is implemented by a number of special computational structures, each of which is trained by its cost function, which is controlled by development, and the structure itself and the cost function are optimized by evolution like hyper-parameters.
Although it is just a change of words, I somehow feel very intuitive. Perhaps it was Minsky's name, but more likely the cost function was driven by the biologist's words, and the diversification of the driving force was a question I was thinking about recently.
End of topic, welcome back.
Above is the only one of the article of the mouth cannon. Figure A is a typical structure of traditional machine learning, with red dashed lines as cost functions (entering the network in the form of errors). Figure B is a hypothetical brain neural network, and the cost function is calculated based on the external input within the system. In Figure C, several different brain regions are trained according to different cost functions and interact with each other.
The brain can optimize the cost function of the brain can optimize costs functions
One of the first criticism to try to unify the theory of machine learning and the nervous system must be: how can the nervous system achieve reverse propagation (backpropagation)???
This is a top question, and the author wrote eight pages of paper in one breath. Its core idea is
(a) The brain have powerful mechanisms for credits assignment during learning that allow it to optimize global functions in Multi-layer networks by adjusting the properties of each neuron to contribute to the global outcome, and that (b) the Brai n have mechanisms to specify exactly which cost functions it subjects it networks to, i.e., which the cost functions is hi Ghly tunable, shaped by evolution and matched to the animal ' s ethological needs. Thus, the brain uses cost functions as a key driving force of IT development, much as modern machine learning systems do.
(a) The brain has strong enough mechanisms to solve the problem of credit distribution. By altering the nature of each neuron in a multilayer network, the
brain can optimize the overall cost function.
(b) The brain has a mechanism for precisely assigning different cost functions to its networks, i.e., the
cost function is highly controllable and regulated by evolution and the ecological needs of the animal itself.
As a result, the
brain takes cost functions as a decisive driver of its development , just like today's machine learning systems.
The full text is too Luo long Li does not Luo to turn suo, here only to mention some interesting models. (too technical, temporarily put at the end of the article)
Machine learning inspired neuro-science learning Inspired Neuroscience
The hypothesis, of course, is to guide practice-is it possible to test the hypothesis that "there are a variety of cost functions in the brain that guide the learning of neural circuits"?
1. you can predict the state of the network by guessing the cost function: The network should be in the optimization state specified by the cost function.
2. The optimization of cost function is necessarily related to the gradient descent of parametric space . In other words, there should be more movement in the gradient descent direction than in the vertical direction of meaningless rotation. If you can observe the weight of the neural network (see here I really laughed aloud haha haha), should be able to find the weight in the gradient drop.
3. According to 1, outside interference will cause the system to deviate from the optimization state . By changing the weight of the synapses, we can generate a small disturbance and predict that the system will return to the same optimization state. This has begun to become possible in the field of motion (via brain-computer interface BMI).
4. If we know which cells and connections are responsible for transmitting error signals, you can impose a user-defined cost function on the system by stimulating the specified connection . This would be equivalent to taking the associated brain loops as a trained depth network to study their learning. At the other end, it is also possible to enter new information through the brain-computer interface to investigate whether its behavior conforms to the optimization principle (Dadarlat et al., 2015).
5. Training Artificial Neural networks with hypothetical candidate cost functions that can be compared to actual brain loops to test hypotheses (this method has been used by many people)
Neural science-inspired machine learning
The authors believe that the brain is the implicit machine learning mechanism that evolved from evolution. Then the brain should be able to efficiently optimize multiple cost functions under a variety of data. In fact, compared to the existing machine learning system, the brain's hardware is very slow (limited by the rate of biochemical reactions), and for non-linear, non-differential, time-dependent, pulse-based systems with a large number of feedback connections How to optimize, we know very little. At the system architecture level, the brain has a small number of stimuli that can be used for many different timeframes and active learning . If the brain is really an example of machine learning (specifically, if it does solve the problem of credit distribution in multilayer networks), then we will learn a lot of useful optimization algorithms.
On the other hand, even if the brain does not use reverse transfer, we will learn a new technique of non-reverse transfer.
The field of machine learning has begun to study how to generate cost functions using networks (Watter et al., 2015). By examining how the brain develops and applies the different cost functions in the process of development, we will help us to better design cost functions and hierarchical behavior in machine learning.
The diversity of structures that are taking place in machine learning can also benefit from the diversity of the brain structure.
The
brain combines a jumble of specialized structures in a to that works. Solving this problem
de novo on machine learning promises to being very difficult, making it attractive to be Insp Ired by observations on how the brain does it.
The
brain combines a bunch of special structures in an effective way. it will be very difficult to re-solve this problem in machine learning-that's why it's so appealing to see how the brain does it.
Does evolution separate the cost function from the optimization algorithm? Did Evolution separate cost Functions from optimization algorithms?
Deep learning is successful because it divides machine learning into two parts : 11 Algorithms, reverse propagation , for efficient and distributed optimization, and 2 tips for converting any problem into a proper cost function . In today's deep learning, most of the work is looking for a more appropriate cost function.
Does the brain also find this method in evolution? The authors say yes: different cortical regions may share the same optimization algorithm (microstructure), but accept different data and cost functions. In fact, the cost function for developing cortical regions may be passed along as inputs to the data itself.
Another possibility is that, in cortical microstructures (loops), part of the connection and learning rules determine the optimization algorithm (fixed), while others determine the cost equation (variable). This idea can be analogous to the FPGA (it has to be a hole in the brain to open).
Conclusion
The conclusion part of the article is very sympathetic, here the general translation is as follows.
Due to the complexity and variability of the brain, purely bottom-up neural data analysis faces difficulties in interpreting. The theoretical framework can be used to constrain the hypothesis space , allowing researchers to solve high-level principles and system structures, then "amplify" and solve the details. The existing top-down theoretical framework includes maximum entropy, effective coding, reliable approximation of Bayesian inference, minimization of prediction errors *, attractor dynamics, modularity, symbolic arithmetic capabilities, etc. (Pinker, 1999; Marcus, 2001; Bialek, 2002; Knill and Pouget, 2004; Bialek et al., 2006; Friston, 2010). Many of these top-down theories are essentially an optimization of a single cost function for a single computational structure . We extend these hypotheses by proposing the diversity and development of cost-functional groups , as well as multiple specialized subsystems .
Many neuroscientists focus on finding "neural coding," which stimulates activities that are prone to producing specific neurons or brain regions. But if the brain does optimize cost functions, then we have to note that simple cost functions can produce complex stimulus responses. This may turn us to another type of problem. a more in-depth dialogue between neuroscience and machine learning can help clarify many problems . Much of machine learning is focused on faster gradient descent from beginning to end in neural networks. Neuroscience may bring many levels of enlightenment to machine learning. The optimization algorithm used by the brain has evolved over the past millions of years. The brain may find ways to simplify learning by directing unsupervised learning consequences by using heterogeneous, cost-functional groups that influence each other in development . The various specialized structures that evolve in the brain may prompt us to improve the efficiency of learning systems that face multiple computational problems and span multiple timeframes. By seeking insights from neuroscience , machine learning may move toward a strong artificial intelligence that learns in a world of structural heterogeneity and a limited number of tagged data.
In a sense our hypothesis is contrary to popular theory. There is no single optimization mechanism, a single cost function, a single form of representation, or a homogeneous structure. all of these heterogeneous elements are unified by the principle of optimizing the cost function generated internally. Many early AI pathways reject a single theory. For example, the work of Minsky and Papert in the Society of the mind, as well as the broader, interconnected system of development theory by genetic preparation and internal self-guidance, emphasizes that intelligence requires a system composed of internal detectors and judges, specialized communication and storage mechanisms, and hierarchical organization of simple control systems.
In these early stages of work, it was not known that gradient-based optimization could lead to strong feature representation and behavioral policies. The theory presented here can be seen as a way of re-proposing the heterogeneous approach to the popular optimization from beginning to end. In other words, the framework can be seen as a "society" of cost functions and a trained network, thus fulfilling the internal bootstrap process that is similar to Minsky in the "Mind Society". In this view, intelligence is implemented by a number of special computational structures, each of which is trained by its cost function, which is controlled by development, and the structure itself and the cost function are optimized by evolution like hyper-parameters.
---was put at the end of some technical details---
The brain can optimize the cost function of the brain can optimize costs functions
2.1 Local self-organization and optimization does not require multi-layered credit distribution local self-organization and optimization without multi-layer credits assignment
Pehlevan and Chklovskii 2015 suggests that a class of Hebbian plasticity can be seen as the process of extracting the input principal component (PC), thus minimizing the refactoring error.
2.2. Optimized bio-base biological implementation of optimization
2.2.1. Multilayer networks require efficient gradient descent the need for efficient Gradient descent in multi-layer Networks
The importance of gradient descent is well known, and here is not much to talk about. There is one topic that is specifically about gradient descent.
2.2.2. Biological approximation of gradient descent biologically plausible approximations of Gradient descent
The possible mechanisms in the brain that may be used to approximate the gradient descent algorithm are unexpectedly many. The common point is to use the feedback connection propagation error. An example is the O ' Reilly XCAL algorithm (o ' Reilly et al., 2012), which implements the reverse propagation of errors through local Hebbian learning laws.
Another possible way to achieve reverse propagation is based on the plasticity of Pulse time (STDP). This is illustrated by the Hinton that neurons can encode the error derivative required for reverse propagation by the time derivative of the pulse rate (Hinton, 2007, 2016).
Another possible mechanism involves a random feedback connection that is independent of the strength of the feedforward connection. In a model known as "Feedback alignment", error calculations that are almost as good as reverse propagation can be achieved through synaptic normalization and symbolic consistency of feedforward and feedback connections (Liao et al., 2015).
2.2.2.1. Time Credit Distribution temporal assignment:
An important unresolved issue in the above discussion is the time credit allocation: In the recurrent network (recurrent nets), in order to achieve "time domain reverse propagation (BPTT)", the method of machine learning is to use the network in time to expand (unroll). The nervous system seems apparently unable to carry out its own activities in time to reverse-propagate.
The author gives some ideas for the solution. One is to spatially allocate the problem of time by memory (e.g. Weston et al., 2014).
Another approach comes from the study of recurrent web-supervised learning. In the force model proposed by Sussilo and Abbott, 2009, the output of the network is clamped at the specified target, and the random fluctuations generated within the network provide feedback signals to update the weights.
2.2.2.2. Pulse Network spiking networks
2.3. Other principles of biological learning other principles for biological learning
It is clear that even if the brain does adopt an optimization algorithm similar to reverse propagation, it cannot rule out other completely different algorithms.
2.3.1. Exploiting biological neural mechanisms using bio-neural basis
In particular, when we examine the structure of individual neurons, we find that (these are all commonplace): the dendrites of neurons can be localized, neurons contain multiple parts (compartments), each neuron can be seen as a local network, and when neurons generate action potentials, The signal transmitted by the reverse (to the dendrites) is more strongly transmitted to the nearest active branch, which may simplify the credit allocation problem (K?rding and K?nig, 2000); etc.
An important feature of biological neural networks is the nerve regulator: the same neural network can be viewed as switching between multiple coincident loops (Bargmann, 2012, depending on the state of the neural Regulation). Bargmann and Marder, 2013). This may allow sharing of acquired weights between different loops.
2.3.2. Learning in the cortex learning in the cortical Sheet
The 6-storey structure of the cortex is very compelling, and there are several learning theories that attempt to explain this ever-repeating structure. It is generally believed that the cortex is supervised unsupervised learning through prediction (O ' Reilly et al., 2014b; Brea et al., 2016). This includes efforts to direct the transmission of cortical structures to Bayesian inference (Lee and Mumford, 2003; Dean, 2005; George and Hawkins, 2009), while others try to explain the observed cortical activity using learning theory.
These and other preliminary theories about the workings of the cortex go beyond reverse communication.
---
* This theory can refer to Zhao Shi's article: The brain is constantly "predicting" the world's original text
Marblestone, A. H., Wayne, G. & Kording, K. Toward an integration of deep learning and neuroscience. Front. Comput. Neurosci. Ten, 1–61 (2016).
Frontiers | Toward an integration of deep learning and neuroscience
Other reference documents
Bargmann, C. I. (2012). Beyond the Connectome:how neuromodulators shape neural circuits. BioEssays 34, 458–465. doi:10.1002/bies.201100185.
Bargmann, C. I., and Marder, E. (2013). From the connectome to brain function. Nat. Methods 10, 483–490. doi:10.1038/nmeth.2451
Bialek, W. (2002). "Thinking about the brain," in Physics of Bio-molecules and Cells, Vol, eds F. Flyvbjerg, F. Jülicher, p. ormos, and F . David (Berlin; Heidelberg:springer), 485–578.
Bialek, W., De ruyter Van Steveninck, R., and Tishby, N. (2006). "Efficient representation as a design principle for neural coding and Computation," 2006 IEEE International Symposium O n Information theory, (Los alamitos:ieee), 659–663.
Brea, J., Gaál, A. T., Urbanczik, R., and Senn, W. (2016). Prospective coding by spiking neurons. PLoS Comput. Biol. 12:e1005003. doi:10.1371/journal.pcbi.1005003
Dadarlat, M. C., O ' Doherty, J. E., and Sabes, P. N. (2015). A learning-based approach to artificial sensory feedback leads to optimal integration.nat. Neurosci. 18, 138–144. doi:10.1038/nn.3883
Dean, T. (2005). "A computational model of the cerebral cortex," in Proceedings of the 20th National Conference on Artificial Intelligence ( Pittsburg, PA).
Enel, P., Procyk, E., Quilodran, R., and Dominey, p. F. (2016). Reservoir computing properties of neural dynamics in prefrontal cortex. PLoS Comput. Biol. 12:e1004967. doi:10.1371/journal.pcbi.1004967
Friston, K. (2010). The Free-energy principle:a Unified brain Theory? Nat. Rev. Neurosci. 11, 127–138. doi:10.1038/nrn2787
George, D., and Hawkins, J. (2009). Towards A mathematical theory of cortical micro-circuits. PLoS Comput. Biol. 5:e1000532. doi:10.1371/journal.pcbi.1000532
Hinton, G. (2007). "How to does backpropagation in a brain," in invited talk at the NIPS ' of deep learning Workshop (Vancouver, BC).
Hinton, G. (2016). "Can the brain does back-propagation?," in invited talk at Stanford University Colloquium on Computer Systems (Stanford, CA) .
Knill, D., and Pouget, A. (2004). The Bayesian brain:the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719. doi:10.1016/j.tins.2004.10.007
K?rding, K., and K?nig, P. (2000). A Learning rule for dynamic recruitment and decorrelation. Neural NETW. 13, 1–9. doi:10.1016/s0893-6080 (00088-x)
Lee, T. S., and Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 20, 1434–1448. doi:10.1364/josaa.20.001434
Liao, Q., Leibo, J. Z., and Poggio, T. (2015). How important was weight symmetry in backpropagation? arxiv:1510.05067.
Marcus, G. (2001). The algebraic mind:integrating connectionism and cognitive science. Cambridge, Ma:mit Press.
O ' Reilly, R. C., Wyatte, D., and Rohrlich, J. (2014b). Learning through time in the thalamocortical loops. arxiv:1407.3432, 37.
Pehlevan, C., and Chklovskii, D. B. (2015). "Optimization theory of Hebbian/anti-hebbian Networks for PCA and Whitening," in 53rd Annual Allerton Conference on Commun Ication, Control, and Computing (Monticello, IL), 1458–1465.
Pinker, S. (1999). How the mind works. Ann. n.y. Acad. 882, 119–127.
Sussillo, D., and Abbott, L. (2009). Generating coherent patterns of activity from chaotic neural networks. Neuron 63, 544–557. doi:10.1016/j.neuron.2009.07.018.
Watter, M., Springenberg, J., Boedecker, J., and Riedmiller, M. (2015). "Embed to control:a locally linear latent dynamics model for control from raw images," Advances in neural information Processing Systems (Montreal, QC), 2728–2736.
Weston, J., Chopra, S., and Bordes, A. (2014). Memory Networks. arxiv:1410.3916
"Turn" machine learning and neuroscience: Is your brain also doing deep learning?