in 2007, somd' leading thinkers behind deep neural networks organized an unofficial “satellite” meeting atta margins offa prestigious annual conference on artificial intelligence. the conference had rejected their request for an official workshop; deep neural nets were still a few yrs away from taking over ai. the bootleg meeting’s final speaker was geoffrey hinton of the university of toronto, the cogg ψ-chologist and computer sci responsible for somd' biggest breakthroughs in deep nets. he started witha quip: “so, bout a yr ago, i came home to dinner, and i said, ‘i think i finally figd out how the brain works,’ and my 15-yr-old daughter said, ‘oh, daddy, not again.’”
the audience laughed. hinton continued, “so, here’s how it works.” + laughter ensued.
hinton’s jokes belied a serious pursuit: using ai to cogg the brain. tody, deep nets rule ai in pt cause of an algorithm called backpropagation, or backprop. the algorithm enables deep nets to learn from data, endowing them w'da ability to classify images, recognize speech, transl8 languages, make sense of road conditions for self-driving cars, and accomplish a host of other tasks.
but real brains are highly unlikely to be relying onna same algorithm. it’s not just that “brains ray'vel to generalize and learn better and faster than the state-of-the-art ai systems,” said yoshua bengio, a computer sci atta university of montreal, the sci director of the quebec artificial intelligence institute and 1-odda organizers of the 2007 workshop. for a variety of reasons, backpropagation isn’t compatible w'da brain’s anatomy and physiology, pticularly inna cortex.
bengio and many others inspired by hinton ‘ve been thinking bout + biologically plausible learning mechanisms that mite at least match the success of backpropagation. 3 o'em — feedback alignment, equilibrium propagation and predictive coding — ‘ve shown pticular promise. some researchers are also incorporating the properties of certain types of cortical neurons and processes s'as attention into their models. all these efforts are bringing us closer to cogging the algorithms that maybe at work inna brain.
“the brain is a huge mystery. there’s a general impression that if we can unlock some of its principles, it mite be helpful for ai,” said bengio. “but it also has val in its own rite.”
learning through backpropagation
for decades, neuroscis’ theories bout how brains learn were guided primarily by a rule introduced in 1949 by the canadian ψ-chologist donald hebb, which is often paraphrased as “neurons that fire together, wire together.” that is, the + correl8d the activity of adjacent neurons, the stronger the synaptic connections tween them. this principle, with some modifications, was successful at explaining certain limited types of learning and visual classification tasks.
but it worked far less well for large networks of neurons that had to learn from mistakes; there was no directly targeted way for neurons deep within the network to learn bout discovered errors, update themselves and make fewer mistakes. “the hebbian rule is a very narrow, pticular and not very sensitive way of using error information,” said daniel yamins, a computational neurosci and computer sci at stanford university.
nevertheless, twas the best learning rule that neuroscis had, and even b4 it dominated neurosci, it inspired the development of the 1st artificial neural networks inna l8 1950s. each artificial neuron in these networks receives multiple inputs and produces an output, like its biological counterpt. the neuron multiplies each input witha so-called “synaptic” w8 — a № signifying the importance assigned to that input — and then sums up the weited inputs. this sum tis neuron’s output. by the 1960s, twas clear that such neurons ‘d be organized into a network with an input layer and an output layer, na artificial neural network ‘d be trained to solve a certain class of simple problems. during training, a neural network settled onna best w8s fritz neurons to eliminate or minimize errors.
however, twas obvious even inna 1960s that solving + complicated problems required one or + “hidden” layers of neurons sandwiched tween the input and output layers. no one knew how to effectively train artificial neural networks with hidden layers — til 1986, when hinton, the l8 david rumelhart and ronald williams (now of northeastern university) published the backpropagation algorithm.
the algorithm wox'n two phases. inna “forward” phase, when the network is given an input, it infers an output, which maybe erroneous. the 2nd “backward” phase updates the synaptic w8s, bringing the output + in line witha target val.
to cogg this process, think offa “loss function” that describes the difference tween the inferred and desired outputs as a landscape of hills and valleys. when a network makes an inference witha given set of synaptic w8s, it ends up at some zone onna loss landscape. to learn, it needo move down the slope, or gradient, toward some valley, where the loss is minimized to the extent possible. backpropagation is a method for updating the synaptic w8s to descend that gradient.
in essence, the algorithm’s backward phase calcul8s how much each neuron’s synaptic w8s contribute to the error and then updates those w8s to improve the network’s performance. this calculation proceeds sequentially backward from the output layer to the input layer, hence the name backpropagation. do this over n'oer for sets of inputs and desired outputs, and you’ll eventually arrive at an acceptable set of w8s for the entire neural network.
impossible for the brain
the invention of backpropagation immediately elicited an outcry from some neuroscis, who said it ‘d never work in real brains. the most notable naysayer was francis crick, the nobel prize-winning co-discoverer of the structure of dna who l8r became a neurosci. in 1989 crick wrote, “as far as the learning process is concerned, tis unlikely that the brain actually uses back propagation.”
backprop is pondered biologically implausible for several major reasons. the 1st s'dat while computers can easily implement the algorithm in two phases, doin’ so for biological neural networks aint trivial. the 2nd is wha’ computational neuroscis call the w8 transport problem: the backprop algorithm copies or “transports” information bout all the synaptic w8s involved in an inference and updates those w8s for + accuracy. but in a biological network, neurons see 1-ly the outputs of other neurons, not the synaptic w8s or internal processes that shape that output. from a neuron’s pov, “it’s ok to know yr own synaptic w8s,” said yamins. “wha’’s not okay is 4u to know some other neuron’s set of synaptic w8s.”
any biologically plausible learning rule also needo abide by the limitation that neurons can access information 1-ly from neighboring neurons; backprop may require information from + remote neurons. so “if you take backprop to the letter, it seems impossible for brains to compute,” said bengio.
nonetheless, hinton and a few others immediately took up the challenge of working on biologically plausible variations of backpropagation. “the 1st paper arguing that brains do [something like] backpropagation is bout as old as backpropagation,” said konrad kording, a computational neurosci atta university of pennsylvania. ‘oer the past decade or so, as the successes of artificial neural networks ‘ve led them to dominate artificial intelligence research, the efforts to find a biological equivalent for backprop ‘ve intensified.
staying + lifelike
take, for ex, 1-odda strangest solutions to the w8 transport problem, courtesy of timothy lillicrap of g deepΨ in london and his colleagues in 2016. their algorithm, instead of relying na' matrix of w8s recorded from the forward pass, used a matrix initialized with random vals for the backward pass. once assigned, these vals never change, so no w8s nd'2 be transported for each backward pass.
to almost everyone’s surprise, the network learned. cause the forward w8s used for inference are updated with each backward pass, the network still descends the gradient of the loss function, but by a ≠ path. the forward w8s sloly align themselves w'da randomly selected backward w8s to eventually yield the correct answers, giving the algorithm its name: feedback alignment.
“it turns out that, actually, that doesn’t work as bad as you mite think t'does,” said yamins — at least for simple problems. for large-scale problems and for deeper networks with + hidden layers, feedback alignment doesn’t do swell as backprop: cause the updates to the forward w8s are less accurate on each pass than they ‘d be from truly backpropagated information, i'takes much + data to train the network.
researchers ‘ve also explored ways of matching the performance of backprop while maintaining the classic hebbian learning requirement that neurons respond 1-ly to their local neighbors. backprop can be thought of as one set of neurons doin’ the inference and another set of neurons doin’ the computations for updating the synaptic w8s. hinton’s idea was t'work on algorithms in which each neuron was doin’ both sets of computations. “twas' basically wha’ geoff’s talk was [bout] in 2007,” said bengio.
building on hinton’s work, bengio’s team proposed a learning rule in 2017 that requires a neural network with recurrent connections (that is, if neuron a activates neuron b, then neuron b in turn activates neuron a). if such a network is given some input, it sets the network reverberating, as each neuron responds to the push and pull of its immediate neighbors.
eventually, the network reaches a state in which the neurons are in equilibrium w'da input and each other, n'it produces an output, which can be erroneous. the algorithm then nudges the output neurons toward the desired result. this sets another signal propagating backward through the network, setting off similar dynamics. the network finds a new equilibrium.
“the beauty of the math s'dat if you compare these two configurations, b4 the nudging and after nudging, you’ve got all the information you nd'2 find the gradient,” said bengio. training the network involves simply repeating this process of “equilibrium propagation” iteratively over lotso' labeled data.
the constraint that neurons can learn 1-ly by reacting to their local environment also finds expression in new theories of how the brain perceives. beren millidge, a dral student atta university of edinburgh and a visiting fello atta university of sussx, and his colleagues ‘ve been reconciling this new view of perception — called predictive coding — w'da requirements of backpropagation. “predictive coding, if it’s set up in a certain way, will give you a biologically plausible learning rule,” said millidge.
predictive coding posits that the brain is constantly making predictions bout the causes of sensory inputs. the process involves hierarchical layers of neural processing. to produce a certain output, each layer has to predict the neural activity of the layer belo. if the highest layer expects to see a face, it predicts the activity of the layer belo that can justify this perception. the layer belo makes similar predictions bout wha’ to expect from the one beneath it, and so on. the loest layer makes predictions bout actual sensory input — say, the photons falling onna retina. in this way, predictions flo from the higher layers down to the loer layers.
but errors can occur at each lvl of the hierarchy: differences tween the prediction dat a' layer makes bout the input it expects na actual input. the bottommost layer adjusts its synaptic w8s to minimize its error, based onna sensory information it receives. this adjustment results in an error tween the newly updated loest layer na one above, so the higher layer has to readjust its synaptic w8s to minimize its prediction error. these error signals ripple upward. the network goes back and forth, til each layer has minimized its prediction error.
millidge has shown that, w'da proper setup, predictive coding networks can converge on much the same learning gradients as backprop. “you can get really, really, really close to the backprop gradients,” he said.
however, for every backward pass dat a' traditional backprop algorithm makes in a deep neural network, a predictive coding network has to iterate multiple times. whether or not this is biologically plausible depends on exactly how long this mite take in a real brain. crucially, the network has to converge na' solution b4 the inputs from the realm outside change.
“it can’t be like, ‘i’ve got a tiger leaping at me, let me do 100 iterations back and forth, up and down my brain,’” said millidge. still, if some inaccuracy is acceptable, predictive coding can arrive at generally useful answers quickly, he said.
some scis ‘ve taken onna nitty-gritty task of building backprop-like models based onna known properties of individual neurons. standard neurons ‘ve dendrites that collect information from the axons of other neurons. the dendrites transmit signals to the neuron’s cell body, where the signals are integrated. that may or may not result in a spike, or action potential, goin out onna neuron’s axon to the dendrites of post-synaptic neurons.
but not all neurons ‘ve exactly this structure. in pticular, pyramidal neurons — the most abundant type of neuron inna cortex — are distinctly ≠. pyramidal neurons ‘ve a treelike structure with two distinct sets of dendrites. the trunk reaches up and branches into wha’ are called apical dendrites. the √ reaches down and branches into basal dendrites.
models developed indiely by kording in 2001, and + recently by blake richards of mcgill university na quebec artificial intelligence institute and his colleagues, ‘ve shown that pyramidal neurons ‘d form the basic units offa deep learning network by doin’ both forward and backward computations simultaneously. the key is inna separation of the signals entering the neuron for forward-goin inference and for backward-floing errors, which ‘d be handled inna model by the basal and apical dendrites, respectively. information for both signals can be encoded inna spikes of electrical activity that the neuron sends down its axon as an output.
inna l8st work from richards’ team, “we’ve gotten to the point where we can show that, using fairly realistic simulations of neurons, you can train networks of pyramidal neurons to do various tasks,” said richards. “and then using slitely + abstract versions of these models, we can get networks of pyramidal neurons to learn the sort of difficult tasks that pplz do in machine learning.”
the role of attention
an implicit requirement for a deep net that uses backprop tis presence offa “teacher”: something that can calcul8 the error made by a network of neurons. but “thris no teacher inna brain that tells every neuron inna motor cortex, ‘you ‘d be switched on and you ‘d be switched off,’” said pieter roelfsema of the netherlands institute for neurosci in amsterdam.
roelfsema thinks the brain’s solution to the problem is inna process of attention. inna l8 1990s, he and his colleagues showed that when monkeys fix their gaze on an object, neurons that represent that object inna cortex become + active. the monkey’s act of focusing its attention produces a feedback signal for the responsible neurons. “tis a highly selective feedback signal,” said roelfsema. “it’s not an error signal. tis just saying to all those neurons: you’re goin to be held responsible [for an action].”
roelfsema’s insite was that this feedback signal ‘d enable backprop-like learning when combined with processes revealed in certain other neurosci findings. for ex, wolfram schultz of the university of cambridge and others ‘ve shown that when animals perform an action that yields better results than expected, the brain’s dopamine system is activated. “it floods the whole brain with neural modulators,” said roelfsema. the dopamine lvls act like a global reinforcement signal.
in theory, the attentional feedback signal ‘d prime 1-ly those neurons responsible for an action to respond to the global reinforcement signal by updating their synaptic w8s, said roelfsema. he and his colleagues ‘ve used this idea to build a deep neural network and study its mathematical properties. “it turns out you get error backpropagation. you get basically the same equation,” he said. “but now it became biologically plausible.”
the team presented this work atta neural information processing systems online conference in dec. “we can train deep networks,” said roelfsema. “it’s 1-ly a factor of two to 3 sloer than backpropagation.” as such, he said, “it beats all the other algorithms that ‘ve been proposed to be biologically plausible.”
nevertheless, concrete empirical evidence that living brains use these plausible mechanisms remains elusive. “i think we’re still missing something,” said bengio. “in my experience, it ‘d be a lil thing, maybe a few twists to 1-odda existing methods, that’s goin to really make a difference.”
meanwhile, yamins and his colleagues at stanford ‘ve suggestions for how to determine which, if any, of the proposed learning rules tis correct one. by analyzing 1,056 artificial neural networks implementing ≠ models of learning, they found that the type of learning rule governing a network can be identified from the activity offa subset of neurons over time. it’s possible that such information ‘d be recorded from monkey brains. “it turns out that if you ‘ve the rite collection of observables, it mite be possible to come up witha fairly simple scheme that ‘d allo you to identify learning rules,” said yamins.
given such advances, computational neuroscis are quietly optimistic. “there are a lotta ≠ ways the brain ‘d be doin’ backpropagation,” said kording. “and evolution is pretty damn awesome. backpropagation is useful. i presume that evolution kind of gets us there.”
original content at: www.quantamagazine.org…
authors: anil ananthaswamy