The BRAIN Initiative: Surviving the Data Deluge

by Alan S. Brown

Mapping brain activity will produce nearly as much data as the Large Hadron Collider, yet managing the sheer volume of information will be the simplest challenge for brain data managers.

The Author

Alan S. Brown

The Researchers

Terrence Sejnowski
Fritz Sommer
Michael Dickinson
Richard Weinberg
Tom Insel
By mapping the activity of neurons in the brain, an important aspect of the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative, researchers hope to discover fundamental insights into how the mind develops and functions, as well as new ways to address horrific brain diseases and trauma.

Yet before researchers can even begin taking their first measurements, they must face up to an unusual challenge: deciding how to handle the torrent of data this project is expected to generate.

The scope of the problem is startling. Measuring just a fraction of the neurons in the brain of a single mouse could generate nearly as much data as the 17-mile-long Large Hadron Collider or the most advanced astronomical observatories.

This poses unprecedented challenges to brain researchers. These include developing universally accepted ways to label and organize data; deciding how to share expensive, one-of-a-kind measurement tools; and scaling up analytical and visualization software to handle massive amounts of information.

The Kavli Foundation's Kavli Futures Symposium, "Data Deluge from the Brain Activity Map," brought together 18 top academic, university, and government brain and big data experts to sort through these issues.

Terrence Sejnowski, director of the Salk Institute for Biological Studies' Computational Neurobiology Laboratory and one of the symposium's organizers, went into the meeting expecting to find problems that no one could figure out. Instead, he said that the attendees could point to many possible solutions based on existing models and techniques.

Even so, Sejnowski said, implementing those suggestions is not going to be easy.

The Million-Neuron March

The data problem presents challenges because even the simplest brains are extraordinarily complex. Scientists will need to sample tens and then hundreds of thousands of neurons in multiple locations to understand how they work together to sense, think, and act.

Most of those measurements will likely begin with the cerebral cortex, the outermost layer of the brain. Its specialized brain cells, or neurons, play a key role in perception, attention, memory, and thought. The cerebral cortex of a rat might have 15-20 million neurons; a cat, 300 million; and a chimpanzee, 5-6 billion. For a human, the figure is around 20 billion.

A single neuron looks like a long, branching vine with thousands of tendrils, or synapses, which connect with other neurons. Tiny electrical currents fluctuate along the neuron's surface. Then, perhaps in response to a stimulus or to send a command, the neuron sends a jolt of electric current, or spike, across a synapse and into another neuron.

The BrainGate Neural Interface System is an implantable device with 96 microelectrodes capable of picking up electrical activity from nearby neurons. Researchers are developing smaller arrays whose nanoscale electrodes will one day be able to interrogate thousands and tens of thousands of neurons. (Credit: Brown University)
The BrainGate Neural Interface System is an implantable device with 96 microelectrodes capable of picking up electrical activity from nearby neurons. Researchers are developing smaller arrays whose nanoscale electrodes will one day be able to interrogate thousands and tens of thousands of neurons. (Credit: Brown University)
This process is repeated hundreds of times each second as electrical signals race from one side of our brains to the other. This creates ever-changing networks of millions of neurons -- and billions of connections -- that gives rise to our senses, thoughts, and actions.

Until now, the best way to observe these networks has been to implant a small electrode array in the brain. This enables investigators to measure spikes from up to 200 nearby neurons. Such measurements have taught us a great deal about the inner workings of the brain. Several research teams now use spike measurements to enable paralyzed humans to control robotic arms and communicate through computer devices.

Yet today's electrodes provide the barest sampling of the brain's sprawling neural networks. It is like looking at the brushwork in one corner of a painting without seeing the masterpiece itself. This is about to change. Building on advances in nanotechnology, electronics, and optics, scientists and engineers are already prototyping ways to monitor networks of thousands and tens of thousands of neurons at a time.

Ultimately, they hope to build instruments that will monitor the activity of hundreds of thousands and perhaps 1 million neurons, taking 1,000 or more measurements each second. The early work will likely focus on research animals, such as zebrafish, mice and monkeys. While this is only a small fraction of their cerebral cortex, it is a large enough sample to provide powerful insights into neural behavior as thoughts and sensory stimuli propagate across our brains.

This million-neuron march will unleash a torrent of data. A brain observatory that monitors 1 million neurons (or 100,000 neurons in 10 subjects) 1,000 times per second would generate 1 gigabyte of data every second, 4 terabytes each hour, and 100 terabytes per day. Even after compressing the data by a factor of 10 (to make it easier to store), a single advanced brain laboratory would produce 3 petabytes of data annually, roughly as much data as the world's largest and most complex science projects.

Probing the Brain

Brain researchers have little experience dealing with such massive amounts of data. This is complicated by a second issue: they understand so little about how the brain actually works, they are not even certain what data they should collect.

Take memory, for example. Scientists know that emotions cause memories to stick and often trigger their recall. Yet they can barely discern the mechanisms by which millions of individual neurons interact with one another to create those emotions, or how they alter how we form, store, retrieve, and forget memories.

Compare the plight of brain investigators with physicists or astronomers. Their fields bloom with mysteries, yet they know enough to propose hypotheses and make predictions that they can confirm by measuring a particular set of wavelengths or band of energy. In other words, their hypotheses guide their research.

Because we know so little about the brain, researchers have fewer hypotheses to guide them. Instead, they must measure every piece of data they can. Only after analyzing it will they be able to develop theories that enable them to search for specific patterns.

In the meantime, though, scientists will have to brace for petabytes of data. Managing so much data will be like trying to drink water from a fire hose, Sejnowski said.

Synapse in the brain (Credit: UCLA/Public Domain)
Synapse in the brain (Credit: UCLA/Public Domain)
Fortunately, companies like Google, Microsoft, and Qualcomm have experience with storing, maintaining, and accessing large data sets. They recommend locating data storage, high-performance computers, and specialized software at "brain observatories" that would house the specialized instruments used to generate brain activity data. This would allow the observatories to take advantage of high-speed connections to speed the data from the laboratory into banks of servers. Researchers could then access, search, visualize, and analyze petabytes of data through the cloud by using a web browser.

Before that happens, though, scientists must develop a consistent way to describe and label their data. This is not as easy as it sounds.

To start with, investigators must describe precisely where they collected the data, Clay Reid, a professor of neurobiology at Harvard University, told the symposium. If they cannot anchor recordings to neurons in specific parts of the brain, it will be difficult to duplicate experiments or compare results.

That means neuroscientists need a consistent map of brain anatomy. Unfortunately, they have never had one. In part, this is due to the natural advance of science: in the fast-moving field of neuroscience, researchers constantly reorganize brain maps to reflect new knowledge. They also face a vocabulary problem. Sometimes, different research groups will use several words to describe a single location; other times, a single word may mean different things to different researchers. Nor do maps remain consistent when moving across species.

Yet consistent maps are not enough. Scientists must also describe their experiments more precisely, so others can duplicate them or compare results, Anthony Lewis, a senior director at Qualcomm, told the symposium. He recommends describing experimental methods with so much detail an industrial robot could replicate each step of the procedure. For example, instead of merely stating that the researchers implanted an electrode into a certain part of the brain, he believes they should also include such factors as insertion speed, depth, force, and angle. This would improve the consistency of the results and make comparisons easier.

Brain Observatories

Once researchers resolve labeling and methodology issues, they must collect, store, and disseminate the information. Most symposium attendees believe this will take place through brain observatories. Like astronomical observatories or the first genomic laboratories, these facilities would bring together specialized brain activity mapping instruments unavailable anywhere else. These might range from nanoscale electrodes capable of measuring signals from neurons across the brain to high-speed optical instruments with neuron-scale resolution.

The ability to record continuously hundreds of thousands or millions of neurons would make new types of experiments possible, said Fritz Sommer, director of University of California, Berkeley's Redwood Center for Theoretical Neuroscience. For example, he proposed using virtual reality systems to control sensory inputs while tracking freely moving animals.

University of Washington bioengineer Michael Dickinson has done something like that with his Fly-O-Rama. He tethers fruit flies to a steel rod, and then projects a changing virtual landscape on the wall to convince them that they are flying. By controlling the image, Dickinson can prompt flies to repeat specific behaviors, such as avoiding obstacles, so he can study how their responses vary. Sommer believes similar systems would make it easier to study natural animal behaviors in controlled settings.

The virtual reality systems and other one-of-a-kind research instruments housed in brain observatories would change how many investigators work. Today, most brain researchers work in their own laboratories, using their own equipment, and retaining proprietary control over their data.

A BrainGate microelectrode was implanted in the motor cortex of a 58-year-old woman, paralyzed by a stroke for almost 15 years. She uses her thoughts to control a robotic arm, grasp a bottle of coffee, serve herself a drink, and return the bottle to the table. (Credit: Brown University)
A BrainGate microelectrode was implanted in the motor cortex of a 58-year-old woman, paralyzed by a stroke for almost 15 years. She uses her thoughts to control a robotic arm, grasp a bottle of coffee, serve herself a drink, and return the bottle to the table. (Credit: Brown University)
They would have to share the expensive, one-of-a-kind instruments housed in brain observatories. There would certainly be more researchers and experiments than any set of observatories could accommodate. Fortunately, there are good models for sharing equipment. Astronomers and physicists have developed a variety of means to ensure the most deserving experiments receive time on one-of-a-kind instruments.

Symposium participants strongly supported sharing all data from experiments funded by the BRAIN Initiative. The Human Genome Project provides a good model for how to do that. Today, researchers cannot publish a paper in genomics unless they agree to add their data to a government repository that anyone can access.

Symposium members envisioned sharing data with as many researchers, communities, and students as possible, including non- experts. As Tom Insel, director of the National Institute of Mental Health (NIMH), noted, many so-called "non-experts" have contributed to groundbreaking results in genomics.

Crunching Numbers

The brain observatory servers, as envisioned at the symposium, would collect, store, and disseminate the massive streams of data generated by specialized laboratory instruments. The servers would also provide a core array of standardized software so researchers could search, analyze, and visualize the massive database.

Creating such software poses challenges too. Scientists already have mathematical tools to help analyze neural data. Some separate spikes from background electrical noise, while others evaluate spike frequencies. Yet all of them are designed to work with small numbers of neurons. Software developers will have to scale up these tools to handle vastly larger data sets. They will also have to adapt the software to run efficiently on powerful supercomputers that consist of thousands or even hundreds of thousands of separate processors.

Software developers will also want to apply the latest software tools to brain analysis. One example is machine learning, which uses powerful computers to search databases for hidden patterns. Today, machine learning applications are as diverse as DNA sequence mining and speech recognition to spam filtering and stock market trading. Such software could help brain investigators find research threads amid all the data they will suddenly be able to access.

Visualization is another tool that is likely to be part of any package of core software. While machine learning is powerful, humans also have an astonishing ability to grasp the implications of complex images and video streams. The ability to see how pathways change as information flows through the brain as it responds to different stimuli may give researchers more insight into a process than merely analyzing data. Any brain observatory would need potent visualization tools to take advantage of this innate human ability.

Observatories designed to observe the brain could generate as much data as major scientific instruments, such as 17-mile-long Large Hadron Collider (LHC), the most powerful particle accelerator ever built. (Credit: Maximilien Brice, CERN)
Observatories designed to observe the brain could generate as much data as major scientific instruments, such as 17-mile-long Large Hadron Collider (LHC), the most powerful particle accelerator ever built. (Credit: Maximilien Brice, CERN)

Richard Weinberg, a research associate professor at University of Southern California School of Cinematic Arts, gave an example at the symposium. He proposed a three-dimensional holographic display that would superimpose neuronal activity over a map of the brain's anatomical structure. This might help investigators to determine, for example, the extent to which neural networks for specific tasks are hardwired or form and reform in our brains. Others suggested data walls that would let researchers interact directly with visualized data, in much the way Tom Cruise did in the movie, Minority Report.

The Tenth Kavli Futures Symposium envisioned that new software tools, powerful servers and computers, and pioneering laboratory instruments are likely to alter the way we understand the brain. Current theories of how the brain works are based on low-resolution optical images and recordings from only a handful of neurons. The ability to observe, in real time, networks of hundreds of thousands of neurons is likely to bring startling revelations.

At first, scientists are likely to struggle with trying to identify patterns amid so much information. They will have to constantly adjust and readjust their models. But ultimately, they will begin to develop hypotheses and models based on solid observation. They will see patterns and make predictions that they will be able to test with powerful new instruments.

Slowly, the brain will give up its secrets. We may learn new ways to identify and treat dementia, schizophrenia, paralysis and other neural conditions. Perhaps we will develop new insights into the nature of consciousness itself.

Open, consistent, easily accessible data will make these discoveries possible.