China's New Supercomputer Puts the US Even Further Behind
This week, the China’s Sunway TaihuLight officially became the fastest supercomputer in the world. The previous champ? Also from China. What used to be an arms race for supercomputing primacy among technological nations has turned into a blowout.
The Sunway TaihuLight is indeed a monster: theoretical peak performance of 125 petaflops, 10,649,600 cores, and 1.31 petabytes of primary memory. That’s not just “big.” Former Indiana Pacers center Rik Smits is big. This is, like, mountain big. Jupiter big.
But TaihuLight’s abilities are matched only by the ambition that drove its creation. Fifteen years ago, China claimed zero of the top 500 supercomputers in the world. Today, it not only has more than everyone else—including the United States—but its best machine boasts speeds five times faster the best the US can muster. And, in a first, it achieves those speeds with purely China-made chips.
Think of TaihuLight, then, not in terms of power but of significance. It’s loaded with it, not only for what it can do, but how it does it.
The Super Supercomputer
If you think of a supercomputer as a souped-up version of what you’re playing EVE Online with at home, well, as it turns out you’re not entirely wrong. “At one level they’re not very different from your desktop system,” says Michael Papka, director of the Argonne Leadership Computing Facility (home to Mira, the world’s sixth-fastest supercomputer). “They have a processor that looks very similar to the one in a laptop or desktop system—there’s just a lot of them connected together.”
Your MacBook, for example, uses four cores; Mira harnesses just under 800,000. It uses those them to simulate and study everything from weather patterns to the origins of the universe. The faster the supercomputer, the more precise the models and simulations.
On that basis alone, TaihuLight is a singular accomplishment. Its 10.6 million cores are more than three times the previous leader, China’s Tianhe-2, and nearly 20 times the fastest U.S. supercomputer, Titan, at Oak Ridge National Laboratory. “It’s running very high rates of execution speed, very good efficiency, and very good power efficiency,” says University of Tennessee computer scientist Jack Dongarra. “It’s really quite impressive.”
If anyone’s qualified to say so, it’s Dongarra. He created the benchmark by which supercomputers were first compared in 1993 by TOP500, the organization that still ranks them today, and published the first independent evaluation [PDF] of TaihuLight’s capabilities.
Still, hardware’s not everything. Because supercomputers run specialized tasks, they require specialized software. “You can use a factory as an example,” says Papka. “A lot of people are working on putting a car together at the same time, but they’re all working in a coordinated manner. People who write programs for supercomputers have to get all of the pieces working together.”
TaihuLight passes that test, too. In fact, three of the six finalists for a prestigious high-performance computing award are applications built to run on TaihuLight. Aside from relatively slow memory—a conscious trade off to save money and power consumption—this rig is ready to go to work. “This is not a stunt machine,” says Dongarra. And it’s years ahead of anything the US has.
A Command Line Lead
TaihuLight is faster than anything scheduled to come online in the US until 2018, when three Department of Energy sites will each receive a machine expected to range from 150 to 200 petaflops. That’s ahead of where China is now—but two years is half an eternity in computer-time. That the lead has gotten so large galls some lawmakers for reasons both political and practical. Legislation exists calling for a supercomputer funding boost, but has spent the last year mired in the Senate.
“Massive domestic gains in computing power are necessary to address the national security, scientific, and health care challenges of the future,” says Rep. Randy Hultgren, a Republican from Illinois whose American Super Computing Leadership Act has twice been passed by the House of Representatives. “It is increasingly evident that America is losing our lead.” Meanwhile the DOE is working on innovating with the budget it has.
The other significant TaihuLight achievement stings US interests even more, because it’s political. China’s last champ, Tianhe-2, had Intel inside. But in February of 2015, the Department of Commerce, citing national security concerns—supercomputers excel at crunching metadata for the NSA and their foreign equivalents—banned the sale of Intel Xeon processor to Chinese supercomputer labs.
Rather than slow the rate of Chinese supercomputer technology, the move appears to have had quite the opposite effect. “I believe the Chinese government put more research funding into the projects to develop and put in place indigenous processors,” Dongarra says. “The result of that, in some sense, is this machine today.”
A Race Worth Winning
Broadly, it’s true that better supercomputers benefit the whole world, assuming scientists get to work on them. It doesn’t exactly matter what flavor the chips are. “On some level, it’s a trophy that you put on your mantel,” Dongarra says. “But what’s more important is what kind of science it does, what kind of discoveries you make.”
TaihuLight’s stewards tell Dongarra that they’re putting all that power toward advanced manufacturing, Earth-system modeling and weather forecasting, life science, and big data analytics. That sounds like a broad range, but it’s just a small slice of what supercomputers’ capabilities. “Each time we make an increase, we can add more science to the problem,” Papka says. “For the foreseeable future, until we can model the real world on a quark-for-quark basis, we’ll need more powerful computers.”
And those computers are coming—especially if the US gets serious about catching up.
China claims exascale by 2020, three years before U.S.
China has set 2020 as the date for delivering an exascale system, the next major milestone in supercomputing performance. This is three years ahead of the U.S. roadmap.
This claim is from China's National University of Defense Technology, as reported Thursday by China's official news agency, Xinhua.
This system will be called Tianhe-3, following a naming convention that began in 2010 when China announced its first petaflop-scale system, Tianhe-1. The first petascale system was developed in the U.S. in 2008.
The U.S. roadmap calls for delivering an exascale system -- capable of 1,000 petaflops -- in 2023.
But it's not clear just what China will deliver in 2020. Theoretically, an exascale computer could be built today but it wouldn't be practical. The power needs may be in excess of what the U.S. believes is possible, power-wise, by 2023: A system that uses 20 to 30 megawatts.
"It's entirely probable that one or more governments will deploy supercomputers with hypothetical peak performance of an exaflop by 2020," said Steve Conway, a high-performance computing analyst at IDC. "An exaflop is an arbitrary milestone, a nice round figure with the kind of symbolic lure the four-minute mile once held."
But what will China be capable of delivering in 2020?
The first stage will likely be peak exaflop performance, and then a Linpack test making make it eligible for ranking on the Top 500 supercomputing list, said Conway.
But the measure "that counts most, but will be likely be celebrated least," said Conway, "is sustained exaflop performance on a full, challenging 64-bit application."
That third stage probably won't happen until the 2022 to 2024 timeframe, he said.
That's the timeframe the U.S. has set, and its definition of exascale is sustained performance.
The White House, in an executive order last year, released a plan for coordinating exascale development and defined an exascale system capable of "100 times the performance of current 10-petaflop systems across a range of applications representing government needs."
The U.S. emphasis is on application performance, not on a peak performance record. Even if China does meet its 2020 goal, the debate will be over the usefulness of the machine. Nonetheless, China will likely use the machine to underscore its science advancement.
China has been leading the Top 500 list with its 34-petaflop Tianhe-2 system, but that list is due to be updated next week at the ISC High Performance Conference in Frankfurt, Germany.
DR. GRACE AUGUSTINE: What we think we know - is that there's some kind of electrochemical communication between the roots of the trees. Like the synapses between neurons. Each tree has ten to the fourth connections to the trees around it, and there are ten to the twelfth trees on Pandora... That's more connections than the human brain. You get it? It's a network - a global network. And the Na'vi can access it - they can upload and download data - memories - at sites like the one you just destroyed.
Sunway-TaihuLight outperforms Tianhe-2 as world's fastest supercomputer
Source: Xinhua | 2016-06-20 15:56:14 | Editor: huaxia
NANJING, June 20 (Xinhua) -- China's new supercomputing system, Sunway-TaihuLight, was named the world's fastest computer at the International Supercomputing Conference in Germany on Monday.
The National Supercomputing Center was also unveiled Monday in Wuxi, east China's Jiangsu Province, where the new-generation supercomputer is installed.
With processing capacity of 125.436 petaflops (PFlops) per second, which means it can perform quadrillions of calculations per second at peak performance, Sunway-TaihuLight is the first supercomputer to achieve speeds in excess of 100 PFlops.
The computing power of the supercomputer is provided by a China-developed many-core CPU chip, which is just 25 square cm.
What is Neural Processing?
Neural processing originally referred to the way the brain works, but the term is more typically used to describe a computer architecture that mimics that biological function. In computers, neural processing gives software the ability to adapt to changing situations and to improve its function as more information becomes available. Neural processing is used in software to do tasks such as recognize a human face, predict the weather, analyze speech patterns, and learn new strategies in games.
The human brain is composed of approximately 100 billion neurons. These neurons are nerve cells that individually serve a simple function of processing and transmitting information. When the nerve cells transmit and process in clusters, called a neural network, the results are complex – such as creating and storing memory, processing language, and reacting to sudden movement.
Artificial neural processing mimics this process at a simpler level. A small processing unit, called a neuron or node, performs a simple task of processing and transmitting data. As the simple processing units combine basic information through connectors, the information and processing becomes more complex. Unlike traditional computer processors, which need a human programmer to input new information, neural processors can learn on their own once they are programmed.
For example, a neural processor can improve at checkers. Just like a human brain, the computer learns that certain moves by an opponent are made to create traps. Basic programming might allow the computer to fall for the trap the first time. The more often a certain trap appears, however, the greater attention the computer pays to that data and begins to react accordingly.
Neural programmers call the increasing attention that the computer pays to certain outcomes "weight." Traditional processing would provide the computer only with the basic rules of the game and a limited number of strategies. Neural processing, by gathering data and paying greater attention to more important information, learns better strategies as time goes on.
The power of neural processing is in its flexibility. In the brain, information is presented as an electrochemical impulse – a small jolt or a chemical signal. In artificial neural processing, the information is presented as a numeric value. That value determines whether the artificial neuron goes active or stays dormant, and it also determines where it sends its signal. If a certain checker is moved to a certain square, for instance, the neural network reads that information as numeric data. That data is compared against a growing amount of information, which in turn creates an action or output.
DR. GRACE AUGUSTINE: What we think we know - is that there's some kind of electrochemical communication between the roots of the trees. Like the synapses between neurons. Each tree has ten to the fourth connections to the trees around it, and there are ten to the twelfth trees on Pandora... That's more connections than the human brain. You get it? It's a network - a global network. And the Na'vi can access it - they can upload and download data - memories - at sites like the one you just destroyed.
China has launched mass production of the country’s newly-unveiled first ever embedded neural network processing unit (NPU), marking one more major breakthrough in the country’s NPU research and development.
The VC0758 NPU, developed by China’s leading video technology supplier Vimicro is based on a data driven parallel computing model, which can greatly improve the smart chip’s computational ability at a lower power consumption rate, said Zhang Yundong, executive director of the Vimicro State Key Laboratory on digital multimedia chip technology.
Vimicro on Monday announced that it has realized mass production of the VC0758 NPU after five years of research, suggesting that China is now one of the countries with the most advanced artificial intellectual technology in deep learning based on a data driven parallel computing model, according China National Radio (CNR).
Zhang added that the chip is especially skilled in processing multimedia data such as videos and images. Its capabilities will especially be brought into full play when it is used for embedded computer vision applications, CNR reported.
The CNR noted that the smart chip will give a major boost to improvement for China’s video surveillance industry and can help the country to establish a leading position in the world.
According to China Central Television, the VC0758 will be widely employed in drone, intelligent drive assistance systems or in the field of robot vision.
China unveils first embedded neural network processing unit
China has launched mass production of the country’s newly-unveiled first ever embedded neural network processing unit (NPU), marking one more major breakthrough in the country’s NPU research and development.
The VC0758 NPU, developed by China’s leading video technology supplier Vimicro is based on a data driven parallel computing model, which can greatly improve the smart chip’s computational ability at a lower power consumption rate, said Zhang Yundong, executive director of the Vimicro State Key Laboratory on digital multimedia chip technology.
Vimicro on Monday announced that it has realized mass production of the VC0758 NPU after five years of research, suggesting that China is now one of the countries with the most advanced artificial intellectual technology in deep learning based on a data driven parallel computing model, according China National Radio (CNR).
Zhang added that the chip is especially skilled in processing multimedia data such as videos and images. Its capabilities will especially be brought into full play when it is used for embedded computer vision applications, CNR reported.
The CNR noted that the smart chip will give a major boost to improvement for China’s video surveillance industry and can help the country to establish a leading position in the world.
According to China Central Television, the VC0758 will be widely employed in drone, intelligent drive assistance systems or in the field of robot vision.