Over the past decade, Nvidia has more or less invented the modern AI and machine-learning market. The company continues to make remarkable strides generation-on-generation and Ampere’s performance per dollar is very good. Nvidia currently has no serious competition in the GPU AI market.
But — having said all that — there’s no such thing as “Huang’s Law.” That’s the appellation assigned to Nvidia CEO Jensen Huang by the Wall Street Journal‘s Christopher Mims.
So, what is Huang’s Law? Well, it’s a misunderstood definition of Moore’s Law, but with the name “Huang” in front of it instead of “Moore.” Specifically:
“I call it Huang’s Law, after Nvidia Corp. chief executive and co-founder Jensen Huang. It describes how the silicon chips that power artificial intelligence more than double in performance every two years.”
Why the Definition Doesn’t Work
Mims begins his explanation by conflating Moore’s Law with Dennard scaling. Moore’s Law predicted that the number of transistors on a chip would double every two years. Dennard scaling predicted that building smaller transistors closer together would reduce their power consumption and allow for faster clocks. Moore’s Law is a measure of density. Dennard scaling measures performance per watt. It’s true that these two distinct discoveries are often combined in colloquial conversation, but in this specific case, conflating the two obfuscates the truth of the situation.
Mims writes: “Moore’s law has slowed, and some say it’s over. But a different law, potentially no less consequential for computing’s next half century, has arisen.”
As we’ve discussed a few times on this website, the meaning of Moore’s Law is complex and prone to periodic shifts. If you mistakenly conflate Moore’s Law and Dennard scaling, Moore’s Law has slowed a great deal. If you strictly consider Moore’s Law as a measure of transistor density, it’s actually kept close to its long-term historical pace. This chart from 1970 – 2018 makes that quite clear. What broke was Dennard scaling, which ended in roughly 2004.
To back up his argument, Mims turns to Bill Dally, Senior VP of research at Nvidia:
Between November 2012 and this May, performance of Nvidia’s chips increased 317 times for an important class of AI calculations, says Bill Dally, chief scientist and senior vice president of research at Nvidia. On average, in other words, the performance of these chips more than doubled every year, a rate of progress that makes Moore’s Law pale in comparison.
I’m the one that labeled this image — the original lacks labels — but based on the timeline, these are the GPUs the graph is likely referring to. Pascal launched in May 2016, Volta was announced in May 2017, and Turing shipped in the back half of 2018.
I’m going to ignore the fact that “an important class of AI calculations” is literally not a metric and treat the 317x performance claim as truthful. That’s an enormous increase in performance. The only trouble is, Huang’s Law is self-evidently dependent on Moore’s Law + the remains of Dennard scaling + the chum bucket of additional technologies like FinFET that engineers dump into every node to squeeze reasonable improvements out of it.
If you check the chart, most of Nvidia’s performance improvements are tied specifically to node transitions. Turing is the only exception. Nvidia has significantly improved performance without a node transition twice in recent history — first from Kepler to Maxwell (the first tiny bump just before 2015) and then from Volta to Turing. But good as Nvidia is at wringing additional process from the same node, you can also see how important new process nodes have been to Nvidia’s overall performance. Huang’s Law, if it existed, could not be a replacement for Moore’s Law. Huang’s Law is enabled by Moore’s Law.
As the benefit of node transitions drop, the rate of AI performance improvement is going to slow.
Why Huang’s Law Doesn’t Exist
First, the existence of an independent Huang’s Law is an illusion. Despite Dally’s comments about moving well ahead of Moore’s Law, it would be far more accurate to say “Nvidia has taken advantage of Moore’s Law to boost transistor density, while simultaneously improving total device performance at an effectively faster rate than Dennard scaling alone would have predicted.”
Huang’s Law can’t exist independently of Moore’s Law. If Moore’s Law is in trouble — either in terms of transistor scaling or the loosely defined performance-improvement inclusions, Huang’s Law is, too. TSMC has forecast only limited performance improvements at 5nm and below, and that’s going to have an impact on how much performance each new generation of product can deliver. This is going to put more pressure on Nvidia’s engineers to squeeze better performance out at a per-transistor level, and humans aren’t actually very good at that.
Second, it’s too early to make this kind of determination. When Gordon Moore published his first paper in 1965, he examined the time period from 1959 – 1964. Later, in 1975, he revised his paper again, and increased the expected time to double from one year to two. That same year, Caltech professor Carver Mead popularized the term “Moore’s Law.” By the time he did, the “law” had been in effect for about 16 years. If we look at the WSJ’s representation of Nvidia’s timeline, either Pascal or Volta was the first GPU to really offer any kind of useful AI/ML performance. “Huang’s Law” is all of 3-4 years old. Even if we use Dally’s 2012 figure, it’s just eight years old. It’s a premature declaration.
Third, it’s not clear that AI/ML improvement can continue to grow at its present rate, even if we assume Moore’s Law improvements continue to deliver substantial benefits. Adding support for features like FP16 and INT8 allows AMD, Nvidia, and Intel to increase AI performance by executing more instructions in a single clock cycle, but not every type of workload delivers suitable results this way and there aren’t an infinite number of useful, ever-smaller low-precision targets to choose from. Over the last few years, manufacturers have been very busy picking low-hanging fruit. Eventually, we’re going to run out. We can’t subdivide a floating-point standard down to “FP0.0025” in an attempt to build a hyper-efficient neural net. Amazon, Google, Facebook, and similar companies do not have an infinite amount of space to devote to building ever-larger AI networks.
Consider the smartphone. Ten years ago, it was not unusual for a new smartphone to double or nearly-double the performance of its predecessor, to say nothing of the visual upgrade once “Retina” displays hit the market. That doesn’t happen any longer. The rate of improvement, which was meteoric in the beginning, has slowed.
If a person had proposed a “Job’s Law” of smartphone performance improvement back in late 2010 based on the rate of improvement from the iPhone -> iPhone 3G -> iPhone 3GS -> iPhone 4, they would look pretty silly in 2020.
Again, this is not some knock against Nvidia. The AI/ML market has exploded quickly, with dozens of companies working on silicon, and Nvidia has led the entire industry. Jensen Huang is an incredibly successful CEO. But with Dennard Scaling gone, low-hanging fruit being quickly gathered, and TSMC warning of fewer performance improvements on future nodes, it’s premature to be declaring that anyone has established any kind of law governing long-term performance growth. Dennard scaling lasted for decades. Moore’s Law (again, strictly defined in terms of density) is still chugging along 61 years later.
I say we give it a decade. If Huang’s Law is a real thing now, it’ll still be a real thing in 2030. If it isn’t, it never existed in the first place. No matter what the answer is, Jensen Huang will still go down as one of the business leaders who pioneered artificial intelligence and machine learning.
Now Read:
- Resellers Used Bots to Dominate the RTX 3080 Launch
- Dual GPU Gaming Gives Up the Ghost as Nvidia Ends SLI Support
- Ampere Unleashed: Nvidia’s RTX 3080 Redefines High-End Gaming
sourse ExtremeTechExtremeTech https://ift.tt/2FPTQ4x
ليست هناك تعليقات:
إرسال تعليق