Hacker News new | past | comments | ask | show | jobs | submit login
Nvidia GeForce GTX 970: Correcting the Specs and Exploring Memory Allocation (anandtech.com)
88 points by wtallis on Jan 27, 2015 | hide | past | favorite | 43 comments



Nothing they can't fix with a refund or a discount. It'll hurt but the vast majority of the users will be happy to continue to use what is a very high performance card for a pretty good price.

After all, what are they going to buy instead with their refund? There would have to be a card performing better for the same price in order to make a trade make sense and even then they'll be spending time ripping apart their pc's and travelling back-and-forth to stores or sending back units to online shops. Lots of trouble for little gain.

I'm happy that anandtech took the time to point out the issue but it isn't nearly as bad as it looks from a technical point of view, the discrepancy is only a major one if you notice it in an application that you run and in that case it might be worth the trouble but if you're happy then I don't see the point.

Imagine you bought a car that was advertised as having 7 cylinders and it turns out that you only have 6. That would be the nightmare of every car company but in the end the number of cylinders is less important than whether or not the car performs for its assigned duty or not.


Why would they give a refund or discount in the first place? "I would like to return my Haswell chip because it has come to my attention that some of the execution units can't execute all instructions."

I guess the problem is that they put the memory bandwidth into the official specs.


Not just the memory bandwidth - the 970 has been revealed to have fewer ROPs and less cache than advertised.

When you sell a Haswell as having 4 cores and 8MB of cache, and when you purchase it you notice that it only has 3 cores and 6MB of cache, damn straight a refund is in order.


To make sure their customers are happy.

That's a good thing when you're selling products that people tend to replace every 2 to 3 years.


Or more. Serious gamers replace their GPU roughly every year.


Yeah, but what else are their customers going to buy? AMD?


The customers aren't actually going to exercise the refund; the point of offering it is just to prevent backlash.

http://www.joelonsoftware.com/articles/customerservice.html (scroll down to #7)


I don't see the relevance here, but that article was actually really interesting. Thanks for the link.


Sure. Some people might just get mad enough to do so.


>Why would they give a refund or discount in the first place?

Because it will look better when they are investigated by the Advertising Standards Authority. Nvidia wants to frame this as an accidental non disclosure of a technical issue, they don't want you reporting this as fraudulent behavior. Everyone that bought a card, even if they are happy with it should file a complaint. You and I, as the consumer do not have the needed information to decide if we were intentionally or unintentionally deceived beyond Nvidias 'oops, sorry'. A proper investigation may go a long way to make sure we don't get an 'oops, sorry' on the next generation of products too.


Perhaps I'm being dense in not seeing the issue here? It seems like a relatively minor mistake which turns the card from "obscenely good value for money" into "very good value for money".

Generally I buy hardware based on performance, not on technical specifications. This is certainly normal purchasing behavior for a CPU (a 3.0GHz Intel beats similarly specced 3.0GHz AMD almost always), so I'm surprised to hear people buy GPUs based on whatever obscure number they've been told to care about.


I've bought my last card(GTX780) with the very specific intention of using all(well, nearly all) of its ram(3GB) for CUDA calculations I was doing at the time - if it turned out that I couldn't allocate it fully, then the card simply would not be what they have advertised and I would have returned it.


You can allocate it fully even here, it's just that accessing some addresses is slower than others.

That's a shitty situation. If you want really fully guaranteed specs and performance, buy a workstation card.


Why? If it's not a workstation card why can I no longer expect solid specs and performance? I don't need to work on doubles, so a customer grade card was perfectly reasonable for my work - and it was easily 5-6x cheaper than an equivalent Quattro/Tesla card. I hate the notion that if something is not from "professional" segment then it can be shitty or not fully spec compliant. It's still misleading advertising.


>If you want really fully guaranteed specs and performance, buy a workstation card

Wrong answer. If you want guaranteed specs you and every other user file a complaint with your countries advertising standard. When a hefty fine is laid upon said manufacture the spec sheets will become amazingly accurate.



>Perhaps I'm being dense in not seeing the issue here?

Taking the technology out if it, it's a rather serious violation of advertising law, at least in the U.S. and Britain. Try to whitewash the issue with 'well it's still a good deal' is a poor way of looking at it because if not dealt with more fraud from the manufacturer is likely. Even though your value calculation was not changed by the revelation your experience does not encompass all experiences. The 3.5G split is certainly unexpected behavior.

https://www.gov.uk/marketing-advertising-law/overview


Is it actually a breach? It might be because I don't look at or for that level of specs but I didn't think the "advertising" such as it was included "it has this many ROPs".

There is a line, i think, in what counts as advertising in this case. If the specs that were wrong were not advertised by NV themselves I imagine they don't count as misleading.


Yup - the documents they sent out to reviewers stated that the 970s had 64 ROPs [1]. This was also further pushed down the line to the OEMs building the cards, who used it in their own advertising (see the chart from ASUS in [2]).

NVIDIA advertised the cards as having 64 ROPs - the fact that the first bunch of dupes parroted it to a wider audience is irrelevant.

[1] http://www.techpowerup.com/209339/gtx-970-memory-drama-plot-...

[2] http://www.hardwarecanucks.com/forum/hardware-canucks-review...


Why didn't the OEMs catch it? You'd think they would verify the specifications of the chips they are given to some degree to assure they are accurate. Have they grown complacent?


OEMs only care about the external interfaces on the chips, and only if they're doing a custom PCB rather than using NVidia's reference design. It's not like they're advertising it as their own GPU - they put NVidia's name and logo all over their products.


The advertising and spec sheets talk about 224G of memory bandwidth. Retailers posted the listed specs provided by Nvidia as part of their advertising. You can go on Amazon/Ebay and see it listed everywhere. It may be that the retailer gets the brunt of the flack, you can guarantee that the retailers will demand accurate specifications before accepting more of Nvidias product if they are held liable.


"Consequently, achieving peak memory bandwidth performance on the GTX 970 is still possible, but it requires more effort since simple striping will not do the trick."

So you have to specially write code to reach peak memory bandwidth, the same as all other chips.

Unless that part of the article is wrong?


There have been reports of games having noticeable frame rate drops when they start to utilize the slow memory area, far above the few percent NVidia claims happen "typically". This might be fixable, either in the drivers or with game patches, but if they don't even publish the information that there is a special case, how are people supposed to work around it?

I imagine that if they had documented this from the start the backlash would have been way smaller, and it leaves people to wonder if problems they have are related to it.


I have a GTX970 and it performs beautifully, and this news makes no difference to my impression - the card is great bang for its buck, and runs everything I've thrown at it without issue, supersampled, at obscene framerates - I've been using it to play elite in the rift, everything cranked, supersampled, 75Hz... yeah. How it manages memory really doesn't matter currently, and it'll be obsolete on the same timeline as its superior sibling.


Frankly, I am amazed technical marketing at Semi companies get the specifications right as often as they do. They typically don't get to bother engineering all too much and even if they typically have worked in engineering before they probably didn't work at an architecture level.

Seems entirely plausible this information had not made it from engineering to technical marketing and/or been missed in a review.


Everyone in this thread needs to take a step back from the technical merits of the card itself. This is an issue of deceptive advertising and every buyer of the card should report it as such.

Take this for example. You go to the store and buy 1 pound of lean ground beef. You cook it, eat it, and it tastes great. A day later you find out that your beef was actually 90% beef and 10% horse. Were you harmed?, No. Is this a deceptive practice?, Yes.

Customers in the U.S. should contact the FTC and in the U.K. the ASA so the manufacture can be investigated and appropriately fined.


I disagree that this analogy is correct. To me it seems more like:

* The butcher offers you "1lb of beef"

* You buy it thinking it's 100% lean beef

* Turns out 12.5% of it is fatty beef

You still got 1lb of beef.


IMHO the data transfer rate is the important info:

  First 3.5 GB : 196 GB/s == 7/8 of 224 GB/s
  Last  0.5 GB :  28 GB/s == 1/8 of 224 GB/s
"When the GPU is accessing this segment of memory, it should achieve 7/8th of its peak potential bandwidth, not far at all from what one would see with a fully enabled GM204. However, transfer rates for that last 512MB of memory will be much slower, at 1/8th the card's total potential." http://techreport.com/review/27724/nvidia-the-geforce-gtx-97...


The data transfer rates are what will bite you, and actually are different than advertised.

I know of at least one group that bought bunch of GTX970 cards because they were supposed to have same mem bw as GTX980 but just less computation power. Their application is memory bandwidth bound so that additional computation would be wasted.

However this means they didn't really get what was promised. 196GB/s instead of 224GB/s.

Even so, it's still has the best performance/price combo for that particular GPGPU application.


It's somewhat misleading to look at an arithmetic average of the bandwidth of the fast/slow segments. Due to the way they architectured it, you cannot access both the fast segment and the slow segment during the same memory-fetch cycle, it's either/or. If control flow depends on data that's stuck in the slow segment, performance could be significantly degraded.

Now - blah blah prediction, blah blah heuristics, yadda yadda. If you don't use the memory fully (compute, 4K, etc) there's no problem, and even then you can optimize the problem away somewhat. This will work pretty well for AAA-grade game engines that get special attention - Unreal, CryEngine, Unity. But for memory-bound (especially latency-sensitive) compute applications, what you have here is a 3.5GB card, not a 4GB card.

Having a card show up with 1/8th of its specced memory units turned off is not acceptable, regardless.


Actually I should correct this - if you access the slow segment at all performance will be degraded, since you cannot also access the fast memory during the same cycle.

Looking at it on a 2-cycle basis, since performance is 7x as high you can either access (7+7) or (7+1) chunks of memory. That's a 43% performance drop if even 1 of the 32 threads in a warp consistently needs to touch the slow segment.

That data being used for control flow will amplify the problem, of course, since latency will double.


There is an actual difference even for the fast segment. Because for the fast 3.5GB segment the performance is still 7/8 of the GTX980.

http://dl5.guru3d.com/gtx-970vs-980.png See this

So the peak performance of GTX970 is 196GB/s


This is kind of like the pen-tile controversy. A full HD pentile display does not have 1920x1080 independent, atomic pixels. Therefore, in my eyes, it's an obvious lie that the display is full HD. However, marketing says that a pixel is not defined like that and therefore it's okay.

"Any sufficiently advanced business model is indistinguishable from a scam."

People can say it's okay because the card actually has 4GB. But where do you draw the line? If 0.5GB had a throughput of 1 byte per second, would it be a scam now? Technically it's still 4GB.


The issue isn't the quantity of memory - there is physically 4GB of GDDR5 memory soldered to the board, yes.

The issue is the number of ROPs. If the board had 64 ROPs - as advertised - then there wouldn't be a fast segment and a slow segment of memory. All 4GB would perform equally, there wouldn't be access contention between the fast segment and slow segment, etc.

There's no cute marketing spin you can put on the board being advertised one way and quietly showing up with 1/8th of its memory subsystem disabled. NVIDIA isn't some fly-by-night hardware company like Butterfly Labs or something.


So, simple question - if I bought that card to do CUDA work on it, could I allocate the whole 4GB of GPU memory, or not? If not, I would be absolutely asking for a refund.

Edit: ok, from what I understand you can allocate the whole 4GB,but 512MB of it will be much slower memory. Also a basis for a refund in my eyes.


From the article there is nothing that would make it seem that the 4G would not be available using CUDA, though there is a hint there may be more wait states involved when accessing all of it. Someone with this card and the CUDA devkit installed could verify that.


It's already been done thankfully

http://dl5.guru3d.com/gtx-970vs-980.png [0]

All of the memory is accessible. Memory read performance is worse than on GTX980 on the first 3.5G and the performance from the last 512MB is atrocious.

http://www.computerbase.de/forum/showthread.php?t=1435408&p=... for the sourcecode

[0] http://www.guru3d.com/news-story/does-the-geforce-gtx-970-ha...


Sort of - with any card, you will never get that full amount because of the operating system, which is likely using GPU memory (additionally, applications such as your web browser are likely using the memory as well). On the other hand, you can exceed the amount of GPU memory (using virtual memory, either via the drivers or a hand-made solution), but depending on the amount of thrashing it can slow things down greatly.


I don't think that answers his question.

It has nothing to do with the OS, browsers, virtual memory or trashing but it has to do with the fact that 3.5G of the 4G is directly available but the remaining 512M is only accessible through an indirect (and therefore slower) path as per the article.


If you are running CUDA applications on Linux, you can certainly get the full amount of RAM - just don't let X.org use it.


Does anyone know (or can make an intelligent guess) on what impact this will have on CUDA?


Could this be causing judder in the Oculus? And if so, is there a workaround?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: