One thing I find kind of interesting about this -- the folks dumping cards on th...

latchkey · on Sept 20, 2022

Most large farms don't have the ability to properly tune cards for best ROI. At scale, it is too hard because each card is a snowflake in terms of performance. In order to tune, you're basically pushing the card to the limits of underclock/undervolt... which means the card crashes. Therefore, most large farms actually undertuned, for stability, which is the least abusive on the cards.

If the card crashes due to tuning, the machine crashes, and the rest of the cards are taken offline until the machine reboots. In order to change settings, you have to reboot the entire machine too. It is a pain in the ass to find optimal tuning at scale.

There is no publicly available software to 'autotune' cards at scale. It took me a long time to first learn how to tune cards (by hand tuning thousands of cards and getting a feeling for things) and then once I figured that out, I built software to do this. It required changes in the AMD driver to get it to output in the kernel messages, which card was crashing. It required updating the mining software itself. It was a nutty insane project, but it worked well. We hit our target hashrate / power usage.

Melatonic · on Sept 20, 2022

You should really pivot and start selling that software. Package it up with a nice pretty bow and wrapper and it could be a very lucrative product.

hailwren · on Sept 20, 2022

What are you doing now? Did you shut the project down entirely?

latchkey · on Sept 20, 2022

Some is shut down, some is mining other coins. Kind of a wait and see time right now.

Ruphin · on Sept 20, 2022

With Ethereum no longer in the game the total USD issuance for GPU mining probably dropped over 80%. Is your strategy to just wait out for miners to capitulate until mining is profitable again? Are you not worried that this will crater the value of your hardware?

latchkey · on Sept 20, 2022

We bootstrapped the company on mining, knowing full well that it would eventually end, and our business model goes beyond just mining ETH.

That was another reason to go with the RX4/5 series... fastest route to ROI.

Kerbonut · on Sept 20, 2022

Considering the current state of PoW mining, it doesn’t seem to me like there is much advantage to holding secrets on auto tuning and diagnostics… (correct me if I’m wrong), would you consider releasing any of that work to the community? Auto tuning and diagnostic information would be incredibly useful for almost everyone.

latchkey · on Sept 20, 2022

My knowledge is very specific to just these cards.

The only 'secret', which isn't a secret, was that I realized that since this is hardware, we know the maximum settings (hash/watt) that should be possible. Therefore, I set the cards to that best setting and then tune down from there. This is the opposite train of thinking than most other miners.

Most people think, let's start low and then tune up from there to make them 'faster'. The cards crash when they can't handle the settings, so it turns out that tuning down is a better way to tune since they stop crashing when they are stable... and thus don't need tuning from there. There are 3 different sets of 'knobs' to tweak, so I had to build an algo to adjust the knobs in the right order to tune things down. I just had the concept of 'current -> next settings'.

Temperature and power fluctuations can make the cards crash too... so by always tuning down, you're always heading towards more stability instead of instability. Since neither of those could be controlled, machines would reboot randomly all the time.

The software I built was a golang daemon that ran on each machine and watched for these crashes and modified the tuning of each card individually. The daemon is pretty cool as it is effectively a task runner. I had different tasks to configure and monitor the machines as well. The machines are all independent, idempotent and self-healing workers. Reliably distributing the software to 20k different workers, is a fun challenge. There are a ton of unit tests, so that helped a lot.

If I have the energy, I may rip out the tasks from the daemon, turn it into a library and open source that. It is kind of a fun project that could be useful for others trying to manage large scale individual workers. Tasks could easily be 'apt install' or monitoring utilities. I even bundled node_exporter into the binary, so that we could monitor the machines with prometheus.

arcticbull · on Sept 20, 2022

The planet thanks you for shutting down. Don't worry, I'd say the same for any tobacco executive.

latchkey · on Sept 20, 2022

Our power is unused hydro, in a geographic location full of other data centers for all the large tech companies.

This comment is just your continued trolling with your anti-crypto trope. Please try another tactic.

Update: I got downvoted to -4 on this one after getting several upvotes. Makes me think that articbull has multiple accounts and is just gaming the system.

robocat · on Sept 20, 2022

> downvoted

You are very likely mistaking normal HN herd behaviour - plenty of people are going to think bullshit to “unused hydro”.

Also “Please don't comment about the voting on comments. It never does any good, and it makes boring reading.” and “Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.” - https://news.ycombinator.com/newsguidelines.html

If you assume that HN is full of good people, and respond accordingly, you will learn more and have more fun. If you assume HN is full of bad people causing trouble, why would you stick around? Good communities are built upon trusting cooperators and learning the valid signals of trust.

latchkey · on Sept 20, 2022

Right after that update, I went from -4 to +2.

> plenty of people are going to think bullshit to “unused hydro”.

Maybe... but the way I went up and then suddenly went down to -4, was super odd.

> If you assume that HN is full of good people, and respond accordingly, you will learn more and have more fun..

I've been around here, with an account, since 2009. It is the only community I participate in any more. I believe that HN is full of great people and that the content is about as good as you're going to get these days. This is why accounts such as articbull are becoming boring to me. All it is is anti-crypto hatred that is often not fully factual. Equating me to a tobacco exec is just icing on the cake. Sorry for calling that out.

mattwilsonn888 · on Sept 23, 2022

Not only have I debated this person multiple times, but on a search about settlement a HN thread came up and he had the top comment citing the infamous Mora et al study which makes the naive assumption that required hash for a proof of work network scales linearly with transaction volume, rather than block volume (which come at a constant rate no matter what the total hash is).

This person is a successful troll, and a pathetic thinker.

arcticbull · on Sept 20, 2022

There's no such thing. That power should be connected to the grid instead of wasted or put to productive use, not to mention all those GPUs that are about to head for the landfill after a long life of guessing random numbers and warming up the outside. Wasting the power for money disincentivizes storage and grid connection.

latchkey · on Sept 20, 2022

You are hilarious, your crypto hatred has clouded your judgement. Since you have so much karma here, you don't mind saying silly things and losing points left and right.

Transmission of power is expensive and lossy. These data centers are in a very remote location that isn't heavily populated. That's why they are there to begin with! The power would absolutely go to waste otherwise. That's also why we picked that area... power was cheap because it wasn't being used.

Our GPUs are 'new old stock'. They were already produced, sitting in warehouses not doing anything, and would have gone to waste anyway.

Update: Similar with the downvotes...

arcticbull · on Sept 20, 2022

> You are hilarious, your crypto hatred has clouded your judgement. Since you have so much karma here, you don't mind saying silly things and losing points left and right.

First question, where do you think I got most of the karma? Sentiment here is broadly against crypto and has been for years (although it comes and goes). The pro side is loud, but enjoys far less support than you might imagine.

> Transmission of power is expensive and lossy.

No, it isn't. Transmission of power is super efficient. The EIA says the US lost only 5% of its electricity to T&D over the 2016-2020 period. [1] Expensive it may be, but that's not a reason not to do it. A single long power line from Boston to LA would only see ~25% loss from one end to the other. The whole Ohm's law thing (P=IV), we use transformers that are 99%+ efficient to raise the voltage to ~345kV and big fat wires to reduce current and resistance and hence transmission line losses.

> That's also why we picked that area... power was cheap because it wasn't being used.

It should be used to do something productive, not generate e-waste.

> Our GPUs are 'new old stock'. They were already produced, sitting in warehouses not doing anything, and would have gone to waste anyway.

Or, prices would have gone down and more people would have bought them. I can't see 3060s getting scrapped due to lack of demand when they were going for $850 a pop.

[1] https://www.eia.gov/tools/faqs/faq.php?id=105&t=3

latchkey · on Sept 20, 2022

> Sentiment here is broadly against crypto and has been for years.

Agreed. However, I'd hope that 'against crypto' was less of one dude ranting endlessly, which is really boring and should get down voted.

> Transmission of power is super efficient.

We're not talking about efficiency. Transmission is expensive. Which goes beyond efficiency. You have to factor in the cost of the lines themselves, which are based on distance. I've seen these costs first hand, have you?

> It should be used to do something productive, not generate e-waste.

Oh yes, please define productive for me.

> Or, prices would have gone down and more people would have bought them.

They weren't for sale to the public. Nor would they ever have been.

> I can't see 3060s getting scrapped due to lack of demand when they were going for $850 a pop.

Huh? This whole thread is about RX4xx, which is what my cards are.

Reason077 · on Sept 21, 2022

> "A single long power line from Boston to LA would only see ~25% loss from one end to the other."

It would actually be less than that using a modern HVDC transmission line. Losses for HVDC are around 3% per 1000 km of cable, and around 0.7% for the converter stations at each end. So around 15% loss for a theoretical Boston-to-LA HVDC.

dumbfounder · on Sept 20, 2022

It’s the storage that’s the problem, not the transmission. Have talked with several in this business, their take is that it allows plants to have extra capacity to expand to meet peak needs, instead of flirting with disaster. Most plants are hard to spin up and down, so having an alternative way to monetize the energy creation is super important.

latchkey · on Sept 21, 2022

In the past, it was aluminum smelting pots, which can't go cold or they take a huge amount of effort to spin up again. They built huge dams just for this [0].

When those plants shut down, the power, which was transmitted directly to the plants, had nowhere to go. Sending it somewhere else requires building high voltage transmission towers, which is very expensive to build as well as maintain.

Coinmint [1] took over the old Alcoa plant and is now pulling power from that dam. As it was explained to me by the head electrician when I visited, Coinmint is doing the dam a favor by having a constant draw of power to replace what was once the smelters. They also are paying for the electricity, which helps with maintenance of the dam, which is effectively a national resource.

[0] https://en.wikipedia.org/wiki/Moses-Saunders_Power_Dam

[1] https://www.coinmint.one/

Reason077 · on Sept 21, 2022

> "It’s the storage that’s the problem, not the transmission... Most plants are hard to spin up and down"

This is true for nuclear and some types of fossil-fuel power plants. But for hydro, all you have to do to vary the power output is adjust the amount of water running through the turbines. Plus, hydro plants that have their own storage lakes/dams have built-in storage: any water you're not using for generation can be saved in the lake for when it's needed.

(OK, some rivers have minimum flow requirements so that fish don't die and water can be used downstream etc...)

schoen · on Sept 20, 2022

According to the HN moderators, if you suspect someone is manipulating votes in some way, you're supposed to e-mail the moderators to have them look at it rather than making an accusation in your post.

latchkey · on Sept 20, 2022

Sure. Apologies for calling that out in my update. That said, I don't feel it is worth @dang's time to waste on validating if someone with 24k karma is using fake accounts. As soon as I posted that update, the karma on the comment reversed.

superjan · on Sept 20, 2022

I don’t particularly like crypto but I don’t think we need to rehash the same debate again. Isn’t it interesting to hear what this guy is going to do? Passing judgement does not improve the conversation.

NikolaNovak · on Sept 20, 2022

>>I think an argument could be made that mined-on cards are more well cared for than those owned by your average gamer.

What strikes me though is people are rarely discussing mileage, only nature of work.

The video card in my gaming computer works, on average, maybe 1hr/week (some weeks more, most weeks 0). So right out of the gate, my video card has been working basically 168 times less long than any mining card running 24x7. We are talking more than two orders of magnitude difference in "mileage", before we discuss how well or poorly it has been taken care of.

LoveMortuus · on Sept 20, 2022

Most of the "damage" that happens to a GPU is due to thermal expansion, which happens at to a more extreme degree if you're cold booting a game and then from a hot gaming state back down to cold. Where are with mining it's usually running constantly at a more or less stable temperature, thus it experiences much less of stress from the thermal expansion side of things.

belval · on Sept 20, 2022

My understanding is that it's similar to a car, changing state (starting/stopping) is much more damaging that running the same thing, at the same temperature/voltage for extended period of time.

latchkey · on Sept 20, 2022

I've rebooted GPUs hundreds of times to find the optimal tuning parameters. Failures after a reboot were very few and long term over the entire collection of GPUs, we've seen very few failures. Definitely within operating parameters.

Obi_Juan_Kenobi · on Sept 20, 2022

They usually undervolt/clock the GPU, but not the memory. VRAM temps can be an issue and only some miners will try to address that properly with copper shims, etc.

The 3090 is notoriously bad for mining because it has memory chips on both sides of the board, leading to more heating in one spot and poor cooling for the 'back side' chip. Other cards tolerate it much better, but it can be an issue.

So while overall mining cards should be good, there are some specific issues to watch out for. A failed memory chip is not an easy fix, requiring a reball/replacement of that chip.

latchkey · on Sept 20, 2022

100% accurate. ethash is memory bound. I've definitely seen 'memory go bad'... but at our scale, the percentage failure rate on that is in the single low digits. Additionally, we have cards that run just fine for years on end.

Maybe, as you suggest, it is just something that is more card specific.

synapse26 · on Sept 22, 2022

I fully agree with your point. However, this only applies when miners are buying new cards. I live in Vietnam, crypto mining/usage is pretty big, during the last crypto craze a few months (a year?) ago, all miners are willing to buy old cards, used cards, even broken cards at high price, I sold my 570 for $300.

Furthermore, flashing custom mining BIOS, what I’m told is that every miner here has to do or it wouldn’t be profitable, voids warranty.

captainbland · on Sept 20, 2022

Yeah if I had to guess, the main concern with very long periods of load is going to involve moving parts rather than anything else. So pretty much the fan is likely to go but not much else.