std_badalloc's comments

std_badalloc · on March 29, 2021

As someone who did cross platform development for iPhone, Android and Windows Phone way back when, Windows Phone did actually have the superior dev experience by far (talking about WinPhone 7+ here). It wasn't free, but neither was iPhone development.

They didn't have any market share though, so there wasn't much money to be made making apps for them. I suspect they failed because they launched 2-3 years after Android and iPhone, so the other platforms had accumulated the network effects of an existing user base and app ecosystem that they couldn't catch up to. And they tried hard, IIRC, Microsoft offered to build a Snapchat client for Snap Inc, and to pay them to be allowed to do so, but were denied.

std_badalloc · on March 5, 2021

PyTorch is the most impressive piece of software engineering that I know of. So yeah, it's a nice interface for writing fast numerical code. And for zero effort you can change between running on CPUs, GPUs and TPUs. There's some compiler functionality in there for kernel fusing and more. Oh, and you can autodiff everything. There's just an incredible amount of complexity being hidden behind behind a very simple interface there, and it just continues to impress me how they've been able to get this so right.

sillysaurusx · on March 5, 2021

and TPUs

BS. There's so much effort getting Pytorch working on TPUs, and at the end of it it's incredibly slow compared to what you have in Tensorflow. I hate this myth and wish it would die.

Old thread on this, detailing exactly why this is true: https://news.ycombinator.com/item?id=24721229

jampekka · on March 5, 2021

OTOH PyTorch seems to be highly explosive if you try to use it outside the mainstream use (i.e. neural networks). There's sadly no performant autodiff system for general purpose Python. Numba is fine for performance, but does not support autodiff. JAX aims to be sort of general purpose, but in practice it is quite explosive when doing something other than neural networks.

A lot of this is probably due to supporting CPUs and GPUs with the same interface. There are quite profound differences in how CPUs and GPUs are programmed, so the interface tends to restrict especially more "CPU-oriented" approaches.

I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs), but Python really needs a general purpose, high performance autodiff.

wxnx · on March 5, 2021

> I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs), but Python really needs a general purpose, high performance autodiff.

As someone who works with machine learning models day-to-day (yes, some deep NNs, but also other stuff) - GPUs really seem unbeatable to me for anything gradient-optimization-of-matrices (i.e. like 80% of what I do) related. Even inference in a relatively simple image classification net takes an order of magnitude longer on CPU than GPU on the smallest dataset I'm working with.

Was this a comment about specific models that have a reputation as being more difficult to optimize on the GPU (like tree-based models - although Microsoft is working in this space)? Or am I genuinely missing some optimization techniques that might let me make more use of our CPU compute?

jampekka · on March 6, 2021

For gradient-optimization-of-matrices for sure. Just make sure that you don't use gradient-optimization-of-matrices just because they run well on GPUs. There may well be more efficient approaches to your problems that are infeasible for the GPUs' wide SIMD architecture you may miss if you tie yourself to GPUs.

In general it's more that some specific models are easy for GPUs. Most models probably are not.

_coveredInBees · on March 5, 2021

I really don't understand the GPUs are overrated comment. As someone who uses Pytorch a lot and GPU compute almost every day, there is an order of magnitude difference in the speeds involved for most common CUDA / Open-CL accelerated computations.

Pytorch makes it pretty easy to get large GPU accelerated speed-ups with a lot of code we used to traditionally limit to Numpy. And this is for things that have nothing to do with neural-networks.

jampekka · on March 6, 2021

For a lot of cases you don't really need that much performance. Modern processors are plenty fast. It seems that current push to use GPU also pushes people towards GPU oriented solutions, such as using huge NNs for more or less anything, while other approaches would in many cases be magnitudes more efficient and robust.

GPUs (or "wide SIMDs" more generally) have quite profound limitations. Branching is very limited, recursion is more or less impossible and parallelism is possible only for identical operations. This makes for example many recursion-based time-series methods (e.g. Bayesian filtering) very tricky or practically impossible. From what I gather, running recurrent networks is also tricky and/or hacky on GPU.

GPUs are great for some quite specific, yet quite generally applicable, solutions, like tensor operations etc. But being tied to GPUs' inherent limitations also limits the space of approaches that are feasible to use. And in the long run this can stunt the development of different approaches.

mpfundstein · on March 6, 2021

> For a lot of cases you don't really need that much performance. Modern processors are plenty fast. It seems that current push to use GPU also pushes people towards GPU oriented solutions, such as using huge NNs for more or less anything, while other approaches would in many cases be magnitudes more efficient and robust.

for instance?

_coveredInBees · on March 6, 2021

I still don't get the criticism of Pytorch. If anything, you can get the best of both worlds in many way with their API supporting on GPU and on CPU operations in exactly the same ways.

ahendriksen · on March 5, 2021

What do you mean by “seems to be highly explosive”? I have used Pytorch to model many non-dnn things and have not experienced highly explosive behavior. (Could be that I have become too familiar with common footguns though)

lgessler · on March 5, 2021

I get what you mean by the GPUs are overrated comment, which is that they're thought of as essential in many cases when they're probably not, but in many domains like NLP, GPUs are a hard requirement for getting anything done

jl2718 · on March 5, 2021

Have you tried using Enzyme* on Numba IR?

* https://enzyme.mit.edu

komuher · on March 5, 2021

Wait wat, jax and also pytorch is used in a lot more areas then NN's. Jax is even consider to do better in that department in terms on performance then all of julia so wat are u talking about

BadInformatics · on March 5, 2021

GP makes a fair point about JAX still requiring a limited subset of Python though (mostly control flow stuff). Also, there's really no in-library way to add new kernels. This doesn't matter for most ML people but is absolutely important in other domains. So Numba/Julia/Fortran are "better in that department in terms on performance" than JAX because the latter doesn't even support said functionality.

jpsamaroo · on March 5, 2021

> Jax is even consider to do better in that department in terms on performance then all of julia so wat are u talking about

Please provide sources for this claim

UncleOxidant · on March 5, 2021

> There's sadly no performant autodiff system for general purpose Python.

Like there is for general purpose Julia code? (https://github.com/FluxML/Zygote.jl)

> I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs),

Do you run much machine learning code? All those matrix multiplications run a good bit faster on the GPU.

UncleOxidant · on March 5, 2021

> Oh, and you can autodiff everything.

Well, not everything. Julia's Zygote AD system can autodiff most Julia code (currently with the exception of code that mutates arrays/matrices).

mpfundstein · on March 6, 2021

and you didn't even talk about data and model parallelism. which often just works out of the box

thecleaner · on March 5, 2021

Its python wrappers on top of existing ThTensor library which was already provided by torch. But yes great engineering nonetheless.

rrss · on March 5, 2021

I don't think this is a particularly accurate description of pytorch in 2021. Yeah, the original c++ backend came from torch, but I think most of that has been replaced. AFAIK, all the development of the c++ backend for pytorch over that last several years has been done as part of the pytorch project -it's not just python wrappers at this point.

danieldk · on March 5, 2021

What I like about PyTorch is that most of the functionality is actually available through the C++ API as well, which has 'beta API stability' as they call it. So, there are good bindings for some other languages as well. E.g., I have been using the Rust bindings in a larger project [1], and they have been awesome. A precursor to the project was implemented using Tensorflow, which was a world of pain.

Even things like mixed-precision training are fairly easy to do through the API.

[1] https://github.com/tensordot/syntaxdot

std_badalloc · on March 3, 2021

It's the same in Norway, and it seems completely indefensible.

1) It's the opposite of progressive taxation, since the tax deduction is higher the larger your mortgage is (and there's no deduction if you don't have a mortgage).

2) It artificially inflates housing prices.

Overall, it's a shifting of cash away from those that want to enter the housing market towards those that are in it, i.e. taking from the poor and giving to the rich. The policy seems like an obviously indefensible mistake, yet no political party dears to touch it because the majority of voters are beneficiaries of it, and the topic is slightly too complicated for there to be an informed public debate about it.

Vinnl · on March 3, 2021

I think it's relatively widely recognised as a bad idea now, but the challenge is that people have made long-term financial plans based on it, and if you were to suddenly cancel it altogether, lots of people would probably get in trouble. They're imposing further restrictions on it in the Netherlands, but it's all gradual and slooooow.

Disclaimer: I don't know much about this.

KSteffensen · on March 3, 2021

We have the same problem in Denmark.

It doesn't have to be cancelled suddenly, it could be removed over a period of 25 years or even more. Similar to what happens with public pensions.

Vinnl · on March 3, 2021

Yeah, I think that's what's going on over here. (Likewise for the retirement age.)

MrDresden · on March 3, 2021

I'm intrigued, what are you referencing in regards to public pensions in Denmark?

KSteffensen · on March 3, 2021

Denmark is in the process of raising the retirement age. This is done gradually to allow people to properly plan for it. This means that the retirement age is based on year of birth. For people born before 1. january 1954 the age is 65.5. For people born in 1996 or later the retirement age is 74. There is a scale between these two points, I was born in the early 80es and my retirement age is 72.

The argument behind this is that the current level of income/taxation can not support people on pension for more than ~12 years on average and as the life expectancy rises the retirement age must rise with it.

There is a fairly big debate happening on 'graduated pensions'. Should a bricklayer who has carried heavy loads all his life be forced to work to the same age as an academic who has spent a large amount of time behind a desk in an office?

jopsen · on March 3, 2021

Does it matter now that we have negative interest rates?

KSteffensen · on March 3, 2021

No, and that means this is the best time to get this done. The least amount of people will be affected.

The interest rate will go back up at some point and we would have a fairer house market without this tax deduction. It's basically a tax break for people with enough resources to buy their house. And it drives up the housing prices making it harder for first-time buyers.

alkonaut · on March 3, 2021

It means it’s the perfect time to phase out the deductions, because they are very low at the moment.

statstutor · on March 6, 2021

> the challenge is that people have made long-term financial plans based on it, and if you were to suddenly cancel it altogether, lots of people would probably get in trouble.

This could be solved by just cancelling it. People taking advantage of this scheme:

a) had no reasonable expectation that it would be long-term, and,

b) had to be wealthy enough to initially access the scheme to benefit from it, and have since continued to benefit.

Handling any exceptional cases (which would be relatively very few) would be a lot more effective, in every sense, than continuing the scheme.

Besides, the Netherlands already has a Mortgage Guarantee Scheme which protects people with mortgages and changed circumstances.

Vinnl · on March 6, 2021

People who recently got one might have some reasonable expectation that it wouldn't be long term (although often their mortgage advisor would probably have given them the impression that it would be), but there are still many home owners who would reasonably have believed so. And even the ones who do expect it to not be long-term, would not have expected it to be cancelled overnight.

I would certainly love to be rid of it sooner rather than later, but I do acknowledge that that's going to be disadvantageous to a lot of people, and will have to take them into account as well.

The mortgage guarantee scheme doesn't really apply here, I think: it raises the buying power of people with low incomes, but once you've bought a house, it only means that their bank gets paid if you're forced to sell it - but you'd still be forced to sell it.

statstutor · on March 6, 2021

> And even the ones who do expect it to not be long-term, would not have expected it to be cancelled overnight.

I find this argument really strange.

Budgets can (and do) change and cancel categories of benefits for poor people overnight.

But benefits that accrue to wealthy people must not be changed overnight?

This is a function solely of power.

Vinnl · on March 6, 2021

Is that so? Like which? Generally, I wouldn't be in favour of suddenly significantly cutting people's benefits without some compensation or other either. Luckily, I don't think that happens a lot.

For example, whereas we need to raise the retirement age, we don't suddenly increase it with a couple of years. It is raised slowly in increments, and less for people who are closer to the retirement age and thus have had less time to plan for it.

LudwigNagasena · on March 3, 2021

> the challenge is that people have made long-term financial plans based on it, and if you were to suddenly cancel it altogether, lots of people would probably get in trouble

This can be solved with a grandfather clause.

mattmanser · on March 3, 2021

Which would make even more brutally regressive and unfair.

Vinnl · on March 3, 2021

Possibly, but that means houses would suddenly become much more expensive for those not ushered in.

jug · on March 3, 2021

This sounds much like the situation in Sweden including why things remain the way they have become. If it's brought to a political level, it use to be more about how house owners with very high mortages can be protected, haha... sigh

justincormack · on March 3, 2021

The UK managed to stop doing this. Also most deductions were limited to basic tax rate not higher rates. At some point the extra tax revenue becomes attractive.

em500 · on March 3, 2021

Dutch political commentators suggest that there had been a tacit housing policy bargain between the left and right wing parties for many decades: the left delivers large rent subsidies while the right delivers mortgage subsidies for their constituencies. It just seemed much harder to win elections with supply side policies (build, baby, build!).

std_badalloc · on Dec 16, 2020

I've seen studies where they tracked wild Norwegian salmon, and found that 90% died before spawning. As far as I understood it, this was interpreted as "natural death", but it's the same figure as the one in the article. It seems like a very plausible explanation that those deaths also largely occurred due to the same poisoning from tires.

std_badalloc · on Dec 1, 2020

The difference is presumably the digital contract, that is written in a programming language that only a handful of people on earth can understand, where a programming error can make your money disappear, such that you have no recourse in the court system.

Another difference is the pyramid scheme incentives in the cryptocurrency tech, causing everyone who's bought into it to be incentivised to talk about how great it is.

hanniabu · on Dec 1, 2020

Please don't comment on subjects you know nothing about and are clearly ignorant and show no desire to actually learn.

std_badalloc · on Dec 1, 2020

I think it was with the seq2seq paper of Sutskever, Vinyals and Le in 2014: https://arxiv.org/abs/1409.3215

People were doing a mix of learning from data and hand engineered solutions before this, but this was the first system learned end-to-end, afaik.

std_badalloc · on Nov 13, 2020

> Isn't DRM... Completely fucking pointless?

> All it does is infringe on our rights to be able to do what we want on our own devices.

No, but this is exactly the point of DRM and the legal protections around circumventing it. It never was about copyright protection. Copyright infringement was already illegal before the DMCA, and the introduction of DRM didn't make a dent in the amount of copyright infringement.

The point of making DRM circumvention illegal is for me to be able to sell you a bunch of bits, but ensure that I don't have any commercial competition in regards to how you use those bits. You can't legally make a device that plays DVDs without the blessing of a cartel known as DVD FLLC. You can't legally make a device that plays music from iTunes without the blessing of Apple. Etc. It's about retaining monopolistic control over media distribution and use, by forbidding certain forms of competition in the market.

Getting a law passed that forbids market competition (in many countries! not just the US) under the guise of being about copyright protection, is one of the greatest cons I've ever heard of, but that is what has happened.

std_badalloc · on Oct 28, 2020

I don't think the example here is the best. There's a case to be made for extracting pure functions and organizing them like this, but I don't think this code makes it. The benefit of pure functions IMO is primarily in that the code becomes easy to reason about if it doesn't depend on state. But any app that does anything will have state, and the question is how you manage that. One guideline could be that individual code units should reduce the amount of state you need to worry about at higher levels of abstraction.

In the example, there is hardly any code that does anything different depending on state. There's no state being managed, so there isn't actually any architectural problem being solved here. Should the API go down or change its format, the code breaks. The pure pluck_definition() will still fail to parse the JSON if the format changes. The pure build_url() will stop working if the API changes its URL format. They will pass unit tests, but fail in practice.

An actual problem to be solved here is to abstract away the details of the REST API, formatting and network errors. One way to do this is to pack that into a component with a well defined interface. You can still do this stateful/non-stateful split within the component if you want, but on the application level you need to apply that heuristic recursively at different levels of abstraction.

whoomp12342 · on Oct 28, 2020

There is absolutely a problem here. Having worked in disasters of a code base, the architectural pattern in the first example is probably fine... until the software grows. The first function truly is a thing-do-er which violates SRP. Then it will easily become a ball of mud.

Why is this so bad? Its not because its expensive, yes that is bad, but the largest issue with working in a ball of mud architecture, is that the code becomes so fragile and interdependent that changing any one thing can easily lead to breaking many other things. This leads to a culture of fear of change which grows tech debt. Then one day someone steps up and decides to actually refactor this ball of mud to have some semblance of logic to it, what a noble soul. That person is then subject to a barrage of bugs and issues from the refactor and is that the mercy of their supervisor.

Dealing with state and other side effect like issues is certainly something to consider in architecture, but it is a different argument entirely.

std_badalloc · on Sept 18, 2020

Yeah, but ResNet does not have sparse matrices, so how could it use them? Post ReLU activations may be sparse, but I don't think that helps when used with a non-sparse Conv2d.

dplavery92 · on Sept 18, 2020

I don't know if there are any white papers with hard details yet (if anyone knows of one, please share!), but nVidia's marketing material[0] for the Ampere architecture claims the following:

"Sparsity is possible in deep learning because the importance of individual weights evolves during the learning process, and by the end of network training, only a subset of weights have acquired a meaningful purpose in determining the learned output. The remaining weights are no longer needed.

Fine grained structured sparsity imposes a constraint on the allowed sparsity pattern, making it more efficient for hardware to do the necessary alignment of input operands. Because deep learning networks are able to adapt weights during the training process based on training feedback, NVIDIA engineers have found in general that the structure constraint does not impact the accuracy of the trained network for inferencing. This enables inferencing acceleration with sparsity."

So the idea seems to be that at the end of training, there's fine tuning that can be done to figure out which weights can be zeroed out without significantly impacting prediction accuracy, and then you can accelerate inferences with sparse matrix multiplication. They consider training acceleration with sparse matrices an "active research area."

I could see it being nice for the sake of running large language models on consumer, or really cool for the few edge computing applications that can actually demand and power conventional GPUs (e.g. self-driving cars.) It's probably not a great boon to the researcher who wants to reduce their iteration timeline though.

[0] https://developer.nvidia.com/blog/nvidia-ampere-architecture...

ryneandal · on Sept 18, 2020

Ah, I should have paid more attention to the question. Read it as "are they enabled?" My bad. :(

std_badalloc · on Sept 14, 2020

First I've heard of this. What is the software situation like on a device like this? I assume no comptability with Android, so apps have to be developed specifically targeting their OS?

fsflover · on Sept 14, 2020

This is GNU/Linux based on Debian, so apps should be developed in the same ways as for desktop Debian. Android apps could work in a VM, see anbox.