Hacker News new | past | comments | ask | show | jobs | submit login
Is deep learning a new kind of programming? (tomasp.net)
94 points by jpcooper on Dec 13, 2020 | hide | past | favorite | 60 comments



We trained a deep learning model to look at like 20 system parameters and predict an output. the parameters were binary. So one curios engineer decided to brute-force the trained model with all possible inputs like 2^20 inputs to see what the model does. he found for the problem we were solving only 4 of the 20 parameters had effect on results. the remaining approx 16 parameters do not affect results.

So he replaced the model with a single line of code with one boolean expression made with those 4 parameters connected with logical operators.


That kind of problem, with such a limited number of parameters, really shouldn't be thrown into a neutral network. A decision tree (or varient) might have been the ideal ML technique, and you may have been able quickly see what parameters mattered and reduce the four parameters to code if needed.

Neural networks make sense with huge number of input parameters where feature selection is really tricky to reason about and decision boundaries are very non-linear such as image classification.

Edited: slight clarification


When I studied AI in uni in the cold cold (cold) winter of AI and this kind of input was really significant, but most problems we consider ML now are vastly more complex and other problems can be addressed by things that are no longer considered AI at all (while they were back then).

It is funny how my uni top research (on 1m$ computers) neural nets are considered to make no sense anymore. That went a lot faster than programming.


There's no rule that when the number of parameters is small deep learning shouldn't be used. The one time where deep learning maybe shouldn't be attempted at all is when the number of samples is very limited. While it excels with high dimensional hierarchical data it can do well on other problems as well. It differentiates from problem to problem and usually multiple solutions are tried and compared, starting with EDA and linear regression.


> There's no rule that when the number of parameters is small deep learning shouldn't be used

I would be genuinely interested in examples of problems with a very low number of predictors (say two to five) when a neutral net would be appropriate (where as you say less complex methods have been tried and failed).

I just can't think of one.


Suppose you have to fit a fairly non-linear curve to make interpolated predictions. A NN could do that with fewer parameters than most other models.

I can't think of a method that would use fewer parameters. If nothing else, it's a decent way to compress the data set for interpolation (on nearby averages) as a use case, no?


For interpolation (just to be clear, not regression, i.e. interpolation means the curve has to pass through every point in the data exactly), polynomial interpolation gives a unique polynomial of lowest possible degree [1]; I'm not sure a NN would have fewer parameters than this for interpolation, strictly speaking.

To your point, I believe you meant "rough interpolation", and it's true in many cases NN's might produce a less overfitted approximating function if one has no prior knowledge of the generating function.

But if one can exploit prior knowledge, one can select an optimal set of basis functions and fit a more parsimonious model than a NN. For instance, if you knew that a nonlinear function was a function of sin, cos and logs, selecting these as basis functions and finding the correct functional form [2] would likely help an optimizer find more parsimonious model than a NN using standard activation functions (ReLU, sigmoid, etc). As a thought experiment, suppose the generating function was this: (5 parameters)

  y = a1*log(a2*x)/cos(a3*x) + a4*sin(a5*x)
If one attempted to fit this with log, cos and sin basis functions, one is likely recover this form with ~5 parameters. But suppose we tried to fit this with an NN with the stipulation that the approximation error is under some ε -- I suspect we'll need quite a bit more than 5 parameters.

NN's tend to generalize better (assuming proper regularization) than polynomial approximations and have fewer numerical problems like Runge's phenomenon, but I don't think NNs aim for (or have results that demonstrate) parsimony in parameters.

[1] https://en.wikipedia.org/wiki/Polynomial_interpolation

[2] If the functional form is unknown, there are techniques like "symbolic regression" that attempt to do a structure search to find a well-fitting structure. https://en.wikipedia.org/wiki/Symbolic_regression


For online learning, multi-output, non-negative output, unlabeled data etc neural networks works well. The power of deep learning lies in how you can shape the problem and loss function for specific purposes. And even if these circumstances do not exit they can do well, it's all problem specific.


Historically the XOR function has been the simple example that many ML algorithms can't handle. Just imagine a higher dimensional XOR with outliers, and you have a pretty good use case for DL with limited predictors.


Historically, this was solved in the 80's with the multi-layer perceptron, but it seems it still gets repeated.


20 Parameters is WAY too small for a deepnet. Deepnets are better suited for very high dimensional spaces where the data has sparsity, a hierarchical structure and can take advantage of nets rotational, translational invariance etc (if that architecture is used)

You cant pick completely the wrong tool, and then complain about how unsuitable it was.


Was there any rationale to using a "deep learning model" to train a 20 parameter model? Sounds like some amateur DS convinced the team this is a good idea because he thought deep learning was cool?


For up to 100ish parameters, even mixed with floating point I recommend trying the midaco solver a friend of mine develops. MINLP, ant colony method (i.e. gradient descent with many restarts). From my experience this runs circles around NNs for this class of problems (parameter optimization with relatively low complexity and/or limited amount of training data available).


Yep there are well established and powerful tools for various problems : linear programming, boolean satisfiability, analytical solutions, etc. NeuralNetworks and co are for a very specific, yet large, class of problems.


wouldn't principal component analysis have done the same thing without the brute forcing?


The engineer wouldn't have been able to waste the week feeling fancy fiddling with GPUs in that case.


Since the variables are categorical, won't PCA have to be modified to use it effectively? To my knowledge, PCA can only be used for continuous variables.


Yes. I think so. Or maybe some correlation plots. I doubt that a proper EDA was performed pre-modelling.


Have you tried training the network using dropout?


yes, dropout and other regularization techniques were used


I have been writing optimization solvers of many forms to solve problems in engineering for about 20 years. from "make excel do linear regression on some data" to linear least squares to some nonlinear methods, simulated annealing, bayesian methods, deep neural networks -- none of this is "a new kind of programming", it's "do a bunch of data munging, throw matrix at a function, get matrix back, interpret/plot."

there's really no other magic in it than that.


How well does "linear least squares to some nonlinear methods, simulated annealing, bayesian methods" work when you're doing speech to text, text generation or object detection? Deep learning is different in that it's a huge leap closer to human capabilities compared to the other methods you've listed.


well, your neural network is just a very complicated - ideally continously differentiable - function. In principle you could just fit this function to data with nonlinear least squares (e.g. Levenberg-Marquardt), we use backpropagation and SGD, because it's faster. Concerning simulated annealing: have a look at the old Alphafold and check how they optimize their neural networks...

So, could you explain again how the concepts and foundations of deep learning differs from plain old regression techniques?


Deep learning is closer to linear regression than it’s to human capabilities.


> Deep learning is closer to linear regression than it’s to human capabilities.

Quantitatively perhaps, in that deep learning is equivalent to many neurons, whereas a least squares model is only equivalent to a single neuron. Other than that there is not much evidence that a human brain is qualitatively different.


Deep learning is much more than a "long" network nowadays. Most of DL success comes from clever architectures, like Inception Net. AFAIK there's no evidence human brain is anything like those fancy deep NNs.


Recurrent networks don't give you a matrix back. They return a state machine. Qualitatively different.

Transitioning from a function that returns a scalar, to returning a polynomial, to returning an arbitrary function, and now to returning something stateful is a big difference. I suppose the next step is to write an algorithm that trains a Turing machine.


RNNs provided by common frameworks really do give a matrix back (examples [0, 1]). They're about as stateful as any other object or generator function.

[0]: https://pytorch.org/docs/stable/generated/torch.nn.RNN.html

[1]: https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cu...


What other statistical machine learning algorithm gives you a state machine as an output?

The particular encoding -- matrix or otherwise -- is just an implementation detail.


> What other statistical machine learning algorithm gives you a state machine as an output?

* hidden Markov models

* autoregressive models

* learned LQR

Sure, some of those have finite memory, but RNNs are practically limited–not as easily quantified.

I see what you're saying about the single-step operation appearing unique, but I also think that RNNs can be viewed through a lens such that they look like "normal" programming concepts like generators and folding iterators.


I'm not familiar with LQRs. Versus HMMs and linear autoregressive models, I think avoiding the need to predefine the form of the autoregression is a bigger leap than it might appear right now. I'd always been a neural network skeptic, thinking they're excessively complex for no benefit. It seems to me that's no longer true. I'm not certain that'll hold, but it's exciting.


The feedback machine is also represented as a matrix, with either implicit or explicit recurrence by the code around the model.


Yes, a finite state machine can be encoded as a matrix. Ultimately it's just a string of bits, represented by whatever physical mechanism it happens to reside in. It's still a state machine, and that's a big difference.


Why can’t there be a discussion on machine learning without everyone on HN trying to prove how unnecessary it is. Queue the anecdotes on simpler regression based methods, over paid scientists, and how much superior some other simpler method is.


It's an expensive (hardware, time, complexity) technique that is rarely the best. Why wouldn't people discuss cheaper, faster, understandable alternatives?


Just because you aren’t employed in a job that can make use of deep learning doesn’t mean it isn’t profitable. It’s extremely profitable. I’m tired of reading about some dumb alternatives that aren’t even relevant to the topic. Deep learning works very well for a certain class of problems. Continuing on with the trope that it doesn’t work just shows you aren’t educated


While I agree with you that deep learning works very well for a certain class of problems and that some criticism directed towards it is misguided, there are also some arguments for the opposite side.

I normally do not feel the need to comment about deep learning, because it is only tangentially related to some of my past projects, but I can also understand those who might want to comment negatively, because I have seen many cases of managers who did not understand at all how exactly certain problems can be solved, but nevertheless they pushed vigorously for the use of deep learning to replace other better suited solutions, because they believed it to be a modern and universally better method.

So there were times when I was tired of seeing one more attempt to misuse deep learning and to have to explain and demonstrate once more which solution is better.

Obviously, any such opinions, about which method is better in a certain case, should be proved with numerical results from tests or simulations, not with guesses, but some times that requires a lot of work to implement both methods, even if you are pretty sure about which will be the result.


As a computer vision engineer, I feel "rarely" quickly changes to "most probably" for a lot of the problems we work on.


I've seen the same in discussions of many things from programming to cars.

For instance, FP vs Imperative, where a simple example (map for instance) is being shown and someone always compares it to a for loop, and then disses garbage collection. They're obviously thinking of their existing language (we know from the GC comment) and not of whatever point the OP is trying to make about composability or whatever.

To some degree, examples are to blame. They often aren't good and don't show, to the actual audience, what the author intended. In my FP example, you need something that can't get stuck in a syntax-level debate and is complex enough for the benefits to show through. Like doing something that would take a 50-level deep stack of for loops in an unnested fashion.

It gives you more respect for great teachers who can pull examples out of the air that both illustrate and refuse to misdirect.

Can you (because I certainly can't) write a good post or blog about the minimum useful deep learning project that can't be done equivalently via any other methods?


Most people don't really have the data set to exploit larger ML models.

And the mistake people make is that they don't start with a simple model. They move straight into the heavy ones.


Is linear regression programming?

A hammer, nail, saw and timber can also be used to solve problems but I wouldn't call those programming in and of themselves, but they could be used to built analog computers (where cogs, cams etc. Are like lines of code or procedures).

Building a neutral network to get a result is not at all like programming. There is usually not a "perfect" structure, rather there are hardware, energy and time constraints to training and inference, balanced by over and under training the network.


> There is usually not a "perfect" structure, rather there are hardware, energy and time constraints to training and inference, balanced by over and under training the network.

The same can be said about complex simulations. The difference lies in knowing and being able to determine the limits of the system and the verification process.

In a simulation we can derive the accuracy of our model from parameters like numerical precision and -stability, coarseness and model used. In other words we know the function we want to model because we state it explicitly.

Neural nets can model any function and the challenge is to extract the learned function from the trained network, to examine its limits and correctness. This is what I understood the author meant by the "operational" viewpoint.

We can verify that a given network architecture combined with a given optimisation function will find a local minimum w.r.t a given set of training data. This can be verified and tested.

What's not so easy to verify and test, however, are the properties of the modelled function as well as the function itself. That's why we still have to rely on proxies like error metrics on fixed datasets or failure cases.

With a simulation on the other hand, we can easily control and predict the (quality of the-) outcome by manipulating well understood parameters (number of iterations, coarseness of the simulation, numerical precision, etc.).

I picked simulations as an example, because many other classes of program can be verified using formal methods since the desired results are usually known beforehand. Again, just another reason why the author talks about a distinction in terms of operations, not the fundamental type of programming.

I find this to be a very interesting and thought provoking idea.


> Is linear regression programming?

I’d argue not. Mostly. If you fit a model using lm() in R, and then apply that model it’s not the same as hand selecting the weights of a linear equation and coding that equation.

You could in theory select the same weights, and code it by hand. But no one ever does.


Doesn't that imply that if one can fit an NN in a line of code, then it's not programming?

That seems like a weird distinction to make.


I would go a step further, and say that prompt design will become an important sector of programming.

Modern language models (eg GPT-3 et al) offer the capability to take a natural language input, match it against the context of the sentence, then propose a query that is understandable to the layperson. This abstraction allows us to understand the problem better, rather than just analyzing the way the problem manifests itself in code. Having a programming language that mirrors our everyday communication is an important step forward in making the innovations from software broadly available.

The next wave of programmers will need to understand how human language can be used to efficiently guide models to solve problems that can’t be solved by human-written code. This is a big challenge, and we’re just at the very beginning of it, but I think it will open up new and undiscovered ways to create value in the world.

I have quite a few additional thoughts on the topic which I’ve captured here: https://sundayscaries.substack.com/p/whos-the-real-expert


> Having a programming language that mirrors our everyday communication is an important step forward in making the innovations from software broadly available.

I don't think that's the case at all. Mathematics developed a formalised non-natural language precisely because human language is completely unsuitable for expressing abstract concepts in a concise and unambiguous fashion.

You will find that even in non-technical fields language will quickly converge to a well-defined, coarse and highly coded subset of regular human language when efficiency and correctness are key. You can observe this in the different branches of military, medicine, and trades.

We use programming to formalise algorithms, processes, and models. Those are abstract concepts and the difficulty doesn't lie in expressing them verbally. This has been shown time and again by fruitless efforts to create localised dialects of more accessible programming languages like BASIC or Pascal.

Turns out it doesn't matter whether keywords are written in your native language or if you could write natural language-like sentences: the difficult part remained formalising the abstract concept and ideas in a meaningful, logical and sound way.

What I do think will help tremendously, however, is using system such as GPT-3 to create another level of abstraction. There are many descriptive tasks that don't need to be put into code manually. The structure and behaviour of UIs comes to mind.


I don't know why you were downvoted, but I am interested in maybe the simpler case of computer-assisted programming. It doesn't necessarily have to be natural language input. Anything that can help debug compiler errors or debug programmes.

I went to a talk years ago on someone's PhD project involving a certain interactive debugger for Haskell, where the user could traverse the graph, making claims about nodes and eliminating possibilities. I wish I could remember its name.

uu-parsinglib [1] is a parser combinator library that provides error correction.

[1] https://hackage.haskell.org/package/uu-parsinglib-2.3.0

Maybe these algorithms could combine with AI to create something better.


Over time the words 'teaching' and 'programming' will converge and at some point you likely won't be able to tell the difference between the two anymore (in a computer context).

Deep learning isn't programming per-se, but it definitely creates results that are of the same kind that programming would be able to create as well (in principle, at least, in many cases).


If you squint hard enough, deep learning can be seen as programming. A computer (human) is used to train a model (write and compile a program). The generated model (compiled program) is then used for inference (ran) on target machine with new input data.


As I've understood it, normal programming is transforming an input with a program to get an output.

Machine learning is giving the input and the output to get a program.

Problem is, it's too difficult to summarize or understand the resulting program, while the program you get is tied to the output data which is never really accurate.

I'm still curious how ML specialists are approaching the task of analyzing a resulting deep neural network, and squeeze some science from it (meaning putting words on things they understand and are able to explain).

I've also read that google was using ML to test different learning models, to easily find the best model to use for a given problem. I'm not sure but it sounded like they were feeding the training model and the data into another learning model. I can't remember the details or the article or the reddit comment but it sounded quite interesting.


You're probably talking about Google AutoML. FWIW there seems to also be Amazon SageMaker and Azure Machine Learning AutoML


for anyone interested, chris olah has a good blog post providing further insight on this topic: https://colah.github.io/posts/2015-09-NN-Types-FP/


I wouldn't say that deep learning is programming. I think the key feature of programming is legibility. A program is something that is clear enough to read and understand, to be decomposed in its constituent parts, and that has understandable semantics.

For example, writing an algorithm that has precise steps and procedures is programming. Putting my input into a box, shaking the box, and taking the result out is not programming, even if the box somehow solved the problem. Merely describing a problem and then having it solved is not enough to delineate programming, because that actually does apply to almost anything.


Going by the operationalistic idea of the article, which I interpret as duck typing, I think the key operations that can be applied to programmes are building them, running them, and measuring the build and results of the programme with respect to a certain goal. This applies both to a traditional programme, and to deep learning. Programmes are after all written to satisfy human needs, even if it is an obfuscated C contest.

For instance, imagine you have a black box that observes the horse races, Twitbook, the betting market and so on, and based on those observations executes bets for you with a bookmaker. The execution of the orders has a measurable effect on your net worth.

You might write a traditional programme which takes all of this data, and based on some ETL, statistical models and probability calculations, executes orders.

You might do some ETL, plug it all into a neural network, tune it and execute orders based on the results.

Your traditional programme is very complex, and combinations of small bugs may have large effects on the results. Your unit and integration tests may themselves be wrong. Formal testing possibly reduces the expected value of the system and is an arse to carry out for any large system. The expected value of the system itself becomes harder to reason about as the system grows, based on the operation of reading and understanding the code.

The internals of your neural network are also difficult to reason about in some ways. It is difficult to understand the workings of your neural network and specific parts' effects on the measured effects of its output. It will take time to tune it and build the most profitable model.

Both implementations of the black box may be backtested, and some sort of trust can be established over the expected value of each implementation. Both implementations allow the operations of running, and measuring the results of running. Both implementations are difficult to reason about in various ways.

We are perfectly happy to give money to people for them to do things without fully understanding their inner thoughts and the processes behind those thoughts.

Which is the golden duck?


>We are perfectly happy to give money to people for them to do things without fully understanding their inner thoughts and the processes behind those thoughts.

yes, but we wouldn't say that we're programming them, which is the problem with the operational definition, it applies to everything. If my drunk uncle is great at horse-betting and I just need to give him a nice sixpack of microbrew and get measurable net worth increase out I've not turned into a computer scientist.

Hence my argument that legibility is what matters. Programmers must be able to reason, and rearrange, and understand relationship between syntax and semantics of a program.

I think it's more accurate to compare deep learning to running a sort of physical experiment, rather than programming.


I agree that deep learning is not programming, and that maybe the title of the post is wrong. However, I agree with the sentiment of the article. The contexts that both operations are carried out in are closer together than the context of employing your uncle is to either. I was trying to highlight that it is more useful to focus on the similarity of the contexts and the net benefit of either method.

Also, if your uncle is better than your computer, then stop programming it at all. However, if he was actually any good, then he shouldn't be talking to you, and you shouldn't be giving him any beer. Unless he was banned by the bookmaker.


Somehow remind me of Software 2.0[1]

[1] https://medium.com/@karpathy/software-2-0-a64152b37c35


Having thought along these lines before, I realized that i didn't get any new useful insights by throwing neutral networks and conventional programming into the same bucket. Life went on as usual in both the worlds and I stopped thinking about it.

Any insights worth learning about?


About the only useful genralization has been differentiable programming - I.e. AD embedded in a programming language to help mix learning some functions from data. But this is not a consequence of clubbing the aforesaid two.


Deep learning is very different to, but feels to me like the way you work with Prolog.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: