There's a great recent documentary on Netflix called My Octopus Teacher about an octopus which displayed some of these behaviors (interspecies bonding, tactile drive, play). Another good Netflix documentary, on birds, is Beak & Brain: Genius Birds from Down Under.
True. We haven't confirmed a single other one yet, which is why actually finding one will be probably the most significant discovery humans ever make.
But I would imagine that an interstellar civilization that has discovered thousands of life-bearing planets would start to develop some criteria about what type of life is interesting or not. It could just be that nothing on Earth meets the criteria to be interesting.
My issue with microtransactions is that the cost of making the decision seems like it outweighs the currency cost. Like, I used to buy computer games (e.g. from Steam) when they were on sale. Now I ignore any sale which has a price greater than $0.00, because if a game is free, I don't have to spend any time figuring out whether I will like it.
Where do you see microtransactions being applied in practice?
>Say you flip a coin a few times and only get heads. Your maximum likelihood (frequentist) estimate is that the coin will always land heads. In a Bayesian setting, if you have a (say uniform) prior on the probability that the coin lands heads, your maximum a posteriori estimate of this probability will be non-zero, but will get continue to get smaller if you continue only seeing heads.
Not quite. If you have a uniform prior, there will be no difference between MAP and MLE.
>From the vantage point of Bayesian inference, MLE is a special case of maximum a posteriori estimation (MAP) that assumes a uniform prior distribution of the parameters.
>The training algorithms for deep learning are also the hottest algorithm research area in machine learning, and are certainly applicable beyond deep learning.
The lore I've heard is that most new deep learning training algorithms (optimization algorithms) only work better on particular special cases, and it is hard to do better than the established algorithms in general.
I'm also not sure why you're saying they're applicable beyond deep learning--how do you plan to train a PGM or SVM using Adam?
I'd more generally describe the area as first order optimization, including methods like acceleration, automatic differentiation, stochastic approaches. Adam is just one trick for determining a hyperparameter.
They are usable everywhere derivative-based optimization is usable. Which certainly means SVM's, though since it's a shallow method you don't need much data to train it, and hence don't need a scalable optimization methods (it would just be unnecessarily slow). But you certainly could do it if you somehow needed to. Here's the first hit on google for "sgd svm': https://scikit-learn.org/stable/modules/generated/sklearn.li...
The fact that you can't use first order optimization methods for graphical models is one answer to the question of why everyone doesn't use them. Though for small models there are deep networks which model them and are trained as per usual for neural networks. I think this is still an active research area.
Nice yea I would agree with the vast majority of this, only thing I would add is that Adam/gradient methods are still useful in a graphical model e.g. to get a MAP estimate (and then you can get a rough posterior estimate using variational methods or Laplace approximation once you find the MAP). But I agree I wasn’t clear about what I mean when I say graphical models since I think most people would understand graphical models to mean a full MCMC sampling of the posterior and marginalization over hyperparameters. I would say it’s useful to understand why people do that and why that is useful, but many times that is (1) overkill and (2) inspires overconfidence in the result because once we marginalize over our prior distribution people tend to forget that our prior may have been a complete fudge. I just mean graphical models as a tool for model building, understanding how different models relate to one another, and as a recipe for deriving a loss function.
Adam, RMSProp, etc are just flavors of gradient descent so they’re useful on anything from ResNet to logistic regression. There are more flavors like natural gradient that are more useful for smaller problems since they require a Hessian matrix, but gradient descent is gradient descent. We use Adam in production for logistic regression, not for any particular reason really, just happens to work.
I'm not the OP, but personally I see NN's as being really really useful where the input data is unstructured (such as text or images). The deep approach (appears to) build better features than a human can, but I'm not convinced that they are _that_ much better (or indeed at all) than standard methods for tabular data.
Once upon a time, when I used to hire data people, I'd ask them to tell me about a recent data project. They'd normally mention some kind of complex model, and I'd ask them how much better it was than linear/logistic regression. A really large proportion of candidates (around 50%) couldn't answer this because they'd never compared their approach to anything simpler.
One person told me that linear regression wasn't in the top 10 Kaggle models, so they would never use it.
Oh so training time is virtually irrelevant to us and if it weren’t we would have to be a lot more careful about optimization methods and possibly which language to use. We also cannot use NN for the models we build (we are restricted to LR, but LR has as much model capacity as you need as long as you include more and more feature interaction terms).
NN’s are universal function approximators. They can have arbitrary model capacity, and you can sort of control that with architecture decisions, loss function/regularization choices, and early stopping, but depending on the problem they can cause more problems than they solve. Usually you don’t really know if your NN will generalize well outside of your train/test distributions, so many times it’s better to have a simpler, more predictable model that you can control the behavior of. This is all from my personal experience and is completely moot when we’re talking about e.g. NLP or vision tasks or situations where you’re drowning in data. NNs are super interesting and powerful, don’t mean to suggest otherwise but the mantra is: “what is the right solution to my problem”. Lots of great advantages to NN’s as well (you can get them to do anything with enough cajoling and they can be solutions to major headaches you would usually have in e.g. kernel methods).
all of us employees with startup equity know that it's a far cry from employee ownership, and it's disingenuous to suggest that we have any quantum of control when push comes to shove.
In democratic society, on principle, how much your say is worth is not determined by how much money you are able to invest in it. Your analogy would only make sense if you can "buy" votes, and arguably the political sphere is in a pretty sad state too with how much control voters actually have. At best, you're saying that a company can be just as undemocratic as the modern state. That's not a high bar to cross.
This. These folks are helping to shape new laws that set precedent for every other case. For example: the wealthiest now pay historically low tax rates [1][2]
Do they really set the vision, though? Did they start the #MeToo movement and bring down Harvey Weinstein and other powerful men? Are they leading the push for transgender acceptance? Are they the reason why reparations and universal healthcare are now mainstream policy proposals in the Democrat party?
It certainly makes sense for the wealthy, elite lawyers of a society to set the course of policy. But I'm not sure what the empirical evidence from 2019 America says. I think a lot of these Yale graduates end up becoming cogs in the existing system without actually changing it, and I think other forces might be equally as or more impactful in terms of shaping political currents.
I mean, go look at the professors leading these policy pushes, the journalists writing about metoo, and the policy wonks pushing for healthcare or trans changes. You're going to see a lot of JDs from top schools -- a lot of Yale. You won't see any self-taught coders or mathematicians from CalTech, you know?
Setting the vision for American legislation and so-called "justice" is hardly something to be envious of, or to be proud of if you know anything about legislation or the "justice" system.
And yet we routinely see people here arguing in defence of 80- to 100-hour work weeks at startups. This kind of mind-sickness is not the sole domain of the elite legal profession.