Hacker Newsnew | past | comments | ask | show | jobs | submit | gajomi's commentslogin

I didn't see any of the specific job titles you mentioned on the career site. Perhaps this is part of the "antedisciplinary" method you refer to? Practically speaking though, should one just use the "Open Apply" option for Machine Learning Engineer/Scientist roles?


Yes, please use Open Apply.


I would also like to know more as you mentioned


PSA... check out this guy's LinkedIn profile. It's a great read.


Thank you! You gave me an excuse to look at my own LinkedIn profile, which I've ignored for a few years.

I have a distinct memory from not that long ago when I was unemployed and asking people for advice on what to put on LinkedIn and on my resume.

One consistent piece of advice was "Don't list anything more than five years ago. People will think you're old!"

I thought, "Screw that, everyone already knows I'm old."

So I decided to just tell my story, all the way back to the beginning in the days of punch cards and Teletypes.

I am grateful that you appreciated that.


A bit surprising to hear that you weren't swimming in offers to hire you all the time. Any idea why that was? Maybe you're somewhat picky about the job and responsibilities that come with it?


It was an interesting reading to through your LinkedIn profile. Really liked how you summarized the work and tech used for each entry. When I reached the earliest entry "Night operator" which sounded like the start of a spy novel!


I’m impressed by Michael’s LinkedIn profile and extensive experience. If you ever take up blogging, I imagine you’d have a lot of lessons and experience to share.


I think that is why he asked the rhetorical "which masks?".


There are a couple starting points you could take. I spent a weekend hacking out a program that generates fake word/definition pairs with a transformer model set against a dictionary: https://youtu.be/XnJ2TKAn-Vk?t=1547. If you substitute fake words for real words and have a sufficiently accurate model you could quickly generate reasonable and novel definitions.

There are more complete versions of this kind of thing publicly available: https://github.com/turtlesoupy/this-word-does-not-exist

> This would be amazing, for example, to run on a large corpus, generate the dictionary, and then run it again to find words that are used but not defined - not just in the original corpus but in the definitions too.

I think this would be how you would gauge success of the model. That is to say, you would evaluate model accuracy on a set of held-out words with definitions that never appeared in your dictionary training set but appeared in context in your corpus. You would have to manually annotate whether or not the generated definition of these held out words was acceptable.


Thanks - that is indeed very interesting, and I will spend my weekend checking it out.

>I think this would be how you would gauge success of the model.

Yes, exactly. I think there would definitely be edge-cases, but the general rule is that there should not be any undefined terms/words in the final dictionary. The degree to which this can be achieved is of course related to the cyclomatic complexity of the original materials. But this is why I want this tool - to see how effective it is for creating training materials that prepare students for obtuse subjects.


Let me take a stab at this (I'll maybe take it halfway there). First of all we want to know what kind of matrix we are talking about.

Imagine that you have a whole bunch generative models (its best if you imagine a fully connected Boltzmann machine in particular, whose states you can think of as a binary vector consisting only of zeros and ones) that have the same form but different random realizations of their parameters. This is a typical example of what a toy model of a so-called "spin glass" looks like in statistical physics (the spins are either up down down, usually represented as +1/-1). Each of these models, having been initialized randomly will have their up particular frequency of a particular location (also called site) of the boolean vector being either a one or a zero.

If the tendency of a site to be either or one or a zero was independent of every other site the analysis of such a model would be pretty straightforward: every model would just have a vector of N frequencies and we could compare how close the statistical behavior of each model was to the other by comparing how closely the N frequencies at each site matched one another. But in the general case there will be some interaction or correlation between sites in a given model. If the interaction strength is strong enough this can result in a model tending to generate groups of patterns in its sequence of zeros and ones that are close to one another. Furthermore if we compare the overlap of the apparent patterns between two such models, each with their own random parameters, we will find that some of them overlap more than others.

What we can then do is to ask the question of how much, on average do the patterns of these random models overlap with on another in the full set of all models. This leads us to the concept of an "overlap matrix". This matrix will have one set of values along the diagonal (corresponding to how much a models patterns tend to overlap with themselves) and off diagonal values capturing the overlap between. You can find through simulation or with some carefully constructed calculations that when the interaction strength between sites is small that the off diagonal elements don't tend to zero, but rather a single number different from the diagonal value. This is perhaps intuitive: these models were randomly initialized but they are going to overlap in their behavior in some places.

Where things get interesting though is when you increase the interaction strength you find that the overlap matrix starts to take on a block diagonal form, wherein clusters of models overlap with one another at a certain level and at a lower but constant level with out-of-cluster models. This is called one replica symmetry breaking (1RSB). These different clusters of models can be thought of as having learned different overall patterns with the similarity quantified by their overalp. If you keep increasing the interaction strength you will find that this happens again and again, with a k-fold replica symmetry braking (kRSB) with a sort of self similar block structure emerging in the overlap matrix (picture is worth a thousand words [1]).

Now the real wild part that Parisi figured out is what happens when you take this process to the regime of full replica symmetry breaking. You can't really do this with simulations and the calculations are very tricky (you have a bunch of terms either going to infinity or zero that need to balance out correctly) but Parisi ending up coming up with an expression for the distribution of overlaps for the infinitely sized matrix with full interaction strength in play. The expression is actually a partial differential equation that itself needs to be solved (I told you the calculations were tricky right), but amazingly, it seems to capture the behavior of these kinds of models correctly.

Whereas mathematicians have a pretty good idea of how to understand the 1RSB process rigorously, the Parisi full replica symmetry breaking scheme is very much not understood and remains of interest both to complex systems researches trying to understand their models and applied mathematicians (probability people in particular) trying to lay the foundations needed to explore the ideas being explored by theorists.

Hope that helps a bit!

[1] https://www.semanticscholar.org/paper/Spin-Glasses%2C-Boolea...


I work with many physicists. They are some of my most beloved colleagues. But man, physicists! Amirite?


+1 on the whole references as a nice introduction. I think the authors overstate the preparation of their hypothetical "pedestrian" (either that or they need to get away from the physics department a bit more often), but a great reference nevertheless. I also got a lot out of sections of Nishimori's textbook [1]. In particular it helps motivate problems outside of physics and provides some references to start digging into more rigorous approaches via cavity methods (which I think, incidentally, are also more intuitive). I am a novice in this area but am sort of crossing my fingers that some of the ideas in this area will make their way into algorithms for inferring latent variables in some of the upcoming modern associative neural networks. What I mean here is that it would be cool not just to have an understanding of total capacity of the network but also correct variational approximations to use during training.

[1] https://neurophys.biomedicale.parisdescartes.fr/wp-content/u... [2] https://ml-jku.github.io/hopfield-layers/


I have been using the phrase "Artificial Stupidity" as well, but with the opposite meaning. Specifically I like to think of human-like artificial stupidity as a challenge for machine intelligence, in which an algorithm is able to replicate the rather sophisticated and incredibly entangled logic, intuitions and calculus of humans at the height of their stupidity. This seems to me a much greater challenge than the standard sort of supervised learning problems in that a truly stupid AI must be able to imagine latent variables that allow it to explain away real world observations in a way that is both statistically implausible but casually serendipitous to their stupid peers. This seems to me to be a requirement for any kind of useful AGI.


You could easily generate stupid statements on demand. Just post a video on Youtube on a related theme, and scrape the comments.


This would be the opposite of a Turing test though, since most people wouldn't be able to do this.


I can imagine a variant of this card that tries to tackle the point you bring up. Not all purchases are equal. Some of them have negative carbon footprints. The tree planting offsets the footprint for every purchase. These things are not easy to estimate exactly but since there are probably order of magnitude differences in carbon footprints for different products there should be useful information about the relative impact of a purchase to display to the consumer and to demonstrate the overall impact of the project.


This feels very clever to me. I think people have very wrong senses of how bad various actions are for carbon footprint. I know folks who don’t eat meat for CO2 reasons (which I’m fine with) but fly across the US all the time—-sorry citation needed and happy to be corrected but I did this math once and it was roughly one year of meat = one (round?)trip.

A side-benefit, in my view, is to show that actually offsetting carbon footprints isn’t terribly expensive (for people who have the disposable income to fly across the country), which I think counters the “We’d have to shut down the economy to stop global warming” misconception that I believe a lot of people have (citation also needed here, just personal experience)

The Indulgences That Actually Work Card (in the sense of old Catholic indulgences), you could call it :)


It seems to me that they are basically describing a variational formulation of the "optimization perspective" of reinforcement learning, which is cool, but I am confused... where is the supervised learning? Like what is the input and what is the output?


The way I understand it, the two subproblems are supervised in the sense that they are trained using data sampled from a fixed distribution, instead of data sampled from a distribution that changes as you update your model, as it is usually the case in RL. This makes the training more stable.


Thanks for clarifying that point.


It seems more as if the authors are abusing terms from Machine Learning like "Supervised Learning".


abusing how?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: