Hacker News new | past | comments | ask | show | jobs | submit | marketforlemmas's comments login

> dealing with non-linear data

What is non-linear data?


Data derived from non linear inputs.

That is to say problems that can't be expressed by linear functions.

I.e. Y= mx + B is a linear function.

Y= ax^2 + bx + C is a polynomial (non linear) function.

Linear Programming (LP) involves solving a series of linear equations (something like Excel's Solver can do this).

When you are dealing with non linear functions you need to use a method such as Sequential Quadratic Programming (SQP).


Using a term like nonlinear science is like referring to the bulk of zoology as the study of non-elephant animals.

— Stanislaw Ulam

https://en.wikipedia.org/wiki/Nonlinear_system


> WF lost money on this

This ignores the fact that Wells Fargo is a publicly traded company and has an incentive to make their numbers look good for their shareholders.

... And to be clear, your assertion is that thousands of employees across different branches all independently decided to commit fraud and risk their jobs all for an additional 450 per pay check? That is your alternative explanation?


On the contrary, I think they committed fraud to keep their jobs. The $450 is a break-even number; if not getting fired resulted in the employees getting an extra $450 in pay, then WF lost money.


Is being chosen by Joel Spolsky not enough of an endorsement for you?


So I've read the PP piece and your article, and I think your criticism is way off-base. The most technical part of your argument relies on a p-value being .057 vs .05, which is not a good one. No one seriously believes that .05 is magical number that determines true from false; things that are close to it but not quite below .05 are not automatically false.

They go on to give supporting analysis in form of the false positive and false negative rates by race, which is pretty compelling evidence. You claim to not believe that because you cant find it in the notebook but its literally right underneath the Cox model section.

I was intrigued by this article and went a step further to plot the ROC curves and the evidence is solid. It's messy, but you can see it here https://github.com/stoddardg/compas-analysis/blob/master/my_... in cell 78. Its quite clear that the algorithm is choosing a different point of optimization on the ROC for white people (a more lenient one) than for black people. A white defendant with a risk score of 5 is as likely to commit a crime as a black defendant is with a score of 7. That's an obvious case where you could simply relabel and be more fair but their algorithm chooses not to.

I also hate when people abuse bad statistics and reasoning to sell page views.


I have plenty of criticisms of hypothesis testing and p-values. Nevertheless, if you choose to run that type of analysis, do it right - this means sticking with your analysis and not using weasel words like "almost statistically significant" when it doesn't come out the way you want. Incidentally, the real p-value is 11.075% since they ran two hypothesis tests and didn't adjust for multiple comparisons.

Your analysis might be right - if so, that's interesting. I'll take a closer look and write a followup piece if true - among other things glancing at your ROC curve suggests they are pretty close, and perform better for whites in some regions and better for blacks in others. But it's 7:30AM (pre-coffee) and I haven't looked closely yet.

But since PP did not do any of this, my criticism of them holds - they ran an NHST, got the wrong result, and then spouted a bunch of anecdotes instead of admitting that their analysis went against what they wanted to find.


What's your response to the GP that you seem to have missed the part where false positive/negative rates in the notebook? In your blog post, you said this:

> Finally, the article includes a table of false positive probabilities (FPP) and false negative probabilities (FNP). This may or may not be evidence of bias - the authors would need to run a statistical test to determine that, which they don't. In fact, I can't even find the place in their R notebook where they did that calculation. Is this the result of bad statistics? Is it merely random chance? Who knows!

Looking at PP's Jupyter Notebook, the calculations seem to be performed at lines 50 onwards (if you're referring to the table that I think you're referring to).

FWIW, those "weasel words" you allege are in the writeup of the methodology, where the audience is expected to follow along and see how the 0.057 is calculated. I'm not sure how you're interpreting that calculation...My read is that it's not the bedrock from which all of the other analyses are based from. Where in the story do you see that particular calculation being used as the main (or even ancillary) thrust of the piece?


Aggregate false positive/false negative rates don't prove anything. They can be caused by composition differences, which the analysis demonstrates the existence of.

But I'm sure they look convincing to ProPublica's readers who are not statistically sophisticated.


What's even more concerning is that these statistical noobs sometimes look to experts, unaware that these math geniuses may lack the literacy to read a plaintext notebook to the end.


I don't think advertising for jobs is a zero-sum game, especially if the matching algorithm is good enough to match employers and employees that had no knowledge of each other before. If you are able to reduce those search frictions, you have created value. Several economic professors won the Nobel prize for their work in this area:

http://economix.blogs.nytimes.com/2010/10/11/the-work-behind...

https://en.wikipedia.org/wiki/Search_theory


Well, those Nobel prizes are for a specific theory. In practice, in this case, you see developers looking to get problems solved (or solving problems), and getting distracted to look for another job. So even if you are removing market friction, there's a huge cost in people switching (or even getting distracted all the time). In my opinion, if people are looking for a job, they should go to a job-hunting website (even if it has lower-quality data about them).

Anyway, it would be nice to have the effects properly quantified.


There are many developers looking for a good job. There are many companies having difficulty finding qualified developers.

Our goal is to solve those problems, and it is not a zero-sum game.


To be a little more clear about the diamond and circle...

LASSO is a diamond because it represents the constraint that w_1 + w_2 <= 1. The region of (w_1,w_2) that satisfy that inequality is a square.

Ridge is a circle because it represents the constraint that w_1^2 + w_2^2 <= 1. The region of (w_1,w_2) that satisfy that inequality is a circle.


Agh I forgot to mention this! Thank you, I'll add a note to my post.


This is the kind of reasoning I was looking for. Thanks!


Glad you got an answer. :)


Does SSRN come with any guarantee of peer review or minimum quality threshold? The only downside with arXiv is that anything can be posted. It works fine as a paper-hosting service but is terrible as a "social proof that my paper is OK". If SSRN comes along with such a reputation, then it might be a hard transition.

However if SSRN is just a paper-hosting service, then everyone should move to the arXiv immediately.


SSRN posts anything (no peer review). It has the same function as arXiv.

The only issue I see with moving to arXiv right now is that currently arXiv only accepts papers on topics such as physics, math, and CS. For arXiv to be helpful to the SSRN crowd, it would need to have a few new categories (e.g., "quantitative social science", which could be split into subtopics mirroring the main topics in economics and management science journals).


Totally agree; this is basically a text-book example of the "No True Scotsman" argument.


I think it's possible that its both a sickness and a boon. I fully agree with your point that the "fear of being a loser" does drive people to work harder, do better, etc. All of that has (generally) positive effects (well, assuming that you are willing to ignore the fact that many people will use less-tan-ethical means of making more money).

However I can also see it as a sickness in the sense that we fetishize this extreme success. Imagine a hypothetical person that has to make a choice between doing some risky start-up that will consume his/her life or getting a solid job that allows them to work 40 hours per week and then enjoy their personal lives. Nobody would explicitly blame the person who chooses the latter option but we celebrate (especially on HN) the people who choose the former option. We celebrate it to the point that there are many people who choose the harder option simply because they feel they are a loser for not going after it and throwing everything they have at it. I don't see that being a good thing that everyone who is on the fence decides to go for the risky/hard path simply because of some societal zeitgeist.

Then again, as you point out, those individual choices lead to our collective advancement, so it's a really hard balance.


www.guessthekarma.com

It's a simple game where I show you two images from Reddit (SFW of course) and ask you to guess which was more popular. I'm using the data from the site as part of my research into the dynamics on internet popularity.

It's awesome because it's showing that Reddit is a pretty random/fickle thing. In the first iteration of the experiment (we're on the second now), people couldn't really do much better than randomly guessing. If one image had 10,000 upvotes and the other 10, people could only guess the popular one about 55% of the time. I wrote up some quick results in this blogpost: https://medium.com/@gregstod/guess-the-karma-2-0-82a224a691f...

I'd very much appreciate it if you played the game and donated a few data points :-)


I'm 3/3 so far! Cool site man


One UI suggestion: make the images bigger!


Thanks for the suggestion. If you click on them, they'll pop out to the full size. Otherwise, I couldn't figure out how to get it to appear OK on both desktop and mobile (because I"m not really a web developer).


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: