Hacker News new | past | comments | ask | show | jobs | submit login
Tapkee, a dimension reduction library (lisitsyn.me)
45 points by urlwolf on March 7, 2015 | hide | past | favorite | 7 comments



Can anyone explain to a newbie why LDA and embedding through neural networks were not included as a dimensionality reduction techniques?


[Assuming you meant LDA = Linear Discriminant Analysis.] It looks like the toolkit implements "unsupervised" methods, where the datapoints don't have a special "label" feature that is treated specially in the embedding. If you do have labels, see also neighbourhood components analysis (NCA), amongst others.

They say they've biased towards spectral methods. Getting neural net methods to work requires a different type of experience.

If you're prepared to use Matlab, there are neural net and some supervised methods in this toolbox: http://lvdmaaten.github.io/drtoolbox/ A bunch of this stuff, if not all, might work in Octave too.

Some practical advice. Try linear methods like PCA and LDA first. Think about and/or try out different rescalings and representations of your features. Also look at what the code in a "blackbox" toolbox is doing. From memory, I think the Matlab toolbox reduces the dimensionality to 20 with PCA before applying most other methods. (What if your data has less than 20 dimensions? Change that code!)

I've had poor experience with most of the spectral methods. Things I've got to work include the simplest (like PCA), NCA, t-SNE and variants, and auto-encoders. On many datasets, PCA has worked the best!

If you meant LDA = Latent Dirichlet Allocation: It looks like the toolkit implements methods that are mainly used for real valued features, rather than sets of counts.


I meant Linear Discriminent Analysis - good answer! I've experienced similar results with PCA vs other methods.


Let me comment that as an author of the library. There are a few reasons actually. First, as mentioned it was biased to spectral methods from the very beginning. Second reason is that I didn't even know how to implement and train neural nets properly. And the last reason why it is not here yet - unfortunately I don't have much time to implement that yet so contributions are welcome :)


> "The library is distributed under permissive BSD 3-clause license"

Fantastic! That means it is actually an interesting announcement and worth digging deeper into.


Where's the SVD? Did I miss something?


Comes with a great command-line tool.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: