A Practical Guide to Tree-Based Learning Algorithms

iamnafets · on July 23, 2017

I've found Adele Cutler's presentation on random forests to be an outstanding resource for getting intuition of tree-based algorithms.

http://www.math.usu.edu/adele/RandomForests/UofU2013.pdf

Thinking about trees as a supervised recursive partitioning algorithm or a clustering algorithm is useful for problems that may not appear to be simple classification or regression problems.

lackadaisicall · on July 23, 2017

I like this one better:

https://web.csulb.edu/~tebert/teaching/lectures/551/random_f...

I made it.

claytonjy · on July 23, 2017

On the topic of complementary resources, I really like Ben Gorman's explanation: https://gormanalysis.com/random-forest-from-top-to-bottom/. His related posts on singular decisions trees and GBM's are just as good, too.

thearn4 · on July 23, 2017

As interesting as I find the current state of deep learning to be, there is something about random forests that I can't help but find much more cool. Probably the amazing out-of-box performance.

nerdponx · on July 24, 2017

You might also be interested in algorithms like Adaboost [0] and its successors, including the various "gradient boosting" algorithms like XGBoost, LightGBM, and the newly-open-sourced Catboost.

0: https://jeremykun.com/2015/05/18/boosting-census/

shoo · on July 24, 2017

a great paper on this is Friedman's "gradient boosting machine" paper, where he shows how the boosting idea can be generalised to support a range of different loss functions and underlying approximation schemes (especially trees).

"Greedy function approximation: a gradient boosting machine" - JH Friedman

platz · on July 24, 2017

also the model is analyzable so as to determine the variables which are contributing the most.

nerdponx · on July 24, 2017

For some very nice Random Forest visualizations, check out the R package "forestFloor" [0].

I also once started implementing a R package for "partial dependence plots" [1][2], which are popularly associated with Random Forests but aren't specific to them.

[0]: https://CRAN.R-project.org/package=forestFloor

[1]: http://scikit-learn.org/stable/auto_examples/ensemble/plot_p...

[2]: https://github.com/gwerbin/statsplots/blob/master/R/partialp...

codewithcheese · on July 24, 2017

You might be interested in this approach to explaining the predictions for any classifier. https://github.com/marcotcr/lime

petters · on July 23, 2017

Yes, they have very few knobs to turn, which is very attractive.

Aqwis · on July 24, 2017

Does anyone know why most machine learning libraries (notably scikit-learn) implement trees and ensembles of trees based on the CART algorithm? It seems like using other types of trees (See5, MARS) particularly in ensembles could possibly have advantages as these types of trees were specifically developed as improvements to CART/C4.5.

lackadaisicall · on July 24, 2017

> Does anyone know why most machine learning libraries (notably scikit-learn) implement trees and ensembles of trees based on the CART algorithm?

This is just my theory.

Because it was the first tree based algorithm and Leo Brieman really did market it out. He even trademark Random Forest.

Kinda like how XGboost is doing right now.

My professor is also trying to market his version out too. If I get around finishing my thesis. His algorithm problem is that it isn't ported to any language at all. It's written years ago in a C and he's not a programmer.

I'd imagine it is the same with the other algorithms. Leo on the other hand is a CS major on top of a Stat major.

Also there are tons of regression algorithms out there that can be made into trees (their fully nonparametric counter part).

But in the end linear regression is the most popular next to logistic iirc. There's survival trees and BART bayesian trees which is in it's infancy.

joe636434 · on July 24, 2017

A professor who invents his own version of tree but can not program. Seriously. Is this common in academic circles where a computer professor who can not program ?

nerdponx · on July 24, 2017

AFAIK:

- ID3, CART, C4.5, and C5 are all conceptually equivalent "recursive partitioning" algorithms, and CART is sometimes used as a catch-all term instead of the phrase "recursive partitioning".

- MARS requires two passes over the data

- CART is "dumber" than CHAID, which could be seen as a benefit for "high-volume" ensembles like RFs and GBMs. One blogger writes that CHAID is a better explanatory/exploratory tool, while CART is a better prediction tool: http://www.bzst.com/2006/10/classification-trees-cart-vs-cha...

Some other comparisons:

https://stats.stackexchange.com/a/61245/36229

https://stackoverflow.com/q/9979461/2954547

So the answer is that CART specifically isn't used everywhere. Recursive partitioning is used everywhere, mostly because it is simple.

6502nerdface · on July 23, 2017

Nice write-up, thanks for sharing. One possible typo I noticed:

> Maximum depth of tree (vertical depth) The maximum depth of trees. It is used to control over-fitting, higher values prevent a model from learning relations which might be highly specific to the particular sample.

Shouldn't it be lower values, i.e., shallower trees, that control over-fitting?

sadanand4singh · on July 23, 2017

Thanks for pointing. Yes it should be lower value to prevent over-fitting.

Bishonen88 · on July 23, 2017

df_train_set.Income.value_counts() should be df_test_set probably in the part where it's comparing both.

lugg · on July 24, 2017

Is OP here?

Can you please remove the text justification? Makes it really hard to read on mobile.

ErikBjare · on July 24, 2017

Looks fine to me. I wouldn't want him to remove it.

zephyrppt · on July 24, 2017

Looks fine in portrait mode, width-wise truncated in landscape mode.