Machine Learning Algorithms Cheat Sheet

rmbyrro · on Feb 20, 2022

I don't get the purpose of these cheatsheets. Haven't we already automated these decisions and model optimization with AutoML?

AI or Data Science is not my fulltime day job but my go to when I need any ML model is an AutoML library, such as AutoSklearn [1] or AutoKeras [2].

[1] https://automl.github.io/auto-sklearn/master/

[2] http://autokeras.com/

yobbo · on Feb 20, 2022

It can take hours, days, or weeks, depending on data set, to even know if a model is going to work. In these cases, blindly searching for models in the entire space is sort of futile.

But then, cheatsheets like these don't help much either.

jll29 · on Feb 20, 2022

Machine learning isn't an exam (and if it was, it's an open book one), so nobody really needs cheat sheets.

There is, however, a part of machine learning practice that is not covered by textbooks.

discordance · on Feb 20, 2022

AutoML is good most of the time but there is still a place for custom model development in particular applications where you to adjust and tune parameters.

rmbyrro · on Feb 20, 2022

I totally agree. Personally, I don't have the skills to tackle these particular cases. Cheatsheets don't help me in anything either. And I presume (maybe I'm wrong?) they're also useless for those with advanced skills for particular cases...

thibauts · on Feb 20, 2022

Used to use this cheat sheet

https://scikit-learn.org/stable/tutorial/machine_learning_ma...

sireat · on Feb 20, 2022

That is the one I give for students who want to get started with ML.

Has links to relevant papers for the implementations.

melony · on Feb 20, 2022

Too much focus on implementation details and not enough emphasis on state of the art theory. A new practitioner should start with probabilistic learning for theory and deep learning on the practical side. A cursory overview of linear regression, decision trees, and the various linear algebra vector learning tricks is more than enough. They are mathematically interesting but not particularly useful except on sorted and cleaned tabular data. The best introduction to ML is by demonstrating its solutions to previously intractable classic problems: realistic text to speech, speech recognition, image classification, image generation, natural language processing. A boosted decision tree is rather limited in its applications (except for Kaggle competitions) that it should be something to learn after the fact.

Understanding maximum likelihood and probability chaining at the information theoretic level is a lot more important than focusing on the nitty-gritty details of the Generalized Linear Model du jour. A good understanding of ideas like KL-divergence and messaging passing is more crucial than knowing which tree or forest algorithm to use.

Many tutorials like to focus on linear and logistic regression because they are well understood and mathematically straightforward (for the educator). But machine learning is not purely a field of math, nor is it merely statistics. Too many practitioners never go beyond formula chugging and end up missing the forest for the trees. Sure a big enough model will solve almost any problem. However, without understanding the information theory and probabilistic explanations behind why a model can learn and function, ML will always remain a bag of tricks.

pramodbiligiri · on Feb 20, 2022

Can you suggest any resources for the information theoretic level explanation that you talk about? Do either of the Hastie-Tibshirani books cover it?

melony · on Feb 20, 2022

https://probml.github.io/pml-book/

Murphy's should be more than sufficient.

Mackay (Information Theory, Inference, and Learning Algorithms) and Bishop (Pattern Recognition and Machine Learning) are also commonly recommended but their books are all more than a decade old.

pramodbiligiri · on Feb 20, 2022

Great! Thanks.

aquafox · on Feb 20, 2022

So many things wri g with it. PCA and SVD are placed on opposite nodes, but PCA is actually computed using SVD!

ogogmad · on Feb 20, 2022

I don't understand what use case they've given for SVD. Are they suggesting using it for compressing an image? Because that's a terrible idea, with an algorithmic complexity of O(n^3) and a very bad compression ratio.

[edit]

Just to underscore the point that you should never use SVD for compression except as a teaching example consider a comment here: https://mathoverflow.net/questions/408504/listing-applicatio...

The comment says

  That's a very nice way of getting some intuition about the SVD. But this is by no means an application of the SVD. Compression [sic] images this way is terrible by any standard (you get very bad image for still quite large storage and computing the compression is very slow).

This is by an expert.

ogogmad · on Feb 20, 2022

Typo: "wri g" should be "wrong".

samhw · on Feb 20, 2022

Ah, dammit, I thought he meant "wri g". This changes the meaning entirely. Well, I disagree with that, but I thought there was some real insight in the observation that there are many things wri g with it.

ogogmad · on Feb 21, 2022

Looking through your comment history, you appear to have a history of passive-aggressive comments like the one above. A typo as extensive as "wri g" is pretty hard to read at first glance, because it has an intervening space.

tpoacher · on Feb 20, 2022

This diagram is useful in two ways:

1. If you're an aspiring data scientist, you should know a little bit about each method mentioned there, at least. So it's a handy list.

2. It stimulated discussion on HN, with much better answers.

NalNezumi · on Feb 20, 2022

Only looking at the first graph, I'm curious why they left out "data size requirement". Does the problem you want to solve have access to a lot of data, or only very little (or data gathering is costly). Is it noisy? Biased? The latter might be beyond the scope, but the amount of data available seems to be crucial for the cheatsheet?

Imagine following the graph and deciding on Neural network and then realize you can't collect enough data for it to work after.

Looks like an low effort article to me

montereynack · on Feb 20, 2022

I agree, but I haven’t found any decent cheat sheet resources discussing problems in all these dimensions. Got any ideas? Open to textbooks as well.

pilotneko · on Feb 20, 2022

Strange to see dimensionality reduction as the root node? I’m having a hard time reading this diagram.

rsrsrs86 · on Feb 20, 2022

Presenting dimensionality reduction as the starting point in machine learning is not usual, but it does make sense, technically. A classification model can be seen as mapping inputs from an N dimensional space to set of labels. Regression maps N dimensional spaces to the real line and so on. Even a sample mean reduces a set with dimensionality equal to the sample size to a single real number.

jrumbut · on Feb 20, 2022

You're right but really the sheet is just a mess. Naive Bayes is on both sides of "Data Too Large" and the wrong side of "Explainable."

It's almost like they put some algorithm names and machine learning concepts in a hat and drew them at random.

liara_k · on Feb 20, 2022

Reinforcement learning gets ignored again... :/

amznbyebyebye · on Feb 20, 2022

They haven’t made it easy enough to understand yet

oneoff786 · on Feb 20, 2022

Reinforcement learning is rarely a good choice

sgillen · on Feb 20, 2022

Yes but sometimes it’s your only choice.

ivalm · on Feb 20, 2022

For what practical applications is it the only choice?

sgillen · on Feb 21, 2022

Playing go at superhuman levels, or playing StarCraft at above diamond level, not sure about Dota but wouldn’t be surprised if RL also outperformed hand crafted AI there as well.

yodelshady · on Feb 20, 2022

Controlling a tokamak: https://news.ycombinator.com/item?id=30379973

AlphaZero is RL as well.

Isn't that the sort of thing you thought of as a kid when thinking of AI, rather than "here's another way to serve ads and make hopefully-not-racist financial decisions"?

ivalm · on Feb 21, 2022

Tokomak control has some RL experiments but it is definitely not the only way.

AlphaZero is… not quite a practical application.

I am a big believer in using AI to improve the world, but RL is just a direction that isn’t yet sufficiently mature. I think they still need their convnet/transformer moment.

oneoff786 · on Feb 20, 2022

Kids are dumb. The real value of machine learning is mundane business decisions that used to rely an poorly informed gut feelings or painstaking manual efforts

utopcell · on Feb 20, 2022

Information Retrieval for one.

king_magic · on Feb 20, 2022

RL is definitely not the only option in the information retrieval space.

swah · on Feb 22, 2022

Hey king, just curious - still do Crossfit 8 years later? ;) Asking here because you have no contact in your profile!

king_magic · on Feb 27, 2022

Hi! Sadly I do not, it was a bit too rough on my body. But still work out, and a lot of what I learned at Crossfit is still useful today. But I'll never go back, ha.

oneoff786 · on Feb 20, 2022

If you’re consulting an ML cheat sheet it’s almost certainly not one of those times

minroot · on Feb 20, 2022

Who the motherfucker decided to put a subscribe to mailing list form that blocks 1/3 of the screen on mobile?

huqedato · on Feb 21, 2022

Very useful, great AI/ML taxonomy. Tx

clircle · on Feb 19, 2022

Blogspam

nefitty · on Feb 19, 2022

Accel.ai is a non-profit

pc86 · on Feb 20, 2022

These two things are not mutually exclusive. Despite the word count this is a pretty low-effort, superficial article.

rsrsrs86 · on Feb 20, 2022

Yes. It reads like other thousands of articles in medium or equivalent... Just content marketing hoping it gets some traffic.

enjoyyourlife · on Feb 20, 2022

So is OpenAI

rexreed · on Feb 20, 2022

Not since 2019. From the wikipedia entry:

"In 2019, OpenAI transitioned from non-profit to for-profit. The company distributed equity to its employees and partnered with Microsoft Corporation, who announced an investment package of US$1 billion into the company. OpenAI then announced its intention to commercially license its technologies, with Microsoft as its preferred partner."