Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Machine Learning Algorithms Cheat Sheet (accel.ai)
251 points by headalgorithm on Feb 19, 2022 | hide | past | favorite | 44 comments


I don't get the purpose of these cheatsheets. Haven't we already automated these decisions and model optimization with AutoML?

AI or Data Science is not my fulltime day job but my go to when I need any ML model is an AutoML library, such as AutoSklearn [1] or AutoKeras [2].

[1] https://automl.github.io/auto-sklearn/master/

[2] http://autokeras.com/


It can take hours, days, or weeks, depending on data set, to even know if a model is going to work. In these cases, blindly searching for models in the entire space is sort of futile.

But then, cheatsheets like these don't help much either.


Machine learning isn't an exam (and if it was, it's an open book one), so nobody really needs cheat sheets.

There is, however, a part of machine learning practice that is not covered by textbooks.


AutoML is good most of the time but there is still a place for custom model development in particular applications where you to adjust and tune parameters.


I totally agree. Personally, I don't have the skills to tackle these particular cases. Cheatsheets don't help me in anything either. And I presume (maybe I'm wrong?) they're also useless for those with advanced skills for particular cases...



That is the one I give for students who want to get started with ML.

Has links to relevant papers for the implementations.


Too much focus on implementation details and not enough emphasis on state of the art theory. A new practitioner should start with probabilistic learning for theory and deep learning on the practical side. A cursory overview of linear regression, decision trees, and the various linear algebra vector learning tricks is more than enough. They are mathematically interesting but not particularly useful except on sorted and cleaned tabular data. The best introduction to ML is by demonstrating its solutions to previously intractable classic problems: realistic text to speech, speech recognition, image classification, image generation, natural language processing. A boosted decision tree is rather limited in its applications (except for Kaggle competitions) that it should be something to learn after the fact.

Understanding maximum likelihood and probability chaining at the information theoretic level is a lot more important than focusing on the nitty-gritty details of the Generalized Linear Model du jour. A good understanding of ideas like KL-divergence and messaging passing is more crucial than knowing which tree or forest algorithm to use.

Many tutorials like to focus on linear and logistic regression because they are well understood and mathematically straightforward (for the educator). But machine learning is not purely a field of math, nor is it merely statistics. Too many practitioners never go beyond formula chugging and end up missing the forest for the trees. Sure a big enough model will solve almost any problem. However, without understanding the information theory and probabilistic explanations behind why a model can learn and function, ML will always remain a bag of tricks.


Can you suggest any resources for the information theoretic level explanation that you talk about? Do either of the Hastie-Tibshirani books cover it?


https://probml.github.io/pml-book/

Murphy's should be more than sufficient.

Mackay (Information Theory, Inference, and Learning Algorithms) and Bishop (Pattern Recognition and Machine Learning) are also commonly recommended but their books are all more than a decade old.


Great! Thanks.


So many things wri g with it. PCA and SVD are placed on opposite nodes, but PCA is actually computed using SVD!


I don't understand what use case they've given for SVD. Are they suggesting using it for compressing an image? Because that's a terrible idea, with an algorithmic complexity of O(n^3) and a very bad compression ratio.

[edit]

Just to underscore the point that you should never use SVD for compression except as a teaching example consider a comment here: https://mathoverflow.net/questions/408504/listing-applicatio...

The comment says

  That's a very nice way of getting some intuition about the SVD. But this is by no means an application of the SVD. Compression [sic] images this way is terrible by any standard (you get very bad image for still quite large storage and computing the compression is very slow).
This is by an expert.


Typo: "wri g" should be "wrong".


Ah, dammit, I thought he meant "wri g". This changes the meaning entirely. Well, I disagree with that, but I thought there was some real insight in the observation that there are many things wri g with it.


Looking through your comment history, you appear to have a history of passive-aggressive comments like the one above. A typo as extensive as "wri g" is pretty hard to read at first glance, because it has an intervening space.


This diagram is useful in two ways:

1. If you're an aspiring data scientist, you should know a little bit about each method mentioned there, at least. So it's a handy list.

2. It stimulated discussion on HN, with much better answers.


Only looking at the first graph, I'm curious why they left out "data size requirement". Does the problem you want to solve have access to a lot of data, or only very little (or data gathering is costly). Is it noisy? Biased? The latter might be beyond the scope, but the amount of data available seems to be crucial for the cheatsheet?

Imagine following the graph and deciding on Neural network and then realize you can't collect enough data for it to work after.

Looks like an low effort article to me


I agree, but I haven’t found any decent cheat sheet resources discussing problems in all these dimensions. Got any ideas? Open to textbooks as well.


Strange to see dimensionality reduction as the root node? I’m having a hard time reading this diagram.


Presenting dimensionality reduction as the starting point in machine learning is not usual, but it does make sense, technically. A classification model can be seen as mapping inputs from an N dimensional space to set of labels. Regression maps N dimensional spaces to the real line and so on. Even a sample mean reduces a set with dimensionality equal to the sample size to a single real number.


You're right but really the sheet is just a mess. Naive Bayes is on both sides of "Data Too Large" and the wrong side of "Explainable."

It's almost like they put some algorithm names and machine learning concepts in a hat and drew them at random.


Reinforcement learning gets ignored again... :/


They haven’t made it easy enough to understand yet


Reinforcement learning is rarely a good choice


Yes but sometimes it’s your only choice.


For what practical applications is it the only choice?


Playing go at superhuman levels, or playing StarCraft at above diamond level, not sure about Dota but wouldn’t be surprised if RL also outperformed hand crafted AI there as well.


Controlling a tokamak: https://news.ycombinator.com/item?id=30379973

AlphaZero is RL as well.

Isn't that the sort of thing you thought of as a kid when thinking of AI, rather than "here's another way to serve ads and make hopefully-not-racist financial decisions"?


Tokomak control has some RL experiments but it is definitely not the only way.

AlphaZero is… not quite a practical application.

I am a big believer in using AI to improve the world, but RL is just a direction that isn’t yet sufficiently mature. I think they still need their convnet/transformer moment.


Kids are dumb. The real value of machine learning is mundane business decisions that used to rely an poorly informed gut feelings or painstaking manual efforts


Information Retrieval for one.


RL is definitely not the only option in the information retrieval space.


Hey king, just curious - still do Crossfit 8 years later? ;) Asking here because you have no contact in your profile!


Hi! Sadly I do not, it was a bit too rough on my body. But still work out, and a lot of what I learned at Crossfit is still useful today. But I'll never go back, ha.


If you’re consulting an ML cheat sheet it’s almost certainly not one of those times


Who the motherfucker decided to put a subscribe to mailing list form that blocks 1/3 of the screen on mobile?


Very useful, great AI/ML taxonomy. Tx


Blogspam


Accel.ai is a non-profit


These two things are not mutually exclusive. Despite the word count this is a pretty low-effort, superficial article.


Yes. It reads like other thousands of articles in medium or equivalent... Just content marketing hoping it gets some traffic.


So is OpenAI


Not since 2019. From the wikipedia entry:

"In 2019, OpenAI transitioned from non-profit to for-profit. The company distributed equity to its employees and partnered with Microsoft Corporation, who announced an investment package of US$1 billion into the company. OpenAI then announced its intention to commercially license its technologies, with Microsoft as its preferred partner."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: