I am not sure when you are using neural nets how much knowing the underlying math helps you. Yes we are doing matrix multiplies and using gradient decent. That is stuff you probably learned in high school. Usually you just learn partial derivatives in two dimensions instead of 10s or 100s of dimensions.
Even if you understand the math you are training tens of thousands of parameters inside your model. You are also passing them through non linear elements like sigmoid of ReLUs. I am not sure what insight knowing calculus or linear algebra will provide you in building the model unless you can process more than 3 dimensions non linear elements in your head. I am sure there are people that can do it, but how many can do it?
Actually very much. Think of it from an industry perspective.
If you are a software engineer that needs to use a network as a plug in module, then you may not need much understanding of linear algebra. But then you also don't need to know much ML either. It is simply a software engineering job.
However things change for anyone who gets their hands dirty or actually builds (even just implements know networks ) from scratch .
It is common to find that error caused by transferring a known model to your problem, was because of making key mistakes in how the question was posted mathematically. I am currently implementing Neural nets in my job, and have made many errors because of misunderstanding the mathematical operation of a layer.
PGMs are almost entirely math and so are embeddings and matrix factorization models. VAEs and GANs also need a solid grasp of the math behind them. Want to touch your loss function ? -> Math. Want to change optimization methods ? -> Math.
The visual designs of almost all neural nets are grounded in math, and this fact makes it vital to be good at it, if one wishes to gain anything beyond a surface level understanding of it.
I am not arguing there is not math involved, but gradient decent is basic calculus MSE, MAE, etc. This is something you take in high school. This stuff is pretty basic. Even in gradient decent you might have multiple minimums. The minimum you get will dependent on which set of points you start at. Imagine a 3d curve you might have different valleys.
Let me quote François Chollet:
"Neural networks" are a sad misnomer. They're neither neural nor even networks. They're chains of differentiable, parameterized geometric functions, trained with gradient descent (with gradients obtained via the chain rule). A small set of highschool-level ideas put together
You need a great knowledge of math to internalize the complexity of neural nets. One particular area of mathematics that is not in the set of highschool-level ideas is topology.
Quick question, do you do much machine learning in your current job on complex and large datasets? I find that the people who claim that machine learning is not math heavy are not solving the actual hard ML problems. Ofcourse it will be easy if your problem maps well to some framework, where jamming your data into a model produces useful outputs, but theres plenty of problems where you cant do that, and the second that happens, you need to understand the algorithms. Its similar to using some javascript framework, where you can plug in solutions great, until you need to do something outside the framework.
I have been getting more into machine/deep learning et al recently and have been pleasantly surprised that the calc and linalg I learnt 25 years ago I just need to review, it all still works the same! So refreshing after the short shelf/half life of many tech skills. These are the skills that matter (and that if I’d been entirely self-taught I never would have learnt)
Well said! It's pretty funny, back when I was studying neuroscience i used to constantly loathe the fact that my undergrad made me study linear algebra, calc, etc. Now, a few years later i'm neck deep in all this theoretical ml and boy am i thankful for it haha
As someone who took calculus, linear algebra and a couple CS related courses on my economics program in university 6 years ago, I wish I had someone around who could explain me (better) why and how all those courses can be helpful to me. ML and AI were not big those days (at least I didn't hear much about them as I do know). Now, looking for my first python web developer job and dreaming about switching to something more ML related eventually, I am thankful I hadn't completely fail those courses, but feel extremely sorry I didn't study them properly.
You can get a fair way these days without knowing any maths at all, the tools and libraries are very good and there are a ton of tutorials and sample code out there. But what a grasp of the maths really gives you is intuition which you will really, really need as soon as you go off the beaten track. Without it you will just get bogged down in hyperparameter sweeps and other tasks that just don’t scale
Calculus provides the foundation for finding the optimal solutions to the math problems you set up. This is through a few methods: differential curvature for gradient descent, and the theory of real functions provides gaurantees regarding convex functions and more generally, where optimal values can occur (exterma and stationary points).
Linear algebra is in general the language used to set up and solve the problems systematically (on a computer).
Probability and stats build on calculus to provide formal methods to formulate the problems. ML adds techniques to this area to tackle problems focused around getting machines to learn and solve specific domain problems.
Deeper math is probably more necessary when developing machine learning algorithms. To apply machine learning techniques, less math is required. But knowing Lin algebra and matrix manipulations (Matlab, R fluency, etc) will not be wasted effort either way.
It is concerning to me how many people are deploying these techniques without first understanding the principles and limitations of multivariate regression and time series analysis.
I'm not sure why people are diving into modern techniques without knowing how to properly specify a simple (but powerful) regression model.
It is concerning to me that people are driving cars without first understanding the principles of mechanical engineering, thermodynamics, and fluid dynamics. I'm not sure why people are diving into modern cars without knowing how to properly build a simple (but powerful) steam engine.
The point being, there's a distinction between "machine learning research" and "applied machine learning". Of course those points are on a continuum, not separated by a bright line. But the point is, there are different roles and those different roles have different goals and requirements.
Of course knowing more math and theoretical foundations enables you to do more in some senses... just like a basic knowledge of fluid dynamics would be useful if you want to "port and polish" the cylinder heads in your car. But in reality, a minuscule portion of the population of car owners will ever want to do this. OTOH, it is essential if you're the person designing the car engine in the first place.
No, it's not a perfect analogy, but I think the overall point stands: some people need to know the deep, deep details of linear algebra, multivariable calculus, probability theory, measure theory, topology, etc., etc., for their goals in machine learning, while other people can achieve quite a lot without all of that knowledge.
Even if you understand the math you are training tens of thousands of parameters inside your model. You are also passing them through non linear elements like sigmoid of ReLUs. I am not sure what insight knowing calculus or linear algebra will provide you in building the model unless you can process more than 3 dimensions non linear elements in your head. I am sure there are people that can do it, but how many can do it?