Learn to learn: Hyperparameter Tuning and Bayesian Optimization

bayesian optimization

In machine learning models, we often need to manually set various hyperparameters such as the number of trees in random forest and learning rate in neural network. In traditional optimization problems, we can rely on gradient-based approaches to compute optimum. However, hyperparameter tuning is a black box problem and we usually do not have an expression for the objective function and we do not know its gradient. In this post, I will discuss different approaches for hyperparameter tuning and how we can learn to learn. 


consulting yes or no or maybe?

声明:本文作者最近也开设了个人博客(https://yaqiongchen.com/),如有问题和评论,请直接进入原作者的博客 北美博士的咨询申请之路。这里的评论已关。

这篇稿子应博主Ju的邀约而写,主要是总结一下北美博士找咨询工作的经验和教训。我的背景是国内Top2本科,哥大生物专业博士,下个月入职一家咨询公司的New York Office。在求职期间得到了很多的帮助,希望也能尽一份自己的绵薄之力,帮到更多的后来人。

Numerical optimization in machine learning (III): Constrained optimization

KKT conditions

Now what we have discussed unconstrained optimization problems in previous post, it is now time to come to the reality. In the real world, we often have limitations, such as the total budget, motion angles, and some arbitrary desirable range of values. Life would be so easy (and boring) without boundary and conditions. Adding constraints certainly makes optimization problems less easy, but more interesting. 

Everything you need to know about matrix in machine learning (II): eigendecomposition and singular value decomposition


Why do we care about eigenvalues, eigenvectors, and singular values?  Intuitively, what do they tell us about a matrix? When I first studied eigenvalues in college, I regarded it as yet another theoretical math trick that is hardly applicable to my life. Once I passed the final exam, I shelved all my eigen-knowledge to a corner in my memory. Years have passed, and I gradually realize the importance and brilliance of eigenvalues, particularly in the realm of machine learning. In this post, I will discuss how and why we perform eigendecomposition and singular value decomposition in machine learning.

Everything you need to know about matrix in machine learning (I): Solve Ax = b


In machine learning, we are often dealing with high-dimension data. For convenience, we often use matrix to represent data. Numerical optimization in machine learning often involves matrix transformation and computation. To make matrix computation more efficiently, we always factorize a matrix into several special matrices such as triangular matrices and orthogonal matrices. In this post, I will review essential concepts of matrix used in machine learning. 

Numerical optimization in machine learning (I): the basics


During logistic regression, in order to compute the optimal parameters in the model, we have to use an iterative numerical optimization approach (Newton method or Gradient descent method, instead of a simple analytical approach). Numerical optimization is a crucial mathematical concept in machine learning and function fitting, and it is deeply integrated in model training, regularization, support vector machine, neural network, and so on. In the next few posts, I will summarize key concepts and approaches in numerical optimization, and its application in machine learning.

DIY curriculum for post-school life


Last February, I defended my PhD thesis and graduated from more than 2 decades of school life. Now, it’s been a full year of post-school life. There are no more exams and curriculum to quantify my GPA. In this post-school life, I start to realize that I have to be the one setting my own goal, designing my own curriculum, and evaluating my progress introspectively. In this post, I am sharing some lessons that I find useful in DIY curriculum.