Connect the Dots

Distributed model training II: Parameter Server and AllReduce

Written by Ju on May 20th, 2020September 21st, 2020. Leave a comment

In the previous post, I talked about using MapReduce and Spark for distributed model training. In this post, I will talk about parameter server and how it is used in distributed model training.

(more…)

Distributed model training I: MapReduce and Spark

Written by Ju on May 7th, 2020September 21st, 2020. 1 Comment

In the previous post, I introduced challenges in machine learning systems with big data and complex models. In this post, I will discuss distributed systems in the era of big data.

(more…)

Think big: ML systems in the era of big data

Written by Ju on May 1st, 2020September 11th, 2023. Leave a comment

Let’s start with linear regression. Using established libraries such as scikit-learn, it is almost trivial to train a linear regression model. We can easily run the model training with a few hundred Megabytes of data on our laptop with a build-in CPU.

Now let’s think big.

(more…)

Machine Learning Lifecycle

Written by Ju on January 17th, 2020January 18th, 2020. Leave a comment

In business management, product lifecycle is broken into 4 stages with the distinct pattern of sales over time: introduction, growth, mature, and decline. In the diagram below, I adapt the classic product lifecycle curve to show the engineering load over time in machine learning (ML): from model development to maintenance. Managing and coordinating different stages in ML lifecycle presents pressing challenges for ML practitioners.

(more…)

5 things you need to know about Machine Learning Systems

Written by Ju on January 9th, 2020January 9th, 2020. 2 Comments

The more I work on building end-to-end machine learning (ML) pipelines, the more I realize the importance of system design and infrastructure. ML shares many concerns with traditional software development, and poses new challenges to system design.

(more…)

Reinforcement learning (II): Markov Decision Process and RL agent

Written by Ju on August 11th, 2019August 12th, 2019. Leave a comment

In the previous post,I gave a high-level overview of Reinforcement Learning (RL). In this post, I will summarize different learning paradigms of RL agents. (more…)

Reinforcement learning (I): overview

Written by Ju on August 4th, 2019December 29th, 2019. Leave a comment

In the past 2 years, I have been following progress in Reinforcement Learning (RL). RL beats human experts in Go [1], and achieves professional levels in Dota2 [2] and StarCraft [3]. RL is being mentioned more and more often in mainstream media and conferences.

I think it is a good time for me to revisit RL. (more…)

How exactly does Bayesian Optimization work?

Written by Ju on May 23rd, 2019May 23rd, 2019. Leave a comment

In the previous post, I introduced Bayesian Optimization for black-box function optimization such as hyperparameter tuning. It is now time to look under the hood and understand how the magic happens.

(more…)

Ju Yang

Ph.D. / Machine Learning Practitioner in New York

Posts in the Connect the Dots category:

Distributed model training II: Parameter Server and AllReduce

Distributed model training I: MapReduce and Spark

Think big: ML systems in the era of big data

Machine Learning Lifecycle

5 things you need to know about Machine Learning Systems

Reinforcement learning (II): Markov Decision Process and RL agent

Reinforcement learning (I): overview

How exactly does Bayesian Optimization work?