We were recently joined by Crissman Loomis, AI Engineer at Preferred Networks, and James Bourbeau, Lead OSS Engineer at Coiled for a webinar on Training ML Models Faster: Scalable Hyperparameter Optimization with Optuna and Dask.
Optuna is a framework that automates hyperparameter optimization and Dask is a library for scaling Python. In the webinar, Crissman introduces hyperparameter optimization, demonstrates Optuna code, and talks in-depth about how Optuna works internally to make the process efficient. Then, James walks us through Dask’s integration with Optuna and how it can be used to scale hyperparameter optimization.
In this post, we will cover:
- What are hyperparameters?
- Evolution of hyperparameter optimization
- A brief look at Optuna (more about Optuna in the webinar!)
- Key takeaways about hyperparameter optimization and Optuna
What are Hyperparameters?
Hyperparameters are attributes that control the behavior of machine learning algorithms. They have a direct influence on the algorithm’s performance. A lot of times, these attributes are predefined or defined manually. In a general neural network, the number of layers, number of nodes in each layer, etc. are defined manually, and are all examples of hyperparameters.
We also find hyperparameters outside of machine learning applications. Wherever there are objective functions, you can expect hyperparameters. For example: linpack parameters, database performance settings, etc.
Finding the right set of hyperparameters, widely known as hyperparameter optimization, is important because it can have a significant impact on your application or model. In a specific machine learning example of object detection in images, Crissman and team were working to find the threshold to display a bounding box. The following slide shows the results before and after hyperparameter tuning.
The difference is stark!
Moreover, the threshold was only one hyperparameter in the entire process. Crissman describes how hyperparameters can be found everywhere, from the ML models to even the chips at a hardware-level.
Traditionally, hyperparameter optimization is done manually. You start with some random values and get an accuracy reading. You then continue tweaking the hyperparameter values by hand and find the best accuracy using trial-and-error.
Ideally, we want to automate this process and that’s where Optuna comes in. As we see in the webinar, Optuna not only makes automation easy, but also helps find the right hyperparameters to adjust and provides a multitude of other helpful features!
Evolution of Hyperparameter Optimization
It’s interesting to look at the evolution people go through while working with hyperparameters.
Case 1: Not tuning hyperparameters
A significant number of people do not optimize hyperparameters (as found in a recent survey). Researchers who are replicating papers tend to use the same default hyperparameter values or use the baseline parameters.
Case 2: Manually fidgeting with hyperparameters
In the next stage, they realize the importance of hyperparameters. They fidget with the hyperparameters manually to find a satisfactory accuracy value.
Case 3: Grid search
After working with random values, the next step is making the process more systematic. They develop a complete grid using tools like an excel spreadsheet to make sure the entire hyperparameter space is searched.
Case 4: Using Optuna
Finally, they consider automating the process using a framework like Optuna.
A Brief Look at Optuna
Optuna is a very powerful open source framework that helps automate hyperparameter search. It is easy to implement and uses state-of-the-art algorithms to maximize efficiency. You can introduce Optuna into your workflow without making any major changes to your original code!
Optuna comes with a unique set of advantages over other tools and methods of hyperparameter optimization. For instance, some existing frameworks require you to define the search space before optimization using the library’s own syntax, but Optuna defines the search space during optimization using Python. This makes Optuna incredibly useful.
Internally, Optuna has a sampling strategy and a pruning strategy. Sampling refers to the process of finding relevant hyperparameters to optimize, and pruning involves stopping unpromising trails early. Learn more about the gears of the machine in the webinar recording!
Key Takeaways about Hyperparameter Optimization and Optuna
Crissman talks more about how Optuna works, different types of samplers within Optuna, pruning strategies, and some bonus benefits of using Optuna in the webinar. Some key takeaways include:
- Bayesian and evolutionary strategies used in Optuna help determine the best points to find the best hyperparameters, unlike random search where trial points are randomly distributed in the search space. The following slide shows how random search selects points randomly on the curve, while Optuna selects more points near the global minimum.
- Optuna provides a variety of samplers like TPE (Tree-Structured Parzen Estimator), GPC (Gaussian Processor), and CMA-EC (Covariance Matrix Adaptation – Evolutionary strategy) that have specific advantages. For example, GPC performs better with correlated hyperparameters while CMA-EC is strong when you have a large number of trials.
- Pruning can make optimization almost twice as fast! Optuna looks at the learning curve, compares performance with previous trials, and stops unpromising trials early, saving you a lot of compute time.
- Automating hyperparameter optimization allows you to use a strong framework like Dask to scale-up and use more nodes in parallel. Crissman demonstrates this parallel execution in the webinar!
- Visualizations can help understand questions like: What are the most important hyperparameters? What is the contribution of each parameter to the overall performance of your algorithm? How do particular variables change over time?
Distributed Hyperparameter Optimization
In the second part of the webinar, James demonstrates Dask-Optuna, a library for integrating Dask and Optuna. Dask-Optuna allows you to run optimization trials in parallel on a Dask cluster. James walks through an example — optimizing several hyperparameters for an XGBoost classifier trained on the breast cancer dataset, and uses Coiled to create a remote Dask cluster on AWS for this demonstration.
Check out the webinar recording and follow along in the demo notebook!