AI Case Study

Researchers from Boston University improve automatic parameter selection for synthetic data creation

Researchers from Boston University and NEC-Labs develop a method which automatically adjusts the parameters of a data simulator to optimise the distibution of for accuracy of the model trained on the simulated data.


Public And Social Sector

Education And Academia

Project Overview

Simulators are used to create synthesised data, but the distribution of that data must be determined manually or automatically. The researchers have proposed a method where the parameters are "learned", resulting in a distribution of synthetic data that improves the performance of the model then trained on it. The researchers evaluated the technique on both synthetic and real datasets. "Learning to simulate can be seen as a meta-learning algorithm that adjusts parameters of a simulator to generate synthetic data such that a machine learning model trained on this data achieves high accuracies on validation and test sets, respectively."

Reported Results

"Given the need for large-scale data sets to feed deep learning models and the often high cost of annotation and acquisition, we believe our approach is a sensible avenue for practical applications to leverage synthetic data. Our experiments illustrate the concept and demonstrate the capability of learning to simulate on both synthetic and real data."


The technique for automatic parameter selection is based on reinforcement learning.


R And D

Core Research And Development


"In order to train deep neural networks, significant effort has been directed towards collecting largescale datasets for tasks such as machine translation (Luong et al., 2015), image recognition (Deng
et al., 2009) or semantic segmentation (Geiger et al., 2013; Cordts et al., 2016). It is, thus, natural for recent works to explore simulation as a cheaper alternative to human annotation (Gaidon et al., 2016; Ros et al., 2016; Richter et al., 2016). Besides, simulation is sometimes the most viable way to acquire data for rare events such as traffic accidents. However, while simulation makes data collection and annotation easier, it is still an open question what distribution should be used to synthesize data."



"10 sampled datasets for random and learned parameters"