The problem is gathering and labeling datasets that may contain a few thousand to tens of millions of elements is time consuming and often prohibitively expensive.Įnter synthetic data. More diverse training data generally makes for more accurate AI models. Why Is Synthetic Data So Important?ĭevelopers need large, carefully labeled datasets to train neural networks. “The fact is you won’t be able to build high-quality, high-value AI models without synthetic data,” the report said. In a report on synthetic data, Gartner predicted by 2030 most of the data used in AI will be artificially generated by rules, statistical models, simulations or other techniques. Source: Gartner, “Maverick Research: Forget About Your Real Data – Synthetic Data Is the Future of AI,” Leinar Ramos, Jitendra Subramanyam, 24 June 2021. Synthetic data will become the main form of data used in AI. “Most benchmarks provide a fixed set of data and invite researchers to iterate on the code … perhaps it’s time to hold the code fixed and invite researchers to improve the data,” he wrote in his newsletter, The Batch. He’s rallying support for a benchmark or competition on data quality which many claim represents 80 percent of the work in AI. The rise of synthetic data comes as AI pioneer Andrew Ng is calling for a broad shift to a more data-centric approach to machine learning. It concludes “synthetic data is essential for further development of deep learning … many more potential use cases still remain” to be discovered. The 156-page report cites 719 papers on synthetic data. Indeed, a survey of the field calls use of synthetic data “one of the most promising general techniques on the rise in modern deep learning, especially computer vision” that relies on unstructured data like images and video. That’s why developers of deep neural networks increasingly use synthetic data to train their models. Users can generate synthetic data for autonomous vehicles using Python inside NVIDIA Omniverse. ![]() Research demonstrates it can be as good or even better for training an AI model than data based on actual objects, events or people. It may be artificial, but synthetic data reflects real-world data, mathematically or statistically. Put another way, synthetic data is created in digital worlds rather than collected from or measured in the real world. Synthetic data is annotated information that computer simulations or algorithms generate as an alternative to real-world data. So, many are making their own fuel, one that’s both inexpensive and effective. Data is the new oil in today’s age of AI, but only a lucky few are sitting on a gusher.
0 Comments
Leave a Reply. |