top of page
  • tech360.tv

Microsoft, Google, and Meta Bet on Synthetic Data to Advance AI Models

Microsoft, Google, and Meta are investing in synthetic data to train AI models. Synthetic data is generated by AI systems and offers advantages in terms of control and privacy. Anthropic, Meta, and Google have already used synthetic data for their models.

With the ever-increasing need for data to train AI systems, the availability of high-quality real data is becoming limited. To overcome this challenge, companies are exploring the use of artificial or fake data generated by their own AI systems.


The concept behind synthetic data is simple. Tech companies can use their AI systems to generate writing and other media, which can then be used to train future versions of those same systems. This approach offers several advantages, including the ability to avoid legal, ethical, and privacy concerns associated with real data. It also provides more control over the learning process, allowing AI models to be guided and refined with greater precision.


Anthropic, a leading AI firm, used synthetic data to build its latest chatbot model, Claude. Meta and Google have also utilized synthetic data to develop their open-source models. Google DeepMind even relied on this method to train a model capable of solving Olympiad-level geometry problems. Speculation surrounds whether OpenAI is using synthetic data to train its text-to-video image generator, Sora.


Microsoft's generative AI research team also embraced synthetic data for a recent project. Instead of feeding a large volume of children's books to their AI model, they created a list of 3,000 words that a four-year-old could understand. The AI model was then prompted to generate children's stories using one noun, one verb, and one adjective from the list. This process resulted in the development of a more capable language model. Microsoft has made this new family of "small" language models, Phi-3, open source and available to the public.


While synthetic data offers promising possibilities, some AI experts express concerns. Researchers at major universities published a paper highlighting the risks of using synthetic data, including the potential for "model collapse." In their experiment, an AI model built on synthetic data began generating nonsensical output, losing memory of its original training. There are also concerns about the amplification of biases and toxicity in datasets created with synthetic data.


Despite the debate surrounding synthetic data, pioneers in the field agree that human intelligence is still essential. Real people are needed to create and refine artificial datasets, as synthetic data generation is a complex process that requires significant human labor.


In other AI news, a mysterious chatbot recently appeared on a benchmarking website, showcasing performance comparable to OpenAI's GPT-4. Speculation arose that the chatbot might be from OpenAI, but the developer's identity remains unknown. The chatbot has since been taken offline temporarily due to high traffic and capacity limits.

As the first full school year with AI models like ChatGPT comes to a close, Bloomberg is eager to hear about the impact of generative AI in classrooms. Students and teachers are encouraged to share their experiences.


In a recent interview, Demis Hassabis, CEO of Google DeepMind, seemingly threw shade at his friend-turned-rival Mustafa Suleyman, who was recently named CEO of Microsoft AI. Hassabis remarked that Suleyman's knowledge of AI largely came from working with him over the years.

 
  • Microsoft, Google, and Meta are investing in synthetic data to train AI models

  • Synthetic data is generated by AI systems and offers advantages in terms of control and privacy

  • Anthropic, Meta, and Google have already used synthetic data for their models


Source: BLOOMBERG


As technology advances and has a greater impact on our lives than ever before, being informed is the only way to keep up.  Through our product reviews and news articles, we want to be able to aid our readers in doing so. All of our reviews are carefully written, offer unique insights and critiques, and provide trustworthy recommendations. Our news stories are sourced from trustworthy sources, fact-checked by our team, and presented with the help of AI to make them easier to comprehend for our readers. If you notice any errors in our product reviews or news stories, please email us at editorial@tech360.tv.  Your input will be important in ensuring that our articles are accurate for all of our readers.

bottom of page