Skiff Team / 8.11.2023Home / Features

A list of services training AI on your data

Companies known to use, sell, or share user data to train AI models.
AI companies training data.
AI training on the public web is nothing new. It’s safe to assume that your old Tweets, YouTube videos, WordPress posts, and LinkedIn profile have made their way into massive, for-profit datasets.What is new is training AI models on proprietary content or private data - your calls, video chats, documents, notes, and emails. You’d be surprised to learn that this is becoming even more common across consumer products, even with companies as big as Google and Zoom.This blog is an updated list of consumer products that do use your data to train AI models. ​Have more to share? Open a pull request at this link, or write to [email protected] with any questions or feedback.

The updated list

  • Zoom: In March 2023, Zoom updated its terms of service to allow the company to use user data to train AI models. The update has sparked intense media coverage and controversy, as it gave Zoom broad control over user data without clear consent requirements. Zoom later backtracked on the update, saying that it would not use user data to train AI models without consent. Read more in The Washington Post.
  • Google: In January 2023, Google updated its privacy policy to state that it may use user data to train AI models. The update was met with criticism from privacy advocates, who argued that Google was not being transparent about its data collection practices. Google has since clarified that it will only use user data to train AI models that are necessary for its core products and services. Read more in 9to5Google.
  • Facebook: Facebook has an established history of using public data - including photos on Instagram and videos on Facebook - to train AI models. Read more in CNBC or The Verge.
  • Apple: Apple has generally toed a line between training ML models and indiscriminately harnessing user data. Recently, Spotify stopped allowing Apple to train on podcast data (read more in WIRED).
  • Getty Images: Although Getty Images does not train AI models of their own (that we know of), they filed a lawsuit against Stability AI alleging that over 10 million photos were illegally used in AI training. Read more in The Verge.
  • ChatGPT/OpenAI: Unsurprisingly, user interactions with ChatGPT are used to improve the chatbot’s service. However, in a prior version of OpenAI’s API terms of service, the company could use developer-provided API requests as training data as well. OpenAI later changed this policy, as developers objected to their API calls being used as training data by OpenAI. Read more in TechCrunch.
  • Snapchat: At the very least, Snapchat is known to use conversations with “My AI” - it’s AI chatbot - to train its AI models. Read more on ZDNet. Snapchat also allegedly uses data from Memories to train models capable of recognizing user content.

Questions or comments?

If you have any questions or suggested additions to the list, please contact us at the email address above, or open a pull request against the GitHub file. To have a longer conversation about privacy, join our Discord, Reddit, and Twitter communities. If you're looking to request a feature or update for Skiff, visit our public feature request board.

Join the community

Become a part of our 1,000,000+ community and join the future of a private and decentralized internet.

Free plan • No card required