Changes to Terms of Service to Enable AI Training Catching Flak

Battle plays out between personal privacy and the insatiable need of AI systems for data, especially personal data; Stanford researchers point to a way forward

Aug 02, 2024

By John P. Desmond, Editor, AI in Business

*All is not lost when it comes to salvaging personal privacy in the age of AI, suggest some optimistic researchers. (Credit: Wikimedia Commons)*

Changes to terms of service clauses in the privacy policies of big tech players, are aimed at enabling them to collect data to feed their AI large language models.

The changes are not generally trumpeted in press releases; it may be the higher ups hope nobody notices. But privacy-watchers are tuned in.

“Buried thousands of words into its document, Google tweaked the phrasing for how it used data for its products, adding that public information could be used to train its AI chatbot and other services,” stated the author of a recent account in The New York Times on changes in Google’s privacy policy.

Agreeing to the new Google terms of service means you agree that if you use Google Translate, Bard and Cloud AI capabilities, you are agreeing your use of those products contributes publicly available information that Google can use to train its products, according to the account.

Not surprisingly, Google saw some pushback.

Google seeks access to private data protected by a web of federal and state privacy laws that require the company to get permission to access content created online by customers. The private data is estimated to be 10 times the size of publicly available data, estimates Tamay Besiroglu, an associate director at Epoch, an AI research institute.

*Tamay Besiroglu, associate director, Epoch*

In February, the Federal Trade Commission issued a warning that changing terms of service could be a deceptive practice. The FTC warning, posted in a blog post in February, described how the use of AI by big tech companies with user privacy agreements in place, is creating a voracious demand for data. And the massive volume of private data of their users, is a tempting target.

“These companies now face a potential conflict of interest,” the FTC post stated. “They have powerful business incentives to turn the abundant flow of user data into more fuel for their AI products, but they also have existing commitments to protect their users’ privacy.”

Some companies attempt to resolve the conflict by changing the terms of service agreements, even “surreptitiously” in some cases to avoid backlash from users, the FTC warned. This may not be playing by the rules.

“It may be unfair or deceptive for a company to adopt more permissive data practices—for example, to start sharing consumers’ data with third parties or using that data for AI training—and to only inform consumers of this change through a surreptitious, retroactive amendment to its terms of service or privacy policy,” the FTC post stated.

Adobe Gets Pushback

Adobe ran into trouble in June when it tweaked its terms of service agreement, in what the company said was a clarification or documentation of practices it had already been engaging in, but not making clear to its customers.

In the clarification, Adobe stated, “We don’t train generative AI on customer content.” The only exception is, if work is submitted to the Adobe Stock marketplace, the company can use it to train Adobe Firefly.

“We’ve explicitly said we will not train generative AI on your content,” stated Scott Belsky, chief strategy officer of Adobe, in an account in The Verge. “It was always a policy that we had as a company. We always made that very clear, but we never explicitly said that.”

*Scott Belsky, chief strategy officer, Adobe*

Only if content uploaded into the cloud is flagged as abusive content, such as child sexual abuse material, will it be subject to human review, the policy states. Users who want to opt out of the product improvement program, in which, “We may use usage data and content characteristics to improve your product experience,” according to the Adobe policy, can do so.

Privacy Protections in AI Tools Porous Today

Privacy protections for consumers using tools incorporating AI are porous today. “Most privacy laws and regulations do not yet directly address AI and how it can be used, or how data can be used in AI models” states the author of a recent account in eWeek. “As a result, AI companies have had a lot of freedom to do what they want.”

The issues are being recognized by US lawmakers, who see a model in the European Union’s AI Act for how to go forward, suggests the author, Shelby Hiter, technology writer. “Expect more regional, industry-specific, and company-specific regulations to come into play in the coming months and years, with many of them following the EU AI Act as a blueprint for how to protect consumer privacy,” she stated in the account.

Among examples of AI products or services with strong privacy protections, the author mentioned:

Hippocratic AI, a generative AI product designed for healthcare services, that complies with HIPAA and is being used by nurses, physicians, health systems and pay partners. The company took $50 million from investors to help build out its model and get it into practice. In an interview with TechCrunch last year, cofounder and CEO Munjal Shah stated, “Hippocratic has created the first safety-focused large language model design specifically for healthcare.”
Glean, an enterprise search product built by founders with experience at big tech companies including Google, Facebook, Microsoft and Uber. A customer (unidentified) that supplies financial software in the financial services industry, used Glean to connect 33 different data sources into a single trusted knowledge model. Adoption within the 1,000-person organization was rapid, hovering about 80 percent. The users spent less time searching as a result.
Another Clean customer, Grammarly, serving millions of customers with its “communication assistance” technology, that at its root checks for grammar, uses Glean to search across all the applications serving customers. These include Confluence, Slack, Zendesk and Google Drive. Customer Support Manager Iryna Smuk of Grammarly stated, “I found many more documents with important product information than I ever knew existed within Grammarly.”
Simplifai, headquartered in Oslo, Norway, uses natural language processing to interpret unstructured written human communications received through emails and documents and chats, reducing time needed to answer customer inquiries. Concentrating on the insurance and banking industries, the company boasts of state-of-the-art GDPR and ISO security-compliant products, that respect user privacy and prioritize data protection from design to development. Its developers follow the Secured Software Development Cycle approach. The Norwegian Data Protection Authority has confirmed a legal basis for public institutions in Norway to use the Simplifai AI products.

Stanford HAI Researchers Point to a Way Forward on Privacy

Stanford researchers recently published a white paper on privacy in the AI era. “Today, it is basically impossible for people using online products or services to escape systematic digital surveillance across most facets of life—and AI may make matters even worse,” stated Jennifer King, a privacy and data policy fellow at Stanford, in an account from the university’s Human-Centered AI unit. King wrote the white paper, “Rethinking Privacy in the AI Era: Policy Provocations for a Data-Centric World,” with Caroline Meinhardt, Stanford HAI’s policy research manager.

*Jennifer King, privacy and data policy fellow, Stanford HAI*

King is optimistic that personal privacy can be better protected in the AI era. “In my view, when I’m browsing online, my data should not be collected unless or until I make some affirmative choice, like signing up for the service or creating an account. And even then, my data shouldn’t be considered public unless I’ve agreed to share it,” she stated.

She credited Apple’s App Tracking Transparency feature, launched in 2021, with addressing how much user data was being collected by third-party apps. When installing a new app, users are asked if they want to allow the app to track them across other apps and websites. Marketing industry reports show that when presented with that choice, 80 to 90 percent of users say no.

Or, browsers could offer a built-in, opt-out feature, such as Global Privacy Control. Some browsers, including Firefox and Brave, have built-in opt-out features, but Microsoft Edge, Apple’s Safari and Google Chrome do not, King reported.

Regulators examining AI need to look harder at how data is collected, whether it is biased, and whether personal privacy is protected. “This is an area where there is a lot of work to do,” King stated.

One idea is to have a “data intermediary,” a third party that acts as a mediator between data subjects and data users. “It involves delegating the negotiating power over your data rights to a collective that does the work for you, which gives consumers more leverage,” King stated.

It is polite to ask before taking a user’s data and using it for business purposes; maybe the AI industry will get to that.

Read the source accounts and information in The New York Times, a Federal Trade Commission blog post, in The Verge, in eWeek and from Stanford University’s Human-Centered AI unit.

AI in Business

Changes to Terms of Service to Enable AI Training Catching Flak

Battle plays out between personal privacy and the insatiable need of AI systems for data, especially personal data; Stanford researchers point to a way forward

Discussion about this post