Data Privacy More at Risk as Generative AI Grows
Companies using consumer data to help with decisions on credit and insurance risk, often without the knowledge or permission of users; some US agencies responding
By John P. Desmond, Editor, AI in Business

To fuel the insatiable appetites of their generative AI models for data, tech companies are taking the conversations, photos and documents of users and using them to train their AI in how to write, paint and seem more human.
Consumers may be used to being targeted with ads based on what they have been shopping for or buying, but now the companies are using consumer data to build out new technologies, without really asking for permission.
“We don’t yet understand the risk that this behavior poses to your privacy, reputation or work. But there’s not much you can do about it,” stated Geoffrey Fowler, technology writer, in a recent column in The Washington Post.
Usually, by agreeing to use a big tech product, the user provides permission for whatever the tech supplier wants to do with the user’s personal data. Google earlier in the summer, for example, updated its privacy policy to say it can use any “publicly available information” to train its AI, including for the Bard chatbot.
“Most people have no way to make truly informed decisions about how their data is being used to train AI. That can feel like a privacy violation — or just like theft,” stated Fowler.
Those weighing in with increased privacy concerns included Nicholas Piachaud, a director at the open source nonprofit Mozilla Foundation. He stated, “This is an appropriate moment to step back and think: What’s at stake here? Are we willing just to give away our right to privacy, our personal data to these big companies? Or should privacy be the default?”
Ben Winters, a senior counsel at the Electronic Privacy Information Center (EPIC), who has been studying the harms of generative AI, stated, “Everybody is sort of acting as if there is this manifest destiny of technological tools built with people’s data. With the increasing use of AI tools comes this skewed incentive to collect as much data as you can upfront.”
Many companies are concerned that private data sucked in by the generative AI model will leak back out and cause harm. It happened at Samsung, where employees using ChatGPT discovered on three occasions that the chatbot revealed company secrets. Samsung subsequently banned the use of AI chatbots at work; Apple, Spotify, Verizon and a number of banks have done the same.
AI & Privacy Issues Wide-Ranging
A list of AI privacy issues recently compiled for an account in eWeek shows the issues are wide-ranging:
Little regard for copyright and IP laws;
Unauthorized incorporation of user data;
Limited regulatory bodies and safeguards;
Unauthorized usage of biometric data;
Covert metadata collection practices;
Limited built-in security features for AI models.
Issues around privacy and the collection of data for AI include concerns about these sources:
Web scraping and web crawling;
User queries in AI models;
IoT sensors and devices;
Public records;
User surveys and questionnaires.
The author warns: “Unfortunately, many AI vendors either don’t realize or don’t care when they use someone else’s copyrighted artwork, content, or other intellectual property without their consent.”
Also, “When AI model users input their own data in the form of queries, there’s the possibility that this data will become part of the model’s future training dataset. When this happens, this data can show up as outputs to other users’ queries, which is a particularly big issue if users have input sensitive data into the system.”
And, “Many AI models do not have native cybersecurity safeguards in place. This makes it incredibly easy for unauthorized users and bad-faith actors to access and use other users’ data, including personal identifiable information (PII).”
Some Best Practices to Protect Privacy in the Age of AI
Experts have suggested some best practices for protecting privacy in the age of AI:
Establish an appropriate use policy for AI: Internal users should know what data they can use and how and when they should use it when engaging with AI tools;
Invest in data governance and security tools: Including tools for extended detection and response (XDR), data loss prevention, and threat intelligence and monitoring;
Read the fine print: AI vendors typically offer some kind of documentation that covers how their products work and the basics of how they were trained. Model cards, for instance, increase transparency by communication key information about machine learning models; and
Use only non-sensitive data: As a general rule, do not input business or customer sensitive data in any AI tool. Techniques including data anonymization and use of synthetic data may help when pursuing use cases involving sensitive data.
Analysts at the legal intelligence site JD Supra have been tracking the explosion in the use of AI and generative AI tools, assisting clients with governance programs and encouraging safe use of AI tools by employees. “We find that many employees have no idea that their use of generative AI or other AI tools may have legal risks,” stated the authors of a recent JD Supra post on AI governance.
Referring to the eWeek article quoted above, the JD Supra authors stated, “Although the article is targeted to consumers, it is instructive for businesses using AI tools to be aware of what consumer facing publications are saying about business use of AI, and how consumers should be responding.”
US CFPB Guidance Says Lenders Must Cite the Reason Credit is Denied
The privacy risks of using AI tools are becoming better known. The US Consumer Financial Protection Bureau, for example, recently issued a press release with guidance on how lenders must adhere to certain legal requirements when denying credit based on the output of an AI model. For example, the lenders must cite “the actual reason for the denial of credit or a change of credit conditions.”
“Technology marketed as artificial intelligence is expanding the data used for lending decisions, and also growing the list of potential reasons for why credit is denied,” stated CFPB Director Rohit Chopra. “Creditors must be able to specifically explain their reasons for denial. There is no special exemption for artificial intelligence.”
The specific reasons for denial must be disclosed, “even if consumers may be surprised, upset, or angered to learn their credit applications were being graded on data that may not intuitively relate to their finances,” such as behavioral spending data of a broad group the applicant may fall into, the CFPB release stated.
Personal Medical Information Now in Play as Well
Personal medical information, some of which health care consumers mistakenly believe is protected by HIPPA privacy laws, is now within the target range of the big tech AI generators and third party marketers who see a lucrative market, according to a recent account in The Atlantic.
Users of the online therapy app BetterHelp, for example, each completed a questionnaire similar to the ones at therapists’ offices, describing their relevant personal profile, touching on medications and thoughts related to depression, self-harm and intimacy. “These questions were just meant to match you with the best counselor for your needs,” the Atlantic author wrote; users were assured their information would remain private.
Federal regulators later charged BetterHelp with sharing user data, including email addresses, IP addresses and questionnaire answers, with third parties, including Facebook and Snapchat, “for the purposes of targeting ads for its services.” The company finalized a settlement with the Federal Trade Commission in July, in which the company agreed to refund $7.8 million to consumers whose privacy had been compromised. BetterHelp in a statement admitted no wrongdoing and called the sharing of user information “industry-standard practice.”
“When somebody downloads an app on their phone and starts inputting health data in it, or data that might be health indicative, there are definitely no protections for that data other than what the app has promised,” stated Deven McGraw, a former deputy director of health-information privacy in the Office for Civil Rights at the Department of Health and Human Services, to The Atlantic. McGraw is currently the head of data stewardship and data sharing for Invitae, a genetic-testing company.
Some actions are being taken. The FTC is providing a tool to help app developers comply with privacy laws, and the Health and Human Services Office for Civil Rights has provided guidance on the use of online tracking technologies by entities and businesses covered by HIPAA.
And a consumer-privacy framework has been proposed by the nonprofit Center for Democracy & Technology in response to the changing reality that “extraordinary amounts of information reflecting mental and physical well-being are created and held by entities that are not bound by HIPAA obligations.”
The nonprofit Center for Democracy & Technology has also put together its own proposed consumer-privacy framework in response to the fact that “extraordinary amounts of information reflecting mental and physical well-being are created and held by entities that are not bound by HIPAA obligations.”
Read the source articles and information in The Washington Post, eWeek, JD Supra and in The Atlantic.
(Write to the editor here; tell him what you want to read about in AI in Business.)
Click on the image to buy me a cup of coffee and support the production of the AI in Business newsletter. Thank you!