Ethics and LLMs: Privacy and Security

May 22

Written By Karen Boyd, PhD. Director of Research at PIC

Mission-driven organizations have a lot of private data. A philanthropy has personally identifiable information about donors. A hospital has medical records. Government agencies have addresses, social security numbers, sometimes income and banking information, and all kinds of information about your interactions with the government and its programs.

People trust us. And we’d be in some serious hot water if we violated that trust.

If there is one thing that your organization has an AI policy around, it’s probably data security. No one wants their organization’s private data living on someone else’s servers, waiting to get spit out in response to another user’s query!

In this post we will review:

The differences between privacy and security
How to tell whether and how an LLM is using your data
How to protect your data and still get a useful answer from an LLM
What kind of data do you want an LLM to have about your work?

If you take nothing else away from this post, it’s that you should have clear policy about data use with LLMs! You can get a simple policy template for free in the Mission-First AI Starter Kit.

Privacy vs Security

People often use these two words interchangeably, but there is a difference.

Privacy is your control of information about you. Some define it as “the right to be left alone,” or the right to seclude your self and information about you from others.

Data security is the safeguards that protect information about you from being seen.

Imagine your bedroom. You don’t want just anyone coming in! That feeling is a desire for privacy. Security is the lock on the door.

The danger of conflating privacy with security is that we tend to let the definition of security subsume the word privacy. In doing so, we can under-appreciate the value of privacy. This can make it difficult to talk about how and why privacy matters.

Perhaps the most useful idea about privacy is “contextual integrity:” our preferences about our privacy are tightly tied to context. I am fine with my bank knowing by financial information, but it doesn’t seem right to me that they would know my location. The app that gives me directions of course can have my location, but if they wanted my financial information, I would find that invasive! My understanding of how data will be used and the appropriateness of that use for its context are central to my preferences about privacy.

At work, you may be responsible for data security broadly: data you have needs to be safeguarded, and the security measures IT has set up don’t work if the people who work there don’t abide by their rules. This is pretty simple, because there are rules in place, and they often just treat all data as sensitive: keep a password on your computer. Don’t let strangers into the building. You’ve seen the videos, I’m sure!

However, privacy at work is a little more complicated. You have information about yourself and your personal life, which (ideally) you have total control over and perfect access to the preferences of the data subject (you!). However, if you have access to information about others— clients, patients, coworkers, donors, or students, for example— you have the responsibility to safeguard others’ privacy, too, not only because it is likely required by your organization’s policy and the law, but also because these people trusted you with their data. You are tasked with protecting others’ privacy without entirely understanding their preferences.

If you don’t have a policy, get started with a simple policy template in the Mission-First AI Starter Kit!

Bonus definition: Anonymity is the inability of others to link information about a data subject with their identity. If you are anonymous, people may still be able to see what you’ve done or said, for example, but they will not be able to associate that speech or behavior with your name. Anonymity is one method of securing your privacy.

How your LLM is using your data?

People worry (understandably!) about their privacy when using LLMs. Let’s talk about the ways that privacy plays out with LLMs.

The main fear that clients talk to me about with LLMs and data is the idea that they are sucking up all the data you give it in prompts and associated documents, storing that data on their servers (where it may or may not be secure), and using it to inform answers that it gives other users.

Here’s the reality of the situation. As of this writing, the largest American LLMs (ChatGPT, Claude, Co-Pilot, and Gemini) encrypt your data by default, so when it it is in storage or in transit, it is secure. It is unlikely that it will handout names and social security numbers: those would not be an appropriate answer to a query that these LLMs are allowed to answer (The examples I have seen of LLMs spitting out personally identifiable information have been later proven to be hallucinated, fake information.) However, there are people trying every day to get around the safeguards programmed into the LLM to avoid it answering sensitive questions like that.

Also, personally identifiable information is not the only sensitive information you have access to. An LLM very well could suggest your organizational strategy to a competitor, or think your grant idea is a perfect fit for someone else. Now, will that other user prefer that answer and act on it as fast as you can? Probably not. But, you still might not want to have that information out there.

If you are worried about this, you need to check if the specific LLM you are using, for the account type you have is reusing prompts as training data.

As of this writing, you can turn off prompt reuse in ChatGPT (by disabling "Improve the model for everyone" in your settings) and Gemini (by turning off the Gemini apps activity setting in your dashboard.) Microsoft says that Co-Pilot does not reuse prompts. Claude will use a specific prompt for fine-tuning if you give it a thumbs up or down, but does not reuse prompting data by default. I (using my own privacy preferences) generally consider these approaches safe.

Grok (xAI/Twitter’s chat bot) does reuse “your content and interactions with Grok (e.g., prompts, searches, and other materials you submit) along with Grok's responses to train our models.” Chinese AI (e.g. DeepSeek, ChatGLM, and others) use your data extensively. I (again, personal opinion) consider these approaches not safe.

An under-appreciated security concern with LLMs is whether it is using data from outside your chats to personalize your experience. Gemini and Co-Pilot currently use data from your Google accounts and Microsoft (respectively). This is not my personal preference, but I guess if I was OK with them having the data when I created the data on my account in the first place, my complaint rings a bit hollow.

How to get an answer from an LLM without sacrificing your data

If you are using an LLM with an unsafe approach, you do'n’t have admin access to the reuse settings described above, things change, or you simply don’t trust what the company says about data reuse, you may want strategies to get useful answers without sharing key data. You can write prompts that will help you get useful information from suspect LLMs by using protective prompting or splitting up prompts

Minimizing and anonymizing sensitive data. If anonymizing is enough security for you, you can preserve your data privacy by:

1) deleting unnecessary, identifiable information,

2) replacing names and other identifiable features that you need to refer to with fake ones

3) “find and replace” in your word processor to restore the correct information.

The key here is making sure that your LLM will not reword things. If the private data is a name, this is fine— you can ask it to write an email to Sally instead of Sarah, and you will be able to find and replace.

However, if what you are replacing isn’t a proper name, the LLM might say it a little bit differently, and find and replace will not work. To try to prevent this, you can put the word or phrase in quotes, specifically request that it is not referred to any other way, and ask that the LLM return a document. The goal of having your response in a document is that, when you read the answer (which you must do anyway), if you see it has rephrased something, you can ask it to fix it and it won’t regenerate the entire response.

When I tested it today, Claude and ChatGPT happily created a document, Gemini gave me markdown, which you might find more annoying to work with, and Co-Pilot just responded in chat, but treated it as a document. All were able to execute small tweaks without regenerating all the text. If I leave out the “document” request, I get worse results that can’t be edited by the LLM as we want.

Compartmentalizing your prompts. Unlike names, strategies and ideas cannot be anonymized. If you are worried about an LLM giving away your bigger ideas, you may still be able to get an LLM to help you with some parts of your work. The key here is making sure that the LLM is not connecting pieces of your idea together.

LLMs are most powerful when they have as much context as possible, but if you are worried about an LLM internalizing your idea, you want to split it up. Ask about small parts of your project at different stages (e.g. brainstorming approaches, reviewing copy, or evaluating ideas) in separate chats, either with all personalization settings turned off or in separate LLMs altogether.

For example, I have a side project where I write short stories and add annotations to help English lanugage learners build their vocabulary. When I was in the planning stage, I could have asked Claude “Please give me some ideas for brand names centered around learning, adventure, and vocabulary” then turn to ChatGPT and ask “I’d like some plot ideas for a series of short stories involving travel,” and go back to Claude and ask “I would like advice about marketing vocabulary-building books on Amazon.” I won’t get as good of information as if one model had all the context for a story saved in a project, but I can still get some leverage out of it. (Note that once your idea is on the Internet, it is accessible to LLMs, but of course it is also accessible to everyone else, so it is probably not that private :)

Aggregate your data. This solution can be a bit technical, but if you would like to use AI on your donor database, you can use (or hire someone to use) differential privacy techniques to allow you to get insight out of data using a custom AI product without being able to see or deanonymize the data.

When do you want the LLM to have your data?

If you want your LLM to personalize its responses to you, if you would like to give feedback, or if you want your work to be discoverable through the LLM, it will need your data.

Personalization can be really helpful in some cases. If you use it to plan things for yourself, like travel, it can be pretty helpful for it to know what you like and don’t. If you use it to help you with your interpersonal problems (BE CAREFUL), it can really benefit from the history and learning about your strengths and weaknesses. If you want it to consistently respond from an economic, management, HR, or marketing perspective, it’s nice that it knows to do that without having to explain repeatedly what you care about.

I do not turn on personalization, because I don’t use it for personal stuff, and most of the useful features of personalization I can get by creating project folders and using them consistently. I generally don’t want my projects bleeding into each other, and I can set custom instructions manually where I want global personalization. Perhaps I’ll write posts about how I use projects and custom instructions in the future!

You can learn more about how personalization works in Claude, ChatGPT, Gemini, and Co-Pilot. It is worth reading up on these, because some of them can use data outside of the chatbot to customize.

If you want to give your LLM feedback that it listens to, you need to share your data to some extent. Claude has this built in, in the form of thumbs up and down icons below its responses, so you can give feedback without having to otherwise open up your data reuse settings.

You might want LLMs to improve your discoverability online. Increasingly, people are using LLMs instead of search engines to find the answers to their questions online. You might want your project to be one of the answers they get. If you sell something and would like your product or service to be discoverable on through that large language model, you want the LLM to have and even train on some of your data.

Key words for this are “SEO for LLMs,” “LLMO” (Large Language Model Optimization), and “brand visibility.” If you want to learn more about what an LLM knows about your brand, log out, create a blank chat, and ask some questions. Start with broad questions, like “what can I do to [solve the problem my brand solves]?” and “What are some [product type]s” before you ask directly “What do you know about [my brand]?”

I hope this post was helpful for your understanding of privacy and security in LLM use! Let me know if there’s anything I missed or can dig into in future posts!

—

LLM disclosure:

I used Answer with AI, the search summary feature in Brave Search, to help find links about the privacy features and policies of different LLMs. I do not trust the AI summaries, but I sometimes use them to get links to sources, and I am confident in my ability to asses the sources. This is a good case for that because it is straightforward for it to return links to official documentation.

I tested Claude, ChatGPT, Gemini, and Co-Pilot for the ability to make small edits without regenerating all the text. I started with:
”What does it mean when a squirrel chirps at you? Please respond with a document.”

When Gemini and Co-Pilot responded with things that didn’t look like the documents I expected, I prompted them to change one word:
”Can you change this to replace the word "agitation" with "activation?"“

Then, I tested to make sure that the “please respond with a document” clause of my original prompt was still necessary by opening a new chat and asking only:
”What does it mean when a squirrel chirps at you?”
and then asking it to change a single word again (this did not work, leading me to believe that “please respond with a document” is still necessary).

Ethics

Karen Boyd, PhD. Director of Research at PIC https://drkarenboyd.com

Ethics and LLMs: Privacy and Security

Privacy vs Security

How your LLM is using your data?

How to get an answer from an LLM without sacrificing your data

When do you want the LLM to have your data?

LLM-augmented Brainstorming: How to make it work

Ethics and LLMs: New Series!

Get free resources to jumpstart effective, ethical AI use at your organization!