AI Is Making Therapy Transcript Re-Identification Easier: Part 3 of 3

This is Part 3 of 3 in a series about the risks of retaining therapy session transcripts. You can read Part 1 here (on "de-identified" transcripts) and Part 2 here (on "de-coupled" transcripts).

So far we've talked about how transcripts are "de-identified" and "de-coupled" -- and why neither of those words means quite what they sound like.

That leaves the question we've been circling the whole time: what can still be figured out anyway?

That's the re-identification risk. And it's the part of this conversation that AI is making harder and harder to ignore.

Re-identification is the process of taking supposedly de-identified or de-coupled data and reconnecting it back to its source -- a specific client, a specific therapist, a specific practice, a specific family, a specific workplace, a specific community.

The risk isn't only what the transcript says directly. The risk is what someone (or something) can piece together from the details that remain.

And the more useful a transcript is for AI training, the more real-life detail it tends to contain. The richness is the whole point. It's what makes the transcript valuable. It's also what makes it risky.

Remember from Parts 1 and 2 -- de-identification can remove the obvious identifiers like names and numbers, and de-coupling can remove the direct database relationship. But these processes don't remove context, meaning, speech and language patterns, the transcript content itself, or clues spread across multiple sessions. So de-identification and de-coupling can make the data less directly connected to an individual. But it doesn't automatically make the person unrecognizable.

And AI and technology is evolving faster than ever.

There is more searchable public information about people than there has ever been -- local news, court records, property records, school websites, professional directories, LinkedIn, Facebook, Instagram, obituaries, online reviews, data broker profiles, archived web pages, and more. Combine even a "scrubbed" transcript with the right slice of public information, and the gap between "anonymous" and "oh, that's definitely him" can collapse pretty fast.

And here's the kicker:

AI is unusually good at connecting fragments. It's good at summarizing timelines, extracting relationships, comparing details across documents, inferring missing context, and finding patterns in messy, unstructured information.

This doesn't mean every transcript can easily be re-identified today. But it does mean that "we removed the obvious identifiers and the direct database link" is a much weaker promise than it was a few years ago.

These days, you can't shrug and say "no person is going to skim through these details to piece it all together", because a human doesn't need to do that at all -- AI will do it instead. And AI will work 24/7 to figure it out. And AI's ability to do this is only getting better.

Circling back a bit, think about what tends to actually be in a transcript. A small community. A spouse with a specific job. A child involved in a public situation. A recent workplace departure. A custody dispute. A rare diagnosis. A specific therapist specialty. A local event.

None of these are Safe Harbor identifiers (like we discussed in Part 1). None of them require a direct database link to the client. But put a handful of them together and you've basically got a fingerprint.

And one session may reveal a few small clues -- but a year of retained transcripts from the same client may reveal a full history of life events and details for that client. (And considering speech and language patterns that are so unique to each of us, linking transcripts together -- again, with AI -- doesn't sound too far-fetched!)

Even if a de-identified, de-coupled transcript can't be re-identified today, that's not really the only question. The better questions are:

Could it be re-identified a year from now?
Could it be re-identified five years from now?

Because AI is getting better. Data-matching tools are getting better. Public records keep getting easier to search. Data brokers keep collecting more data, not less.

Lastly, retained transcripts can outlive the original product decision, the original privacy policy, the original leadership team, and even the original company. Companies get acquired. Privacy policies get rewritten. Founders leave. Startups shut down and sell their data to whoever's buying. The transcript a vendor de-coupled in 2026 with the best of intentions may end up in completely different hands by 2031.

So we also have to ask: Could this super-sensitive data be connected back to the client later, by someone we never agreed to share it with?

Therapy is where people say things they may not say anywhere else. The details are often personal, specific, painful, and unique to that one person's life. Those same details are exactly what makes the data valuable for AI training.

But why do we have to make that sacrifice to privacy? Does it do the client any good, or could it only lead to possible harm?

We understand how it helps train AI, even if we disagree with it. But let's all step back -- do we have to collect and store this data in the first place?

Thanks for reading the full 3-part series. If you have thoughts (or pushback!) we'd genuinely love to hear them -- jon@quilltherapysolutions.com.

Also, we've since posted a more thorough post on this topic, about retaining de-identified therapy session transcripts and why that's a bad idea, including how these transcripts can potentially be re-identified later, if you want to give that a read next! Maybe pour yourself another cup of coffee?

AI Is Making Therapy Transcript Re-Identification Easier

Part 3 of 3

What is Quill?