That is sort of understood facts with even models like Copilot & ChatGPT. With t...

srvmshr on Jan 14, 2023 | parent | context | favorite | on: On the dangers of stochastic parrots: Can language...

That is sort of understood facts with even models like Copilot & ChatGPT. With the amount of information we are generally churning, all PII may not get scrubbbed. And these LLMs often could be running on unsanitized data - like a cache of Web on Archive.org, Getty images & the likes.

I feel this is a unavoidable consequence of using LLM. We cannot ensure all data is free from any markers. I am not a expert on databases/data engineering so please take it as an informed opinion

weeksie on Jan 14, 2023 [–]

Copilot has a ton of well publicised examples of verbatim code being used, but I didn't realize that it was as trivial as all that to go plumbing for it directly.