Episode 175 — Data Provenance and Privacy: Personal Privacy and the Rise of AI
Manage episode 429489734 series 3568385
Artificial intelligence is not new. But now an acronym in common usage, AI is dominating markets, politics, industry, and our attention. And its use affects personal privacy.
Let’s take a couple examples. Bathsheba was the mother of Solomon in Torah and biblical days. Solomon’s father was King David. tIf you ask Google’s Gemini what ethnicity was Bathsheba, you’ll get an answer that this is uncertain but she probably was Hebrew. We can’t ask her, because she died about three millenia ago. But what does Gemini base its answer upon that indicates probably Hebrew? Asking that question is asking about data provenance. And if we were asking about a living person and who the parents of that individual is - example if adopted - that’s a very private matter. Let’s say the data says a major politician is the mother or father of someone else - and the individual denies it. You can see how there privacy matters are very much at stake when a dataset is being created and then used to provide information.
So what is data provenance? Think about it as the genealogy of information.
Where did the information originate? How was it used and misused and transformed? How do we know if data is reliable and trustworthy or whether it is disinformation or hallucination? This is where the provenance of a dataset comes in.
Datasets are the blood flows of AI. Garbage in garbage out - computer age. And misinformation in disinformation out . This can have major consequences as AI takes over more and more of the roles humans used to play when oral transmission and later hard copy was the way info was shared, stories were told, myths were created, science progressed. In the digital age, it all happens much faster. As AI expands, so does the risk that datasets will be unreliable and invade personal privacy of people.
So - the need for standards of data provenance, rules and standards for determining the lineage of info, the reliability of a dataset. Otherwise, AI will be misused and create harm ad invade privacy and reputations of individuals improperly.
So how are standards for the provenance of data and datasets being created?
Government regulation is one approach. But here govts are catching up with tech change. There is no gold standard, or perhaps any colorable standard, set by govt or govts about data provenance.
There are some efforts under way from the tech and business world to create data prov standards. One is by Data Provenance Initiative, powered by Cohere - aiming to track and trace data sources within data sets. Understand - almost 2000 data sets audited and traced to determine the provenance of data held within. Picture a variety of techniques to do this, so we can measure the origin, accuracy, reliability of data that feeds AI. Check out https://dataprovenance.org.
Another major effort is by Data and Trust Alliance, a nonprofit group of major western companies. Including Walmart, Deloitte, Humanan, Pfizer, Mastercard, others. Goal is to produce voluntary standards for data provenance. Think of this like voluntary standards for safety in other fields. E.g., UL label re elec - not govt. Or food safety standards. Labels about organic or level of whether fish is farm or wild, endangered or not.
We’ll talk in next episode with reps of D & T Alliance about standards announced in June 2024 to cover:
Data type, source, legal rights, privacy and protection, lineage, generation date and material, intended use, and restrictions.
Privacy aspects are an essential part of setting standards. Can AI use personal data of individuals gathered without their consent? Can AI use personal identities in formulating datasets about a mass of people? How reliable is data that cannot be traced definitively back to someone who grants consent and has the ability to correct or delete such data? What happens if personally identifiable information is included in a dataset and then is used to broadcast or leak that information, which allows someone to take malicious or unwanted action against that individual. Does anyone truly have a right to be forgotten? A right not to be doxxed? A right not to be maligned because of one’s identity, race, national origin, religion?
Tune in next week to Episode 176 as we dig deeper into data provenance and privacy.
The first 155 episodes of Data Privacy Detective can be found on the feed of the Frost Brown Todd Podcast. You can listen on Apple Podcasts (https://apple.co/3IrHUTg), Spotify (https://bit.ly/49XRU2k), or Soundcloud (https://bit.ly/3T8EWrw).
31 episoder