From the AnthologyAI Blog

Listen Don’t Eavesdrop and Other Lessons Learned on Data Quality

May 7, 2024

The lie we’ve told ourselves as data practitioners - and the one we continue to tell ourselves - is that the headwinds we face from regulation and the demand for informed consent are suppressants and not accelerants. 

“The worst lies are the lies we tell ourselves.” It is with great regret that I start here, with a deeply mis-contextualized quote by the author Richard Bach. 

The lie we’ve told ourselves as data practitioners - and the one we continue to tell ourselves - is that the headwinds we face from regulation and the demand for informed consent are suppressants and not accelerants

Factually however, the optimistic scenario - and the one that we’ve observed at Caden - is that much like ethical research practice in the social sciences, informed consent for passive data collection has the opportunity to create not only optimal data quality (in the absence of intermediaries, conflicts of interest and natural degradation) but a higher degree of inference and data exploration that makes analytics more meaningful. In clinical trials, sociology, and ethnographic research, this is table stakes; “taking time to build relationships before expecting research participants to consent and replacing informed consent with a negotiated agreement” is not only the more ethical approach but the premise on which the best research - and as such the best analytics - are derived. 

From the perspective of the data-driven marketer, investor, or corporate analytics professional - the analog is easily found. Today’s model of data usage (setting aside provenance for a moment) is set up for failure in the context of signal loss. Let’s take a look at the dependencies and business models that box us in:

  • Signal Loss and Modeling: Digital advertising, and the data industrial complex that supports it, has long been handicapped by the need for scale in its business model (more on this later), operations, and value proposition. It could definitely be argued that in the transition from mass-reach advertising to personalization, we forgot to personalize the media plan. We expect to reach everyone all the time, leveraging granular data to activate messaging and strategies that are the furthest thing from “fine-grained”. The twisted incentive that arises from this conflict is the desire for a supplier of data to generalize and expand audiences far beyond the reach of the analytics and truth sets we can access. In an exogenous context where truth sets are doing nothing but shrink and lose fidelity, applying the last decade’s strategies to today’s challenges isn’t likely to do much else but aggravate an existing data quality catastrophe where “37% of marketers waste spend as a result of poor data quality” and “30% have lost customers as a result” (Forrester Consulting). Scott McKinley, founder and CEO of data validation firm Truthset, says it the clearest: “the current data ecosystem is built for scale, not accuracy.”
  • Supply Chain Degradation: The “data supply chain” has become a gordian knot of complexity, dependencies, and questionable provenance, with data sourcing treated (conveniently) as “confidential information” as much for reasons of competition as reputation. All criticism aside, the fidelity loss we see from a source (a panel, an SDK, a cookie, a device ID) is more linked to the telephone tag played between the originator and consumer of data. Aggregations, differential privacy (intentional or not), modeling, and loose and “adapted” schemas trickle across the value chain to the point that it becomes nearly impossible to identify fact data at the point of activation or attribution.
  • The Audience Segment: The real culprit of so many of our problems may in fact be the foundational concept of the audience segment, far preceding the emergence of programmatic itself. To ease the operations of campaign management, sales processes, and tracking and verification (and scale itself), advertising has had to adapt the full spectrum of data collection and provenance into roughly a few thousand loosely defined segments, themselves limited to a purely binary heuristic of audience membership. And if all the segments get smaller with the steady march of privacy compliance, and the only way to scale them is to model more - to oversample more - what could be more futile than continuing on the road more traveled? Already, we’re starting to see positive signs of stakeholders emerging from the innovator’s dilemma and challenging this; Ogury’s view of “personified” vs. “personalized” advertising might be a fantastic example of such a school of thought.
  • The “Cookie-Census”: To (mis)quote the Wizard of Oz: “pay no attention to the cookies behind the curtain.” For years, modeling and scale have largely been enabled through one key tool, cookies mapped to loose contextual media consumption data, providing limited information gain and even more limited predictive power. The appeal of course was ubiquity -  digital publishers, social login providers, and DSPs were happy to oblige. Now however, with this brute force instrument increasingly deprecated by regulation and compliance (and the fact that it wasn’t a great business in the first place), there is a vacuum of model features, at the worst possible moment. 
  • Yield: The business of the internet gets talked about much less than the features thereof. The summation of all the points above affects publishers uniquely - what marketers think of as “signal”, publishers view as “addressable impressions”. With the lost yield of high-fidelity audience-targeted impressions often making the difference between survival and bust, the only way to avoid an unwinnable war is to reset the rules of engagement. That’s what inspires us here at Caden to work for a fairer data economy, every day.

Here at Caden (launching early March - Download the app on the iOS store), we definitely believe informed AND explicit consent is the greatest opportunity for data quality yet; a single source passively disclosed dataset with full semantic context. More to come next week…

“Privacy means people know what they’re signing up for, in plain language, and repeatedly. I believe people are smart. Some people want to share more than other people do. Ask them.”

Partner with AnthologyAI

We work with the most progressive, technology-enabled, ethically-minded corporations who want to rethink how we collect and make decisions from consumer behavioral data. Our platform fulfills hundreds of different use cases. Want to see how we can bring new consumer intelligence to your business? Get in touch with our strategic development team.