Today we’ll look at the basics of the data involved in understanding where customers come from. Like our last question, this subject can be complex, but we’ll keep our focus on doing the basics well. Any company can build complex growth systems, but finding a company who has had clean data from the start is surprisingly rare.
Channel-Level Data Mapping
The effort to reach customers is generally described with terms like “acquisition,” “top of funnel” or “lead gen(eration).” At Yield, we call the data associated with acquisition “channel-level data mapping,” which is a fancy way of saying, “what data can we collect about the channels where we reach customers (and they reach us)?”
The most basic level of defining customers is grouping them into what we call “personas.” Sometimes these are called customer profiles or archetypes and often they are grouped into what marketers call “cohorts.” The basic concept is the same: customer personas are descriptions of the most important and/or most common types of customers your company serves.
The focus in channel-level data mapping is often called “first-touch attribution,” which we’ll discuss in more detail in a future post, but for now the point is this: analyzing your customers’ first interaction with you is critical to understanding which channels, or sources, lead to the most customers.
Even The Masters Had To Master The Basics
101-material, though, doesn’t mean that everyone has mastered the basics. From my experience, even though lots of companies and growth teams are doing some form of channel-level data-mapping, few are doing it really well, or in a way that will scale.
I would guess most readers of this post are already doing some sort of channel-level data mapping or at least have a functional understanding of how to use that information in Google Analytics, etc. For many working in growth roles, this is 101-level stuff.
Don’t get me wrong, any attribution is better than none, but clean first-touch data is critical for sustainable growth and, most importantly, it’s really, really easy to set up a data structure that becomes unmanageable at scale.
Secondarily, and this is something most companies don’t do, channel-level data-mapping is the first step to take in validating hypotheses about personas at the top of the funnel (which we’ll discuss in detail below).
With that said, let’s dig in.
What Is Channel-Level Data Mapping?
As I said before, collecting first-touch data can be very complicated and needs can vary depending on the type of business. Even still, the basics of channel-level data mapping are simple: we want to capture as much data as possible about the first trackable interaction we have with a customer. We’ll spend most of our time in this article looking at basic digital tracking to reinforce this concept (we’ll cover device fingerprinting and stitching in future posts).
The core tactic used for first-touch attribution is link tagging. Most readers will be familiar with this concept, but for the sake of completeness, I’ll explain it quickly for those who aren’t.
Link tagging is the process of appending information to the end of a URL via query string parameters. Query string parameters sound fancy, but they’re really simple: it’s a mechanism that makes it easy to send and receive data from URLs.
Here’s a really simple use case: let’s say I want to post this article on LinkedIn, then see how many people come to the site to read it. I can use a query string parameter appended to the end of the post URL to explicitly tell Google Analytics where I posted the link (and where inbound traffic from the link came from). The URL could look something like this:
http://yieldgroup.xyz/blog/post?utm_source=LinkedIn
Query string parameters are infinitely flexible—you can name them whatever you want. In this case, we’re using a parameter called utm_source.
Simple enough, right?
How Query String Parameters Are Used
Let’s look quickly at the most common ways query string parameters are used for channel-level data mapping.
Aggregate Analytics Tools
The first use case for query string parameters is almost always aggregate analytics that help businesses understand the effectiveness of their acquisition efforts. Returning to the example above, if I tag all of the blog article links I post on LinkedIn, I’ll be able to view the behavior of visitors from that source (or channel).
That becomes extremely valuable when you think about being able to answer questions like, “what is the bounce rate for visitors from LinkedIn? How long do they spend on the site? Do visitors behave differently based on the time of day the link is posted?” You can’t see data on individual users in free aggregate analytics tools like Google Analytics, but they are excellent for answering detailed questions about groups of users—like visitors from LinkedIn.
In other words, using query string parameters in a URL allow can allow growth teams to understand whether visitors from a certain channel are high quality or not.

Lead Capture Mechanisms
One age-old practice in growth is populating hidden fields on a form with query string parameters so you can append first-touch attribution data directly to a contact record in your CRM, marketing automation tool, spreadsheet, etc.
More and more, though, tools are automating this process in the context of better user experience. Tools like Drift, Autopilot, Sumo, etc. make capturing a visitors referring URL and specific query string parameters (and adding it their contact record) an almost completely hands-off process, which is great.
Even better, most growth teams now have the ability, within the walls of new software tools, to keep data on anonymous visitors and later append it to a contact record when the person does provide their email or another unique identifier. (Leveraging data points for anonymous users across a martech stack is another post for another day…)

User-Level Analytics And Dynamic Content Tools
Last, but certainly not least, first-touch attribution data can be extremely valuable when viewed and acted upon on a user level. Tools like Mixpanel, Amplitude, Adobe Analytics and others allow you to see where specific users came from on a channel level, analyze their behavior, collect them into cohorts, and continue to track their behavior through their entire customer journey—which is your first step towards a multi-touch attribution model for your business.
Additionally, query string parameters are often used to dynamically change content on a site for specific users. For example, if I know a particular user is interested in a certain product, I can change the header image of the site to show that product (as opposed to something ‘random’). If you have enough clean data, you can actually create a completely customized user experience, user-by-user, which is wild (but far more common than most marketers would think).

Best Practices For Using Query String Parameters
The flexibility of query string parameters is one of their greatest strengths, but it’s also one of their greatest weaknesses. You can technically name them anything you want, but if you’re using Google Analytics (which 99% of companies are), the best practice is to use Google’s 5 ‘out of the box’ parameters, which are:
- utm_source
- utm_medium
- utm_campaign
- utm_content
- utm_term
Why would we box ourselves into 5 parameters when we have so much more flexibility? Well, for the use case of tagged links in customer acquisition, using Google’s params ensures that we always capture acquisition data at an aggregate level in Google Analytics, which gives us the ability not only to report on it, but also to leverage that data in other critical tools like tag managers, ad platforms, etc. that are bread-and-butter for growth teams.
There are certainly other ways to do this and other theories on best practices, but having seen this done (and done it myself) on both a small and extremely large scale, I can say with confidence that this is the best practice for a vast majority of companies (in the acquisition use case).
Ok, now that that’s settled, it’s time to warn you that…
…This Is Where Things Get Tricky
The example I used above is simple enough because utm_source is at the highest level of hierarchy in the schema and top-level channel names don’t really vary or have iterations over time.
As you move down the schema, though, it’s a different story. Let’s look at a quick example. Here’s a breakdown of a link someone might use to capture acquisition data:
- URL: http://yieldgroup.xyz/awesome-landing-page
- utm_source: LinkedIn
- utm_medium: Sponsored_Content
- utm_campaign: Useless_Whitepaper_Download_v2
- utm_content: Awesome_Growth_Stats_Robot_Image
- utm_term: {keyword}
The goal here is testing whether we can capture leads by offering a white paper download through the Sponsored Content medium on LinkedIn. From a singular view, this tagging setup would work fine for quick “growth hacking,” but now imagine that you are running 10 different ads across 3 campaigns. How about 100 different ads across 15 campaigns? How about 5,000 ads across 75 different campaigns?
Anyone who’s faced this problem has felt the pain of unsustainable spreadsheet and reporting hell. This is what I was referring to earlier in the post when I said that many growth teams do this, but few do it in a way that scales with clean data.
There are several ways to build a good schema and tag links in clever ways, which we’ll discuss in a future post, but here’s a quick look at some of the first-touch data that the best growth teams are capturing:
- Channel (or source)
- Medium within that channel (the actual format of content)
- Content attributes
- --Image
- --Copy (headline, subhead, body, etc.)
- --Product or service (what you are selling)
- --Call to action and/or offer
- Audience attributes
- --Persona (see what we did there?)
- --Geotargeting
- --Demographic information
- --Behavioral information (if retargeting, for example)
- Campaign attributes (especially for paid)
- --Internal name/category/etc. for the growth effort
- --Type (i.e., branded, competitor, etc.)
- Testing attributes
- --Test type
- --Test variation
Hopefully that list leads you to the conclusion that achieving clean, reportable first-touch data is more than tagging—it requires an actual data architecture. If you’re asking, “how in the world do I do that,” you’re in luck—we’ll talk in more detail about query strings in a future post.
The Holy Grail: Reporting At Scale For Rapid Optimization
Ok, now that we’re past some of the technical bits, let’s step back and look at the big picture.
The entire goal with channel-level data mapping is to be able to ask any question about your acquisition efforts, quickly get an answer, then take actions to optimize and increase performance. Said simply, this data helps growth teams know if what they are doing at the top of the funnel is working or not.
Clean data saves you the pain and time of munging data together in spreadsheets and have immediate access to the insights that will increase performance, which is the holy grail for any growth team.