Hello! We're Pan, a Data Scientist, and David an Analytics Engineer here at Monzo. We both work in the marketing team and today we're going to run you through some of the exciting data work being done in this space.
The world of marketing data is a bit like a lasagne; rich and nourishing, but with many layers which can sometimes be hard to piece together. Today we’ll be giving you a flavour of the challenges that come with this, how we’ve tackled them and how we’ve tried to design a system that’s able to deal with future changes in the landscape, both from a Data Science and Analytics Engineering perspective.
Specifically, we’ll be digging into how we attribute value to the paid advertising we buy from Apple, Google, Meta and various other digital channels. We want to do this so we can measure the impact driven by our ads on user acquisition and promoted feature usage, this helps to inform our wider strategies around growth as well as how we invest in this part of the business.
🎢 A brief account of Apple SKAd
Back in 2018, Apple launched a service that would go on to change the face of mobile advertising but this wasn’t the type of software that had developers excited like children at Christmas. No, instead it had marketeers waking up from data induced nightmares. That service was Apple SKAdNetwork.
SKAdNetwork is Apple’s answer to attribution data sharing in a world where user privacy is increasingly the top priority. This shift in perspective was realised in April 2020 when Apple’s App Tracking Transparency (ATT) feature was launched. Allowing iOS users to opt out of data sharing, which the vast majority did, and forcing companies to turn to SKAdNetwork to understand the performance of their paid media spend.
When data is shared through SKAdNetwork, it prioritises user privacy by withholding information that could potentially identify individuals. This privacy-focused approach then further protects anonymity with the following:
Limited In-App Event Tracking: Conversion values are 6-bit value configured by the app developers (us) to measure post-install activity and tie it back to the install. This means that based on our current standard set up, we are limited to tracking 6 in-app events (the advanced setup which could allow us to track up to 63 conversion events was ruled out as user journeys in banking are not linear, see documentation for more detail)
The 24-Hour Window: After an install or any other conversion took place, a 24-hour timer started. During that period of time, if the user conducted any other action that updated the SKAdNetwork conversion value, the timer would reset for another 24 hours. This means that if a user has a gap of more than 24h, we only track them before that gap.
Delayed and Anonymised Postbacks: Only when the conversion value hadn’t experienced any updates for 24 hours, the conversion value is locked and the final timer would start. At some point during the following 0-24 hours the final postback would be sent. The purpose of this is to protect user privacy, making it impossible for marketers to leverage user level information (including IDFA) to target users.
Together, these features prevent apps from connecting the dots internally to deduce user identities. Finally, data transmitted through SKAdNetwork is aggregated upon delivery, adding an extra layer of protection while still providing the necessary information to attribute value to marketing campaigns.
⏲️ How SKAd impacts our data at Monzo
Considering the process of opening a bank account, there are scenarios where an application can be approved more than 24 hours after initiation.
Imagine 100 people completing their application journey through three signup stages tracked by SKAD: application started, approved, and completed.
Initially, most would be reported by SKAdNetwork due to a short delay between installing the app and starting an application. But as they progress, around 80 out of 95 applicants would reach the approval stage within 24 hours. Delays could be caused by various reasons, like when applicants don’t have the necessary documents to hand when completing signup.
By the time they complete the application, again about 60 out of those 80 people would do so within the 24-hour limit after application approval. Consequently, SKAdNetwork would report only 60 successful signups.
These nuances and challenges are essential to understand as they impact the way we accurately capture new users attributable to performance marketing. Our dedication to user privacy and a privacy-friendly framework underscores our commitment to responsible data handling and sustainable growth.
In the next section, we will explore how our performance marketing team at Monzo integrates adjustments to compensate for this data loss and ensure effective marketing strategies.
🔮 The complexities this adds for reporting and analysis
At Monzo, one of our key metrics when it comes to Performance Marketing is Cost of Acquiring Customer (CAC) which is defined at the point when a user’s application is successfully completed. Because of the data loss outside of the 24 hour window in SKAd, we often lose track of our users on Apple devices by the time they activate, meaning this metric consistently suggests that campaigns targeting Android outperform iOS
Whilst we know that the SKAd numbers are under reported, the question is “by how much?” It’s hard to make decisions around whether we should be investing more in acquiring Android users over iOS ones without knowing this. Spending more money on Android is appealing because it’s easier to measure the return on investment but we know that the majority of our customers are iOS users.
Things are further complicated by the fact that non-Apple media providers must use SKAdNetwork but they themselves provide an anonymised user level dataset for Apple Search Ads. Meaning that channel performance is slightly skewed.
And this without even thinking about doing any simple attribution modelling ourselves. In the days before SKAd, integrating multiple marketing datasets to give a cohesive view of performance was a breeze. User level data could be compared across sources with methods such as last touch attribution (LTA) or multi touch attribution (MTA) being applied to confidently assign value.
Now we have a serious headache when trying to accurately model the interactions between our other acquisition channels and performance marketing attributions from our partner Adjust. We simply don’t have the data at the correct granularity to consistently apply even basic logic for our reporting.
🔬The science behind the adjustments
Our approach to compensating for data loss and adjusting reported personal and business account activations is straightforward yet effective. We utilise SKAd events but refine the SKAd event numbers by incorporating internal data on the percentage of applications approved and completed within the 24-hour time restrictions.
Let's illustrate this with an example: Suppose a campaign registers 100 activations according to SKAd. By analysing our internal data, if we find that 50% of users who completed the application within the last 7 days did so within the 24-hour timer restriction, we can adjust the campaign's activations accordingly. In this case, the campaign would be credited with 200 activations (100 divided by 50%).
This approach offers several advantages. Firstly, it aligns directly with how SKAd activations were filtered out, ensuring accuracy. Additionally, it has minimal impact on channel-level optimisation since there are no significant differences in the time users take to navigate the application funnel across marketing channels.
By employing this methodology, our performance marketing team can make informed decisions and optimise campaigns while mitigating the data loss caused by SKAd limitations.
🔌 How we integrated the adjustments to our reporting systems
Having a methodology for better understanding our data is one thing, but to truly leverage this we need to integrate this into our automated reporting systems. At Monzo, these systems are built in BigQuery and powered by dbt models scheduled in Apache Airflow with dashboarding in Looker.
When it came to marketing, it made sense to design a solution that leveraged the modular nature of the data we get from our partners, whilst keeping things as simple as we could to provide transparency on how metrics are calculated. The end product of this pipeline is two denormalised dbt models that preserve the data at its lowest available granularity upon which all of our reporting is built. These exist at the media campaign and ad levels, with the former aggregating data from the latter and integrating other sources where ad level or equivalent data is not available.
The approach we’ve applied to create these models was first to transform data from all sources into a common structure before unioning it all together. This means the data in the model is somewhat sparsely populated, there are common dimensions across all rows but some will carry data on spend and others on ad performance for example. However, we then exploit the nature of Looker’s aggregated measures and the common dimensions to consolidate the data into the format required for reporting; handling the null values and allowing us to still compute more complex metrics like CAC that requires data from different sources to be calculated.
We’ve seen a number of benefits since implementing this strategy:
Since we’re not doing any complex transformations, the data is preserved in something near to its initial structure. This means the models run very efficiently but also that we can see the flow of data through our pipelines much more effectively, allowing us to quickly find the root cause of issues when data queries come in or when debugging.
This modular nature means we have good stability because no single data source forms the foundation of our model. It also has made it very easy to integrate new data sources and components into the pipeline.
One good example of these components is when we’ve extended our modelling to add the adjustment for SKAd data loss. The adjustment ratios and values are calculated in their own model then the corrections are applied in another union block in our final table. Meaning when the data is aggregated in our reporting layer, the reported data and the adjustment are summed to give us the total. This way we can track the effect the adjustments are having on our reporting and easily toggle them on or off in our dashboards.
🏆 What impact has this had for Monzo?
With adjustments to compensate for SKAdNetwork's data loss and accurate tracking of user growth through performance marketing, we achieved remarkable results:
Personal activations have surged by an impressive 20%, while business account activations have seen a notable 5% boost from marketing campaigns. These adjustments provide us with the confidence to invest more in iOS campaigns, which were previously under-indexed due to data limitations.
Overall, this work has enabled us to have a clearer picture of campaign success, leading to us better optimising our efforts effectively and driving better results.
🛸 SKAD 4.0 and the future
As the landscape of performance marketing keeps evolving, we must stay adaptable and open to change. Exciting developments are on the horizon, such as the arrival of Google Analytics 4 and the upcoming SKAd 4.0, which will allow us to receive up to three postbacks instead of just one under the current SKAd 3.0 framework.
To thrive in this dynamic environment, we constantly review our approaches and assumptions.
Our solid foundation in the marketing data models make it easy for us to incorporate new features and data sources as they become available. We also collaborate closely with Adjust to improve our setup and optimise our performance marketing strategies.
At Monzo, safeguarding user privacy is paramount. We are fully committed to working with the challenges posed by SKAdNetwork to ensure not only the protection of user privacy but also to foster sustainable growth within a privacy-friendly framework.
⛳️ Conclusion
The power of data modelling:
The overhaul of our systems has boosted the visibility of data to our marketeers and saved them in their day to day. This has led to a surge in activations across both our personal and business accounts, helping us grow more quickly and properly represent the value of advertising to Monzo.
Simple solutions can be the most effective:
Some of the problems we work on in data can be very involved and complex. It’s important that our solutions intuitive and transparent so that other Monzonauts can quickly grasp what is happening and we can easily see the effect they’re having on our data.
The world of performance marketing is always changing:
The key to iterating towards a best in class solution in this space, is not being complacent. We continually review our modelling/systems and look for ways to improve, integrate new technologies or account for changes in the space.
Collaborating across data disciplines:
Across Monzo, our data scientists and analytics engineers work closely from beginning to end to tackle these challenges and productionise solutions.
We hope we’ve been able to give you a taste of what we’re doing in this space and you’ve enjoyed digesting this post, if you’re keen to discuss any of the topics further then please do reach out on our community post!
🚀 Join the team
If you want to get even more involved, we’re actively hiring talented individuals who are passionate about solving data problems and delivering results. Join us on this exciting journey as we embrace innovation, leverage emerging technologies, and shape the future of data-driven marketing.