This is the second part in a three-part technical blog series on bringing COVID Hotspot Alert to market in eight weeks. We will focus on the integration aspect of the value proposition and explain how Adatree’s Accredited Data Recipient platform turns sophisticated multi-party integration into simple data APIs and allows innovative participants to focus on business challenges.
As Australia’s only purpose-built Data Recipient platform provider, Adatree’s mission is to transform the sophisticated Consumer Data Right integration into simple and powerful APIs so Data Recipients can focus on business innovation and product features using CDR data.
Adatree kicked off the COVIDHotspot Alert (CHA) project in August 2021, built the service end-to-end using the Adatree ADR platform and released it in October 2021. As a ‘dogfooding’ process, the CHA service integrated with the ADR platform the same way as Adatree’s Data Recipient customers. We think it is the perfect time to address a few questions we’ve received:
“We have a strong engineering team in house and we like challenges. How hard is it to retrieve Open Banking data if we build it into our own system from scratch?”
“How long do you think it will take a strong engineering team to build a CDR based solution in house from scratch without all the complexity of a platform?”
“How long do you think it will take to build a minimum viable product using the Adatree ADR Platform?”
To answer the question in a context relevant to Data Recipients, we will fictionalise this project a little bit and discuss in context of a company that is new to CDR:
- How would have analysed the challenges
- Why they choose the platform over implementing the CDR integration in house
- How they can simplify their solution, speed up they delivery while optimising the service using Aggregated APIs by Adatree ADR platform
Accessing CDR data seems like another easy integration…or does it?
As an entrepreneur, Jane Doe is a keen problem solver and she has been thinking about ways to help the community fight COVID-19 with her passionate teammates. When she reads through an article about how consumers can share their transactional data securely under Consumer Data Rights (CDR a.k.a Open Baking), she has a lightbulb moment and realises it will be an intuitive utility to help fight COVID-19 if her team could match transaction data with COVID-19 exposure site venue data published by state and territory governments across Australia.
Jane reads more about CDR and quickly learns that once a consumer grants consent to share transaction data of a bank as Accredited Data Holder (ADH) with her company as an Accredited Data Recipient (ADR), her ADR team can then retrieve CDR data from that particular ADH via CDR APIs using a token representing that consent. She also learns that the token will expire shortly to keep the consumer’s data safe while her company can obtain a new token to retrieve more transactions if needed so long as the consent granted by the consumer is still active.
Things seem to come together, so Jane calls her teammate John the Engineering Lead for opinions on the feasibility of the solution. John gets really excited about the idea and has a quick skim through CDR specification while chatting with her over the phone. He then tells her
“This is a brilliant idea! CDR data API endpoints look nice and neat and it appears to follow common RESTful patterns. It should be easy to call these API endpoints just as any other integrations in our system.”
…a minute later…
“Hmm, hang on, the security profile is complicated and seems to be designed for banks… Wow, The DCR protocol is complex… And one.. two.. three.. at least four OAuth providers are involved! Gosh! Let me read more into it and get back to you later”
John is not alone in assuming participating in CDR to be as easy as any other API integrations after a quick skim into the specifications. But he soon realises, the CDR ecosystem is actually highly regulated so that consumers can share their data with service providers securely, and the ecosystem’s regulatory requirements on security, auditability, and reporting are almost as comprehensive as those for big financial institutions.
A highly regulated CDR Ecosystem for secure data sharing
Figure 1. Integrate application service directly with the CDR Ecosystem
n.b. A business needs to pass the CDR data environment audit process first to become accredited and then pass the CDR Conformance Test Suite (CTS) to be active. But the process is worth a separate blog post, so we will skip it here. As a matter of fact, half of the accredited Data Recipients are still stuck at “accredited” status since CDR data sharing commenced.
After further research and some technical design sessions, the engineering team listed all the work items below and they drafted a high level design as illustrated in Figure 1.
- CDR data can only be used for a specific consented use case
- Requests between ADR and ADH must be secured by mTLS and Private Key JWT Client Authentication guarded access token
- CDR data cannot leave the secure CDR data boundary
- CDR data lifecycle must be explicitly defined in the use case for consumer’s awareness
- All consent records must be auditable, including the consented use case and scopes
- Access to CDR data must be auditable and reportable
- CDR data within consented scopes can be retrieved from a Data Holder under a consent; if a consumer grants consent to access banking data from multiple Data Holders, multiple consents will need be granted
- An access token must be provided to Data Holders to fetch data within consent scopes. A Data Recipient needs to manage access token lifecycle with Data Holders as defined in consented use case
- A Data Recipient must provide machine-to-machine infosec endpoints for Data Holders to send consent arrangement revocation notifications so a consumer can revoke a consent from either the Data Holder or the Data Recipient. This endpoint must be protected using Self-Signed JWT Client Authentication
- A Data Recipient must manage the lifecycle of a consent and make sure the consent expires in time and that consumers are notified of consent lifecycle events
- CDR data must be removed in time from participants’ systems when the corresponding consent arrangement is revoked or has expired
- Every new software product (a collection of use cases) need go through the CTS to become active in the register
- A Data Recipient needs to manage Dynamic Client Registrations (DCR) with all Data Holders as Data Holders become ACTIVE or Software Statement Assertions (SSAs) change
- Data Recipients need to raise tickets via the ACCC CDR register and collaborate with individual Data Holders through the CDR portal when there are DCR or data API issues. Those issues could be
- misunderstanding of the CDR specification or implementation issues by Data Holders
- data issues such as invalid enumeration values in payloads
- connectivity issues (e.g. JWKS or Pushed Authorization Request endpoint failures)
- some Data Holders force Data Recipients to open tickets in the CDR ticketing system to request domain whitelisting
- The CDR specification is updated every three months and Data Recipients must keep up to date with mandatory changes, especially when the security profile changes
The massive burden of non-functional requirements on ADR for consumer benefits
That is a massive list and more than 90% of the work items are addressing non-functional requirements that govern data security but do not contribute to retrieval of CDR data for solving the business problem. And some of the non-functional requirements are way more complicated to implement than simply calling CDR data APIs endpoints.
When John and the engineering team take a closer look into the release cycle of the CDR specification, they realise that there will be regular updates to the security profile and data models. ADR’s systems must keep up-to-date with the latest specification, especially the security profile. That could be a big ongoing challenge as there are no subject matter experts in the team and everyone prefers feature development rather than addressing security concerns.
After a finger-in-the-air estimation session, John and his team estimate
- CDR data retrieval and matching will take about 2 to 3 months to complete
- But, a minimum viable product (MVP) of consent, security profile and non-functional requirements of the CDR system will take the team about a year, assuming they can quickly upskill themselves with the Financial-grade API (FAPI) security profile expertise or hire subject matter experts really fast at additional expense. At the same time, they will need to stop the other project work and focus on this project.
The team comes to a conclusion that it is not feasible to build the solution from scratch. First of all, the use case is probably no longer valid after a year; secondly, FAPI expertise is not the core business domain of the company and spending too much effort on it will incur massive opportunity cost to the business.
Before pulling the plug, John reaches out to his friend Sean in the Adatree team.
Speed-up COVID HotSpot Alert (CHA) delivery with Adatree ADR Platform
Sean tells John that his team’s guesstimation on the Data Recipient backend is in the right ballpark for a MVP of the backend integration if the team has the right skillset. The team missed an important component though – the team will need a sandbox environment to support end to end integration testing. A MVP (with rough edges) for the sandbox will take another few months and it needs to be built first as it is a dependency for the front end development.
Sean then tells John to not give up hope yet and explains how John’s team can delegate all the heavy lifting of non-functional requirements including Dynamic Client Registration, FAPI security profile implementation and consent arrangement lifecycle management to Adatree’s ADR Platform. Adatree’s out-of-box consent dashboard and Industry Sandbox will enable the John’s team to kick off the front end development right away.
With Sean’s help, John simplified his high level design from Figure 1 to Figure 2. Then John realises Adatree ADR Platform turns the sophisticated CDR specification into a simple integration that only involves a few API endpoints, exactly what everyone in John’s team is familiar with.
Now John’s team only needs to work with a few APIs:
- Consent APIs to retrieve cdrArrangementId(s) for consent(s)
- Standard CDR APIs to retrieve CDR data by cdrArrangementId(s)
- ConsentUpdate Webhook to receive consent revocation / expiration messages from Adatree ADR Platform to trigger data cleanup in CHA
Everyone in John’s team feels relieved and happy. After all, they prefer building creative application features and release them into the market as fast as they can. Lead time is their enemy! Jane is happy as well because now the team can focus on creating business value rather than spending time on infrastructural cost. Time to market is crucial for the company and it is heaps cheaper to use the platform compared to building it in house. On top of that, she just learnt that application features will be her company’s intellectual assets while the non-functional requirements are costs.
Sean gives a demo of the out-of-box consent flow, dashboard and Industry Sandbox. John recognised at least another 6 months of work if they choose to spin up a mock server internally. Jane is happy to use the out-of-box consent dashboard with minor customisations but she asks the team to build their own consent flow and simplify it as much as possible for a customised user experience. As a tech savvy product owner, she knows the out-of-box consent flow can be a reference implementation and help reduce the time spent on building the consent flow.
Sean then tells John and Jane that the Conformance Test Suite keeps changing due to the CDR specification updates, so Adatree will take care of the Conformance Test for its customer out of good will. John and Jane are thrilled.
Simplicity, performance and resilience with Aggregated APIs on the Adatree ADR Platform
Now that all the non-functional requirements are out of the way, John’s team is ready to rock and roll.
Retrieving CDR data could be a simple component that performs the below actions sequentially at intervals:
- get all consents for the use-case
- iterate through all consents for the use-case and retrieve all associated bank accounts for each consent arrangement
- fetch all recent transactions for each bank account from individual Data Holders and save the last fetched transaction for that account
It is a typical MVP but John has a gut feeling things can get complicated quickly as the number of consumers and consent arrangements grows. Aside from that, the use case does not need track status of individual consents, it only cares about matches between transactions and COVID-19 venues.
He remembers Sean talked about Aggregated APIs during the demo. So he messages Sean and explains his use case.
Sean then does a quick demo on the Aggregated API and explains how it
- allows retrieval of all bank transactions by use-case and time range and simplifies state management for Adatree’s customers
- manages the data retrieval as per SLA with Data Holders to avoid throttling
- provides extra resilience through prefetching and caching data when downstream Data Holder systems run into issues
That is more than what John hoped for! He then updates the high level design to Figure 3.
Now instead of thousands or more API calls to retrieve all recent transactions, the data retrieval component need only call the Aggregated API endpoint to get recent all transactions for the use case. If ever needed, retries will be trivial as well. How good is that!
A real story with a fantastic ending
Three months has passed since the kickoff and the COVID Hotspot Alert service went LIVE in early October 2021. The Adatree team built it to help the community fight the COVID-19 pandemic because we believe in CDR / Technology for Good.
Note: COVID Hotspot Alert is the WINNER of Best OB/OF Fintech for Good in FDATA’s Global Open Finance Awards 2021!
During this development process, Adatree’s engineering team stepped into our customer’s shoes and forced ourselves to re-experience the complexity of the ecosystem so we can better help our customers to:
- understand the challenge and regulatory requirements
- focus on creating business value with CDR data without worrying about the moving pieces of continuously evolving security profile
- make an informed decision on Buy vs Build
- innovate with CDR data on top of a simplified, performant and resilient ADR platform
Drop us an email at hello@adatree.com.au and let us help you build something for good using CDR data