Deep dive: Engineering artificial intelligence for trust

From broad to narrow to broad: Trust in small things builds up trust in the whole solution — as long as the pieces fit together

20 min read
Share on

Artificial intelligence” is a broad term loosely applied to technologies focused on delivering automated or augmented decision-making. Looking behind the geeky buzzword extravaganza describing the various techniques, the essential parts of any AI are a data-generating mechanism, a machine learning software system applied to the data to determine the best choice for a particular question or problem, and a way to interface the decision to either a machine or a human user. The past decades saw a rapid transition from technological novelty towards relatively well-established capabilities accessible to a wider audience (supported by a maturing technological eco-system, see fig. 1), and so the big question for the next period will be how to build “AI-native” products that leverage these capabilities in unique and trustworthy ways to serve users.

Fig. 1 A brief timeline of AI adoption across industries (illustrative).

Trust is essential to successful AI-powered products. When we trust something or someone, we rely on them to make the right decision. Since AI-powered products are making decisions for their users – which shows to watch on Netflix, how much to invest in which stock, which customers to target with your marketing to maximize customer acquisition – you will only use these products if you trust them. Trust is a very human thing – we learn about it from the early days and build mental models for what or who can be trusted and when we should be careful. With trust being essential to adoption, engineering your AI-powered product to imbue trust is key to success, whatever success means to you.

With the rapidly growing number of AI-augmented or automated use cases, there are also concerns about incorrect decisions from these products being voiced monthly, if not weekly. So, what would you trust AI to handle for you in your life?

Let’s dive in.

What role does trust play in AI-powered business (B2B) systems?

The particular strength of AI is the ability to deduce the right prediction or decision, including generalizing decisions made by multiple users to find the best decision for you.

Solving the technical challenges of building AI is hard, yet they are often both simpler and more tangible compared to earning user trust – especially in B2B, where we all know that the user is not the customer. Solving for trust is just as big of a challenge, along the orthogonal axis of product management. Even with the rapidly improving technology, there will be decisions which the AI will not get right – just like humans don’t get everything right – but unlike with human decision-makers, unclear fallback options for when the AI fails and lack of clear ownership in case of failure can introduce more challenges than benefits. The essence is that by replacing human decision-making, these technologies take on a high level of responsibility (implicitly or explicitly), yet who will be accountable in case of failure is often much less clear: will it be the people who used to do the job that got handed over to an AI? Will it be senior leaders who have signed off on the implementation? Or will it be the internal sponsor/champion of the AI who argued for doing it in the first place? Sometimes, those who were formerly accountable for the decision will continue to feel this responsibility, and will therefore be very hesitant to adopt an AI-based solution. Their motivations can be professional pride, a sense that the interesting parts of their job is being taken away, loss of prestige in the job or perhaps a feeling that the job is being reduced to “AI support”, but in some situations, there may also be legal responsibilities assigned to roles, which cannot simply be handed over to the AI.

Another challenge, particularly in non-consumer segments, is when the choice to implement an AI is driven by senior leaders with one objective in mind, but without appreciating the impact this lack of clarity of accountability can have on adjacent areas. A terrifying example of this is predictive policing (deploying police based on AI models to have the best chance to reduce or prevent crime) which has led to increased civilian risk in non-target areas, increased risk for citizens in target areas (in some cases with racial or other biases). Additionally, these systems also risk weaponizing policing since they depend on human input data, as grotesquely illustrated in the 2002 Hollywood hit Minority Report.

A more common example of this from the “traditional” business domain is the choice by one manager to implement an AI for extracting information from printed documents for adding to the financial report to improve the performance of his team, but without appropriate oversight fails to see the bias in the AI, which ultimately leads to incorrect financial data for the company. Bummer.

In most cases, until the capable lawyers of the world come up with a good set of regulations, we think it is best to keep human accountability in place and consider AI systems not as “autonomous agents”, but rather “augmentation/automation” systems. As the preceding examples show, adoption of any new AI tool hinges on understanding and preempting the explicit and unspoken concerns among the users and beneficiaries. In other words, success with AI products necessitates an even deeper understanding of user needs, value drivers and emotional drivers than most non-AI products. Or, put bluntly, you need to do even more of what good people do: speak to your users.

Fig. 2 Talk to people! In many B2B scenarios, especially the more “traditional” ones, the users, the stakeholders, and the beneficiaries are abstracted by layers of reporting and communication lines, leaving the most important people faceless.

Identifying users’ perspectives on risks is the starting point in building for trust

So now that we need an even deeper understanding of our users to earn trust and drive adoption, what is the best way to address people’s concerns when building AI-native products? In our experience, analyzing your idea along the three product development dimensions of entry barriers, adoption barriers, and scale barriers clarifies where the challenges are. The emerging risk picture is the starting point for product development from which you can engineer solutions for trust.

Entry barriers are concerns (real or perceived) by users which would make them hesitant about entertaining the notion of using your product in the first place. The minimalist number of stakeholders to analyze is listed below, but it is often worthwhile to extend the list.

  • For the job or function: In the eyes of the user, what is the cost if the AI makes the wrong decision (risk) versus the perceived impact of getting it right (reward)? Getting the support of the users is crucial for the success of any tool, so this message needs to be clear.
  • For the individual (emotional and personal situation): Will my job change to something I don’t like, or will it improve?
  • For the company: What are the transition costs (immediate investments in systems, new operating model, new skills, operating with new risk picture?) versus the expected benefit (including any changes in risk picture from introducing the product)?

Adoption barriers are challenges or concerns preventing the use of the product. Common examples include

  • Trust: users don’t trust the system to make the right decisions and therefore won’t use it, or won’t use it to the fullest extend
  • Knowledge barrier: Users don’t have  the knowledge or are not permitted to make the choices the AI requires
  • Process misalignment: implementation of the AI requires changes in work processes, affecting jobs and operations
  • Organizational challenges: the use and the operation of the AI may be split across different departments (e.g. a commercial function and IT), each with different objectives and agendas

Scale barriers are challenges limiting the growth of the product

  • Network effects: products where the value comes from having a large user base like social network or buyer-seller platforms like eBay and Etsy must consider how to get the first users in and retain them during initial growth
  • Process alignment: many companies must agree to the same process, or at least the same approach towards the outcome.
  • Privacy concerns: cross-user learning by the AI can be disapproved (could accidentally reveal trade secrets).

Cultivating trust with your product: concepts and examples

Once you’ve understood the barriers at play for your product, what can you do to address them? We’ve found that the three barriers can be tackled by employing three concepts:

  • Transparency: how is the AI making its decision?
  • Consistency: consistently getting the right outcomes
  • Impact understanding: what’s at stake with letting the product make the decision?

The biggest problem is that there’s rarely a case when only one of the barriers is at play. It can be tackled by one of the concepts (fig. 3). We call this “3×3 = ∞” to illustrate that for all products with their intricate barriers you often need to draw on several of the concepts to build trust.

 

Transparency Consistency Impact
Entry barrier ?
Adoption barrier
Scale barrier

Fig. 3. Unfortunately, this is true – people are not always rational.

Let’s look at some real-life quotes from AI-powered product users and see which concept is best to mitigate the barrier at hand.

“What’s the formula for this? I want to double check…”

Adoption barrier – the user disagrees with the AI prediction, but they do not want to be caught off guard – the numbers must align between what’s in their mind, and the “formula” output.

This is a great example of when employing transparency (the mechanism) works well – not by literally exposing the “formula”, because in the world of large neural nets, it is quite impossible, but by being transparent about the inputs (context) and the rationale behind the prediction. For example, if we’re to suggest that a task in a team’s project is about to exceed the estimate, the project manager would want to understand why their meticulous estimation process failed in this specific instance.

Instead of trying to prove the user wrong, we can approach it by giving a few examples (a set of past observations), which look similar to the case at hand.  Seeing past examples where the user can validate that the product was used to achieve the desired outcomes (in the example above: estimate correctly), builds confidence with the user that the product is likely also making the right with the latest prediction, even if it “seems off”.

“Your ‘robots’ are usually 80% right, that’s great!”

Scale barrier – this quote indicates that the scale barrier has been overcome by being predictably consistent – you’re onto something big!

What makes people feel at ease is consistent behavior in their daily interactions with systems. Knowing what to expect from the system helps them plan their days, weeks, and months in a way, which works for them, not for your product. The resource manager here knows that the majority of their challenges will be correctly handled by the AI system, giving them time to deal with the tricky cases – a win-win situation!

“Even if you get it 3 times out of 10, it is still useful for my team.”

Entry barrier – the mechanism of impact understanding helps ship and iterate faster.  Often, AI model metrics warrant delayed deployment – the F1 score is too low across the customer base, the precision and recall are not well balanced, and so on. While these are important metrics to keep in mind, for the AI product you are building, there has to be someone who will find the imperfect AI model outputs very impactful.

Spend time figuring out who will benefit most from what you have here and now. There’s only one thing that everyone likes – pizza – and your AI-powered product is not it, so spend time figuring out the segment, which will feel the positive impact from even the early versions of your AI-powered product.

Tactics to build for trust: breaking full products into small trust-building entities

Now that we have a framework to map user needs and concerns with AI to tangible risks and some general concepts for how to address those risks, let’s explore some practical product development tactics for AI products.

Trust in the small things builds trust in the product

Trust is a result of consistency over time. This holds for all products, and in fact, for everything we built trust in, whether it’s your friends, your new car will start in the morning, or your bank is not cheating you. In other words, to get users to trust your product, you must consistently indicate to them through all actions of your product that you understand their context and concerns and that you would make the same choice that they would – i.e. you must make it clear that you “get them”.

In particular for AI-powered products which are making decisions for the users, one mechanism to leverage is that trusting in the little things builds trust in the whole solution. Align AI suggestions, user journeys and the full experience of the product to what the user cares about and is working to accomplish; that’s how you build their confidence. To get this across, one must align many small pieces within and outside the “core” AI model so you repeatedly reconfirm to the user that you “get them”. Trust is built through consistently doing what the user expects at all these micro-touchpoints; if your product does something unexpected or something inconsistent with user expectations – i.e. if your GPS suggests you go the wrong way down a one-way street – users will immediately question any other recommendation. Conversely, adding more trust-building micro-touchpoints can even accelerate the development of user trust, similar to how many notifications can build user habits.

Staircase of trust

In practical terms we have found that a good way to leverage this idea of repeated micro-touchpoints is by staging the features of your product to meet expectations on the little things first. Start by building trust in simpler functionality in earlier releases. Over time, add more sophistication, such as AI-powered features to the product after users have adopted the simpler iterations. We think of this as building a “staircase of trust”, where the first features unlock the initial barriers users have to trust advanced features, i.e. unlocking product-market fit for the AI-powered feature sets (Fig. 4). By this approach you decouple the risk of customers not trusting your overall product from the technical risk of building a strong AI capability, carving out the time to understand your customers well enough to build the latter. At the same time, you lay the foundations on trust for the latter with the less sophisticated features.

Fig. 4: Building trust through first release simpler functionality which unlocks the barriers to users trusting more complicated features.

A booking platform one of us worked on offers a good example of these principles. The platform was looking to simplify logistics by making booking and shipment management available to small customers without logistics experience or dedicated logistics departments. Having first created simple, intuitive experiences for booking and shipment management that resonated with customers (giving them what they expected), the team could start adding live shipment tracking (addressing an issue customers only cared about because booking and execution worked), and finally start adding next-level services powered by AI such as offering bespoke recommendations on certain shipment-related details or giving bespoke offers.

Development practices to build AI for trust

We’ll close by sharing some practical tips to get product teams set up to leverage the principles we’ve discussed in this article.

To enable the team to build and maintain deep user understanding, the approaches below can be particularly helpful. It is important to understand the emotional and operational context for users and beneficiaries/subjects, as a lack of trust for either side can risk adoption.

  • Build a set of charter customers to iterate your understanding. The idea is to have a smaller group of target segment customers and continually invite their unfiltered critique and feedback to deduce their concerns and their view on risks. With this, you continue to build a understanding of the next rungs on the “ladder of user trust”, enabling you to update your product strategy.
  • Use product principles extensively on the product and the AI to define how the product should behave to imbue trust. An important component of product principles for AI products is to not only describe happy flow scenarios, but also to explicitly articulate the exception processes. If your AI cannot make a decision with high probability, how should the product behave? Whether it should make the best decision available, whether it should stop and ask for user input, or whether it should simply skip and move on to the next case will  – depending on the problem being solved and the roles involved – often be the difference maker for adoption. And, of course, update your principles when your customer understands and makes it clear that they no longer hold true.
  • Regulatory or legal approval/compliance should be pursued whenever this is expected by users. To some customers, especially in more mature or regulated industries, rubber stamps or proof of compliance with legislation or guidelines such as FDA requirements or the pending EU AI Act will significantly improve trust; in other cases, such approvals may be a requirement before customers buy your product. It is important to stress that the impact of these regulations varies across use cases and types of AI (see e.g. this overview of how different large language models score on draft EU AI regulations.)

On the technical side, the following approaches facilitate fast development of consistent AI systems

  • Control the data-generating process: first, invest effort in creating a strong data-generating process, and then worry about the AI. A frequent challenge is not to ensure you can consistently get the data your ML model needs to be effective, e.g. being dependent on users to enter data which they don’t want to do or perhaps cannot share. LinkedIn is a good example of this practice: they started by just allowing you to find people you may know, and only later – once they had a strong data generating process to understand user networks and whom they added – added the “people you may know” ML-powered functionality. The upshot is also that sharing the right information, without any AI, can, for a time, be all you need to build a user base.
  • Testing your models in production is another good approach to weed out edge cases for your ML models, as well as get user feedback on relevance, etc. The engineering practices to allow for this are collectively known as MLOps, and a good implementation of these can allow for multiple-X faster iteration cycles. As with any other tool or practice, MLOps/testing in product is not a silver bullet, but investing to make it easy to take a model to production and running it in production is always a great accelerator and will at a minimum, help launch more things.
  • While all products benefit from feedback loops monitoring user behavior and product performance, AI products need to have quantitative feedback loops to ensure the ML models powering your AI products consistently remain relevant. By comparing AI proposals to what actually happened, the ML models will adjust and improve (this process is known as model retraining). Identifying the right ways to get quantitative user feedback and building the feedback loops as part of the product is a necessity for a successful AI product; without them, the model will quickly lose its ability to get things right in the eyes of the user, and all trust will be lost. Hence, while all products benefit from feedback, for AI products, it’s crucial that this feedback is quantitative and easy to come by.

Finally, the choice of ML for your AI model type can also contribute to building trust. Some ML approaches such as causal ML, can guarantee reasonable output, others, such as explainable ML, can shed light on how a model reached its decision, allowing people to vet the decision process as a means to build trust. We have summarized these in the table below. Each approach represents a particular tradeoff, and each use case determines the best way to build trust. Put differently, there is not a single model framework that always guarantees trust.

 

Model or approach Example Benefit What it doesn’t do Which of the solution concepts does it support?
Explainable ML, strong Symbolic regression Higher likelihood that the ML model is consistent than Explainable ML, weak, but no guarantee that the ML model itself is correct Guarantee that the identified reasons were in fact the actual reasons for the outcome. There is no guarantee that these models capture the correct cause-and-effect for the problem to which they are predicting outcomes Transparency: Indicates what led the AI to its conclusion, i.e. sheds light on “what made the AI give the answer it did”
Explainable ML, weak (post factum) Lime, Shap Easy to implement and interpret, but no guarantee that the ML model itself is correct Same as above Transparency: Same as above
Causal models Causal inference, causal ML, physics-based AI systems (for decisions related to the physical world) Guarantees that the AI has the correct causal relationships, i.e. that the AI has the same understanding of cause-and-effect as the engineer who designed it Make error-less predictions; there will be some element of uncertainty remaining which means the system won’t get it right always Consistency: ensures that the AI consistently makes decisions in line with the cause-and-effect of the problem domain

Final words

In many ways, AI products are more extreme than regular software products. They are more finicky to get right due to the strong dependency on the content of the data, it is harder to see if they fail (the product itself may work fine, but if data content has changed, the decisions may be wrong or bad), and there’s moreover often a delayed response in whether they deliver the expected benefits. With such “hypersensitive” products deployed to solve what are oftentimes impactful problems, your product management will be put to the test.

As we’ve laid out in the preceding sections, we’ve found that amplifying good product techniques together with an understanding of the psychology of trust is a recipe for success. However, these principles are guide posts rather than guarantees for success, as the details of your product, your industry and your users will raise a number of detailed product challenges that you will have to discover and overcome to deliver a product that your users will trust and value.

{"HashCode":-450582994,"Height":841.0,"Width":595.0,"Placement":"Footer","Index":"Primary","Section":1,"Top":0.0,"Left":0.0}