A new way to look at applied experiments

In this article, Connor Joyce, Author of Bridging Intentions to Impact, shares how to design effective experiments that drive real results, even with limited resources.

11 min read
Share on

In a recent piece, I shared the power of identifying a feature’s intended purpose as a means for driving alignment around what should be designed and how to measure success. Once a team has a clear understanding of what their new feature is intended to achieve, it becomes crucial to validate its effectiveness through experimentation. When a feature works, the team can replicate it in future efforts and market the impact it creates. If a feature does not perform as expected, experimentation provides the necessary insights to refine and redirect efforts, ensuring that the product development remains aligned with user needs and business goals.

While valuable, many teams still struggle to conduct experiments, often due to a few misconceptions. One common belief is that all experiments equate to the gold standard in academics, the Randomized Controlled Trials (RCTs). This leads to the false notion that simpler or less rigorous methods are inadequate. Another misconception is that experiments must be done with a complete digital infrastructure to be valuable. While it is true that having a full experimentation structure makes testing at scale easier, it is something to strive for rather than a prerequisite. Beyond these, a general lack of knowledge on how to properly set up experiments further complicates the process. 

There are many great pieces describing how to set up experiments but less that help get started today. This article, as well as with my new book “Bridging Intention to Impact”, aims to demystify experimentation by introducing a new gold standard for experimentation of digital products. It then breaks down the seven essential components of experimentation, illustrating that not all need to be fully met to conduct a valuable experiment. By providing this comprehensive guide, I hope to empower teams to design and implement experiments regardless of their current state. Doing so equips the team with shareable insights rather than just theoretical pitches, putting them on a path to make the case that the company should invest further in experimental infrastructure.

The new ideal experiment

Randomized Controlled Trials (RCTs) are often considered the gold standard in experimental research due to their ability to minimize noise and extraneous variables, thus providing highly reliable results. However, in an applied setting, RCTs are rarely realistic. The extensive resources, time, and controlled environments required for RCTs make them impractical for most product teams, who operate under tight deadlines and budget constraints. Consequently, teams must break free from the notion that RCTs are the ultimate goal and instead focus on creating practical, feasible experiments that still yield valuable insights.

The ideal experiment for many teams is fully digital and easily executable. This approach allows for rapid testing and iteration, which is essential in today's fast-paced development cycles. Teams like Airbnb exemplify this strategy, running hundreds of experiments annually by leveraging their ability to conduct fully digital tests. This agility enables them to continually optimize their offerings and stay competitive in the market. A fully digital experiment comprises several key components that collectively ensure its effectiveness and feasibility. Firstly, digital execution allows for seamless data capture and broad participant reach. By utilizing feature management systems, teams can quickly deploy and toggle new features for different user groups, facilitating efficient and targeted testing.

Data infrastructure is another critical component. Robust mechanisms for collecting, storing, and analyzing data are essential for real-time analysis and informed decision-making. Advanced data science capabilities further enhance the experiment's value by applying sophisticated statistical techniques to uncover deeper insights. In addition to behavioral metrics, collecting attitudinal data through surveys and feedback tools provides a comprehensive view of the user experience. This qualitative data complements the quantitative findings, offering a more nuanced understanding of how features impact users. A well-chosen participant pool for both the behavioral and attitudinal data ensures the generalizability of the findings.

In summary, while RCTs may be ideal in theory, the reality of applied settings necessitates a more pragmatic approach. By focusing on fully digital experiments and incorporating the essential components of data infrastructure, advanced analytics, attitudinal data collection, and diverse participant recruitment, teams can conduct effective experiments that drive evidence-based decision-making and continuous improvement.

Digital behavioral experiments are fully digital and completed in real time with a random sample, but without any attitudinal data.
Digital or in­-personThis experiment is fully digital in a way that a user may not even know that they are a part of the test.
Real­time or retroactiveIt is real time such that users are actively engaging with the feature in their live experience.
Feature rollout systemThis requires a feature flagging system where users will be greeted with different options.
Data infrastructureThis requires the ability to capture behavioral data.
Data science capabilityThe ability to manage all the behavioral data will require someone trained in data science.
Surveying capabilityNot required for completion.
Participant poolThis group of product users is where participants are drawn from.

The seven components of an experiment

The previous section outlined an ideal experiment, which is structured around seven primary components. These components form the foundation of a robust experimental framework. In this framework, each component is given a name and an ideal state to provide a deeper understanding of its role and importance along with how to gauge if a company has it established. It is crucial to remember that not all seven components are required to conduct a successful experiment. Instead, teams should evaluate their current capabilities in each category and design an experiment that is realistic and feasible given their resources and constraints. This approach ensures that even with limited resources, meaningful and actionable insights can still be obtained.

  1. Digital or in-person execution: Choosing between digital and in-person execution is crucial depending on the product's nature and the data required. Digital experiments are often more feasible, allowing for automated data capture and broader participant reach. However, in-person experiments may be necessary for physical products or when collecting detailed attitudinal data. In such cases, usability studies and ethnographic research can provide valuable insights, even if they require larger sample sizes for statistical significance.
  2. Experimentation platform: An effective experimentation platform is essential for deploying new features and managing variations efficiently. In a digital environment, feature management systems enable quick toggling of features for different user groups. This flexibility is critical for iterative testing and rapid adjustments. In physical settings, although more challenging, similar principles apply to ensure that variations can be tested without significant delays.
  3. Data infrastructure: Robust data infrastructure underpins the entire experimentation process. It involves mechanisms for passive data collection, storage solutions, and systems for building and analyzing metrics. Ideally, data should flow seamlessly from collection to analysis, allowing for real-time insights. When advanced infrastructure is lacking, teams may resort to manual data collection and rudimentary storage solutions, but the goal remains to ensure data is accessible and usable.
  4. Data science capability: Advanced data science capabilities elevate the quality of insights derived from experiments. Teams with expertise in statistical analysis and causal inference can perform more sophisticated analyses, such as propensity score matching, enhancing the validity of the results. While not always necessary, having these capabilities adds a layer of rigor that brings experiments closer to the gold standard of causality.
  5. Real-time or retroactive data collection: Deciding whether to collect data in real-time or retrospectively depends on the experiment's objectives and available resources. Real-time experiments involve recruiting participants and capturing data as they interact with the product. In contrast, retroactive studies use pre-existing data, which can be more efficient but may lack some real-time nuances. Both approaches have their merits, and the choice should align with the study's goals.
  6. Attitudinal data collection: Collecting attitudinal data provides qualitative insights that complement behavioral metrics. Surveys and feedback tools are the primary methods for gathering this data. Advanced setups might use automated triggers to send surveys based on user actions, integrating attitudinal data collection seamlessly into the user experience. This capability helps capture users' perceptions and sentiments, adding depth to the quantitative data.
  7. Participant pool: A well-chosen participant pool ensures the findings are representative and actionable. Recruiting a diverse and representative sample can be challenging, but it is crucial for reducing bias and enhancing the reliability of the results. Strategies for participant recruitment include using digital systems to reach a broad audience or building insider groups willing to participate in research. A convenience sample may suffice in some cases, but striving for randomness and diversity is always beneficial.

The spectrum of experiments

Experiments can be viewed on a spectrum ranging from high-fidelity, resource-intensive setups to simpler, more accessible methods. At one end of the spectrum lies the ideal experiment, characterized by fully digital execution, comprehensive data collection, and advanced analytics capabilities. These experiments provide the most reliable and actionable insights but require significant resources and infrastructure. In the middle of the spectrum are fully retro experiments, which utilize existing behavioral data to draw insights without the need for real-time data collection. This method strikes a balance between fidelity and feasibility, leveraging robust data infrastructure while minimizing the need for new data collection efforts. On the other end of the spectrum are basic moderated experiments, which are the most accessible and least resource-intensive. These involve direct interaction with participants in a live setting, providing valuable insights, albeit with a limited sample size, with minimal technological requirements. Understanding this spectrum allows teams to choose the experimental approach that best fits their resources and objectives. Further details on these experiment types include: 

Fully retro experiments are digital, and the experimental groups are constructed using historical usage of a feature. Given the passive nature of the study, they can use only behavioral data.
Digital or in­-personThis is a digital experiment where all participants interact with the feature on a SaaS application.
Real­time or retroactiveDeployed on behaviors that already occurred and thus done retroactively.
Feature rollout systemThe feature was rolled out to everyone, but not all users have engaged.
Data infrastructureBehavioral data is tracked on users who engage with the feature, and metrics are created to align with the specific behaviors outlined in the user outcome connection.
Data science capabilityUsing the propensity score matching statistical technique, the study compares the feature users against a group of non­ users who share similarities in all relevant metrics (for example, demographic characteristics, types of behavior on SaaS products, months of experience with the product). This enables an experiment in a completely retroactive sense.
Surveying capabilitySince this is completely retroactive, no surveying occurs.
Participant poolThis study is completed on a random sample of current users.
Moderated experiments include physically showing prototypes to users and then actively collecting behavioral and attitudinal feedback, only hypothetical outcome metrics.
Digital or in­-personPhysical experiments done with users in the headquarters office.
Real­time or retroactiveThis is performed in real time as people will be interacting with it in a live setting.
Feature rollout systemParticipants can interact with different types of prototypes.
Data infrastructureA note taker is present in each interview to record how the participants interact with the feature. All data is collected through this note­taker.
Data science capabilityLimited data science skills on the team limit the results to comparisons of averages and other basic descriptive analytics.
Surveying capabilityParticipants respond to a basic survey after the study.
Participant poolInterviewees are chosen from a pool of users who responded to a random solicitation regarding the study.

By understanding and utilizing these alternative experiment types, teams can adapt their research strategies to fit their resource availability and still achieve meaningful, actionable insights. Evaluating where your team is at on each of the seven components also sets the stage for developing a strategic roadmap to make progress toward building all of the components. It is a solid approach to begin experimenting with what you have today and use the insights to create support for further investments in improving the components. 

Ultimately, it is up to the product and research teams to carefully assess their data needs in conjunction with the available resources to choose the most appropriate experimental approach. Whether striving for the high fidelity of fully digital, ideal experiments, leveraging existing data through fully retro experiments, or employing basic moderated experiments to gather direct user feedback, the key is to align the chosen method with the team's objectives and constraints. By thoughtfully considering the spectrum of experimental options and understanding the trade-offs involved, teams can effectively gather the evidence needed to drive informed, evidence-based decision-making, ensuring that product development remains user-focused and impactful.

Connor is a keynote speaker at #mtpcon North America. During his keynote 'AI Features Demand Evidence-Based Decisions', Connor will share the essential skills needed to lead successful teams and the traits that will help next-gen product people excel in their careers. Gain insights into tools that empower practitioners and leaders to excel in their daily work.

Don't miss out on this opportunity to learn from product leaders - buy your ticket!

Comments

Join the community

Sign up for free to share your thoughts