A new way to look at applied experiments

In this article, Connor Joyce, Author of Bridging Intentions to Impact, shares how to design effective experiments that drive real results, even with limited resources.

July 22, 2024

11 min read

Share on

Connor Joyce

Connor is the CEO of Desired Outcome Labs, a consulting firm focused on assisting clients build products that work. He espouses the adoption of an Impact Mindset, maximizing the effect features create to drive retention and organic growth. Through his experience at Microsoft, BetterUp, Twilio, and Deloitte, Connor is an experienced product builder and user researcher. Additionally, he is a frequent contributor to numerous publications and a keynote speaker.

The new ideal experiment

Randomized Controlled Trials (RCTs) are often considered the gold standard in experimental research due to their ability to minimize noise and extraneous variables, thus providing highly reliable results. However, in an applied setting, RCTs are rarely realistic. The extensive resources, time, and controlled environments required for RCTs make them impractical for most product teams, who operate under tight deadlines and budget constraints. Consequently, teams must break free from the notion that RCTs are the ultimate goal and instead focus on creating practical, feasible experiments that still yield valuable insights.

The ideal experiment for many teams is fully digital and easily executable. This approach allows for rapid testing and iteration, which is essential in today's fast-paced development cycles. Teams like Airbnb exemplify this strategy, running hundreds of experiments annually by leveraging their ability to conduct fully digital tests. This agility enables them to continually optimize their offerings and stay competitive in the market. A fully digital experiment comprises several key components that collectively ensure its effectiveness and feasibility. Firstly, digital execution allows for seamless data capture and broad participant reach. By utilizing feature management systems, teams can quickly deploy and toggle new features for different user groups, facilitating efficient and targeted testing.

Data infrastructure is another critical component. Robust mechanisms for collecting, storing, and analyzing data are essential for real-time analysis and informed decision-making. Advanced data science capabilities further enhance the experiment's value by applying sophisticated statistical techniques to uncover deeper insights. In addition to behavioral metrics, collecting attitudinal data through surveys and feedback tools provides a comprehensive view of the user experience. This qualitative data complements the quantitative findings, offering a more nuanced understanding of how features impact users. A well-chosen participant pool for both the behavioral and attitudinal data ensures the generalizability of the findings.

In summary, while RCTs may be ideal in theory, the reality of applied settings necessitates a more pragmatic approach. By focusing on fully digital experiments and incorporating the essential components of data infrastructure, advanced analytics, attitudinal data collection, and diverse participant recruitment, teams can conduct effective experiments that drive evidence-based decision-making and continuous improvement.

Digital behavioral experiments are fully digital and completed in real time with a random sample, but without any attitudinal data.
Digital or in-person	This experiment is fully digital in a way that a user may not even know that they are a part of the test.
Realtime or retroactive	It is real time such that users are actively engaging with the feature in their live experience.
Feature rollout system	This requires a feature flagging system where users will be greeted with different options.
Data infrastructure	This requires the ability to capture behavioral data.
Data science capability	The ability to manage all the behavioral data will require someone trained in data science.
Surveying capability	Not required for completion.
Participant pool	This group of product users is where participants are drawn from.

The seven components of an experiment

The previous section outlined an ideal experiment, which is structured around seven primary components. These components form the foundation of a robust experimental framework. In this framework, each component is given a name and an ideal state to provide a deeper understanding of its role and importance along with how to gauge if a company has it established. It is crucial to remember that not all seven components are required to conduct a successful experiment. Instead, teams should evaluate their current capabilities in each category and design an experiment that is realistic and feasible given their resources and constraints. This approach ensures that even with limited resources, meaningful and actionable insights can still be obtained.

Digital or in-person execution: Choosing between digital and in-person execution is crucial depending on the product's nature and the data required. Digital experiments are often more feasible, allowing for automated data capture and broader participant reach. However, in-person experiments may be necessary for physical products or when collecting detailed attitudinal data. In such cases, usability studies and ethnographic research can provide valuable insights, even if they require larger sample sizes for statistical significance.
Experimentation platform: An effective experimentation platform is essential for deploying new features and managing variations efficiently. In a digital environment, feature management systems enable quick toggling of features for different user groups. This flexibility is critical for iterative testing and rapid adjustments. In physical settings, although more challenging, similar principles apply to ensure that variations can be tested without significant delays.
Data infrastructure: Robust data infrastructure underpins the entire experimentation process. It involves mechanisms for passive data collection, storage solutions, and systems for building and analyzing metrics. Ideally, data should flow seamlessly from collection to analysis, allowing for real-time insights. When advanced infrastructure is lacking, teams may resort to manual data collection and rudimentary storage solutions, but the goal remains to ensure data is accessible and usable.
Data science capability: Advanced data science capabilities elevate the quality of insights derived from experiments. Teams with expertise in statistical analysis and causal inference can perform more sophisticated analyses, such as propensity score matching, enhancing the validity of the results. While not always necessary, having these capabilities adds a layer of rigor that brings experiments closer to the gold standard of causality.
Real-time or retroactive data collection: Deciding whether to collect data in real-time or retrospectively depends on the experiment's objectives and available resources. Real-time experiments involve recruiting participants and capturing data as they interact with the product. In contrast, retroactive studies use pre-existing data, which can be more efficient but may lack some real-time nuances. Both approaches have their merits, and the choice should align with the study's goals.
Attitudinal data collection: Collecting attitudinal data provides qualitative insights that complement behavioral metrics. Surveys and feedback tools are the primary methods for gathering this data. Advanced setups might use automated triggers to send surveys based on user actions, integrating attitudinal data collection seamlessly into the user experience. This capability helps capture users' perceptions and sentiments, adding depth to the quantitative data.
Participant pool: A well-chosen participant pool ensures the findings are representative and actionable. Recruiting a diverse and representative sample can be challenging, but it is crucial for reducing bias and enhancing the reliability of the results. Strategies for participant recruitment include using digital systems to reach a broad audience or building insider groups willing to participate in research. A convenience sample may suffice in some cases, but striving for randomness and diversity is always beneficial.

The spectrum of experiments

Experiments can be viewed on a spectrum ranging from high-fidelity, resource-intensive setups to simpler, more accessible methods. At one end of the spectrum lies the ideal experiment, characterized by fully digital execution, comprehensive data collection, and advanced analytics capabilities. These experiments provide the most reliable and actionable insights but require significant resources and infrastructure. In the middle of the spectrum are fully retro experiments, which utilize existing behavioral data to draw insights without the need for real-time data collection. This method strikes a balance between fidelity and feasibility, leveraging robust data infrastructure while minimizing the need for new data collection efforts. On the other end of the spectrum are basic moderated experiments, which are the most accessible and least resource-intensive. These involve direct interaction with participants in a live setting, providing valuable insights, albeit with a limited sample size, with minimal technological requirements. Understanding this spectrum allows teams to choose the experimental approach that best fits their resources and objectives. Further details on these experiment types include:

Fully retro experiments are digital, and the experimental groups are constructed using historical usage of a feature. Given the passive nature of the study, they can use only behavioral data.
Digital or in-person	This is a digital experiment where all participants interact with the feature on a SaaS application.
Realtime or retroactive	Deployed on behaviors that already occurred and thus done retroactively.
Feature rollout system	The feature was rolled out to everyone, but not all users have engaged.
Data infrastructure	Behavioral data is tracked on users who engage with the feature, and metrics are created to align with the specific behaviors outlined in the user outcome connection.
Data science capability	Using the propensity score matching statistical technique, the study compares the feature users against a group of non users who share similarities in all relevant metrics (for example, demographic characteristics, types of behavior on SaaS products, months of experience with the product). This enables an experiment in a completely retroactive sense.
Surveying capability	Since this is completely retroactive, no surveying occurs.
Participant pool	This study is completed on a random sample of current users.

Moderated experiments include physically showing prototypes to users and then actively collecting behavioral and attitudinal feedback, only hypothetical outcome metrics.
Digital or in-person	Physical experiments done with users in the headquarters office.
Realtime or retroactive	This is performed in real time as people will be interacting with it in a live setting.
Feature rollout system	Participants can interact with different types of prototypes.
Data infrastructure	A note taker is present in each interview to record how the participants interact with the feature. All data is collected through this notetaker.
Data science capability	Limited data science skills on the team limit the results to comparisons of averages and other basic descriptive analytics.
Surveying capability	Participants respond to a basic survey after the study.
Participant pool	Interviewees are chosen from a pool of users who responded to a random solicitation regarding the study.

By understanding and utilizing these alternative experiment types, teams can adapt their research strategies to fit their resource availability and still achieve meaningful, actionable insights. Evaluating where your team is at on each of the seven components also sets the stage for developing a strategic roadmap to make progress toward building all of the components. It is a solid approach to begin experimenting with what you have today and use the insights to create support for further investments in improving the components.

Ultimately, it is up to the product and research teams to carefully assess their data needs in conjunction with the available resources to choose the most appropriate experimental approach. Whether striving for the high fidelity of fully digital, ideal experiments, leveraging existing data through fully retro experiments, or employing basic moderated experiments to gather direct user feedback, the key is to align the chosen method with the team's objectives and constraints. By thoughtfully considering the spectrum of experimental options and understanding the trade-offs involved, teams can effectively gather the evidence needed to drive informed, evidence-based decision-making, ensuring that product development remains user-focused and impactful.

Connor is a keynote speaker at #mtpcon North America. During his keynote 'AI Features Demand Evidence-Based Decisions', Connor will share the essential skills needed to lead successful teams and the traits that will help next-gen product people excel in their careers. Gain insights into tools that empower practitioners and leaders to excel in their daily work.

Don't miss out on this opportunity to learn from product leaders - buy your ticket!