Now compare that to a sequential testing approach. With alpha set to 0.05, we can see that the number of rejections far exceeds that of our threshold that we set for our Type 1 error if we peek at our results -num_reject should never be higher than 5 in this example. Since this is an A/A test, there is no difference. Here, “baseline” is the conversion rate of our control variant, and “delta_true” is the absolute difference between our treatment and the control. The table below summarizes the number of rejections we have for different configurations of our experiment when we run a t-test. By peeking, we are inflating the number of false positives. You can see the p-values fluctuate quite a bit, even before the end of our test when we’ve reached 10,000 visitors. Whenever we see the p-value fall below alpha, we stop the test and conclude that it has reached statistical significance. As we ran a t-test on data coming in, we peeked at our results at regular intervals. But fixed-horizon tests-such as t-tests, for example-can give you inflated false positives if you peek while your experiment is running.īelow is a visualization of p-values over time in a simulation we ran of 100 A/A tests for a particular configuration (alpha=0.05, beta=0.2). How does sequential testing compare to a t-test?Īs mentioned above, using sequential testing lets you look at the results whenever you like. So we get the following mixture of likelihood ratios against the null hypothesis that :Ĭurrently, Amplitude only supports a comparison of arithmetic means between the treatment and control variants for uniques, average totals, and sum of property. The weight function, H, is the mixing distribution. Amplitude Experiment uses a family of sequential tests called mixture sequential probability ratio test ( mSPRT ). There are a number of different sequential testing options. Experiment’s statistical model uses sequential testing to look for any difference between treatments and control. The alternative hypothesis states that there is a difference between the treatment and control. Amplitude Experiment tests the null hypothesis, where states there’s no difference between treatment’s mean and control’s mean.įor example, if you’re interested in measuring the conversion rate of a treatment variant, the null hypothesis posits that the conversion rates of your treatment variants and your control are the same. In a hypothesis test, you’re looking for performance differences between the control and your treatment variants. With a predetermined metric, Experiment compares the performance of these two populations using a test statistic. The control represents your product as it currently is, while each treatment includes a set of potential changes to your current baseline product. When you run an A/B test, Experiment conducts a hypothesis test using a randomized control trial, in which users are randomly assigned to either a treatment variant or the control. Hypothesis testing in Amplitude Experiment This article will explain the basics of sequential testing, how it fits into Amplitude Experiment, and how you can make it work for you. You can experiment more quickly, incorporating your new learnings into your product and escalating the pace of your experimentation program. That means you can decide to terminate an experiment early based on observations made to that point, and that the number of observations you’ll need to make an informed decision is, on average, much lower than the number you’d need when using a t-test or similar procedures. Why is this important? With sequential testing, results are valid whenever you view them. Sequential testing has several advantages over t-tests, another widely-used method, chief among them being that you don’t need to know how many observations you’ll need to achieve significance before you start the experiment. Understand and explain the difference between sequential testing and t-testsĪmplitude Experiment uses a sequential testing method of statistical inference.Familiarize yourself with the statistical testing method used by Amplitude Experiment.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |