Why do AB Experiment
Online AB Experiment is a scientific method to establish causality
Online experiment can detect impact of smaller changes
Most entrprises or product company determine which feature to implement, track progress and deliver features to production.
Success in these companies are determined by whether % of features delivered in time, budget.
Online experimentation enable determining impact of features on the fly. It let companies innovate for their customers.
Without online experimentation companies figure out impact of features after month/years. During these periosd teams get promoted for delivering feature. Online experimentation change the culture to delivering impact and learning thru experimentation.
To do AB experiments (A/ B testing) Experimentation platforms should abstract the following aspects of experimental design:
Statistical unit for observation
Sampling traffic
ML Metrics definition and computation
Business metrics definition and computation
Loss functions and alignment with metrics
Modeling architecture
Evaluation
ML engineer need to know intricacies in order to build dials.
Abstraction enable them to run experiment. with least code. In most cases there is no code change needed. In some code changes will be needed
Experimenter needs to be well-informed of their intricacies in order to experiment and validate hypothesis.
A/B testing is also referred as
Field experiment
split test
flights
Bucket test
A/B/n test,
abexperiment
Google optimize
Optimizely
convert
Adobe
AB Tasty
Evolv.ai
Evergage
Apptimize
Interaction Studio
Taplytics
Neatab
Intellimize
Leanplum
Sitespect
split.io
Adobe Target
Eppo
LaunchDarkly
VWO Abtesting tool
growthbook.io
Some teams create very significant result from small sample size. Statistically significant results, from under-powered experiments, exaggerate the lift, leading experimenting teams to believe they have great results when they really don't.
Under power results are prone to errornous conclusions.
Statistically significant result on small sample (low powered test) are common cause of bad ab experiment.
Example
Bing : Increase the title line of ads. Combine text from first line into title
Bing ran approximately 1000 experiment /year. Few of experiment provided significant impact.
Bing ran experiment on user interface , personalization changes, algorithmic changes, content changes etc.
Experiments are run on web portals, mobile site, mobile app, desktop apps.