I know that’s a bold claim, but we can back it up with Math, with some easily quantifiable assumptions.
I’ve been working with mobile platform teams lately, and one consistent theme is that platform teams rely heavily on metrics. You might argue that metrics shouldn’t be everything and can obscure more fundamental developer productivity concerns, but that’s a different discussion.
Instead, I like to meet my customers where they’re at. If metrics is your language, that’s the language I speak to you in.
And most platform teams care a lot about build times. It’s typically also a heavily instrumented metric. So let’s start there. In a future blog post, I’ll talk about other metrics that you can track that are more specific to screenshot testing.
TL;DR: screenshot testing results in developers having to re-run builds, which can be completely avoided with Screenshotbot.
The first assumption is that your organization heavily relies on screenshot testing (a.k.a snapshot testing). In a company that uses screenshot testing, you can estimate that about 25% of Pull request, or pull request version will result in screenshot changes. I’ve spoken to companies that estimate this a lot higher, but 25% seems to be what our internal data tells us. It’s quite easy for you to estimate this for your own company by running a script and seeing how many PRs changed screenshots. Let’s call this ratio p, which for the rest of our calculations we’ll keep 0.25.
(This number might also vary with teams: a product team might have this closer to 90%, and a backend team might be closer to 0%.)
Now, let’s understand what the workflow looks like if you don’t use Screenshotbot:
A developer sends a UI change, tests run on CI, and the tests fail because the screenshots have not been updated. If the developer decides this is an intentional change and not a regression, at most organizations they would do one of two things:
In both cases, note that we’re running the tests twice on CI (and in the first case, one additional time on the laptop… but we’ll ignore this extra cost for now, since we’re focusing on the platform team’s metrics.)
This means, that if the time it took to run the tests was initially T, this PR now is taking time 2T to run the tests for essentially one version of the code. This is clearly bad, how do we quantify the overall cost to the company across all PRs?
Well this happens only in p fraction of PRs, since those are the only PRs that affected screenshots. There’s one more thing we need here, the developer only re-runs the screenshot tests if they decide it’s an intentional change and not a regression.
This comes to a second variable we need to estimate: When screenshots change what percentage of changes are intentional product changes, vs regressions? This number is a little hard to estimate with your existing tooling but I can suggest two ways of doing this:
Let’s conservatively estimate this to 95%. You can come up with your own estimate here. We’ll call this ratio q.
Given these two variables, we can now estimate the expected build time across the two runs. With probability p a PR affects screenshots, and then with probability q, that would not be a regression and developer will have to rerun the build. If they re-ran the build, the time it took was 2T, and otherwise just T. Using some simple probability theory, we can write our expected build time as:
Now, if you use Screenshotbot, the build time will always be T, since we’ll never have to trigger a second build. The developer just accepts the screenshots and goes on their way.
Thus, we can estimate the average CI savings as a fraction as:
For our initial estimates of p = 0.25, and q = 0.95, this works out to be about 19%, and does not depend on the actual time it takes to run each build.
Clearly, this shows that on average you’ll save 19% across builds that run screenshot tests. In particular if you use emulator or simulator based screenshot tests, this cost can be pretty high, since these jobs typically use beefy machines or expensive emulator cloud services. I don’t know how much you’re paying for CI, but this savings can be substantial just in raw monetary value.
However, it’s equally important to look at this savings from a developer productivity perspective. When it comes to developer productivity, most platform teams track P95 build times instead of average build times. Well, we’ve already agreed that 25% of PRs result in screenshot changes, so in the P95 case you’re definitely hitting screenshot changes. You can work out the same math for the P95 case and you’ll find it to be:
There’s an additional savings of avoiding network data when using GitLFS. With GitLFS, you’re always downloading the images during test runs. With Screenshotbot you’re only uploading screenshots that actually changed. Without looking at your metrics, I can’t quantify this for you though. But if you have gigabytes of screenshots, expect this to be substantial (but let me know what you find).
If you liked this blog post, please share this with your platform/DevEx team. At the end of the day, the platform team’s goal is to keep developers productive and happy. If you’re being slowed down by your screenshot tests, perhaps it might be time for your company to start using Screenshotbot.
You can also reach me at arnold@screenshotbot.io.
Did you enjoy this post? Share the knowledge!