One of the longest running performance measurements we have is how long it takes Firefox to start. We do it very simply just to get a raw number (and yes, there have been many improvements made but this is the gist of the automation):
- Start Firefox with a URL ending with a query parameter like “start=”<current time in ms since EPOC>
- The URL prints out the date to the console (because we can do that since we control the browser and the profile)
- Automation reads the console and puts the value in a database
Pretty simple. Applying this to different browsers, you have to nix the “print to console” idea. But, how hard could it be to POST to a web service that stuffs your result in a database? Do that and the rest of it will all “just work”, right?
Well, not really. Every browser implements the cross-origin access policy to a different degree, and since we did this on android, some of them don’t seem to support it at all. Once we found a way around that, we realized that not all the data was making it into the database because the automation would kill the browser before it had a chance to POST its results. So we slowed that down, forcing the automation to wait 20s before closing the browser. Then our database crashed, this part we had nothing to do with, but Murphy’s law states that you can’t have an automation project without at least one bonfire igniting under your chair.
Add to this cross-browser headache that we’re automating this on multiple phones. The older Nexus phones (Nexus One and Nexus S) will not stay connected to a wireless network after reboot (appears fixed with Galaxy-Nexus or with ICS, not sure which). Even if you put these phones on an open network with no contention and they are set to “join automatically”, they will at some point boot into a state with their wireless disabled. We had to write some service code to ensure the wireless remained on and connected to our specific network on boot. Our other phones (a Droid Pro and a Samsung Galaxy S2) have no problem staying connected to the network, but they alternately “freeze”. I’m still trying to debug what this “freeze” actually is, but everything is functioning fine on the phone – network, logcat, process list etc are all normal. However, the phone stops running the automation. It’s interesting that the Nexus phones never encounter this issue and they are all running the same version of the automation code and browsers.
At long last, we have fought through enough of these issues so that we can start to see the results of the data coming into our database (select “2 months” or “all” to see data). Because we are merely firing our “timing” function when the “onload” event happens for the page, we can see the different interoperability issues with measuring this event. We knew it wasn’t perfect, but the results we are seeing on Android make me call into question the usefulness of this as a cross-browser comparison tool at all.
- Opera seems to fire the onload event randomly. I’m not sure what they are doing, but their timing is all over the place. Note that this could be a fluke in the automation as the Samsung/Droid Pro hang usually occurs during the Opera test (which, by chance, is also the first test). However, note that the Opera numbers for the Nexus phones are also wild, and they are not afflicted by this unusual hang.
- Dolphin and the stock Android browser are both webkit based browsers and we have always known that webkit tends to fire this event very early in the page-load sequence. This is reinforced by the fact that the event always happens at roughly the same time regardless of the underlying phone hardware, especially on the stock Android browser
- Fennec – this automation measures the new native Fennec product. Currently, the system contains results from the beginning of the project to the point at which we moved from the birch tree into the mozilla-central tree. I have another set of jobs to run that will get us the last two weeks from the mozilla-central tree, once the phones finish their jobs from the previous two months. Of the four browsers being measured, the only one changing versions is Fennec; therefore, you can see the effect of our developers’ work as they add features and battle regressions. Native Fennec is still under heavy development, and this is why the Fennec number jumps around as much as it does.
The system is far from perfect. Measuring onload is at best an artificial metric, and not at all indicative of what the user sees. In desktop automation, we don’t even use onload, we use the “mozafterpaint” event notification. For the next stage of the cross-browser test we are going to automate some visual comparison tests to get closer to measuring the metric that really matters: real-life user experience. In the meantime, the onload tests will continue to give us a rough barometer of our regressions and performance, especially against our own historical data. To that end, I am going to undertake the next few improvements to this automation:
- Understand what the hang is on the Galaxy S2 and Droid Pro phones and fix it
- Add more phones to the system so that it doesn’t take so long to run through a set of jobs (we only need these temporarily until the system catches up on old data).
- Experiment with lowering the timeout period between “results uploaded” and killing the browser under test. (This might work better now that we have changed database backends).
- Get a better front end UI for the results. If you’d like to contribute to this, let me know, because this website could sure use your help!