Posted by: cmtalbert | July 2, 2009

Testing Graphics Hardware Acceleration

Why does changing a car’s oil in take an hour in California? In Texas, this is a 15 minute process, in and out. I’ve never been a fan of wasting time, so while waiting on my car this morning, I started researching one of the most exciting test opportunities we have in the upcoming Gecko 1.9.2 platform: Graphics Hardware Acceleration.

Most computers that people use today have some amount of GPU acceleration under the hood. So, this should result in improved rendering performance, which we will need as the web becomes more graphically interesting.

Our current test suites do not have any notion of hardware accelerated versus software rendering. In fact, on Linux we run our test suites in an X virtual frame buffer, which never touches any graphics hardware. And the graphics hardware on our boxes is pretty bottom of the barrel. I’d bet that the chip rendering this text for you is more advanced.

So…how on earth are we going to test this? I’m mostly concerned about functional testing, but I recognize that whatever solution we come up with here might also be needed to aid the graphics team with their unit tests. My thoughts and research are just beginning on this effort, so this isn’t a complete plan, this is a preview to my plan and a sincere request for help.

First Steps

We need to determine what the top video cards/drivers are for each of our top three platforms: Windows, Linux, and Mac. We can then grab a small handful of machines that have those configurations. I’m thinking maybe just 3-6 machines here, nothing extravagant. I think we would still want to target middle-of-the-road hardware, not top of the line gaming systems. We want to test what the average web user will be running.

At the same time, we need to figure out what graphics benchmarks exist and whether or not these can be leveraged or re-implemented into a graphics benchmark for the web. Since I’m not interested in how we perform writing instructions into the XPCOM components that make up the underbelly of the graphics support, I’m thinking we should create these benchmarks/tests in JavaScript and canvas. We ought to use this new graphics support the way that real developers will be using it once it is released.

Further Thoughts

What we might use here would be a reftest-like framework that would be augmented to compare the test that is hardware accelerated with a reference image that is software rendered. I imagine there are pitfalls this way–with hardware support, the test image may be more detailed than the reference, and we would need a mechanism to detect this condition to flag it as a “pass” rather than a “failure”. I definitely need to think about this some more.

Vision

There are two basic parts here. First, we need a simple test infrastructure that can test our rendering in both accelerated and software modes. We need a set of representative machines to run these tests on and we need those machines automated into our normal build and test reporting structure.

However, our tiny sample of machines will not give us the coverage we really need. So, the other reason this test infrastructure must be simple is that I want to invite all interested users to run it for us. It must be something that the users can download and run on their nightly builds. Once the test finishes, the users should be able to send us the results data from that run with a click of a button. The anonymized results will help us understand how well our code is handling device/driver/OS combinations.

I wanted to start a conversation here, and throw a line out into the wide world of the net to see who might be interested in helping us figure it out. Thoughts and ideas are most welcome. I’ll keep y’all up to date as my plans progress.

Posted by: cmtalbert | June 30, 2009

Firefox 3.5 Release Day!

Today, we are releasing Firefox 3.5 this morning, so we’re in the office really early.  Got to watch the sun rise over downtown Mountain View.

mvsunriseIt’s going to be a fun day!

Posted by: cmtalbert | March 26, 2009

Greener Tinderbox Results

So,after a much more log parsing and number crunching than I had anticipated, I have some results from the Greener Tinderbox effort.  For those that don’t know, I’ve been running all the TUnit automated tests over and over on a set of knock-off Tinderboxen trying to hammer out which failures we are seeing, and which we aren’t.  My hope is keep doing this long term, debugging a failure or two every couple of weeks until we get the tinderboxes back to a  sane state.  But to get started, first I had to collect a bunch of data, parse the data, and determine what the data means.  That takes a while, but now, round 1 is done.

= Executive Summary =
The boxes displayed more noise than normal tinderboxes because of a couple of configuration issues described down in the “Quality of Results” section.  For the most part, many of the tests that failed have been designated as “intermittent” at one time or another or on different products (seamonkey, fennec, etc). You can grab the results in an Open Office spreadsheet:

== Next Steps ==
Looking over the top intermittent failures that were replicated from tinderbox, and comparing that to Ted’s new topfails reports, I’ve decided to attack three tests for debugging:
* browser chrome fuel test: browser_Browser.js (bug 458688)
* chrome test test_wheeltransaction.xul (bug 484853)
* mochitest jquery fx module tests (bug 484994)

My plan is to put in debugging information on these tests and run these three harnesses in tight loops for the next 24 hours to see if I can get the failures to replicate and figure out what is happening there.

On Friday, I want to update to a new changeset and re-run all the tests on the machines for another five days. And after getting that data, I’ll need to give the machines back, but I plan to keep the VMs.  Rolling up results should be quite a bit quicker next time as I won’t have any scripts to write for log crunching.

== The VM versus Machine Question ==
There is no doubt that the high end machines I’m using for this blow the VMs out of the water in terms of performance.  However, in terms of intermittent failures with tests, I don’t see any quantitative difference between VMs and machines.  For the second round, I hope to finally have my Linux machine ready and can compare it with the Linux VM.

= Details Details Details =
== Results ==
The pretty results are here (open office.org spreadsheet).  The raw logs are also available in that directory.  And I’ll be checking in all my log parsing/number crunching/test running code into http://hg.mozilla.org/qa/tunit-analysis once that repo is available.

== The Math ==
I was going to crunch the numbers, but once I started them (you can see some on the “Total” sheet in the spread sheet, the numbers didn’t look quite right to me, so I’m going to repost my “Counting Failures” blog post but with legible equations this time to make sure I’m going about this correctly.

== Quality of Results ==
There are two problems with the results.
1. The accessibility tests on Linux did not run, because I naively thought that –enable-accessible was the default on Linux as it is on windows.  So all the failures on mochitest-a11y on Linux should be disregarded.
2. On Windows, I set the color depth policy as specified in the reference platform document; however, I could not change the resolution or color depth on those machines.  I assumed that the policy would somehow take care of it (should have known better).  So a bunch of the color testing reftests failed on the windows machine.

Otherwise, the results seem to be good.  I have queried every failure (except the reftest failures as noted — many of them seem to be machine configuration issues) against bugzilla to find out if each failure is noise from the box or if it is a failure that has been reported as intermittent on Tinderbox, and I’ve noted those bugs.  If the bug field and it is blank then there is no mention of this failure in Bugzilla.

Posted by: cmtalbert | March 20, 2009

Counting Failures

So, I’ve been trying to get some test results together over the last few weeks to try to determine why our Tinderboxes keep going orange.  To do that, I’ve been running the TUnit test harnesses constantly on the same change set, amassing quite a bit of data.  One of the things I’d like to be able to do here would be to determine why we get these random failures and how we can minimize them.  I’ve almost acquired my first set of full results, and to complement the investigation into each failure, I’d like to have a set of metrics I could apply to the entire run as a whole so that I could measure a couple of things.

  • Measure the reliability of each test harness in the TUnit suite (i.e. XPCSHell, mochitest, mochitest-chrome, mochitest-browser, mochitest-a11y, Reftest, Crashtest).
  • Measure how close my machines match that of Tinderbox in terms of random failures they encounter
  • Measure how random each failure is.

So, to get each of these, I’ve come up with an idea on how to calculate them.  However, I’m no math whiz, and that’s why I turn to you to get your opinion before I start crunching numbers.  Here are my thoughts on how to measure this stuff.  I apologize in advance for the copies straight from my notes, I wanted to get feedback before spending time on making it extremely pretty.  I’ll do MathML next time, especially if these prove to be illegible.

Reliability of each Harness

To do this, I’m basing my analysis off the standard Mean Time Between Failure metric.  Here, it would be more aptly called mean tests per failure.  In other words, how many tests do we run before we hit a failure on a given test harness.

Mean Tests Per Failure from my notes

Mean Tests Per Failure from my notes

Here, my idea is to sum the delta of the number of tests ran in the harness and the number of failures in those test runs, then divide that by the total number of failures from all runs on this harness.  This does seem to make sense on the surface: if there are no failures your number of tests run per failure encountered approaches infinity.  If every test fails, the number of tests run per failure encountered approaches zero. One thing that bothers me about this metric is the amount of noise that will be introduced by upgrading to a new changeset – because with a new changeset, we’ll also have a new set of tests in the harness we are testing against.

How close are we to Tinderbox

To do this, I’m going to take a discreet view of the total known failures, both those that are known to be random on tinderbox, and those that I’m encountering:

Percent Difference the test machines are from the true Tunit machines

Percent difference the test machines are from the true Tunit machines

This measurement is across all harnesses.  Essentially, the number of items in the set of observed failures on my machines (G) that are not in the set of known random failures on Tinderbox (T), divided by the number of the items in the set of known failures on Tinderbox (T).  This will give me a percentage value that indicates how closely we match Tinderbox.  If all the tests that fail on my machines are the same tests that are known to fail on tinderbox, then we have a 0% difference.  However if more tests fail on my machines than on tinderbox (which is what initial analysis is showing) then we will have a non-zero percentage difference.  One caveat here is if less tests fail on my machines (i.e. not all Tinderbox failures are displayed) then I would still have a 0% difference.  Since I’m only trying to determine “how different are my machines acting from Tinderbox’s behavior,” I think this might be OK.  Thoughts?

Measuring Randomness of a Failure

This is probably the most straightforward measurement that I am considering.

Measurement of how random a given failure is

Measurement of how random a given failure is

This is simply the number of times a particular failure occurred (Cf), divided by the number of times we ran that harness (r).  This gives a simple percentage of how likely (l) it is that we will observe the failure in question.  The reason this calculation can be this simple is that during any given run A we can only hit any given failure X once.

Thoughts?

So, those are my first thoughts on how to measure this data.  I have a lot more questions than I have answers:

  • I’ve been reading a lot, but it’s been a long time since I’ve done this kind of math — are these correct?
  • Are there other metrics that would make sense for this issue? I don’t want to do a bunch of metrics for the sake of numbers, but I definitely want to have a good way to compare the data I get from this effort between runs on different changesets in order to determine if we are making the TUnit test suite more or less reliable.
  • Ideas on how to measure the error in this process?

Thanks for the help!

Posted by: cmtalbert | March 2, 2009

Keeping the Code Fires Burning

In Austin, March means flowers, the beginning of the best season (the not-quite-so-hot-and-not-rainy-season, to be prescise) of the year and SXSW music.   I really miss SXSW; the way the town erupts into a giant party, the way that there is music in even MORE places than normal (it’s already everywhere in Austin–I’ve seen live bands in grocery stores!)…I wish I were heading home for it.

Alas.  But since I am stuck on a wind whipped coast with breakers crashing together in all kinds of crazy directions and rain coming down sideways, I thought I’d share one of my secrets to keeping warm.  Since much of the Mozilla world is hammering away on Firefox Beta 3 in cold climates, I figured this quick little dish would be welcome.  I found it in the The Islands in the Sun Cookbook: Culinary Treasures of the Italian Isles by Marlena Spieler, which is a great book for a bunch of reasons, not the least of them being this recipe.

Pasta Mista in Brodo alla Zenzaro

  • 3 cups(715 ml) chicken broth (I use Pacific Natural Organic Veggie broth)
  • 1 1/2 cups (350ml) diced tomatoes (canned work great)
  • 1/2 cup (120 ml) tomato juice
  • 1/2 to 1 tsp (1-5ml) finely chopped fresh ginger (you have to have fresh ginger, it’s the only hard part, I swear)
  • 3 cloves garlic (chopped, minced, or pressed–as fine as you can make it, essentially)
  • Some green onions
  • Black pepper
  • Sea Salt
  • Dash of Tabasco sauce (I use several)
  • 8 ounces (255g) of ravioli or tortellini (I use one package of the frozen stuff.  I find that sharper flavors work well like portabello mushroom and cheese.  This is actually kind of important, green or super strong flavors like spinach and Asiago or Gorgonzola really don’t work well with this dish).
  1. Combine broth, tomatoes (dump ‘em in, juice and all), tomato juice, garlic, ginger, salt, and pepper into a big pot.  Bring it to a boil.  I usually let it come to a slow boil over medium heat, but if you’re hungry you can make it go faster.  Cook a minute or two then remove from heat (or just simmer it). Stir in your Tabasco sauce.
  2. Cook up your pasta in another pot, drain that.
  3. Put the pasta in a bowl, put the broth over the pasta, and drizzle green onions (and olive oil if you want, I never use it but it’s in the original recipe) over that.

Makes one awesome, warm, spicy dish.  I’ve been known to drink the tomato broth by itself–it’s just that good, and it warms you from the inside out.  I love this dish because you only have to chop the ginger (and the garlic if you opt to do that, I have a garlic press so it hardly counts as chopping).  You can prepare this in about 10 minutes and it will be ready to eat in about 10 more.  I like to let it cook longer to get the flavors to meld, but that still gets the job done in under 30.

When you’ve got a deadline to meet and a cold wind whipping at your door, this is exactly what the doctor ordered.  Now, back to coding.  Bon Appetit!

Posted by: cmtalbert | February 27, 2009

Matt Miller on the Tyranny of Dead Ideas

So…my local radio station kinda sucks.  They don’t post streaming (or even podcasted) content of some of their shows?  Makes me want to record it off the air and then offer it here.  So, I heard Matt Miller on this program tonight.  It’s an incredible show, and definitely worth staying up late to hear again (unless you’re lucky enough to live far enough east of us that 4am PST is a decent hour in the afternoon). He’s discussing his new book, The Tyranny of Dead Ideas, which sounds like a damn good read.

The best line?  While answering a question about how to overcome partisanship in the face of the giant crisis we are facing, he said: “In a democracy, you get the government you deserve.” Oh yeah.

It’s enough to shame even this old, jaded, activist carcass off the couch, shrug off the defeat from 2003 and get back in the game.  In other words, it’s up to us to build the safety net for the politicians to think outside the box, because they are physically incapable of doing it on their own as they spend far too much time navel gazing on re-election poll findings and how many fundraisers’ bottoms still need kissing.  That’s taking his statement a bit out of context to be fair, but it is still true to the point he was making.  If we want to see the world change for the better, it’s up to us to make that happen.  It’s time to start acting like we already are the change we want to see in the world.

Posted by: cmtalbert | January 16, 2009

Seven Things

I was hoping this would pass me by.  But I do need to blog more, and while I haven’t had a bunch of time, I’ve enjoyed the “seven things” posts I’ve read on Planet.  I was tagged by Simon and Jane.

The rules.

  1. Link back to your original tagger and list the rules in your post.
  2. Share seven facts about yourself.
  3. Tag some (none? :) ) people by leaving names and links to their blogs.
  4. Let them know they’ve been tagged

The Seven Things.

  1. I’ve been a writer all my life: my first word was “book”, I wrote my first story when I was five.  Now, I have four novels finished, hundreds of stories and a fifth novel in progress.  None of them are published because for many years I despised editing.  It was much more fun to write.  In 2007, I finally changed that perspective when a friend of mine published her first book.
  2. I grew up in the swamps on the Texas and Louisiana border.  When I was fourteen, a buddy of mine and I captured an alligator using our bicycles and a rope. We took it to his house.  He was convinced his dad would let him keep it as a pet.  Needless to say, we were both grounded.
  3. When I was very young, my best friend and I created an imaginary high fantasy world to play in.  We continued that story line and played in that world until we were about ten years old.  When we ended it, we held an imaginary hero’s funeral for the characters who had to die to save their world and so that we would be released from it.
  4. When a land developer attempted to bulldoze the small bit of forested land across the street from my childhood home, I lead a loosely organized resistance of children to stop him.  We’d never read the Monkey Wrench Gang; we had no idea what we were doing.  We just knew we had to save Twin Hills (that was our name for it).  Ten years later, I met that very land developer face to face, standing upon the ruins of our forest.
  5. I once lived outdoors for so long that when I first went back indoors (it was a restaurant, as I recall) it felt claustrophobic and unnatural.
  6. In college, I befriended an entomology professor and spent a month living in the Caribbean studying bugs.  In the mountains of Dominica at that time, the water was so clean, you could stop at a stream, fill up your bottle and just start drinking.  And for the record, the “bottled spring water” that we are drowning in these days tastes nothing like the real thing.
  7. My first computer was a Commodore 64 too, and my parents have a picture of me holding up a dot-matrix printout that is as tall as I am.  The printout was a text based adventure game that I’d programmed in BASIC (long live GOTO!).  I don’t have a scanner here so I can’t attach the pic.  Bummer.

Let’s see who shall I spread this thing to next.

* Stephend – who needs to blog more.

* Abillings – who always has an interesting story to tell.

Posted by: cmtalbert | January 1, 2009

SF to Texas, Days 2 & 3 – Time Management

Well, the rest of the trip was pretty uneventful.  I stopped at the petrified forest in Arizona and walked around there for an hour or two.  Other than that, I just drove, drove, drove.  It turned out to be 2,125.3 miles.  Quite a haul.  I’m looking forward to heading back and I promise to actually blog it well this time.  But I think that this  points to a bigger topic, and one very appropriate for a New Year’s Day.

Happy New Year!

And now what?  That’s my problem.  How will this year be any different from last?  I keep wrestling with that question as my head threatens to explode from the remnants of a very bad decision (mixing scotch and champagne is a no-no).  I don’t procrastinate, I manage time, I make time, I do all that stuff, and still things just don’t get done.  I think I let too many things get in my way between what my heart tells me to do and that ever-present sense of “What I Ought To Be Doing”.  I don’t know if other people have this problem.  None of my extremely successful friends seem to.  I’m making ‘09 a year to work on this.  So the first thing is a set of goals:

  • Finish the Twin Hills Novel
  • Publish some of the stories, finish edits on Legacy Pt 1
  • Do a triathlon
  • Complete my website
  • Bring the principles I use at work on the web to my writing life, quit making an artificial and useless dichotomy there

A very interesting thought popped in my head while reading the first few pages of Seth Godin’s Tribes, it comes from Ghandi’s oft quoted admonishment: “Be the change you wish to see in the world.”  My thought was: “It’s not so much that you have to become the change you want to see in the world, it’s that you already ARE that change, you just have to start acting like it.

So my question, dear readers, is what are you going to do with this “one wild and precious” year?

Posted by: cmtalbert | December 21, 2008

Day 1: SF to Flagstaff, 780 miles

I love deserts.  Maybe it’s because I grew up in a swamp.  Maybe it’s because of the rock, the hard-edged faces, the difficulty of the landscape.  I think it’s because of the absolute silence in the desert.  It is perfectly quiet, perfectly still.  Not all the time, but most of the time.  I am especially drawn to deserts that are somehow knocked out of their element.  Last time I saw this was when I watched the remnants of hurricane Ivan dump the amount of water Big Bend would get in ten years in two days.  So you can imagine my delight when I found that the Mojave desert had been hit by this latest winter storm.

Mojave Covered in Snow

Anytime something like this happens in a desert, all sorts of strange things start happening.  Different plants will bloom, odd animals will suddenly appear.  So you can imagine my surprise when I stumbled across the extremely rare Andean Snow Beast!

Rare Andean Snow Beast

RAAAAAAAAAAR!

Posted by: cmtalbert | November 12, 2008

What would you spend a ton of money on?

I met a guy tonight that’s spent a bunch of money (more than I’ve ever spent on a new car) on his dream.  His dream? To combat apathy and empower people.  Pretty cool, eh?

I’ve always had this wild distrust of money, like it’s some unbroken horse that’s going to throw you the moment it gets a chance.  Maybe it’s from the way I grew up, I don’t know.  But, as I’ve spent the last several years in different businesses, I’ve come to see money as just another tool to get you where you want to go.  Nothing scary, just practical, like a pair of channel locks.  Not good for everything, but when you need it, nothing else will do.

My dream is not unlike his.  To spin a story that axes the frozen sea inside all of us (thank Kafka for that line).  But, I’ve never really brought all my talents and tools to bear on that problem.  I’ve done part of it here, part of it there.  I do some of it every single day.

This really came up because of another email I got this week from a domain name registering service.  You know, those people that you pay to get things like www.isntthiscool.com or whateveryouwant.org.  About ten years ago, I registered cmtalbert.com.  In ten years, I’ve never done a thing with it.  Ten years.  It was quite a blast from the past.  I remember registering for it and spending a few minutes trying to imagine what I’d be doing in 2008.

I didn’t think I’d be working for an incredible Open Source movement, living right above the Pacific Ocean, and still not have a published credit to my name.

But I think the time has come to really look at this entire enterprise the way I’d approach something at work.  Let’s spend money where it helps, let’s do what it takes, and climb after my dreams without keeping one hand on the rope at all times.

I would normally write this sort of thing in a journal, and all four of my regular readers (thank you! ;-) ) would  never see it.  But, this time, I thought it made sense to pledge myself to this publicly.

What would you spend a ton of money on if you had to?  What is it you want to do in your heart of hearts?

Older Posts »

Categories