Friday May 17, 2013

How to Score Customer Feedback/Bugs and Stores (for Agile)

I am sure some of you are doing Agile Scrum to manage your own software development. We do this as well for My Oracle Support Development. In the past I have talked about user research and touched on how we score issues we find or want to address. I thought in the spirit of the Agile world I would elaborate on this.

Here is the question...

How do I order my stories and bugs in way that is repeatable and consistent with being Agile?  How do you decide what stories to do first? What order should I fix bugs vs. do new features and enhancements?

My answer is that you score them. Scoring is more powerful than just an order and allows for a natural sort order (supporting the concept of doing "the most important" stories first, from the customers perspective). Scores can be compared between teams and in SAFe for getting stories that other teams need to complete for you, into a fair and manageable order (or at least to have the discussion). Whatever is "most important" is ranked first and should be done first. We do this rank ordering by scoring each item. Then the highest scores for features or bugs go first. This is more repeatable than just "moving things around" in your Agile tracking tool till it looks right.

See what you think...

How to Score Items

First let me tell you about the wrong way...

Look at your list and see which ones you (as a Product Owner) think should be done first, maybe the ones where you know the developers can do it quickly.

The right way: Use an "unbiased" method to put the items in order using a three step rating system. This score could be generated by you or someone else. You do this by answering the following questions

How Many Users Does it Impact (3 - All, 2 - Some, 1 - A few or a Limited User Role)
How Often Does it Occur (3 - All of the Time, 2 - Some of the Time, 1 - Infrequently)
How Bad is the Problem (4 - Severe, 3 - Critical, 2 - Important, 1 - Minor Importance)

Take the "score" from each item and multiple it together (3x2x2=12). Now order your stores by the score. As new stories come in, score them. They will naturally fall into the right order.

How to Score Consistently

By learning some simple rules, two independent people with a common understanding of the scores should be able to score the same item the same way. But just like playing "Poker" to come up with development time estimates for stories, there can be differences. Let someone else score the same story and if they come up with a different score, discuss why. Nine times out of ten you can easily resolve the difference and come to a common agreement. If you don't, use the higher score of the rating. Why? Because if there is confusion about the scope of the problem, you are likely underestimating it anyway. Scope rarely shrinks over time, so go conservative.

How Many Users Does it Impact

How Many Users Does it Impact (3 - All, 2 - Some, 1 - A few or a Limited User Role)

This score has to be done based on the total number of customers for the product. it canNOT be the number of people who use the specific feature in question. That is "everyone" signs in. So no question about rating something to do with Sign In a "3", but only a very limited number of people customize their home page. So something to do with that would be a "1". You can't change the scope and say, "well WITHIN the people who customize, ALL of them will use this feature". It doesn't work that way. You are trying to create a score that can be measured against other stories. So, one can easily see where the most good would come out of applying resources.

Examples of 3's for My Oracle Support (the product I spend most of my time thinking about)

  • Sign In
  • Issues on the Landing Page
  • Searching Knowledge (because "everyone" does this)
  • Viewing Trouble Tickets (we call them Service Requests or "SRs")
  • My Settings

Examples of 2's

  • Advanced Filters in Tables
  • Editing SRs
  • Creating SRs
  • Creating On Demand RFCs (Request for Change to our on-demand serice)
  • SR Profiles ("templates" used to file SRs)

Examples of 1's

  • Customizing a Region
  • Approve User (to access content)
  • Help Link in a Feature used by a small audience

So you can see, it is sort of a the top 20% of use are 3's, the middle 80% are 2's and the bottom 20% are 1's.

How Often Does it Occur

How Often Does it Occur (3 - All of the Time, 2 - Some of the Time, 1 - Infrequently)

If every time you come here, the problem exists, it is easy to make this a "3". If it only happens in a specific mode or state (say when someone does a complex filter on a table THEN your region exhibits this problem), then it is a 2, and those annoying errors that pop-up rarely would be a "1". Of course, you have to judge if those "errors" are some of the time, or infrequent, because as you might expect when you multiply the values together multiplying by "1" doesn't do anything. ;-> So we do consider that "intermittent" errors that are difficult to reproduce, but you have personally seen them more than once would be a "2". Again you have some flexibility, but all of the time means all of the time, just use your judgement between 2's and 1's. I would say if it is less than 10% of the time, then it is a 1.

Examples of 3's

  • Every time you open a dialog box it is empty
  • A typo would be "all of the time".
  • A scroll bar always appears even when not needed or wanted

Examples of 2's

  • If your saved filter's name is too large then it truncates
  • You get a time-out error after using the product for 10 minutes and clearly you have not timed-out (the time-out is say 4 hours)
  • A dialog box appears off screen some of the time

Examples of 1's

  • An error appears rarely and you have no idea why. It only happened once that session and everything seems to be working
  • When you save an SR profile, on rare occasions it will error out
  • Every once in a while, I go "Back" in the setup wizard or flow and it forgets that I completed a step and shows the wrong state for the step
    -- Remember this is just frequency, don't get freaked out because some of these appear to be bad issues, we should catch that next...

How Bad is the Problem

How Bad is the Problem (4 - Severe, 3 - Critical, 2 - Important, 1 - Not Important)

This is probably the easiest one for anyone to score. It is basically the inverse of Bug Severity. Likely you have something well understood in your organization. In our organization a true "Severity 1" issue doesn't come up that frequently. Severity 1 means "service down" - totally unavailable and no work around. We don't tend to see Sev 1's very often in development, because the code is not in production and the system is not down. But from a usability perspective, if I can't complete the task that is a usability "Sev 1" and thus is worth a score of 4. Likely these are just bugs. We have a long list of definitions of Prioritization of Bugs. I have this printed out by my desk so if I forget. We have a category one down from a "Sev 1" called a Sev 2 Showstopper. That too would get scored a 4.

You might already have a well tuned definition of what is in each severity in your organization, I would use that. But here are a few examples I would share.

4- Severe ("Sev 1" or P2 Showstoppers in my world)

  • ADA: Major Accessibility issues, including missing labels, non-standard abbreviations, using only color to distinguish UI elements, etc. (These would be flagged as P1s by ADALint)
  • NLS: Missing msg files causing pages not to render; garbled error msgs, can't translate the string
  • Help doesn't come up
  • Performance Issues (beyond our stated level of service)

3- Critical (typically a P2 in a bug system)

  • UI: A significant percentage of users would need assistance to complete the task
  • UI: Context lost during workflow
  • UI: Typos
  • Scalability issues (such as using a shuttle when it's likely that there will be thousands of elements)
  • User isn't prevented from making a serious mistake.

2- Important (P3's)

  • UI: A large number of users would need assistance to complete the task
  • Hard to understand concepts
  • UI: Layout is confusing.
  • UI: Incorrect page header
  • UI: Incorrect breadcrumbs (may be a P2 if severe because the user may lose context)
  • UI: Incorrect time format
  • UI: Grammatical errors

1- Minor Importance (P4s)

  • UI: Minor inconsistencies (like button order, using Delete instead of Remove, using OK instead of Continue, missing units after numbers even if the value is obvious)
  • UI: Incorrect usage of blank table cells versus N/A or unavailable.

Examples of Scoring

Sample (Real) Issue # Users Bad Often Total Dev
1. The Browser Back button does not take you "Back"
3 4 3 36 H
2. Text: "Run" should be called "Search" in toolbar
3 2 3 18 VL
3. Can't submit a search by pressing return in a field
3 2 3 18 L
4. Vertical Scroll bar is missing from Patch Recommendations
2 3 3 18 M
5. "Enterprise Patch Recommendations" is confusing term
2 2 3 12 VL
6. Deploy column (in Patch Plans) is not sorting correctly
2 2 3 12 M
7. No way to select all language packs you need for an EBS patch
2 2 3 12 M
8. Download Text and number should align correctly
3 1 3 9 VL
9. Download Trend for Patch Downloads has no labels
2 2 2 8 L
10. Task Region cannot be dragged onto the screen when empty
1 2 3 6 M

Some discussion of this can be found in the Blog post.

How to use this in your Agile tool to Show the Rankings of Scores

Most tools can expose additional fields or columns. Typically you might have a Development Priority drop menu (1 to 4 is typical). This is similar.

I like to expose all 4 fields (the three scores and then the final score to sort by). This allows for discussion to validate the assumptions made. Like I said, you might have slight disagreements, and this brings it into the open to clarify. Expect to be able to use this to drive the sort order for your stories and bugs, so that the sprint teams, release management and your customers can see how and why this order exists. Transparency is best here.

Why do to this and not "more"?

Of course, you can go "all the way" and create a score so complex that no-one would really understand the difference between something with a score of "2032" and "2840". I know a system that has 17 factors, adds and subtracts based on who is escalating the issue (a VP escalating the issue is worth more than if I do it), how old the issue is, when it was filed, and its severity among the many factors. I just find that (and I would suspect some simple research into this would confirm) that mortals like me would have no chance in getting a good feel for working with the output of a score based on 17 factors. So I will approach this as a "Keep it Simple" method.

And it is true that this simple score will not differentiate enough when you have 20 stories all with a score of 12. Clearly you might want to either include more rules for ranking these scored items (a 12 that is for more customers (a 3 on that scale) is higher than a 12 coming from a worse bug that is for few customers (a 2). You decide. But the more complexity you add the more difficult it is to understand and judge differences. And you have to decide if it worth the additional complexity, confusion and overhead. I am minimally advocating that you do NOT just rank stories and and nor just score arbitrarily. Use a method that is repeatable and even can be consistently applied from team to team. So when you aggregate your rankings or your backlog you are comparing apples to apples.

Why have one Backlog for a Large Product?

Clearly we have more than one backlog. The scrum team for the sprint as a backlog, so does the project, maybe the program and even the release. But effectively these are virtual and clearly tagging an item for a sprint doesn't mean you can't look at it in the context of all other backlog items. I am suggesting that a single large backlog is sometimes useful to get a few key metrics out of, if done right. In Agile at scale you are looking at different roll-ups of stories. At the Program level you are looking at features that decompose into stories. If the feature has a score of 36, it won't mean that all stories in that feature will have the same value. Clearly as you break it down, some of the stories are more important than others.

The backlog at the Program or Product level might span say 30 or more teams, like it does for my organization. If a few of those teams represent 50% of the backlog (thinking now in terms of story points or work effort, not in terms of number of items), then maybe you should reconsider how you have allocated your teams. Looking at this Backlog across your teams and knowing the value completing those stories would bring to the customer helps you do this allocation. Cool!

Looking at it another way...

If you have one backlog, and use a scoring method, you can see how many "customer points" are being delivered by each team, compared to other teams. That is a team with 5 stories each with a score of 12, is delivering (5x12) 60 customer points in that sprint. While another team might be doing just three stories with scores of 36, 24, and 12, and thus delivering about the same (62 points) value to the customer. But even a team tackling 15 stories, if their summed value is 30, they are delivering 1/2 the value. Maybe a reallocation of teams to projects that have more customer points out there is warranted. This is not a challenge for a single sprint team working one list top down, but when doing Agile at Scale, one has to consider when to reallocate sprint resources to be the most effective. 

This is the same argument we had about moving away from counting bugs. If you lived in most development organizations it was bugs that were counted and reported. A team with 50 bugs should not be compared to a team with 3 bugs on how quickly they fixed those bugs or if it was worth the time and energy. The team with 3 bugs might have been doing something far more valuable and even more costly (development cost) than the team who has 50 "easy" bugs to fix. For design, we look at customer points. For development, in Agile, we can look at Story Points, or level of effort in days. Then we can look at this single virtual backlog based on scores and learn something valuable.

I might add, a backlog which is just ordered, is not as useful. We learned this in basis statistics. Ordinal numbers (1 comes before 2) are not as powerful and are not able to be given more thoughtful interpretation unless they are on an "interval" scale (where the difference between 2 numbers is of an equal interval). A score of 5 to 10 is effectively twice as important. But something just ranked 5th and 10th does not communicate this. Ten is at best 5 behind 5. That is, items 6, 7, 8 and 9 might have huge differences in value or might not. An interval scale explains this. A "score" is an interval scale. A ranking is not. This is why you now see College Football rankings also showing the points used to get that ranking. So now you know how far "behind" your team really is from the next spot up.

How does this fit into Reinertsen's Weighted Shortest Job First (WSJF)?

Primer (for those who have read this far and don't even know what I am talking about).

Please see this short discussion, or read his amazing book (I am still working through it, but if you like numbers and methods, this is so the book for you...) - Reinertsen, Don. Principles of Product Development Flow: Second Generation Lean Product Development. Celeritas Publishing, 2009.


I have not flushed out an answer for this. But I am leaning towards my scoring as a proxy for his numerator (relating User|Business Value, Time Criticality and Risk Reduction|Opportunity Enablement Value). With his use of a fibonacci type number and adding, there is more breadth in his scale. Likewise once divided it by job size it creates more spread. But my focus is on the requirements side, hence the numerator. I let development decide the denominator.

I am considering (and, you the reader) might want to weigh in, if I should weight my values more. Right now a Sev 1 is worth 4 points, only 33% more than a Sev 2 at 3 point, and only 50% more than a Sev 1 at 2 points.

How I "Invented" This

I did not. I took this from a ranking method provided to me by Philip Haine, now at Success Factors, many years ago. And I also know that he got it from another well known UX expert. So it has been around the block a few times. I am just saying that this model can also be applied to Agile Scrum to help us all keep our Backlogs in priority order without resorting to magic.





« May 2013 »