X

My Oracle Support Blog

How to Score Customer Feedback/Bugs and Stores (for Agile)

I am sure some of you are doing Agile Scrum to manage your own software development. We do this as well for My Oracle Support Development. In the past I have talked about user research and touched on how we score issues we find or want to address. I thought in the spirit of the Agile world I would elaborate on this.

Here is the question...

How do I order my stories and bugs in way that is repeatable and consistent with being Agile?  How do you decide what stories to do first? What order should I fix bugs vs. do new features and enhancements?

My answer is that you score them. Scoring is more powerful than just an order and allows for a natural sort order (supporting the concept of doing "the most important" stories first, from the customers perspective). Scores can be compared between teams and in SAFe for getting stories that other teams need to complete for you, into a fair and manageable order (or at least to have the discussion). Whatever
is "most important" is ranked first and should be done first. We do this rank ordering by scoring each item. Then the highest scores for features or bugs go first. This is more repeatable than just "moving things around" in your Agile tracking tool till it looks right.

See what you think...

How to Score Items

First let me tell you about the wrong way...

Look at your list and see which ones you (as a Product
Owner) think should be done first, maybe the ones where you know the
developers can do it quickly.

The right way: Use an
"unbiased" method to put the items in order using a three step rating
system. This score could be generated by you or someone else. You do this by answering the following questions

How Many Users Does it Impact (3 - All, 2 - Some, 1 - A few or a Limited User Role)
How Often Does it Occur (3 - All of the Time, 2 - Some of the Time, 1 - Infrequently)
How Bad is the Problem (4 - Severe, 3 - Critical, 2 - Important, 1 - Minor Importance)

Take
the "score" from each item and multiple it together (3x2x2=12). Now
order your stores by the score. As new stories come in, score them. They
will naturally fall into the right order.

How to Score Consistently

By
learning some simple rules, two independent people with a common
understanding of the scores should be able to score the same item the
same way. But just like playing "Poker" to come up with development time
estimates for stories, there can be differences. Let someone else score
the same story and if they come up with a different score, discuss why. Nine times out of ten you can easily resolve the difference and come to a
common agreement. If you don't, use the higher score of the rating. Why?
Because if there is confusion about the scope of the problem, you are
likely underestimating it anyway. Scope rarely shrinks over time, so go conservative.

How Many Users Does it Impact

How Many Users Does it Impact (3 - All, 2 - Some, 1 - A few or a Limited User Role)

This
score has to be done based on the total number of customers for the product. it canNOT be the number of people who use the specific feature in question.
That is "everyone" signs in. So no question about rating something to
do with Sign In a "3", but only a very limited number of people customize their home page. So something to do with that would be a "1". You
can't change the scope and say, "well WITHIN the people who customize, ALL of them will use this feature". It doesn't work that
way. You are trying to create a score that can be measured against other stories. So, one can easily see where the
most good would come out of applying resources.

Examples of 3's for My Oracle Support (the product I spend most of my time thinking about)

  • Sign In
  • Issues on the Landing Page
  • Searching Knowledge (because "everyone" does this)
  • Viewing Trouble Tickets (we call them Service Requests or "SRs")
  • My Settings

Examples of 2's

  • Advanced Filters in Tables
  • Editing SRs
  • Creating SRs
  • Creating On Demand RFCs (Request for Change to our on-demand serice)
  • SR Profiles ("templates" used to file SRs)

Examples of 1's

  • Customizing a Region
  • Approve User (to access content)
  • Help Link in a Feature used by a small audience

So you can see, it is sort of a the top 20% of use are 3's, the middle 80% are 2's and the bottom 20% are 1's.

How Often Does it Occur

How Often Does it Occur (3 - All of the Time, 2 - Some of the Time, 1 - Infrequently)

If
every time you come here, the problem exists, it is easy to make this a
"3". If it only happens in a specific mode or state (say when someone does a complex filter on a table THEN your region exhibits this problem), then it is a 2, and
those annoying errors that pop-up rarely would be a "1". Of course, you
have to judge if those "errors" are some of the time, or infrequent,
because as you might expect when you multiply the values together
multiplying by "1" doesn't do anything. ;-> So we do consider that
"intermittent" errors that are difficult to reproduce, but you have
personally seen them more than once would be a "2". Again you have some
flexibility, but all of the time means all of the time, just use your
judgement between 2's and 1's. I would say if it is less than 10% of the
time, then it is a 1.

Examples of 3's

  • Every time you open a dialog box it is empty
  • A typo would be "all of the time".
  • A scroll bar always appears even when not needed or wanted

Examples of 2's

  • If your saved filter's name is too large then it truncates
  • You get a time-out error after using the product for 10 minutes and clearly you have not timed-out (the time-out is say 4 hours)
  • A dialog box appears off screen some of the time

Examples of 1's

  • An error appears rarely and you have no idea why. It only happened once that session and everything seems to be working
  • When you save an SR profile, on rare occasions it will error out
  • Every once in a while, I go "Back" in the setup wizard or flow and it forgets that I completed a step and shows the wrong state for the step
    -- Remember this is just frequency, don't get freaked out because some of these appear to be bad issues, we should catch that next...

How Bad is the Problem

How Bad is the Problem (4 - Severe, 3 - Critical, 2 - Important, 1 - Not Important)

This
is probably the easiest one for anyone to score. It is basically the
inverse of Bug Severity. Likely you have something well understood in your organization. In our organization a true "Severity 1" issue doesn't come up that frequently. Severity 1 means "service down" - totally unavailable and no work around. We don't tend to see Sev 1's very often in
development, because the code is not in production and the system is not
down. But from a usability perspective, if I can't complete the task
that is a usability "Sev 1" and thus is worth a score of 4. Likely these
are just bugs. We have a long list of definitions of Prioritization of
Bugs. I have this printed out by my desk so if I forget. We have a category one down from a "Sev 1" called a Sev 2 Showstopper. That too would get scored a 4.

You might already have a well tuned definition of what is in each severity in your organization, I would use that. But here are a few examples I would share.

4- Severe ("Sev 1" or P2 Showstoppers in my world)


  • ADA: Major Accessibility issues, including missing labels,
    non-standard abbreviations, using only color to distinguish UI
    elements, etc. (These would be flagged as P1s by ADALint)
  • NLS: Missing msg files causing pages not to render; garbled error msgs, can't translate the string
  • Help doesn't come up
  • Performance Issues (beyond our stated level of service)

3- Critical (typically a P2 in a bug system)

  • UI: A significant percentage of users would need assistance to complete the task
  • UI: Context lost during workflow
  • UI: Typos
  • Scalability issues (such as using a shuttle when it's likely that there will be thousands of elements)
  • User isn't prevented from making a serious mistake.

2- Important (P3's)

  • UI: A large number of users would need assistance to complete the task
  • Hard to understand concepts
  • UI: Layout is confusing.
  • UI: Incorrect page header
  • UI: Incorrect breadcrumbs (may be a P2 if severe because the user may lose context)
  • UI: Incorrect time format
  • UI: Grammatical errors

1- Minor Importance (P4s)


  • UI: Minor inconsistencies (like button order, using Delete instead of
    Remove, using OK instead of Continue, missing units after numbers even if the value is obvious)
  • UI: Incorrect usage of blank table cells versus N/A or unavailable.

Examples of Scoring

Sample (Real) Issue # Users Bad Often Total Dev
1. The Browser Back button does not take you "Back"
3 4 3 36 H
2. Text: "Run" should be called "Search" in toolbar
3 2 3 18 VL
3. Can't submit a search by pressing return in a field
3 2 3 18 L
4. Vertical Scroll bar is missing from Patch Recommendations
2 3 3 18 M
5. "Enterprise Patch Recommendations" is confusing term
2 2 3 12 VL
6. Deploy column (in Patch Plans) is not sorting correctly
2 2 3 12 M
7. No way to select all language packs you need for an EBS patch
2 2 3 12 M
8. Download Text and number should align correctly
3 1 3 9 VL
9. Download Trend for Patch Downloads has no labels
2 2 2 8 L
10. Task Region cannot be dragged onto the screen when empty
1 2 3 6 M

Some discussion of this can be found in the Blog post.


How to use this in your Agile tool to Show the Rankings of Scores

Most tools can expose additional fields or columns. Typically you might have a Development Priority drop menu (1 to 4 is typical). This is similar.

I like to expose all 4 fields (the three scores and then the final score to sort by). This allows for discussion to validate the assumptions made. Like I said, you might have slight disagreements, and this brings it into the open to clarify. Expect to be able to use this to drive the sort order for your stories and bugs, so that the sprint teams, release management and your customers can see how and why this order exists. Transparency is best here.

Why do to this and not "more"?

Of
course, you can go "all the way" and create a score so complex that
no-one would really understand the difference between something with a
score of "2032" and "2840". I know a system that has 17 factors, adds and subtracts based on who is escalating the issue (a VP escalating the issue is worth more than if I do it), how old the issue is, when it was filed, and its severity among the many factors. I just find that (and I would suspect some simple research into this would confirm) that mortals like me would have no chance in getting a good feel for working with the output of a score based on 17 factors. So I will approach this as a "Keep it Simple" method.

And it is true that this simple score will not differentiate enough when you have 20 stories all with
a score of 12. Clearly you might want to either include more rules for
ranking these scored items (a 12 that is for more customers (a 3 on that scale) is higher
than a 12 coming from a worse bug that is for few customers (a 2). You
decide. But the more complexity you add the more difficult it is to
understand and judge differences. And you have to decide if it worth the
additional complexity, confusion and overhead. I am minimally advocating that you do NOT just rank stories and and nor just score arbitrarily. Use a method that is
repeatable and even can be consistently applied from team to team. So
when you aggregate your rankings or your backlog you are comparing
apples to apples.

Why have one Backlog for a Large Product?

Clearly
we have more than one backlog. The scrum team for the sprint as a
backlog, so does the project, maybe the program and even the release.
But effectively these are virtual and clearly tagging an item for a
sprint doesn't mean you can't look at it in the context of all other
backlog items. I am suggesting that a single large backlog is sometimes
useful to get a few key metrics out of, if done right. In Agile at scale you are looking at different roll-ups of stories. At the Program level you are looking at features that decompose into stories. If the feature has a score of 36, it won't mean that all stories in that feature will have the same value. Clearly as you break it down, some of the stories are more important than others.

The backlog at the Program or Product level might span say 30 or more teams, like it does for my organization. If a few of
those teams represent 50% of the backlog (thinking now in terms of story
points
or work effort, not in terms of number of items), then maybe you
should reconsider how you have allocated your teams. Looking at this Backlog across your teams and knowing the value completing those stories would bring to the customer helps you do this allocation. Cool!

Looking at it another way...

If you
have one backlog, and use a scoring method, you can see how many
"customer points" are being delivered by each team, compared to other
teams. That is a team with 5 stories each with a score of 12, is
delivering (5x12) 60 customer points in that sprint. While another team
might be doing just three stories with scores of 36, 24, and 12, and
thus delivering about the same (62 points) value to the customer. But
even a team tackling 15 stories, if their summed value is 30, they are
delivering 1/2 the value. Maybe a reallocation of teams to projects that
have more customer points out there is warranted. This is not a challenge for a single sprint team working one list top down, but when doing Agile at Scale, one has to consider when to reallocate sprint resources to be the most effective. 

This is the same argument we
had about moving away from counting bugs. If you lived in most development organizations it was bugs that were counted and reported. A team with 50 bugs should not be
compared to a team with 3 bugs on how quickly they fixed those bugs or if it was worth the time and energy. The team with 3 bugs might have been
doing something far more valuable and even more costly (development
cost) than the team who has 50 "easy" bugs to fix. For design, we look
at customer points. For development, in Agile, we can look at Story
Points, or level of effort in days. Then we can look at this single virtual backlog based on scores and learn something valuable.

I might add, a backlog which is just ordered,
is not as useful. We learned this in basis statistics. Ordinal numbers
(1 comes before 2) are not as powerful and are not able to be given more
thoughtful interpretation unless they are on an "interval" scale (where
the difference between 2 numbers is of an equal interval). A score of 5
to 10 is effectively twice as important. But something just ranked 5th and
10th does not communicate this. Ten is at best 5 behind 5. That is, items
6, 7, 8 and 9 might have huge differences in value or might not. An
interval scale explains this. A "score" is an interval scale. A ranking
is not. This is why you now see College Football rankings also showing the points used to get that ranking. So now you know how far "behind" your team really is from the next spot up.

How does this fit into Reinertsen's Weighted Shortest Job First (WSJF)?

Primer (for those who have read this far and don't even know what I am talking about).

Please see this short discussion, or read his amazing book (I am still working through it, but if you like numbers and methods, this is so the book for you...) - Reinertsen, Don. Principles of Product Development Flow: Second Generation Lean Product Development. Celeritas Publishing, 2009.

Answer

I have not flushed out an answer for this. But I am leaning towards my scoring as a proxy for his numerator (relating User|Business Value, Time Criticality and Risk Reduction|Opportunity Enablement Value). With his use of a fibonacci type number and adding, there is more breadth in his scale. Likewise once divided it by job size it creates more spread. But my focus is on the requirements side, hence the numerator. I let development decide the denominator.

I am considering (and, you the reader) might want to weigh in, if I should weight my values more. Right now a Sev 1 is worth 4 points, only 33% more than a Sev 2 at 3 point, and only 50% more than a Sev 1 at 2 points.


How I "Invented" This

I did not. I took this from a ranking method provided to me by Philip Haine, now at Success Factors, many years ago. And I also know that he got it from another well known UX expert. So it has been around the block a few times. I am just saying that this model can also be applied to Agile Scrum to help us all keep our Backlogs in priority order without resorting to magic.

Resources

Enjoy!


Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.