500

I've wanted to use that title since "300" came out last year!

There is a method to my madness however. As many of you know, we rolled out an upgrade to forums.oracle.com last weekend (to Jive Forums 5.5). Since then, the site has been unstable, but steadily improving - Monday, uptime was 7%; Tuesday, 70%; and Wednesday, 80%. We've thrown more hardware at the problem, upgraded various network components, and have a team of Java and clustering specialists working the problems around the clock.

Uptime is still not where it should be of course; forums.oracle.com is business-critical for a lot of folks (as well as for Oracle), and I'm glad they consider it so. If nobody cared, we'd be wasting our time.

We made a conscious decision early in this process to stick with the upgrade; to fight through the problems instead of run from them. Some people would argue with that decision (many of you have), but so far, I'm glad we made it - even if there are grumblings about the new rewards system. (In fact, I've seen more complaints about that than about downtime!) Regardless, I do want to apologize for the downtime you've suffered through thus far.

Is it possible that we'll have to change course? Sure it is. Stability is our top priority - much more so than features. If we have to trade the latter for the former, we will.

But we're not there yet. After all, "This is Sparta!"

Comments:

You see - this is what really annoys me. Clever people say Oracle doesn't get 'Web 2.0' but here we have a lengthy outage which just proves that Oracle lives on the bleeding edge, is an early adopter and can rival Twitter, Jaiku., Disqus et al. We just need an animal for the FAIL notice. I suggest a poll and a badger.

Posted by Andy C on August 28, 2008 at 05:56 PM PDT #

Thank you for all your effort and to the team also. I hope Xerxes' army will be kill soon. Good luck ;-)

Posted by Nicolas Gasparotto on August 28, 2008 at 08:00 PM PDT #

"This is where we fight! This is where they die!"

Posted by Bob Rhubart on August 28, 2008 at 09:24 PM PDT #

Justin, I really hope you guys are able to do a public post-mortem on what types of problem solving steps and tools were applied. I for one would find that type of information extremely compelling blog reading. Thanks for the update!

Posted by James Bayer on August 29, 2008 at 12:04 AM PDT #

James, great idea - I'll see what we can do after the smoke clears.

Posted by Justin Kestelyn on August 29, 2008 at 12:09 AM PDT #

@Justin - hoping that the post mortem happens. If not in public, at least with ACE and/or ACE Director community. There are some valuable lessons in there, especially since most other forums I have seen are not as scaled (up and out) as our Oracle Forums. In the mean time, many thanks to YOU and to the team for the effort put into this. Regardless of the grumbling (mine included), this promises to be a boon to the community in the long run.

Posted by Hans Forbrich on August 30, 2008 at 03:26 AM PDT #

Have all the issues been resolved ? Have the issues been analyzed ? Is a "post-mortem" report available ? WHEN ? (this is the "response question" to what might be the answers to the first three questions)

Posted by Hemant K Chitale on October 14, 2008 at 03:37 PM PDT #

No, yes, and not at the moment. We have continuing reports of problems from Europe but are not able to reproduce them internally; relying on external users for help. Aside from inability to increase font size in IE, this is the only outstanding issue AFAIK.

Posted by Justin Kestelyn on October 14, 2008 at 11:10 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed