Monday Nov 12, 2007

Identity oracles and derived attributes

There has been some great discussion of late about identity oracles and the need to present
derived/obfuscated identity attributes and whether derived attributes offer greater privacy.

Personally, I am not yet convinced of the business model for a consumer-facing identity oracle. Perhaps I'm a cheapskate, but I can't see paying, say, $5 a month to a service to hand out my identity attributes to save me the trouble of typing them in. Chances are, any given website will have some unusual attribute that my identity oracle didn't think of, and in any case I prefer to type my information in to each website myself, because I tend to provide different, erroneous info to each site whenever I can get away with it. I guess that makes me a paranoid cheapskate!

I think the attribute provider concept makes a lot more sense when there is an entity that already has information about a person and which would be authoritative for a particular attribute and authorized (not necessarily by the user) to hand that attribute out, possibly in an obfuscated form. A government might be an authoritative source for whether someone is a citizen, and obliged to respond to another government to that effect. An employer might be an authoritative source for whether someone is employed and obligated to respond to a bank loan officer to that effect. A credit agency might be an authoritative source on someone's credit rating, and obligated to report that to a credit card company. The user doesn't get to intercept, alter or halt any of this, because the entities which need the information trust these authoritative sources more than they trust the user. Of course, this kind of provider would offer up only a limited set of attributes.

For either type of provider, be it identity oracle or limited attribute provider, there are operational and support considerations related to derived or obfuscated attributes. For example, it is important to think about how to troubleshoot issues with derived data when something goes wrong. A deployment I was involved in a couple of years ago is a good example.

A particular employee-facing application was provided by an outsourced application service provider. Federation using Liberty protocols was implemented for authentication between the application service provider and the employer. The authorization requirements for the application stated that some pages were open to all employees, and some pages were restricted to employees who were citizens and either above a certain level within the company, or active shareholders. The service provider didn't need to know a particular user's citizenship status or job level. The service provider really just needed a "YES" or "NO" in terms of eligibility for the special content. So it was determined that the equivalent of a "YES" or "NO" would be made known to the service provider instead of the raw data attributes.

Unfortunately, the raw data from which eligibility (the "YES" or "NO") was determined came from several different source systems, at least one of which was another outsourced system. Imagine that a user goes to the outsourced application, and doesn't get the "special" page which requires eligibility, and calls support to complain because they feel they've been erroneously prevented from viewing the page. It's not feasible to expect that the support desk folks would a) know where the derived "YES" or "NO" came from, and then b) have the ability to go out to the individual source systems to see which one caused the erroneous "NO" to be derived. (Which would be necessary to troubleshoot the problem)

In this case, we collected the salient facts from the different source systems (each of which of course was updated on a different time schedule) and stored the raw data attributes in a holding pen "behind" our identity service, to facilitate trouble shooting. Not a great solution perhaps, but better than nothing.

Now, imagine that Application Service Provider (ASP) #1 needs a derived value based on source data items x,y,z. ASP#2 needs a derived value based on source data items a,b,c, from other systems. ASP#3 needs a derived value based on source data items x,a,g,m This would start to be a bit of a maintenance headache keeping all these source data items in order to provide a derived data field to each ASP. I'm not suggesting that the raw attributes should be handed out. On the contrary, I think derived or obfuscated attributes are better, but they may be more work than imagined, when you consider how to support and troubleshoot issues with such systems.

If the raw data is simple (just a number for age) and stored in the identity service already, and your derived data is simply a quantitative comparison (e.g. is the age number > 21) then things aren't so complicated. However, if the source data comes from multiple places and the derivation requires some logic, then you need to think about how support people, with limited access, could start to troubleshoot things when something goes

Monday Sep 24, 2007

User-centricity, Trust: Technology or Practice?

There has been a lot of buzz about user-centric identity but too often it seems to assume that user-centricity is completely dictated by technology. From my perspective as an IT architect, the practices and procedures implemented for a deployment have as much, if not more, influence on how user-centric a solution is and how trusted a solution is.

The term "user centricity" is often used for login and single signon types of systems if the system allows the user to be 'in control' of their identity information and which aspects of their identity are released to other relying party applications. (See my previous post for a few thoughts on when user-centricity is appropriate and when it's not.)

OpenID was designed with user-centricity in mind, and there are some existing OpenID deployments that are user-centric, but there is nothing about the technology itself that forces user-centricity. It would be very easy to deploy an OpenID provider in an organization-centric manner. A corporation could deploy an OpenID provider for its employees for use in authenticating to OpenID-compliant systems. The corporation could pre-create the accounts for users, assign user ids, and populate the user account attributes such as phone number or email address, from a corporate HR source. The OpenID confirmation page, in which a user can alter the user attributes supplied to a particular relying party, could be set to not allow the end user to change the values of their attributes. This would be a very un-user-centric deployment and might be done to prevent users from impersonating another user or claiming undeserved entitlements. So the user-centricity of an OpenID provider depends in large part on the practices and procedures around the OpenID account creation and authentication process.

On the other hand, while there are many organization-centric SAML-based solutions in production around the world, a SAML-based identity provider could easily be deployed in a user-centric fashion if desired. Any organization, or even an individual, could set up a SAML-based identity provider that allowed users to self-register for accounts and self-specify their user attributes and specify which attributes to share with service providers. So a SAML-compliant solution can be set up that is very user-centric. It all depends on the practices and procedures established by the identity provider.

So, how to decide what you need? First, you should decide whether user-centricity is appropriate for your needs, and then you should consider trust.

User-centricity is appropriate for situations where the user is the authoritative source for information - for example whether they're vegetarian. (See previous post) User-centricity is often not appropriate where some organization (such as a corporation or university or government entity) is the authoritative source of information for some of the user's attributes, such as the user's job level within a company, physician status within a hospital, affiliation with a university, or perhaps creditworthiness.

This brings us around to the question of trust. OpenID does not require any contact or setup between a relying party website and an OpenID provider, prior to the time a user logs in. The OpenID model assumes that the relying party is willing to trust any random entity on the internet (chosen by the user) to authenticate the user. This effectively means that the relying party website doesn't particularly care about what practices the OpenID provider
follows in handing out accounts, how reliable the OpenID provider is etc. So this type of scheme is going to be appropriate primarily for low-risk and non-critical sites such as blogs or social networking or sharing photos.

On the other hand, the SAML model assumes some contact between a relying service provider and an identity provider to exchange, out of band, information about the servers at each end of the communication. This means that parties in a SAML environment have some say about who they trust. An office supply website might want to insist that a user making corporate purchases with a P.O. is authenticated by the corporation and not a random identity provider of the user's choice. A medical lab might want to insist that a doctor is authenticated by his or her hospital's identity provider service, and not a random identity service that no one has ever heard of. A university library might want to insist that a visiting scholar is authenticated by the scholar's home university and not a random identity service of the user's choice. The SAML model gives a relying party a way to choose the identity providers they consider trustworthy. It gives an opportunity for the relying parties to ask
the identity provider questions about their practices in setting up accounts, whether they vet any of the user attributes, uptime and reliability etc.

I suppose that someone could customize an OpenID environment to implement a white-list of relying parties that are allowed to use the service and a relying party could implement customization to only allow use of certain OpenID providers. It would also be possible to customize a SAML deployment to allow a relying service provider and an identity provider to automatically register with each other without any human involvement or vetting. (In fact, we did a POC on this for a customer.) Which brings me back to my original point: practices and procedures influence the user-centricity and trustedness of any deployment.

So you should choose between user-centricity and organization-centricity based on whether the user or some organization is an appropriate source for authoritative/trusted information.
The technology choice should be a separate decision and should be based on how much trust is needed for the situation.

Thursday Aug 16, 2007

Trusted Sources of Information

I decided to make my first blog post be about trust because as an IT architect for Sun, responsible for identity and application security, I spend a great deal of time thinking about identity and trust.

I think many of us have had a seminal moment that wakes us up to an awareness of trust (or the lack thereof) on the internet. For me it was back in the early '90s when I received an email that stated that gang members in San Jose CA would drive around at night without their lights on and if anyone flashed their lights at them, the gang members would chase them and shoot them. I'd never received a hoax email before, so just unthinkingly forwarded the email to a friend who lived near there. My friend called the police and found out, fortunately, that it was a hoax. Afterward, I felt a little silly for believing something so ridiculous, but learned, sadly, that the age of internet innocence was long gone and that I would have to evaluate whether an email came from a trusted source of information.

When designing or reviewing application security mechanisms, I ask a similar question. "Are the security decisions based on information from a trusted source?"

This is an important question to ask when deciding between what I'll call a user-authoritative identity scheme and an organization-authoritative identity scheme. For the purposes of this post, in a user-authoritative identity scheme the user is in complete control of their identity and is considered the trusted source of information attributes about themselves. For example, If I sign up for an account at, I get to specify what my name is, which email address I'd like to use, and what my birthdate is. There are no checks or validations performed when I sign up because for this type of application, it's not necessary from a risk or trust perspective. If I say my name is Cinderella or Scheherazade, or that my age is 20 or 40 or 60 it is unlikely to matter to anyone because the nature of the application does not require trusted information.

In an organization-authoritative identity scheme, on the other hand, an organization (a company, hospital, government, university, non-profit etc.) owns a portion of the user's identity in that it serves as the authoritative source of information for it, with respect to that organization and any partner organizations. The organization (not the user) is the authoritative source for user attributes such as the user's employee/membership status and id number, job/member level, and access rights or entitlements within the organization(s). For example, if I ask for a AAA discount at a hotel, AAA is the authoritative source for whether I'm a member of this organization. If I apply for a loan, Sun Microsystems is the authoritative source of information about whether I'm employed by Sun. The bank isn't going to just trust me on that, because I might lie in order to get a loan. Similarly, when I use a Sun application, Sun (not me) is the authoritative source of information about what applications I'm allowed to access. So there are many scenarios where an organization-authoritative identity scheme is needed because the user might not be a trusted and authoritative source.

When designing any system, it is important to consider whether a user-authoritative or organization-authoritative identity scheme is most appropriate. It is helpful to evaluate the identity information that is used as a basis for security decisions. Is such information coming from a trusted source? Would the source stand to benefit if it supplied false or incomplete identity information? What could a person do if false identity information were supplied? Does the source have practices to ensure the integrity and correctness of the information supplied? These are a few examples of questions that can be used as a litmus test to evaluate the appropriateness of a design from an identity-centricity perspective.

In my next few posts I'll continue this thread on trusted sources of information, discussing how this relates to concepts such as user-centric identity, security, and risk as well as how we at Sun have used some of today's popular technologies, such as SAML and OpenID, to create identity systems of varying trust levels.


Thoughts on identity management


« July 2016