Wednesday Nov 07, 2007

Another CIFS server topic to blog about: Filesystem I18N

Late last year I wrote a blog entry about filesystem internationalization. It did not discuss the CIFS server project, but that was definitely on my mind. We had recently had several discussions about the various filesystem I18N problems, and that blog post more or less summarized what we'd decided to do. Now you can see that the Solaris I18N engineers (e.g., Ienup Sung) have delivered several major components:

So now we have codeset conversion APIs, Unicode encoding conversion APIs and Unicode normalization and string preparation APIs in the kernel and in user-land, including Unicode case-folding, and case- and normalization-insensitive Unicode string comparison. That's impressive! Ienup and his team did an excellent job of that. I did code reviews for Ienup, and feel that the code is quite good. Congrats Ienup!

It's time to update that ASCII art picture in my fs I18N blog entry to include SMB...

Did I mention that these components are open source?

Dealing with Windows SIDs in Solaris, part 2

As described in my first post on this subject, Solaris can now map SIDs to POSIX UIDs/GIDs and back, and it can store SIDs in ZFS. The identity models of Windows and Solaris are now unified, and the ACL model of ZFS has been extended. And we have a unified administration of SMB and NFS shares. Wow. I find this exciting, and not just because I've worked on parts of this story.

In this post I want to walk through the identity mapping facility's design. Next I'll talk about implementation, and then about how to use the facility.


Design of the Solaris ID mapping facility

The salient points of the design of the ID mapping facility are:

  • A door server daemon provides the ID mapping service (idmapd, svc:/system/idmap:default)

  • The service can be called via libidmap and the idmap kernel module

  • Given the need to map many SIDs at logon time and when dealing with ACLs, the idmap door protocol (an ONC RPC protocol), the kernel and user-land APIs, and the idmap service are designed to batch operations, to reduce latency by taking advantage of parallelism in the directory server and the network

  • There are two SQLite 2.x databases:
    • /var/idmap/idmap.db -- contains persistent name-based ID mapping rules

    • /var/run/idmap/idmap.db -- caches Windows name<->SID lookups, ID mappings and ephemeral ID allocations

  • There are two types of ID mapping today:
    • ephemeral ID mapping, where we dynamically allocate the next available UID or GID from the erstwhile negative uid_t/gid_t namespace (uid_t and gid_t are now unsigned), but we forget these mappings on reboot (see part 1 for more)

    • name-based ID mapping, where the sysadmin provides rules for mapping Windows users and groups to Solaris users and groups; these rules use names and wildcards, not SIDs and UIDs/GIDs.

  • There are three private system calls for idmapd:
    • idmap_reg() to register idmapd's door; besides a door fd argument there's a boolean that tells the kernel whether idmapd was unable to open/recover /var/run/idmap/idmap.db (see below)

    • idmap_unreg(), which is called when idmapd exits cleanly

    • allocids(), which allocates a number of ephemeral UIDs and GIDs for idmapd to use for dynamic ID allocation

  • The kernel tracks what ranges of ephemeral IDs are valid. Initially there are no (or very few) valid ephemeral IDs, and the kernel begins allocation of IDs at 2\^31. When idmapd crashes in a way such that the ephemeral DB is unrecoverable then idmapd tells the kernel about this when it registers, and the kernel moves the low end of the valid ephemeral ID range to match the top end.

  • idmapd uses asynchronous LDAP searches of the Active Directory Global Catalog to do name<->SID resolution

  • idmapd uses DNS SRV RR lookups and LDAP searches of the Active Directory configuration partition to auto-discover Active Directory domain, forest and site names, and domain controller and global catalog directory server names

  • I've run out of time. I'll cover implementation details next.

    Tuesday Nov 06, 2007

    Dealing with Windows SIDs in Solaris

    The CIFS server project integrated into ONNV / OpenSolaris in build 77. This is a very important milestone for Solaris, which now has a fully integrated native SMB server running in the kernel.

    "Fully integrated" implies many good things:

    • proper interaction of SMB share modes, file locks and oplocks with NFSv4equivalents
    • proper integration with Solaris administration utilities (e.g., sharemgr(1M))
    • support for case- and Unicode normalization-insensitive but case- and normalization preserving filesystems (yes, we now have Unicode normalization code in the kernel!)
    • integration of the Solaris and Windows identity models
    • filesystem support for the integrated identity model, as well as extended ACLs to support Windows ACL features
    • etcetera

    That is a very significant list! A lot of work went into this project and related sub-projects.

    I'll be blogging about the integration of the Solaris and Windows identity models, both in this post and subsequent ones.


    Solaris has distinct, small, flat user and group identity namespaces (POSIX UIDs and GIDs). Windows has a unified, practically unlimited, and non-flat namespace for user and group identities (SIDs). There's a very high impedance mismatch there!

    We knew we'd need to map between these two models, so we started a project to do that. We needed to be able to map any valid SID in an AD forest to a Unix UID and/or GID, as needed. And we needed such a system to be low-configuration, easy to use, and safe. Mapping between these models isn't hard, it's the other requirements that were challenging to tackle.

    Initially we pursued a notion of persistent dynamic mappings within each Unix (NIS/NIS+/native LDAP) domain, but Mike Shapiro helped us simplify things greatly with an outside-the-box idea: use the heretofore unused "negative" UID/GID namespaces for ephemeral dynamic ID mapping, thus removing two big problems with our earlier design (the need to configure a pool of IDs and the reliability issues associated with having to persistently store important mappings).

    "Negative UID/GID namespaces", you ask? Until now uid_t and gid_t have been signed 32-bit integers in Solaris, but the relevant standards (POSIX, SUS) require UIDs and GIDs to be positive integers, which means that we wasted almost half of the uid_t/gid_t namespace. Mike's insight was that we could use that wasted ID namespace as a pool of IDs that we can dynamically allocate IDs from, resetting the pool at boot time, and that this wouldn't be too expensive in terms of incompatibility (more on that below). So we changed the uid_t and gid_t types, and we reserved the 2\^31..2\^32-2 ID namespace for Solaris-driven allocation (i.e., customers cannot assign these IDs directly).

    ID mapping then works as follows:

    • there is an ID mapping service, svc:/system/idmap:default
    • the idmap service is accessed via RPC over doors only (i.e., it's a local service)
    • by default the idmap service validates SIDs and maps them to the next available 'ephemeral' UID or GID, and this mapping persists until the system reboots (more on this below)
    • the mapping service also offers name-based ID mapping, where you can map Windows domain users and groups to Unix users and groups by name
    • the consumers are: the SMB server in the kernel and in user-land utilities, the NFSv4 user-land nfsmapid daemon, the kernel ksid\*() functions (which are called from cr\*() kernel functions that deal with cred_t), and the idmap(1M) utility

    Now, using ephemeral IDs in the erstwhile negative ID space has some implications. First and foremost: ephemeral IDs must not be persistently stored anywhere, including in filesystem objects. Because that is far too restrictive the Solaris VFS and one filesystem, ZFS, have been modified to support storing SIDs instead of ephemeral IDs (the other filesystems simply reject any attempt to store an ephemeral ID). You read that right: ZFS can now use SIDs in ACL entries! Most applications will already do the Right Thing -- either reject or pass through ephemeral IDs -- and those core Solaris apps that needed modification have been modified. C++ mangled symbols for methods that take uid_t or gid_t arguments will change on recompile (this was deemed acceptable). For more information you should see the ARC case that covers ephemeral IDs (which will be available soon, as I understand it).

    By the way, I think Solaris may now be the first non-Windows implementation of NFSv4 that supports the use of user/group names from many domains on the wire!


    Next up: current limitations of ID mapping, ongoing sub-projects, and a guided tour of the source. The impatient can start by looking at:

    You can find calls to various ID mapping and related functions using the OpenSolaris source code browser, of course.

    Thursday Oct 11, 2007

    Phishing as a man-in-the-middle kind of attack

    I gave a presentation to the Liberty Staff yesterday about Phishing as an MITM attack, and what can be done about it. I think it went very well, and I'm very excited that I'll be meeting people I didn't know who are working in this space, and that we could have a significant impact on the future of web authentication and in ridding us of phishers.

    I don't have enough time to give this topic a complete treatment in this blog entry, so I'll stick to a very short summary. The Internet-Drafts and links to them that are relevant here are linked to in earlier blog entries of mine. Rest assured though, I will be writing more link-rich blog entries about this topic soon enough, and I'll post my presentation once I add a few more slides (mostly to include relevant links and to give credit where it's due -- I had to limit myself to two slides for the presentation itself!).

    The gist of this presentation was: phishing is not about stealing passwords, it is about stealing our money -- passwords are gravy to a phisher. If we replace cleartext passwords in HTML forms POSTed over https as our predominant method of web authentication, but aren't careful enough to defeat MITM attacks then phishers will still be in business, and they'll still steal our money. Note that there are practical MITM attacks that phishers can and do mount that are not on-path attacks (i.e., the phisher need not be in the route path from the client to the server) -- think of URLS like "http://www.yourbank.tld:sdfjkfgsdf@clever-phisher.tld/login.php".

    It's crucial that we understand that neither DNS registrars nor certificate authorities care to help, nor are they in a position to help us defeat phishing.

    Here's where Project Liberty comes in: federations, by dint of being much smaller than the Internet as a whole, can help. Federations can help by providing a way to authenticate relying parties to user agents, or at least by providing a relying party authorization function for user agents -- given this then federations can act as whitelists and keep the phishers out. There is a crucial UI element here though: users need to be able to know what identity they used to authenticate to some server, what the server's name is and what the name of the federation is that mediated mutual authentication.

    Besides the message that federated mutual authentication provides a mechanism to keep phishers out, there's also the issue of ensuring that there are no practical MITM attacks left to phishers. This is where channel binding comes in. If authentication happens about the HTTP/TLS layer, then we need to make sure that the server we think we're talking to at that layer is the same as the one at the HTTP/TLS layer, or we have to make sure that all messages to the server are additionally proteted about the HTTP/TLS layer (this last is never going to happen). So either we push authentication down the stack, to the HTTP/TLS layer, or we need to provide some way to bind web authentication to the HTTP/TLS "channel."

    I described several ways to do the channel binding and mutual authentication.

    Credit for these ideas, by the way, goes to Sam Hartman, Leif Johansson, and the IETF usual suspects who helped refine them (Jeff Hutzelman, Jeff Altman, Love Hörnquist, RL "Bob" Morgan, and Lisa Dusseault, Chris Newman, and many others).

    Tuesday Sep 18, 2007

    Improving Web Security: AJAX + GSS/SAML/other authentication + channel binding to TLS


    Authentication

    Leif Johansson has a couple of Internet-Drafts (proposals for RFCs) [1][2] that would provide for a novel way to deal with authentication in web applications. This is all related to Sam Hartman's Internet-Draft on dealing with phishing. The idea in Leif's I-Ds is to:

    1. provide a way to do multi-round-trip user/server authentication over HTTP 1.0 (and 1.1), starting with the GSS-API (but applicable to SAML profiles and other schemes),
    2. bind this authentication to the TLS sessions used between the client and the server,
    3. and use cookies to bind all those sessions together with the same authentication event.

    Imagine that you have multiple identities, which you may have enrolled for in a variety of ways (such as at a brick&mortar location, or online via another identiy (e.g., via your ISP), or online much like when you sign up for free e-mail accounts.

    Now imagine that you can authenticate these identities using a strong network authentication mechanism (say, Kerberos V), so you don't have to type passwords into forms anymore. And imagine that there are authentication federations, so that you can use one identity in many places without having to expose your passwords to many servers. Lastly, imagine that you can authenticate to some website using an HTTP URL (as opposed to HTTPS) and that immediately you're redirected to the HTTPS URL for the same without having to click through the "give your money away to the nice attacker?" dialog box. Oh, and one more thing: this happens with some website designer UI control.

    And all that with protection against MITMs, without necessarily depending on DNSSEC, nor on a true PKI.

    That's what we're talking about here.

    The components of this are, then:

    The XMLHttpRequest extensions I have in mind are, roughly:

    • a method for requesting GSS-API (or other) authentication with a given {mechanism, identity, [federation]} tuple as an argument;
    • a method for requesting GSS-API auth but with the browser displaying an identity selection dialog or, better yet, with a DOM object representing where on the page the browser should prompt for identity (but the browser should not make its UI elements for this available to the calling script via the DOM, so there's no way to leak the set of identities the user has to any script on the page);
    When either of these methods is used the browser will then do the HTTP_S_+GSS+channel binding dance, even if the page where the script is running came from HTTP_not_S. And then it will, for the length of the session, accept the server's certificate as valid for the server's name from the URL being fetched (which, while we're at it, must not be cacheable).

    The script should set a cookie when the XMLHttpRequest succeeds -- we can't be doing the HTTPS+GSS+cb dance for every URL, just once a session, thank you (or until the cookie expires).

    We'll want another host object to help with online enrollment. This object/class (prototype) should have a method to set the credentials for a {mechanism, federation, identity} tuple. When this method is called the user should be prompted in browser chrome as to whether to accept this credential, and whether to accept it only for this host, this session, whatever.

    Putting it all together:

    1. The user visits an http:// URL (no https; using https is OK, but it requires a server cert valid to some trust anchor).
    2. The page a that site loads a script that uses an XMLHttpRequest object to do GSS+channel binding, authenticating the user to the site, the server to the client, and binding the server's cert to this authentication. The script will set a secure only cookie for fast user re-authentication and it will ensure that the selected ID is remember subsequently.
    3. The UI will look like this: there will be an "authenticate" button, and maybe an "authenticate as..." button -- click there and it all goes. But also there will be another kind of lock icon to go with the TLS one, or perhaps a different shape and color for the existing one, to indicate that stronger authentication has been done (we want users to want this).


    Enrollment

    So you travel and sometimes use kiosks or other people's 'puters. Which means you may not have your long-lived credentials with you. All you have then are plain old username+password credentials, and, perhaps, the ability to use your cell phone. So you do traditional username+password (preferably a temporary one obtained via your cell phone) form authentication, and a script enrolls new GSS creds for you and your browser. But there may be a field in the HTML form for limiting the lifetime of the resulting credentials (described, perhaps, as a session?).

    When you enroll for an identity for the first time there'd be no username+password, just captchas/whatever, of course.

    Actually, this too might be a good candidate for design as a new method of XMLHttpRequest...

    So we can't get rid of passwords in all circumstances -- preferably you could carry your non-password credentials on a token, but we'll assume you can't. But perhaps we can get the number of passwords you need to remember down to a manageable few, corresponding to how many federations you have identities in.

    I think some parts of this might be very easy to prototype with Mozilla, particularly if we stick to building only extensions to the XMLHttpRequest object (then we don't have to learn how to write plug-ins, new host objects, etc...).

    SOAP indigestion continued

    Ah, a friend tells me that negotiation of all security related things too, like key derivation, is left to application profiles. Even such crucial aspects of a security system like protection against replay attacks is left to application profiles. I see no discussion of reflection attacks in SOAP, so I hope that there's some indication of directionality in the parts of SOAP messages that are signed, or that app profiles always use different session keys for each direction. \*sigh\*

    In the IETF we have security frameworks like TLS and SASL that could be as profiled as SOAP is, but by and large all such frameworks come with required-to-implement functionality that makes it possible to have off-the-shelf implementations that Just Work (tm). The IETF mostly does not produce standard APIs and specific programming language APIs for them (exceptions: the GSS-API, SCTP), so each off-the-shelf implementation of, say, TLS, may have its own APIs, but at least implementations of application protocols that use TLS just work and just interop without having to have per-application profiles of TLS. TLS handles most of the security-sensitive aspects of a serious security framework, such as negotiation of key exchange and authentication mechanisms, ciphers, MACs, key-derivation, re-keying, replay protection, reflection protection, etc...

    Application developers are usually not cryptographic protocol designers. They shouldn't have to know too much about mundane (to them) issues like key derivation, damn it. Complicating a security analyst's job by pushing so much of a security specification to every application profile doesn't help either. The OASIS WSS TC has done SOAP developers a disfavor here, to say nothing of what it's done for SOAP users.

    Of course, SOAP can use TLS. But when it does there is no binding between the use of SOAP Security token profiles for authentication and the TLS sessions used. But at least that's no worse than the sate of authentication for web applications. And there is hope for them still (I'll blog about that sometime).

    Monday Sep 17, 2007

    SOAP, Security, Kerberos V, indigestion

    Twice in the past 5 or so years I've sent the OASIS WSS TC some comments on the Kerberos V Token Profile 1.1 for SOAP Message Security 1.1. You can probably go find the mails in the archives. I never got answers to the questions that really mattered, and then I dropped the matter -- I'm not a SOAP implementor, after all -- though I had cared out of fear that someday the Solaris krb5 team would be asked to make it possible to implement a problematic spec.

    A colleague asked me about this profile today. So I went to find answers, again.

    Nowadays the Kerberos V Token Profile 1.1 is no longer a draft.

    Now, reading the two specs (SOAP Message Security 1.1 and the Kerberos V Token Profile 1.1) I note that:

    • The TC added support for using the initial context token from the Kerberos V GSS-API mechanism, probably in response to a question from me as to why they didn't. I so wish they hadn't. The right thing to do would have been to use all of the GSS-API mechanism, including its per-message token services, or that they use the GSS_Pseudo_random() function to extract keys from the mechanism's security contexts. As it is I think we'll need GSS-API extensions to get at mechanisms-specific internal contents: the krb5 session and sub-session keys.
    • Key derivation is still unspecified, in the token profile and in the base spec, which means that you can only use XMLenc ciphers and XMLdsig signature algorithms whose key sizes are compatible with the enctype used in the Kerberos V AP-REQ Authenticator for the sub-session keys (or in the Ticket for the session key), or you must come up with some profile that specifies key derivation, or you can choose not to interop.
    • Negotiation of things like XMLenc ciphers is also out of scope (in the base SOAP security spec; perhaps it's specified elsewhere?).

    \*sigh\*

    I think SOAP would benefit from using TLS for transport security and PKI/Kerberos/... for authentication with channel binding to the TLS session. Of course, that would require further profiling, and I gather everything SOA is political, so getting there seems unlikely. But I think I'll try, particularly when On Channel Bindings is published as an RFC (it's in the RFC Editor's queue!).

    Friday Sep 14, 2007

    In Austin for ACL? Don't miss Gotan Project

    I saw Gotan Project last night. Wow. What a show. They play again tonight, don't miss them. The crowd loved them, and they even loved it when they played a straight Tango tune, Canaro en París (a guitar duo version can be heard here), a very difficult piece (it gets very fast at the end), both to play and to dance. Gotan Project's version of Canaro en París was one of the fastest I've heard yet. The crowd (mostly Austinites) even screamed "¡Otra! ¡Otra! ...," demanding an encore.

    Unfortunately, there is no suitable dance floor at Stubb's, and what there is was crowded.

    Gotan Project had:

    • one celloist
    • three violinists
    • one piano player
    • one bandoneonista
    • one guitarist
    • one DJ
    • one keyboardist
    • one singer
    • and a heck of a sound

    Some things were canned: drums, and vocals for at least one song.

    Wednesday Aug 01, 2007

    IDNAbis -- must not miss I18N presentation

    At last week's 69th IETF meeting (in Chicago) there was a presentation on the IDNAbis effort at the SAAG meeting on Thursday that anyone with an interest in I18N should look at and listen to (the presentation starts about 28 minutes into the recording).

    For me the biggest takeaway is this: if you want Unicode version agnosticism, and you \*should\* want that, then you need to think carefully about where unassigned codepoints will be dealt with. In particular, IF you use ACE encoding on the wire in your protocols then you need only worry about Unicode versions supported at the client end -- a very important point. Of course, administrative authorities must be the ones to enforce rules about Unicode version use, and about use of codepoints heretofore considered dangerous, the latter in context- and language-specific ways (another crucial, and brilliant, insight in the IDNAbis presentation).

    One of the attributes of the IDNAbis proposal is that a lot of constraints from stringprep would be relaxed significantly, to the point where we can, and should, consider the use of A-labels (meaning, the output of toASCII(), that is, punycoded strings), on the wire in critical protocol elements. In particular I'm thinking that Kerberos V I18N should just shove ACE into all instances of GeneralString in the protocol, augmented with UTF8String and OCTET STRING (for legacy names from just-send-8 deployments) aliases of principals and realm names to support migrations.

    Friday Jun 01, 2007

    C with continuations?! A report on a conversation in Cancún

    I shared my thoughts on async, async everywhere with my friend Sam while on a bus ride to some ruins in Cancún this past weekend, and his reaction was, "sure, but why not continuations everywhere, why not just add continuations to C?" This floored me: I'd never considered such a thing -- it seems so... foreign, out of place, yet so clever. Needless to say, we proceeded to have a lively conversation about this.

    What would it mean to add something like Scheme's call/cc to C? We thought it'd mean this: that all function call frames would be allocated from a heap and would be garbage collected, and that there would be no stack, plus all attendant calling conventions changes. (Such a C might as well also have closures, while we're at it.)

    Such a C would have to deal with alloca() (e.g., by turning alloca()s into heap allocations and recording those in the activation record so they can be freed when the activation record is garbage collected), and setjmp() and longjmp() (which, actually, become much simpler in implementation!), and such things. And probably would have to have a "foreign function call" facility for calling normal C code, to make it easier to start using this weird C.

    What would break? Anything that assumes that the addresses of automatic variables in different function call frames in the current code path are ordered in the same way as the function call frames -- i.e., code that assumes a stack. And any C code that uses asm() to get at the stack pointer would likely break (a stack pointer could be provided, but its values couldn't be changed by asm() code). That's not a whole lot of code, probably, as a percentage of an entire OS, but it'd be a largish amount of code in absolute terms, perhaps even excluding code implementing other languages in C.

    The OS would still have to provide async versions of all synchronous system calls, unless, my friend cleverly points out, the kernel itself were built with such a C compiler!

    We're talking about dirt cheap threading here, so cheap that it might outweigh the cost of garbage collection and heap fragmentation (there's no way to do a copying garbage collector for a language like C, I'm afraid, and any GC would have to be very conservative) -- it would have to in order to succeed. With threads this cheap you can start one any time you want, for any reason, and avoid having to write CPS-like code atop lots and lots of async interfaces, each implemented in CPS too.

    No, this will never happen. It's a fantasy of people who wish we could write procedural-looking code with tiny amounts of syntactic sugar to enable lots of cheap parallelization, while retaining source-level compatibility with millions of lines of code.

    In reality programmers will always just have to deal with writing async, CPS-like code; we'll just have to do what should have been the compiler's job. Scheme lost. Tough beans.

    Still, if we want to parallelize code cheaply we'll need thread creation to be cheap as well, dirt, dirt cheap. What other ways are there to make it so? Pre-creation of threads?

    Tuesday May 22, 2007

    Async, async, everywhere

    A few weeks ago there was a brief sub-thread on the networking-discuss OpenSolaris list about whether the new proposed kernel sockets API project should not begin by delivering only a synchronous API. That suggestion was quickly dismissed, fortunately.

    IMO all APIs that can block on I/O should be asynchronous.

    Even APIs that can "block" only on lengthy computation (e.g., crypto) should be async, as such computation might be offloaded to helper devices (thus getting I/O back into the picture) or the call to such an API might fork a helper thread (think automatic parallelism, which one might want on chip multi-threading (CMT) systems) if that is significantly lighter-weight than just doing the computation.

    For example, gss_init_sec_context(3GSS) should have an option to work asynchronously, probably using a callback function to inform the application of readiness.

    And open(2), creat(2), readdir(3C), and so on should all be asynchronous. If all filesystem-related system calls had async versions then one could build file selection combo box widgets that are responsive even when looking for files in huge directories (the user would see the list of files grow until the whole thing was read in or the file the user was interested in appears, as opposed to having to wait to see anything at all) and which don't need to resort to threading to achieve the same effect. And the NFS requirement that operations like file creation must be synchronous would not penalize clients that support async versions of creat(2) and friends.

    Of course, adding async versions of all filesystem-related system calls without resorting to per-call worker threads probably means radical changes to the VFS and the filesystem modules. Which should prove the point: it's much easier to layer sync interfaces atop async ones than it is to rework the implementation of sync interfaces to support async ones efficiently, so one should always start by implementing async interfaces first.

    It's been decades since GUI programming taught us that everything must be async. And even web applications, which used to be synchronous because of the way HTML and browsers worked, work asynchronously nowadays -- async is what Ajax is all about.

    Really, we should all refrain from developing and delivering any more synchronous blocking APIs without first delivering asynchronous counterparts.

    BTW, closures are a wonderful bit of syntactic sugar to have around when coding to async APIs -- they let you define callback function bodies lexically close to where they are referenced.

    Given a really cheap way to test for the availability of a thread in a CMT environment, and a really cheap way to start one, then all those callback invocations (closure invocations) could be done on a separate thread when one is available.

    I like to think of async programming as light-weight threading because I like to think of closures and continuations as light-weight threading. Continuations built on closures and continuation-passing-style (CPS) conversion, in particular (i.e., putting activation records on the heap, rather than on a stack), are a form of very light-weight cooperative multi-threading (green threads): thread creation and context switching between threads has the same cost as calling any function. The trade-off when putting activation records on the heap is that much more garbage is created that needs to be collected -- automatic stack-based memory allocation and deallocation goes out the window. A compromise might be to use delineated continuations and allocate small stacks, with bounds checking rather than guard pages used to deal with stack growth. VM manipulation to setup guard pages and the large stacks needed to amortize this cost are, I suspect, some of the largest costs in traditional thread creation as compared to heap allocation of activation records.

    Another reason to think of async as light-weight threading is that a workaround for a missing, but needed, async version of a sync function is to create a worker thread to do the sync call in the background and report back to the caller when the work is done. Threads are fairly expensive to create. At the very least async interfaces allow the implementation cost to be less obvious to the developer and leave more options to the implementor (who might resort to forking a thread per-async call if they really want to).

    Finally, pervasive async programming looks a lot like CPS code, which isn't exactly pleasant. Too bad continuations haven't made it as a mainstream high level language feature.

    Tuesday Apr 10, 2007

    .safe TLD? Probably a bad idea

    F-Secure proposes a .safe TLD.

    How would a global registrar be able to vet a request from a bank in, say, Nigeria? You'd think that ccTLD registrars would be in a much better position to see to it that local registrants are vetted according to local regulations, which argues for a .safe.cc.

    Why just financial institutions? After they move to .safe all the other senstive services left outside .safe will become targets, so why not move all medical providers, shopping sites, etcetera to .safe too?

    Confusable bank names occur in the world of brick-and-mortar anyways, and those cause problems in the Internet, so how is .safe to avoid confusables?

    And so on. I think this is a bad idea. On the other hand, establishing a precedent that registrars can do better would be good!

    Thursday Apr 05, 2007

    C shell pushd/popd on steroids as Korn Shell functions

    Blogfinger inspires me to post some of my crazy KSH function code. The code below is a heavily hacked version of a similar functions that I got from Will several years ago. Someday I should post my partial ASN.1 DER encoder/decoder written in KSH :)

    Syntax highlighting courtesy of VIM and its :TOhtml feature.

    typeset|grep '\^integer cdx$' > /dev/null || integer cdx=0
    typeset|grep '\^integer cdsx$' > /dev/null || integer cdsx=0
    
    function cdhelp
    {
        cat <<-"EOF"
            Usage: cdinit  [<file>]
            Usage: cdcl
                Initialize directory list and stack.
                Clear directory list and stack.
    
            Usage: cdfind  <path> [quiet]
            Usage: cdgrep  <partial path> [first]
                Find a directory, by exact match in the cd list.
                Find matching directories in the cd list.
    
            Usage: cdls
            Usage: cdrf    [-a|--append] [<files>|-]
                Show the cd list. Use this to save your cd list.
                Read in a cd list and replace the current cd list.
    
            Usage: cdsv    [<directory paths>] (if none given then CWD)
                Save the given or current directory to the cd list.
    
            Usage: cdow    <index>
                Overwrite the given entry in the cd list with the CWD.
    
            Usage: cdto    <index>|<path>|[+]<partial path>
                Change the current directory to an entry from the cd list,
                or, if a path is given, then change to the given path and
                save to the cd list.
    
            Usage: cdrm [<path>|<index>]
                Remove a directory, or the current directory from the cdlist.
    
            Usage: cdsort
                Sort the cd list (alphanumerically)
    
            Usage: pushd   [<directory>]
            Usage: popd    [<number>]
            Usage: rightd  [<number>] (reverse of popd)
            Usage: dirs    (shows dirs in pushd/popd stack)
                Directory stack, similar to the C-Shell built-ins of similar names.
    EOF
    }
    
    function cdfind
    {
        typeset p dir
        integer i
        if [[ $# -lt 1 || -z "$1" ]]
        then
            print "Usage: cdfind <path> [quiet]"
            return 5
        fi
        i=0
        p="$1"
        [[ "$1" != /\* ]] && p="$PWD/$1"
        for dir in "${cdlist[@]}"
        do
            if [[ "$p" = "$dir" ]]
            then
                if [[ "$2" != quiet ]]
                then
                    print "Found at index $i"
                    return 0
                fi
                return $i
            fi
            i=i+1
        done
        return 255
    }
    
    function cdgrep
    {
        typeset i s
        integer i=0 s=1
        if [[ $# -lt 1 || $# -gt 2 || -z "$1" ]]
        then
            print "Usage: cdgrep <partial path> [first]"
        fi
        while [[ i -lt cdx ]]
        do
            if eval "[[ \\"${cdlist[i]}\\" = \*${1}\* ]]"
            then
                print "$i ${cdlist[i]}"
                s=0
                [[ "$2" = first ]] && return 0
            fi
            i=i+1
        done
        return $s
    }
    
    function cdshow
    {
        typeset found_at
        cdfind "$PWD" quiet
        found_at=$?
        [[ $found_at -eq 255 ]] && found_at='[unsaved]'
        print "$found_at $PWD"
        return 0
    }
    
    function cdto
    {
        typeset i j dir
        integer i
        if [[ $# -ne 1 || -z "$1" ]]
        then
            print "Usage: cdto <index>|<path>|[+]<partial path>"
        fi
        if [[ "$1" = +([0-9]) ]]
        then
            if cd "${cdlist[$1]}"
            then
                print "$1 ${cdlist[$1]}"
                return 0
            fi
        else
            if [[ "$1" != \\+\* && -d "$1" ]]
            then
                if cd "$1"
                then
                    cdsv > /dev/null
                    cdshow
                    return 0
                fi
            fi
            cdgrep "${1#\\+}" first|read j dir
            if [[ -d "$dir" ]]
            then
                if cd "$dir"
                then
                    cdsv > /dev/null
                    cdshow
                    return 0
                fi
            fi
        fi
        print "Could not cd to $1"
        return 1
    }
    
    function cdls
    {
        integer i=0
    
        while [[ i -lt cdx ]]
        do
            print $i ${cdlist[i]}
            i=i+1
        done
    }
    
    cdcl ()
    {
            cdx=0
            cdsx=0
            unset cdlist
            unset cdstack
            set -A cdlist
            set -A cdstack
    }
    
    function cdsv
    {
        typeset dir current
        integer i
        if [[ "$1" = -h || "$1" = --help ]]
        then
            print "Usage: cdsv [<directory paths>] (if none given then CWD)"
            return 1
        fi
        # Look for $PWD in cdlist[]
        current=""
        if [[ $# -eq 0 ]]
        then
            current="current "
            set -- "$PWD"
        fi
        for dir in "$@"
        do
            cdfind "$dir" quiet
            i=$?
            if [[ $i -ne 255 ]]
            then
                print "The ${current}directory $dir is already in the cdlist ($i)"
                continue
            fi
            #cdlist[${#cdlist[@]}]="$PWD"
            cdlist[cdx]=$PWD
            cdx=cdx+1
        done
    }
    
    # overwrite entry
    function cdow
    {
        integer i
        if [[ $# -ne 1 || "$1" != +([0-9]) ]];
        then
            print "Usage: cdow index#(see cdls)"
            return 1
        fi
        cdfind "$PWD" quiet
        i=$?
        if [[ "$1" -gt ${#cdlist[@]} ]]
        then
            print "Index is beyond cdlist end ($1 > ${#cdlist[@]})"
            return 1
        fi
        if [[ $i -ne 255 ]]
        then
            print "The current directory is already in the cdlist ($i)"
            return 1
        fi
        cdlist[$1]=$PWD
        return 0
    }
    
    function cdrf
    {
        typeset f status dir spath
        typeset cdx_copy cdlist_copy usage
        integer cdx_copy=0
    
        usage="Usage: cdrf [-a|--append] [<files>|-]"
        status=1
    
        set -A cdlist_copy --
    
        # Options
        while [[ $# -gt 0 && "$1" = -?\* ]]
        do
            case "$1" in
                -s|--strip)
                    spath=$2
                    shift
                    ;;
                -a|--append)
                    cdx_copy=$cdx
                    set -A cdlist_copy -- "${cdlist[@]}"
                    ;;
                \*)  print "$usage"
                    return 1
                    ;;
            esac
            shift
        done
    
        # Default to reading stdin
        [[ $# -eq 0 ]] && set -- -
    
        # Process cdlist files
        for f in "$@"
        do
            if [[ "$f" = - ]]
            then
                # STDIN
                sed -e 's/\^[0-9]\* //' | while read dir
                do
                        dir=${dir#$spath}
                        [[ "$dir" != /\* ]] && dir="$PWD/$dir"
                        cdlist_copy[cdx_copy]=${dir}
                        cdx_copy=cdx_copy+1
                done
                status=0
                continue
            fi
    
            # Find the cdlist file
            if [[ ! -f "$f" && ! -f "$HOME/.cdpath.$f" ]]
            then
                print "No such cdpath file $p or $HOME/.cdpath.$f"
                continue
            fi
    
            [[ ! -f "$f" && -f "$HOME/.cdpath.$f" ]] && f="$HOME/.cdpath.$f"
    
            # Read the cdlist file
            sed -e 's/\^[0-9]\* //' "$f" | while read dir
            do
                    dir=${dir#$spath}
                    [[ "$dir" != /\* ]] && dir="$PWD/$dir"
                    cdlist_copy[cdx_copy]=${dir}
                    cdx_copy=cdx_copy+1
            done
            status=0
        done
    
        [[ $status -ne 0 ]] && return $status
    
        # Install new cdlist
        cdcl
        cdx=$cdx_copy
        set -A cdlist -- "${cdlist_copy[@]}"
        return 0
    }
    
    function cdsort
    {
        cdls | sed -e 's/\^[0-9]\* //' | sort -u | cdrf -
        cdls
    }
    
    function cdrm
    {
        integer i
        if [[ -n "$1" && "$1" != +([0-9]) ]]
        then
            cdfind "${1:-$PWD}" quiet
            i=$?
        elif [[ -n "$1" && "$1" = +([0-9]) ]]
        then
            i=$1
        elif [[ $# -ne 0 ]]
        then
            print "Usage: cdrm <path>|<index>"
            return 1
        else
            cdfind "${1:-$PWD}" quiet
            i=$?
        fi
        if [[ "$i" -eq 255 ]] && return 1
        then
            cdfind "${1:-$PWD}"
        fi
    
        cdls|grep -v "\^${i} "|sed -e 's/\^[0-9]\* //'|cdrf
        i=$?
        cdls
        return $i
    }
    
    function pushd
    {
        if [[ $# -gt 0 && ! -d "$1" ]]
        then
            print "Can't cd to: $1"
            return 1
        fi
        cdstack[$cdsx]="$PWD"
        if cd "${1:-.}"
        then
            cdsx=cdsx+1
            cdstack[cdsx]="$PWD"
            [[ "$2" = sv || "$2" = save ]] && cdsv
            cdshow
        else
            print "Could not cd to $1"
            return 1
        fi
        return 0
    }
    
    function pushdsv
    {
        pushd "$1" save
    }
    
    function popd
    {
        if [[ $((cdsx-${1:-1})) -lt 0 || $((${#cdstack[@]} - ${1:-1})) -lt 0 ]]
        then
            print "Empty stack or popping too much"
            return 1
        fi
        if [[ ! -d "${cdstack[cdsx-${1:-1}]}" ]]
        then
            print "Can't cd to: ${cdstack[cdsx-${1:-1}]}"
            return 2
        fi
        if cd "${cdstack[cdsx-1]}"
        then
            cdsx=cdsx-1
            cdshow
        else
            print "Could not popd to ${cdstack[cdsx-1]}"
            return 1
        fi
        return 0
    }
    
    function dirs
    {
        integer i
        if [[ ${#cdstack[@]} -eq 0 ]]
        then
            print "Empty stack"
            return 1
        fi
        i=1
        print -n "${cdstack[0]}"
        while [[ $i -le $((cdsx)) && $i -le ${#cdstack[@]} ]]
        do
            print -n " ${cdstack[i]}"
            i=i+1
        done
        if [[ ${#cdstack[@]} -gt $cdsx && $i -lt ${#cdstack[@]} ]]
        then
            print -n " <-> "
            while [[ $i -lt ${#cdstack[@]} ]]
            do
                print -n " ${cdstack[i]}"
                i=i+1
            done
        fi
        print
        return 1
    }
    
    function rightd
    {
        typeset i
        integer i
        if [[ -n "$1" && "$1" != +([0-9]) ]]
        then
            print "Usage: rightd [<number>]"
            return 1
        fi
        i=${1:-1}
        if [[ ${#cdstack[@]} -le $((cdsx+i)) ]]
        then
            print "No directories to the right on the stack"
            return 1
        fi
        if [[ ! -d "${cdstack[cdsx+i]}" ]]
        then
            print "Can't cd to: ${cdstack[cdsx+i]}"
            return 2
        fi
        if cd "${cdstack[cdsx+i]}"
        then
            cdsx=cdsx+i
            cdshow
        else
            print "Could not cd to ${cdstack[cdsx+i]}"
            return 1
        fi
        return 0
    }
    
    cdinit ()
    {
        [[ -n "${recd}" ]] && return 0
        cdcl
        if [[ $# -eq 0 && -f ~/lib/cdpaths ]]
        then
            cdrf < ~/lib/cdpaths
        elif [[ $# -eq 1 ]]
        then
            cdrf "$1"
        fi
        cdls
    }
    
    vicd ()
    {
        ${EDITOR:-vi} ~/.kshcd
    }
    
    recd ()
    {
        typeset recd
        recd=inprogress
        . ~/.kshcd
        recd=""
    }
    
    cdinit
    

    Friday Mar 30, 2007

    Neptune and IPsec

    That new NIC of ours rocks. Its best feature: incoming packet classification offload. "What?" you ask? Neptune can route incoming packets the CPUs most closely associated with the packet flows to which those incoming packets belong -- and this means lower latency because of hotter caches. Compris?

    This classification works on 5-tuples (or hashes thereof), of course: source and destination addresses, next protocol (e.g., TCP, UDP, SCTP), source and destination port numbers.

    Curiously absent from the data sheet: IPsec. So I asked and I found out: Neptune can classify just as well by IPsec SA SPI as by plaintext 5-tuples. Of course, so can Solaris, therefore Neptune, Niagara and Solaris fit together well, IPsec or no IPsec.

    Excellent.

    Tuesday Mar 27, 2007

    Google's J2ME apps

    Jonathan Schwartz raves about Google Maps on his Blackberry. I couldn't agree more. Two weeks ago I depended on it heavily (running on my Samsung SPH900) to get around in southern Florida, where I went on vacation. It found locations, plotted routes, found gas stations near where I was, all very quickly, and easily. Wow. I also use Google's GMail J2ME application for my personal e-mail (actually, this is the primary reason I use gmail, that I get to use it on my cell phone with a wonderful UI).

    About

    I'm an engineer at Oracle (erstwhile Sun), where I've been since 2002, working on Sun_SSH, Solaris Kerberos, Active Directory interoperability, Lustre, and misc. other things.

    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today