Monday Jan 22, 2007

What's an iPhone without the phone?

What's an iPhone without the phone? My guess, the nextgen iPod with ichat w/ audio, or a skype like app? Seems like the logical progression to me...

Monday Jul 17, 2006

Bringing up Solaris domain 0 on Xen

Bringing up Solaris domain 0 (dom0) on Xen was surprisingly easy. Mostly because all of the hard work was already done by other people. The hard work which remained, was also done by other people :-)

I apologize in advance for giving credit to the wrong folks or for taking credit for something I didn't do. This was such a blur, it all tends to blend together...

Obviously, this won't cover everything. I tried to talk about some of the more interesting parts. Well, interesting is relative of course :-)

To start with, first you need to be able build xen on Solaris. You could actually cheat and start with a xen image and skip all the user apps to manage domUs. But that seems kind of pointless unless you have tons of bodies to throw at the effort, which we don't, thankfully.

John L and Dave already had Xen building, so all I had to do was ask them what I needed to do to build it.. The first thing you need are changes to gcc and binutils that's shipped in /usr/sfw. Which is why you need to download unofficially updated SUNWgcc, SUNWgccruntime, and SUNWbinutils packages in order to build the xen sources on Solaris (they will be officially updated at some point in the future).

There were two things that John L fixed. The first one was a bug in how we build gcc (can't find it's own ld scripts). See this bug.

The second fix was to add a -divide to the binutils gas to not treat / as a comment. John got this change back to to binutil cvs repository, but it hasn't made it out in a release yet (as far as I know).

Of course, Dave and John L had to change stuff in the xen.hg gate to get it to compile too. If you look at the source, you'll notice there are a few things we don't try and compile current, e.g. hvm related support. Then, of course, you need to test it to make sure the xen binary worked (user apps would have to wait until Solaris dom0 was up). Not sure if it just worked or they had to debug it, but it was working by the time I got to it :-)

So after I built my xen gate, put xen.gz in /boot (starting with 32-bit dom0), and tried to boot a i86xen (vs i86pc) version of the kernel debugger (kmdb). Again, I was following footsteps here. John L had done a ton of work getting kmdb to work in domU (since we already had Solaris domU running on a Linux dom0). And Todd and/or John L had already debugged kmdb on a Solaris dom0. So I was at kmdb prompt ready to venture into unknown territory.

So before I could boot my Solaris dom0, I had to build one. Up to this point, we only had the driver changes we needed for domU. Before xen, we only had one x86 "platform", i86pc.

This is unlike SPARC, which usually gets a new "platform" or every major architecture change (e.g. sun4m, sun4u, sun4v). On SPARC, you'll also see machine specific platmod's and platform directories to provide additional functionality and modules which are specific to a given machine (e.g. /platform/SUNW,Sun-Fire-880).

For xen (on x86), we have a new "platform", i86xen. For Solaris dom0, we we're missing all of the drivers which were in i86pc (i.e. they did not show up in i86xen). The vast majority of these drivers aren't platform specific and can go into intel, i.e. doesn't have any platform specific code (which today is i86pc and i86xen). So I had to try to move each driver over to intel and see if it had platform specific code or not. Since there was only one intel "platform" in the past, the lines we're a little gray at times. But I finally got through it and ended up moving around 40 drivers in src/uts and a little over 15 in closed/uts, to intel from i86pc. For the rest, I need to create makefile in i86xen to build a platform specific version of these drivers.

Now I had a Solaris dom0 kernel to boot. I setup my cap-eye install kernel, rebooted into kmdb, and :c'd into a new world. The majority of the hard work was already done bringing up domU. The CPU and VM code for domU, done by Tim, Todd, and Joe just worked for domain 0. That made life very simple.

The first problem I ran into was the internal pci config access setup in mlsetup. It was initially shutoff for domU, I had added it back in for dom0. However, this requires a call to the BIOS, which xen doesn't allow. So I changed the code to default to PCI_MECHANISM_1 for i86xen dom0.

From there, the next problem I ran into was ins/outs weren't working.. That was fixed with a HYPERVISOR_physdev_op (PHYSDEVOP_SET_IOPL), which ended up being slightly wrong and fixed by Todd before we released.

Now I was at the point where we are attaching drivers and the drivers are trying to map in their registers. Joe had done a bunch of work in the VM getting the infrastructure ready for foreign PFNs, which are basically PFN's which are tagged to mark then as containing the real MFN, instead of being present in the mfn_list. Since this was the first time trying that code out, I ran into a couple of minor bugs. The more interesting problem was that Xen was using one of the software PTE bits in a debug version of Xen which conflicted with the bit we we're using to mark the page as a foreign. I commented out that feature and rebuilt Xen and continued on while Joe worked on changing the PTE software bits to be encoded instead of individual flags to avoid bit 2 int PTE software field.

I had already changed the code in rootnex to convert the MFN (device register access) to a foreign PFN during ddi_regs_map_setup(). So once the PTE software bits were cleared we were sailing through the driver reading its device registers and on to mapping memory for device DMA.

I had also modified the rootnex dma bind routine. When we're building dma cookies, we need to put MFNs in the cookies instead of PFNs. I had a couple of bugs in that code, fixed that up, then ran into the contig alloc code path. I hadn't coded up the contig alloc code path changes yet (were we want to allocate physically contiguous memory). So I cheated and temporarily took out all the drivers which required contig alloc, and did the contig alloc code at a later time (my boot device didn't need it :-) )

Now I was up to vfs_mountroot(). This is where the Solaris drivers start taking over disk activity and stop using the BIOS to load blocks. This is also where we first start noticing problems if interrupts don't work.

This is where I handed off the Stu :-). This was the last of the hard problems. Stu had been busily working on Solaris dom0 interrupt support. A mix of event channels, pcplusmp, ACPI, and APICs. Something I would never wish on anyone. Stu got it up and working remarkably fast (something he should talk about :-)) and I was back and running up to the console handover.

The console config code is a little bit messy in solaris. I waded through that for a little bit. All of the code was originally in the common intel part of the code. I moved the platform specific code to i86pc and i86xen then have a different implementation in i86xen which basically always sends the Solaris console to the Xen console. Not sure if it will stay that way in the end, but that makes the most sense IMO.

And from there, I was at the multi-user prompt..

Some other interesting problems I ran into during the bringup. I had to have isa fail to attach on a Solaris domU. The ISA leaf drivers assume the device is present and bad things happen. There were a couple places in the kernel where they have hard coded physical address which it tries to map in (e.g. psm_map_phys_new; the lower 1M of memory, used for BIOS tables, etc.; and xsvc used by Xorg/Xsun). And we found out the hard way that Xen's low mem alloc implementation is linux specific. Only allocates memory < 4G && > 2G. We need to redo our first pass at implementing memory constrained allocs.

As far as booting 64-bit Solaris dom0, it booted up the first time.

We'll that enough for now.. I'll save the bringup of domUs on a Solaris dom0 for the next post. That was a little more challenging...

Wednesday Nov 16, 2005

New x86 rootnex code and dtrace

In snv_24, there was some significant changes to the DDI DMA routines in the x86 rootnex.

For driver writers, there's some additional visibility to what's going on with the DDI DMA interfaces via dtrace. The following is a hacked up dtrace script to get an idea of what you can see. I need to clean it up and explain what your seeing more, but for now, I'll just put it out there.

#!/usr/sbin/dtrace -Fs

sdt:rootnex:\*:rootnex-bind-prealloc
{
	@prealloc_cookies[arg0] = count();
}

sdt:rootnex:\*:rootnex-bind-alloc
{
	@ba[arg0] = count();
}

sdt:rootnex:\*:rootnex-alloc-handle
{
	@lq[probefunc] = lquantize(arg0, 0, 16384, 128);
}

sdt:rootnex:\*:rootnex-bind-fast
{
	@lq[probefunc] = lquantize(arg1, 0, 16384, 128);
	@bf[arg0] = quantize(arg2);
}

sdt:rootnex:\*:rootnex-bind-slow
{
	@lq[probefunc] = lquantize(arg1, 0, 16384, 128);
	@bs[arg0] = quantize(arg2);
}

sdt:rootnex:\*:rootnex-bind-sp-alloc
{
	@bw[arg0] = count();
}

sdt:rootnex:\*:rootnex-sync-dev
{
	@sd[arg0] = sum(arg1);
}

sdt:rootnex:\*:rootnex-sync-cpu
{
	@sc[arg0] = sum(arg1);
}

sdt:rootnex:\*:rootnex-alloc-copybuf
{
	@copybuf_alloc[arg0] = sum(arg1);
}

sdt:rootnex:\*:rootnex-sgllen-window
{
	@sgllen_window[arg0] = count();
}

sdt:rootnex:\*:rootnex-copybuf-window
{
	@copybuf_window[arg0] = count();
}

sdt:rootnex:\*:rootnex-maxxfer-window
{
	@maxxfer_window[arg0] = count();
}

fbt:unix:i_ddi_mem_alloc:entry
{
	@mem_alloc[arg0] = sum(arg2);
	@c[probefunc] = count();
}

fbt:genunix:ddi_dma_alloc_handle:entry
{
	@c[probefunc] = count();
}

fbt:genunix:ddi_dma_addr_bind_handle:entry
{
	@c[probefunc] = count();
}

fbt:genunix:ddi_dma_buf_bind_handle:entry
{
	@c[probefunc] = count();
}

fbt:genunix:ddi_dma_mem_alloc:entry
{
	@c[probefunc] = count();
}

END
{
	printf("\\n\\n");

	printf("\\nDMA Function Count\\n");
	printf("    %-26s\\tCNT\\n", "FUNCTION");
	printa("    %-26s\\t%@u\\n", @c);

	printf("\\nalloc: bytes allocated in dma alloc\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @mem_alloc);

	printf("\\nbind: used pre-alloced sgl storage (fast)\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @prealloc_cookies);

	printf("\\nbind: had to alloc memory to store sgl (slow)\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @ba);

	printf("\\nbind: had to alloc memory for window/copybuf state (slow)\\n");	
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @bw);

	printf("\\nbind: bytes allocated for copybuf\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @copybuf_alloc);

	printf("\\nbind: sgllen windows\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @sgllen_window);

	printf("\\nbind: copybuf windows\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @copybuf_window);

	printf("\\nbind: maxxfer windows\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @maxxfer_window);

	printf("\\nbind: took fastpath\\n");
	printa("    DIP = 0x%p\\tvalue=bindsize%@u\\n", @bf);

	printf("\\nbind: took slowpath\\n");
	printa("    DIP = 0x%p\\tvalue=bindsize%@u\\n", @bs);

	printf("\\nalloc/bind: outstanding handles and binds\\n");
	printa("    %s\\t%@u\\n", @lq);

	printf("\\nsync: bytes copied for sync dev\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @sd);

	printf("\\nsync: bytes copied for sync cpu\\n");
	printf("    %-18s\\tCNT\\n", "DIP");
	printa("    0x%p\\t%@u\\n", @sc);
}

Monday Aug 15, 2005

Solaris x86 rootnex warning in syslog

Your probably here because you searched for "x86 rootnex warning" or something like that on google. :-) If so, you came to the right place. I'm including below, a slightly edited heads up message that I sent out when I putback the bug fix related to this warning. It gives some details on the warning.

But first, a little background. Towards the end of s10 development, I fixed a couple of old x86 rootnex bugs. One of these incorrectly passed a dma bind operation instead of failing it. This fix ended up finding a few driver bugs. Since we were getting relatively close to s10 RR, and this was a hard to diagnose problem, I ended up printing a warning when we hit this condition. We found a few 3rd party drivers with sgllen related bugs in Solaris Express after this was fixed.

I heard some folks still occasionally run into this, so I figured I'd put this info out there... Here's the rootnex code in question and here's the original bug. Below is a slightly edited version of the heads-up that I sent out..


My recent putback of:

(P1) 3001685 ddi: DMA breakup routines do not match DDI specification
     4926500 ddi: x86 DMA breakup with sgllen==1 can give ncookies!=1
     4796610 ddivs_dmae/ddi_check_dma_handle_\* 3 assertions fail due to a product bug

could cause existing buggy drivers which are thought to be functioning
correctly, to start failing.


------------------


 o description of the problem - what's wrong

   The x86 root nexus's ddi_dma_\*_bind_handle() implementation
   can wrongly return more cookies than a driver specifies is
   the maximum that it can handle (when DDI_DMA_PARTIAL is not
   specified). This can be a very serious problem, causing silent
   data corruption. There can be a lot of corner cases depending
   on memory fragmentation, making it difficult to test for.
   Unfortunately, the fix for these bugs also can/has exposed
   minor bugs in existing drivers, which could be hard to identify
   and debug. An example is the elxl driver which specified that
   it could only handle 1 cookie for the tx data buffers, but
   really could handle 2 (and was getting 2 occasionally).


 o does the problem happen on sparc?  why not?

   This problem does \*not\* happen on SPARC. This is a bug
   in the x86 rootnex driver. This part of the code is not
   shared with any SPARC nexus driver code.


 o what rootnex was doing wrong

   The rootnex driver would return success from a DMA bind
   operation when the cookie count was larger than the maximum
   the driver specified that it could handle, and the driver
   specified that it couldn't handle partial mappings.


 o what rootnex is now doing right

   The rootnex driver fails a DMA bind operation when
   the cookie count would be larger than the maximum
   that a driver specifies that it can handle, and the
   driver specifies that it cannot handle partial mappings.


 o what drivers were doing wrong

   The vast majority of drivers aren't doing anything wrong.
   There may be a few which are not handling the DMA bind
   operation correctly. For example, if a driver cannot
   handle partial DMAs, it should be able to handle the
   following # of cookies ((max possible bind size / page size) + 1).
   If it cannot handle that number of cookies, it must be
   able to expect and correctly handle a failed bind
   operation.


 o what they need to do right.

   If a driver cannot handle partial DMAs, it should be able
   to handle the following # of cookies
   ((max possible bind size / page size) + 1).
   If it cannot handle that number of cookies, it must be
   able to expect and correctly handle a failed bind
   operation.


 o does this mean a driver which was working ok before
   could now broken, because a driver may have been
   written to work around the rootnex bug?

   No. A driver didn't need to workaround the bug. It does
   mean that a driver which was working or was appearing to
   work before, could now be broken. This could be because the
   code path for the failing bind operation has never been
   tested or the driver never expected the bind operation
   to fail (i.e. the driver has a bug which they never
   debugged since it never failed before unless they noticed
   memory corruption).


 o instructions for using flags

       There are two patchables rootnex_bind_fail & rootnex_bind_warn. The
       following table explains the behavior of the new bind operation
       failure. The current default behavior is to fails the bind and
       print one warning message per major number.

 rootnex_bind_fail |  rootnex_bind_warn  | Results if sgllen < \*ccount
                   |                     | && !DDI_DMA_PARTIAL
 ---------------------------------------------------------------------
        0          |          0          | behaves like code today (bind succeeds, no warning message)
        0          |          1          | bind succeeds, print one warning/major#
        1          |          0          | fails the bind, no warning message
        1          |          1          | fails the bind, print one warning/major#

       To revert to the previous behavior, which is incorrectly returning
       success from these ddi_dma_\*_bind_handle() operations, put the
       following in /etc/system then reboot.
          set rootnex:rootnex_bind_fail = 0

       To disable the warning message, put the following in /etc/system
       then reboot.
          set rootnex:rootnex_bind_warn = 0

       Anyone fixing bugs found by the warning should make sure they test their
       fix \*without\* rootnex_bind_fail = 0 and set kmem_flags = 3f to ensure they
       clean up correctly after the failure (since it's likely that code path
       hasn't been tested). e.g. put the following in /etc/system
          set rootnex:rootnex_bind_fail = 1
          set kmem_flags = 0x3f


 o so you've got a driver's source code.  how can you
   inspect the code to check for possible errors
   and how can you fix them?

   First you must understand what is the maximum possible bind
   size that your driver will see. This may not be trivial.
   For example, the ata driver handles ~64K maximum buffers
   for normal operation, but newfs goes through /dev/rdsk
   which doesn't break buffers down to ~64K buffers initially.
   Once you understand maximum possible bind size, make sure
   sgllen is set to ((max possible bind size / page size) + 1)
   and that you actually handle that many cookies. Or make
   sure you handle the fail case correctly.


 o can you run your fixed driver on earlier solaris releases?

   Yes. A correctly written driver will run fine on both S10
   and earlier solaris releases. This bug fix only fixes a
   problem of not correctly identifying a failure case. If a
   driver doesn't generate a failure case, or correctly
   handles the failure case, there will not be any problems.


 o What can I do if my system doesn't boot anymore

   In the unlikely event your system doesn't boot anymore,
   you can recover the system by setting a defered breakpoint
   in rootnex_attach, clear the rootnex_bind_fail patchable, 
   then, once you've booted, add set rootnex:rootnex_bind_fail = 0
   to /etc/system

   An example is provide below...

     Boot args: 

     Type    b [file-name] [boot-flags]       to boot with options
     or      i                                to enter boot interpreter
     or                                       to boot with defaults

                       <<< timeout in 5 seconds >>>

     Select (b)oot or (i)nterpreter: b kmdb -d
     Loading kmdb...

     Welcome to kmdb
     kmdb: Unable to determine terminal type: assuming `vt100'
     [0]> ::bp rootnex`rootnex_attach
     [0]> :c
     SunOS Release 5.10 Version gate:2004-10-18 32-bit
     Copyright 1983-2004 Sun Microsystems, Inc.  All rights reserved.
     Use is subject to license terms.
     Loaded modules: [ ufs unix krtld genunix specfs ]
     kmdb: stop at rootnex`rootnex_attach
     kmdb: target stopped at:
     rootnex`rootnex_attach: pushl  %ebp
     [0]> rootnex`rootnex_bind_fail?W 0
     rootnex`rootnex_bind_fail:      0x1             =       0x0
     [0]> :c
     Hostname: ...
     [hostname] console login: root
     Password: 

     [CUT]
     bfu'ed from /ws/on10-gate/archives/i386/nightly-nd on 2004-10-18
     Sun Microsystems Inc.   SunOS 5.10      s10_68  December 2004
     # echo "set rootnex:rootnex_bind_fail = 0" >> /etc/system
     #

Tuesday Jun 14, 2005

Solaris x86, Device DMA, and the DDI

Solaris x86, Device DMA, and the DDI I'm going to start a monthly blog entry on a DDI subject of your choice... Now that OpenSolaris is available, I can get into some decent detail referencing kernel code when needed.

So I'm seeking a topic for July... Anyone interested, submit your DDI topic of interest in the comments and I'll pick one for July....

For June, I'm going to talk about Solaris x86, device DMA, and the DDI.. Mostly because that's what I have been spending some of my time on lately... I write code, not documents, so don't expect too much. I can't spell worth a damn... Grammar is poor. Use the wrong words a lot. I'll probably jump around a lot too :-). Hopefully, you'll still get something meaningful out of this ;-)

I'm assuming for this entry you already know a little bit about the DDI DMA interfaces in Solaris. If not, you can look at the following manpages for a little background...

  • ddi_dma_alloc_handle(9M)
  • ddi_dma_free_handle(9M)
  • ddi_dma_addr_bind_handle(9M)
  • ddi_dma_buf_bind_handle(9M)
  • ddi_dma_unbind_handle(9M)
  • ddi_dma_sync(9M)
  • ddi_dma_getwin(9M)
  • ddi_dma_nextcookie(9M)
  • ddi_dma_attr(9S)
  • ddi_dma_cookie(9S)

The implementation of these routines live in sunddi.c where sunddi.o resides in genunix (/kernel/amd64/genunix). You'll see when you look at this code, that most of these routines are just simple wrappers which will eventually end up in architecture specific code.

Jumping ahead a little, on x86, we end up in the rootnex driver. The rootnex driver is the x86 root nexus driver. Nexus drivers implement the busops interface in dev_ops. Basically, drivers are hierarchical where nexus drivers can have children which could be either other nexus drivers or leaf drivers. A leaf driver is the last driver in a branch (i.e. can't have any children). The root nexus driver is the root driver, similar to the root of a filesystem. Anyway, that's a subject of another entry. For now, just trust me that we end up in rootnex for x86 :-)

So a quick mapping of code is:

ddi_dma_nextcookie stays in genunix...

Now let me be the first to say this is pretty rough code... This should be changing soon, but for now, you have been warned... So now that you know where the code is, I'm going to jump back up to a higher level...

If your still interested, you probably already know the normal sequence is to alloc a dma handle, bind a buffer, get your cookies (physical addresses to DMA into), sync if reading from memory, do your DMA, etc...

When a dma handle is allocated, the rootnex driver will do some validation on the ddi_dma_attr and pre-allocate some state for you. Nothing very exciting... The fun stuff happens in the bind code which will be the topic for the rest of this entry.. Instead of walking through the code, I'll walk through the concepts... The code should be changing soon, so I don't want to spend a lot of time on code which may not be the same by the time you read this. But first some terminology I'll be using which doesn't always match up with other folks terminology. Sometimes I like to redefine things too :-).

  • Scatter/Gather List (SGL) - a list of physically contiguous buffers
  • Cookie - single physically contiguous buffer i.e. a SGL element.
  • SGL Length - The maximum number of cookies/SGL elements the DMA engine supports
  • Copy Buffer - bounce buffer/intermediate buffer. Used as a temporary buffer to DMA to/from when the DMA engine can't reach the physical address we are supposed to MDA into.

Jumping to the fun stuff, the first concept in the bind, is how the buffer is passed down to bind. It can be a kernel virtual address (KVA) w/ size, a linked list of physical pages (without a kernel address), or an array of physical pages (with a kernel virtual address [shadow I/O]). For each page in the buffer, the rootnex driver has to make sure that the dma engine can reach the physical address. There is a DMA engine low address limit and a high address limit passed in the ddi_dma_attr during ddi_dma_alloc_handle(9M) which the rootnex driver uses to do this.

For every page which can't be reached, the rootnex driver will use part of a copy buffer. For these pages, the device will DMA into the copy buffer, and not the actual buffer. The data will be copied to/from the copy buffer when the driver calls ddi_dma_sync(9F). So the driver better make sure they have syncs in the right place and have the direction correct! Continuing... The copy buffer has a fixed maximum size. Each bind will get its own copy buffer if needed. If the amount of copybuf required in a single bind is greater than the maximum size of a copy buffer, the bind will need to be a partial bind and will require multiple windows. This is a concept I'll talk about further down..

What happens when a linked list of physical pages w/o a KVA comes down you asked? Good question! Well, currently, the rootnex driver will allocate some KVA space (vmem_alloc) without physical memory to back it up and then maps it to the physical page on the fly during sync. Not pretty. This should be changing for the 64-bit kernel in the near future (homework: what is seg kpm). How come the DMA engine can reach the copy buffer, but can't reach the original DMA buffer you ask? You guys are good... Well, most DMA buffers originate from userland or from a kernel stack which has no idea what the constraints of the DMA engine are (and it shouldn't since there may be multiple DMA engines with different constraints). The copy buffer is allocated from the same underlying routines that ddi_dma_mem_alloc(9F) uses, which takes into account the DMA engines constraints. i.e. the copy buffer is allocated specifically for the DMA engine we are using...

The copybuf code path got, and is still getting, a lot of usage in s10 and above once we went to a 64-bit kernel on x86. The number of x86 machines with > 4G of memory has gone up tremendously since you can actually use the memory more efficiently these days. OK, maybe efficiently isn't the right word, but you get my point...;-) A lot of devices only support 32-bit DMA addresses, so they correctly set their DMA high address to 0xFFFF.FFFF. Any physical address above this will require a copy buffer on x86 (On SPARC, we have an IOMMU so it doesn't have this problem, but that's a different entry).

Jumping to the side for a sec... don't confuse 64-bit DMA addresses with a 64-bit card. You may have a 32-bit/33MHz PCI card which supports 64-bit address via dual address cycles (DAC), you may have a 64-bit/66Mhz PCI card which only supports 32-bit DMA addresses, or you could have a x8 PCI Express card which only supports 32-bit DMA addresses. The speed of the card and the number of bytes that can be transfered in a clock have nothing to do with the DMA address width. If a device only supports a 32-bit DMA address, it will not be able to reach memory above 4G and will require a copy buffer.

Jumping back. It gets more interesting from here. Memory organization on SMP Opteron systems is very similar to our SPARC systems. The memory controller is in the CPU chip (which could have multiple cores). So if I have a two chip Opteron based system, I have at least 2 memory controllers. Solaris is smart, and will allocate memory closest to the core you are running on. Going back to the 2 chip Opteron system. If the system has 16G of memory, and I am a process running on chip 2, when I allocate memory, it's physical address will be above 8G (0 - 8G is attached to chip 1). So all I/O on chip 2 will need a copy buffer for a DMA engine with a high address limit of 0xFFFF.FFFF. Lessons learned, if you want performance on this type of system, use a device which supports 64-bit DMA addresses. And make sure if your device supports 64-bit DMA addresses, the driver supports 64-bit DMA addresses!

OK, enough about copy buffers. Jumping back to ddi_dma_attr for a moment. dma_attr_align is used during ddi_dma_mem_alloc(9F), don't expect it to do anything for you in the bind. dma_attr_count_max and dma_attr_seg limit the size of a cookie. If I have a 1M buffer which is physically contiguous, normally I would get a sgl length of 1 and the single cookie would be 1M in size. If I set seg or count_max to 256K-1, I would get a sgl length of 4 or 5 (depending on if the start address was page aligned) where each cookie would be <= 256K in size. Why do we have both seg and count_max? don't know...

OK, we finally arrive at windows... The fun stops here. Basically, a window is supposed to be a piece of a DMA bind that fits within the DMA constraints. i.e. if I have a bind for which the DMA engine cannot handle in a single transfer, and the driver/device supports partial mappings, the DDI is supposed to break it into multiple windows where each window can be handled by the DMA engine. Again, jumping back to ddi_dma_attr . There are three things which should require the use of multiple windows during a bind:

  • We need more copybuf space then the maximum copy buffer size allowed
  • The number of cookies required to bind the buffer is greater than the maximum number of cookies the H/W can handle (dma_attr_sgllen)
  • The size of the bind is greater than the maximum transfer size of the DMA engine (dma_attr_maxxfer)

But, from a historical note, that's not the way it was original implemented on the original x86 port. At the time this was written, the only time you will get multiple windows is when we need more copybuf space then the maximum copy buffer size allowed. This should be fixed shortly, but you will still have to handle how the current implementation works for the driver to operate correctly on s10 and before (I'll explain what that behavior is shortly). Don't worry though, once this is fixed, a driver which handles the old behavior will still work great with the correct behavior.

Once we need multiple windows, the rootnex now has to worry about the granularity of the device (dma_attr_granular). A device can only transfer data in even multiples of the granularity. e.g. if the granularity is set to 512, the size of a window must be an even multiple of 512. So when the rootnex gets to the end of a window, it sometimes has to subtract some data from the current window and put it into the next window to ensure the current window size is a multiple of granularity. This is referred to as trimming in the code. This gets pretty complicated with the way the rootnex DMA code is currently architected, and was the source of a fair number of bugs for which I had to put some not so obvious hacks in there to fix..

And last, but not least, what happens today in a bind when the driver supports a partial bind and one of the two conditions are hit:

  • The number of cookies required to bind the buffer is greater than the maximum number of cookies the H/W can handle (dma_attr_sgllen)
  • The size of the bind is greater than the maximum transfer size of the DMA engine (dma_attr_maxxfer)

Tune in next week..

Sorry couldn't resist.. Well, I don't think there's an official word for it, but I'm going to make up something as I type, because, remember, I like to do that sort of thing ;-). We get a superwindow, where a superwindow is a window larger than the DMA engine can handle. However, a superwindow is properly trimmed at the conditions mentioned above. So when the driver is going through the cookies, if the next cookie puts it over the DMA engines sgllen or maxxfer size, it can consider that cookie the start of the next window. So it puts more work back on the driver writer. Of course, if you've already written a Solaris driver for x86 which supports partial mappings, you have probably already figured that out :-/.

Well, that's enough for this month. I have code to finish up and putback. Don't forget to submit your DDI topic of interest in the comments section for next month...

MRJ

Technorati Tag:
Technorati Tag:

Tuesday Jun 22, 2004

What devices do you want supported & Writing Solaris Drivers

Looking for opinions out there... Mostly concentrating on x86, but SPARC comments welcome too.

  • For those of you who are, or would like to run Solaris x86, what's the top device that you would like to see supported?
  • For those of you that write, or would like to write Solaris device drivers, what could we do to make it easier for you?

Thanks!

MRJ

Tuesday Jun 08, 2004

Good place to get Solaris apps

Here's a really useful site...

http://www.blastwave.org/

nice to have pkg-get w/ dependencies on Solaris... :-)

Solaris really needs a standard app which is as easy to use as pkg-get and up2date...

MRJ

About

mrj

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today