As a developer, there are always those projects when it is hard to find a way to go forward. Drop the project for now and find another project, if only to rest your eyes and find yourself a new insight for the temporarily abandoned project. This is how I embarked on posix_spawn() as an actual system call you will find in Oracle Solaris 11.4. The original library implementation of posix_spawn() uses vfork(), but why care about the old address space if you are not going to use it? Or, worse, stop all the other threads in the process and don't start them until exec succeeded or when you call exit()?
As I had already written kernel modules for nefarious reason to run executables directly from the kernel, I decided to benchmark the simple "make process, execute /bin/true" against posix_spawn() from the library. Even with two threads, posix_spawn() scaled poorly: additional threads did not allow a large number of additional spawns per second.
All ways to start a new process need to copy a number of process properties: file descriptors, credentials, priorities, resource controls, etc.
The original way to start a new process is fork(); you will need to mark all the pages as copy-on-write (O(n) in the size of the number of pages in the process) and so this gets more and more expensive when the process get larger and larger. In Solaris we also reserve all the needed swap; a large process calling fork() doubles its swap requirement.
In BSD vfork() was introduced; it borrows the address space and was cheap when it was invented. In much larger processes with hundreds of threads, it became more and more of bottleneck. Dynamic linking also throws a spanner in the works: what you can do between vfork() and the final exec() is extremely small.
In the standard universe, posix_spawn() was invented; it was aimed mostly at small embedded systems and a very number of specific actions can be performed before the new executable is run. As it was part of the standard, Solaris grew its own copy build on top of vfork(). It has, of course, the same problems as vfork() has; but because it is implemented in the library we can be sure we steer clear from all the other vfork() pitfalls.
The exec() call copies from its own address space but when spawn(2) needs the argument, it is already in a new process. So early in the spawn(2) system call we copy the environment vector and the arguments and save them away. The data blob is given to the child and the parent waits until the client is about to return from the system call in the new process or when it decides that it can't actually exec and calls exit instead.
A process can spawn(2) in all its threads and the concurrently is only limited by locks that need to be held shortly when processes are created.
The performance win depends on the application; you won't win anything unless you use posix_spawn(); I was very happy to see that our standard shell is using posix_spawn() to start new processes as do popen(3c) as well as system(3c) so the call is well tested. The more threads you have, the bigger the win. Stopping a thread is expensive, especially if it hold up in a system call. The world used to stop but now it just continues.
When developing a new system call special attention needs to be given to proc(5) and truss(1) interaction. The spawn(2) system call is an exception but only because it is much harder to get it right; support is also needed in debuggers or they won't see a new process starting. This includes mdb(1) but also truss(1). They also need to learn that when spawn(2) succeeds, that they are stopping in a completely different executable; we may also have crossed a privilege boundary, e.g., when spawning su(8) or ping(8).