Let me first say this: I'm typically not a fan of middleware. I like my client code to connect directly to the services they are using. I dislike proxies, load balancers, really anything that gets in the path between my client code and the service it's talking to. Most of this dislike comes from my own history, as in the past the performance of network hops and especially middleware software I've used has been pretty poor, and anything that adds latency or bottlenecks or another failure point just makes me want to cringe. So when I first heard of this proxy in the NoSQL database, I was skeptical to say the least. In the past year though, I've moved into the NoSQL database group in Oracle and I've been able to deep dive into this component and its performance characteristics. I still dislike middleware, but I've come to accept and even like the NoSQL httpproxy.
Oracle NoSQL httpproxy is an http-based networking service that acts as a "middle man" between NoSQL clients and NoSQL database server cluster machines. This proxy has multiple purposes:
The proxy supports a standard http(s) interface through a single host/port. This is critical for use in the NoSQL Cloud Service, and simplifies network security and configuration even for on-premise installations. The http(s) interface also simplifies integration with load balancers, which enables high availability and failover since many separate proxies can be configured behind load balancer instances.
NoSQL httpproxy is used for all accesses to the NoSQL Cloud Service. In this case it is also integrated with Oracle IAM Identity for authentication and authorization. All tenant/user/compartment/etc info is managed through httpproxy.
The proxy manages all message routing to the internal NoSQL database nodes to enable sharding automatic failover in case nodes experience problems, in the same way that the original java direct driver (still available for on-prem use) does.
In the Oracle NoSQL system, client code that connects to the httpproxy are referred to as "client drivers". There are drivers for many languages: java, python, go, C++, .NET, etc. The drivers do not know about routing/sharding and route all their messages through a single https host/port. Since the proxy now manages the routing/sharding of messages, the driver code is much simpler, more robust, and easier to update and maintain.
A side note: the driver clients use the http protocol, but the payload of the messages is an internal proprietary binary format. This is (in theory, anyway) more performant and uses less bandwidth/resources. But it doesn't enable a standard REST interface, such that a user could (if so desired) interact with the NoSQL database via something like curl.
So, the httpproxy also provides a REST interface which supports basic data operations and enables table/metadata functionality as well. This is used in the Oracle NoSQL Cloud Service console interface to allow users to interact with the database via a browser. But it's not really intended for heavy data operations, as the REST protocol adds another layer of overhead. When doing moderate to heavy load data operations, use a client driver instead.
Httpproxy is based on the netty open-source platform. Netty was chosen over others because it's well known, performant, and relatively stable and robust. The proxy uses netty as its main server structure, and uses the NoSQL java direct driver code to communicate with the databse server nodes (well, not exactly - It uses direct driver lower-level internals for performance reasons). It is fully multithreaded, and the thread pool parameters are configurable if you should want to do more tests with non-default values (or, as I did, if you want to reduce the threads to near zero and try to break it ).
When I first heard of the proxy, my immediate reaction was "oh man, that's gonna be a bottleneck and increase latency". Since then I've done significant internal load testing of single and multiple parallel httpproxies, using thousands of driver clients, and I can say for certain that the proxy is not a bottleneck at all. It does add typically about 0.5ms of latency, but throughput is not an issue, even in very heavily loaded database environments. I've even run a full-scale NoSQL database cluster with many hardware machines, and thousands of driver clients across many hardware machines, all using a single httpproxy running on a cloud VM instance, and the proxy was still not a bottleneck. So I can say now that my fear was unfounded.
Believe me, I tried to make a case that the proxy performance was an issue. But the more I tested it and the harder I pushed it, the more impressed I became. My background is very heavy in C and C++, and I've often scoffed at applications written in java due to their inherit performance issues, especially around garbage collection. But this is one java application that performs surprisingly well. Internally its written such that there's very little GC, and even when pushed very hard the first bottleneck that usually arises is external network I/O. Its memory footprint and CPU usage are very good.
So, I really wanted to hate this thing... but I have to admit I'm now quite a fan. ... of a java application. Ow, that hurts.
Learn how to set up and use the proxy in the Admin guide.
Developers looking for fast and predictable response time at scale for their document, columnar and key-value database centric applications prefer Oracle’s NoSQL Database Cloud Service.