Since Ravello trial now supports 16GB RAM per VM, this article is now obsolete. Please check for the updated version here.
Big Data Lite Virtual Machine is the most straight-forward way to start learning about Oracle Big Data Stack or even develop demos/proof of concepts on top of it (don't forget however that BD Lite only emulates Big Data environment and while it provides the functionality, it can't be used for any kind of performance benchmarks).
As it's a VM, it has some limitations and potential issues, such as:
Migrating the VM to Oracle IaaS environment is one way to handle some, but not all of those issues.
Turns out there is a better way: Oracle Ravello. Ravello is an overlay cloud service that brings data-center like capabilities to public cloud – enabling one to run VMware and KVM VMs with data-center like Layer 2 networking. Ravello brings these capabilities through a nested virtualization engine and networking & storage overlay that is built into its built-for-cloud hypervisor HVX. With Ravello’s capabilities one can run an existing data-center based VMware / KVM application on public cloud without any modifications, there by accelerating the move.
Some of the key features of Ravello that we want to exploit are:
1. Get Ravello Trial Account
Go to Ravello site and sign up. It's free for 14 days and includes 2880 free CPU hours.
2. Upload BD Lite OVA
Once you have the account, go to Library -> VMs and click the Import VM button. You will need to install a local service which does the actual upload. Simply select the BD Lite OVA file (download and unzip BD Lite zip files to get it). The process should be straight-forward. Admittedly, I haven't done it myself because somebody was nice enough the do this instead of me and just share the VM with me. More about this later.
3. Create an application
An application in Ravello can be a complex environment required to run a complete application, such as: DB Server, App servers, Load Balancer...
Go to Applications and click "Create Application". Give it a meaningful name, e.g. "BD Lite".
4. Application Blueprint
Drag and drop 2 instances of Big Data Lite to the canvas. You will get an alert that there are errors, don't worry about it for now.
Why two machines? Well, the alert is telling you that no more than 8 GB RAM can be assigned to a single VM (this applies to trial accounts only!). To get around that, we'll create two machines and have hadoop run on one and BDD on the other.
Now select one of the machines, go to the system properties on the right and assign 8GB RAM to it. Then repeat for the other VM and save.
You might also want to rename the VMs in the process, e.g. "BigDataLite460-hadoop" and "BigDataLite460-BDD". I haven't in my screenshots, so the "BigDataLite460" will be running hadoop and "BigDataLite4601" will be running BDD.
Select each of the VMs, go to services, add, and add supplied service on TCP port 22.
You might want to expose additional services, such as hue (8888) on the hadoop side and BDD (9003) on the BDD side. However, if you do this, please remember to change your root and oracle passwords, or your machine will be exposed over the internet.
Publish the application. When you do, you can change some options. Note the "application scheduled to stop in". It will stop in 2 hours by default - meaning the VMs will shut down. Click on the number to change it or change to "Never" if you don't want it to stop automatically.
5. Trick the VMs to think there is only one machine
We are going to use a simple trick which requires no reconfiguration of either of the machines. One machine will only run hadoop while the other will only run BDD, and we will connect them using SSH port forwarding. That way each of the machines will seem to connect to localhost for the services running on the other machine, but the change will be totally seamless.
Select the first VM, then in the bottom right corner click console. This will open VNC console to the running machine. Login (oracle/welcome1).
Next, open "Start/Stop Services" app and make sure hadoop and only hadoop services are up and running (Zookeeper, HDFS, Hive, Hue, Oozie, Yarn). You can add other hadoop services if you need them.
Next, open termnial. Then, in the toolbar at the top of VNC, click "Paste Text" button and paste the following command.
ssh -L 9003:localhost:9003 -L 3306:localhost:3306 -L 7010:localhost:7010 -L 7203:localhost:7203 -L 7101:localhost:7101 -L 7102:localhost:7102 firstname.lastname@example.org
Once back in the terminal just press enter to execute the command.
Note1: I've experienced some errors produced by copy-pasting, so in case you get a strange error, try to fix the problem or paste again.
Note2: In my case, the "BDD" VM had internal IP 10.0.0.4 and the "Hadoop" one had 10.0.0.3. You can find the internal IP in the VNC console if it changes.
Once executed, the terminal might look a bit puzzling since both hosts have the same local name. However, this is now a ssh session established to the other machine. Don't close this window or port mapping will be lost!
What this command will do, it will map some of the ports from the second machine to the first machine. E.g. 9903 (BDD), 3306 (mysql). Frankly, I'm not completely sure which ports are required, and the bind error indicates there are too many, but in my tests this configuration works. If you hit problems you might want to adjust them.
Advanced users might want to do this whole step through ssh rather than VNC console. Just remember to use screen in this case to keep the session open.
5a) Mapping hadoop to the BDD machine
We need to do the reverse to the second - BDD - machine. Again, in ravello console select it and connect to console, login to it.
First we need to stop all services. Click start-stop services and stop all of them (don't start BDD yet!). Wait for services to stop.
Then open terminal and paste the following command:
ssh -L 50070:localhost:50070 -L 8020:localhost:8020 -L 9000:localhost:9000 -L 50075:localhost:50075 -L 50030:localhost:50060 -L 50010:localhost:50010 -L 50020:localhost:50020 -L 8020:localhost:8020 -L 8485:localhost:8485 -L 8480:localhost:8480 -L 8032:localhost:8032 -L 8030:localhost:8030 -L 8031:localhost:8031 -L 8033:localhost:8033 -L 8088:localhost:8088 -L 8040:localhost:8040 -L 8042:localhost:8042 -L 8041:localhost:8041 -L 9083:localhost:9083 -L 10000:localhost:10000 -L 2181:localhost:2181 -L 2888:localhost:2888 -L 3888:localhost:3888 -L 3181:localhost:3181 -L 4181:localhost:4181 -L 8019:localhost:8019 -L 9010:localhost:9010 -L 11000:localhost:11000 -L 11001:localhost:11001 -L 7077:localhost:7077 -L 7078:localhost:7078 -L 18080:localhost:18080 -L 18081:localhost:18081 email@example.com
This will map hadoop ports from the hadoop vm to the BDD vm. Again, don't close the terminal or mapping will be lost.
Now we just need to start BDD, but we need to tweak services script first. BDD service is not visible unless there is enough RAM assigned to the VM.
Open another terminal and execute nano /opt/bin/services
Change "minimum_memory_BDD" from 11000 to 4000, save and exit.
Run "stop/start services" from the desktop again and start BDD service, but keep all others turned off!
Congrats, you can now run the whole Big Data Lite, including BDD in the Ravello cloud! If you've configured supplied services, then you can connect to BDD using VM's external IP:9003/bdd. Otherwise you can use console and work over VNC.
I've tested the key functionality, including command line import and some transformations, and everything seems to be working fine. Keep in mind however that this is a hack and in now way an officially supported configuration.
Other cool stuff
I mentioned Ravello has some other cool features, like sharing VMs. Once you've done the process it's very easy to share configured application with your peers or customers.
In the application, go to canvas and save it as blueprint.
Select your blueprint, click share and share with other Ravello users simply by using their email address (the address they use for either paid or trial account). This will copy VMs and the configuration to your peers, which is a huge timesaver.