X

@OracleIMC Partner Resources & Training: Discover your Modernization options + Reach new potential through Innovation

Using Oracle Big Data Lite Virtual Machine with Ravello trial

Big Data VM on RavelloSince Ravello trial now supports 16GB RAM per VM, this article is now obsolete. Please check for the updated version here

Big Data Lite Virtual Machine is the most straight-forward way to start learning about Oracle Big Data Stack or even develop demos/proof of concepts on top of it (don't forget however that BD Lite only emulates Big Data environment and while it provides the functionality, it can't be used for any kind of performance benchmarks).

As it's a VM, it has some limitations and potential issues, such as:

  • the machine running it needs to have at least 16GB RAM
  • migrating from one machine to another (e.g. from developer's machine to a customer's demo machine) is a slow process
  • sharing isolated snapshot environments of a version among team members is equally slow and painful

Migrating the VM to Oracle IaaS environment is one way to handle some, but not all of those issues.

Turns out there is a better way: Oracle Ravello. Ravello is an overlay cloud service that brings data-center like capabilities to public cloud – enabling one to run VMware and KVM VMs with data-center like Layer 2 networking. Ravello brings these capabilities through a nested virtualization engine and networking & storage overlay that is built into its built-for-cloud hypervisor HVX. With Ravello’s capabilities one can run an existing data-center based VMware / KVM application on public cloud without any modifications, there by accelerating the move.

Some of the key features of Ravello that we want to exploit are:

  • uploading an OVA file directly (BD Lite is distributed as zipped OVA archive)
  • Sharing capability. Once you upload your VM, you can instantly share it either with other Ravello users
  • Blueprints for getting around Ravello trial limitations.
Obviously, the best way to go is to get Ravello as a part of Oracle IaaS. In that case, the whole process is straightforward: upload the OVA, create an Ravello application, add the VM, assign more RAM to it and run it.

However, if you don't have a paid Oracle IaaS with Ravello yet, it gets a bit tricky. Ravello trial limits amount of RAM to 8GB per VM. Note that BD Lite requires at least 12 GB RAM to be able to run the complete stack, including Big Data Discovery.

1. Get Ravello Trial Account

Go to Ravello site and sign up. It's free for 14 days and includes 2880 free CPU hours.

2. Upload BD Lite OVA

Once you have the account, go to Library -> VMs and click the Import VM button. You will need to install a local service which does the actual upload. Simply select the BD Lite OVA file (download and unzip BD Lite zip files to get it). The process should be straight-forward. Admittedly, I haven't done it myself because somebody was nice enough the do this instead of me and just share the VM with me. More about this later.

3. Create an application

An application in Ravello can be a complex environment required to run a complete application, such as: DB Server, App servers, Load Balancer...

Go to Applications and click "Create Application". Give it a meaningful name, e.g. "BD Lite".

4. Application Blueprint

Drag and drop 2 instances of Big Data Lite to the canvas. You will get an alert that there are errors, don't worry about it for now.

Why two machines? Well, the alert is telling you that no more than 8 GB RAM can be assigned to a single VM (this applies to trial accounts only!). To get around that, we'll create two machines and have hadoop run on one and BDD on the other. 

Now select one of the machines, go to the system properties on the right and assign 8GB RAM to it. Then repeat for the other VM and save.

You might also want to rename the VMs in the process, e.g. "BigDataLite460-hadoop" and "BigDataLite460-BDD". I haven't in my screenshots, so the "BigDataLite460" will be running hadoop and "BigDataLite4601" will be running BDD.

Select each of the VMs, go to services, add, and add supplied service on TCP port 22.

You might want to expose additional services, such as hue (8888) on the hadoop side and BDD (9003) on the BDD side. However, if you do this, please remember to change your root and oracle passwords, or your machine will be exposed over the internet. 

Publish the application. When you do, you can change some options. Note the "application scheduled to stop in". It will stop in 2 hours by default - meaning the VMs will shut down. Click on the number to change it or change to "Never" if you don't want it to stop automatically.

5. Trick the VMs to think there is only one machine

We are going to use a simple trick which requires no reconfiguration of either of the machines. One machine will only run hadoop while the other will only run BDD, and we will connect them using SSH port forwarding. That way each of the machines will seem to connect to localhost for the services running on the other machine, but the change will be totally seamless.

Select the first VM, then in the bottom right corner click console. This will open VNC console to the running machine. Login (oracle/welcome1). 

Next, open "Start/Stop Services" app and make sure hadoop and only hadoop services are up and running (Zookeeper, HDFS, Hive, Hue, Oozie, Yarn). You can add other hadoop services if you need them.

Next, open termnial. Then, in the toolbar at the top of VNC, click "Paste Text" button and paste the following command. 

ssh -L 9003:localhost:9003 -L 3306:localhost:3306 -L 7010:localhost:7010 -L 7203:localhost:7203 -L 7101:localhost:7101 -L 7102:localhost:7102 oracle@10.0.0.4

Once back in the terminal just press enter to execute the command.  

Note1: I've experienced some errors produced by copy-pasting, so in case you get a strange error, try to fix the problem or paste again.

Note2: In my case, the "BDD" VM had internal IP 10.0.0.4 and the "Hadoop" one had 10.0.0.3. You can find the internal IP in the VNC console if it changes. 

Once executed, the terminal might look a bit puzzling since both hosts have the same local name. However, this is now a ssh session established to the other machine. Don't close this window or port mapping will be lost!

What this command will do, it will map some of the ports from the second machine to the first machine. E.g. 9903 (BDD), 3306 (mysql). Frankly, I'm not completely sure which ports are required, and the bind error indicates there are too many, but in my tests this configuration works. If you hit problems you might want to adjust them. 

Advanced users might want to do this whole step through ssh rather than VNC console. Just remember to use screen in this case to keep the session open.

5a) Mapping hadoop to the BDD machine

We need to do the reverse to the second - BDD - machine. Again, in ravello console select it and connect to console, login to it.

First we need to stop all services. Click start-stop services and stop all of them (don't start BDD yet!). Wait for services to stop.

Then open terminal and paste the following command:  

ssh -L 50070:localhost:50070 -L 8020:localhost:8020 -L 9000:localhost:9000 -L 50075:localhost:50075 -L 50030:localhost:50060 -L 50010:localhost:50010 -L 50020:localhost:50020 -L 8020:localhost:8020 -L 8485:localhost:8485 -L 8480:localhost:8480 -L 8032:localhost:8032 -L 8030:localhost:8030 -L 8031:localhost:8031 -L 8033:localhost:8033 -L 8088:localhost:8088 -L 8040:localhost:8040 -L 8042:localhost:8042 -L 8041:localhost:8041 -L 9083:localhost:9083 -L 10000:localhost:10000 -L 2181:localhost:2181 -L 2888:localhost:2888 -L 3888:localhost:3888 -L 3181:localhost:3181 -L 4181:localhost:4181 -L 8019:localhost:8019 -L 9010:localhost:9010 -L 11000:localhost:11000 -L 11001:localhost:11001 -L 7077:localhost:7077 -L 7078:localhost:7078 -L 18080:localhost:18080 -L 18081:localhost:18081 oracle@10.0.0.3

This will map hadoop ports from the hadoop vm to the BDD vm. Again, don't close the terminal or mapping will be lost.

Now we just need to start BDD, but we need to tweak services script first. BDD service is not visible unless there is enough RAM assigned to the VM.

Open another terminal and execute nano /opt/bin/services

Change "minimum_memory_BDD" from 11000 to 4000, save and exit.

Run "stop/start services" from the desktop again and start BDD service, but keep all others turned off!

Congrats, you can now run the whole Big Data Lite, including BDD in the Ravello cloud! If you've configured supplied services, then you can connect to BDD using VM's external IP:9003/bdd. Otherwise you can use console and work over VNC.

I've tested the key functionality, including command line import and some transformations, and everything seems to be working fine. Keep in mind however that this is a hack and in now way an officially supported configuration. 

Other cool stuff

I mentioned Ravello has some other cool features, like sharing VMs. Once you've done the process it's very easy to share configured application with your peers or customers.

In the application, go to canvas and save it as blueprint.

Select your blueprint, click share and share with other Ravello users simply by using their email address (the address they use for either paid or trial account). This will copy VMs and the configuration to your peers, which is a huge timesaver. 

Facebook Google+ LinkedIn Twitter Delicious Email

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.