Earlier I've written about Big Data High Availability in different aspects and I intentionally avoided the Disaster Recovery topic. High Availability answers on...
Earlier I've written about Big Data High Availability in different aspects and I intentionally avoided the Disaster Recovery topic. High Availability answers on the question how system should process in case of failure one of the component (like Name Node or KDC) within one system (like one Hadoop Cluster), Disaster Recovery answers on the question what to do in case if entire system will fail (Hadoop cluster or even Data Center will go down). In this blog, I also would like...
Earlier I've written about Big Data High Availability in different aspects and I intentionally avoided the Disaster Recovery topic. High Availability answers on the question how system should process...
Hadoop is an ecosystem consisting of multiple different components in which each component (or engine) consumes certain resources. There are a few resource...
Hadoop is an ecosystem consisting of multiple different components in which each component (or engine) consumes certain resources. There are a few resource management techniques that allow administrators to define how finite resources are divided across multiple engines. In this post, I'm going to talk about these different techniques in detail. Talking about resource management, I'll divide the topic in these sub-topics: 1. Dealing with low latency engines...
Hadoop is an ecosystem consisting of multiple different components in which each component (or engine) consumes certain resources. There are a few resource management techniques that allow...
In this blog post, I'd like to briefly review the high availability functionality in Oracle Big Data Appliance and Big Data Cloud Service. The good news on all...
In this blog post, I'd like to briefly review the high availability functionality in Oracle Big Data Appliance and Big Data Cloud Service. The good news on all of this is that most of these features are always available out of the box on your systems, and that no extra steps are required from your end. One of the key value-adds of leveraging a hardened system from Oracle. A special shout-out to Sandra and Ravi from our team, for helping with this blog post. For this post on...
In this blog post, I'd like to briefly review the high availability functionality in Oracle Big Data Appliance and Big Data Cloud Service. The good news on all of this is that most of these features...
A while ago I've wrote Oracle best practices for building secure Hadoop cluster and you could find details here. In that blog I intentionally didn't mention...
A while ago I've wrote Oracle best practices for building secure Hadoop cluster and you could find details here. In that blog I intentionally didn't mention Kafka's security, because this topic deserved dedicated article. Now it's time to do this and this blog will be devoted by Kafka security only. Kafka Security challenges 1) Encryption in motion. By default you communicate with Kafka cluster over unsecured network and everyone, who can listen network between your client...
A while ago I've wrote Oracle best practices for building secure Hadoop cluster and you could find details here. In that blog I intentionally didn't mention Kafka's security, because this...
Security is a very important aspect of many projects and you must not underestimate it, Hadoop security is very complex and consist of many components, it's...
Security is a very important aspect of many projects and you must not underestimate it, Hadoop security is very complex and consist of many components, it's better to enable one by one security features. Before starting the explanation of different security options, I'll share some materials that will help you to get familiar with the foundation of algorithms and technologies that underpin many security features in Hadoop. Before you begin First of all, I recommend that you...
Security is a very important aspect of many projects and you must not underestimate it, Hadoop security is very complex and consist of many components, it's better to enable one by one...
In my previous blogs, I already told about data loading into HDFS. In the first blog, I covered data loading from generic servers to HDFS. The second blog was...
In my previous blogs, I already told about data loading into HDFS. In the first blog, I covered data loading from generic servers to HDFS. The second blog was devoted by offloading data from Oracle RDBMS. Here I want to explain how to load into Hadoop streaming data. Before all, I want to note that I will now explain Oracle Golden Gate for Big Data just because it already has many blogposts. Today I'm going to talk about Flume and Kafka. What is Kafka? Kafka is distributed...
In my previous blogs, I already told about data loading into HDFS. In the first blog, I covered data loading from generic servers to HDFS. The second blog was devoted by offloading data from...
Some time ago I started to explain how to move data to the Hadoop Distributed File System (HDFS) from different sources. In my first blogpost about this I told...
Some time ago I started to explain how to move data to the Hadoop Distributed File System (HDFS) from different sources. In my first blogpost about this I told about batch data loading from generic Linux (or even Unix) servers, Today I’m going to explain some best practices about data movement (offloading) from the Oracle Database to the HDFS in batch mode. Generally speaking there are two major ways: Sqoop and Copy2Hadoop. You may also think about Oracle Table Access for...
Some time ago I started to explain how to move data to the Hadoop Distributed File System (HDFS) from different sources. In my first blogpost about this I told about batch data loading from generic...
Many customers are keep asking me about "default" (single) compression codec for Hadoop. Actually answer on this question is not so easy and let me explain why....
Many customers are keep asking me about "default" (single) compression codec for Hadoop. Actually answer on this question is not so easy and let me explain why. Bzip2 or not Bzip2? In my previous blogpost I published results of the compression rate for some particular compression codecs into Hadoop. Based on those results you may think that it’s a good idea to compress everything with bzip2. But be careful with this. Within the same research, I noted that bzip2 actually has...
Many customers are keep asking me about "default" (single) compression codec for Hadoop. Actually answer on this question is not so easy and let me explain why. Bzip2 or not Bzip2? In my previous...
Classification of the compression algorithms. Big Data term supposes to store and handle huge amount of data. Big Data technologies like Hadoop allow us to...
Classification of the compression algorithms. Big Data term supposes to store and handle huge amount of data. Big Data technologies like Hadoop allow us to scale out storage and processing layers. It’s easy to scale your system – just buy new servers and add them to cluster. But before buying new hardware or delete historical data it is a good idea try to reorganize storage, specifically compress data. I’m sure that many of you heard about different compression programs that...
Classification of the compression algorithms. Big Data term supposes to store and handle huge amount of data. Big Data technologies like Hadoop allow us to scale out storage and processing layers....