Pull to refresh
3

Hadoop *

The framework for distributed applications

Show first
Rating limit
Level of difficulty

Reach Out Top Hadoop Consulting Companies To Leverage Big Data In 2020

Reading time 7 min
Views 1.1K
image

Hadoop is divided into different modules, each of which delivers a distinct task crucial for a computer system and is uniquely designed for big data analytics. Apache Software Foundation developed this incredible platform. It is extensively utilized by worldwide developers to build big data Hadoop solutions amazingly and easily.

Big data offers several perks, some of them are; examining root causes of failures, recognizing the potential of data-driven marketing, improving and enhancing customer engagement, and much more. By offering multiple solutions in a single stream it helps in lowering the cost of the organization.

In various industries such as Retail, Manufacturing, Financial insurance, Education, Transportation, Agriculture, Healthcare, Energy, etc big data is utilized and that’s why it’s demand is expanding day by day. The Global Hadoop Market is envisioned to grow to $84.6 billion by 2021, with an expected CAGR of 63.4%.
Read more →
Total votes 3: ↑3 and ↓0 +3
Comments 2

Introducing One Ring — an open-source pipeline for all your Spark applications

Reading time 23 min
Views 1.4K

If you utilize Apache Spark, you probably have a few applications that consume some data from external sources and produce some intermediate result, that is about to be consumed by some applications further down the processing chain, and so on until you get a final result.


We suspect that because we have a similar pipeline with lots of processes like this one:


A process flowchart with more than 50 applications and about 70 datasets
Click here for a bit larger version


Each rectangle is a Spark application with a set of their own execution parameters, and each arrow is an equally parametrized dataset (externally stored highlighted with a color; note the number of intermediate ones). This example is not the most complex of our processes, it’s fairly a simple one. And we don’t assemble such workflows manually, we generate them from Process Templates (outlined as groups on this flowchart).


So here comes the One Ring, a Spark pipelining framework with very robust configuration abilities, which makes it easier to compose and execute a most complex Process as a single large Spark job.


And we just made it open source. Perhaps, you’re interested in the details.

We got you covered!
Total votes 9: ↑8 and ↓1 +7
Comments 0

Apache Hadoop Code Quality: Production VS Test

Reading time 11 min
Views 630

Рисунок 1

In order to get high quality production code, it's not enough just to ensure maximum coverage with tests. No doubts, great results require the main project code and tests to work efficiently together. Therefore, tests have to be paid as much attention as the main code. A decent test is a key success factor, as it will catch regression in production. Let's take a look at PVS-Studio static analyzer warnings to see the importance of the fact that errors in tests are no worse than the ones in production. Today's focus: Apache Hadoop.
Read more →
Total votes 4: ↑4 and ↓0 +4
Comments 0

Authors' contribution