Pull to refresh
99.04

Big Data *

Everything about big data

Show first
Rating limit
Level of difficulty

Machine Learning & Big Data: Let’s Find The Relationship Between Them

Reading time4 min
Views3.3K
image

Machine learning is indeed a famous word among technologies. Today we will relate it with another famous term that is Big data. Both these have become Buzz words these days. Let’s here find out their meaning individually.

Big data is known as the process in which we collect and analyze the large volume of data sets (called Big Data) which helps in discovering useful hidden patterns and other information such as customer choices, market trends which is really beneficial for the organizations to remain informed and customer-oriented business decisions.
Read more →
Rating0
Comments1

The QC House of Cards

Reading time4 min
Views726
There’s Gold in Them Thar Hills

Gold rushes can make people crazy. 1848 was enough of an indicator of that. When Sam Brannan announced to the world: ‘Gold! Gold! Gold from the American River!’, half the world’s population (or so it seemed to the tiny California population which lived there at the time) descended on the soon to be the newest state of the union.

San Francisco, before a small hamlet with a few hundred pioneers living there, became a centre of vice, murder and debauchery overnight.

image

Two hundred years before tulip mania hit Europe, and like in California with its argonauts or 49ers, it impoverished more than it made rich. In the early 2000s, too, the Dot.Com bubble created a speculative tendency in people when irrationality took over all reason.
Read more →
Total votes 1: ↑0 and ↓1-1
Comments0

Four Ways Quantum Computing Will Change Artificial Intelligence Forever

Reading time4 min
Views1.8K
If science were a dating app, quantum physics and machine learning probably wouldn’t be a match. They’re from completely different fields and often require completely different backgrounds and skills. But, throw in a little quantum computing and, suddenly, that science-matchmaking app becomes Tinder and the attraction between the two is palpable.

image

(Credit: cmo.adobe.com/articles/2017/5/how-will-artificial-intelligence-impact-business-tlp-ptr.html#gs.5zlifl)

Even though the extent of change that quantum computing will unleash on AI is up for debate, many experts now more than suspect that quantum computing will definitely alter AI at some level. Analysts from bank holding company BBVA, for example, point toward the natural synergy between quantum computing and AI as reasons why quantum machine learning will eventually best classical machine learning.

“Quantum machine learning can be more efficient than classic machine learning, at least for certain models that are intrinsically hard to learn using conventional computers,” says Samuel Fernández Lorenzo, a quantum algorithm researcher who collaborates with BBVA’s New Digital Businesses area. “We still have to find out to what extent do these models appear in practical applications.”
Read more →
Rating0
Comments2

Reach Out Top Hadoop Consulting Companies To Leverage Big Data In 2020

Reading time7 min
Views1.2K
image

Hadoop is divided into different modules, each of which delivers a distinct task crucial for a computer system and is uniquely designed for big data analytics. Apache Software Foundation developed this incredible platform. It is extensively utilized by worldwide developers to build big data Hadoop solutions amazingly and easily.

Big data offers several perks, some of them are; examining root causes of failures, recognizing the potential of data-driven marketing, improving and enhancing customer engagement, and much more. By offering multiple solutions in a single stream it helps in lowering the cost of the organization.

In various industries such as Retail, Manufacturing, Financial insurance, Education, Transportation, Agriculture, Healthcare, Energy, etc big data is utilized and that’s why it’s demand is expanding day by day. The Global Hadoop Market is envisioned to grow to $84.6 billion by 2021, with an expected CAGR of 63.4%.
Read more →
Total votes 3: ↑3 and ↓0+3
Comments2

Could Quantum Computing Help Reverse Climate Change?

Reading time4 min
Views1K
The unique powers of quantum computation may give humanity an important weapon — or several weapons — against climate change, according to one quantum computer pioneer.
One of the possible solutions for the excess carbon in the atmosphere and to reach global climate goals is to suck it out. It sounds pretty easy, but, in fact, the technology to do so cheaply and easily isn’t quite here yet, according to Jeremy O’Brien Chief Executive Officer, PsiQuantum, a quantum computing startup.

Currently, there is no way to simulate large complex molecules, like carbon dioxide. Current classical computers cannot simulate these types of molecules because the problem grows exponentially with the size or complexity of the simulated molecules, according to O’Brien, who wrote an article outlining the issue at the World Economic Forum’s annual meeting held recently.

“Crudely speaking, if simulating a molecule with 10 atoms takes a minute, a molecule with 11 takes two minutes, one with 12 atoms takes four minutes and so on,” he writes. “This exponential scaling quickly renders a traditional computer useless: simulating a molecule with just 70 atoms would take longer than the lifetime of the universe (13 billion years).”
Read more →
Total votes 3: ↑1 and ↓2+1
Comments3

The World’s Top 12 Quantum Computing Research Universities

Reading time5 min
Views4.2K
In just a few years, quantum computing and quantum information theory has gone from a fringe subject offered in small classes at odd hours in the corner of the physics building annex to a full complement of classes in well-funded programs being held at quantum centers and institutes at leading universities.

The question now for many would-be quantum computer students is not, “Are there universities that even offer classes in quantum computing,” but, rather, “Which universities are leaders at quantum computing research.”

We’ll look at some of the best right now:

The Institute for Quantum Computing — University of Waterloo


The University of Waterloo can proudly declare that, while many universities avoided offering quantum computing classes like cat adoption agencies avoided adoption applications from the Schrodinger family, this Canadian university went all in.

And it paid off.
Read more →
Rating0
Comments0

Introducing One Ring — an open-source pipeline for all your Spark applications

Reading time23 min
Views1.5K

If you utilize Apache Spark, you probably have a few applications that consume some data from external sources and produce some intermediate result, that is about to be consumed by some applications further down the processing chain, and so on until you get a final result.


We suspect that because we have a similar pipeline with lots of processes like this one:


A process flowchart with more than 50 applications and about 70 datasets
Click here for a bit larger version


Each rectangle is a Spark application with a set of their own execution parameters, and each arrow is an equally parametrized dataset (externally stored highlighted with a color; note the number of intermediate ones). This example is not the most complex of our processes, it’s fairly a simple one. And we don’t assemble such workflows manually, we generate them from Process Templates (outlined as groups on this flowchart).


So here comes the One Ring, a Spark pipelining framework with very robust configuration abilities, which makes it easier to compose and execute a most complex Process as a single large Spark job.


And we just made it open source. Perhaps, you’re interested in the details.

We got you covered!
Total votes 4: ↑3 and ↓1+7
Comments0

Five Methods For Database Obfuscation

Reading time20 min
Views7.4K
ClickHouse users already know that its biggest advantage is its high-speed processing of analytical queries. But claims like this need to be confirmed with reliable performance testing. That's what we want to talk about today.



We started running tests in 2013, long before the product was available as open source. Back then, just like now, our main concern was data processing speed in Yandex.Metrica. We had been storing that data in ClickHouse since January of 2009. Part of the data had been written to a database starting in 2012, and part was converted from OLAPServer and Metrage (data structures previously used by Yandex.Metrica). For testing, we took the first subset at random from data for 1 billion pageviews. Yandex.Metrica didn't have any queries at that point, so we came up with queries that interested us, using all the possible ways to filter, aggregate, and sort the data.

ClickHouse performance was compared with similar systems like Vertica and MonetDB. To avoid bias, testing was performed by an employee who hadn't participated in ClickHouse development, and special cases in the code were not optimized until all the results were obtained. We used the same approach to get a data set for functional testing.

After ClickHouse was released as open source in 2016, people began questioning these tests.

Read more →
Total votes 7: ↑5 and ↓2+7
Comments4

Machine Learning in Static Analysis of Program Source Code

Reading time27 min
Views2.9K

Machine Learning in Static Analysis of Program Source Code

Machine learning has firmly entrenched in a variety of human fields, from speech recognition to medical diagnosing. The popularity of this approach is so great that people try to use it wherever they can. Some attempts to replace classical approaches with neural networks turn up unsuccessful. This time we'll consider machine learning in terms of creating effective static code analyzers for finding bugs and potential vulnerabilities.
Read more →
Total votes 2: ↑2 and ↓0+2
Comments0

How Ecommerce Fueled By the Pillars of AI Technology

Reading time4 min
Views848


At present, we see artificial intelligence is implemented across the corridors of business operations and also the way we shop and trade online. To hit a home run in the retail game, genius AI applications, PIM solutions, and e-commerce development tools are now offering smart solutions: predictive analysis, recommendation engines, inventory management, and warehouse automation to create a more profitable shopping experience for consumers.

Now more than ever, e-commerce is an AI innovation game


Artificial Intelligence often sometimes seems complicated to newbies but in reality, it is simple in use and gives you the ability to predict customer needs. This paves the way for e-commerce companies to become a “big brand” or “big business” with revolutionary AI tools.

Now that AI algorithms making way for consumer acceptance of AI like never before, how can you use it to create more profitable outcomes in e-commerce?

Interesting E-commerce Stats:


With an estimated global population of 7.7 billion, 25 percent of people shopping through e-commerce stores. According to Statista, 52% of e-commerce stores will have omnichannel capabilities by 2020 which means they can communicate and sell with their consumers via multiple channels. For example, they can use their e-commerce website, Facebook e-shop, email account, and Instagram account.

Examples of AI tools and PIM software for e-commerce businesses that can help them have a high bar on customer service and marketing:
Read more →
Total votes 1: ↑0 and ↓1-1
Comments0

Apache Hadoop Code Quality: Production VS Test

Reading time11 min
Views682

Рисунок 1

In order to get high quality production code, it's not enough just to ensure maximum coverage with tests. No doubts, great results require the main project code and tests to work efficiently together. Therefore, tests have to be paid as much attention as the main code. A decent test is a key success factor, as it will catch regression in production. Let's take a look at PVS-Studio static analyzer warnings to see the importance of the fact that errors in tests are no worse than the ones in production. Today's focus: Apache Hadoop.
Read more →
Total votes 4: ↑4 and ↓0+4
Comments0

Analyzing the Code of ROOT, Scientific Data Analysis Framework

Reading time14 min
Views2.4K
Picture 3
While Stockholm was holding the 118th Nobel Week, I was sitting in our office, where we develop the PVS-Studio static analyzer, working on an analysis review of the ROOT project, a big-data processing framework used in scientific research. This code wouldn't win a prize, of course, but the authors can definitely count on a detailed review of the most interesting defects plus a free license to thoroughly check the project on their own.

Introduction


Picture 1

ROOT is a modular scientific software toolkit. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage. It is mainly written in C++. ROOT was born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations.
Read more →
Total votes 22: ↑22 and ↓0+22
Comments4

What's new in ML.NET and Model Builder

Reading time2 min
Views1K
We are excited to announce updates to Model Builder and improvements in ML.NET. You can learn more in the «What’s new in ML.NET?.» session at .NET Conf.

ML.NET is an open-source and cross-platform machine learning framework (Windows, Linux, macOS) for .NET developers.

ML.NET offers Model Builder (a simple UI tool) and CLI to make it super easy to build custom ML Models using AutoML.

Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom AI into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Recommendation, Image Classification and more!..

Read more →
Total votes 4: ↑4 and ↓0+4
Comments0

How we created IoT system for managing solar energy usage

Reading time5 min
Views1.3K

If you have no idea about the development architecture and mechanical/electrical design behind IoT solutions, they could seem like "having seemingly supernatural qualities or powers". For example, if you show a working IoT system to 18th century people, they'd think it's magic.This article is sort of busting such myth. Or, to put it more technically, about hints for fine-tuning the IoT development for an awesome project in solar energy management area.

Read more →
Total votes 9: ↑7 and ↓2+5
Comments0

PVS-Studio Visits Apache Hive

Reading time12 min
Views1.2K
Рисунок 1

For the past ten years, the open-source movement has been one of the key drivers of the IT industry's development, and its crucial component. The role of open-source projects is becoming more and more prominent not only in terms of quantity but also in terms of quality, which changes the very concept of how they are positioned on the IT market in general. Our courageous PVS-Studio team is not sitting idly and is taking an active part in strengthening the presence of open-source software by finding hidden bugs in the enormous depths of codebases and offering free license options to the authors of such projects. This article is just another piece of that activity! Today we are going to talk about Apache Hive. I've got the report — and there are things worth looking at.
Read more →
Total votes 23: ↑20 and ↓3+17
Comments0

Contextual Emotion Detection in Textual Conversations Using Neural Networks

Reading time10 min
Views3.8K

Nowadays, talking to conversational agents is becoming a daily routine, and it is crucial for dialogue systems to generate responses as human-like as possible. As one of the main aspects, primary attention should be given to providing emotionally aware responses to users. In this article, we are going to describe the recurrent neural network architecture for emotion detection in textual conversations, that participated in SemEval-2019 Task 3 “EmoContext”, that is, an annual workshop on semantic evaluation. The task objective is to classify emotion (i.e. happy, sad, angry, and others) in a 3-turn conversational data set.
Read more →
Total votes 37: ↑37 and ↓0+37
Comments0

How to speed up LZ4 decompression in ClickHouse?

Reading time23 min
Views16K
When you run queries in ClickHouse, you might notice that the profiler often shows the LZ_decompress_fast function near the top. What is going on? This question had us wondering how to choose the best compression algorithm.

ClickHouse stores data in compressed form. When running queries, ClickHouse tries to do as little as possible, in order to conserve CPU resources. In many cases, all the potentially time-consuming computations are already well optimized, plus the user wrote a well thought-out query. Then all that's left to do is to perform decompression.



So why does LZ4 decompression becomes a bottleneck? LZ4 seems like an extremely light algorithm: the data decompression rate is usually from 1 to 3 GB/s per processor core, depending on the data. This is much faster than the typical disk subsystem. Moreover, we use all available CPU cores, and decompression scales linearly across all physical cores.
Read more →
Total votes 23: ↑21 and ↓2+19
Comments0

Google News and Leo Tolstoy: visualizing Word2Vec word embeddings using t-SNE

Reading time7 min
Views14K

Everyone uniquely perceives texts, regardless of whether this person reads news on the Internet or world-known classic novels. This also applies to a variety of algorithms and machine learning techniques, which understand texts in a more mathematical way, namely, using high-dimensional vector space.

This article is devoted to visualizing high-dimensional Word2Vec word embeddings using t-SNE. The visualization can be useful to understand how Word2Vec works and how to interpret relations between vectors captured from your texts before using them in neural networks or other machine learning algorithms. As training data, we will use articles from Google News and classical literary works by Leo Tolstoy, the Russian writer who is regarded as one of the greatest authors of all time.

We go through the brief overview of t-SNE algorithm, then move to word embeddings calculation using Word2Vec, and finally, proceed to word vectors visualization with t-SNE in 2D and 3D space. We will write our scripts in Python using Jupyter Notebook.

Read more →
Total votes 28: ↑28 and ↓0+28
Comments0

How to generate a huge financial graph with money laundering patterns?

Reading time4 min
Views3K
image

Couple of years ago my team (compliance in one of Swiss banks) and I had an interesting task to implement — we had to generate a huge random graph of financial transactions between clients, companies and ATMs. Moreover, we wanted this graph to contain some money-laundering and other financial crime patterns alongside with nodes description such as names, addresses, currencies etc. Obviously, all data should be randomly generated from scratch as long as we could not use any real data for obvious reasons.

As a solution we wrote a generator that I’d love to share with you. This article explains why we needed it and how this generator is working, but if you don’t want to read and want to try it on your own here is the code: https://github.com/MGrin/transactions-graph-generator. I hope that our experience will be helpful to any of you.
Read more →
Total votes 3: ↑3 and ↓0+3
Comments0

Authors' contribution