Pull to refresh
24.57

Scala *

General-purpose programming language providing support for functional programming and a strong static type system

Show first
Rating limit
Level of difficulty

User-defined aggregation functions in Spark

Level of difficultyMedium
Reading time6 min
Views719

Below, we will discuss user-defined aggregation functions (UDAF) using org.apache.spark.sql.expressions.Aggregator, which can be used for aggregating groups of elements in a DataSet into a single value in any user-defined way.

Let’s start by examining an example from the official documentation that implements a simple aggregation

Read more

Big Data Tools EAP 12 Is Out: Experimental Python Support and Search Function in Zeppelin Notebooks

Reading time3 min
Views1.2K

Update 12 of the Big Data Tools plugin for IntelliJ IDEA Ultimate, PyCharm Professional Edition, and DataGrip has been released. You can install it from the JetBrains Plugin Repository or from inside your IDE. The plugin allows you to edit Zeppelin notebooks, upload files to cloud filesystems, and monitor Hadoop and Spark clusters.


In this release, we've added experimental Python support and global search inside Zeppelin notebooks. We’ve also addressed a variety of bugs. Let's talk about the details.


Read more →

Big Data Tools Update 11 Is Out

Reading time3 min
Views1.7K

EAP 11 of the Big Data Tools plugin for IntelliJ IDEA Ultimate, PyCharm, and DataGrip is available starting today. You can install it from the JetBrains Plugin Repository or inside your IDE.


Big Data Tools is a new JetBrains plugin that allows you to connect to Hadoop and Spark clusters and monitor nodes, applications, and jobs. It also brings support for editing and running Zeppelin notebooks inside IntelliJ IDEA and DataGrip, so you can create, edit, and run Zeppelin notebooks without ever having to leave your favorite IDE. The plugin offers smart navigation, code completion, inspections, quick-fixes, and refactoring inside notebooks.


Read more →

ZTools for Apache Zeppelin

Reading time8 min
Views1.4K



Zeppelin is a web-based notebook for data engineers that enables data-driven, interactive data analytics with Spark, Scala, and more.


The project recently reached version 0.9.0-preview2 and is being actively developed, but there are still many things to be implemented.


One such thing is an API for getting comprehensive information about what's going on inside the notebook. There is already an API that completely solves the problems of high-level notebook management, but it doesn’t help if you want to do anything more complex.

Read more →

Automate SOAP client auto-generation routines with WSDL import for SBT and Scala

Reading time5 min
Views3.8K

Working with SOAP often gets tricky, and dealing with WSDL might be a huge contribution to the complexity of this task. Really, it could be the least expected thing to face when you are into a modern & fancy language like for example, Scala, that is well known for its reactiveness and asynchronous way of dealing with requests. In fact, many of the software developers that have made their way into industry quite recently, might not even know about SOAP and WSDL protocols, and get quickly annoyed or even enraged when first trying to connect to such a legacy service. So, should we deprecate this altogether in favour of modern technology stack, or maybe there is a less painful solution?

Read more →

Escaping the Thicket of Tests: Building a Shortcut from a Fixture to an Assertion

Reading time15 min
Views1.1K


In this article, I would like to propose an alternative to the traditional test design style using functional programming concepts in Scala. This approach was inspired by many months of pain from maintaining dozens of failing tests and a burning desire to make them more straightforward and more comprehensible.


Even though the code is in Scala, the proposed ideas are appropriate for developers and QA engineers who use languages supporting functional programming. You can find a Github link with the full solution and an example at the end of the article.

Read more →

Compilable configuration of a distributed system

Reading time17 min
Views1.4K

In this post we'd like to share an interesting way of dealing with configuration of a distributed system.
The configuration is represented directly in Scala language in a type safe manner. An example implementation is described in details. Various aspects of the proposal are discussed, including influence on the overall development process.


Overall configuration management process


(на русском)

Read more →