Data Engineering *

discuss data collection and preparation

ArticlesPostsNewsAuthors

Konard Apr 1 at 12:15

The Links Theory 0.0.2

Medium

27 min

Data Engineering*Open source*Mathematics*Abnormal programming*Programming*

Translation

This world needs a new theory — a theory that could describe all the theories on the planet. A theory that could easily describe philosophy, mathematics, physics, and psychology. The one that makes all kinds of sciences computable.

This is exactly what we are working on. If we succeed, this theory will become the unified meta-theory of everything.

A year has passed since our last publication, and our task is to share the progress with our English-speaking audience. This is still not a stable version; it’s a draft. Therefore, we welcome any feedback, as well as your participation in the development of the links theory.

As with everything we have done before, the links theory is published and released into the public domain — it belongs to humanity, that means, it is yours. This work has many authors, but the work itself is far more important than any specific authorship. We hope that today it can become useful to more people.

We invite you to become a part of this exciting adventure.

Witness the birth of meta-theory

+10

lukyanchikov Mar 13 at 10:46

jBPM as AI Orchestration Platform

Easy

4 min

647

Artificial IntelligenceData Engineering*Open source*Openshift*Visual programming*

Review

Author: Sergey Lukyanchikov, C-NLTX/Open-Source

Disclaimer: The views expressed in this document reflect the author's subjective perspective on the current and potential capabilities of jBPM.

This text presents jBPM as a platform for orchestrating external AI-centric environments, such as Python, used for designing and running AI solutions. We will provide an overview of jBPM’s most relevant functionalities for AI orchestration and walk you through a practical example that demonstrates its effectiveness as an AI orchestration platform:

ValRakitine Feb 9 at 14:53

Eco-Methodological Sustainability

6 min

841

Abnormal programming*System Analysis and Design*IT Infrastructure*Developer Relations*Data Engineering*

Analytics

Recovery Mode

In recent years, discussions about the environmental impact of information and communication technologies (ICTs) have largely revolved around hardware — data centers, electronic waste, and energy consumption. However, an equally important factor has been overlooked: the software development methodologies themselves.

When I read the UNCTAD “Digital Economy Report 2024”, I was struck by the complete absence of any mention of how programming methodologies impact sustainability. There was no discussion of whether developers use algorithm-centric or code-centric methodologies when creating software, nor how these choices affect the environment.

This realization led me to introduce the concept of Eco-Methodological Sustainability — a new approach that highlights the role of structured software development methodologies in shaping an environmentally sustainable future for the digital economy.

Falcon_eye Jan 11 at 14:55

Apache Kafka… Basics to drive

Medium

5 min

2.2K

Data Engineering*Data storagingBig Data*

Review

Apache Kafka is a distributed event-streaming platform designed to handle real-time data feeds. It allows applications to publish, process, and subscribe to streams of data in a highly scalable, fault-tolerant manner.

ArcaneGamingcom Dec 5 2024 at 15:45

How to Choose the Optimal Authentication Solution for Your Application

Medium

3 min

1.6K

API*Asterisk*Big Data*Data Engineering*Email-marketing*

Retrospective

In today's digital world, where applications process increasing amounts of sensitive data, ensuring reliable user authentication is critical. Authentication is the process of verifying the identity of a user who is trying to access a system. A properly chosen authentication method protects data from unauthorized access, prevents fraud, and increases user confidence.

However, with the development of technology, new authentication methods are emerging, and choosing the optimal solution can be difficult. This article will help developers and business owners understand the variety of authentication approaches and make informed choices.

Falcon_eye Jul 24 2024 at 21:15

How to set up Apache Airflow for 10 minutes via Docker

Medium

2 min

Data Engineering*Python*Big Data*

Tutorial

Prerequisites:
1. Install Docker
2. Install VSCode

STEP BY STEP

1. Open VSCode that you previously installed and click on "Extensions" tab right on the menu bar, then type 'docker' to find proper extension and click "install":

Nikiz May 24 2024 at 09:47

Utilizing Wearable Digital Health Technologies for Cardiovascular Monitoring

Medium

17 min

533

Manufacture and development of electronics*BiotechnologiesIOTData Engineering*

Case

Wearable Digital Health Technologies for Monitoring in Cardiovascular Medicine

This review article presents a three-part true-life clinical vignette that illustrates how digital health technology can aid providers caring for patients with cardiovascular disease. Specific information that would identify real patients has been removed or altered. Each vignette is followed by a discussion of how these methods were used in the care of the patient.

Ninil Apr 1 2024 at 19:10

User-defined aggregation functions in Spark

Medium

6 min

Data Engineering*Big Data*Scala*

Below, we will discuss user-defined aggregation functions (UDAF) using org.apache.spark.sql.expressions.Aggregator, which can be used for aggregating groups of elements in a DataSet into a single value in any user-defined way.

Let’s start by examining an example from the official documentation that implements a simple aggregation

barbarisanet Mar 4 2024 at 03:00

Validation of custom data with Minimal API filters, sensible and without fuss

Medium

6 min

1.8K

.NET*API*C#*Microsoft Azure*Data Engineering*

FAQ

Translation

This is a step-by-step research of a clear and intuitive approach to validate custom data in .NET applications with help of the Minimal API filters and the FlatValidator.

rumatavz Jan 16 2024 at 19:42

Evaluating Performance: CosmosDB vs. Azure SQL

Easy

4 min

7.3K

SQL*NoSQL*Microsoft SQL Server*Microsoft Azure*Data Engineering*

Analytics

In the evolving landscape of database technology, choosing the right database management system is crucial for the efficiency and scalability of applications. This article presents a detailed comparison of the performance between Microsoft's CosmosDB and MS SQL Server. We'll examine how each database performs under various load conditions and share some interesting findings.

ArcaneGamingcom Oct 30 2023 at 08:55

UX Designer Job Interview: 10 questions to answer, 5 questions to ask

Easy

6 min

846

SQL*IT Standards*Usability*GitHub*Data Engineering*

Retrospective

Recovery Mode

No matter how many degrees you have or how high your experience level is, your recruiters need to evaluate your knowledge of UX design as a whole. But keep in mind that a job interview is not an exam, so here you are expected not to recite the textbook definitions learned by heart, but rather share your personal understanding of UX and your role as a designer in general. Consider talking about how you define UX, what creates value in the design, what are the necessary parts of a UX design process, what are the current trends in UX. You might also be asked to explain the difference between UI and UX to see how you understand the role of each in the development process.

gfx_pro Oct 9 2023 at 15:55

A (more) accurate camera sensor dynamic range measurement

7 min

1.4K

Photographic equipmentData Engineering*

Analytics

Translation

Hello, everyone! In this post, let's talk about how to (more) accurately measure the dynamic range of a camera sensor and what can be done with these measurements.

Of course, I am not an expert in computer vision, a programmer or a statistician, so please feel free to correct me in the comments if I make mistakes in this post. Here my interest was primarily focused on everyday and practical tasks, such as photography, but I believe the results may also be useful to computer vision professionals.

kirill702b Aug 3 2023 at 12:02

How to access real-time smart contract data from Python code (using Lido contract as an example)

Medium

7 min

Decentralized networks*Python*Solidity*CryptocurrenciesData Engineering*

Tutorial

Let’s imagine you need access to the real-time data of some smart contracts on Ethereum (or Polygon, BSC, etc.) like Uniswap or even PEPE coin to analyze its data using the standard data scientist/analyst tools: Python, Pandas, Matplotlib, etc. In this tutorial, I’ll show you more sophisticated data access tools that are more like a surgical scalpel (The Graph subgraphs) than a well-known Swiss knife (RPC node access) or hammer (ready-to-use APIs). I hope my metaphors don’t scare you ?.

Z1at Jun 13 2023 at 17:51

Mathematical meaning of principal component analysis (PCA)

Medium

7 min

2.1K

Big Data*Data Engineering*

This article aims at explaining the mathematical sense of the Principal Component Analysis (PCA) in practice.

Z1at Jun 2 2023 at 18:13

Pixel image rotation

Easy

13 min

1.5K

C*Data Engineering*

Brief problem formulation

The program accepts as input the absolute path to the image in the bmp extension and the path where you save the result of the work. Then, it rotates the image by 90 degrees counterclockwise. Afterwards, the program saves the new image.

The program is executed on C.

Z1at May 31 2023 at 11:45

Blinking into Morse code

Easy

10 min

2.3K

Data Engineering*IT Infrastructure*Python*

From sandbox

Explaining main algorithm.

For a while I’ve been thinking of writing a scientific article. I wanted it to have certain utility.

Morse code is binary: it takes only two values – either dot (short) or hyphen (long). I figured out that short (s) can stand for two-eye blinking whilst long (l) can indicate left-eye blinking. Another question emerged: how to understand when does one-symbol recording stop?

Empty space between two symbols can be presented by right-eye blinking – r. If I input singly symbol of short (dot) and long (hyphen), I will blink my right eye once to indicate the space between two symbols.

To separate independent words, one has to blink her right eye twice and get rr.

Hence, I have collected an ordered set of symbols – r, l, s, - that can be converted into a full-fledged text. Once I accomplish the transformation, I get an answer.

AmiraB2 May 30 2023 at 07:53

Feature Engineering: Techniques and Best Practices for Data Scientists

8 min

2.5K

Big Data*Data Engineering*

Tutorial

The most important stage in the data science process is feature engineering, which entails turning raw data into useful features that might enhance the performance of machine learning models. It calls for creativity, data-driven thinking, and domain expertise. Data scientists can improve the prediction capability of their models and find hidden patterns in the data by choosing, combining, and inventing relevant features. Handling missing data, scaling features, encoding categorical variables, constructing interaction terms, and other procedures are examples of feature engineering techniques. The best practises involve investigating the data, testing and improving features iteratively, and applying domain knowledge to draw out important information. The accuracy and effectiveness of machine learning models are significantly influenced by effective feature engineering.

Evrone Nov 16 2022 at 11:13

How we designed the user interface for an enterprise analytical system

5 min

Singula Team corporate blogData Engineering*CGI*Big Data*

In 2021, we were contacted by an industrial plant that was faced with the need to create a system for analyzing processes in its production. The enterprise team studied ready-made solutions, but none of the analytics system designs fully covered the required functionality. So they turned to us with a request to develop their own analytical system that would collect data from all machines and allow it to be analyzed to see bottlenecks in production. For this project, we created a data-driven UI/UX design and also developed a web-based interface for the equipment monitoring system.

IgKend Oct 21 2022 at 18:42

How Yandex Made Their Biggest Improvement in the Search Engine with the Help of Toloka

5 min

2.3K

Search engines*Data Mining*Machine learning*Artificial IntelligenceData Engineering*

Tutorial

Toloka is a crowdsourcing platform and microtasking project launched by Yandex to quickly markup large amounts of data. But how can such a simple concept play a crucial role in improving the work of neural networks?

Learn how

alexandervolchek Jul 11 2022 at 10:02

What are neural networks and what do we need them for?

4 min

4.7K

Mathematics*Machine learning*Data Engineering*

Explaining through simple examples

For a long time, people have been thinking on how to create a computer that could think like a person. The advent of artificial neural networks is a significant step in this direction. Our brain consists of neurons that receive information from sensory organs and process it: we recognize people we know by their faces, and we feel hungry when we see delicious food. All of this is the result of brain neurons working and interacting with each other. This is also the principle that artificial neural networks are based on, simulating the processes occurring in the human brain.

What are neural networks

Artificial neural networks are a software code that imitates the work of a brain and is capable of self-learning. Like a biological network, an artificial network also consists of neurons, but they have a simpler structure.

If you connect neurons into a sufficiently large network with controlled interaction, they will be able to perform quite complex tasks. For example, determining what is shown in a picture, or independently creating a photorealistic image based on a text description.