Pull to refresh

All streams

Show first
Period
Level of difficulty

Lingtrain Aligner. How to make parallel books for language learning. Part 1. Python and Colab version

Reading time8 min
Views4.1K

title


If you're interested in learning new languages or teaching them, then you probably know such a way as parallel reading. It helps to immerse yourself in the context, increases the vocabulary, and allows you to enjoy the learning process. When it comes to reading, you most likely want to choose your favorite author, theme, or something familiar and this is often impossible if no one has published such a variant of a parallel book. It's becoming even worse when you're learning some cool language like Hungarian or Japanese.


Today we are taking a big step forward toward breaking this situation.


We will use the lingtrain_aligner tool. It's an open-source project on Python which aims to help all the people eager to learn foreign languages. It's a part of the Lingtrain project, you can follow us on Telegram, Facebook and Instagram. Let's start!


Find the texts


At first, we should find two texts we want to align. Let's take two editions of "To Kill a Mockingbird" by Harper Lee, in Russian and the original one.

Read more →

Millions of orders per second matching engine testing

Reading time4 min
Views9.9K

I had some experience in the matching engine development for cryptocurrency exchange some time ago. That was an interesting and challenging experience. I developed it in clear C++ from scratch. The testing of it is also quite a challenging task. You need to get data for testing, perform testing, collect some statistics, and at last, analyze collected data to find weak points and bottlenecks. I want to focus on testing the C++ matching engine and show how testing can give insights for optimizations even without the need to change the code. The matching engine I developed can do more than 1’000’000 TPS (transactions per second) and is 10x times faster than the matching engine of the Binance cryptocurrency exchange (see one post on Binance Blog).

Read more

PVS-Studio 7.14: intermodular analysis in C++ and plugin for JetBrains CLion

Reading time4 min
Views777

0854_Release_714/image2.png


The PVS-Studio team is increasing the number of diagnostics with each new release. Besides, we are improving the analyzer's infrastructure. This time we added the plugin for JetBrains CLion. Moreover, we introduced intermodular analysis of C++ projects and speeded up the C# analyzer core.

Read more →

DSO138 upgrade

Reading time11 min
Views5.2K

On Ali, an interesting toy – an oscilloscope called DSO138 is sold for a very inexpensive price. It has already gained quite a lot of popularity among electronics lovers, but the parameters of this device, alas, allow it to be more or less fully used only for debugging very low-frequency circuits. Actually, it is not positioned as a tool, but rather as a DIY-kit for novice electronics engineers.

This "toy" oscilloscope is assembled on the STM32F103 microcontroller, and with a fairly competent circuit design of the digital part, the presence of a fairly decent 320X240-dot color display, and not the most rotten analog path, everything, alas, is ruined by very weak ADCs on board the 32F103. The claimed band of 200 kHz can be recognized as such only with a very large stretch. Yes, it will show the presence or absence of a signal with such a frequency, but it will not be possible to really look at something beyond this.

At the same time, the 103-series has a slightly more powerful brother - the STM32F303, it is almost completely compatible with the legs, but it is significantly better in terms of the parameters we are interested in, there are 4 ADCs on board with a conversion frequency of 5 MHz (6 MHz with a 10-bit resolution). In this scenario, if you use all 4 ADCs in parallel with a 10-bit resolution, you can get a effective resolution of up to an honest 24 MSPS (millions of samples per second). The microcontroller is also inexpensive; you can easily find it on the same Ali for very reasonable money again. It is clear that the idea to change the microcontroller arose almost immediately after I tried this DSO138.

At the same time, if upgraded the toy can turn out to be a completely full-fledged tool that even professionals, not just novice amateurs, could already use. With these thoughts in mind, I decided to try to do something with a Chinese toy in my free time.

Read more

Speed Dating with the CodeIgniter 4 on GitHub

Reading time2 min
Views2.9K

Getting acquainted with the CodeIgniter 4 PHP framework is quite simple.

Spend the evening following the instructions in the «Build Your First Application» section. Since the documentation is written in a good, technically understandable language, it is even possible to get some aesthetic pleasure in the process of familiarizing yourself with this and other sections.

The feeling of airiness and consistency of the CodeIgniter 4 project will be present with you everywhere now.

What is so attractive about CodeIgniter 4?

Read more

AngouriMath 1.3 update

Reading time5 min
Views4.3K

Four months of awesome work together with a few new contributors finally result in a new major release, which I'm happy to announce about.

Now we get completely new matrices, improved parser, a lot of new functions, almost rewritten interactive package (for working in Jupyter) and many more.

This article about a big update in a FOSS symbolic algebra library for .NET, I hope it may be interesting for someone!

Read more

How to be good in hackathons as a developer? Practice creating simple pet projects

Reading time6 min
Views3.5K

Hackathons could be very intimidating and stressful. The key to getting better is doing simple projects. In this article, we will look at an example of a web app that can be used for sharpening your skills when you prepare for a hackathon. We will use a powerful Google API based on Machine Learning and apply the following technologies: ASP.NET, HTML, Docker, Heroku, and Git.

Read more

You don't know Redis

Reading time8 min
Views4.4K

Originally posted on DEV.to

In my previous post, I touched on the point that Redis is more than just an in-memory cache.

Most people do not even consider Redis as a primary database. There are a lot of use cases where Redis is a perfect choice for non-cache related tasks.

In this article, I will demonstrate how I built a fully functional Q&A board for asking and upvoting the most interesting questions. Redis will be used as a primary database.

I will use Gatsby (React), Netlify serverless functions and Upstash Serverless Redis.

Upstash has been a good choice so far and I decided to try it out in a more serious project. I love everything serverless and how it makes things simpler for me.

Serverless will be a great choice for most tasks however you need to know the pros and cons of the tech you are using. I encourage you to learn more about serverless to get the most out of it.

Read more

How WCF Shoots Itself in the Foot With TraceSource

Reading time11 min
Views979

We don't often get the chance to write something on parallel programming issues. This time we "got lucky". The TraceEvent standard method has some implementation peculiarities. They resulted in an error with multiple threads blocking. So we'd like to warn users about this nuance and cover this interesting case from our users support practice. Why was our support involved? Keep reading to find out. Enjoy the reading!

Read more

«No Windows no problems» What?

Reading time5 min
Views4.3K

Hi, My name is Alex and I am a DevOps engineer at Altenar. “No Windows, no problems.” - that is the answer I got by asking a guru of Ansible "How do you manage Windows?" on one of the local Ansible meetups. Although we have been running a modern stack (k8s, helm, .net core, etc) in production for about two years, that’s not how it has always been.

Read more

Show me your settings and I will tell you who you are

Reading time5 min
Views1.6K
Today we will discuss the silver bullet of software development: the program settings.

Everybody understands what settings are, more or less. Every computer user is going to stumble upon them, eventually. But the user does not always end up a winner in this encounter. There are 3 major problems to tackle: it's hard to find the right setting, the required setting does not exist, and it's unclear what this or that setting's responsible for.

To understand this, let's first think about the origin of settings. In theory, settings provide a way for the developer to adapt a program to a certain use case. The users are different though, one wants it his way, another demands her own, and even though the differences are miniscule, the decision is left up to the user. As a result, you must know about user tasks and their goals to create the right settings window.

That's the theory so far. What about the practice?
Continuing is optional

High-Quality Text-to-Speech Made Accessible, Simple and Fast

Reading time8 min
Views11K

image


There is a lot of commotion in text-to-speech now. There is a great variety of toolkits, a plethora of commercial APIs from GAFA companies (based both on new and older technologies). There are also a lot of Silicon Valley startups trying to ship products akin to "deep fakes" in speech.


But despite all this ruckus we have not yet seen open solutions that would fulfill all of these criteria:


  • Naturally sounding speech;
  • A large library of voices in many languages;
  • Support for 16kHz and 8kHz out of the box;
  • No GPUs / ML engineering team / training required;
  • Unique voices not infringing upon third-party licenses;
  • High throughput on slow hardware. Decent performance on one CPU thread;
  • Minimalism and lack of dependencies. One-line usage, no builds or coding in C++ required;
  • Positioned as a solution, not yet another toolkit / compilation of models developed by other people;
  • Not affiliated by any means with ecosystems of Google / Yandex / Sberbank;

We decided to share our open non-commercial solution that fits all of these criteria with the community. Since we have published the whole pipeline we do not focus much on cherry picked examples and we encourage you to visit our project GitHub repo to test our TTS for yourself.

Algorithms in Go

Reading time2 min
Views5.8K

Most solutions to algorithmic problems can be grouped into a rather small number of patterns. When we start to solve some problem, we need to think about how we would classify them. For example, can we apply fast and slow аlgorithmic pattern or do we need to use cyclic sortpattern? Some of the problems have several solutions based on different patterns. In this series, we discuss the most popular algorithmic patterns that cover more than 90% of the usual problems.

It is different from High-School Algorithms 101 Course, as it is not intended to cover things like Karatsuba algorithm (fast multiplication algorithm) or prove different methods of sorting. Instead, Algorithmic Patterns focused on practical skills needed for the solution of common problems. For example, when we set up a Prometheus alert for high request latency we are dealing with Sliding Window Pattern. Or let say, we organize a team event and need to find an available time slot for every participant. At the first glance, it is not obvious that in this case, we are actually solving an algorithmic problem. Actually, during our day we usually solve a bunch of algorithmic problems without realizing that we dealing with algorithms.

The knowledge about Algorithmic Patterns helps one to classify a problem and then apply the appropriate method.

But probably most importantly learning algorithmic patterns boost general programming skills. It is especially helpful when you are debugging some production code, as it trains you to understand the execution flow.

Patterns covered so far:

Sliding Window I

Sliding Window II

Merge Intervals

Dutch National Flag

Matrix Spiral

Iterative Postorder Traversal

Bit Manipulation

Stay tuned :)

<Promo> If you interested to work as a backend engineer, there is an open position in my squad. Prior knowledge of Golang is not required. I am NOT an HR and DO NOT represent the company in any capacity. However, I can share my personal experience as a backend engineer working in the company. </Promo>

Read more

Startups going global: a guide to Product Hunt

Reading time4 min
Views2.1K

Product Hunt is a Y-combinator backed discovery platform, founded by Ryan Hoover in 2013. Conceived as an email list, it has gone on to become one of the most popular directories, raised $7.5 million in backing and was acquired by AngelList — a social network for entrepreneurs — in December 2016.

Exposure on the platform contributed to viral successes of Yo and Ship Your Enemies Glitter, and brought multi-million dollar companies, like Robinhood and Gimlet Media, to the public eye.

Continue Reading

Algorithms in Go: Sliding Window Pattern

Reading time3 min
Views5.9K

Let's consider the following problem: we have an array of integers and we need to find out the length of the smallest subarray the sum of which is no less than the target number. If we don't have such a subarray we shall return -1.

We can start with a naive approach and consider every possible subarray in the input:

Continue reading

Playing with Nvidia's New Ampere GPUs and Trying MIG

Reading time11 min
Views4.4K


Every time when the essential question arises, whether to upgrade the cards in the server room or not, I look through similar articles and watch such videos.


Channel with the aforementioned video is very underestimated, but the author does not deal with ML. In general, when analyzing comparisons of accelerators for ML, several things usually catch your eye:


  • The authors usually take into account only the "adequacy" for the market of new cards in the United States;
  • The ratings are far from the people and are made on very standard networks (which is probably good overall) without details;
  • The popular mantra to train more and more gigantic models makes its own adjustments to the comparison;

The answer to the question "which card is better?" is not rocket science: Cards of the 20* series didn't get much popularity, while the 1080 Ti from Avito (Russian craigslist) still are very attractive (and, oddly enough, don't get cheaper, probably for this reason).


All this is fine and dandy and the standard benchmarks are unlikely to lie too much, but recently I learned about the existence of Multi-Instance-GPU technology for A100 video cards and native support for TF32 for Ampere devices and I got the idea to share my experience of the real testing cards on the Ampere architecture (3090 and A100). In this short note, I will try to answer the questions:


  • Is the upgrade to Ampere worth it? (spoiler for the impatient — yes);
  • Are the A100 worth the money (spoiler — in general — no);
  • Are there any cases when the A100 is still interesting (spoiler — yes);
  • Is MIG technology useful (spoiler — yes, but for inference and for very specific cases for training);
Read more →

The Different RPG styles between East and West

Reading time6 min
Views4.5K

image


The Chinese Mobile RPG: the Genre of Giants.


Chinese RPGs make up 56% of the top 500 grossing iOS games in China. According to data gathered in 2019 by Game Refinery’s Joel Julkunen. Let’s take a look at the genre and some important points for game developers and RPG enthusiasts to consider Chinese Mobile RPGs.


RPG or role-playing game is a game where the player plays as a character, often the main character, in a fictional game world.

Read more →

Part 2: Upsetting Opinions about Static Analyzers

Reading time4 min
Views1.1K
Единорог грустит

By writing the article "Upsetting Opinions about Static Analyzers" we were supposed to get it off our chest and peacefully let it all go. However, the article unexpectedly triggered robust feedback. Unfortunately, the discussion went in the wrong direction, and now we will make a second attempt to explain our view of this situation.
Read more →

Using Flex (Fast Lexical Analyzer Generator)

Reading time5 min
Views8.6K
Lexical analysis is the first stage of a compilation process. It's used for getting a token sequence from source code. It gets an input character sequence and finds out what the token is in the start position, whether it's a language keyword, an identifier, a constant (also called a literal), or, maybe, some error. A lexical analyzer (also known as tokenizer) sends a stream of tokens further, into a parser, which builds an AST (abstract syntax tree).

It's possible to write a lexer from scratch, but much more convenient to use any lexer generator. If we define some parsing rules, corresponding to an input language syntax, we get a complete lexical analyzer (tokenizer), which can extract tokens from an input program text and pass them to a parser.

One of such generators is Flex. In this article, we'll examine how it works in general, and observe some nontrivial nuances of developing a lexer with Flex.
Read more →