InvokeAI 2.2 is now available to everyone. This update brings in exciting features, like UI Outpainting, Embedding Management and more. See highlighted updates below, or the full release notes for everything included in the release.
Image processing *
Working with photos and videos
I trained a neural network on my drawings and give the model for free (and teach you to create your own)
The InvokeAI team is excited to share our latest feature release, with a set of new features, UI enhancements, and CLI capabilities.
The present invention relates to an analog signal capturing devices generally and monochrome or color image capture sensors, such as a scanner or a Charge-Coupled-Device (“CCD”) for video and photo camera in particular, which are almost free from moiré and aliasing. The present invention relates to methods for enhancing the resolution of an image capture device and device for digital color/grey image displaying also.
Multimodality has led the pack in machine learning in 2021. Neural networks are wolfing down images, text, speech and music all at the same time. OpenAI is, as usual, top dog, but as if in defiance of their name, they are in no hurry to share their models openly. At the beginning of the year, the company presented the DALL-E neural network, which generates 256x256 pixel images in answer to a written request. Descriptions of it can be found as articles on arXiv and examples on their blog.
As soon as DALL-E flushed out of the bushes, Chinese researchers got on its tail. Their open-source CogView neural network does the same trick of generating images from text. But what about here in Russia? One might say that “investigate, master, and train” is our engineering motto. Well, we caught the scent, and today we can say that we created from scratch a complete pipeline for generating images from descriptive textual input written in Russian.
In this article we present the ruDALL-E XL model, an open-source text-to-image transformer with 1.3 billion parameters as well as ruDALL-E XXL model, an text-to-image transformer with 12.0 billion parameters which is available in DataHub SberCloud, and several other satellite models.
This article continues a series of notes about colorization. During today's experiment, we’ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.
This is a practical project, so we won’t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.
A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.
Google Colorizing Transformer vs Deoldify
Channel with the aforementioned video is very underestimated, but the author does not deal with ML. In general, when analyzing comparisons of accelerators for ML, several things usually catch your eye:
- The authors usually take into account only the "adequacy" for the market of new cards in the United States;
- The ratings are far from the people and are made on very standard networks (which is probably good overall) without details;
- The popular mantra to train more and more gigantic models makes its own adjustments to the comparison;
The answer to the question "which card is better?" is not rocket science: Cards of the 20* series didn't get much popularity, while the 1080 Ti from Avito (Russian craigslist) still are very attractive (and, oddly enough, don't get cheaper, probably for this reason).
All this is fine and dandy and the standard benchmarks are unlikely to lie too much, but recently I learned about the existence of Multi-Instance-GPU technology for A100 video cards and native support for TF32 for Ampere devices and I got the idea to share my experience of the real testing cards on the Ampere architecture (3090 and A100). In this short note, I will try to answer the questions:
- Is the upgrade to Ampere worth it? (spoiler for the impatient — yes);
- Are the A100 worth the money (spoiler — in general — no);
- Are there any cases when the A100 is still interesting (spoiler — yes);
- Is MIG technology useful (spoiler — yes, but for inference and for very specific cases for training);
В предыдущей статье мы рассказали о том, как создать камеру для улыбок с помощью HUAWEI ML Kit. В этот раз я собираюсь представить вам новую функцию HUAWEI ML Kit.
Вас когда-нибудь просили на учебе или работе принести фотографию определенного размера с цветным фоном для документов? В большинстве случаев у человека не окажется под рукой подходящей фотографии. Однажды в институте нам решили оформить персональные пропуска, но фотостудия оказалась закрыта. Тогда я сфотографировался на телефон, использовав простыню в качестве фона. И получил выговор от преподавателя. Но с помощью инструмента HUAWEI ML Kit вы сможете интегрировать SDK для сегментации изображений в ваше приложение и разработать апплет, чтобы создавать фото на документы самостоятельно и решить проблему отсутствия нужных фотографий.
Самое главное, что этот SDK абсолютно бесплатный и работает на всех телефонах на базе Android.
Разработка апплета для фото на документы самостоятельно
1.1 Добавьте репозиторий Maven Huawei в файл на уровне проекта build.gradle
Откройте файл build.gradle в корневом каталоге вашего проекта Android Studio.
In words, it seems, all the «intellects» are installed already everywhere, the whole country has long been transferred to neural networks, but only in some kind of demonstration pictures, in diagrams, on fingers. There is a mental dissonance — why not take a video camera and shoot at least a fragment of how Russia's super mega technologies work?
As Nikita Sergeevich said, «science ceases to be self-indulgence when its fruits are applied in the national economy.» And today's artificial intelligence is familiar to us only from games. Many people really want to see something useful in reality. Therefore, we were not too lazy and recorded our video of the operation of neural networks from real objects.
Especially my personal opinion is nothing but a consequence of the intervention of politics in science. After all, the colors of the Moon and the Sun from space directly relate to the flights of Americans to the Moon.
I searched through many scientific articles and books in search of information about the color of the Moon and the Sun from space. Fortunately, it turned out that even though they do not have a direct answer to RGB, there is complete information about the spectral density of the solar radiation and the reflectivity of the Moon across the spectrum. This is quite enough to get accurate colors in RGB values. You just need to carefully calculate what, in fact, I did. In this article I will share the results of calculations with you and, of course, I will tell you in detail about the calculations themselves. And you will see the Moon and the Sun from space in real colors!
With the advent of mobile phones with high-quality cameras, we started making more and more pictures and videos of bright and memorable moments in our lives. Many of us have photo archives that extend back over decades and comprise thousands of pictures which makes them increasingly difficult to navigate through. Just remember how long it took to find a picture of interest just a few years ago.
One of Mail.ru Cloud’s objectives is to provide the handiest means for accessing and searching your own photo and video archives. For this purpose, we at Mail.ru Computer Vision Team have created and implemented systems for smart image processing: search by object, by scene, by face, etc. Another spectacular technology is landmark recognition. Today, I am going to tell you how we made this a reality using Deep Learning.
Manual lung segmentation takes about 10 minutes and it requires a certain skill to get the same high-quality result as with automatic segmentation. Automatic segmentation takes about 15 seconds.
I assumed that without a neural network it would be possible to get an accuracy of no more than 70%. I also assumed, that morphological operations are only the preparation of an image for more complex algorithms. But as a result of processing of those, although few, 40 samples of tomographic data on hand, the algorithm segmented the lungs without errors. Moreover, after testing in the first five cases, the algorithm didn’t change significantly and correctly worked on the other 35 studies without changing the settings.
Also, neural networks have a disadvantage — for their training we need hundreds of training samples of lungs, which need to be marked up manually.
Hi everybody! I’m a research engineer at the Mail.ru Group computer vision team. In this article, I’m going to tell a story of how we’ve created AI-based photo restoration project for old military photos. What is «photo restoration»? It consists of three steps:
- we find all the image defects: fractures, scuffs, holes;
- we inpaint the discovered defects, based on the pixel values around them;
- we colorize the image.
Further, I’ll describe every step of photo restoration and tell you how we got our data, what nets we trained, what we accomplished, and what mistakes we made.
However, there are still countless little details that… they are not insolvable, no. They simply take too much of your time, especially if you are a beginner. What would be of help is a step-by-step project, done right in front of you, start to end. A project that does not contain «this part is obvious so let's skip it» statements. Well, almost :)
In this tutorial we are going to walk through a Dog Breed Identifier: we will create and teach a Neural Network, then we will port it to Java for Android and publish on Google Play.
For those of you who want to see a end result, here is the link to NeuroDog App on Google Play.
Web site with my robotics: robotics.snowcron.com.
Web site with: NeuroDog User Guide.
Here is a screenshot of the program: