Comments / Profile of comerc / Habr

Andrew Ka @comerc

#кодеротбога

ProfileArticles70PostsNewsComments829

Asana CEO: То, как мы работаем сейчас, скоро будет выглядеть пережитком прошлого

comerc Dec 31 2023 at 13:33

Доска является инструментом организации совместной работы. Итерация длится две недели. Команда выбирает больше задач, чем способна выполнить, при этом 20% времени следует оставить на работу с бэклогом. Оценка задач производится через покер-планирование, исходя из сложности, а не времени исполнения. Если задача занимает больше одного рабочего дня, она должна быть декомпозирована. У каждого есть свобода выбирать следующую задачу по своему усмотрению, но с условием чередования: одну большую, затем одну поменьше. Это обеспечивает равномерное распределение задач. Нельзя набирать задачи заранее. Если задача была взята и не выполнена, это сразу станет видно. Таким образом, менеджер с кнутом и пряником больше не нужен. Если кто-то вытаскивает вас из состояния потока, это является исключением, а не нормой жизни в условиях микроконтроля.

0

Тонкая настройка Whisper для многоязычного ASR с помощью Hugging Face Transformers

comerc Dec 26 2023 at 12:39

На faster-whisper есть утилитка для конвертации whisper-моделей:

$ ct2-transformers-converter --model mitchelldehaven/whisper-medium-ru --output_dir mitchelldehaven/whisper-medium-ru --quantization int8

Перепробовал всё, что нашёл. Печалька:

from faster_whisper import WhisperModel

# artyomboyko/whisper-small-ru-v2 - ошибки, но с пунктуацией
# artyomboyko/whisper-base-fine_tuned-ru - ошибки, но с пунктуацией
# mitchelldehaven/whisper-medium-ru - лучший результат, без пунктуации
# mitchelldehaven/whisper-large-v2-ru - медленно и ошибки, без пунктуации
# lorenzoncina/whisper-small-ru - ошибки, но с пунктуацией
# lorenzoncina/whisper-medium-ru - ошибки, но с пунктуацией
# AlanRobotics/whisper-tiny-ru ??
# gggggggg123/whisper-small-ru-golos ??
# sanchit-gandhi/whisper-small-ru-1k-steps - ошибки, но с пунктуацией
# Kolbaster/whisper-medium-ru ??
# Shirali/whisper-small-ru - ошибки, но с пунктуацией
# erlandekh/whisper-small-russian - ошибки, но с пунктуацией

model_size = "mitchelldehaven/whisper-medium-ru"

model = WhisperModel(model_size, device="cpu", compute_type="int8") # cpu_threads=10, 

segments, info = model.transcribe(
    "audio.wav",
    language="ru",
    beam_size=5,
    # beam_size=1,
    # temperature=0,
    # suppress_tokens=None,
)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Что ещё можно сделать? Научиться тюнить самому! Прикрутить GOLOS и SILERO.

-1

Распознавание речи, генерация субтитров и изучение языков при помощи Whisper

comerc Dec 25 2023 at 20:07

Перевёл: Тонкая настройка Whisper для многоязычного ASR с помощью Hugging Face Transformers

+1

Как отличить ChatGPT-3.5 от ChatGPT-4: разоблачение мифов

comerc Dec 25 2023 at 13:05

Bing Chat:

Ah, a riddle! The mistake in your sentence is the repetition of the word “the” in “Can you find the the mistake?”. It’s a common trick in these types of riddles to distract the reader with a list of numbers or other information, while the actual mistake is in the text itself. Good one!

phind:

The mistake in your riddle is that you wrote "the the" instead of "the". The correct sentence should be "Can you find the mistake?"

phind GPT-4:

The riddle you've presented is a classic one. The mistake in the text is not in the numbers, but in the sentence itself. The phrase "Can you find the the mistake?" has the word "the" repeated twice.

perplexity:

В этой загадке есть ошибка. Ответ на эту загадку можно найти на разных сайтах. Например, на сайте jagranjosh.com
1
можно найти загадку, в которой нужно найти ошибку в предложении. Ответ: в вопросе лишний артикль "the". На сайте funwithpuzzles.com
2
можно найти картинную загадку, в которой нужно найти ошибку в числах. Ответ: в числе 14 вместо цифры "1" написана буква "l". На сайте mirror.co.uk
3
также есть загадка, в которой нужно найти ошибку в предложении. Ответ: в вопросе лишний артикль "the".

perplexity copilot:

The mistake in the riddle is that it asks to find "the" mistake, but there are actually two mistakes. The first mistake is that the sequence of numbers is not complete, as it stops at 73. The second mistake is that the word "the" is repeated. So, the riddle should be "I have a nice riddle for you… Can you find the mistake?"

mixtral-8x7b-instruct:

The mistake is that there are only 70 numbers listed, but the statement claims that there are 72 numbers. The numbers 72 and 73 are missing from the list.

Codeium:

It looks like you've provided a sequence of numbers from 1 to 73. However, there doesn't seem to be a mistake in the sequence. Could you please provide more information or context about the riddle?

0

Распознавание речи, генерация субтитров и изучение языков при помощи Whisper

comerc Dec 25 2023 at 01:12

Так и не смог отгадать эту загадку для whisper.cpp

А качество распознавания русского текста оставляет желать. Очень хотелось попробовать дообученные модели с ресурса huggingface.

Нашёл другой путь - faster-whisper. Там есть утилитка для конвертации whisper-моделей:

$ ct2-transformers-converter --model mitchelldehaven/whisper-medium-ru --output_dir mitchelldehaven/whisper-medium-ru --quantization int8

Перепробовал всё, что нашёл. Печалька:

from faster_whisper import WhisperModel

# artyomboyko/whisper-small-ru-v2 - ошибки, но с пунктуацией
# artyomboyko/whisper-base-fine_tuned-ru - ошибки, но с пунктуацией
# mitchelldehaven/whisper-medium-ru - лучший результат, без пунктуации
# mitchelldehaven/whisper-large-v2-ru - медленно и ошибки, без пунктуации
# lorenzoncina/whisper-small-ru - ошибки, но с пунктуацией
# lorenzoncina/whisper-medium-ru - ошибки, но с пунктуацией
# AlanRobotics/whisper-tiny-ru ??
# gggggggg123/whisper-small-ru-golos ??
# sanchit-gandhi/whisper-small-ru-1k-steps - ошибки, но с пунктуацией
# Kolbaster/whisper-medium-ru ??
# Shirali/whisper-small-ru - ошибки, но с пунктуацией
# erlandekh/whisper-small-russian - ошибки, но с пунктуацией

model_size = "mitchelldehaven/whisper-medium-ru"

model = WhisperModel(model_size, device="cpu", compute_type="int8") # cpu_threads=10, 

segments, info = model.transcribe(
    "audio.wav",
    language="ru",
    beam_size=5,
    # beam_size=1,
    # temperature=0,
    # suppress_tokens=None,
)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Что ещё можно сделать? Научиться тюнить самому. Прикрутить GOLOS и SILERO.

https://huggingface.co/blog/fine-tune-whisper

0

Странно, уже столько времени прошло, а на Хабре до сих пор не упомянули про Alpaca

comerc Dec 22 2023 at 19:14

Как получить ggml .bin файл из huggingface репки?

# try.py
from transformers import WhisperForConditionalGeneration
import tempfile
import torch

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
model.save_pretrained('./my')
torch.save(model.state_dict(), './my/file.pt')

$ python3 try.py
$ python3 models/convert-pt-to-ggml.py ./my/file.pt ~/code/whisper ./my
Traceback (most recent call last):
File "~/code/whisper.cpp/models/convert-pt-to-ggml.py", line 210, in <module>
hparams = checkpoint["dims"]
~~~~~~~~~~^^^^^^^^
KeyError: 'dims'

0

Распознавание речи, генерация субтитров и изучение языков при помощи Whisper

comerc Dec 22 2023 at 19:13

Как получить ggml .bin файл из huggingface репки?

# try.py
from transformers import WhisperForConditionalGeneration
import tempfile
import torch

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
model.save_pretrained('./my')
torch.save(model.state_dict(), './my/file.pt')

$ python3 try.py
$ python3 models/convert-pt-to-ggml.py ./my/file.pt ~/code/whisper ./my
Traceback (most recent call last):
File "~/code/whisper.cpp/models/convert-pt-to-ggml.py", line 210, in <module>
hparams = checkpoint["dims"]
~~~~~~~~~~^^^^^^^^
KeyError: 'dims'

0

OpenAI решили распознавание речи! Разбираемся так ли это…

comerc Dec 22 2023 at 19:13

Как получить ggml .bin файл из huggingface репки?

# try.py
from transformers import WhisperForConditionalGeneration
import tempfile
import torch

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
model.save_pretrained('./my')
torch.save(model.state_dict(), './my/file.pt')

$ python3 try.py
$ python3 models/convert-pt-to-ggml.py ./my/file.pt ~/code/whisper ./my
Traceback (most recent call last):
File "~/code/whisper.cpp/models/convert-pt-to-ggml.py", line 210, in <module>
hparams = checkpoint["dims"]
~~~~~~~~~~^^^^^^^^
KeyError: 'dims'

0

AI доступный каждому разработчику

comerc Dec 22 2023 at 19:13

Как получить ggml .bin файл из huggingface репки?

# try.py
from transformers import WhisperForConditionalGeneration
import tempfile
import torch

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
model.save_pretrained('./my')
torch.save(model.state_dict(), './my/file.pt')

$ python3 try.py
$ python3 models/convert-pt-to-ggml.py ./my/file.pt ~/code/whisper ./my

Traceback (most recent call last):
  File "~/code/whisper.cpp/models/convert-pt-to-ggml.py", line 210, in <module>
    hparams = checkpoint["dims"]
              ~~~~~~~~~~^^^^^^^^
KeyError: 'dims'

0

Нейросеть мне в помощь или как я сделал телеграм бота, который умеет переводить песни

comerc Dec 22 2023 at 19:13

Как получить ggml .bin файл из huggingface репки?

# try.py
from transformers import WhisperForConditionalGeneration
import tempfile
import torch

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
model.save_pretrained('./my')
torch.save(model.state_dict(), './my/file.pt')

$ python3 try.py
$ python3 models/convert-pt-to-ggml.py ./my/file.pt ~/code/whisper ./my

Traceback (most recent call last):
  File "~/code/whisper.cpp/models/convert-pt-to-ggml.py", line 210, in <module>
    hparams = checkpoint["dims"]
              ~~~~~~~~~~^^^^^^^^
KeyError: 'dims'

0

Создаем сервис для Serverless перевода голоса, как в Cyberpunk 2077

comerc Dec 22 2023 at 19:13

Как получить ggml .bin файл из huggingface репки?

# try.py
from transformers import WhisperForConditionalGeneration
import tempfile
import torch

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny.en")
model.save_pretrained('./my')
torch.save(model.state_dict(), './my/file.pt')

$ python3 try.py
$ python3 models/convert-pt-to-ggml.py ./my/file.pt ~/code/whisper ./my

Traceback (most recent call last):
  File "~/code/whisper.cpp/models/convert-pt-to-ggml.py", line 210, in <module>
    hparams = checkpoint["dims"]
              ~~~~~~~~~~^^^^^^^^
KeyError: 'dims'

0

Подробное объяснение принципа KISS в программном обеспечении

comerc Dec 22 2023 at 13:11

Я ничего не минусил. Моя философия: или плюсы, или игнор.

0

Распознавание речи, генерация субтитров и изучение языков при помощи Whisper

comerc Dec 21 2023 at 21:27

Ребята, я с вами!

+1

Инструкция: как поднять GitLab CI/CD на GoLang-проекте

comerc Dec 21 2023 at 16:44

Как отказаться от helper_image?

### config.toml - diff helper_image VS docker image ###
# concurrent = 1
concurrent = 2
check_interval = 0
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "via docker"
  url = "http://localhost"
  id = 13
  token = "glrt-2GR51rubUbopK_fqMFLz"
  token_obtained_at = 2023-12-12T14:51:21Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  # only for docker image:
  environment = ["DOCKER_DRIVER=overlay2", "DOCKER_LOGIN=gitlab-ci", "DOCKER_PASSWORD=password"]
  [runners.cache]     
    MaxUploadedArchiveSize = 0
  [runners.docker]
    tls_verify = false
    # image = ""
    image = "docker:latest"
    # privileged = true
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    # volumes = ["/certs/client", "/cache"]
    volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
    shm_size = 0
    network_mode = "host"
    # helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-e0218c92"
    # network_mtu = 0
    # only for docker image:
    pull_policy = ["always", "if-not-present"]

0

Создание атомарных коммитов в Git

comerc Dec 21 2023 at 11:00

В режиме "ковровое бомбометание" спасает "trunk based development".

0

Инструкция: как поднять GitLab CI/CD на GoLang-проекте

comerc Dec 20 2023 at 16:58

(И ещё до кучи, не проверял - ChatGPT насоветовал.)

Это позволяет запускать задания GitLab CI/CD в отдельных, изолированных контейнерах Docker. Вот пример конфигурации docker-compose.yml для GitLab Runner:

version: '3.5'
services:
  dind:
    image: docker:20-dind
    restart: always
    privileged: true
    environment:
      DOCKER_TLS_CERTDIR: ''
    command:
      - --storage-driver=overlay2
  runner:
    restart: always
    image: registry.gitlab.com/gitlab-org/gitlab-runner:alpine
    depends_on:
      - dind
    environment:
      - DOCKER_HOST=tcp://dind:2375
    volumes:
      - ./config:/etc/gitlab-runner:z
  register-runner:
    restart: 'no'
    image: registry.gitlab.com/gitlab-org/gitlab-runner:alpine
    depends_on:
      - dind
    environment:
      - CI_SERVER_URL=${CI_SERVER_URL}
      - REGISTRATION_TOKEN=${REGISTRATION_TOKEN}
    command:
      - register
      - --non-interactive
      - --locked=false
      - --name=${RUNNER_NAME}
      - --executor=docker
      - --docker-image=docker:20-dind
      - --docker-volumes=/var/run/docker.sock:/var/run/docker.sock
    volumes:
      - ./config:/etc/gitlab-runner:z

Вам также потребуется файл .env со следующим содержимым:

RUNNER_NAME=RUNNER-NAME
REGISTRATION_TOKEN=TOKEN
CI_SERVER_URL=https://gitlab.com/

Пожалуйста, замените RUNNER_NAME, REGISTRATION_TOKEN, и CI_SERVER_URL на соответствующие значения для вашего проекта.

0

Инструкция: как поднять GitLab CI/CD на GoLang-проекте

comerc Dec 20 2023 at 16:18

version: '3.5'
name: gitlab-runner
services:
  runner:
    restart: always
    image: registry.gitlab.com/gitlab-org/gitlab-runner:alpine
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ~/.gitlab-runner:/etc/gitlab-runner:z
    network_mode: host

Ларчик просто открывался. Это всё, что нужно для запуска gitlab-runner без танцев с бубном.

0

Создание атомарных коммитов в Git

comerc Dec 20 2023 at 16:12

Круто, да. Но мой кумир - Андрей Ситник.

+1

Создание атомарных коммитов в Git

comerc Dec 20 2023 at 10:31

Так squash убивает машину времени. Меня, как археолога, это обламывает.

+2

Создание атомарных коммитов в Git

comerc Dec 20 2023 at 10:28

Хе-хе. Я тоже играю в эту игру. Дофаминовый майнинг.

+4

4

5 6 ...