Local Whispers

For most of the videos that I make, I also like to have subtitles, because sometimes it's easier to just read along.

I used to make these subtitles with an online service called Otter.io, but they stopped allowing uploading of video files.

And then I found Whisper, which allows me to upload audio files to create subtitles. Whisper is an API from OpenAI, mainly known for ChatGPT.

I didn't like having to upload everything to them either, as that means that they could train their model with my original video audio.

Whisper never really worked that well, because it broke up the sentences in weird places, and I had to make lots of edits. It look a long time to make subtitles.

I recently found out that it's actually possible to run Whisper locally, with an open source project on GitHub. I started looking into this to see whether I could use this to create subtitles instead.

The first thing that their documentation tells you to do is to run: pip install openai-whisper.

But I am on a Debian machine, and here Python is installed through distribution packages, and I don't really want to mess that up. apt-get actually suggests to create a virtual environment for Python.

In a virtual environment, you can install packages without affecting your system setup. Once you've made this virtual environment, there's actually Python binaries symlinked in there, that you can then use for installing things.

You create the virtual environment with:

python3 -m venv `pwd`/whisper-local
cd whisper-local

In the bin directory you then have python and pip. That's the one you then use for installing packages.

Now let me run pip again, with the same options as before to install Whisper:

bin/pip install -U openai-whisper

It takes quite some time to download. Once it is done, there is a new whisper binary in our bin directory.

You also need to install fmpeg:

sudo apt-get install ffmpeg

Now we can run Whisper on a video I had made earlier:

./bin/whisper ~/media/movie/xdebug33-from-exception.webm

The first time I ran this, I had some errors.

My video card does not have enough memory (2GB only). I don't actually have a very good video card at all, and was better off disabling it, by instructing "Torch" that I do not have one:

export CUDA_VISIBLE_DEVICES=""

And then run Whisper again:

./bin/whisper ~/media/movie/xdebug33-from-exception.webm

It first detects the language, which you can pre-empt by using --language English.

While it runs, it starts showing information in the console. I quickly noticed it was misspelling lots of things, such as my name Derick as Derek, and Xdebug as XDbook.

I also noticed that it starts breaking up sentences in a odd way after a while. Just like what the online version was doing.

I did not get a good result this first time.

It did create a JSON file, xdebug33-from-exception.json, but it is all in one line.

I reformatted it by installing the yajl-tools package with apt-get, and flowing the data through json_reformat:

sudo apt-get install yajl-tools
cat xdebug33-from-exception.json | json_reformat >xdebug33-from-exception_reformat.json

The reformatted file still has our full text in a line, but then a segments section follows, which looks like:

"segments": [
    {
        "id": 0,
        "seek": 0,
        "start": 3.6400000000000006,
        "end": 11.8,
        "text": " Hi, I'm Derick. For most of the videos that I make, I also like to have subtitles, because",
        "tokens": [
                            50363, 15902, 11, 314, 1101, 9626, 624, 13, 1114, 749, 286,
                            262, 5861, 326, 314, 787, 11, 314, 635, 588, 284, 423, 44344,
                            11, 780, 50960
        ],
        "temperature": 0.0,
        "avg_logprob": -0.20771383965152435,
        "compression_ratio": 1.5128205128205128,
        "no_speech_prob": 0.31353551149368286,
            },

Each segment has an id, a start and end time (in seconds), the text for that segment, and a bunch of auxiliary information.

But it is still sort of a sentence at a time, which isn't really what we want. Additionally, I really don't want it to say XDbook and misspell my name.

For both of those there are actually options that we can use.

To make things easier for us later, we're turning on word timestamps. That allows us to by word see where the time index actually was. You do that with the --word_timestamps True option.

To provide a hint on what the video is about, and to get better words, we can use the initial prompt. The option that I used for this video was: --initial_prompt="This video by Derick introduces a new Xdebug feature, regarding exceptions".

To make things more accurate or less accurate, there's also an option that you can specify which is which model to use. Normally the standard one medium is fine, but in order to speed up generation, you can use --model tiny. This will give less accurate results.

If the model has not been downloaded the before, Whisper will automatically do this.

It is also possible to specify the language, which makes things go faster if it's English only as well: --language English. It supports a bunch of languages.

The full command that I used is:

CUDA_VISIBLE_DEVICES="" \
        ./bin/whisper ~/media/movie/xdebug33-from-exception.webm \
        --word_timestamps True \
        --initial_prompt="This video by Derick introduces a new Xdebug feature, regarding exceptions" \
        --language English

In the output file, we now see another elements in each sentence section:

"words": [
    {
        "word": " Hi,",
        "start": 3.6400000000000006,
        "end": 4.16,
        "probability": 0.6476245522499084
    },
    {
        "word": " I'm",
        "start": 4.28,
        "end": 4.44,
        "probability": 0.9475358724594116
    },
    {
        "word": " Derick.",
        "start": 4.44,
        "end": 4.8,
        "probability": 0.12672505341470242

For each word, it has the start and end, as well as the probability of it being correct. You see that for Derick it was only 13% certain.

With this information you can do some analysis to create an actual subtitle script out of this. For that I have written a PHP script.

It loops over all the segments, and for each of the segments over all the words. If the difference in time between the end of a word and the start of a new word is more than a quarter of a second, it emits a new section of subtitles.

Similarly if a sentence is longer than 60 characters it also breaks it up into an extra line. This keeps all the subtitles reasonably well sorted.

The emit function formats it like how the SRT files are supposed to be.

With this script, I now create a new SRT file, overwriting the one that Whisper had created:

php whisper-to-srt.php xdebug33-from-exception.json > xdebug33-from-exception.rst

The output looks like:

0
00:00:03,640 --> 00:00:04,799
Hi, I'm Derick.

1
00:00:05,919 --> 00:00:10,980
For most of the videos that I make, I also like to have subtitles,

2
00:00:11,640 --> 00:00:15,779
because sometimes it's easier to just read along for various
different reasons.

These subtitles I can now add when uploading a video to YouTube. I might also create a similar script to generate output that can be used as the base for an textual article. Not everybody likes learning from watching videos.

I would also prefer not to use a model that has been trained with questionable sources. I will be investigating if I can use Mozilla's Common Voice project's data instead.

Shortlink

This article has a short URL available: https://drck.me/local-whispers-io0

Comments

No comments yet

Xdebug Update: May 2024

I have not written an update like this for a while. I am sorry.

In the last months I have not spent a lot of time on Xdebug due to a set of other commitments.

Since my last update in November a few things have happened though.

Xdebug 3.3

I released Xdebug 3.3, and the patch releases 3.3.1 and 3.3.2.

Xdebug 3.3 brings a bunch of new features into Xdebug, such as flamegraphs.

The debugger has significant performance improvements in relation to breakpoints. And it can now also show the contents of ArrayIterator, SplDoublyLinkedList, SplPriorityQueue objects, and information about thrown exceptions.

A few bugs were present in 3.3.0, which have been addressed in 3.3.1 and 3.3.2. There is currently still an outstanding issue (or more than one), where Xdebug crashes. There are a lot of confusing reports about this, and I have not yet managed to reproduce any of them.

If you're running into a crash bug, please reach out to me.

There is also a new experimental feature: control sockets. These allow a client to instruct Xdebug to either initiate a debugging connection, or instigate a breakpoint out of band: i.e., when no debugging session is active. More about this in a later update.

Funding Platform

Last year, I made a prototype as part of a talk that I gave at NeosCon.io. In this talk I demonstrated native path mapping — configuring path mapping in/through Xdebug, without an IDE's assistance.

In collaboration with Robert from NEOS, and Luca from theAverageDev, we defined a project plan that explains all the necessary functionality and work.

Adding this to Xdebug is a huge effort, and therefore I decided to set up a way how projects like this could be funded.

There is now a dedicated Projects section linked from the home page, with a list of all the projects. The page itself lists a short description for each project.

For each project, there is a full description and a list of its generous contributors. The Native Xdebug path Mapping project is currently 85% funded. Once it is fully done, I will start the work to get this included in Xdebug 3.4. You could be part of this too!

Xdebug Videos

I have created several videos since November.

Two for Xdebug:

And several for writing PHP extensions, as part of a new series:

If you have any suggestions, feel free to reach out to me on Mastodon or via email.

Business Supporter Scheme and Funding

In the last month, no new business supporters signed up.

Besides business support, I also maintain a Patreon page, a profile on GitHub sponsors, as well as an OpenCollective organisation.

If you want to contribute to specific projects, you can find those on the Projects page.

Xdebug Cloud

Xdebug Cloud is the Proxy As A Service platform to allow for debugging in more scenarios, where it is hard, or impossible, to have Xdebug make a connection to the IDE. It is continuing to operate as Beta release.

Packages start at £49/month, and I have recently introduced a package for larger companies. This has a larger initial set of tokens, and discounted extra tokens.

If you want to be kept up to date with Xdebug Cloud, please sign up to the mailinglist, which I will use to send out an update not more than once a month.

Shortlink

This article has a short URL available: https://drck.me/xdebug-24may-inz

Comments

No comments yet

Friday Night Dinner: Sudu

We went on a rainy Saturday evening. Sudu is pretty close to where we live, but this was our first time.

The exterior is painted a dark grey and is a little unprepossessing, however once inside, we were in for a treat.

We were quickly seated in a busy restaurant and saw many delicious-sounding things on the menu. My husband chose the beef rendang, which he had served with plain basmati rice. I picked the chicken satay served with coconut rice and a plain roti. This was washed down with a couple of tiger beers. The service in Sudu was quick and attentive but not intrusive.

The food arrived fairly quickly. My husband's beef rendang was nice and spicy, as it should be, with nice tender chunks of beef that worked well with the rice. My chicken satay came as chicken pieces on skewers with a bowl of a spicy, peanutty satay sauce into which the chicken could be dipped. The coconut rice was subtly flavoured with coconut. Once the chicken skewers were finished, I poured the remaining satay onto my rice. The roti was delicious, slightly crispy on the outside but soft and a little fluffy on the inside and not overly greasy.

The food coming out to neighbouring tables also looked great, so it is likely we’ll be going back to try something else from the menu.

Chicken Satay
1 / 3
Roti
2 / 3
Beef Rendang
3 / 3

Shortlink

This article has a short URL available: https://drck.me/sudu-int

Comments

No comments yet

Become a Patron!
Mastodon
GitHub
LinkedIn
RSS Feed
Flickr
YouTube
Vimeo
Email

My Amazon wishlist can be found here.

Life Line