Blackmagic DaVinci Resolve – the Sequel

Unlike most cinema films Unlike most cinema films, new software sequels are usually released with new ideas – and DaVinci Resolve 18.5 and 18.6 (DR for short) are no exception. In particular, there are quite impressive new AI functions among them. But this time, one wonders whether one has bought the cinema ticket too early without waiting for the criticism of others. Or whether you installed the update too early without consulting the relevant forums.
Trotz Einstellung auf Deutsch tauchen sogar fremde Schriftzeichen auf.
Trotz Einstellung auf Deutsch tauchen sogar fremde Schriftzeichen auf.

Let’s start with transcription: The value of a new feature is, of course, extremely dependent on your workflow and specific needs. But for documentaries, especially with interviews, AI-based text recognition (only in the studio version) is perhaps the most useful new feature in 18.5. It can even be helpful in feature films if the director likes to let the actors improvise.

The selection of languages is wider than that for the programme itself.

We tested speech recognition on an entire feature film in order to test it under critical conditions, such as music and strong background noises. Even on a modest MacBook M1 Pro, it ran surprisingly quickly at 4 minutes and 15 seconds for an hour and 42 minutes of film. But then the typical behaviour of today’s AI became apparent – the result was somewhere between Wow! and What? Sometimes we were extremely impressed when it still recognised the text correctly despite the film music, where we had difficulties ourselves and had to listen several times.

Sometimes the AI is also quite confused.

But there are also a lot of results that don’t (yet?) replace humans. In passages with long periods of silence or only quiet noises, it started to hallucinate. And not just in German, although we had switched from automatic to our language. Chinese or Korean text could still appear from time to time. Or absurd poetry such as “If you think the body Bugün? Cve is that giant?”

And who says AI isn’t creative? Word creations that probably don’t appear in the internal dictionary were, for example, “Kapitaneverbräucher” as an interesting alternative to “Kapitalverbrecher” or a nightmare that mutates into “Eiltraum”. Sometimes, however, the context is ignored in such a way that “photo album” can become “fodder album” when spoken quickly.

Sometimes the AI is also quite confused.

All of this may be rather amusing, but word turns that change the meaning, such as “What happened?” instead of “Did what happen?” can easily be overlooked. Strangely, sometimes even longer, clearly understandable passages are completely ignored. Conversely, the AI occasionally inserts a short text passage several times, which actually only occurs later in the film. In short: you can’t get by without careful checking and manual correction, although the function is better than in Premiere.

The transcription is linked to the viewer, with cut points for a marked sentence.

Editing assistant

Nevertheless, the AI can save a lot of time, especially when editing, because the recognised and, if necessary, corrected text remains linked to the image. If you have selected a text passage, of course also via text search, the corresponding point in the film is shown at the same time. A moving cursor appears in the text while you play the video. As corresponding cut points are also set temporarily, a section can be cut into the timeline immediately. This works in both the Cut and Edit pages. The corresponding functions can be found directly in the transcription text window, from playback buttons to insert or append to setting markers and creating subclips.

There are also text functions such as search and replace, changing the font size and optional display in black on white. Only major changes to the text length in this window can have unintended side effects when editing. Therefore, there is a copy function for the external use of text, but no paste function. The backspace key crosses out selected passages, but does not delete them. This results in the passage being omitted during editing. Of course, you can also export the entire text here if required. The text window can be freely configured and moved to a separate monitor.

German subtitles for English tutorials are created with the help of speech recognition and DeepL.

Subtitles

As the transcription is based on text and timecode, it makes sense to generate subtitles automatically in a similar way. The “Create Subtitles from Audio” function is available in the timeline menu for this purpose. This takes a little longer than the transcription, around three and a half minutes for half an hour of film. After that, the subtitle track contains clear texts if the speaker, in this case Cullen Kelly, was clearly audible. This time we tried a tutorial from Blackmagic (BM for short), in which the voiceover was completely clear and could even be heard without background music (it works!).

Now we wanted to create German subtitles for this video using colour management, as no German version exists in this case. In principle, this also works if you simply enter German instead of English in the dialogue box for the title generation, but unfortunately such an internal translation is not really convincing. We have therefore renamed the exported *.SRT file to *.TXT and had it translated by the respectable DeepL. Unfortunately, this is only possible with the full version, as otherwise the time codes and line breaks in the file are not retained (but there is a test period).

We only had to rename the result to *.SRT again and were then able to import it as a subtitle file. Thanks to better technical conditions, the text recognition in English was already more accurate, as was the translation by DeepL. Nevertheless, it is essential to review the foreign-language version and correct it if necessary. This applies all the more to the German translation, as even DeepL does not yet have a complete grasp of technical terms such as those used in this complex topic. What, please, is a flat log state? The appearance of a log file..

The translation by AI is not always helpful.

So here too: Trust (in AI) is good, control is better. Nevertheless, it can save a lot of time in both use cases, depending on the sound quality and level of sophistication of the language. You will want to revise the formatting of the subtitles for high-quality jobs, e.g. only a minimum time interval can be specified in the dialogue and different speakers are not recognised. Finally, line breaks and timing are not always perfect.

With the full version of DeepL, the subtitles remain in place.

Classification

Last but not least, another delicacy of the AI for audio: it can classify the sound of all clips according to criteria such as dialogue, effects, music or silence. It adds sub-categories to these categories if it has recognised something, such as sirens or dogs barking. These terms appear in the audio metadata under “Category” and “Subcategory” and can of course be corrected or added there. You can then sort the material into “Smart Bins” on this basis – making your work even easier.

Even the preview shows that the contours
are not very precise.

Relight

Let’s move on to the image, where the AI has also learnt something new. Relight is intended to simplify tasks that previously required complex Power Windows with tracking and therefore could not take into account the spatial situation in the image. Similar to Depth Map, Relight calculates the spatial constellation in the scene. On this basis, you can then place a directional light source, a point light or a spotlight and subsequently change the lighting.

This creates a halo on the background without an additional mask.

Once again, this is as amazing as it is difficult. Just as Depth Map cannot replace a green screen, the separation from the background is also problematic here. This can already be seen in the map preview: The AI has recognised the spatial situation correctly, but the mask is not clearly separated from the distant background. If we try to soften the lighting in the foreground, the result is unfortunately a “halo”.

Although Relight offers very versatile adjustments and even a connection to the tracker, this fundamental problem cannot be solved with this alone. You need precise masks or a key again. An additional ‘magic mask’ can help to a certain extent, but it is not always clearly delineated enough. On the other hand, it is clear that there is considerable potential with green screen if the background is replaced anyway and the lighting situation needs to be adjusted afterwards.

At dusk, the radiation from lamps can be enhanced quite credibly.

Another use case would be “Day for Night”, or rather twilight instead of night. If you can’t replace all the lamps with more powerful ones, they often can’t compete with the rest of the light. In this case, the function is very well suited to convincingly amplifying the light in their surroundings. The advantage here is that the angle to the light, especially in buildings, is correctly taken into account. Because you often need several light sources for this, but the effect is computationally complex, you can use an analysis for several nodes. Casey Faris shows this quite well in a YouTube tutorial.

To understand the function: Relight does not generate a light source itself, but only defines its area of influence in 3D space. The result is a mask with grey scales, which then controls the effect of all the usual grading settings. Incidentally, the surface map is compatible with the method often used by 3D software for the alignment of surfaces. Such image data, usually called “normal maps”, can be exported and imported for corresponding tasks. This opens up far-reaching possibilities for the integration of real video and CGI.

The surface map corresponds to a normal map, green is horizontal, red areas point to the right and blue to the left.

Blackmagic Cloud

Previously, only the project data was exchanged in a cloud account at BM. The transfer and synchronisation of proxies or even raw material had to be done via a Dropbox or Google Drive account. This was obviously not always easy, so BM now offers its own cloud storage (currently still in beta). This costs 15 US dollars a month for 500 GB of storage space. For comparison: Dropbox Plus costs 12 euros per month or 99 euros per year, so you can use 2 TB of storage space, but can only exchange up to 2 GB of data per day.

Blackmagic now also offers storage space in its own cloud.

Google’s cheapest offer initially costs 1.99 per month for 100 GB. This makes the BM Cloud seem quite expensive. But apart from the fact that you can cancel at any time, it is also more convenient. With the right settings, proxies are automatically generated and synchronised in the background. In addition, the desired storage space can be adjusted at any time, with precise billing after cancellation, even for parts of a month. The Project Libraries still cost an additional 5 US dollars.

It’s not cheap, but it’s convenient.

With a fast connection, DaVinci Remote Monitor can be used to stream a session in DR to another location. The image quality is so good that it is even possible to judge the image on a calibrated monitor.

Remote monitoring already works quite well.

All computers must be equipped with DR Studio and with Apple Silicon or for Windows and Linux with RTX Nvidia GPUs, other GPUs are currently not supported. You can use the free app of the same name on an iPad or iPhone. In addition, all participants must have a BM Cloud account. Of course, the connections are protected with a session code.

Improvements and additions

So much for the most spectacular functions in the studio version. The AI-based Depth Map previously found in DR is now also available in Fusion. Magic Mask has been given new fine adjustments, similar to conventional keys, with an additional parameter for “Consistency”. This allows you to better define uneven mask edges over the course of the clip. For this to work, a “stroke” must exist over a sufficient number of individual frames.

The “Magic Mask” can now be fine-tuned.

You will often use the cache for such work, especially on somewhat weaker computers. If any inexplicable errors occur, it usually helps to simply delete the cache and regenerate it. It doesn’t run smoothly yet, especially when combining AI-based tasks and tracking with masks. In addition, you should not work with reduced “Timeline Proxy Resolution”, the other options for smoother work with limited computing power work better.

Timelines can now be quickly saved as individual backups and you can set the colour management for each timeline separately. The cut page has been given many new functions, including subtitling including the speech recognition described above, simpler split edits, as well as cut detection and the optional creation of empty spaces in the main track. Anyone still struggling with interlace video will breathe a sigh of relief, as the cut can be correctly limited to the full frame limit.

Timelines including presets are sent directly to the render queue by right-clicking.

The Media Page now offers the export and import of timelines in OpenTimelineIO format. Timelines can be transferred directly to the render queue from here, with immediate selection of a preset and the storage location. When rendering individual clips, the complete originals can be output instead of the edited version. Power Bins can now also be exported or imported. DR also understands the latest XML format from Final Cut Pro.

Each timeline can have its own colour management.

Fusion now recognises the USD format (Universal Scene Description), which is becoming increasingly widespread and is also supported by the free Blender. There is also a specialised toolset including MaterialX Framework and support for USD Hydra renderers. Multi Merge simplifies the combination of several layers and a native Depth Map can be found in the studio version
the studio version. Clean Plates and Anaglyphs now have GPU support, and the splitter has also become much faster. Many will also be pleased with the 3D extrusion of shapes, with bevelled or rounded edges if desired. Corresponding shapes can also be designed with the sPolygon Node.

New formats

The latest SDKs have been integrated for BM and RED cameras, and XAVC H and HS from Sony now also work. Apple Log is supported, but unfortunately still no ProRes RAW. Playback of AC3 sound is finally available under Linux and Macs decode AAC with low latency. Compression to ProRes, AV1, H.264, H.265, MP3 and AAC is now offered for the MKV container, plus FFV1 in MKV or QuickTime. Customised render presets can be written as XML files and can be transferred to other systems to achieve identical results (provided the hardware encoders play along).

Speed

HDR material can be graded faster. On Apple Silicon, the flat noise reduction has become significantly faster, but the biggest speed gains this time are for the AI tools on the Nvidia GPUs with TensorRT. The first time the current version is started on a corresponding computer, an optimisation run is performed. As this is still a fairly new process, BM has cleverly provided a switch-off option this time. But even with newer AMD GPUs, such as the RX 7900, there are impressive speed gains for the AI tools.

Comment

With so many new features, users should be quite satisfied, shouldn’t they? There’s even more detail, including some long-cherished wishes. But unfortunately this time Blackmagic has managed to unleash an extremely unfinished version on the world. Presumably the IBC was the occasion to snatch a software from the developers’ hands that would have been more suitable as a public beta. Even the hastily delivered 18.6.1 still caused considerable problems. Only 18.6.2, which we tested, runs reasonably smoothly.
For the future, we would like the developers to concentrate a little more on bug fixes and a little less on new features.

Speed on PC hardware
We would like to provide you with a few more benchmark results from PCs for our test of Mac Studio from DP 23:05. Thankfully, these were created with the dedicated help of several members of the German DR forum. They also showed that 18.6 was mostly faster than 18.5.

A Ryzen 9 3950X with 128GB RAM and the RTX 4090 with 24GB VRAM was already way ahead with 18.5.1, with 6:11 at UHD and 15:57 (!) at 8K. The values for H.265 (Setting Master) and DNxHR HQX 10 Bit were almost identical. 

A Ryzen 7 5800x with 48GB RAM and an RTX4070 managed 13:25 in UHD in DNxHR, but needed significantly more at 21:05 due to the lack of a hardware encoder for H.265. this computer did not manage 8K. On the Intel side, a "Hackintosh" based on the 13900KF with 64 GB RAM and the Radeon RX 6900XT was quite fast in UHD with 11:40, 8K was not tested.

Under Windows, an Intel i9-10940X with 64 GB and the GeForce RTX 2080 Ti needed 22:08 for H.265 and 21:22 for DNxHR HQX.  An i9-9900K with 32 GB RAM and the GeForce RTX 3080 was significantly slower. Under DR 18.6, it took an hour and 6 minutes for H.265, and even a little longer for DNxHR. 8K was not tested.
An i7 9700K with 32 GB RAM and the inexpensive GeForce RTX 3060TI needed one hour and 34 minutes for UHD in DNxHR and two hours and 12 minutes for H.265 in 10-bit (8K was also out of the question here).  The detailed results and their authors can be found here: bit.ly/bmd_forum

Another useful benchmark for Neatvideo is here, including the results of common devices: bit.ly/neatbench

Related Posts