Skip to content

The Scalable Video Technology for AV1 (SVT-AV1 Encoder and Decoder) with perceptual enhancements for psychovisually optimal AV1 encoding

License

Notifications You must be signed in to change notification settings

vghuy17ck1/svt-av1-psyex

 
 

Repository files navigation

An exotic, extended, exciting continuation of SVT-AV1-PSY: SVT-AV1-PSYEX

Since the original SVT-AV1-PSY project was sunset, I decided to build svt-av1-psyex: a way for all of us to develop the best open AV1 video encoder in novel ways to attain the best visual fidelity at all quality levels when compressing videos.

As such, SVT-AV1-PSYEX is the Scalable Video Technology Psychovisually Extended with advanced perceptual improvements, additions and tuning for psychovisually optimal media encoding. The goal is to create the best encoding implementation for perceptual quality with AV1. We may or may not implement bleeding edge features, optimizations and even extend mainline features beyond their intended purpose.

The biggest change to SVT-AV1-PSYEX 3.0.2-B: tune 0

Compared to previous SVT-AV1-PSYEX versions, svt-av1-psyex 3.0.2-B has superior default settings. The main change relates to the visual tune: tune 0 has been made default in place of tune 1. This has been done to improve visual quality on the side of fidelity. However, just changing the default from tune 1 to tune 0 would have increased encoding artifacts considerably.

For this reason, I set --noise-adaptive-filtering 0 by default, which disables noise-adaptive CDEF and restoration filters. This restores original CDEF and restoration filter applications similar to tune 1/tune 2.

Normally in tune 0/3, noise-adaptive CDEF and restoration filters are enabled: if noise levels are high enough, CDEF and restoration filtering get completely disabled. This has the effect of increasing sharpness levels quite a bit in some scenarios.

However, at lower bitrates and for lots of 2D animated content, completely disabling those filters when there's just a bit of noise, can wreck image stability and/or quality around lines. In the past with previous svt-av1 and svt-av1-psy, this tradeoff was worth it since there weren't other ways to boost image quality; today is a completely different story with much more powerful psychovisual options.

This allowed us to set --tune 0 --noise-adaptive-filtering 0 and still get higher visual quality than the previously set --tune 1 in svt-av1-psyex 3.0.2-A in almost all encoding scenarios as well!

If you want previous tune 0 behavior, you can set --noise-adaptive-filters 1. Do note that 1 enables both noise-adaptive CDEF and restoration filtering at all times, for all presets. --noise-adaptive-filtering 1 can be forced to all tunes, which can be useful if you want to make --tune 2 behave more closely to --tune 3.

For reference, here is Fidelity <<<<<<<<<<< Appeal scale of various basic tunes:

--tune 0 --noise-adaptive-filtering 1 < --tune 0 --noise-adaptive-filtering 2 < --tune 1

A set of guidelines for high quality encoding in SVT-AV1-PSYEX 3.0.2-B

In the previous encoder version SVT-AV1-PSYEX 3.0.2-A, I included various setting recommendations for various encoding scenarios. However, it just backfired when I started seeing people using settings outside of their recommended usage guidelines; I also realized that without realizing, I contributed to the practice of cargo-culting by recommending settings that only I could imagine and view.

For this reason, I'm avoiding recommending definite setting strings from now on. Instead, I'll just provide a few general recommendations down below.

1) ---> Simplicity is key

If you're doing general encoding or are at the beginning of your encoding journey, I'd just recommend staying with the well tuned defaults and only change things like speed presets (recommended are P2 to P6), CRF and maybe higher psy-rd if everything else fails.

2) ---> General recommendations for intermediate encoders.

Note: higher fidelity = sharper with more chances of artifacts, higher appeal = fewer artifacts with more potential for blurriness.

Higher fidelity at no direct speed cost

For sharper, more consistent visuals (with a chance of more artifacts) without slowing the encoder down:

  • Increase --psy-rd strength.
  • Set --qm-min 8
  • Set --noise-adaptive-filtering 1

This will make the encode more consistent and increase visual sharpness, but might introduce more artifacts at the same bitrate; this is usually worth it.

For more challenging content, a further quality increase can be achieved with:

  • --noise-norm-strength 3 (Default is 1)

A bit more Consistency over Efficiency

For more consistent quality at the expense of compression efficiency:

  • --qp-scale-compress-strength 2 (the default is 1; higher values offer diminishing returns).

Higher visual quality with a CPU tradeoff

For a significant visual quality increase at the cost of more encoding time:

  • Add --complex-hvs 1 to the above settings. This enables much higher quality mode decisions, which can greatly increase visual quality in all scenarios when psy-rd is active, particularly at higher strengths.
  • If using Preset 3 and faster, set --hbd-mds 1 to get static 10-bit mode decision, increasing visual quality and efficiency at all times.
  • If using Preset 4 and faster, set --enable-dlf 2 to increase deblocking filter quality.

Anime Encoding (Minimal Blur)

To preserve smooth, clean high quality lines in anime without excessive bitrate:

  • Keep your high-quality settings and add --noise-adaptive-filtering 4.
    • This enables noise-adaptive filtering only for restoration, allowing CDEF to clean up lines effectively while retaining grain in more demanding scenarios, where restoration gets effectively disabled.

Extreme Grain Retention (Advanced users only)

Only for content with overwhelming natural or artificial grain (e.g., old films, Breaking Bad or massive artifical grain dumps). Not recommended for clean, modern sources. Expect large file sizes and slow encodes, particularly if you have natural content from most modern cameras that is relatively clean and doesn't deserve bloat.

You have been warned: don't expect great efficiency with these settings at CRF50 1080p60 natural content. It is also rather slow.

--preset X --complex-hvs 1 --crf XX --enable-cdef 0 --enable-restoration 0 --enable-tf 0 --spy-rd 1 --noise-norm-strength 3 --qm-min 10 --tune 0 --qp-scale-compress-strength 3 --scm 0 --psy-rd 4.0 --hbd-mds 1

If you want much more detailed information, you can just visit the x266 wiki on the subject and expect some future articles on there for a truely profound... deep dive. Sorry for the word play :)

Feature Additions

Provides control over our augmented AQ Modes 0 and 2 which can utilize variance information in each frame for more consistent quality under high/low contrast scenes. Four curve strength options are provided, and the default is strength 2; 1: mild, 2: gentle, 3: medium, 4: aggressive

Controls how "selective" the algorithm is when boosting superblocks, based on their low/high 8x8 variance ratio. A value of 1 is the least selective, and will readily boost a superblock if only 1/8th of the superblock is low variance. Conversely, a value of 8 will only boost if the entire superblock is low variance. Lower values increase bitrate. The default value is 6.

Enable an alternative variance boost curve, with different bit allocation and visual characteristics. The default is 0. A 3rd curve --variance-boost-curve 3 will be added in the next release for HDR content.

  • Presets -2 & -3

Terrifically slow encoding modes for research purposes, as well as extremely low filesize challenges.

  • Enhanced Tune 0

Using the knowledge gained from the Tune 3 implementation, we greatly enhanced tune 0 to the point that it's become the favored tune for high fidelity video encoding, including demanding anime content. Further updates on related options will be added to push tune 0 into a different direction for less demanding animu content, since many have asked for less... aggressive tuning.

  • Tune 3

A new tune based on Tune 2 (SSIM) called SSIM with Subjective Quality Tuning. Generally harms metric performance in exchange for better visual fidelity with the SSIM+SSIM-RD tuning.

Another new tune based on Tune 2 (SSIM) called Still Picture. Optimized for still images based on SSIMULACRA2 performance on the CID22 Validation test set. Not recommended for use outside of all-intra (image) encoding

A parameter for modifying loopfilter deblock sharpness and rate distortion to improve visual fidelity. The default is 1 (mild sharpness bias).

  • --dolby-vision-rpu path to file

Set the path to a Dolby Vision RPU for encoding Dolby Vision video. SVT-AV1-PSY needs to be built with the enable-libdovi flag enabled in build.sh (see ./Build/linux/build.sh --help for more info) (Thank you @quietvoid !)

  • Progress 3

A new progress mode that provides more detailed information about the encoding process.

Argument for providing a film grain table for synthetic film grain (similar to aomenc's '--film-grain-table=' argument).

  • Extended CRF (for now, only quarter step CRF has been merged to mainline)

Provides a more versatile and granular way to set CRF. Range has been expanded to 70 (from 63) to help with ultra-low bitrate encodes, and can now be set in quarter-step (0.25) increments.

Increases video quality temporal consistency, especially with clips that contain film grain and/or contain fast-moving objects. The default is 1, a conservative setting for most content.

Enables a more accurate loop filter that prevents blocking, for a modest increase in compute time (most noticeable at presets 7 to 9). This stops being useful at Preset 3.

The default is 1, which is based on the preset.

  • Higher-quality presets for 8K and 16K

Lowers the minimum available preset from 8 to 2 for higher-quality 8K and 16K encoding (64 GB of RAM recommended per encoding instance).

It was known before as --frame-luma-bias Enables frame-level luma bias to improve quality in dark scenes by adjusting frame-level QP based on average luminance across each frame. The default is 0.

Restricts available transform sizes to a maximum of 32x32 pixels. Can help slightly improve detail retention at high fidelity CRFs. The default is 0.

Adaptively varies the film grain blocksize based on the resolution of the input video. Often greatly improves the consistency of film grain in the output video, reducing grain patterns. The default is 1, and is the recommended setting.

  • --hdr10plus-json path to file

Set the path to an HDR10+ JSON file for encoding HDR10+ video. SVT-AV1-PSY needs to be built with the enable-hdr10plus flag enabled in build.sh (see ./Build/linux/build.sh --help for more info) (Thank you @quietvoid !)

Manually adjust temporal filtering strength to adjust the trade-off between fewer artifacts in motion and fine detail retention. Each increment is a 2x increase in temporal filtering strength; the default value of 1 is 4x weaker than mainline SVT-AV1's default temporal filter (which would be equivalent to 3 here).

Set the minimum & maximum quantization matrices for chroma planes. The defaults are 8 and 15, respectively. These options decouple chroma quantization matrix control from the luma quantization matrix options currently available, allowing for more control over chroma quality.

Allows the encoder to accept content with odd width and/or height (e.g. 1920x817px). Gone are the "Source Width/Height must be even for YUV_420 colorspace" messages.

Allows the encoder to accept content with width and/or height as small as 4 pixels (e.g. 32x18px).

  • --noise-norm-strength 0 to 4

In a scenario where a video frame contains areas with fine textures or flat regions, noise normalization helps maintain visual quality by boosting certain AC coefficients. The default value is 1; a recommended value is 3.

  • --kf-tf-strength 0 to 4

Manually adjust temporal filtering strength specifically on keyframes. Each increment is a 2x increase in temporal filtering strength; a value of 1 is 4x weaker than mainline SVT-AV1's default temporal filter (which would be equivalent to 3 here). The default value is 1, which reduces alt-ref temporal filtering strength by 4x on keyframes.

Adaptively varies temporal filtering strength based on 64x64 block error. This can slightly improve visual fidelity in scenes with fast motion or fine detail. Setting this to 2 will override --tf-strength and --kf-tf-strength, as their values will be automatically determined by the encoder.

  • --psy-rd 0.0 to 6.0

Configures psychovisual rate distortion strength to improve perceived quality by measuring and attempting to preserve the visual energy distribution of high-frequency details and textures. The default is 1.0.

  • --spy-rd 0 to 2

Configure psychovisually-oriented pathways that bias towards sharpness and detail retention, at the possible expense of increased blocking and banding. The default is 0, with 1 being the most aggressive and 2 being less aggressive.

This setting controls the minimum & maximum quantization matrices for chroma planes. The defaults are 8 and 15, respectively. These options decouple chroma quantization matrix control from the luma quantization matrix options currently available, allowing for more control over chroma quality.

The default minimum is already good, but we recommend setting --chroma-qm-min 10 for more challenging content, as the encoder has a bad tendency to choose always choose more aggressive chroma quantization matrices.

  • --low-q-taper

This setting prevents the encoder from choosing extremely low quantizers for blocks/keyframes, tapering off the quantizers chosen below q11; this can greatly increase efficiency at very low CRF. Original explanation: "Low q taper. If macroblocks are boosted below q11, taper the effect"

Default is 0.

  • --sharp-tx

This setting disables conventional transform optimizations to provide a sharper output overall decided entirely by other metrics, like psy-rd. It has the effect of making psy-rd much stronger, which is why it has been made default default. For more appealing output in much less demanding scenarios, you can disable it by setting --sharp-tx 0, although it is not recommended for grainy content.

Default is 1.

  • --hbd-mds

This setting is short for High Bit Depth - Mode DecisionS (hbd-md was already taken internally and using it caused some bugs). It controls the bit-depth at which internal operations are performed at. On Preset 2 and slower, it is ALWAYS on no matter what.

0 follows the default preset behavior, 1 forces 10-bit mode decision for everything, 2 is adaptive 8/10-bit mode decision based on the scenario, 3 is always 8-bit.

Default is 0, following default preset behavior.

  • --complex-hvs

This is a new and very interesting setting, as it enables a higher complexity metric to be used for mode and transform decisions. When enabled, it switches from the low complexity VAR/SAD to the higher complexity SSD metric. When combined with --psy-rd, particularly at higher strengths, it can grearly increase visual quality.

Normally, on presets faster than P-1, the default metric used is VAR (Variance). With high quality psy-rd enabled (--psy-rd>=1.2), the metric is changed from VAR to SAD (Sum of Absolute Deviations). Already, changing from VAR to the SAD metric increases the strength and quality of psy-rd.

Setting --complex-hvs 1 changes the metric used from SAD to SSD (Sum of Square Deviations). In other words, it's PSNR/SSE/MSE.

By itself, it's honestly not worth the extra 20% encoding time that it brings. However, when combined with psy-rd, it significantly amplifies the strength of psy-rd to make it a much stronger more visually accurate metric; the difference it makes to visual quality in challenging scenarios is honestly mind-blowing.

As some of you might have recognized, psy-rd combined with SSD makes for a low complexity version of PSNR-HVS.

When using psy-rd on slower presets, Preset 6 and slower, it is heavily recommended to set --complex-hvs 1 to optimize visual quality to the fullest. It is not recommended to set --complex-hvs 1 on presets faster than 6.

Default is 0.

  • --noise-adaptive-filtering

This setting controls the noise detection algorithm that turns off CDEF/restoration filtering if the noise level is high enough; this feature is enabled by default if you use tune 0/tune 3. By popular request, a member of our community has decided to add this setting to improve visual appeal on less demanding content.

0 forcefully disables the noise-adaptive CDEF/restoration filters, resulting in CDEF/restoration filtering always being on. 1 enables noise adaptive CDEF and restoration filters.

2 follows default tune beavior

3 only enables noise-adaptive filtering for CDEF, forcing restoration filtering at all times. 4 only enables noise-adaptive filtering for restoration, enabling CDEF at all times.

Default is 0 to improve the appeal of tune 0.

Modified Defaults

SVT-AV1-PSYEX has enhanced defaults versus mainline SVT-AV1 in order to provide better visual fidelity out of the box. They include:

  • Default 10-bit color depth when given a 10-bit input.
  • Disable film grain denoising by default, as it often harms visual fidelity. (Merged to Mainline)
  • Enable quantization matrices by default through --enable-qm 1
  • --tune 0 has been enabled for higher quality encodes by default
  • --noise-adaptive-filtering 0 has been enabled to balance the sharpness from tune 0 while still allowing for higher sharpness than tune 1.
  • Set --qm-min 4 by default for more consistent performance that min QM level 0 doesn't offer. It has been increased from 2, as 4 provides the most balanced gains overall.
  • --chroma-qm-min 10 has been set by default to greatly increase chroma quality, as the encoder will tend to pick too low chroma QMs otherwise.
  • --enable-variance-boost 1 enabled by default.
  • --keyint -2 (the default) uses a ~10s GOP size instead of ~5s.
  • --sharpness 1 by default to prioritize encoder sharpness.
  • Sharp transform optimizations (--sharp-tx 1) are enabled by default to supercharge svt-av1-psy psy-rd optimizations. It is recommended to disable it if you don't use --psy-rd.
  • --tf-strength 1 by default for much lower alt-ref temporal filtering to decrease blur for cleaner encoding.
  • --kf-tf-strength 1 controls are available to the user and are set to 1 by default to remove KF artifacts.
  • --psy-rd 1.0 is set on by default. When combined with --sharp-tx 1, it makes tune 1 much stronger compared to mainline SVt-AV1.
  • --qp-scale-compress-strength 1 has been set to increase visual quality consistency.

We are not in any way affiliated with the Alliance for Open Media or any upstream SVT-AV1 project contributors who have not also contributed here.

Other Changes

Prints the information found in Appendix A.2 of the user guide in order to help users more easily understand the Color Description Options in SvtAv1EncApp.

  • Micro-Releases

We are always continuously improving SVT-AV1-PSY, and we always recommend using the master branch to experience exciting new features as soon as they can be considered usable. To make our feature additions more clear, micro-release tags indicate when significant new feature additions have been made. Micro-release tags are letters starting with A, so new releases will be tagged as v#.#.#-A, v#.#.#-B, etc.

  • Enhanced Content Detection

Tune 4 features a smarter content detection algorithm to optimize the encoder for either screen or photographic content based on the image. This helps Tune 4 achieve better visual fidelity on still images.

Building

For Linux, macOS, & Windows build instructions, see the PSY Development page.

Getting Involved

For more information on SVT-AV1-PSYEX and this project's mission, see the PSY Development page.

Use SVT-AV1-PSYEX

One way to get involved is to use SVT-AV1-PSYEX in your own AV1 encoding projects, increasing the impact our work has on others! You and your users will also be able to provide feedback on the encoder's overall performance and report any issues you encounter. Your name will also be added to this page.

If you use svt-av1-hdr or svt-av1-psyex, it doesn't matter; I'll still include you on this page. Just make sure to not miss too many letters and write svt-av1-e instead.

Projects Featuring SVT-AV1-PSY:

Support Development

If you'd like to directly support the team working on this project, we accept monetary donations via the "Sponsor" button at the top of this repository (it has a pink heart within the button frame). Your donations will help the core development team continue to improve the encoder, our support efforts, and our documentation - a little goes a long way, and we appreciate it immensely.

License

Up to v0.8.7, SVT-AV1 is licensed under the BSD-2-clause license and the Alliance for Open Media Patent License 1.0. See LICENSE and PATENTS for details. Starting from v0.9, SVT-AV1 is licensed under the BSD-3-clause clear license and the Alliance for Open Media Patent License 1.0. See LICENSE and PATENTS for details.

SVT-AV1-PSY does not feature license modifications from mainline SVT-AV1.

Documentation

For additional docs, see the PSY Development page.

About

The Scalable Video Technology for AV1 (SVT-AV1 Encoder and Decoder) with perceptual enhancements for psychovisually optimal AV1 encoding

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 89.4%
  • C++ 5.9%
  • Assembly 4.1%
  • CMake 0.4%
  • Python 0.1%
  • Shell 0.1%