logo

CBR and VBR in mp4 H264 video files

October 26th, 2011

CBR versus VBR in video encoding

When referring to codecs, CBR (constant bitrate) encoding means that the rate at which a codec’s output data should be consumed is constant. As opposed to constant bitrate, VBR (variable bitrate) vary the amount of output data per time segment. VBR allows you to set a maximum and minimum bitrate. The advantages of VBR are that it produces a better quality-to-space ratio compared to a CBR file of the same data. The bits available are used more flexibly to encode the sound or video data more accurately, with fewer bits used in less demanding passages and more bits used in difficult-to-encode passages.

The disadvantages are that it takes more time to encode, as the process is more complex. VBR may  pose problems when streaming over a  web connection since it is the maximum bit rate that matters, not the average.

The generally accepted best practice is to use CBR when producing for streaming delivery, and VBR when producing for progressive download.

 

MPEG-4 containers : mp4, m4a, m4p, m4v

October 25th, 2011

To play H264 encoded movies, the encoded video and audio files must be packaged in a specific type of container following the MPEG-4 Part 12 specification. Stream packaging, also known as muxing, is the procedure to combine multiple elements that enable control of the distribution delivery process into a single multiplexed media file.

The most common container for H264 encoded videos is specified by MPEG-4 Part 14 (standard ISO 14496-14) and has the file extension mp4. This format, often called MP4 container,  is based on the quicktime format mov. Audio-only MPEG-4 files have a m4a extension, or m4p when they are encrypted.

Apple introduced the extension m4v for it’s iTunes applications. It’s very close to .mp4, some differences are the optional Apple’s DRM copyright protection, and the treatment of AC3 (Dolby Digital) audio which is not standardized for MP4 container

The following command line tools are available to create and modify MP4 files by combining (multiplexing) previously encoded video or audio tracks, as well as subtitles, chapter information and meta data.

  • AtomicParsley : lightweight program for reading, parsing and setting metadata into MPEG-4 files
  • MP4Creator : tool from Cisco’s mpeg4ip suite that combines video, audio, text and other media to create MPEG-4 streams.

GUI’s for both programs are also available.

More informations about the MPEG-4 containers are listed hereafter :

Quicktime indexing (MOOV atom) in mp4 H264 video files

October 25th, 2011

QTIndexSwapper 2 (to move MOOV atom)

When a streaming mp4 H264 video file won’t play immediately, the reason could be a quicktime (QT) Index problem. This index is called MOOV atom. The moov atom, also referred to as the movie atom, defines the timescale, duration, display characteristics of the movie, as well as subatoms containing information for each track in the movie.

Often the moov atom is located at the end of the video file and the player needs to load the entire file to read this information. The solution is simple: Move the moov atom from the end of the file to the beginning. Renaun Erickson, Developer Evangelist for Adobe Systems Inc., created a simple tool called QTIndexSwapper 2 to do this job. This AIR application can be downloaded from his blog. Another tool to move the moov atom is MP4 FastStart.

More informations about atoms in mp4 files are availabale at the following links :

AVC (H264) video settings

October 24th, 2011

It’s not easy to configure an H264 codec to create videos which will play on different devices and stream from various servers on the web, including Amazon S3 Cloudfront. Some basic informations about the different frame types of H264 are given at the post Smart editing of MPEG-4/H264 videos. The following list gives some informations about the common H264 parameters :

CABAC : stands for Context Adaptive Binary Arithmetic Coding. Improves encoding efficiency at the expense of playback/decoding efficiency. The default option is on, unless the encoded video is to be played back on devices with limited decoding power (for example iPod). CABAC is only supported by the main and higher profiles.

Trellis : Trellis is only available with CABAC on. It improves quality, while maintaining a small file size but it will increase conversion time slightly. The default value is on.

Encoding mode :

  • Single Pass – Bitrate: encodes the video once  with a set constant bitrate for each frame
  • Single Pass – Quantizer: encodes the video with a set quantizer (higher quantizer => lower quality) for each frame. The default value is 26, the maximum value  is 51.
  • Single Pass – Quality: encodes the video with a set quality rating for each frame
  • Two Pass:  encodes the video twice (once to determine it’s properties, another to ensure the selected output file size is reached with maximum efficiency). This is the most common setting.
  • Multi Pass: Same as Two Pass except for extra encoding passes to ensure even better quality/accurate file size. During multipass encoding, the video results of the first pass are saved into a log file. In a second step the encoding is done based on the logfile data.

Bit Rate : the average bitrate varies between 0 and 5000 Kbits/s; the default values are 800 Kbits/s for low quality, 1000 Kbit/s for medium quality and 1200 Kbits/s for high quality.

  • Keyframe Boost  : High values give better visual quality but also bigger file sizes. The default value for I-Frames is 40%. Values vary from 0 to 70.
  • B-Frame reduction : these frames are responsible for the interpretation of motion in the video. This setting determines the reduction of quality in B-frames in favor of P-frames (predicted picture). The default vallue is 30%, the range varies from 0 to 60%. For cartoons higher values are recommended.
  • Bitrate variability : This attribute indicates in how far the bitrate is allowed to vary in relation to what is set as target bitrate. A variable bitrate tells the encoder to vary bitrate as needed, based on the information in the frames. The default value is 60%, the range varies from 0 to 100%.

Quantization limits : these values are only used when the Single Pass – Quantizer encoding mode is selected.

  • Min QP : Values vary from 0 to 50, the default value is 10.
  • Max QP : Values vary from 0 to 51, the default value is 51
  • Max QP step : Values vary from 0 to 50, the default value is 4.

Scene cuts : this option sets how H264 determines when a scene change has occurred and hence when a key frame is needed.

  • Scene cut threshold : The default value is 40. A higher value will allow H264 to be less sensitive to scene changes. A lower value is recommended for dark videos.
  • Min IDR frame interval : IDR means Instantaneous Decode Refresh, a parameter to indicate the amount of frames in between before the encoder can detect a new scene change. Setting this to high will result in not detecting enough scene changes. Setting it too low results in an unnecessary high bitrate. The range varies from 0 to 100.000, the default value is 25.
  • Max IDR frame interval : Setting this too low results in too many keyframes and as such wasting bitrate for nothing. The range varies from 0 to 100.000, the default value is 250.

Partitions : During the encoding process, the encoder will break down the video into so-called Macroblocks. Then it will search for similar blocks in order to discard redundant data. The macroblocks can be subdivided into 16×8, 8×16, 8×8, 4×8, 8×4, and 4×4 partitions. The partition searches increase accuracy and compression efficiency. As a general rule, the more search types are performed, the better and stronger the compression will be while maintaining a high quality output.

  • 8×8 transform : the 8×8 Adaptive DCT transform is a very powerful compression technique but it is not compatible with every device. It makes the video High Profile AVC.
  • 8×8, 8×16 and 16×8 P-Frame search : This settings enables the 8×8 partitions on P-Frames and thus improves the visual quality of these frames.
  • 8×8, 8×16 and 16×8 B-Frame search : This settings enables the 8×8 partitions on B-Frames and thus improves the visual quality of these frames.
  • 4×4, 4×8 and 8×4 P-Frame search : This settings enables the 4×4 partitions on P-Frames, but usually the quality improvement will be negligible. Therefore this option is not worth the additional encoding time and thus can safely be turned off.
  • 8×8 intra search : This settings enables the 8×8 partitions on I-Frames and thus improves the visual quality of these frames, but it requires the 8×8 Adaptive DCT Transform.
  • 4×4 intra search : This settings enables the 4×4 partitions on I-Frames and thus improves the visual quality of these frames.

B-Frames :

  • Use as a reference : alows a B-Frame to reference another B-Frame to provide better quality. Only useful when using more than 2 consecutive B-Frames.
  • Adaptive : Turns on adaptive B-frames, which allows H264 to determine the number of B-frames to use. The default value is on. This option is only available when at least 1 B-frame has been set.
  • Bidirectional ME :  allows predictions based on motion both before and after the B-frames. Default value is on.
  • Weighted bipredictional :  allows B-Frames to be predicted more heavily from P-Frames which results in improved accuracy and therefore a more efficient encoding. Default value is on. This option is only available when at least 1 B-frame has been set.
  • Direct B-Frame mode : temporal or spatial : The default value is temporal. The spatial mode handles better animated content.
  • Max consecutive : the number of consecutive B-Frames. The values vary from 0 to 5, the default value is 3.
  • Bias : Sets how much bias H264 should give the usage of B-frames (higher means more use of B-frames). Setting this to 100 is the equivalent of not selecting the “Adaptive” option.The default value is 0, possible values vary from -100 to +100.

Motion estimation :

  • Partition decision : This controls the precision with which the motion in the video is estimated. Values range from 1 to 6. The default value is 5. A setting of 6 is even better but it strongly increases the amount of time needed for the conversion.
  • Method : The better the method, the more efficient compression and high quality output. Hexagonal Search is the default setting. Uneven Multi-hexagon is meant for powerful computers, while Exhaustive search works only on super computers.
  • Range : this field is disabled when you select Hexagonal Search. It only works with the powerful methods and it specifies the motion search in the pixels. The more pixels are examined, the more processor power is needed, but the better the outcome. The values vary from 0 to 64, the default value is 16.
  • Max Ref Frames : This value indicates how many previous frames can be referenced by a P-frame or B-frame. The higher this value, the better the quality at the expense of speed. The values vary from 0 to 16, the default value is 0.
  • Mixed references : offers the codec greater freedom to make references on a smaller scale. This option is only available when the Max Ref Frames value is greater than 1.
  • Chroma ME : uses the color information in the video to estimate motions, which increases the visual quality. It is recommended to set this option on.

Misc. options :

  • Threads : This sets the number of CPU threads to use in encoding. Default value is 1.
  • Noise reduction : this setting depends if there is noise in the video images or not. Videos with noise appear grainyNoise Reduction filters out that noise and the more noise you have, the higher you need to set the value. Varies from 0 to 65535. Default value is 0.
  • Deblocking filter : A deblocking filter is a video filter applied to blocks in decoded video to improve visual quality and prediction performance by smoothing the sharp edges which can form between macroblocks when block coding techniques are used. The strength (values from -6 to +6) and threshold (values from -6 to +6) of the filter are set. The default values are 0 and 0.

The H264 specifications define a number of different profiles specifying which compression features of H.264 are allowed or forbidden. In addition to the profiles, the H264 specifications also define a number of levels  putting further restrictions on other properties of the video. These restrictions include the maximum resolution, the maximum bitrate, the maximum framerate. The common notation for Profiles and Levels is “Profile@Level”, for example Main@3.1. There is no way to directly encode a video to a specific level and/or profile,  you must choose the encoder settings accordingly. Presets may be helpful to define the correct settings.

The most common profiles for webstreaming are baseline (BP) and main (MP). Some differences in the features for these profiles are shown hereafter :

Compression features Baseline Profile Main Profile
B-Frames no yes
CABAC no yes
FMO, ASO, RS yes no
PicAFF, MBAFF no yes

The next table shows the maximum values for some common levels :

Level Number Video bitrate Resolution & frame rate
1.3 768 Kbit/s 352×288 ; 30 fps
2.2 4 Mbit/s 352×576 ; 25 fps
3.1 14 Mbit/s 720×576 ; 25 fps
4.0 20 Mbit/s 1920×1080 ; 30 fps

Further informations about H264 are available at the following websites :

MPEG-4 Part 2 and Part 10

October 19th, 2011

MPEG-4 Part 2 (MPEG-4 Visual) is a video compression technology developed by MPEG, similar to previous standards such as MPEG-1 and MPEG-2 and compatible with H.263. Several popular codecs including DivX and Xvid implement this standard.

MPEG-4 Visual should not be confused with MPEG-4 Part 10 which is commonly referred to as H.264 or AVC (Advanced Video Coding), and was jointly developed by ITU-T and MPEG.

AVC is currently one of the most commonly used formats for the recording, compression, and distribution of high definition video.

Create videos on Youtube

July 23rd, 2011

YouTube recommends the following applications to animate your own story or to create a video slideshow.

  • GoAnimate is a fun app that lets you make animated videos, for free, in just 10 minutes, without having to draw. You can even create your own cast of characters.
  • Xtranormal lets you to turn anything you type into a fully-animated CG movie. Set up your scene, type in your script, and animate it instantly.
  • Stupeflix Video Maker lets you tell a story with your digital content. Mix pictures, videos, maps, text, music and watch Stupeflix produce a stunning video in a few seconds.
  • One True Media by SpotMixer is a simply powerful video creation tool. Robust, fast and easy video editing. Combine and clip video and photos.

Youtube Custom Player discontinued

July 21st, 2011

In june 2011, YouTube removed support for the creation of new Custom Players, a specialized way of embedding playlists for playback on third party sites in a customizable interface. Existing players will continue to function. An example of a YouTube custom player is embedded in Cedrix Crespel’s music video webpage at Leslie’s Artgallery.

Similar functionality is available through creating embedded playlists, which can be accessed by visiting http://youtube.com/my_playlists, clicking Share, and then using the embed code given there.

Square and non square video pixels : pixel aspect ratio and picture aspect ratio

July 21st, 2011

last update : November 14, 2011
Whereas pixels in the graphic and computer world are square, pixels in the old video world (PAL and NTSC) are non-square (Recommendation ITU-R BT.601-4). Video pixels in the HD world are, fortunately, square.

The term which describes this squareness or non-squareness is the pixel aspect ratio, expressed as a fraction of horizontal (x) pixel size divided by vertical (y) pixel size.

The PAL (576i) pixel aspect ratio (PAR) is 59/54 (1,094), the NTSC (480i) pixel aspect ratio is 10/11.

The pixel aspect ratio must not be confused with the display aspect ratio (DAR) or where the common values are 4:3 and 16:9 (anamorphic format).

When doing a conversion of a video file from one size or format in another size or format, the resulting video geometry will be stretched or squished if the pixel aspect ratio is not accomodated. Usually the errors are small and there is no great damage in the result if the correct conversion factor is ignored. The difference can become critical if filters are applied or other synthetic effects are added.

More detailed informations are available in the lurker’s guide to video from Chris Pirazzi.

Another problem is that the commonly used digital video resolutions don’t exactly represent the actual 4:3 or 16:9 picture aspect ratios. All commonly used modern digital video standards  are based on their counterparts in analog video standards to avoid too many compatibility issues. The most used sampling rate in PAL and NTSC video systems is 13,5 Mhz.

PAL has a line length of 64 µs, of which 52 µs contains actual image information, the rest is reserved for horizontal blanking. 52 µs × 13.5 MHz = 702 samples per scanline. In the vertical direction, there are 574 complete lines and 2 half lines, giving a total of 576 scanlines. Thus, the active image area for a 4:3 or 16:9 frame at 13.5 MHz sampling is 702×576 pixels.

For NTSC, the same calculation gives an image area of 711×486 pixels.

Instead of using 702 or 711 samples per line, the digital video standard defines 720 samples (= pixels) per line to allow for little deviations from the ideal timing values and to use a common sampling rate of 13,5 Mhz.

When converting videos from one size to another, cropping or adding black side edges to the video is necessary to keep the correct image aspect ratio. Fortunately some video conversion softwares care for these conditions.

More details and a conversion table are available in the Quick Guide to Digital Video Resolution and Aspect Ratio Conversions maintained by Jukka Aho. Another useful tutorial about Pixel Aspect Ratio is available at the doom9.net website.

Ken Burns video effect

June 26th, 2011

The Ken Burns video effect is a popular name for a  panning and zooming effect used in video production from still imagery. The name refers to Kenneth Lauren “Ken” Burns,  an American director and producer of documentary films known for his style of using archival footage and photographs.

The technique is principally used in historical documentaries where film or video material is not available. The ffect is included in several movie edition softwares, for instance on the Windows platform in AVS Video Editor.

Will McGugan published a tutorial about the Ken Burns effect in javascript and canvas on his blog.

FOURCC.org

November 21st, 2010

FOURCC is short for “four character code” – an identifier for a video codec, compression format, color or pixel format used in media files. Another way to write FOURCC is 4CC. To find out which FOURCC’s are used within a media file, you need to use an application specialized to open and inspect the media file. Gspot, MediaInfo and ACIcodec are some of the tools supporting FOURCC.

A list of a few hundred video-codecs is available at the FOURCC website, a list of RGB- and YUV-pixel formats is available at the same site.

For audio codecs it is not FOURCC’s that is used, but rather audio tags, or an audio identifier – that identifies one specific audio codec or one type of audio compression scheme. An audio tag is an integer decimal value, often specified as a HEX value.

A great website about video- and audio-codecs is MovieCodec Forums/Downloads.