How Digital Video Compression Works

How Digital Video Compression Works

H.264 is a video compression codec standard. It is overwhelmingly used for internet video, phones, Blu-ray movies, security cameras, drone, pretty much everything. It is the final delivery codec that Bonafide Film House uses as well.

H.264 is a impressive piece of technology. It is the result of 30+ years of work with one single goal in mind: To reduce the bandwidth required for full motion video transmission.

It is the result of 30+ years of work with one single goal in mind: To reduce the bandwidth required for full motion video transmission.

The purpose of this post is to give you insight into some of the higher level details of how it works, I will try to not bore you too much.

Why would we want to compress anything in the first place?

A simple uncompressed video is huge. While editing I export pre-rendered videos in Apple ProRes 4.2.2, a 30 minute video is typically 30 gigabytes, that is far too large to deliver to a client. A simple uncompressed video (even larger than the Apple ProRes files I work with) file will contain an array of 2D buffers containing pixel data for each frame. So it’s a 3D (2 spatial dimensions and 1 temporal) array of bytes. Each pixel takes 3 bytes to store – one byte each for the three primary colors (red, green and blue). So a 1080p video at 60 frames a second comes to 370 Megabytes of raw data every second.

This is next to impossible to deal with. A 50GB Blu-ray day would only hold about 2 minutes of video, that’s not going to work.

Shedding Weight

Imagine you’re building a car for racing. You obviously want to go fast, so what is the first thing you do? You shed some weight. Say your car weights 3000 pounds; You throw away stuff you don’t need. The carpets? Gone. That radio? Get it out of here. Heater? Sure don’t need that. Engine? Probably should keep that. You remove everything except the things that matter.

The idea of throwing away bits you don’t need to save space is called lossy compression. H.264 is a lossy codec – it throw away less important information and only keeps the important bits.

Important stuff? How does H.264 know what’s important?

There are a few obvious ways to reduce the size of images. Maybe the top right of the frame is useless all the time. So maybe we can zero out those pixels and discard that area. At this point we would only be using 3/4th of the space we need. Or maybe we can crop out a thick border around the edges of the frame, the important stuff is happening in the middle of the frame anyways. Yes you could do this as well, but this isn’t how H.264 works.

What does H.264 actually do?

H.264, like other lossy image algorithms discards detail information.

compressedimage-1

Compare these two images. See how the compressed one does not show the holes in the speaker grills of the MacBook Pro? If you don’t zoom in you probably wouldn’t even know the difference. The image on the right weighs 7% of the original, that is a huge difference already and we haven’t even started.

7%? How did you pull that off?

Information Entropy

If you paid attention in your information theory class, you might vaguely remember information entropy. Information entropy is the number of bits required to represent some information that it is not simply the size of some dataset. It is minimum number of bits that must be used to represent all the information contained in a dataset. For example,. if your dataset is the result of a single coin toss, you need 1 bit of entropy. If you have to record two coin tosses, you’ll need 2 bits. Make sense?

Well there you go, you’ve just compressed a large dataset.

Suppose you have a coin, you’ve tossed it 10 times and every time it lands on heads. How would you describe this dataset to someone? You wouldn’t says “HEADS HEADS HEADS HEADS HEADS HEADS HEADS HEADS HEADS HEADS”. You would just says “10 tosses all heads” – Well there you go, you’ve just compressed a large dataset. This is obviously a oversimplification, but you’ve transformed some data into another shorter representation of the same information. You’ve reduced data redundancy. The information of this dataset has not changed. This type of encoder is called a entropy encoder – it’s a general purpose lossless encoder that works for any type of data.

Frequency Domain

Now that you understand information entropy, let’s move on to transmissions of data. There are some fundamental units that are used to represent data. If you use binary, you have 0 and 1. If you used hex, you have 16 characters. You can easily transform between the two systems. They are essentially equivalent.

Now imagine you can transform any dataset that varies over space – something like the brightness value of an image, into a different coordinate space. So instead of x-y coordinates, let’s say we have frequency coordinates. Frequency X and Frequency Y are the axes now. This is called a frequency domain representation. There is another mathematical theorem that states you can do this for any data and you can achieve a perfect lossless transformation as long as Frequency X and Frequency Y are high enough.

What is Frequency X and Frequency Y?

Frequency X and Frequency Y are another kind of base unit. Just like when we switch from binary to hexcode, we have different fundamental units, we’re switching from the familiar X-Y to Frequency X and Frequency Y. Here is what our image looks like in the frequency domain.

basicfft-2

The fine grill of the MacBook Pro has a higher information content in the higher frequency components of that image. Finely varying content + high frequency components. Any sort of gradual variation in the color and brightness, such as gradients are low frequency components of that image. Anything in between falls in between. So fine details equals high frequency and gentle gradients equals low frequency.

In the frequency domain representation, the low frequency components are near the center of the image. The higher frequency components are toward the edge of the image.

Why do all this?

Because now, you can take that image containing all of the frequency domain information and then mask out the edge and discard information which will contain information with high frequency components. Now if you convert back to your regular x-y coordinates, you’ll find that the resulting image looks similar to the original but has lost some of the fine details. But now, the image only occupies a fraction of the space. By controlling how big your mask is, you can now tune precisely how detailed you want your output images to be.

quantizationhorizontalwithmasks-1

The numbers represent the information entropy of that image as a fraction of the original. Even at 2%, you don’t notice the difference at this zoom level. 2%! Your race car now weights 60 pounds!

So that’s how you shed weight. This process in lossy compression is called quantization.

Chroma Subsampling

The human eye and brain system is not very good at resolving finer details in color. It can detect minor variations in brightness very easily, but not color. So there must be some way to discard color information to shed even more weight.

In a TV signal, RGB (red, green, blue) color information gets transformed to Y+Cb+Cr. The Y is the luminance (essentially black darkness and white brightness) and the Cb and Cr are the chrominance (color) components. RGB and YCbCr are equivalent in terms of information entropy.

Why complicate matters? Why not use RGB?

Back before we had color television, we only had the Y signal, and when color TV’s started coming along, engineers had to figure out a way to transmit RGB color along with Y. Instead of using two separate data streams, they decided to encode the color information into Cb and Cr and transmit that along with the Y information. That way, Black and White televisions look at the chrominance components and convert to RGB internally. But here’s the trick: the Y component gets encoded at full resolution. The C components only at a quarter resolution, because the eye and brain is terrible at detecting color variations, you can get away with this. By doing this, you reduce total bandwidth by one half, with very little visual difference. So we’ve reduced the data by half! Your race car now weights 60 pounds.

This process of discarding some of the color information is called Chroma Subsampling. While not specific to H.264 and has been around for decades itself, it is used almost universally.

Those are the big weight shedders for lossy compression. Our frames now tiny – since we discarded most of the detail information and half of the color information.

Can we take this even further?

Yes, in fact we can. Weight shedding is only the first step. So far we’re only looking at the spatial domains within a single frame. Now it’s time to explore temporal compression – where we look at a group of frames across time.

Motion Compensation

H.264 is a motion compensation compression standard.

Imagine you’re watching a tennis match. The camera is fixed on a certain angle. The only thing moving is the ball back and forth. How would you encode this information? The same method using a 3D array of pixels, two dimensional in space and one in time?

No. Most of the image is the same. The court, the net, the crowds are all static. The only thing moving is the ball. What if you could just have one static image of everything on the background and then one moving image of just the ball. That would save lots of space.

This is exactly what H.264 does. It splits up the image into macro-blocks

This is exactly what H.264 does. It splits up the image into macro-blocks – typically 16×16 blocks that it will use for motion estimation. It encodes one static image – typically called an Intra Frame. This is a full frame containing all the bits it required to construct the frame. And then subsequent frames are either P-frames (predicted) or B-frames (bi-directionally predicted. P-frames are frames that will encode a motion vector for each of the macro blocks from the previous frame. So a P-frame has to be constructed by the decoder based on previous frames. It starts with the last I-frame in the video stream and then walks through every subsequent frame – adding up the motion vector deltas as it goes along until it arrives at the current frame.

B-frames are even more interesting, where the prediction happens bi-directionally, both from past frames and from future frames.Since you’re only encoding motion vectors deltas, this technique is extremely efficient for any video with motion. Now we’ve covered both spatial and temporal compression! So far we had a ton of space saved in Quantization. Chroma subsampling further halved the space required. On top of that we have motion compensation that stores only 3 frames for the 300 we had in that video.

Looks pretty good, now what?

We use a tradition lossless entropy encode to finish it off.

Entropy Encoder

The I-frames, after the lossy steps, contain redundant information. The motion vectors for each of the macro blocks in the P and B-frames – there are entire groups of them with the same values – since serveral macro blocks move by the same amount when the image pans in our video test.

An entropy encoder will take care of the redundancy. And since it is a general purpose lossless encoder, we don’t have to worry about what tradeoffs it’s making. We can recover all the data that goes in.

And we’re done! At the core of it, this is how H.264 works. These are it’s tricks.

I am massively oversimplifying several decades of intense research in this field. If you want to know more, the Wikipedia Page is pretty descriptive.

By Justin Kietzman

justincircleheadshot

Director and Editor at Bonafide Film House
Published September, 1 2016

Sources:

Part 2: What I learned from starting a Youtube channel using nothing but GoPros. by Justin Kietzman

After work one day me and my previously mentioned co-worker decided to head out to the Lower Madison River near Bozeman Montana. Armed to the teeth with every piece of fishing equipment we owned and two Go-Pro Hero 2’s, with cut up cases so I could stick Panasonic Stereo mic’s in them; which completely removes the waterproofing (I don’t understand how I never destroyed one of those cameras).

We were amped and ready to catch fish.

We got to our fishing spot at Red Mountain campground, a location we had caught many fish at before, put on the Go-Pros and hit the water with our fly rods. Not-so-luckily for us, there was a screaming drunk person on a float going past every 2 minutes, spilling beer in the water and making as much noise as possible. So after about two hours of fishing and one complete set of Go-Pro batteries, we decided to hang up the graceful touch of the fly rods and move down river to faster water and break out the spinning rods.

Once there, I realized the battery situation was more dire than I had realized. The battery on the Go-Pro I was wearing was at 50% and I only had one spare for my co-worker. So the rest of the day was nothing more than me occasionally switching on my Go-Pro when I thought I was about to catch fish, then forgetting to switch it back off, draining my battery even more. Luckily my partner that day, caught a few fish while his was running, scoring us some decent very shaky footage.

Go-Pros are the most reliable cameras on the market.

If you treat them well and follow the rules, they will work. As long as you have a fast enough memory card, you will never get a file error, especially in Protune. They can record video for constantly for the life of the battery, they will not shut off on you like most DSLR’s. The batteries run for two hours only, the primary complaint I hear about Go-Pros is the battery, these are primarily from people with little camera experience. Besides one Canon camcorder I owned, the Go-Pro has the longest single battery life of any camera I own. If you need the camera to run longer, they sell battery backpacs, or you can use an external charger, upping your time to 4-6 hours.

Next week I will talk about my editing workflow with the Go-Pro footage.

 

Stay Tuned

 

-Justin Kietzman

Transitions: Film to Digital By: Anthony Cohen

TransitionsMy first real camera was a 1980 Canon AE-1, I was 18 and in my first semester of college when I entered my first dark room. After only a week I experienced the magic of an image materializing on the page for the first time. I never looked back. Nothing is better then blasting some music chilling out and processing film after work.

If you have ever made the transition from film to digital, you lose some of the magic. Digital media is amazing; a marvel of modern technology; you get to take photos lightning fast and review them instantly. These are major bonuses but the finality of clicking the shutter is gone and photography becomes less special.

That is when I received my second real camera a Sony a5100. To be honest I bought it for its amazing video, and have moved up through the line buying a Sony a6000 and a Sony a7s.  When I first got into video production I noticed some of the old magic coming back you have that one chance to capture a moment perfectly you can try an recreate it but its never exactly the same. You might argue that its the same way with digital photography, but I feel it more when I am filming something. From the calculated and controlled click of the record button to the hours it takes to edit and refine a video into a workable piece the similarities between shooting video or 35mm film are there.

When I started shooting live events the similarities became even more apparent just like using film you have timed shots you need to take your time and not miss anything but your camera can only be pointed in so many directions. There is a sort of adrenaline rush knowing you have one chance to get the perfect shot. Shooting video just like film just feels more personal.

 

I now mainly shoot on a Sony a7s and a a6000 but I do carry my old AE-1 around from time to time. I still get to use the dark room at the local university.

Cameras:

Sony A7s shooting in XAVC-S or externally

Sony a6000 to XAVC-S

Canon AE-1 loaded with Illford Delta 400 usually pushed to 1600

 

Programs:

Adobe Premier Pro CC 2016

Adobe After Effects CC 2016

Local dark room using Illford RC matte paper

By Anthony Cohen

Bozeman Winter Farmers Market – Behind the Scenes

 

On January 30, 2016 the Bonafide Crew woke up early, still exhausted from filming a concert for a documentary the night before. Luckily today was a easier, very pleasant job that didn’t involve carrying heavy gear through the snow.

Having prepped the Blackmagic Pocket Cinema the night before, it was rigged up with the 7 inch field monitor, Zoom H4 audio recorder, Rode Shotgun Microphone, 10,000 MaH external battery and the big memory card in the belly of the camera. It was mounted on the Manfrotto 60 inch tripod with the Manfrotto fluid head. Justin would be operating this camera today.

Anthony grabbed the Sony a6000 as the B cam, which he used for wides and zoomed cutaways. Due to the a6000 having a “Custom -3 -3 -3 mode” it can come close to the soft, desaturated image of the Blackmagic.

We spent about an hour at the event, starting by filming B-Roll of the hallways upstairs, while Anthony found out where the event is actually held (downstairs to anyone wondering). Once we made it to the farmers market, I started filming wides of the crowds while Anthony did some sniper shots on ceiling lights.

After a few minutes of that, we got a bearing on who was interested in talking to the camera. Luckily Justin noticed that there was something going on with the audio as the Blackmagic was showing no active levels, so we pulled the Zoom Recorder off of the rig and set it in front of people as they spoke. This caused a very high level of room noise, but was better than nothing.

After having some very pleasant conversations with quite a few of the vendors, we decided to head out.

Once home, both of the cards were dumped to our working drive. Since the Blackmagic records to ProRes HQ and the Sony records to XAVC S, no transcoding was needed.

A little over a week later Justin started editing in Premiere Pro CC. After cutting he chose a very warm soft color scheme, that matches the feeling of being at a farmers market. Very little After Effects CC work was needed, besides some warp stabilizing.

 

Cameras Used:

Blackmagic Pocket Cinema
Sony A6000 Shooting in XAVC S

Software Used:

Premiere Pro CC
After Effects CC

 

 

Part 1: What I learned from starting a Youtube channel using nothing but GoPros. by Justin Kietzman

 

As anyone involved in video production knows, it is very easy to get caught up in the gear. You often put off projects, or don’t even considering certain shots, based solely on the fact that I don’t have the right filter with me or my lens isn’t fast enough to get that shot. From a professional standpoint these thoughts are often totally valid, but that’s what’s funny about youtube, these things are never valid. The youtube audience does not care about picture, they care only about content.

But first, a bit of backstory.

Over the summer of 2015, after returning to work from a reconstructive knee surgery, I was spending most of my time hiking, or doing laps around the Gallatin Valley on my road bike trying to get my strength back. After a few months of this I started to get very bored of the repetitiveness of seeing the same tour busses heading to West Yellowstone as I would pass Gooch Hill, or seeing the same backs of trail runners as they passed me while I headed up Sourdough Canyon Trail.

Noticing my frustration and boredom, one day a co-worker offered to take me fishing, as he knew I loved salt water fishing.

Hailing from central North Carolina, the only fishing we had was for crappies, which isn’t the most exciting and bass fishing, which is absurdly hard for a kid fishing in a lake that is regularly used for national bass competitions. So some of the best memories I have as a kid were the times we went deep sea fishing off the coast of Myrtle Beach or Charlestone, SC. When you go out 75 miles in a fishing boat, with a captain that has been doing it for 30 years, you catch an absolutely insane amount of fish.

During my knee surgery I spent a lot of time rigging my open top kayak and going pier and surf fishing with my father. I spent a lot of time also watching Robert Field youtube videos, seeing how he handled the larger fish in his kayak. All the while, really taking in how he filmed his shows.

All of this came back really quickly the first time I fished in Montana. The fish here are hungry. Before, when I thought of fresh water fishing, I envisioned sitting on a boat, in the middle of a still lake, in 98 degree, 95% humidity, not catching anything all day. But now, I think about that first time I stood in the middle of the Madison, watching with snipers eyes as I would see 14 inch rainbow peek out from behind his rock, deciding to chase my lure and running off as they would see my standing there while I reeled in my size 2 mepps as fast as I could. It was exilerating. Having been involved in a FPV (first person view) action sports youtube channel the year prior, I had a feeling this was something people wanted to see, through the eyes of the fisherman.

That’s when Intense Fish was born. It was the simplest of plans, using them simplest of gear, doing the simplest of sports, fishing.

I will get into the technicals of how I made IntenseFish and Bonafide Fishing next week.

Stay Tuned.

 

-Justin Kietzman

Five Rivers Lodge – Behind the Scenes

fiveriverslodgeOn January 17th, 2016 The Bonafide Film House crew set out in our production van to Dillon, Montana to film Five Rivers Lodge. Armed with our Black Magic Cinema, the Sony a6000 and a slew of GoPro 3’s to compliment our Aerial footage.

We arrived at the Lodge around 9:45 am and quickly unpacked our equipment and got the drone in the air. After capturing some stunning footage of the surrounding mountain ranges and breathtaking views we set the drone aside to set up our time lapse cameras (GoPro’s in this case). After pressing record on those we moved into the lodge breaking out the Black Magic, lights, sliders and manfrotto fluid head to capture the each rooms unique feel.

Once we completed the inside of the lodge we moved back outside with a freshly charged DJI Phantom 3 Professional and used the birds eye view to really show the vastness of the property and the beauty of its surroundings. Start to finish with the 3 hour drive time we where in the field for 6 hours.

Once we got home we started ingestion; dumping and rendering all the time lapses out into the cineform mezzanine codec. Justin took over from there, using Adobe After Effects CC 2016 and Adobe Premier Pro CC 2016 to edit, stylize and color correct all our the lodge to the best of his ability. In the end Bonafide Film House produced a quality piece to help promote the lodge.

Cameras used:

Programs used:

 

 

Waves of Ventura – Behind the Scenes

wavesofventura

On December 25, 2015 I flew from Bozeman, Montana to San Diego international airport. I packed light for the trip, ditching all the heavier clothing for shorts and a t-shirt in preparation for the transition from temperatures, just above zero to  those in the 60’s. Packing light allowed some room to not only bring my Sony a5100 and a few extra lenses but a small Manfrotto tripod and fluid head.

After a few days visiting with the family, my wife Amanda and I drove north towards Ventura, California where this piece was filmed. The first few scenes of the film were taken in Ventura harbor while walking along the beach. The surfers and sunset clips where shot on the last day of our stay. The airport sequence was taken before we took off from LAX to return home.

Once I returned home the data was all ingested by Justin. Time lapses where built using Adobe After Effects CC 2016 and the footage was color corrected and cut together using Adobe Premier Pro CC 2016. Justin took all the footage and weaved it into a great little piece showcasing the Waves of Ventura.

Camera used:

Software Used:

Boxes of Bozeman – Behind the Scenes

Boxesofbozeman

If you have driven around Bozeman, Montana any time in the last year in a half you have seen them… electrical boxes wrapped in the beautiful art of local artists. A few weeks ago Justin and I where driving back from a shoot over by Whitehall, MT. We where brain storming about different local projects we could do and we stumbled upon boxes. What could be better? Not only where we producing something about art that bozemanites and and people from around Gallatin valley get to experience on a daily basis but we could showcase local artists and the beautiful works they do. So Boxes of Bozeman began.

We used the Black Pocket Magic Cinema as our primary camera for this shoot. Its high dynamic range and light weight made it the perfect camera for running around in the snow around town. Being a side project we mostly captured boxes between shoots as we saw them on the side of the road. Over the course of a few weeks we got enough footage to put together this piece.

All said and told we shot about 150 gb worth of footage in RAW on the Black Magic. Using a Konova slider, four foot Indy jib and a Manfrotto fluid head we where able to capture movement and make each shot more dynamic. The Time lapse was shot at night on a Sony a6000 it was set at f/11 to keep everything crisp with a shutter speed of 1.5 sec.

Once we had enough footage Justin put it all into Adobe Premier Pro cc 2016 and Adobe After Effects cc 2016 for final editing. He used high end color correcting techniques an a modern editing style to give the video a simple yet clean feel.

Cameras used:

 

Leftovers – Behind the Scenes

Leftovers

Leftovers was a pet project of Justin Kietzman. As 2015 started to wind down Bonafide Film House found that we had a huge collection of time lapse shot from over the past year shot with a variety of cameras. These time lapses while all decent each have a small flaw in some way, They may have been bumped or something unusual happened in frame. For one reason or another these pieces where not used in our final projects.

Justin took these and ran with them using a his creative flair he was able to piece together a final product that shows the power of a time lapse. The completed film was edited using a variety of productions suites including Adobe Premiere Pro CC 2016, Adobe After Affects CC 2016 and Go Pro studio.

These time lapses where shot on a variety of different camera’s including:

Filming Locations:

  • Bozeman Montana
  • Ventura California
  • San Diego California
  • Sarasota Florida

Tom Catmull “Addiction” – Behind the Scenes

 Untitled-2

Hailing from Missoula, Montana Tom Catmull was scheduled to play at Norris Hot Springs on January 31st, 2016. The Bonafide Film House crew packed up the production van and headed south towards Ennis, MT to capture his performance in the best quality possible. Heading out from our studio near main street Bozeman, Montana we made good time to Norris. Arriving an hour and a half early put us there at magic hour allowing us to capture the stunning sun set as it slid behind the tobacco root range.

After meeting with Mr. Catmull the crew quickly unloaded and assembled the equipment. Due to the unusual shooting conditions at Norris Hot Springs we chose to shoot a 3 camera set up. We used the Black Magic Cinema as our primary camera set up as a wide angle of the stage built within a geodesic dome. Our two B cameras where set up within the dome. The GH3 getting a 3/4 shot of Tom playing at his chair and the Sony a6000 rigged up with the field monitor on a shoulder rig to get close ups and more dynamic angles.

Once the performance started, Anthony used the shoulder rig within the dome stage, while Justin manned the Black Magic, Lumix and made sure the audio was on point. At the end of the night with over 200 Gb of raw data and some amazing shots using only the stage’s preset lighting we headed back to the studio.

Justin Kietzman handled all the post processing. He finished all data ingestion the night of the performance and then edited all the footage a couple days later using Adobe Premiere Pro CC 2016 And  Adobe After Affects CC 2016. The entire project was completed within 4 business days.

 

 Camera’s used on this shoot:

Software used for post production: