Determining which format to use is a matter of informed choices. The questions to ask in order to effectively make these choices are hierarchical, in other words, the answers to primary questions will help you to determine the answers to secondary questions. First, and of utmost importance, what is the final distribution format of the project? While you may choose, based on budget, postproduction considerations, and availability, to work with a format that is of a much higher resolution and bandwidth than the anticipated delivery and/or distribution format, the final format will dictate the minimum requirements of the chosen acquisition medium and format.

When money and time are at a premium, and a filmmaker wishes to “future-proof” their visual assets for anticipated re-use and reformat of this material, as well as investing in a time-tested archival format, nothing beats film, the higher the resolution, the better. For the sake of discussion for independent filmmakers, we won’t go into the very large film formats such as IMAX, Vistavision, and 65mm; for our purposes it’s enough to be aware that there are various aspect ratios used in 35mm and 16mm that determine the shape of the image that is projected on the screen, whether that screen is a video screen or a theater screen, which may be illuminated by a film projector or a video projector.

The aesthetic attributes of film can be argued either positively or negatively. Some filmmakers contend that the look of film is more “cinematic”, that the 24 frames per second frame rate, or temporal resolution, is more removed from reality and thus better for storytelling. Others will state the case that film stock is exponentially more expensive than video recording medium, and that the processing and transfer of film to video incurs costs of time and money that don’t warrant the expense and bother. The increasing quality of certain video formats makes the aesthetic differences appear to be less and less of a consideration, although ultimately, film is still the highest resolution acquisition format available. While super 16mm is probably less resolved in lines per inch than the currently available 1920 x 1080 high definition video formats, the available resolution of color space in most super 16mm film stocks is much greater. Also, keep in mind that while most filmmaker think “24fps” when they think of film, in reality, one can shoot film at any frame rate; in fact, this has been, until recently, a distinct advantage to film over video. Newer high speed video systems, especially with the advent of solid state recording formats, are beginning to give video acquisition the advantage over film once again, in both time and money when dealing with variable frame rates, even when the frame rates need to be manipulated and changed over time, sometimes referred to as “ramping”.

All of the available film formats and aspect ratios, from 2.34:1, 1.85:1, to 1.33:1, and any other in between, can be accommodated via existing video formats and technologies by the use of standard definition 1.33:1 (4x3) video or high definition 1.78:1 (16x9) video formats, by either cropping, pan-and-scanning (less desirable), or letter-boxing (more desirable, leaving the top and bottom black). Even if electronic postproduction methods are employed, a filmmaker can choose to return to a film distribution release print by either “conforming” the original negative via an EDL (edit decision list) to keycode conversion in order to assemble a conformed negative for prints, or by taking the finished video version of the project and “burning” it back to a film negative or print via a film recorder system. These aspects of film formats make it the most flexible of mediums, but at the same time can make film the most complicated, time consuming, and least affordable. When time and expense are lesser issues, though, nothing can beat the reliability and quality of film at this juncture, although the fast-paced evolution of electronic media is making this statement a moving target!

For the sake of discussion, we’ll use the generic term “video” to refer to all resolutions, aspect ratios, and recording formats of electronic acquisition. electronic production, and electronic postproduction. First, let’s look at the differences in how an electronic image is acquired and processes relative to how an image is captured on film.

When light photons or waves (depending on the physics theory applied!) strikes the surface of a photochemically sensitized film surface, using multiple layers of silver halide, chemical changes take place on a sub-molecular level within each silver halide clump, grouping, or granule. This happens linearly, on a frame-by-frame basis, as the film is being exposed or recorded. When the film is processed by being immersed in a series of chemical baths in order to either develop or eliminate granules that have either been affected or not affected by light, the resulting grain patterns, when light is projected through each frame on a frame-by-frame temporal, linear basis, comes to represent visual information to the viewer. The audience sees motion because of the phenomenon of “persistence of vision”, whereby one experiences the illusion of movement because the eye and brain no longer perceive the interval between one frame and the next. The viewer sees light and dark because intervening grains, changing on a frame by frame basis, interfere or allow the passage of light through the frame to the screen, and therefore exhibit variation of luminance in particular discrete areas of the frame. Color is represented by the exclusion, omission, or allowed transmission of specific wavelengths of light, as projected through each frame sequentially. This has been achieved mechanically and photochemically, using nineteen century “sewing machine” technology that has been around for well over a century.

Now let’s look at how an image is acquired on video. The theoretical methods of registering an image on a line by line basis, thereby “scanning” an image in order to break down the visual components into discrete elements for manipulation, encoding, and transmission and reception, has been around almost as long as the telegraph, a mid-nineteenth century invention. A precursor to the modern “fax”, or telephonic facsimile device, whereby images were scanned linearly on a rotating drum, and the resulting linear data was transmitted by telegraph, was used as a way to send images via telegraph for the purposes of relaying photos and engraving between newspapers, and because the images were represented by lines, they were ideal for the printing rotogravure process that was used during this period for the impression of visual images in the press.

With the advent of radio in the early twentieth century, the idea of wireless transmission of data became a reality. It didn’t take long for the notion of the transmission of visual information, especially moving pictures, to be an area of keen interest among key scientists and media entrepreneurs worldwide. At first, only very low-resolution images could be transmitted via “closed circuit” systems employing wire methods to carry the limited signal between the point of origin and the point of exhibition. The resolution of a video image was limited then, as it still is today, by the capacity of the carrier signal, which is referred to as “bandwidth”. As the use of radio frequency bandwidth became a more sophisticated field, and regulation of this bandwidth became necessary, standards of transmission and bandwidth allocation came into play. The bandwidth that analogue broadcast video is allocated today is based on standards set when video was a single black and white signal of limited resolution. The advent of color television posed a real challenge to broadcasters, eager to bring this improvement to the public in order to directly compete with what film cinema had to offer. In the United States, the FCC (Federal Communication Commission), who set standards for broadcast communications, along with the NTSC (North American Television Standards Committee), had set the frame rate at 30 frames per second, each frame divided into two separate scan “fields”, which became a 60 fields per second rate of display. This was chosen because the alternating electrical current is North America was 60 cycles per second, and thus the electrical current could, theoretically, be used as a regulating rate for the video signal.

Acquisition of the video image was effected by projecting a continuous stream of electrons from an electron “gun” onto the phosphorous-coated back of a closed vacuum tube. The scan pattern was controlled buy passing the electron beam through a yoke of electromagnets, each individually controlled by applied currents precisely timed to move the electron beam linearly left to right, and progressively up and down, in order to form a resolved scan pattern on a field-by-field, frame-by-frame basis. This electromagnetic yoke was called a magnetic lens, as it effectively acted as a “lens” in order to focus the electron beam at certain points at certain times. The arrival of the electron beam at a specific point at a specific time was a theoretical unit called a “pixel”, although there was no virtual interruption of this linear beam as it horizontally traversed the phosphorous surface of the video “tube”. The front side of this phosphorescent surface was the plane of focus of the lens of the video camera. When light photons (or waves) struck this surface and caused changes in the phosphorescent photochemistry, this precipitated changes in the electron beam being projected linearly as a scan pattern on the back of the tube, and these subsequent changes, reflected back, were recorded as variations in current or signal strength which could then be electronically interpolated as variations in luminance and motion when reconstituted as a projected electron beam in a video display device (television) when later displayed. Televisions used a CRT (cathode ray tube) to display the video signal. The scanning electron beam, controlled by a magnetic lens employing the prerecorded luminance values represented by variations in the electron beam signal strength, along with positional information of the beam contained in the video signal as well, was projected as the recorded scan pattern onto the phosphorescent back of the CRT. When the phosphor coating was subjected to the electron beam, the phosphor was chemically excited to emit light, which could then be viewed as luminent scanned lines in front of the tube. As each field and frame were subsequently displayed, sequentially, the phenomenon of persistence of vision once again came into play as moving images were perceived by the viewer. In this way, rudimentarily, a linearly scanned electronic representative of a visual, moving image could be acquired, transmitted, and displayed.

This brings us back to the issue of format. When discussing format, there are a number of aspects to be considered. First is the acquisition format, which may be much higher resolution, both linearly and in regards to bandwidth and total information, on a frame-by-frame basis, than the subsequent recording format or carrier signal. This is referred to as “oversampling”.

Keep in mind that originally, there really was no way to record the video signal, other than to photograph the frames to film, a process called “kinescope”. All of the early black and white video you see from historical television shows and broadcasts are kinescopes, where the film camera photographed the surface of a CRT in order to create a film recording of a live video signal. All broadcast television was performed live, even the television commercials. Later, as system whereby film could be projected into the lens of a television camera, live, was invented. This was called a “film chain”, as was the precursor of the modern film-to-tape telecine machines of today.

NTSC standard definition is comprised of 486 vertical lines of resolution out of 525 lines total, the other 39 lines representing non-visual information such as audio, timing and synchronization, etc. This gives us a picture with an aspect ratio of 1.33:1, or 4x3. Up until about the beginning of the 21st century, this was the standard broadcast format, although, as mentioned earlier, many subsequent oversampled acquisition formats have been developed and employed to provide the source material for this SD signal. Linear, analogue, tape-based formats such as 8mm, VHS, VHS-C,, 3/4 inch, 2 inch, and 1 inch have been devised, implemented, and employed to record and transmit the SD signal. These have been, for the most part, analogue composite signals, meaning that the discrete color carrier signals of red, green, and blue (RGB) have been combined into one signal via a method called multiplexing, whereby the three signals are carried simultaneously using “interleaving”, where the signal is carrying pulses of changing signal strength to represent each color signal simultaneously for subsequent interpolation at the receiving end, a form of analogue encoding and decoding. This “interleaving” process has it’s problems, causing visual anomalies in the signal called “artifacts”, particularly in the longer-wave end of the spectrum, represented by the red signal, which tends to bleed and exceed the defined boundaries of the video image.

In order to avoid these artifact problems, non-broadcast analogue acquisitions formats were devised whereby the RGB signal could be recorded discretely. This was called component video, and systems such as Hi-8mm, Super VHS, BetaCam, and BetaCam SP, were developed to take advantage of this capability. But there were problems in general with analogue video, whether it was composite or component. Even with the advent of a device called a time base corrector (TBC), which stripped away the non-visual timecode information of the analogue signal and regenerated it in the subsequently regenerated video signal, thereby significantly reducing the introduction of visual anomalies and artifacts in the reproduced or copied video signal, there were very defined finite limits to how many generations one could copy the video signal before significant deterioration of the resulting image became apparent, usually two or three generations at the most.

With the invention of the videotape recorder by the Ampex company, initially the 2-inch format, it was possible to record composite video linearly, but editing video was nearly impossible. Videotape could be cut with a razorblade and an edit block, like film, but it was nearly impossible to make frame-accurate edits. When switching devices and mixing boards for the video signal became available, it became possible to edit video by re-recording video on a linear basis, first in real time, then subsequently on a cut-by-cut basis sequentially. This is referred to as linear editing because it had to be done in sequential order, and if changes to an existing recorded edit needed to be effected, this had to be done with great difficulty as an “insert” edit, something that was to be avoided if at all possible. Therefore, editing by its very nature involved reproduction of the video signal, and the generational loss and deterioration of the signal that this entailed.

This is where the advent of the digital video came into play, because each picture element of the digital video signal is represented by numerical data in discrete packets representing fields and/or frames, and which can be reproduced almost infinitely without any generational loss. Initially-introduced digital video formats such as D-1 (SD digital component)and
D-2 (SD digital composite) were higher resolution than SD NTSC broadcast standards, using the 720x486 resolution, and the CCIR 701 digital standard of 4:2:2 bit-depth sampling. These standards were not field acquisition formats; the machines and tapes were much too large and cumbersome, and not robust enough for these purposes. They were devised and implemented to solve the problems of generational loss encountered in linear editing situations, and they solved these problems very well. The resulting video was downconverted to the SD NTSC broadcast signal for distribution.

With the need for robust digital field acquisition formats came Sony’s Digital Betacam, a digital version of the already ubiquitous ENG (electronic news gathering) format BetacamSP, which had replaced 3/4 inch video as the choice of field camcorder format for the new industry, supplanting 16mm film, which had been used for news gathering in the preceding decades. While Betacam SP has become an industry standard in ENG SD acquisition worldwide, and the dominant format, Digital Betacam, despite the obvious technological advantages, has proven to be too expensive, and has made limited inroads into the SD ENG market, although it has handily replace D-1 as a digital component SD postproduction format, as it is only compressed at a 2:1 ratio (more about compression later…).

DVCPro soon followed, another similar component, digital, Standard Definition field format developed by Panasonic. These formats relied on field tested and proven formats that were linear in nature, and that used moving magnetic tape passing over rotating magnetic record heads to sample the video signal digitally. The exploration of different ways to record the video signal, and different resolutions and aspect ratios to apply to the video image was about to create an explosion of new formats and video technologies that are profoundly affecting what we call video, film, and cinema.

Initial introduction of high definition video systems by Sony, Panasonic, and other manufacturers were large and cumbersome camera and videotape decks, and their use was limited to special projects. A “Grand Alliance” of manufacturers was established in the early 1990’s to decide upon a standardized high definition broadcast signal and format resolution, with mixed results. Sony embraced an interlaced 1920x1080, 1.78:1 (16x9) format, which was recorded to a tape format called HDCam. This was followed by the introduction of the very first HD video format capable of 24 frames per second video recording in a “progressive scan” mode, or 24p, where all the lines of each frame were scanned progressively, with no fields or field interpolation. This was highly desirable for the feature film industry, and kind of “holy grail” as an electronic way to replace film acquisition for independent and low budget feature films, although the development of the technology was driven by some of the heavy hitters in the film world such as George Lucas and James Cameron, as well as up and coming directors like Robert Rodriguez and Indies and experimental filmmakers like Lars von Trier, who had already been experimenting with standard def digital 24p, and upconversion using 24p pal to transfer to film distribution and release. Panasonic followed suit with a high definition version of their already successful DVCPro format, creating a camcorder system that was capable of variable frame rates from 1 to 60 frames per second, as well as 24p, called Varicam, supported by their proprietary DVCPro HD tape format. The pixel resolution of this camera system’s HD chipset is 1280x720, significantly lower than the Varicam, but how the image data is compressed and allocated makes up for this in many ways, giving the Varicam system a more “cinematic”, film-like look, according to many filmmakers who have enthusiastically embraced the Varicam. Panasonic’s 1/2 inch D-5 tape systems have already been embraced by postproduction facilities around the world as the “swiss army knife” of videotape decks, as they are capable of recording and playback of HD and SD, offering great economy and flexibility.

Video equipment manufacturers such as JVC and Ikegami had already released standard definition professional ENG camcorders using optical disk DVD recording systems for the broadcasting industry, with limited acceptance, as news organizations weren’t comfortable, in general, with the field reliability of this new technology and recording medium. What followed was a move on Sony’s part to introduce a highly compressed high definition (1920x1080), MPEG 2 algorithm optical disk recording format and camcorder system called XDCam to compete with other HD ENG acquisition formats and systems. The advantages of this medium are lightweight equipment, inexpensive and reliable optical recording medium, and the ability in audiovisual and IT networks to treat digital video material as data in ways that are becoming more and more common in the workplace.

On the consumer, “pro-sumer”, and low budget and independent filmmaking front, the rise of low-cost digital video formats such as miniDV and DVCam, Sony’s proprietary DV, based on small, inexpensive digital linear videotapes, has progressed rapidly. The first wave was lightweight, fixed lens camcorders such as the three chip Sony VX-1000, using the smaller, less expensive 1/3 inch chipsets. These formats are highly compressed, and while color rendition and overall image quality are surprisingly good, there are many compression artifacts present in the miniDV and DVCam image that can be problematic. The preponderance of visual data is contained in the mid-tones, or gamma part of the characteristic curve of these imaging devices, with very little detail contained in the highlights above 80 ire. Details in the shadows and black areas fall off precipitously as well, and DV is well known for its “crushed blacks” look. Overall, though, it’s an amazing revelation the first time on shoots DV after using comparable lower end consumer and prosumer formats, to see such a good-looking image come out of such small cameras using such small, inexpensive tapes.

What has followed now seems inevitable, that a high definition format using the same DV tape technology would emerge, and it has. The result is HDV, an even more highly compressed DV format that has been adopted by numerous manufacturers, including Canon, JVC, Panasonic, and Sony.

Using 1920x1080 or 1280x720 chipsets, depending on the manufacturer, the resulting HD video generated with these camera systems is stunning, especially considering the relative economy that the price of these camera systems, tape stocks, and postproduction options that the HDV systems afford. But the use of intraframe, or spatial compression in the form of “groups of frames”, or GOP’s, has been problematic in the postproduction process. Frame accurate editing has been unreliable until fairly recently. The two dominant nonlinear editing software developers, Avid and Apple’s Final Cut Pro, have been slow to address the problems posed by frames grouped together in information packets, or “wrappers”, in the compression scheme, but good progress has been reported lately, these problems appear to be on the road to history soon.

The latest format choice, introduced by Panasonic via the P2 solid-state recording system, is direct compressed recording to solid-state chips. In the P2 system, large capacity semiconductor chips are integrated onto PCM/CIA cards for quick insertion and removal into built-in slots on their camera systems, with capacities starting at 2 gigabytes per card, and climbing upward of 16 gigabytes to date, with 32 gigabytes coming soon. Using Panasonic’s DVCPro HD signal, which uses interframe, temporal compression, the signal can be easily and accurately edited without consideration for groups of frames, just like the tape-based Varicam, because all of the full video data for each frame is contained in a discrete packet, on a frame-by-frame basis. The first camcorder system introduced using the P2 system was the HVX-200, a fixed lens three-chip HD camera with a two port PCM/CIA card slot capacity. Larger, more expensive versions are now coming to market, using the larger ENG camera design, with interchangeable lenses and four and five card slot capacities. These cameras offer variable frame rates, instantaneous downloading of clip file data and metadata for field postproduction, and the workflow is being celebrated for its ease of use and flexibility. The primary downside to the P2 system is the cost of the PCM/CIA cards, which can be close to 1000 dollars or more per card, depending on the gigabyte capacity of each card. Workflow schemes where the filmmaker can hotswap and download cards to laptops and hard drives while simultaneously shooting have been used extensively in the field with good results. This kind of scenario, without safeguards and redundancy plans in place, along with stringent procedural rules in place, can prove disastrous, and leaves many more experienced production personnel uncomfortable. Use of large capacity hard drive recording systems such as the Focus Enhancements Firestore system, with a 100-gigabyte capacity, has proven to be a reliable alternative to using P2 cards, but negates some of the convenient aspects of the swap-able solid-state cards.

Upcoming on the horizon for formats are a new generation of cameras and video recording systems based on the MPEG4/AVC compression algorithm, which uses much lower bit transfer rates and less data space. They can record onto standard SD data cards, such as are currently used in digital still cameras, and are much cheaper and more readily available than P2. It remains to be seen what the editing issues are with MPEG4/AVC relative to frame accuracy, and what compression artifact one will encounter with this new format, but the future is bright for all things smaller, faster, and cheaper in the new frontiers of digital video, as always.