Corrupt File causes program to exit prematurely using TRY/EXCEPT block.
Created by: RoscoeTheDog
Overview
When performing decoding operations in a Try/Except block, PyAV encounters a corrupt file and then prematurely exits the entire application without catching the thrown exception. Normally, it would catch the exception and continue operating as normal.
Expected behavior
Throw exception from the current working file and continue operating as usual.
Actual behavior
Exception thrown:
Format flac detected only with low score of 1, misdetection possible!
Could not find codec parameters for stream 0 (Audio: flac, 0 channels): unspecified sample format
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Reproduction
Pass the path of the unzipped file to the function. Run the source code. Consider adding some random print statements or something after the function call to see how it crashes the rest of the program prematurely.
#import av
def pyav_decode(path: str) -> dict:
"""
:param path: :str: "path to file"
:return: :dict: "dict of metadata from file"
"""
# declare different types of streams (defaults to false)
file_meta =\
{
'a_stream': False,
'v_stream': False,
'i_stream': False,
'succeeded': False,
}
try:
# TODO: investigate why open() can take awhile to return in some cases. (seems mostly video)
container = av.open(path)
"""
Note:
Values that == 0 mean not known or is a False Positive.
Decode audio channels first for efficiency (it has the highest probability to fail).
"""
for frame in container.decode(audio=0):
channels = frame.layout.channels # returns (tupple[list]) of channels
counter = 0
for ch in channels:
counter += 1
if not counter == 0:
file_meta['channels'] = counter
file_meta['channel_layout'] = frame.layout.name
break # Do not decode all frames for audio channel info
# decode file's bit-rate
if not int(container.bit_rate / 1000) == 0:
file_meta['a_bit_rate'] = container.bit_rate / 1000
# decode file's streams
for s in container.streams:
"""
Certain properties from Images (such as stream type) can be mistaken as Video.
Check the decoder's format name to determine if == image or video.
"""
# IMAGE STREAMS
file_meta['i_stream'] = is_image_stream(s.codec_context.format.name)
if file_meta['i_stream'] is True: # skip current working stream if == Image type
continue
# VIDEO STREAMS
elif s.type == 'video':
file_meta['v_stream'] = True
"""
- PyAV library does not always return v_duration reliably, but is the fastest method.
- FFprobe is an alternative whenever v_duration is not returned.
"""
file_meta['v_duration'] = s.metadata.get('DURATION', '')
if file_meta['v_duration'] == '':
stdout, stderr = ffprobe(path)
file_meta = parse_ffprobe(stdout, stderr)
file_meta = validate_keys(file_meta)
break
# decode video container's resolution
if not s.width == 0:
file_meta['v_width'] = s.width
if not s.height == 0:
file_meta['v_height'] = s.height
# decode actual encoded resolution of video
if not s.coded_width == 0:
file_meta['v_buffer_width'] = s.coded_width
if not s.coded_height == 0:
file_meta['v_buffer_height'] = s.coded_height
file_meta['nb_frames'] = s.frames
if s.frames == 0:
file_meta['nb_frames'] = s.metadata.get('NUMBER_OF_FRAMES', '')
# decode frame-rate (returned in fraction format)
if not int(s.rate) == 0:
file_meta['v_frame_rate'] = float(s.rate)
# decode video format
if s.pix_fmt:
file_meta['v_pix_fmt'] = s.pix_fmt
# AUDIO STREAMS
elif s.type == 'audio':
file_meta['a_stream'] = True
# decode sample format
if s.format.name:
file_meta['a_sample_fmt'] = s.format.name
# decode sample rate
if not int(s.sample_rate) == 0:
file_meta['a_sample_rate'] = s.sample_rate
# decode bit depth (note: 24 bit will show as 32 -- check sample_fmt for pcm_s24le instead)
if not int(s.format.bits) == 0:
file_meta['a_bit_depth'] = s.format.bits
# check dict keys for missing entries or 0s -- minimize decoding false positives into database
file_meta = validate_keys(file_meta)
file_meta['succeeded'] = True
except Exception as e:
file_meta['succeeded'] = False
print(e)
return file_meta
def is_image_stream(stream_fmt: str):
"""
Normally we just check mimetypes to check how the file will be decoded before the function call.
This function just adds a precautionary step during an event where a video/image is being decoded and frames are misinterpreted.
:param stream_fmt: accepts string value of av_decode stream format
:return: boolean value of image type
"""
_stream = False
if 'pipe' in stream_fmt: # 'pipe' are typically image-type decoders
_stream = True
if stream_fmt in ['image2', 'tty', 'ico', 'gif']: # list of some other image decoders
_stream = True
return _stream
Versions
Windows 10 Pro x64 Python 3.7.4 PyAV version: 6.2.0 py37heb183d3_1 conda-forge FFmpeg 4.1.3 built with gcc 8.3.1 (GCC) 20190414
Additional context
The given source code is only to extract metadata and does not handle any encoding operations.
Under any other circumstance, I can use a Try/Except to catch any of these problems and keep other operations running smoothly. This file somehow exits the application prematurely and ignores all of this
The problem file has been attached to this post below. You can clearly see it is corrupt, having a file size of '0'. Identifying this is not the issue, it is managing/handling it within the source code so that it does not break everything which is the problem.