pts trouble with audio from dshow
Created by: fvollmer
I'm recording audio from dshow and there seems to be a problem with the pts. ffmpeg isn't complaining about these sources, but I also think this might not be a bug in pyav. I tried several input devices and it always creates some error like:
Encoder did not produce proper pts, making some up.
Traceback (most recent call last):
File "C:/Users/User/PycharmProjects/project/audio.py", line 43, in <module>
output_container.mux(output_audio_stream.encode(frame))
File "av\stream.pyx", line 155, in av.stream.Stream.encode
File "av\codec\context.pyx", line 466, in av.codec.context.CodecContext.encode
File "av\audio\codeccontext.pyx", line 40, in av.audio.codeccontext.AudioCodecContext._prepare_frames_for_encode
File "av\audio\resampler.pyx", line 122, in av.audio.resampler.AudioResampler.resample
ValueError: Input frame pts 980000 != expected 1000000; fix or set to None.
I can obviously just set the pts to None
and it will just make the some up. This seems to be deprecated and results in the following warning
Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
Encoder did not produce proper pts, making some up.
To debug the problem I wrote the following code, which prints pts, time base, sample rate, samples etc. and allows to see the what exactly is going on.
import av
# doesn't work with dshow (tried different devices, and also specifying the sample rate)
input_container = av.open(format='dshow',
file='audio=Eingang (High Definition Audio Device)',
#file='audio=Mikrofon (USB2.0 MIC)',
options={"audio_buffer_size": "100",
#"sample_rate": "44100",
},
)
# # works
# input_container = av.open(format='lavfi', file='sine=frequency=1000:duration=5',
# options={"sample_rate": "44100"})
input_audio_stream = input_container.streams.audio[0]
output_container = av.open('test.mkv', mode='w')
output_audio_stream = output_container.add_stream("aac", rate=44100)
first_pts = None
next_pts = 0
for frame in input_container.decode(audio=0):
# subtract first pts to start at zero
if first_pts is None:
first_pts = frame.pts
frame.pts -= first_pts
print(f"pts: {frame.pts}")
print(f"time base: {frame.time_base}")
print(f"sample rate: {frame.sample_rate}")
print(f"samples: {frame.samples}")
print(f"array_shape/2: {frame.to_ndarray().shape[1]/2}") # we have stereo
pts_per_sample = frame.time_base.denominator / frame.time_base.numerator
pts_per_sample /= frame.sample_rate
next_pts = frame.pts + pts_per_sample*frame.samples
print(f"next pts: {next_pts}")
print("---------")
frame.pts = None
output_container.mux(output_audio_stream.encode(frame))
Example output:
pts: 0
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 1000000.0
---------
pts: 980000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 1980000.0
---------
pts: 2000000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 3000000.0
---------
pts: 3010000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 4010000.0
---------
pts: 4030000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 5030000.0
---------
pts: 5050000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 6050000.0
---------
How should we handle this?