pts trouble with audio from dshow

Created by: fvollmer

I'm recording audio from dshow and there seems to be a problem with the pts. ffmpeg isn't complaining about these sources, but I also think this might not be a bug in pyav. I tried several input devices and it always creates some error like:

Encoder did not produce proper pts, making some up.
Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/project/audio.py", line 43, in <module>
    output_container.mux(output_audio_stream.encode(frame))
  File "av\stream.pyx", line 155, in av.stream.Stream.encode
  File "av\codec\context.pyx", line 466, in av.codec.context.CodecContext.encode
  File "av\audio\codeccontext.pyx", line 40, in av.audio.codeccontext.AudioCodecContext._prepare_frames_for_encode
  File "av\audio\resampler.pyx", line 122, in av.audio.resampler.AudioResampler.resample
ValueError: Input frame pts 980000 != expected 1000000; fix or set to None.

I can obviously just set the pts to None and it will just make the some up. This seems to be deprecated and results in the following warning

Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
Encoder did not produce proper pts, making some up.

To debug the problem I wrote the following code, which prints pts, time base, sample rate, samples etc. and allows to see the what exactly is going on.

import av

# doesn't work with dshow (tried different devices, and also specifying the sample rate)
input_container = av.open(format='dshow',
                          file='audio=Eingang (High Definition Audio Device)',
                          #file='audio=Mikrofon (USB2.0 MIC)',
                          options={"audio_buffer_size": "100",
                                   #"sample_rate": "44100",
                                   },
                          )

# # works
# input_container = av.open(format='lavfi', file='sine=frequency=1000:duration=5',
#                           options={"sample_rate": "44100"})

input_audio_stream = input_container.streams.audio[0]

output_container = av.open('test.mkv', mode='w')
output_audio_stream = output_container.add_stream("aac", rate=44100)

first_pts = None

next_pts = 0
for frame in input_container.decode(audio=0):

    # subtract first pts to start at zero
    if first_pts is None:
        first_pts = frame.pts
    frame.pts -= first_pts

    print(f"pts: {frame.pts}")
    print(f"time base: {frame.time_base}")
    print(f"sample rate: {frame.sample_rate}")
    print(f"samples: {frame.samples}")
    print(f"array_shape/2: {frame.to_ndarray().shape[1]/2}")  # we have stereo
    pts_per_sample = frame.time_base.denominator / frame.time_base.numerator
    pts_per_sample /= frame.sample_rate
    next_pts = frame.pts + pts_per_sample*frame.samples
    print(f"next pts: {next_pts}")
    print("---------")

    frame.pts = None
    output_container.mux(output_audio_stream.encode(frame))

Example output:

pts: 0
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 1000000.0
---------
pts: 980000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 1980000.0
---------
pts: 2000000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 3000000.0
---------
pts: 3010000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 4010000.0
---------
pts: 4030000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 5030000.0
---------
pts: 5050000
time base: 1/10000000
sample rate: 44100
samples: 4410
array_shape/2: 4410.0
next pts: 6050000.0
---------

How should we handle this?