21
loading...
This website collects cookies to deliver better user experience
main()
implementation:def main():
parser = argparse.ArgumentParser()
parser.add_argument('--find-offset-of', metavar='audio file', type=str, help='Find the offset of file')
parser.add_argument('--within', metavar='audio file', type=str, help='Within file')
parser.add_argument('--window', metavar='seconds', type=int, default=10, help='Only use first n seconds of a target audio')
args = parser.parse_args()
offset = find_offset(args.within, args.find_offset_of, args.window)
print(f"Offset: {offset}s" )
if __name__ == '__main__':
main()
find_offset
function.librosa
to read both of our audio files, match sampling rate and convert them to raw samples in float32
format (almost) regardless of the original audio format. The last part is accomplished with ffmpeg
which is used basically everywhere where AV processing is involved.y_within, sr_within = librosa.load(within_file, sr=None)
y_find, _ = librosa.load(find_file, sr=sr_within)
window
of it) using Fast Fourier Transform (FFT) method:c = signal.correlate(y_within, y_find[:sr_within*window], mode='valid', method='fft')
peak = np.argmax(c)
offset = round(peak / sr_within, 2)
PullTube
to download the video clip and ffmpeg
to extract the audio and convert it to WAV
at the 16 kHz sampling rate.3242.69 seconds
which is precisely the moment the song starts in the Youtube video. Voilà!