So you want to check if 1) the data got corrupted from the original source to the "unfriendly service" or 2) from the unfriendly service to you?
(for 2) hashing works)
Well, I'd do like acquisition tools for hard disks do:
Hash the data in transit and then hash again the data on your hard disk (the resulting file). Should be possible with tee maybe?
If that's too complicated for your use case, just download, hash and then download again and pipe into hashing tool. On Linux I'd do something like:
# download everything
wget -R [url]
# hash everything
md5sum [files]
# download again and only hash
# maybe insert loop here
curl [url to filename] | md5sum > filename.md5
That's quick and dirty, you probably want a loop around curl but that would not be hard (something like for $file in ${ls -R *.mp4}; do ...; done should work or use find instead of ls)
PS: before someone says md5 is not secure. Yes, if assuming attacks, but here we only want to prove that the data is identical and md5 is faster than sha hashes.
@buherator @13reak do you get an e-tag ? That should be basically md5 - all though you will have to verify chuncks (last digits of the etag should tell the number of chunks). Or you could compare content length header ?
@buherator fetch twice and compare two copies in your storage? if you are only concerned about corruption on its way to you or the process of saving it on the disk, of course.
@buherator it should work unless the service does live media encoding. easy to test against too, just download twice. :)
as for your other concern, not much you can do about it without examining the received media in detail. to save time, you could produce a small still image every hour, for example; much easier to inspect daily instead of watching everything, but still requires a human in the loop.
@buherator pretty sure you can find examples of how to do that, as a lot of media sites produce "previews" in a similar way. should be easy to throw together.
@buherator @infosecdj I came here to suggest random audits, which would be more robust against an adversary than periodic ones. Otherwise, the ideas in this thread sound good to me.