I have this favorite story about "the first 3D shopping mall" on the Hungarian Internet: it was of course a massive failure, because the minds behind it didn't realize that people are not going to the mall to use escalators (which was precisely modeled in VRML along with corridors, benches, etc).
I think the recent enthusiasm about GenAI-driven voice/video recognition is similar in this aspect: in many cases people would prefer a system *not* involving human(-like) interaction (or using an escalator), it's just we currently don't have better solutions for many tasks. Assuming a speech/video interface is always better than e.g. a bar code reader results in faster horses, not cars.