Microsoft corporate VP Pavan Davuluri outlined the company’s vision for transforming Windows into a voice and vision-driven operating system that could eventually reduce reliance on traditional peripherals like mice and keyboards. Speaking in a video released Wednesday, Davuluri emphasized how generative AI will enable new multimodal interaction methods, suggesting a fundamental shift in how users engage with their computers over the next five years.
The big picture: Microsoft envisions Windows evolving beyond traditional input methods toward AI-powered interfaces that can understand speech, visual content, and user intent contextually.
What they’re saying: Davuluri described the scope of these interface changes during the presentation.
• “I think what human interfaces look like today and what they will look like five years from now is one big area of thrust for us that Windows continues to evolve,” he said.
• “You’ll be able to speak to your computer while writing, inking, interacting with another person, for example. You should be able to have a computer semantically understand your intent to interact with it, for instance, from when it’s awake or when you put a machine to sleep.”
Key capabilities planned: The next-generation Windows will incorporate several advanced AI-driven features that expand beyond current limitations.
• Voice commands will become integrated across multiple simultaneous activities, allowing users to speak while writing or interacting with others.
• Visual understanding technology will enable Windows devices to view and comprehend desktop screen content automatically.
• Semantic understanding will allow computers to interpret user intent contextually, including system states like sleep and wake functions.
• The interface evolution mirrors science fiction concepts like the Star Trek computer or the AI assistant from the film “Her.”
Privacy considerations: Microsoft plans to address potential data collection concerns through on-device AI processing.
• Davuluri mentioned using AI models that can run “on the device,” suggesting operation without internet connectivity.
• This approach could help protect user privacy while enabling advanced AI capabilities.
• The vision may still concern users given the extensive data collection required for such comprehensive AI understanding.
Current state: Some AI capabilities already exist in Microsoft’s ecosystem, though adoption remains limited.
• Windows 11 users can access certain features through Microsoft’s Copilot AI assistant.
• A large number of PC consumers still use Windows 10, which officially loses support in October.
• The transition to AI-driven interfaces represents a significant departure from current Windows interaction models.
Interface evolution: Microsoft sees this transformation as part of a broader shift in computing interaction paradigms.
• “We certainly look at the interface becoming more multimodal and more capable, based on new interaction technologies that come to life,” Davuluri explained.
• The progression moves from “mouse and keyboard moving to pen and touch, and so on,” indicating a continuous evolution of input methods.
• Voice interaction will become particularly important as Windows expands to diverse device types including tablets and gaming handhelds.