The new LAM model can perform some tasks in Word.
- panoramagatewayllc
- Jan 4
- 1 min read
Microsoft researchers have developed the LAM (Large Action Model), an AI designed to control Windows applications, including Microsoft Office. Unlike traditional language models like GPT-4, which focus on generating text, LAM can turn user requests into real actions by creating step-by-step plans. It processes input from text, voice, and images, adjusting its actions in real-time.
In tests with Word, LAM completed tasks successfully 71% of the time, outperforming GPT-4, which achieved 63% when visual data wasn't included. LAM was also faster, completing tasks in 30 seconds compared to GPT-4's 86 seconds. However, when visual information was provided, GPT-4 had a higher accuracy rate of 75.5%.
For training, researchers used 29,000 "task-plan" pairs, later expanding this to 76,000 examples with GPT-4's help. Despite challenges such as AI errors and technical limitations, researchers believe LAM represents an important step toward Artificial General Intelligence (AGI).
LAM's development follows four key stages: breaking tasks into logical steps, using advanced AI like GPT-4 to convert plans into actions, finding new solutions independently, and improving the system through reinforcement learning.
