I dont think AGI is required. But multimodality for sure and some piece to plan/execute actions. Prompting techniques now, would land you clear tasks, validations and such, given some objective "replace table leg". A unit to process this would be required.