Think of a big database of nouns(objects/meshes/structure/digitized things). Your sensors should be able to look at real life, and then categroize things into nouns. Modified nouns are adjectives by nature for instance the color of an object. Verbs we teach by hard code at first are the motion of objects or interactions of them. And modified verbs are adverbs.

Your code will be describing using an action on an object, and natural language can also describe it in a way that it anchors into your code.

First we need to database a lot of real world objects and be able to know when we encounter them with sensors. If we database them in properly, maybe we even describe the elemental composition of the matter they're made up of to do physics simulations and expected interactions. But just knowing that a Soda Can is on the desk, and where the walls, and obsticles are in a room, and that is the beginning. Next database in as many objects as you can, expand the algorithm to understand dynamic objects of non specific dimensions, and you've done the hardest part of AI. Once that is done, Natural Language is easy to code for you already put in the nouns. It is just a different way of describing the scene than code. In fact everything you want AI to do is easy to code once it has spacial awareness of objects, and terrain to navigate.

