Image Understanding

Image Understanding
Image Understanding

At its core, image understanding is the process by which machines interpret visual content, extracting meaningful information just as a human would when looking at a picture. There are several processes involved: Recognizing objects within an image, understanding the context, relationships between objects, even inferring the emotions or actions depicted in the visual content.

 Image understanding has many potential real world applications in fields like healthcare, retail, and autonomous driving.

For instance, here are a couple of attempts of Chat GPT 4 to describe the scene in Fig. 1:

  1. “The image depicts a group of men walking through a hallway with a tiled floor and white walls. In the foreground, a tall man in a dark suit is standing on a weighing scale, checking his weight, and he appears to be holding a clipboard or folder. He is smiling and looking at a shorter man standing next to him, who is laughing and pointing towards the scale, seemingly sharing a light-hearted moment. The shorter man is also dressed in a dark suit and is walking with an air of confidence and joy. Behind them, there are other men dressed in suits who are observing the interaction, some with smiles on their faces, suggesting an overall jovial atmosphere. The environment gives the impression of a professional setting, perhaps a workplace or a formal event, where a moment of humor is being shared among the individuals.”
  2. “This image captures a moment where a group of men are in a hallway with tiled floors and walls. One man is stepping on a scale to measure his weight. He is holding a clipboard, implying that he may be taking part in some form of a health check or tracking his weight for some reason. To his right, another man is looking on with a notably amused expression, while others in the background appear to be enjoying the scene as well, with smiles and laughter. The overall mood of the image seems lighthearted and congenial.”
Figure 1. President Obama is stepping on the weight scale.

Despite describing the image accurately, it misses the main point: “President Barack Obama was playing a prank, making the other man weight more by stepping on the weight scale.” 

If you are interested in how to achieve this type of analysis using Python please consider subscribing to our newsletter, where you will find all the documented and tested Python code.