thinking about the deictic properties of the indefinite article so often found in image captions, especially in a machine learning context—e.g. it's almost always something like "a boat is in the water," not "this/that/the boat..." (or, you know, "my boat," "someone else's boat," etc.). demonstrates the extent to which image captioning as an activity assumes & enforces a kind of physical, temporal and social decontextualization of images—neither captionist nor viewer are "present" in the process

Show thread

image captions like this always sound like descriptions from text adventure games, or like calling the super for help with the fusebox when a fuse gets tripped, or like when one stranger asks another at some public spectacle, "what happened here?" (very different from how you might "caption"/describe photos you took [e.g.] on vacation to a friend while showing them to the friend—in that case you'd be saying things like "and *this* is *my* favorite place in...")

Show thread
Follow

@aparrish Is technology like this being used for captioning? Or just for search engine indexing?

For the reasons you mention, and more, it seems pretty poorly suited to captioning.

· · Web · 1 · 0 · 0

@apLundell thinking of e.g. cs.stanford.edu/people/karpath and github.com/tensorflow/models/t both of which are well-known image captioning models based on corpora that have this characteristic (though admittedly I don't know exactly how these particular models are being used in commercial applications)

Sign in to participate in the conversation
Octodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!