Matching Love Poems to Images from The Metropolitan Museum of Art

Study of Cupid by John Quincy Adams Ward
Study of Cupid by John Quincy Adams, Metropolitan Museum of Art

Chances are good that when you look at this page, you see Cupid. In fact, the adjacent image is titled Study of Cupid and is from the sketchbook of American artist John Quincy Adams Ward. I downloaded it along with 1699 drawings by American artists from the Metropolitan Museum of Art’s open access collection and then asked my computer to generate a caption for each.

About Study of Cupid it said:

a bear of a male in an orange and snow

I was charmed.

Visitors can browse more than 406,000 images in the Met's collection and/or use the images that are in the public domain. The collection’s API provides a way to programmatically collect a set of either a low- or a high-resolution versions. I used the Met’s API when I gathered the low-resolution images for my project.

My goal was to match each of the 1000 love poems in my collection with a piece of art from the Met and to use my caption generator to help me. The pictures you see in the love carousels on this site are the result of this experiment, which I describe in more detail below.

Here’s What Happened:

After downloading the images from the Met, I built my caption generator using Google’s Image Captioning notebook, which I ran on Google Colab. I tried other implementations before I settled on this one, but ultimately, the Colab notebook was the one that I got running most easily (I used a professional account, and got bumped off from time to time, but that was my biggest trouble).

a plate some small dessert cake on a table<

a plate some small dessert cake on a table

The initial caption generator—which created the “bear of a male in an orange and snow” caption-- was trained on just 30 thousand captions, which is a relatively small amount of text, but allowed me to test out the caption generator without spending too much time on it. I tried captioning a couple of my own photographs as a quick way to set my expectations for the drawings.

I wasn’t sure how well the caption generator would do with the drawings as it had never seen a drawing before, only photographs. Also, the generator’s vocabulary size was—in the interest of saving memory-- constrained to 5000 words, about what the average four-year-old knows.

Still, I had a substantial number of poems and images and would take even a four-year-old's help. Originally, I included a few hundred works from the Biodiversity Heritage Library’s open access collection in addition to the work from the Met. I love these images, but the license specified attribution, and I thought that would be too much text for the carousels. I include a few of the images here, however, and you can see the whole collection on Flickr.

My first caption generator produced these gems:

a drawing of a drawing of a sign that appears to be broken

a drawing of a drawing of a sign that appears to be broken

CC BY-NC-SA 3.0 Catalog of North American Mammals, Biodiversity Heritage Library

a pile of bananas

a pile of bananas and a surf board

CC BY-NC-SA 3.0 Catalog of North American Mammals, Biodiversity Heritage Library

drawing of ship

a large ship very large assortment of pictures with some pictures with a factory with a mask being used as a massive amount of which is hanging from a thick shrubbery

Ship (from McGuire Scrapbook), 1839 Probably Edmund C. Coates Metropolitan Museum of Art/American Wing

After my initial experiment, I decided to train the caption generator on a larger number of captions. For my final caption generator, I trained the system on 414,113 captions and increased the vocabulary size to 10,000 because I wanted my system to be more expressive.

While the new caption generator was training—-it took all night, and still didn't go as long as I'd have liked as I lost my connection--I looked at the gender data the Metropolitan Museum provides. Although only a subset of the female artists are identified as such in the dataset, I was able to browse the work of Mary Russell Smith, Henrietta Johnston, Marcia Oakes Woodbury, Fidelia Bridges, Ellen Robbins, Jane Anthony Davis, Maria Edgar, Sarah Fairchild, Emily Maria Spaford Scott, and Ruth Whittier Shute this way (as well as four paintings attributed to James Sharples, which note that Ellen Wallace Sharples is ‘possibly’ the artist). Here is some of the work:

Cypress Bough by Mary Russel Smith

Cypress Bough

Mary Russel Smith

Metropolitan Museum of Art/American Wing

Dutch Woman by Marcia Oakes Woodbury

Dutch Woman

Marcia Oakes Woodbury

Metropolitan Museum of Art/American Wing

Lady Seated in a Boston Rocker by Jane Anthony Davis

Lady Seated in a Boston Rocker

Jane Anthony Davis

Metropolitan Museum of Art/American Wing

Miss Emeline Parker of Lowell by Ruth Whittier Shute

Miss Emeline Parker of Lowell, Massachusetts

Ruth Whittier Shute (Drawn by R.W. Shute and Painted by S. A. Shute.)

Metropolitan Museum of Art/American Wing

At last, the caption generator was ready and I tried it out. Here’s what the new and improved caption generator (i.e., trained on more data and given a larger vocabulary, but not trained for as long as I'd like) had to say about Cupid:

a man holding a skateboard in the dark column

 a man holding a skateboard in the dark column

Now that I had my state-of-the-art captions, all I had to do was compare them to the poems to determine which image was most suited for each.

There are many ways to approach text comparison (this piece on Medium offers a nice overview). For my first attempt, I decided to use BERT and cosine similarity to compare the image captions to summaries (representing 'the heart') of the poems. Using computer-generated captions and summaries allowed me to compare two texts of a similar length that also reflected my computer's understanding of the work.

I decided to use Derek Miller's summarizer (you can try it out here). The linked paper explains that the service was designed to help students summarize lecture content. and nowhere is poetry mentioned. However, the summarizer is extractive--it summarizes text by pulling phrases from the original document--and I thought that using the original language would be ideal for my purposes.

The basic unit for prose is the sentence, but for poetry, the line. To encourage the summarizer to consider each line as a unit, I pretended that each was a sentence by replacing the line break with a period, and then I instructed the summarizer to make a one-sentence summary for each poem.

Here, for example, is the poem summary of " The Modern Woman to Her Lover" by Margaret Widdemer: Hand in locked hand we shall pass along

I managed to create a brief summary for the bulk of the poems using the summarizer (some of the summaries were longer than others). For the poems that resisted summation, I just used the first 20 words.

Then I looked to see which caption and summary were most similar to one another so that I could match the corresponding image and poem. When I started this project, I hadn't considered the possibility that the images would match more than one love poem, but several images matched multiple poems. Coming in at the top of the love match (i.e., determined by my computer to be the best match for the most love poems) is Two Studies of a Man:

Two Studies of a Man by Francis William Edmonds
Two Studies of a Man by Francis William Edmonds

Followed by an evocative sketch by Thomas Sully, which was selected by my computer as the next best match for love poems:

Various Figure Sketches, Including Two Battling Equestrains, Two Wrestlers, Thomas Sully
Various Figure Sketches, Including Two Battling Equestrains, Two Wrestlers, Thomas Sully

I wanted each poem to have a unique image. When an image matched multiple poems, I assigned it to the poem with the highest similarity score. E. E. Cummings's Amores (IX) 'won' the Two Studies of a Man image this way, while Anne Marie Macari received Sully's sketch for From the Plane (a little under half the poems were assigned their top match and about two-thirds to an image in their top five). And Cupid, that bear of a male, was (alas!) not assigned at all.

See more poems and images in the Love Carousels

Further reading

Miller, Derek. "Leveraging BERT for Extractive Text Summarization on Lectures." arXiv preprint arXiv:1906.04165 (2019).
Reimers, Nils, and Iryna Gurevych. "Sentence-bert: Sentence embeddings using siamese bert-networks." arXiv preprint arXiv:1908.10084 (2019).