top of page

AI Case Study

The XiaoIce chatbot automatically generates poems based on images using neural networks

The ability to generate poetry based on an image has been introduced on Microsoft's XiaoIce chatbot platform. Using recurrent LSTM and convolutional neural networks, text descriptors of the image are obtained and then modern poetry generated from key words. Users have generated 12M poems thusly so far.


Consumer Goods And Services

Entertainment And Sports

Project Overview

"Given an image, we first extract a few keywords representing objects and sentiments perceived from the image. These keywords are then expanded to related ones based on their associations in human written poems. Finally, verses are generated gradually from the keywords using recurrent neural networks trained on existing

Reported Results

The system was deployed on XiaoIce in 2017 and within a year 12M poems had been generated. A book of 139 generate poems was published in 2017.


"For the image query, object and sentiment detection are used to
extract appropriate nouns, such as city and street, and adjectives,
such as busy, as initial keyword set.... We propose detecting objects and sentiments from each image with two parallel convolutional neural networks (CNN), which share the same network architecture but with different parameters. Specifically, one network learns to describe objects by the output of noun words, and the other learns to understand the sentiments by the output of adjective words. The two CNNs are pre-trained on ImageNet and fine-tuned on noun and adjective categories, respectively.

After filtering out words with low confidence and rare words, keyword expansion will be applied to construct a keyword set. Next, each keyword is regarded as an initial seed for each sentence in the poem generation process. For example, the first sentence is generated from the seed city. A hierarchical recurrent neural network is proposed for modifying the structure between words and between sentences. We follow the recurrent neural network language model (RNNLM) to predict text sequence. Each word is predicted sequentially by the previous word sequence...

Finally we apply a fluency checker to automatically detect low quality sentences early on and re-generate them. We use Long-Short Term Memory (LSTM) for RNN... In the poetry generation model, the recurrent hidden layers for the sentence level and poem level both contain 3 layers and 1024 hidden units for each layer. The sentence encoder dimensionality is 64."



General Operations


"Generating poetry from image is a special task of text generation from image. There have been many studies in this area. However, most of them focus on image captioning rather than literature creation. Only few of previous systems addressed the problem of generating poems from images. There have also many studies and systems for generating poetry. In most cases, a system is provided with a few keywords and is required to compose a poem containing or relating to the keywords."



"As a training corpus, we collect 2,027 modern Chinese poems that are composed of 45,729 sentences from The character vocabulary size is 4,547. For the training of word based model, word segmentation are applied on the corpus. The size of word vocabulary is 54,318.

For the model optimization experiment, 100 public domain images are crawled from Bing image search by searching 60 randomly sampled nouns and adjectives in our predefined categories. We focus on 45 images recognized as views for optimizing our model."

bottom of page