RNNを使ってテキストを生成してみる【Keras】

今日も宿題をすることができなかった。明日は連休最終日ですが、今日もまだまだがんばりますよ。RNNを使ってテキストを生成します。とは言っても英語で行うので、正しいのかどうなのかは判別し難い部分がありますね。

# -*- coding: utf-8 -*-
from __future__ import print_function

import numpy as np
from keras.layers import Dense, Activation, SimpleRNN
from keras.models import Sequential
import codecs

INPUT_FILE = "C:\\Users\\admin\\log\\11-0.txt"

# extract the input as a stream of characters
print("Extracting text from input...")
with codecs.open(INPUT_FILE, "r", encoding="utf-8") as f:
    lines = [line.strip().lower() for line in f
             if len(line) != 0]
    text = " ".join(lines)

# creating lookup tables
# Here chars is the number of features in our character "vocabulary"
chars = set(text)
nb_chars = len(chars)
char2index = dict((c, i) for i, c in enumerate(chars))
index2char = dict((i, c) for i, c in enumerate(chars))

# create inputs and labels from the text. We do this by stepping
# through the text ${step} character at a time, and extracting a
# sequence of size ${seqlen} and the next output char. For example,
# assuming an input text "The sky was falling", we would get the
# following sequence of input_chars and label_chars (first 5 only)
#   The sky wa -> s
#   he sky was ->
#   e sky was  -> f
#    sky was f -> a
#   sky was fa -> l
print("Creating input and label text...")
SEQLEN = 10
STEP = 1

input_chars = []
label_chars = []
for i in range(0, len(text) - SEQLEN, STEP):
    input_chars.append(text[i:i + SEQLEN])
    label_chars.append(text[i + SEQLEN])

# vectorize the input and label chars
# Each row of the input is represented by seqlen characters, each
# represented as a 1-hot encoding of size len(char). There are
# len(input_chars) such rows, so shape(X) is (len(input_chars),
# seqlen, nb_chars).
# Each row of output is a single character, also represented as a
# dense encoding of size len(char). Hence shape(y) is (len(input_chars),
# nb_chars).
print("Vectorizing input and label text...")
X = np.zeros((len(input_chars), SEQLEN, nb_chars), dtype=np.bool)
y = np.zeros((len(input_chars), nb_chars), dtype=np.bool)
for i, input_char in enumerate(input_chars):
    for j, ch in enumerate(input_char):
        X[i, j, char2index[ch]] = 1
    y[i, char2index[label_chars[i]]] = 1

# Build the model. We use a single RNN with a fully connected layer
# to compute the most likely predicted output char
HIDDEN_SIZE = 128
BATCH_SIZE = 128
NUM_ITERATIONS = 25
NUM_EPOCHS_PER_ITERATION = 1
NUM_PREDS_PER_EPOCH = 100

model = Sequential()
model.add(SimpleRNN(HIDDEN_SIZE, return_sequences=False,
                    input_shape=(SEQLEN, nb_chars),
                    unroll=True))
model.add(Dense(nb_chars))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

# We train the model in batches and test output generated at each step
for iteration in range(NUM_ITERATIONS):
    print("=" * 50)
    print("Iteration #: {}".format(iteration))
    model.fit(X, y, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS_PER_ITERATION)

    # testing model
    # randomly choose a row from input_chars, then use it to
    # generate text from model for next 100 chars
    test_idx = np.random.randint(len(input_chars))
    test_chars = input_chars[test_idx]
    print("Generating from seed: {}".format(test_chars))
    print(test_chars, end="")
    for i in range(NUM_PREDS_PER_EPOCH):
        Xtest = np.zeros((1, SEQLEN, nb_chars))
        for j, ch in enumerate(test_chars):
            Xtest[0, j, char2index[ch]] = 1
        pred = model.predict(Xtest, verbose=0)[0]
        ypred = index2char[np.argmax(pred)]
        print(ypred, end="")
        # move forward with test_chars + ypred
        test_chars = test_chars[1:] + ypred
    print()

実行結果

Extracting text from input...
Creating input and label text...
Vectorizing input and label text...
==================================================
Iteration #: 0
Epoch 1/1
162739/162739 [==============================] - 9s 55us/step - loss: 2.3730
Generating from seed: as gone, a
as gone, and the the the the the the the the the the the the the the the the the the the the the the the the t
==================================================
Iteration #: 1
Epoch 1/1
162739/162739 [==============================] - 9s 56us/step - loss: 2.0644
Generating from seed: ing, the q
ing, the queen the har she for she for she for she for she for she for she for she for she for she for she for
==================================================
Iteration #: 2
Epoch 1/1
162739/162739 [==============================] - 10s 60us/step - loss: 1.9566
Generating from seed: so i shoul
so i should the was an the was an the was an the was an the was an the was an the was an the was an the was an
==================================================
Iteration #: 3
Epoch 1/1
162739/162739 [==============================] - 9s 58us/step - loss: 1.8721
Generating from seed: of her fav
of her fave and the say har she said the doong to the pooked and the say har she said the doong to the pooked 
==================================================
Iteration #: 4
Epoch 1/1
162739/162739 [==============================] - 9s 58us/step - loss: 1.8049
Generating from seed: ole pack o
ole pack out of the said the was the was the was the was the was the was the was the was the was the was the w
==================================================
Iteration #: 5
Epoch 1/1
162739/162739 [==============================] - 10s 62us/step - loss: 1.7491
Generating from seed: d a vague 
d a vague to see the said the doon and the said the doon and the said the doon and the said the doon and the s
==================================================
Iteration #: 6
Epoch 1/1
162739/162739 [==============================] - 10s 61us/step - loss: 1.7034
Generating from seed: can;--but 
can;--but it was the dore of the the say and when the dormouse was the dore of the the say and when the dormou
==================================================
Iteration #: 7
Epoch 1/1
162739/162739 [==============================] - 10s 61us/step - loss: 1.6628
Generating from seed: cense.  1.
cense.  1.e.  ‘which was it was the project gutenberg-tm electronic works in a little began the was the projec
==================================================
Iteration #: 8
Epoch 1/1
162739/162739 [==============================] - 10s 62us/step - loss: 1.6296
Generating from seed: e a secure
e a secures and she can the grope a growing the gryphon the growes and the gryphon the growes and the gryphon 
==================================================
Iteration #: 9
Epoch 1/1
162739/162739 [==============================] - 10s 63us/step - loss: 1.6003
Generating from seed:  thing i a
 thing i and and all the was and she said the dormouse she had for the ward alice a look at the rabbit harder 
==================================================
Iteration #: 10
Epoch 1/1
162739/162739 [==============================] - 10s 62us/step - loss: 1.5755
Generating from seed:  call it s
 call it she went on an the court in a monether she can the hatter was she can the hatter was she can the hatt
==================================================
Iteration #: 11
Epoch 1/1
162739/162739 [==============================] - 10s 64us/step - loss: 1.5533
Generating from seed: at’s all t
at’s all the same a down and the dittle beanther a look to the choored at the mouse she was she cat all the fi
==================================================
Iteration #: 12
Epoch 1/1
162739/162739 [==============================] - 11s 65us/step - loss: 1.5351
Generating from seed: aw alice. 
aw alice. ‘i could be no mary dore or a little beat had not and be not lear the was of the pare of the same as
==================================================
Iteration #: 13
Epoch 1/1
162739/162739 [==============================] - 10s 62us/step - loss: 1.5164
Generating from seed: ou our cat
ou our cat a little she said to herself and the project gutenberg-tm electronic works in a little she said to 
==================================================
Iteration #: 14
Epoch 1/1
162739/162739 [==============================] - 10s 61us/step - loss: 1.5011
Generating from seed: her alarme
her alarmed in the dormouse the office the dormouse the office the dormouse the office the dormouse the office
==================================================
Iteration #: 15
Epoch 1/1
162739/162739 [==============================] - 10s 62us/step - loss: 1.4859
Generating from seed: ‘shan’t,’ 
‘shan’t,’ said the caterpillar to the court in the court in the court in the court in the court in the court i
==================================================
Iteration #: 16
Epoch 1/1
162739/162739 [==============================] - 11s 65us/step - loss: 1.4737
Generating from seed: e last con
e last confuring the project gutenberg-tm electronic works in a mare or a little bean to the queen of the mous
==================================================
Iteration #: 17
Epoch 1/1
162739/162739 [==============================] - 11s 67us/step - loss: 1.4626
Generating from seed: st at firs
st at first the was a long as it was got the caterpillar with the caterpillar with the caterpillar with the ca
==================================================
Iteration #: 18
Epoch 1/1
162739/162739 [==============================] - 10s 64us/step - loss: 1.4519
Generating from seed: anxiously 
anxiously at the mock turtle sure to herself the king of the mock turtle sure to herself the king of the mock 
==================================================
Iteration #: 19
Epoch 1/1
162739/162739 [==============================] - 11s 67us/step - loss: 1.4414
Generating from seed: ith no oth
ith no other look at the was the was the was the was the was the was the was the was the was the was the was t
==================================================
Iteration #: 20
Epoch 1/1
162739/162739 [==============================] - 11s 68us/step - loss: 1.4332
Generating from seed:  the white
 the white rabbit hard alice the thing it so election in an a moraly and alice was a little be the way was a l
==================================================
Iteration #: 21
Epoch 1/1
162739/162739 [==============================] - 11s 65us/step - loss: 1.4251
Generating from seed: t on plann
t on planned and alice a little she said to herself ‘what is a grow in a come to got be a looking and alice a 
==================================================
Iteration #: 22
Epoch 1/1
162739/162739 [==============================] - 10s 59us/step - loss: 1.4172
Generating from seed:  the hatte
 the hatter was a little book the read the read the read the read the read the read the read the read the read
==================================================
Iteration #: 23
Epoch 1/1
162739/162739 [==============================] - 9s 57us/step - loss: 1.4103
Generating from seed: are went ‘
are went ‘and they went on, ‘i peniest it, and they went on, ‘i peniest it, and they went on, ‘i peniest it, a
==================================================
Iteration #: 24
Epoch 1/1
162739/162739 [==============================] - 9s 57us/step - loss: 1.4035
Generating from seed:  the hatte
 the hatter was the look of the state her full she had not a large and making of the mock turtle she had not a

初回は「the」ばかり喋っていましたが、最後にはそれらしい文章になりましたね。最後が「a」で終わっているあたり、文法的にはまだまだですが。ちなみに日本語でもそれなりに文章ができたので結果だけ記載しておきます。【歓喜した。 トイレは
歓喜した。】以上です。なかなか趣のある俳句?ができました。

システム開発

Posted by @erestage