RNNを使ってテキストを生成してみる【Keras】
今日も宿題をすることができなかった。明日は連休最終日ですが、今日もまだまだがんばりますよ。RNNを使ってテキストを生成します。とは言っても英語で行うので、正しいのかどうなのかは判別し難い部分がありますね。
# -*- coding: utf-8 -*- from __future__ import print_function import numpy as np from keras.layers import Dense, Activation, SimpleRNN from keras.models import Sequential import codecs INPUT_FILE = "C:\\Users\\admin\\log\\11-0.txt" # extract the input as a stream of characters print("Extracting text from input...") with codecs.open(INPUT_FILE, "r", encoding="utf-8") as f: lines = [line.strip().lower() for line in f if len(line) != 0] text = " ".join(lines) # creating lookup tables # Here chars is the number of features in our character "vocabulary" chars = set(text) nb_chars = len(chars) char2index = dict((c, i) for i, c in enumerate(chars)) index2char = dict((i, c) for i, c in enumerate(chars)) # create inputs and labels from the text. We do this by stepping # through the text ${step} character at a time, and extracting a # sequence of size ${seqlen} and the next output char. For example, # assuming an input text "The sky was falling", we would get the # following sequence of input_chars and label_chars (first 5 only) # The sky wa -> s # he sky was -> # e sky was -> f # sky was f -> a # sky was fa -> l print("Creating input and label text...") SEQLEN = 10 STEP = 1 input_chars = [] label_chars = [] for i in range(0, len(text) - SEQLEN, STEP): input_chars.append(text[i:i + SEQLEN]) label_chars.append(text[i + SEQLEN]) # vectorize the input and label chars # Each row of the input is represented by seqlen characters, each # represented as a 1-hot encoding of size len(char). There are # len(input_chars) such rows, so shape(X) is (len(input_chars), # seqlen, nb_chars). # Each row of output is a single character, also represented as a # dense encoding of size len(char). Hence shape(y) is (len(input_chars), # nb_chars). print("Vectorizing input and label text...") X = np.zeros((len(input_chars), SEQLEN, nb_chars), dtype=np.bool) y = np.zeros((len(input_chars), nb_chars), dtype=np.bool) for i, input_char in enumerate(input_chars): for j, ch in enumerate(input_char): X[i, j, char2index[ch]] = 1 y[i, char2index[label_chars[i]]] = 1 # Build the model. We use a single RNN with a fully connected layer # to compute the most likely predicted output char HIDDEN_SIZE = 128 BATCH_SIZE = 128 NUM_ITERATIONS = 25 NUM_EPOCHS_PER_ITERATION = 1 NUM_PREDS_PER_EPOCH = 100 model = Sequential() model.add(SimpleRNN(HIDDEN_SIZE, return_sequences=False, input_shape=(SEQLEN, nb_chars), unroll=True)) model.add(Dense(nb_chars)) model.add(Activation("softmax")) model.compile(loss="categorical_crossentropy", optimizer="rmsprop") # We train the model in batches and test output generated at each step for iteration in range(NUM_ITERATIONS): print("=" * 50) print("Iteration #: {}".format(iteration)) model.fit(X, y, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS_PER_ITERATION) # testing model # randomly choose a row from input_chars, then use it to # generate text from model for next 100 chars test_idx = np.random.randint(len(input_chars)) test_chars = input_chars[test_idx] print("Generating from seed: {}".format(test_chars)) print(test_chars, end="") for i in range(NUM_PREDS_PER_EPOCH): Xtest = np.zeros((1, SEQLEN, nb_chars)) for j, ch in enumerate(test_chars): Xtest[0, j, char2index[ch]] = 1 pred = model.predict(Xtest, verbose=0)[0] ypred = index2char[np.argmax(pred)] print(ypred, end="") # move forward with test_chars + ypred test_chars = test_chars[1:] + ypred print()
実行結果
Extracting text from input... Creating input and label text... Vectorizing input and label text... ================================================== Iteration #: 0 Epoch 1/1 162739/162739 [==============================] - 9s 55us/step - loss: 2.3730 Generating from seed: as gone, a as gone, and the the the the the the the the the the the the the the the the the the the the the the the the t ================================================== Iteration #: 1 Epoch 1/1 162739/162739 [==============================] - 9s 56us/step - loss: 2.0644 Generating from seed: ing, the q ing, the queen the har she for she for she for she for she for she for she for she for she for she for she for ================================================== Iteration #: 2 Epoch 1/1 162739/162739 [==============================] - 10s 60us/step - loss: 1.9566 Generating from seed: so i shoul so i should the was an the was an the was an the was an the was an the was an the was an the was an the was an ================================================== Iteration #: 3 Epoch 1/1 162739/162739 [==============================] - 9s 58us/step - loss: 1.8721 Generating from seed: of her fav of her fave and the say har she said the doong to the pooked and the say har she said the doong to the pooked ================================================== Iteration #: 4 Epoch 1/1 162739/162739 [==============================] - 9s 58us/step - loss: 1.8049 Generating from seed: ole pack o ole pack out of the said the was the was the was the was the was the was the was the was the was the was the w ================================================== Iteration #: 5 Epoch 1/1 162739/162739 [==============================] - 10s 62us/step - loss: 1.7491 Generating from seed: d a vague d a vague to see the said the doon and the said the doon and the said the doon and the said the doon and the s ================================================== Iteration #: 6 Epoch 1/1 162739/162739 [==============================] - 10s 61us/step - loss: 1.7034 Generating from seed: can;--but can;--but it was the dore of the the say and when the dormouse was the dore of the the say and when the dormou ================================================== Iteration #: 7 Epoch 1/1 162739/162739 [==============================] - 10s 61us/step - loss: 1.6628 Generating from seed: cense. 1. cense. 1.e. ‘which was it was the project gutenberg-tm electronic works in a little began the was the projec ================================================== Iteration #: 8 Epoch 1/1 162739/162739 [==============================] - 10s 62us/step - loss: 1.6296 Generating from seed: e a secure e a secures and she can the grope a growing the gryphon the growes and the gryphon the growes and the gryphon ================================================== Iteration #: 9 Epoch 1/1 162739/162739 [==============================] - 10s 63us/step - loss: 1.6003 Generating from seed: thing i a thing i and and all the was and she said the dormouse she had for the ward alice a look at the rabbit harder ================================================== Iteration #: 10 Epoch 1/1 162739/162739 [==============================] - 10s 62us/step - loss: 1.5755 Generating from seed: call it s call it she went on an the court in a monether she can the hatter was she can the hatter was she can the hatt ================================================== Iteration #: 11 Epoch 1/1 162739/162739 [==============================] - 10s 64us/step - loss: 1.5533 Generating from seed: at’s all t at’s all the same a down and the dittle beanther a look to the choored at the mouse she was she cat all the fi ================================================== Iteration #: 12 Epoch 1/1 162739/162739 [==============================] - 11s 65us/step - loss: 1.5351 Generating from seed: aw alice. aw alice. ‘i could be no mary dore or a little beat had not and be not lear the was of the pare of the same as ================================================== Iteration #: 13 Epoch 1/1 162739/162739 [==============================] - 10s 62us/step - loss: 1.5164 Generating from seed: ou our cat ou our cat a little she said to herself and the project gutenberg-tm electronic works in a little she said to ================================================== Iteration #: 14 Epoch 1/1 162739/162739 [==============================] - 10s 61us/step - loss: 1.5011 Generating from seed: her alarme her alarmed in the dormouse the office the dormouse the office the dormouse the office the dormouse the office ================================================== Iteration #: 15 Epoch 1/1 162739/162739 [==============================] - 10s 62us/step - loss: 1.4859 Generating from seed: ‘shan’t,’ ‘shan’t,’ said the caterpillar to the court in the court in the court in the court in the court in the court i ================================================== Iteration #: 16 Epoch 1/1 162739/162739 [==============================] - 11s 65us/step - loss: 1.4737 Generating from seed: e last con e last confuring the project gutenberg-tm electronic works in a mare or a little bean to the queen of the mous ================================================== Iteration #: 17 Epoch 1/1 162739/162739 [==============================] - 11s 67us/step - loss: 1.4626 Generating from seed: st at firs st at first the was a long as it was got the caterpillar with the caterpillar with the caterpillar with the ca ================================================== Iteration #: 18 Epoch 1/1 162739/162739 [==============================] - 10s 64us/step - loss: 1.4519 Generating from seed: anxiously anxiously at the mock turtle sure to herself the king of the mock turtle sure to herself the king of the mock ================================================== Iteration #: 19 Epoch 1/1 162739/162739 [==============================] - 11s 67us/step - loss: 1.4414 Generating from seed: ith no oth ith no other look at the was the was the was the was the was the was the was the was the was the was the was t ================================================== Iteration #: 20 Epoch 1/1 162739/162739 [==============================] - 11s 68us/step - loss: 1.4332 Generating from seed: the white the white rabbit hard alice the thing it so election in an a moraly and alice was a little be the way was a l ================================================== Iteration #: 21 Epoch 1/1 162739/162739 [==============================] - 11s 65us/step - loss: 1.4251 Generating from seed: t on plann t on planned and alice a little she said to herself ‘what is a grow in a come to got be a looking and alice a ================================================== Iteration #: 22 Epoch 1/1 162739/162739 [==============================] - 10s 59us/step - loss: 1.4172 Generating from seed: the hatte the hatter was a little book the read the read the read the read the read the read the read the read the read ================================================== Iteration #: 23 Epoch 1/1 162739/162739 [==============================] - 9s 57us/step - loss: 1.4103 Generating from seed: are went ‘ are went ‘and they went on, ‘i peniest it, and they went on, ‘i peniest it, and they went on, ‘i peniest it, a ================================================== Iteration #: 24 Epoch 1/1 162739/162739 [==============================] - 9s 57us/step - loss: 1.4035 Generating from seed: the hatte the hatter was the look of the state her full she had not a large and making of the mock turtle she had not a
初回は「the」ばかり喋っていましたが、最後にはそれらしい文章になりましたね。最後が「a」で終わっているあたり、文法的にはまだまだですが。ちなみに日本語でもそれなりに文章ができたので結果だけ記載しておきます。【歓喜した。 トイレは
歓喜した。】以上です。なかなか趣のある俳句?ができました。