RNN(Vanilla),RNN(LSTM),RNN(GRU) Performance Test

4월 19, 2017

ANN lstmRNN gruRNN performance test

enter image description here
Image source

Comparison of English character recognition performance among RNN(Vanilla), RNN(LSTM), and RNN(GRU).

Hyungwon Yang
04.19.17
NAMZ Labs

Task

Tensorflow에서 제공하는 기본적인 RNN방식과 LSTM cell, 그리고 GRU cell을 적용한 RNN방식 총 3가지 모델의 성능을 비교한다.
영어 character 단위의 데이터셋을 이용하여 훈련한 뒤, 훈련에 사용하지 않은 테스트 셋으로 결과를 추출하여 세 모델의 성능을 비교한다.

Training Corpus

Project Gutenberg’s The Divine Comedy, Complete, by Dante Alighieri
This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org
The part of the corpus was extracted for training.

Experimental Setting.

Python 3.5.3
Tnesorflow 1.0.0
Mac OSX sierra 10.12.4

Data Preprocessing.

이전 report에서 보고하였던 것으로 갈음한다.

RNN(Vanilla) Training

Hidden unit의 개수는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
설정값
1. 훈련에 사용된 데이터: 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
3. 본 실험에서는 사용하지 않았지만 리포트 상에서는 Accuracy의 변화를 보여주고자 훈련에 사용되는 데이터중 20%를 validation 셋(1,700개)으로 구성하였다. 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
4. Parameters
  - Epoch: 200 (고정)
  - The number of hidden layer: 1 (고정)
  - The number of hidden units: 50, 100, 200
  - Learning Rate: 0.001
  - Cost Function: AdamOptimizer

import sys
# HY_python_NN absolute directory.
my_absdir = "/Users/hyungwonyang/Google_Drive/Python/HY_python_NN"
sys.path.append(my_absdir)

import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
rnn_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = rnn_data['train_input']
train_output = rnn_data['train_output']
test_input = rnn_data['test_input']
test_output = rnn_data['test_output']

# parameters
problem = 'classification' # classification, regression
rnnCell = 'rnn' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
dropout = 'off'
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

rnn_values = set.RNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

# Setting hidden layers: weightMatrix and biasMatrix
rnn_weightMatrix = rnn_values.genWeight()
rnn_biasMatrix = rnn_values.genBias()
rnn_input_x,rnn_input_y = rnn_values.genSymbol()

rnn_net = net.RNNModel(inputSymbol=rnn_input_x,
                        outputSymbol=rnn_input_y,
                        rnnCell=rnnCell,
                        problem=problem,
                        hiddenLayer=hiddenLayers,
                        trainEpoch=trainEpoch,
                        learningRate=learningRate,
                        learningRateDecay=learningRateDecay,
                        timeStep=timeStep,
                        batchSize=batchSize,
                        dropout=dropout,
                        validationCheck=validationCheck,
                        weightMatrix=rnn_weightMatrix,
                        biasMatrix=rnn_biasMatrix)

# Generate a RNN(vanilla) network.
rnn_net.genRNN()

########## RNN Setting #########
Task : classification
Cell Type : rnn
Hidden Layers : 1
Hidden Units : [200]
Train Epoch : 20
Learning Rate : 0.001
Time Steps : 20
Batch Size : 100
Drop Out : off
Validation : on
########## RNN Setting #########
RNN structure is generated.

# Train the RNN(vanilla) network.
# In this tutorial, we will run only 20 epochs.
rnn_net.trainRNN(train_input,train_output)

Activating training process.
Epoch: 1 / 20, Cost : 2.906381, Validation Accuracy: 34.53%
Epoch: 2 / 20, Cost : 2.269840, Validation Accuracy: 35.91%
Epoch: 3 / 20, Cost : 2.197584, Validation Accuracy: 36.31%
Epoch: 4 / 20, Cost : 2.168425, Validation Accuracy: 36.64%
Epoch: 5 / 20, Cost : 2.152344, Validation Accuracy: 36.72%
Epoch: 6 / 20, Cost : 2.142435, Validation Accuracy: 36.76%
Epoch: 7 / 20, Cost : 2.135697, Validation Accuracy: 36.86%
Epoch: 8 / 20, Cost : 2.130892, Validation Accuracy: 36.87%
Epoch: 9 / 20, Cost : 2.127237, Validation Accuracy: 36.84%
Epoch: 10 / 20, Cost : 2.124216, Validation Accuracy: 36.80%
Epoch: 11 / 20, Cost : 2.121692, Validation Accuracy: 36.79%
Epoch: 12 / 20, Cost : 2.119568, Validation Accuracy: 36.84%
Epoch: 13 / 20, Cost : 2.117731, Validation Accuracy: 36.86%
Epoch: 14 / 20, Cost : 2.116107, Validation Accuracy: 36.88%
Epoch: 15 / 20, Cost : 2.114656, Validation Accuracy: 36.94%
Epoch: 16 / 20, Cost : 2.113353, Validation Accuracy: 37.00%
Epoch: 17 / 20, Cost : 2.112178, Validation Accuracy: 37.01%
Epoch: 18 / 20, Cost : 2.111109, Validation Accuracy: 36.95%
Epoch: 19 / 20, Cost : 2.110125, Validation Accuracy: 36.98%
Epoch: 20 / 20, Cost : 2.109211, Validation Accuracy: 36.93%
The model has been trained successfully.

# Test the trained RNN(vanilla) network.
rnn_net.testRNN(test_input,test_output)

Activating Testing Process
Tested with 1650 datasets.
Test Accuracy: 37.44 %

# Save the trained parameters.
vars = rnn_net.getVariables()
# Terminate the session.
rnn_net.closeRNN()

RNN training session is terminated.

RNN(LSTM) Training

Hidden unit의 개수는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
설정값
1. 훈련에 사용된 데이터 : 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
3. 본 실험에서는 사용하지 않았지만 리포트 상에서는 Accuracy의 변화를 보여주고자 훈련에 사용되는 데이터중 20%를 validation 셋(1,700개)으로 구성하였다. 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
4. Parameters
  - Epoch: 200 (고정)
  - The number of hidden layer: 1 (고정)
  - The number of hidden units: 50, 100, 200
  - Learning Rate: 0.001
  - Cost Function: AdamOptimizer

import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
lstm_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = lstm_data['train_input']
train_output = lstm_data['train_output']
test_input = lstm_data['test_input']
test_output = lstm_data['test_output']

# parameters
problem = 'classification' # classification, regression
rnnCell = 'lstm' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
dropout = 'off'
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

lstm_values = set.RNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

# Setting hidden layers: weightMatrix and biasMatrix
lstm_weightMatrix = lstm_values.genWeight()
lstm_biasMatrix = lstm_values.genBias()
lstm_input_x,lstm_input_y = lstm_values.genSymbol()

lstm_net = net.RNNModel(inputSymbol=lstm_input_x,
                        outputSymbol=lstm_input_y,
                        rnnCell=rnnCell,
                        problem=problem,
                        hiddenLayer=hiddenLayers,
                        trainEpoch=trainEpoch,
                        learningRate=learningRate,
                        learningRateDecay=learningRateDecay,
                        timeStep=timeStep,
                        batchSize=batchSize,
                        dropout=dropout,
                        validationCheck=validationCheck,
                        weightMatrix=lstm_weightMatrix,
                        biasMatrix=lstm_biasMatrix)

# Generate a RNN(lstm) network.
lstm_net.genRNN()

########## RNN Setting #########
Task : classification
Cell Type : lstm
Hidden Layers : 1
Hidden Units : [200]
Train Epoch : 20
Learning Rate : 0.001
Time Steps : 20
Batch Size : 100
Drop Out : off
Validation : on
########## RNN Setting #########
RNN structure is generated.

# Train the RNN(lstm) network.
# In this tutorial, we will run only 20 epochs.
lstm_net.trainRNN(train_input,train_output)

Activating training process.
Epoch: 1 / 20, Cost : 2.871745, Validation Accuracy: 30.65%
Epoch: 2 / 20, Cost : 2.423226, Validation Accuracy: 33.72%
Epoch: 3 / 20, Cost : 2.288991, Validation Accuracy: 35.03%
Epoch: 4 / 20, Cost : 2.220830, Validation Accuracy: 35.94%
Epoch: 5 / 20, Cost : 2.174125, Validation Accuracy: 37.06%
Epoch: 6 / 20, Cost : 2.134481, Validation Accuracy: 37.87%
Epoch: 7 / 20, Cost : 2.098422, Validation Accuracy: 38.62%
Epoch: 8 / 20, Cost : 2.065490, Validation Accuracy: 39.25%
Epoch: 9 / 20, Cost : 2.035602, Validation Accuracy: 39.88%
Epoch: 10 / 20, Cost : 2.008691, Validation Accuracy: 40.51%
Epoch: 11 / 20, Cost : 1.984487, Validation Accuracy: 41.02%
Epoch: 12 / 20, Cost : 1.962478, Validation Accuracy: 41.44%
Epoch: 13 / 20, Cost : 1.942094, Validation Accuracy: 41.81%
Epoch: 14 / 20, Cost : 1.923390, Validation Accuracy: 42.08%
Epoch: 15 / 20, Cost : 1.905551, Validation Accuracy: 42.37%
Epoch: 16 / 20, Cost : 1.888492, Validation Accuracy: 42.68%
Epoch: 17 / 20, Cost : 1.872363, Validation Accuracy: 42.94%
Epoch: 18 / 20, Cost : 1.856971, Validation Accuracy: 43.18%
Epoch: 19 / 20, Cost : 1.842187, Validation Accuracy: 43.42%
Epoch: 20 / 20, Cost : 1.827953, Validation Accuracy: 43.59%
The model has been trained successfully.

# Test the trained RNN(lstm) network.
lstm_net.testRNN(test_input,test_output)

Activating Testing Process
Tested with 1650 datasets.
Test Accuracy: 45.55 %

# Save the trained parameters.
vars = lstm_net.getVariables()
# Terminate the session.
lstm_net.closeRNN()

RNN training session is terminated.

RNN(GRU) Training

Hidden unit의 개수는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
설정값
1. 훈련에 사용된 데이터 : 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
3. 본 실험에서는 사용하지 않았지만 리포트 상에서는 Accuracy의 변화를 보여주고자 훈련에 사용되는 데이터중 20%를 validation 셋(1,700개)으로 구성하였다. 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
4. Parameters
  - Epoch: 200 (고정)
  - The number of hidden layer: 1 (고정)
  - The number of hidden units: 50, 100, 200
  - Learning Rate: 0.001
  - Cost Function: AdamOptimizer

import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
gru_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = gru_data['train_input']
train_output = gru_data['train_output']
test_input = gru_data['test_input']
test_output = gru_data['test_output']

# parameters
problem = 'classification' # classification, regression
rnnCell = 'gru' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
dropout = 'off'
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

gru_values = set.RNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

# Setting hidden layers: weightMatrix and biasMatrix
gru_weightMatrix = gru_values.genWeight()
gru_biasMatrix = gru_values.genBias()
gru_input_x,gru_input_y = gru_values.genSymbol()

gru_net = net.RNNModel(inputSymbol=gru_input_x,
                        outputSymbol=gru_input_y,
                        rnnCell=rnnCell,
                        problem=problem,
                        hiddenLayer=hiddenLayers,
                        trainEpoch=trainEpoch,
                        learningRate=learningRate,
                        learningRateDecay=learningRateDecay,
                        timeStep=timeStep,
                        batchSize=batchSize,
                        dropout=dropout,
                        validationCheck=validationCheck,
                        weightMatrix=gru_weightMatrix,
                        biasMatrix=gru_biasMatrix)

# Generate a RNN(gru) network.
gru_net.genRNN()

########## RNN Setting #########
Task : classification
Cell Type : gru
Hidden Layers : 1
Hidden Units : [200]
Train Epoch : 20
Learning Rate : 0.001
Time Steps : 20
Batch Size : 100
Drop Out : off
Validation : on
########## RNN Setting #########
RNN structure is generated.

# Train the RNN(gru) network.
# In this tutorial, we will run only 20 epochs.
gru_net.trainRNN(train_input,train_output)

Activating training process.
Epoch: 1 / 20, Cost : 3.031808, Validation Accuracy: 29.38%
Epoch: 2 / 20, Cost : 2.442505, Validation Accuracy: 33.77%
Epoch: 3 / 20, Cost : 2.280064, Validation Accuracy: 35.53%
Epoch: 4 / 20, Cost : 2.195805, Validation Accuracy: 37.09%
Epoch: 5 / 20, Cost : 2.137235, Validation Accuracy: 38.18%
Epoch: 6 / 20, Cost : 2.088224, Validation Accuracy: 39.11%
Epoch: 7 / 20, Cost : 2.045377, Validation Accuracy: 39.92%
Epoch: 8 / 20, Cost : 2.008134, Validation Accuracy: 40.61%
Epoch: 9 / 20, Cost : 1.975544, Validation Accuracy: 41.13%
Epoch: 10 / 20, Cost : 1.946191, Validation Accuracy: 41.64%
Epoch: 11 / 20, Cost : 1.919675, Validation Accuracy: 42.24%
Epoch: 12 / 20, Cost : 1.896107, Validation Accuracy: 42.59%
Epoch: 13 / 20, Cost : 1.874857, Validation Accuracy: 42.95%
Epoch: 14 / 20, Cost : 1.855431, Validation Accuracy: 43.32%
Epoch: 15 / 20, Cost : 1.837488, Validation Accuracy: 43.69%
Epoch: 16 / 20, Cost : 1.820764, Validation Accuracy: 44.04%
Epoch: 17 / 20, Cost : 1.805057, Validation Accuracy: 44.39%
Epoch: 18 / 20, Cost : 1.790217, Validation Accuracy: 44.61%
Epoch: 19 / 20, Cost : 1.776120, Validation Accuracy: 44.77%
Epoch: 20 / 20, Cost : 1.762664, Validation Accuracy: 44.92%
The model has been trained successfully.

# Test the trained RNN(gru) network.
gru_net.testRNN(test_input,test_output)

Activating Testing Process
Tested with 1650 datasets.
Test Accuracy: 47.36 %

# Save the trained parameters.
vars = gru_net.getVariables()
# Terminate the session.
gru_net.closeRNN()

RNN training session is terminated.

Comments

위 코드상에서 히든레이어 유닛 개수가 200개인 경우만 한정지어 진행하였으나, 실제로는 히든레이어 유닛 개수를 50, 100, 200으로 달리하여 진행하였으며 그에 따른 결과는 아래의 표에서 나타난다.
초반 Accuracy의 변화량을 보여주고자 본 코드에서는 각 모델의 훈련 Epoch를 20회만 진행하였으나, 실제 훈련에서는 각 실험당 총 200회의 Epoch가 진행되었다.

Result

히든레이어 개수와 상관없이 훈련이 안되던 ANN의 결과와 비교해 볼 때, RNN(Vanilla)와 RNN(LSTM), 그리고 RNN(GRU)는 안정적으로 훈련이 진행되며 그에 따라 성능 향상도 보여주고 있다.
표에서 나타나는 것처럼 RNN(LSTM)과 RNN(GRU)가 비슷한 성능을 (히든레이어 유닛 200에서 각각 72.59% 70.89%로 약 2%차이) 보여주며, 이는 RNN(Vanilla) 대비 약 22% 정도의 큰 성능차이를 보여준다.
RNN(LSTM)과 RNN(GRU)를 놓고 비교해볼 경우 본 실험에서는 RNN(LSTM)이 RNN(Vanilla)보다 약간 2% 정도의 높은 성능을 보여주고 있다. 하지만 최근 논문들에서 GRU가 LSTM보다 더 좋은 결과를 가져온다고 주장하는점으로 비춰 볼 때, 다른 테스크에는 어떤 차이가 나타날지 주목해 볼 필요가 있다.
또한 Accuracy 측면에서 RNN(Vanilla)는 불안정하게 하향과 상향을 반복하는 반면, RNN(LSTM)과, RNN(GRU)는 비교적 안정된 Accuracy 상향을 보여주고 있다.

Model	Hidden Units	Accuracy
RNN(Vanilla)	50	44.42%
RNN(Vanilla)	100	47.86%
RNN(Vanilla)	200	50.23%
RNN(LSTM)	50	49.76%
RNN(LSTM)	100	56.54%
RNN(LSTM)	200	72.59%
RNN(GRU)	50	49.68%
RNN(GRU)	100	55.75%
RNN(GRU)	200	70.89%

Github Code

다음의 깃헙 코드를 다운받으면 본 실험을 재현할 수 있다.
- https://github.com/hyung8758/HY_python_NN.git
Jupyter에서 실행을 원할 경우, 위 코드를 받고 jupyter notebook 최상단쯤에 보이는 absolute directory를 코드 상의 폴더이름(/your/path/to/HY_python_NN)으로 정한 뒤 실행하면 된다.

Hyungwon Yang

이 블로그 검색

Hyungwon's Notebook