RNN(Vanilla),RNN(LSTM),RNN(GRU) Performance Test

ANN lstmRNN gruRNN performance test

enter image description here
Image source

Comparison of English character recognition performance among RNN(Vanilla), RNN(LSTM), and RNN(GRU).

Hyungwon Yang
04.19.17
NAMZ Labs

Task

  • Tensorflow에서 제공하는 기본적인 RNN방식과 LSTM cell, 그리고 GRU cell을 적용한 RNN방식 총 3가지 모델의 성능을 비교한다.
  • 영어 character 단위의 데이터셋을 이용하여 훈련한 뒤, 훈련에 사용하지 않은 테스트 셋으로 결과를 추출하여 세 모델의 성능을 비교한다.

Training Corpus

  • Project Gutenberg’s The Divine Comedy, Complete, by Dante Alighieri
  • This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org
  • The part of the corpus was extracted for training.

Experimental Setting.

  • Python 3.5.3
  • Tnesorflow 1.0.0
  • Mac OSX sierra 10.12.4

Data Preprocessing.

  • 이전 report에서 보고하였던 것으로 갈음한다.

RNN(Vanilla) Training

  • Hidden unit의 개수는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
  • 설정값
    1. 훈련에 사용된 데이터: 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
    2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
    3. 본 실험에서는 사용하지 않았지만 리포트 상에서는 Accuracy의 변화를 보여주고자 훈련에 사용되는 데이터중 20%를 validation 셋(1,700개)으로 구성하였다. 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
    4. Parameters
      • Epoch: 200 (고정)
      • The number of hidden layer: 1 (고정)
      • The number of hidden units: 50, 100, 200
      • Learning Rate: 0.001
      • Cost Function: AdamOptimizer
import sys
# HY_python_NN absolute directory.
my_absdir = "/Users/hyungwonyang/Google_Drive/Python/HY_python_NN"
sys.path.append(my_absdir)

import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
rnn_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = rnn_data['train_input']
train_output = rnn_data['train_output']
test_input = rnn_data['test_input']
test_output = rnn_data['test_output']

# parameters
problem = 'classification' # classification, regression
rnnCell = 'rnn' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
dropout = 'off'
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

rnn_values = set.RNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

# Setting hidden layers: weightMatrix and biasMatrix
rnn_weightMatrix = rnn_values.genWeight()
rnn_biasMatrix = rnn_values.genBias()
rnn_input_x,rnn_input_y = rnn_values.genSymbol()

rnn_net = net.RNNModel(inputSymbol=rnn_input_x,
                        outputSymbol=rnn_input_y,
                        rnnCell=rnnCell,
                        problem=problem,
                        hiddenLayer=hiddenLayers,
                        trainEpoch=trainEpoch,
                        learningRate=learningRate,
                        learningRateDecay=learningRateDecay,
                        timeStep=timeStep,
                        batchSize=batchSize,
                        dropout=dropout,
                        validationCheck=validationCheck,
                        weightMatrix=rnn_weightMatrix,
                        biasMatrix=rnn_biasMatrix)

# Generate a RNN(vanilla) network.
rnn_net.genRNN()

########## RNN Setting #########
Task : classification
Cell Type : rnn
Hidden Layers : 1
Hidden Units : [200]
Train Epoch : 20
Learning Rate : 0.001
Time Steps : 20
Batch Size : 100
Drop Out : off
Validation : on
########## RNN Setting #########
RNN structure is generated.

# Train the RNN(vanilla) network.
# In this tutorial, we will run only 20 epochs.
rnn_net.trainRNN(train_input,train_output)

Activating training process.
Epoch: 1 / 20, Cost : 2.906381, Validation Accuracy: 34.53%
Epoch: 2 / 20, Cost : 2.269840, Validation Accuracy: 35.91%
Epoch: 3 / 20, Cost : 2.197584, Validation Accuracy: 36.31%
Epoch: 4 / 20, Cost : 2.168425, Validation Accuracy: 36.64%
Epoch: 5 / 20, Cost : 2.152344, Validation Accuracy: 36.72%
Epoch: 6 / 20, Cost : 2.142435, Validation Accuracy: 36.76%
Epoch: 7 / 20, Cost : 2.135697, Validation Accuracy: 36.86%
Epoch: 8 / 20, Cost : 2.130892, Validation Accuracy: 36.87%
Epoch: 9 / 20, Cost : 2.127237, Validation Accuracy: 36.84%
Epoch: 10 / 20, Cost : 2.124216, Validation Accuracy: 36.80%
Epoch: 11 / 20, Cost : 2.121692, Validation Accuracy: 36.79%
Epoch: 12 / 20, Cost : 2.119568, Validation Accuracy: 36.84%
Epoch: 13 / 20, Cost : 2.117731, Validation Accuracy: 36.86%
Epoch: 14 / 20, Cost : 2.116107, Validation Accuracy: 36.88%
Epoch: 15 / 20, Cost : 2.114656, Validation Accuracy: 36.94%
Epoch: 16 / 20, Cost : 2.113353, Validation Accuracy: 37.00%
Epoch: 17 / 20, Cost : 2.112178, Validation Accuracy: 37.01%
Epoch: 18 / 20, Cost : 2.111109, Validation Accuracy: 36.95%
Epoch: 19 / 20, Cost : 2.110125, Validation Accuracy: 36.98%
Epoch: 20 / 20, Cost : 2.109211, Validation Accuracy: 36.93%
The model has been trained successfully.

# Test the trained RNN(vanilla) network.
rnn_net.testRNN(test_input,test_output)

Activating Testing Process
Tested with 1650 datasets.
Test Accuracy: 37.44 %

# Save the trained parameters.
vars = rnn_net.getVariables()
# Terminate the session.
rnn_net.closeRNN()

RNN training session is terminated.

RNN(LSTM) Training

  • Hidden unit의 개수는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
  • 설정값
    1. 훈련에 사용된 데이터 : 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
    2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
    3. 본 실험에서는 사용하지 않았지만 리포트 상에서는 Accuracy의 변화를 보여주고자 훈련에 사용되는 데이터중 20%를 validation 셋(1,700개)으로 구성하였다. 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
    4. Parameters
      • Epoch: 200 (고정)
      • The number of hidden layer: 1 (고정)
      • The number of hidden units: 50, 100, 200
      • Learning Rate: 0.001
      • Cost Function: AdamOptimizer
import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
lstm_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = lstm_data['train_input']
train_output = lstm_data['train_output']
test_input = lstm_data['test_input']
test_output = lstm_data['test_output']

# parameters
problem = 'classification' # classification, regression
rnnCell = 'lstm' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
dropout = 'off'
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

lstm_values = set.RNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

# Setting hidden layers: weightMatrix and biasMatrix
lstm_weightMatrix = lstm_values.genWeight()
lstm_biasMatrix = lstm_values.genBias()
lstm_input_x,lstm_input_y = lstm_values.genSymbol()

lstm_net = net.RNNModel(inputSymbol=lstm_input_x,
                        outputSymbol=lstm_input_y,
                        rnnCell=rnnCell,
                        problem=problem,
                        hiddenLayer=hiddenLayers,
                        trainEpoch=trainEpoch,
                        learningRate=learningRate,
                        learningRateDecay=learningRateDecay,
                        timeStep=timeStep,
                        batchSize=batchSize,
                        dropout=dropout,
                        validationCheck=validationCheck,
                        weightMatrix=lstm_weightMatrix,
                        biasMatrix=lstm_biasMatrix)

# Generate a RNN(lstm) network.
lstm_net.genRNN()

########## RNN Setting #########
Task : classification
Cell Type : lstm
Hidden Layers : 1
Hidden Units : [200]
Train Epoch : 20
Learning Rate : 0.001
Time Steps : 20
Batch Size : 100
Drop Out : off
Validation : on
########## RNN Setting #########
RNN structure is generated.

# Train the RNN(lstm) network.
# In this tutorial, we will run only 20 epochs.
lstm_net.trainRNN(train_input,train_output)

Activating training process.
Epoch: 1 / 20, Cost : 2.871745, Validation Accuracy: 30.65%
Epoch: 2 / 20, Cost : 2.423226, Validation Accuracy: 33.72%
Epoch: 3 / 20, Cost : 2.288991, Validation Accuracy: 35.03%
Epoch: 4 / 20, Cost : 2.220830, Validation Accuracy: 35.94%
Epoch: 5 / 20, Cost : 2.174125, Validation Accuracy: 37.06%
Epoch: 6 / 20, Cost : 2.134481, Validation Accuracy: 37.87%
Epoch: 7 / 20, Cost : 2.098422, Validation Accuracy: 38.62%
Epoch: 8 / 20, Cost : 2.065490, Validation Accuracy: 39.25%
Epoch: 9 / 20, Cost : 2.035602, Validation Accuracy: 39.88%
Epoch: 10 / 20, Cost : 2.008691, Validation Accuracy: 40.51%
Epoch: 11 / 20, Cost : 1.984487, Validation Accuracy: 41.02%
Epoch: 12 / 20, Cost : 1.962478, Validation Accuracy: 41.44%
Epoch: 13 / 20, Cost : 1.942094, Validation Accuracy: 41.81%
Epoch: 14 / 20, Cost : 1.923390, Validation Accuracy: 42.08%
Epoch: 15 / 20, Cost : 1.905551, Validation Accuracy: 42.37%
Epoch: 16 / 20, Cost : 1.888492, Validation Accuracy: 42.68%
Epoch: 17 / 20, Cost : 1.872363, Validation Accuracy: 42.94%
Epoch: 18 / 20, Cost : 1.856971, Validation Accuracy: 43.18%
Epoch: 19 / 20, Cost : 1.842187, Validation Accuracy: 43.42%
Epoch: 20 / 20, Cost : 1.827953, Validation Accuracy: 43.59%
The model has been trained successfully.

# Test the trained RNN(lstm) network.
lstm_net.testRNN(test_input,test_output)

Activating Testing Process
Tested with 1650 datasets.
Test Accuracy: 45.55 %

# Save the trained parameters.
vars = lstm_net.getVariables()
# Terminate the session.
lstm_net.closeRNN()

RNN training session is terminated.

RNN(GRU) Training

  • Hidden unit의 개수는 50, 100, 200으로 총 3번에 걸쳐 실행하였으며, 자세한 설정사항은 아래와 같다.
  • 설정값
    1. 훈련에 사용된 데이터 : 8,500 - 20 - 38 (# of examples, # of time steps ,# of input features)
    2. 테스트에 사용된 데이터 : 1,650 - 20 - 38 (# of examples, # of time steps ,# of input features)
    3. 본 실험에서는 사용하지 않았지만 리포트 상에서는 Accuracy의 변화를 보여주고자 훈련에 사용되는 데이터중 20%를 validation 셋(1,700개)으로 구성하였다. 이 validation은 epoch가 진행됨에 따라 변화되는 accuracy(인풋 케릭터에 대한 아웃풋 케릭터 결과)를 보여준다.
    4. Parameters
      • Epoch: 200 (고정)
      • The number of hidden layer: 1 (고정)
      • The number of hidden units: 50, 100, 200
      • Learning Rate: 0.001
      • Cost Function: AdamOptimizer
import numpy as np
import main.setvalues as set
import main.rnnnetworkmodels as net

# import data.
# data directory.
gru_data = np.load(my_absdir+'/train_data/pg8800_lstm_char_data.npz')
train_input = gru_data['train_input']
train_output = gru_data['train_output']
test_input = gru_data['test_input']
test_output = gru_data['test_output']

# parameters
problem = 'classification' # classification, regression
rnnCell = 'gru' # rnn, lstm, gru
trainEpoch = 20
learningRate = 0.001
learningRateDecay = 'off' # on, off
batchSize = 100
dropout = 'off'
hiddenLayers = [200]
timeStep = 20
costFunction = 'adam' # gradient, adam
validationCheck = 'on' # if validationCheck is on, then 20% of train data will be taken for validation.

gru_values = set.RNNParam(inputData=train_input,
                           targetData=train_output,
                           timeStep=timeStep,
                           hiddenUnits=hiddenLayers
                           )

# Setting hidden layers: weightMatrix and biasMatrix
gru_weightMatrix = gru_values.genWeight()
gru_biasMatrix = gru_values.genBias()
gru_input_x,gru_input_y = gru_values.genSymbol()

gru_net = net.RNNModel(inputSymbol=gru_input_x,
                        outputSymbol=gru_input_y,
                        rnnCell=rnnCell,
                        problem=problem,
                        hiddenLayer=hiddenLayers,
                        trainEpoch=trainEpoch,
                        learningRate=learningRate,
                        learningRateDecay=learningRateDecay,
                        timeStep=timeStep,
                        batchSize=batchSize,
                        dropout=dropout,
                        validationCheck=validationCheck,
                        weightMatrix=gru_weightMatrix,
                        biasMatrix=gru_biasMatrix)

# Generate a RNN(gru) network.
gru_net.genRNN()

########## RNN Setting #########
Task : classification
Cell Type : gru
Hidden Layers : 1
Hidden Units : [200]
Train Epoch : 20
Learning Rate : 0.001
Time Steps : 20
Batch Size : 100
Drop Out : off
Validation : on
########## RNN Setting #########
RNN structure is generated.

# Train the RNN(gru) network.
# In this tutorial, we will run only 20 epochs.
gru_net.trainRNN(train_input,train_output)

Activating training process.
Epoch: 1 / 20, Cost : 3.031808, Validation Accuracy: 29.38%
Epoch: 2 / 20, Cost : 2.442505, Validation Accuracy: 33.77%
Epoch: 3 / 20, Cost : 2.280064, Validation Accuracy: 35.53%
Epoch: 4 / 20, Cost : 2.195805, Validation Accuracy: 37.09%
Epoch: 5 / 20, Cost : 2.137235, Validation Accuracy: 38.18%
Epoch: 6 / 20, Cost : 2.088224, Validation Accuracy: 39.11%
Epoch: 7 / 20, Cost : 2.045377, Validation Accuracy: 39.92%
Epoch: 8 / 20, Cost : 2.008134, Validation Accuracy: 40.61%
Epoch: 9 / 20, Cost : 1.975544, Validation Accuracy: 41.13%
Epoch: 10 / 20, Cost : 1.946191, Validation Accuracy: 41.64%
Epoch: 11 / 20, Cost : 1.919675, Validation Accuracy: 42.24%
Epoch: 12 / 20, Cost : 1.896107, Validation Accuracy: 42.59%
Epoch: 13 / 20, Cost : 1.874857, Validation Accuracy: 42.95%
Epoch: 14 / 20, Cost : 1.855431, Validation Accuracy: 43.32%
Epoch: 15 / 20, Cost : 1.837488, Validation Accuracy: 43.69%
Epoch: 16 / 20, Cost : 1.820764, Validation Accuracy: 44.04%
Epoch: 17 / 20, Cost : 1.805057, Validation Accuracy: 44.39%
Epoch: 18 / 20, Cost : 1.790217, Validation Accuracy: 44.61%
Epoch: 19 / 20, Cost : 1.776120, Validation Accuracy: 44.77%
Epoch: 20 / 20, Cost : 1.762664, Validation Accuracy: 44.92%
The model has been trained successfully.

# Test the trained RNN(gru) network.
gru_net.testRNN(test_input,test_output)

Activating Testing Process
Tested with 1650 datasets.
Test Accuracy: 47.36 %

# Save the trained parameters.
vars = gru_net.getVariables()
# Terminate the session.
gru_net.closeRNN()

RNN training session is terminated.

Comments

  • 위 코드상에서 히든레이어 유닛 개수가 200개인 경우만 한정지어 진행하였으나, 실제로는 히든레이어 유닛 개수를 50, 100, 200으로 달리하여 진행하였으며 그에 따른 결과는 아래의 표에서 나타난다.
  • 초반 Accuracy의 변화량을 보여주고자 본 코드에서는 각 모델의 훈련 Epoch를 20회만 진행하였으나, 실제 훈련에서는 각 실험당 총 200회의 Epoch가 진행되었다.

Result

  1. 히든레이어 개수와 상관없이 훈련이 안되던 ANN의 결과와 비교해 볼 때, RNN(Vanilla)와 RNN(LSTM), 그리고 RNN(GRU)는 안정적으로 훈련이 진행되며 그에 따라 성능 향상도 보여주고 있다.
  2. 표에서 나타나는 것처럼 RNN(LSTM)과 RNN(GRU)가 비슷한 성능을 (히든레이어 유닛 200에서 각각 72.59% 70.89%로 약 2%차이) 보여주며, 이는 RNN(Vanilla) 대비 약 22% 정도의 큰 성능차이를 보여준다.
  3. RNN(LSTM)과 RNN(GRU)를 놓고 비교해볼 경우 본 실험에서는 RNN(LSTM)이 RNN(Vanilla)보다 약간 2% 정도의 높은 성능을 보여주고 있다. 하지만 최근 논문들에서 GRU가 LSTM보다 더 좋은 결과를 가져온다고 주장하는점으로 비춰 볼 때, 다른 테스크에는 어떤 차이가 나타날지 주목해 볼 필요가 있다.
  4. 또한 Accuracy 측면에서 RNN(Vanilla)는 불안정하게 하향과 상향을 반복하는 반면, RNN(LSTM)과, RNN(GRU)는 비교적 안정된 Accuracy 상향을 보여주고 있다.
Model Hidden Units Accuracy
RNN(Vanilla) 50 44.42%
RNN(Vanilla) 100 47.86%
RNN(Vanilla) 200 50.23%
RNN(LSTM) 50 49.76%
RNN(LSTM) 100 56.54%
RNN(LSTM) 200 72.59%
RNN(GRU) 50 49.68%
RNN(GRU) 100 55.75%
RNN(GRU) 200 70.89%

Github Code

  • 다음의 깃헙 코드를 다운받으면 본 실험을 재현할 수 있다.
  • Jupyter에서 실행을 원할 경우, 위 코드를 받고 jupyter notebook 최상단쯤에 보이는 absolute directory를 코드 상의 폴더이름(/your/path/to/HY_python_NN)으로 정한 뒤 실행하면 된다.
Hyungwon Yang

댓글

이 블로그의 인기 게시물

Kaldi Tutorial for Korean Model Part 1

Korean Forced Aligner: 한국어 자동강제정렬

Kaldi Tutorial for Korean Model Part 4