Layer (type) Output Shape Param # Connected to
==================================================================================================
Input-Token (InputLayer) (None, 128) 0
__________________________________________________________________________________________________
Input-Segment (InputLayer) (None, 128) 0
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 128, 768), ( 23040000 Input-Token[0][0]
__________________________________________________________________________________________________
Embedding-Segment (Embedding) (None, 128, 768) 1536 Input-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Token-Segment (Add) (None, 128, 768) 0 Embedding-Token[0][0]
Embedding-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Position (PositionEmb (None, 128, 768) 98304 Embedding-Token-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Dropout (Dropout) (None, 128, 768) 0 Embedding-Position[0][0]
__________________________________________________________________________________________________
Embedding-Norm (LayerNormalizat (None, 128, 768) 1536 Embedding-Dropout[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 128, 768) 2362368 Embedding-Norm[0][0]
...
...
Nou, blijkbaar dus niet. Een lengte van 256 lukt mij niet:
Resource exhausted: OOM when allocating tensor with shape[300,256,256]
Maar als ik de lengte naar 192 aanpas (en voor de zekerheid de batchgrootte naar 10) begint die vrolijk te zoemen. Ofwel: hij gaat er lekker mee aan de slag. Wel duren de epochs langer: 1562 seconden. Nou ja, ik ben benieuwd of hij er wijzer van is geworden.
Epoch 1/10
42890/42890 [==============================] - 1562s 36ms/step - loss: 1.7809 - sparse_categorical_accuracy: 0.6647
Epoch 2/10
42890/42890 [==============================] - 1555s 36ms/step - loss: 0.7816 - sparse_categorical_accuracy: 0.8110
Epoch 3/10
42890/42890 [==============================] - 1536s 36ms/step - loss: 0.6109 - sparse_categorical_accuracy: 0.8435
Epoch 4/10
42890/42890 [==============================] - 1544s 36ms/step - loss: 0.5158 - sparse_categorical_accuracy: 0.8633
Epoch 5/10
42890/42890 [==============================] - 1543s 36ms/step - loss: 0.4523 - sparse_categorical_accuracy: 0.8755
Epoch 6/10
42890/42890 [==============================] - 1549s 36ms/step - loss: 0.4019 - sparse_categorical_accuracy: 0.8885
Epoch 7/10
42890/42890 [==============================] - 1550s 36ms/step - loss: 0.3620 - sparse_categorical_accuracy: 0.8979
Epoch 8/10
42890/42890 [==============================] - 1548s 36ms/step - loss: 0.3278 - sparse_categorical_accuracy: 0.9080
Epoch 9/10
42890/42890 [==============================] - 1549s 36ms/step - loss: 0.2985 - sparse_categorical_accuracy: 0.9134
Epoch 10/10
42890/42890 [==============================] - 1551s 36ms/step - loss: 0.2720 - sparse_categorical_accuracy: 0.9223
Geen opmerkingen:
Een reactie posten