Na het laden van het standaard model worden de regels hieronder toegepast. Inputs geeft denk ik aan dat de 2 eerste inputs gebruikt worden. 2 x 128 groot (Input token en Input segment). Daarna wordt de layer 'NPS-Dense'(768) gekoppeld aan 'dense'. En deze wordt dan aan de uiteindelijke nieuwe output layer gekoppeld van een bepaald formaat. (In dit voorbeeld 20 groot)
Ik moet even zoeken waarom de andere eind-layers uit het model verdwijnen maar uitindelijk lijkt het logisch als je de naamgeving van de layers volgt.
Oorsponkelijke Bert model staart:
...
Encoder-12-FeedForward-Add (Add (None, 128, 768) 0 Encoder-12-MultiHeadSelfAttention
Encoder-12-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-12-FeedForward-Norm (La (None, 128, 768) 1536 Encoder-12-FeedForward-Add[0][0]
__________________________________________________________________________________________________
MLM-Dense (Dense) (None, 128, 768) 590592 Encoder-12-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
MLM-Norm (LayerNormalization) (None, 128, 768) 1536 MLM-Dense[0][0]
__________________________________________________________________________________________________
Extract (Extract) (None, 768) 0 Encoder-12-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
MLM-Sim (EmbeddingSimilarity) (None, 128, 30000) 30000 MLM-Norm[0][0]
Embedding-Token[0][1]
__________________________________________________________________________________________________
Input-Masked (InputLayer) (None, 128) 0
__________________________________________________________________________________________________
NSP-Dense (Dense) (None, 768) 590592 Extract[0][0]
__________________________________________________________________________________________________
MLM (Masked) (None, 128, 30000) 0 MLM-Sim[0][0]
Input-Masked[0][0]
__________________________________________________________________________________________________
NSP (Dense) (None, 2) 1538 NSP-Dense[0][0]
==================================================================================================
Aangepast model staart:
...
... __________________________________________________________________________________________________
Encoder-12-FeedForward-Norm (La (None, 128, 768) 1536 Encoder-12-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Extract (Extract) (None, 768) 0 Encoder-12-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
NSP-Dense (Dense) (None, 768) 590592 Extract[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 20) 15380 NSP-Dense[0][0]
==================================================================================================
Totaal nieuw model:
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input-Token (InputLayer) (None, 128) 0
__________________________________________________________________________________________________
Input-Segment (InputLayer) (None, 128) 0
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 128, 768), ( 23040000 Input-Token[0][0]
__________________________________________________________________________________________________
Embedding-Segment (Embedding) (None, 128, 768) 1536 Input-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Token-Segment (Add) (None, 128, 768) 0 Embedding-Token[0][0]
Embedding-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Position (PositionEmb (None, 128, 768) 98304 Embedding-Token-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Dropout (Dropout) (None, 128, 768) 0 Embedding-Position[0][0]
__________________________________________________________________________________________________
Embedding-Norm (LayerNormalizat (None, 128, 768) 1536 Embedding-Dropout[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 128, 768) 2362368 Embedding-Norm[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 128, 768) 0 Encoder-1-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 128, 768) 0 Embedding-Norm[0][0]
Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 128, 768) 1536 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward (FeedForw (None, 128, 768) 4722432 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward-Dropout ( (None, 128, 768) 0 Encoder-1-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-1-FeedForward-Add (Add) (None, 128, 768) 0 Encoder-1-MultiHeadSelfAttention-
Encoder-1-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-1-FeedForward-Norm (Lay (None, 128, 768) 1536 Encoder-1-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 128, 768) 2362368 Encoder-1-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
...
...
...
__________________________________________________________________________________________________
Encoder-11-FeedForward-Norm (La (None, 128, 768) 1536 Encoder-11-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 128, 768) 2362368 Encoder-11-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 128, 768) 0 Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 128, 768) 0 Encoder-11-FeedForward-Norm[0][0]
Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 128, 768) 1536 Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-FeedForward (FeedFor (None, 128, 768) 4722432 Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-FeedForward-Dropout (None, 128, 768) 0 Encoder-12-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-12-FeedForward-Add (Add (None, 128, 768) 0 Encoder-12-MultiHeadSelfAttention
Encoder-12-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-12-FeedForward-Norm (La (None, 128, 768) 1536 Encoder-12-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Extract (Extract) (None, 768) 0 Encoder-12-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
NSP-Dense (Dense) (None, 768) 590592 Extract[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 20) 15380 NSP-Dense[0][0]
![]() |
Ook een modellen overzicht :-) |
Geen opmerkingen:
Een reactie posten