Switch to batchnorm architecture
Use a single BatchNormalization
as the first layer, instead of TimeDistributed(Dense())
layers. The old way is still around in the timedistributeddense
branch.
Use a single BatchNormalization
as the first layer, instead of TimeDistributed(Dense())
layers. The old way is still around in the timedistributeddense
branch.