Pytorch Autoencoder starts with Batchnorm before first Layer

Dear Contributors,

if BatchNorm is enabled in the options the PyTorch AutoEncoder starts with a BatchNorm before passing the input samples to the first linear layer. This might be unwanted behavior. Normally the input data should be preprocessed and passed into the first linear layer.

IMHO, BatchNorm should happen after the data has been processed by the first layer. Having preprocessing by Normalization and immediately applying BatchNorm to the data would be somehow doubled.

This behavior would then also differ from the TensorFlow implementation where the data gets passed directly into the first linear layer: https://github.com/yzhao062/pyod/blob/master/pyod/models/auto_encoder.py#L172-L179

EDIT: I just noticed there is also a dropout layer after the last linear layer (and activation). This should IMHO also not be the case, as this would purposely set elements of the reconstructed sample (output) to zero. This also differs from the TensorFlow implementation. See https://github.com/yzhao062/pyod/blob/master/pyod/models/auto_encoder.py#L190-L191 Also the DropOut is applied before the batch norm which will definitely skew the result. The batch norm won't be able to learn a good normalization if dropout is randomly applied right before (and turned off for inference).

TL;DR

Currently the order is: Input->BatchNorm->Neurons->Activation->DropOut->BatchNorm->.... IMHO it should be: Input->Neurons->BatchNorm->Activation->DropOut->Neurons->....

I verified my suspicion by running the PyTorch AutoEncoder Example with print(cls) after Line 39.

The output was the following:

AutoEncoder(batch_norm=True, batch_size=32, contamination=0.1,
      device=device(type='cpu'), dropout_rate=0.2, epochs=10,
      hidden_activation='relu', hidden_neurons=[64, 32],
      learning_rate=0.001, loss_fn=MSELoss(), preprocessing=True,
      weight_decay=1e-05)
inner_autoencoder(
  (activation): ReLU()
  (encoder): Sequential(
    (batch_norm0): BatchNorm1d(300, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (linear0): Linear(in_features=300, out_features=64, bias=True)
    (relu0): ReLU()
    (dropout0): Dropout(p=0.2, inplace=False)
    (batch_norm1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (linear1): Linear(in_features=64, out_features=32, bias=True)
    (relu1): ReLU()
    (dropout1): Dropout(p=0.2, inplace=False)
  )
  (decoder): Sequential(
    (batch_norm0): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (linear0): Linear(in_features=32, out_features=64, bias=True)
    (dropout0): Dropout(p=0.2, inplace=False)
    (batch_norm1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (linear1): Linear(in_features=64, out_features=300, bias=True)
    (dropout1): Dropout(p=0.2, inplace=False)
  )
)

There you can see that the first element in the sequential of the encoder is a BatchNorm. A Pullrequest resolving this issue is on the way. Thanks for reading! (EDIT: See Pullrequest #435 )

If you are annoyed by my fine-grained Pullrequests or issues, please let me know. I don't want to cause any extra work.