In supervised deep learning, gradients are information to update weights during training, if the gradient is too small or zero, the weights are almost unchanged, leading to the model not learning anything from the data. The article providing solutions to the problem of vanishing gradients in Multi Layer Perceptrons (MLP) neural networks when performing train models that are too deep (with many hidden layers). There are six different methods that affect the model, train tactics, etc. to help minimize vanishing gradients featured in the article on the FashionMNIST dataset. In addition, we also introduced and built the MyNormalization() function, a custom function similar to Pytorch's BatchNorm. The purpose of this function is to control variance and reduce the volatility of characteristics across layers. The ultimate goal is to optimize the deep MLP model so that it can learn efficiently from data without being affected by the gradient vanishing problem.