Prediction of Airbnb Listing Prices with Deep Learning: Project to Learn AI vol.4

Photo by Tod Seitz on Unsplash

Introduction

This article is a continuation of this.
After learning classical Machine Learning, I learned Deep Learning. As well as Machine Learning, the goal was learning several techniques about Deep Learning and understanding how to find the best model. I have read several books about DL in the past, but I have never actually used it only by knowledge. In general, image processing is often seen in DL studies, but this time is a regression model that predicts prices from 91 features of Airbnb listings. There were some frameworks and I didn’t know what to start with, so according to my mentor’s advice, I decided to use Keras while referring to a blog[1]. I used Adam though there were various optimizers because this time is for knowing the basic usage.

The machine spec was the following.
PC: MacBook Air (Retina, 13-inch, 2018)
Processor: 1.6GHz Intel Core i5
Memory: 16GB

Learned knowledge

  • scikit-learn
  • SparkML
  • Keras
  • TensorFlow
  • PyTorch
  • fast.ai
  • Theano
  • Convolution
  • ReLU
  • Pooling

To prevent over-fitting, intentionally deactivate a certain percentage of neurons.

Early stopping is a method that allows you to specify an arbitrarily large number of training epochs and stop training once the model performance stops improving on a hold out validation dataset.[2]

Optimizers update the weight parameters to minimize the loss function.

  • SGD
  • MomentumSGD
  • NAG
  • AdaGrad
  • RMSprop
  • AdaDelta
  • Adam
  • RAdam
  • RMSpropGraves
  • SMORMS3
  • AdaMax
  • Nadam

Tools

  • TensorBoard
  • Bakke

Understanding learning by Interpretability

Classifier

  • VGG-16
  • VGG-19
  • InseptionV3
  • XCeption
  • ResNet-50

NLP

  • Deep Dream
  • Image-to-image translation with conditional adversarial nets
  • Pix2Pix
  • CycleGAN
  • Screen Grab
  • Wave Net
  • ACS Images Caption Generation
  • Use the ADAM optimizer.
  • ReLU is a good nonlinearity (activation function).
  • Do NOT use an activation function at your output layer.
  • DO add a bias in every layer.
  • Use variance-scaled initialization.
  • Whiten (normalize) your input data.
  • Scale input data in a way that reasonably preserves its dynamic range.
  • Don’t bother decaying the learning rate (usually).
  • If your convolution layer has 64 or 128 filters, that’s probably plenty.
  • Pooling is for transform invariance.

Technique

The whole of the Jupyter Notebook is here.

It started with a single layer, I strived to improve the score for adding, expanding layer and so on. Actually what I did is changing the batch size, standardizing, adding a layer, expanding the layer, using dropout, using early stopping, and using decay. The data that was used is here.

Results

First, I tried a model with one hidden layer with 100 epochs. Changing the batch size from 50 to 150 improved the validation r2 value from 0.615 to 0.620, and also reduced the learning time from 2h 44min 44s to 1h 19min 14s. It seems that the number of data to be learned in each batch is better when it is gathered to a certain extent, and the time is shortened because the number of repetitions is reduced. When the size was increased to 300, it was improved in the same way, but the improvement slowed down, so I decided to proceed with 300 batch size.

Added a standardizing layer before the input layer. The validation r2 value was 0.62, which is the same as before, but this time I had increased the number of epochs to 300. The learning curve was not very stable and did not give good results. I had already made adjustments to the value of the price by taking the log, so I decided that normalization is unnecessary and decided to make subsequent trials.

A comparison was made between a deeper network and a wider network. In the case that one hidden layer is added, validation r2 is 0.648. On the other hand, the number of neurons increased from 91 to 150, which was 0.629. Both improved, but the better result was obtained with deeper layers. Since the learning time was not different, I decided to use a deeper model after this.

I compared the case where dropout was inserted to the input layer and the hidden layer. Both the results were worse. The one applied to the input layer became very unstable and r2 decreased. That applied to the hidden layer was stable. Although the r2 score of training was very low, that reason was inactive neurons during training, so it seems that it could learn properly when you see the validation value. However, the results ware getting worse and might not be effective unless a larger network. I decided not to use it in later learning.

It was a good result of deepening the network, so I decided to get deeper for further improvement. First, when one hidden layer was added, the validation r2 improved from 0.648 to 0.664. The learning time is also longer, from 3h 1min 34s to 3h 42min 19s, which was a good result. Adding another hidden layer, r2 was a little better at 0.665, but the learning time was 4h 43min 29s. I decided that it was not effective to increase the number of layers beyond this. However, it has been confirmed that deepening as much as the resources allow has a certain effect.

The above five-layer model (deeper_deeper_deeper_model) was learned in 300 epochs, but it looks not to reach the plateau yet. So I increased the number of epochs and used Early stopping. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. However, this model stopped before reaching the previous best score. The model may be improved with setting the baseline parameter, but it is expected to stop immediately after exceeding the baseline, so I thought it was meaningless because it is necessary to know the best score. I decided not to use it in subsequent trials and compared the scores at a certain number of epochs as before.

Decay rate parameters based on the calculation formula were implemented in two patterns, but good results were not obtained. I also tried to set decay at step, but it also didn’t work. Since this adjust was very time consuming, it was interrupted, and this parameter tuning seems very difficult. In my opinion, as long as I looked at the curve, you can not expect a significant improvement, so I thought it would be better not to spend too much time except when you really need a good precision.

Conclusion

I learned the basics of deep learning model construction using Keras. To get the best learning results, there were a lot of persevering things such as layer structure and adjustment of some parameters. This is the same as the previous classical machine learning. There are still parts that I don’t understand deeply about the results, so I will continue to learn. The AutoML field seems to be developing recently. I also understood the importance of automating this tuning work and producing good results without depending on the experience of DL craftsmen. Personally, I enjoyed the process of tuning and making good results. However, the waiting time is very long, so in the future I would like to use SaaS or cloud services to utilize a higher-spec environment and do more trial and error. In addition, transfer learning has the potential for a wide range of applications. In fact, because Google has no competing amount of data and resources, I am also interested in using the public model as a practical service. As for my own learning, I will take time to tackle Kaggle’s problems.

References

[1] Regression Tutorial with the Keras Deep Learning Library in Python
https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/

[2] Use Early Stopping to Halt the Training of Neural Networks At the Right Time
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/

[3] Practical Advice for Building Deep Neural Networks
https://pcc.cs.byu.edu/2017/10/02/practical-advice-for-building-deep-neural-networks/