Before making a distinction between different dog breeds, the application tries to decide if human or dog is depicted on the image. OpenCV's implementation of Haar feature-based cascade classifiers is used to detect human faces in images.
A pre-trained ResNet-50 model is used, to detect dogs. The application downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a large and popular dataset used for image classification and other computer vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.
Distinguishing between dog breeds is a complex task but Convolutional Neural Networks perform well on it. While it is possible to learn the network from scratch, it is reasonable to use transfer learning. The key advantage of transfer learning is the usage of bottleneck features of a pre-trained model. A new model consists mostly of fully-connected layers and takes bottleneck features as input. Only those layers are trained afterward. Models built like that, show excellent performance and spare a lot of time and computational power to be trained.
The pre-trained model named Xception is used to solve the problem. Global average pooling layer and Fully-connected layer provide a new model trained on available images. To make a classification, Xception takes an image as input and returns bottleneck features. The features go into the new model as input and new model makes the classification.