C++ application designed for automated student attendance tracking
Developed as a part of my coursework, this project aims to simplify the process of recording and managing attendance data, making it efficient and error-free.
Initially, the model detects faces in an image using the Histogram of Oriented Gradients (HOG) algorithm. HOG extracts brightness gradients and their directions, creating a feature vector for classifying regions as face or non-face. The image is normalized for brightness and contrast, gradients are computed for each pixel, and histograms of gradients are built for small cells. These histograms are then normalized and combined into a feature vector for classification.
After detecting faces, the model places an initial average face shape inside a rectangle around the detected face to set initial positions of the landmarks. The model iteratively adjusts these positions using texture features around current landmark positions. These features include brightness gradients and other image characteristics, aiding in precise landmark detection. The regressors, trained on a large dataset, minimize errors between current and true landmark positions, refining them iteratively until stable.
The dlib_face_recognition_resnet_model_v1.dat uses a ResNet (Residual Network) architecture to convert face images into compact, informative vector representations (descriptors) for identification and comparison.
ResNet architecture includes residual blocks with convolutional layers, batch normalization, and ReLU activation. "Skip connections" in residual blocks help prevent the vanishing gradient problem, enhancing deep network training. Convolutional layers extract features at various abstraction levels, batch normalization stabilizes training, and ReLU introduces non-linearity for learning complex functions.
Pre-processing involves normalizing the face image to 150x150 pixels, adjusting brightness, and contrast. The ResNet converts this image into a 128-dimensional vector (descriptor) encoding key facial features. For face identification, the descriptor of a new image is compared to those in a database using Euclidean distance. If the distance is below a certain threshold, the faces are considered to match, indicating they belong to the same person.
The model was tested on a dataset from kaggle