Machine Vision and Learning Lab

Our Researches

The research direction and results of our laboratory in recent years.

CCUMVL-Vehicle-ReID DATASET

This dataset is intended for vehicle re-identification purposes.
The vehicle route starts from the Taibao Branch of the Chiayi County Railway Police Bureau(612, Chiayi County, Taibao City, 23), along the Gaoxie West Road extension of Jiapu Highway(No. 168, Gaotie W Rd, Taibao City, Chiayi County, 612) to the Puzi Gas Station(Fumaopuzi Station), and a total of 8 cameras are used to capture the footage along the way.

More details record in Download page.

Please read the License first and submit the application form.
link:
License Link
Apply From Link

Very Long Time Series Data Augmentation via Deep Learning for Silicon Wafer Quality Prediction

In the wafer grinding process, sensors are usually installed in the grinding machine to monitor the wearing process and predict the product quality. However, labeling the time series data is very time-consuming. Therefore, this research aims for very long time series data augmentation based on the deep learning model. The augmented data is verified by wafer quality prediction model. In this report, the proposed data augmentation deep learning method is divided into three stages: augmented data generation, feature representation learning, and prediction quality model. The first stage is the augmented data generation. Temporal Pattern Attention Long short-term memory (TPA-LSTM) model is utilized to realize data augmentation. The second stage is feature extraction, which will extract data features based on the Long short-term memory (LSTM) model or the Auto-Encoder model. It is expected to convert the high-dimensional space of the original signal data into a low-dimensional feature representation to reduce the burden of model learning. Data slicing method is considered to further reduce the model size and computational complexity during model training. The third stage is quality prediction. This is to evaluate the stability under different augmentation settings, different feature extraction methods and different models. Experimental results demonstrate the superior performance improvement when the proposed data augmentation method is exploited for very long time series data generation. The improvement ratio goes up to 98.42% on the real-world wafer grinding dataset.

Generating Adversarial Examples Based on Perceptual Visual Properties

Recent studies show that deep neural networks (DNN) will be affected by adversarial examples, and images after adding perturbation are called adversarial examples. Among them, adversarial samples with a large amount of perturbation can make the deep neural network greatly affected, but it is also easier to detect the difference between the adversarial samples and the original image in visual perception. To improve the image quality of adversarial samples, this paper proposes a general improvement method and an adversarial attack method based on the perceptual visual characteristics, where perceptual visual characteristics include spectral sensitivity and just noticeable difference (JND). Our proposed method will be combined with existing adversarial attack methods, showing the generality of our method. In addition, it can also become an adversarial attack method independently. Experiments show that our method can improve the attack performance of existing methods and improve the image quality of adversarial samples in some cases.

Writing calligraphy on robot

The research includes three areas of artistic creation, robotic automation, and artificial intelligence(AI). The AI technology is used to construct writing style of famous calligrapher, and writing with the arm.
Calligraphy style transfer:
In the process of calligraphy style transfer, the method is based on CycleGAN. With a improvement of adding embedding layers to overcome that a single model can only convert a different style limit. By collecting the wrist movements during writing, the robot can simulate the calligrapher's writing. After the calligraphy are written.
Generating stroke orders and robot trajectory:
Thining the transferred calligraphy lets the robotic arm simulate the calligrapher's writing action to write the calligraphy characters, we need to convert the coordinates of the thinned images to get the six-axis data. The six-axis sequence data of the calligraphy is provided to the robot arm for writing the calligraphy characters.

Incorporating attack information into makeup to attack deep learning models

Machine learning has evolved very rapidly, with good results in both computer vision and natural language processing. There are many deep learning techniques that are used in everyday life of humans such as autonomous vehicles and face recognition systems. Nowadays, the gradual dependence of human daily life on deep neural networks can lead to serious consequences, so the security of neural networks becomes very important. Therefore, the deep neural network has obvious weaknesses. We propose a method based on generating a confrontation network to generate a facial makeup picture that can deceive the face recognition system. We hide the perturbation of the attack in the results of the abnormal makeup photos that humans can’t detect. The experimental results show that we can not only generate high-quality facial makeup images, but also our attack results have a high attack success rate in the face recognition system.

Using the Generative Adversarial Network(GAN) to generate music rhythm games

The music rhythm game is currently a very popular game, and we propose to generate a music rhythm game spectrum based on the method of Generative Adversarial Network. The music is separated into two parts: the vocal and the soundtrack, which makes the generated spectrum closer to the real spectrum. The model consists of two concepts of Generative Adversarial Network: Conditional Generative Adversarial Nets (CGANs) for music information and Improved Wasserstein GAN (WGAN-GP) for better convergence of the model.

Be an Artist! Scribble Lines to Painting.

We propose a fully automated system that converts random graffiti into a painting. However, this is a serious challenge because the input graffiti can be very messy and hide multiple objects, so finding the correlation between these repeated lines and multiple objects is not a simple matter. In the system, we use selective search, sparse coding and Convolutional Neural Network (CNN), in which we use selective search to find the part of the object that may be the object of the graffiti; then use sparse coding to find the corresponding element; CNN sets the style to be converted. The final experimental results show that the methods we use have superior performance and produce artistic works.

Clothing style analysis and popular element capture

With more and more styles of clothing and accessories, regardless of the physical or online store, consumers will spend a lot of time looking for their favorite styles in many styles, so if consumers can give some photos of their favorite costumes, systematic analysis Find out the relevant information in the photo (such as the store address, matching related accessories, etc.). For the store, if you can collect the relevant clothing styles of the customers, you can adjust the purchase styles and the furnishings in the store according to this information, further recommend the related accessories to consumers according to the preferences of consumers and save consumers to find matching accessories. time. For garment manufacturers, they can analyze the data collected by various stores to know which styles are popular and those styles are unpopular, and thus become the next batch of new style design references.

Deep Learning for Sensor-Based Rehabilitation

In this work, we aimed to evaluate four kinds of rehabilitation exercises at three levels: good, average, and bad. We propose a novel evaluation method by learning the best feature of each class. The idea was to design an evaluation matrix where each entry corresponded to one level of one exercise. By setting the largest number in one entry, the evaluation matrix could be used along with the output layer of the deep learning model to infer the best feature of that exercise at a particular level. The evaluation score is obtained by examining the distance measure of the current feature and the best feature of that class. We also collect a new rehabilitation exercise dataset for the rehabilitation exercise evaluation. It contains four different rehabilitation actions at three levels, defined by rehabilitation physicians.

Outdoor low resolution face recognition

The goal of this project is to compare low-resolution face images to verify that they are the same person. In today’s unrestricted environment, the effectiveness of face recognition often decreases due to posture factors, so we establish a normalization method to restore any face angle, thereby returning the face angle of any state to increase The effectiveness of face recognition. The project uses two Caffe model architectures: Matching-Convolutional NeuralNetwork (M-CNN) and Siamese Neural Network (SNN). Finally, the accuracy of the SNN model is more than 90%, which is higher than that of M-CNN.

Multiple attributes image classification

When sorting face images, there are inevitably some accessories in the images to be identified, such as sunglasses, scarves, earrings, etc., or external environmental factors such as light, angle, etc. These accessories or environmental factors are in people. The face image is called multiple attributes. We uses the existing Local Discriminant Embedding (LDE) algorithm as an extension to achieve multiple attribute classification purposes.

Sparse Coding

In recent years, sparse coding has been very popular in the field of computer vision and image processing. Sparse coding consists of a linear combination of input data, dictionary and input data. Sparse coding can be used for image denoising, restoration, and classification. The laboratory focuses on two research directions based on sparse coding: multiple attribute image classification and sparse coding of huge amounts of data.