• Hi!
    I'm Yuwei

    Master of Computational Data Science at Carnegie Mellon University.

About Me


Download Resume

       Yuwei is currently a Master of Computational Data Science (MCDS program) candidate of Carnegie Mellon University. Including attaining an A letter GPA in the world-class top CS school, Yuwei has shown exceptional academic capability by graduating with honor from Electronic Engineering of Tsinghua University, the top one university in China.
       In the aspect of Yuwei's career potential, she was an researcher in Tsinghua University and University of Pennsylvania. Recently, Yuwei has just completed her Machine Learning and Relevance Engineer internship at LinkedIn Artificial Intelligence Algorithms Foundation team. The open-ended research and implemented from scratch to construct the first company-wise scalable automated machine learning framework, which has been applied to a patent at the end of her internship and will be pushed to production in the next quarter.
       Equipped with excellence in not only frontier Machine/Deep Learning topics including but not limited to CV, NLP, AutoML and Multimedia, but also large-scale distributed learning systems, Yuwei is absolutely ready for a promising career in top high-techs.

School Work


Master of Computational Data Science program in CMU equiped me with the skills and knowledge you need to develop the layers of technology involved in the next generation of massive information system deployments and analyze the data those systems generate.

08/2018 - 12/2019(Expected)
CGPA - 3.9/4.0
Relevant Coursework: Introduction to Computer System(A), Introduction to Machine Learning(A+), Large-scale Machine Learning(A+), Topics in Deep Learning, Large-Scale Multimedia Analysis, Advanced Cloud Computing.

Carnegie Mellon’s School of Computer Science is widely recognized as one of the first and best computer science programs in the world.

Bachelor Degree of Electrical Engineering in THU provided with a thorough background in the fundamentals of electrical or computer engineering, as well as the opportunity for in-depth specialization in some particular aspect of these fields.

08/2014 - 07/2018
Graduated with honor ~top 5%.
CGPA - 3.8/4.0
Relevant Coursework: Data Structure & Algorithms, Intro to Machine Learning, Media and Cognition, Operating System, Computer Architecture.

Tsinghua has been ranked as the best engineering and computer science school in the world based on factors including total research output and performance.


Work Experience

Machine Learning and Relevance Engineer Intern, Linkedin
05/2019 - 08/2019

  • Mentored by Bee-Chung Chen(Distinguished Software Engineer) and worked with AI Algorithms Foundation Team.
  • Implemented Auto-tuned neural networks model with Bayesian Optimization and AdaNet from scratch to deploy distributed model training on LinkedIn 1.13TB Job You May Interested In dataset.
  • Increased AUC by 1.3% and shortend training time by 3 times compared to the currently used LinkedIn GLMix model.
  • Model will be pushed to production as online services backend framework in the next quater.

Teaching Assistant at CIS, University of Pennsylvania
06/2017 - 08/2017

  • Summer Session: Visual Intelligence & ML, under Prof. Jianbo Shi.
  • Held recitations, office hours, oral presentations and final review sessions, and graded homework.
  • Designed problems, all testcases for programming assignments.

Research Intern at EE, Tsinghua University
12/2016 - 06/2017

  • Led a group in developing an interactive system using MATLAB and C++ for 1,280 sets of eye tracking experiments with over 1,000 candidates.
  • Proposed and implemented an unsupervised learning approach with Caffe to generate newly defined features.
  • Contributed to a first-authored paper, accepted as oral presentation in ICIG 2017.

Recent Work



Linkedin, Sunnyvale, CA | 05/2019

  • Nowadays, in learning tasks data come and go, tasks come and go, but learning itself is forever. How to learn more effectively, less trial-and-error is the ultimate goal of today's deep learning. Two main problems in so-called “end-to-end Deep Learning” are firstly the great consumption of (expertise) human power and secondly high cost of hyperparameter tuning/ structure learning. Automatic tuning is the trend.
  • In this project, we explore various auto tuning algorithms sheds light on future direction for different DNN model at LinkedIn. We also wiped out disastrous heavyweight structure exploration problem in real-world auto-tuning services by introducing adaptive structure learning algorithm.
  • To deploy the auto-tuning freamwork, we implemented asynchronous model training to avoid straggler effects and communication latency on LinkedIn 1.13TB Job You May Interested In dataset.
  • We increased AUC by 1.3% and 3.1%, shortend training time by 3 and 2 times on Job You May Be Interested In dataset and People You May Know dataset compared to the currently used LinkedIn GLMix model. We have Model will be pushed to production as online services backend framework in the next quater.

  • Graduate

    Carnegie Mellon University | 05/2019

  • We proposed a generative adversarial network based solution to transform photos of real-world scenes into Chinese ink wash style images, which is valuable and challenging in computer vision and computer graphics.
  • Due to the features of ink wash style paintings: 1) smooth edges rather than clear contours, 2) arc-shaped sketches and 3) rough textures of the colors, existing methods do not produce satisfactory results.
  • Therefore, we propose a newly defined edge-weakening adversarial loss for preserving smooth edges and a arc-prompting adversarial loss to maintain the arc-shape of most of the lines.
  • Carnegie Mellon University | 04/2019

  • To train machine learning models with larger models and due to the storage and computing limitations of the workers, I used Spark and Hadoop MapReduce to build up a distributed framework for machine learning training.
  • To avoid unnecessary memory consumption, the data samples are stored in a sparse format and join-based parameter communitation are heavily used. Also, due to the huge cost of the shuffling in Spark rdds communication, I designed a optimized dataflow with least usage of shuffling parameters.
  • After optimization, my framework could deploy logistic regression model training of 3 iterations with 882,774,562 features within 60 mins, 5 iterations with 54,686,452 features with in 60 mins, 10 iterations with 20,216,830 features within 30 mins.
  • For scheduling of a group of jobs, I use a short-job-first policy in Golang with Kubernets to maintain the max utility.
  • Carnegie Mellon University | 02/2019

  • This framework aims to distinguish a pair of face images that may or may not belong to the same person. To achieve this, I built up an end-to-end deep learning system to explore deep face embeddings. The whole training process are divided into two stage: pre-task of face classification to trained the feature extractor and post-task of face verification using angular loss.
  • For the face classification task, I used a set of vanila convolutional networks as feature extractors and cropped the input data in different scales. I concatenated the highly abstract embeddings from these networks as face identification of the input data. This model is trained with Cross Entropy Loss and finally achieves an over 95% accuracy.
  • For the face verification task in the next stage, I deleted the classifier layers of the classification model but kept the convolutional and residual blocks as feature extractors. Next I embedded these layers into Siamese Network and trained with angular loss. Finally I achieved a AUC 99% on the test set.

  • Carnegie Mellon University | 03/2019

  • This framework used a combination of LSTMs and CNNs, and beam search decoder to design a system for speech to text transcription. End-to-end, this system is able to transcribe a given speech utterance to its corresponding transcript.
  • Intuitively, the Listener produces a high-level representation of the given utterance and the Speller uses parts of the representation (produced from the Listener) to predict the next word in the sequence.
  • The Listener consists of a Pyramidal Bi-LSTM Network structure that takes in the given utterances and compresses it to produce high-level representations for the Speller network. Attention intuitively can be understood as trying to learn a mapping from a word vector to some areas of the utterance map.
  • To account for the noise, the dataloader passes the generated chars/words during training, rather than the true chars/words, with some probability, which is called as teacher forcing.

  • Carnegie Mellon University | 06/2018

  • C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc and free.
  • I built a dynamic allocation system with segregated free list and best fit searching.
  • This malloc package makes efficient usage of space with an average utilization of 74.4% and achieved an throughput of 15735 Kops/sec on an Intel CPU@3.10GHz machine with a benchmark of 16920 Kops/sec.
  • Carnegie Mellon University | 12/2018

  • Basically by mining the data in Yelp dataset, we are going to answer two questions: .
  • Firstly, what are the best locations for restaurants to make profits? There are many different types of restaurants in differnt places. We would like to know what is the best place to run a restaurant, and what type of restaurant is most likely to be popular in a certain region. Probability theory and estimation theory are expected to be the foundation of our analysis. We use tableau and google fusion forms to visulize the data and use gradient boosting regression Trees model to predict tags of each kind of food in restaurant.
  • Secondly, what is the social network within the users of Yelp? We use Union Find algorithm to build up the social graph and a Gausian Mixture Model to cluster users into different types of elites. To visulize the graph and data, we use D3 and google fusion forms.
  • Carnegie Mellon University | 04/2019

  • In this project, we plan to explore the EIG dataset and other additional datasets, such as the census data gathered by the American Community Survey, and build a machine learning system to predict whether a community will become distressed. Our proposed model is a machine learning framework, which takes the economic data as inputs, and outputs the distressed scores.
  • We experiment with feedforward neural networks and recurrent neural networks. The recurrent model is able to detect communities trending towards economic distress with a state-of-art accuracy of 82% on 10-year-scale.
  • Our impacts towards the real-world poverty distress are two-fold: firstly, in computing the correlation of the features and prediction we found that the demographic information are not necessary and secondly deeper networks involved with semantic information are fundamental for accuracy prompting.

  • Undergraduate

    University of Pennsylvania | 07/2017

  • Go-Pro directly supplies the source of person-object interaction, is a significant departure from the traditional passive observation from a third person camera, which provides information about what the person sees in terms of object appearance and his/her specific action.
  • I designed advanced LSTM merged with traditional Multi-View Stereo algorithms for sequences processing, which may function well in object detection and skill assessment.
  • Furthermore, I established a system for 3D context reconstruction from a 12GB data set of blurry, narrow ego-centric videos.
  • This project obtained The Outstanding Undergraduate Research Award(~top 1%).
  • Tsinghua University | 04/2017

  • Emotion recognition is the process of identifying human emotion, most typically from facial expressions as well as from verbal expressions. This is both something that humans do automatically but computational methodologies have also been developed.
  • I established deep networks based on VGG frameworkson and various data sets including VGG-Face dataset, FER2013 public Test, FER2013 private Test and CK+.
  • My proposed method achieved a mean average accuracy of 92.4% exceeding the-state-of-art frameworks.
  • Tsinghua University | 03/2018

  • In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes.
  • I developed residual learning models to concatenate deep neural networks including DPN and FPN.
  • Also, I created an Application Programming Interface(API), increasing precision by 2.2% and 1.3% compared to the-state-of-the-art method on 1.5GB PASCAL VOC 2012 and 20GB MSCOCO 2014 respectively.
  • Tsinghua University | 05/2016

  • The pipeline CPU has five stages, instruction fetching, instruction decoding, excuting, memory accessing and writing back. We wrote a program in MIPS assembly language to calculate greatest common divisor and designed a MIPS assembly language interpreter to translate MIPS assembly code to binary machine code.
  • I designed and implemented a 32-bit pipeline MIPS CPU on an Altera FPGA using verilog language.
  • A serial part for the CPU was embeded to enable communication with PC.
  • The final result is that we can input two numbers using serial port and the CPU can run a program to calculate the greatest common divisor of the two numbers and display it on digital tubes.
  • Tsinghua University | 04/2016

  • Vector graphics are computer graphics images that are defined in terms of 2D points, which are connected by lines and curves to form polygons and other shapes. Each of these points has a definite position on the x- and y-axis of the work plane and determines the direction of the path; further, each path may have various properties including values for stroke color, shape, curve, thickness, and fill.
  • I three-dimensional-constructed Chinese characters with high-dimensional Bézier curves and B-splines.
  • Such characters are texture mapped with natural scene images with Homography.
  • Featured Work


    Hardness Prediction for Object Detection Inspired by Human Vision.
    Yuwei Qiu, Huimin Ma, Lei Gao. International Conference of Image and Graphics (ICIG 2017).

    A Human Visual System Inspired Database For Large Scale Vision Problems.
    Lei Gao, Huimin Ma, Yuwei Qiu Journal Of Images and Graphics (2017).

    HTML5 Bootstrap Template by colorlib.com
    HTML5 Bootstrap Template by colorlib.com
    HTML5 Bootstrap Template by colorlib.com

           Copyright © All rights reserved
           Yuwei Qiu | Last Updated: 08/27/2019