Computer Vision Intern
July 2020 - Present, Brooklyn, NY
- Applied cutting-edge Computer Vision research to improve personalization and image-based recommendations.
- Developed and trained Deep Siamese model to extract aesthetic features from listing pictures to recommend style-matching items.
- Trained object detection and image segmentation models on Etsy’s datasets with Detectron2.
Skim AI Technologies, Inc.
Machine Learning Researcher (NLP)
October 2019 – Present, New York, NY
- Trained and deployed MobileBERT for document summarization on mobile devices, achiving 5.5x faster speed while attaining 95% of BERT-large’s performance.
- Trained deep learning models (Transformers, CNN, RNN-LSTM) for document classification and named entity recognition tasks to extract information from legal documents, achieving 0.95 F1 score and saving client hundred hours of manual labeling.
- Pretrained Spanish BERT and ELECTRA models from scratch on 18 GB of Open Super-large Crawled Corpus (OSCAR) using multiple GPUs on AWS EC2 instances, achieving state-of-the-art results on Spanish benchmarks.
- Vectorized documents with pretrained word vectors such as Word2vec, fastText and ELMo to feed classification algorithms.
- Researched latest NLP and CV papers on regular basis, reported findings to CTO and wrote codes to reproduce research results.
Aurubis Buffalo, Inc.
Data Scientist (Capstone Project)
February 2020 – May 2020, Rochester, NY
- Developed data preprocessing and regression pipeline to predict yield percentage from production of coils into finished products. The developed LightGBM regressor achieved highly accurate predictions with 4% MAE, improving scheduling efficiency and significantly lowering production and inventory cost.
- Performed data labeling, data exploration, data cleaning, feature engineering on coil history, and hyperparameter tuning with cross validation to optimize accuracy of regressor.
- Built web app to deploy developed pipeline that can quickly generate yield predictions for future production.
Tax Technologies, Inc.
March 2019 – July 2019, Buffalo, NY
Provided technical supports to Fortune 500 clients utilizing Tax Series – TTI’s flagship product, an all-inclusive SaaS global data collection, tax compliance and provision software.
- Assisted on implementation engagements for new clients, including performing data collection, integration and setup in Tax Series.
- Conducted essential application diagnostics on client financial data, including periodically generating technical reports, maintaining data integrity and monitoring client databases.
- Conducted in-depth research on tax forms and e-file requirements in 32 states and four foreign countries, helping develop annual enhancement release for Tax Series.
- Performed application testing to determine if software worked as designed, logged technical reports and collaborated with software engineers to build enhancement update for Tax Series.
- Proposed new changes in priority system and data visualization that improved overall efficiency of ticket support reporting.
Niagara University - Academic Success Center
August 2018 - December 2018, Lewiston, NY
- Tutored nine students in statistics, business analytics and accounting courses.
- Helped students understand class materials, complete homework and build effective study strategies.
- Rated Excellent in all criteria by eight out of nine students.
Business Analytics Competition & Conference 2018 @ Manhattan College
Data Analytics Team Leader
February 2018 - May 2018, New York, NY
In three-month research and two-day hackathon, led a team of four students to discover insights from NYC and Boston government data sets and won runner-up prize for best research poster out of 18 participating colleges.
- Cleaned up (missing data, outlier detection, duplications) and integrated (merge, join, subset) large data sets (6 million records) of governmental spending, contracts and KPI metrics.
- Utilized Python and Tableau to perform exploratory data analysis and visualization on payroll distribution and minority-owned businesses’ participation in government contracts.
- Built and ran linear models on SPSS to determine socioeconomic factors affecting government spending.
- Applied statistical techniques to predict government KPI metrics, crime rate and education quality.
- Presented research findings before data scientists and Wall Street veterans.
University of Rochester
Master of Science in Business Analytics (STEM), 2020
- GPA: 3.96/4.00
- Coursework: Core Statistics, R Programming, Predictive Analytics with Python (Machine Learning), Causal Analytics with R (A/B Testing), Social Media Analytics (NLP), Database Management (SQL, Cypher), Big Data (Hive, Spark), Pricing Analytics
Bachelor of Business Administration in Accounting, 2019
- GPA: 3.99/4.00, summa cum laude
- Dean’s List (all attended semesters); Top 5 student in the Accounting department
- Coursework: Business Analytics, Linear Models, Management Information Systems, Econometrics
📝 Projects and Articles
I frequently publish articles to discuss recent research in Natural Language Processing and open-source projects applying state-of-the-art AI technologies. Please visit my blog and portfolio for more details.
- Programming: Python (NumPy, Pandas, Scikit-learn, PyTorch, TensorFlow), Big Data (Spark, Hive), R, SQL, Cypher, MATLAB
- NLP: Named Entity Recognition, Sentiment Analysis, Machine Translation, Summarization
- Cloud Computing: AWS EC2, GCP
- Visualization: Tableau, Matplotlib, Seaborn
- Statistical Tools: SAS, SPSS, Excel
- Design and Video Editing: Adobe Photoshop, Lightroom, Premiere