I am |

ZengYan Liang Algorithm Programmer from China
Age : 35 Years
Education Degree : Master
Address : BeiJing,HaiDian
E-mail : Lzy_Lyx@163.com
Hometown : QingHai
PROJECT SHOWCASE

I am currently working as a senior AI algorithm programmer at Yuanfudao Company and have been engaged in related work for 8 years. My artificial intelligence project experience includes: large language models, RAG, multi-modal large models, computer vision, video algorithms, machine vision, natural language processing, visual perception, AI engineering deployment, etc. My earliest exposure to related work was from DQ QR code recognition as an undergraduate. I have been engaged in research on related work during my graduate studies and have a solid theoretical foundation.

The main achievements in recent years are:

  • From 2022 to 2024: English essay correction and polishing (large language model), AI guessing (multimodal model), homework beautification (visual). All online, and widely praised by teachers and students. Session cache service reduces third-party model requests, saving a lot of costs for the company, and independently completes Chinese essay knowledge points, product manual multi-agent RAG services. A total of 12 algorithm invention patents have been completed, including 2 large model-related patents, 1 multimodal large model-related patent, and 2 original algorithm invention patents; during this period, the industrial multimodal large model platform was built and the model was fine-tuned (website: http://112.245.58.16:8852/), and a data automatic annotation platform was built using a sparse attention detection model (website: http://112.245.58.16:8851/);
  • 2022: Obtained the intermediate title of artificial intelligence issued by the Chinese Academy of Sciences, built a video quality analysis framework, which can perform static analysis of image quality, image quality enhancement badcase classification, and image quality intelligent enhancement strategy. During this period, 1 original algorithm invention patent was completed;
  • From 2020 to 2021: I complete the video frame interpolation algorithm on the cellphone, which will break through the barrier that the cellphone can only use chips to video interpolation, and achieve the effect that the software algorithm can be used to video frame interpolation. I complete 6 original algorithm invention patents, and the original video interpolation algorithm reached the state of art level in relevant data set tests.
  • In 2019: I complete the automatic reading recognition of the instrument pointer at the machine car. This original technology broke through the accuracy of reading the meter with the human eye for the first time, which bringing economic benefits to the cooperation between our company and Nanjing China Resources Gas Company and BASF. I complete one original algorithm invention patent.
  • In 2018: I independently complete the Alpha Note App of the intelligent scanning SDK. and the APP has been launched, which bringing economic benefits to our company. During this period, I completed one original algorithm invention patent. I led the team to develop a OCR system for converting PDF documents to Microsoft Word document.

Professional Skills

  • Code language
    c、c++、python、java、c#、shell、html
  • Major skill
    opencv、dlib、ffmpeg、libtorch、pillow、skimage
  • AI
    framework:pytorch、tensorflow,deep learning&llm:transformers、vllm、diffusers、deepspeed、faiss、pymilvus、openai、langchain、llamaindex、autogen
  • Engineering
    platform:linux、windows、android、ros,database:mysql、sqlite,compile:make、cmake,Optimization&terminal:cuda、onnx、libtorch、opencl、tensorrt、snpe、ncnn、mace,Deployment:http、rpc
  • Research
    Ability to quickly reproduce and optimize paper codes, complete technical invention patents and paper writing

Edu&Job Experience

Education

2012 - 2015

Computer science and technology

Guizhou University Master

2007 - 2011

Electronic information engineering

Chengdu University of Information Technology Bachelor

Job

2022.11 - 2024.6

Senior AI algorithm

Beijing Feixiang Planet Technology Co., Ltd.

2022.5 - 2022.11

Senior Video Algorithm Development

BeiJing Shopee information Technology Co., Ltd

2020.4 - 2022.5

Senior Computer Vision Algorithm Development

BeiJing XiaoMi Pinecone Electronics Co., Ltd

2019.2 - 2020.3

Senior Machine Perception Algorithm Development

BeiJing MouShi Technology Co., Ltd

2018.1 - 2019.1

Computer Vision Algorithm Development

BeiJing XinCheng Technology Co., Ltd

2017.3 - 2017.8

Face Recognition Algorithm Development

Guizhou HuaShang High Technology Co., Ltd

Project Experience

English composition correction and polishing (large model)

Introduction: English composition correction and polishing is an important product on the student side of Feixiang Planet. This product is mainly used to improve students' ability to write English compositions; Work: Research, data engineering (collection, cleaning, GT production, JSON generation), prompt word engineering (scoring: content, coherence, word grammar, structure, error correction: words, grammar, format, polishing: words, sentences, total Upgraded 7 versions, positive and negative examples), model fine-tuning (base model llama2-7B-chat, PEFT: lora, MaxToken: 4K, Type: FP16), model optimization (lora fusion, torch compilation, vllm: pp1, tp= 2, gpu_util=0.9, swap_space=4G, max_token=4096, QPS is 2,), generation (streaming output, greedy retrieval), project deployment (RPC service, Console cloud platform, dual computer room, CPU: 8 cores, GPU: V100-32G, Mem: 32G)

You draw AI guess (multi large model)

Introduction: You Draw AI Guess is one of the most popular products in Feixiang Dual Teacher Classroom. Students use drawing to let AI guess in real time, which improves students' painting and imagination abilities, and at the same time stimulates the exploration of artificial intelligence; Work: Data engineering (acquisition: quick_draw, screening, image-text pairing production), model optimization (base model clip-Vit-32, fine-tuning: using the Wise-ft method to freeze the backbone fine-tuning linear layer), performance optimization (text vector database, Vector library grouping, model warmup warmup, QPS is 19.97), engineering deployment (RPC service, Console cloud platform, dual computer room, CPU: 8 cores, GPU: V100, Mem: 16G) Scientific research results (algorithm patent: 1 item)

Large model caching service

Introduction: The large model cache service is a product of Feixiang AI platform. The large model cache service is the front end of large model applications (problem solving, composition, dual-teacher AI). The cache service will be requested before the large model is requested. If the cache service exists If you have the answer, you can quickly return the answer. This product mainly reduces the investment cost of the company in using overseas large models, so as to reduce costs and increase efficiency; Work: Vector data framework construction (Faiss-GPU, create, save, retrieve, insert, delete), vector library creation (naming rules, vector encoding is text2vec-chinese, FLAT type, cache configuration for multi-node synchronization), session mechanism ( The single-round strategy finds the corresponding response based on the ID, and the multi-round strategy stores the time series vector database according to the number of dialogue rounds), project deployment (RPC service, Console cloud platform, dual computer room, CPU: 8 cores, GPU: V100, Mem: 16G), Scientific research results (algorithm patents: 2 items)

Large model RAG service

Introduction: The large model RAG service is used in products such as Chinese work tutoring, Feixiang FAQ, and academic situation analysis. It means retrieval and enhancement of the company's private data, and then regenerating anthropomorphic answers to provide to customers; Work: RAG framework construction (langcanin, document creation, update, save, document splitter, vector library selection, vector storage, vector encoder), performance optimization (documents with the same ID are split into multiple segments, faiss-gpu, QPS is 20) Engineering deployment (RPC service, Console cloud platform, dual computer rooms, CPU: 8 cores, GPU: V100, Mem: 16G)

Automatic annotation data platform (visual large model)

Introduction: A personal project to help the previous company complete the construction of a data platform for automatic labeling of industrial data. The significance of the project is to improve the efficiency of data labeling and reduce the cost of manual labeling through the automatic labeling data platform; Work: Data engineering (acquisition, cleaning, GT generation, diversity), model training (sparse-detr, data augmentation, data parallelism, fine-tuning, transfer learning), platform construction (backend: Tornado, intermediate key: RabbitMQ, and Background worker process: Celery, frontend: Html+JQuery)

Industrial inspection multi-model large model platform (multi large model)

Introduction: A personal project to help the previous company complete the construction of an industrial multi-modal large model platform. The significance of the project is to use the industrial multi-modal large model platform to complete FAQ answers to problems encountered by industrial inspectors, industrial scene descriptions, and industrial target detection. Company introduction and product promotion; Work: Research, data engineering (collection, cleaning, image-text pairing production, text pairing production), model fine-tuning (base model Qwen-VL-Chat, DeepSpeed: zero2, PEFT: lora, MaxToken: 512, Type: FP16, data Parallel), model optimization (lora fusion, torch compilation, QPS is 1.3), generation (safe filtering, streaming output, greedy retrieval), project deployment (Http service: Garido, CPU: 16 cores, GPU: A4000, Mem: 64G )

Homework beautification

Introduction: Homework beautification is to beautify the uploaded homework to make it clean and clear. It is used to help teachers correct the homework uploaded by students in the Feixiang homework system (Web, App). Beautification can be divided into beautifying the original homework or re-rendering and beautifying; Work: Original job beautification includes: data engineering (acquisition, cleaning, GT generation, diversity), model training (beautifying U2Net, clarifying MPR, data augmentation, data parallelism, fine-tuning), job area extraction (morphological processing), Model optimization (ONNX, TRT, beautification [FP32 is 120ms, FP16 is 47ms], clarity [FP16: 230ms]), project deployment (RPC service, Console cloud platform, dual computer room, CPU: 8 cores, GPU: V100, Mem :16G) Re-rendering and beautification: registration project (rpc: QR code detection, KeyNet, affine transformation), extraction of answer text area (morphological processing), job rendering (rendering of answer area to annotation job), scientific research results (algorithm patent: 5 item)

NER of educational subject knowledge points

Introduction: The education subject knowledge point entity naming and identification project is to expand the knowledge points of Yuanfudao Feixiang subject map, so that the knowledge points of the subject are more comprehensive and fine-grained. When students learn personalized learning, they can be based on complete details. to review the knowledge points; Work: Research on NER methods, data engineering (question collection, cleaning, GT production, JSON generation), prompt word engineering (generating corresponding knowledge points based on questions), model fine-tuning (base model aton-7B-chat, PEFT: lora, MaxToken : 4K, Type: FP16), model optimization (lora fusion, torch compilation, vllm: pp1, tp=2, gpu_util=0.9, swap_space=4G, max_token=4096, QPS is 2,), generation (streaming output, greedy Retrieval), post-processing (removal of duplicates, elimination of map knowledge points), project deployment (RPC service, Console cloud platform, dual computer room, CPU: 8 cores, GPU: V100-32G, Mem: 32G)

Estimated number of questions for personal knowledge point ability value estimation

Introduction: The estimated number of recommended questions for individual knowledge point ability values is a model of Feixiang’s personalized learning system, which is used to predict the number of recommended questions based on students’ recent ability values for that knowledge point; Work: Data engineering (cleaning, GT production), feature engineering (3 discrete features, 9 continuous features), model training (base model DeepFM, hot coding of push questions, fine-tuning), performance optimization (GPU inference, QPS is 18.9), engineering deployment (RPC service, Console cloud platform, dual computer rooms, CPU: 8 cores, GPU: V100, Mem: 16G), scientific research results (algorithm patent: 1 item)

Dynamic car demo (generate large model)

Introduction: A dual-teacher literacy classroom exploration project. The project is to allow students to take pictures of hand-drawn cars and generate cars of different colors and styles. The cars will generate GIF images and then cast them to the screen to move on the screen. This project only I made a demo but it didn’t work; Work: hand-drawn car, cut out the hand-drawn car, use diffusion pix edit model and stable diffusion for style rendering and color rendering, visual algorithm to complete random scratches on the bottom of the wheel, generate gif image, add and move it in the background image, scientific research results (algorithm Patents: 1 item)

Prompt engineering

Introduction: Prompt word engineering projects that I have completed in many large model applications; Work: Projects include: English composition correction and polishing, mathematical problem solving, academic situation analysis, Chinese composition tutoring (brainstorming, outline, writing), Chinese composition correction and polishing, dual-view classroom AI teacher, picture and text question correction, prompt words Engineering skills (role setting, simplicity, effectiveness, few-shot positive and negative examples)

video quality analysis and content understanding

Work: the work includes using C++to build the Shopee Video quality analysis and content understanding framework, completing the static analysis algorithm module SDK, the content understanding module completing sandwich detection and face detection algorithms. For image definition evaluation algorithm, i use the latest Vision In Transformer model (MUSIQ) to train, through SPAQ data training and a small amount of SPV data fine-tuning, the plcc indicator reaches 93.2%. For the spv enhanced video, the enhanced texture blur badcase and enhancement failure badcase can be accurately captured, and the recall rate of the latest model for non definition video in the spv definition test set reaches 83%

low fps video frame interpolation

Work: the work includes frame interpolation for low fps shopee video. Cause the motion scale of low fps is too large, FILM algorithm is used to estimate and compensate the large scale motion area. at present, I complete FILM reproduction and tested on shopee 25fps video. the inference speed on NV A100 has an average of 100ms per frame, which has obvious effect on the overall area motion video sample

live video enhancement

Work: the work is to complete the no reference image quality definition model. The pipeline of live video enhancement is to decode the live stream into RGB frames, and then analyze the frame quality definition, with low definition frame are enhanced by the RESRGAN algorithm

video frame interpolation on cellphone

Work: the work includes cleaning Vimeo90K, Adobe high frame and YouTube dataset, training video frame interpolation with AdaCoF algorithm, correcting and refining optical flow, judging video scene switching, using snpe and mace framework to transplant and optimize the front-end model on the cellphone, using opencl to realize that the cellphone doesn’t support ops, and aiming at the problem of AdaCoF for high-resolution video drawback. I develop a video frame interpolation network (VFI-FMSMI) integrating multi-scale motion information. The algorithm has reached the state of art level in the relevant data set test. See the following address for the detail of algorithm description: https://github.com/lzylyx/VFI_FMSMI

magic sky

Work: the work includes cleaning Xiaomi magic sky dataset, using U2Net algorithm to train the sky change model, using snpe framework to transplant and optimize the sky change model at cellphone. I developing a context aware learning for salient object detection (CLN-SOD) to solve the problem of U2Net's misclassification of background. See the following address for details of algorithm description: https://github.com/lzylyx/CLN-SOD-

human body key point detection

Work: the work includes cleaning COCO human key point dataset, using HRNet algorithm to train human key point model, using tensorrt back-end framework to transplant and optimize the human key point model on ROS, using cuda, tensorrt and opencv to implement the post-processing pipeline of human key points on ROS, and completing the detection SDK of human key points on ROS

gesture recognition

Work: the work includes cleaning Xiaomi gesture data, training gesture detection model with Yolov4 algorithm, transplant and optimizing gesture detection model at ROS with onnx and tensorrt back-end framework, merge gesture recognition models, completing the hand Det+Rec framework, and using cuda, tensorrt and opencv to complete the SDK of gesture Det+Rec on ROS

video behavior recognition

Work: the work includes assisting model trainers to complete the frame extraction and sorting of UCF101 dataset, using mace back-end framework to transplant and optimize the video behavior recognition model on the cellphone, and verifying the model, optimizing the C3D model, investigating two stream and TSN methods, integrating the existing open source optical flow algorithm and reproducing them

indication recognition of instrument pointer meter

Work: the work includes marking, sorting and cleaning the pointer meter data of the gas station, training the indicator detection model using the single-stage detection CorNet Lite algorithm, training the indicator semantic segmentation model using the PSP algorithm, training the indicator number recognition model using the E2E algorithm, integrating the indicator detection, semantic segmentation and number recognition modules, completing the automatic pointer meter indication recognition framework, deploying corresponding nodes at the ROS end, and testing at the gas station site, The recognition accuracy of pointer indication is up to 98%

airport tray object detection

Work: the work includes the installation and commissioning of RealSense D415, locking the edge area of the tray with the depth information of D415, partially dividing the area inside the tray, and making statistics of local features and detecting whether there are goods in the tray according to local features and combining the depth information inside the tray

power station switch detection

Work: the work includes verifying the switch detection model of power station and complete the post-processing of Yolov3 detection model by c++

airport runway area detection

Work: the work includes cleaning the runway data collected by RealSense D415, training the runway semantic segmentation model using DeepLabV3+ algorithm, extracting the 2d coordinates of the runway edge points, and submitting the 2d coordinate point results to SLAM colleagues for post-processing

pedestrian to follow

Work: the work includes learning the DeepSort algorithm, understanding the principle of the DeepSort algorithm, and running the DeepSort algorithm on the ROS car.

sdk of intelligent scanning app

Work: the work includes developing the intelligent scanning module SDK of Alpha Note app. The intelligent scanning module SDK includes: SDK License validation, document edge detection algorithm, gray scanning and black and white scanning algorithm, color scanning algorithm, seal retention scanning algorithm, filter algorithm collection, and completing the SDK packaging of Android and IOS. The app website: https://apps.apple.com/cn/app/id1325527674 , the scanning effect please see the following address: https://github.com/lzylyx/Scan-SDK

pdf document to Microsoft word document

Work: The work includes transforming PDF document into image, using ctpn algorithm to detect image text, segment line characters and count character position information, colleague using DPNet92 algorithm to train GB2312 first level character recognition model, using character recognition model to recognize segmented characters, converting recognized characters into txt, converting txt to Microsoft word document, and testing 400 word file PDF document, with character recognition accuracy reaching 80%

face recognition with fluorite camera

Work: the work includes the use of mtcnn algorithm to detect faces, the use of dlib library for face alignment and extraction of facial features, and the use of FaceNet algorithm for face recognition. Without considering occlusion and strong light interference, the camera can test and recognize company personnel with an accuracy rate of about 90%.

face expression recognition

School research project, the Yolo algorithm is used to train and detect the face data, the dlib is used to extract 68 key points of the face, combine these key points for classification and discrimination, and finally recognize the facial expression through the results of classification and discrimination. There are four types of expressions to be recognized: angry, surprised, happy and normal. The accuracy of expression recognition on yale dataset can reach more than 95%

enhancement and haze removal in traffic surveillance video

School research project, the goal is improve the imaging quality in traffic surveillance video by use image enhancement algorithm. Firstly, i developed a local histogram statistical algorithm, and then i developed a fuzzy domain edge contrast enhancement algorithm, the imaging enhancement effect is obvious. In the haze environment, a contrast-constrained adaptive histogram algorithm is used first, and then a new fast dehazing algorithm is developed based on the principle of Dark Channel and the atmospheric scattering model combined with PCA. The dehazing effect is obvious, and the algorithm can meet the real-time processing needs.