Job Responsibilities

Educational Background: Master's degree or above in Computer Science, Artificial Intelligence, Machine Learning, Natural Language Processing, or related fields. Preference for an English-speaking work environment.
Basic Competencies: Profound understanding of computer principles, with a solid foundation in data structures, algorithms, and system design capabilities.
Framework Proficiency: Proficiency in PyTorch, capable of reading and modifying its core module source code, and familiar with mechanisms such as autograd, nn.Module, and distributed training.
NLP Practical Experience: Project implementation experience in foundational NLP tasks such as information extraction, text classification, and Machine Reading Comprehension (MRC), capable of independently solving practical business problems.
Core Technology Stack for Large Models:
- Familiarity with mainstream open-source large model architectures (e.g., Qwen), understanding their core improvements: RMSNorm, SwiGLU, RoPE rotational positional encoding.
- Mastery of large model fine-tuning techniques: including full-parameter fine-tuning, LoRA (capable of handwriting bypass matrix fusion), Adapter, Prefix-Tuning, and the three-stage process of Reinforcement Learning from Human Feedback (RLHF).
- Proficiency in large model inference optimization: familiar with quantization (GPTQ/AWQ), model pruning, KV Cache, and deployment with vLLM/TGI.
RAG and Engineering Capabilities:
- Familiarity with vector retrieval algorithms (HNSW/IVF-PQ) and principles and tuning of vector databases (FAISS, Milvus, Pinecone).
- Familiarity with core modules of LangChain/LlamaIndex (document_loaders, text_splitter, vectorstores, RetrievalQA), and capable of deep customization beyond their abstraction layers.
- Possessing a "post-processing as a firewall" mindset, capable of designing rule engines or small discriminative models for deduplication, hallucination removal, and format standardization of LLM outputs.
AI System Architecture Capabilities:
- Leading the research and development, fine-tuning, and engineering implementation of large-scale dialogue pre-training models, constructing high-performance, low-latency online inference services.
- Designing and implementing a "generation control hub" for dialogue systems, focusing on addressing "hallucination" and "repetition" issues in large models through decoding strategy optimization (Repetition Penalty, Top-p, Temperature), post-processing filtering, RAG enhancement, etc., to ensure the accuracy, diversity, and information density of output content.
- Constructing and optimizing an end-to-end RAG (Retrieval-Augmented Generation) architecture: from document loading → semantic chunking (Chunking) → vector embedding → ANN retrieval (HNSW) → re-ranking (Re-ranking) → prompt injection → LLM generation, conducting performance tuning and effect evaluation for each link.
- Responsible for the research and development and delivery of core NLP requirements, including but not limited to text generation, information extraction, intent recognition, topic discovery, and reading comprehension (MRC).
- Possessing the ability to independently complete the entire process of algorithm development from prototype design, code writing, performance optimization, debugging, to production deployment, capable of bypassing abstraction frameworks like LangChain and directly operating underlying vector libraries (FAISS/Milvus) and inference engines (vLLM/TGI).
- Establishing a scientific evaluation system, designing quantitative metrics (such as Rouge-L, MRR, Faithfulness Score) and manual evaluation criteria to continuously drive model and system iteration.

Job Requirements

PyTorch Deep Learning
- Proficiency in PyTorch tensor computation and automatic differentiation engines, capable of handwriting custom autograd.Function to implement gradient backpropagation for non-differentiable operations, and having practical experience with torch.compile graph optimization and torch.distributed parallel strategies.
- Mastery of industrial-grade implementation of Transformers in PyTorch, capable of building or modifying Decoder-only architectures from scratch based on nn.Module, nn.Transformer, or Hugging Face transformers library, with source-code-level understanding of KV Cache mechanisms, dynamic construction of Attention Masks, and Positional Encoding injection points.
- Proficiency in the complete pipeline of model training and fine-tuning:
  - Capable of designing and implementing mixed-precision training (AMP), gradient accumulation, learning rate warmup and decay strategies to maximize GPU utilization.
  - Mastery of PyTorch native implementation of Parameter-Efficient Fine-Tuning (PEFT):
    - LoRA: Capable of handwriting low-rank bypass matrices (lora_A, lora_B) and implementing their dynamic fusion with original weights (W = W0 + lora_B @ lora_A).
    - Adapter/Prefix-Tuning: Capable of inserting trainable parameter modules between FFN or Attention layers in Transformers and controlling their gradient flow.
  - Profound understanding of PyTorch implementation of the three-stage RLHF process (SFT, Reward Modeling, PPO), capable of debugging Reward Hacking and training collapse issues.
- Possessing model inference optimization capabilities: familiar with the underlying scheduling logic of inference frameworks like vLLM/TGI, capable of reducing inference latency and costs for MSPBot through quantization (GPTQ/AWQ), model pruning, or ONNX Runtime deployment.
Large Model Generation Control
- Mastery of the "entropy control art" of decoding strategies: capable of precisely regulating the information entropy, novelty, and certainty boundaries of output text by dynamically adjusting sampling hyperparameters such as temperature, top_p, top_k, repetition_penalty, and frequency_penalty within the strategy space of Greedy/Beam Search/Nucleus Sampling, to eradicate the "repetition" issue.
- Proficiency in the multi-modal attribution and对抗 (I assume you mean "counteracting") framework for "hallucinations":
  - Capable of distinguishing data source inconsistency hallucinations, contextual attention collapse hallucinations, and OOD (Out-of-Distribution) cognitive boundary hallucinations.
  - Capable of designing RAG-Augmented Prompting, Constrained Decoding (such as CFG), or Faithfulness Scoring-based posterior verification for systematic governance.
- Profound understanding of the decisive impact of data preprocessing on generation quality: capable of enhancing training data diversity through synonym replacement, back-translation enhancement, and template perturbation, and designing semantic boundary-based Chunking strategies to optimize RAG contextual injection.
RAG Full-Link Architecture and Vector Retrieval Expertise
- Leading the design of end-to-end RAG architectures: from Document Loader → semantic Chunking → Embedding Model fine-tuning → Query Rewriting → ANN retrieval (HNSW/IVF-PQ) → Cross-Encoder re-ranking → LLM generation, with quantitative analysis capabilities for performance bottlenecks and accuracy losses in each link.
- Mastery of the "index art" of vector retrieval: possessing tuning skills at the "muscle memory" level for the hierarchical search of HNSW graph structures, nonlinear impacts of M and ef_construction parameters on the recall-latency curve, and precision compensation strategies brought by PQ quantization.
- Constructing the "gold standard" for RAG evaluation:
  - Retrieval layer: Hit Rate@K, MRR, NDCG.
  - Generation layer: Designing automated evaluation based on LLM-as-a-Judge, quantifying Faithfulness, Answer Relevance, and Context Precision.
Engineering Implementation, Open-Source Collaboration, and Cutting-Edge Insight
- Mastery of the "glue" philosophy and "anti-pattern" pitfalls of LangChain/LlamaIndex: capable of rapidly building MVPs based on them, and more importantly, bypassing abstraction layers to directly operate FAISS/Milvus and vLLM for extreme performance optimization.
- Possessing an "post-processing as a firewall" engineering mindset: capable of designing post-processing modules based on rule engines, semantic clustering (Sentence-BERT), or small discriminative models for "triple purification" of LLM raw outputs: deduplication, hallucination removal, and format standardization.
- Having foresight into the evolution direction of "RAG 2.0": such as Self-RAG, CRAG, Graph RAG, and capable of evaluating their ROI in MSPBot business scenarios.

AI engineer

AI overview