Senior Operations Expert FT- SH 高级运维专家 (全职) - 上海

TLDR

Design and implement high-concurrency, low-latency distributed architectures to support exponential business growth and ensure seamless AI interactions for millions of users.

Role Overview

You are the "architect" and "guardian" of Flowith’s global production environment. In this role, you are not just a firefighter putting out outages, but the cornerstone supporting exponential business growth. You will master the Cloudflare ecosystem and mainstream global cloud infrastructure to design and implement high-concurrency, low-latency distributed architectures. Through extreme performance optimization and a relentless pursuit of automation, you will ensure millions of global users always experience silky-smooth and stable AI interactions.

Key Responsibilities

  • Global Architecture Implementation: Design and manage cross-platform cloud-native architectures, driving multi-region deployment, elastic scaling, canary releases, and rapid rollbacks to ensure the efficient operation of global distributed applications.
  • Traffic & Performance Optimization: Lead the architectural design of managed caching and asynchronous messaging capabilities to seamlessly handle hot caches, task decoupling, and traffic spikes.
  • High Availability & Continuity: Build and continuously optimize the observability system (SLI/SLO and alert governance). Develop and drill backup/recovery, disaster recovery switching, and emergency response mechanisms to defend the baseline of business continuity.
  • Technical Vision & Empowerment: Participate in tech stack selection and architecture reviews for core business features, finding the optimal balance between reliability, security, cost, and maintainability.

  • 全球化架构落地:设计并管理跨平台云原生架构,推进多地域部署、弹性扩缩容、灰度发布与快速回滚,保障全球分布式应用的高效运行。
  • 流量与性能优化:主导托管式缓存与异步消息能力的架构设计,从容应对热点缓存、任务解耦与流量削峰。
  • 高可用与连续性保障:建设并持续优化可观测性体系(SLI/SLO与告警治理),制定并演练备份恢复、容灾切换与应急响应机制,捍卫业务连续性底线。
  • 技术前瞻与架构赋能:参与核心业务的技术选型与架构评审,在可靠性、安全性、成本与可运维性之间找到最优解。

Requirements

  • You build systems that never sleep and automate everything you touch.
  • Hardcore Operations Foundation: 5+ years of SRE/DevOps/Operations experience with battle-tested experience in systems serving millions of users. Solid foundation in Linux and networking (TCP/IP, DNS, HTTP/HTTPS, TLS), and complex troubleshooting skills.
  • Cloud-Native & Edge Master: Deep understanding and proficiency in the Cloudflare ecosystem (CDN/WAF/DNS/Edge Computing) and resource governance of mainstream overseas cloud infrastructure (compute, network, load balancing, storage, managed databases).
  • Automation & Monitoring Enthusiast: Proficient in building and maintaining Prometheus + Grafana monitoring systems. Master of Terraform (or similar IaC) and mainstream CI/CD toolchains. Ability to write handy operational tools using Shell/Python/Go.
  • Architectural Vision: Deep understanding of managed cloud caching and messaging systems (Serverless Redis, queues/event-driven architectures), and hands-on experience in security operations (least privilege, key management, access control, auditing).
  • Bonus: Experience in deploying underlying infrastructure for AI applications, or a strong passion for exploring how Agents/LLMs can empower intelligent operations (AIOps).

需要你:

  • 运维经验:5 年以上 SRE/DevOps/运维经验,曾在百万级/千万级用户规模的系统中身经百战,具备扎实的 Linux 与网络基础(TCP/IP、DNS、HTTP/HTTPS、TLS)及复杂故障排查能力。
  • 云原生与边缘计算:深入理解并熟练使用 Cloudflare 生态(CDN/WAF/DNS/边缘计算),具备海外主流云基础设施(计算、网络、负载均衡、存储、托管数据库)的资源治理经验。
  • 自动化与监控:熟练搭建与维护 Prometheus + Grafana 监控体系;精通 Terraform(或同类 IaC)与主流 CI/CD 工具链,能用 Shell/Python/Go 编写趁手的运维平台工具。
  • 架构视野:深入理解托管式云缓存与消息系统(Serverless Redis、队列/事件驱动),具备安全运维实践经验(最小权限、密钥管理、访问控制、审计)。
  • 加分项:对 AI 应用的底层基础设施部署有经验,或热衷于探索如何利用 Agent/大模型赋能智能运维(AIOps)。

Benefits

  • Workspace, Culture & Lifestyle
    • Awesome Teammates: Work alongside a kind, creative, and hardworking crew of occasional "geeks" and visionaries.
    • Building the AGI Future: Participate in the in-house development of rapidly evolving AI agents and explore the future of AGI interactive interfaces.
    • Cool Offices in SH & SF: Enjoy our ultra-open workspaces with the ultimate freedom to seamlessly switch between our Shanghai and San Francisco locations.
    • Pet-Friendly Workplace: Bring your furry friends to work! Come play with our resident Orange Tabby and Golden Retriever Mix, or bring your own pets to hang out.
    • Island Hackathons: Join our annual internal hackathons, where we select a new city or country each year for innovative coding sessions and team bonding.
    • Free AI Tools & Tech Gear: Enjoy free, unlimited access to cutting-edge AI tools, plus the latest tech equipment like Apple Vision Pro and FPV drones.
    • Tech Events: Regularly participate in top-tier global tech meetups and innovation showcases.
    • Parties & Events: Celebrate with monthly birthday bashes and annual milestone parties
    • Free Snacks & Drinks: Stay fueled with an endless supply of your favorite beverages and unlimited complimentary snacks.
  • Work Arrangements
    • Flexible Working Hours: Customize your schedule by arriving at the office between 10 AM and 1 PM for a standard 8-hour workday, 5 days a week.
    • Remote Work & Care: Embrace a supportive hybrid work model, featuring 1 additional Work-From-Home (WFH) day per month exclusively for female employees.
  • Comprehensive Benefits Package
    • Competitive Compensation: Earn an above-market salary structure with an optional equity/stock options package.
    • Wellness Program: Take care of your body and mind with free gym access and monthly on-site professional massages.
    • Exclusive Swag & Perks: Receive holiday surprise gift boxes, premium custom company apparel (T-shirts, hoodies, and jackets), and occasional exclusive internal brand discounts.

Benefits

Flexible Work Hours

Customize your schedule by arriving at the office between 10 AM and 1 PM for a standard 8-hour workday, 5 days a week.

Free Meals & Snacks

Stay fueled with an endless supply of your favorite beverages and unlimited complimentary snacks.

Parties & Events

Celebrate with monthly birthday bashes and annual milestone parties.

Remote-Friendly

Embrace a supportive hybrid work model, featuring 1 additional Work-From-Home (WFH) day per month exclusively for female employees.

Flowith builds an AI workspace that merges knowledge, creativity, and execution, helping users turn their ideas into various forms of media, from visuals to websites. Targeting the Japanese market, it not only provides innovative tools but also fosters a community around human-AI collaboration, engaging users through social initiatives and events.

View all jobs
Report this job
Apply for this job