Role Overview
You are the "architect" and "guardian" of Flowith’s global production environment. In this role, you are not just a firefighter putting out outages, but the cornerstone supporting exponential business growth. You will master the Cloudflare ecosystem and mainstream global cloud infrastructure to design and implement high-concurrency, low-latency distributed architectures. Through extreme performance optimization and a relentless pursuit of automation, you will ensure millions of global users always experience silky-smooth and stable AI interactions.
Key Responsibilities
- Global Architecture Implementation: Design and manage cross-platform cloud-native architectures, driving multi-region deployment, elastic scaling, canary releases, and rapid rollbacks to ensure the efficient operation of global distributed applications.
- Traffic & Performance Optimization: Lead the architectural design of managed caching and asynchronous messaging capabilities to seamlessly handle hot caches, task decoupling, and traffic spikes.
- High Availability & Continuity: Build and continuously optimize the observability system (SLI/SLO and alert governance). Develop and drill backup/recovery, disaster recovery switching, and emergency response mechanisms to defend the baseline of business continuity.
- Technical Vision & Empowerment: Participate in tech stack selection and architecture reviews for core business features, finding the optimal balance between reliability, security, cost, and maintainability.
- 全球化架构落地:设计并管理跨平台云原生架构,推进多地域部署、弹性扩缩容、灰度发布与快速回滚,保障全球分布式应用的高效运行。
- 流量与性能优化:主导托管式缓存与异步消息能力的架构设计,从容应对热点缓存、任务解耦与流量削峰。
- 高可用与连续性保障:建设并持续优化可观测性体系(SLI/SLO与告警治理),制定并演练备份恢复、容灾切换与应急响应机制,捍卫业务连续性底线。
- 技术前瞻与架构赋能:参与核心业务的技术选型与架构评审,在可靠性、安全性、成本与可运维性之间找到最优解。
Requirements
- You build systems that never sleep and automate everything you touch.
- Hardcore Operations Foundation: 5+ years of SRE/DevOps/Operations experience with battle-tested experience in systems serving millions of users. Solid foundation in Linux and networking (TCP/IP, DNS, HTTP/HTTPS, TLS), and complex troubleshooting skills.
- Cloud-Native & Edge Master: Deep understanding and proficiency in the Cloudflare ecosystem (CDN/WAF/DNS/Edge Computing) and resource governance of mainstream overseas cloud infrastructure (compute, network, load balancing, storage, managed databases).
- Automation & Monitoring Enthusiast: Proficient in building and maintaining Prometheus + Grafana monitoring systems. Master of Terraform (or similar IaC) and mainstream CI/CD toolchains. Ability to write handy operational tools using Shell/Python/Go.
- Architectural Vision: Deep understanding of managed cloud caching and messaging systems (Serverless Redis, queues/event-driven architectures), and hands-on experience in security operations (least privilege, key management, access control, auditing).
- Bonus: Experience in deploying underlying infrastructure for AI applications, or a strong passion for exploring how Agents/LLMs can empower intelligent operations (AIOps).
需要你:
- 运维经验:5 年以上 SRE/DevOps/运维经验,曾在百万级/千万级用户规模的系统中身经百战,具备扎实的 Linux 与网络基础(TCP/IP、DNS、HTTP/HTTPS、TLS)及复杂故障排查能力。
- 云原生与边缘计算:深入理解并熟练使用 Cloudflare 生态(CDN/WAF/DNS/边缘计算),具备海外主流云基础设施(计算、网络、负载均衡、存储、托管数据库)的资源治理经验。
- 自动化与监控:熟练搭建与维护 Prometheus + Grafana 监控体系;精通 Terraform(或同类 IaC)与主流 CI/CD 工具链,能用 Shell/Python/Go 编写趁手的运维平台工具。
- 架构视野:深入理解托管式云缓存与消息系统(Serverless Redis、队列/事件驱动),具备安全运维实践经验(最小权限、密钥管理、访问控制、审计)。
- 加分项:对 AI 应用的底层基础设施部署有经验,或热衷于探索如何利用 Agent/大模型赋能智能运维(AIOps)。
Benefits
- Workspace, Culture & Lifestyle
-
Awesome Teammates: Work alongside a kind, creative, and hardworking crew of occasional "geeks" and visionaries.
-
Building the AGI Future: Participate in the in-house development of rapidly evolving AI agents and explore the future of AGI interactive interfaces.
-
Cool Offices in SH & SF: Enjoy our ultra-open workspaces with the ultimate freedom to seamlessly switch between our Shanghai and San Francisco locations.
-
Pet-Friendly Workplace: Bring your furry friends to work! Come play with our resident Orange Tabby and Golden Retriever Mix, or bring your own pets to hang out.
-
Island Hackathons: Join our annual internal hackathons, where we select a new city or country each year for innovative coding sessions and team bonding.
-
Free AI Tools & Tech Gear: Enjoy free, unlimited access to cutting-edge AI tools, plus the latest tech equipment like Apple Vision Pro and FPV drones.
-
Tech Events: Regularly participate in top-tier global tech meetups and innovation showcases.
-
Parties & Events: Celebrate with monthly birthday bashes and annual milestone parties
-
Free Snacks & Drinks: Stay fueled with an endless supply of your favorite beverages and unlimited complimentary snacks.
- Work Arrangements
-
Flexible Working Hours: Customize your schedule by arriving at the office between 10 AM and 1 PM for a standard 8-hour workday, 5 days a week.
-
Remote Work & Care: Embrace a supportive hybrid work model, featuring 1 additional Work-From-Home (WFH) day per month exclusively for female employees.
- Comprehensive Benefits Package
-
Competitive Compensation: Earn an above-market salary structure with an optional equity/stock options package.
-
Wellness Program: Take care of your body and mind with free gym access and monthly on-site professional massages.
-
Exclusive Swag & Perks: Receive holiday surprise gift boxes, premium custom company apparel (T-shirts, hoodies, and jackets), and occasional exclusive internal brand discounts.