AI Platform Engineer
Job Description
We're committed to pushing boundaries and leaving our mark as we reinvent the future of ecommerce for enthusiasts.\nOur customers are our compass, authenticity thrives, bold ideas are welcome, and everyone can bring their unique selves to work - every day. We're in this together, sustaining the future of our customers, our company, and our planet.\n\nJoin a team of passionate thinkers, innovators, and dreamers - and help us connect people and build communities to create economic opportunity for all.\n\nAbout eBay AI Platform\nAt eBay, we are building a next-generation AI platform to power intelligent, AI-driven experiences across our global marketplace. The platform's control plane exposes the full ML lifecycle through the AI Hub developer portal-serving ML researchers, Applied Scientists, and Data Engineers across eBay's global organization with reliable, self-service tooling for experimentation, model management, and production deployment.\nWe focus on building the ML Platform Control Plane-composed of the AI Metadata Service, Model Management System (MMS), Experiment Management System (EMS), and Deployment Service-as well as developer-facing tooling including a Python SDK for the AI platform, Jupyter and Ray Workspace environments, the AI Hub portal built on React and Node.js, and production observability via standardized AI runtime metrics and monitoring dashboards.\n\nAbout the Role We are looking for an experienced Software Engineer specializing in AI Platform infrastructure and MLOps services to design, build, and operate the control plane that ties together eBay's entire ML ecosystem.
This is a high-impact, full-stack platform role where you will own both core backend MLOps services and the developer-facing AI Hub interface-ensuring every ML practitioner at eBay has reliable, efficient, and intuitive tools to build AI at scale.\nYou will work on ML Platform Control Plane services (AI Metadata Service, MMS, EMS, Deployment Service), the Experiment Management System built on MLflow, the Model Management System with Python SDK integration, AI Metadata Service, Ray Workspace and JupyterHub notebook infrastructure, distributed tracing and observability across platform services, the AI Hub portal built on React and Node.js, and production monitoring dashboards-all integrated with GitOps-based CI/CD pipelines.\n\nKey Responsibilities Design and build the ML Platform Control Plane services, including the AI Metadata Service, Management Service, and Deployment Service.\nDevelop and operate the Experiment Management System (EMS) built on MLflow for experiment tracking, metrics, artifacts, and lifecycle governance.\nBuild and maintain the Model Management System (MMS), including model versioning, lineage tracking, stage transitions, and deployment gating.\nDesign and operate the AI Metadata Service to store and serve metadata across experiments, model versions, training runs, datasets, and ML pipelines.\nBuild and manage AI Workspace environments, including JupyterHub and Ray Workspaces on Kubernetes.\nImplement distributed tracing and observability across ML Platform services using tools such as OpenTelemetry and Jaeger.\nDesign and build the AI Hub portal using React and Node.js.\nDevelop and maintain the Python SDK for the AI platform.\nBuild and maintain production monitoring and dashboards using Prometheus and Grafana.\nBuild and operate CI/CD pipelines for ML workflows and platform services using Argo CD and GitOps-based tooling.\nCollaborate with ML researchers, Applied Scientists, and Data Engineers to improve developer workflows and platform usability.\nImprove reliability, scalability, and developer experience across ML Platform control plane services.\n\nWhat We're Looking For Bachelor's or Master's degree in Computer Science, Engineering, or a related field.\n5+ years of experience building scalable distributed systems or platform engineering solutions.\nStrong programming skills in Python and/or Java.\nProficiency in TypeScript and JavaScript for React and Node.js development.\nHands-on experience with MLOps services such as MLflow, Weights & Biases, or equivalent systems.\nExperience designing and operating model management systems with versioning, lineage, and approval workflows.\nExperience building metadata services and scalable data stores for ML platforms.\nHands-on experience with Jupyter Notebook and Ray Workspace environments.\nExperience implementing distributed tracing across microservices and ML platform components.\nProficiency with React and Node.js for developer-facing web portals and internal tools.\nExperience designing and building Python SDKs for platform consumption.\nStrong expertise with monitoring and observability tooling such as Prometheus and Grafana.\nExperience with Kubernetes, Docker, and GitOps-based CD tooling such as Argo CD.\nStrong API design, debugging, and performance optimization skills.\nAdditional Details\neBay is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, veteran status, and disability, or other legally protected status. If you have a need that requires accommodation, please contact us at [email protected] . We will make every effort to respond to your request for accommodation as soon as possible.
View our accessibility statement to learn more about eBay's commitment to ensuring digital accessibility for people with disabilities.\n\nWe use cookies to enhance your experience and may use AI tools for administrative tasks in the hiring process. To learn how we handle your personal data and use AI responsibly, please visit our Talent Privacy Notice , Privacy Center , and AI Hiring Guidelines .\n]]>