Summary
Gainwell is seeking LLM Ops Engineers and ML Ops Engineers to join our growing AI/ML team. This role is responsible for developing, deploying, and maintaining scalable infrastructure and pipelines for Machine Learning (ML) models and Large Language Models (LLMs). You will play a critical role in ensuring smooth model lifecycle management, performance monitoring, version control, and compliance while collaborating closely with Data Scientists, DevOps.
Your role in our mission
Core LLM Ops Responsibilities:
stylemargin:bottom:11.0px:
:
Develop and manage scalable deployment strategies specifically tailored for LLMs (GPT, Llama, Claude, etc.).
:
Optimize LLM inference performance, including model parallelization, quantization, pruning, and fine:tuning pipelines.
:
Integrate prompt management, version control, and retrieval:augmented generation (RAG) pipelines.
:
Manage vector databases, embedding stores, and document stores used in conjunction with LLMs.
:
Monitor hallucination rates, token usage, and overall cost optimization for LLM APIs or on:prem deployments.
:
Continuously monitor models for its performance and ensure alert system in place.
:
Ensure compliance with ethical AI practices, privacy regulations, and responsible AI guidelines in LLM workflows.
Core ML Ops Responsibilities:
stylemargin:bottom:11.0px:
:
Design, build, and maintain robust CI/CD pipelines for ML model training, validation, deployment, and monitoring.
:
Implement version control, model registry, and reproducibility strategies for ML models.
:
Automate data ingestion, feature engineering, and model retraining workflows.
:
Monitor model performance, drift, and ensure proper alerting systems are in place.
:
Implement security, compliance, and governance protocols for model deployment.
:
Collaborate with Data Scientists to streamline model development and experimentation.
What were looking for
stylemargin:bottom:11.0px:
:
Bachelors/Master 's degree in computer science, Engineering, or related fields.
:
Strong experience with ML Ops tools (Kubeflow, MLflow, TFX, SageMaker, etc.).
:
Experience with LLM:specific tools and frameworks (LangChain,Lang Graph, LlamaIndex, Hugging Face, OpenAI APIs, Vector DBs like Pinecone, FAISS, Weavite, Chroma DB etc.).
:
Solid experience in deploying models in cloud (AWS, Azure, GCP) and on:prem environments.
:
Proficient in containerization (Docker, Kubernetes) and CI/CD practices.
:
Familiarity with monitoring tools like Prometheus, Grafana, and ML observability platforms.
:
Strong coding skills in Python, Bash, and familiarity with infrastructure:as:code tools (Terraform, Helm, etc.).Knowledge of healthcare AI applications and regulatory compliance (HIPAA, CMS) is a plus.
:
Strong skills in Giskard, Deepeval etc.
What you should expect in this role
stylemargin:bottom:11.0px:
:
Fully Remote Opportunity : Work from anywhere in theIndia
:
Minimal Travel Required : Occasional travel opportunities (0:10).
:
Opportunity to Work on Cutting:Edge AI Solutions in a mission:driven healthcare technology environment.
Job Type: Permanent