Lead AI Engineer
Kanerika Inc
Job Description
About the Role
We are looking for a Lead AI Engineer who blends deep technical expertise with engineering leadership. This is a builder leader role: you will architect production grade AI systems, write and review code on critical paths, and grow a team of engineers delivering Generative AI, agentic AI and applied machine learning solutions for enterprise clients.
You will own the AI architecture decisions that span data, models, orchestration, infrastructure and governance. You will partner with product managers, solution architects, client stakeholders and platform teams to translate business problems into reliable, cost effective and responsible AI systems. You will also set the engineering bar through code reviews, design reviews, mentorship and hiring.
This is the right role for an engineer who has already shipped multiple AI systems in production, has scars from real world failure modes, and is now ready to lead architecture and people while staying close to the code.
Key Responsibilities1. Architecture and System DesignDesign end to end AI architectures spanning data ingestion, model serving, orchestration, observability and governanceMake and document technology choices with clear trade off analysis: open versus closed models, RAG versus fine tuning, single agent versus multi agent, real time versus batchDefine non functional requirements: latency budgets, throughput, cost per request, accuracy thresholds, uptime targets and graceful degradation pathsLead architecture review boards, write Architecture Decision Records (ADRs), and own the long term technical roadmap for AI capabilitiesDesign for scale, multi tenancy, security and compliance from day one, not as an afterthoughtEvaluate emerging AI technologies and decide what to adopt, what to pilot and what to ignore2. Hands On Engineering and DeliveryBuild and ship production AI systems: GenAI applications, RAG pipelines, agentic workflows, fine tuned models, classical ML servicesWrite production grade Python code, contribute to critical path components, and stay credible as a senior engineerReview code and designs for the team, raise the engineering bar, and be the final reviewer for high risk changesDrive evaluation and observability practices: golden datasets, regression suites, LLM as judge frameworks, tracing, metrics and dashboardsOptimize for cost, latency and quality through caching, batching, quantization, model routing and prompt optimizationOwn production reliability for AI services: on call rotation, incident response and post mortems3.
Generative AI, Agents and Applied MLLead the design of GenAI applications across LLMs such as Claude, GPT, Gemini, Llama and Mistral, including the decision logic for model selection per use caseArchitect RAG systems with rigorous retrieval evaluation, chunking strategies, hybrid search and rerankingDesign and implement agentic systems using frameworks like LangGraph, CrewAI, AutoGen, Claude Agent SDK or MCP based architecturesDrive fine tuning, instruction tuning and adapter strategies (LoRA, QLoRA, PEFT) when use cases justify themEstablish guardrails, prompt injection defenses, PII handling and red teaming as standard practice4. Team Handling and People LeadershipLead a team of 4 to 8 engineers including AI engineers, ML engineers and data scientistsRun sprint planning, design reviews, one on ones and quarterly performance conversationsSet clear technical and career growth plans for each direct report, and actively coach themHire, onboard and retain strong engineers; design and run technical interviewsFoster a culture of ownership, learning, blameless post mortems and shipping with qualityResolve technical disagreements with structured decision making, not seniorityRepresent the team in cross functional forums and shield the team from unnecessary thrash5. Stakeholder and Client EngagementPartner with product managers and solution architects to translate ambiguous business problems into shippable AI scopeLead solution scoping, estimation and proposal work for AI engagementsPresent architectures, trade offs and progress to executive and client audiences in plain languageManage stakeholder expectations on what GenAI can and cannot reliably do, and push back on hype driven asksContribute to pre sales by shaping demos, POCs and reference architectures6.
AI Governance, Safety and Responsible AIEmbed responsible AI practices: bias evaluation, explainability, model cards, data cards and audit trailsImplement guardrails for content safety, PII protection and prompt injection mitigationEnsure compliance posture aligned with frameworks such as EU AI Act, NIST AI RMF, ISO 42001, SOC2, HIPAA or GDPR as relevant to the engagementDefine and own evaluation frameworks for safety, robustness and quality regressionsRequired Qualifications
Experience8 to 12 years of total software and data engineering experience5+ years in applied AI/ML, with multiple systems shipped to production2+ years of formal or informal team leadership, including direct mentoring of 3 or more engineersDemonstrated ownership of at least one AI system from design to production to scaleTechnical SkillsProgramming and Engineering FoundationsExpert level Python, with strong software engineering fundamentals: testing, code structure, performance and debuggingSolid grasp of data structures, algorithms, concurrency and distributed systems conceptsProduction experience with FastAPI or equivalent, REST and gRPC API designComfort with at least one of Java, Go or TypeScript for cross team workSystem Design and ArchitectureStrong system design skills covering high availability, horizontal scale, caching strategies, queueing, idempotency and consistency modelsExperience designing event driven architectures using Kafka, Kinesis or equivalentMicroservices and API design experience, including versioning, backward compatibility and SLA contractsDatabase design across SQL (PostgreSQL, MySQL) and NoSQL (DynamoDB, MongoDB, Redis)Vector database design and tuning: Pinecone, Weaviate, Qdrant, Chroma, FAISS, pgvector or MilvusFamiliarity with security primitives: authentication, authorization, secrets management, network segmentation
AI and Machine LearningDeep experience with LLMs and GenAI: RAG, prompt engineering, fine tuning, evaluation and cost optimizationHands on experience with at least one agentic framework: LangGraph, CrewAI, AutoGen, Claude Agent SDK, OpenAI Agents SDK or Semantic KernelExperience with classical ML: regression, classification, clustering, ensemble methods using scikit learn, XGBoost or LightGBMExperience with deep learning frameworks: PyTorch (preferred) or TensorFlowHands on with LLM tooling: LangChain, LlamaIndex, Hugging Face, vLLM or Ollama
MLOps, LLMOps and CloudProduction MLOps experience: MLflow, Kubeflow, SageMaker, Vertex AI or Azure MLLLM observability and evaluation tooling: Langfuse, LangSmith, Arize, Helicone or equivalentContainerization and orchestration: Docker and Kubernetes in productionCI/CD for ML: automated testing, model registry, canary and shadow deploymentsStrong cloud experience on at least one of AWS, Azure or GCP, including GPU based workloadsInfrastructure as code: Terraform, CloudFormation or PulumiData EngineeringProduction data pipeline experience: Airflow, Dagster, Prefect or dbtDistributed data processing: Spark, Flink or RayData warehouse and lakehouse exposure: Snowflake, BigQuery, Databricks, Delta Lake or IcebergStreaming systems: Kafka, Kinesis or Pub/SubLeadership and Soft SkillsProven ability to lead engineers through ambiguity, set direction and drive deliveryStrong written and verbal communication, comfortable presenting to executives and clientsSound judgment on when to ship, when to refactor, and when to throw awayBias for action paired with disciplined risk managementExperience with Agile delivery models, sprint planning and engineering metricsEducationBachelor's or Master's degree in Computer Science, AI, Data Science, Mathematics, Engineering or related fieldEquivalent practical experience considered for exceptional candidates
Preferred QualificationsExperience leading AI engagements in a consulting or services environmentOpen source contributions, technical blog posts, conference talks or patents in AI/MLExperience building reusable AI platforms or frameworks adopted across multiple teamsCloud certifications: AWS ML Specialty, Azure AI Engineer, GCP ML Engineer or Databricks MLExperience with Model Context Protocol (MCP), tool calling standards and emerging agent protocolsDomain depth in BFSI, healthcare, retail, manufacturing or another regulated verticalHands on with cost optimization for LLM workloads at meaningful scale