OUR SECTORS

At European Tech Recruit, our sectors cover a wide range of industries within the field of technology.

Submit vacancy
Looking for
tech jobs in the US?
Visit USA Tech Recruit

Job search

Our sectors

Client services

About us

Looking for
tech jobs in the US?

Visit US Tech Recruitment

Client services

At European Recruitment, our sectors cover a wide
range of industries within the field of technology

Submit Vacancy

About us

At European Recruitment, our sectors cover a wide
range of industries within the field of technology

Submit Vacancy

Client services

Learn about the range of client services we offer at European Tech Recruit, and browse through our case sudies.

Submit vacancy
Looking for
tech jobs in the US?
Visit USA Tech Recruit

Looking for
tech jobs in the US?

Visit US Tech Recruitment

Our Sectors

At European Recruitment, our sectors cover a wide range of industries within the field of technology

Submit Vacancy

About us

Learn about European Tech Recruit's mission, values, our team, and our commitment to DE&I.

Submit vacancy
Looking for
tech jobs in the US?
Visit USA Tech Recruit

Looking for
tech jobs in the US?

Visit US Tech Recruitment

Our Sectors

At European Recruitment, our sectors cover a wide range of industries within the field of technology

Submit Vacancy

Senior IT Engineer (Contractor)ā€“ AI Infrastructure Management

Recruitment Consultant
Simon Troupe
Posted
6 hours ago

Senior IT Engineer – AI Infrastructure Management
Job Summary
Ā We are looking for a highly skilled Senior IT Engineer to manage a large-scale AI development and training infrastructure.
The role involves overseeing GPU servers, Kubernetes clusters (Rancher), and storage systems to ensure seamless operations and optimized performance. You will collaborate with development teams, ensuring they have the resources and support needed to run their projects efficiently.
This is a critical technical position requiring expertise in Kubernetes, hardware management, automation
Key Responsibilities:

  • Kubernetes and Rancher Management: Configure, scale, and maintain Kubernetes clusters and Rancher for multi-cluster management, ensuring optimal performance and resource allocation.
  • GPU Resource Management: Manage GPU resources and servers, ensuring efficient resource scheduling, load balancing, and performance optimization for AI workloads.
  • Storage Management: Maintain and optimize large storage systems, ensuring high availability, performance, and data persistence.
  • DevOps and Automation: Implement CI/CD pipelines and automate infrastructure management using tools such as Terraform, Ansible, Jenkins, and GitLab CI.
  • Monitoring and Troubleshooting: Set up and manage monitoring and logging systems (e.g., Prometheus, Grafana, ELK) to ensure high availability and rapid issue resolution.
  • AI Framework Optimization: Collaborate with data scientists and AI developers to optimize AI frameworks (e.g., TensorFlow, PyTorch) for GPU and cluster environments.
  • Security and Access Management: Implement and manage role-based access control (RBAC) and ensure data security, encryption, and backup procedures are in place.
  • Team Support and Collaboration: Provide technical support and training to AI teams, ensuring smooth operations and effective use of infrastructure.

Person Specification:
Required:

  • Proven experience in managing large-scale Kubernetes clusters and containerisation technologies (e.g., Docker).
  • Strong understanding of GPU resource management and optimization for AI workloads.
  • Expertise in managing large storage systems and implementing data persistence strategies.
  • Proficiency in scripting and automation (Python, Bash, Go), with experience in infrastructure as code (IaC) using Terraform, Ansible, or similar tools.
  • Familiarity with deep learning frameworks (e.g., TensorFlow, PyTorch) and experience optimizing them for large-scale environments.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, and ELK.
  • Excellent communication and collaboration skills, with a proactive approach to problem-solving and supporting technical teams.

Desired:

  • Experience with Rancher or other Kubernetes management platform
  • Experience in managing hybrid cloud environments
  • Preferred Red Hat Certified System Administrator (RHCSA)
  • Preferred Certified Kubernetes Administrator (CKA)
  • Preferred Mandarin Speaker.
Industry
AI & Machine Learning
Contract Type
Contract
Location
United Kingdom
Work Model
On-Site

Apply Now

By applying to this role, you acknowledge that we may collect, store, and process your personal data on our systems.

For more information, please refer to our
Privacy Notice

    Name
    Email
    Phone
    Location
    Message

    Upload CV:

    Choose file

    Formats: Word, PDF (max. size: 20MB)

    Subscribe for industry highlights.

    Send Application

     

    Other relevant jobs

    Posted 6 hours ago

    Senior IT Engineer (Contractor)ā€“ AI Infrastructure Management

    Type of contract
    Contract
    Location
    United Kingdom
    Type
    On-Site
    Posted 1 day ago

    GPU Algorithm Engineer – Image Quality Metrics

    Type of contract
    Contract
    Location
    United Kingdom
    Type
    On-Site
    Posted 4 days ago

    AI Researcher – Image Quality Metrics

    Type of contract
    Contract
    Location
    United Kingdom
    Type
    On-Site
    Posted 4 days ago

    AI Scientist

    Type of contract
    Permanent
    Location
    France
    Type
    On-Site
    Posted 4 days ago

    Senior/Principal Engineer – Video Compression and International Standardization

    Type of contract
    Permanent
    Location
    Germany
    Type
    On-Site
    Posted 4 days ago

    Chief AI Scientist – AI/ML Research

    Type of contract
    Permanent
    Location
    France
    Type
    On-Site
    Posted 4 days ago

    AI Researcher

    Type of contract
    Contract
    Location
    United Kingdom
    Type
    On-Site
    Posted 4 days ago

    Senior Software Engineer – Backend

    Type of contract
    Permanent
    Location
    France
    Type
    Hybrid
    Posted 4 days ago

    UK Standardization & Industry Development Consultant

    Type of contract
    Contract
    Location
    United Kingdom
    Type
    On-Site
    Posted 4 days ago

    Senior Research Scientist AI Theory

    Type of contract
    Contract
    Location
    United Kingdom
    Type
    On-Site
    Posted 4 days ago

    Machine Learning Infrastructure Engineer

    Type of contract
    Permanent
    Location
    United Kingdom
    Type
    On-Site
    Posted 4 days ago

    Software Engineering Manager

    Type of contract
    Permanent
    Location
    Greece
    Type
    On-Site
    Posted 4 days ago

    Senior AI Processor Software & Hardware Co-design Engineer

    Type of contract
    Permanent
    Location
    United Kingdom
    Type
    On-Site
    Posted 4 days ago

    Principal Systems Engineer

    Type of contract
    Permanent
    Location
    United States
    Type
    On-Site
    Posted 4 days ago

    AI Accelerator Architect for Embedded Systems

    Type of contract
    Contract
    Location
    Sweden
    Type
    On-Site
    Posted 4 days ago

    Machine Learning Resource for VLM

    Type of contract
    Contract
    Location
    Sweden
    Type
    Remote
    Posted 4 days ago

    Knowledge Engineer

    Type of contract
    Permanent
    Location
    Ireland
    Type
    On-Site
    Posted 4 days ago

    Observability – Principal EngineerĀ 

    Type of contract
    Permanent
    Location
    Ireland
    Type
    On-Site
    Submit CV
    Submit Vacancy