Mô tả công việc
DevOps Engineer (Observability Platform)
Các trách nhiệm chính
-
Job Description
- Plan and execute infrastructure projects that improve observability tools and platforms, including metrics, logging, distributed tracing, dashboarding, alerting, application performance management
- Build and enhance tools, frameworks, and pipelines for monitoring, logging and tracing across VPbank's systems.
- Developing and maintaining systems to enable our engineering colleagues to confidently run their services with high quality, and quickly understand and mitigate any technical issues
- Working with engineering colleagues to contribute to our observability strategy with a strong emphasis on open-source libraries, data formats and query languages
- Work with other stakeholders to integrate observability best practices into the development lifecycle and improve overall system resilience.
- Contribute to on-call rotations
- Join with the team to define and implement observability standards (e.g., SLIs, SLOs, dashboards).
- Partnering with our observability vendors.
Trình độ đào tạo
Đại học in Công nghệ thông tin or Khoa học máy tính
Yêu cầu
-
Yêu cầu ứng viên
- 3+ years of software engineering experience, focused on platform engineering, observability, or site reliability engineering (SRE)
- Have experience building and operating infrastructure at scale.
- Proficiency in programming languages such as Go, Python, Java, or similar.
- Hands-on experience with observability / monitoring tools like OpenText, Solarwinds, Prometheus, Grafana, Loki, Jaeger, or OpenTelemetry.
- Strong knowledge of distributed systems, microservices architecture, and cloud platforms (e.g., AWS, GCP, Azure).
- Proficiency in CI/CD tools (Jenkins, GitLab, etc.)
- Experience with administering Linux systems and writing scripts to facilitate operation.
- Experience with Infrastructure-as-Code tools like Terraform, Helm, and orchestration languages like Ansible.
- Experience with Data tools: Kafka, Clickhouse, SQL, Redis
- Ability to debug complex systems and design solutions that scale with high traffic and data volumes.