Intermediate Data Engineer (Remote CAN/UK)
Atreides
Company Overview: Atreides helps organizations transform large and complex multi-modal datasets into information-rich geo-spatial data subscriptions that can be used across a wide spectrum of use cases. Currently, Atreides focuses on providing high-fidelity data solutions to enable customers to derive insights quickly.
We are a fast-moving, high-performance startup. We value a diverse team and believe inclusion drives better performance. We trust our team with autonomy, believing it leads to better results and job satisfaction. With a mission-driven mindset and entrepreneurial spirit, we are building something new and helping unlock the power of massive-scale data to make the world safer, stronger, and more prosperous.
Team Overview: We are a passionate team of technologists, data scientists, and analysts with backgrounds in operational intelligence, law enforcement, large multinationals, and cybersecurity operations. We obsess about designing products that will change the way global companies, governments and nonprofits protect themselves from external threats and global adversaries.
Position Overview: We are looking for a skilled AWS Infrastructure/Data Engineer with a focus on Infrastructure as Code (IaC) and data engineering. Your role will involve designing, implementing, and maintaining scalable, secure cloud infrastructure on AWS, while also specializing in designing, implementing, and optimizing data pipelines and storage solutions. You will ensure the reliability and availability of infrastructure for data processing on a petabyte scale and collaborate closely with our infrastructure engineers and data engineers to automate and streamline data processes.
Team Principles:
At Atreides, we believe that teams work best when they:
- Remain curious and passionate in all aspects of our work
- Promote clear, direct, and transparent communication
- Embrace the 'measure twice, cut once' philosophy
- Value and encourage diverse ideas and technologies
- Lead with empathy in all interactions
Responsibilities:
- Develop and maintain scalable data pipelines using Python, Spark, Sedona, SQL
- Implement and manage Iceberg tables, ensuring efficient data storage and retrieval.
- Optimize data storage solutions, including hot, cold, and glacier storage tiers.
- Develop and enforce data retention policies and ensure compliance with data governance standards.
- Collaborate with software engineers to ensure the infrastructure effectively supports application requirements.
- Ensure data security and implement necessary measures to protect sensitive information.
- Monitor and troubleshoot data pipelines and infrastructure to ensure high availability and performance.
- Document infrastructure design, data engineering processes, and maintain comprehensive documentation.
Desired Qualifications:
- Experience with data orchestration tools such as Airflow, Prefect, Dagster, or Temporal.
- Proficiency in Spark, Scala, Golang, and Python.
- Experience with Postgres, GraphQL, and other data manipulation tools.
- Knowledge of big data tools and environments, including Apache Iceberg and other datalake concepts.
- Familiarity with geospatial data formats such as Parquet/GeoParquet, GeoJSON, and Shapefiles.
- Excellent problem-solving skills and the ability to think quickly in a high-performance environment.
- Effective communication skills to convey technical concepts to both technical and non-technical stakeholders.
Compensation and Benefits:
● Competitive salary
● Comprehensive health, dental, and vision insurance plans
● Flexible hybrid work environment
● Additional benefits like flexible hours, work travel opportunities, competitive vacation time and parental leave
While meeting all of these criteria would be ideal, we understand that some candidates may meet most, but not all. If you're passionate, curious and ready to "work smart and get things done," we'd love to hear from you.