Software Engineer - Infrastructure, Data vacancy at Quora
General overview of the role
Our data infrastructure team maintains, operates and expands data ecosystem @Quora which includes Data warehousing, Streaming infrastructure, Distributed cluster-computing framework, Distributed query engines, Messaging systems, Data pipelines & Automation Tools. In this role you will be responsible for contributing to different aspects of data pipeline development and operational stability of the production big data systems. We leverage existing open source technologies like Spark, Flink, Kafka, HDFS, Hbase, Hive, Presto, Airflow and also build our own systems for experimentation & time-series analysis. As a member of our team you would spend time designing and scaling our distributed data systems, working closely with other teams to identify and execute on new use cases & evangelize the correct use of data at the company. We are looking for someone who will be excited by the prospect of optimizing, enhancing or even re-designing our company’s data architecture/pipelines to support our next generation data initiatives.
Design, implement, maintain and optimize data pipelines, architectures and data sets.
Collaborate with data scientists, platform engineers and business partners to understand data needs and drive key data infrastructure decisions.
Bring your expertise to help model structured & unstructured data. Own these data models at a high level & be a data consultant for partner teams.
Own the data definitions & lineage across different data platforms, maintain systems of record for operational and non operational data stores.
Engineer reusable capabilities, abstractions & resilience in data pipelines for DML, DDL, ETL & Data flows which can be leveraged across teams.
Be a data mentor & a team player with strong communication, prioritization, and adaptability skills.
Ability to be available for meetings and impromptu communication during Quora's "coordination hours" (Mon-Fri: 9am-3pm Pacific Time). Members of our Infrastructure Engineering team are not required to work the full coordination hours, but should anticipate that they will need to be available Mon-Fri from either 11am-2pm PST or noon-3pm PST at minimum. Learn why here
Proficiency in any/all of the programming languages: Python/Java/Scala & strong query authoring skills in SQL.
Must have 2+ years of experience building data pipelines, including data ingestion, cleaning, processing, transforming, staging & loading.
Proficiency with big data processing frameworks: Spark, Flink, Hive, Hadoop, Kafka, EMR, Presto.
Operational mindset with ability to do Problem diagnosis, Root cause analysis, SLA compliance, Performance tuning and Incident Management in Data Infrastructure.
Experience building data-intensive applications (high velocity/high volume).
Experience with SQL/NoSQL data store & data lake operations.
Skills considered as a good plus
Flexible and positive team player with outstanding interpersonal skills.
Passion for Quora's mission and goals.
Hands-on experience with AWS technologies like S3, Redshift, EMR/EC2, Athena, Snowflake.
Familiarity in designing and operating a streaming platform (eg. Kafka, Flink, Spark)
Data wrangling & Data tooling ability
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.