What you'll do at
What you'll do...
Position: Data Engineer III
Job Location: 603 Munger Street, Dallas, TX 75202
Duties: Identifies possible options to address the business problems through relevant analytical methodologies. Demonstrates understanding of use cases and desired outcomes. Supports the development of business cases and recommendations. Drives delivery of project activity and tasks assigned by others. Supports process updates and changes. Supports, under guidance, in solving business issues. Utilizes knowledge of data value chains; data processes and practices; regulatory and ethical requirements around data; data modeling, storage, integration, and warehousing; data value chains (identification, ingestion, processing, storage, analysis, and utilization); data quality framework and metrics; regulatory and ethical requirements around data privacy, security, storage, retention, and documentation; business implications on data usage; data strategy; enterprise regulatory and ethical policies and strategies. Supports the documentation of data governance processes and support the implementation of data governance practices. Utilizes understanding of business value and relevance of data and data enabled insights/decisions; appropriate application and understanding of data ecosystem including data management, data quality standards and data governance, accessibility, storage, and scalability; understanding of the methods and applications that unlock the monetary value of data assets. Understands, articulates, and applies principles of the defined strategy to routine business problems that involve a single function. Utilizes knowledge of functional business domain and scenarios; categories of data and where it is held; business data requirements; database technologies and distributed datastores (e.g. SQL, NoSQL); data quality; existing business systems and processes, including the key drivers and measures of success. Supports the understanding of the priority order of requirements and service level agreements. Helps identify the most suitable source for data that is fit for purpose and perform initial data quality checks on extracted data. Utilizes data transformation and integration knowledge including: internal and external data sources including how they are collected, where and how they are stored, and interrelationships, both within and external to the organization; techniques like ETL batch processing, streaming ingestion, scrapers, API and crawlers; data warehousing service for structured and semi-structured data, or to MPP databases such as Snowflake, Microsoft Azure, Presto or Google BigQuery; Pre-processing techniques such as transformation, integration, normalization, feature extraction, to identify and apply appropriate methods; techniques such as decision trees, advanced regression techniques such as LASSO methods, random forests etc.; Cloud and big data environments like EDO2 systems. Extracts data from identified databases. Creates data pipelines and transform data to a structure relevant to the problem by selecting appropriate techniques. Develops knowledge of current data science and analytics trends. Utilizes Data Modeling including Cloud data strategy, data warehouse, data lake, and enterprise big data platforms; data modeling techniques and tools (for example, dimensional design and scalability), entity relationship diagrams, Erwin, etc.; query languages SQL / NoSQL; data flows through the different systems; tools supporting automated data loads; artificial intelligent enabled metadata management tools and techniques. Analyzes complex data elements, systems, data flows, dependencies, and relationships to contribute to conceptual, physical, and logical data models.
Minimum education and experience required: Bachelor's degree or the equivalent in Computer Science or a related field plus 2 years of experience in software engineering or a related field; OR Master's degree or the equivalent in Computer Science or a related field.
Skills required: Must have experience with: Utilizing Google DataProc Clusters to pull all the data from the on-prem redundant data stores; Developing Spark code using Pyspark or Spark Scala based on business requirement; Building the CICD pipelines during the deployments; Assembling large, complex data sets that meet functional/non-functional business requirements; Scheduling the jobs for the data pipelines using Airflow, crontabs; Developing data analytical processes using Hive; Developing automated script to run the jobs and alert mechanisms after each failed data load; Validating the data after loading into Hive tables /GCP Big Query to ensure the data quality and correctness; Utilizing the Kafka topics to produce/consume the data into GCP buckets. Employer will accept any amount of experience with the required skills.
#LI-DNP #LI-DNIWal-Mart is an Equal Opportunity Employer.
About Walmart
At Walmart, we help people save money so they can live better. This mission serves as the foundation for every decision we make, from responsible sourcing to sustainability-and everything in between. As a Walmart associate, you will play an integral role in shaping the future of retail, tech, merchandising, finance and hundreds of other industries-all while affecting the lives of millions of customers all over the world. Here, your work makes an impact every day. What are you waiting for?
Walmart, Inc. is an Equal Opportunity Employer- By Choice. We believe we are best equipped to help our associates, customers, and the communities we serve live better when we really know them. That means understanding, respecting, and valuing diversity- unique styles, experiences, identities, abilities, ideas and opinions- while being inclusive of all people.
"The people you work with are ready to help without hesitation to make sure you succeed, it's just part of the culture." - Shelby, Project Analyst
All the benefits you need for you and your family
Air Vent Inc |
Swinerton Builders |
Swinerton Builders |