Cognizant is unable to provide employer sponsorship at any point now or in the future.
Business Understanding
Understand and define data vison (strategic data requirements) based on business requirement and translate the business requirement into a technology requirement.
Provide high-level integrated designs to meet business requirements
ELT(Data Processing)
Understand various data sources and source data structure
Understand data processing requirement – like real time, near real time, batch
Understand read pattern, write pattern, usage of data, size of datasets to select right data processing tools
Understand scalability, reliability, maintainability and recoverability requirement
Define source to target dataflow and ensure data security in the dataflow diagram by ensuring following – data is secure at rest, at motion and while in use
Evaluate schemas of various data sources and select right target data format (for analytical or transaction processing) to enable vectorized processing
Select right compute and storage infrastructure to process data. Perform POC to evaluate tools if necessary.
Define framework, standards, policies and best practices for data processing
Collaborate with various stake holders to get feedback on data processing
Data Lake
Define folder structure based on various subject areas and underlying modules
Define data archival strategy(data lifecycle) based on business requirement
Classify data according to its sensitivity and define access control
Define standards, policies and best practices to store and organize data into data lake
Get feedback from various stake holders/users of data lake store
EDW
Understand how data should be organized and managed
Work with data modeler to define data models which meets data vision of the organization
Identify the EDW solutions which matches the scalability, reliability, recoverability and maintainability needs. Perform POC if necessary to select right tools.
Classify data according to its sensitivity and define access control
Define data mapping specifications, data lineage
Define standards, policies and best practices
Get continuous feedback from various stake holders/users of EDW solutions to make sure it matches user expectations
Analytical
Understand business use cases for analytics. Analyze and prioritize use cases based on data availability, schedules and current environment.
Understand usability, security and stability requirement to select right tools
Select technologies and tools for analytics by considering current and future needs. Perform POC if require to select right tools
In depth understanding of Spark Architecture including Spark Core Spark SQL Data Frames Spark Streaming RDD caching Spark MLib
Expertise in using Spark SQL with various data sources like JSON Parquet and Key Value Pair
Experience in creating tables partitioning bucketing loading and aggregating data using Spark SQLScala
In depth understanding of Azure cloud and Data lake and Analytics solutions on Azure
In-depth understanding of database structure principles
Experience gathering and analyzing system requirements
Primary Skills Hands on Proficiency in Spark development PySpark
Experience in designing and developing data pipelines using ETL solutions Talend would be ideal
Experience in Big Data ecosystem components Hive HDFS etc.
Experience with RDBMS systems MySQL SQL Server etc.
Experience with Agile Development methodologies
Strong knowledge and working experience with SQL, Databricks
String understanding about the underlying architecture of data components (Access control/Configurations/ Performance blockers)
Working experience with Azure Data Factory, Azure Data Lake Store, Azure Synapse , Databricks
Hands on experience with Pyspark, Dataframe API, SQL API
Having very good understanding on Spark, Hadoop Map Reduce framework
Hands on experience with Data warehousing
Having prior experience in performance tuning for Big data work load (Spark or Map-reduce framework)
Having prior experience of handling structure and unstructured data
Strong knowledge in release management of azure components
Must be able to appreciate the process control, change control.
Must be able to provide platform solution/guidance about various cloud services, database technologies and pipeline structures of Ingestion to consumption layers.
Nice To Have:
Awareness of data security, DMZ, Encryption mechanism, VPC etc
Hands on experience in building devOPS pipeline
Awareness about DataOPS
Technical Skills
SNo
Primary Skill
Proficiency Level *
Rqrd./Dsrd.
1
AIA-Project Management
PL1
Required
Domain Skills
SNo
Primary Skill
Proficiency Level *
Rqrd./Dsrd.
1
Applying cash to invoices
NA
Required
* Proficiency Legends
Proficiency Level
Generic Reference
PL1
The associate has basic awareness and comprehension of the skill and is in the process of acquiring this skill through various channels.
PL2
The associate possesses working knowledge of the skill, and can actively and independently apply this skill in engagements and projects.
PL3
The associate has comprehensive, in-depth and specialized knowledge of the skill. She / he has extensively demonstrated successful application of the skill in engagements or projects.
PL4
The associate can function as a subject matter expert for this skill. The associate is capable of analyzing, evaluating and synthesizing solutions using the skill.