Engineering your spark application for testing
Advice on how to design and build your Apache Spark application for testability
Problem Statement: Design scalable pipeline using spark to read customer review from s3 bucket and store it into HDFS. Schedule your pipeline to run iteratively after each hour. Create a folder in the s3 bucket where customer reviews in json format can be uploaded. The Scheduled big data pipeline will be triggered manually or automatically to read data from The S3 bucket and dump it into HDFS. Use Spark Machine learning to perform sentiment analysis using customer review stores in HDFS. Data: You can use any customer review data from online sources such as UCI
I need an expert for my project. I'll share complete details in chat
Design, develop and maintain BI Solution using Redshift, Tableau & Alteryx Optimize Redshift Data warehouse by implementing workload management, sort keys & distribution keys Required Skills: 4+ years of experience using databases such as Oracle, MS SQL Server 1+ Must have hands on experience of Amazon Redshift Architecture. 1+ AWS Pipeline knowledge to develop ETL preferably using AWS Glue for data movement to Redshift Strong knowledge on AWS environment and Service knowledge with S3 storage understanding Strong knowledge multiple cloud technologies including VPC, EC2, S3, Amazon API Gateway, DynamoDB, SimpleDB, AWS Route 53 Should understand nuances of moving data from RDBMS sources to Columnar DB (Redshift) Hands on experience in Data Warehouse (DW) and BI tool...
Advice on how to design and build your Apache Spark application for testability