Am seasoned Python developer can deliver the projects
? Develop the ETLflow in a job and file-based system handling data flow, updating databases based on specified pipelines (Iceberg, Parquet, etc.).
? Developed multiple microservices for updating job status, handling requests, and scheduling tasks using Spring Cron and Kubernetes CRON Schedules.
? Collaborated on pipeline design and implementation, triggering Spark jobs for data enrichment.
? Submit Spark Jobs with Kubernetes Pipeline
? Implemented Test Containers with Local Stack for local Docker testing and ephemeral Environment.
? AWS Certified Cloud Practitioner.
? Managed data in a multi-Tenant environment.
? Loading 100Milion record with 1600 columns approx. in 45-50 mins.
? Take the ownership complete the POC for the new tech stack and advise to application owners with the Pros and cons.
? Designed/Implemented the Reporting framework where users are given the flexibility to schedule their reports.
? Responsible for the UI screens with different functionalities like schedules of the reports, reports' destinations either of ftp locations or email etc.
? Proposed/Implemented a dashboard where all the reports distributed to the different customers/clients with their statuses. Any failure job can be re-triggered here.
? Implemented services that will serve the data in the form of JSON to the down streams.
? Implemented different forms where clients can post their details, choices, priorities etc which can be used for the business improvements
? Did the containerization of Spring Boot Application, make it Horizontally Scalable with Auto Scaling Group and multiple Nodes?
? Part of the team that design and Implement the Streaming Layer and Transport Layer Modules that Streams the data from all the upstream system connecting with various Data Source.
? Extracted Structured Data Sets from Different End Points Like Flat/CSV Files, Queues/Streaming, Webservice.
? Create the Internal Caching Layer which loads the Reference Data, this will be extended horizontally for multiple Processes for the enrichment.
? Implement the replication layer at HBase Database to provide fault Tolerance across the cluster. If Primary Cluster goes down.
? Developed a reconciliation tool where the data is extracted from two different sources, compared and gives the details on the data discrepancies.
Technical Skills
Database Postgres, Apache HBase, Apache Iceberg, Apache Hive, Data Storage in Apache Parquet.
Languages Java (11,17), Python, Rust (Knowledge)
Technologies Spring, Spring Boot, Apache Spark 3.3, Apache Kafka, AWS Service (SQS, Parameter Store, Secret Manager, IAM, AZ, Region, EKS, S3)
Architecture Microservices, Event Based Processing, Data Lake Architecture, CAP (Consistency, Availability and Partition Tolerance), Real Time processing with Kafka
Pyspark, Grafana,Postgres,Jupyterhub