This final project should serve to bring together almost all of the topics we have been covering throughout this
class, from manipulating tidy data to designing and storing data in a relational database to working with
hierarchical data to the underpinnings of Client-Server computing and working with Web APIs by providers.
It is also intended to be self-defined. Do some exploration and find data from a provider that you are interested in
investigating.
Requirements
The requirements of this project include the following:
Work with a Web API designed by a Data System provider.
The selected provider must require Authentication and Delegated Authorization (OAuth 2) to enable
access to data not available to an anonymous user for at least some subset of the data.
The API must use HTTP for its requests and responses.
Find a second data source, which could be openly provided, or could be obtained through web scraping
techniques, that complements the data from 1.
Use the skills/knowledge from the Hierarchical Data Models unit to parse and extract tables of information
from XML or JSON from your primary provider.
Build a relational database from your data sources.
Practice good software development techniques:
Practice functional abstraction and associated code documentation
Program defensively, checking for error codes/returns in every client/server interaction, and handling the
error in a reasonable way
Give the acquired data meaning through exploratory data analysis and visualization to ask some questions
interesting to the student group.
Want to know more about it.
these are my skill related to web scraping and web crawling
Have done scraping in Nodejs, CasperJS Phantomjs, python.
Have done testing and automation with selenium also.
Know to deal with database like mongo, mysql, Elasticsearch.
Also know to handle proxy and captcha while scrapping.
Hello? I am a Data scientist with over 5 years experience in using R for web scrapping, data analysis, visualization and modeling. I can provide intuitive R code for posting, mining and authenticating tweets through R- studio and twitteR package. This library allows communication with API. I can use tidyverse to collect and store the twitter data. I have done a similar past project and can provide sample code. I look forward to your response.
Hello, I have qualifications to do the job.
I have worked in twitter API for an academic article, including some exploratory analyzes, I also worked in a data mining company where I worked with http requests.