Write PHP program to mine e-mail and info from HTML-page posted every day
$30-250 USD
Plačilo ob prevzemu
Every workdays, lists with company info is published on: [url removed, login to view], in HTML format, 20 links per page for different number of pages.
Your mission is to
1) write a script to go through all OLD such days and provide an excel-sheet with the same info as described under 2) below for all the previous dates, and
2) write a PHP program that with a cron-run will run once per day and parse the last such set of links (an example will be provided last) for particularly e-mailaddresses and the following information around them:
* E-post: (e-mailaddress but not ones that are used before, because then it's likely an "aggregating" address)
* Org nr (that will generally be the first in one listing for a company (that has the e-mailaddress above, i.e. it belongs to the same company)
* Firma:
* Verksamhet:
* Säte:
* Bildat:
* LASTNAME and FIRSTNAME which is deduced in this order:
IF it says "verkställande direktör", choose the first name (not number) directly after as LASTNAME, then all the names between the comma (,) after and the next comma (,) as FIRSTNAME.
ELSEIF it says "ordförande", then the same names as above directly after the word "ordförande",
ELSEIF it says "Styrelseledamot", do the same thing if the above two does not exist.
3) These records should be posted to a predefined URL populated with the variable names picked up under 2 above (one and one, with 1 seond apart), with their respective variables starting at 09:30 CET in the morning.
You'll get the URL to post them to when you've been accepted for the job :)
An example of the text to be parsed is (it's in Swedish):
Kungörelsetext:
Org nr: 556284-1934
Firma: Upphinds Pålning & Entrepenad AB
Säte: Uppsala
Postadress: Långsjövägen 8, 740 10 ALMUNGE,
E-post: EMAILADDRESS_THAT_I_SHOULDN'T_BE_POSTING_HERE
Typ: Privat aktiebolag
Bildat: 2014-09-19
Verksamhet: Aktiebolaget ska bedriva anläggningsarbeten, reparationer av motorer och hydraulik.
Räkenskapsår: 0101 - 1231
Aktiekapital: 50.000 SEK. Lägst: 50.000 SEK. Högst: 200.000 SEK. Antal aktier: 50. Lägst: 50. Högst: 200.
Kallelse: Kallelse ska ske genom brev.
Föreskrift om antal styrelseledamöter/styrelsesuppleanter: Lägst antal ordinarie ledamöter: 1, högst antal ordinarie ledamöter: 2. Lägst antal suppleanter: 1, högst antal suppleanter: 2.
Förbehåll/avvikelser/villkor: Bestämmelse att företaget inte behöver ha revisor.
Styrelseledamöter: 19780226-0096 Michalak, Anders Hans Bertil, Långsjövägen 8, 740 10 ALMUNGE,
Styrelsesuppleanter: 19560831-1022 Michalak, Lena Sofia Marianne, Kusbyvägen 27 A, 763 35 HALLSTAVIK,
Firmateckning: Firman tecknas av styrelsen
Rättelse: Den registrering som gjordes den 24 september 2014 var felaktig i fråga om följande uppgifter: Firman. Korrekt firma är: Upplands Pålning & Entreprenad AB.
ID projekta: #9267506
Več o projektu
Dodeljeno:
Hello, I am John Ofagbe a professional web developer and software engineer. I have read through your project description and I can write the cron-job script to auto post as specified. Please kindly provide the URL and Več
4 freelancerjev ponuja v povprečju za $189 na tem delu
Hi! Can help you with your task! Have a 5-y experience with PHP and Linux. Will be glad to work with you!