We need a small app created over AWS platform to import data 2 sources:
- Italian Company Register (import data in xls) -contains company name
- Linkedin (import data in csv) -contains employee names
- elaborate them to match company-employees and standardize several fields
- return a .csv properly formatted to be passed to another application odoo
-------------------------------------------------------------------
Step 1 create the database on AWS using RDS
A. Basic Tables (standard values, these tables must be editable)
1. Company Categories
2. Industries
3. Italian Provinces (territories)
4. Role (by categories)
5. Status (for company and individuals)
B. Imported raw data tables
6. Master Table Italian Company Registry (xls) MTICR
7. Master Table Linkedin MTLD
C. Output Tables
1. Company Table (Linkedin_Id, Companyregisted_Id, Category, Industry, Adress, Status, ... all other)
2. Location Tables (each company may have more locations in Italy)
2. Employee Table (Linkedin_Id, Name, Surname, Mail, Function, ....all other)
And relations tables
One company may have multiple locations
One company may have multiple employees
One individual has a company only and the link is his job role
One individual has a location
-----------------------------------------------------
Step 2 create a routine to
- import data from Italian registry in xls to Master Table Italian Company Registry (xls) MTICR avoiding duplicate records with the primary key unique and set company status=to be qualified
- import data from Linkedin extractor in csv to Master Table Linkedin MTLD avoiding duplicate records with the primary key unique status=to be qualified
-----------------------------------------------------------------------------------------
Step 3.a.
insert company data from Italian Register Master Table to Output Company Table
(IR_Pk, Company_name, Category, Location name, Adress, City, Province, Postcode, Country)
set company status=to be qualified
- Look up in Linkedin extractor Mastertable for a similar company name (with Elasticsearch) if the match is 95% then import the match (external key) otherways ask the operator if the match is correct
- If the match is correct add Ld_Pk (LinkedIn primary key for the company) both in Company and Employee Tables set employee status=in qualification company status=in qualification
for each individual on the LinkedIn table (IR_Pk, Company_name, Category, Location name, Adress, City, Province, Postcode, Country, Ld_Pk, status)
- Look up in Master Table Linkedin MTLD and compare to Basic Table Industry with elastic source
If there is good match with an existing industry import it or ask to supervisor
(IR_Pk, Company_name, Category, Location name, Adress, City, Province, Postcode, Country, Ld_Pk, Industry)
3.b. import locations data from Italian Register Master Table
(IR_Pk, ParentCompany, Location name, Adress, City, Province, Postcode, Country, Ld_Pk)
3.c. import individuals data from Linkedin Master Table and populate Company and Employee table
(Ld_pk_individual, Ld_pk_company (parent), name, surname, email, phone)
Use Elasticsearch to standardize the "headline" to one of roles this must be supervised form the operator if the match is not good
(Ld_pk_individual, Ld_pk_company (parent), name, surname, email, phone, Job_function)
- Set status Qualified
INDIVIDUAL (Ld_pk_individual, Ld_pk_company (parent), name, surname, email, phone, Job_function, Qualified)
COMPANY (IR_Pk, Company_name, Category, Location name, Adress, City, Province, Postcode, Country, Ld_Pk, Industry, Qualified)
ST4 all company qualified must be extracted and passed in a proper csv format to be imported in another sys