My 2 Tech Cents: November 2013

In this first episode of " My Cinema Knowledge" I will try to describe my film catalog mixing private information (my disk folders) with public ones (Freebase)
I will use Ubuntu, Open Refine, a little python script and a RDF store, Virtuoso.

Step By Step How-to

Build a csv with all the folders names in my disk using a linux command ( find . -type d > myMovies.csv )
import in open refine (I used lod refine, a package including open refine and the rdf extension)
Extracted movie name from folder name taking only the last part of the location
Added a reconciliation service based on the freebase dump created previously (the making of is described in This post) imported in a Virtuoso triple store
For this I used the SPARQL based reconciliation service feature of the RDF extension
Using a custom reconciliation service over freebase I will not be limited to the english languages provided by the Freebase reconciliation service
After 10 minutes on a 8gb Ram machine, this the results (out of about 310):

138 movies automatically recognized
66 movies with multiple choices (semi automatic)
109 without a match

Reason for the missing matches are:

Missing in Freebase (mostly italina movies)
Missing italian title in Freebase
Missing in my Freebase copy
Some intermediate folder (about 15)

I also got a severe BUG in selecting new matches: https://github.com/fadmaa/grefine-rdf-extension/issues/82 (grrrr)
UPDATE!!!
There is also a cloud based reconciliation service for freebase, now working also with italian language. It should be included in open refine but it does not work for this bug https://github.com/OpenRefine/OpenRefine/issues/805. You can make it work creating a new standard one using this address:
```
http://reconcile.freebaseapps.com/reconcile 
```
Copy reconciled data in a new column
Exported csv. on raw for example is:
./doppiati/1984 , 1984, http://rdf.freebase.com/ns/m.03kp2l
Transformed the csv to rdf using a Python script as simple as this using python-rdflibsudo apt-get install python-pip (ubuntu)
```
sudo pip install rdflib
```
To use in this way:
python myMoviesToRDF.py myMovies-csv.csv myMovies.ttl
Upload data into my RDF store
Enjoy data analysis NOW!
In the first attemp i used SPARQL queries in order to get the genre ranking, the directors ranking and the director nationality ranking. A first attemp now, some more will come soon!

An interactive version here

who : viceministro dell'Economia, Stefano Fassina:

Il taglio delle pensioni d'oro, anche nell'ipotesi di considerare 'd'oro' le pensioni superiori a 3500 euro netti mensili, implica risparmi di alcune centinaia di milioni di euro all'anno".

source:
http://www.repubblica.it/politica/2013/11/08/news/reddito_di_cittadinanza_la_proposta_di_grillo_copertura_con_imu_su_immobili_della_chiesa_e_taglio_pensioni_d_oro-70523273/

who: wallstreetitalia

Nel 2011, il 5,2% dei pensionati (861mila persone in tutto), che percepisce un assegno mensile superiore ai tremila euro, ha assorbito in tutto 45 miliardi, vale a dire il 17% della spesa previdenziale. Poco meno di quanto sborsato per i 7,3 milioni di italiani, il 44% del totale, il cui reddito non supera i mille euro al mese. In cifre 51 miliardi in tutto, pari al 19,2% della spesa complessiva.

source:
http://www.wallstreetitalia.com/article/1641597/le-pensioni-d-oro-costano-45-miliardi.aspx

Io:
Mi sono perso qsa?

Menu

Wednesday, 20 November 2013

My Cinema Knowledge: "my movies" aka Multi-language reconciliation using Freebase

Monday, 11 November 2013

Fact checking : Fassina vs wallstreetitalia