My 2 Tech Cents: October 2013

Wednesday, 30 October 2013

The best open dataset about Cinema: Freebase. aka "how I fight with Freebase in order to push it in my database"

For my work about Linked Data in the Digital Library domain (and for personal passion in Cinema too) i investigated on how to have the most complete open dataset about movies in my local database.

After some quality evaluation I identified Freebase as my target.
These are the steps I used:

0) Try out freebase export by google. It does not work, their rdf has some syntactic problems.

1) Discovered :baseKb . Downloaded the most recent version (some month ago) :A copy of :BaseKB Lime derived from the 2012-02-10 Freebase RDF dump obtained using this tool:
https://github.com/paulhoule/infovore/wiki/:BaseKB%20Lime
Thanks paul Houle!
This tool cleans the original dump and make it loadable in a database.

2) Realize that it's very big.
With my test machine (4 cpu and 16 gb RAM) and one of the best RDF triple stores, Virtuoso 7 I could not end the data loading.

3) Built a Virtuoso script (and some manual iteratio):

Load the dataset in parts (4)
Created the list of object types related to the Cinema domain, manually from the Freebase website
Get the IDs of all the resources related to the cinema domain.
Load the dataset in parts
Export all the triples related to the IDs selected before.

LessonS learned:

The list of types is not complete (awards are not there for example)
Use a database for this kind of bulk processing is not ideal but it lets me use a tool I am familiar. Alternatives will be the topic of another source.
A lot of data are not usefull for me, the latest version of BaseKb is split in parts (see the news) and I could choose for which part to download. It's not free to download (someone has to pay for the big transfert!) but it will be the next step

Where I use it:

Used in this hackathon i organized. Very nice.
I will enrich the information of a video library
Just started to play with queries
Dreaming, some movies reccomendation.

Monday, 14 October 2013

And the winner is.. Team Fungo! #hackindustry #hackathon #h-farm

Abbiamo vintoooooooo!
H-industry è il nuovo form di hackathon lanciato lo scorso weekend da H-farm Ventures, incubatore aziendale che opera a livello internazionale in ambito Web, Digital e New Media, favorendo lo sviluppo di startup basate su innovativi modelli di business.

Ho partecipato alla 24 ore questo weekend dedicandomi a un progetto nel settore Automotive che sfrutta componenti di Texa, azienda leader nella costruzione di strumenti per diagnosi e autodiagnosi.

La nostra idea è stata quella di sfruttare in un'ottica nuova il device fornitoci, considerando i benefici per il guidatore.
Semplificando, l'idea è quella di legare la musica ascoltata in macchina alla situazione in cui ci troviamo con la nostra quattro ruote.

Il mio ruolo nel team è stato variegato ma provo a riassumerlo qui:

Product design, legato allo sviluppo dell'idea iniziale
Software Architect, quindi studio di fattibilità tecnologica e design dell'architettura del sistema
Data Scientist: studio dei dati a disposizione e del miglior modo di utilizzarli
Presentatore: ho presentato l'idea davanti a un pubblico di un centinaio di persone, tra cui uno dei fondatori di Texa e molti dirigenti di H-Farm

Il nostro progetto è stato molto apprezzato e.. abbiamo vinto!

In particolare i punti che mi è parso siano stati più graditi:

L'idea originale, laterale rispetto all'usuale settore dell'azienda
La solida idea di business elaborata
Il design del brand e la conseguente coerenza grafica (Grazie Designer MAI DORMIENTE!)
La presentazione che ha mescoltato visioni aziendali a un video introduttivo accattivante e simpatico per il pubblico
La completa analisi tecnica

Più di tutto mi sono divertito e ho conosciuto persone interessanti partendo dal Team Fungo per arrivare ai ragazzi di H-Farm e alle persone di Texa.
Grazie a tutti!
Ps. Qui di seguito la presentazione che abbiamo portato

Music Motive @ H-ack from Lino Possamai

Menu

Wednesday, 30 October 2013

The best open dataset about Cinema: Freebase. aka "how I fight with Freebase in order to push it in my database"

Monday, 14 October 2013

And the winner is.. Team Fungo! #hackindustry #hackathon #h-farm