Thursday, 13 March 2014

My Cinema Knowledge: "my movies 2" aka kiss open refine

In this episode of " My Cinema Knowledge" I will try to link video files I have on my pc with Freebase. I tried to keep the process as simple as possible and this time the requiremt is only a normal laptop with open refine and its extensions.
Use of this data in the next post!

Step By Step How-to

  • Build a csv with all the film names. I used folders names in my disk extracted using a linux command ( find . -type d > myMovies.csv )
  • Import in open refine (I used ver2.6)
  • Extracted movie name from folder name taking only the last part of the location. For me it was something like this in GREL:
    value.split("/")[value.split("/").length()-1]
    Here a guide:
    https://github.com/OpenRefine/OpenRefine/wiki/Understanding-Expressions
  • Reconciliation: There is now a cloud based reconciliation service for freebase, now working also with italian language. It should be included in open refine but it does not work for this bug https://github.com/OpenRefine/OpenRefine/issues/805. You can make it work creating a new standard one using this address:   http://reconcile.freebaseapps.com/reconcile  

  • Run reconciliation selecting the "film" type
  • Select the match with a "high" match using the facet on best match score
  • match all cell to the highest candidate (from reconciliation->action)
  • Manually find a match for other items
  • Add new column based on the reconciliated one with expression: cell.recon.match.id
  • In order to have the freebase rdf id add new column based on the last one with expression: "http://rdf.freebase.com/ns/" +  "m." +
    value.split("/")[value.split("/").length()-1]
  • Download and compile rdf-extension: https://github.com/fadmaa/grefine-rdf-extension
  • Edit rdf skeleton like this: (preview does not work because of https://github.com/fadmaa/grefine-rdf-extension/issues/89)


In order to make a uri out of row index I just used a custom vale: "http://www.mycinemaknowledge/video/" + value
  • Export in rdf