Use of this data in the next post!
Step By Step How-to
- Build a csv with all the film names. I used folders names in my disk extracted using a linux command ( find . -type d > myMovies.csv )
- Import in open refine (I used ver2.6)
- Extracted movie name from folder name taking only the last part of the location. For me it was something like this in GREL:
value.split("/")[value.split("/").length()-1]
Here a guide:
https://github.com/OpenRefine/OpenRefine/wiki/Understanding-Expressions
- Reconciliation: There is now a cloud based reconciliation service for freebase, now working also with italian language. It should be included in open refine but it does not work for this bug https://github.com/OpenRefine/OpenRefine/issues/805. You can make it work creating a new standard one using this address: http://reconcile.freebaseapps.com/reconcile
- Run reconciliation selecting the "film" type
- Select the match with a "high" match using the facet on best match score
- match all cell to the highest candidate (from reconciliation->action)
- Manually find a match for other items
- Add new column based on the reconciliated one with expression: cell.recon.match.id
- In order to have the freebase rdf id add new column based on the last one with expression: "http://rdf.freebase.com/ns/" + "m." +
value.split("/")[value.split("/").length()-1]
- Download and compile rdf-extension: https://github.com/fadmaa/grefine-rdf-extension
- Edit rdf skeleton like this: (preview does not work because of https://github.com/fadmaa/grefine-rdf-extension/issues/89)
In order to make a uri out of row index I just used a custom vale: "http://www.mycinemaknowledge/video/" + value
- Export in rdf