11. Dataimport

CubicWeb is designed to easily manipulate large amounts of data, and provides utilities to make imports simple.

The main entry point is cubicweb.dataimport.importer which defines an ExtEntitiesImporter class responsible for importing data from an external source in the form ExtEntity objects. An ExtEntity is a transitional representation of an entity to be imported in the CubicWeb instance; building this representation is usually domain-specific – e.g. dependent of the kind of data source (RDF, CSV, etc.) – and is thus the responsibility of the end-user.

Along with the importer, a store must be selected, which is responsible for insertion of data into the database. There exists different kind of stores, allowing to insert data within different levels of the CubicWeb API and with different speed/security tradeoffs. Those keeping all the CubicWeb hooks and security will be slower but the possible errors in insertion (bad data types, integrity error, ...) will be handled.

11.1. Example

Consider the following schema snippet.

class Person(EntityType):
    name = String(required=True)

class knows(RelationDefinition):
    subject = 'Person'
    object = 'Person'

along with some data in a people.csv file:

# uri,name,knows
http://www.example.org/alice,Alice,
http://www.example.org/bob,Bob,http://www.example.org/alice

The following code (using a shell context) defines a function extentities_from_csv to read Person external entities coming from a CSV file and calls the ExtEntitiesImporter to insert corresponding entities and relations into the CubicWeb instance.

from cubicweb.dataimport import ucsvreader, RQLObjectStore
from cubicweb.dataimport.importer import ExtEntity, ExtEntitiesImporter

def extentities_from_csv(fpath):
    """Yield Person ExtEntities read from `fpath` CSV file."""
    with open(fpath) as f:
        for uri, name, knows in ucsvreader(f, skipfirst=True, skip_empty=False):
            yield ExtEntity('Person', uri,
                            {'name': set([name]), 'knows': set([knows])})

extenties = extentities_from_csv('people.csv')
store = RQLObjectStore(cnx)
importer = ExtEntitiesImporter(schema, store)
importer.import_entities(extenties)
commit()
rset = cnx.execute('String N WHERE X name N, X knows Y, Y name "Alice"')
assert rset[0][0] == u'Bob', rset

11.2. Importer API

11.2.1. Stores

11.3. SQLGenObjectStore

This store relies on COPY FROM/execute many sql commands to directly push data using SQL commands rather than using the whole CubicWeb API. For now, it only works with PostgresSQL as it requires the COPY FROM command.

Table Of Contents

Previous topic

10. Full Text Indexing in CubicWeb

Next topic

Web side development