Whether it is migrating from a Drupal 7 website to Drupal 8 or from any other technology (like Wordpress, Joomla, AEM, Sitecore, Typo3, DotNetNuke, etc) to Drupal 8, the most challenging task is the migration of the content. If we were to manually migrate this content, then it will call for a lot of manual effort. And that is never worthwhile and sometimes impossible. Drupal 8 provides an easy solution to this by providing built-in features for migration. Let's take a look at how. I will try and elaborate as much as possible so that this article can become your full guide to migrating content in Drupal 8.
Before we get into the how, let me create a glossary of some terms that I will use throughout.
Migration API is the key to move content from any source to Drupal 8. Migrations are basically ETL processes. ETL is the abbreviation for Extract, Transform Load processes.
Extract: This is the first phase of the extraction of data from a source. Be it a lower version of Drupal website or a non-Drupal one where data may be extracted in the form of CSV, XML etc, the data might be directly supplied from the database of the existing website we wish to update v.i.a source plugins.
Transform: Transform is the procedure of transferring data using the process plugins.
Load: This is used to load data to a destination in our Drupal 8 website.
There is a predefined set of source, process and destination plugin in Drupal 8 system that contains code for fetching data from the source, transforming them to D8 compatible format and loading them to a destination.
As already specified, D8 provides built-in features for migration of data from CSV, XML and a lot more. Here is how I used the CSV export from Sharepoint to move/transfer content to a D8 site.
Illustration: Migrating data from a CSV source
Consider the following CSV export from any website that we wish to migrate to Drupal.
books_info.csv
Book ID,Title,Year of publication,Author,Summary
1,In Search of Lost Time v2,2001,Marcel Proust,"Swann's Way, the first part of A la recherche de temps perdu, Marcel Proust's seven-part cycle, was published in 1913. In it, Proust introduces the themes that run through the entire work."
2,Ulysses,2014,James Joyce,"Ulysses chronicles the passage of Leopold Bloom through Dublin during an ordinary day, June 16, 1904. The title parallels and alludes to Odysseus (Latinised into Ulysses), the hero of Homer's Odyss..."
Plan of Migration:
- A custom module named custom_migration is created in modules/custom/ directory whose dependencies are:
- migrate
- migrate_tools
- migrate_plus
- migrate_source_csv
- The module contains configuration files (starting with migrate_plus.migration) with ID, Source, Destination and Mapping (Process) within /config/install directory. Enabling this module would generate respective configuration entities for the migration.&
- Source data is present in csv format within the asset folder :custom_migration/assets/csv/[.........].csv.
- Migrate tool Drush commands are used to manage migration.
Note:
- The first row in the CSV is the header row. This row is used as the key in the csv for each item to be migrated. It does not contain the actual data. header_row_count in the source section of the migration template is used to specify this.
- delimiter, in this case, would be ‘,’ since it separates two distinct data.
- enclosure is necessary to identify text or description. This will be illustrated while explaining the migration of “text long with summary”.
Migration template:
Create a migration template namely migrate_plus.migration.book_info.yml
id: book_info
label: Migrate Book Info
migration_group: Books
#source part begins
source:
plugin: csv
# Full path to the file.
path: 'modules/custom/custom_migration/assets/csv/books_info.csv'
# Column delimiter. Comma (,) by default.
delimiter: ','
# Field enclosure. Double quotation marks (") by default.
enclosure: '"'
# The number of rows at the beginning which is not data.
header_row_count: 1
# The column(s) to use as a key. Each column specified will
# create an index in the migration table and too many columns
# may throw an index size error.
track_changes: true
keys:
- Book ID
#process plugin contain details of processing
process:
# title is default title field in Drupal 8 nodes
# Title is key from csv export that contains data for migration
title: Title
field_publication_year: Year of publication
field_author: Author
# body needs to be migrated by specifying value, format and summary.
# this is the way to migrate text long with summary
# It is obvious that formatted texts have a value and a format part.
# Body field also contains the summary part.
'field_body/value': Summary
field_body/format:
plugin: default_value
default_value: full_text
'field_body/summary': ''
#target where data is migrated
destination:
#entity:node plugin is used to migrate to the bundle namely book
plugin: entity:node
default_bundle: book
- Every migration template contains an ID that uniquely identifies each migration template.
- label is used to provide a display name/label to the template.
- Multiple migrations may be grouped under a particular head by specifying migration_group. Till here were the basic data that we mention in the script.
- Next comes the source section. Here we specify details of the source like plugin, delimiter, path to the CSV file, enclosure and so on. Using plugin: csv means there is a plugin named csv in the Plugin>migrate>source directory of migrate_source_csv that contains code to return the configuration in the form of an array using the data specified.
- The process part contains the individual field mappings. Now each field can be further passed into the plugin to format them in a way that Drupal accepts. Here body field can be considered as an example where the body format field is processed by a plugin default_value. The default_value plugin is used to specify defaults in our template. A list of Plugin can be found here: https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/list-of-core-migrate-process-plugins
- Finally, we need to specify the destination. Here we have used entity:node Plugin by providing a default value of “book” to it.
- Note that each migration template is unique and it is important for us to understand as to how we can convert the source data into a form that Drupal can accept in its field types. This conversion is achievable by the use of various Plugins that already exist in several migration modules. If these plugins do not fulfil our requirements we might need to write down a custom one.
Once our template is in place we need to do the following:
- Enable the module using drush en custom_migration. This creates the corresponding configuration for yml within our install directory.
- Next, we list down migration using drush ms. This lists down all migration ids along with their status.
- Run migration by using command drush migrate-import [migration-id]
- For reverting back we may use drush migrate-rollback [migration-id]
There are several options that can be used with these commands. You may view the entire list using the help command.
Few illustrations:
Migrate to a Link field
Suppose the source contains data for actual URL(URL) and label/title(uri_title) for the link. Now link field in Drupal has two attributes uri and title. These should be specified explicitly in the template.
...
process:
'field_url/uri': url
'field_url/title': uri_title
In case we want a default title for the link fields:
...
process:
'field_url/uri': url
'field_url/title':
plugin: default_value
default_value: 'Click here!'
Migration to paragraphs within nodes
Suppose we have a multivalued paragraph field that is used to add “contact details” to a node. Now the property type “contact details” contain a Contact name, Contact email
Here we need to keep in mind that the data for each paragraph should be created separately and linked to node from the main migration template of the node.
For this the migration template for the node must contain the following:
...
#temporary contact to create contact data on the fly via some other migration.
temp_contact:
plugin: migration_lookup
#name of the migration that needs to be used for lookup
migration: contact
#To prevent the creation of a stub entity when no relationship is found.
no_stub: true
source: ID
field_contact:
#iterator plugin is used for multiple entries.
plugin: iterator
#the unique key that will be used to refer to the corresponding.
source:
- '@temp_contact'
process:
#data for paragraph goes to paragraph revisions.
#hence both target_id and target_revision_id are needed.
target_id: '0'
target_revision_id: '1'
The template for the creation of each paragraph is as follows:
id: contact
source:
...
#key is important should be a primary identifier
keys:
- ID
...
process:
type:
plugin: default_value
#machine name of paragraph type
default_value: contact
field_contact_name: Contact name
field_email_address: Contact Email
destination:
plugin: entity_reference_revisions:paragraph
Migrating data into the address field
Address field also follows the concept of data insertion to each attribute separately:
#country_code of the field address is assigned a default value of US
'field_address/country_code':
plugin: default_value
default_value: US
#language code has been assigned a default value of en
'field_address/langcode':
plugin: default_value
default_value: en
#Address 1, Address2, City, Zip Code, State are csv data that are inserted to individual parts of an address.
'field_address/address_line1': Address 1
'field_address/address_line2': Address 2
'field_address/locality': City
'field_address/postal_code': Zip Code
'field_address/administrative_area': State
'field_address/family_name':
plugin: default_value
default_value: NA
'field_address/given_name':
plugin: default_value
default_value: NA
Migration of Boolean type
Suppose a boolean field is created in Drupal 8 and the CSV contains data for this field in the form of TRUE/FALSE under the name Switch. We use a static_map plugin in this case for pre-defining the value that needs to get stored in Drupal 8 field for a particular value in CSV. The map is an array (of 1 or more dimensions) that defines the mapping between source values and destination values
....
field_boolean:
plugin: static_map
source: Switch
map:
'TRUE': 1
'FALSE': 0
Migration to Body field or text long with summary type:
...
'field_body/value': Copy_Content
field_body/format:
plugin: default_value
default_value: rich_text
'field_body/summary': ''
Migrate to taxonomy entity reference fields:
Let us consider there is data for terms under header “Terms” separated by a comma as follows - Jira, Slack, Agile
First of all, we need to explode the data for Terms column using ‘, ’ as the separator. The separator may be any unique character.
Secondly, we further pass it to plugin entity-generate that generates term on the fly for the vocab/taxonomy namely taxonomy_type. The entity_generate extends from the base plugin entity_lookup and creates an entity if it isn’t present.
The entity_lookup looks up for an entity if it already exists, otherwise, it returns a NULL.
...
field_tax_reference:
-
#explode is used to split data using a specific separator from csv.
plugin: explode
delimiter: ', '
source: Terms
-
plugin: entity_generate
entity_type: taxonomy_term
#key for the bundle taxonomy is vid
bundle_key: vid
#taxonomy where terms need to be created
bundle: taxonomy_type
#name for the terms should contain the data
value_key: name
Migrate to multi-valued plain text:
A multivalued entry in a field requires our source to contain data for each entry separated by a unique character/symbol within the same column/head. Now in the template, we need to specify the delimiter and explode our source data. In case it a single-valued field we need not use plugin explode.
...
field_plain_text:
-
plugin: explode
delimiter: ','
source: Plain text data
Migration to date field:
format_date plugin formats a date from from_format to to_format.
...
field_date:
plugin: format_date
from_format: 'n/j/y'
to_format: 'Y-m-d'
source: Date
The way the migration template is to be written would solely depend on the format of the data obtained from the source and the type of field to feed data into. Migrations involve a well-planned logic to feed data to Drupal while maintaining and creating all its dependencies on the go. This together with a clear understanding of how data is stored in D8 fields will help you automate the entire process of the data migration. Proper understanding of data mapping in terms of entities and fields and the above Drupal migration understanding will set you up for your migration. Moving content from any source to D8 is easy with Migrations now. Say good-bye to tiresome content update and hello to Migrations.