Skip to main content

Insight | Jul 26, 2022

crypt

Drupal 9 Migration Challenges: Tales from the Drupal 8 Crypt

By Brent Schultz

As the top Drupal agency in the world, our development team takes on the toughest migration challenges. For example, a recent project required two multilingual Drupal 9 sites to migrate into one unified Drupal 9 site. If that's not tricky enough, the two source sites were built in early Drupal 8 days before core media. Yeesh! Read on to learn how we approach challenging migration relics. 

Uncovering Archaic Roadblocks

An immediate challenge presented itself. The core migrate_drupal module has source plugins for the current version of Drupal but these are only used for reading data from the destination site. They can be used for moving data around the same site but not for migrating data from another site. 

Some helpful community members have created the contrib module migrate_drupal_d8, which partially addresses this issue. It has a limitation in assuming that the same entity types exist on both source and destination because it uses the field API on the destination site to get field definitions. The source sites had some custom entities, like Paragraphs, which do not exist on the destination but whose data needed to be migrated. Usually to knows or taxonomy terms on the destination.

This Drupal Migration Approach Brought to You by John Hammond

Our solution was to build a SQL based source plugins inspired by the source plugins from migrate_drupal_d8 but with more custom code and less reliance on field config. We used a typical setup involving migrate_plus and migrate_tools with migrations defined in YAML and run with drush commands.

We were not required to maintain revision history for entities. We only needed to migrate the current revision. This was fortunate because the usual way to achieve this and keep revisions in the correct order is to maintain entity and revision ids between the source and destination. This is probably not possible with two source sites since node ids etc would likely collide so no attempt was made to maintain entity ids. Instead, the database was allowed to allocate new ids via the usual serial integer fields and the migration mapping tables did their job of keeping the relationship.

“Our approach for multilingual entities was to use two migrations—one for the original language and one for translations.”

The initial language migration creates the new entities, therefore defining the new entity ids. Its source plugin has a SQL query something like this. Here are nodes for examples since it's the most likely, but nothing is specific to nodes here.

SELECT d.* FROM node_field_data d
INNER JOIN node n ON n.nid = d.nid AND n.vid = d.vid AND n.langcode = d.langcode.

The translation migration adds the translations now that we know the new node id. Its SQL query is something like this:

SELECT d.* FROM node_field_data d
INNER JOIN node n ON n.nid = d.nid AND n.vid = d.vid AND n.langcode <> d.langcode.

The YAML for the two migrations is identical except that the one for translation sets nid to the new destination node id by using the migration_lookup process plugin, adding translations to the nodes created in the first migration rather than creating new nodes.

Field values

Base field values will be included in the above queries because they're in the data table. However, other field values need to be obtained by querying the source field tables. E.g., if your source node has a field named field_description, you need to run a query in the source plugin prepareRow() method.

SELECT f.* FROM node__field_description f
WHERE f.entity_id = :nid AND f.langcode = :langcode AND deleted = 0

Where :nid and :langcode are parameters for the current row. The resulting array of data can be further manipulated if required and then added to the source data with $row->setSourceProperty(). This can result in a lot of queries (like loading any complex Drupal entity), but it's usually a lot easier to understand and maintain than trying to add table joins to the actual source plugin query. That would cause duplicates for multi-value fields.

Content Moderation

A big gotcha is a fact that, by default, content moderation always creates a new revision when an entity is updated. This causes havoc with the translation migration described above and any subsequent migration run with the –update option. New revisions are created that no longer match the ids in the migration mapping table. The solution is this patch which marks the destination entity as syncing during migration. Hopefully, that will soon be committed to core migration.

Media

Now that Media is in core, most images and documents should be media entities that reference a file entity. This usually means that two migrations are required. One to create the file entities, and one for the media entities. Both migrations can normally use the same source plugin. The media migration will use migraton_lookup to reference the previously created file entity.
 

“Don't be tempted to take shortcuts.”

We've seen code where the migration for media entities creates the file entity in a process plugin which returns the file id ready to be referenced by the media. This might work but it means that the migration system doesn't know about that file entity. It won't be deleted if the media migration is rolled back.

If the file is accessible then it can usually be downloaded with the download plugin as part of the file entity migration. You usually want to set file_exists: 'use existing' so you can run the migration multiple times without hammering the source site unnecessarily for files. That said, it's useful to remember that a rollback of file entities will delete the physical files so you might want to take some steps to preserve them if you need to rollback and rerun the migration.

Dependencies

Make sure you set migration dependencies (i.e., what needs to run before this one) using migration_dependencies: required: in the YAML. This will allow you to run a group of migrations in the correct order by using the –group option of drush migrate:import.

Considerations

If a migration appears to run correctly but drush migrate:status reports some unprocessed items, this usually means that the source plugin is producing duplicate ids. This will cause the migration to be not completed and any dependent migrations will not run. Check the source queries carefully. It's difficult to give specific advice, but you may need to add language condition or perhaps force uniqueness with a GROUP BY depending on the data.

These migration challenges can be scary. But, don’t worry. TAG can help ease your fears. Contact us today to talk more about how we can help.

Drop us a line

Have a project in mind?

Contacting Third and Grove may cause awesomeness. Side effects include a website too good to ignore. Proceed at your own risk.

Reduced motion disabled