Insight | Jan 19, 2021
Ask a Core Maintainer Anything 2020
By Nathaniel Catchpole
We set up an internal ask me anything session with Nat, the Drupal core maintainer that works at TAG that we sponsor to contribute to Drupal core every day and let our engineering team ask him anything they wanted. We got into some very deep topics and had learned some things that quite surprised us. Below is a transcript of the most interesting bits of the conversation.
Read on for insights into Drupal 8 and 9!
How do you decide what to work on?
I start my day, and really spend all my time, on the Ready and Tested By the Community (RTBC) queue on Drupal.org. Typically there are always 40-100 issues waiting to be committed. I think it’s been empty just once in the past five years. Some of the issues are small and some of them are huge 200k patches. So really at 9 AM I get on the queue and work the issues, kicking some back for additional refinements and committing some to core.
At the end of the day, there might be more in the queue than when I started. It’s a bit Sisyphean.
In addition, I believe it’s crucial as a Core Maintainer to do client work because you need that perspective from using Drupal in the wild for real client uses. I think that informs our work on core better. I spend roughly half of my time doing client work for these reasons. So it does happen that something that annoys me on a client ticket I will look at improving it in core.
I’m also jointly responsible with xjm for the release schedule. This means keeping an eye on issues that we’re hoping to get done for the next minor release, as well as identifying blockers (and sometimes unblocking) for Drupal 10, which means trying to plan a year or three ahead. This could be features, critical bugs, or dependencies like Symfony.
Is working on Core really just like an issue queue for a contributed module, just bigger?
Essentially. There are always core initiatives happening, some of which have their own teams and funding, so function more like a contributed project within a project. But, at the end of the day, all of that work ends up as an issue in the queue that I work on. Every change in core needs to be worked on by at least three different people (the patch or merge request author, a reviewer, and a committer), so there is no directly committing code for core maintainers, it’s always a collaborative process whether you have commit access or not.
How do you handle strategic long term concerns, when things could go in two different philosophic directions, and conflict?
We don’t have an official guidance document or anything; we really just try to hash things out. There are long-running ones. Often with these big decisions we make a meta issue and hash it out there, and at some point, consensus starts to push it in one direction or the other.
Events versus hooks is a good example of a major issue that as yet has not been resolved. In Drupal 8 we added a Symfony events system because it was needed by core functions like routing. We then started adding events in Drupal Core that contributed modules could take advantage of. We also kept the hook system in place, in parallel.
Some people want to deprecate hooks entirely and use only events. Others, like me, prefer to modernize the hook system and drop the event system entirely. We don’t yet have a consensus on this issue, but I’m confident at one point we will. Until then we’re stuck with both, but while this might be an annoyance sometimes, it’s not really broken.
When there are important bug fixes that are also blocked on architectural changes things can become a bit more urgent, although this doesn’t always mean they’re resolved faster. Having said that, it’s rare that architectural disagreements are what holds an issue up for very long. What tends to take most of the time is ensuring that changes in one system don’t break another and that people are able to move from old APIs to new ones smoothly - more like city planning than architecture.
What are the shortcomings of turning ideas in the issue queue into Drupal functionality?
It’s better than it used to be. In Drupal 6 and 7, it was challenging because the release cycle was 3-4 years and no new features could go into Drupal 6 or 7 after their first stable release. So if new functionality didn’t make it into Drupal 7, that was often kicked back or punted to Drupal 8.
With our six-month release cycle, we now have a clear, quick process to get new functionality into the next release of Drupal or set it on a clear path for a future release, no more than a year away usually. With experimental modules, it’s also a lot easier to introduce things in stages, while keeping the overall system working
We have a lot of technical debt in core so there have been some occasions where two groups of people will work on different issues, not knowing the other issue exists, and not knowing that these two issues are actually duplicates. Work can get wasted, buried in those unknown duplicates.
There is a new initiative called Bug Smash that is trying to address this, cleaning out old issues and cleaning up the past. It’s been a great success so far to reduce technical debt in Drupal, but it’s still resolved hundreds of issues in a backlog of thousands.
Getting more people involved in core is great, but it’s required to add more structure to how issues progress. I think this speeds up throughput overall since it’s easier to see at what stage any particular issue is at, what’s left to be done, what should be tackled in a follow-up or parallel issue etc., but at the cost of additional steps for each individual issue compared to core development say ten years ago.
What are your favorite things to come out of Drupal 8 & 9?
The two major ones for me are the Entity System and the caching layer.
The Entity System changes in Drupal 8 compared to 7 have been great. It actually is a coherent system now. We added automated Entity schema updates late in Drupal 8, which introduced upgrade path bugs at the time but made the system much more powerful. It feels solid and complete now.
Cache tags and context, the new caching layers in Drupal 8, was one of the first things I worked on in core. Lots of people implement the system wrong, I notice this in client work, but when you implement it right you have a really powerful, granular caching system at your disposal that works with edge caching layers.
What is the right way to use cache tags?
When you add stuff to the cache system you have a cache ID (like an Entity ID). A unique identifier for the thing that you are caching. The context, when you are rendering, is not the identifier for the thing, but things that come from the request, like the current user, or time zone, etc. So even if you’re rendering the same content teaser, contextual links may or may not show up depending on permissions, or the author name may or may not link to their user profile depending on whether you’re allowed to access it. The combination of cache ID and cache context affects which actual cache object the render API will retrieve when building a page.
Cache tags are stored with the cache items. So a list of ten nodes might have cache tags for each node in the list, as well as the authors of each node and any media items. When a node is updated, we invalidate its cache tags, and the next time that list of nodes is rendered, the tags are checked and it’ll be a cache miss. The list of nodes has to be rerendered, the node we updated has to be rerendered, but the other nine nodes in the list will usually be retrieved from the cache because they don’t have that cache tag.
The combination of cache contexts and cache tags means you’re showing the right thing to the right people at the right time.
Where people tend to get it wrong are areas like access checks. For example, if I’m implementing an access check depending on the author of a node, someone could come along, edit the node, and change the author, and my access check is still based on the old author. These bugs can be hard to track down because by the time a developer goes to look at the information being displayed, the cache may have been invalidated for some other reason. By using <code>RefinableCacheableDependencyInterface :addCacheableDependency($node);</code>, when the node is updated, the access result cache will be invalidated too via cache tags.
Another area that people get wrong, or rather often don’t know exists, is placeholdering. So if you have something that is per user, ideally you want to placeholder that and we have a Placeholder API in the render system. If you have 200,000 users on a site, you don’t want to cache your entire site header 200,000 times - you want to cache as much as possible once, and their username separately, even though it’s rendered as part of the header. You can take advantage of the placeholdering API almost for free - it just requires using a #lazy_builder callback in a render array for the content that is per-user, instead of putting everything in there directly, then the render API will figure out that this is content which needs placholdering almost by itself.
Drop us a line
Have a project in mind?
Contacting Third and Grove may cause awesomeness. Side effects include a website too good to ignore. Proceed at your own risk.