Why I love data files in Jekyll
Chen Hui Jing / @hj_chen
Hello everyone! This is the first time I'm presenting at an online conference, so it's pretty cool to be speaking to everyone from my own room for a change. It's also my first JekyllConf, so this kinda exciting. Today I'm going to be talking about how I used data files to help me manage the website for SingaporeCSS, which is the only CSS-centric meetup group in Singapore at the moment.
🇲🇾
👾
🏀
🚲
🖌️
👟
💻
🖊️
🎙
🦊
🥑
🧗
🏳️🌈
My name is Hui Jing, and these are the emojis that pretty much describe who I am as a person. There is also a diagram explaining my name because I found that Chinese names can sometimes be confusing. Anyway, feel free to interpret the emojis however you like. If any of them don't make sense to you, maybe you could tweet me or something.
🥑 Developer Advocate 🥑
I also have a day job, because gainful employment is a kind of a good look on me. Currently, I'm with Nexmo as a Developer Advocate. Nexmo being a company that does communication APIs for messaging, voice, verification among many others.
What are data files?
Custom data that can be accessed via the Liquid templating system
Supported formats:
Helps avoid repetition in your templates
Keeps the focus on the content
Jeykll is currently on 4.0.0, released last month. If you are on GitHub pages, then the gem's version is 3.8.5. The ability to store data in external files then access them was first introduced in 2013 for the 1.3.0 release so it had already been a feature when I picked up Jekyll for the first time.
There are loads of static generators out there today and if we look at the most popular few, Jekyll isn't the only one that supports external data files. Hugo has something similar but it's called Templates, Gatsby uses GraphQL to manage data sources. I have no idea what Next.JS does. But my point is, the ability to access custom data stored in external files makes maintenance and organisation of the site much easier to handle.
Using data files (1/2)
Place data files (supported formats only) in the _data
folder
Jekyll will recognise them during site generation
Data will be accessible via site.data.NAME_OF_FILE
Some data file basics. Place the data files you'd like Jekyll to access in a _data
folder at the root of your project, and Jekyll will parse them during the site generation process. The data in those files can be accessed with the syntax, site.data.
whatever you named your data file.
Using data files (2/2)
speakers.yml
gloria:
name: "Gloria Soh"
twitter: "Gloriasohss"
shortcode: "gloria"
bio: "Gloria is a CSS queen who is bitcoin rich."
jacob:
name: "Jacob Tan"
twitter: "jacobtyq"
shortcode: "jacob"
bio: "Plays too much Factorio… and Developer by day"
sarah:
name: "Sarah Lienert"
twitter: "SazzarJ"
shortcode: "sarah"
bio: "Sarah likes to bake things for Talk.CSS and teach children."
speakers.html
<ul class="l-speakers c-speakers">
{% for speaker in site.data.speakers %}
<div class="l-speaker c-speaker">
<figure>
<img class="c-speaker__img" src="{{ site.url }}/assets/img/speakers/{{ speaker[1].shortcode }}.jpg" srcset="{{ site.url }}/assets/img/speakers/{{ speaker[1].shortcode }}@2x.jpg 2x" alt="{{ speaker[0].name }}"/>
{% if speaker[1].twitter %}
<figcaption><a class="c-speaker__link" href="https://twitter.com/{{ speaker[1].twitter }}">@{{ speaker[1].twitter }}</a></figcaption>
{% elsif speaker[1].github %}
<figcaption><a class="c-speaker__link" href="https://github.com/{{ speaker[1].github }}">@{{ speaker[1].github }}</a></figcaption>
{% else %}
<figcaption><span class="c-speaker__link">@{{ speaker[1].shortcode }}</span></figcaption>
{% endif %}
</figure>
<p class="c-speaker__intro">{{ speaker[1].bio }}</p>
</div>
{% endfor %}
</ul>
For example, to display a list of speakers on my template with data from the speaker.yml
file, I would do a for
loop that runs through every entry in the data file, and this picks out the relevant fields I want formatted in a specific way on my template. I will cover this in more detail later on.
So Jekyll supports 4 file formats, YAML, JSON, CSV and TSV. And we can shape the data in these files to our needs via the Liquid templating engine. If you are already very familiar with Liquid, please bear with me, or take this as a refresher. I will be going into some features of Liquid that are very relevant when dealing with custom data.
So maybe think of this as data files and Liquid talk.
Liquid templating engine
Open-source template language created by Shopify and written in Ruby
Features include:
Logical and comparison operators (==
, !=
, <
, >
, <=
, >=
, or
, and
)
Truthy and falsy
Control flow
Iteration
Variables
Filters
Like most self-respecting templating engines, there are numerous functions to add logic to your templates. We have the basic logical and comparison operators, which will form the basis for control flow logic. We have equals, not equals, greater than, less than, greater than less than inclusive, then and and or. You might notice an omission here, but we'll talk about that more later on.
Then we have truthiness, control flow, iterators, variables and filters. Again, typical in many templating engines. But there are some Liquid specific behaviours and limitations that we need to understand so we can work efficiently with it.
Truthy and falsy in Liquid
All values in Liquid are truthy except nil
and false
Strings and arrays, even when empty , are truthy
Check for emptiness using blank
or empty
As a web developer, my main programming language of choice is Javascript. Please don't judge me. But this is probably why I'm not entirely surprised by the slight weirdness of truthiness and falsiness in Liquid. The key point to be aware of is that everything in Liquid is truthy by default, unless it resolves explicitly to false
or it's a nil
value.
Strings and arrays are considered truthy even if there's nothing in them. Their mere existence is truth. Same goes for the number 0, any integers or floats, you get the picture.
nil
is a special empty value that is returned when Liquid cannot find anything to return. It is not a string with characters "nil", the letters N, I and L, do not show up anywhere. It is nothing.
Control flow
if
Executes a block of code only if a certain condition is true
unless
The opposite of if
– executes a block of code only if a certain condition is not met
elsif / else
Adds more conditions within an if
or unless
block
case / when
Creates a switch statement to compare a variable with different values. case
initialises the switch statement, and when
compares its values
Now that we've sorted that out, control flow. Control flow is pretty standard, except for the fact that there is and
and or
but no not
operator. There is an unless
operator though, which sort of behaves like a not
, but not exactly.
It is possible achieve the effect of a not
operator but we would have to nest conditionals instead of putting them all on the same line. For example, to have a conditional like “if a and b and not c”, you'd have to first do an “if a and b”, then nest an “unless c” within it.
Relevant questions
That example I picked up from a Github issue on the Liquid repository, which got reopened a couple times over the past 7 years, I think. It is currently still open today. Clearly, not
is something most people expect but unfortunately is not provided as a basic operator.
unless
Your best substitute for not
{% unless post.special %}
<header class="c-event__header">
<h2 class="c-event__title">
{{ post.title }}
</h2>
<p><strong>Event location:</strong> {{ post.location }}</p>
<p><strong>Event date:</strong> {{ post.event-date | date: "%b %-d, %Y" }}</p>
</header>
<a href="https://www.meetup.com/SingaporeCSS/events/{{ post.event-id }}/" class="c-rsvp"><span>RSVP</span><span>at meetup.com</span></a>
<div class="c-event__content">
{{ post.content | markdownify }}
</div>
{% endunless %}
Controls rendering when conditional returns nil
or false
Prevent rendering of empty HTML elements
Useful when used in conjunction with front matter variables
Instead, we have unless
. As long as the specified condition is NOT met, the code block will render. So for this example, if my post has the front matter value of special: true
, the content inside my unless block is not going to show up.
Could I achieve the same effect with an if-else block around using page.special
as a variable in the conditional? Yes, as a matter of fact I could. But it's nice to have options, no?
Iteration
Iterator
Description
Parameters
for
Repeatedly executes a block of code
limit
, offset
, range
, reversed
cycle
Loops through a group of strings and outputs them in the order that they were passed as parameters (must be used within for
loop)
cycle group
tablerow
Generates an HTML table (has to be wrapped with <table>
tags)
limit
, offset
, range
, cols
Now let's move on to iterators, which are pretty much a major motivation for most people to want to use templating languages in the first place.
When we refer to data, we often are talking about pieces of information formatted in a specific way, essentially a repeatable pattern. Iterators are important for transforming the data into a format that suits our purpose.
Liquid provides the very basic for
loop, but allows for the usage of parameters, like limit
, offset
or range
and so on, which can be combined in different ways to make Jekyll quite flexible.
And now I'm going to illustrate some of the parameters by going through the different for
loops which are used in the SingaporeCSS website.
Display only latest post
{% for post in site.posts limit:1 %}
<article class="c-upcoming-event {{ post.css }}">
{% unless post.special %}
<header class="c-event__header">
<h2 class="c-event__title">
{{ post.title }}
</h2>
<p><strong>Event location:</strong> {{ post.location }}</p>
<p><strong>Event date:</strong> {{ post.event-date | date: "%b %-d, %Y" }}</p>
</header>
<a href="https://www.meetup.com/SingaporeCSS/events/{{ post.event-id }}/" class="c-rsvp"><span>RSVP</span><span>at meetup.com</span></a>
<div class="c-event__content">
{{ post.content | markdownify }}
</div>
{% endunless %}
</article>
{% endfor %}
This is the homepage of the site. And what I wanted was for some of details of the upcoming meetup to be displayed, appropriately formatted, of course. Jekyll already has post
as a predefined variable we can use to access all the posts in our _posts
folder.
This can be done in a fairly straightforward manner using the limit
parameter with the value of 1
in the for
loop since posts are in reverse chronological order by default.
Display list of 3 posts before the latest one
<h2>Past meetups</h2>
<ul class="l-past-events c-past-events">
{% for post in site.posts offset:1 limit:3 %}
<li class="l-past-event c-past-event">
<a class="c-past-event__link" href="{{ post.url | prepend: site.baseurl }}">
<span class="c-past-event__meta">{{ post.event-date | date: "%b %-d, %Y" }}</span>
<h3>{{ post.title }}</h3>
</a>
</li>
{% endfor %}
</ul>
<a class="l-archive__link c-archive__link" href="{{ "/archives" | prepend: site.baseurl }}">View full list</a>
The next section is a preview list of the 3 previous meetups. Here I didn't want the latest post to be on the list, so this is where the offset
parameter comes in handy.
I could just offset the post listing by 1 and only display from the second post onwards. With offset, you can also skip entire blocks of posts if that's what you want to do.
There are other parameters like cycle
and range
but they haven't been necessary thus far. Maybe if I add more features to the site in future.
Variables
Possible to create new Liquid variables in addition to predefined ones
assign
Creates a new variable
capture
Captures the string inside of the opening and closing tags and assigns it to a variable
increment
Creates a new number variable, and increases its value by one every time it is called
decrement
Creates a new number variable, and decreases its value by one every time it is called
Custom front matter variables can be defined and accessed via the page.VARIABLE_NAME
syntax
Jekyll provides some predefined global and site variables which cover a lot of common use-cases, but also allows us to create our own. There are 2 types of custom variables, if you will. One is with custom front matter, which you can access via the page.VARIABLE_NAME
syntax, and the other way is to use the assign
or capture
tags.
As for the increment
and decrement
options, I have yet to discover a relevant use case for them, but if you have used it before, I'd love to hear about it.
Establishing relationships across data files
speakers.yml
gloria:
name: "Gloria Soh"
twitter: "Gloriasohss"
shortcode: "gloria"
bio: "Gloria is a CSS queen who is bitcoin rich."
jacob:
name: "Jacob Tan"
twitter: "jacobtyq"
shortcode: "jacob"
bio: "Plays too much Factorio… and Developer by day"
sarah:
name: "Sarah Lienert"
twitter: "SazzarJ"
shortcode: "sarah"
bio: "Sarah likes to bake things for Talk.CSS and teach children."
videos.yml
s2502:
title: "CSS grid for noobs"
link: "https://youtu.be/0Laanmn3zWc"
shortcode: "s2502"
description: "Shirlaine and Gloria share their experience of learning and using CSS grid. Video from JuniorDev.SG ."
s2601:
title: "Frontend development for distributed teams"
link: "https://youtu.be/xj-MbcBfu6Y"
shortcode: "s2601"
description: "Ardy shares how frontend development is done at Skyscanner, with a distributed team of more than 400 engineers."
s2602:
title: "Faux sub-grid"
link: "https://youtu.be/ChnITWfUfFU"
shortcode: "s2602"
description: "Using CSS grid but sad that subgrid isn't here yet? Zell has open-sourced his Sass-powered workaround."
meetups.yml
25:
colour:
name: lime
hex: "#00FF00"
rgba: rgba(0, 255, 0, 1)
videos:
- ref: s2501
- ref: s2502
speakers:
- ref: ed
- ref: gloria
- ref: shirlaine
26:
colour:
name: indigo
hex: "#4b0082"
rgba: rgba(75, 0, 130, 1)
text: "#fff"
videos:
- ref: s2601
- ref: s2602
- ref: s2603
speakers:
- ref: zell
- ref: sheldon
- ref: ardy
The SingaporeCSS website had always been a pet project of mine over the past 3 years and even though I doubt anybody visits the site, it's my baby and I love it regardless. When I first started, all the markup was manually written, which didn't make sense as the site grew because every post had a similar format. But I was lazy. Oh, so lazy.
When I finally wanted to create a page to list all speakers, I realised the existing implementation made the idea infeasible, and hence I stopped procrastinating and decided to resolve this technical debt.
Some planning was necessary in order to extract all the hard-coded values into data files that could be iterated upon. Each post represented 1 meetup, each meetup had multiple speakers with their corresponding videos. Videos would not be repeated across meetups, but speakers could be.
Then we also have variations, where sometimes there would be no video, and also some additional bits like CSS colour of the month, or news updates. So it took a combination of templating features to get things wired together.
Using assign
to simplify custom data (1/2)
40:
colour:
name: lightseagreen
hex: "#20b2aa"
rgba: rgba(32, 178, 170, 1)
videos:
- ref: s4001
- ref: s4002
- ref: s4003
- ref: s4004
speakers:
- ref: alex
- ref: ujjwal
- ref: linhan
- ref: mike
{% assign meetup = site.data.meetups[page.meetup] %}
<div class="c-videos">
/* other code */
{% for videos in meetup.videos %}
{% assign video = site.data.videos[videos.ref] %}
<div class="c-video">
<a class="c-video__link" href="{{ video.link }}">
<img class="c-video__img" src="{{ site.url }}/assets/img/videos/talk-{{ page.meetup }}/{{ video.shortcode }}.jpg" srcset="{{ site.url }}/assets/img/videos/talk-{{ page.meetup }}/{{ video.shortcode }}@2x.jpg 2x" alt="Link to {{ video.title }} video"/>
</a>
<p class="c-video__desc">{{ video.description }}</p>
</div>
{% endfor %}
/* moar code */
</div>
My data format of choice is YAML, so all the examples today are using YAML. The key data file was the meetup.yml file as it was the one that referenced both the speakers file and the videos file, and sort of ties them together.
To access specific values in the file, the syntax ended up being rather long, like site.data.meetups[KEY_VALUE]
. Here's where I used the assign
tag to put all that into a variable. This made the code much neater for when I wanted to loop through the video entries in the meetup data file, then use those values as keys in the videos data file.
I make use of the matching of values in one file as the key of another in order to stitch all the data together, so if you do dig into my source code, you will find things like shortcode
or ref
whose main purpose is, matching of key values pairs. Sort of like a foreign key in structured databases, I suppose.
Using assign
to simplify custom data (2/2)
40:
colour:
name: lightseagreen
hex: "#20b2aa"
rgba: rgba(32, 178, 170, 1)
videos:
- ref: s4001
- ref: s4002
- ref: s4003
- ref: s4004
speakers:
- ref: alex
- ref: ujjwal
- ref: linhan
- ref: mike
{% if meetup.colour %}
{% assign colour = meetup.colour %}
<div class="c-colour">
{% if colour.text %}
<div class="c-swatch" style="background-color:{{ colour.hex }};color:{{ colour.text }}">
{% else %}
<div class="c-swatch" style="background-color:{{ colour.hex }}">
{% endif %}
<div class="c-swatch__txt">
<p>{{ colour.name }}</p>
<p>{{ colour.hex }}</p>
<p>{{ colour.rgba }}</p>
</div>
</div>
<h4>CSS colour of the month</h4>
</div>
{% endif %}
To deal with the fact that some sections on the site were added on later, like the CSS colour of the month, some checking logic was required. So here's where Liquid's truthiness comes in handy. I can check for meetup.colour
, and if this key does not exist in the entry on the meetup data file, the code block will not render as Jekyll returns nil in that case.
It all sounds rather straightforward now that I've streamlined the explanation to the pertinent bits, but it took me quite a while to figure out Liquid's truthiness which is why I spent a slide and a half talking about it earlier.
I personally prefer doing an additional assign
for aesthetic reasons so the code is easier to read, but you don't really have to if you are okay with the longform style.
Count of speakers with forloop.length
Variables can be used outside the for
loop they were declared in.
{% for speaker in site.data.speakers %}{% assign count = forloop.length %}{% endfor %}
So far, {{ count }} lovely people have spoken at Talk.CSS, we want you to be one of them too. If you're thinking, “but I have nothing to talk about”, we can help. Anything related to CSS will work. It could be something you debugged at work, or about a new property you read about, or something you built, or even a rant about why something doesn't work the way you want it.
☝️ speakers.md ☝️
Earlier on I mentioned that I wanted a listing of all speakers, right? Liquid also has a forloop
object which contains attributes of its parent for
loop. And this is pretty useful.
It allowed me to have a dynamically updated speaker count as part of the copy for the speakers page, so every time I added a new speaker to the file, this number updates itself.
Using capture
for complex strings
Variables created through {% capture %}
are strings.
Can create complex strings using other variables created with assign
.
{% if page.css %}
{% capture special_styles %}/assets/css/{{ page.css }}.css{% endcapture %}
<link rel="stylesheet" href="{{ special_styles | prepend: site.baseurl }}">
{% endif %}
Finally, let's talk about capture
. With assign
, you can wrap a variable in quotations to save it as a string. capture
takes all the content between the tags and assigns it to the variable.
So it may seem like there's simply a minor difference in syntax here. But the advantage of capture
is you can use it to create complex strings that are made up of multiple string variables created through assign
.
My particular use case was for conditional CSS in the head of my site, but I wanted the URL of the stylesheet to be processed through Jekyll filters, just to make sure I didn't have inconsistent paths.
One thing about filters is that all the Liquid filters are supported by Jekyll, in addition to those that are specific to Jekyll only. So the one I've used here, baseurl
, is a Jekyll specific filter. The full list is available on the documentation site.
SingaporeCSS website workflow (1/2)
speakers.yml
ollie:
name: "Ollie Monk"
twitter: "ollie_monk"
shortcode: "ollie"
bio: "A modern Renaissance man with a wide range of interests. Looking for his headphones. Clearly did not write this bio himself."
jinjiang:
name: "Zhao Jinjiang"
twitter: "zhaojinjiang"
shortcode: "jinjiang"
bio: "0.1x engineer. Loves football. Does frontend."
post.markdown
---
layout: post
title: "Talk.CSS #43"
date: 2019-08-07 19:00:00 +0800
location: HOOQ
event-date: 2019-09-04 19:00:00 +0800
categories: meetup
meetup: 43
event-id: 261692436
novideos: true
---
Now with the templates all set up, I no longer have type (or copy and paste) any markup code. The site is managed entirely via markdown files and front matter on each individual post, and the 3 data files.
When a new speaker submits a talk, they are added to the speaker data file with required information like name, social media handles, 1-liner bios etc. and this information is used in the speaker.html
include file, using shortcode
as the reference.
The post itself is written in markdown, with some custom front matter variables, that are used in the templates as logic checks. For example, before the meetup, there are no videos, so with novideos set to true, the markup for the videos section won't render.
videos.yml
s4301:
title: "Lean animations"
link: "https://youtu.be/_QN_ahQN0Ic"
shortcode: "s4301"
description: "Jinjiang shares some very practical tips on how to quickly add animations to your design or product."
s4302:
title: "An exploration of CJSS - A CSS based web framework"
link: "https://youtu.be/x1U9tPwRiiw"
shortcode: "s4302"
description: "CSS-in-JS? How about JS-in-CSS instead? Yishu ponders some deep questions about web development, plus LIVE-CODING!"
s4303:
title: "Sometimes you gotta abuse Grid"
link: "https://youtu.be/9DwzDzCNDwI"
shortcode: "s4303"
description: "Sheldon encountered a problem and solved it the best way he could, with Grid. Context is everything, my friends."
post.markdown
---
layout: post
title: "Talk.CSS #43"
date: 2019-08-07 19:00:00 +0800
location: HOOQ
event-date: 2019-09-04 19:00:00 +0800
categories: meetup
meetup: 43
event-id: 261692436
---
Once a meetup is done and dusted, the content of the post, which had previously been details of each speaker's talk, would be replaced with a short paragraph of recap text.
The novideos front matter would be removed, and the videos go into the videos.yml file, and that data gets parsed into the video section of the recap template. Same goes for CSS colour of the month, as that information gets added to the meetup.yml file after the meetup is over.
I could have saved a bit of time by doing this refactoring a bit earlier, but oh well.
If you are also using YML…
There is a difference between lists and dictionaries.
41:
colour:
name: steelblue
hex: "#4682b4"
rgba: rgba(70, 130, 180, 1)
text: "#fff"
videos:
- ref: s4101
- ref: s4102
- ref: s4103
- ref: s4104
speakers:
- ref: van
- ref: dylan
- ref: weiyuan
- ref: ollie
All members of a list are lines at the same indentation level starting with a "- "
A dictionary is represented by key/value pairs in the form key: value
, where the colon must be followed by a space
Complicated data structures like lists of dictionaries, or dictionaries with lists as values are possible
Okay, I'm almost done here. I wasn't too familiar with YML when I started but after a lot of trial and error, I learned that the dash makes a difference in how the YAML file is parsed. Because I was using the square bracket notation to iterate over videos and speakers, those had to be a list. If I used a dictionary syntax, that is, without the dash, things didn't work out.
I want to plug the Jekyll documentation, which I personally find really well written and should be your go-to reference whenever you want to implement something slightly more complicated than rendering front matter on your template.
I hope some of the things I covered today are useful to some of you, and thank you all for taking the time to listen to me talk about data files! If you have any questions, you can find me on Twitter, and I'll try my best to answer them.
https://huijing.github.io/slides/67-jekyllconf-2019 @hj_chen