Kyle Stratis Software Developer in Boston, MA

Hash Traversal in Perl

Today, I wanted to talk about hash traversal in Perl with the aim of flattening a complex multi-level hash into a simpler single-level hash. Hash traversal is a perfect way to get some practice with recursion in, so we will be using that approach here as well. Let’s outline how this might look:

  1. Inputs
  2. Iterate over original hash
  3. Recursion!
  4. Flattening

We can choose to flatten in any way we want. Since this was originally conceived to work with MongoDB queries (specifically to assist in accessing values in complex objects), we will join the keys down to the final key with a period (.).

Inputs

Since we will be writing a recursive function, we know we will at least have to take as an input the next level of hash available (we will be doing a depth-first traversal). Since we need to keep track of every level we get into for flattening, we should also keep track of the keys through each call. Another hash reference, which we will call output, will also need to be passed in to each recursive call, which will be loaded with our key-value pairs as we unwind the stack.

This will be a special, unnamed parameter, and will be undef on the call your program makes to flatten_hash. So the parameters we will pass in to our function will be the undefined output, your input hash, and an initially empty arrayref that will keep track of the keys that point to subhashes.

# Named inputs: original_hash, keys_list
sub flatten_hash {
    my %output = %{shift @_};
    my %args = @_;
}

Iteration

As a necessity, we need to iterate over the keys of the hash we are working on. Instead of the more common foreach my $key (keys %hash) idiom, we will instead use the each built-in. There are some arguments against using each, which include the fact that the each internals can get confused if the original hash is modified while being iterated through. Luckily, that doesn’t cover our use case, and each is fine to use here. Use caution with it in general, though.

each returns to us the key and value of each element of a hash. We will unpack those results as we iterate, and also set up the recursive call and the list of previous keys.

while (my ($key, $value) = each(%{$args{original_hash}})) {
    my @data_address = defined($args{keys_list}) ? @{$args{keys_list}} : ();
    push(@data_address, $key);
}

Notice the ternary operator - if the keys_list parameter is defined, we will dereference it and set it to @data_address, otherwise we will initialize @data_address as an empty array. We will then push the current key we’re on to the end of the @data_address array.

Recursion

We looked a little bit at the $key part of the each call’s results, but let’s take a look at the $value return. If we’re examining a multi-level hash, there will be occasions where $value’s type is a hashref, instead of a scalar. That indicates to us that there is more to explore – we have arrived at our recursive case.

if (ref($value) eq 'HASH') {
    %output = flatten_hash(\%output, original_hash => $value, keys_list => \@data_address);
}

Flattening

If $value is not a reference to a hash, then we’re in our base case (you can expand the recursive case check to include arrays, but I’ll leave that as an exercise to the reader) - $value is an actual scalar value.

else {
    my $addr = join('.', @data_address);
    $output{$addr} = $value;
}

Here, we join all the keys leading to this particular piece of data with a period. We can do this with whatever delimiter you need for your specific application. Then we use autovivification to add to the %output hash the new combined key and then assign it the value stored in $value. All that is left is to return %output, which you can see in the final code below:

sub flatten_hash {
    my %output = %{shift @_};
    my %args = @_;
    while (my ($key, $value) = each(%{$args{original_hash}})) {
        my @data_address = defined($args{keys_list}) ? @{$args{keys_list}} : ();
        push(@data_address, $key);

        if (ref($value) eq 'HASH') {
            %output = flatten_hash(\%output, original_hash => $value, keys_list => \@data_address);
        }
        else {
            my $addr = join('.', @data_address);
            $output{$addr} = $value;
        }
    }
    return %output;
}

Here is a quick test script that shows how to use this method and what its output looks like.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $test_hash = {
    a => 'a',
    b => {
        d => 'd',
        e => 'e',
        f => {
            g => 'g',
            h => 'h',
        },
    },
    c => 'c',
};

my %empty;
my %res = flatten_hash(\%empty, original_hash => $test_hash, arguments => []);
print "Test hash:\n";
print Dumper($test_hash);
print "Result\n";
print Dumper(\%res);

This code grew out of modifications I made to the excellent answer given in this Stack Overflow answer, which gives a basic template to traversing a hash with a callback in Perl.

Hash Slices in Perl

Recently, I had a bug at work that would be solved by simply stripping certain characters from the keys of a hash. This seems easy at first: iterate through the keys with the keys operator and run a regex substitution on that, something like the following (with the goal of stripping periods):

my $my_hash = ('al.pha' => 'a', 'beta' => 'b');

s/\.*//g foreach keys %my_hash;

For those deeply familiar with Perl, the error probably seems obvious - keys doesn’t return references to the keys in the hash for you to manipulate in place, it instead returns copies of the keys. These copies are what we’re iterating through with foreach and stripping the period from.

If we use Data::Dumper on %my_hash, we will be greeted with the hash in its original state:

$VAR1 = {
          'al.pha' => 'a',
          'beta' => 'b'
        };

But if we instead print the default variable within the foreach after the substitution with a little code reorganization:

foreach (keys %hash) {
    s/\.*//g;
    print "Default var: $_\n";
}
print Dumper(keys %hash);

This is our output:

Default var: a
Default var: alpha
$VAR1 = 'al.pha';
$VAR2 = 'a';

So the keys stored in $_ were correctly stripped of periods, but the actual keys in the hash were unaffected. Let’s continue on this path (using the substitution operator s/// with a foreach) - if you have used Perl for any appreciable amount of time, you know TIMTOWTDI (pronounced “Tim Toady”: there is more than one way to do it) is one of the language’s guiding philosophies, and there are a number of ways to tackle a problem like this. We will be continuing the use of keys, though. Since keys returns a copy of a hash’s keys, we can load it into a new array:

my @new_keys = keys %my_hash;

Easy enough. Now we can iterate over this and use the substitution operator to substitute in place:

s/\.*//g foreach @new_keys;

Here comes the Perl magic, can you see all the things that are accomplished in this one line?

@my_hash{@new_keys} = delete @my_hash{keys %my_hash};

Here we finally make use of hash slices - twice, in fact. Both times that we encounter my_hash, it’s in a list context, as are the keys that we are addressing. This technique allows us to operate on or assign to a list of keys all at once. So @my_hash{keys %my_hash}, in the state it is in now, gives us a list of all the values that are in my_hash. Why are we calling delete on those, though? delete has an interesting side effect beyond deleting things: delete also returns what it is deleting.

So not only do we delete the old values from the hash, we load them in to the appropriate new keys, all in one line! We can also guarantee that the values get loaded into the correct keys. This is because keys returns entries in the same random order for the same hash. This depends on the hash not changing between calls to keys.

To put it all together, we have three lines (any wizards want to make it even shorter?) that, in effect, will run a regex substitution on all keys of a hash, seemingly in place.

my @new_keys = keys %my_hash;
s/\.*//g foreach @new_keys;
@my_hash{@new_keys} = delete @my_hash{keys %my_hash};

And the results:

$VAR1 = {
          'alpha' => 'a',
          'beta' => 'b'
        };

Success!

A Short Update

Well, it’s been quite a while, hasn’t it? I know this site doesn’t get a ton of traffic, but it probably still makes sense to explain the long gap in posts. Around this time last year, I got engaged. After the initial excitement leveled out, panic decided to set in. How will I pay for the wedding? How will I continue to pay for my student loans? I eventually decided to begin looking for other jobs, putting this blog on the backburner. I looked all over the state of Florida, not wanting to be too far from where my new fiancée and I were to be married. Like my initial job search, it was met with a lot of auto-rejects due to not having a CS degree. After many, many applications, lots of coding challenges, and working through the first few levels of the Google Secret Interview challenge (which was an absolute blast and may become fodder for another post in the future), I finally got an interview locally, this time with homes.com. The first two interviews were over the phone, first with the recruiter, then with the hiring manager. The second interview described the job in more depth, how the team works with large amounts of data, a number of databases, and, of all languages, Perl. The manager liked my excitement for the job (backend data engineering? Too interesting for me to decline!), as well as the fact that I didn’t have a CS degree. Score!

Perl

As a digression, let’s talk a little bit about Perl, and my experience with it. Having had Internet access since the late 90s, I was familiar with Perl via CGI scripts that were common on websites at that time. This was quite a bit different from how the job was described for me, so I wasn’t too worried that I’d be working on some hugely deprecated presentation codebase that hasn’t been meaningfully updated since 2001, but some grad school experience with Perl made me nervous. My first real experience with the language was writing a data parsing script for a project when I was in grad school, and it also happened to be my first experience with regular expressions. You can guess how well that went.

With this memory, I was pretty nervous going into it. So leading up to my technical interview I bought Learning Perl (no ref) and Intermediate Perl (no ref) and began reading. I had made it through Learning Perl prior to my in-person interview, but decided not to rely on my still-wobbly Perl legs for this interview.

The Interview

This interview was actually my first real technical interview. I remember being very nervous, and it took a lot of time (and questions) for me to get an accurate view of the actual problem. I was shaking the entire time, and by the time I left (this was mid-September in Florida) I was sweating all over and sure I had failed. I did eventually get the solution (using Python-ish), but not like I would have if I had done it at home and not surrounded by three strangers. I felt like a failure, and didn’t want to tell my fiancée about this failure. I did, and she was nothing but supportive.

The Call

The next week I got the call, they were extending me an offer, and I took it. I submitted my 2 weeks’ notice to my employer at the time, which was nerve-wracking itself because I had built quite a friendship with many of my coworkers. They were very supportive and understanding, and we had a nice lunch at Tijuana Flats to send me off.

The Job

I started with homes.com at the end of September, and have really been enjoying it. Perl is no longer an untameable monster, and it really can be quite elegant (and fast). Also very odd and quirky, but I enjoy the little bit of challenge that that provides. The people I work with are great, and I’ve again built new friendships that have been very rewarding, and learned a lot of new skills. I also participated in Digital Ocean Hacktoberfest and got a great tee shirt out of some fun bug-hunting for a few great open source projects.

The Future

So what am I working on now, and what does the future have in store? Well, right now I’m starting back up with a traditional-ish CS program to hone some skills that I’ve been lacking in, I’ve been especially interested in machine intelligence, so that is one of the goals. Eventually, I’d like to work on another M.S., this time in computer science. I was also recruited to work on an Android app called WeatherGirl with a few other local developers who are all learning Android development, something I’ve been trying to get started for a while now. On top of all that, I am in the very early stages of writing the life story of my paternal grandparents, of whom my grandmother came to America from Greece as a refugee, then later went back to Greece to find her man at a time when her sisters were put into arranged marriages. On my last visit with them, I learned that there was a lot I still didn’t know, despite reading my great uncle’s book Eleni (non ref) and growing up with them. I am hoping to be able to keep their incredible story alive for my children and, eventually, their children. And come October, I’ll be married, and who knows what is in store for us after that? It’s a very exciting time! With everything going on, it sounds like I’ll be unable to blog much more, but on the contrary, I plan to bring my posting volume back up. I have a few posts already in varying stages of completion, so something should be finding its way up soon. Until then!

Setting Up an Elegant Blog with Github Pages, Jekyll, and Poole; Part II

I guess it’s time to talk a little bit about the finishing touches I did to customize this site further beyond what Poole and Layon offer.

Resources

I used the following sites for help with these customizations, ideas, code, and more. Be sure to visit them and poke around a bit:

Enabling Disqus Comments

First, visit the Disqus website, set up an account, and log in. Upon logging in you will be brought to your dashboard. Click the gear in the top right corner, and in the dropdown box select “Add Disqus to Site.” Follow the commands until you get to the unique code for your disqus comments.

Now, shifting forcus to your site, you’ll want to add the following line to _layouts/default.html:

{% include comments.html %}

If you take a look at my default layout you will see that this is placed in the content container div underneath the liquid tag for the post’s (or page’s) content. The relevant code:

<div class="container content">
  
  {{ content }}
  {% include comments.html %}
  
</div>

The parent divs are wider than the content container, and so adding the include statement outside of this one will make the comments section (once you implement it in comments.html) stretch across the entire page, rather than being confined to the content’s area.

Now that you are armed with Disqus’s universal code and you have set the location of your comments in the HTML, you can create comments.html in the _includes folder. Here’s what my comments.html looks like:


{% if page.comments %}
<div id="disqus_thread"></div>
<script type="text/javascript">
    /* * * CONFIGURATION VARIABLES * * */
    var disqus_shortname = %%%YOUR USER NAME HERE%%%;
    var disqus_identifier = "{{ site.disqusid }}{{ page.url | replace:'index.html','' }}";
    
    /* * * DON'T EDIT BELOW THIS LINE * * */
    (function() {
        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
        dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
    })();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a></noscript>
{% endif %}

The first thing you see is this liquid tag:

{% if page.comments %}

which is paired with

{% endif %}

at the end of the file. This will run all the code it encloses if the comments is set to true in the YAML frontmatter of whichever page or post is being generated. After that is the universal code (slightly modified) provided by Disqus, and the value of the disqus_shortname variable is provided by Disqus (and is simply your Disqus username). Don’t forget to set Comments to true or false in your YAML frontmatter!

The disqus_identifier configuration variable does not come from the universal code, but it helps Disqus return the appropriate thread for your article. See more about Disqus configuration variables here.

This code was adapted from Joshua Lande’s excellent article.

Enabling Google Analytics

This is very straightforward and follows the same pattern as setting up the Disqus comments. Go to the Google Analytics site, sign up for an account if you don’t have one already, use universal tracking and copy the code that they provide. Create a file in the _includes folder and name it something like google_analytics.html. Paste in the code Google gave you and save the file. Now all that’s left is to open layouts/default.html and add the following line:

{% include google_analytics.html %}

Favicon

A favicon can be built an an image editor like GIMP, and there are a number of sites that have pre-made designs you can choose from. Save your favicon.ico to the folder public (which should have a placeholder favicon already present), then head over to _includes/head.html and and verify that the reference to your favicon exists:

<link rel="shortcut icon" href="{{ site.baseurl }}public/favicon.ico">

Tags Page

This page was modified from Patrick Steadman’s site that I mentioned earlier. You can store this page in the root folder for your site as tags.html. A slight modification from his version was to comment out the code that displays the number of articles with a given tag, as you can see in my repo. Feel free to copy and paste the code directly, and note how the YAML frontmatter declares it a page rather than a post.

I again drew inspiration from Patrick Steadman’s design here, which is clean and balanced, and has some nice features not present in the default Lanyon sidebar. Let’s first take a look at the sidebar and its structure. At the top I have a pretty picture of myself cropped as a circle, below which I have a some nice attractive icons that link to my various social networks. Below that still is a short tagline, then site navigation, version number, and a copyright notice (don’t worry, this is just for the blog content, the actual code is available under MIT licensing). You can find the sidebar code in _includes/sidebar.html. All of the code in the following subsections will be within the first div of this file, aside from the CSS file:

<div class="sidebar" id="sidebar">

Custom CSS

First, though, we will want to apply some custom CSS for the sidebar to allow for a few aesthetic features like opacity. To do this, we will create a new file in public/css/ called custom.css. Save the empty file as it is, then open head.html in _includes and add the following line under the CSS comment:

<link rel="stylesheet" href="https://kylestratis.github.io/public/css/custom.css">

Save that, now go back to your custom CSS file, and add the lines below. Feel free to play with the values to get the effect that is most pleasing to your eye, this is your site after all! Since you’ve got the CSS file linked, you can use jekyll serve in your terminal to test your changes to the CSS.

.sidebar {
       opacity: 0.8;
}
   
.sidebar:hover {
       opacity: 1.0;
}

This will simply set the default and hover opacities for the sidebar, tweak these to what looks pleasing to your eye.

.sidebar-logo {
      padding: 1.5rem;
}
  
.sidebar-logo img {
      border-radius: 50%;
      -moz-border-radius: 50%;
      -webkit-border-radius: 50%;
      width: 160px;
      margin: 0 auto;
}

The logo will be your gravatar (details below), and this sets the location (via padding and measured in rem units, for more about rem units see this article), shape (via border-radius which defines the circularity, if you will, of the image), size (via width), and again location (via margin). Now that we’ve done some setup, we can move on to the gravatar logo.

Gravatar

First, we will set up a gravatar. Go to Gravatar’s site and set up an account, if you don’t have one already. Upload the image you wish to use, and follow all the instructions for setting up your gravatar. Eventually, you will be given an email hash, and you should add this to your _config.yml file, something like this (with other identifiers removed):

author:
    gravatar_md5:     //your hash here

Then open up your sidebar HTML file and add this line right inside the first div:

 <div class="sidebar-logo" style="align:center">
      <img src="https://www.gravatar.com/avatar/{{ site.author.gravatar_md5 }}?s=120" />
  </div>

This will allow the CSS to be applied to your gravatar properly.

Contact List

Below the logo, we have some stylish icons that link to various methods of contacting me. These icons are from the excellent (and free) Font Awesome icon pack, and are a breeze to install. The easiest way is to use their Bootstrap CDN: all you have to do is add the following line to your _includes/head.html with your other CSS links:

<link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css">

Now you have access to the icons, but how do you use them? I will show you some example code below, which works with your _config.yml file, but first take a look at the 519 icons included in the Font Awesome pack. Chances are you’ll find some that you like better than the ones I use, or you’ll find one for a social media account that I don’t have linked, but that you do.

On to the code: the first thing you’ll want to do is set up a div container for your contact list, like so:

<div id="contact-list" style="text-align:center">
<!-- contact icons go here --> 
</div>

You will want one of these div containers for each row of icons you want to use. Within these, you will add an if block that will display an icon and link if you have specified the site in your config.

In the example below, your config is checked to see if you have defined an attribute named github under author, and if it is present a link to github is constructed with your github username (assuming that’s what the github attribute is set to). Within that a tag is a span tag with a class attribute that tells it to stack the images within. We are stacking icons to allow for a border, which is defined by the first i tag and the enclosed icon in the second i tag. For more examples on what you can do with the Font Awesome CSS, see their examples page.

{% if site.author.github %}
  <a href="https://github.com/{{ site.author.github }}">
    <span class="fa-stack fa-lg">
      <i class="fa fa-square-o fa-stack-2x"></i>
      <i class="fa fa-github-alt fa-stack-1x"></i>
    </span>
  </a>
{% endif %}

Your actual link will be different based on the contact method: LinkedIn will have a different way to get to a given profile than Github, as an example, and if you want to include an email you will instead have a mailto: link. Repeat this pattern (or one that you make with Font Awesome’s many styling features) for each icon you want on each row, and don’t forget that each row will have its own contact-list div!

Description and Navigation

These items are mostly unchanged from the original Lanyon sidebar, however I did remove the download link because it is mostly redundant with the link to the project and it doesn’t work if you are primarily working on the master branch. If you do want the download link, change this:

<a class="sidebar-nav-item" href="{{ site.github.repo }}/archive/v1.0.1.zip">Download</a>

to this:

<a class="sidebar-nav-item" href="{{ site.github.repo }}/archive/master.zip">Download</a>

RSS

Lanyon comes with a generated RSS file, atom.xml. All you have left to do is link to it where necessary. As an example, I have added a link to the RSS feed on the second row of the contact list.

Closing Notes

I hope this gives a little insight into the use of Jekyll, Poole, and Lanyon in bootstrapping and customizing a personal site on Github Pages. For further reading, please refer to the links at the top of this post.

Setting Up an Elegant Blog with Github Pages, Jekyll, and Poole; Part I

Recently, as an extension of my note-taking obsession, I decided to start this blog, primarily to share what I’ve learned the hard way. I already had my domain name registered via Namecheap (non-ref), so it was time to find hosting. Luckily, Github offers free hosting with support for Jekyll. Jekyll is a Ruby-based static site generator that authoring easy by automatically building Markdown files, but remains powerful enough to create all sorts of site customizations.

In this tutorial, I will walk you through the steps I took and the resources I used to build this blog, the first undertaking of its kind for me. There were many tools that made setup easy, but it was still enough work that it was a fun and enlightening process.

To begin with Github pages, you will have to create a repo on Github entitled yourname.github.io. This will be your URL if you don’t register a custom domain, but having your own domain is always nice. It takes just a few extra steps, but is worth it for the professional look it offers your site.

Setting Up Poole

The first step is to get Poole. This site uses the minimalist Lanyon theme, which keeps navigation in a collapsible sidebar. Poole on its own is described as a butler for Jekyll, which is really a great analogy: it provides a vanilla install of Jekyll, setting up many of the folders that Jekyll will use to generate your site. Follow the instructions at the Poole website to install the gemfile, then clone down the empty repository you made, go to the repo for the theme you want and download the provided archive. Extract that to your local repo and push it to your remote, and you’re all set with Poole!

Setting Up a Custom URL

Github offers two sets of instructions for setting up your domain depending on whether you are using an apex domain or subdomain. To set up the appropriate records with Namecheap specifically, navigate to the Manage Domains page, select the domain you wish to set the record for, and click the ‘Edit Selected’ button. Now, on the left navigation pane, click the ‘All Host Records’ link. On this page, you will need to set two A records for hostname @, one to 192.30.252.153 and the other 192.30.252.154. You will also need to set a CNAME record for hostname www to your Github Pages URL (in my case kylestratis.github.io).

When you have done this, all that is left is to edit the CNAME file (a blank one comes with Poole) to have your custom URL. Save the file and push it to your remote, and navigating to your blog via a custom URL should work shortly.

Setting Up Your Configuration

The main way to do basic configuration on your site is to edit your _config.yml file in the root of your repo. Here, you can set the markdown engine, code highlighting engine, and more. These will allow you to leverage the liquid templating system and customize the sidebar later. For an example _config.yml file, see my config.

Setting Up Your About Page

Poole comes with an example about.md page that gives you a general idea of how to leverage Markdown for the content of your page. What you will also notice is some information at the top of the page. This is the YAML frontmatter, and gives Jekyll some important information about your page (or post) that it will use when generating your site. Here is the default frontmatter in about.md:

---
layout: page
title: About
---

Layout specifies which specific layout to use (they are located in your _layouts folder), in this case we are using a page layout, as opposed to a post layout. We can also specify the title. I didn’t make any additional changes to the frontmatter here, the frontmatter is much more powerful (and useful) in the context of a blog post.

Open up your local about.md in your favorite text editor if you haven’t already, and write something up! Now, I’m sure you don’t want to push every minor change to Github just to preview your changes, so instead of that fire up your Terminal and run jekyll serve. This will allow you to access your generated website at localhost:4000 and will rebuild your site when any file is changed so that you can quickly view any edits you make before publishing. Play around, and when you’re satisfied with your page, push it to your remote repo.

Setting Up Your First Post

The primary difference between a post and a page is the location of a post (they will be put in your _posts folder) and the filename, which will be the date followed by the title, e.g. 2015-04-06-sublime-markdown.md (for my excellent previous post). You can also take advantage of more YAML variables here, such as tags and category. Go ahead and write an introductory post and give it some tags! To preview your post, fire up your terminal, navigate to your local repo for the site, and run jekyll serve. Like above, this command will build and serve your site, allowing you to view it at localhost:4000. When you want to publish (you can go ahead and publish what you’ve written now, or write another post later), use git to add your files to the staging area, commit them, and then push them to your remote repo.

Setting Up Drafts and .gitignore

Another handy feature is the ability to save drafts. Instead of saving a dated markdown file to the _posts folder, save an undated markdown file (e.g. ruby-blog-first-draft.md) to the _drafts folder. In order to preview, run jekyll serve --drafts. When you are ready to publish the post, rename it according to the convention above and move it to your _posts folder.

Before you push anything to your remote (and public) Github repo, it may be wise to create a .gitignore file. Like the filename says, this file tells git which files to ignore so that they don’t show up in your public repo. Simply add the name of a specific file or files (by using asterisks as wildcards) to its own line. For a template, feel free to refer to my own .gitignore file. The most important thing for me was to not share my _drafts folder, so that I could both keep my commits clean (and not have dozens of draft edits being pushed along with a major page overhaul) and to keep post ideas private until they are finished, polished, and ready for the world to see. Read more about .gitignore here.

That is it for part 1 of this guide, please check back again soon for Part 2, where I will cover more customizations and other blogs that helped me set mine up.