PyHP: Python Home Page

Published on 8 min read

Introduction

So, the other day, I was at a local Python meetup, where one of the members shared an interesting project prompt:

Create a minimalist static site generator using only the Python standard library.

Of course, one option would be to write a tokenizer and parser from scratch, parsing your template language of choice quickly and efficiently. This has the merit of being the most technically challenging way to approach this that I can think of, and allowing for complete creative control. However, this approach also seems to carry a significant amount of work for a fun hobby project.

The person who introduced the prompt took it in an interesting direction, choosing XML as the templating language, and then using Python’s native XML parsing capabilities to generate XHTML documents, championing XML’s validity in 2023. I like this approach, as I believe it was a good learning experience regarding Python’s XML support, though he said that he did end up writing more error-checking code for invalid XML structures than he would have liked.

On the other hand, I felt inspired to take this approach in an more minimalist direction: Using the Python parser itself to handle template generation.

Ruby Inspiration

It was interesting to me to see what is required to build template generators in Python, as, in my experience, Ruby makes this type of project trivial. The site you’re reading right now is generated by a bodged-together static site generator I wrote using Ruby’s standard library.

Notably, Ruby’s standard library includes the templating system ERB, so including a template is as simple as including just a few lines.

ERB has two main features that let you use Ruby code to produce HTML:

  1. Execution tags: <% ... %>
    • Used for loops, comments, if statements, function definitions, and anything that shouldn’t end up in the result.
  2. Expression tags: <%= ... %>
    • Used to produce values for your HTML
require 'erb'
template = ERB.new("
<% 1.upto(3) do |i| %>  <%# execution tag  %>
   <h<%= i %>>          <%# expression tag %>
    <%= phrase %>       <%# expression tag %>
   </h<%= i %>>         <%# expression tag %>
<% end %>               <%# execution tag  %>
")
phrase = "Hello World" # the variable used in the template above
puts template.result(binding)

With whitespace removed, this produces:

<h1>Hello World</h1>
<h2>Hello World</h2>
<h3>Hello World</h3>

Instead of having a separate templating language, you just use normal Ruby code. I wanted to find an easy way to get the same vibe in Python.

PHP Inspiration

As a disclaimer, I really haven’t used PHP very much, so I didn’t take any influences from PHP beyond the name.

Originally, I wanted to call this project “PHP”, but I found it caused more confusion than laughs, so PyHP felt like a good compromise.

I’m nowhere near the first person to think of this pun, so credit to the following people for getting there first:

My Solution

While it’s totally possible to try to build out a full static site generator under this prompt, I was mostly inspired to find an elegant, minimalist solution to produce basic templates.

However, because I’m working in Python, I chose to loosely base the project on Jinja templates:

SyntaxDescription
{{ ... }}expression tag
{% ... %}execution tag

In general, I figured the flow should go as follows:

Parser flowchart

Given this framework, Python gives us two really useful tools for processing strings as Python code, and they happen to be exactly what we want!

FunctionDescription
evalevaluate a string and return a result
execexecute a string and return None

Basic usage is as follows:

# eval returns native Python types
x = eval("1 + 2")
assert type(x) is int and x == 3
# exec has side effects
exec("x += 1")
exec("print(x)")  # => prints 4
 
def f(x):
    return x * 2
# these functions can be called recursively
arr = eval("[eval('f(i)') for i in range(x)]")
print(arr) # => prints [0, 2, 4, 6]

exec and eval both take three parameters:

  1. The code we’re executing
  2. the global definitions the code has access to
  3. the local definitions the code has access to

In our case, we want to be able to call global functions with ease, so we pass in all globals() for parameter 2. We also want to be able to define new methods and variables on the fly, so we pass in a shared namespace for all templates as well.

Based on this reasoning, we get the following core solution:

EXECUTE_REGEX = re.compile(r'\{\%(.*?)\%\}', re.DOTALL)
EVALUATE_REGEX = re.compile(r'\{\{(.*?)\}\}', re.DOTALL)
template_namespace = {}
 
def parse(template: str) -> str:
    def make_replacer(handler: Callable) -> Callable[[Match], str]:
        def replace(match: Match) -> str:
            # extract the code inside the template
            code = str(match[1]).strip()
            # recursively parse, then execute that code
            return str(handler(parse(code), globals(), template_namespace) or "")
        return replace
 
    executor = make_replacer(exec)
    evaluator = make_replacer(eval)
 
    # execute everything between '{%' and '%}' and replace it with nothing
    template = EXECUTE_REGEX.sub(executor, template)
 
    # evaluate everything between '{{' and '}}' and recursively replace it with its result
    return EVALUATE_REGEX.sub(evaluator, template)

(Source on Github)

I felt like this was a pretty elegant solution to this prompt.

Let’s see what it can do!

Examples

Basic Example

<p>
    {% from datetime import datetime %}
    {{ datetime.now().strftime("%Y-%m-%d") }}
</p>

When evaluated with python3 src/main.py --file $(realpath example_site/example.php), this will print out:

<p>2023-06-17</p>

Including templates

To handle templates, we just have to define a function that looks for a filename, and returns the file as a string.

Something like

def include(path: str):
    assert os.path.isfile(path)
    with open(path) as f:
        return parse(f.read())

will allow us to easily include other files in templates.

{{ include('./head.php') }}

This example will include the entire ”./head.php” file in whatever template we’re using.

Given these basic building blocks, it should be possible to create many types of static sites.

Putting files together

.
├── footer.php
├── head.php
└── index.php

footer.php:

{% from datetime import datetime %}
 
<footer>{{ f'&copy; {datetime.now().strftime("%Y")} {author_name}'}}</footer>

head.php:

<head>
  <meta charset="UTF-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Example Document</title>
</head>

index.php:

<!DOCTYPE html>
<html lang="en">
  {{ include('./head.php') }}
  <!-- These variables are included in ./footer.php -->
  {% author_name = "Zack Sargent" %}
 
  <body>
    <h1>Hello World!</h1>
    {% import time %}
    <p>Generated at {{ time.time() }}</p>
 
    {{ include('./footer.php') }}
  </body>
</html>

When index.php is processed, it will print:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Example Document</title>
  </head>
 
  <body>
    <h1>Hello World!</h1>
 
    <p>Generated at 1686989964.6053464</p>
 
    <footer>&copy; 2023 Zack Sargent</footer>
  </body>
</html>

Defining functions

I was initially wary about using a whitespace-sensitive language for use in html templating, as I wasn’t sure if HTML indentation would cause problems.

Luckily, Python does not require consistent depths between different levels of indentation.

The following is a weird, but still valid, function definition:

def calculate(x: int) -> str:
                        if x % 2 == 0:
                         return "even"
                        else:
                         return "odd"

This gives a certain level of freedom to how functions are defined in PyHP.

Additionally, there are some annoying scoping limitations when working with exec. For example, the following code works as expected, between exec and eval:

{% def f(x):
    return x + 1
%}
{{f(1) # returns 2}}

but the following code errors out:

{%
def f(x):
    return x + 1
def g(x):
    return f(x) + 1
%}
{{g(1)}}
NameError: name 'f' is not defined

The best way I’ve found to resolve this is to force the definition by assigning it to globals():

{%
def f(x):
    return x + 1
globals()['f'] = f

def g(x):
    return f(x) + 1
%}
{{g(1) # returns 3 }}

But, that’s a bit ugly. To resolve this scoping issue, there’s a simple decorator you can use instead!

{%
@define
def f(x):
    return x + 1

def g(x):
    return f(x) + 1
%}
{{g(1) # returns 3 }}

Email obfuscation example

Imagine a situation where someone wanted to put their email on their blog, but didn’t want their email to be accessible by people scraping the website, so they put a bunch of garbage in the text and then cleaned it up with CSS.

{%
import random
import string
define(random)  # could be nested in get_garbage as well
define(string)
 
@define
def obfuscate(letter: str, garbage: bool = False) -> str:
    if garbage:
        return f"<span class=\"garbage\">{letter}</span>"
    else:
        return f"<span>{letter}</span>"
 
@define
def get_garbage() -> str:
    return random.choice(string.ascii_letters)
 
def obfuscate_email(email: str) -> str:
    result = ""
    for letter in email:
        result += obfuscate(letter)
        result += obfuscate(get_garbage(), garbage=True)
    return result
%}
 
<style>
.garbage {
    display: none;
}
</style>
 
<div id="my-email-i-want-to-protect-from-scammers">
    {{ obfuscate_email("e@a.com") }}
</div>

produces:

<style>
  .garbage {
    display: none;
  }
</style>
 
<div id="my-email-i-want-to-protect-from-scammers">
  <span>e</span>
  <span class="garbage">w</span>
  <span>@</span>
  <span class="garbage">h</span>
  <span>a</span>
  <span class="garbage">i</span>
  <span>.</span>
  <span class="garbage">T</span>
  <span>c</span>
  <span class="garbage">g</span>
  <span>o</span>
  <span class="garbage">Y</span>
  <span>m</span>
  <span class="garbage">J</span>
</div>

Conclusion

Anyway, that’s all I have for now! I hope you found this at least mildly interesting. 😊

This is still a super janky project, but I thought it was interesting how it’s possible to get pretty useful results from such as simple approach.

Just don’t create a template that deletes everything!

<!-- This file deletes main.py -->
{% import os %}
{% os.remove(__file__) %}