Published on

Making events-api

Authors
  • avatar
    Name
    Teddy Xinyuan Chen
    Twitter

OpenAPI docs: https://events-api.teddysc.me/docs

Online data explorer & demo: https://events-data.teddysc.me/events/events

The quick and dirty, and the ugliness. Just do what works, and do repeat yourself, because pre-mature optimization considered harmful.

Table of Contents

Motivation

I often find myself opening 10 tabs from sm.teddysc.me (archive.is, blog post) to find events in Raleigh. So I thought, why not make something more fun?

I want to see all events in place and be able to search and filter through the data, so I made this.

Tech Stack

  • Chrome Dev Tools for finding the actual source I can get the events from.
    Some websites use a PHP backend to generate part of the HTML, some have JSON APIs, the API of one of the sites requires token authentication, and the token is sent to client side in the HTML! Amazing. Thank you for making events accessible. :)
    NCSU events calendar is using the Localist framework and the HTML is highly structured, which makes my work easier.
  • FastAPI - provides an OpenAPI 3.1 compliant API, can be used for any apps, or even as a GPT Action.
  • BeautifulSoup for scraping the events.
    I used GPT-4 to generate the code for parsing HTML from different sites. Took me some trials and prompting efforts, but it was really easy and almost smooth.
  • HTTPS requests are issued with urllib with Chrome's user agent to fool the websites. requests is heavy and I don't need the extra features.
  • Maya for parsing dates (they never use the same format, I don't want to write a custom parser for each of the even source)
  • Redis for caching (time-to-live depends on how often the event source update the events, some are daily, some are weekly, some only updates on Tuesday (looking at you, ThisIsRaleigh)).
    There's another layer of caching for database connections and function calls, but these don't persist when the Python interpreter exits.
    I use RedisJSON to store JSON data in the database.
  • Postgres for storing events, and supporting full-text search (via a generated column of tsvector type and an index) and filtering.

The Ugliness

DRY: Do Repeat Yourself

I thought about making Redis caching a decorator, but they're kinda interwined and sometimes intermediate results are also cached, so I didn't bother do that in this phase.

@app.get("/ncmns/events", summary="Get NC Museum of Natural Sciences Events")
async def get_ncmns_events(tabular: Optional[bool] = Query(False)):
    """
    Fetch, parse and extract events from
    https://naturalsciences.org/calendar/events/
    """
    r = get_redis_connection()
    key = "ncmns"
    key_tabular = "ncmns:tabular"
    if not tabular and r.exists(key):
        return r.json().get(key, redis_path_root_path)[0]  # type: ignore
    if tabular and r.exists(key_tabular):
        return r.json().get(key_tabular, redis_path_root_path)[0]  # type: ignore
    data = ncnms_url_extract_events(ncmns_events_url)
    r.json().set(key, redis_path_root_path, data)
    r.expire(key, redis_expiry_seconds)
    if tabular:
        tabular_events = []
        for d in data:
            epoch = maya_parse_date(d["date"])
            tabular_events.append({"epoch": epoch, "source": "ncmns", "event": d})
        r.json().set(key_tabular, redis_path_root_path, tabular_events)
        r.expire(key_tabular, redis_expiry_seconds)
        return tabular_events
    return data


@app.get("/ncsu/events", summary="Get NC State University Events")
async def get_ncsu_events(tabular: Optional[bool] = Query(False)):
    """
    Fetch, parse and extract events from
    https://calendar.ncsu.edu/calendar
    """
    r = get_redis_connection()
    key = "ncsu"
    key_tabular = "ncsu:tabular"
    if not tabular and r.exists(key):
        return r.json().get(key, redis_path_root_path)[0]  # type: ignore
    if tabular and r.exists(key_tabular):
        return r.json().get(key_tabular, redis_path_root_path)[0]  # type: ignore
    data = ncsu_url_extract_events(ncsu_events_calendar_base_url)
    r.json().set(key, redis_path_root_path, data)
    r.expire(key, redis_expiry_seconds)
    if tabular:
        tabular_events = []
        for d in data:
            epoch = maya_parse_date(d["startDate"])
            tabular_events.append({"epoch": epoch, "source": "ncsu", "event": d})
        r.json().set(key_tabular, redis_path_root_path, tabular_events)
        r.expire(key_tabular, redis_expiry_seconds)
        return tabular_events
    return data


@app.get(
    "/ncsu/events/path/{event_path:path}",
    summary="Get NC State University Events at $path",
)
async def get_ncsu_events_with_path(
    event_path: str, tabular: Optional[bool] = Query(False)
):
    """
    Fetch, parse and extract events from
    https://calendar.ncsu.edu/calendar

    event_path: The path to append to the base url above
    format: {,{day,week}/YYYY/MM/DD}{,2,3,4,5}
    """
    redis_key = f"ncsu:{event_path}"
    redis_key_tabular = f"ncsu:{event_path}:tabular"
    r = get_redis_connection()
    if not tabular and r.exists(redis_key):
        return r.json().get(redis_key, redis_path_root_path)[0]  # type: ignore
    if tabular and r.exists(redis_key_tabular):
        return r.json().get(redis_key_tabular, redis_path_root_path)[0]  # type: ignore
    data = ncsu_url_extract_events(f"{ncsu_events_calendar_base_url}/{event_path}")
    r.json().set(redis_key, redis_path_root_path, data)
    r.expire(redis_key, redis_expiry_seconds)
    if tabular:
        tabular_events = []
        for d in data:
            epoch = maya_parse_date(d["startDate"])
            tabular_events.append({"epoch": epoch, "source": "ncsu", "event": d})
        r.json().set(redis_key_tabular, redis_path_root_path, tabular_events)
        r.expire(redis_key_tabular, redis_expiry_seconds)
        return tabular_events
    return data

JDWW: Just Do What Works

I didn't use FastAPI's dependency injection (with yield statements) for the database connections, and what's more ugly is this:

@app.get(
    "/events_iso8601_like/{dt_iso8601_like}",
    summary="Get Events Where dt ISO8601 Like (try '04-01' etc)",
)
async def get_events_iso8601_like(dt_iso8601_like: str):
    try:
        conn = get_pg_connection()
        return get_events_where_dt_iso8601_like(conn, dt_iso8601_like)
    except:
        get_pg_connection.cache_clear()
        conn = get_pg_connection()
        return get_events_where_dt_iso8601_like(conn, dt_iso8601_like)


@app.get(
    "/events_ts/{event_ts}",
    summary="Run FTS on the event field (try 'talley', 'beer' etc)",
)
async def get_events_ts(event_ts: str):
    try:
        conn = get_pg_connection()
        return get_events_where_event_ts(conn, event_ts)
    except:
        get_pg_connection.cache_clear()
        conn = get_pg_connection()
        return get_events_where_event_ts(conn, event_ts)

I'm running on a scale-to-zero serverless platform and I don't even bother to close database connections. When the connection times out while the runtime is still alive, I just use try with bare except to get a new connection and run the query again. It works BTW. :)

New Things Learned

  • Doing FTS on Postgres
  • Using RedisJSON

Conclusion

For toy projects like this, just do what's fast and what works! It can work out well. :)