- Published on
Making events-api
- Authors
- Name
- Teddy Xinyuan Chen
OpenAPI docs: https://events-api.teddysc.me/docs
Online data explorer & demo: https://events-data.teddysc.me/events/events
The quick and dirty, and the ugliness. Just do what works, and do repeat yourself, because pre-mature optimization considered harmful.
Table of Contents
Motivation
I often find myself opening 10 tabs from sm.teddysc.me (archive.is, blog post) to find events in Raleigh. So I thought, why not make something more fun?
I want to see all events in place and be able to search and filter through the data, so I made this.
Tech Stack
- Chrome Dev Tools for finding the actual source I can get the events from.
Some websites use a PHP backend to generate part of the HTML, some have JSON APIs, the API of one of the sites requires token authentication, and the token is sent to client side in the HTML! Amazing. Thank you for making events accessible. :)
NCSU events calendar is using the Localist framework and the HTML is highly structured, which makes my work easier. - FastAPI - provides an OpenAPI 3.1 compliant API, can be used for any apps, or even as a GPT Action.
- BeautifulSoup for scraping the events.
I used GPT-4 to generate the code for parsing HTML from different sites. Took me some trials and prompting efforts, but it was really easy and almost smooth. - HTTPS requests are issued with
urllib
with Chrome's user agent to fool the websites.requests
is heavy and I don't need the extra features. - Maya for parsing dates (they never use the same format, I don't want to write a custom parser for each of the even source)
- Redis for caching (time-to-live depends on how often the event source update the events, some are daily, some are weekly, some only updates on Tuesday (looking at you, ThisIsRaleigh)).
There's another layer of caching for database connections and function calls, but these don't persist when the Python interpreter exits.
I use RedisJSON to store JSON data in the database. - Postgres for storing events, and supporting full-text search (via a generated column of
tsvector
type and an index) and filtering.
The Ugliness
DRY: Do Repeat Yourself
I thought about making Redis caching a decorator, but they're kinda interwined and sometimes intermediate results are also cached, so I didn't bother do that in this phase.
@app.get("/ncmns/events", summary="Get NC Museum of Natural Sciences Events")
async def get_ncmns_events(tabular: Optional[bool] = Query(False)):
"""
Fetch, parse and extract events from
https://naturalsciences.org/calendar/events/
"""
r = get_redis_connection()
key = "ncmns"
key_tabular = "ncmns:tabular"
if not tabular and r.exists(key):
return r.json().get(key, redis_path_root_path)[0] # type: ignore
if tabular and r.exists(key_tabular):
return r.json().get(key_tabular, redis_path_root_path)[0] # type: ignore
data = ncnms_url_extract_events(ncmns_events_url)
r.json().set(key, redis_path_root_path, data)
r.expire(key, redis_expiry_seconds)
if tabular:
tabular_events = []
for d in data:
epoch = maya_parse_date(d["date"])
tabular_events.append({"epoch": epoch, "source": "ncmns", "event": d})
r.json().set(key_tabular, redis_path_root_path, tabular_events)
r.expire(key_tabular, redis_expiry_seconds)
return tabular_events
return data
@app.get("/ncsu/events", summary="Get NC State University Events")
async def get_ncsu_events(tabular: Optional[bool] = Query(False)):
"""
Fetch, parse and extract events from
https://calendar.ncsu.edu/calendar
"""
r = get_redis_connection()
key = "ncsu"
key_tabular = "ncsu:tabular"
if not tabular and r.exists(key):
return r.json().get(key, redis_path_root_path)[0] # type: ignore
if tabular and r.exists(key_tabular):
return r.json().get(key_tabular, redis_path_root_path)[0] # type: ignore
data = ncsu_url_extract_events(ncsu_events_calendar_base_url)
r.json().set(key, redis_path_root_path, data)
r.expire(key, redis_expiry_seconds)
if tabular:
tabular_events = []
for d in data:
epoch = maya_parse_date(d["startDate"])
tabular_events.append({"epoch": epoch, "source": "ncsu", "event": d})
r.json().set(key_tabular, redis_path_root_path, tabular_events)
r.expire(key_tabular, redis_expiry_seconds)
return tabular_events
return data
@app.get(
"/ncsu/events/path/{event_path:path}",
summary="Get NC State University Events at $path",
)
async def get_ncsu_events_with_path(
event_path: str, tabular: Optional[bool] = Query(False)
):
"""
Fetch, parse and extract events from
https://calendar.ncsu.edu/calendar
event_path: The path to append to the base url above
format: {,{day,week}/YYYY/MM/DD}{,2,3,4,5}
"""
redis_key = f"ncsu:{event_path}"
redis_key_tabular = f"ncsu:{event_path}:tabular"
r = get_redis_connection()
if not tabular and r.exists(redis_key):
return r.json().get(redis_key, redis_path_root_path)[0] # type: ignore
if tabular and r.exists(redis_key_tabular):
return r.json().get(redis_key_tabular, redis_path_root_path)[0] # type: ignore
data = ncsu_url_extract_events(f"{ncsu_events_calendar_base_url}/{event_path}")
r.json().set(redis_key, redis_path_root_path, data)
r.expire(redis_key, redis_expiry_seconds)
if tabular:
tabular_events = []
for d in data:
epoch = maya_parse_date(d["startDate"])
tabular_events.append({"epoch": epoch, "source": "ncsu", "event": d})
r.json().set(redis_key_tabular, redis_path_root_path, tabular_events)
r.expire(redis_key_tabular, redis_expiry_seconds)
return tabular_events
return data
JDWW: Just Do What Works
I didn't use FastAPI's dependency injection (with yield
statements) for the database connections, and what's more ugly is this:
@app.get(
"/events_iso8601_like/{dt_iso8601_like}",
summary="Get Events Where dt ISO8601 Like (try '04-01' etc)",
)
async def get_events_iso8601_like(dt_iso8601_like: str):
try:
conn = get_pg_connection()
return get_events_where_dt_iso8601_like(conn, dt_iso8601_like)
except:
get_pg_connection.cache_clear()
conn = get_pg_connection()
return get_events_where_dt_iso8601_like(conn, dt_iso8601_like)
@app.get(
"/events_ts/{event_ts}",
summary="Run FTS on the event field (try 'talley', 'beer' etc)",
)
async def get_events_ts(event_ts: str):
try:
conn = get_pg_connection()
return get_events_where_event_ts(conn, event_ts)
except:
get_pg_connection.cache_clear()
conn = get_pg_connection()
return get_events_where_event_ts(conn, event_ts)
I'm running on a scale-to-zero serverless platform and I don't even bother to close database connections. When the connection times out while the runtime is still alive, I just use try
with bare except
to get a new connection and run the query again. It works BTW. :)
New Things Learned
- Doing FTS on Postgres
- Using RedisJSON
Conclusion
For toy projects like this, just do what's fast and what works! It can work out well. :)