When I was first tasked with building Zai's EventCatalog, I knew it wasn’t going to be simple.
What I didn’t realise was that it would become one of the most rewarding technical journeys I’ve been on; filled with challenges, caffeinated problem solving, unexpected lessons, and a lot of pride at the finish line.
This is the story of how Zai's EventCatalog came to life: from a simple goal to a secure, automated, scalable platform.
Setting the Goal
The initial brief sounded straightforward:
"Create a simple internal catalog website using EventCatalog that documents key system events, make it secure (via our corporate identity provider), easy to deploy, and resilient."
At the time, I thought, "Sounds like a few S3 buckets and CloudFront, how hard can it be?"
(You already know where this is going.)
Before touching any code, I spent time thinking through the requirements:
- Private access only — this wasn't a public-facing site.
- User authentication and session management — not just a password page.
- Scalability and global reach — minimal latency for users anywhere.
- Ease of deployment — so future updates wouldn't be a manual nightmare.
The first decision? Start small, prove the concept, and grow it carefully.
Early Days: Building the MVP
I started with the basics:
- An S3 bucket to host static HTML.
- Used aws s3 sync to upload local files.
- A CloudFront distribution in front to cache and serve the content globally.
- Created a distribution with the S3 bucket as the origin.
- Configured default root object to index.html.
- Set minimum TTL to improve cache behaviour during testing.
- Public access was temporarily enabled just to get something working.
And sure enough; after a few hours, I saw my first "Hello World" on a real URL.
When I finally accessed the website over CloudFront, it was a great milestone: the site was live globally within minutes.
Small win, big motivation.
Hitting Real Problems
Of course, making something work and making it secure are two very different things.
Problems started stacking up:
- Public S3 access wasn't acceptable long-term.
- CloudFront access policies became confusing quickly.
- How would I lock it down to internal users only? S3 bucket policies? Signed URLs? IAM auth?
Now the real engineering work started.
S3 Bucket Policy Tightening
- Blocked all public access on the bucket.
- Configured a custom CloudFront origin access control (OAC) to allow CloudFront to fetch from the bucket securely.
- Attached a policy like:
{
"Effect": "Allow",
"Principal": {
"Service": "cloudfront.amazonaws.com"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<YOUR_BUCKET>/*",
"Condition": {
"StringEquals": {
"AWS:SourceArn": "arn:aws:cloudfront::account-id:distribution/distribution-id"
}
}
}
When you’re experimenting with toy data in a development account, you can (briefly) put your docs bucket in public-read. But be mindful of what you upload there and never carry that setting into staging or prod.
Authentication Plan
No matter which way I turned, the answer kept pointing toward one thing:
Lambda@Edge + one of our corporate identity providers.
Auth: My Identity Boss Fight
Implementing SAML authentication was a whole new world.
The first time I opened a SAML Response, I stared at it like it was hieroglyphics. Definitely check the SAML docs:
- XML signatures.
- SAML Validations like NotBefore and NotOnOrAfter.
-
Base64 encoding inside HTTP POSTs.
Yet this was the most technical part.
SAML Flow
- User tries to access the catalog.
- CloudFront triggers a viewer-request Lambda@Edge.
- Lambda checks if the user has a valid AuthToken cookie (JWT).
- If no cookie: redirects to SSO URL.
- Identity provider authenticates and POSTs a SAMLResponse to /callback.
- Lambda validates the SAML signature and conditions.
- Lambda issues a new JWT token to the browser.
Edge Lambda
Wrote the Lambda@Edge function in Node.js v20.
Used libraries:
- A hardened SAML validation library for verifying the signature
- xmldom for parsing XML
- jsonwebtoken for issuing JWTs
Validating SAML Response
- Extracted and parsed the <Signature> block.
- Verified the XML signature against our corporate certificate.
- Checked <Conditions> timestamps:
- NotBefore (must be in the past)
- NotOnOrAfter (must not be expired)
I had to:
- Decode and parse the SAML payload.
- Verify the digital signature.
- Validate that the SAML Conditions were still active.
- Extract the user’s NameID.
- Then issue my own JWT to use as a session token.
Sent this token back as a Secure, HttpOnly cookie.
const token = jwt.sign(
{
user: userId,
iat: nowSeconds,
exp: nowSeconds + expTime // token expiry duration
},
fetchSecretKey()
);
Handling Unauthenticated Requests
return {
status: '302',
statusDescription: 'Found',
headers: {
location: [{
key: 'Location',
value: '<https://login.hellozai.com/.../sso/saml?'>
}]
}
};
Result:
Anyone without a valid session would automatically be redirected to login.
All of this inside an Edge Lambda, operating at the CloudFront request layer.
When I finally got the first successful login redirect, I’m not exaggerating, I celebrated like I had just won a championship.
Automation and Pipelines
Once the basics were working, it was time to automate everything.
I moved from:
Manual AWS Console clicks ➜ CloudFormation templates.
Manual deployments ➜ Buildkite pipelines.
All builds and deployments for EventCatalog happen in a dedicated non-production AWS account (this is purely an internal tool), which lets us avoid touching prod.
CloudFormation Templates
- 01-bucket.yml: Edge Lambda upload bucket
- 02-edgeLambda.yml: Edge Lambda build and version export
- 03-site.yml: Bucket, CloudFront, Lambda@Edge association
Wrote a deploy pipeline that:
- Dockerized the Edge Lambda build (Dockerfile.edge-lambda)
- Build steps:
- Build and package Lambda code
- Deploy S3 bucket (if missing)
- Upload ZIP
- Deploy Edge Lambda
- Deploy the site Stack
- Deploy the website
Dockerfile for Testing
FROM node:20-alpine
WORKDIR /app
COPY edgeLambda/ .
RUN npm install
CMD ["npm", "test"]
- Ran Mocha tests inside a clean container.
- No local pollution.
- Tests validated authentication flows, rewrites, and token handling.
More Challenges and Failures
It wasn’t all smooth sailing:
- I accidentally triggered stack rollbacks by trying to delete exported values still in use.
- Had to rename stack resources to workaround.
- Misconfigured S3 permissions led to “Access Denied” screens (hours of head-scratching).
- Updating Lambda code required creating a new version every time.
Important takeaway: CloudFront only accepts published Lambda@Edge versions, not aliases.
- Node version mismatches broke Lambda deployments.
- Buildkite agents refused to npm install without proper Docker isolation.
(Pro tip: never run npm install directly on your CI agent, use containers for clean, consistent builds.)
Each obstacle taught me something, sometimes painfully, but I wouldn’t trade it.
Testing and Polish
We didn’t stop at “it works.”
I added:
- Unit tests for the Edge Lambda:
- Verifying SAML responses
- Mocking JWT creation and validation
- Handling missing or expired Conditions
- Dockerised the test runner to isolate dependencies
- Made sure CI/CD pipelines tested before deploys
- Put monitoring on access attempts
This made deployments safer and will give the team in the future confidence to trust every push.
Reflections: What I Learned
- Automation saves future-you.
It’s worth it to spend extra time upfront automating builds, deployments, and tests. - Identity is complex.
SAML, JWT, X.509 certificates; these are powerful tools, but they demand deep attention to detail. - Celebrate small wins.
Sometimes just getting a 302 redirect instead of a 403 error feels like magic. - Clean environments matter.
Building inside containers avoids the endless "works on my machine" cycle.
Advice for Future Builders
If you're trying to do something similar:
- Start simple.
Get a basic "Hello World" live first; don't worry about security yet. - Invest early in Automation.
CloudFormation, Build-kite, and Docker saved weeks of work later. - Be meticulous with Identity.
SAML validation is tricky. Never blindly trust a SAMLResponse without:
- Signature verification
- Condition timestamp validation
- Test your Lambda@Edge in isolation.
Debugging CloudFront behaviour is slow, test locally as much as possible.
Conclusion: Beyond EventCatalog
Today, EventCatalog stands as a secure, scalable, easy-to-deploy internal platform.
But more importantly, it represents the journey I went through: from basic cloud hosting, through authentication challenges, automation pipelines, and deep technical growth.
It wasn't just about building a system.
It was about building an engineer.
About the Author
Fathy Abdelshahid
A software engineer who wrangles APIs by day, lifts weights by night, and blends technical know-how with humour, caffeine, and the occasional bad joke.