Adnan Shahriar

01Case Study ↓

Should London Pursue a Future Olympic Bid?

This study quantifies London's economic trajectory post 2012 London Olympics — and weighs in on whether another London Olympic bid is worth it.

Interrupted Time-SeriesDiff-in-DiffPoisson RegressionPython

View Report ↓

The Challenge

Simple before-and-after comparisons of Olympic host cities are meaningless — London's economy was changing regardless. The question is whether the Games produced a measurable effect above and beyond what would have happened anyway.

This study examines three regeneration indicators across 2007–2017: international tourism, residential property prices, and business dynamism in the six host boroughs (Hackney, Newham, Tower Hamlets, Greenwich, Barking & Dagenham, Waltham Forest).

Each indicator required a different analytical design, chosen to match the structure of the data and the nature of the counterfactual question.

Three Methods, Three Questions

Tourism — Interrupted Time-Series: With London as a single unit and no comparable control city, ITS was the appropriate design. The model separates a level shift (an abrupt jump) from a slope change (a gradual acceleration), with 2012 excluded from estimation to avoid contaminating the pre/post split.

Property Prices — Matched Difference-in-Differences: A naïve DiD risks bias if host boroughs were already on different trajectories. Nearest-neighbour matching (k=3) on pre-period price levels and trends produced a balanced comparison group of 13 control boroughs. A two-way fixed-effects model with clustered standard errors captures the host×post effect.

Business Dynamism — Fixed-Effects Poisson: Firm births and deaths are count outcomes; Poisson regression with a log-offset (active enterprises) converts coefficients into incidence rate ratios, normalising for borough size. Negative binomial re-estimation confirmed robustness to overdispersion.

What the Data Shows

Tourism: Annual growth accelerated from ~1.4% pre-2012 to ~5.7% post-2012 — a statistically significant +4.3 percentage point shift (β₃ ≈ 0.042, p = 0.003). This is a gradual steepening, not a one-off spike.

Property Prices: Host boroughs gained approximately +4.5% in median prices relative to matched controls, but the estimate is statistically imprecise (p = 0.185). Event-study coefficients show divergence emerging from 2014 onward — a slow-burn legacy, not an immediate repricing.

Business Dynamism: Firm births rose ~6.2% (IRR = 1.062, p = 0.072); overall churn was significantly elevated (IRR ≈ 1.051, p = 0.045). However, five-year survival showed no improvement — more entry, but not better durability.

The Honest Answer

The evidence supports a qualified yes — but with domain-specific caveats. Tourism shows the clearest and most durable effect. Property price uplift exists but cannot be precisely separated from London's broader post-2012 housing cycle. Business dynamism increased at the entry margin without translating into stronger firm longevity.

The policy implication is that hosting alone is insufficient. Converting Olympic investment into sustained local prosperity requires complementary support — destination marketing, infrastructure, and business development programmes that outlast the event itself.

Key limitations: all data is aggregated annually; spillovers from host boroughs into controls blur the host–control contrast; the post-period ends at 2017, potentially too early to capture the full legacy arc.

02Case Study ↓

Lung Cancer Risk Prediction: Linear vs. Ensemble Machine Learning Models

Logistic regression is compared against random forest on a small medical dataset to study whether model complexity improves lung cancer risk prediction.

Logistic RegressionRandom ForestMATLAB

View Poster ↓

The Challenge

Lung cancer remains one of the leading causes of cancer-related mortality. Early identification of at-risk individuals from survey data could support earlier intervention — but the dataset (309 observations, 15 predictors) is small and severely imbalanced: 87.4% of patients have cancer.

This imbalance immediately makes accuracy a misleading metric. A classifier that labels everyone as "cancer" achieves 87% accuracy with 0% specificity. The real question is whether either model can meaningfully discriminate beyond this baseline — and whether the added complexity of Random Forest is worth it.

Deliberate Methodological Choices

F1-score as the tuning metric, not accuracy. In imbalanced medical data, missed cancers and false alarms both carry clinical cost. F1 balances precision and recall appropriately.

Stratified 70/30 split with a fixed random seed preserves class proportions across train and test sets and ensures reproducibility.

Z-score normalisation fitted on training data only, then applied to the test set — preventing data leakage.

Ridge (L2) regularisation for LR, chosen over Lasso because coefficient shrinkage rather than elimination was preferred given only 15 predictors. Grid search across 15 values of λ on a log scale, evaluated via 5-fold CV.

RF grid search across 12 combinations: trees {100, 200}, minimum leaf size {1, 5}, predictors per split {3, 8, 12}.

Results

0.957LR F1-Score

0.93LR AUC-ROC

0.727LR Specificity

Logistic Regression outperformed Random Forest across nearly every metric: Accuracy 0.924 vs 0.913, F1 0.957 vs 0.951, AUC 0.93 vs 0.90. Both models missed 4 cancer cases; LR produced fewer false positives (3 vs 4).

This is explainable: the correlation matrix showed only weak-to-moderate predictor correlations (|r| < 0.4), suggesting an approximately linear decision boundary where LR is well-suited. With n=309, Random Forest's flexibility introduces variance without proportional bias reduction.

What This Actually Means

The result illustrates the bias-variance tradeoff concretely: ensemble methods require sufficient data to realise their advantage. On small datasets with approximately linear structure, a regularised linear model can match or exceed a complex ensemble.

Honest limitations: with only 11 healthy test samples, specificity estimates are unstable. Statistical significance was not formally tested. Repeated train/test splits reporting mean ± SD would provide more robust estimates — a clear path for future work.

The operating threshold is fixed at 0.5 throughout. In a real clinical deployment, a lower threshold prioritising sensitivity (fewer missed cancers) might be preferable — the ROC curves support threshold-independent evaluation regardless.

03Case Study ↓

Handwriting Recognition System

This project developed a handwritten digit recognition system trained on a custom dataset, using multinomial logistic regression to classify handwritten digits.

Machine LearningNeural NetworksComputer Vision

View Report ↓

The Problem

Handwriting recognition is a classic benchmark problem in machine learning — deceptively simple on clean data, but meaningfully hard when digits are skewed, rotated, or noisy. The objective was to build a classification system capable of reliably identifying handwritten digits and to understand which design choices most affect performance.

Method

Placeholder: model architecture details, preprocessing pipeline, training strategy, and evaluation setup to be added here.

Findings

Placeholder: key accuracy metrics, confusion matrix observations, and notable failure cases to be added here.

Reflection

Placeholder: limitations, what would be done differently, and paths for future improvement to be added here.

04Case Study ↓

TfL Cycling Data Analysis for Informed Congestion Charge Policy

Three TfL monitoring stations are analysed to determine whether consistent peak windows exist to anchor a congestion charging recommendation.

PythonDecision TreeEDAGroup Project

View Report ↓

The Brief

TfL collects continuous cycle count data across London monitoring stations. The objective was to identify peak usage periods that could inform a congestion charging window — specifically, when cycling demand is high enough to justify a charge as a traffic management tool.

Three stations were assigned: ML0025 (Northbound, primary), ML0029, and ML0037. Analysis needed to confirm whether peak patterns were consistent across locations before any recommendation could be made.

The Analysis

Weather and seasonality: Scatter plots of private cycle counts over time, coloured by weather condition, revealed strong seasonal and weather-dependent patterns. Usage drops markedly in poor conditions and rises in dry and sunny periods.

Time-of-day and mode: Private cycles dominate hired cycles across all hours. Clear peaks emerge during morning and evening rush hours — visible even in raw scatter plots.

Directionality: Northbound cycles peak in the morning; southbound peak in the evening. This matches a predictable commuting pattern — and confirms the peaks are genuine commuter demand rather than leisure noise.

Decision tree (depth 3): A classification tree modelling cycle count by time of day confirmed the two high-usage windows and provided precise boundaries for the recommendation.

Key Findings

Peak window 1: 7:30–9:30. Consistent across all three stations (ML0025, ML0029, ML0037) with identical peak times of 8:30.

Peak window 2: 16:30–18:30. All three stations show peaks at 17:45 and 18:00 — confirming the evening commute pattern is station-agnostic.

The directional split (northbound AM, southbound PM) provides independent corroboration that these peaks reflect genuine commuter flows.

The Recommendation

TfL should implement congestion charging during 7:30–9:30 and 16:30–18:30. These windows are supported by three independent lines of evidence: time-of-day scatter analysis, directional patterns, and the decision tree model — across all three assigned stations.

The consistency across stations is the strongest part of the case. A recommendation grounded in a single monitoring point would be weaker; the cross-station agreement makes it actionable.

Live Product

05Visit site →

Co-founderChoicemate
choicemate.gg

An image polling platform that helps users decide between two options. Co-built and product managed from concept to live product.

Next.jsReactFramer MotionThree.js

View CV ↓

Should London Pursue a Future Olympic Bid?

The Challenge

Three Methods, Three Questions

What the Data Shows

The Honest Answer

Lung Cancer Risk Prediction: Linear vs. Ensemble Machine Learning Models

The Challenge

Deliberate Methodological Choices

Results

What This Actually Means

Handwriting Recognition System

The Problem

Method

Findings

Reflection

TfL Cycling Data Analysis for Informed Congestion Charge Policy

The Brief

The Analysis

Key Findings

The Recommendation

Co-founderChoicematechoicemate.gg

Co-founderChoicemate
choicemate.gg