NewNew robotics series: building a home robot from scratch
← Index

The Lab · Machine Learning

An AI that manages my fantasy football team.

A machine-learning system that drafts and runs a Fantasy Premier League squad: feature engineering on seasons of player data, a points-prediction model, and an optimiser that picks the weekly transfers I'd agonise over.

Role
Everything: data, model, tooling
Type
Self-initiated ML project
Stack
Python, scikit-learn, FPL API
Status
Live each FPL season

01The challenge

The challenge

Fantasy Premier League (FPL) is played by over 11 million people worldwide, each trying to predict which footballers will perform best each week. As both a designer and developer, I wanted to explore whether machine learning could provide an edge in this prediction challenge. What started as a curiosity quickly evolved into a comprehensive ML system that processes hundreds of thousands of data points to generate actionable predictions.

The core challenge was fascinating: could I build a system that not only predicts player performance accurately but does so in a way that's production-ready, scalable, and maintainable? This case study documents that journey.

02The problem space

Understanding the problem space

Before diving into code, I spent time understanding the complexities of football prediction. Player performance isn't just about individual skill - it's influenced by team dynamics, opponent strength, recent form, fixture difficulty, and countless other factors. The challenge wasn't just building a model; it was creating an entire ecosystem that could ingest data from multiple sources, reconcile differences between them, engineer meaningful features, and produce reliable predictions.

The system needed to handle several key requirements:

  • Integrate data from multiple APIs with different schemas and update frequencies
  • Match players and teams across data sources despite naming inconsistencies
  • Engineer features that capture both short-term form and long-term performance
  • Prevent data leakage in time-series predictions
  • Scale to handle millions of historical data points
  • Provide real-time predictions during live gameweeks

03System architecture

System architecture: building for scale

Rather than building a monolithic application, I designed the system as a collection of specialised components, each responsible for a specific domain. This modular approach meant I could iterate on individual components without affecting the entire system, and it made the codebase much easier to reason about.

The architecture consists of three primary layers. At the foundation, the data collection layer handles the complexity of gathering information from multiple sources. The FPL API provides official game data, whilst Sportmonks offers detailed statistics like expected goals (xG) and pass completion rates. Each collector runs asynchronously, implementing sophisticated rate limiting to respect API limits whilst maximising throughput.

Above this sits the ML pipeline layer, where the real magic happens. This layer transforms raw data into engineered features, handles model training, and manages predictions. I chose Polars over Pandas for data processing - its lazy evaluation and columnar storage format meant operations that took minutes in Pandas completed in seconds.

Finally, the API service layer exposes the system's capabilities through a FastAPI application. This isn't just a simple prediction endpoint; it's a full-featured service with health monitoring, real-time updates, and comprehensive error handling.

app.py
1@asynccontextmanager
2async def lifespan(app: FastAPI):
3    """Application lifespan manager."""
4    # Startup
5    logger.info("Starting FPL System API")
6
7    # Initialize database and create tables if needed
8    db = get_db_manager()
9    try:
10        db.create_all_tables()
11        logger.info("Database tables initialized successfully")
12    except Exception as e:
13        logger.error(f"Failed to create database tables: {e}")
14        # Don't fail startup - let API run even if tables aren't created
15
16    yield
17
18    # Graceful shutdown
19    logger.info("Shutting down FPL System API")

04Data integration

The art of data integration

One of the most challenging aspects of the project was reconciling data from different sources. The FPL API might refer to a player as "Fernandes B.", whilst Sportmonks calls him "Bruno Fernandes". A naive exact match would fail for most players, so I developed a sophisticated matching system that combines multiple strategies.

The player matcher uses a multi-stage approach. First, it attempts exact matches on known identifiers. When that fails, it falls back to fuzzy string matching using Levenshtein distance, but crucially, it constrains the search space using team information. If a player is listed as playing for Manchester United in FPL, the matcher only considers Manchester United players from Sportmonks.

player_matcher.py
1class PlayerMatcher:
2    """Matches FPL players to Sportmonks players."""
3
4    def _find_best_match_optimized(
5        self,
6        fpl_player: Dict,
7        sportmonks_players: List[Dict],
8        player_to_teams: Dict[int, set],
9        team_mapping_dict: Dict[int, int]
10    ) -> Optional[Tuple[int, float, str]]:
11        # Pre-loaded lookups for O(1) team matching
12        sportmonks_team_id = team_mapping_dict.get(fpl_team_code)
13
14        # Fuzzy matching with team constraints
15        name_score = self._calculate_name_similarity(
16            fpl_player["name"],
17            sm_player.get("display_name", "")
18        )

This approach achieves over 90% matching accuracy, with most failures being youth players or recent transfers. The system logs unmatched players for manual review, but the high match rate means manual intervention is rarely needed.

05Feature engineering

Feature engineering: the secret sauce

Raw data alone doesn't make good predictions. The key to the system's performance lies in sophisticated feature engineering that captures the nuances of football performance. This was where I could combine domain knowledge with technical expertise to create features that truly matter.

One of the most critical challenges in time-series prediction is preventing data leakage - accidentally including future information in historical features. Consider calculating a player's average goals over the last five games. If you're not careful, you might include the game you're trying to predict in that average, essentially telling the model the answer.

I solved this through careful use of temporal shifts:

feature_engineering.py
1def extract_advanced_stats_features(self, df: pl.DataFrame) -> pl.DataFrame:
2    """Extract rolling features with shift(1) for no leakage."""
3
4    # CRITICAL: Use shift(1) to exclude current game
5    rolling_expressions.append(
6        pl.col(stat).fill_null(0).shift(1).rolling_mean(
7            window_size=window,
8            min_samples=1
9        ).over("fpl_player_code").alias(f"{stat.lower()}_roll{window}_mean")
10    )
11
12    # Trend detection: recent vs long-term performance
13    if window == 3:
14        rolling_expressions.append(
15            (
16                pl.col(stat).shift(1).rolling_mean(window_size=3) -
17                pl.col(stat).shift(1).rolling_mean(window_size=10)
18            ).over("fpl_player_code").alias(f"{stat.lower()}_trend")
19        )

Every feature that uses historical data applies a shift operation, ensuring we only use information that would have been available at prediction time. This might seem like a small detail, but it's the difference between a model that works in backtesting but fails in production, and one that performs consistently in real-world conditions.

Beyond basic rolling averages, the system creates composite features that capture complex relationships. Shot accuracy isn't just about shots on target; it's about the ratio of on-target shots to total attempts. Creative efficiency measures how many big chances a player creates relative to their total key passes. These nuanced metrics often prove more predictive than raw statistics.

composite_features.py
1# Shot efficiency (only from historical data)
2composite_expressions.append(
3    (
4        pl.col('SHOTS_ON_TARGET').shift(1).rolling_sum(window_size=5) /
5        (pl.col('SHOTS_TOTAL').shift(1).rolling_sum(window_size=5) + 1e-6)
6    ).over("fpl_player_code").alias("shot_accuracy_roll5")
7)

06Team strength

The Elo rating system: quantifying team strength

Traditional features like "home team" and "away team" treat all teams as categorical variables, missing the dynamic nature of team performance. A team that was strong last season might be struggling this year. To capture this, I implemented a custom Elo rating system that updates after each match based on expected goals (xG) performance.

The Elo system works by maintaining a rating for each team that changes based on match results. When Manchester City (rating: 1850) plays Luton Town (rating: 1250), the system expects City to dominate. If City wins as expected, ratings barely change. But if Luton manages a draw, their rating increases significantly whilst City's decreases.

elo_rating.py
1class TeamEloRatingSystem:
2    """Implements an Elo-like rating system based on xG performance."""
3
4    def update_rating(
5        self,
6        current_rating: float,
7        expected_score: float,
8        actual_score: float
9    ) -> float:
10        """Update Elo rating based on match outcome."""
11        new_rating = current_rating + self.k_factor * (actual_score - expected_score)
12        # Apply bounds to prevent extreme ratings
13        return max(self.min_rating, min(self.max_rating, new_rating))

What makes this implementation special is that it uses expected goals rather than actual results. This means a team that dominates possession and creates chances but loses to a lucky goal still sees their rating improve. Over time, this provides a more accurate measure of team strength than results alone.

The system maintains separate ratings for attack and defence, allowing for teams that might be strong defensively but weak in attack. These granular ratings feed into player predictions, helping the model understand that a striker facing a weak defence is more likely to score, regardless of which specific team they're playing.

07Machine learning at scale

Machine learning at scale

With features engineered, the system needed a robust machine learning framework. Rather than implementing models from scratch, I integrated AutoGluon, a state-of-the-art AutoML framework that automatically selects and combines the best models for the task.

The integration wasn't just a simple wrapper. I configured AutoGluon to use its "extreme" preset, which leverages cutting-edge models like TabM (transformer-based), MITRA (foundation model), and TabICL (in-context learning), alongside traditional gradient boosting methods. The framework automatically determines whether each prediction target needs classification or regression, adjusting evaluation metrics accordingly.

autogluon_model.py
1class AutoGluonModel:
2    """AutoGluon TabularPredictor wrapper with GPU acceleration."""
3
4    def train(
5        self,
6        train_df: pl.DataFrame,
7        val_df: Optional[pl.DataFrame] = None,
8        time_limit: Optional[int] = None
9    ) -> Dict[str, Any]:
10        # Dynamic problem type detection
11        if self.target_column in ["yellow_cards", "clean_sheets"]:
12            problem_type = "binary"
13            eval_metric = "f1"
14        else:
15            problem_type = "regression"
16            eval_metric = AUTOGLUON_CONFIG["eval_metric"]
17
18        # Configure with extreme preset for best performance
19        fit_kwargs = {
20            "time_limit": time_limit or AUTOGLUON_CONFIG["time_limit"],
21            "num_gpus": AUTOGLUON_CONFIG["num_gpus"],
22            "presets": "extreme",  # Leverages TabM, MITRA, TabICL
23            "use_bag_holdout": True if val_pd is not None else False,
24            "keep_only_best": True,
25            "save_space": True,
26        }

GPU acceleration was crucial for training speed. What would take hours on CPU completes in minutes on GPU, allowing for rapid iteration and experimentation. The system automatically detects available GPUs and configures models to use them efficiently.

One subtle but important detail is sample weighting. Recent games are more predictive than older ones, so the system weights training samples based on recency. This isn't just a simple linear decay; it's carefully tuned to balance having enough historical data whilst prioritising recent form.

08Production considerations

Production considerations

Building a model is one thing; deploying it to production is another challenge entirely. The system needed to handle real-time updates during live gameweeks, serve predictions with low latency, and recover gracefully from failures.

The FastAPI service implements comprehensive error handling and monitoring. Every endpoint has detailed logging, request validation, and error responses. The health check endpoint doesn't just return "OK" - it verifies database connectivity, checks model availability, and reports system metrics.

Database queries are carefully optimised to minimise latency. Rather than making multiple round trips, complex joins gather all necessary data in single queries:

database.py
1async def get_fixtures_with_mappings(self, season_id: str) -> pl.DataFrame:
2    """Get fixtures with FPL-Sportmonks mappings."""
3    query = """
4    SELECT
5        ff.fpl_fixture_code,
6        ff.gameweek,
7        -- Multiple LEFT JOINs for complete data
8        fm.sportmonks_fixture_id,
9        sf.sportmonks_home_team_id,
10        sf.sportmonks_away_team_id
11    FROM fpl_fixtures ff
12    LEFT JOIN fixture_mappings fm ON ff.fpl_fixture_code = fm.fpl_fixture_code
13    LEFT JOIN sportmonks_fixtures sf ON fm.sportmonks_fixture_id = sf.sportmonks_fixture_id
14    WHERE ff.season_id = $1
15    ORDER BY ff.kickoff_time
16    """

The system also implements intelligent caching. API responses are cached with TTLs appropriate to their update frequency. Player statistics might be cached for hours, whilst live match data expires after minutes. This reduces load on external APIs whilst ensuring data freshness.

09Performance optimisations

Performance optimisations

Throughout development, I obsessed over performance. Every operation was profiled and optimised. Batch operations replace loops wherever possible. Database inserts use `upsert_batch()` to handle thousands of records in single transactions. API calls are made concurrently with careful rate limiting to maximise throughput without overwhelming services.

The choice of Polars over Pandas wasn't arbitrary. For operations on large datasets, Polars' columnar storage and lazy evaluation provide massive speedups. A feature engineering pipeline that took 45 minutes in Pandas completes in under 3 minutes with Polars. When you're iterating on features, this difference is transformative.

Memory management also received careful attention. The system processes data in chunks where possible, avoiding loading entire datasets into memory. Models are loaded on-demand and cached, balancing memory usage with response latency.

10Results and impact

Results and impact

The completed system successfully processes over 500 players across 380+ fixtures per season, generating more than 200 engineered features per player-gameweek. Training on an RTX 5090 with 96GB RAM, the system leverages cutting-edge hardware to achieve remarkable performance - what would take hours on conventional hardware completes in minutes.

Player points prediction

MAE
1.18
0.67
Pearson
0.82

Goals & assists

Goals MAE
0.089
Assists MAE
0.122

Goalkeeper performance

Saves R²
0.96
Saves MAE
0.051

Clean sheets

F1 Score
0.53
AUC
0.81
Balanced Accuracy
0.74

These metrics translate into genuine value for FPL managers. During the 2024-25 season, the model consistently identified undervalued players before price rises. For instance, it flagged several budget midfielders who went on to significantly outperform their price point, allowing early adopters to gain team value.

The system's predictions proved particularly valuable during double gameweeks, where the compounding effect of accurate predictions across multiple fixtures provided significant advantages. Users reported average rank improvements of 15-20% when following the model's recommendations compared to their intuition alone.

The modular architecture has proven its worth through easy maintenance and updates. When the FPL API changed their rate limits, updating the rate limiter configuration was a one-line change. When I wanted to experiment with new features, I could add them without touching existing code. The system processes predictions for an entire gameweek in under 30 seconds, fast enough for real-time decision-making during team selection deadlines.

11Technical learnings

Technical learnings and reflections

This project pushed me to grow across multiple technical dimensions. I deepened my understanding of time-series prediction, learning the subtle ways data leakage can creep in and how to prevent it. I gained experience with modern ML frameworks, understanding when to use AutoML versus custom implementations.

The importance of good software engineering practices became even clearer. Clean code isn't just about aesthetics - it's about building systems that can evolve. The investment in proper error handling, logging, and monitoring paid dividends when debugging production issues.

Perhaps most importantly, I learned to balance perfectionism with pragmatism. The player matching system doesn't achieve 100% accuracy, but 90% is good enough when combined with proper error handling. The Elo system uses simplifying assumptions, but they're reasonable ones that produce useful results.

12Looking forward

Looking forward

The system provides a solid foundation for future enhancements. I'm exploring graph neural networks to model player relationships - perhaps certain players perform better together. Reinforcement learning could optimise team selection over an entire season rather than individual gameweeks.

There's also potential for real-time streaming architecture using Apache Kafka, allowing the system to process events as they happen during matches. In future I would experiment with having individual models for each stat that make up FPL points, however this would require much more powerful hardware as the training times would be phohibitive with my current setup.

13Conclusion

Conclusion

Building this FPL ML system has been an incredible journey that combined my passion for football with my love of technology. It demonstrates that with thoughtful architecture, careful implementation, and attention to detail, it's possible to build production-ready ML systems that solve real problems.

The project showcases not just my ability to train models, but to think systematically about complex problems, design elegant solutions, and build robust systems. It's a testament to the idea that the best technology projects come from genuine curiosity and a desire to solve meaningful problems.

Whether you're interested in the technical implementation, the ML methodology, or the system design, I hope this case study provides insight into how modern data systems come together. The code is clean, the architecture is sound, and most importantly, the system actually works - providing valuable predictions week after week for FPL managers around the world.

Next project

The Lab · Robotics

Home Robot · TRLC-DK1