My Data Science Journey

Identifying Trending Topics in Machine Learning with NMF

August 25, 2025
Topic Extraction Using Non-Negative Matrix Factorization

As part of my exploration into emerging research trends in machine learning, I recently revived an older project of mine. The core idea was simple: gather abstracts from top machine learning journals, process them, and use Non-Negative Matrix Factorization (NMF) to uncover overarching themes in the field.

The pipeline looks like this:
1. Collect abstracts from leading journals.
2. Preprocess text with tokenization and TF-IDF vectorization.
3. Apply NMF to extract latent topic clusters.
4. Analyze results by tracking popularity over time and identifying representative/cited papers for each theme.
I limited the factorization to 20 topics – beyond this, additional clusters offered minimal new insight while reducing interpretability. Full methodology and code are available in my GitHub repo: https://github.com/trisha-sen/machine_learning_journal_topic_analysis

Figure 1 End-to-end workflow for topic extraction using NMF: from journal abstract collection to clustering and trend analysis.

Why NMF?

Unlike hard clustering approaches, NMF represents each document as a weighted combination of topics. This fits research papers well, since most span multiple themes (e.g., a paper on attention mechanisms may also touch on optimization strategies).

The output not only highlighted well-established areas but also surfaced several emerging fields that were new to me – making this a strong starting point for deeper exploration.

Figure 2 Popularity trends of extracted topics across time, highlighting emerging research themes.

Key Groups and Themes

1. Broad, Foundational Topics

The first four clusters capture the backbone of the ML research landscape:
- Generic learning and modeling
  Keywords: learning, data, model, training, samples, methods, knowledge, classification
- Attention mechanisms
  Keywords: attention, features, semantic, fusion, segmentation, multi-level
  - Particularly relevant is the expansion of transformer architectures into computer vision.
“Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation capabilities, researchers are looking at ways to apply transformer to computer vision tasks… Given its high performance and less need for vision-specific inductive bias, transformer is receiving more and more attention from the computer vision community.” – 10.1109/TPAMI.2022.3152247
- Neural Networks (CNNs)
  Keywords: neural, networks, convolutional, pruning, architecture
- Optimization algorithms
  Keywords: optimization, convergence, evolutionary, search, objective
2. Process Control of Dynamic Systems

Keywords: control, event-triggered, nonlinear, adaptive

“… feedback optimized control design for a class of strict-feedback nonlinear systems that contain unknown internal dynamics and states that are immeasurable and constrained within some predefined compact sets.” (example: a quarter-car active suspension)

10.1109/TNNLS.2021.3051030

3. Image Super-Resolution with Diffusion Models

Keywords: image, resolution, reconstruction, segmentation, medical, high/low quality

This was the most active industry-driven topic, with contributions from Microsoft, Google, and Snapchat.

“Single-image super-resolution is the process of generating a high-resolution image that is consistent with an input low-resolution image. It falls under the broad family of image-to-image translation tasks, including colorization, in-painting, and de-blurring”

“Diffusion-based image generators have seen widespread commercial interest, such as Stable Diffusion and DALL-E. These models typically combine diffusion models with other models, such as text-encoders and cross-attention modules to allow text-conditioned generation.”

https://imagen.research.google/, 10.1109/TPAMI.2022.3204461, https://en.wikipedia.org/wiki/Diffusion_model

4. Graph Neural Networks (GNNs)

Keywords: graph, nodes, structure, GCN

“Deep learning has revolutionized many machine learning tasks in recent years… The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged”

10.1109/TNNLS.2020.2978386

5. Domain Generalization

Keywords: domain adaptation, cross-domain, transfer

“If an image classifier was trained on photo images, would it work on sketch images? What if a car detector trained using urban images is tested in rural environments?… Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d. assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning.”

10.1109/TPAMI.2022.3195549

6. Small Object Detection (SOD)

Keywords: detection, anomaly, small objects

“Small Object Detection (SOD), as a sub-field of generic object detection, which concentrates on detecting those objects with small size, is of great theoretical and practical significance in various scenarios such as surveillance, drone scene analysis, pedestrian detection, traffic sign detection in autonomous driving, etc. There remains a huge performance gap in detecting small and normal-sized objects even for leading detectors.”

10.1109/TPAMI.2023.3290594

7. Predictive Learning of Spatiotemporal Sequences

Keywords: temporal, video, motion, frames

“As a key application of predictive learning, generating future frames from historical consecutive frames has received growing interest in machine learning and computer vision communities. It benefits many practical applications and downstream tasks, such as the precipitation forecasting, traffic flow prediction, physical scene understanding, early activity recognition, deep reinforcement learning, and the vision-based model predictive control. Many of these existing approaches suggested leveraging RNNs with stacked LSTM units to capture the temporal dependencies of spatiotemporal data.”

10.1109/TPAMI.2022.3165153

8. Multivariate Time-Series Prediction

Keywords: time series, forecasting, multivariate

“Modern cyberphysical systems (CPS), such as those encountered in manufacturing, aircraft and servers, involve sophisticated equipment that records multivariate time-series (MVTS) data from 10s, 100s, or even thousands of sensors. The MVTS need to be continuously monitored to ensure smooth operation and prevent expensive failures”

10.1109/TNNLS.2021.3105827

9. Other Active Areas

Several other high-interest topics emerged, including:
- Multi-view learning (MVL) – Deep multi-view learning methods: A review
- Feature selection for Big Data – A comprehensive survey on recent metaheuristics for feature selection
- Imbalanced/Noisy multi-label classification –
  - Learning from Noisy Labels with Deep Neural Networks: A Survey
  - A review of methods for imbalanced multi-label classification
- Federated Learning – Privacy and Robustness in Federated Learning: Attacks and Defenses
- 3D Object Reconstruction – Deep Learning for 3D Point Clouds: A Survey
- Clustering with Big Data – K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data
- Fuzzy Decision Making – Approximating XGBoost with an interpretable decision tree
- Recommender Systems – A survey of graph neural network based recommendation in social networks
- Adversarial Attacks – Backdoor Learning: A Survey
Final Thoughts

This project demonstrates how topic modeling can map the evolving landscape of machine learning research. By systematically analyzing abstracts, we can:
- Track emerging vs. declining research themes
- Identify seminal works within each cluster
- Spot areas ripe for exploration and collaboration
For me, the most exciting part was uncovering entire research areas I hadn’t explored before – a reminder of just how fast the field continues to expand.
Click-Through Rate Prediction in Practice: Lessons from the Deep Interest Network

August 10, 2025
In my work measuring digital advertising campaigns, I often see how quickly the landscape shifts—driven by advances in transformers and deep learning. These changes make it important to pause and assess which innovations are most relevant to improving outcomes for advertisers and platforms. The Deep Interest Network (DIN) paper caught my attention as a practical and meaningful contribution. Using it as a starting point, I explored the key developments that led here, considered where DIN fits into the broader CTR prediction landscape, and reflected on possible directions for future progress.

Click-through rate (CTR) prediction is the lifeblood of modern digital advertising and recommendation systems. Whether it’s deciding which product to show on Amazon, which video to recommend on YouTube, or which ad wins in a real-time bidding (RTB) auction, CTR estimates directly affect both user experience and revenue.

While CTR prediction has been studied for years, the Deep Interest Network (DIN)—introduced by Alibaba Group researchers—brings a fresh perspective to a long-standing challenge: how to model diverse and context-dependent user interests effectively.

Why CTR Prediction Is Hard

CTR prediction systems face three major challenges:
1. Extreme Sparsity in Data
  Most user and item attributes are categorical and extremely sparse. In datasets like Criteo’s, the feature space can have tens of millions of dimensions with >99.99% sparsity. Simple models like logistic regression struggle to generalize in such conditions.
2. Complex Feature Interactions
  User behavior patterns often depend on high-order interactions—e.g., {Gender=Male, Age=10, ProductCategory=VideoGame}—which can’t be captured by looking at features in isolation. Manually engineering these interactions is labor-intensive and incomplete.
3. Context Dependence of User Interests
  A single user’s browsing history may span multiple categories (e.g., handbags, jackets, electronics). Not all interests are equally relevant to every ad impression.
Modeling Approaches

From Embeddings to Interest Modeling

The first step in modern CTR models is to convert high-dimensional categorical features into dense, low-dimensional embeddings—similar to how NLP models embed words. This reduces dimensionality and captures latent structure (e.g., product category similarity).
In standard models, these embeddings are pooled (often averaged) into a single user vector and fed into a prediction network. While effective, this approach compresses all user interests into one fixed-length vector, losing granularity.

Implicit vs. Explicit Feature Interaction Approaches

Industry research has pursued two main paths:
- Implicit Interaction Models – Let deep neural networks discover feature interactions automatically.
  Examples:
  - Wide & Deep (Google Play)
  - YouTube’s recommendation DNN
Explicit Interaction Models – Model and expose feature interactions explicitly for interpretability.
Examples:
- AutoInt (multi-head self-attention for feature interaction)
- DCNv3 (deep cross networks with interpretable crossing layers)
Deep Interest Network

Limitations of Fully Connected Implicit Interaction Models

Fully connected implicit interaction models can only handle fixed-length inputs. Diverse interests of the user are compressed into a fixed-length vector (of manageable size), which limits model expressive ability. It is also not necessary to compress all the diverse interests of a certain user into the same vector when predicting a candidate ad because only part of the user’s interests will influence their action (to click or not to click).

DIN’s Key Innovation: Local Activation Unit

As an implicit interaction model, DIN builds on the approaches discussed above but takes them a step further by directly addressing the context-dependence problem. It does this through a local activation unit, which adaptively weighs a user’s historical behaviors according to their relevance to the current ad.
- For a t-shirt ad, clothing-related behaviors get more weight.
- For a smartphone ad, electronics-related behaviors dominate.
Unlike traditional attention in transformers, DIN’s activation weights are not normalized, allowing stronger emphasis on dominant interests.

This produces a context-specific user interest vector, enabling the model to focus on the subset of historical interactions most likely to influence the click decision.

Training Techniques That Make It Work at Scale

Alibaba’s production environment involves 60M users and 0.6B products, so efficiency matters.
1. Mini-batch Aware Regularization – Applies L2 regularization only to features present in the current mini-batch, avoiding unnecessary computation on inactive features.
2. Dice Activation – A data-adaptive activation function that shifts its rectification threshold based on the input distribution, improving stability across layers.
Results: Tangible Gains at Alibaba

During almost a month’s A/B testing in Alibaba (2017):
- +11% weighted AUC compared to the base DNN.
- +10% CTR and +3.8% revenue per mille (RPM) in a month-long A/B test.
These gains led to DIN’s deployment across Alibaba’s large-scale display advertising system.

Visualization Insights

A t-SNE visualization of learned embeddings showed (Points with same shape correspond to ads in the same category. Color indicates CTR for each ad.):
- Items from the same category naturally clustered together—despite no explicit category supervision.
- Click probabilities varied smoothly within these clusters, showing the model’s nuanced interest mapping.
Why DIN Matters

DIN’s contribution goes beyond Alibaba’s needs. It illustrates a broader principle for recommender and ranking systems:

User interests should be modeled in a context-dependent, dynamic way rather than compressed into a single static profile.

From my perspective in digital ad measurement, DIN represents a solid step toward more context-aware personalization. It bridges current best practices with the potential of emerging approaches, hinting at a future where CTR models combine adaptiveness with greater interpretability. As transformer-based architectures and advanced attention mechanisms continue to evolve, I expect DIN’s core ideas to remain central to shaping the next generation of advertising intelligence.y.

References
- Zhou et al., Deep Interest Network for Click-Through Rate Prediction, KDD 2018.
- Song et al., AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks, 2019.
- Wang et al., DCNv3: Towards Next Generation Deep Cross Network for Click-Through Rate Prediction, 2024.
- Cheng et al., Wide & Deep Learning for Recommender Systems, 2016
- Covington et al., Deep Neural Networks for YouTube Recommendations, 2016
A Journey of a Thousand Miles

April 22, 2025

It all began with a single step.. applying for the Insight Data Science Fellows Program..
What My Job at Macy’s Taught Me?

April 22, 2025

Right after my bootcamp at Insight Data Science Fellows Program, I joined the supply chain optimization team at Macy’s.
Moving on to becoming an actual ‘Data Scientist’

April 22, 2025

I became a real data scientist when I joined Nielsen Media.
What’s Next?

April 22, 2025

I am passionate about learning the intricacies of data science and disseminating what I learn. This blog is my attempt at making these involved and sometimes complicated concepts more accessible to anyone who is interested 🙂