Frank

And lost be the day to us in which a measure was not danced.

Und verloren sei uns der Tag, wo nicht einmal getanzt wurde!

—— Nietzsche, Thus Spoke Zarathustra

15 posts

2026 · 06 · 13
On Business Taste

When SpaceX was on the brink of bankruptcy, Peter Thiel made a life-saving investment that turned into over 50 billion dollars on the day the company went public. From Girard's mimetic desire, to the four traits of monopoly in Zero to One, to Warren Buffett and Duan Yongping's principle of 'look at the business model first'—I've come to realize that taste in business models is the single most important thing, whether you're investing or building a company.
Business Read more →
2026 · 06 · 12
How to Persuade People

The best theory of persuasion comes not from a modern business school but from Aristotle, twenty-three hundred years ago: Ethos (credibility), Pathos (emotion), Logos (logic). Every popular persuasion framework today—Claim → Reason → Evidence for a raise, STAR for interviews, BLUF for email, Problem → Solution → Traction for the pitch, Pain → Gain → Proof for sales, the Golden Circle for the launch event, BATNA for negotiation, AIDA for copywriting, Nonviolent Communication for intimate relationships—is at bottom a localized version of these three elements for a different audience. This piece walks through ten scenarios one by one and lands on a single line: the framework is the map, sincerity is the compass.
Essays Read more →
2026 · 06 · 09
Lessons From Building Products for Overseas Markets

Claude Code has gotten really good lately, and a lot of people now have a shot at building their own products. This post shares the tech stack and the various SaaS platforms I commonly use when building products for overseas markets: static sites, full front-end/back-end apps, mobile, and desktop—covering framework choices, deployment, databases, auth, payments, AI services, and image hosting, with free-tier limits and starting costs wherever I can.
TechnologyBusiness Read more →
2026 · 06 · 05
How to Iterate on Yourself Systematically

Karpathy on how to become an expert and Feynman on how to test real understanding are, at bottom, the same thing: use output to check input. This piece maps the method—do a project first, fill in theory later, only compare yourself to your past self—onto the training loop of a neural network. Forward is acting first, Loss is the gap from the ideal result, and Backward plus gradient descent is the review-and-improve step. The biggest difference between people and networks is that a large Loss hurts us, so the key is to separate criticizing behavior from negating worth: gradient descent only updates the parameters (your behavior, your skills), so don't blow up the architecture (who you are). Two final tricks: write your mistakes down serially, then immediately move on to the next round—don't keep running Backward on the same sample and overfit.
Essay Read more →
2026 · 06 · 03
Common Go Concurrency Pattern Templates

A ready-to-use Go concurrency reference. From "locking shared variables, Once, channels, Pub/Sub" to "select/Actor, pipeline/fan-out/worker pool, context, errgroup," each pattern gives when to use it, template code, and key pitfalls, closing with a cheat sheet: if you can avoid sharing, communicate instead; if you must share, wrap it with the simplest synchronization possible; and every goroutine needs a clear exit path.
GoConcurrencychannel Read more →
2026 · 05 · 27
My Claude Code Practices

I've been using Claude Code for almost a year now. As one of its earliest users, I've watched both Claude Code and the models themselves improve enormously over the past year. This post shares some ways I use it beyond writing code, along with what I've learned: a workflow built on an always-on home machine + Tailscale + tmux; managing frontend and backend with a monorepo + submodules + OpenAPI; doing Deep Research with tree-structured prompts; cleaning up long conversations and messy desktop folders; and using it to draw SVGs, process images, make videos with Remotion, and more.
Technology Read more →
2026 · 05 · 19
Reason Is the Slave of the Passions

Hume said that reason is, and ought only to be, the slave of the passions. Translate that into language a software engineer knows today: emotion is the machine instruction set, the thing the CPU actually executes; reason is a high-level language that has to be compiled before it can run. Any chain of rational reasoning ultimately has to bottom out in something you care about. A good system first builds its framework in a high-level language, then profiles out the hot spots and hand-optimizes that small slice to run close to the hardware. Port the same architecture onto life: use reason to build the big framework, use love to drive the high-frequency everyday.
Essays Read more →
2026 · 05 · 17
The Barbell Strategy and What It Teaches Us About Personal Growth

Taleb's barbell strategy: put your resources at the extremes and deliberately avoid the middle. The idea was first used to talk about investing, then extended to reading, careers, even life planning—but its core was never about how much to put on each end. It's about accepting that the world is fundamentally uncertain: first make sure you survive, then go after the opportunities that might let you leap.
Essays Read more →
2026 · 05 · 09
Hong Kong's Mountains: A Hiking Map of Trails, History, and Urban Memory

The 100 km MacLehose Trail, Lion Rock sung about for fifty years, the pavilion at Pat Sin Leng that fire kept failing to consume—70% of Hong Kong is country park, and every trail threads through a piece of institutional history, an old song, or a war. 10 routes × history × culture.
Hong KongHikingMacLehose Trail Read more →
2026 · 05 · 09
Where Ideas Are Born: A Pilgrimage Map of European Philosophy

Plato's Academy, a Danube army camp, a Bordeaux tower, an Ulm stove-room, the Sils-Maria boulder, a Black Forest hut—putting 12 philosophical classics back into the specific rooms, cities, and landscapes where they were born.
EuropePhilosophyPhilosophy Travel Read more →
2026 · 04 · 11
Reading Notes: Freedom of Money

I spent two days reading Changpeng Zhao's autobiography Freedom of Money. I'm not into crypto, but I've always been curious about CZ's legendary story—selling his house to go all-in on Bitcoin, becoming the richest ethnic Chinese person, building a crypto empire, and serving time in a U.S. prison. A few reflections.
EssaysBusiness Read more →
2026 · 04 · 06
Survival, Competition, and Freedom

There are three stages to the things people do: in the survival stage you are driven by instinct, in the competition stage your coordinate system is defined by your rivals, and when you finally reach freedom, what you face is an open wilderness with no compass. Freedom without values is a painful prison; freedom with values is the most powerful engine. Charles Zhang and Elon Musk are two of the most telling case studies.
Essays Read more →
2026 · 02 · 15
Analyzing the Business Models of Independent Content Creators

Content creation may be the biggest lever available to ordinary people today—low barrier to entry, high ceiling, zero marginal cost of distribution. Looked at through the lens of business models, there are really only three ways to make a living from it: sell content directly, help others sell goods, or help yourself sell goods
Business Read more →
2026 · 02 · 01
The Boundaries of Compression

Knowledge falls into two kinds: Techne, which can be serialized, and Metis, which depends on context, trial and error, and the body. AI excels at compressing the former but cannot touch the latter—and that is precisely the boundary of its capability. As standardized work gets rapidly replaced, the experience that cannot be compressed becomes scarcer instead.
Essays Read more →
2026 · 01 · 23
Book Notes: The Technological Republic

Palantir CEO Alex Karp released a new book, The Technological Republic, in 2025, and it was published in mainland China at the end of the year. I bought it and read it right away. The views in the book represent the right-wing current of thought in Silicon Valley, and its shadow can be seen everywhere in actual American politics.
EssaysBusiness Read more →
2025 · 12 · 21
The Data Supplier Behind the Large Models: Surge AI

I first heard of Surge AI from a podcast interview with Edwin Chen, right as they were raising their first round of funding. Edwin came across as extremely pragmatic and efficient, and his views left a strong impression.
Business Read more →
2025 · 12 · 19
The VL Model Behind the Doubao AI Phone

According to public reporting, the model used in the Doubao AI phone is a closed-source version of UI-TARS optimized for phones. UI-TARS itself is the result of SFT on top of Alibaba's Qwen2 VL, with the 7B version currently open-sourced (Qwen2 VL has open-sourced models from 3B to 72B). Rather than dwelling on Qwen here (Qwen2 VL already has UI Operation capabilities), this post focuses on how UI-TARS improves further on top of Qwen2 VL, split into data and training.
TechnologyBusiness Read more →
2025 · 10 · 15
A Dimension-by-Dimension Walkthrough of LLM Inference, with the Core Formulas

Using Llama 3 8B as the baseline, we trace the entire inference path — token ID → embedding → Transformer → sampling — and write out the dimensional flow and core formulas from memory.
LLMTransformerInference Read more →
2024 · 04 · 04
Using UTM Tags to Analyze Traffic Sources

Promotion usually runs several channels at once: cold email, Google Ads, Twitter, SEO, communities. Without tagging your links, GA can't tell them apart. This post lays out the five UTM parameters clearly and gives a dozen real-world naming examples.
Business Read more →
2024 · 03 · 21
jenni.ai's Cold Start and Growth Strategy

Jenni.ai is an AI tool for assisting with paper writing and reading. It has already reached \$5M ARR with 2.5M users and is still growing fast. In its early days it actually did SEO writing and could barely survive; later it narrowed down to academic writing and came back from the dead. CEO David Park has candidly shared their growth path from zero to several million dollars—from cold calls, to lurking in groups, to viral growth on TikTok—and much of it is worth learning from.
Business Read more →
2024 · 03 · 03
Startup Perks Worth Knowing About

A systematic rundown of the perks the big international vendors offer startups, covering cloud, AI tokens, banking, and SaaS tools. With special attention to which ones you can apply for directly without a VC referral.
Business Read more →
2023 · 04 · 15
Shuangyue Bay: When You're Worn Out, Come Here to Lie Flat and Watch the Sea

Just two hours from Shenzhen, Shuangyue Bay in Huizhou is a place made for doing nothing: watch the sunrise over the water from your guesthouse balcony, set off fireworks and grill skewers on the beach, eat seafood far cheaper than in Shenzhen, then climb the lookout to see how two curving coastlines come together to form a "double moon." Includes a concrete two-day, one-night itinerary.
Travel Read more →
2023 · 02 · 11
Hong Kong Country Trails: Walking the Ridges and the Coast of Ten Thousand Columns

Hong Kong is more than Victoria Harbour and Temple Street. Cross the border from Shenzhen and in half a day you can step into its other side—weathered ridges, alpine meadows like Wugong Mountain, deserted beaches all to yourself, and a coast of ten thousand columns left behind by Jurassic volcanoes. This piece gathers four country hikes: Beiling Double Crossing, Kai Kung Leng, Sharp Peak, and Po Pin Chau.
Travel Read more →
2022 · 12 · 10
Zhongshan + Jiangmen: Visiting the Filming Locations of "The Knockout" and "Let the Bullets Fly"

Stringing Zhongshan and Jiangmen into a single weekend route: Thirty-Three Market Street recreates the old factory district from "The Knockout," the Mei Family Mansion is the Goose Town of "Let the Bullets Fly," a single ticket to the Zhongshan Film Studio lets you shoot scenes from many countries, plus two easy hikes up Shitou Mountain and Yaji Mountain. A hands-on guide for movie fans and photo lovers.
Travel Read more →
2022 · 11 · 12
Dapeng Peninsula: The Cleanest Sea and the Most Beautiful Coastline

Right next to a megacity like Shenzhen, there's this pristine landscape of mountains and sea. Dongxichong, Dayanding, Daluqiang, the Mermaid Cave at Luzui Villa, and the Shenzhen Observatory—five spots on the Dapeng Peninsula I keep coming back to, from hardcore hikes to gentle seaside strolls you can take your parents on.
Travel Read more →
2022 · 10 · 15
Shenzhen After Hours: A Few Places for Weeknights and Weekends

No need to travel far—right within the city, Shenzhen hides plenty of places worth a visit: chase the sunset up Tanglang Mountain after work, watch the sun sink below the horizon at the Mangrove Park, listen to live music at sea aboard the Greater Bay Area No. 1 cruise, pray at Hongfa Temple and stroll through Fairy Lake Botanical Garden, or hike Meishajian at night to overlook Yantian Port.
Travel Read more →
2022 · 08 · 13
Wugong Mountain: Alpine Meadows Above the Clouds

Wugong Mountain is in Pingxiang, Jiangxi, and is best known for its rolling alpine meadows—green grasslands in summer, with a better chance of catching a sea of clouds in winter. This is a record of a three-day, two-night trip from Shenzhen: taking the cable car up to skip the thousand-meter climb, walking the cliffside boardwalk, watching the sunrise, spending a night on the summit, and happening to catch a bonfire music festival on the meadow.
Travel Read more →
2021 · 06 · 12
NCNN Peak Memory Benchmark: A Layer-by-Layer Analysis of MobileNet

To understand how much peak memory a model actually consumes during inference, I ran a layer-by-layer benchmark of MobileNet with NCNN on x86, compared several networks side by side, and analyzed how light_mode, fp16, and int8 affect peak memory — along with the differences in how "peak memory" itself is defined.
Model CompressionInference EngineMobileNet Read more →
2021 · 04 · 18
A Panorama of ML System Design: Inference, Training, Data, and Deployment

Course notes from Stanford's CS 329S, Machine Learning Systems Design: inference and learning paradigms, data storage and feature management, sampling and class imbalance, parallel training and system testing, experiment-management tooling, and model deployment — a full picture of taking ML systems into production.
MLSysSystem DesignDeep Learning Read more →
2021 · 04 · 12
Profiling Performance in Deep Learning Training and Inference

Performance bottlenecks in training and inference often aren't obvious; you need profiling to analyze them quantitatively. Notes on the profiling methods I use for the training phase (breaking down the time spent on data loading, the forward pass, and the backward pass) and the inference phase.
ProfilingPerformance OptimizationDeep Learning Read more →
2021 · 02 · 07
Pruning Convolutional Neural Networks: Decide the Plan, Prune, Then Finetune

Pruning is subtraction applied to a trained network. It breaks down into three steps: first decide how much to prune in each layer (by intuition, by sensitivity analysis, or by searching with reinforcement learning as in AMC), then actually prune (unstructured weight sparsity and the "Lottery Ticket Hypothesis"; or structured pruning that removes whole filters using criteria like L1/L2 norm, geometric median FPGM, BN's γ, or feature-map rank HRank), and finally finetune to recover the accuracy. Or you can simply prune while training—Slimmable Networks and AutoSlim.
Technology Read more →
2021 · 02 · 01
Training Tricks Miscellany: Reparameterization, Label Smoothing, and Dropout

A few scattered but practical training tricks. Reparameterization (ACNet / RepVGG / RepMLP) adds extra branches at training time to help optimization, then fuses them back into the original structure at inference—essentially gaining a bit of accuracy for free without adding any inference cost. Plus weight EMA, Shake-Shake, label smoothing, and Dropout—common tactics for fighting overfitting and improving generalization.
Technology Read more →
2021 · 01 · 30
Knowledge Distillation: Teaching a Small Network to Mimic a Large One

Knowledge distillation is a common way to boost a small network's accuracy and speed up convergence, and it's often used to fine-tune small models after compression or NAS. The core idea is straightforward: treat the outputs of a high-accuracy "teacher network" as soft labels for a "student network," letting the student mimic the teacher. This post walks through several representative methods—classic temperature-scaled distillation, FitNets' intermediate-layer hints, FSP matrices, the teacher-assistant network (TAKD), and DML, where two networks learn from each other.
Technology Read more →
2021 · 01 · 24
Image Data Augmentation: From Random Crop to AutoAugment

Data augmentation is a standard trick for preventing overfitting in computer vision. This post first walks through the common image augmentations—random resized crop, cutout, random erase, mixup—then covers the automated AutoAugment and Fast AutoAugment (I used the latter to win the image track of the AutoDL competition back in the day), and finally adds a note on test-time augmentation (TTA).
Technology Read more →
2021 · 01 · 18
Dealing with Class Imbalance: Resampling, Weighting, and Ensembles

Real-world data is rarely as tidy as public datasets, and severely imbalanced positive and negative samples are the norm. There are three main approaches to class imbalance: resampling (undersampling the majority class, e.g. Tomek Links; oversampling the minority class, e.g. SMOTE), weighting different classes or easy/hard samples (Focal Loss), and ensembles (Bagging).
Technology Read more →
2020 · 12 · 12
Deep Learning Hyperparameter Tuning: From Hand-Crafted Alchemy to Automated Search

Tuning in deep learning is both a black art and a make-or-break step—so much so that training is jokingly called "alchemy" and engineers call themselves "tuning monkeys." This post first covers the essentials of manual tuning—how batch size and learning rate move together, the update rules for SGD / momentum / Adagrad / Adam, learning-rate warmup and decay, and a small weight-decay trick—then turns to automated hyperparameter optimization (HPO): grid/random search, the CMA-ES evolutionary algorithm, and the most sample-efficient Bayesian optimization (Gaussian process + acquisition function), plus off-the-shelf tools like NNI and RAY.
Technology Read more →
2020 · 12 · 06
Designing Compact Networks: Taking Convolution Apart

To run on mobile with low latency and a small memory footprint, a family of compact networks was designed. This piece walks through their design ideas: MobileNet splits the standard convolution into depthwise + 1×1 convolutions (cutting FLOPs by up to 9×); ShuffleNet's grouped 1×1 convolution + channel shuffle; MobileNetV2's inverted residual; ShuffleNetV2's four design guidelines built around memory access cost (MAC); and MobileNetV3's h-swish, EfficientNet's compound scaling, and GhostNet's cheap feature generation.
Technology Read more →
2020 · 12 · 01
The Evolution of Convolutional Network Architectures: From LeNet to SENet

Following the ILSVRC timeline, this walks through several milestones in convolutional neural network architecture: LeNet-5 from 1998, AlexNet that ignited deep learning in 2012, GoogLeNet with its multi-branch Inception modules, the deeper VGG, Inception with Batch Normalization, ResNet that trained hundreds of layers via residual connections, and SENet that introduced channel attention.
Technology Read more →
2020 · 11 · 15
How to Measure Whether a Neural Network Is "Good" and "Fast"

Optimization presupposes measurement—if you can not measure it, you can not improve it. A neural network's performance actually splits into two lines: task-oriented accuracy (Top-1 for classification, mAP for detection, mIoU for segmentation, PSNR for restoration) and efficiency-oriented cost (latency, FLOPs, parameter count, peak memory). This piece lays out the definitions and computations of these metrics, and why FLOPs so often lie (DenseNet161 is more than twice as slow as VGG16), all in one go.
Technology Read more →
2020 · 11 · 03
A Quantitative Analysis of PyTorch Training Acceleration

Starting from a baseline, this article progressively optimizes training speed through a variety of software and hardware methods, ultimately cutting training time to one eighth.
Technology Read more →
2020 · 02 · 29
Notes on a Survey of Automated Data Augmentation Methods

Notes on three automated data augmentation surveys: AutoAugment searches for augmentation policies with reinforcement learning, RandAugment drastically shrinks the search space, and Fast AutoAugment speeds up the search by merging policies.
Paper NotesData AugmentationAutoML Read more →
2019 · 12 · 31
AutoDL Image Competition Tuning Log

A running log of tuning experiments on the AutoDL image-classification track (Pedro dataset): ResNet18 / 34 backbones, CELU in place of ReLU, resetting BN and Group Norm, FP16 mixed precision, ReID / pedestrian-attribute pretraining, and a series of controlled experiments including online-data reshuffling.
Hyperparameter TuningImage ClassificationExperiment Log Read more →
2019 · 12 · 01
Milestones in Neural Architecture Search (NAS)

Neural Architecture Search (NAS) has been red-hot this year. This post is a quick rundown of the work I personally find most representative. Corrections and additions are welcome, hahaha.
Technology Read more →
2019 · 08 · 12
Feeding the GPU in Deep Learning

I trained a lot of models recently and found that brute force is not always magic—more GPUs is not always better. Sometimes one V100 and two V100s make almost no difference, because the bottleneck is elsewhere. Here is a write-up of a few small tricks I have picked up.
Technology Read more →
2019 · 07 · 10
Notes on Three CVPR 2019 Neural Network Pruning Papers

The core ideas of three CVPR 2019 pruning papers—Variational Pruning estimates channel saliency with a probability distribution, Importance Estimation, and Cascaded Projection for end-to-end compression and acceleration—plus the day's experimental progress.
Paper NotesPruningCVPR Read more →
2019 · 07 · 09
Progressive Pruning: Breaking the Train-Prune-Finetune Paradigm

Mainstream pruning is a three-stage train-prune-finetune pipeline. These notes work through the idea of progressive pruning: using relative cross-layer statistics to guide non-uniform pruning, and treating the number of channels pruned each round as an analog of the learning rate, converging gradually rather than deciding all at once.
PruningModel CompressionIdeas Read more →
2019 · 07 · 01
Dynamic Network Inference: Design Thoughts on Early Exit and Dynamic Channel Pruning

Starting from a sensitivity-based pruning experiment, some thoughts on designing dynamic inference networks: why FLOPS is unreliable, abandoning multi-network dynamics in favor of single-network dynamics, attaching a classifier after each block for early exit, and how dynamic channel selection can be combined with pruning.
PruningDynamic InferenceModel Compression Read more →
2019 · 05 · 23
Notes on a Survey of Model Pruning Algorithms

Organizing a batch of model-pruning papers into three categories—predefined structured (L1, ThiNet, LASSO), automatic structured (Network Slimming, SSS), and others (AOFP, NISP, SNIP, Autopruner)—and discussing the thought-provoking conclusion that "for a predefined structure, training from scratch is enough."
Paper NotesPruningModel Compression Read more →
2019 · 05 · 09
Notes on MobileNetV3 and the Lottery Ticket Hypothesis

Two notes: MobileNetV3 applies NetAdapt fine-tuning on top of the MnasNet seed architecture, plus head/tail and activation-function tweaks — engineering refinements; the Lottery Ticket Hypothesis argues that large networks hide trainable sparse sub-networks, contrasted with "Rethinking the Value of Network Pruning."
Paper NotesMobileNetPruning Read more →
2019 · 05 · 06
DoReFa-Net: Notes on Low-Bit Quantized Training

DoReFa-Net quantizes weights, activations, and gradients separately to low bit-width, pushing most of the computation in both training and inference down to the bit-operation level. From the relationship between bit operations and dot products, to the STE (Straight-Through Estimator), to how each of the three is quantized, the overall algorithm flow, and the fused inference optimization.
Paper NotesQuantizationModel Compression Read more →
2019 · 04 · 24
pix2pix: A General Framework for Image Translation with cGANs

pix2pix uses a conditional GAN for general-purpose image translation—the same architecture transfers across tasks just by swapping datasets. Why not use L1 / L2 directly (they only capture low frequencies and produce blurry results), how PatchGAN models high frequencies, noise inputs and stochasticity, and the role of BN.
PapersGANImage Translation Read more →
2019 · 04 · 23
GAN Training Stability: Notes on Improved Techniques for Training GANs

Notes on "Improved Techniques for Training GANs": GAN training is essentially the search for a Nash equilibrium in a high-dimensional non-convex game, which makes convergence hard. A walkthrough of feature matching, minibatch discrimination, historical averaging, one-sided label smoothing, virtual BN, plus the Inception Score and the semi-supervised setup.
Paper NotesGANTraining Techniques Read more →
2019 · 04 · 18
A Collection of Original Ideas on NAS and Model Pruning

A batch of scattered ideas accumulated in NAS and model compression, spanning search strategies (MCMC, greedy linear structures), pruning, dynamic inference, and adaptive inference, gathered into a pick-and-experiment checklist, with reading notes on the AutoML Survey.
NASModel CompressionIdeas Read more →
2019 · 04 · 16
OctaveConv: Reducing Convolutional Redundancy via Frequency Decomposition

OctaveConv's angle isn't parameter count but the redundancy in feature maps—decompose them into low- and high-frequency components, control the ratio with a hyperparameter α, and store and compute the low-frequency part at a lower resolution. The parameter count stays the same while the compute drops.
PapersModel CompressionConvolution Read more →
2019 · 04 · 11
A Side-by-Side Comparison of Attention Mechanisms in CV, with Single-Path NAS Notes

A side-by-side comparison of several attention mechanisms in CV: SENet weights channels, Non-local applies self-attention over spatial pixels, CBAM chains the two, DANet uses dual attention for segmentation, plus notes on how Single-Path NAS compresses multi-path search into a single path.
PapersAttentionNAS Read more →
2019 · 04 · 10
Resource-Constrained NAS: Using Submodular Optimization to Justify Greedy Search

Searching for the highest-accuracy network under a resource budget is NP-hard. This paper casts NAS into the submodular optimization framework from combinatorial optimization, obtaining an approximation guarantee via a greedy algorithm and further cutting the compute cost—explaining, in theory, why greedy search is reasonable.
Paper NotesNASOptimization Algorithms Read more →
2019 · 04 · 07
Randomly Wired Neural Networks: A New Take on Replacing NAS with Graph Theory

Kaiming He et al. use random graph generators (ER / BA / WS) to produce a network's wiring directly, skipping selection altogether—under the same generator parameters, the performance variance across different random seeds is tiny. Notes on the three random-graph algorithms, the human priors retained in the overall network structure, and the ImageNet / COCO / robustness experiments.
Paper NotesNASNetwork Architecture Read more →
2019 · 04 · 01
Notes on a Survey of Deep Learning Interpretability

A set of interpretability notes built around "Visual Interpretability for Deep Learning: a Survey," walking through visualization, representation diagnosis, decomposing networks into explanatory graphs/decision trees, learning interpretable representations directly, and evaluation metrics — with Interpretable CNN, Network Dissection, and a black-box explanation survey thrown in.
Paper NotesInterpretabilityCNN Read more →
2019 · 03 · 30
A Survey of Generative Models and a Checklist of Image-Classification Training Tricks

Two sets of notes combined: one on the basic landscape of generative models (PixelRNN / CNN, AE / VAE, GAN), and one a checklist of training tricks for image classification (training, data augmentation, validation). They sit together because the regularization and stability ideas in generative models share a lot with discriminative training.
Generative ModelsGANTraining Tricks Read more →
2019 · 03 · 28
Dataset Distillation: Compressing an Entire Dataset Into a Handful of Images

Dataset distillation compresses an entire dataset into a few synthetic images per class plus a learning rate, reaching near full-training accuracy in just a few iterations. We work through the paper's five stages step by step: fixed / random initialization, linear-model analysis, multi-step gradients, and different initialization distributions.
PapersDataset DistillationModel Compression Read more →
2019 · 03 · 21
Auto-DeepLab Reading Notes: Rethinking the Essence of the NAS Search Space

Auto-DeepLab moves NAS from classification to semantic segmentation, and beyond searching the internal structure of a cell, it also searches the macro network topology outside the cell. I use this paper to reflect on the essence of the NAS search space, and on one possible research direction.
Paper NotesNASSemantic Segmentation Read more →
2019 · 03 · 20
Notes on TensorRT Inference Acceleration

A quick rundown of what TensorRT is for and the key optimizations it uses to speed up inference: layer fusion, automatic kernel selection, low precision, and the rough flow of building an engine.
TensorRTInference AccelerationModel Deployment Read more →
2019 · 03 · 19
A Close Reading of the ResNet Training Bag of Tricks

"Bag of Tricks for Image Classification" stacks a pile of training tricks together, pushing ResNet-50's Top-1 on ImageNet from 75.3 to 79.29. A chapter-by-chapter record of the baseline pipeline, the hardware-oriented large batch / warm-up / FP16 tricks, plus cosine decay, label smoothing, knowledge distillation, mixup, and other refinements.
Paper NotesTraining TricksResNet Read more →
2019 · 03 · 12
MobileNetV2 Close Reading: Inverted Residuals and Linear Bottlenecks

A close reading of MobileNetV2: why ReLU is dropped at the low-dimensional bottleneck, the inverted residual structure, choosing the expansion rate, the overall network and ReLU6 — all about balancing accuracy, FLOPs, and latency.
Paper NotesMobileNetNetwork Architecture Read more →
2019 · 02 · 23
Notes on DARTS and ProxylessNAS

DARTS uses a softmax to relax the discrete choice of operations into a continuous mixture, then takes gradients with respect to the architecture parameters. To address its memory bottleneck, ProxylessNAS switches to binarized path sampling, searches directly on ImageNet, and models hardware latency as a differentiable objective. Reading the two together makes clear how differentiable NAS evolved from "saving search cost" to "saving memory and fitting the hardware."
PapersNASDeep Learning Read more →
2018 · 11 · 22
Learning to Push by Grasping: Using multiple tasks for effective learning

End-to-end learning frameworks have become popular in robotic control: they take state/images as direct input and directly output predicted torques and action parameters. But they have been criticized for their heavy data requirements, sparking debate about their scalability—does the end-to-end approach require building a separate model for every task? Intuitively, sharing across tasks should help, since they all require some common understanding of the environment. This paper attempts the next step for data-driven end-to-end learning frameworks: moving from task-specific models to a joint model across multiple robotic tasks, with surprising results. Under the same amount of data, multi-task learning outperforms single-task learning. For the grasp task, for example, a model trained on 2.5k grasp samples plus 2.5k push samples outperforms a model trained on 5k grasp samples.
Technology Read more →
2018 · 11 · 17
Playing Atari with Deep Reinforcement Learning

This paper by Volodymyr Mnih, published at NIPS 2013, is roughly the founding work of DQN; the other one is the Nature 2015 paper.
Technology Read more →
2018 · 11 · 02
The Cityscapes Dataset

Cityscapes is commonly used for semantic segmentation. Its data is divided into 8 categories in total, including one named "void", and each category contains multiple classes. Cityscapes has 30 classes in all, but once labeled there are 35 kinds of labels in total, which also include labels such as "unlabeled" that are not counted as classes.
Technology Read more →
2018 · 10 · 30
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Earlier networks for segmentation were either too slow or too inaccurate. This paper designs an EDANet module that combines asymmetric convolution, dilated convolution, and dense connectivity. It outperforms FCN across the board, and does so without a decoder structure, a context module, a post-processing scheme, or a pretrained model. Experiments are run on Cityscapes and CamVid.
Technology Read more →
2018 · 10 · 22
Darts: Differentiable Architecture Search

This paper takes on architecture search by formulating the task in a differentiable form, instead of the traditional approach of using reinforcement learning over a discrete, non-differentiable space. The method is based on a continuous relaxation of the architecture representation, allowing efficient methods such as gradient descent to be used for architecture search. Subsequent experiments show that the algorithm performs well at discovering high-performance CNN architectures for image recognition and RNN architectures for language modeling, and is far faster than existing state-of-the-art non-differentiable architectures.
Technology Read more →
2018 · 10 · 15
Compressing Neural Networks with the Hashing Trick

As deep networks are increasingly deployed on mobile devices, a dilemma becomes ever more apparent: the trend in deep learning is to develop models that can absorb larger and larger datasets, yet mobile devices have limited storage and cannot hold overly large models. This paper proposes HashedNets, which reduce model size by exploiting the inherent redundancy inside neural networks. HashedNets use a low-cost hash function to randomly group connection weights into different hash buckets, and all connections that fall into the same bucket share a single parameter value. These parameters are tuned during standard backpropagation, and the hashing process introduces no extra memory overhead. Performance on a range of benchmark datasets shows that HashedNets can substantially reduce storage requirements while preserving generalization performance.
Technology Read more →
2018 · 10 · 11
ShuffleNetV2

Many network designs today consider only indirect metrics of computational complexity (such as FLOPs), yet direct metrics (such as speed) are not determined by FLOPs alone—MAC (memory access cost) and platform characteristics also influence speed. This paper argues for measuring directly on a specific platform, which is far better than considering FLOPs alone. Based on a series of controlled experiments, it proposes several guidelines for efficient networks, and from those guidelines derives a new architecture, ShuffleNetV2. Comprehensive ablation experiments show the model achieves a state-of-the-art trade-off between performance and accuracy.
Technology Read more →
2018 · 10 · 10
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

This paper introduces a highly efficient network, ShuffleNet, which centers on two operations—pointwise group convolution and channel shuffle—that drastically cut computation while maintaining accuracy. It outperforms prior networks on both ImageNet and COCO.
Technology Read more →
2018 · 10 · 04
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Targeting mobile and embedded vision applications, this paper proposes an efficient model called MobileNets, a lightweight neural network built on depthwise separable convolutions. The model uses two hyperparameters to trade off accuracy against latency, and extensive experiments balancing the two were conducted on ImageNet, demonstrating strong performance compared with other models. Experiments also showcase MobileNets' strengths across a wide range of applications, including object detection, fine-grained classification, face attributes, and large-scale geolocalization.
Technology Read more →
2018 · 09 · 28
Notes on Inception-V4

In recent years, very deep convolutional neural networks have been the single biggest driver of improvements in image recognition performance. The Inception architecture achieves strong performance while keeping computational cost relatively low. Combining residual connections with conventional architectures produced the best results on the 2015 ILSVRC, comparable to Inception-V3. This paper considers combining Inception networks with residual connections; there is ample evidence that residual connections can greatly accelerate the training of Inception networks, and also evidence that a residual Inception slightly outperforms a non-residual Inception of almost the same computational cost. The paper also proposes several new Inception networks, both with and without residual connections, and these changes likewise markedly improve single-frame classification performance on the 2012 ILSVRC. Finally, it shows that scaling the activations appropriately can make the training of very wide residual Inception networks more stable.
Technology Read more →
2018 · 09 · 21
On Differentiating Vectors and Matrices

Machine learning algorithms involve a great deal of matrix-related differentiation and derivatives. Here we introduce some common derivative formulas for matrices and vectors.
Technology Read more →
2018 · 09 · 14
A General Solution to Stock-Trading Problems in Dynamic Programming

There is a class of dynamic-programming problems that give you a sequence of stock prices and ask for the maximum profit you can earn by buying and selling. These problems come in many variants — only one transaction allowed, unlimited transactions, an added transaction fee, and so on. In other words, the maximum profit is generally determined by the trading day and the maximum number of transactions allowed (where one transaction is a single buy paired with a single sell).
Technology Read more →
2018 · 08 · 31
Definition of Convex Sets and Common Convex Sets

Similar to solving the equality-constrained optimization problems discussed earlier, optimization problems with inequality constraints can likewise be solved using the method of Lagrange multipliers
Technology Read more →
2018 · 08 · 24
Deriving the SVM (3)

The previous posts covered the derivation of the hard-margin SVM. This post continues with the mathematical derivation of the soft-margin SVM, which allows some samples to be misclassified when the data is not linearly separable.
Technology Read more →
2018 · 08 · 18
Deriving the SVM (Part 2)

In the previous post (Part 1) we discussed the derivation of the hard-margin SVM and its dual form, whose dual problem can be simplified into the following form
Technology Read more →
2018 · 08 · 10
Deriving the SVM (Part 1)

The SVM is a classic method in machine learning. Beyond the hard-margin SVM, there are also variants such as the soft-margin SVM and the kernel trick. This article focuses on deriving the hard-margin SVM.
Technology Read more →
2018 · 07 · 26
Solving Systems of Linear Equations (3)

The pseudoinverse introduced here is the Moore-Penrose inverse
Technology Read more →
2018 · 07 · 21
Solving Systems of Linear Equations (2)

The previous post covered one case of linear systems—where the number of unknowns is smaller than the number of equations—and introduced the least-squares method. This post covers the other case, where the number of equations is smaller than the number of unknowns. Here the system has infinitely many solutions, but the one closest to the origin—the solution with the smallest norm—is unique. This is the minimum-norm solution of a linear system that we introduce here.
Technology Read more →
2018 · 07 · 20
207. Course Schedule

This problem uses DFS and BFS to determine whether a given graph admits a topological ordering.
Technology Read more →
2018 · 07 · 20
Solving Systems of Linear Equations (1)

This article discusses solving one particular case of systems of linear equations, namely considering the system
Technology Read more →
2018 · 07 · 14
Numerical Computation in Machine Learning (1)

Machine learning algorithms usually require a great deal of numerical computation—that is, solving for approximate values iteratively rather than obtaining analytical solutions. These algorithms typically involve optimization and solving systems of linear equations. Representing various floating-point numbers with a finite number of bits on a computer carries inherent error, so we need certain methods to guarantee the precision of our computations.
Technology Read more →
2018 · 07 · 06
Training a Simple Neural Network with TensorFlow

Here we use TensorFlow's Eager Execution to build the model, so that we no longer have to create a Graph and Session as before, which makes training a neural network more convenient and faster. Below we use the Iris dataset as an example to train a neural network, with the code taken from Google's tutorial.
Technology Read more →
2018 · 06 · 29
Doing Deep Learning on GeekCloud

I was recently working on an image-related deep learning assignment from my professor. After debugging the code, I found my computer didn't have enough memory (an 8GB laptop), and later I discovered a really handy deep learning cloud service platform
Technology Read more →
2018 · 06 · 15
LiDAR + Camera Data Fusion on KITTI

KITTI offers many datasets; here we pick the raw_data (raw data) for fusion.
Technology Read more →
2018 · 06 · 08
Solving Optimization Problems with Inequality Constraints

Similar to the equality-constrained optimization problems discussed earlier, optimization problems with inequality constraints can also be solved using the method of Lagrange multipliers.
Technology Read more →
2018 · 06 · 02
Constructors in C++

Every class defines how its objects are initialized. A class controls the initialization of its objects through one or more special member functions called constructors. The job of a constructor is to initialize the data members of a class object, and a constructor runs whenever an object of the class is created.
Technology Read more →
2018 · 06 · 01
Associative Containers in C++

Associative containers support efficient lookup and access by key. The two primary associative containers are set and map. The elements of a map are key-value pairs, where the key acts as an index and the value represents the data associated with that index; the elements of a set contain only a key. A set supports efficient key lookup, and is presumably implemented with a hash table under the hood.
Technology Read more →
2018 · 06 · 01
Deriving Backpropagation for Neural Networks

For the training process of a neural network, the backpropagation algorithm lies at its core
Technology Read more →
2018 · 05 · 25
Sequential Containers in C++

A container is a collection of objects of a specific type. Sequential containers give you control over the order in which elements are stored and accessed.
Technology Read more →
2018 · 05 · 24
An Introduction to Decision Trees and Random Forests

A decision tree is a method for classification and regression; this article focuses mainly on decision trees used for classification. A decision tree has a tree-like structure, and in classification problems it represents the process of classifying data based on features. It can usually be regarded as a collection of if-then rules, or as a conditional probability distribution defined over the feature space and the class space. Its main advantages are good model interpretability and fast classification. During training, the decision tree model is built from the training data according to the principle of minimizing a loss function. During prediction, the decision tree is used to classify new data. Learning a decision tree usually involves three steps: feature selection, decision tree generation, and decision tree pruning. These decision-tree ideas come mainly from the ID3 algorithm proposed by Quinlan in 1986 and the C4.5 algorithm proposed in 1993, as well as the CART algorithm proposed by Breiman et al. in 1984.
Technology Read more →
2018 · 05 · 18
IO Classes in C++

C++ does not handle input and output directly; instead, it relies on a set of types defined in the standard library to deal with IO. These types support IO operations that read data from devices and write data to devices, where a device can be a file, a console window, and so on. Some types also allow in-memory IO, that is, reading data from a string, writing data to a string, and the like.
Technology Read more →
2018 · 05 · 18
Solving Optimization Problems with Equality Constraints

This article discusses optimization problems of the following form
Technology Read more →
2018 · 05 · 11
Duality in Linear Programming

Every linear programming problem has a corresponding dual problem. The dual is itself a linear program, and the dual of the dual is the original problem. The optimal solution of the primal can be obtained from the dual; sometimes solving a linear program via duality theory is simpler and reveals the essence of the problem more clearly. Inspired by duality theory, the performance of the simplex method has been improved, and some non-simplex methods for solving linear programs have emerged, which this article does not cover in detail.
Technology Read more →
2018 · 05 · 04
Parameter Passing in C++ Functions

In a C++ program, calling a function requires passing it arguments. Apart from an empty parameter list (void), parameter passing comes in two kinds: pass by reference and pass by value.
Technology Read more →
2018 · 05 · 04
The Simplex Algorithm for Solving Linear Programming Problems

In 1947, Dantzig proposed a method for solving linear programming problems, now known as the simplex method. It is a concise and efficient algorithm, hailed as one of the ten algorithms with the greatest impact on scientific development and engineering practice in the 20th century.
Technology Read more →
2018 · 04 · 27
An Overview of Linear Programming

Among optimization problems there is a class known as linear programming problems, which belong to constrained optimization. Linear programming is the problem of finding the extremum of a linear objective function under linear constraints (equalities or inequalities).
Technology Read more →
2018 · 04 · 26
The const Keyword in C++

When programming we often need to define a kind of variable whose value never changes, for example pi=3.14, e=2.72, or the elastic modulus of a material. That is when the const keyword comes in.
Technology Read more →
2017 · 03 · 29
Self-Tuning PID Parameters with a Genetic Algorithm (Simulink Implementation)

Automatically tuning PID parameters with a genetic algorithm in MATLAB / Simulink: building the simulation platform, designing an error-centric objective function, writing the interface function between the GA and Simulink, and iterating with a main function. Of limited practical use, but worth noting as a learning method.
Genetic AlgorithmsPIDSimulink Read more →
2017 · 03 · 27
How Genetic Algorithms Work, with a MATLAB Implementation

Walking through the complete genetic-algorithm pipeline on a classic problem—finding the maximum of a two-variable function—covering individual encoding, the initial population, fitness, selection / crossover / mutation, with a MATLAB implementation and a quick reference for the most-used functions in the Sheffield GA Toolbox.
Genetic AlgorithmsMATLABOptimization Algorithms Read more →
2017 · 03 · 26
Plotting a Vehicle's Dynamic Performance Curves with MATLAB

A big problem from my Automotive Theory class: given the parameters of a medium-duty truck, use MATLAB to compute and plot the driving-force / road-resistance balance diagram, the acceleration-time curve, the dynamic-factor diagram, and the power-balance diagram—and along the way find the top speed and the maximum gradeability.
MATLABAutomotive TheoryVehicle Engineering Read more →

On Business Taste

How to Persuade People

Lessons From Building Products for Overseas Markets

How to Iterate on Yourself Systematically

Common Go Concurrency Pattern Templates

My Claude Code Practices

Reason Is the Slave of the Passions

The Barbell Strategy and What It Teaches Us About Personal Growth

Hong Kong's Mountains: A Hiking Map of Trails, History, and Urban Memory

Where Ideas Are Born: A Pilgrimage Map of European Philosophy

Reading Notes: Freedom of Money

Survival, Competition, and Freedom

Analyzing the Business Models of Independent Content Creators

The Boundaries of Compression

Book Notes: The Technological Republic

The Data Supplier Behind the Large Models: Surge AI

The VL Model Behind the Doubao AI Phone

A Dimension-by-Dimension Walkthrough of LLM Inference, with the Core Formulas

Using UTM Tags to Analyze Traffic Sources

jenni.ai's Cold Start and Growth Strategy

Startup Perks Worth Knowing About

Shuangyue Bay: When You're Worn Out, Come Here to Lie Flat and Watch the Sea

Hong Kong Country Trails: Walking the Ridges and the Coast of Ten Thousand Columns

Zhongshan + Jiangmen: Visiting the Filming Locations of "The Knockout" and "Let the Bullets Fly"

Dapeng Peninsula: The Cleanest Sea and the Most Beautiful Coastline

Shenzhen After Hours: A Few Places for Weeknights and Weekends

Wugong Mountain: Alpine Meadows Above the Clouds

NCNN Peak Memory Benchmark: A Layer-by-Layer Analysis of MobileNet

A Panorama of ML System Design: Inference, Training, Data, and Deployment

Profiling Performance in Deep Learning Training and Inference

Pruning Convolutional Neural Networks: Decide the Plan, Prune, Then Finetune

Training Tricks Miscellany: Reparameterization, Label Smoothing, and Dropout

Knowledge Distillation: Teaching a Small Network to Mimic a Large One

Image Data Augmentation: From Random Crop to AutoAugment

Dealing with Class Imbalance: Resampling, Weighting, and Ensembles

Deep Learning Hyperparameter Tuning: From Hand-Crafted Alchemy to Automated Search

Designing Compact Networks: Taking Convolution Apart

The Evolution of Convolutional Network Architectures: From LeNet to SENet

How to Measure Whether a Neural Network Is "Good" and "Fast"

A Quantitative Analysis of PyTorch Training Acceleration

Notes on a Survey of Automated Data Augmentation Methods

AutoDL Image Competition Tuning Log

Milestones in Neural Architecture Search (NAS)

Feeding the GPU in Deep Learning

Notes on Three CVPR 2019 Neural Network Pruning Papers

Progressive Pruning: Breaking the Train-Prune-Finetune Paradigm

Dynamic Network Inference: Design Thoughts on Early Exit and Dynamic Channel Pruning

Notes on a Survey of Model Pruning Algorithms

Notes on MobileNetV3 and the Lottery Ticket Hypothesis

DoReFa-Net: Notes on Low-Bit Quantized Training

pix2pix: A General Framework for Image Translation with cGANs

GAN Training Stability: Notes on Improved Techniques for Training GANs

A Collection of Original Ideas on NAS and Model Pruning

OctaveConv: Reducing Convolutional Redundancy via Frequency Decomposition

A Side-by-Side Comparison of Attention Mechanisms in CV, with Single-Path NAS Notes

Resource-Constrained NAS: Using Submodular Optimization to Justify Greedy Search

Randomly Wired Neural Networks: A New Take on Replacing NAS with Graph Theory

Notes on a Survey of Deep Learning Interpretability

A Survey of Generative Models and a Checklist of Image-Classification Training Tricks

Dataset Distillation: Compressing an Entire Dataset Into a Handful of Images

Auto-DeepLab Reading Notes: Rethinking the Essence of the NAS Search Space

Notes on TensorRT Inference Acceleration

A Close Reading of the ResNet Training Bag of Tricks

MobileNetV2 Close Reading: Inverted Residuals and Linear Bottlenecks

Notes on DARTS and ProxylessNAS

Learning to Push by Grasping: Using multiple tasks for effective learning

Playing Atari with Deep Reinforcement Learning

The Cityscapes Dataset

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Darts: Differentiable Architecture Search

Compressing Neural Networks with the Hashing Trick

ShuffleNetV2

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Notes on Inception-V4

On Differentiating Vectors and Matrices

A General Solution to Stock-Trading Problems in Dynamic Programming

Definition of Convex Sets and Common Convex Sets

Deriving the SVM (3)

Deriving the SVM (Part 2)