Frank
And lost be the day to us in which a measure was not danced.
Und verloren sei uns der Tag, wo nicht einmal getanzt wurde!
—— Nietzsche, Thus Spoke Zarathustra
- 2026 · 06 · 13
On Business Taste
When SpaceX was on the brink of bankruptcy, Peter Thiel made a life-saving investment that turned into over 50 billion dollars on the day the company went public. From Girard's mimetic desire, to the four traits of monopoly in Zero to One, to Warren Buffett and Duan Yongping's principle of 'look at the business model first'—I've come to realize that taste in business models is the single most important thing, whether you're investing or building a company.
Read more → - 2026 · 06 · 12
How to Persuade People
The best theory of persuasion comes not from a modern business school but from Aristotle, twenty-three hundred years ago: Ethos (credibility), Pathos (emotion), Logos (logic). Every popular persuasion framework today—Claim → Reason → Evidence for a raise, STAR for interviews, BLUF for email, Problem → Solution → Traction for the pitch, Pain → Gain → Proof for sales, the Golden Circle for the launch event, BATNA for negotiation, AIDA for copywriting, Nonviolent Communication for intimate relationships—is at bottom a localized version of these three elements for a different audience. This piece walks through ten scenarios one by one and lands on a single line: the framework is the map, sincerity is the compass.
Read more → - 2026 · 06 · 09
Lessons From Building Products for Overseas Markets
Claude Code has gotten really good lately, and a lot of people now have a shot at building their own products. This post shares the tech stack and the various SaaS platforms I commonly use when building products for overseas markets: static sites, full front-end/back-end apps, mobile, and desktop—covering framework choices, deployment, databases, auth, payments, AI services, and image hosting, with free-tier limits and starting costs wherever I can.
Read more → - 2026 · 06 · 05
How to Iterate on Yourself Systematically
Karpathy on how to become an expert and Feynman on how to test real understanding are, at bottom, the same thing: use output to check input. This piece maps the method—do a project first, fill in theory later, only compare yourself to your past self—onto the training loop of a neural network. Forward is acting first, Loss is the gap from the ideal result, and Backward plus gradient descent is the review-and-improve step. The biggest difference between people and networks is that a large Loss hurts us, so the key is to separate criticizing behavior from negating worth: gradient descent only updates the parameters (your behavior, your skills), so don't blow up the architecture (who you are). Two final tricks: write your mistakes down serially, then immediately move on to the next round—don't keep running Backward on the same sample and overfit.
Read more → - 2026 · 06 · 03
Common Go Concurrency Pattern Templates
A ready-to-use Go concurrency reference. From "locking shared variables, Once, channels, Pub/Sub" to "select/Actor, pipeline/fan-out/worker pool, context, errgroup," each pattern gives when to use it, template code, and key pitfalls, closing with a cheat sheet: if you can avoid sharing, communicate instead; if you must share, wrap it with the simplest synchronization possible; and every goroutine needs a clear exit path.
Read more → - 2026 · 05 · 27
My Claude Code Practices
I've been using Claude Code for almost a year now. As one of its earliest users, I've watched both Claude Code and the models themselves improve enormously over the past year. This post shares some ways I use it beyond writing code, along with what I've learned: a workflow built on an always-on home machine + Tailscale + tmux; managing frontend and backend with a monorepo + submodules + OpenAPI; doing Deep Research with tree-structured prompts; cleaning up long conversations and messy desktop folders; and using it to draw SVGs, process images, make videos with Remotion, and more.
Read more → - 2026 · 05 · 19
Reason Is the Slave of the Passions
Hume said that reason is, and ought only to be, the slave of the passions. Translate that into language a software engineer knows today: emotion is the machine instruction set, the thing the CPU actually executes; reason is a high-level language that has to be compiled before it can run. Any chain of rational reasoning ultimately has to bottom out in something you care about. A good system first builds its framework in a high-level language, then profiles out the hot spots and hand-optimizes that small slice to run close to the hardware. Port the same architecture onto life: use reason to build the big framework, use love to drive the high-frequency everyday.
Read more → - 2026 · 05 · 17
The Barbell Strategy and What It Teaches Us About Personal Growth
Taleb's barbell strategy: put your resources at the extremes and deliberately avoid the middle. The idea was first used to talk about investing, then extended to reading, careers, even life planning—but its core was never about how much to put on each end. It's about accepting that the world is fundamentally uncertain: first make sure you survive, then go after the opportunities that might let you leap.
Read more → - 2026 · 05 · 09
Hong Kong's Mountains: A Hiking Map of Trails, History, and Urban Memory
The 100 km MacLehose Trail, Lion Rock sung about for fifty years, the pavilion at Pat Sin Leng that fire kept failing to consume—70% of Hong Kong is country park, and every trail threads through a piece of institutional history, an old song, or a war. 10 routes × history × culture.
Read more → - 2026 · 05 · 09
Where Ideas Are Born: A Pilgrimage Map of European Philosophy
Plato's Academy, a Danube army camp, a Bordeaux tower, an Ulm stove-room, the Sils-Maria boulder, a Black Forest hut—putting 12 philosophical classics back into the specific rooms, cities, and landscapes where they were born.
Read more → - 2026 · 04 · 11
Reading Notes: Freedom of Money
I spent two days reading Changpeng Zhao's autobiography Freedom of Money. I'm not into crypto, but I've always been curious about CZ's legendary story—selling his house to go all-in on Bitcoin, becoming the richest ethnic Chinese person, building a crypto empire, and serving time in a U.S. prison. A few reflections.
Read more → - 2026 · 04 · 06
Survival, Competition, and Freedom
There are three stages to the things people do: in the survival stage you are driven by instinct, in the competition stage your coordinate system is defined by your rivals, and when you finally reach freedom, what you face is an open wilderness with no compass. Freedom without values is a painful prison; freedom with values is the most powerful engine. Charles Zhang and Elon Musk are two of the most telling case studies.
Read more → - 2026 · 02 · 15
Analyzing the Business Models of Independent Content Creators
Content creation may be the biggest lever available to ordinary people today—low barrier to entry, high ceiling, zero marginal cost of distribution. Looked at through the lens of business models, there are really only three ways to make a living from it: sell content directly, help others sell goods, or help yourself sell goods
Read more → - 2026 · 02 · 01
The Boundaries of Compression
Knowledge falls into two kinds: Techne, which can be serialized, and Metis, which depends on context, trial and error, and the body. AI excels at compressing the former but cannot touch the latter—and that is precisely the boundary of its capability. As standardized work gets rapidly replaced, the experience that cannot be compressed becomes scarcer instead.
Read more → - 2026 · 01 · 23
Book Notes: The Technological Republic
Palantir CEO Alex Karp released a new book, The Technological Republic, in 2025, and it was published in mainland China at the end of the year. I bought it and read it right away. The views in the book represent the right-wing current of thought in Silicon Valley, and its shadow can be seen everywhere in actual American politics.
Read more → - 2025 · 12 · 21
The Data Supplier Behind the Large Models: Surge AI
I first heard of Surge AI from a podcast interview with Edwin Chen, right as they were raising their first round of funding. Edwin came across as extremely pragmatic and efficient, and his views left a strong impression.
Read more → - 2025 · 12 · 19
The VL Model Behind the Doubao AI Phone
According to public reporting, the model used in the Doubao AI phone is a closed-source version of UI-TARS optimized for phones. UI-TARS itself is the result of SFT on top of Alibaba's Qwen2 VL, with the 7B version currently open-sourced (Qwen2 VL has open-sourced models from 3B to 72B). Rather than dwelling on Qwen here (Qwen2 VL already has UI Operation capabilities), this post focuses on how UI-TARS improves further on top of Qwen2 VL, split into data and training.
Read more → - 2025 · 10 · 15
A Dimension-by-Dimension Walkthrough of LLM Inference, with the Core Formulas
Using Llama 3 8B as the baseline, we trace the entire inference path — token ID → embedding → Transformer → sampling — and write out the dimensional flow and core formulas from memory.
Read more → - 2024 · 04 · 04
Using UTM Tags to Analyze Traffic Sources
Promotion usually runs several channels at once: cold email, Google Ads, Twitter, SEO, communities. Without tagging your links, GA can't tell them apart. This post lays out the five UTM parameters clearly and gives a dozen real-world naming examples.
Read more → - 2024 · 03 · 21
jenni.ai's Cold Start and Growth Strategy
Jenni.ai is an AI tool for assisting with paper writing and reading. It has already reached \$5M ARR with 2.5M users and is still growing fast. In its early days it actually did SEO writing and could barely survive; later it narrowed down to academic writing and came back from the dead. CEO David Park has candidly shared their growth path from zero to several million dollars—from cold calls, to lurking in groups, to viral growth on TikTok—and much of it is worth learning from.
Read more → - 2024 · 03 · 03
Startup Perks Worth Knowing About
A systematic rundown of the perks the big international vendors offer startups, covering cloud, AI tokens, banking, and SaaS tools. With special attention to which ones you can apply for directly without a VC referral.
Read more → - 2023 · 04 · 15
Shuangyue Bay: When You're Worn Out, Come Here to Lie Flat and Watch the Sea
Just two hours from Shenzhen, Shuangyue Bay in Huizhou is a place made for doing nothing: watch the sunrise over the water from your guesthouse balcony, set off fireworks and grill skewers on the beach, eat seafood far cheaper than in Shenzhen, then climb the lookout to see how two curving coastlines come together to form a "double moon." Includes a concrete two-day, one-night itinerary.
Read more → - 2023 · 02 · 11
Hong Kong Country Trails: Walking the Ridges and the Coast of Ten Thousand Columns
Hong Kong is more than Victoria Harbour and Temple Street. Cross the border from Shenzhen and in half a day you can step into its other side—weathered ridges, alpine meadows like Wugong Mountain, deserted beaches all to yourself, and a coast of ten thousand columns left behind by Jurassic volcanoes. This piece gathers four country hikes: Beiling Double Crossing, Kai Kung Leng, Sharp Peak, and Po Pin Chau.
Read more → - 2022 · 12 · 10
Zhongshan + Jiangmen: Visiting the Filming Locations of "The Knockout" and "Let the Bullets Fly"
Stringing Zhongshan and Jiangmen into a single weekend route: Thirty-Three Market Street recreates the old factory district from "The Knockout," the Mei Family Mansion is the Goose Town of "Let the Bullets Fly," a single ticket to the Zhongshan Film Studio lets you shoot scenes from many countries, plus two easy hikes up Shitou Mountain and Yaji Mountain. A hands-on guide for movie fans and photo lovers.
Read more → - 2022 · 11 · 12
Dapeng Peninsula: The Cleanest Sea and the Most Beautiful Coastline
Right next to a megacity like Shenzhen, there's this pristine landscape of mountains and sea. Dongxichong, Dayanding, Daluqiang, the Mermaid Cave at Luzui Villa, and the Shenzhen Observatory—five spots on the Dapeng Peninsula I keep coming back to, from hardcore hikes to gentle seaside strolls you can take your parents on.
Read more → - 2022 · 10 · 15
Shenzhen After Hours: A Few Places for Weeknights and Weekends
No need to travel far—right within the city, Shenzhen hides plenty of places worth a visit: chase the sunset up Tanglang Mountain after work, watch the sun sink below the horizon at the Mangrove Park, listen to live music at sea aboard the Greater Bay Area No. 1 cruise, pray at Hongfa Temple and stroll through Fairy Lake Botanical Garden, or hike Meishajian at night to overlook Yantian Port.
Read more → - 2022 · 08 · 13
Wugong Mountain: Alpine Meadows Above the Clouds
Wugong Mountain is in Pingxiang, Jiangxi, and is best known for its rolling alpine meadows—green grasslands in summer, with a better chance of catching a sea of clouds in winter. This is a record of a three-day, two-night trip from Shenzhen: taking the cable car up to skip the thousand-meter climb, walking the cliffside boardwalk, watching the sunrise, spending a night on the summit, and happening to catch a bonfire music festival on the meadow.
Read more → - 2021 · 06 · 12
NCNN Peak Memory Benchmark: A Layer-by-Layer Analysis of MobileNet
To understand how much peak memory a model actually consumes during inference, I ran a layer-by-layer benchmark of MobileNet with NCNN on x86, compared several networks side by side, and analyzed how light_mode, fp16, and int8 affect peak memory — along with the differences in how "peak memory" itself is defined.
Read more → - 2021 · 04 · 18
A Panorama of ML System Design: Inference, Training, Data, and Deployment
Course notes from Stanford's CS 329S, Machine Learning Systems Design: inference and learning paradigms, data storage and feature management, sampling and class imbalance, parallel training and system testing, experiment-management tooling, and model deployment — a full picture of taking ML systems into production.
Read more → - 2021 · 04 · 12
Profiling Performance in Deep Learning Training and Inference
Performance bottlenecks in training and inference often aren't obvious; you need profiling to analyze them quantitatively. Notes on the profiling methods I use for the training phase (breaking down the time spent on data loading, the forward pass, and the backward pass) and the inference phase.
Read more → - 2021 · 02 · 07
Pruning Convolutional Neural Networks: Decide the Plan, Prune, Then Finetune
Pruning is subtraction applied to a trained network. It breaks down into three steps: first decide how much to prune in each layer (by intuition, by sensitivity analysis, or by searching with reinforcement learning as in AMC), then actually prune (unstructured weight sparsity and the "Lottery Ticket Hypothesis"; or structured pruning that removes whole filters using criteria like L1/L2 norm, geometric median FPGM, BN's γ, or feature-map rank HRank), and finally finetune to recover the accuracy. Or you can simply prune while training—Slimmable Networks and AutoSlim.
Read more → - 2021 · 02 · 01
Training Tricks Miscellany: Reparameterization, Label Smoothing, and Dropout
A few scattered but practical training tricks. Reparameterization (ACNet / RepVGG / RepMLP) adds extra branches at training time to help optimization, then fuses them back into the original structure at inference—essentially gaining a bit of accuracy for free without adding any inference cost. Plus weight EMA, Shake-Shake, label smoothing, and Dropout—common tactics for fighting overfitting and improving generalization.
Read more → - 2021 · 01 · 30
Knowledge Distillation: Teaching a Small Network to Mimic a Large One
Knowledge distillation is a common way to boost a small network's accuracy and speed up convergence, and it's often used to fine-tune small models after compression or NAS. The core idea is straightforward: treat the outputs of a high-accuracy "teacher network" as soft labels for a "student network," letting the student mimic the teacher. This post walks through several representative methods—classic temperature-scaled distillation, FitNets' intermediate-layer hints, FSP matrices, the teacher-assistant network (TAKD), and DML, where two networks learn from each other.
Read more → - 2021 · 01 · 24
Image Data Augmentation: From Random Crop to AutoAugment
Data augmentation is a standard trick for preventing overfitting in computer vision. This post first walks through the common image augmentations—random resized crop, cutout, random erase, mixup—then covers the automated AutoAugment and Fast AutoAugment (I used the latter to win the image track of the AutoDL competition back in the day), and finally adds a note on test-time augmentation (TTA).
Read more → - 2021 · 01 · 18
Dealing with Class Imbalance: Resampling, Weighting, and Ensembles
Real-world data is rarely as tidy as public datasets, and severely imbalanced positive and negative samples are the norm. There are three main approaches to class imbalance: resampling (undersampling the majority class, e.g. Tomek Links; oversampling the minority class, e.g. SMOTE), weighting different classes or easy/hard samples (Focal Loss), and ensembles (Bagging).
Read more → - 2020 · 12 · 12
Deep Learning Hyperparameter Tuning: From Hand-Crafted Alchemy to Automated Search
Tuning in deep learning is both a black art and a make-or-break step—so much so that training is jokingly called "alchemy" and engineers call themselves "tuning monkeys." This post first covers the essentials of manual tuning—how batch size and learning rate move together, the update rules for SGD / momentum / Adagrad / Adam, learning-rate warmup and decay, and a small weight-decay trick—then turns to automated hyperparameter optimization (HPO): grid/random search, the CMA-ES evolutionary algorithm, and the most sample-efficient Bayesian optimization (Gaussian process + acquisition function), plus off-the-shelf tools like NNI and RAY.
Read more → - 2020 · 12 · 06
Designing Compact Networks: Taking Convolution Apart
To run on mobile with low latency and a small memory footprint, a family of compact networks was designed. This piece walks through their design ideas: MobileNet splits the standard convolution into depthwise + 1×1 convolutions (cutting FLOPs by up to 9×); ShuffleNet's grouped 1×1 convolution + channel shuffle; MobileNetV2's inverted residual; ShuffleNetV2's four design guidelines built around memory access cost (MAC); and MobileNetV3's h-swish, EfficientNet's compound scaling, and GhostNet's cheap feature generation.
Read more → - 2020 · 12 · 01
The Evolution of Convolutional Network Architectures: From LeNet to SENet
Following the ILSVRC timeline, this walks through several milestones in convolutional neural network architecture: LeNet-5 from 1998, AlexNet that ignited deep learning in 2012, GoogLeNet with its multi-branch Inception modules, the deeper VGG, Inception with Batch Normalization, ResNet that trained hundreds of layers via residual connections, and SENet that introduced channel attention.
Read more → - 2020 · 11 · 15
How to Measure Whether a Neural Network Is "Good" and "Fast"
Optimization presupposes measurement—if you can not measure it, you can not improve it. A neural network's performance actually splits into two lines: task-oriented accuracy (Top-1 for classification, mAP for detection, mIoU for segmentation, PSNR for restoration) and efficiency-oriented cost (latency, FLOPs, parameter count, peak memory). This piece lays out the definitions and computations of these metrics, and why FLOPs so often lie (DenseNet161 is more than twice as slow as VGG16), all in one go.
Read more → - 2020 · 11 · 03
A Quantitative Analysis of PyTorch Training Acceleration
Starting from a baseline, this article progressively optimizes training speed through a variety of software and hardware methods, ultimately cutting training time to one eighth.
Read more → - 2020 · 02 · 29
Notes on a Survey of Automated Data Augmentation Methods
Notes on three automated data augmentation surveys: AutoAugment searches for augmentation policies with reinforcement learning, RandAugment drastically shrinks the search space, and Fast AutoAugment speeds up the search by merging policies.
Read more → - 2019 · 12 · 31
AutoDL Image Competition Tuning Log
A running log of tuning experiments on the AutoDL image-classification track (Pedro dataset): ResNet18 / 34 backbones, CELU in place of ReLU, resetting BN and Group Norm, FP16 mixed precision, ReID / pedestrian-attribute pretraining, and a series of controlled experiments including online-data reshuffling.
Read more → - 2019 · 12 · 01
Milestones in Neural Architecture Search (NAS)
Neural Architecture Search (NAS) has been red-hot this year. This post is a quick rundown of the work I personally find most representative. Corrections and additions are welcome, hahaha.
Read more → - 2019 · 08 · 12
Feeding the GPU in Deep Learning
I trained a lot of models recently and found that brute force is not always magic—more GPUs is not always better. Sometimes one V100 and two V100s make almost no difference, because the bottleneck is elsewhere. Here is a write-up of a few small tricks I have picked up.
Read more → - 2019 · 07 · 10
Notes on Three CVPR 2019 Neural Network Pruning Papers
The core ideas of three CVPR 2019 pruning papers—Variational Pruning estimates channel saliency with a probability distribution, Importance Estimation, and Cascaded Projection for end-to-end compression and acceleration—plus the day's experimental progress.
Read more → - 2019 · 07 · 09
Progressive Pruning: Breaking the Train-Prune-Finetune Paradigm
Mainstream pruning is a three-stage train-prune-finetune pipeline. These notes work through the idea of progressive pruning: using relative cross-layer statistics to guide non-uniform pruning, and treating the number of channels pruned each round as an analog of the learning rate, converging gradually rather than deciding all at once.
Read more → - 2019 · 07 · 01
Dynamic Network Inference: Design Thoughts on Early Exit and Dynamic Channel Pruning
Starting from a sensitivity-based pruning experiment, some thoughts on designing dynamic inference networks: why FLOPS is unreliable, abandoning multi-network dynamics in favor of single-network dynamics, attaching a classifier after each block for early exit, and how dynamic channel selection can be combined with pruning.
Read more → - 2019 · 05 · 23
Notes on a Survey of Model Pruning Algorithms
Organizing a batch of model-pruning papers into three categories—predefined structured (L1, ThiNet, LASSO), automatic structured (Network Slimming, SSS), and others (AOFP, NISP, SNIP, Autopruner)—and discussing the thought-provoking conclusion that "for a predefined structure, training from scratch is enough."
Read more → - 2019 · 05 · 09
Notes on MobileNetV3 and the Lottery Ticket Hypothesis
Two notes: MobileNetV3 applies NetAdapt fine-tuning on top of the MnasNet seed architecture, plus head/tail and activation-function tweaks — engineering refinements; the Lottery Ticket Hypothesis argues that large networks hide trainable sparse sub-networks, contrasted with "Rethinking the Value of Network Pruning."
Read more → - 2019 · 05 · 06
DoReFa-Net: Notes on Low-Bit Quantized Training
DoReFa-Net quantizes weights, activations, and gradients separately to low bit-width, pushing most of the computation in both training and inference down to the bit-operation level. From the relationship between bit operations and dot products, to the STE (Straight-Through Estimator), to how each of the three is quantized, the overall algorithm flow, and the fused inference optimization.
Read more → - 2019 · 04 · 24
pix2pix: A General Framework for Image Translation with cGANs
pix2pix uses a conditional GAN for general-purpose image translation—the same architecture transfers across tasks just by swapping datasets. Why not use L1 / L2 directly (they only capture low frequencies and produce blurry results), how PatchGAN models high frequencies, noise inputs and stochasticity, and the role of BN.
Read more → - 2019 · 04 · 23
GAN Training Stability: Notes on Improved Techniques for Training GANs
Notes on "Improved Techniques for Training GANs": GAN training is essentially the search for a Nash equilibrium in a high-dimensional non-convex game, which makes convergence hard. A walkthrough of feature matching, minibatch discrimination, historical averaging, one-sided label smoothing, virtual BN, plus the Inception Score and the semi-supervised setup.
Read more → - 2019 · 04 · 18
A Collection of Original Ideas on NAS and Model Pruning
A batch of scattered ideas accumulated in NAS and model compression, spanning search strategies (MCMC, greedy linear structures), pruning, dynamic inference, and adaptive inference, gathered into a pick-and-experiment checklist, with reading notes on the AutoML Survey.
Read more → - 2019 · 04 · 16
OctaveConv: Reducing Convolutional Redundancy via Frequency Decomposition
OctaveConv's angle isn't parameter count but the redundancy in feature maps—decompose them into low- and high-frequency components, control the ratio with a hyperparameter α, and store and compute the low-frequency part at a lower resolution. The parameter count stays the same while the compute drops.
Read more → - 2019 · 04 · 11
A Side-by-Side Comparison of Attention Mechanisms in CV, with Single-Path NAS Notes
A side-by-side comparison of several attention mechanisms in CV: SENet weights channels, Non-local applies self-attention over spatial pixels, CBAM chains the two, DANet uses dual attention for segmentation, plus notes on how Single-Path NAS compresses multi-path search into a single path.
Read more → - 2019 · 04 · 10
Resource-Constrained NAS: Using Submodular Optimization to Justify Greedy Search
Searching for the highest-accuracy network under a resource budget is NP-hard. This paper casts NAS into the submodular optimization framework from combinatorial optimization, obtaining an approximation guarantee via a greedy algorithm and further cutting the compute cost—explaining, in theory, why greedy search is reasonable.
Read more → - 2019 · 04 · 07
Randomly Wired Neural Networks: A New Take on Replacing NAS with Graph Theory
Kaiming He et al. use random graph generators (ER / BA / WS) to produce a network's wiring directly, skipping selection altogether—under the same generator parameters, the performance variance across different random seeds is tiny. Notes on the three random-graph algorithms, the human priors retained in the overall network structure, and the ImageNet / COCO / robustness experiments.
Read more → - 2019 · 04 · 01
Notes on a Survey of Deep Learning Interpretability
A set of interpretability notes built around "Visual Interpretability for Deep Learning: a Survey," walking through visualization, representation diagnosis, decomposing networks into explanatory graphs/decision trees, learning interpretable representations directly, and evaluation metrics — with Interpretable CNN, Network Dissection, and a black-box explanation survey thrown in.
Read more → - 2019 · 03 · 30
A Survey of Generative Models and a Checklist of Image-Classification Training Tricks
Two sets of notes combined: one on the basic landscape of generative models (PixelRNN / CNN, AE / VAE, GAN), and one a checklist of training tricks for image classification (training, data augmentation, validation). They sit together because the regularization and stability ideas in generative models share a lot with discriminative training.
Read more → - 2019 · 03 · 28
Dataset Distillation: Compressing an Entire Dataset Into a Handful of Images
Dataset distillation compresses an entire dataset into a few synthetic images per class plus a learning rate, reaching near full-training accuracy in just a few iterations. We work through the paper's five stages step by step: fixed / random initialization, linear-model analysis, multi-step gradients, and different initialization distributions.
Read more → - 2019 · 03 · 21
Auto-DeepLab Reading Notes: Rethinking the Essence of the NAS Search Space
Auto-DeepLab moves NAS from classification to semantic segmentation, and beyond searching the internal structure of a cell, it also searches the macro network topology outside the cell. I use this paper to reflect on the essence of the NAS search space, and on one possible research direction.
Read more → - 2019 · 03 · 20
Notes on TensorRT Inference Acceleration
A quick rundown of what TensorRT is for and the key optimizations it uses to speed up inference: layer fusion, automatic kernel selection, low precision, and the rough flow of building an engine.
Read more → - 2019 · 03 · 19
A Close Reading of the ResNet Training Bag of Tricks
"Bag of Tricks for Image Classification" stacks a pile of training tricks together, pushing ResNet-50's Top-1 on ImageNet from 75.3 to 79.29. A chapter-by-chapter record of the baseline pipeline, the hardware-oriented large batch / warm-up / FP16 tricks, plus cosine decay, label smoothing, knowledge distillation, mixup, and other refinements.
Read more → - 2019 · 03 · 12
MobileNetV2 Close Reading: Inverted Residuals and Linear Bottlenecks
A close reading of MobileNetV2: why ReLU is dropped at the low-dimensional bottleneck, the inverted residual structure, choosing the expansion rate, the overall network and ReLU6 — all about balancing accuracy, FLOPs, and latency.
Read more → - 2019 · 02 · 23
Notes on DARTS and ProxylessNAS
DARTS uses a softmax to relax the discrete choice of operations into a continuous mixture, then takes gradients with respect to the architecture parameters. To address its memory bottleneck, ProxylessNAS switches to binarized path sampling, searches directly on ImageNet, and models hardware latency as a differentiable objective. Reading the two together makes clear how differentiable NAS evolved from "saving search cost" to "saving memory and fitting the hardware."
Read more → - 2018 · 11 · 22
Learning to Push by Grasping: Using multiple tasks for effective learning
End-to-end learning frameworks have become popular in robotic control: they take state/images as direct input and directly output predicted torques and action parameters. But they have been criticized for their heavy data requirements, sparking debate about their scalability—does the end-to-end approach require building a separate model for every task? Intuitively, sharing across tasks should help, since they all require some common understanding of the environment. This paper attempts the next step for data-driven end-to-end learning frameworks: moving from task-specific models to a joint model across multiple robotic tasks, with surprising results. Under the same amount of data, multi-task learning outperforms single-task learning. For the grasp task, for example, a model trained on 2.5k grasp samples plus 2.5k push samples outperforms a model trained on 5k grasp samples.
Read more → - 2018 · 11 · 17
Playing Atari with Deep Reinforcement Learning
This paper by Volodymyr Mnih, published at NIPS 2013, is roughly the founding work of DQN; the other one is the Nature 2015 paper.
Read more → - 2018 · 11 · 02
The Cityscapes Dataset
Cityscapes is commonly used for semantic segmentation. Its data is divided into 8 categories in total, including one named "void", and each category contains multiple classes. Cityscapes has 30 classes in all, but once labeled there are 35 kinds of labels in total, which also include labels such as "unlabeled" that are not counted as classes.
Read more → - 2018 · 10 · 30
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation
Earlier networks for segmentation were either too slow or too inaccurate. This paper designs an EDANet module that combines asymmetric convolution, dilated convolution, and dense connectivity. It outperforms FCN across the board, and does so without a decoder structure, a context module, a post-processing scheme, or a pretrained model. Experiments are run on Cityscapes and CamVid.
Read more → - 2018 · 10 · 22
Darts: Differentiable Architecture Search
This paper takes on architecture search by formulating the task in a differentiable form, instead of the traditional approach of using reinforcement learning over a discrete, non-differentiable space. The method is based on a continuous relaxation of the architecture representation, allowing efficient methods such as gradient descent to be used for architecture search. Subsequent experiments show that the algorithm performs well at discovering high-performance CNN architectures for image recognition and RNN architectures for language modeling, and is far faster than existing state-of-the-art non-differentiable architectures.
Read more → - 2018 · 10 · 15
Compressing Neural Networks with the Hashing Trick
As deep networks are increasingly deployed on mobile devices, a dilemma becomes ever more apparent: the trend in deep learning is to develop models that can absorb larger and larger datasets, yet mobile devices have limited storage and cannot hold overly large models. This paper proposes HashedNets, which reduce model size by exploiting the inherent redundancy inside neural networks. HashedNets use a low-cost hash function to randomly group connection weights into different hash buckets, and all connections that fall into the same bucket share a single parameter value. These parameters are tuned during standard backpropagation, and the hashing process introduces no extra memory overhead. Performance on a range of benchmark datasets shows that HashedNets can substantially reduce storage requirements while preserving generalization performance.
Read more → - 2018 · 10 · 11
ShuffleNetV2
Many network designs today consider only indirect metrics of computational complexity (such as FLOPs), yet direct metrics (such as speed) are not determined by FLOPs alone—MAC (memory access cost) and platform characteristics also influence speed. This paper argues for measuring directly on a specific platform, which is far better than considering FLOPs alone. Based on a series of controlled experiments, it proposes several guidelines for efficient networks, and from those guidelines derives a new architecture, ShuffleNetV2. Comprehensive ablation experiments show the model achieves a state-of-the-art trade-off between performance and accuracy.
Read more → - 2018 · 10 · 10
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
This paper introduces a highly efficient network, ShuffleNet, which centers on two operations—pointwise group convolution and channel shuffle—that drastically cut computation while maintaining accuracy. It outperforms prior networks on both ImageNet and COCO.
Read more → - 2018 · 10 · 04
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Targeting mobile and embedded vision applications, this paper proposes an efficient model called MobileNets, a lightweight neural network built on depthwise separable convolutions. The model uses two hyperparameters to trade off accuracy against latency, and extensive experiments balancing the two were conducted on ImageNet, demonstrating strong performance compared with other models. Experiments also showcase MobileNets' strengths across a wide range of applications, including object detection, fine-grained classification, face attributes, and large-scale geolocalization.
Read more → - 2018 · 09 · 28
Notes on Inception-V4
In recent years, very deep convolutional neural networks have been the single biggest driver of improvements in image recognition performance. The Inception architecture achieves strong performance while keeping computational cost relatively low. Combining residual connections with conventional architectures produced the best results on the 2015 ILSVRC, comparable to Inception-V3. This paper considers combining Inception networks with residual connections; there is ample evidence that residual connections can greatly accelerate the training of Inception networks, and also evidence that a residual Inception slightly outperforms a non-residual Inception of almost the same computational cost. The paper also proposes several new Inception networks, both with and without residual connections, and these changes likewise markedly improve single-frame classification performance on the 2012 ILSVRC. Finally, it shows that scaling the activations appropriately can make the training of very wide residual Inception networks more stable.
Read more → - 2018 · 09 · 21
On Differentiating Vectors and Matrices
Machine learning algorithms involve a great deal of matrix-related differentiation and derivatives. Here we introduce some common derivative formulas for matrices and vectors.
Read more → - 2018 · 09 · 14
A General Solution to Stock-Trading Problems in Dynamic Programming
There is a class of dynamic-programming problems that give you a sequence of stock prices and ask for the maximum profit you can earn by buying and selling. These problems come in many variants — only one transaction allowed, unlimited transactions, an added transaction fee, and so on. In other words, the maximum profit is generally determined by the trading day and the maximum number of transactions allowed (where one transaction is a single buy paired with a single sell).
Read more → - 2018 · 08 · 31
Definition of Convex Sets and Common Convex Sets
Similar to solving the equality-constrained optimization problems discussed earlier, optimization problems with inequality constraints can likewise be solved using the method of Lagrange multipliers
Read more → - 2018 · 08 · 24
Deriving the SVM (3)
The previous posts covered the derivation of the hard-margin SVM. This post continues with the mathematical derivation of the soft-margin SVM, which allows some samples to be misclassified when the data is not linearly separable.
Read more → - 2018 · 08 · 18
Deriving the SVM (Part 2)
In the previous post (Part 1) we discussed the derivation of the hard-margin SVM and its dual form, whose dual problem can be simplified into the following form
Read more → - 2018 · 08 · 10
Deriving the SVM (Part 1)
The SVM is a classic method in machine learning. Beyond the hard-margin SVM, there are also variants such as the soft-margin SVM and the kernel trick. This article focuses on deriving the hard-margin SVM.
Read more → - 2018 · 07 · 26
Solving Systems of Linear Equations (3)
The pseudoinverse introduced here is the Moore-Penrose inverse
Read more → - 2018 · 07 · 21
Solving Systems of Linear Equations (2)
The previous post covered one case of linear systems—where the number of unknowns is smaller than the number of equations—and introduced the least-squares method. This post covers the other case, where the number of equations is smaller than the number of unknowns. Here the system has infinitely many solutions, but the one closest to the origin—the solution with the smallest norm—is unique. This is the minimum-norm solution of a linear system that we introduce here.
Read more → - 2018 · 07 · 20
207. Course Schedule
This problem uses DFS and BFS to determine whether a given graph admits a topological ordering.
Read more → - 2018 · 07 · 20
Solving Systems of Linear Equations (1)
This article discusses solving one particular case of systems of linear equations, namely considering the system
Read more → - 2018 · 07 · 14
Numerical Computation in Machine Learning (1)
Machine learning algorithms usually require a great deal of numerical computation—that is, solving for approximate values iteratively rather than obtaining analytical solutions. These algorithms typically involve optimization and solving systems of linear equations. Representing various floating-point numbers with a finite number of bits on a computer carries inherent error, so we need certain methods to guarantee the precision of our computations.
Read more → - 2018 · 07 · 06
Training a Simple Neural Network with TensorFlow
Here we use TensorFlow's Eager Execution to build the model, so that we no longer have to create a Graph and Session as before, which makes training a neural network more convenient and faster. Below we use the Iris dataset as an example to train a neural network, with the code taken from Google's tutorial.
Read more → - 2018 · 06 · 29
Doing Deep Learning on GeekCloud
I was recently working on an image-related deep learning assignment from my professor. After debugging the code, I found my computer didn't have enough memory (an 8GB laptop), and later I discovered a really handy deep learning cloud service platform
Read more → - 2018 · 06 · 15
LiDAR + Camera Data Fusion on KITTI
KITTI offers many datasets; here we pick the raw_data (raw data) for fusion.
Read more → - 2018 · 06 · 08
Solving Optimization Problems with Inequality Constraints
Similar to the equality-constrained optimization problems discussed earlier, optimization problems with inequality constraints can also be solved using the method of Lagrange multipliers.
Read more → - 2018 · 06 · 02
Constructors in C++
Every class defines how its objects are initialized. A class controls the initialization of its objects through one or more special member functions called constructors. The job of a constructor is to initialize the data members of a class object, and a constructor runs whenever an object of the class is created.
Read more → - 2018 · 06 · 01
Associative Containers in C++
Associative containers support efficient lookup and access by key. The two primary associative containers are set and map. The elements of a map are key-value pairs, where the key acts as an index and the value represents the data associated with that index; the elements of a set contain only a key. A set supports efficient key lookup, and is presumably implemented with a hash table under the hood.
Read more → - 2018 · 06 · 01
Deriving Backpropagation for Neural Networks
For the training process of a neural network, the backpropagation algorithm lies at its core
Read more → - 2018 · 05 · 25
Sequential Containers in C++
A container is a collection of objects of a specific type. Sequential containers give you control over the order in which elements are stored and accessed.
Read more → - 2018 · 05 · 24
An Introduction to Decision Trees and Random Forests
A decision tree is a method for classification and regression; this article focuses mainly on decision trees used for classification. A decision tree has a tree-like structure, and in classification problems it represents the process of classifying data based on features. It can usually be regarded as a collection of if-then rules, or as a conditional probability distribution defined over the feature space and the class space. Its main advantages are good model interpretability and fast classification. During training, the decision tree model is built from the training data according to the principle of minimizing a loss function. During prediction, the decision tree is used to classify new data. Learning a decision tree usually involves three steps: feature selection, decision tree generation, and decision tree pruning. These decision-tree ideas come mainly from the ID3 algorithm proposed by Quinlan in 1986 and the C4.5 algorithm proposed in 1993, as well as the CART algorithm proposed by Breiman et al. in 1984.
Read more → - 2018 · 05 · 18
IO Classes in C++
C++ does not handle input and output directly; instead, it relies on a set of types defined in the standard library to deal with IO. These types support IO operations that read data from devices and write data to devices, where a device can be a file, a console window, and so on. Some types also allow in-memory IO, that is, reading data from a string, writing data to a string, and the like.
Read more → - 2018 · 05 · 18
Solving Optimization Problems with Equality Constraints
This article discusses optimization problems of the following form
Read more → - 2018 · 05 · 11
Duality in Linear Programming
Every linear programming problem has a corresponding dual problem. The dual is itself a linear program, and the dual of the dual is the original problem. The optimal solution of the primal can be obtained from the dual; sometimes solving a linear program via duality theory is simpler and reveals the essence of the problem more clearly. Inspired by duality theory, the performance of the simplex method has been improved, and some non-simplex methods for solving linear programs have emerged, which this article does not cover in detail.
Read more → - 2018 · 05 · 04
Parameter Passing in C++ Functions
In a C++ program, calling a function requires passing it arguments. Apart from an empty parameter list (void), parameter passing comes in two kinds: pass by reference and pass by value.
Read more → - 2018 · 05 · 04
The Simplex Algorithm for Solving Linear Programming Problems
In 1947, Dantzig proposed a method for solving linear programming problems, now known as the simplex method. It is a concise and efficient algorithm, hailed as one of the ten algorithms with the greatest impact on scientific development and engineering practice in the 20th century.
Read more → - 2018 · 04 · 27
An Overview of Linear Programming
Among optimization problems there is a class known as linear programming problems, which belong to constrained optimization. Linear programming is the problem of finding the extremum of a linear objective function under linear constraints (equalities or inequalities).
Read more → - 2018 · 04 · 26
The const Keyword in C++
When programming we often need to define a kind of variable whose value never changes, for example pi=3.14, e=2.72, or the elastic modulus of a material. That is when the const keyword comes in.
Read more → - 2017 · 03 · 29
Self-Tuning PID Parameters with a Genetic Algorithm (Simulink Implementation)
Automatically tuning PID parameters with a genetic algorithm in MATLAB / Simulink: building the simulation platform, designing an error-centric objective function, writing the interface function between the GA and Simulink, and iterating with a main function. Of limited practical use, but worth noting as a learning method.
Read more → - 2017 · 03 · 27
How Genetic Algorithms Work, with a MATLAB Implementation
Walking through the complete genetic-algorithm pipeline on a classic problem—finding the maximum of a two-variable function—covering individual encoding, the initial population, fitness, selection / crossover / mutation, with a MATLAB implementation and a quick reference for the most-used functions in the Sheffield GA Toolbox.
Read more → - 2017 · 03 · 26
Plotting a Vehicle's Dynamic Performance Curves with MATLAB
A big problem from my Automotive Theory class: given the parameters of a medium-duty truck, use MATLAB to compute and plot the driving-force / road-resistance balance diagram, the acceleration-time curve, the dynamic-factor diagram, and the power-balance diagram—and along the way find the top speed and the maximum gradeability.
Read more →