Advice for Machine Learning PhD Students
This is my collection of advice and principles for machine learning PhD students, distilled from years of research experience, supervising others, and reflecting on what has worked well (and what hasn’t).
The Three Pillars of ML Research
Machine learning research rests on three pillars, in order of importance: (1) understanding, (2) communicating, and (3) implementing. To minimize time spent coding, maximize time spent on understanding the problems and the solutions.
Research Philosophy & Mindset
Clarity is all you need. Research often boils down to simplifying problems, solutions, and ideas to their simplest version. Refactor and rework the math until it’s so clear that an outsider could grasp the ideas at ease from a first glance. This often takes time, but will help make the best realisation of your contribution. Implementing, writing, and publishing becomes much easier after the conceptual work is done.
Focus on problems, not solutions. Instead of looking for solutions, focus on finding problems that are true, novel, and significant. Find open problems by looking at what state-of-the-art can’t do, does poorly, or ignores.
Make a point. Don’t just present a method with 2% better performance. So what? There are thousands of papers every year: why should one care about a 2% improvement? Why is this important and significant; what do we learn from this; why should everyone know this method?
Don’t be pretentious. Math for math’s sake is rarely useful in machine learning. Don’t drown your audience in theory unless it matters. Keep things as simple and tied to the real-world problem you are solving.
Benchmarks are not science. This can be controversial, but I think it’s important to say: papers should not be treated as benchmark competitions, but opportunities to identify novel research problems and understand and address their root causes. Benchmark tables by themselves are not scientifically interesting: every year new methods come up and errors go down, often with little gain in insights. Instead, aim at understanding the qualitative improvement behind the contributions, or finding gaps in literature, or problems behind SOTA models. These often come from understanding related works and your model more in-depth. By solving true open problems, the SOTA results will follow.
It’s a marathon. A PhD is around 1000 days of work. Plan long term and check your progress twice a year.
Keep learning. One needs to become the world’s top expert in the PhD topic during it. This means reading 100’s of papers during your PhD. Most scientists know one thing very well, and apply it everywhere: differential geometry, Bayes, numerics, etc. This makes publishing much easier.
Stay curious. The field moves fast, but fundamental principles remain. Always ask “why” and “what if”. Read widely beyond your narrow specialization. Attend seminars outside your immediate area.
Coding & Engineering Excellence
Become a coder wiz. Learn to automate your workflows: code, experiments, runs, logging, analysis, plotting, results. Make sure you can reconfigure and restart and reanalyse your experiments in “one click” against a GPU cluster. Make clean code and refactor often. Learn the latest tools and frameworks. Ask your colleagues for their best practices. One can’t survive a PhD anymore without being a proficient ML engineer. This will multiply your research output.
Learn git. Version control is essential for any software project, research or not. If you don’t know how to use git, learn it NOW. Learn how to do branch, merge, pull requests… This will save you from countless headaches and lost work. One can’t do a PhD without knowing git professionally.
Exploit LLMs wisely. Use LLMs to code faster, write better papers, do literature surveys, and solve math. LLMs are a game-changer in how we do science. But don’t rely on them blindly: always verify their outputs, and use them as a tool to increase your productivity, not as a crutch to avoid learning the fundamentals.
Document everything. Keep detailed notes of experiments, failures, ideas, and insights. Use a lab notebook (digital or physical). Tag files and code consistently. What you forget today will be impossible to reproduce tomorrow.
Do your homework. Follow good ML principles and keep the quality bar high. Don’t make shortcuts. This will come around and cause trouble in the future. Understand your own code, data, and literature throughout. Don’t rush to implementing a cool idea before you’ve gone through all related papers. Prepare for any question a colleague could have. Don’t submit unfinished manuscripts with sloppy presentation and incomplete results.
Debugging & Problem Solving
Debug to understand. When things are not working, visualise everything: the loss, the optimisation, the network, the activations, the weights, the data, the likelihood, the gradients, the layers, etc. A “that’s odd…” moment will come.
Understand the data. If you are working on a particular application with non-trivial data, spend time understanding the data. Look at it, visualise it, study how it is represented and preprocessed.
Backtrack to debug. When things don’t work, backtrack until you find a solid foundation that you fully understand and that fully works as expected in every possible way. Then, start adding your stuff back in one at a time: verifying and checking each. Example: my ELBO is not converging -> remove the KL term -> converges -> the KL term is the problem!
Slow down to speed up. Spend time understanding the problem you want to solve, and verify it exists. Formulate hypotheses on how to improve. Start from trivial baselines (random forest, linear regression), simple neural networks, and pre-trained SOTA baselines. Run a sequence of more and more complex models, where ideally you change and quantify only one thing at a time. You want to do coordinate descent: move along one design axis at a time (is A better than B? Is C better than D?). This way you can understand the contribution of each component, and build up to a strong final model. Don’t jump to complex models before understanding the problem and the solution.
Don’t tell me “it doesn’t work”. This is one of the most common phrases I hear from students (MS and PhD, alike). Even outside research, an engineer just saying to a manager “it doesn’t work” is not helpful. It doesn’t tell us anything about the problem, the solution, or the next steps. It’s a dead end. Instead, try to be more specific: what exactly doesn’t work? What did you expect to happen, and what happened instead? What steps did you take to try to fix it? Be proactive!
Writing & Communication
Learn LaTeX. LaTeX is the standard for scientific writing in machine learning. Ask your colleagues and your supervisor for their templates and macros.
Write simple papers. Write in a way that is accessible to a non-expert reader. Use illustrations, colors, and short paragraphs. Use LLMs to polish the language. Shorten, distill, and simplify as much as you can. If you can’t write a simple paper, the idea is not yet ready. Be explicit and use precise math.
Go to the point. Write what you want to say, and nothing more. As a reader I want to see a 10/15-line abstract and 2-paragraph introduction: please do this. Bullet your contributions. Related works often works best at the end of the paper. Use short paragraphs, and use \paragraph to title them. Papers often have a good flow when whitespace is maximised. Make a feature table wrt related methods (example). Add conclusions in boxes (example). Color equations (example). Use \underbrace commands to explain complex formulas. Use LaTeX templates that encourage clear writing (e.g., those used by ICML, NeurIPS, JMLR).
Spend time on figures. Always try to make a good “abstract” figure for page 1 that illustrates the main innovation, even if your paper is mostly theoretical. Animations typically make even the most complex ideas understandable.
How to present in meetings. Make slides for every meeting, but don’t waste time on making them pretty. Always include a “big picture” context slide at the beginning, and a “todo” slide at the end. Distill your ideas to their simplest version: what are the main points you need to convey? Great slides usually have 1 picture and little text (closer 10 words than 100 per slide). Distribute meeting agenda beforehand and summary afterward.
How to prepare talks. Start with a clear outline of the main points you want to convey. Make a “big picture” slide at the beginning, and a “takeaway” slide at the end. Skip heavy math and details, and focus on intuition and motivation. Remember that an average listener has mental budget for max 5 equations, and will stop listening if you present more. Use animations to illustrate complex ideas. Don’t read from the slides and don’t rush through them (consider 2/3 minutes per slide).
Advertise your papers. Make a website for each paper you make (a good example). Release the code, and spend time polishing it, making demos, tutorials and notebooks. Write a friendly blog post for each of your papers. The more user-friendly your method is, the more citations and impact you will get. Often the most famous method in a field is not the best, but the one that is easiest to use and has best documentation.
Be visible. Have a website for colleagues and bigshots to find you. If you have no papers yet, having a technical blog is a good way to show your expertise (great example).
Collaboration & Organisation
Don’t hide from your supervisors. Supervisors love talking about science, being challenged, and hearing about your ideas. Most likely, the supervisor has a clear vision of what they want to do, and is waiting for you to come up with the details. Don’t be afraid to ask for advice, feedback, and help. But don’t just ask for suggestions: come with your own ideas and solutions (and don’t be afraid to be wrong).
Be honest. Tell your supervisor when you don’t understand something or when you are struggling. Don’t nod if you didn’t understand; ask for clarification. Implying otherwise makes it difficult to work with you. Don’t imply that you are doing fine when you aren’t. Never cancel meetings.
Calendar, not TODO lists. Don’t make TODO lists. They expand until they become unbearable, and you restart. Instead, block time for tasks on your calendar and put deadlines on them.
Ask to supervise. Later in your PhD (at least 1 year in), ask your supervisor if you can supervise a master’s student or an intern. This is a great way to learn how to mentor others, and to get help on your projects. It also looks good on your CV. Supervising others will also help you clarify your own ideas and research direction.
“Inspired by the advice of Markus Heinonen, Aalto University”