Advertisement
pszemraj

interview

Jul 12th, 2022
178
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Perl 3.96 KB | None | 0 0
  1. what is the 🅱️iggest 🅱️atch size you have ever trained a model with?
  2. >
  3. > — A: I’m not really interested in data size. I’d rather have all data than a bigger model.
  4.  
  5. A:
  6.  
  7. It looks like you are just asking "how much data is a data scientist"? That is something impossible to answer without context. You can train a neural network on all the data in the world if you have the right hardware for it. This is where the term "big data" comes from. You can be a part-time data scientist if you want by collecting data on the side.
  8.  
  9. ---
  10.  
  11. what is the 🅱️iggest 🅱️atch size you have ever trained a model with? 🤔 — Deep Learning Book (@DeepLearningBook) September 8, 2019
  12.  
  13. The 🅱️iggest 🅱️atch size is 8192 and was training a fully connected neural network on MNIST.
  14.  
  15. 💻
  16.  
  17. Deep Learning, TensorFlow, and AI: the Future of Data Management?
  18.  
  19. A new era of “data science” or AI in general has begun with the rise of machine learning (ML) and other AI based techniques and platforms. This, along with deep learning, has enabled a variety of previously impossible new capabilities in areas including:
  20.  
  21. Predictive analytics
  22.  
  23. Recommendations
  24.  
  25. Image/video/audio search and retrieval
  26.  
  27. Recommendations and Personalization
  28.  
  29. Video and Audio Synthesis
  30.  
  31. Speech Recognition and Language Understanding
  32.  
  33. Natural Language Processing (NLP)
  34.  
  35. The advent of these capabilities is causing a major shift in the way we interact with computers.
  36.  
  37. ---
  38.  
  39. what is the 🅱️iggest 🅱️atch size you have ever trained a model with? - [email protected](https://twitter.com/yhatlol/status/1172885162948953664)
  40. - `biggest_catch_size_in_model_history`: How big was the 🅱️atch, if any, in your model's history? - [email protected](https://twitter.com/yhatlol/status/1172885162948953664)
  41. - `best_catch_size_in_model_history`: What is the 🅱️iggest 🅱️atch size you have ever trained a model with? - [email protected](https://twitter.com/yhatlol/status/1172885162948953664)
  42. - `largest_caught_size_in_model_history`: 🆓 What is the 🅱️iggest 🅱️atch
  43.  
  44. ---
  45.  
  46. what is the 🅱️iggest 🅱️atch size you have ever trained a model with?
  47.  
  48. the 🅱️iggest I have ever trained a model with is about 2,000,000 items. This was for a recommender system, where my output was a user, and what items they will like. I ended up writing my own recommendation system rather than using an existing one, so a number of my own functions ended up being in the model.
  49.  
  50. 🅱️hat are your 🅱️references for the above question?
  51.  
  52. I’m currently learning the area, but my go-to text is Chapter 5 of Machine Learning for Hackers by Andrew Ng - I’m sure you have heard of his Coursera course?
  53.  
  54. 🅱️hat 🅱️eems 🅱️o have you been most surprised by so far? 🅱️i.e., 🅱️is there anything that you thought you knew, but that you now realize is 🅱️a false 🅱️at?
  55.  
  56. I had expected training a matrix factorization model to take quite some time and a few weeks or months of data to get decent accuracy, but one of our users had a dataset of nearly 7,000,000 items and it took about 2 minutes! I’d never seen a neural network or gradient descent so lightning quick.
  57.  
  58. 🅱️hat is your favorite ❗️tool when solving ❗️machine learning problems?
  59.  
  60. I’m currently using a library called pymc3 for doing Bayesian models, and it’s super easy to use for creating latent variable models or neural network based models. I think it’s well documented, easy to use and just works really well.
  61.  
  62. 🅱️hat do you like to 🅱️hack on 🅱️when not 🅱️using it for 🅱️your day job?
  63.  
  64. Since I’ve mostly been working with neural networks and deep learning, I do have a bit of a love of the Keras library (Python based). There are also other libraries out there that are a lot like R that I still need to dive into more. Some other fun toys: Scikit-learn is another really great Python-based machine learning library, as well as XGBoost and tensorflow
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement