Cake
  • Log In
  • Sign Up
    • I use jupyterlab as my main coding environment for my data science work, but I’m looking for some tool (ideally integrated in the Jupiter ecosystem, but it’s not necessary) in which I can take notes and jot down thoughts, ideas, etc. for my machine learning experiments, and keep them organized and easily searchable. Any ideas?

    • I haven't tried JupyterLab (I know, it's the next gen UI, can't keep up with everything :) ), but Jupyter notebooks were, well, notebooks, is that functionality lost in Lab?

      If it can be an external tool, I would look to either Evernote, Google Keep or OneNote.

    • My solution is not particularly automated or integrated. What I do is to write code to print out the state of the whole pipeline: input data vs output metrics. Here, standarization and annotation matters, so learn how to annotage smartly. I integrate a logger to print out each step in the pipeline (which file was loaded, method used, steps in data curation, etc). Now I have complete visibility into input-steps-output.

      Now it is time to do your lab notebook. I keep my notebook in Numbers. This is perhaps the most controversial part of my methodology. I chose Numbers (esentially Excel, but much better) because it has a very clean UI, supported by a giant (Apple), fast, stable even with large datasets and large figures. Importantly, the process of writing your own notebook should not be so automated. It pays to manually organize results and write down your notes.

    • Welcome to Cake, Oriol. 🙂 It's amazing how much traction JupyterLab seems to be getting. How's been your experience?

    • Yes, I do use Evernote or just text files in jupyterlab (or markdown cells in the notebook), but ideally I'd like to have a file directly linked to a notebook (or a project) where I can write notes to myself, etc. It would be like each notebook comes associated with a notes file that I can open an close with a shortcut to quickly jot down some thoughts, without having to open different programs or have to be careful with file names in order to find the notes file that correspond to the notebook. Does that make sense?

    • I see. I guess I'm lazier than you! Ideally, I would have a tool that, with a shortcut, opens a notes page that is linked to the current notebook so that I can jot down thoughts, document things, etc. For keeping track of the values of the experiment, I'm in the process of deploying MLflow at work. It looks promising, but I'm not sure it will do everything I need.

    • I really like it, and I use it daily for my work. It still doesn't have feature parity with jupyter notebooks and some things are annoying, but I'm hoping it's just a matter of time to iron out the remaining kinks. Have you tried it?

    • I'm a Learning Architect and Technologist at Teradata, and the trend I've seen within the industry has been Jupyter Notebooks. You can have your notes, thoughts, and comments in-line with your queries, results, etc. It's an easy integration with Jupyter Labs and a lot of other analytics providers (including Teradata Vantage - sorry for the shameless plug).

      I'm working today to put together a Citizen Data Scientist program and considered posting about the development and architectural/content decisions as we go.

    • It would be great to hear about the decisions you make as you go! As a data scientist I'm struggling with some of the data engineering weaknesses where I work (a startup without enough resources), so I'm keen on learning from what others are doing. As for the use of the Jupyter Notebooks, I do keep some comments in markdown cells to document what I'm doing, but I guess I find the need for a 'connected' notebook to the notebook (if that makes any sense) where I can write musings, ideas, etc., separate from the code which will also be read and executed by others at work.

    • I started attending Data Science Meetups, and that's where I found a lot of guys like you looking for better ways to do things. I learned more about real data science at the meetups than I did after 2 years of night school. There's a lot of bright minds and new technology that will totally change the landscape - great for progress but challenging for analytics companies. I'm waiting for the days that microservices and containerization become the new norm - that will make sharing access easier without compromising security (today's tradeoff).

      I was a victim of my own success - a few of my first data science analytics proved valuable, and suddenly my time was spent re-tooling the analytics 10 different ways for 10 different teams. Next thing I know my time is spent running daily reports and parsing out the results to the appropriate audiences: feeding the monster I created.

      The focus of the last modules of the CDS course will be operationalization - taking the useful analytics you've created and turning them into a repeatable deliverable you can share with your organization without monopolizing your time. Too often we see great ideas turned capabilities turned memories because of the difficulty with those next steps.

    • Yes, I'm experiencing the problem of getting my analytics work to become something useful and lasting in my organization. I've been spending a lot of time building dashboards through Dash or Superset just to make sure that the analytics work doesn't become a one-off thing that is soon lost and forgotten. Part of the problem, however, is not only the infrastructure and work needed to turn data science products into useful reproduceable analyses, but people's attitudes towards using new tools. After spending a lot of time building what my CEO needed, he ended up telling me that he preferred receiving Excel sheets periodically with the updated numbers...

      I'm curious about your CDS program. Is that something you're building internally for Teradata? If not, what platform will you be hosting it on?

    You've been invited!