Flattening the (other) curve: Data Science Learning Resources

Greetings from Dataspace!

First of all, we here at Dataspace hope that you and all of your loved ones are safe and well in this time of uncertainty, and that we’ll all soon be back on our feet.

As the nation’s workers struggle with unemployment or adjust to working from home, it has become even more essential for each of us to pay attention to our habits (or lack thereof) of self-care.

Just like it is important to build in exercise to keep our bodies healthy, we also need to make an effort to keep our brains fit and agile… and one of the best ways to do this is through learning new things!

Time for Homeschool: Data Science Learning Resources

In our upcoming newsletters, we’re hoping to provide you with some resources to help you beef up your analytics skills. While both are important, there’s a difference between a data scientist and a programmer who knows data science toolkits. A data scientist generally knows technology but also understands his/her business and the statistics and techniques that will improve it.

So, rather than technology, today we focus on the basic statistics and concepts that underlie modern analytics. Sharpen your pencil and let’s get started!

  • Why yes, I did study at Harvard (remotely, for free). Didn’t everyone? The edX website has a ton of free data science material produced by some top universities. This introduction to probability course, for example, is from Harvard. We dare you to view the intro video and not want to jump right in.
  • When you’re feeling like nothing is “normal” anymore… Check out this five-part series on data science concepts, the first of which dives heavily into statistics and distributions (normal or otherwise).
  • Just because you’re quarantined doesn’t mean you can’t go for a walk through a (random) forest. One “crowd favorite” data science technique is called random forest classification. It’s a way to create predictive models. This blog post provides a great introduction to what it’s all about.
  • What should I do if the machines take over the office while I’m gone? Keep learning about machine learning! This free lecture series from a real Caltech course covers the theories and practices behind learning from data – both for humans and machines. There’s even “homework” assignments and a final exam available if you really want to feel like you’re back in school!
  • But can my computer keep me company while I’m stuck at home? If you’re interested in learning more about how machines process and understand human language, check out this Introduction to Natural Language Processing – what it is, how it works, and some common techniques.

Stay tuned next week for more learning at home resources – next time focusing on a few specific analytics tools.

Ben’s Take

My Coronavirus Project: Have any input for me?

Greetings (from six feet away, of course)!

One thing I’m doing to make lemonade out of this stuck-at-home crisis is to work on a piece of software I’ve been thinking about for a long time. It’s a cloud-based tool for finding matches across data sets. For example, it can tell that the John Smith in your CRM system is the same person as the John Smith in your sales system but different from the John Smith in your warranty system (although it works for any data, not just persons). It can return the results in bulk or keep the data synchronized over time, serving as a master data management (MDM) solution.

Yes, I do realize that there are already matching and MDM products on the market. I’m hoping that this one, tentatively called Golden Record, will be different in a few ways:

  • It will be lightweight / cloud-based
  • It will have both an API and a browser-based web user interface
  • It will provide both one-time matching and long term, persistent MDM / integration
  • It will be less expensive than existing solutions, which can cost into the hundreds of thousands of dollars.

I’ve heard from a few folks in the software industry about their needs for something like this but I could really use your input, too. In particular, if you have a few free minutes (and who doesn’t right now?) could you please let me know…

  • If you’ve addressed a need like I’m targeting, how’d you do it?
  • If you have, or are anticipating, a need for something like this?
  • If you know of industries and use cases where Golden Record might be a good fit?

I’d love to talk if you’re up for it. Just email me at benjamin.taub@dataspace.com.

And, above all else, stay safe! Thanks for reading.
-Ben

Suggestions?

 

What do you think? How can we make this newsletter more useful to you? What topics would you like to see more of? Want to contribute an article? Just want to catch up and chat?

I’d love to hear from you! Email me at benjamin.taub@dataspace.com.