More data, more problems (to solve)…

Greetings from Dataspace!

Another week, another newsletter! Before we get into any new news, a quick update on a piece of old news:

It looks like our new website was experiencing some problems of its own when we sent out our last newsletter with the launch announcement. Hopefully all technical issues have been resolved now, so if you weren’t able to see our new look before – give our site another try!

Time for Homeschool: Practicing with real data!

Speaking of recording problems in spreadsheets.. this week’s online learning resource section includes links to some interesting datasets that will allow you the chance to practice the data science techniques you’ve been working on, and hone your problem solving skills.

Happy Learning!

  • I’ve got 99 problems… how do I pick ONE (data science technique)? If you’re at a bit of a loss for where to start, this article provides some insight into picking the right kind of data depending on which technique you are hoping to practice. Similarly, you can find some suggestions for some structured data science projects, along with the links to the appropriate data sets, here.
  • My main problem right now is being tired of watching re-runs of my favorite reality TV shows… is there a data science technique for that? Not exactly. However, the VLOG Dataset curated by researchers at the University of Michigan (link not data science related, just nostalgic for Michigan football games) catalogues massive amounts of data gathered from Lifestyle Video Blogs, and also provides some resources discussing the best ways to tag, organize, and analyze this kind of data.
  • I’d rather do data science on other people’s problems, not mine. What is the right technique for me? Never fear, this tutorial will walk you through the process of working with streaming data (specifically, the Twitter API), and how to collect and analyze the information published by others online.
  • Enough tutorials, just show me the data! This free data repository at Harvard University provides access to massive amounts of research data across a wide variety of fields – from Astronomy to Law to Military History, etc. Play as you please!

Stay tuned for more learning at home resources – next time highlighting some more unorthodox and creative applications for data science techniques.

Ben’s Take

Have you thought about what comes after Covid 19 (pattern wise, Covid 20, I guess)?

Yes, almost everyone has cut back in a really big way. Things are tight now. But, have you thought about what comes next? Sadly, when the current crisis ends, a number of us will be looking for new jobs. Others, however, will have to figure out where to go next. For those in analytics, that means answering some very important questions, like:
  • What projects are most important and need to continue?
  • Should we hire new staff to tackle our hot projects or does the risk of reoccurrence make that dangerous?
  • Has this bout of working from home made the concept of remote resources more, or perhaps less, appealing?
Frankly, we’re trying to figure out how companies are going to answer questions like these. As you may know, we provide both temporary contractors and permanent employees in analytics and data engineering. I, personally, think that companies are going to forgo hiring for a while and lean on contractors to meet immediate needs until the situation calms down and stabilizes. What do you think? Do you have a plan for what comes next? I’d love to hear what you’re thinking and, of course, I’m here as a sounding board if you need one.
More on my Data Matching project, aka Golden Record
So, it turns out that funky characters from alphabets other than traditional US English can throw off a database load routine. Who knew? I suspect it’s a problem that has followed Mr. Ziębo all his life, however. In other words, I was busy on Sunday.
In the broader picture, we are making great progress in our effort to develop a system that matches people and other things across data sets and we remain on track for a late June POC release. Interesting accomplishments in the past few weeks include:
  • We now have a web page that describes Golden Record (in perhaps too much detail).
  • We’ve tested our basic matching algorithms and, thank goodness, they work!
  • A number of folks have stepped forward to provide input on their needs and to describe situations where they face matching problems. (Thank you, thank you, thank you!)
I am very eager to hear about other situations where matching records between datasets could be useful. If you know of a potential need and are willing to talk, please do reach out to me at Benjamin.Taub@Dataspace.com. I promise that I won’t try to sell you anything (at least not until after June 30 ). I just want to hear about your needs. And, if you’re comfortable sharing some sample data sets, that would be heaven! In any case, don’t hesitate to reach out if you have any input or questions for me. Thanks!
That’s all for now, thanks for reading. Until next time, please don’t let anyone sneeze within six feet of you.

-Ben

Suggestions?

What do you think? How can we make this newsletter more useful to you? What topics would you like to see more of? Want to contribute an article? Just want to catch up and chat?

I’d love to hear from you! Email me at benjamin.taub@dataspace.com.