Tools and techniques for management reporting and analysis have evolved since computers first came out. One can argue that the first management reporting tool was COBOL (COmmon Business Oriented Language). It allowed business people to get data out of systems created by computer people. COBOL was eventually replaced for reporting by tools like Information Builders (IBI), which started as programming languages used for reporting.  

 

Spreadsheets appeared as a way to analyze and present data. Originally they couldn’t access databases but that has long since changed.  

 

Since then we moved onto decision support systems and business intelligence, with tools like Business Objects, MicroStrategy, Tableau, Qlik, Cognos, and a ton of others. 

 

What almost all of these tools have in common is that they let non-technical managers access and and present data. For example, the sales manager using Tableau can grab sales data for the past two years, see it presented in a pivot table showing the top customers by region and in a chart showing the trend in overall sales. However, almost all applications of these business intelligence tools are backward looking. They present summaries of the data we own. 

 

The next step in this reporting evolution is data science. Data science is more about identifying correlations, calculating probabilities, and predicting the future than reporting the past. Thus, one may use a data science model to predict the likelihood that a customer will purchase based on their zip code (and, in most cases, a host of other factors).  

 

While it’s not initially obvious, prediction extends into realms like facial recognition. Most facial recognition is performed using data science tools and data about people the system already knows, data like photographs tagged with people’s names. These systems don’t know that you are who you say you are but they calculate a likelihood that you are that person (to reinforce the point that this is simply a prediction, and that predictions sometimes fail, here’s an article about a study finding that an Amazon facial recognition tool misidentified 28 members of Congress). 

 

So, the major difference between data science and business intelligence is this focus on being forward, rather than backward, looking.  However, there are a few other differences.

 

TOOL SETS: As you might expect, data scientists use different tools than do BI users. Whereas most BI tools provide users with point and click interfaces to access and format data, data science tools tend to be programming languages, like Python or R, with add-in libraries of code tailored to specific problems.

 

AUDIENCE: While vendors are developing end-user-accessible data science tools, the vast majority of data science technologies require strong technical skills as well as a background in statistics, math, or a related field. As a result, data science technologies are used by a very limited set of people. These people are called upon to build ‘predictive models’, the results of which are distributed to much broader audiences.

 

PROGRAMMING COMPETENCE: Because BI tools are intended to give easy data access to managers, they do not require much technical skill (beyond that needed to initially configure them). Data science tools, however, do require programming chops. There are differences between the job of a programmer and the job of a data scientists but that’s a topic for another post.

 

So, if you’re just getting into this field, what do you need, business intelligence or data science? This is obviously not a simple question to answer. You likely want to start with business intelligence as almost all companies moving into data science have. Not only will it give your managers access to their data but BI tools will be used by your future data scientists as they investigate that data. Once you’re comfortable with these tools, then take the leap into data science. Remember, whatever you do, purchasing technologies just to ‘keep up with the Joneses’ is ill-advised. Start with a clear picture of what you’re expecting and know what it is you want to do that you can’t do today.

 

Finally, if you have any questions, reach out to Dataspace! We’d love to help you navigate this powerful yet complex space!

contractors for data science

Data science is, and has been, in vogue.  Every forward-thinking company wants to have a data science program because by now it is conventionally understood that it will improve profitability and efficiency across the business.

 

To some companies with troves of data waiting to be ‘data scienced,’ his means hiring data scientists willy-nilly even if they don’t know exactly what they’ll be doing.  Several months down the line, many find that nothing meaningful has come of their data science efforts and that they have invested in a team of highly-paid individuals that are now looking to jump ship.

 

We call this the “Step 2 Problem”.  Companies erroneously think that the next step after collecting or simply having data is hiring data scientists and building a team – but what many companies fail to realize is that a data science strategy must precede the onboarding of data scientists themselves.

 

Having a plan is key to hiring and retaining top talent

 

The data science market is competitive and it can take more than just generous compensation to lure and retain talent.  While it is certainly still the case that money, benefits and corporate culture are crucial factors, we often find that data science candidates want more – they want to understand the part they’ll play in a well-considered corporate data science strategy and the specific nature of the problems they will be solving.

 

We often come across data scientists that are interested in a change because they feel underutilized in their current jobs despite having all of the necessary skills.  These candidates tend to favor opportunities with companies that can clearly articulate the path towards achieving the objectives of their data science strategy as well as how that data science strategy plays into the overall business strategy.  So, be prepared and keep strategy in mind when crafting job descriptions and reaching out to candidates that you want to bring into your fold. They want to know that your organization will be able to give them the mentorship and skill development that they need to grow within the industry as well as a few feathers in their caps in the form of successful projects.  

 

Do you need help conveying your strategy?

 

We are always happy to kick ideas around and can also help you convey your data science strategy to prospective candidates.  We speak the language of data science and can help you and your organization avoid falling victim to the Step 2 problem.

 

Dataspace can help with job descriptions, screening, and interviewing candidates for your team. We have an unusually high rate of success in finding and screening the data science practitioners companies struggle to find on their own. In fact, most of our recruiting customers come to us because their internal teams are having trouble locating and even more frequently, screening candidates.

As you may know, many big data technologies are defined as schema on read. What this means is that you can throw whatever you want on the disk and then, when you need the data, you tell the data store what that data means (e.g. the second column contains price per unit). Traditional relational databases, on the other hand, are schema on write – you tell the system what your data will look like before you put the data into it. And, if your data doesn’t look like you said, the system will reject it (I originally said, ‘barf it back to you’ but that line was removed in editing).

 

I was recently reminded of the value of schema on write systems when I ported data from an old forecasting system (ex-Dataspace cadets may remember a system called PrOps) to a new, cloud-based system we’ve developed. I created the new schema in Google Cloud SQL, exported data from our old, MS Access database, and tried to import it into the new schema. Lo and behold, much of the data got rejected. Here’s just one reason why: MS Access had prefixed every dollar value with a dollar sign. Thus, the new database saw those values as strings, not floating point numbers. I then had to put in some effort to fix the load routines to address this.

 

Suppose for a moment that our new database was in a schema on read technology, like Hadoop. It would have just let the data in, dollar signs and all. The problem wouldn’t be exposed until later, when someone tried to work with that data, getting strings when they expected floats and potentially getting a faulty result to their query.

 

It’s not that schema-on-read systems are bad, it’s just that they have their place. When developing applications where exactness is paramount, think very carefully about whether or not schema-on-read makes the most sense.

In the distant past (circa 2015), IT was responsible for providing data consumers with data, tools, and development expertise. Today, the landscape is shifting, especially in larger organizations. Nowadays, IT provides some of the data – the certified, clean data. Consuming departments fill the rest of the needs – tools, analytic experts, even non-certified data. This shift has been enabled by a number of factors including cloud technologies, an increase in technical competence in the population at large, and a desire to do more and do it more quickly. There’s still a role for IT’s analytic data sets, like data warehouses, but as just one component in a broader infrastructure that is increasingly controlled by the consumers of data.

 

How is your organization adapting to the new realities of analytics? I’d love to hear. Pop me an email or a call sometime.

 

-Ben

Predictive analytics is this super-complex field that only statisticians and data scientists can understand, right? Well, perhaps it takes some training to do it well but it only takes two sentences to understand what it’s all about: In predictive analytics we determine the likelihood of something by looking at data about it. We do this simply by looking for similarities between that data and data from past cases where we actually know the outcome. So for example, the Not Hot Dog app, based on HBO’s show Silicon Valley, doesn’t actually know that it’s looking at a hot dog, it is simply predicting the likelihood that what it sees is a hot dog based on actual hot dogs it’s been shown (i.e. trained with) in the past. It doesn’t tell you that there’s only a 72% probability that what its seeing right now is a hot dog but, in fact, that is what it ‘knows’. Simple.
Pitching data science budget

Last week we offered some suggestions on how to attack your data management initiatives in 2018 according to your organization’s level of data expertise.  This week we follow up with some tips on how to pitch those new data science technologies and initiatives to increase your chances of getting resources allocated to your projects.

 

As a general rule, when pitching your budget for data related activities, it is important to avoid simply listing products and technologies that you think your organization needs.  What senior executive is going to fund a bunch of acronyms that he’s never heard of and doesn’t understand? A more effective approach is to provide a vision of the future where these technologies are helping answer important questions and achieve strategic goals.  For example, pitching the ability to know things that people in your organization desperately want to know about their customers will be far more successful than pitching a Tensorflow project that sources data from Hadoop.

 

Speak in terms of ROI

 

When seeking budget, you are essentially pitching an investment and making an argument for why you need the money.  You have to present your case as a story which ends with more money coming into the business than what goes into the project.  Start by defining a problem and then propose the technologies and people assets necessary to solve it.  Generally the dollar amount of such an investment and the costs needed to sustain the data science assets over time can be clearly articulated.

 

What is less clear is how this money, and more, is going to come back into the business, so this is where it is important to identify concrete potential for savings and new revenue areas.  What will the company see in return after one year?  Two years? Five years?

 

Remember also that decision-making doesn’t rely exclusively on quantitative rationale and that it can be helpful to make emotional appeals as well.  Make note of efforts that your competitors are making so that your colleagues understand the consequences of falling behind.  Envision a future where data is easing challenges, making people’s jobs easier and playing a meaningful role in your company’s success.

 

Speak in terms of hot buttons

 

Big data can solve a broad spectrum of business problems.  It may be helpful to ask executives what their concerns are and also the roadblocks they face in trying to set the guiding strategy.  You will find that management is more eager to hand over cash when there is a benefit that resonates with their own agenda.

 

Consider Competitive Advantage

 

Look at your competitors and express how your project contributes to making your company’s offering better than theirs.  Will your data science initiative result in a far better product? Will it allow you to operate at a lower cost and pass that savings onto shareholders or customers?  Is the effort necessary simply because you have already fallen behind your competition and need to catch up?

 

If you are doing things correctly, your data science investments will alarm your competition and encourage them to follow suit.  (So, even if you’re successful getting budget and changing your business, never let your guard down)

 

Speak to the impact on people and processes

 

Management frequently thinks about their business in terms of people, process, and technology. While you’re hoping to introduce groundbreaking technology, management will be thinking about that bigger picture. In their minds, the technology is just part of the solution to a business problem. You need to think through a vision that shows who will use the technology, how they’ll be trained, and how the company’s business processes will be changed. As we’re fond of saying, the ROI on a new business intelligence system is zero – the system simply represents the I, the investment. The R, the return, comes from doing business differently with that system.

 

If you think your investment will take needless hassle and effort out of people’s lives, mention it!  Maybe the new system will change how the sales team targets accounts, allowing them to identify and prioritize leads with the greatest likelihood of purchasing.  Maybe it will automate administrative procedures for employees that could be spending their time on tasks demanding more intellectual rigor.

 

Demonstrating that processes can be improved and people’s lives improved in concrete and specific ways will embolden your case and help you win resources.

 

Are you pitching data science projects for 2018? How are you justifying them to management? We love new stories from the field and would enjoy hearing more about your efforts to improve your company’s data strategy.

 

Additionally, we provide amazingly talented and rigorously vetted data science staff at affordable rates.  If you need more hands to shore up a new project, we would love to help you out.  Get in touch with us at 734.761.5962 or leave us a note at info@dataspace.com!

data science budget

It’s that time of year again!  With budgeting for 2018 at the forefront of your agenda, you may be wondering where to head with your data science efforts.  Investments in big data are often expensive, but when planned correctly you can manage costs and also ensure that your new systems and employees generate data-driven profits.  We at Dataspace wanted to share a few things our clients are pursuing or considering to improve their data management teams and systems to bring big insights to their 2018 strategy.

 

Our list is arranged from least to greatest according to the maturity of your data science environment. So, regardless of today’s capabilities, there’s a next step that you can take.

 

We use Excel, but it’s starting to feel inadequate

 

Spreadsheets were one of the earliest BI tools. So, for those of you who think you’ve been bypassed by the BI revolution, think again. However, if you don’t yet have a more modern BI tool, you’re probably working with messy, disparate spreadsheets that require manual data entry.  Maybe you’re frustrated by the complexity, the inability to integrate the various streams of data collection, or the lack of centralized control over who sees what and who can change the analyses.

 

The first step to amending this situation is to adopt a modern, BI tool such as Qlik Sense, IBM Cognos or Tableau which can link to the detail data in source systems and join it to data from the same Excel spreadsheets you currently use.  You will gain an integrated view for senior management and the ability for business users to correlate pieces of data from across the business. And, with their cloud offerings, it’s easier and cheaper to get into these kinds of tools than ever before.

 

We are currently working with a client that performs data analysis by running SQL queries and then manually porting the data into Excel spreadsheets.  The organization’s new CEO is frustrated by the unintegrated nature of the analyses she receives. She has no way to compare trends across her different business units or to drill into those trends. Other executives are frustrated by the amount of time it takes to create and refresh analyses and also by the static nature of the visualizations.  We are helping them implement Qlik dashboards to remedy these pain points.

 

We have a modern BI tool, but our data systems need help

 

Already have a BI tool in place?  The next step is thinking about the data itself and the source systems from which it originates.  It is time to consider building a data warehouse where you can integrate all of your data and make it easy for all employees to access and use tor reporting.

 

Another one of our clients adopted Qlik Sense about six months ago. They’ve started rolling out applications and are seeing some significant successes. Now they’re taking the next step – building a data warehouse to allow easier access to more clean, reliable data.

 

Need help taking this step? Dataspace is experienced in helping clients plan and take the steps to build data warehouses.  

 

We Have a Data Warehouse

 

So you’ve built and know how to manage your data warehouse.  From this stage, we recommend two moves.

 

Step 1: Build your data science capabilities – develop or hire resources who know how to interface with the business, ask the critical “what if” questions and then use their technical skills to integrate data and reveal insights.

 

Step 2:  Provide these individuals with data.  This data is often unstructured, in the form of web logs, textual data, images, voice recordings or what have you.  This often entails moving to new data storage technologies such as Hadoop.  Hadoop is highly scalable and flexible in the sense that you can run it across many servers and process huge amounts of both structured and unstructured data.

 

We have Hadoop or other Big Data technologies

 

If you are already succeeding with Hadoop or different data storage and analysis platforms, the next step is to move towards using true predictive analytics.

 

Push your data science staff to adopt R, SAS, Python, Tensorflow and other tools, languages, and methodologies.  Artificial intelligence, predictive analytics, and machine learning will allow you to develop applications like product recommendation engines and pricing optimization engines, that rely on advanced analytic techniques to inform future decisions. Further, think outside of your organization and consider applications tailored to your strategy goals and the needs of your customers.

 

We’ve started working with predictive analytics

 

If you are feeling good about your current data strategy, congratulations!  You are ahead of the curve so keep up the good work.  Please also keep in mind that Dataspace is able to assist with providing temporary resources for any projects you have requiring extra hands!  

 

Are you budgeting to make your next big step in the big data world? Do you already know your direction but need top-quality, expertly-vetted staff to execute your plan?

 

Either way, we can help! For more information, contact Ben at 734.761.5962 x 503 or email us at info@dataspace.com

actuary

The hot new thing, data science, isn’t so new after all. Since the advent of modern actuarial science in the late 1980s, insurance companies have relied on actuaries to use math and statistics to anticipate the future.

 

 

What Actuaries and Data Scientists have in common

 

Actuaries and data scientists share many of the same responsibilities and have similar educations and skill-sets.  They use many of the same techniques when analyzing data  to make informed predictions about the future. Techniques include statistics, data visualization and pattern recognition.  

 

Both can never get enough data – the more they get, the better chance they have of improving the predictive ability of their models. For example, consider the rise in importance of vehicle telematics for actuaries and data scientists alike. Vehicle telematics describes the systems that connect cars to the outside world, allowing them to communicate with other vehicles and pedestrians, physical infrastructure and the cloud to provide drivers with an enriched experience and to provide researchers with data sets about driving habits.

 

It is telematics technology that has provided the abundance of data that allows data scientists to, for example, optimize the behavior of autonomous vehicles and teach them how to behave better in relation to their surroundings.   

 

At the same time, insurance companies have identified an opportunity to use telematics to adjust premiums based on driver behavior, a technique known as usage-based insurance (UBI).  By making real data about user’s driving behavior available to actuaries, they can provide lower prices to responsible drivers and penalize those who display unsafe behavior while driving.  

 

Where Actuaries and Data Scientists differ

 

Problem Sets

 

Actuaries are found primarily in the insurance industry, where they focus on predicting the possibility of loss, estimating the cost of that loss, and proposing prices to charge that will allow the insurer to cover that loss and still make a profit.  For example, an actuary might be tasked with answering the following question: given information about renewal rates and claims volume on different insurance policies and the popularity of these policies across different populations, what is the lowest price we can charge for our policies and still expect to make a profit?

 

A data scientist, conversely, can be found in virtually any industry and would be tasked with a much wider array of problems to solve.  Designing the questions themselves is often just as important as finding the insights that the answers reveal.

 

Training

 

Actuaries have rigorous formal training, testing, and credentialing such as ASA (Associate of the Society of Actuaries), FSA (Fellow of the Society of Actuaries) and CERA (Chartered Enterprise Risk Analyst).   Actuaries likely have a more thorough and foundational knowledge of statistics than do many data scientists.  

 

A data scientist, by contrast, is often formally trained in statistics and programming, but there isn’t (yet) a formal recognition or testing of their abilities.  

 

Tools

 

Actuaries generally make use of SAS, Excel, VBA, and SQL on a frequent basis and also have additional knowledge of finance industry software such as MoSes and Prophet.  

 

A data scientist is probably more programming savvy than the average actuary and generally have solid command of the previously mentioned languages in addition to C++, R, Python and NoSQL databases (Hadoop, etc).  While actuaries may fall short of data scientists in terms of programming ability now, it is likely that the actuaries of tomorrow will develop a broader skill set to become more competitive with data science professionals.

 

Income Statement Focus

 

The purpose and goals of an actuary are strictly financial, and most of the problems they solve directly impact their companies’ bottom lines.  

 

Data scientists, conversely, address problems that hopefully impact the P&L positively in the long term but not necessarily immediately.  For example, their solution could involve optimizing the customer experience in some way to improve the reputation of the brand, the financial impact of which might not be felt for many years down the line, but which would certainly translate into increased revenue for a company.  

 

Communication Skills

 

Actuaries have a massive advantage in that their knowledge is very insurance industry-specific.  While data scientists are often expected to have stronger business acumen than the average statistician or programmer, they work across many industries and most certainly wouldn’t match the level of insurance expertise that a certified actuary would have.    

 

The Future

 

Which profession has a greater prognosis in terms of utility and survivability?  Many people believe actuarial science will merge into the data science profession and it is conceivable that many actuaries would love the flexibility of working across industries as opposed to being pigeonholed into the insurance space.  

 

Regardless of what the future holds, predictive analytics & big data is the place to be nowadays!

 

 

Trying to get more insights out of your data, or struggling to adopt a data-driven strategy? If superior staffing or consulting is the right formula, give us call at 734.761.5962 or send us an email at info@dataspace.com !

If you have huge volumes of data, chances are that you can get value out of analyzing that data. But, would you derive significant value from analyzing it in real time? This is an important question because, while there are benefits to real time analytics, there are also real costs. To help you decide whether real-time analytics make sense, consider these questions.

 

Does data arrive to you on a constant basis?

 

Remember Doug Laney / Gartner’s textbook definition of ‘big data’? It tells us that big data isn’t just about the volume of data but it’s also about the variety and the velocity of that data. So, if you’ve got high velocity (e.g. machines are always humming, trucks are always driving and customers are always buying) you may see major operational improvements through real-time analytics.  On the other hand, if your company operates largely in batch mode, where the exact timing of transactions is relatively unimportant, it probably doesn’t make sense to build a real time analytics infrastructure.

 

Are you building rockets or launching rockets?

 

While both are necessary to delivering payloads, the job of building rockets is far different than the job of launching them. Building a rocket takes a long time. Plans are made, and changed, based on the data that is captured and aggregated over that long time. If a change is required, it is considered and implemented over time. The decision to abort a program could take months to make.

 

 

When launching a rocket, however, events happen and decisions must be made on time horizons of seconds. The decision to abort a launch needs to be made in seconds, or less.So, when considering a real time analytics investment, consider your situation: Is your business closer to building rockets…or launching them?

 

Are there decisions you would make differently based on changes that occurred in the past hour?

 

Imagine that you run a hospital emergency room and that you schedule a certain number of doctors to be in the facility throughout the day. One day, when you’re staffed for an average of ten patients per hour, you start seeing 12, and then 15, and then 18… What do you do? You probably start calling your on call staff to come in and pick up some of the unexpected load. In other words, you have the ability to make changes based on things you’re seeing at the present time.

 

 

Alternatively, imagine you run a railroad and you notice that you’re selling far more tickets for today’s trip to Kalamazoo than usual. You’ll likely be able to sell twice what you normally sell.  However, the train you have scheduled on that route is sized to your normal capacity and you don’t have any spares. While you hate to lose the revenue, there’s likely not much you can do about it in the short term. You can’t buy a new train with a moment’s notice. Thus, in this case, knowing in real time might be somehow comforting, but it doesn’t pay much in the way of dividends.

 

 

Notice that in both cases, however, analyzing trends over a long time period might provide insights on how to better set your capacity in the future. Which raises an interesting side point: even if you benefit from real time analytics, you can also probably benefit from analyzing your data in the traditional way, too.

 

.

Do your customers or employees need data immediately after it comes available?

 

Have you ever entered data into a web site only to find that there’s a delay of a few hours before it becomes available for you to view. Some hotel and car reservation sites work this way. If you check for your reservation and don’t see it, you’re left wondering if it really got entered. Situations like this present a good argument for real-time analytics.

 

 

Situations like this require a good look at the dividing line between operational systems (those that do day to day transaction processing) and informational systems (those used for analysis). While the business need exists, should the data come from your operational or informational systems?

 

 

It’s sometimes helpful to consider what kind of data the user wants right after it’s entered. If it’s just atomic details (e.g. show me the reservation I just made), you may want to simply direct those queries to the original, operational system, or a real-time replica of that system. However, if the need is for data that aggregates the most recent with older data (e.g. show me the total of all my purchases for the year, including this most recent one), you may have a strong argument for real-time analytics and real-time analytic data storage.

 

 

Designing and implementing effective real-time analytical systems requires large upfront investment and significant monitoring costs.  In-memory databases, scalable platforms and other technologies required for analyzing high-transaction throughput activities are not easy on the budget and so it is important to understand which areas of an organization will benefit sufficiently from real-time insights.

 

 

While it is highly probable that real-time analysis can enhance your business, not every operational activity will require it.   It may be that the speed of customer interactions and business transactions flowing in and out of your operation isn’t fast enough to warrant the cost of these capabilities.  A lower batch-processing frequency such as daily or biweekly processing may give you all of the actionable data you are looking for.  In most companies, the customer-facing departments are those that benefit most from real-time processing, while backend activities such as purchasing and accounting are not time-sensitive enough to warrant the investment.

 

 

Do you feel like you aren’t moving as fast as your data?  Dataspace is a vendor-neutral provider of specialized expertise in business intelligence, data science, predictive analytics and data warehousing for companies nationwide.  Contact us today for more information on how our staffing and consulting solutions can help you make the most of your data assets.

There exists a chronic confusion as to what the distinction is between your average software engineer (i.e. programmer) and a data scientist.  This is totally understandable, considering the fact that both jobs do involve programming and the term “data science” seems so much like the term “computer science”.  However, the two differ in some significant ways.  

 

This week we break down the key differences between the two professions to clarify what being a data scientist really means.  

 

Software engineers “create the products that create the data” – Data scientists analyze the data

 

Software engineers work on front-end/back-end development, build web and mobile apps, develop operating systems and design software to be used by organizations.  Data scientists, on the other hand, focus on building predictive models and developing machine learning capabilities to analyze the data captured by that software.

 

Data scientists specialize in finding methods for solving business problems that require statistical analysis.  They take the data that is created by the organization’s systems and create actionable insights and recommendations for the purposes of optimization in forms like risk mitigation and demand analysis.  And while a software engineer will design tools for recurrent use by the business (i.e. they build a system which is then used, relatively unchanged, for years), a data scientist often deals with discrete, situational analyses, which constantly require tailored tools and processes to be created. So, for example, the software engineer might design and build an order entry system that the company uses for 20 years. The data scientist, on the other hand, takes data from that system to determine 1) if there is a correlation between customer geography and sales quantity one month and, in the next month, 2) to determine the effect of customer demographics on purchasing propensity by day of week and time of day.

 

A software engineer creates deterministic algorithms whereas data scientists create probabilistic algorithms

 

Every program that a software engineer writes should produce the exact same result every time it runs. For example, the programmer at Amazon.com knows that when you buy four items at five dollars each, the total sale will be $20.

 

Data scientists, however, because they are dealing with statistics, can’t always guarantee an outcome. Thus, the data scientist can’t tell with certainty that, because you bought a hockey stick you’re also going to buy a bag of pucks. But they can tell the likelihood that you will and, from that, Amazon can decide whether or not to recommend pucks to you when you buy that stick.

 

While there is some overlap, software engineers and data scientists use different tools.

 

Nowadays, programmers typically work with SQL databases and programming languages like Java, Javascript, and Python.

 

Data scientists typically also work with SQL databases as well as Hadoop data stores. They are more likely to work in Excel and frequently program with statistical software like SAS and R. There is also a big trend toward Python but with different libraries (Numpy, Pandas, etc.) than are used by programmers. An interesting programming environment used by data scientists is called Jupyter, which is a tool that allows the data scientist to write a few lines of code, show the intermediate result, add some documentation, and continue on in that mode until a conclusion is reached. This approach helps make the final result, and how it was reached, more obvious to people reviewing and using it.

 

It is important to remember that although they need them, data scientists are not special because of their coding abilities.  Rather, it is their training in mathematics, statistics and social sciences that gives them an edge when it comes to solving business problems with data.  

 

A data scientist will also possess a strong business acumen and a deep intellect.  They need a keen sense of observation in order to ask the right questions to guide their analytical process.  They have an intellectual passion for getting insights out of data as well as the strategic competence to integrate these insights seamlessly with the overarching business strategy.  

 

Do you need data science, big data, or data warehousing staff?

 

We at Dataspace have developed some really unique methods for understanding your needs and delivering the contract staff and consultants you need to succeed. Our clients tell us that our track record is unsurpassed. So, if you need help moving your data science efforts forward, let’s talk! Here is our contact info.