hashing

What is Hashing?

 

Hashing is simply passing some data through a formula that produces a result, called a hash. That hash is usually a string of characters and the hashes generated by a formula are always the same length, regardless of how much data you feed into it. For example, the MD5 formula always produces 32 character-long hashes. Regardless of whether you feed in the entire text of MOBY DICK or just the letter C, you’ll always get 32 characters back.

 

Finally (and this is important) each time you run that data through the formula, you get the exact same hash out of it. So, for example, the MD5 formula for the string Dataspace returns the value e2d48e7bc4413d04a4dcb1fe32c877f6. Every time it will return that same value. Here, try it yourself.

 

Changing even one character will produce an entirely different result.  For example, the MD5 for dataspace with a small d yields 8e8ff9250223973ebcd4d74cd7df26a7

 

Hashing is One-Way

 

Hashing works in one direction only – for a given piece of data, you’ll always get the same hash BUT you can’t turn a hash back into its original data. If you need to go in two directions, you need encrypting, rather than hashing.

 

With encrypting you pass some data through an encryption formula and get a result that looks something like a hash, but with the biggest difference being that you can take the encrypted result, run it through a decryption formula and get your original data back.

 

Remember, hashing is different – you can’t get your original data back simply by running a formula on your hash (a bit about how to hack these, though, in a moment).

 

What Hash Formulae are Available?

 

There are a huge number of widely accepted hashing algorithms available for general use. For example, MD5, SHA1, SHA224, SHA256, Snefru… Over time these formulae have become more complex and produce longer hashes which are, presumably, harder to hack.

 

Hashing capability is available in standard libraries in common programming languages. Here’s a quick example coded in Python (call me if you’d like to walk through this code – I’d love to chat!):

 

import hashlib

hash = hashlib.md5(“Dataspace”.encode(‘utf-8’))

print(hash.hexdigest())

 

The result comes back as: e2d48e7bc4413d04a4dcb1fe32c877f6

 

Notice that it’s the same as the hash value we created earlier! In the words of Bernadette Peters in THE JERK, “This s__t really works!”

 

Hashing and Passwords

 

When an online system stores your credentials, it usually stores both your username and password in a database. There’s a problem here, though: any employee who accesses the database, or any hacker who breaks into the system, can see everyone’s username and password. They can then go out to the logon screen for that system, type in that username and password, and get access to anything that you are allowed to do on that system.

 

However, if the system stores your password as a hash, then seeing it won’t do a hacker any good. He can see that the hash is, for example, 5f4dcc3b5aa765d61d8327deb882cf99, but he can’t use that to get into the system and look like you. He has no way of knowing that your password (i.e. the value you type into a logon screen) is actually the word password.

 

Can I Break a Hash? Can I Keep Someone Else From Breaking it?

 

Can hashes be hacked? Absolutely. One of the easiest ways is to access a list of words and the hash that each results in. For example, there are websites that publish millions of words and their related hash values. Anyone (usually a hacker, actually) can go to these sites, search for a hash value and instantly find what the value was before it was hashed:

To protect against this security professionals use a technique known as salting. To salt a hash, simply append a known value to the string before you hash it. For example, if before it’s stored in a database every password is salted with the string ‘dog’, it will likely not be found in online databases. So, password salted with dog (i.e. passworddog) and then run through the md5 calculator becomes 854007583be4c246efc2ee58bf3060e6.

 

To use these passwords when you log in, the system takes the password that you enter, appends the word ‘dog’ to it, runs that string through the hashing algorithm, and finally looks up the result in its database to see if you’re really authorized and if you’ve typed in the right password.

 

Hey Ben, Do You Know of Other Cool Uses for Hashing?

 

Why, yes, there are some other great uses for hashing beyond storing passwords. Here are two:

  • Fighting computer viruses: When a computer virus ‘infects’ a program it does so by changing some of the code in that program, making it do something malicious. One way to protect against viruses, therefore, is to create a hash value for a program when it’s distributed to users (i.e. run the computer code through a hashing algorithm and get a hash). Then, whenever that program is run, create a new hash value for the file you’re about to run. Compare the new hash to the original hash. If the two values match then you’re fine. If they don’t match, someone has fiddled with your copy of the program.

 

  • Change data capture: When reading data into a data warehouse we frequently want to know if any records in our source system changed. To do this we sometimes read every field in every source record and compare it to every field in the related record in our data warehouse – a complex process that requires a lot of computer cycles. However, we can speed it up as follows:
    • Read all the fields in the source record, concatenate them together, and create a hash of the result
    • Compare that hash to a hash value that was stored on the related record in the data warehouse when it was last updated
    • If the two don’t match, you know that the source record has changed and the changes should be migrated to the warehouse

 

So…

 

OK, this one got a little out of hand. I was asked to write a short paragraph for our monthly email and ended up with four pages of text. Thanks for hearing me out. I just think the concept of and uses for hashes are way cooler than most people realize.

 

If you’d like to talk about hashes, Python, data science, big data, or World War II aviation, please get in touch – I’d love to chat!

 

Ben

kafka
Number One in Dataspace’s Data Science Series: What does it mean?

 

If nothing else, the data science industry is good at coming up with new, unique, confusing names and terms.  ZooKeeper, MapReduce, Hadoop, Pig, Storm, Mahout MongoDB…the list keeps growing and it’s totally understandable if you can’t always identify or explain the different technologies and tools of the industry.    

 

In each article in this series we will select a term and give you a quick background on what it means and its implications for your data science efforts and broader data strategy.  This week we begin with Kafka!

 

Apache Kafka: What is it?

 

Originally developed by Linkedin, Kafka is an open-source product from the Apache Software Foundation, the well-known provider of open-source software. It is a self-described “distributed streaming platform.” The website reads “Kafka™ is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.”

 

What problems does it solve?

 

Kafka allows you to pass data between systems as transactions occur in real-time.  Imagine a situation where you have multiple web properties, all kicking out transactions and multiple downstream systems that need to read those transactions (e.g. a CRM, data warehouse and order management system).  Each of those systems could build a connection directly to the sources, creating a brittle, spaghetti-like architecture of interwoven systems

 

With Kafka, however, each of those sources, known in Kafka as producers, writes its data just to Kafka.  Each of the downstream systems (known in Kafka as consumers) reads the data from Kafka.  The data is therefore organized for easy access.

 

There are mechanisms for transforming it, called processors.  It’s starting to sound a bit like a combination of our traditional staging area and ETL, isn’t it?

 

Kafka can also store this data, allowing downstream systems to reload history should they ever lose it.

 

Who’s using it?

 

Kafka is used by thousands of major companies, including Twitter, Paypal, Netflix and many other significant players.

 

One good example of Kafka in action is at Walmart, a particularly massive company with a variety of data sources and users.  They began looking at options for scalable data processing systems three years ago and ultimately used Kafka as a company-wide, multi-tenant data hub.

 

It has allowed Walmart to onboard sellers and launch product listings faster by enabling fast processing of data from various sources.  By centralizing all incoming data updates in Kafka, they were able to use the same data for activities such as reprocessing catalog information, analytics and A/B testing instead of each activity pulling data directly from the source systems.

 

Are you considering using Kafka as your data processing platform?  Dataspace can help you find the right people to help you make the transition.  We bring over 25 years of experience in the big data and data science space and leverage our expertise to identify top talent for our clients.  

 

Contact Dataspace at info@dataspace.com or give us a call at 734.761.5962.  

artificial intelligence insurance

Any serious conversation about the future of data and analytics invariably turns to the topic of artificial intelligence.  The past year has seen AI surge in popularity, with high profile corporations and personalities getting behind what many people believe could be the biggest technological watershed moment since the advent of the internet.

 

Indeed, many companies have seriously ambitious goals for their AI investments and it is fully expected that materially higher productivity, profitability and efficiency levels will result for those companies and the users of their products.

 

Over the course of the next few months Dataspace will be examining the impact of AI on the different industries with which we work.  Even if you aren’t in those industries, each provides lessons that are broadly applicable. Today we begin, with insurance.

 

Chatbots and Natural Language Processing

 

According to Accenture, the insurance industry will see dramatic improvements in customer experience through the use of automation for simple claim handling and individual risk-based underwriting processes.  Many insurance companies already use some form of AI-equipped bot to handle customer inquiries that don’t require human involvement.

Chatbots employ what is known as natural language processing, a field concerned with the interaction between computers and language as it is spoken by mere humans, like us.  NLP is based on algorithms that improve over time by gathering input from users to correct their behavior.

 

Insurance startup Lemonade, which makes use of a Chatbot named Maya to handle claims, made headlines last year when a claim was processed and paid through the app in less than three seconds.  While many claims are complex enough to still require human intervention, Lemonade hopes that one day 90% of their claims will be processed solely using NLP algorithms.

 

In fact, those looking well into the future believe that NLP will one day render computer programming languages obsolete and that all programming will be accomplished through natural language voice commands.  But let’s not get ahead of ourselves…

 

More automation of claims management

 

AI image analysis startup Tractable hopes to automate claims management for insurance companies.   By using a decade worth of photos to train a deep learning algorithm, they have developed image recognition technology capable of predicting the total cost of damages with the same accuracy as an experienced adjuster.

 

Users submit photos of their accidents and the incurred damage, and the algorithm assesses the value of the damages. This ensures that insurance companies do not pay more for damages than is fairly due and avoid unnecessary repairs suggested by cheating body shops.

 

Telematics

 

As the quantity of data available to insurance companies grows, so does their ability to optimally assess risk.  Telematics are a way of collecting more of this data that will allow insurers to link human behavior directly to the pricing of premiums.  Telematics is the long-distance transmission of user behavior data to insurance companies.  With the knowledge of driving habits, insurers will tailor their policies to shift premium costs away from safe drivers to risky ones. These technologies will incentivize good behavior among people when it comes to responsible driving, leading a healthy lifestyle and more.

 

It seems inevitable that all vehicles in the near future will send data to insurance companies to reward drivers that exhibit responsible habits and penalize those with high-risk tendencies.  Telematics will allow for the expansion of usage-based insurance (UBI), which will adjust premiums based on how much and how responsibly you use your car, bike, boat or whichever futuristic mode of transport we have at our disposal in the coming years.

 

Similarly, health insurance companies will likely offer the option to wear sensors that monitor physical activity, eating and drinking habits, and more indicators of healthy and unhealthy lifestyles, to adjust premiums accordingly.

 

Wider implications of automation

 

Insurance exists because humans are flawed creatures that are adept at making mistakes.  A significant anticipated benefit of automation is the prospect of eventually reducing errors, omissions and accidents to negligible levels as it replaces our responsibilities as humans.

 

So what happens when 100% of the cars on the road are driverless and life continues on without any of the rainy-day events that make auto insurance companies necessary?  Many experts believe that premiums will at the very least become dramatically cheaper as accident claims fall precipitously.

 

Further, there are predictions that self-driving technology will greatly reduce car ownership, changing it to more of a usage-based model. When you need a car, you’ll use an app to contact the car company and the vehicle will navigate itself to your location to pick you up. In this scenario, not only will premiums go down but, also, the number of customers will also decrease.

 

Tesla has ambitions to one day include insurance in the cost of its driverless cars, as they have already noticed significant declines in accidents with the advent of their Autopilot technology.

 

Forward-thinking insurers are already considering the effect that automation will have on their businesses, and coming up with mitigation plans.

 

Are you an insurance company looking forward to the AI revolution and needing help managing your ever-expanding data assets?  Dataspace is a vendor-neutral provider of big data staffing and consulting solutions and we stand ready to assist you with your data needs and challenges.  Give us a call at 734.761.5962 or email us at info@dataspace.com.

Cyberattacks have become increasingly damaging and visible in recent years, in part because of numerous, high-profile instances of hacking affecting everything from your personal files to global election outcomes.

 

And, in fact, hacking is no longer just about unauthorized access to, or loss of data. As the internet of things continues to expand, so has the threat of so-called “kinetic” cyberattacks, attacks that result in physical damage or worse, the loss of human life. In what was perhaps the highest profile instance of a kinetic cyberattack, the Stuxnet worm destroyed Iranian nuclear centrifuges by manipulating the code that controlled their speed.

 

Another growing concern is data sabotage, the subtle manipulation of data within transactional databases with the aim of some direct or indirect benefit.  Cyber criminals have realized that they can reap similar benefits from changing information as they can from stealing it, and stand a higher chance of going undetected by current monitoring tools.  This is of particular concern in the financial sector, where tweaking certain numbers, such as the revenue figures on an earnings report, or the price of a recent stock transaction, can send ripples through the stock market and cause billions of dollars in damage.

 

Thankfully, the good guys are keeping up and developing strategies to thwart modern cyberattacks.  We take a look this week at how cyberattacks have traditionally been detected and how data-centric threat detection is changing the cybersecurity sphere, leading security companies to take a highly contextualized and analytical approach to threat detection.

 

Scalability and Data Unification: To catch intrusions, you have to look at every piece of data

 

Traditional security event and information management software does not collect enough data to be able to detect modern, sophisticated attacks.  And, while they do use some historical data, many do not have the storage or processing capabilities to analyze anything older than 30 days, which can lead them to ignore key abnormalities.  Additionally, these tools examine different sources of data individually and not in conjunction (i.e. not correlated) with one another.

 

So new tools have emerged which take into account the size, speed, complexity and variety of data in order to detect the new generation of cyberattacks.  The new paradigm calls for layering predictive analytics and machine learning algorithms on top of all sources of data in an organization’s cyber infrastructure.

 

With such quantities of data, well-designed visualization is essential

 

Visual representations of infrastructure data can help make security vulnerabilities obvious. However, today’s security professionals are not well versed in data visualization.  Typically, they are only formally trained in computer science, statistics, and security.  But in situations where data is captured across much longer time horizons and from multiple, disparate sources, well-designed visualization becomes indispensable to threat analysis.

 

Those companies that do use visualization tools have traditionally used them for post-attack illustration and not for analysis of real-time threats.  But the integrated platforms described above, when paired with elegant and streamlined visualization, now give users the ability to quickly and accurately identify system vulnerabilities.

 

Real-time is a necessity

 

A few weeks ago, we wrote about the importance of knowing when real-time analytics can help your strategic decision making and security is one of the cases where it makes all the difference.  When it comes to data-centric security, it is imperative that your platform possess the ability to process all of the information going in and out of your network in real-time.

 

Cybersecurity is getting more expensive, making smaller companies more vulnerable

 

It used to be the case that hackers would target massive corporations with large-scale cyberattacks intended to disrupt thousands of systems and make front page news.  By contrast, the modern cyberattack is more likely to be a low-profile attack on confidential data, intended to go unnoticed.  Smaller companies are the most vulnerable, because they can’t afford to implement and manage systems that track the big data moving through the endpoints of their organizations.

 

The software and human talent to enable this type of monitoring are not necessarily expensive, but the hardware to handle the processing of such massive amounts of data can be extremely costly.  Thus, your security approach should depend chiefly on the value of the assets you are protecting.

 

Here are a few options for protecting your IT systems

 

Platfora and MapR offer a security solution that combines data transformation, visualization and analytics on top of a native Hadoop platform, allowing multiple varieties of data to coexist in a single repository.  By combining the scalable platform with sensors at the gateways of the business IT infrastructure, the algorithms are able to detect irregularities and then present them in a visually digestible way to end-users.

 

Another platform, Sqrrl Enterprise, uses a three step, data-driven approach to expose threats and intrusions.  It allows users to embark on “hunts” driven by established indicators or also “exploratory hunts,” driven by hypotheses and optimized using their automated analytics and machine learning processes.  Then, by scouring current and historical network data coming in and out of your organization, Sqrrl is able to pinpoint threats that other security solutions would have ignored.  In identifying and disrupting attacks, the system learns to generate new indicators to inform future hunts.  And like all user-friendly platforms, it offers advanced risk scoring and visualization capabilities.

 

Moving Ahead

 

Automation and the big data it produces is a double edged sword; it brings huge business benefits but all of that data and reliance on technology also introduces major security challenges. Just like a recursive algorithm, though, it’s fascinating to see how these same technologies are being used to monitor and protect themselves.

 

Need Help with Your Big Data and Data Science Efforts?

 

For almost 25 years Dataspace has helped our client navigate the opportunities presented by big data, analytics, and data science. We provide both data strategy consulting and cost-effective, highly talented implementation staff. Want to kick around your needs? Contact us at 734.761.5962 or info@Dataspace.com. Thanks!

 

facial recognition

A key feature of big data is its lack of structure – we’re talking about the stuff that doesn’t fit neatly into excel columns or that is easily described on first glance using numbers or other descriptors.  This includes things such as images, satellite data and social media posts, which altogether comprise the bulk of all data in the world.  One of the hottest and also controversial applications of unstructured big data is in the field of biometrics, the analysis of data that can identify us as individual human beings.

 

This week we will take a look at some of the interesting use cases of facial recognition and explore how the algorithms and technologies function to identify people based on their facial features.

 

Facial Recognition

 

Thanks in part to the trove of images that has been uploaded to social media sites such as Facebook, facial recognition has progressed by leaps and bounds in the past several years.  You have likely seen it in action using Google Photos or Facebook, with functionality such as automatic tagging of recognizable faces.  It is indeed an impressive capability, albeit somewhat creepy.

 

And that is because the technology has a high propensity to be used for nefarious or oppressive purposes.  It is becoming exceedingly popular with governments around the world who want to automate or enhance certain aspects of law enforcement through the tracking of their citizens.  A recent Wall Street Journal article detailed how China is using facial recognition to police its citizens and even go as far as “scoring” them based on automated observation of their social behavior.  A citizen who jaywalks for example, can be caught via a street camera and penalized either through a financial penalty or, more disturbingly, a dent in their ‘social score.’

 

Many retailers have started to employ the technology to spot shoplifters when they attempt to re-enter their stores.  By tracking the identities of known shoplifters, loss prevention professionals  are able to pinpoint a shoplifter and take measures to avoid repeat offenses.  An article in Loss Prevention Media cites a VP of a major retailer, noting high recidivism rates among shoplifters and a lack of really being able to deal with them prior to this technology.  “‘We now know that 26 percent of the people we detain, we see again in the brand within one month, on average 13 days later. We never had a way of knowing things like this before. This is stuff that LP associates will salivate over. It’s going to be a game changer.;”

 

So how does this facial recognition technology work?

 

Facial recognition software is made of highly refined machine learning algorithms that have taught themselves to identify relative characteristics that are unique to different faces.  It first takes an image and isolates the faces that it identifies within (To try this yourself with Python, check this out).  The software then analyzes the image to determine if any reorientation or resizing needs to occur before looking more closely at individual features.  If necessary, it will adjust the image so that the key points are at a comparable pixel position as they existing photos in the database.  So for example, the right eyebrow will need to be in approximately the same position as the right eyebrow in database photos.

 

Software then looks at the relative positioning of different facial features to create a “faceprint,” the set of unique characteristics that make one face different from every other face.  This includes the shape of the eyebrows and their distance from the eyes, the corners of the eyes and mouth, the points of the nose and the shape of the lips and chin.  More than 100 other features may be processed to improve the accuracy of the match

 

Once the faceprint is established, it looks for a match in the database of existing photos.  In many cases, the software will identify the face in the image, sometimes more accurately than a human.  However, error rates for facial recognition remain high compared to those of fingerprints and retina scans.

 

If you know or are learning Python, here’s a tutorial on how to use OpenCV and Python to perform face recognition!

 

Privacy Concerns

 

According to survey data, consumers are afraid of companies and governments that use facial recognition data.  Ask Your Target Market conducted a survey in 2016 that found that 62% of people are at least somewhat concerned over how facial recognition might impact their personal privacy.  Only 10% of people thought it would be acceptable for companies to use the technology for marketing or advertising purposes.

 

Indeed, companies employing facial recognition will soon have the ability to track the movements of anyone that enters their stores and people fear that the information could be used in ways that are unfair to consumers.  The data could even be sold to other companies or used against them in a legal context.

 

Unsurprisingly, the primary consumers of this technology are governments, many of whom would like the technology to monitor citizen activity in a way that exceeds the requirements of normal law enforcement.  The WSJ notes an instance where a vocal government critic had been detained by police in southwestern China despite him making a concerted effort to hide his location from authorities.  The man said that authorities were able to track him using cameras in certain intersections that used his faceprint to identify him.

 

Biometrics, like much modern technology that once seemed like distant fantasy, is now a staple feature of the new, data-centric reality.   This is all built on two key concepts: big data to store massive libraries of faces and machine learning to run and improve the recognition algorithms.

 

It is expected that as biometric data continues to be captured, the algorithms for processing it will become more accurate.  What is less certain is how our society will react to the use of these technologies and how regulation will evolve to prevent (or enable) companies and governments from taking advantage of customers and citizens.

 

Are you a company looking to do more with biometric data and in need of talent to help you achieve your goals?  Dataspace is a vendor-neutral provider of big data staffing and consulting services.  Contact us today at 734.761.5962 or email us at info@dataspace.com for more information on how we can work together!

We here at Dataspace thought it would be helpful to share some intriguing examples of how data can be easily manipulated to bring efficiency and value into our day-to-day lives.  This week we will focus on how the daunting and time-consuming task of learning a foreign language can be made easier by taking a data-driven approach.

 

With a little help from my Friends

 

A few weeks ago, data analyst and researcher Tomi Mester of data36.com posted a detailed article about his unorthodox method for quickly understanding the Swedish language.  Mester knew that he didn’t have the time it takes to truly gain fluency in a foreign language, much less a language as difficult as Swedish.  But he wanted to pursue it as a “hobby language” to make his time in Sweden a more rewarding and accessible experience.

 

Instead of picking up a phrasebook or downloading a language-learning app, Mester found all of the Swedish subtitles for two popular sitcoms: How I Met Your Mother and Friends.  A few lines of code later, he had every word used in these series organized by frequency of usage.  He made a cut-off at the top 1000 words and asked “how much will I understand if I learn only these 1000 words?”

 

By simply comparing the total instances of these top words to the total number of words counted in the scripts, he realized he could understand as much as 85% of the sitcoms just by learning 1000 words!

 

So what tools do I need to do this myself?

 

The cleaning and analysis Mester carried out in doing this task was all done almost entirely with the help of Bash.  Bash is simply a language that allows you to interact with your operating system via the command line.  With a few lines of Bash code, he was able to scour the compilation of subtitles, sort its contents into an alphabetized list of individual words, clean the words into a uniform format and also print the frequency of each word next to it in a .csv export.

 

But, you say, “I don’t know Bash. How can this be easy?” Two answers: If you look carefully at the lines of Tomi’s code you start to realize that you could duplicate his program in a tool you probably do know: Excel.  Secondly, if you’ve worked in a command line environment (anyone remember DOS?), you’ll realize that Bash isn’t too hard to pick up. Give it an hour or two and you’ll know the basics. Here’s a great place to start.

 

Is data science really this easy?

 

Mester emphatically notes that, unfortunately, this is lightweight data science at best. But it’s a start and does provide a nice introduction to the steps in the data science process Learning tools such as Python, R, Bash and SQL represent other meaningful steps in the direction of becoming a data scientist, or at least a citizen data scientist.

 

The point here is that data can be useful in unexpected places, and making it useful is not always a complicated endeavor.

 

Have you done anything cool with data lately?  We would love to hear about it!  Drop us a line and let us know how simple data tools have helped you in your everyday life.  

Challenge

Here at Dataspace we’ve worked with Qlikview for a number of years.  It’s a great tool but it has one, glaring limitation: Qlikview does not offer an inexpensive, native method to publish and distribute reports to users.  In order to publish reports from Qlikview, one must purchase both Qlikview Publisher and PDF distributor which are expensive and far from pixel perfect.  However, NPrinting offers an alternative, at less than a third of the cost, to generate, schedule, and distribute customized reports. 

 

NPrinting is a 3rd party tool by Vizubi that connects to one or more Qlikview files to quickly create unified reports that can be built from scratch or from existing templates. With the ability to integrate into enterprise systems, NPrinting could serve as a viable option for Fortune 500 companies just as well as smaller companies. Since NPrinting relies on Qlikview and integrates with MS Office, there is a simple learning curve leading to quick results. 

 

Using NPrinting

 

How simple is distributing reports in NPrinting? Well, if you know Qlikview and MS Office – simple!  The process follows these steps: 

  1. Create the objects you want on your reports in QlikView
  2. Start Nprinting and
    1. Create a report template
    2. Create a report task
    3. Create a schedule
    4. Create your output folder and/or set up recipients
    5. Run or schedule your reports 

How simple is it to receive and read NPrinting reports?  Even simpler – just open your email and there they are, in pdf or a variety of MS Office document formats!

 

Demo

 

Here, we’re using a demo Expense Management report available from Qlikview. First we’ll load the Qlikview document into NPrinting to create a connection. Note that we can connect to a QlikView Access Point if we so desire:

 

Capture1-2

 

Now that we’re connected, here is the Nprinting main page. On the top left is the “Reports” section which provides a list of the formats in which you can distribute your reports.  On the bottom-left, are tabs for Scheduling, Tasks, Reports, etc. Once you select a tab, it will bring you to the respective page where you can control your options:

 

Capture4

 

Now you can create a new template that can be reused for future reports. It is a very simple interface, but still gives a lot of flexibility since it uses Excel formatting and syntax. On the left panel, you’ll note the list of Page, Levels, Images, Tables, Cells, and other options that can be added to the report. To do so, just right-click on the object you want to add and then the object list will pop-up:

Capture5

Now let’s add objects to our report.  Start by right-clicking on the list on the left to add the respective Page, Level, Image, or Tables. Nprinting will present us with a list of all of the objects in the Qlikview document to which we are connected. When we first created the connection, Nprinting automatically populated the list of the available objects that we can use in our reports. Simply select the charts that you would like added to your report and hit OK. 

Capture6

 

TIP: It is helpful to keep the original Qlikview document open in another window so you can quickly find the proper chart ID: 

 

TIP: As you can tell from the picture, if you’re designing a QlikView application for distribution with Nprinting, naming your important objects will make the Nprinting development process easier.

 

Now that we’ve added several images and tables to our list, we can simply drag and drop them onto the Excel sheet on the right:

 

Capture7

 

Click the Preview button to get a quick look at the report.

 

Capture8

 

 

 Now that we’ve got a basic report template, we can save our report and exit back to the main screen to create a report task. The task describes how and when to distribute the report. 

For our report task, let’s select an output folder in which to create the report. It’s important to note that we could also select individual users or lists for the report to be sent out to – one of Nprinting’s most valuable features. Note that reports can be  password protected

 

Capture9

 

 

 Once our task is saved, we can run it manually or create schedules to automate it

 

Capture10

 

Now that the task has completed successfully, let’s head to our Output folder to see the result:

 

Capture8

 

Voila! In just a few minutes, we’ve managed to set up our connection to a Qlikview document, create a report, and export it to a folder where users can access it. 

 

Key Features:

 

nPrinting’s key features include: 

 

Schedule and Distribution

  • It can create tasks and jobs to reload documents, run macros, import recipients, and distribute documents
  • Reports can be delivered in formats such as PDF, Excel, PowerPoint, Word, HTML, or Qlikview entity, or PNG image
  • Excel reports can be created to offer great flexibility for end users
  • Reports can be emailed, saved to directories, or published online on an automated schedule 

 

Security

  • Nprinting supports Qlikview section access and user login to control permissions to reports
  • Reports can be encrypted
  • Filters can be associated with users or groups so they only receive relevant information 

 

Connectivity

  • Nprinting can connect to AccessPoint server and combine multiple QVWs into a single report 

 

Reports

  • Reports can be pixel perfect
  • Report templates can be created for fast development and reuse
  • Existing Qlikview bookmarks can be used as NPrinting filters
  • Qlikview objects can be simply dragged and dropped onto reports 

 

Contacts

  • Contacts can be manually input or accessed from file, Qlikview, or LDAP directories
  • nPrinting developers can create filters so that different users or groups receive customized reports based on credentials.

 

Downsides

 

Of course, there are a few limitations with which you should be aware:

 

  • Jobs cannot be run in parallel. However, it seems that this feature may be added in future releases.
  • Users cannot manipulate their reports – what they get is, well, what they get

 

My Recommendation

NPrinting is a fantastic addition to Qlikview that offers far more flexibility and customization than do Qlikview’s competing options. Not only does NPrinting give a wider array of options, but it is a fraction of the cost of Qlik’s option.  The company, also, has excellent customer support. If you are looking for a quick, inexpensive way to leverage your QlikView investment by publishing static reports to users, I strongly recommend taking a look into NPrinting.

 

I am consistently impressed by the number and quality of new data tools I’m seeing online. In the past, if you did not have access to an enterprise-level tool, your charting and analysis was done in Excel (and even if you did have an enterprise-level tool, your charting was probably still done in Excel).

 

The latest cool tool to come across my screen is DataHero. With the ability to connect to popular cloud-based services, accept CSV and Excel file uploads, and combine all of them together (when sensible), DataHero makes charting and analysis pretty darn easy.

 

Is this a tool for everyone in the enterprise? No, but I don’t think that’s what it is trying to be. It seems that the better fit would be a smaller department, say a marketing or sales department in a midsize business. Often, these users are relying on SaaS applications to do their day-to-day work, applications like HubSpot and MailChimp. And my experience has shown that these users tend to be on something of an island when it comes to enterprise data, since much of their activity involves working with data before it ever hits the company’s operational applications. As a result, these generally non-technical users are left to either figure out charting in Excel, or to buy cups of coffee for their more technical co-workers in exchange for a spreadsheet or two.

 

A Quick Chart

 

How easy is it to use? Given my background with data and reporting tools, I may not have the same perspective as a data novice, but it seems pretty straightforward to me. Select your SaaS data source (from among 23 choices), or upload an Excel or .csv file from your computer. Once imported, DataHero asks you to confirm the data types in an easy-to-use format (in this case I’m using one of the available sample data sets provided by data provider Qandl):

 

datahero1

 

When everything looks good, a single click on the ‘IT LOOKS GOOD!’ button and DataHero brings up a number of suggested charts against your data (I think this is really cool):

 

datahero2

 

One option is to select one of the suggested charts and make modifications to it, if needed. If you don’t like the suggested charts, or have a specific chart in mind, clicking on the ‘CREATE NEW CHART’ icon brings you to a blank screen with your data fields along the left side, ready to drag and drop onto the chart area:

 

datahero3

 

As you drag and drop the various fields onto the chart area, DataHero will figure out the best (or at least a good) way to display it. From the example above, I first dragged ‘Exports (metric tons)’ onto the chart area, followed by ‘Fuel Type’. The resulting chart looks clean and usable:

 

datahero4

 

But maybe the chart doesn’t have enough context yet. So, let’s add in the ‘Date’ (which in the source data is only the year) to make it more meaningful:

 

datahero5

 

Getting there! Note that the Fuel Types in the legend are sorted alphabetically (as directed by the ‘Order: A-Z’ in the ‘Colored By’ dialog box on the left side). Instead, let’s sort them by the largest values. Simply drop down the Order dialog and select ‘Order: Top by Value’ (I also added a title to the chart while I was at it):

 

datahero6

 

There! First chart done. Of course I could continue to tweak it by adding labels, etc., but you get the idea.

 

Other Features

 

Filtering – I didn’t show it in my demo above, but charts can be filtered by a data item whether it shows up on the chart or not.

 

Combining Datasets – A great feature of DataHero is the ability to tie two (or more) datasets together. Once your first dataset is selected, you tell DataHero to ‘Combine Your Data’. It asks which field in the dataset should be used as the link, then prompts for your second dataset. Note that your second dataset already needs to be in DataHero before you start this operation, I didn’t see any way to add it in on the fly. Once the second dataset has loaded, you tell DataHero which field to link on (and, yes, they do need to match!).

 

Dashboards– In a manner similar to Tableau, charts can be grouped into dashboards. Note that, unlike combining datasets, which is available on a limited basis with the free account, you must have a paid account in order to create dashboards.

 

Data Update – Because your underlying data changes, periodic updating is a necessity. These updates can be kicked off manually, or can be scheduled through DataHero. Note that files uploaded from your computer must be updated manually.

 

Export – Interestingly, exports of charts are available only as .png files, while dashboards can be exported as a PDF packet or a zip file of images. While this would be acceptable in many cases, it also has its limitations.

 

Free Account Limitations

 

Though a free account can provide value right out of the box, it does have limitations:

 

  • Uploaded files can only be 2MB, while a paid account allows for 10MB uploads
  • Only one dataset combination is allowed with free accounts, while there is no limit with paid accounts
  • As mentioned above, dashboards are only available with a paid account
  • Data updates are available only with paid accounts (though you can try one update with the free account)

 

Paid Plans

 

DataHero keeps it pretty simple with their plans. In addition to the free plan discussed throughout, there are two paid plans: Premium ($49/month), which gives access to one user, and Team ($69/month/user), which allows multiple users to work on the charts and dashboards, and has access to priority phone and email support.

 

Caveats

 

Now before you go out and switch all of your department’s reporting over to DataHero, remember that with any SaaS there are possible issues:

 

  • Longevity – I don’t have any clue about the company’s financial health, but there have certainly been cases over the years of apps disappearing quickly, and with little or no warning.
  • Lack of Calculations – I didn’t see any way to create calculations within the charts, so either I missed it or you need to make sure that everything is available in your source data before you start.
  • Volume – I didn’t attempt any large-volume tests with the tool so I can’t vouch for its performance when data sets get large. At the same time, DataHero is not claiming to be the right tool for large-volume environments.
  • Database Connectivity – As of right now, there is no way to hook directly to databases. Of course, adding that capability would pit DataHero against enterprise-level tools, and that doesn’t appear to be their target market.

 

As with every other piece of software ever written, DataHero is not perfect. But for what it is trying to do, it sure seems like a good candidate for many of the data-poor teams out there. Give it a try and let me know what you think!

 

My Recommendation

 

I am really impressed with DataHero as a low-impact data visualization tool.  If you’ve got needs that just don’t warrant the weight of a MicroStrategy, a Business Objects, or even a Tableau or Qlik, give it a try.  I think you’ll be pretty impressed.