JSFeeds: macwright.org - Map, a moderately better dictionary datastructure for JavaScript

Monday, 13 March, 2017 UTC

Map, a moderately better dictionary datastructure for JavaScript

Summary

From the beginning
JavaScript objects don’t map from any string to any value
Why it matters
The ‘right way’ to use objects as data: Object.create(null)
Introducing the Map
- Pro: testing for emptiness is easier with Map
- Pro: testing for a key’s existence is easier with a Map
- Con: Maps and JSON don’t get along
- Con: Maps don’t work very well with Flowtype
Footnotes

From the beginning:

At the bottom of programming languages are primitive data types. They include the booleans, true and false, all sorts of numbers like integers and floating point, characters (letters and numbers), and sometimes other types. Primitive data types can’t usually be broken into pieces - they’re like atoms1.

And then there are composite data types, which combine primitive data types into collections of data. At the very least, most languages have two kinds: arrays, also known as lists tuples, and dictionaries, also called as maps, objects, hashes or structs.2 Lists are used to store data that’s organized like a sequence, whereas dictionaries store data that has named parts, because dictionaries allow you to retrieve each part by name.

If you’re dealing with real-world information, chances are you’ll need to use a dictionary at some point. It’s the datastructure that makes it easy to store people’s phone numbers and email addresses together without needing to remember whether you stored the phone number in [0] or in [1].

{ "email": "[email protected]", "phone": "1 (555) 555-5555", "age": 61 }

From this example and the above introduction, you should have some expectations about JavaScript’s dictionary type. It would be fair to expect objects to work roughly like the diagram of their encoded form from JSON’s documentation:

{string:value,}

That is, they map from string keys to values. Digging deeper, the claim would be

any string key
maps to
any value

There’s good reason why math notation has a special symbol, ∀, for ‘universal quantification’. There is a big difference between a statement being true for a few examples, versus remaining true for all members of the domain (in this case, all strings).

JavaScript objects don’t map from any string to any value

Unfortunately, this statement isn’t true: objects, as commonly used, don’t actually support mappings from any string to any value.

If you’re curious, you can run these counterproofs in a Node.js REPL or in your browser’s console.

There are some keys that you can’t set:

> var x = {}; > x['__proto__'] = 'test' 'test' > x['__proto__'] {}

And other keys that already exist for ‘empty’ objects:

> var x = {}; > x['constructor'] [Function: Object]

So, much unlike empty, unopinionated containers for our own data, JavaScript’s objects contain quite a bit of pre-existing functionality, as well as opinions about certain keys that can’t be overwritten.

Why? Well, objects in JavaScript play many roles. In this article, I’m talking about them as containers for data, but they’re just as commonly used in the realm of object-oriented programming, in which they contain functionality, state, and other features bundled up together. So when we use them as containers for data, they have some unwanted properties hanging around.

Why it matters

You can go a long time before getting bitten by this problem, but it will eventually catch up to you. For instance, we often use JavaScript’s objects to contain the values of dictionaries coming from other languages or environments, like the contents of a query string in a URL. A query including __proto__=true is perfectly valid in a URL, but would be incorrectly ignored if query parameters are represented with a JavaScript object.

Similarly, when we use objects to ‘index’ values dynamically, like in this example we take a string value that’s perfectly valid as a value in an object and instead try to use it as key, and fail in that effort.

function indexById(array) { var x = {}; array.forEach(function (val) { x[val.id] = val; }); return x; } var indexed = indexById([{ id:"__proto__" }]); // == {}

Realizing this endemic problem made me think back to all the times I had used objects in this manner, and shudder at the potential bugs that usage might cause.

The ‘right way’ to use objects as data: Object.create(null)

There’s a trick you can use to create objects that don’t have any special pre-filled or protected properties: Object.create(null). See, the properties like __proto__ and constructor that we just identified are due to normally-created objects, the ones you get with {}, inheriting functionality from the Object prototype.

I’ve been using Object.create(null) for data structures in documentation.js with some success, but it comes with drawbacks. As we’ll mention below, Object.create(null) creates an object with no methods, not even useful ones like .hasOwnProperty, so you’ll have to use some dodgy hacks to use those methods when you need them. And, similarly, if you’re passing your objects with no methods into other people’s libraries and APIs, those libraries often reasonably expect that the objects you pass in have the full prototype, rather than are bare-bones data representations created with Object.create(null).

> var x = Object.create(null); > x['__proto__'] = 'hi'; > x['__proto__'] 'hi' // yay! > Object.prototype.hasOwnProperty.call(x, 'emptyKey') // tedious! false

Unfortunately, though Object.create(null) fixes some of the problems of using objects as data, it’s non-obvious, somewhat laborious, and not a very popular technique.

Introducing the Map

In an effort to solve this problem and improve JavaScript’s data types, ES6, a new version of JavaScript supported in Node 4+ and a wide range of browsers, introduced the Map. It’s a new data structure that, unlike objects, is able to truly map from any string to any value.

And it does one better, by also supporting other kinds of keys, including numbers3, functions, and objects.

var myMap = new Map(); myMap.set(1, 'This is the number 1'); myMap.set('1', 'This is the string "1"'); myMap.get(1) // This is the number 1 myMap.get("1") // This is the string "1"

Unfortunately, the Map data type isn’t strictly better - in a few ways which I’ll describe, it’s less convenient than traditional objects. But, when you need a way to use a dictionary as an index or otherwise index it with strings that you don’t know ahead of time - in other words, use ‘any string’ as a key - Map is a big improvement.

Pro: testing for emptiness is easier with Map

Testing if a Map is empty is super easy:

> myMap.size() === 0 true

Whereas testing if an object is empty is super error prone. The most succinct way to do it would be this:

> Object.keys(myObject).length === 0 true

But the performance-minded will note that that method is inefficient - for gigantic objects, you’re generating a huge list of keys, just to compare it to 0. Similarly inefficient is the JSON stringification method:

> JSON.stringify(myObject) === '{}' true

To efficiently test for emptiness, you’ll need this function:

function isEmpty(value) { for (var entry in value) { if (entry != undefined) return false; } return true; }

So, in terms of testing for emptiness, Maps have a big advantage.

Pro: testing for a key’s existence is easier with a Map

The Map object has a nice method named has() which tests whether a key has been set.

> myMap.has('hi') true

For objects, it’s not quite that easy. If you create an object with {}, you might be tempted to write

> myObject['hi'] !== undefined true

But what if you defined the value as undefined, so the key is, in fact, set? In which case you can upgrade to

> myObject.hasOwnProperty('hi') true

But what if you created myObject with Object.create(null) to dodge the predefined keys issue? In that case, your myObject doesn’t even have the hasOwnProperty method, so you’ll need to borrow it from the Object constructor itself:

> var myObject = Object.create(null); > myObject['hi'] = true; > Object.prototype.hasOwnProperty.call(myObject, 'hi') true

That’s gross, and most people won’t go through the trouble to do it ‘the right way’, especially because the right way looks like a hack.

Con: Maps and JSON don’t get along

The flipside of the Map’s flexibility in terms of keys - that it can accept numbers, objects, and other types as keys - is that it’s no longer similar to JSON’s idea of objects. JSON’s objects are, in fact, just strings mapped to values, so they are a subset of what a Map can represent. For this reason, the Map data type doesn’t fluidly become JSON like a traditional JavaScript object can.

Stringifying a simple JavaScript object:

> JSON.stringify({ x: 1 }) '{"x":1}'

Stringifying a Map:

> var myMap = new Map(); > myMap.set('x', 1) Map { 'x' => 1 } > JSON.stringify(myMap); '{}' // oops, that didn't work!

So… simply calling JSON.stringify doesn’t quite work in this case - you’ll need a method and sadly that method will simply convert the Map to an object before turning it into JSON.

function mapToObject(map) { var o = {}; map.forEach(function (key, value) { o[key] = value; }); return o; }

Con: Maps don’t work very well with Flowtype

Flowtype is a level on top of JavaScript that provides types, just like C++, and helps you catch errors before a program even runs. It works incredibly well with traditional JavaScript objects - you can declare which keys can contain which values and it’ll enforce that you use those values correctly from then on.

Unfortunately, like Immutable.js’s objects, Maps are too dynamic for Flow to properly track them, so you can’t define Flow types for the data in a Map.

Summing up, here are some of the things you might want to do with data, and whether they’re easy, hard, possible or impossible with objects and maps.

	Object	Map	Object.create(null)
is it empty?	hard	easy	hard
keys can be any string	impossible	possible	possible
keys can be non-strings	impossible	possible	impossible
asking if a key is in it	easy	easy	hard
you can define Flow types	possible	impossible	possible
JSON stringify & parse	easy	hard	easy

Footnotes

Or, yeah, subatomic particles if you want to be nerdy.
Yep, these words refer to slightly different things in different contexts - Python’s idea of dictionaries lets them be modified with new keys, whereas structs typically have a defined layout. But conceptually they’re similar enough that I’m grouping them together here.
But I thought that objects supported numeric keys? Objects let you provide a number as an index, but they cast that number to a string before using it as an index. So you can’t have a key 1 and another key "1" in the same traditional object - they’ll both cast to "1".

... more @ macwright.org

macwright.org