Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Shale – a Ruby object mapper and serializer for JSON, YAML and XML (shalerb.org)
182 points by beerkg on May 31, 2022 | hide | past | favorite | 52 comments


Hi, I released Shale, a Ruby gem that allows you to parse JSON, YAML and XML and convert it into Ruby data structures, as well as serialize your Ruby data model to JSON, YAML or XML.

Features:

- convert JSON, XML or YAML into Ruby data model

- serialize data model to JSON, XML or YAML

- generate JSON and XML Schema from Ruby models

- compile JSON Schema into Ruby models (compiling XML Schema is a work in progress)

A quick example so you can get a feel of it:

  require 'shale'

  class Address < Shale::Mapper
    attribute :street, Shale::Type::String
    attribute :city, Shale::Type::String
  end

  class Person < Shale::Mapper
    attribute :first_name, Shale::Type::String
    attribute :last_name, Shale::Type::String
    attribute :address, Address
  end

  # parse data and convert it into Ruby data model
  person = Person.from_json(<<~JSON) # or .from_xml / .from_yaml
  {
    "first_name": "John",
    "last_name": "Doe",
    "address": {
      "street": "Oxford Street",
      "city": "London"
    }
  }
  JSON

  # It will give you
  # =>
  #  #<Person:0xa0a4
  #    @address=#<Address:0xa0a6
  #      @city="London",
  #      @street="Oxford Street",
  #      @zip="E1 6AN">,
  #    @age=50,
  #    @first_name="John",
  #    @hobbies=["Singing", "Dancing"],
  #    @last_name="Doe",
  #    @married=false>

  # serialize Ruby data model to JSON
  Person.new(
    first_name: 'John',
    last_name: 'Doe',
    address: Address.new(street: 'Oxford Street', city: 'London')
  ).to_json # or .to_xml / .to_yaml
Source code is available on GitHub: https://github.com/kgiszczak/shale


Hey this is a very cool project! When you were developing it, I'm curious if you took any special security precautions in your design of this project, seeing how XML/JSON/YAML serialization and de-serialization are the topic of many high profile CVEs, particularly in the Ruby community?


Shale uses Ruby's standard library parsers out of the box, so if you keep your Ruby up to date with security updates you will be good. Also others in this thread suggested to set minimal version on dependencies, so I'll probably do that in the future version.


When using the shale gem, how would you avoid the mass assignment problem? Is there a configuration, or a way of using the shale gem to avoid it?

CWE-915: Improperly Controlled Modification of Dynamically-Determined Object Attributes <https://cwe.mitre.org/data/definitions/915.html> (Ruby on Rails Mass assignment bug)


This seems like programmer error. Don't put restricted fields into types you're deserializing off the wire. It's like accepting user input and directly inserting it into a database without any validation.


If you don't define attributes explicitly on the model, Shale will ignore them.

Regarding attributes that you defined but still don't want to be assigned, you should probably filter them before passing them to Shale, or alternatively filter them with Shale before passing them further down the stack (e.g to ActiveRecord)


Just noticed when sharing the site link - the summary reads: Vue-powered Static Site Generator. A bit misleading.

<meta name="description" content="Vue-powered Static Site Generator">

Kudos for choosing Vue tho =)


Documentation site was based on https://vuepress.vuejs.org/ but it evolved so much I dropped Vue all together and wen't with plain HTML instead. I must have left that meta tag from the early days.

Regarding Vue I use it daily at my job, great library :)


In the last example, where does it find the values for the `married`, `age`, `zip` and `hobbies` attributes? They are not present in the JSON string?


Ah, I messed up the example, Person class definition should look like this:

  class Person < Shale::Mapper
    attribute :first_name, Shale::Type::String
    attribute :last_name, Shale::Type::String
    attribute :age, Shale::Type::Integer
    attribute :married, Shale::Type::Boolean, default: false
    attribute :hobbies, Shale::Type::String, collection: true
    attribute :address, Address
  end
And the JSON used for parsing also should contain those atttributes, like:

  {
    "first_name": "John",
    "last_name": "Doe",
    "age": 30,
    "married": false,
    "hobbies": ["Singing", "Dancing"],
    "address": {
      "street": "Oxford Street",
      "city": "London"
    }
  }


Serialization/deserialization is such an important part of web development, I have no idea why Rails includes the ancient JBuilder (and very slow since it goes through templating) library, instead of investing in a proper library. Let alone deserializing which is equally important..

I think the API Shale provides is pretty sane. I would probably use it in my next Ruby/Rails project. I don't like the fact that Nokogiri is included by default, it would be nice to declare a core type, and then bring in what you need (JSON, XML, YAML) as a different gem. But that's not a deal breaker for me.

I have created my own serializers in the past (SimpleAMS[1]) because I really detested AMS, no offence to AMS contributors, but AMS library should just die. Rails, and way more importantly Ruby, should come up with an "official" serializers/deserializers library that is flexible enough, rock solid and fast. For instance I had done some benchmarking among common serializer libraries [2] and AMS was crazy slow, without providing much flexibility, really (meaning, slowness is not justified). Others were faster, but were supporting only one JSON spec format (like jsonapi-rb). I am wondering where shale stands.

Another thing is that most serialization libraries seem to have ActiveSupport as a main dependency (not shale though) which I think is a bit too much, and actually has a performance hit on the methods it provides.

I really think that Ruby community can do better here ?

[1] https://github.com/vasilakisfil/SimpleAMS

[2] https://vasilakisfil.social/blog/2020/01/20/modern-ruby-seri... (scroll towards the end for benchmarks)


I'm glad you like it. One clarification - Nokogiri is not required by default, you have to explicitly require "shale/adapter/nokogiri" to use it. If you don't Shale will use REXML which comes from Ruby's standard library.


Rexml has been gemified. Shale's gemspec doesn't require a specific version of rexml and rexml<3.2.5 is vulnerable to CVE-2021-28965. I just checked Ubuntu 20.04 LTS and got Ruby 2.7 with rexml 3.2.3 by default so this seems like a realistic concern and it would be safer if shale required a minimum rexml version.

See http://www.ruby-lang.org/en/news/2021/04/05/xml-round-trip-v...


I have a mixed feelings about this, standard library's vulnerabilities are part of Ruby's vulnerabilities, so you would update your Ruby version anyway. But you're right specifing version explicitly would prevent this.


I think one of the motivations for splitting the stdlib into gems was for exactly for this kind of scenario: some users might not be able to update their Ruby immediately. The ruby-lang advisory explicitly recommends bumping the REXML version.


I have definitely been in situations where I couldn't update the ruby version in a timely manner, but have been able to bump a gem version (like in this example)


If I get a dependabot alarm for my Rails project, I would do well to make a bet that it's a nokogiri vulnerability. I haven't looked into the "why" or what's really going on, but it does feel like there's a lot of room to look at attack surface or any core design issues.


Nokogiri is one of the most security-sensitive parts of any Rails codebase, since it's used for parsing and sanitizing untrusted HTML and XML documents. Accordingly, there's a lot of scrutiny on it (and its upstream dependency, libxml2). That said, as far as I'm aware, almost all of the recent vulnerabilities I've noticed have been related to XSLT and other obscure XML features that most people probably don't use (and aren't enabled by default). So there's a combination of both 1) lots of scrutiny on the library itself leads to high security standards and 2) the goal of fully-featured XML processing adds a large attack surface that may not be relevant to most people that leads to a lot of vulnerability alerts.

Personally though, I've been seeing almost 10x the amount of alerts for useless "vulnerabilities" like ReDOS in nodejs projects though. Either way, alert fatigue is real.


The last one was about libxslt... which I'd be shocked if anyone is using XSLT in a production environment that is also actively maintained.


XML is chock-full of misfeatures ripe for creating security vulnerabilities. It's not just nokogiri – XML parsing libs are one of the hottest sources of vulnerability notifications in many ecosystems (a large number of those CVE alerts come by way of using libxml2 under the hood, which nokogiri also depends on).

Safely parsing untrusted XML is an extremely hairy task.


This library looks great for those using it, but I wish the situation for "ActiveRecord model -> JSON representation" in open-source libraries was better. This library seems to be overkill for that, since you'll almost always want completely separate code for "deserializing" attribute updates from a request, and it requires you to specify the type of every single property. ActiveModel::Serializer was great while it lasted, but it's unmaintained and missing a lot of features. Blueprinter seems a lot less battle-tested and may have performance problems. Last I looked, almost no library easily supports eager-loading. Is this right? I feel like I must be missing something. How do people render their models in modern Rails apps?


Have you checked out Alba [0]? I think it’s one of the better options right now.

[0]: https://github.com/okuramasafumi/alba


I used a middle layer to transform ActiveRecord into, before put it in Serializer.


Glad to see folks actively pushing things in the Ruby space further. I've said it before, but I recently returned to Ruby and Rails after many years away, and my productivity has reached levels I couldn't imagine. Subjective for sure, but ruby is a beautiful fun language, and rails has everything (especially now with https://hotwired.dev) that a single founder needs.


The new, modern Rails stack is leaps and bounds ahead of anything else out there. Rails truly makes web development enjoyable, fast and effective.

It's a shame Ruby and Rails are not getting all the recognition they deserve.


I'm currently doing both Rails and Go, it's just different worlds. I'm a Go noob so it's not a fair comparison but still - I did Django, Node, etc etc and Go is just miles behind anything productive.


You are comparing a language to two complete frameworks and a runtime.

Go can be extremely productive but it's definitely not a great choice if you need to create a web app over a weekend.

RoR, Django etc have ready solutions for things like authorization\authentication, administration tools, oauth... Not to mention that 'framework' assumes some sort of contracts so that all thing build for the framework in question can talk to each other.

Go is a good choice if you need to build a custom solution for your needs. Not if you are looking for a set of building blocks you have to configure for your task.


Am using Chi and GORM for what it's worth.


I also like to use Chi and don't use any query builders (except for one service I guess) or ORMs.


I did Go for around a year and found I'm not its target audience. I need to build database-backed web apps quickly and, while possible in Go, it wasn't easy. Rails is a dream in comparison for that purpose. I experimented with many of the Go web frameworks, but it felt very much like a square peg in a round hole.

I really liked Go for what it was, but it wasn't the right fit for my set of problems.


Ah yes indeed, I should have mentioned the company is mostly a standard CRUD web app. Sure - Go's advantage is the services weigh less and are able to handle more - but that is such an insignificant advantage compared to the human cost of developing and maintaining it (compared to Rails) imo.

It might have been different if I needed something more low level - though in that case one can use C/C++.


Exactly my experience. Rails allows me to build quickly and iterate even faster. Highly recommended. Hotwire is pretty cool, too. I‘ve built an action palette type of dialog with keyboard navigation without any stateful JavaScript (except for the cursor).


I haven't used Turbo yet, but Simulus is a great little framework for slapping a bit of JS onto an existing fairly vanilla server side app to add some nice interactive experiences.

Have really enjoyed using it recent in my Rails apps.


One of the things that keeps being repeated in ruby land is that domain objects are usually married to storage/serialisation method. At some point of application maturity you'll need some other method of serialisation, some other type casting or conversion logic for your form or something else, but by that time a lot of surrounding code would depend on implicit logic of the original base library. ActiveRecord does this, and your library does it too. Object mappers which can initialize or serialize instances of other classes, including PORO, are much more versatile and future-proof. And API for doing that could look almost the same as yours.


Great point. I feel like this is an often ignored advantage of JS/TS projects. Most often data is passed around as POJOs. It's dead simple and easy to duplicate, serialize, and mutate


POJSOs? :D


I totally agree with your points, but this approach has one big advantage - it's dead simple - define attributes and mapping and you're good to go.


You don't have to sacrifice that simplicity, actually. (And I insist on that simplicity being a wrong type, it'll bite users of your library basically right away, when they try to use it for anything apart from storage/serialisation)

But you can just give an upgrade path! consider something like this:

  class Address
    attr_accessor :street, :city
  end

  class Person
    attr_accessor :address
  end

  class AddressMapper < Shale::Mapper
    mapped_class Address
    attribute :street, Shale::Type::String
    attribute :city, Shale::Type::String
  end

  class PersonMapper < Shale::Mapper
    mapped_class Person
    attribute :address, AddressMapper
  end

  # use like this
  PersonMapper.from_xml("...."); PersonMapper.to_xml(person)
and then, for _dead_ simplicity, you can add another method generate_mapped_class "Person"

which will define that PORO class for user for extra DRYness. API is basically the same, no repetition, but amount of rewrite with new requirements is drastically less.

I'm not asking you to rewrite your library, and I probably won't write and release mine, just saying that considering future self isn't that hard. And yeah, it's a bit of a rant about ActiveRecord from user of Rails, since 2006.


I haven’t looked at the Shale source code, but I suspect that it would not be hard to add `mapped_class` support the way you’ve described it, so that the business objects are not themselves mapper instances. At a guess, the `from_xml` probably does something like (vastly over simplified):

    def from_xml(xml_string)
      new.tap { |o|
        parse_xml(xml_string) do |key, value|
          o.__send__(:"#{key}=", value)
        end
      }
    end
It would then be possible to change this to:

    def from_xml(xml_string)
      (mapped_class || self).new.tap { |o|
        parse_xml(xml_string) do |key, value|
          o.__send__(:"#{key}=", value)
        end
      }
    end
      
This would make it easier to solve a larger problem of needing to serialize the same business object in different ways for different consumers with different levels of detail. It would also permit the construction of mappers for temporary objects that contain the details for more complex serializations that have indirect connections.


An added advantage of this approach is that it allows clean integration w/any other mapper or library. E.g. you could define a mapping to your Sequel or ActiveRecord models, and in one go you have the ability to roundtrip between JSON/XML etc. and the ORM.

To the point of rewriting: A halfway point is to drop inheritance in favour of include/extend'ing the models. If that's done cautiously, it allows for co-existing with model objects from libraries that require inheritance. That is, this:

   class Person
     include Shale::Mapper
   end
is preferable to

  class Person < Shale::Mapper
  end


I like it actually, using POROs (or any class for that matter) is definitely a big advantage. Maybe I implement something like that for version 2 :)


Agreed that this is a big advantage. I’ve switched to having a separate set of serialization objects with straightforward copy constructors or mapping functions and let the serialization library do the job against those. I used to hand roll the serialization, but this is admittedly user.


Thanks for pushing the Ruby space further!


Nice library with a very approachable documentation, congrats!

I'll probably give it a go to replace my current implementation using nokogiri-happymapper (https://github.com/mvz/happymapper)


HappyMapper was actually an inspiration for Shale. If it had support for JSON, Shale probably wouldn't be created :)


I like this idea, I remember seeing something similar in Trailblazer. But basically you just define your models once, and then you can transform them into different formats, and have them play nicely with ActiveRecord as well. Pretty cool :)


Nice, it seems like a generic version of grape-entity https://github.com/ruby-grape/grape-entity


It would be great to be able to generate the Ruby models from XML Schema Definition files (.xsd) No mistakes and a huge time saver.


Yeah, Shale supports generating models from JSON Schema for now, XML is work in progress and should be ready in two or three weeks.


Wonderful, thanks!


Thanks for this! Definitely going to use this for one of our big projects.

*Edit: nice docs site as well - what are you using for it?


It's a custom template I created (based on https://vuepress.vuejs.org/), because I couldn't find anything that simple. The source code is available on https://github.com/kgiszczak/shale-website

Interactive examples are powered by https://opalrb.com/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: