Normalize Data with Rails 7.1
Storing user input directly in a database can be a security risk. Malicious users can inject harmful code into the database, which could lead to data breaches or other problems. To protect your database, you should sanitize (remove HTML tags from user input to prevent cross-site scripting attacks) or standardize (convert the data into a consistent format) the data before saving it.
Before Rails 7.1
Before Rails 7.1,
to ensure data is correctly formatted
and
sanitized,
before_save
or
before_validation
callbacks can be used.
The most basic use case for such scenarios is storing the user's email address
in the Rails application.
1. Using before_save or before_validation callbacks
With before_save
or
before_validation
you can implement the normalization flow as below:
class User < ApplicationRecord
before_save :sanitize_email
private
def sanitize_email
email.strip.downcase
end
end
## or
class User < ApplicationRecord
before_validation :sanitize_email
private
def sanitize_email
email.strip.downcase
end
end
2. Using attribute setter
Another way to normalize the data is to use the setter methods as follows:
class User < ApplicationRecord
def email=(value)
self.email = value.strip.downcase
end
end
3. Using normalize gem
If you want to avoid using the above two methods,
you can use the
normalize
gem.
As the gem README explains,
you must create an app/normalizers
directory in your Rails application.
The directory will contain your normalization classes.
For the above use case,
an email normalizer class can be created as below:
class EmailNormalizer
def self.call(email)
email.strip.downcase
end
end
You need to specify the EmailNormalizer
to run on the email
attribute in your
User
model as below:
class User < ApplicationRecord
normalize :email, with: EmailNormalizer
end
In Rails 7.1
Rails 7.1 adds normalizes
method to ActiveRecord::Base,
which can be used to declare normalization for attribute values.
The method can be used to sanitize user inputs,
enforce consistent formatting
and
clean up data from external sources.
The normalizes
method takes two arguments:
- name of the attribute to be normalized
- block that defines the normalization logic. The block can contain any Ruby code you want to run on the attribute value before it is saved to the database.
For example, the following code normalizes the email attribute by downcasing it:
class User < ApplicationRecord
normalizes :email, with: -> email { email.downcase.strip }
end
> user = User.create(email: "\n [email protected]")
> user.email
=> "[email protected]"
When a new user is created, or an existing user's email address is updated, the normalizes block will run on the email address before it is saved to the database. The email address will be downcased and then saved.
Normalize multiple attributes
The normalizes method can also be used to normalize multiple attributes at once.
For example,
the following code normalizes the email
and
the username
attributes:
class User < ActiveRecord::Base
normalizes :email, :username, with: -> attribute { attribute.strip.downcase }
end
Normalize nil values
By default,
normalization is not applied to nil
values.
If you pass the username
as nil
,
assuming it is not mandatory,
it will be set to nil
.
This means normalization code is not executed for nil
values
else nil.strip
will raise NoMethodError: undefined method 'strip' for nil:NilClass
.
> user = User.create(email: "\n [email protected]", username: nil)
> user.email
=> "[email protected]"
> user.username
=> nil
The nil behaviour can be changed by setting apply_to_nil
to true.
apply_to_nil
is false by default.
class User < ActiveRecord::Base
normalizes :username, with: -> username { username&.downcase&.titleize || 'No username' }, apply_to_nil: true
end
> user = User.create(email: "\n [email protected]", username: nil)
> user.email
=> "[email protected]"
> user.username
=> "No username"
Normalization process
Normalization is applied when the attribute is assigned or updated. The normalization is also applied to the corresponding keyword argument of finder methods. This enables the creation of a record and subsequent querying using unnormalized values.
> User.create!(email: "\n [email protected]")
#<User:0x000000001x987261 id: 1, email: "[email protected]"
created_at: Fri, 11 June 2023 00:00:20.058984000 UTC +00:00,
updated_at: Fri, 11 June 2023 00:00:20.058984000 UTC +00:00>
> user = User.find_by!(email: "\n [email protected]")
> user.email
=> "[email protected]"
> User.exists?(email: "\n [email protected]")
=> true
Note: Normalization will not be applied when you pass the attribute value in the raw query.
> User.exists?(["email = ?", "\n [email protected]"])
=> false
Normalize existing records
If a user's email were already stored in the database before normalization was added to the model, the email would not be retrieved in the normalized format.
## user created with email "[email protected]"
> user = User.find(1)
> user.email
=> "[email protected]"
This means for existing records; the attributes won't be normalized. You can normalize it explicitly using the Normalization#normalize_attribute method.
class User < ActiveRecord::Base
normalizes :email, with: -> email { email&.downcase&.strip }
end
### Migration to normalize existing records
User.find_each do |legacy_user|
legacy_user.normalize_attribute(:email)
legacy_user.save
end
Normalize using a class method
You can call the normalize
method at the class level to normalize the attribute's value
as below:
> User.normalize(:email, "\n [email protected]")
=> "[email protected]"
To know more about this feature, please refer to this PR.