Archives

All posts for the month December, 2014

Another SSL vulnerability – the POODLE bug, has surfaced. Server-side measures taken.

SSL 3.0 POODLE vulnerability -October 16th, 2014

Just a few months after the Heartbleed bug shattered the believed-to-be-secure SSL/TLS encryption layer status quo and put data transfers, emails, instant messages, etc. at risk, a new SSL vulnerability has been brought to light by Google experts.

According to Google researchers, a weakness in the SSL 3.0 protocol could be used to eavesdrop critical data that is transferred over an encrypted connection between web browsers, apps, etc. and servers.

The ‘new’ bug is called POODLE – an acronym for Padding Oracle On Downgraded Legacy Encryption.

The mechanism of the POODLE attack

The newly discovered POODLE exploit poses a great threat to online security, since it affects an old SSL version, which is still widely used by the majority of servers and clients.

It allows hackers to outsmart a web client by telling it that the server does not support the more secure TLS (Transport Layer Security) protocol, so the client is forced to connect via SSL 3.0.

This downgrade maneuver opens the door of abuse and attackers can freely decrypt secure HTTP data and steal the protected information.

Measures taken against POODLE attacks

With the discovery of POODLE, the security specialists at Google instantly recommended measures for dealing with this encryption issue.

First and foremost, the SSL 3.0 protocol needs to be disabled for both participants in the SSL communication – the server and the client, and they need to default to the more secure TLS. This will stop attackers from forcing the communication to go through the exploited SSL 3.0.

Server-side measures:

In response to the Google team’s recommendation, our web hosting servers no longer support SSL 3.0 and older versions of the protocol. Also, our administrators have set the minimum SSL requirement to the provenly secure TLS 1.0.

NOTE: As a result, an Internet Explorer browser whose version is 6.0 or older will not be able to access websites hosted on our servers.

Client-side measures:

As far as web clients are concerned, Google specialists recommend that end users immediately disable SSL 3.0 support in their browsers, if such exists.

In response to the issue, Google plans to remove SSL 3.0 support completely from all its products in the upcoming months. Currently, they even offer a Chromium patch, which disables the SSL 3.0 fallback.

Mozilla has also announced plans to turn off SSL 3.0 in Firefox and it will be disabled by default in Firefox 34, which will be released in November. They also offer code for disabling the protocol, which is now available via Nightly. Also, you can use the SSL Version Control add-on for Firefox.

Upcoming actions against POODLE attacks

To further secure our system against future downgrade attacks, our admins are also planning toimplement TLS_FALLBACK_SCSV (Transport Layer Security Signalling Cipher Suite Value) on all our servers shortly. We’ll keep you posted.

 

ModSecurity now enabled with all VPSs

ModSecurity on VPSThe ModSecurity Apache module is a great solution for minimizing the number of hack attacks to websites and applications.

It acts as an application-layer firewall and is able to effectively prevent most brute force/ URL forgery attacks and forum spamming attempts targeted at sites.

Some time ago, we enabled the ModSecurity protection layer as a default feature with all shared hosting accounts. Now the highly effective anti-hack firewall is enabled with all VPSs as well.

ModSecurity enabled on all VPSs

As with shared hosting accounts, the ModSecurity firewall is enabled by default on your VPS, so you don’t have to configure anything in order to have your websites protected.

ModSecurity is running in a blocking mode, so it will automatically block all incoming requests that are flagged as insecure according to the commercial rules at http://www.atomicorp.com.

You can access the ModSecurity section in the Hepsia Control Panel from the newly added shortcut on the Control Panel’s home page or from the Advanced drop-down menu:

ModSecurity in the Control Panel

How does ModSecurity exactly work?

Over 70% of all the attacks are now carried out at the web application level and being a web application firewall (WAF) itself, ModSecurity effectively addresses this problem.

Its purpose is to establish an external security layer, which allows for HTTP traffic monitoring and real-time analysis, and it offers a powerful API for implementing the advanced protection needed.

This way, the firewall ensures an enhanced level of security, where the malicious attacks are detected and prevented before they reach the web applications.

ModSecurity against brute force attacks

ModSecurity has proven to be very efficient in preventing “brute force” attacks, i.e. the attempts to guess the username and the password of a web application, using a predefined set of usernames and passwords and combining them randomly.

Thanks to the ModSecurity firewall, if there are more than 15 failed login attempts from an IP address within 3 minutes, the IP address will be blocked from accessing the website for the next 30 minutes.

So far, the ModSecurity plugin has reduced the number of hacked websites on our servers dramatically.

If you have any questions about ModSecurity and about how it will work on your Virtual Private Server, don’t hesitate to contact our support team by opening a ticket from the Web Hosting Control Panel.

Password Setup interface added for new web hosting accounts

Password setup interface for Hepsia Control PanelPassword transmission over email has always been a hot topic for security-sensitive users and this is so for a reason – password smugglers are getting smarter in inventing new ways of stealing private information.

We’ve addressed this issue by moving passwords out of welcome emails and by implementing a password setup interface for first-time customers.

The interface will be applicable to both new hosting account signupsand to situations when users request to reset their passwords.

A Password Setup interface for new customers

From today, welcome emails sent to newly registered users will feature special instructions on how to set up their hosting account passwords on their own:

Password Setup interface - Welcome mail

Through a special link, they will be taken to a secure page where they can set up their web hosting account password:

Password setup interface - login page

After they type their password in the two fields following the password strength tips that we have included in the form, they will be able to log into their hosting accounts immediately.

The password they set through the form will be also be valid for the FTP account, which is created for the user at signup.

A Password Setup interface for resetting passwords

The password setup form can also be used in cases of password changes. When a customer requests to reset their password from the login form of the Web Hosting Control Panel, they will be sent an email notification, which will forward them to the same password setup form:

Password setup interface - reset password

After filling out the form, the user will be instantly logged into the Control Panel with the new password.

NOTE: Since the hosting account password will not be readily available to the user in a written form anymore, users will be recommended to instantly save their passwords in the browser or use a password management tool.

The .MX country-code TLD is finally available for registration!

.MX domain names now available for registrationFelicidades! Starting from today, you and your customers can register .MX ‘nombres de dominio’ and become a part of the large online Mexican family.

.MX had been a long-sought-after extension before it became open to the public a few years ago. It was closed for registration for a long time and only 3rd-level .COM.MX domains names were available to the public.

Now the .MX ccTLD is open for registration to anyone and you can get your .MX domain today without any restrictions whatsoever.

Why register .MX domain names?

  • Show your personal/business interest in Mexico. The .MX ccTLD represents a great chance for companies and individuals having economical, political or cultural relations with Mexico, to voice their identity online.
  • Show commitment to Mexican customers. A local domain name will contribute to your company’s professional image and will help you demonstrate your commitment to the local audience.
  • Increase your sales on the Мexican market. By targeting the Mexican market with a local TLD, your brand will be recognized by potential customers and you will stand a much better chance of increasing your revenue.
  • Protect your local trademark. By registering a .MX domain, you will secure your brand as an online trademark pertaining to Mexico.
  • Get a short and easy-to-remember website name. The possibility of laying hands on a short and recognizable site name is much greater with a ccTLD like .MX than with a widely used TLD like .COM and .NET.

Who can register .MX domain names?

There are no local presence or citizenship requirements for registering a .MX domain, unlike the situation with some other ccTLDs.

Anyone can register a .MX domain name for a period of 1-5 years. Also, if you have a .MX domain registered somewhere else, you can transfer it over to us, so as to keep your site(s) and domain name(s) in one place. An EPP code will be needed for the transfer to take place.

On the heels of the recent PHP updates to our system – the introduction of PHP sockets and the upgrade to v 5.6.0, we have now activated PHP NG on our servers.

PHP NG was first introduced in May, 2014 as a major effort to refactor the PHP codebase in order to reduce memory consumption and to increase overall performance.

Although still in alpha development, PHP NG is starting to show amazing performance improvements over the current stable PHP version – PHP 5.6, while maintaining 100% compatibility.

The tests of developers have shown that PHP 5.7 NG is performing anywhere between 20% and 110% faster than PHP 5.6 in real-world applications such as WordPress, Drupal or SugarCRM. Memory consumption has been reduced significantly too.

Also, PHP NG’s refactored codebase is an excellent basis for future optimizations, and its performance level keeps improving.

PHP 5.7 NG is used only for development purposes and will most probably not be suitable for production use this year. However, it has all the potential to turn into a phenomenon for servers around the world, with more stable betas and even release candidates expected in early 2015.

You can enable PHP NG from the PHP Configuration section of the Web Hosting Control Panel.

PHP NG option in the Control Panel

Semi-dedicated servers in the Australian data centerLast week, our admins were busy configuring server packages in our partnering data centers in Europe – semi-dedicated and OpenVZ VPS packages in Finland and in Eastern Europe, as well as semi-dedicated packages in the UK and in the TelePoint data center in Sofia, Bulgaria.

This week, they added semi-dedicated servers to the data center in Australia, so now you will be able to target a new niche on the Australian hosting market.

Target customers from Australia & the region with more advanced needs

With the addition of semi-dedicated servers to the list of services offered in Australia, you can now expand your reach by addressing the needs of more demanding customers.

From today, site owners who run more complex sites, for example – resource-consuming blogs or traffic-hungry e-stores, will be able to seamlessly move to a more powerful web hosting solution with a click of the mouse.

au-data-center-semi-dedicated-servers

Since the semi-dedicated servers are based on the same cloud hosting platform that is implemented on our shared hosting servers, an existing shared hosting customer will be able to quickly upgrade to a semi-dedicated server by opening a ticket from their Web Hosting Control Panel. Our technicians will allocate the new extra resources to the customer’s account in a matter of minutes and let the customer know immediately.

The Australian data center – key highlights

SIS Group – the Australian data center that we are partnering with, is located in the business district of Sydney. It is a perfect hosting choice for customers from Australia, New Zealand, the islands of Oceania and Southeast Asia.

The facility relies on redundant high-speed Internet connections and is well-equipped to handle high volumes of traffic from the area and abroad.

Strict temperature and humidity control equipment, water detection systems, security controls and powerful backup power supplies guarantee the continuous flow of customers’ data.

Here’s an overview of the facility’s main advantages:

  • 24×7 Onsite Security
  • Raised Computer Room Flooring
  • Multiple UPS Systems
  • Backup Diesel Generators
  • Dark Fibre Connectivity
  • Fire Protection
  • Video Surveillance
  • Rack Mount Equipment
  • Desktop Equipment
  • Device Monitoring
  • Load Balancing
  • Tape Rotation and Offsite Storage
  • Online Data Backups
  • Remote Hands

The following is a repost of the current wiki data on ZFS from Wikipedia : View current data here

zfs-linux

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

ZFS was originally implemented as open-source software, licensed under the Common Development and Distribution License (CDDL). The ZFS name is registered as a trademark of Oracle Corporation.[3][better source needed][4]

OpenZFS is an umbrella project aimed at bringing together individuals and companies that use the ZFS file system and work on its improvements

 

Data integrity[edit]

One major feature that distinguishes ZFS from other file systems is that ZFS is designed with a focus on data integrity. That is, it is designed to protect the user’s data on disk against silent data corruption caused by bit rot, current spikes, bugs in disk firmware, phantom writes (the write is dropped on the floor), misdirected reads/writes (the disk accesses the wrong block), DMA parity errors between the array and server memory or from the driver (since the checksum validates data inside the array), driver errors (data winds up in the wrong buffer inside the kernel), accidental overwrites (such as swapping to a live file system), etc.

Data integrity is a high priority in ZFS because recent research shows that none of the currently widespread file systems—​such as UFS, Ext,[8] XFS, JFS, or NTFS—​nor hardware RAID provide sufficient protection against such problems (hardware RAID has some issues with data integrity).[9][10][11][12][13] Initial research indicates that ZFS protects data better than earlier efforts.[14][15] While it is also faster than UFS, it can be seen as the successor to UFS.[16][17]

ZFS data integrity[edit]

For ZFS, data integrity is achieved by using a (Fletcher-based) checksum or a (SHA-256) hash throughout the file system tree.[18] Each block of data is checksummed and the checksum value is then saved in the pointer to that block—rather than at the actual block itself. Next, the block pointer is checksummed, with the value being saved at its pointer. This checksumming continues all the way up the file system’s data hierarchy to the root node, which is also checksummed, thus creating a Merkle tree.[18] In-flight data corruption or phantom reads/writes (the data written/read checksums correctly but is actually wrong) are undetectable by most filesystems as they store the checksum with the data. ZFS stores the checksum of each block in its parent block pointer so the entire pool self-validates.[19]

When a block is accessed, regardless of whether it is data or meta-data, its checksum is calculated and compared with the stored checksum value of what it “should” be. If the checksums match, the data are passed up the programming stack to the process that asked for it. If the values do not match, then ZFS can heal the data if the storage pool has redundancy via ZFS mirroring or RAID.[20] If the storage pool consists of a single disk, it is possible to provide such redundancy by specifying “copies=2” (or “copies=3”), which means that data will be stored twice (thrice) on the disk, effectively halving (or, for “copies=3”, reducing to one third) the storage capacity of the disk.[21] If redundancy exists, ZFS will fetch a copy of the data (or recreate it via a RAID recovery mechanism), and recalculate the checksum—ideally resulting in the reproduction of the originally expected value. If the data passes this integrity check, the system can then update the faulty copy with known-good data so that redundancy can be restored.

ZFS and hardware RAID[edit]

If the disks are connected to a RAID controller, it is most efficient to configure it in JBOD mode (i.e. turn off RAID functionality). If there is a hardware RAID card used, ZFS always detects all data corruption but cannot always repair data corruption because the hardware RAID card will interfere. Therefore the recommendation is to not use a hardware RAID card, or to flash a hardware RAID card into JBOD/IT mode. For ZFS to be able to guarantee data integrity, it needs to either have access to a RAID set (so all data is copied to at least two disks), or if one single disk is used, ZFS needs to enable redundancy (copies) which duplicates the data on the same logical drive. Using ZFS copies is a good feature to use on notebooks and desktop computers, since the disks are large and it at least provides some limited redundancy with just a single drive.

There are several reasons as to why it is better to rely solely on ZFS by using several independent disks and RAID-Z or mirroring. For example, a ZFS volume with RAID-0 volumes even with “copies=2” can be failure prone, as the RAID-0 volumes will fail in the event of any disk failures. Thus, storing data on RAID-0 with a ZFS volume and “copies=2” enabled doesn’t increase data reliability, instead, it reduces it.

When using hardware RAID, the controller usually adds controller-dependent data to the drives which prevents software RAID from accessing the user data. While it is possible to read the data with a compatible hardware RAID controller, this inconveniences consumers as a compatible controller usually isn’t readily available. Using the JBOD/RAID-Z combination, any disk controller can be used to resume operation after a controller failure.

Note that hardware RAID configured as JBOD may still detach drives that do not respond in time (as has been seen with many energy-efficient consumer-grade hard drives), and as such, may require TLER/CCTL/ERC-enabled drives to prevent drive dropouts.[22]

Software RAID using ZFS[edit]

ZFS offers software RAID through its RAID-Z and mirroring organization schemes. RAID-Z is invulnerable to the write hole error, which other types of RAIDs suffer from. There are three different RAID modes: RAID-Z1 is similar to RAID 5 (allows one disk to fail), RAID-Z2 is similar to RAID 6 (allows two disks to fail) and RAID-Z3 (allows three disks to fail). The need for RAID-Z3 arose recently because RAID configurations with future disks (say 6–10 TB) may take a long time to repair, the worst case being weeks. During those weeks, the rest of the disks in the RAID are stressed more because of the additional intensive repair process and might subsequently fail, too. By using RAID-Z3, the risk involved with disk replacement is reduced.[23]

Mirroring, the other ZFS RAID option, is essentially the same as RAID 1. The difference is that ZFS allows any number of disks in the mirror, for instance, you could create a mirror consisting of three disks, or even eleven disks.[24]

Resilvering and scrub[edit]

ZFS has no fsck repair tool equivalent, common on Unix filesystems, which does file system validation and file system repair.[25] Instead, ZFS has a repair tool called “scrub” which examines and repairs silent corruption and other problems. Some differences are:

  • fsck must be run on an offline filesystem, which means the filesystem must be unmounted and is not usable while being repaired.
  • scrub does not need the ZFS filesystem to be taken offline; scrub is designed to be used on a mounted, live filesystem.
  • fsck usually only checks metadata (such as the journal log) but never checks the data itself. This means, after an fsck, the data might still be corrupt.
  • scrub checks everything, including metadata and the data. The effect can be observed by comparing fsck to scrub times – sometimes a fsck on a large RAID completes in a few minutes, which means only the metadata was checked. Traversing all metadata and data on a large RAID takes many hours, which is exactly what scrub does.

The official recommendation from Sun/Oracle is to scrub enterprise-level disks once a month, and cheaper commodity disks once a week.[26][27]

Storage pools[edit]

Unlike traditional file systems which reside on single devices and thus require a volume manager to use more than one device, ZFS filesystems are built on top of virtual storage pools called zpools. A zpool is constructed of virtual devices (vdevs), which are themselves constructed of block devices: files, hard drive partitions, or entire drives, with the latter being the recommended usage.[28] Block devices within a vdev may be configured in different ways, depending on needs and space available: non-redundantly (similar to RAID 0), as a mirror (RAID 1) of two or more devices, as a RAID-Z (similar to RAID-5) group of three or more devices, or as a RAID-Z2 (similar to RAID-6) group of four or more devices.[29] In July 2009, triple-parity RAID-Z3 was added to OpenSolaris.[30][31] RAID-Z is a data-protection technology featured by ZFS in order to reduce the block overhead in mirroring.[32]

Thus, a zpool (ZFS storage pool) is vaguely similar to a computer’s RAM. The total RAM pool capacity depends on the number of RAM memory sticks and the size of each stick. Likewise, a zpool consists of one or more vdevs. Each vdev can be viewed as a group of hard disks (or partitions, or files, etc.). Each vdev should have redundancy, because if a vdev is lost, then the whole zpool is lost. Thus, each vdev should be configured as RAID-Z1, RAID-Z2, mirror, etc. It is not possible to change the number of drives in an existing vdev (Block Pointer Rewrite will allow this, and also allow defragmentation), but it is always possible to increase storage capacity by adding a new vdev to a zpool. It is possible to swap a drive to a larger drive and resilver (repair) the zpool. If this procedure is repeated for every disk in a vdev, then the zpool will grow in capacity when the last drive is resilvered. A vdev will have the same base capacity as the smallest drive in the group. For instance, a vdev consisting of three 500 GB and one 700 GB drive, will have a capacity of 4×500 GB.

In addition, pools can have hot spares to compensate for failing disks. When mirroring, block devices can be grouped according to physical chassis, so that the filesystem can continue in the case of the failure of an entire chassis.

Storage pool composition is not limited to similar devices, but can consist of ad-hoc, heterogeneous collections of devices, which ZFS seamlessly pools together, subsequently doling out space to diverse filesystems as needed. Arbitrary storage device types can be added to existing pools to expand their size at any time.[33]

The storage capacity of all vdevs is available to all of the file system instances in the zpool. A quota can be set to limit the amount of space a file system instance can occupy, and a reservation can be set to guarantee that space will be available to a file system instance.

ZFS cache: ARC (L1), L2ARC, ZIL[edit]

ZFS uses different layers of disk cache to speed up read and write operations. Ideally, all data should be stored in RAM, but that is too expensive. Therefore, data is automatically cached in a hierarchy to optimize performance vs cost.[34] Frequently accessed data is stored in RAM, and less frequently accessed data can be stored on slower media, such as SSD disks. Data that is not often accessed is not cached and left on the slow hard drives. If old data is suddenly read a lot, ZFS will automatically move it to SSD disks or to RAM.

The first level of disk cache is RAM, which uses a variant of the ARC algorithm. It is similar to a level 1 CPU cache. RAM will always be used for caching, thus this level is always present. There are claims that ZFS servers must have huge amounts of RAM, but that is not true. It is a misinterpretation of the desire to have large ARC disk caches. The ARC is very clever and efficient, which means disks will often not be touched at all, provided the ARC size is sufficiently large. In the worst case, if the RAM size is very small (say, 1 GB), there will hardly be any ARC at all; in this case, ZFS always needs to reach for the disks. This means read performance degrades to disk speed.

The second level of disk cache are SSD disks. This level is optional, and is easy to add or remove during live usage, as there is no need to shut down the zpool. There are two different caches; one cache for reads, and one for writes.

  • The read SSD cache is called L2ARC and is similar to a level 2 CPU cache. The L2ARC will also considerably speed up Deduplication if the entire Dedup table can be cached in L2ARC. It can take several hours to fully populate the L2ARC (before it has decided which data are “hot” and should be cached). If the L2ARC device is lost, all reads will go out to the disks which slows down performance, but nothing else will happen (no data will be lost).
  • The write SSD cache is called the Log Device, and it is used by the ZIL (ZFS intent log). ZIL basically turns synchronous writes into asynchronous writes, which helps e.g. NFS or databases.[35] All data is written to the ZIL like a journal log, but only read after a crash. Thus, the ZIL data is normally never read. Every once in a while, the ZIL will flush the data to the zpool; this is called Transaction Group Commit. In case there is no separate log device added to the zpool, a part of the zpool will automatically be used as ZIL, thus there is always a ZIL on every zpool. It is important that the log device use a disk with low latency. For improved performance, a disk consisting of battery-backed RAM should be used. Because the log device is written to often, an SSD disk will eventually be worn out, but a RAM disk will not. If the log device is lost, it is possible to lose the latest writes, therefore the log device should be mirrored. In earlier versions of ZFS, loss of the log device could result in loss of the entire zpool, therefore one should upgrade ZFS if planning to use a separate log device.

Capacity[edit]

ZFS is a 128-bit file system,[36] so it can address 1.84 × 1019 times more data than 64-bit systems such as Btrfs. The limitations of ZFS are designed to be so large that they should not be encountered in the foreseeable future.

Some theoretical limits in ZFS are:

  • 248: number of entries in any individual directory[37]
  • 16 exbibytes (264 bytes): maximum size of a single file
  • 16 exbibytes: maximum size of any attribute
  • 256 zebibytes (278 bytes): maximum size of any zpool
  • 256: number of attributes of a file (actually constrained to 248 for the number of files in a ZFS file system)
  • 264: number of devices in any zpool
  • 264: number of zpools in a system
  • 264: number of file systems in a zpool

Copy-on-write transactional model[edit]

ZFS uses a copy-on-write transactional object model. All block pointers within the filesystem contain a 256-bit checksum or 256-bit hash (currently a choice between Fletcher-2, Fletcher-4, or SHA-256)[38] of the target block, which is verified when the block is read. Blocks containing active data are never overwritten in place; instead, a new block is allocated, modified data is written to it, then any metadata blocks referencing it are similarly read, reallocated, and written. To reduce the overhead of this process, multiple updates are grouped into transaction groups, and ZIL (intent log) write cache is used when synchronous write semantics are required. The blocks are arranged in a tree, as are their checksums (see Merkle signature scheme).

Snapshots and clones[edit]

An advantage of copy-on-write is that, when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are created very quickly, since all the data composing the snapshot is already stored. They are also space efficient, since any unchanged data is shared among the file system and its snapshots.

Writeable snapshots (“clones”) can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist. This is an implementation of the Copy-on-write principle.

Sending and receiving snapshots[edit]

ZFS file systems can be moved to other pools, also on remote hosts over the network, as the send command creates a stream representation of the file system’s state. This stream can either describe complete contents of the file system at a given snapshot, or it can be a delta between snapshots. Computing the delta stream is very efficient, and its size depends on the number of blocks changed between the snapshots. This provides an efficient strategy, e.g. for synchronizing offsite backups or high availability mirrors of a pool.

Dynamic striping[edit]

Dynamic striping across all devices to maximize throughput means that as additional devices are added to the zpool, the stripe width automatically expands to include them; thus, all disks in a pool are used, which balances the write load across them.

Variable block sizes[edit]

ZFS uses variable-sized blocks, with 128 KB as the default size. Available features allow the administrator to tune the maximum block size which is used, as certain workloads do not perform well with large blocks. If data compression is enabled, variable block sizes are used. If a block can be compressed to fit into a smaller block size, the smaller size is used on the disk to use less storage and improve IO throughput (though at the cost of increased CPU use for the compression and decompression operations).[39]

Lightweight filesystem creation[edit]

In ZFS, filesystem manipulation within a storage pool is easier than volume manipulation within a traditional filesystem; the time and effort required to create or expand a ZFS filesystem is closer to that of making a new directory than it is to volume manipulation in some other systems.

Cache management[edit]

ZFS also uses the Adaptive Replacement Cache (ARC), a new method for read cache management, instead of the traditional Solaris virtual memory page cache. For write caching, ZFS employs the ZFS Intent Log (ZIL). ZFS makes allowances for both of these methods to incorporate separate virtual devices to improve the total IOPS. For read operations it is the “cache” vdev and for write operations it is the “log” vdev.[40]

Adaptive endianness[edit]

Pools and their associated ZFS file systems can be moved between different platform architectures, including systems implementing different byte orders. The ZFS block pointer format stores filesystem metadata in an endian-adaptive way; individual metadata blocks are written with the native byte order of the system writing the block. When reading, if the stored endianness does not match the endianness of the system, the metadata is byte-swapped in memory.

This does not affect the stored data; as is usual in POSIX systems, files appear to applications as simple arrays of bytes, so applications creating and reading data remain responsible for doing so in a way independent of the underlying system’s endianness.

Deduplication[edit]

Data deduplication capabilities were added to the ZFS source repository at the end of October 2009,[41] and relevant OpenSolaris ZFS development packages have been available since December 3, 2009 (build 128).

Effective use of deduplication may require large RAM capacity; recommendations range between 1 and 5 GB of RAM for every TB of storage.[42][43][44] Insufficient physical memory or lack of ZFS cache can result in virtual memory thrashing, which can either lower performance or result in complete memory starvation.[citation needed] Solid-state drives (SSDs) can be used to cache deduplication tables, thereby speeding up deduplication performance.[citation needed]

Other storage vendors use modified versions of ZFS to achieve very high data compression ratios. Two examples in 2012 were GreenBytes[45] and Tegile.[46]

Encryption[edit]

With Oracle Solaris, the encryption capability in ZFS[47] is embedded into the I/O pipeline. During writes, a block may be compressed, encrypted, checksummed and then deduplicated, in that order. The policy for encryption is set at the dataset level when datasets (file systems or ZVOLs) are created. The wrapping keys provided by the user/administrator can be changed at any time without taking the file system offline. The default behaviour is for the wrapping key to be inherited by any child data sets. The data encryption keys are randomly generated at dataset creation time. Only descendant datasets (snapshots and clones) share data encryption keys.[48] A command to switch to a new data encryption key for the clone or at any time is provided — this does not re-encrypt already existing data, instead utilising an encrypted master-key mechanism.

Additional capabilities[edit]

  • Explicit I/O priority with deadline scheduling.
  • Claimed globally optimal I/O sorting and aggregation.
  • Multiple independent prefetch streams with automatic length and stride detection.
  • Parallel, constant-time directory operations.
  • End-to-end checksumming, using a kind of “Data Integrity Field“, allowing data corruption detection (and recovery if you have redundancy in the pool).
  • Transparent filesystem compression. Supports LZJB and gzip.[49]
  • Intelligent scrubbing and resilvering (resyncing).[50]
  • Load and space usage sharing among disks in the pool.[51]
  • Ditto blocks: Configurable data replication per filesystem, with zero, one or two extra copies requested per write for user data, and with that same base number of copies plus one or two for metadata (according to metadata importance).[52] If the pool has several devices, ZFS tries to replicate over different devices. Ditto blocks are primarily an additional protection against corrupted sectors, not against total disk failure.[53]
  • ZFS design (copy-on-write + superblocks) is safe when using disks with write cache enabled, if they honor the write barriers. This feature provides safety and a performance boost compared with some other filesystems.
  • On Solaris, when entire disks are added to a ZFS pool, ZFS automatically enables their write cache. This is not done when ZFS only manages discrete slices of the disk, since it does not know if other slices are managed by non-write-cache safe filesystems, likeUFS. The FreeBSD implementation can handle disk flushes for partitions thanks to its GEOM framework, and therefore does not suffer from this limitation
  • Per-user and per-group quotas support.[54]
  • Filesystem encryption since Solaris 11 Express[1]
  • Pools can be imported in read-only mode
  • It is possible to recover data by rolling back entire transactions at the time of importing the zpool.
  • ZFS is not a clustered filesystem; however, clustered ZFS is available from third parties.[citation needed]

Limitations[edit]

  • Capacity expansion is normally achieved by adding groups of disks as a top-level vdev: simple device, RAID-Z, RAID-Z2, RAID-Z3, or mirrored. Newly written data will dynamically start to use all available vdevs. It is also possible to expand the array by iteratively swapping each drive in the array with a bigger drive and waiting for ZFS to heal itself; the heal time will depend on the amount of stored information, not the disk size.
  • As of Solaris 10 Update 11 and Solaris 11.2, it is neither possible to reduce the number of top-level vdevs in a pool, nor to otherwise reduce pool capacity.[55] This functionality was said to be in development already in 2007.[56]
  • It is not possible to add a disk as a column to a RAID-Z, RAID-Z2, or RAID-Z3 vdev. This feature depends on the block pointer rewrite functionality due to be added soon. One can however create a new RAID-Z vdev and add it to the zpool.[57]
  • Vdevs cannot be nested, so a mirror or RAID-Z top-level vdev can only contain files or disks. Mirrors of mirrors (or other combinations) are not allowed.[citation needed]
  • Reconfiguring the number of devices in a top-level vdev requires copying data offline, destroying the pool, and recreating the pool with the new top-level vdev configuration, except for adding extra redundancy to an existing mirror, which can be done at any time or if all top level vdevs are mirrors with sufficient redundancy the zpool split[58] command can be used to remove a vdev from each top level vdev in the pool, creating a 2nd pool with identical data.
  • Resilver (repair) of a crashed disk in a ZFS raid takes a long time. This applies to all types of RAID, in one way or another. This means that future large disks, say 5 TB or 6 TB, can take several days to repair. This means that raidz1 (similar to RAID-5) should be avoided, because repairing a raid puts additional stress on the other disks which might cause them to crash, losing all data in the storage pool if configured as raidz1. Therefore, with large disks, one should use raidz2 (allow two disks to crash) or raidz3 (allow three disks to crash).[59] It should be noted however, that ZFS RAID differs from conventional RAID by only reconstructing live data and metadata when replacing a disk, not the entirety of the disk including blank and garbage blocks, which means that replacing a member disk on a ZFS pool that is only partially full will take proportionately less time compared to conventional RAID.[60]
  • IOPS performance of a ZFS storage pool can suffer if the ZFS raid is not appropriately configured. This applies to all types of RAID, in one way or another. If the zpool consists of only one group of disks configured as, say, eight disks in raidz2, then the write IOPS performance will be that of a single disk. However, read IOPS will be the sum of eight individual disks. This means, to get high write IOPS performance, the zpool should consist of several vdevs, because one vdev gives the write IOPS of a single disk. However, there are ways to mitigate this IOPS performance problem, for instance add SSDs as ZIL cache — which can boost IOPS into 100.000s.[61] In short, a zpool should consist of several groups of vdevs, each vdev consisting of 8–12 disks. It is not recommended to create a zpool with a single large vdev, say 20 disks, because write IOPS performance will be that of a single disk, which also means that resilver time will be very long (possibly weeks with future large drives).

Platforms[edit]

Solaris[edit]

Solaris 10 update 2 and later[edit]

ZFS is part of Sun’s own Solaris operating system and is thus available on both SPARC and x86-based systems.

Solaris 11[edit]

After Oracle’s Solaris 11 Express release, the OS/Net consolidation (the main OS code) was made proprietary and closed-source, and further ZFS upgrades and implementations inside Solaris (such as encryption) are not compatible with other non-proprietary implementations which use previous versions of ZFS.

When creating a new ZFS pool, to retain the ability to use access the pool from other non-proprietary Solaris-based distributions, it is recommended to upgrade to Solaris 11 Express from OpenSolaris (snv_134b), and thereby stay at ZFS version 28.

OpenSolaris[edit]

OpenSolaris 2008.05 and 2009.06 use ZFS as their default filesystem. There are over a dozen 3rd-party distributions, of which nearly a dozen are mentioned here. (OpenIndiana and illumos are two new distributions not included on the OpenSolaris distribution reference page.)

OpenIndiana[edit]

OpenIndiana 148 and 151 use ZFS version 28, as implemented in Illumos.

By upgrading from OpenSolaris snv_134 to both OpenIndiana and Solaris 11 Express, one also has the ability to upgrade and separately boot Solaris 11 Express on the same ZFS pool, but one should not install Solaris 11 Express first because of ZFS incompatibilities introduced by Oracle past ZFS version 28.[62]

BSD[edit]

OS X[edit]

OpenZFS on OSX (abbreviated to O3X) is an implementation of ZFS for OS X.[63] O3X is under active development, with close relation to ZFS on Linux and illumos’ ZFS implementation, while maintaining feature flag compatibility with ZFS on Linux. O3X implements zpool version 5000, and includes the Solaris Porting Layer (SPL) originally authored for MacZFS, which has been further enhanced to include a memory management layer based on the illumos kmem and vmem allocators. O3X is fully featured, supporting LZ4 compression, deduplication, ARC, L2ARC, and SLOG.[citation needed]

MacZFS is free software providing support for ZFS on OS X. The stable legacy branch provides up to ZFS pool version 8 and ZFS filesystem version 2. The development branch, based on ZFS on Linux and OpenZFS, provides updated ZFS functionality, such as up to ZFS zpool version 5000 and feature flags.[64][65]

A proprietary implementation of ZFS (Zevo) was available at no cost from GreenBytes, Inc., implementing up to ZFS file system version 5 and ZFS pool version 28.[66] Zevo offered a limited ZFS feature set, pending further commercial development; it was sold to Oracle in 2014, with unknown future plans.[citation needed]

DragonFlyBSD[edit]

Edward O’Callaghan started the initial port of ZFS to DragonFlyBSD.[67]

NetBSD[edit]

The NetBSD ZFS port was started as a part of the 2007 Google Summer of Code and in August 2009, the code was merged into NetBSD‘s source tree.[68]

FreeBSD[edit]

Paweł Jakub Dawidek ported ZFS to FreeBSD, and it has been part of FreeBSD since version 7.0.[69] This includes zfsboot, which allows booting FreeBSD directly from a ZFS volume.[70][71]

FreeBSD’s ZFS implementation is fully functional; the only missing features are kernel CIFS server and iSCSI, but at least the latter can be added using externally available packages.[72] Samba can be used to provide a userspace CIFS server.

FreeBSD 7-STABLE (where updates to the series of versions 7.x are committed to) uses zpool version 6.

FreeBSD 8 includes a much-updated implementation of ZFS, and zpool version 13 is supported.[73] zpool version 14 support was added to the 8-STABLE branch on January 11, 2010,[74] and is included in FreeBSD release 8.1. zpool version 15 is supported in release 8.2.[75] The 8-STABLE branch gained support for zpool version v28 and zfs version 5 in early June 2011.[76] These changes were released mid-April 2012 with FreeBSD 8.3.[77]

FreeBSD 9.0-RELEASE uses ZFS Pool version 28.[78][79]

FreeBSD 9.2-RELEASE is the first FreeBSD version to use the new “feature flags” based implementation thus Pool version 5000.[80]

MidnightBSD[edit]

MidnightBSD, a desktop operating system derived from FreeBSD, supports ZFS storage pool version 6 as of 0.3-RELEASE. This was derived from code included in FreeBSD 7.0-RELEASE. An update to storage pool 28 is in progress in 0.4-CURRENT and based on 9-STABLE sources around FreeBSD 9.1-RELEASE code.

PC-BSD[edit]

PC-BSD is a desktop version of FreeBSD, which inherits FreeBSD’s ZFS support, similarly to FreeNAS. It also allows installation with disk encryption using geli. Its graphical installer can handle even / (root) on ZFS and RAID-Z pool and Gnome installs right from the start in an easy convenient way (GUI). The current PC-BSD 10.0+ “Joule Edition” has ZFS filesystem version 5 and ZFS storage pool version 5000.

FreeNAS[edit]

FreeNAS, an embedded open source network-attached storage (NAS) distribution based on FreeBSD, has the same ZFS support as FreeBSD and PC-BSD.

NAS4Free[edit]

NAS4Free, an embedded open source network-attached storage (NAS) distribution based on FreeBSD 9.2, has the same ZFS support as FreeBSD 9.2, ZFS storage pool version 5000. This project is a continuation of FreeNAS 7 series project.[81]

Debian GNU/kFreeBSD[edit]

Being based on the FreeBSD kernel, Debian GNU/kFreeBSD has ZFS support from the kernel. However, additional userland tools are required,[82] while it is possible to have ZFS as root or /boot file system[83] in which case required GRUB configuration is performed by the Debian installer since the Wheezy release.[84]

As of 31 January 2013, the ZPool version available is 14 for the Squeeze release, and 28 for the Wheezy-9 release.[85]

Linux[edit]

ZFS has several Linux implementations despite the fact that the GNU General Public License (GPL), under which the Linux kernel is licensed, is incompatible with the Common Development and Distribution License (CDDL) under which ZFS is distributed.[86]According to the used licensing models, a single derived work of both projects cannot be legally distributed, as it is not possible to simultaneously meet both licenses’ requirements.[87] To include ZFS in the Linux kernel, ZFS would have to be cleanly reimplemented, and patents may hamper this.[88]

This problem is being worked around by providing the kernel facilities through a separate kernel module, a technical solution for a legal problem that is also being employed by vendors and distributors of proprietary hardware drivers.

Native ZFS on Linux[edit]

A native port of ZFS for Linux produced by the Lawrence Livermore National Laboratory (LLNL) was released in March 2013,[89][90] with the following key events:[91]

  • 2008: prototype to determine viability
  • 2009: initial ZVOL and Lustre support
  • 2010: development moved to Github
  • 2011: POSIX layer added
  • 2011: community of early adopters
  • 2012: production usage of ZFS
  • 2013: stable GA release.

Of the major distributions, Ubuntu and Gentoo have very good support for ZFS on Linux, meaning that required packages can be installed from their own package repositories, and configuring a ZFS root filesystem is well documented.[92][93] Slackware also provides documentation on supporting ZFS, both as a kernel module[94] and when built into the kernel.[95]

The current zpool version supported by ZFS on Linux is 5000.[96]

Linux FUSE[edit]

Another solution to the issue with licenses incompatibility was to port ZFS to Linux’s FUSE system, so the filesystem runs entirely in userspace instead of being part of the Linux kernel, in which case it is not considered a derived work of the kernel. A project to do this was sponsored by Google’s Summer of Code program in 2006.[97]

KQ InfoTech[edit]

Another native port for Linux was developed by KQ InfoTech in 2010.[98][99] This port used the zvol implementation from the Lawrence Livermore National Laboratory as a starting point. A release supporting zpool v28 was announced in January 2011.[100] In April 2011, KQ Infotech was acquired by sTec, Inc., and their work on ZFS ceased.[101] Source code of this port can be found on GitHub.[102]

The work of KQ InfoTech was pulled back into the native port of ZFS for Linux, produced by the Lawrence Livermore National Laboratory.[101]

List of operating systems supporting ZFS[edit]

List of Operating Systems, Distros and add-ons that support ZFS, the zpool version it supports, and the Solaris build they are based on (if any):

Click here to view the original Wikipedia Post and Available O/S that ZFS supports

 

Liquid Layer Networks | Cloud Hosting with ZFS

LiquidLayer.net