Misplaced Pages

Independent Reference Model

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Independent Reference Model" – news · newspapers · books · scholar · JSTOR (October 2014) (Learn how and when to remove this message)
This article is written like a personal reflection, personal essay, or argumentative essay that states a Misplaced Pages editor's personal feelings or presents an original argument about a topic. Please help improve it by rewriting it in an encyclopedic style. (May 2024) (Learn how and when to remove this message)
(Learn how and when to remove this message)

The Independent Reference Model (I.R.M.) is a conceptual model used in the analysis of storage system: disk drives, caches, etc.

Under this model the references to stored objects are independent random variables.

Description

The motivation for coming up with this model (and others like it) is to compensate for the lack of "traces" in such storage devices.

A "trace" is simply a logging of a set of data regarding the performance of the storage device, focusing on the I/O requests: how many read/write operations, the size of each request, the exact address (LUN-wise), and a time-stamp.

Accurate, valid and detailed traces of actual storage systems are very difficult to get for the purpose of academic analysis (for reasons that are beyond the scope of this article), and that is why such models are a necessity.

Usually, the data that is available is much coarser and low in quality, in comparison to a full trace: For example, the data might record for every unit of time T, the number of I/Os that took place at each LUN (or track), along with the total hit/miss ratios.

For example: For a disk drive with 4 tracks, A, B, C and D and after 15 minutes of work, the I/O requests were the following: 7600, 20, 50, 6000 for A, B, C and D respectively.

It is easy to see why this data is insufficient to determine the actual workload: Consider a second even more simple example: Two tracks, A and B, which each have 1,000 I/Os during 15 minutes.

To answer the simple question, "How hard did the disk work during those 15 minutes?" then consider these two following scenarios:

  • (I) The disk first received and responded to all 1,000 I/O requests at track A, and later, all 1,000 I/Os of track B.
  • (II) The disk received and responded to an I/O request from a different track interchangeably: First at A, then at B, then A again, alternating A/B up to 1000 times.

It is easy to see that, at each of these scenarios, the amount of work done by the disks is very different (in the first scenario, the case being that the disk did a minimal amount work, not having to travel between tracks more than once, and in the second scenario, a maximal amount of work).

The I.R.M. was first introduced by E. Coffman and P. Denning, and it is still in active use today. It is the most simplified model.

In this "memoryless" model, every I/O reference represents an i.i.d multinomial random variable, whose outcome is the location of the next reference track. Thus, arrivals to any given track must occur at a specific average rate, which is directly proportional to the probability of requesting the track. More specifically, the arrival rate of requests to a given track equals its access probability, of being referenced, multiplied by the overall rate of requests.

That is, we denote N as the sum of all the I/O requests (both read and write) and assign to each track the probability the number of I/Os, which came from it, divided by N. In the case of our first example: N = 7600 + 20 + 50 + 6000 = 13,670 and we'll assign the following probabilities to each track:

A → 7600/N, B → 20/N, C → 50/N and D → 6000/N.

The benefit of this model, other than being simple and easy to work with, is its conservative property. This means that when analyzing the worst-case-scenario, we cannot be off by very much in the result of the model, as illustrated in the following example:

  • Going back to the second example:
In the best-case-scenario - the disk only traveled once from A to B, and in the worst-case, the disk traveled 2,000 times back or forth.
Using the I.R.M. model (a technical computation that will not be brought here), then the expectancy is 1,000 travels between tracks.
That is: the result was off from the worst-case-scenario by a multiple of two, while in the best-case-scenario, it actually spared a multiple of 1,000!

Indeed, it can be proven that the I.R.M. model always satisfies that it is always "off" by, at most, a multiple of two.

References

  1. Flajolet, Philippe (1992). "Birthday paradox, coupon collectors, caching algorithms and self-organizing search". Discrete Applied Mathematics. 39 (3): 207–229. doi:10.1016/0166-218x(92)90177-c.
  2. Coffman, Edward G. Jr.; Denning, Peter J. (1973-01-01). Operating Systems Theory. Prentice Hall Professional Technical Reference. ISBN 978-0136378686.
Category: