Notions, Notations and Axioms

Set theory was developed in the second half of the Nineteenth Century. It has its roots in the work of Georg Cantor, although contributions of others such as Gottlob Frege and Giuseppe Peano were significant. Ultimately, the goal of Set Theory was to provide a common axiomatic basis for all of mathematics. In some sense, mathematics could then be reduced to logic. Attempts to provide an axiomatic basis for mathematics were undertaken by such prominent individuals as Bertrand Russell, Alfred North Whitehead, and David Hilbert.

A set is a collection of things of any kind. If B is a set we call the "things" in B the elements or members of B. In symbolsmeans  b is an element of B. Similarly, for a set B the statementmeans that the object b is not in B. Sets themselves are often symbolized by enclosing their elements within "curly brackets". For example {Jimmy Carter, Ronald Reagan, George Bush, Bill Clinton} represents the set of US presidents during the years 1980 to 1995. Note: The use of { } , which is also employed as a grouping symbol, for sets could be confusing, but generally the context makes what is intended clear.

Sets can also be described by a rule or predicate members must satisfy. If P(x) is the predicate "x was a president of the US during the years 1980 to 1995", then the set listed above could be symbolized as {x | P(x) }. This is rendered in words as "the set of all x such that P(x)". The "|" stands for the expression "such that", although some authors prefer ":" to mean the same thing. The use of curly brackets and a rule to specify sets is called set-builder notation and is used throughout mathematics.

Naïve Set Theory is based on the following two axioms.

1. Axiom of Abstraction (or Comprehension): Any collection of objects that can either be listed or described by some predicate constitutes a set. Furthermore, an object is a member of a set if and only if it is one of the objects listed or satisfies the open predicate describing the set. In symbols given any predicate, P(x), the set {x | P(x)} exists and.

2. Axiom of Extensionality: Given two sets A and B, A = B if and only if . In words, two sets are identical if and only if they have exactly the same elements.

Let A = { Bob Jones, Felix the cat, Mars, 9 } and B = { 9, Mars, Bob Jones, Felix the cat}. By the application of the axioms A = B, since every member of one set is also in the other set. Thus, the order in which elements are listed is immaterial in defining a set. A set has no characteristics other than its status as a collection of its elements.

These two axioms seem very intuitive and self-evident. In fact, they seem almost devoid of content. You might suspect that they lack sufficient power to prove anything of importance. In fact, these two axioms are too powerful! You can prove absolutely everything from them because they are self-contradictory! We will discus this later when we outline Russell's Paradox. Suffice it to say that the problem lies in the Axiom of Abstraction's lack of restriction on the predicate P(x). Thus, by the Axiom of Abstraction, we are allowed to formulate things such as the set of all sets. By making our universe too big we are begging for trouble! In our use of Naïve set theory we will purposely avoid such constructions.

For a given predicate P(x), consider the set . Suppose for some a, then by the Axiom of Extensionality, . Hence, by Indirect Proof, we've established that the set Q has no members, i.e., . This set is called the empty Set or the null Set and is commonly symbolized by either empty brackets { } or the symbol . The empty set goes by infinitely many aliases. For example, the following sets are all different names for the one and only empty set. (Z is the "standard" name for the set of integers.)

Note: The set is not the empty set. This is a set of sets which has an element, namely the null set.

Some sets are contained entirely within other sets. For example, the set of women is contained within the set of human beings. We say that is a subset of . In symbols,

. The formal definition of inclusion (A being a subset of B) is stated as follows:

In words, given any two sets A and B, A is a subset of B if and only if every element of A is also an element of B.

Some authors use the symbolinstead offor inclusion. In analogy to the symbols <and, we will use for inclusion and will reserve to mean that A is a proper subset of B. This means that A is contained in B, but is not all of B. In symbols, if A and B are sets, . Sinceisalways true when p is false and since is always false, for any set A, it is true that . Hence, the empty set is a subset of every set,. From the notion of subset we can characterize set equality by the following theorem, given two sets A and B,.

Proof: Assume A = B, then by the Axiom of Extensionality every element in A is also in B and every element in B is also in A, so and. So. Assume . Pick any x. If x is in set A, since, x is also in B. If x is in set B, since, xis also in A. So,. So by the Axiom of Extensionality, A = B. So.

Consider the set L = {a, b, c}. The full list of the 8 subsets of L is as follows:

,{a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}.

The set of all possible subsets of a given set A is called the power set of A, in symbols P(A). Thus, for the set L,

P(L) = {,{a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.

Unions, Intersections and Relative Complements

Given two sets A and B we can define the following binary set operations.

1. The union of A and B .

2. The intersection of A and B :.

3. The relative complement of A in B .

The union of two sets is everything in either one of them. It is the "sum total" of the two sets. The intersection of two sets is the "overlap", i.e., the part of the two sets which is common to both. If the intersection is empty, we say the two sets are disjoint. This means they have no elements in common. In probability theory two disjoint events are called mutually exclusive, which means one event happening excludes the possibility of the other having happened. The relative complement of A in B are the elements in B without the ones also in A. One might read this as B "minus"

A or B "without" A. For example, if A = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and B = {0, 2, 4, 6, 8, 10, 12, 14, 16}, then


From the definitions of intersection, union and inclusion, we can immediately establish that for any sets A and B,, and .

Applying the indicated tautology or contradiction along with the set operation definitions establishes each of the following set operation identities for any sets A, B, and C.

Tautology Set Operation Identity
Contradiction Set Operation Identity

The above identities serve as the basis for an "algebra of sets".

We can now prove the following two theorems:




1. . Now consider any x in A. Since , x is also in B. Therefore x is in . Hence,. Since ,we have .

2. . Now consider any x in . Then x is either in A or B, but since , if x is in A, it is also in B. Therefore x is in B. Hence,. Since,we have .

As was noted earlier, allowing the scope of sets to become too large (such as the set of all sets) can lead to contradictions. This can often be avoided by imagining that all the sets under consideration are subsets of some finite universe. Some authors use U for such a set, but we will call it S to avoid confusion with the set operation of union. S is also the symbol used in probability for the universe of all possible outcomes of an experiment. In this context S is called the Sample Space. A common usage is to abbreviateby . This of course is a more concise notation. It also reflects the definitions of Boolean Algebra, where multiplication, as indicated by the juxtaposition of two variables, is defined as set intersection and addition is set union. Working within the restriction of a universe S, allows us to define the complement of a set A as the relative complement of A in S.

Since the complement of A is that part of "everything" (i.e., S ) that's not A , we have the interpretation of complement as negation. In symbols, is "like".

Assuming that the sets A and B are subsets of S and using the algebra of sets we can establish the following:



Set Identity Interpretation
What's not part of everything is nothing
What's not part of nothing is everything.
Not (not A) is A.
A and not A is always false.
B but not A is B without A.
DeMorgan's Rule for negating an OR.
DeMorgan's Rule for negating an AND.
A is present when either A is present with B or A is present without B.
Either A is in the universe (S) or it is not (EMI).

Set operations in a universe S can be viewed graphically by a Venn diagram. Here the rectangle represents S and any subset of S is drawn as a closed figure within this rectangle. This is illustrated below.

Venn Diagram Showing the Intersection and Union of Sets A and B

Sets of Numbers

The most basic numbers are the natural numbers ("counting" numbers or positive integers). While there are ways to construct the natural numbers using sets, such a development is beyond the scope of these notes. Furthermore, this construction is not intuitive and strikes many people (mathematicians included) as artificial. Thus, we will consider the natural numbers as undefined and primitive objects. This approach can also be taken to extremes as seen in the work of Leopold Kronecker, who declared "God created the integers, all else is the work of man". The set of natural numbers is often symbolized as either Nor Z+. If the ellipsis is understood to represent the same sequence repeating without end, then N = { 1, 2, 3, 4, 5, 6, 7, … }.

After much resistance zero was eventually accepted as a "valid" number. The set of whole numbers is given by

W = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, … } = union of N with { 0 }. The negative of a natural number, n, is its opposite , so that . We define the set of integers as Z =.

Since not every mathematical process (for example, measurements with a rule) results in an integer answer, we need to consider numbers that are fractions or ratios of integers. Assuming that the process of division is understood, we can write


For integers this can be interpreted as repeated subtraction just as multiplication can be interpreted as repeated addition. Thus, 34 divided by 5 is 6, remainder 4. This means that we can subtract 5 from 34 six times before we get a remainder of 4 less than 5. Essentially, thecomputation  tells us how many 5's are in 34. There are six 5's in 34 with 4 left over. This is one reason why division by zero is meaningless. How many 0's are in 34? How many 0's could we subtract from 34 till we get a remainder less than 0?  So when we write , we assume that q is not zero. We therefore define the set of rational numbers as the set of all permissible ratios of integers. In symbols, the set of rationals is given by

Q =. A second way to characterize the set of rationals is

Q = {x | x has a terminating or repeating decimal representation }. In fact this definition can be shortened as just the set of repeating decimals. Any terminating decimal is a repeating decimal with a repeating string of 0's. For example, 3.458 = 3.458000… . One problem with defining rational numbers this way it that the decimal representation of some rational numbers is not unique. Consider the decimal 0.49999… , where it is understood that the string of 9's never stops. This is the rational number one half which is also represented by the terminating decimal 0.5 . This redundancy in representation exits for every rational number that is a terminating decimal. To understand the equivalence of these forms requires a familiarity with limits and infinite geometric series. These concepts are beyond the scope of the present notes. However, the following computation may make the result at least seem plausible.

Since we would like our decimal representation of rational numbers to be unique, we will assume that all numbers that end in an infinite string of 9's have been rewritten as the corresponding terminating decimal.

Still there are processes whose answers can not be rational. Pythagoras is said to have known that the square root of 2 is such a number. A number that is not rational is called irrational. This certainly does not mean that the number is "deranged" or even illogical, it simply means that the number can not be expressed as a ratio of integers. Other Greek mathematicians such as Theodorus, were aware of additional irrational quantities.

The proof that is irrational was one of the first examples of indirect proof. It assumes some familiarity with arithmetic and is presented below.

Assume that , where P and Q are both positive integers without any common factors. If P and Q were not originally in lowest terms, we could remove as many common factors as necessary until we arrived at a rational representation of  that was in lowest terms. So without any loss of generality, we will assume that P and Q have no common factors. Now, by definition of the square root, . So P must be even since it has an even square. Hence,

P = 2M for some natural number M. Substituting this into the expression for P squared yields . So Q must be even since it has an even square. Hence, 2 divides both P and Q in contradiction to the assumption that P and Q have no common factors. Therefore, it is impossible to expressas a rational number.

In fact, using the Fundamental Theorem of Arithmetic, we can show that the n'th root of any natural number that is not a perfect n'th power must be irrational. Let a be a natural number that is not a perfect n'th power. This means that there is no natural number m with the property that mn = a .

Now suppose , where P and Q are natural numbers with P > Q > 1 . We know Q > 1, since Q = 1 implies that Pn is a. As before we can assume that P and Q have no common factors. Now by the Fundamental Theorem of Arithmetic, P and Q can both be written as unique products of prime factors. Specifically, if the J prime factors of P are p1, p2, … pJ, and K prime factors of Q are q1, q2, … qK , then none of the prime factors of P match any of the prime factors of Q.

By the definition of the n'th root, we have

. Now, the prime factors of Pn are the same as the prime factors of P, i.e.,

p1, p2, … pJ, and the prime factors of Qn are the same as the prime factors of Q . Only the multiplicity (how often each factor occurs) has changed.

However, so we have a second factorization of Pn , which involves different prime factors. This is a contradiction of the Fundamental Theorem of Arithmetic. So for a not a perfect n'th power, is irrational.

By taking the union of the set of rational numbers with the set of irrational numbers we form a bigger set, R, called the set of real numbers. It is convenient to represent the real numbers geometrically as points on a "number line". This is illustrated below.

Often we want to work with subsets of real numbers that correspond to intervals on the number line.

The open interval from a to b does not include the endpoints;.

The closed interval from a to b does include the endpoints;.

The semi-open or semi-closed intervals from a to b include just one endpoint;


The entire real number line can be represented using this notation;R.Here the symbol is "infinity" and is not any kind of real number. It is used in this context to mean that the interval has no bounds.

Relations and Functions

An ordered pair is a pair of objects in which (unlike a set) order makes a difference. We represent ordered pairs by (a, b) . This use of parentheses reflects an earlier time when the choice of typographical symbols was limited. Unfortunately, it is the third use of parentheses we've encountered! The earlier uses were as a grouping symbol and to designate an open interval of real numbers. Reluctance to change notation means that we'll have to rely on context to distinguish which use of parentheses is intended. In (a, b) a is called the first component and b is called the second component. The main idea of an ordered pair is that for two ordered pairs to be equal their components must match.

In fact, the relation (a, b) can defined as the set of sets { {a}, {a, b} } , which ensures the above property.

A set of ordered pairs is called a relation. In symbols this can be written as follows:

Here, in the second statement we have introduced an abbreviated form of set builder notation in which we designate immediately after the curly brackets that this will be a set of ordered pairs. The nature or conditions of the relation are given by the predicate xRy. For example, let xMy be the statement "y is the mother of x". Then the relation M is a set whose elements are of the form (child, mother). We might say this relation defines the "relationship" of motherhood. In fact, this is why the name relation was chosen to designate a set of ordered pairs.

The set of values of the first component of the ordered pairs in a relation is called the domain of the relation and is designated as Dom(R). The set of values of the second component of the ordered pairs in a relation is called the range of the relation and is designated as Rng(R). Here, R means the given relation, not the set of real numbers.

Dom(R) = 

Rng(R) = 

In the relation M, Dom(M) is the set of all individuals who were someone's child, while Rng(M) is the set of all individuals who are mothers. Undoubtedly, the intersection of these two sets is not empty!

Given any relation, we can define its inverse by reversing each ordered pair. The inverse of M, usually designated by M-1 would be the set of all (mother, child) pairs. In general, we define the inverse of a relation as follows:

From this it is obvious that the Dom(R-1)= Rng(R) and Rng(R-1) = Dom(R).As a second example, if R were the relation consisting of all (names, phone numbers) in a community (i.e., the contents of a "phone book"), then R-1 would be an "inverse phone book" with entries of the form (phone numbers, names).

So far we have allowed the predicate relationship xRy to define the relation and its domain and range. An alternate approach is to start with two sets A and B and define the relation (and hence the relationship) as all possible pairings of elements of A with elements of B. This is called the Cartesian (after René Descartes) cross product of A with B.

 From this definition we see that .

Thus in general, the Cartesian product is non-commutative, i.e., . The set R xRcommonly called R2 is the set of all ordered pairs of real numbers. Just as Rrepresents all the points on a line, RxR represents all the points in a plane. Thus RxR is referred to as the Cartesian plane.

The schematic below graphically represents a relation between two sets A and B. From the 10 connections drawn between the two sets we can list the 10 ordered pairs in R. Thus, each connection represents a "relationship" between an object in the domain and an object in the range. Note: In a relation, a given element in the domain can connect to more than one element in the range and vice versa.

A relation R is called reflexive on a set A if and only if.

A relation R is called symmetric if and only if .

A relation R is called transitive if and only if.


1. If R is symmetric, then Dom(R) = Rng(R). To prove this, let x be any element of Dom(R), then there must be some

y in Rng(R) with xRy. Since R is symmetric, we must also have yRx. Hence, x is also an element of Rng(R) and

. Now let y be any element of Rng(R), then there must be some x in the Dom(R) with xRy.

Again since R is symmetric, we must also have yRx. Hence, y is an element of Dom(R) and .

2. If R is symmetric and transitive, then R is reflexive on Dom(R). Since R is symmetric, we have Dom(R) = Rng(R).

Let x be any element of Dom(R), then there is a y in Dom(R) with (x, y) in R. Since R is symmetric, we have both

xRy and yRx. Since R is transitive, xRy and yRx implies xRx. Hence, for all x in Dom(R), we have xRx.

If a relation is reflexive, symmetric, and transitive, it is called an equivalence relation on the set A. The prototypical equivalence relation is equality, which obviously has all three properties. A slightly different example of an equivalence relation is "x weighs the same as y", which of course does not assert that x and y are the same object. It merely states that when the objects are weighed the measurements are equal. A third example of an equivalence relation is "x has the same shape as y". This is the geometric equivalence relation called similarity.

The significance of an equivalence relation on a set A is that it allows us to partition A into disjoint subsets all of which have the same property. For example, we could partition a set of people into "equal weight" groups, or the set of all possible triangles into subsets based on similarity (shape).

Often we are interested in relationships between two sets of objects in which the choice of an object in one set completely and unambiguously determines the related object in the second set. For example, if a TV is working properly, the choice of channel number should unambiguously determine one and only one picture! We often label the first set, that controls the choice of objects in the second set, as the input set or the independent variable set. The second set is then called the output or dependent variable set. The idea of these terms is that there is "some freedom" in the choice of the independent variable (like the TV channel), while the output seen depends on this choice. In such a determined relationship it is customary to state the independent variable as the first component (it's selected first after all), and the output as the second component. The key restriction is that for a given input, there must be one and only one (i.e., a unique) output. The input determines the output, so we should only see one response. Thus, in this kind of relation, an ordered pair with a given first component can only appear once! Such a relation is called a function.

A relation f is a function if and only if . Since the second or output component of a function is completely determined by the first or input component, it is common practice to use the following notation in defining a function f. Here, f (x) is read "f of x" and designates the unique element in the Rng(f ) that is associated with the input x from the Dom(f ). We can picture a function as a "machine" or "black box" that converts the input x into an output designated by f (x) . We then think of the function not so much as a set of ordered pairs, but rather a process that produces a unique response based on the initial input. This terminology is also used in statements like "the health of the nation's economy is a function of the level of productivity", which is interpreted to mean that the level of productivity determines the health of the economy. Labeling certain buttons on a calculator as "function keys" means that using these buttons along with the appropriate input generates a particular and unique output. Sometimes in mathematics we can specify the process that converts input into output through a formula. For example, if Dom(f ) = R and we have the formula, we generate ordered pairs in RxR by applying this formula to each real number x .

The formula can be applied to any input as is illustrated by the following:

The last four examples illustrate that when we write a formula for a function, f (x) = …x… , the variable x is a "dummy" and just stands for the input value. The essence of the process, which is the function, stays the same for any choice of the independent variable name.

 For , the Rng(f ) = is not all of R.

A second notation for functions is . Here A is the domain of f, and the "action" of the function is to map or transform (thesymbol, which in this context does not mean implication!) elements of this domain into elements of the set B. The notation of a function "of a set", f (A), means the range of f when the domain of f is A.

f (A) =Rng(f ) = . In general, the range is only a subset of B. If the Rng(f ) = B, we say the function is onto or surjective. This is expressed in symbols by . To be more precise,


In the example,, we would write f : RR, the function is into R, but not onto R.

What makes a function a function is that for every input in the domain there is one and only one output in the range.

This means a particular first component or input occurs in one and only one ordered pair in the complete set of ordered pairs that constitutes the function. If in addition, we have that every second component or output occurs in

one and only one ordered pair in the function, we say that the function is 1-1 or injective and write . This means that every output in the range is generated once and only once by its own particular input. Hence, if the outputs match, then the inputs must match. This gives us the following definition:


Equivalently a function is 1-1 if and only if


In the TV example discussed earlier, the function that converts channels into pictures is usually not 1-1. It is quite possible that the same picture is being shown on different stations. If the president is addressing the nation, I don't conclude my set is broken just because the same picture pops up on different channels. The picture is still a function of channel, it's just not a 1-1 function. The function given by is also not 1-1, since we get the same output for both a given number and its negative.

One important property of 1-1 functions is that the inverse relation f -1 is also a function. Recall that the inverse of a relation is set of ordered pairs obtained by switching the order of all pairs in the original relation. For a function this means switching input with output. The domain of f -1 is the range of f and the range of f -1 is the domain of f.

Suppose for some x, y, and z that (x, y) and (x, z) are all in f -1. Then (y, x) and (z, x) are all in f , i.e., f (y) = x and

f (z) = x. Since f is 1-1, it must be that y = z . Hence, we have established that

, so that f-1 is a function.

 Since f -1 is a function, we can write the relation in two different, but equivalent ways.

Thus, we have the following equivalence that for any 1-1 function f .

Thus, for any x in the range of a 1-1 function f,


and for any x in the domain of a 1-1 function f ,


 Consider the function defined from set A to set B by the four ordered pairs shown below.

This is certainly a function since no element in the domain A is mapped to more than one element in the range. However, this function is not 1-1 since both x1 and x2 in the domain get mapped to the same element, y5 in the range. The function is also not onto since there are elements in B, y2, y4, and y6 , which are not in the range of f .


Consider the function defined from set A to set B by the four ordered pairs shown below.

This is a 1-1 function since every element in the domain A connects to one and only one element in the range, and every element in the range is connected to only one element in the domain. Since the range does not include the elements y6 and y4 of B, the function is not onto.

Note: Since f is a 1-1 function, an inverse function f -1 exits. Its domain is not B, since f was not onto.

Consider the function defined from set A to set B by the four ordered pairs shown below.

This is a function since every element in the domain A connects to one and only one element in the range. However, this function is not 1-1 since both x1 and x4 in the domain get mapped to the same element, y3 in the range. This is an onto function since there are no elements in B which are not in the range of f .


A function from set A to set B that is both 1-1 and onto is called a bijection. 

If f is a bijection, then not only is Rng(f ) = Dom( f-1) = B, but the Rng(f -1) = Dom(f ) = A. In a bijection each element of A is matched with one and only one element of B, and each element of B is matched with one and only one element of A. It's like a game of "musical chairs" where there are exactly the same number of chairs as people. Each person gets a chair and each chair gets a person. The two sets are in a 1-1 correspondence and have exactly the same number of elements.

Consider the function defined from set A to set B by the four ordered pairs shown below.

This is a 1-1 function since every element in the domain A connects to one and only one element in the range and every element in the range connects to one and only one element in the domain. This is also an onto function since there are no elements in B which are not in the range of f . Thus, this function is a bijection.

Note: Since f is a 1-1 function, an inverse function f -1exits. Since f is an onto function the domain of f -1 is B.

An example of a function which is a bijection from any set unto itself is the Identity Function, I(x) = x . This function is called the identity function because it returns whatever was put in. The output is identical to the input.

For any set A, .

Often the output of one process becomes the input of a new process. To model this mathematically we need to apply two or more functions in succession. This is called composition and is symbolized by an "open circle" between the two function names. To be more precise we state the following definition: given sets A, B, and C, and functions f, and g with , the composition of "g with f " is a function whose output on the Dom(f ) is characterized by.


We have the following three results.

1. Compositions of onto functions are onto. 

Let c be any element of C, then since g is an onto function from B to C, there is a b in B with g(b) = c. Now, since f is an onto function from A to B, there is a a in A with f (a) = b . Hence, there is an a in A with .

2. Compositions of 1-1 functions are 1-1 .

Suppose for some a in A and b in A that. Now  and  and g is 1-1 from B to C, so . However, f is 1-1 from A to B, so a = b . Hence, .

3. Compositions of bijections are bijections.

This follows from the combined application of 1 and 2.

Finite, Infinite, Denumerable and Uncountable Sets

As we have seen, two sets A and B have exactly the same number of elements if there exits a bijection or one-to-one correspondence between them. We formalize this observation with the following definition. Two sets A and B are said to be cardinally equivalent, in symbols, , if and only if there exits a bijection between them.

A set A is finite if and only if either A is empty or, for some N , the set {1, 2, 3, … nA. The natural number n is called the cardinality of A or Card(A) and is simply the number of elements of A. If A is empty, Card(A) = 0.

Note: This definition recalls the most basic of mathematical operations: counting. As we count, we go through {1, 2, 3, … n} setting up a one-to-one correspondence with the objects being counted.

Cardinal equivalence is an equivalence relation. To demonstrate this, we need only find or construct an appropriate bijection.

Reflexive: Let f (x) = I(x) = x , the identity function. For any set A,.

Symmetric: If , there is a bijection f from A to B. Then the function f-1 is a bijection from B to A.

Transitive: If and, there are bijections f and g with  and . Then the composition is a bijection from A to C.

In general for a finite set A with n elements, A = {e1, e2, e3, e4, e5, … en}, P(A) has 2n elements. More formally, if Card(A) is n, then Card(P(A)) = 2n. We can verify this formula using mathematical induction.

 Base Case: n = 0 (Technically, 0 is not a natural number, but establishing this base case combined with the Induction Step will establish the desired theorem.) If n = 0, A. Suppose, then .

If , then , which is a contradiction. . Hence, . Therefore, has only 1 = 20 elements.

Induction Step: Assume that any set of n elements has 2n elements. Now consider any set of the form

A = {e1, e2, e3, e4, e5, … en, en+1}. Let , then either. If , then x must be one of the 2n subsets of {e1, e2, e3, e4, e5, … en}. If , then let. Every element of y must then come from the set {e1, e2, e3, e4, e5, … en}, i.e., then y must be one of the 2n subsets of {e1, e2, e3, e4, e5, … en}. Hence, there are 2n different choices for x with. Thus, the total number of choices for x is 2n + 2n =2n+1.

What if there is no subset of Nwhich has a bijection to A? In that case we say the set A is infinite and the Card(A) is not a natural number. Stated differently, a set A is infinite if and only if it's not finite. The set Nis an infinite set. This result may seem intuitively obvious, but it still requires a proof based on the definitions. We could use Mathematical Induction and indirect proof to show that for every natural number n, there does not exist a bijection from {1, 2, 3, …, n } to N . For the sake of brevity, we will not work through this argument. A set A which is cardinally equivalent to Nis called denumerable or countably infinite.

Galileo seems to have been the first person to claim that the set of even numbers E = {2, 4, 6, 8, …}, despite that it has "only half" of the elements in N, has the same number of elements as N, i.e., N . This result follows from the fact that the function f (x) = 2x is a bijection from N to E. For any non-zero m, a linear function of the form,

f (x) = mx + b, is a bijection from N tothe infinite arithmetic sequence {b + m, b + 2m, b + 3m, b + 4m, … }. We therefore conclude that all such sequences are cardinally equivalent to N.

Even infinite geometric sequences like {10, 100, 1000, 104, 105, 106, 107 , … }, despite their "sparse" appearance, are cardinally equivalent to N. This is true, since for , the function is a bijection from N to such sets.

What can we say about infinite sets that include all of the natural numbers? Are they "bigger" than N ? The following bijection from N to Z shows that NZ.

LetQ + be the set of positive rational numbers. The following scheme shows that a bijection from N to Q + exists.

Arrange all the positive rational numbers P/Q in a table with the numerator P labeling the columns of the table and the denominator Q labeling the rows. Since the desired map from N to Q +is to be 1-1 it is necessary to remove all multiple occurrences of the same rational numbers. Thus, all unit fractions such as 2/2, 3/3, etc. are "struck off" the table since they equal 1/1. In a like manner any fraction equivalent to a fraction in an earlier row is removed. After this process is complete (it will take forever!) we have each element ofQ + listed once and only once in the table. But the positions of these entries can be indexed by a single natural number by moving diagonally through the table and "counting off" as we go from each listed entry to the next listed entry. [A slightly different form of this argument would be to use the movement along the diagonal to "count" all the elements of N xN (i.e., N xNN).

 We then argue that, since the elements of Q + can be constructed from a infinite subset ofN xN, Q +N . However, this approach anticipates definitions and arguments that we have yet to elaborate.]

In the function f which is the bijection from N to Q +;

Now we can show that NQ. The argument is almost identical to the demonstration that Z. Let f be the bijection from N to Q +shown above. Then the function g defined below is a bijection from from N to Q.

Note: The result NQshould seem rather surprising! Unlike N or Z,where the elements are separatedby "gaps" (for example, there are no integers between 2 and 3), Q has the density property that between every two rational numbers we can always find another. While this is not a "continuum" of values, from a measurement point of view we could never tell the difference. Every measurement is a ratio with respect to some standard. We could never measure an irrational quantity since all measurements have only a finite number of decimal digits! To even the sharpest eye, marking only the points on the number line which are in Q would "look" like the "real" thing (the double entendre is intentional). We wouldn't notice that the irrational numbers were missing!

Cantor designated the Card(N) as  (aleph naught, aleph being the first letter of the Hebrew alphabet). Cantor believed that, while not a number in the ordinary sense of the word, represented the "smallest kind of infinity". He called the first transfinite cardinal.

In order to compare cardinalities of various sets, we introduce the following notations for two sets A and B.


Thus, , while .

The idea is that, if a 1-1 map from A to B exists, then B must have at least as many members as A. Furthermore, if B has at least as many members as A, but a one to one correspondence from A to B is impossible, then B must have "more members" than A.

The similarity of  and  to the ordinary symbols  is intentional. For any real numbers a, b, and c, we have the transitive property that if a < b and b < c , then a < c . For  and  we have the following results which seem almost intuitively obvious based on "everyday" notions of size and comparison. With the exception of property 6, the arguments are pretty straightforward. They soon, in fact, seem identical! Once you have understood say the first seven, you can probably skim the rest. Property 10 is needed when we compare Card(R) to Card(N).

1. For any sets A and B.

Suppose A is a subset of B, let f be the identity function, i.e., I(x) = x, and Dom(f ) = A, then f (A) = Rng(f ) = A, which is a subset of B. Since I is 1-1, we have a 1-1 function from A to B .

2. .

If  we can find 1-1 functions f and g with and, then the composition of g with f is 1-1 from A to C.

3. .

If  we can find 1-1 functions f and g with and, then the composition of g with f is 1-1 from A to C.


Suppose . Then there is a 1-1 function f with . Now, pick any x in A. Since A is a subset of B, x is also in B and so f (x) is in C. Since f is 1-1on B, it must also be 1-1 on A. So we have .


Suppose . Then there is a 1-1 function f with . Now, pick any x in A. Then f (x) is in B. Since B is a subset of C, f (x) is also in C. So we have .

6. .

If  we can find a 1-1 function f with and, so and.

The other part of the theorem,  is a result proved by Cantor, Felix Bernstein, and Friedrich Schröder and is known as the Cantor- Schröder-Bernstein (CSB) Theorem. It's proof is rather complex and technical, but also extremely clever and maybe even beautiful. It involves using the generalized union of sets of infinite sequences and is beyond the scope of these notes. The interested reader should consult the references.

7. .

Suppose . Then we can find 1-1 functions f and g with and. The composition of g with f is 1-1 from A to C, so . As an indirect proof, suppose now that , then . Since , we have , hence by property 3. But implies that, so by the CSB Theorem . This contradicts , so it must not be true that .

8. .

Suppose . Then we can find 1-1 functions f and g with and. The composition of g with f is 1-1 from A to C , so . As an indirect proof, suppose now that , then . Since , we have, hence by property 3. But implies that, so by the CSB Theorem . This contradicts , so it must not be true that .


Suppose . Then we have and so from property 4, we have. As an indirect proof, suppose now that , then . Since , by property 3 we have . Since and property 1 imply that , by the CSB Theorem . This contradicts , so it must not be true that .


Suppose . Then we have and so from property 5, we have. As an indirect proof, suppose now that , then . Since , by property 1 we have . Hence, by property 3. The CSB Theorem then implies. This contradicts , so it must not be true that .

If a set A has the property that  N , we say that A is countable.

A set A is said to be uncountable if and only if NA .

We are now ready to show that there are uncountable sets "bigger" than N. The following proof is due to Cantor and presents his famous diagonal argument that N R. First, consider the function


This is certainly a 1-1 function from N to the open interval (0, 1). We will show that no such 1-1 function can be onto.Let g be any 1-1 function from N to the open interval (0, 1). Let designate the elements of the range of g. Each of these a's is a real number between zero and one. If we agree that all rational numbers that are terminating decimals be represented only by their repeating string of 0's and not by a repeating string of 9's, the output of g can be represented uniquely by the decimal expansion of each an. As is shown below, let an, j be the j'th decimal digit of an = g(n). Now pick two different decimal digits, say 3 and 5. Define the following function h(n) and let bn = h(n) for every natural number n.

Consider the real number b determined by the decimal expansion .

Then b is a real number in the open interval (0, 1) whose decimal expansion consists only of 3's and 5's. Furthermore, at decimal place n, bn fails to match the n'th decimal digit (an,n) of an .

For any natural number n, the decimal expansion of b does not match the decimal expansions of any of an. Therefore, the number b is not an element of the range of g. Thus, no 1-1 map from N to the open interval (0, 1) can be onto.

Hence, N(0, 1). Since the open interval (0, 1) is a subset of R,from property 10 we conclude N R. Thus, the set of real numbers is uncountable.

Note: You might think we could solve the problem in the above proof by simply including the derived number b in the range of the 1-1 function g. This wouldn't solve the problem. Because now we'd be working with a "new" g, and then we'd construct a "new" b that would not be in the range of this function. There are "just too many" real numbers to count! Any scheme that tries to count them all is guaranteed to miss some.

Surprisingly, set size as measured with cardinality is very different than geometric measures of length or area. For any pair of real numbers a and d, with a < d, the linear function is a bijection from (0, 1) to the interval (a, d). Thus, (a, d(0, 1). Since (0, 1) is uncountable, so is (a, d). The geometric length of (0, 1) is 1, while the geometric length of (a, d), which ranges from the "indescribably small as a approaches d to the incomprehensibly large as d is chosen arbitrarily far from a. Yet, never the less, (0, 1) and (a, d) have the "same number" of elements.

Using some properties of rational functions, we could show that the following function is a bijection from the open interval to .

A "cleaner looking" bijection from to involves the inverse tangent function.

Either way, despite the differences in geometric size, we have R.

An even more unbelievable result is that . The first set is one-dimensional and has no geometric area, while the second set is two-dimensional and has an area of 1.

To see the cardinal equivalency consider the function f from todefined by the following "coding" scheme. Given any number in (0, 1) it has a unique decimal expansion (provided we convert all trailing strings of 9's into terminating decimals). Let  . The output of f applied to a is the ordered pair . First, this function is 1-1, since there is an inverse "decoding" in which we use alternate decimal digits from the first and second components of the output to reconstruct the decimal expansion of the input. Second, given any ordered pair (b, d) in , using the decoding scheme allows us to reconstruct the number a in (0, 1) with f (a) = (b, d), so the function is onto.

It might now seem that Ris the biggest possible set. Cantor, however, showed how starting with any set, we can get a set with larger cardinality. Specifically, for any set A, P(A) .

Proof. First consider the function, T(x) ={ x }, which given an object forms the set with that object and only that object as an element (the "singleton" of x). Then for any a in A, T(a) = {a} is a subset of A, so T(a) is in P(A). Now suppose T(a) = T(b), .i.e., {a} = {b}. Since two sets are equal if and only if they have the same elements, we conclude a = b. So we have P(A) . Now consider any function f with P(A). Consider the set B,

. Now since every element in B is also in A, we have that B is a subset of A. Hence, B is an element of P(A). Now, can we find x in A, with f (x) = B ?

Assume we've found an x in A with f (x) = B. Now, is this x in B? If x is in B, then x must be in A and x is not in the set given by f (x) = B. That is, since x is in A, if x is in B, then it is not in B! Thus, if x is in B we get a contradiction. Therefore, it must be the case that x is not in B. However, since x is not in B, x is not in f (x). That is, we have x in A and x not in f (x), which is precisely the membership criteria for B! Thus, x is both not in B and in B. Another contradiction! We must therefore conclude that there is no x in A with f (x) = B. Thus, no 1-1 map from A to P(A) can be onto.

Thus, there is no largest set! By forming the power set we can always make a larger one. In particular,

N P(N) and R P(R).

Both of the sets R and P(N) are uncountable. Which is larger? Again Cantor provided the answer.

Theorem: RP(N ) .

Proof. In analogy with the decimal representation every real b number in the open interval (0, 1) can be represented by a binary number, b = 0.b1b2b3b4b5b6b7b8bn, where each bn is either 0 or 1. As with decimal numbers we will assume that any repeating binary number with a trailing string of 1's has been rewritten as a terminating binary number. Hence, the binary representation of the real numbers in (0, 1) is unique. We will now generate another coding scheme which codes every subset of N into a binary number in [0, 1]. Let A be any subset of N, then we can construct the function f that maps A into the binary number 0.a1a2a3a4a5a6a7a8an…, with an = 1 if n is in A, and an = 0 if n is not in A. If A is empty f (A) = 0, and if A is N f (A) = 1. The function f is certainly 1-1 from P(N ) to [0, 1], since if two subsets of N map to the same binary number, they must contain exactly the same natural numbers. Let b be any real number in (0, 1), then let B be the subset of N with n in B if and only if bn = 1. By construction f (B) = b, so f is an onto function. So we have established that P(N )[0, 1]. Since the interval [0, 1]R,we have RP(N ).

The symbol c is sometimes used for the Card(R). Since Card({1, 2, 3,…n} ) = 2n, the result RP(N ) is often written symbolically as


Cantor made the conjecture that there is no set A, whose cardinality is between the natural numbers and the real numbers. More precisely this assertion, known as the Continuum Hypothesis, says that there does not exist a set A with the property that R .

Russell's Paradox

In the year 1900, at the dawn of a new century, David Hilbert, a passionate advocate of Cantor's work, addressed the Second International Congress of Mathematicians in Paris. He presented 23 famous unsolved problems for consideration. The first problem on the list was to establish the truth or falsehood of the Continuum Hypothesis. Hilbert presented in his speech what was almost an article of faith among mathematicians, namely, that if a mathematical proposition is true, it must either be an axiom or theorem of mathematics. According to this view all true mathematical statements can be proved.

 To quote Hilbert ,

"The great importance of definite problems for the progress of mathematical science in general ... is undeniable. ... [for] as long as a branch of knowledge supplies a surplus of such problems, it maintains its vitality. ... every mathematician certainly shares ..the conviction that every mathematical problem is necessarily capable of strict resolution ... we hear within ourselves the constant cry: There is the problem, seek the solution. You can find it through pure thought..."

In 1902 while Gottlob Frege was preparing to publish his major opus on the foundations of mathematics representing his life's work, he received a letter from the young English philosopher Bertrand Russell. After reading this correspondence, Frege wrote in the introduction of his book,

"A scientist can hardly meet with anything more undesirable than to have the foundation give way just as the work is finished. In this position I was put by a letter from Mr. Bertrand Russell as the work was nearly through the press."

What had happened? Inspired by the work of Cantor and others, Bertrand Russell considered a construction reminiscent of the set B in Cantor's proof that P(A). Specifically, he considered the set of all sets that are not elements of themselves, i.e., the set of all collections that are not self-contained. Let . This set is blatantly self-referential, but it is not necessarily unreasonable. For example, a set of dogs is in W, for a set of dogs is not a dog! In any case, the Axiom of Abstraction says we can talk about W. The big question we arrive at now is the following: is ? Well, if, then by the Axiom of Abstraction,. Similarly, if , then . We get a contradiction either way! There's no way out. Naïve set theory is inherently self-contradictory!

Almost immediately mathematicians and logicians tried to "fix" set theory by various modifications of the Axiom of Abstraction that seemed to be the source of the trouble. Russell and Whitehead in their book, Principia Mathematica, developed a "theory of types" which did not even allow the question, "is a set an element of itself ", to be posed. Other mathematicians developed alternate axiom schemes for set theory that did not allow Russell's Paradox to occur. The most widely accepted are based on the work of Ernst Zermelo, Adolf Fraenkel , and Thoralf Skolem. This system of axioms is called ZF and many mathematicians feel that it (or perhaps it with the Axiom of Choice added) contain all of the formal methods required for mathematics. Some would even go so far as to say that a proof of a theorem is valid if and only if it could, at least in principle, be formulated and proven in ZF.

In 1930 Kurt Gödel completely turned over the apple cart! He showed that all of the systems of set theory developed to avoid Russel's Paradox could not be proven to be consistent. That is, using methods acceptable to mathematicians, he was able to show that there is no guarantee that there are not other paradoxes, just like Russell's Paradox, which would render all of these set theories self-contradictory! In fact, the details of his proof are even more perplexing. For what Gödel really demonstrated was that if these set theories are consistent (no contradictions possible), then they must be incomplete. This means that there are well-formed formulas in these axiomatic systems which can neither be proved nor disproved using the rules of these axiomatic systems. For this reason, Gödel's result is called Gödel's Incompleteness Theorem. A statement that can neither be proved nor disproved within an axiomatic system is called undecidable. Gödel in essence showed that any axiomatic system with sufficient "power" to encapsulate the methods of mathematics must of necessity contain undecidable statements. The situation is again reminiscent of Cantor's theorem that the real numbers in the open interval (0, 1) are uncountable. Adding the number b developed by the Cantor "diagonal slash" construction to the list of real numbers does not make the real numbers countable, since now a "new" b could be generated which is not on the list. In an analogous fashion, adding either an undeciable statement (or its negation) to the list of axioms to make a "new" axiomatic system, just generates a "new" undecidable statement within this "new" system!

In particular, Gödel's Incompleteness Theorem guarantees that no finite scheme of axioms can prove or disprove all the well-formed statements involving the natural numbers! There are statements in any axiomatic formulation of number theory (probably the most basic subject in mathematics) that are undecidable.

Based on the work of Gödel in 1940 and of Paul Cohen in 1963, the Continuum Hypothesis is undecidable within standard ZF theory! So much for Hilbert's first problem. This constituted a resolution he probably did not imagine when he first posed the question. There are some very deep, if not downright disturbing, questions here. Is the Continuum Hypothesis true? What is meant by the truth of any undecidable proposition? What does this tell us about the truth of any mathematical statement?