In computer science, type punning is a common term for any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.
In C and C++, constructs such as pointer type conversion and union — C++ adds reference type conversion and reinterpret_cast to this list — are provided in order to permit many kinds of type punning, although some kinds are not actually supported by the standard language.
In the Pascal programming language, the use of records with Variant type may be used to treat a particular data type in more than one manner, or in a manner not normally permitted.
The bind function is usually called as follows:
struct sockaddr_in sa = {
int sockfd = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
bind(sockfd, (struct sockaddr*)&sa, sizeof sa);
.sin_family = AF_INET,
.sin_port = htons(port)
};
The Berkeley sockets library fundamentally relies on the fact that in C, a pointer to struct sockaddr_in is freely convertible to a pointer to struct sockaddr; and, in addition, that the two structure types share the same memory layout. Therefore, a reference to the structure field my_addr->sin_family (where my_addr is of type struct sockaddr*) will actually refer to the field sa.sin_family (where sa is of type struct sockaddr_in). In other words, the sockets library uses type punning to implement a rudimentary form of polymorphism or inheritance.
Often seen in the programming world is the use of "padded" data structures to allow for the storage of different kinds of values in what is effectively the same storage space. This is often seen when two structures are used in mutual exclusivity for optimization.
return x < 0.0f;
}
However, supposing that floating-point comparisons are expensive, and also supposing that float is represented according to the IEEE floating-point standard, and integers are 32 bits wide, we could engage in type punning to extract the sign bit of the floating-point number using only integer operations:
int* i = (int*)&x;
return *i < 0;
}
Note that the behaviour will not be exactly the same: in the special case of x being signed zero, the first implementation yields false while the second yields true. Also, the first implementation will return false for any NaN value, but the latter might return true for NaN values with the sign bit set. Lastly we have the problem wherein the storage of the floating point data may be in big endian or little endian memory order and thus the sign bit could be in the least significant byte or the most significant byte. Therefore the use of type punning with floating point data is a questionable method with unpredictable results.
This kind of type punning is more dangerous than most. Whereas the sockets example relied only on guarantees made by the C programming language about structure layout and pointer convertibility, the float example relies on assumptions about a particular system's hardware. The C99 Language Specification ( ISO9899:1999 ) has the following warning in section 6.3.2.3 Pointers : "A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined." Therefore one should be very careful with the use of type punning.
Some situations, such as time-critical code that the compiler otherwise fails to optimize, may require dangerous code. In these cases, documenting all such assumptions in comments, and introducing static assertions to verify portability expectations, helps to keep the code maintainability.
Practical examples of floating-point punning include fast inverse square root popularized by Quake III, fast FP comparison as integers, and finding neighboring values by incrementing as an integer (implementing ).
// in C++
bool is_negative(float x) {
int32_t i = *(int32_t*)&x;
return i < 0;
}
int32_t i = *reinterpret_cast
}
The C standard's aliasing rules state that an object shall have its stored value accessed only by an lvalue expression of a compatible type. The types float and int32_t are not compatible, therefore this code's behavior is undefined. Although on GCC and LLVM this particular program compiles and runs as expected, more complicated examples may interact with assumptions made by strict aliasing and lead to unwanted behavior. The option -fno-strict-aliasing will ensure correct behavior of code using this form of type-punning, although using other forms of type punning is recommended.
union {
int i;
float f;
} my_union;
my_union.f = x;
return my_union.i < 0;
}
Accessing my_union.i after most recently writing to the other member, my_union.f, is an allowed form of type-punning in C, provided that the member read is not larger than the one whose value was set (otherwise the read has unspecified behavior ). The same is syntactically valid but has undefined behavior in C++,ISO/IEC 14882:2011 Section 9.5 where only the last-written member of a union is considered to have any value at all.
For another example of type punning, see Stride of an array.
bool is_negative(float x) {
int i;
memcpy(&i, &x, sizeof(int)); // or std::memcpy in C++
return i < 0;
}
using std::numeric_limits;
constexpr bool is_negative(float x) noexcept {
static_assert(numeric_limits
}
VariantRecord = record
case RecType : LongInt of
1: (I : array[1..2] of Integer); (* not show here: there can be several variables in a variant record's case statement *)
2: (L : LongInt );
3: (R : Real );
4: (C : array[1..4] of Char );
end;
var
V : VariantRecord;
K : Integer;
LA : LongInt;
RA : Real;
Ch : Character;
V.I1 := 1; Ch := V.C1; (* this would extract the first byte of V.I *) V.R := 8.3; LA := V.L; (* this would store a Real into an Integer *)
These examples could be used to create strange conversions, although, in some cases, there may be legitimate uses for these types of constructs, such as for determining locations of particular pieces of data. In the following example a pointer and a longint are both presumed to be 32 bit:
var
New(PP);
PP^.P := PP;
WriteLn('Variable PP is located at address ', Hex(PP^.L));
PA = ^Arec;
Arec = record
case RT : LongInt of
1: (P : PA );
2: (L : LongInt);
end;
PP : PA;
K : LongInt;
The reinterpret cast technique from C/C++ also works in Pascal. This can be useful, when eg. reading dwords from a byte stream, and we want to treat them as float. Here is a working example, where we reinterpret-cast a dword to a float:
var
F := pReal(@DW)^;
pReal = ^Real;
DW : DWord;
F : Real;
float pi = 3.14159;
uint piAsRawData = *(uint*)π
}
// ...
FloatAndUIntUnion union;
union.DataAsFloat = 3.14159;
uint piAsRawData = union.DataAsUInt;
[FieldOffset(0)]
public float DataAsFloat;
[FieldOffset(0)]
public uint DataAsUInt;
}
This can be circumvented by the following CIL code:
!!TEnum CombineEnums
{
.maxstack 2
ldarg.0
ldarg.1
or // this will not cause an overflow, because a and b have the same type, and therefore the same size.
ret
}
The cpblk CIL opcode allows for some other tricks, such as converting a struct to a byte array:
uint8[] ToByteArray
{
.locals init (
[0] uint8[]
)
.maxstack 3
// create a new byte array with length sizeof(T) and store it in local 0
sizeof !!T
newarr uint8
dup // keep a copy on the stack for later (1)
stloc.0
ldc.i4.0
ldelema uint8
// memcpy(local 0, &v, sizeof(T));
//
ldloc.0
ret
}
void main(String args) {
int value = 42;
ByteBuffer buffer = ByteBuffer.allocate(Integer.BYTES);
buffer.putInt(value);
// "Pun" this data into a float (just as an example)
buffer.flip();
float punResult = buffer.getFloat();
}
void main(String args) throws NoSuchFieldException, IllegalAccessException {
Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
Unsafe unsafe = (Unsafe) f.get(null);
long address = unsafe.allocateMemory(4);
unsafe.putInt(address, 42);
// Interpret the memory location as a float
float result = unsafe.getFloat(address);
unsafe.freeMemory(address);
}
|
|