C language advanced programming guide 1)

Source: Internet
Author: User
Tags posix

 

This article is the first part. The translation content is int type conversion and memory allocation.

C language is a choice for system programs, embedded systems, and many other applications. However, it seems that I am not particularly interested in computers, so I will not be able to access the C language. It is a great challenge to be familiar with all aspects of the C language, and to have a lot of details. This article tries to provide more information to illustrate some of them. Including int type conversion, memory allocation, array pointer conversion, explicit memory functions, Interpositioning (not quite understandable), and vector changes.
 

Int overflow and type conversion many C language programmers tend to assume that the basic operations on the int type are safe and will not be reviewed too much during use. In fact, these operations are prone to problems. Think about the code below:
int main(int argc, char** argv) {    long i = -1;    if (i < sizeof(i)) {         printf(OK);    }    else {         printf(error);    }    return 0;}

(Note: The result is error, which is unexpected by many people. The following is the author's explanation)

This is because variable I is converted to the unsigned int type. Therefore, its value is no longer-1, but the maximum value of size_t, which is caused by the type of the sizeof operator.
The specific cause is found in the section on common arithmetic conversions in the C99/C11 standard:
"If the operator has the unsinged int type and the operator priority is greater than or equal to that of other operators, You need to convert signed int to unsinged int.

Size_t is defined as at least 16-bit unsinged int type in the C language standard. Generally, the number of digits of size_t is related to the system. The size of the int type is at least equal to that of size_t. Therefore, the above rules forcibly convert the variable to unsinged int.

(For introduction to sizof, see the http://blog.csdn.net/sword_8367/article/details/4868283)

When we use the int type, there will be some problems. The C language standards do not clearly define short, int, long, long, and the size of their unsinged versions. Only the minimum size is required. Taking the x86_64 framework as an example, the long type is 64 bytes in linux, while the long type is still 32 bytes in 64-bit windows. For better code porting, the common method is to use a fixed length type, such as unit16_t or int32_t, which are defined in the C99 standard stdint. h header file. The following three int types are defined here:
1 clear size: uint8_t uint16_t int32_t and so on
2. define the minimum length of the type: uint_least9_t, uint_least16_t, int_least32_t, etc.
3. The most efficient and minimum length defined: uint_fast8_t, uint_fast16_t, int_fast32_t, etc.

But unfortunately, using stdint. h cannot avoid all problems ." In the integral promotion rule (rule for int type conversion), we can say this:
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. these are called the integer promotions. all other types are unchanged by the integer promotions.
If an int can represent all values of the original type, the value is converted to int. Otherwise, the value is converted to unsinged int. this is called int type conversion. All other types will not be changed in int type conversion.
The following code returns 65536 on 32 and 0 on 16;
Uint32_t sum ()
{
Udint16_t a = 65535;
Uint16_t B = 1;
Return a + B;
}
Int type conversion retains the symbols of the variable. But is a simple char type conversion converted to a signed number or an unsigned number?


Generally, char type conversion depends on the hardware structure and operating system. It is usually determined in the program binary interface of a specific platform. If you find that char is upgraded to siged char, the following code will print-128,127 (for example, x86 framework); otherwise, it will be 128,129. gcc adds the compilation option-funsigned-char to forcibly increase the number of characters on the x86 platform to unsigned.
Char c = 128;
Char d = 129;
Printf (% d, % d, c, d );


Memory Allocation and memory management: malloc, calloc, realloc, and free
Malloc allocates an uninitialized memory in bytes of the specified size. If the size is 0, the returned result depends on the Operating System. This action is not explicitly stated in C or POSIX.

If the space size must be 0, the result is determined by the compiler: return a null pointer or a unique pointer.

Malloc (0) usually returns a unique valid pointer. You must ensure that no error is returned when calling the free function. Its hollow pointer and free function will not perform any operation.

Therefore, if you use the result of an expression as the malloc parameter, You need to test whether the int value is out of bounds.

Size_t computed_size;

If (elem_size & num> SIZE_MAX/elem_size ){
Errno = ENOMEM;
Err (1, overflow );
}

Computed_size = elem_size * num;

Void * calloc (size_t nelem, size_t elsize );
Generally, when allocating a series of spaces of the same size, you should use calloc to calculate the size () without expressions (). In addition, it will initialize the memory space to 0. Release the allocated space and use free.

Void * realloc (void * ptr, unsigned newsize );
Realloc will change the size of the previously allocated memory. The pointer returned by the function points to the new memory location. The content in the pointer may be the same as the original content. If the size of the new allocation is greater than the original size, the added space may not be initialized. If the old pointer in the parameter is null and the size is not equal to 0, the function is equivalent to malloc. If the size of the parameter is 0 and the old pointer is not null, The result depends on the operating system.

Most operating systems release the memory of the old pointer and return malloc (0) or NULL. for example, windows releases the memory, returns NULL, OpenBSD releases the memory, and returns a pointer to 0.

If realloc fails, NULL is returned and the allocated memory is left. Therefore, it is not only necessary to check whether the parameter overflows, but also to correctly process the old memory space when realloc allocation fails.
#include 
 
  #include 
  
   #include 
   
    #include 
    
     #define VECTOR_OK            0#define VECTOR_NULL_ERROR    1#define VECTOR_SIZE_ERROR    2#define VECTOR_ALLOC_ERROR   3struct vector {    int *data;    size_t size;};int create_vector(struct vector *vc, size_t num) {    if (vc == NULL) {        return VECTOR_NULL_ERROR;    }    vc->data = 0;    vc->size = 0;    /* check for integer and SIZE_MAX overflow */    if (num == 0 || SIZE_MAX / num < sizeof(int)) {        errno = ENOMEM;        return VECTOR_SIZE_ERROR;    }    vc->data = calloc(num, sizeof(int));    /* calloc faild */    if (vc->data == NULL) {        return VECTOR_ALLOC_ERROR;    }    vc->size = num * sizeof(int);    return VECTOR_OK;}int grow_vector(struct vector *vc) {    void *newptr = 0;    size_t newsize;    if (vc == NULL) {        return VECTOR_NULL_ERROR;    }    /* check for integer and SIZE_MAX overflow */    if (vc->size == 0 || SIZE_MAX / 2 < vc->size) {        errno = ENOMEM;        return VECTOR_SIZE_ERROR;    }    newsize = vc->size * 2;    newptr = realloc(vc->data, newsize);    /* realloc faild; vector stays intact size was not changed */    if (newptr == NULL) {        return VECTOR_ALLOC_ERROR;    }    /* upon success; update new address and size */    vc->data = newptr;    vc->size = newsize;    return VECTOR_OK;}
    
   
  
 


Avoid fatal errors on dynamic memory application, avoid common error methods, write code carefully, and do as much exception protection as possible. However, there are many common problems that can be avoided in some ways.
1) repeated calls of free causes a crash. The free function parameter is caused by the following: NULL pointer, or the pointer is not allocated by functions such as malloc (wild pointer ), or it has been released by free/recalloc (wild pointer ). To avoid this problem, you can use the following methods:
1. If a valid value cannot be assigned to the pointer immediately, the initialization pointer is NULL during the declaration,
2 gcc and clang both warn about uninitialized variables.
3. Do not use the same pointer to point to static memory and dynamic memory.
4. After free is used, set the pointer to NULL, so that if you call free accidentally, no error will occur.
5. To avoid two releases, use a function similar to assert during testing and debugging.

 

 

char *ptr = NULL;/* ... */void nullfree(void **pptr) {    void *ptr = *pptr;    assert(ptr != NULL)    free(ptr);    *pptr = NULL;}

 

2) access the memory through a null pointer or uninitialized pointer. With the above rules, your code only needs to process null pointers or valid pointers. You only need to check whether the dynamic memory pointer is empty at the beginning of a function or code segment. 3) memory access out of bounds does not necessarily cause program crashes. The program may continue to operate and use the wrong data, causing dangerous consequences, or the program may use these operations to enter other branches or execute code. Step-by-step manual detection of array boundary and dynamic memory boundary is the main method to avoid these dangers. The memory boundary information can be tracked manually. The sizeof function can be used for array size, but sometimes the array is converted to a pointer. (for example, in a function, sizeof returns the pointer size instead of an array .) The Annex k interface in the c11 standard is the boundary detection interface. It defines a series of new library functions and provides some simple and secure methods to replace the standard library (such as string and I/O operations) there are also some open-source methods such as slibc, but their interfaces are not widely used. The strlcpy and strlcat functions are provided based on the BSD system (including Mac OS X) to better perform string operations. For other systems, you can use libbsd libraray. Many operating systems provide interfaces to control the acquisition of memory areas and protect memory read/write, such as posix mporst. These mechanisms are mainly used to control the entire memory page. To avoid Memory leakage, some dynamic memory is not used, but the program is not released. Therefore, to really understand the scope of the allocated memory space, the most important thing is when the free function is called. However, as the complexity of the Program increases, this will become more and more difficult. Therefore, you need to add the memory management function in the initial design. Below are some ways to solve these problems: 1) apply to allocate all the heap memory needed during startup to prevent the program from making memory management easier. When the program ends, it is released by the operating system (does this mean that the program ends calling free? Or the system is free after the program is closed ). In many cases, this method is satisfactory, especially for program batch processing input, and then complete. 2) if you need a variable-size temporary storage space and the lifecycle is only in one function, you can consider using VLA (variable-length array ). However, it is restricted to use. Each function can use up to bytes of space. Because the variable length array (C11 optimized) defined in C99 has an automatic storage area, it has a certain range as other automatic variables. Although the standards do not clearly indicate, VLA is usually placed in the stack space. The maximum size of memory available for VLA allocation is SIZE_MAX bytes. First, we need to know the size of the stack space of the target platform. Therefore, we need to exercise caution to avoid stack overflow or read error data under the memory segment. 3) The idea behind manual reference counting is to record the number of each allocation and loss of reference. When a reference is assigned each time, the Count increases, and the allocation decreases when the reference is lost each time. When the number of references is 0, it indicates that the memory space is no longer used and then released. However, the C language does not support automatic destructor (in fact, both GCC and Clang support cleanup extension), but it does not mean that the allocation operator needs to be rewritten and the retain/release is manually called to complete the counting. Function. In another way, the program occupies or disconnects from a piece of memory space in multiple places. Even if you use this method, you need to follow many rules to ensure that you do not forget to call release (resulting in Memory leakage) or too many calls (release ahead of schedule ). However, if the life cycle of a memory space is determined by external events and determined by the program structure, it will use various methods to process the memory space, so it is worthwhile to use this troublesome method. The following code block is a simple reference count for memory management. 
 #include 
 
  #include 
  
   #define MAX_REF_OBJ 100#define RC_ERROR -1struct mem_obj_t{    void *ptr;    uint16_t count;};static struct mem_obj_t references[MAX_REF_OBJ];static uint16_t reference_count = 0;/* create memory object and return handle */uint16_t create(size_t size){    if (reference_count >= MAX_REF_OBJ)        return RC_ERROR;    if (size){        void *ptr = calloc(1, size);        if (ptr != NULL){            references[reference_count].ptr = ptr;            references[reference_count].count = 0;            return reference_count++;        }    }    return RC_ERROR;}/* get memory object and increment reference counter */void* retain(uint16_t handle){    if(handle < reference_count && handle >= 0){        references[handle].count++;        return references[handle].ptr;    } else {        return NULL;    }}/* decrement reference counter */void release(uint16_t handle){    printf(release);    if(handle < reference_count && handle >= 0){        struct mem_obj_t *object = &references[handle];        if (object->count <= 1){            printf(released);            free(object->ptr);            reference_count--;        } else {            printf(decremented);            object->count--;        }    }}
  
 



If you do not consider the compatibility of each compiler, you can use cleanup attribute to simulate automatic destructor in C language.
(Reference http://blog.csdn.net/haozhao_blog/article/details/14093155
Http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html)
void cleanup_release(void** pmem) {    int i;    for(i = 0; i < reference_count; i++) {        if(references[i].ptr == *pmem)           release(i);    }}void usage() {    int16_t ref = create(64);    void *mem = retain(ref);    __attribute__((cleanup(cleanup_release), mem));    /* ... */}




Another disadvantage of cleanup_release is that it is released based on the object address, rather than the number of references. Therefore, cleanup_release consumes a lot of effort to search for referenced arrays. One remedy is to modify the retain interface and return a pointer to the mem_obj_t struct. Another method is to use the following macro, which creates a variable to save the number of references and is associated with cleanup attribute.




/
* helper macros */#define __COMB(X,Y) X##Y#define COMB(X,Y) __COMB(X,Y)#define __CLEANUP_RELEASE __attribute__((cleanup(cleanup_release)))#define retain_auto(REF) retain(REF); int16_t __CLEANUP_RELEASE COMB(__ref,__LINE__) = REFvoid cleanup_release(int16_t* phd) {    release(*phd);}
void usage() {    int16_t ref = create(64);    void *mem = retain_auto(ref);    /* ... */}



4. The memory pool goes through many steps when a program is running. A memory pool may exist at the beginning of each step. When the program needs to allocate memory at any time, one of the memory pools will be used. Select a memory pool based on the lifecycle of the allocated memory, and the memory pool belongs to a certain stage of the program. At the end of each stage, the memory pool is immediately released. This method is very popular in long-running programs, such as Daemon, which can reduce memory fragmentation as a whole. The following is a simple example of memory pool management.

#include 
 
  #include 
  
   struct pool_t{    void *ptr;    size_t size;    size_t used;};/* create memory pool*/struct pool_t* create_pool(size_t size) {    struct pool_t* pool = calloc(1, sizeof(struct pool_t));    if(pool == NULL)        return NULL;    if (size) {        void *mem = calloc(1, size);        if (mem != NULL) {            pool->ptr = mem;            pool->size = size;            pool->used = 0;            return pool;        }    }    return NULL;}/* allocate memory from memory pool */void* pool_alloc(struct pool_t* pool, size_t size) {    if(pool == NULL)        return NULL;    size_t avail_size = pool->size - pool->used;    if (size && size <= avail_size){        void *mem = pool->ptr + pool->used;        pool->used += size;        return mem;    }    return NULL;}/* release memory for whole pool */void delete_pool(struct pool_t* pool) {    if (pool != NULL) {        free(pool->ptr);        free(pool);    }}
  
 




Implementing a memory pool is a difficult task. Some existing databases may meet your needs.
GNU libc obstack
Samba talloc
Ravenbrook Memory Pool System

5) Data Structure many memory management problems can be attributed to using the correct data structure to store data. The selected data structure is mainly determined by the algorithm requirements for accessing data and storing data. It is similar to the use of chained tables, hash tables, and trees that can bring additional gains, for example, traverse the data structure and quickly release the data. Although data structures are not supported in the standard library, there are some useful libraries below.


For traditional Unix implementation of linked lists and trees see BSD's queue. h and tree. h macros both are part of libbsd.
GNU libavl
Glib Data Types
For additional list see http://adtinfo.org/index.html


6) another way to mark and clear the garbage collector is to use an automatic garbage collection mechanism to reduce manual memory release. Pointer Reference is intended to be released when the memory is not used. Instead, the garbage mechanism is triggered by a specific event, such as memory allocation failure or allocation reaching a certain level. The tag and cleanup algorithms are a method for implementing the garbage mechanism. At the beginning, it will traverse all previously allocated memory references in the heap space, mark which references can be reached, and clear which ones are not marked.


Perhaps the most famous garbage collection mechanism in C is Boehm-Demers-Weiser conservative garbage collector. The disadvantage of the spam mechanism is the performance overhead and the uncertain pause of the program. Another problem is caused by malloc. It cannot be managed by the garbage collection mechanism and needs to be manually managed.
In addition, unpredictable pauses are unacceptable in real-time systems, but many environments still offer advantages over disadvantages. In terms of performance, they even claim high performance. Mono project GNU Objective C runtime and Irssi IRC client both use Boehm GC.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.