Program flow and C code implementation for removing repeated rows from source files
I. Requirement Description
Deduplication is required for a file (source file) that contains several lines of records with the same records, and records after deduplication are written to another file (target file). That is, there are no two rows of records with the same content in the last generated file. If there is a blank line between two records in the source file, remove it from the target file.
The two records share the same standards:
1) the number and content of characters are identical.
2) The content after removing spaces and entering the line breaks is exactly the same.
Example:
Source File example:
ABCDEFGHabcdAB CDabcd12345
Example of the target file:
ABCDEFGHabcd12345
II. General Procedure
To achieve deduplication, we consider using the Linked List data structure. First, read the record content in the source file one by one, and compare it with the existing record content in the linked list. If it is not the same, add the record to the linked list. After reading all the records in the source file, write the records in the linked list to the target file.
The general process 1 of the program is shown in.
Figure 1 General Procedure <喎?http: www.bkjia.com kf ware vc " target="_blank" class="keylink"> Expires + expires + PGJyIC8 + expires + PGJyIC8 + expires + 1tC2wcihtb21xMO/expires/Co7o8L3A + DQo8cHJlIGNsYXNzPQ = "brush: java;"> typedef struct T_FileInfoStruct{ UINT8 szContentLine[256]; struct T_FileInfoStruct *pNext;} T_FileInfo;
2. function used to determine whether a record already exists in the linked list
Each time a record is read, it is necessary to determine whether the record already exists in the Linked List (except for the first record ). The operation is very simple. You only need to traverse the entire linked list. This operation is implemented by the IsInList function. The specific code is as follows:
UINT32 IsInList (T_FileInfo * ptContentListHead, UINT8 * pszContentLine) {T_FileInfo * pTmpInfo = ptContentListHead; if (ptcontentlistheader = NULL | pszContentLine = NULL) {printf (IsInList: input parameter (s) is NULL !); Return 0;} while (pTmpInfo! = NULL) {if (strncmp (pszContentLine, pTmpInfo-> szContentLine, strlen (pszContentLine) = 0) // exist in the linked list {return 1 ;} pTmpInfo = pTmpInfo-> pNext;} if (pTmpInfo = NULL) // does not exist in the linked list {return 0 ;}}
3. Remove the record's carriage return or line break, and remove the space in the record.
The RmNewLine function is used to remove the record's carriage return line break. The RemoveSpaceInStr function is used to remove spaces in the record. For the program code of these two functions, see the appendix.
Iv. Program compilation and execution results
Upload the RemoveRepeatLine. c (for detailed code, see the appendix) program to a Linux machine. Use the gcc-g-o RemoveRepeatLine. c command to compile the program and generate RemoveRepeatLine.
In this program, the source file we use is testfile.txt, located in the current user's zhouzhaoxiong/zzx/RemoveRepeatLine/testfiledirectory. The generated destination file is resultfile.txt, which is located in the same directory as the source file.
The content of the source file testfile.txt used by the user is:
ABCDEFGHAB CD1243565dfbfdjdgbsjn1234 ndsaghhn zvb vcawsswghEFGHzvb1234 nfedhh EFG H
Run the removerepeatlinecommand to generate the final file resultfile.txt, whose content is:
ABCDEFGH1243565dfbfdjdgbsjn1234ndsaghhnzvbvcawsswghfedhh
It can be seen that the program satisfies the requirements, removes duplicate rows, and removes extra empty rows.
V. Summary
The procedures are described as follows:
First, the Linked List data structure is widely used in actual software development projects. You must be familiar with its usage, in particular, how to insert data into the linked list, how to traverse the entire linked list, how to clear the linked list, and so on.
Second, in the WriteToFile function, before the "strncpy (szContentBuf, ptContentListHead-> szContentLine, strlen (ptContentListHead-> szContentLine);" statement, you must add "memset (szContentBuf, 0x00, sizeof (szContentBuf); "statement. Otherwise, the content written into the file may not be what we want. You can also try it on your own machine to see the difference between the "memset (szContentBuf, 0x00, sizeof (szContentBuf);" Statement and the generated result file.
Third, in the program for generating and processing a ticket, deduplication of the source ticket file is a problem that everyone must consider. The program in this article provides reference for developers of related projects.
Appendix: complete program code
/*************************************** * ******************************** All Rights Reserved (C) 2015, Zhou Zhaoxiong. ** File name: RemoveRepeatLine. c * File ID: none * Content Abstract: Remove duplicate lines in the source file * Other Description: none * Current version: V1.0 * Author: Zhou Zhaoxiong * completion date: 20151209 *************************************** * *******************************/# include
# Include
# Include
// Data type redefinition typedef unsigned char UINT8; typedef signed int INT32; typedef unsigned int UINT32; // chain table typedef struct T_FileInfoStruct {UINT8 szContentLine [256] that stores the content of each row of the file; struct * pNext;} T_FileInfo; // function declaration void inline (UINT8 * pszTestFile); void RmNewLine (UINT8 * pInStr); UINT32 IsInList (T_FileInfo * ptContentListHead, UINT8 * handle ); void WriteToFile (T_FileInfo * ptConten TListHead); void ClearList (T_FileInfo * ptContentListHead); void RemoveSpaceInStr (UINT8 * pszStr, UINT32 iStrLen ); /*************************************** * ******************************** function description: main function * input parameter: none * output parameter: none * return value: none * Other description: no * modified date version number modifier modified content * found * 20151209 V1.0 Zhou Zhaoxiong create *********************** ************************************ * **********/INT32 main () {UINT8 szTestFile [128] = {0}; // assemble the configuration file path snprintf (szTestFile, sizeof (szTestFile)-1, % s/zhouzhaoxiong/zzx/RemoveRepeatLine/TestFile/TestFile.txt, getenv (HOME); RemRepLineAndWriResFile (szTestFile ); // call the function to complete the deduplication and file write operations return 0 ;} /*************************************** * ******************************** function description: remove duplicate rows from the source file and write the deduplication content to the result file * input parameter: pszTestFile-test file name with path * output parameter: none * return Return Value: none * Other description: no * modified date version number modifier modified content * found * 20151209 V1.0 Zhou Zhaoxiong create *********************** **************************************** * ******/void RemRepLineAndWriResFile (UINT8 * pszTestFile) {UINT8 szContentLine [256] = {0}; UINT32 iLineLen = 0; UINT32 iRetVal = 0; FILE * fp = NULL; T_FileInfo * ptContentListHead = NULL; T_FileInfo * ptConten TListTail = NULL; T_FileInfo * ptCurrentContent = NULL; if (pszTestFile = NULL) {printf (RemRepLineAndWriResFile: pszTestFile is NULL !); Return;} printf (RemRepLineAndWriResFile: now, begin to process file % s, pszTestFile); if (fp = fopen (pszTestFile, r) = NULL) {printf (RemRepLineAndWriResFile: open file % s failed !, PszTestFile); return;} else {ptContentListHead = NULL; ptContentListTail = NULL; while (feof (fp) = 0 & ferror (fp) = 0) {memset (szContentLine, 0x00, sizeof (szContentLine); if (fgets (szContentLine, sizeof (szContentLine), fp) = NULL) // read a row of content from the source file {printf (RemRepLineAndWriResFile: get line null, break .); break;} else {printf (RemRepLineAndWriResFile: get content line: % s, szContentLine );} RmNewLine (szContentLine); // remove the carriage return line break after the string RemoveSpaceInStr (szContentLine, strlen (szContentLine); // remove the spaces in the string iLineLen = strlen (szContentLine ); if (iLineLen = 0) // if the valid length of this row is 0, continue to read the next row {printf (RemRepLineAndWriResFile: the length of line is 0, continue to read the next content line .); continue;} if (ptContentListHead! = NULL) // determine whether the current row already exists {iRetVal = IsInList (ptContentListHead, szContentLine); if (iRetVal = 1) // {printf (RemRepLineAndWriResFile: this content line has already existed .); continue ;}}// Add the current row to the linked list ptCurrentContent = (T_FileInfo *) malloc (sizeof (T_FileInfo); if (ptCurrentContent = NULL) {printf (RemRepLineAndWriResFile: exec malloc failed, memory may be not enough .); return;} memcpy (ptC UrrentContent-> szContentLine, szContentLine, strlen (szContentLine); if (ptContentListHead = NULL) // when the linked list is empty, it is used as the linked list header {ptContentListHead = ptCurrentContent ;} else {if (ptContentListTail! = NULL) // Insert the end of the linked list {ptContentListTail-> pNext = ptCurrentContent; ptContentListTail = ptCurrentContent ;}}// the source file is fclose (fp); fp = NULL ;} // write the de-duplicated result into the file WriteToFile (ptContentListHead); // clear the linked list ClearList (ptContentListHead); ptContentListHead = NULL ;} /*************************************** * ******************************** function description: remove the carriage return line character * after the string. input parameter: pInStr-input string * output parameter: none * return value: none * Other Description: none * modify date version The content modified by this account is * 0000* 20151209 V1.0 Zhou Zhaoxiong. Create *************************** **************************************** * ***/void RmNewLine (UINT8 * pInStr) {UINT32 iStrLen = 0; if (pInStr = NULL) {printf (RmNewLine: pInStr is NULL !); Return;} iStrLen = strlen (pInStr); while (iStrLen> 0) {if (pInStr [iStrLen-1] = ''| pInStr [iStrLen-1] = '') {pInStr [iStrLen-1] = '';} else {break;} iStrLen --;}} /*************************************** * ******************************** function description: determine whether the content of a row already exists in the linked list * input parameter: pInStr-input string * output parameter: none * return value: 1-existence 0-nonexistent * Other Instructions: no * modified date version number modifier content *----------------------------------------------------------- ---- * 20151209 V1.0 Zhou Zhaoxiong create ********************************* * ***********************************/UINT32 isInList (T_FileInfo * ptContentListHead, UINT8 * pszContentLine) {T_FileInfo * pTmpInfo = ptContentListHead; if (ptContentListHead = NULL | pszContentLine = NULL) {printf (IsInList: input parameter (s) is NULL !); Return 0;} while (pTmpInfo! = NULL) {if (strncmp (pszContentLine, pTmpInfo-> szContentLine, strlen (pszContentLine) = 0) // exist in the linked list {return 1 ;} pTmpInfo = pTmpInfo-> pNext;} if (pTmpInfo = NULL) // does not exist in the linked list {return 0 ;}} /*************************************** * ******************************** function description: write the content to the file * input parameter: ptContentListHead-file record linked list * output parameter: none * return value: none * Other Instructions: No * modified date version number modified *-------------------------------- ---------------------------------- * 20151209 V1.0 Zhou Zhaoxiong create ********************************* ***************************************/ void WriteToFile (T_FileInfo * ptContentListHead) {FILE * fp = NULL; UINT8 szLocalFile [500] = {0}; UINT8 szContentBuf [256] = {0}; if (ptContentListHead = NULL) {printf (WriteToFile: input parameter is NULL !); Return;} snprintf (szLocalFile, sizeof (szLocalFile)-1, % s/zhouzhaoxiong/zzx/RemoveRepeatLine/TestFile/ResultFile.txt, getenv (HOME); fp = fopen (szLocalFile, a +); if (fp = NULL) {printf (WriteToFile: open local file failed, file = % s, szLocalFile); return;} while (ptContentListHead! = NULL) {memset (values, 0x00, sizeof (szContentBuf); strncpy (values, ptContentListHead-> szContentLine, strlen (ptContentListHead-> szContentLine); printf (WriteToFile: localFile = % s, ContentBuf = % s, szLocalFile, szContentBuf); fputs (szContentBuf, fp); fputs (, fp); // Add the carriage return line break fflush (fp ); ptContentListHead = ptContentListHead-> pNext;} fclose (fp); fp = NULL; return ;} /************************ **************************************** * ****** Function description: clear linked list * input parameter: ptContentListHead-linked list pointer * output parameter: none * return value: none * Other Instructions: no * modified date version number modifier modified content * found * 20151209 V1.0 Zhou Zhaoxiong create *********************** **************************************** * ******/void ClearList (T_FileInfo * ptContentListHead) {T_FileInfo * ptContentList = NULL; T_FileInfo * pTmpData = NULL; if (ptContentListHead = NULL) {printf (ClearList: input parameter is NULL !); Return;} ptContentList = ptContentListHead; while (ptContentList! = NULL) {pTmpData = ptContentList; ptContentList = ptContentList-> pNext; free (pTmpData );}} /*************************************** * ******************************** function description: clear spaces in the string * input parameter: pszStr-input string iStrLen-maximum length * output parameter: none * return value: none * Other Instructions: no * modified date version number modifier modified content * found * 20151209 V1.0 Zhou Zhaoxiong create *********************** **************************** * ******************/Void RemoveSpaceInStr (UINT8 * pszStr, UINT32 iStrLen) {UINT8 szResult [256] = {0}; UINT8 szBuf [256] = {0}; UINT32 iLoopFlagSrc = 0; UINT32 iLoopFlagDst = 0; if (pszStr = NULL) {return;} memcpy (szBuf, pszStr, iStrLen); for (iLoopFlagSrc = 0; iLoopFlagSrc <strlen (szBuf); iLoopFlagSrc ++) {if (szBuf [distinct]! = '') {SzResult [iLoopFlagDst] = szBuf [iLoopFlagSrc]; iLoopFlagDst ++ ;}} szResult [iLoopFlagDst + 1] = 0; memcpy (pszStr, szResult, iStrLen ); return ;}