The following is a simple understanding of their own, as a reminder of the journal, so write more confusing. If you understand something wrong, please help me correct it.
Flume-ng does not have before the real-time stream source to the file, only provides the Spooldir source, this source function monitors the specified folder, the files placed in the folder can no longer make any changes (including the modification time and file size), these 2 errors correspond to these 2
reflected in the code as
Within the Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.retireCurrentFile () method
1File Filetoroll =NewFile (Currentfile.get (). GetFile (). GetAbsolutePath ());2 3 currentfile.get (). Getdeserializer (). Close ();4 5 //Verify that spooling assumptions hold6 if(Filetoroll.lastmodified ()! =currentfile.get (). getlastmodified ()) {7String message = "File has been modified since being read:" + filetoroll + "\ n"8+ "filetoroll.lastmodified ():" + filetoroll.lastmodified () +9"Currentfile.get (). getlastmodified ():" +currentfile.get (). getlastmodified ();Ten Throw Newillegalstateexception (message); One } A if(Filetoroll.length ()! =currentfile.get (). GetLength ()) { -String message = "File has changed size since being read:" + filetoroll + "\ n" -+ "Filetoroll.length ():" + filetoroll.length () + the"Currentfile.get (). GetLength ():" +currentfile.get (). GetLength (); - Throw Newillegalstateexception (message); -}
But the problem is that we did not make any changes to the document, what would still be reported this error. After reviewing the code, we found that this thread frequency is 500ms, when we copy a larger file, 500ms has not been copied to complete, so there will be such a mistake. Of course Flume is designed to 500MS, because the default is that everyone is a small file, every few minutes or every few seconds to write a log file, there is no such a problem.
Org.apache.flume.source.SpoolDirectorySource
This test is too fast, when the file is relatively large, the time required to copy more than 500 milliseconds, will be reported changes in file or file size changes, changed to 5000private static final int poll_delay_ms = 15000;
The default is 500MS, I change to 15000ms.
Now that the problem is out, I would like to turn this value up a bit, tune into 15 seconds head. However, after testing, it is still not available.
Then why is this? The reason is very simple, even if we set this value to 10,000 seconds, when a large point of the file is exactly in the first 9,999 seconds to copy, a second after the thread started to find a new file and note the time when the file is modified and size, but because the file is not copied to complete, so the 2 values are errors. When that part of the reading, the program to check the changes in the time and size of the file to see if it was changed, just that time the file was copied a few come in, at this time will report the above error.
Knowing the root cause, like fundamentally solving this problem, the best way is that we wait for the complete copy of the file to finish, and then we start reading this file. Then find the part of the code that gets to read the file
1 /**2 * Find and open the oldest file in the chosen directory. If or more3 * files are equally old, the file name with lower lexicographical value is4 * returned. If The directory is empty, this would return an absent option.5 */6 PrivateOptional<fileinfo>GetNextFile () {7 /*Filter to exclude finished or hidden files*/8FileFilter filter =NewFileFilter () {9 Public BooleanAccept (File candidate) {TenString FileName =candidate.getname (); One if((Candidate.isdirectory ()) | | A(Filename.endswith (completedsuffix)) | | -(Filename.startswith (".")) || - Ignorepattern.matcher (FileName). Matches ()) { the return false; - } - return true; - } + }; -list<file> candidatefiles = arrays.aslist (spooldirectory.listfiles (filter));//get a file that meets the criteria under SpoolDirectory + if(Candidatefiles.isempty ()) { A returnoptional.absent (); at}Else { -Collections.sort (Candidatefiles,NewComparator<file> () {//sort files by last modified time - Public intCompare (file A, file b) { - intTimecomparison =NewLong (A.lastmodified ()). CompareTo ( - NewLong (B.lastmodified ())); - if(Timecomparison! = 0) { in returnTimecomparison; - } to Else { + returna.getname (). CompareTo (B.getname ()); - } the } * }); $File nextfile = candidatefiles.get (0);//because every time a file gets processed, it is marked as completed, so the firstPanax Notoginseng //fixed bug that transmitted large file error file was modified - This.checkfilecpisover (nextfile);//blocked here until the file is copied or more than 20 seconds the + Try { A //Roll The meta file, if needed theString Nextpath =Nextfile.getpath (); +Positiontracker Tracker = - durablepositiontracker.getinstance (MetaFile, nextpath); $ if(!tracker.gettarget (). Equals (Nextpath)) { $ tracker.close (); - deletemetafile (); -Tracker =durablepositiontracker.getinstance (MetaFile, nextpath); the } - Wuyi //Sanity Check the preconditions.checkstate (Tracker.gettarget (). Equals (Nextpath), -"Tracker target%s does not equal expected filename%s", Wu tracker.gettarget (), nextpath); - AboutResettableinputstream in = $ NewResettablefileinputstream (Nextfile, Tracker, - resettablefileinputstream.default_buf_size, Inputcharset, - decodeerrorpolicy); -Eventdeserializer Deserializer =eventdeserializerfactory.getinstance A (Deserializertype, Deserializercontext, in); + the returnOptional.of (NewFileInfo (Nextfile, Deserializer)); -}Catch(FileNotFoundException e) { $ //File could has been deleted in the interim theLogger.warn ("Could not find File:" +Nextfile, E); the returnoptional.absent (); the}Catch(IOException e) { theLogger.error ("Exception Opening file:" +Nextfile, E); - returnoptional.absent (); in } the } the}
In the 36th line of the method is to get ready to read the part of the file, before the direct get to the file does not check whether the file is copied, the 38th behavior to add the method
Here's how:
1 /**2 *3 * @Title: Checkfilecpisover4 * @Description: TODO (To check if the file copy is complete)5 * @param @paramcurrentfile settings File6 * @returnvoid return type7 * @throws8 */9 Private voidcheckfilecpisover (file file) {Ten LongModified = File.lastmodified ();//The current file modification time One LongLength = File.length ();//the size of the current file A Try { -Thread.Sleep (1000);//wait 1 seconds -}Catch(interruptedexception e) { the //TODO auto-generated Catch block - e.printstacktrace (); - } -File Currentfile =NewFile (File.getabsolutepath ()); + intCount = 0;//record the number of cycles, more than 20 times, i.e. 10 seconds after throwing an exception - while(Currentfile.lastmodified ()! = Modified | | currentfile.length ()! =length) { + if(Count > 20) { AString message = "File Copy time too long." Please check copy whether exception! "+" \ n " at+ "File at:" + file.getabsolutepath () + "\ n" -+ "File Current length is:" +currentfile.lastmodified (); - Newillegalstateexception (message); - } -count++; -Modified =currentfile.lastmodified (); inLength =currentfile.length (); - Try { toThread.Sleep (500);//wait 500 milliseconds +}Catch(interruptedexception e) { - //TODO auto-generated Catch block the e.printstacktrace (); * } $Currentfile =NewFile (File.getabsolutepath ());Panax Notoginseng - the } + //until the file transfer is complete, you can exit A}
The repair is completed, the modifications involved are repackaged, replaced with the online jar package, and verified again.
Flume Spooldirectorysource file has been modified since being read with file has changed size since being read error