当前位置:  开发笔记 > Android > 正文

阅读Excel Open XML忽略了空白单元格

如何解决《阅读ExcelOpenXML忽略了空白单元格》经验,为你挑选了4个好方法。

我在这里使用公认的解决方案将excel表转换为数据表.如果我有"完美"数据,这可以正常工作,但如果我的数据中间有一个空白单元格,那么它似乎会在每列中放入错误的数据.

我想这是因为在下面的代码中:

row.Descendants().Count()

是填充单元格的数量(不是所有列)AND:

GetCellValue(spreadSheetDocument, row.Descendants().ElementAt(i));

似乎找到下一个填充的单元格(不一定是该索引中的内容)所以如果第一列为空并且我调用ElementAt(0),它将返回第二列中的值.

这是完整的解析代码.

DataRow tempRow = dt.NewRow();

for (int i = 0; i < row.Descendants().Count(); i++)
{
    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants().ElementAt(i));
    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
        Console.Write(tempRow[i].ToString());
    }
}

amurra.. 42

这是有道理的,因为Excel不会为null的单元格存储值.如果使用Open XML SDK 2.0 Productivity Tool打开文件并将XML遍历到单元级别,您将看到只有具有数据的单元格才会出现在该文件中.

您可以选择在要遍历的单元格区域中插入空白数据,或者以编程方式找出已跳过的单元格并适当调整索引.

我在单元格引用A1和C1中创建了一个带有字符串的excel文档示例.然后我在Open XML Productivity Tool中打开了excel文档,这里是存储的XML:


  
    0
  
  
    1
  

在这里,您将看到数据对应于第一行,并且只为该行保存了两个单元格的数据.保存的数据对应于A1和C1,并且不保存具有空值的单元格.

要获得所需的功能,您可以像上面那样遍历单元格,但是您需要检查Cell引用的值,并确定是否已跳过任何单元格.为此,您需要两个实用程序函数来从单元格引用中获取列名称,然后将该列名称转换为基于零的索引:

    private static List Letters = new List() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

    /// 
    /// Given a cell name, parses the specified cell to get the column name.
    /// 
    /// Address of the cell (ie. B2)
    /// Column Name (ie. B)
    public static string GetColumnName(string cellReference)
    {
        // Create a regular expression to match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// 
    /// Given just the column name (no row index), it will return the zero based column index.
    /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
    /// A length of three can be implemented when needed.
    /// 
    /// Column Name (ie. A or AB)
    /// Zero based index if the conversion was successful; otherwise null
    public static int? GetColumnIndexFromName(string columnName)
    {
        int? columnIndex = null;

        string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
        colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();

        if (colLetters.Count() <= 2)
        {
            int index = 0;
            foreach (string col in colLetters)
            {
                List col1 = colLetters.ElementAt(index).ToCharArray().ToList();
                int? indexValue = Letters.IndexOf(col1.ElementAt(index));

                if (indexValue != -1)
                {
                    // The first letter of a two digit column needs some extra calculations
                    if (index == 0 && colLetters.Count() == 2)
                    {
                        columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
                    }
                    else
                    {
                        columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
                    }
                }

                index++;
            }
        }

        return columnIndex;
    }

然后,您可以迭代单元格并检查单元格引用与columnIndex的比较.如果它小于那么你将空白数据添加到tempRow,否则只需读入单元格中包含的值.(注意:我没有测试下面的代码,但一般的想法应该有帮助):

DataRow tempRow = dt.NewRow();

int columnIndex = 0;
foreach (Cell cell in row.Descendants())
{
   // Gets the column index of the cell with data
   int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));

   if (columnIndex < cellColumnIndex)
   {
      do
      {
         tempRow[columnIndex] = //Insert blank data here;
         columnIndex++;
      }
      while(columnIndex < cellColumnIndex);
    }
    tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);

    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
       Console.Write(tempRow[i].ToString());
    }
    columnIndex++;
}

请参阅@ amurra的答案[here](http://stackoverflow.com/questions/4923359/how-do-i-have-open-xml-spreadsheet-uncollapse-cells-in-a-spreadsheet)查看该定义"信件"清单. (4认同)


Waylon Flinn.. 19

这是一个IEnumerable应该执行你想要的,编译和单元测试的实现.

    ///returns an empty cell when a blank cell is encountered
    ///
    public IEnumerator GetEnumerator()
    {
        int currentCount = 0;

        // row is a class level variable representing the current
        // DocumentFormat.OpenXml.Spreadsheet.Row
        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for ( ; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

以下是它依赖的功能:

    /// 
    /// Given a cell name, parses the specified cell to get the column name.
    /// 
    /// Address of the cell (ie. B2)
    /// Column Name (ie. B)
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// 
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// 
    /// Column Name (ie. A or AB)
    /// Zero based index if the conversion was successful
    /// thrown if the given string
    /// contains characters other than uppercase letters
    public static int ConvertColumnNameToNumber(string columnName)
    {
        Regex alpha = new Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }

把它扔进课堂并尝试一下.



1> amurra..:

这是有道理的,因为Excel不会为null的单元格存储值.如果使用Open XML SDK 2.0 Productivity Tool打开文件并将XML遍历到单元级别,您将看到只有具有数据的单元格才会出现在该文件中.

您可以选择在要遍历的单元格区域中插入空白数据,或者以编程方式找出已跳过的单元格并适当调整索引.

我在单元格引用A1和C1中创建了一个带有字符串的excel文档示例.然后我在Open XML Productivity Tool中打开了excel文档,这里是存储的XML:


  
    0
  
  
    1
  

在这里,您将看到数据对应于第一行,并且只为该行保存了两个单元格的数据.保存的数据对应于A1和C1,并且不保存具有空值的单元格.

要获得所需的功能,您可以像上面那样遍历单元格,但是您需要检查Cell引用的值,并确定是否已跳过任何单元格.为此,您需要两个实用程序函数来从单元格引用中获取列名称,然后将该列名称转换为基于零的索引:

    private static List Letters = new List() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

    /// 
    /// Given a cell name, parses the specified cell to get the column name.
    /// 
    /// Address of the cell (ie. B2)
    /// Column Name (ie. B)
    public static string GetColumnName(string cellReference)
    {
        // Create a regular expression to match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// 
    /// Given just the column name (no row index), it will return the zero based column index.
    /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
    /// A length of three can be implemented when needed.
    /// 
    /// Column Name (ie. A or AB)
    /// Zero based index if the conversion was successful; otherwise null
    public static int? GetColumnIndexFromName(string columnName)
    {
        int? columnIndex = null;

        string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
        colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();

        if (colLetters.Count() <= 2)
        {
            int index = 0;
            foreach (string col in colLetters)
            {
                List col1 = colLetters.ElementAt(index).ToCharArray().ToList();
                int? indexValue = Letters.IndexOf(col1.ElementAt(index));

                if (indexValue != -1)
                {
                    // The first letter of a two digit column needs some extra calculations
                    if (index == 0 && colLetters.Count() == 2)
                    {
                        columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
                    }
                    else
                    {
                        columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
                    }
                }

                index++;
            }
        }

        return columnIndex;
    }

然后,您可以迭代单元格并检查单元格引用与columnIndex的比较.如果它小于那么你将空白数据添加到tempRow,否则只需读入单元格中包含的值.(注意:我没有测试下面的代码,但一般的想法应该有帮助):

DataRow tempRow = dt.NewRow();

int columnIndex = 0;
foreach (Cell cell in row.Descendants())
{
   // Gets the column index of the cell with data
   int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));

   if (columnIndex < cellColumnIndex)
   {
      do
      {
         tempRow[columnIndex] = //Insert blank data here;
         columnIndex++;
      }
      while(columnIndex < cellColumnIndex);
    }
    tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);

    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
       Console.Write(tempRow[i].ToString());
    }
    columnIndex++;
}


请参阅@ amurra的答案[here](http://stackoverflow.com/questions/4923359/how-do-i-have-open-xml-spreadsheet-uncollapse-cells-in-a-spreadsheet)查看该定义"信件"清单.

2> Waylon Flinn..:

这是一个IEnumerable应该执行你想要的,编译和单元测试的实现.

    ///returns an empty cell when a blank cell is encountered
    ///
    public IEnumerator GetEnumerator()
    {
        int currentCount = 0;

        // row is a class level variable representing the current
        // DocumentFormat.OpenXml.Spreadsheet.Row
        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for ( ; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

以下是它依赖的功能:

    /// 
    /// Given a cell name, parses the specified cell to get the column name.
    /// 
    /// Address of the cell (ie. B2)
    /// Column Name (ie. B)
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// 
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// 
    /// Column Name (ie. A or AB)
    /// Zero based index if the conversion was successful
    /// thrown if the given string
    /// contains characters other than uppercase letters
    public static int ConvertColumnNameToNumber(string columnName)
    {
        Regex alpha = new Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }

把它扔进课堂并尝试一下.



3> Tim Schmelte..:

这是Waylon回答的略微修改版本,也依赖于其他答案.它将他的方法封装在一个类中.

我变了

IEnumerator GetEnumerator()

IEnumerable GetRowCells(Row row)

这是类,您不需要实例化它,它只是作为实用程序类:

public class SpreedsheetHelper
{
    ///returns an empty cell when a blank cell is encountered
    ///
    public static IEnumerable GetRowCells(Row row)
    {
        int currentCount = 0;

        foreach (DocumentFormat.OpenXml.Spreadsheet.Cell cell in
            row.Descendants())
        {
            string columnName = GetColumnName(cell.CellReference);

            int currentColumnIndex = ConvertColumnNameToNumber(columnName);

            for (; currentCount < currentColumnIndex; currentCount++)
            {
                yield return new DocumentFormat.OpenXml.Spreadsheet.Cell();
            }

            yield return cell;
            currentCount++;
        }
    }

    /// 
    /// Given a cell name, parses the specified cell to get the column name.
    /// 
    /// Address of the cell (ie. B2)
    /// Column Name (ie. B)
    public static string GetColumnName(string cellReference)
    {
        // Match the column name portion of the cell name.
        var regex = new System.Text.RegularExpressions.Regex("[A-Za-z]+");
        var match = regex.Match(cellReference);

        return match.Value;
    }

    /// 
    /// Given just the column name (no row index),
    /// it will return the zero based column index.
    /// 
    /// Column Name (ie. A or AB)
    /// Zero based index if the conversion was successful
    /// thrown if the given string
    /// contains characters other than uppercase letters
    public static int ConvertColumnNameToNumber(string columnName)
    {
        var alpha = new System.Text.RegularExpressions.Regex("^[A-Z]+$");
        if (!alpha.IsMatch(columnName)) throw new ArgumentException();

        char[] colLetters = columnName.ToCharArray();
        Array.Reverse(colLetters);

        int convertedValue = 0;
        for (int i = 0; i < colLetters.Length; i++)
        {
            char letter = colLetters[i];
            int current = i == 0 ? letter - 65 : letter - 64; // ASCII 'A' = 65
            convertedValue += current * (int)Math.Pow(26, i);
        }

        return convertedValue;
    }
}

现在,您可以通过这种方式获取所有行的单元格:

// skip the part that retrieves the worksheet sheetData
IEnumerable rows = sheetData.Descendants();
foreach(Row row in rows)
{
    IEnumerable cells = SpreedsheetHelper.GetRowCells(row);
    foreach (Cell cell in cells)
    {
         // skip part that reads the text according to the cell-type
    }
}

它将包含所有单元格,即使它们是空的.



4> jaccso..:

看我的实施:

  Row[] rows = worksheet.GetFirstChild()
                .Elements()
                .ToArray();

  string[] columnNames = rows.First()
                .Elements()
                .Select(cell => GetCellValue(cell, document))
                .ToArray();

  HeaderLetters = ExcelHeaderHelper.GetHeaderLetters((uint)columnNames.Count());

  if (columnNames.Count() != HeaderLetters.Count())
  {
       throw new ArgumentException("HeaderLetters");
  }

  IEnumerable> cellValues = GetCellValues(rows.Skip(1), columnNames.Count(), document);

//Here you can enumerate through the cell values, based on the cell index the column names can be retrieved.

使用此类收集HeaderLetters:

    private static class ExcelHeaderHelper
    {
        public static string[] GetHeaderLetters(uint max)
        {
            var result = new List();
            int i = 0;
            var columnPrefix = new Queue();
            string prefix = null;
            int prevRoundNo = 0;
            uint maxPrefix = max / 26;

            while (i < max)
            {
                int roundNo = i / 26;
                if (prevRoundNo < roundNo)
                {
                    prefix = columnPrefix.Dequeue();
                    prevRoundNo = roundNo;
                }
                string item = prefix + ((char)(65 + (i % 26))).ToString(CultureInfo.InvariantCulture);
                if (i <= maxPrefix)
                {
                    columnPrefix.Enqueue(item);
                }
                result.Add(item);
                i++;
            }
            return result.ToArray();
        }
    }

辅助方法是:

    private static IEnumerable> GetCellValues(IEnumerable rows, int columnCount, SpreadsheetDocument document)
    {
        var result = new List>();
        foreach (var row in rows)
        {
            List cellValues = new List();
            var actualCells = row.Elements().ToArray();

            int j = 0;
            for (int i = 0; i < columnCount; i++)
            {
                if (actualCells.Count() <= j || !actualCells[j].CellReference.ToString().StartsWith(HeaderLetters[i]))
                {
                    cellValues.Add(null);
                }
                else
                {
                    cellValues.Add(GetCellValue(actualCells[j], document));
                    j++;
                }
            }
            result.Add(cellValues);
        }
        return result;
    }


private static string GetCellValue(Cell cell, SpreadsheetDocument document)
{
    bool sstIndexedcell = GetCellType(cell);
    return sstIndexedcell
        ? GetSharedStringItemById(document.WorkbookPart, Convert.ToInt32(cell.InnerText))
        : cell.InnerText;
}

private static bool GetCellType(Cell cell)
{
    return cell.DataType != null && cell.DataType == CellValues.SharedString;
}

private static string GetSharedStringItemById(WorkbookPart workbookPart, int id)
{
    return workbookPart.SharedStringTablePart.SharedStringTable.Elements().ElementAt(id).InnerText;
}

该解决方案处理共享单元项(SST索引单元).

推荐阅读
有风吹过best
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有